Specification for sending statistics

Images subdirectory handled in test blueprint send-anon-usage Change-Id: I9e8a5f2dd289ec957e047a3ab7391d351bf3d3de
2014-09-09 19:59:43 +04:00 · 2014-09-09 19:59:43 +04:00 · 9b60814c7b
commit 9b60814c7b
parent 30a88d4bd8
3 changed files with 413 additions and 0 deletions
--- a/specs/6.0/images/fuel-stat-architecture.png
+++ b/specs/6.0/images/fuel-stat-architecture.png
--- a/specs/6.0/statistics-collecting.rst
+++ b/specs/6.0/statistics-collecting.rst
@ -0,0 +1,411 @@
 ..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.
 http://creativecommons.org/licenses/by/3.0/legalcode
 ==================================================
 Fuel-stats - sending of the statistics information
 ==================================================
 https://blueprints.launchpad.net/fuel/+spec/send-anon-usage
 Fuel-stats is the service of collecting and providing analytical
 information about using of the Fuel product.
 Problem description
 ===================
 We need to understand how customers are using Fuel. We need to collect
 usage statistics and provide the analytics reports.
 We need to send `immediate failure reports`_ to the support team on
 failed deployment.
 Proposed change
 ===============
 Fuel-stats service is separated into three parts. The first one is statistics
 collecting service (collector), the second one is analytical service
 (analytics), the third one is data migration tool from relational DB into
 analytics engine (migration).
 .. figure:: images/fuel-stat-architecture.png
   :alt: Fuel-stats architecture
   Pic 1. Fuel-stats architecture
 .. _`statistic data`:
 Statistic data includes:
 * Operation type (adding cluster, adding node, deployment,
  removing node, e.t.c.)
 * Operation start and finish time (in UTC)
 * Distribution / OS
 * Reference architecture (e.g. HA)
 * Network type (Nova-Network, Neutron with VLAN, GRE, NSX, etc.)
 * Hypervisor (KVM, QEMU, vCenter, etc.)
 * Storage options (Glance w/ Ceph, Glance w/ Swift, Cinder w/ LVM,
  Cinder w/iSCSI, Ceph, etc.)
 * Related Projects (Sahara, Murano, Ceilometer, etc.)
 * Number of nodes deployed
 * Roles deployed to each node
 * Number of environments
 * Installation master node identifier (generated once during installation)
 * Fuel version info (build number, release, build id, nailgun sha,
  fuelmain sha, ostf sha, e.t.c.)
 * OpenStack version info
 * Settings modified on Settings tab
 * Interfaces configuration
 * Disk layout
 * Hardware (so we can differentiate between virtualbox and bare metal installs)
 * Network verification - whether it was used, and what was the result
 * Networking configuration
 * Actual time (in seconds) that is took to complete the operation
 * Is there any manual customizations of nodes metadata done
 * Kernel parameters
 * Admin network parameters
 * PXE parameters
 * DNS parameters
 * Is fuel menu used
 * Is OSTF used, and tests results
 * Customer contact information, if provided
 * Plugins information
 On some operations we can provide only part of statistic info. All
 reports creation logic is implemented in the analytic. All the identifying
 information should be sanitized.
 .. _`master node identifier`:
 Statistics is grouped by the unique master node identifier which is generated
 once during installation or upgrade the Nailgun. Master node identifier is
 stored in master node settings in the Nailgun DB.
 Community and commercial Fuel installations will be provided by different
 Fuel-stats instances. So we should provide different URIs of collector and
 analytics in community and commercial ISOs.
 Collector service provides REST API which available from the internet.
 Analytics provides REST API and have UI to viewing stats reports online.
 Access to the analytics UI and REST API for commercial instance is limited to
 Mirantis network. Analytics REST API has public and private parts.
 Public part is available for search requests only. Private part of
 REST API available only on localhost. API of collector and analytics
 is available only through HTTPS.
 Requests and responses in collector are validated by JSON Schema.
 Design analytics UI is out of scope of this specification.
 Each Fuel installation, with enabled sending statistics option, in random time
 once an hour sends info to collector API from action logs and information about
 cluster, nodes and other objects configurations. Each request and response
 to the collector API is validated by JSON scheme. After validation and
 processing data saves into collector DB. Collector DB has slave replica.
 Periodically data from collector DB extracted, transformed, loaded (ETL)
 into analytics DB. For performance issues ETL can be configured to work
 with replica DB. As analytics engine Elasticsearch is used.
 Periodically backup of collector and analytics DBs is made. Collector DB's
 backup is made from slave replica due to performance issues.
 Analytics information can be accessed through analytics API or web UI. For
 heavy analytics reports can be used asynchronous processing, based on tasks
 and messaging system.
 Option for sending statistics and customer contact information should be
 added into Nailgun UI. Also detailed description of sending statistics
 should be added into Nailgun UI.
 Storing of action logs should be added into Nailgun. Each modification
 requested through Nailgun API should be stored into DB table action_logs.
 Action logs records contains actor (user, performed the operation), action
 name, result code, execution time, processing time, and some serialized
 additional info. Success and failure operations with error description
 are logged. Logged info should be sanitized from any credentials data.
 Action logs are saved always and saving is not depends on state of
 the 'send statistics' option. Nailgun tasks info also stored into
 the action logs table.
 Requests from fuel-cli and fuel-web have custom value in the HTTP header
 User-Agent. 'fuel-cli' and 'fuel-web' for simple requests separation.
 Execution time is calculated for asynchronous tasks, Nailgun API requests
 and added into action logs.
 Sending of statistics from Nailgun to collecting service will be implemented
 as background process. This background process should save info about last
 sent action log and sends only fresh records. Sending process should not
 affect Nailgun services, should be robust to errors. It is started by
 supervisord. Also this process on each run sends installation detailed
 information: environments number, nodes number and roles, Fuel release info,
 OpenStack release info, network configuration, e.t.c.
 Requested analytics reports:
 * totals/distribution for all the categories of information gathered:
    * distribution of OSes of each type (CentOS/Ubuntu) by installations,
    * distribution of nodes numbers by installations,
    * distribution of hypervizor types by envirionments,
    * average deployment time,
    * how many of a given release (2014.2-6.0, 2014.2.1-6.1, etc.)
      are deployed,
    * most common HW server type.
 * average number of deployment failures before success for environments,
 * total number of node types deployed across customers (e.g. controllers,
  compute, storage, MongoDB, Zabbix, etc.). This should be smart enough
  to recognize combined nodes as well (e.g. where compute and storage are
  on the same node).
 * number of failures for specific Health Checks vs. total runs. This would be,
  for example, to identify the most commonly failing test.
 .. _`immediate failure reports`:
 For sending failure reports collector API is used. On failure all required
 information is gathered, combined with `customer contact`_ and sent to the
 collector. On the collector side failure report is immediately processed and
 notification is sent to the support team. If `customer contact`_ is not
 filled only action log of failure will be stored.
 Alternatives
 ------------
 None
 Data model impact
 -----------------
 New databases for collector and analytics will be created.
 Action_logs table added into Nailgun.
 In case of extra-large data amounts DB can be partitioned by DB
 migration scripts. If partitioning is required we can introduce it
 by creating master table and partitions and moving data into
 partitioned table. After that partitioned and original table can be
 swapped by renaming.
 Master node settings will be added into Nailgun DB. `customer contact`_,
 `master node identifier`_ are included into master node settings.
 REST API impact
 ---------------
 REST API for collector and analytics services will be created.
 API call for enabling and disabling sending statistics in the Nailgun.
 Upgrade impact
 --------------
 Action logs table should be included into DB migration.
 During deployment `master node identifier`_ should be generated if it not
 generated yet.
 After upgrade information about environments, nodes, roles, networks,
 releases, e.t.c. will be sent into collector on scheduled action logs
 sending.
 Security impact
 ---------------
 Protection from data spoofing should be designed and implemented.
 Authentication should be added for access to analytics UI.
 Collector and analytics API available only through HTTPS.
 Notifications impact
 --------------------
 None
 Other end user impact
 ---------------------
 Option for sending statistic and `customer contact`_ are added into Fuel UI
 settings. We must have a clear, and obvious message that we are collecting
 data. Information about sending statistics and `customer contact`_ form are
 shown at once on the popup page after authorization in the Fuel. Later they
 can be changed on the settings tab.
 .. _`customer contact`:
 Customer contact information is added to the settings tab. This information
 is used in `immediate failure reports`_ for feedback and in statistics info.
 Contact information is:
 * Last Name, First Name
 * Email Address
 * Company Name
 By default, option for sending statistics is selected, and customer contacts
 are not. Statistics will be sent only if user select option 'send statistics'
 and save it in the UI.
 Performance Impact
 ------------------
 Performance should be measured on the large amount of action logs.
 Other deployer impact
 ---------------------
 We require hosting for collector and analytics services and their DBs.
 Collector and analytics services, DBs migrations should be deployed by
 puppet manifests from packages.
 Community and commercial Fuel installation are provided by different
 Fuel-stats instances. Different URIs should be in settings of
 community and commercial Fuel distributions.
 During deployment `master node identifier`_ should be generated if it not
 generated yet.
 Developer impact
 ----------------
 None
 Implementation
 ==============
 Assignee(s)
 -----------
 Primary assignee:
 * aroma@mirantis.com (Artem Roma)
 * akasatkin@mirantis.com (Alexey Kasatkin)
 * akislitsky@mirantis.com (Alexander Kislitksy)
 Other contributors:
 * jkirnosova@mirantis.com (Julia Aranovich) UI developer
 * kpimenova@mirantis.com (Ekaterina Pimenova) UI developer
 * acharykov@mirantis.com (Alexander Charykov) DevOps developer
 * apanchenko@mirantis.com (Artem Panchenko) QA specialist
 * asledzinskiy@mirantis.com (Andrey Sledzinskiy) QA specialist
 * dkaiharodsev@mirantis.com (Dmitry Kaiharodsev) OSCI specialist
 Work Items
 ----------
 Implementation is separated on several stages.
 Used technologies
 ^^^^^^^^^^^^^^^^^
 * Programming language - Python 2.7.
 * Application server - uWSGI.
 * API protocol definition - JSON Schema.
 * Web service - Nginx.
 * Database - PostgreSQL.
 * Slave DB replica - by PostgreSQL native WAL technology.
 * DB schema migrations - Alembic.
 * Analytics engine - Elasticsearch
 Stage 1
 ^^^^^^^
 All logic should be covered by unittests.
 * Configure uWSGI + Nginx + DB. Run simple WSGI application in collector
 * Add JSON Schema support and validation of test request/response
 * Initiate implementation of puppet manifests for service deployment,
  DBs backup
 * Check deployment of collector and analytics, when deployment is ready
 * Implement part of collector API and initiate testing and load testing
  of it by QA team
 * Initiate implementation of enabling sending statistics and viewing
  `statistic data`_
 * Implement saving action logs in Nailgun
 * Implement sending statistics to collector from Nailgun
 * Initiate Nailgun testing by QA
 * Implement logic enough for switching to implementation of analytics service
 * Implement part of analytics API
 * Implement data migration from PostgreSQL to Elasticsearch
 * Initiate analytics UI implementation
 * Implement full analytics API, collector API
 * Testing, fixing
 * Deploy DB, collector, analytics to servers
 * Add services and servers to the monitoring of IT infrastructure
 * First release is done
 Limitations of the first release:
 * No authentication
 * No replication of collector DB
 * No backup of DB
 * Heavy analytics reports are not handled
 * Only commercial instance is implemented (access to the analytics UI and
  REST API is limited to Mirantis network)
 * No OSTF statistics
 * No action logs viewing in the Nailgun UI
 * No immediate failure reports to the support team
 * No plugins statistics
 Stage 2
 ^^^^^^^
 * Community instance is implemented
 * Improve analytics reports and analytics UI
 * Action logs viewing in the Nailgun UI
 * Collecting OpenStack statistics
 Stage 3
 ^^^^^^^
 * Handle collector DB backup
 * Handle collector DB replication
 * Sending OSTF and plugins statistic
 * Improve analytics reports and analytics UI
 * Immediate failure reports to the support team
 Stage 4
 ^^^^^^^
 * Handle authentication
 * Handle analytics DB backup
 * Improve analytics reports and analytics UI
 Stage 5
 ^^^^^^^
 * Handle heavy analytics reports
 * Handle data partitioning (if required)
 * Improve analytics reports and analytics UI
 Dependencies
 ============
 None
 Testing
 =======
 We require those tests:
 * APIs integration testing
 * APIs load testing
 * UI functional testing
 Documentation Impact
 ====================
 Option for enabling sending, and `statistic data`_ details
 should be documented.
 Collector API will be documented by JSON Schemas (probably by sphinx).
 Analytics reports and analytics UI should be documented.
 References
 ==========
 None
--- a/tests/test_titles.py
+++ b/tests/test_titles.py
@ -82,6 +82,8 @@ class TestTitles(testtools.TestCase):
    def test_template(self):
        files = ['specs/template.rst'] + glob.glob('specs/*/*')
        # filtering images subdirectory
        files = filter(lambda x: 'images' not in x, files)
        for filename in files:
            self.assertTrue(filename.endswith(".rst"),
                            "spec's file must uses 'rst' extension.")