Specification for sending statistics
Images subdirectory handled in test blueprint send-anon-usage Change-Id: I9e8a5f2dd289ec957e047a3ab7391d351bf3d3de
This commit is contained in:
parent
30a88d4bd8
commit
9b60814c7b
BIN
specs/6.0/images/fuel-stat-architecture.png
Normal file
BIN
specs/6.0/images/fuel-stat-architecture.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 44 KiB |
411
specs/6.0/statistics-collecting.rst
Normal file
411
specs/6.0/statistics-collecting.rst
Normal file
@ -0,0 +1,411 @@
|
|||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
==================================================
|
||||||
|
Fuel-stats - sending of the statistics information
|
||||||
|
==================================================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/fuel/+spec/send-anon-usage
|
||||||
|
|
||||||
|
Fuel-stats is the service of collecting and providing analytical
|
||||||
|
information about using of the Fuel product.
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
We need to understand how customers are using Fuel. We need to collect
|
||||||
|
usage statistics and provide the analytics reports.
|
||||||
|
|
||||||
|
We need to send `immediate failure reports`_ to the support team on
|
||||||
|
failed deployment.
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
Fuel-stats service is separated into three parts. The first one is statistics
|
||||||
|
collecting service (collector), the second one is analytical service
|
||||||
|
(analytics), the third one is data migration tool from relational DB into
|
||||||
|
analytics engine (migration).
|
||||||
|
|
||||||
|
.. figure:: images/fuel-stat-architecture.png
|
||||||
|
:alt: Fuel-stats architecture
|
||||||
|
|
||||||
|
Pic 1. Fuel-stats architecture
|
||||||
|
|
||||||
|
.. _`statistic data`:
|
||||||
|
|
||||||
|
Statistic data includes:
|
||||||
|
|
||||||
|
* Operation type (adding cluster, adding node, deployment,
|
||||||
|
removing node, e.t.c.)
|
||||||
|
* Operation start and finish time (in UTC)
|
||||||
|
* Distribution / OS
|
||||||
|
* Reference architecture (e.g. HA)
|
||||||
|
* Network type (Nova-Network, Neutron with VLAN, GRE, NSX, etc.)
|
||||||
|
* Hypervisor (KVM, QEMU, vCenter, etc.)
|
||||||
|
* Storage options (Glance w/ Ceph, Glance w/ Swift, Cinder w/ LVM,
|
||||||
|
Cinder w/iSCSI, Ceph, etc.)
|
||||||
|
* Related Projects (Sahara, Murano, Ceilometer, etc.)
|
||||||
|
* Number of nodes deployed
|
||||||
|
* Roles deployed to each node
|
||||||
|
* Number of environments
|
||||||
|
* Installation master node identifier (generated once during installation)
|
||||||
|
* Fuel version info (build number, release, build id, nailgun sha,
|
||||||
|
fuelmain sha, ostf sha, e.t.c.)
|
||||||
|
* OpenStack version info
|
||||||
|
* Settings modified on Settings tab
|
||||||
|
* Interfaces configuration
|
||||||
|
* Disk layout
|
||||||
|
* Hardware (so we can differentiate between virtualbox and bare metal installs)
|
||||||
|
* Network verification - whether it was used, and what was the result
|
||||||
|
* Networking configuration
|
||||||
|
* Actual time (in seconds) that is took to complete the operation
|
||||||
|
* Is there any manual customizations of nodes metadata done
|
||||||
|
* Kernel parameters
|
||||||
|
* Admin network parameters
|
||||||
|
* PXE parameters
|
||||||
|
* DNS parameters
|
||||||
|
* Is fuel menu used
|
||||||
|
* Is OSTF used, and tests results
|
||||||
|
* Customer contact information, if provided
|
||||||
|
* Plugins information
|
||||||
|
|
||||||
|
On some operations we can provide only part of statistic info. All
|
||||||
|
reports creation logic is implemented in the analytic. All the identifying
|
||||||
|
information should be sanitized.
|
||||||
|
|
||||||
|
.. _`master node identifier`:
|
||||||
|
|
||||||
|
Statistics is grouped by the unique master node identifier which is generated
|
||||||
|
once during installation or upgrade the Nailgun. Master node identifier is
|
||||||
|
stored in master node settings in the Nailgun DB.
|
||||||
|
|
||||||
|
Community and commercial Fuel installations will be provided by different
|
||||||
|
Fuel-stats instances. So we should provide different URIs of collector and
|
||||||
|
analytics in community and commercial ISOs.
|
||||||
|
|
||||||
|
Collector service provides REST API which available from the internet.
|
||||||
|
Analytics provides REST API and have UI to viewing stats reports online.
|
||||||
|
Access to the analytics UI and REST API for commercial instance is limited to
|
||||||
|
Mirantis network. Analytics REST API has public and private parts.
|
||||||
|
Public part is available for search requests only. Private part of
|
||||||
|
REST API available only on localhost. API of collector and analytics
|
||||||
|
is available only through HTTPS.
|
||||||
|
|
||||||
|
Requests and responses in collector are validated by JSON Schema.
|
||||||
|
|
||||||
|
Design analytics UI is out of scope of this specification.
|
||||||
|
|
||||||
|
Each Fuel installation, with enabled sending statistics option, in random time
|
||||||
|
once an hour sends info to collector API from action logs and information about
|
||||||
|
cluster, nodes and other objects configurations. Each request and response
|
||||||
|
to the collector API is validated by JSON scheme. After validation and
|
||||||
|
processing data saves into collector DB. Collector DB has slave replica.
|
||||||
|
Periodically data from collector DB extracted, transformed, loaded (ETL)
|
||||||
|
into analytics DB. For performance issues ETL can be configured to work
|
||||||
|
with replica DB. As analytics engine Elasticsearch is used.
|
||||||
|
|
||||||
|
Periodically backup of collector and analytics DBs is made. Collector DB's
|
||||||
|
backup is made from slave replica due to performance issues.
|
||||||
|
|
||||||
|
Analytics information can be accessed through analytics API or web UI. For
|
||||||
|
heavy analytics reports can be used asynchronous processing, based on tasks
|
||||||
|
and messaging system.
|
||||||
|
|
||||||
|
Option for sending statistics and customer contact information should be
|
||||||
|
added into Nailgun UI. Also detailed description of sending statistics
|
||||||
|
should be added into Nailgun UI.
|
||||||
|
|
||||||
|
Storing of action logs should be added into Nailgun. Each modification
|
||||||
|
requested through Nailgun API should be stored into DB table action_logs.
|
||||||
|
Action logs records contains actor (user, performed the operation), action
|
||||||
|
name, result code, execution time, processing time, and some serialized
|
||||||
|
additional info. Success and failure operations with error description
|
||||||
|
are logged. Logged info should be sanitized from any credentials data.
|
||||||
|
Action logs are saved always and saving is not depends on state of
|
||||||
|
the 'send statistics' option. Nailgun tasks info also stored into
|
||||||
|
the action logs table.
|
||||||
|
|
||||||
|
Requests from fuel-cli and fuel-web have custom value in the HTTP header
|
||||||
|
User-Agent. 'fuel-cli' and 'fuel-web' for simple requests separation.
|
||||||
|
|
||||||
|
Execution time is calculated for asynchronous tasks, Nailgun API requests
|
||||||
|
and added into action logs.
|
||||||
|
|
||||||
|
Sending of statistics from Nailgun to collecting service will be implemented
|
||||||
|
as background process. This background process should save info about last
|
||||||
|
sent action log and sends only fresh records. Sending process should not
|
||||||
|
affect Nailgun services, should be robust to errors. It is started by
|
||||||
|
supervisord. Also this process on each run sends installation detailed
|
||||||
|
information: environments number, nodes number and roles, Fuel release info,
|
||||||
|
OpenStack release info, network configuration, e.t.c.
|
||||||
|
|
||||||
|
Requested analytics reports:
|
||||||
|
|
||||||
|
* totals/distribution for all the categories of information gathered:
|
||||||
|
|
||||||
|
* distribution of OSes of each type (CentOS/Ubuntu) by installations,
|
||||||
|
* distribution of nodes numbers by installations,
|
||||||
|
* distribution of hypervizor types by envirionments,
|
||||||
|
* average deployment time,
|
||||||
|
* how many of a given release (2014.2-6.0, 2014.2.1-6.1, etc.)
|
||||||
|
are deployed,
|
||||||
|
* most common HW server type.
|
||||||
|
|
||||||
|
* average number of deployment failures before success for environments,
|
||||||
|
* total number of node types deployed across customers (e.g. controllers,
|
||||||
|
compute, storage, MongoDB, Zabbix, etc.). This should be smart enough
|
||||||
|
to recognize combined nodes as well (e.g. where compute and storage are
|
||||||
|
on the same node).
|
||||||
|
* number of failures for specific Health Checks vs. total runs. This would be,
|
||||||
|
for example, to identify the most commonly failing test.
|
||||||
|
|
||||||
|
.. _`immediate failure reports`:
|
||||||
|
|
||||||
|
For sending failure reports collector API is used. On failure all required
|
||||||
|
information is gathered, combined with `customer contact`_ and sent to the
|
||||||
|
collector. On the collector side failure report is immediately processed and
|
||||||
|
notification is sent to the support team. If `customer contact`_ is not
|
||||||
|
filled only action log of failure will be stored.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
New databases for collector and analytics will be created.
|
||||||
|
|
||||||
|
Action_logs table added into Nailgun.
|
||||||
|
|
||||||
|
In case of extra-large data amounts DB can be partitioned by DB
|
||||||
|
migration scripts. If partitioning is required we can introduce it
|
||||||
|
by creating master table and partitions and moving data into
|
||||||
|
partitioned table. After that partitioned and original table can be
|
||||||
|
swapped by renaming.
|
||||||
|
|
||||||
|
Master node settings will be added into Nailgun DB. `customer contact`_,
|
||||||
|
`master node identifier`_ are included into master node settings.
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
REST API for collector and analytics services will be created.
|
||||||
|
API call for enabling and disabling sending statistics in the Nailgun.
|
||||||
|
|
||||||
|
Upgrade impact
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Action logs table should be included into DB migration.
|
||||||
|
|
||||||
|
During deployment `master node identifier`_ should be generated if it not
|
||||||
|
generated yet.
|
||||||
|
|
||||||
|
After upgrade information about environments, nodes, roles, networks,
|
||||||
|
releases, e.t.c. will be sent into collector on scheduled action logs
|
||||||
|
sending.
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Protection from data spoofing should be designed and implemented.
|
||||||
|
Authentication should be added for access to analytics UI.
|
||||||
|
Collector and analytics API available only through HTTPS.
|
||||||
|
|
||||||
|
Notifications impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Option for sending statistic and `customer contact`_ are added into Fuel UI
|
||||||
|
settings. We must have a clear, and obvious message that we are collecting
|
||||||
|
data. Information about sending statistics and `customer contact`_ form are
|
||||||
|
shown at once on the popup page after authorization in the Fuel. Later they
|
||||||
|
can be changed on the settings tab.
|
||||||
|
|
||||||
|
.. _`customer contact`:
|
||||||
|
|
||||||
|
Customer contact information is added to the settings tab. This information
|
||||||
|
is used in `immediate failure reports`_ for feedback and in statistics info.
|
||||||
|
Contact information is:
|
||||||
|
|
||||||
|
* Last Name, First Name
|
||||||
|
* Email Address
|
||||||
|
* Company Name
|
||||||
|
|
||||||
|
By default, option for sending statistics is selected, and customer contacts
|
||||||
|
are not. Statistics will be sent only if user select option 'send statistics'
|
||||||
|
and save it in the UI.
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Performance should be measured on the large amount of action logs.
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
We require hosting for collector and analytics services and their DBs.
|
||||||
|
|
||||||
|
Collector and analytics services, DBs migrations should be deployed by
|
||||||
|
puppet manifests from packages.
|
||||||
|
|
||||||
|
Community and commercial Fuel installation are provided by different
|
||||||
|
Fuel-stats instances. Different URIs should be in settings of
|
||||||
|
community and commercial Fuel distributions.
|
||||||
|
|
||||||
|
During deployment `master node identifier`_ should be generated if it not
|
||||||
|
generated yet.
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
|
||||||
|
* aroma@mirantis.com (Artem Roma)
|
||||||
|
* akasatkin@mirantis.com (Alexey Kasatkin)
|
||||||
|
* akislitsky@mirantis.com (Alexander Kislitksy)
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
|
||||||
|
* jkirnosova@mirantis.com (Julia Aranovich) UI developer
|
||||||
|
* kpimenova@mirantis.com (Ekaterina Pimenova) UI developer
|
||||||
|
* acharykov@mirantis.com (Alexander Charykov) DevOps developer
|
||||||
|
* apanchenko@mirantis.com (Artem Panchenko) QA specialist
|
||||||
|
* asledzinskiy@mirantis.com (Andrey Sledzinskiy) QA specialist
|
||||||
|
* dkaiharodsev@mirantis.com (Dmitry Kaiharodsev) OSCI specialist
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
Implementation is separated on several stages.
|
||||||
|
|
||||||
|
Used technologies
|
||||||
|
^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
* Programming language - Python 2.7.
|
||||||
|
* Application server - uWSGI.
|
||||||
|
* API protocol definition - JSON Schema.
|
||||||
|
* Web service - Nginx.
|
||||||
|
* Database - PostgreSQL.
|
||||||
|
* Slave DB replica - by PostgreSQL native WAL technology.
|
||||||
|
* DB schema migrations - Alembic.
|
||||||
|
* Analytics engine - Elasticsearch
|
||||||
|
|
||||||
|
Stage 1
|
||||||
|
^^^^^^^
|
||||||
|
|
||||||
|
All logic should be covered by unittests.
|
||||||
|
|
||||||
|
* Configure uWSGI + Nginx + DB. Run simple WSGI application in collector
|
||||||
|
* Add JSON Schema support and validation of test request/response
|
||||||
|
* Initiate implementation of puppet manifests for service deployment,
|
||||||
|
DBs backup
|
||||||
|
* Check deployment of collector and analytics, when deployment is ready
|
||||||
|
* Implement part of collector API and initiate testing and load testing
|
||||||
|
of it by QA team
|
||||||
|
* Initiate implementation of enabling sending statistics and viewing
|
||||||
|
`statistic data`_
|
||||||
|
* Implement saving action logs in Nailgun
|
||||||
|
* Implement sending statistics to collector from Nailgun
|
||||||
|
* Initiate Nailgun testing by QA
|
||||||
|
* Implement logic enough for switching to implementation of analytics service
|
||||||
|
* Implement part of analytics API
|
||||||
|
* Implement data migration from PostgreSQL to Elasticsearch
|
||||||
|
* Initiate analytics UI implementation
|
||||||
|
* Implement full analytics API, collector API
|
||||||
|
* Testing, fixing
|
||||||
|
* Deploy DB, collector, analytics to servers
|
||||||
|
* Add services and servers to the monitoring of IT infrastructure
|
||||||
|
* First release is done
|
||||||
|
|
||||||
|
Limitations of the first release:
|
||||||
|
|
||||||
|
* No authentication
|
||||||
|
* No replication of collector DB
|
||||||
|
* No backup of DB
|
||||||
|
* Heavy analytics reports are not handled
|
||||||
|
* Only commercial instance is implemented (access to the analytics UI and
|
||||||
|
REST API is limited to Mirantis network)
|
||||||
|
* No OSTF statistics
|
||||||
|
* No action logs viewing in the Nailgun UI
|
||||||
|
* No immediate failure reports to the support team
|
||||||
|
* No plugins statistics
|
||||||
|
|
||||||
|
|
||||||
|
Stage 2
|
||||||
|
^^^^^^^
|
||||||
|
|
||||||
|
* Community instance is implemented
|
||||||
|
* Improve analytics reports and analytics UI
|
||||||
|
* Action logs viewing in the Nailgun UI
|
||||||
|
* Collecting OpenStack statistics
|
||||||
|
|
||||||
|
Stage 3
|
||||||
|
^^^^^^^
|
||||||
|
|
||||||
|
* Handle collector DB backup
|
||||||
|
* Handle collector DB replication
|
||||||
|
* Sending OSTF and plugins statistic
|
||||||
|
* Improve analytics reports and analytics UI
|
||||||
|
* Immediate failure reports to the support team
|
||||||
|
|
||||||
|
|
||||||
|
Stage 4
|
||||||
|
^^^^^^^
|
||||||
|
|
||||||
|
* Handle authentication
|
||||||
|
* Handle analytics DB backup
|
||||||
|
* Improve analytics reports and analytics UI
|
||||||
|
|
||||||
|
Stage 5
|
||||||
|
^^^^^^^
|
||||||
|
|
||||||
|
* Handle heavy analytics reports
|
||||||
|
* Handle data partitioning (if required)
|
||||||
|
* Improve analytics reports and analytics UI
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
We require those tests:
|
||||||
|
|
||||||
|
* APIs integration testing
|
||||||
|
* APIs load testing
|
||||||
|
* UI functional testing
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
Option for enabling sending, and `statistic data`_ details
|
||||||
|
should be documented.
|
||||||
|
|
||||||
|
Collector API will be documented by JSON Schemas (probably by sphinx).
|
||||||
|
|
||||||
|
Analytics reports and analytics UI should be documented.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
None
|
@ -82,6 +82,8 @@ class TestTitles(testtools.TestCase):
|
|||||||
|
|
||||||
def test_template(self):
|
def test_template(self):
|
||||||
files = ['specs/template.rst'] + glob.glob('specs/*/*')
|
files = ['specs/template.rst'] + glob.glob('specs/*/*')
|
||||||
|
# filtering images subdirectory
|
||||||
|
files = filter(lambda x: 'images' not in x, files)
|
||||||
for filename in files:
|
for filename in files:
|
||||||
self.assertTrue(filename.endswith(".rst"),
|
self.assertTrue(filename.endswith(".rst"),
|
||||||
"spec's file must uses 'rst' extension.")
|
"spec's file must uses 'rst' extension.")
|
||||||
|
Loading…
Reference in New Issue
Block a user