Create spec for optional subunit data import

Re: primary assignee: I know Chris had mentioned he wanted me
working on this; primary assignee can be changed if somebody
else is going to be the main actor in this change.

This is very much a work in progress. Some topics currently
being researched/up for debate:
* Is subunit2sql's input sanitization adequate for our needs,
  where protection against malicious file upload is concerned?
* Is there a way to store the data in subunit2sql's db schema
  in the RefStack db or vice versa?

Sections that I will be adding more detail to shortly:
* REST API impact
* Other deployer impact

Change-Id: I03e212694d7d7a586f1caabedbebd7da6a0f38b9
This commit is contained in:
Megan Guiney 2017-07-03 20:12:32 -07:00
parent 454a40b329
commit 2832d3a56f
1 changed files with 315 additions and 0 deletions

View File

@ -0,0 +1,315 @@
=============================================
Upload Subunit Data From Test Results
=============================================
Launchpad blueprint:
* https://blueprints.launchpad.net/refstack/+spec/subunit-data-upload
This specification describes an expansion of the RefStack API's
functionality to allow for upload of the subunit data which corresponds
to a given set of test results.
Problem description
===================
Currently, all test results uploaded to the RefStack website consists
of a json file containing only the portion of the RefStack run pertaining
to the passed tests. This limitation dates back to the the start of the
RefStack project. At that time, Defcore (which is now known as interop-WG)
was very concerned about the possibility that private data may be included
in the subunit upload file. Defcore was concerned that vendors might, for
that reason, be hesistant to upload data into RefStack for fear of
unintentionally revealing vendor-specific data such as reasons for test
failures. For this reason, Defcore agreed unanimously that RefStack should
care only about passing tests, and not failed or skipped ones.
The risk, with this resolution, however, is that not including a full set of
results means that it would be fairly simple to falsify those results in
order to make an OpenStack instance appear to be more interoperable than
it actually is. This too, was discussed at the time, and Defcore eventually
arrived at the conclusion that, in the end, it would be better to accept
vendor results in good faith, but to always leave the door open for users
and Foundation staff to verify those results independently. This decision
did not, however, account for the possibility that vendors seeking support
during the process of verifying the interoperability of their product may
need a way to securely share subunit data for review by Foundation staff.
Proposed change
===============
In order to move towards having a more reliable and verifiable collection
of RefStack results, we intend to add functionality to the RefStack
toolkit that will allow for the upload of the subunit data related to a given
set of test results. This data would be private, only accessible to the party
uploading it, and to foundation staff, to be used for result integrity
verification as well as debugging assistance. Upload of subunit data will not,
for the time being, be required.
After discussing a number of data storage methods at the 7/18/17 RefStack
meeting[3], we settled upon saving the newly usable subunit data in the
current database. With a small adjustment to our alembic settings (which
will keep the version tables from colliding), this could be done using the
existing subunit2sql toolkit[4]. In order to apply the table name change,
we will build in a series of functions that check refstack.conf and rename
the existing alembic version table if needed. This added functionality,
when merged and functional, will make RefStack one of only two OpenStack
projects (according to oslo.conf docs[7])that is currently capable of
modifying configuration at runtime without a service restart. The usage of
subunit2sql will do a lot of the heavy lifting for us, as far as data import
goes, as well as keeping the storage method of test data consistent across
the board.
For the time being, we plan to link the subunit data will be linked to the
corresponding test results via a key value pair in the metadata table that
is an existing part of the RefStack database.
Toolset to use:
subunit2sql
Alternatives
------------
Though we did eventually decide upon storing the new data in a new, separate
database, a few alternate options were discussed during the 7/18/17 RefStack
meeting[3]. The alternate options discussed were:
* Save subunit files as-is in a file system. This has the benefit of being the
least processing-intensive option for saving the data, as it would literally
just save the output into a file. It may, however, make subunit data upload
a bit less elegant, as well as being a deviation from the way test run data
is managed throughout the rest of RefStack.
* Save subunit data in the RefStack database and tables by building in the
functionality required to save and manage it. Like the option listed above,
this option keeps test run data stored consistently across refstack, which
would make the changes to the API more consistent as well. It would also
avoid the overhead that would result from using a separate database, as well
as any redundancies that resulted from using a second, separate database.
However, any redundancy would be fairly minor due to the extremely limited
scope of the data we are currently storing from each test run, and this would
leave more of the implementation up to us, which, because of how well
subunit2sql's schema fulfills the needs of this change, may be wholly
unnecessary.
* Save subunit data in a separate database created by subunit2sql. This has the
benefit of having all of the functionality we need without forcing us to
reinvent the wheel, but it also carries with it the overhead of having to use
a second database. This option doesn't make much sense, however, given that
we can actually use subunit2sql's toolkit in the current refstack database,
as long as we can configure the database to use an extra (differently named)
alembic version table for refstack's core db.
Data model impact
-----------------
We may be able to use the tables created by subunit2sql within the RefStack
database. These tables (for reference) are mapped out below:
--------------------------------------
| tests |
--------------------------------------
| id | String(256) |
| test_id | String(256) |
| run_count | Integer |
| failure | Integer |
| run_time | Float |
--------------------------------------
----------------------------------------
| runs |
----------------------------------------
| id | BigInteger |
| skips | Integer |
| fails | Integer |
| passes | Integer |
| run_time | Float |
| artifacts | Text |
| run_at | DateTime |
----------------------------------------
---------------------------------------------------
| test_runs |
---------------------------------------------------
| id | BigInteger |
| test_id | BigInteger |
| run_id | BigInteger |
| status | String(256) |
| start_time | DateTime |
| start_time_microseconds | Integer |
| stop_time | DateTime |
| stop_time_microseconds | Integer |
| test | Test |
| run | Run |
---------------------------------------------------
-------------------------------------------
| run_metadata |
-------------------------------------------
| id | BigInteger |
| key | String(255) |
| value | String(255) |
| run_id | BigInteger |
| run | Run |
-------------------------------------------
-------------------------------------------
| test_run_metadata |
-------------------------------------------
| id | BigInteger |
| key | String(255) |
| value | String(255) |
| test_run_id | BigInteger |
| test_run | TestRun |
-------------------------------------------
-------------------------------------------
| test_metadata |
-------------------------------------------
| id | BigInteger |
| key | String(255) |
| value | String(255) |
| test_id | BigInteger |
| test | Test |
-------------------------------------------
-------------------------------------------
| attachments |
-------------------------------------------
| id | BigInteger |
| test_run_id | BigInteger |
| label | String(255) |
| attachment | LargeBinary |
| test_run | TestRun |
-------------------------------------------
more details about this data model can be found in the source docs for
subunit2sql[5]
If we end up being unable to integrate the two databases into one at this time,
we plan to use the metadata table which already exists in the RefStack internal
db to store a key pair that links the existing test data to the newly added
subunit data.
REST API impact
---------------
We will need to implement a new REST API for the upload of subunit data
from the client, and then use subunit2sql to process and save the data
into the database.
Security impact
---------------
It has been suggested that uploading the subunit data for tests may expose
private data. However, it was determined in the 6/27/2017 RefStack meeting[1]
that if any such data is revealed through this upload, it would be due to a
leak in tempest's logging procedures, not the upload of this new type of data.
This was also discussed at the 6/28/17 Interop-wg meeting[2]. It was at this
meeting that was confirmed that we would implement this change using an
opt-in flag, so that those who are still concerned about the security of
uploading the results do not, by default, have to upload their data. It was
also determined that, due to the fact that this design reflects a fairly
significant reversal in a past decision, that the community should be
properly notified. This decision also resulted in the following action plan:
1. write an email to distribute to the mailing list
2. send out the official decision after the email is distributed
3. change the offical interop docs to reflect this change
Another concern was that a database injection attack may be possible, if an
attacker were to use maliciously crafted subunit data. This threat, also,
does not appear to be much of a danger, as the mass majority of the data
written to the database is done after the subunit data is processed, meaning
that there are very few places in which raw strings are written into the db.
We need to look a little more into whether sql does enough input sanitization
for our needs.
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance impact
-------------------
None
Other deployer impact
---------------------
We will also need to adjust refstack-client to be able to consume the new API
feature while uploading subunit data.
One of the most user-visible part of this change would be the creation of a
flag option which enables the upload of the subunit data to the refstack site,
which would modify the existing procedure in that we would need to build in
functionality that would allow for the additional data upload.
We would also need to add a second flag to the database sync functionality in
order to allow for the alternate naming of the alembic version table, which
enables us to use both subunit2sql and refstack tables and functionality
within the same database.
Developer impact
----------------
None
Implementation
==============
Assignees(s)
------------
Primary assignee:
Megan Guiney
Other contributors:
Paul Van Eck (subunit data upload ui in refstack-client)
Work Items
----------
* Add a CONF option to allow for the usage of nonstandard alembic
version table names.
* Add a utility that allows for the runtime checking and alteration
of alembic version table names.
* Create an API at the server side to accept the subunit data
* At the server side, use subunit2sql to process the subunit data
* Link subunit data to existing set of refstack results.
* Create UI to upload subunit data (completed, as of 1/20/2016[6],
though may require update)
* Create a UI to display subunit data. There may already be one, but
we need to make sure such a utility exists. We also need to decide
whether the results should be viewable via the refstack website.
Dependencies
============
Testing
=======
Documentation Impact
====================
We will need to update the docs to reflect the additions to the API, the
database, and to refstack-client as well.
References
==========
[1] http://eavesdrop.openstack.org/meetings/refstack/2017/refstack.
2017-06-27-19.00.log.html
[2] http://eavesdrop.openstack.org/meetings/interopwg/2017/interopwg.
2017-06-28-16.00.log.html
[3] http://eavesdrop.openstack.org/meetings/refstack/2017/refstack.
2017-07-18-19.00.log.html
[4] https://git.openstack.org/cgit/openstack-infra/subunit2sql
[5] https://docs.openstack.org/subunit2sql/latest/data_model.html
[6] https://review.openstack.org/#/c/265394/
[7] https://docs.openstack.org/oslo.config/latest/configuration/
mutable.html