.. This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode ============================================= Use uuids in services and os-hypervisors APIs ============================================= ``_ To work with services and hypervisors (compute nodes) in the compute REST API we currently expose and take primary key IDs. In a multi-cell deployment, these IDs are not unique. This spec proposes exposing a uuid for services and hypervisors in the REST API to uniquely identify a resource regardless of which cell it is in. Problem description =================== We currently leak database id fields (primary keys) out of the compute REST API for services and compute_nodes which are all in a cell database (the 'nova' database in a cells v2 deployment). These are in the `os-services` and `os-hypervisors` APIs, respectively. For example, to delete a service record, you must issue a DELETE request to ``/os-services/{service_id}`` to delete the service record with that id. The `os-hypervisors` API exposes the id in GET (index) requests and uses it in the "show" and "uptime" methods to look up the ComputeNode object by that id. This is ugly but functional in a single-cell deployment. However, in a multi-cell deployment, we have no context on which cell we should query to get service/node details from, since you could have multiple cells each with a nova-compute service and compute node with id 1, so which cell do you pick to delete the service or show details about the hypervisor? Use Cases --------- As a cloud administrator, I want to uniquely identify the resources in my cloud regardless of which cell they are in and be able to get details about and delete them. Proposed change =============== This blueprint proposes to add a microversion to the compute REST API which replaces the usage of the id field with a uuid field. The uuid would be returned instead of the id in GET responses and also taken as input for the id in CRUD APIs. Then when a request to delete a service is made, if the uuid is provided we can simply iterate cells until we find the service, or error with a 404. Before the microversion, if an id is passed and there is only one cell, or no duplicates in multiple cells, we will continue to honor the request. But if an id is passed on the request (before the microversion) and we cannot uniquely identify the record out of multiple cells, we error with a 400. This is similar behavior to how creating a server works when a network or port is not provided and there are multiple networks available to the project, we fail with a 400 "NetworkAmbiguous" error. The compute_nodes table already has a uuid field. The services table, however, does not, so as part of this blueprint we will need to add a uuid column to that table and corresponding versioned object. Alternatives ------------ Alternatives to exposing just the basic uuid and using it to iterate over potentially multiple cells until we find a match, is to encode the cell uuid in the resource uuid. For example, if we could simply return ``{cell_uuid}-{resource_uuid}``. Then rather than iterating all cells to find the resource, we could decode the input uuid to get the cell we need. This is not a recommended alternative because it encodes the cell in the REST API which is something we have said in the past we did not want to do, and is similar to how cells v1 does namespacing on cells. It would also mean that parts of the compute API are encoding a cell uuid and others, like the `servers` API, are not. This could lead to maintenance issues in the actual code since we would have different lookup operations for different resources. Another alternative is creating mapping tables in the Nova API database, like the ``host_mappings`` and ``instance_mappings`` tables. This alternative is not recommended, at least not at this time, because the need for working with service records should be relatively small. Data model impact ----------------- The `services` table in the cell (nova) database will have a nullable uuid column added. The column will be nullable due to existing records which do not have the uuid field. We can migrate the data on access through the versioned object, and/or provide online data migrations to add uuids to existing records during an upgrade. REST API impact --------------- os-hypervisors ~~~~~~~~~~~~~~ There are only ``GET`` methods in this API. They will all be changed to return the uuid value for the `id` field and take as input a uuid value for the ``{hypervisor_id}``. We cannot use the `query parameter validation`_ added in Ocata to validate that the ID passed in is a uuid since it is not be a query parameter. Therefore, we will need to validate the input `id` value is a uuid in code. The following APIs will also be changed:: * GET /os-hypervisors/{hypervisor_hostname_pattern}/search * GET /os-hypervisors/{hypervisor_hostname_pattern}/servers Both of those APIs return a list of matches given the hostname search pattern. While not directly needed to the problem stated in this spec, we will take the opportunity of the microversion change in this API to make these better. The `hypervisor_hostname_pattern` will change to a query parameter. * Old: GET /os-hypervisors/{hypervisor_hostname_pattern}/search * New: GET /os-hypervisors?hypervisor_hostname=xxx Example request:: GET /os-hypervisors?hypervisor_hostname=london1.compute Example response:: { "hypervisors": [ { "hypervisor_hostname": "london1.compute.1", "id": "37c62dfd-105f-40c2-a749-0bd1c756e8ff", "state": "up", "status": "enabled" } ] } * Old: GET /os-hypervisors/{hypervisor_hostname_pattern}/servers * New: GET /os-hypervisors?hypervisor_hostname=xxx&with_servers=true Example request:: GET /os-hypervisors?hypervisor_hostname=london1.compute&with_servers=true Example response:: { "hypervisors": [ { "hypervisor_hostname": "london1.compute.1", "id": "37c62dfd-105f-40c2-a749-0bd1c756e8ff", "state": "up", "status": "enabled", "servers": [ { "name": "test_server1", "uuid": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" }, { "name": "test_server2", "uuid": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb" } ] } ] } .. _query parameter validation: https://specs.openstack.org/openstack/nova-specs/specs/ocata/implemented/consistent-query-parameters-validation.html os-services ~~~~~~~~~~~ The following API methods which take as input and/or return the integer primary key id in the response will be updated to take/return a uuid:: * GET /os-services * DELETE /os-services/{service_id} For example: **GET /os-services** Response:: { "services": [ { "id": "8e6e4ab6-0662-4ff5-8994-dde92bedada1", "binary": "nova-scheduler", "disabled_reason": "test1", "host": "host1", "state": "up", "status": "disabled", "updated_at": "2012-10-29T13:42:02.000000", "forced_down": false, "zone": "internal" }, { "id": "3fe90b52-1d67-4f03-9ed3-5fbf1a6fa1e1", "binary": "nova-compute", "disabled_reason": "test2", "host": "host1", "state": "up", "status": "disabled", "updated_at": "2012-10-29T13:42:05.000000", "forced_down": false, "zone": "nova" }, ] } **DELETE /os-services/3fe90b52-1d67-4f03-9ed3-5fbf1a6fa1e1** There is no response for a successful delete operation. The **action** APIs do not take an id to identify the service on which to perform an action. These include:: * PUT /os-services/disable * PUT /os-services/disable-log-reason * PUT /os-services/enable * PUT /os-services/force-down Unlike the ``/servers/{server_id}/action`` APIs which take the action in the request body, these APIs do not take a specific service id. The request body contains a ``host`` and ``binary`` field to identify the service. As part of this microversion, we will collapse those action APIs into a single PUT method which supports all of the actions and takes a ``service_id`` as input to uniquely identify the service rather than a body with the ``host`` and ``binary`` fields. What follows are examples of the old and new formats for each action API. * PUT /os-services/disable Old request:: PUT /os-services/disable { "host": "host1", "binary": "nova-compute" } New request:: PUT /os-services/{service_id} { "status": "disabled" } * PUT /os-services/disable-log-reason Old request:: PUT /os-services/disable-log-reason { "host": "host1", "binary": "nova-compute", "disabled_reason": "test2" } New request:: PUT /os-services/{service_id} { "status": "disabled", "disabled_reason": "test2" } * PUT /os-services/enable* Old request:: PUT /os-services/enable { "host": "host1", "binary": "nova-compute" } New request:: PUT /os-services/{service_id} { "status": "enabled" } * PUT /os-services/force-down Old request:: PUT /os-services/force-down { "host": "host1", "binary": "nova-compute", "forced_down": true } New request:: PUT /os-services/{service_id} { "forced_down": true } We will also provide a full response for the PUT method now. For example: * PUT /os-services/disable-log-reason Old response:: { "service": { "binary": "nova-compute", "disabled_reason": "test2", "host": "host1", "status": "disabled" } } New response:: { "service": { "id": "ade63841-f3e4-47de-840f-815322afa569", "binary": "nova-compute", "disabled_reason": "test2", "host": "host1", "state": "up", "status": "disabled", "updated_at": "2012-10-29T13:42:05.000000", "forced_down": false, "zone": "nova" } } Security impact --------------- None Notifications impact -------------------- Services ~~~~~~~~ The ``service.update`` versioned notification payload will be updated to include the new uuid field. Hosts ~~~~~ There are legacy unversioned notifications for actions on a compute node, such as ``HostAPI.set_enabled.start``. These are not converted to using versioned notifications yet, so until they are, there are no changes needed. Other end user impact --------------------- Since the REST API changes do not change the 'id' key in the response, only the value, there should not need to be any changes in python-novaclient. Performance Impact ------------------ None. Since we do not have a mapping table for services in the nova_api database, we already have to iterate cells looking for a match, as seen in this change: https://review.openstack.org/#/c/442162/ Other deployer impact --------------------- Once deployers have multiple cells, they may have to update tooling to specify the microversion to uniquely identify hypervisors or services, for example, to delete a service. Developer impact ---------------- None Implementation ============== Assignee(s) ----------- Primary assignee: Matt Riedemann (mriedem) Other contributors: Dan Peschman (dpeschman) Work Items ---------- * Write a database schema migration to add the services.uuid column. * Add the uuid field to the Service object. * Generate a uuid for new services if not specified during create(). * Generate and save a uuid for old services upon retrieval from the database, like when compute nodes got a uuid [1]_. * Add `get_by_uuid` methods to the ComputeNode and Service objects. * Add an online data migration for service uuids like what we had for compute nodes [2]_. * Update the ``nova.compute.api.HostAPI`` methods which take an ID and check if the ID is a uuid and if so, query for the resource using the `get_by_uuid` method on the object, otherwise use `get_by_id` as today. * Add the microversion to the `os-hypervisors` and `os-services` APIs including validation to ensure the incoming id is a uuid. This also includes changing the request format of the `os-services` PUT method. This is likely going to be a large and relatively complicated change to review, but given all of these changes are going to be in the same microversion we cannot realistically break these changes up. * Update the compute API response schema validation for hypervisors [3]_ and services [4]_. Note that the Tempest response schema already allows for integers or strings. As part of this change, we should update the response schema validation in Tempest to be strict that the hypervisor and service id should be a uuid after this new microversion. Dependencies ============ None Testing ======= * Unit tests for negative scenarios, like not being able to find a service by uuid in multiple cells. We should also test passing a non-uuid integer value to the changed APIs with the new microversion to ensure the query parameter validation makes that request fail with a 400 error. * Functional testing for API samples to ensure the 'id' value in a response after the microversion is a uuid and not an integer. * Tempest API tests *may* be added, although we can probably handle that same test coverage with in-tree functional tests. * We will have to test all of the `os-services` PUT method changes with in-tree functional tests because Tempest does not test disabling or forcing down a compute service since that would break a concurrent multi-tenant Tempest run. Documentation Impact ==================== The `os-services`_ and `os-hypervisors`_ API reference docs will need to be updated to note the new microversion takes as input and returns in the response a uuid value for the 'id' key. .. _os-services: https://developer.openstack.org/api-ref/compute/#compute-services-os-services .. _os-hypervisors: https://developer.openstack.org/api-ref/compute/#hypervisors-os-hypervisors References ========== .. [1] https://github.com/openstack/nova/blob/13.0.0/nova/objects/compute_node.py#L243 .. [2] https://github.com/openstack/nova/blob/13.0.0/nova/db/sqlalchemy/api.py#L6436 .. [3] https://github.com/openstack/tempest/blob/15.0.0/tempest/lib/api_schema/response/compute/v2_1/hypervisors.py#L68 .. [4] https://github.com/openstack/tempest/blob/15.0.0/tempest/lib/api_schema/response/compute/v2_1/services.py#L27 History ======= .. list-table:: Revisions :header-rows: 1 * - Release Name - Description * - Pike - Introduced