Robustify Compute Node Hostnames backlog spec
This is mostly a brain-dump spec on the topic of compute hosts and how fragile they are in terms of hostname handling. There has long been a requirement that computes can NEVER change hostnames, and we have few tools to even detect the problem before we corrupt the database if it happens. Here I have documented some of the things we could do to make that more robust, should we choose to do so. This is based on a recent near-catastrophe and thus reflects things that would have avoided pain in a real scenario. Per discussion at the PTG, I am adding this as a backlog spec, to be an overarching guide for multiple smaller specs to provide more detailed progress towards the goals described here. Change-Id: I72fa3f605cfcf7c3dd0ff4c791be7df8f19f058b
This commit is contained in:
@@ -24,4 +24,8 @@ Template:
|
||||
|
||||
Approved (but not implemented) backlog specs:
|
||||
|
||||
None currently
|
||||
.. toctree::
|
||||
:glob:
|
||||
:maxdepth: 1
|
||||
|
||||
approved/*
|
||||
|
||||
317
specs/backlog/approved/robustify-compute-hostnames.rst
Normal file
317
specs/backlog/approved/robustify-compute-hostnames.rst
Normal file
@@ -0,0 +1,317 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
========================================
|
||||
Robustify Compute Node Hostname Handling
|
||||
========================================
|
||||
|
||||
Include the URL of your launchpad blueprint:
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/example
|
||||
|
||||
Nova has long had a dependency on an unchanging hostname on the
|
||||
compute nodes. This spec aims to address this limitation, at least
|
||||
from the perspective of being able to detect an accidental change and
|
||||
avoiding catastrophe in the database that can currently result from a
|
||||
hostname change, whether intentional or not.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently nova uses the hostname of the compute (specifically
|
||||
``CONF.host``) for a variety of things:
|
||||
|
||||
#. As the routing key for communicating with a compute node over RPC
|
||||
#. As the link between the instance, service and compute node objects
|
||||
in the database
|
||||
#. For neutron to bind ports to the proper hostname (and in some
|
||||
cases, it must match the equivalent setting in the neutron agent
|
||||
config)
|
||||
#. For cinder to export a volume to the proper host
|
||||
#. As the resource provider name in placement (this actually comes
|
||||
from libvirt's notion of the hostname, not ``CONF.host``)
|
||||
|
||||
If the hostname of the compute node changes, all of these links
|
||||
break. Upon starting the compute node with the changed name, we will
|
||||
be unable to find a ``nova-compute`` ``Service`` record in the
|
||||
database that matches, and will create a new one. After that, we will
|
||||
fail to find the matching ``ComputeNode`` record and create a new one
|
||||
of those, with a new UUID. Instances that refer to both the old
|
||||
compute and service records will not be associated with the running
|
||||
host, and thus become unmanageable through the API. Further, new
|
||||
instances that end up created on the compute node after the rename
|
||||
will be able to claim resources that have been promised to the
|
||||
orphaned instances (such as PCI devices and VCPUs) as the tracking of
|
||||
those is associated with the old compute node record.
|
||||
|
||||
If the orphaned instances are relatively static, the first indication
|
||||
that something has gone wrong may be long after the actual rename,
|
||||
where reality has forked and there are instances running on one
|
||||
compute node that refer to two different compute node records and thus
|
||||
are accounted for in two separate locations.
|
||||
|
||||
Further, neutron, cinder, and placement resources will have the old
|
||||
information for existing instances and new information for current
|
||||
instances, which requires reconciliation. This situation may also
|
||||
prevent restarting old instances if the old hostname is no longer
|
||||
reachable.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
* As an operator, I want to make sure my database does not get
|
||||
corrupted due to a temporary or permanent DNS change or outage
|
||||
* As an operator, I may need to change the name of a compute node as
|
||||
my network evolves over many years.
|
||||
* As a deployment tool writer, I want to make sure that changes in
|
||||
tooling and libraries never cause data loss or database corruption.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
There are multiple things we can do here to robustify Nova's handling
|
||||
of this data. Each one increases safety, but we do not have to do all
|
||||
of them.
|
||||
|
||||
Ensure a stable compute node UUID
|
||||
---------------------------------
|
||||
|
||||
For non-ironic virt drivers, whenever we generate a compute node uuid,
|
||||
we should write that to a file on the local disk. Whenever we start,
|
||||
we should look for that UUID file, use that, and under no
|
||||
circumstances should we generate another one. To facilitate
|
||||
pre-generating this by deployment tools, we should use this if we are
|
||||
starting for the first time and create a ComputeNode record in the
|
||||
database.
|
||||
|
||||
We would put the actual lookup of the compute node UUID in the
|
||||
`get_available_nodes()` method of the virt driver (or create a new
|
||||
UUID-specific one). Ironic would override this with its current
|
||||
implementation that returns UUIDs based on the state of Ironic and the
|
||||
hash ring. Thus only non-Ironic computes would read and write the
|
||||
persistent UUID file.
|
||||
|
||||
Single-host virt drivers like libvirt would be able to tolerate a
|
||||
system hostname change, updating ``ComputeNode.hypervisor_hostname``
|
||||
without breaking things.
|
||||
|
||||
Link ComputeNode records with Service records by id
|
||||
---------------------------------------------------
|
||||
|
||||
Currently the ComputeNode and Service records are associated in the
|
||||
database purely by the hostname string. This means that they can
|
||||
become disassociated, and is also not ideal from a performance
|
||||
standpoint. Some other data structures are linked against ComputeNode
|
||||
by id, and thus are not re-associated when the name matches.
|
||||
|
||||
This relationship used to exist, but was `removed`_ in the Kilo
|
||||
timeframe. I believe this was due to the desire to make the process
|
||||
less focused on the service object and more on the compute node
|
||||
(potentially because of Ironic) although the breaking of that tight
|
||||
relationship has serious downsides as well. I think we can keep the
|
||||
tight binding for single-host computes where it makes sense.
|
||||
|
||||
At startup, ``nova-compute`` should resolve its ComputeNode object via
|
||||
the persistent UUID, find the associated Service, and fail to start if
|
||||
the hostname does not match CONF.host. Since this is used with
|
||||
external services, we should not just "fix it" as those other links
|
||||
will be broken as well. This will at least allow us to avoid opening
|
||||
the window for silent data corruption.
|
||||
|
||||
Link Instances to a ComputeNode by id
|
||||
-------------------------------------
|
||||
|
||||
Currently instance records are linked to their Service and ComputeNode
|
||||
objects purely by hostname. We should link them to a ComputeNode by
|
||||
its id. Since we need the Service in order to get the RPC routing key
|
||||
or for hostname resolution when talking to external services, we
|
||||
should find that based on the Instance->ComputeNode->Service id
|
||||
relationship.
|
||||
|
||||
We already link PCI allocations for instances to the compute node by
|
||||
id, even though the instance itself is linked via hostname. This
|
||||
discrepancy makes it easy to get one out of sync with the other.
|
||||
|
||||
Potential Changes in the future
|
||||
-------------------------------
|
||||
|
||||
If the above changes are made, we open ourselves to the future
|
||||
possibility for supporting:
|
||||
|
||||
#. Renaming service objects through the API if a compute host really
|
||||
needs to have its hostname changed. This will require changes to
|
||||
the other services at the same time, but nova would at least have a
|
||||
single source of truth for the hostname, making it feasible.
|
||||
#. If we do all of this, Nova could potentially be confident enough of
|
||||
an intentional rename that it could update port bindings, cinder
|
||||
volume attachments, and placement resources to make it seamless.es
|
||||
#. Moving to the use of the service UUID as the RPC routing key, if
|
||||
desired.
|
||||
#. Dropping quite a bit of duplicate string fields from our database.
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
We can always do nothing. Compute hostnames have been unchangeable
|
||||
forever, and the status quo is "don't do that or it will break" which
|
||||
is certainly something we could continue to rely on.
|
||||
|
||||
We could implement part of this (i.e. the persistent ComputeNode UUID)
|
||||
without the rest of the database changes. This would allow us to
|
||||
detect the situation and abort, but without (the work required to get)
|
||||
the benefits of a more robust database schema that could potentially
|
||||
also support voluntary renames.
|
||||
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
Most of the impact here is to the data model for Instance,
|
||||
ComputeNode, Service. Other models that reference compute hostnames
|
||||
may also make sense to change (although it's also reasonable to punt
|
||||
that entirely or to a different phase). Examples:
|
||||
|
||||
* Migration
|
||||
* InstanceFault
|
||||
* InstanceActionEvent
|
||||
* TaskLog
|
||||
* ConsoleAuthToken
|
||||
|
||||
Further, host aggregates use the service name for
|
||||
membership. Migrating those to database IDs is not possible since
|
||||
multiple cells will cause overlap. We could migrate those to UUIDs or
|
||||
simply ignore this case and assume that any *actual* rename operation
|
||||
in the future would involve API operations to fix aggregates (which is
|
||||
doable, unlike changing the host of things like Instance).
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
No specific REST API impact for this, other than the potential for
|
||||
enabling a mutable Service hostname in the future.
|
||||
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
No impact.
|
||||
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
No impact.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
Not visible to end users.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Theoretically some benefit comes from integer-based linkages between
|
||||
these objects that are currently linked by strings. Eventually we
|
||||
could reduce a bunch of string duplication from our DB schema and
|
||||
footprint.
|
||||
|
||||
There will definitely be a one-time performance impact due to the
|
||||
online data migration(s) required to move to the more robust schema.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
This is really all an (eventual) benefit to the deployer.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
There will be some churn in the database models during the
|
||||
transition. Looking up the hostname of an instance will require
|
||||
Instance->ComputeNode->Service, but this can probably be hidden with
|
||||
helpers in the Instance object such that not much has to change in the
|
||||
actual workflow.
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
|
||||
There will be some substantial online data migrations required to get
|
||||
things into the new schema, and the benefits will only be achievable
|
||||
in a subsequent release once everything is converted.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
danms
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Persist the compute node UUID to disk when we generate it. Read the
|
||||
compute node UUID from that location if it exists before we look to
|
||||
see if we need to generate, create, or find an existing node record.
|
||||
* Change the compute startup procedures to abort if we detect a
|
||||
mismatch
|
||||
* Make the schema changes to link database models by id. The
|
||||
ComputeNode and Service objects/tables still have the id fields that
|
||||
we can re-enable without even needing a schema change on those.
|
||||
* Make the data models honor the ID-based linkages, if present
|
||||
* Write an online data migration to construct those links on existing
|
||||
databases
|
||||
|
||||
Later, there will be work items to:
|
||||
* Drop the legacy columns
|
||||
* Potentially implement an actual service rename procedure
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
There should be no external dependencies for the base of this work,
|
||||
but there is a dependency on the release cycle, which affects how
|
||||
quickly we can implement this and drop the old way of doing it.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit and functional testing for the actual compute node startup
|
||||
behavior should be fine. Existing integration testing should ensure
|
||||
that we haven't broken any of the runtime behavior. Grenade jobs
|
||||
will test the data migration and we can implement some nova status
|
||||
items to help validate things in those upgrade jobs.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
There will need to be some documentation about the persistent compute
|
||||
node UUID file for deployers and tool authors. Ideally, the only
|
||||
visible result of this would be some additional failure modes if the
|
||||
compute service detects an unexpected rename, so some documentation of
|
||||
what that looks like and what to do about it would be helpful.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
TODO(danms): There are probably bugs we can reference about compute
|
||||
node renames being not possible, or problematic if/when they happen.
|
||||
|
||||
.. _removed: https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/detach-service-from-computenode.html
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Antelope
|
||||
- Introduced
|
||||
Reference in New Issue
Block a user