This proposes a method to support hard affinity of nodes to conductors based on locality tags. Change-Id: I043c4615394b6dae873fb46a6597e2e8076f21bb Story: 2001795
7.6 KiB
Conductor/node grouping affinity
https://storyboard.openstack.org/#!/story/2001795
This spec proposes adding a conductor_group
property to
nodes, which can be matched to one or more conductors configured with a
matching conductor_group
configuration option, to restrict
control of those nodes to these conductors.
Problem description
Today, there is no way to control the conductor-to-node mapping. This is desirable for a number of reasons:
- An operator may have an Ironic deployment that spans multiple sites. Without control of this mapping, images may be pulled over WAN links. This causes slower deployments and may be less secure.
- Similarly, an operator may want to map nodes to conductors that are physically closer to the nodes in the same site, to reduce the number of network hops between the node and the conductor. A prime example of this would be to place a conductor in each rack, reducing the path to only go through the top-of-rack switch.
- A deployer may have multiple networks for out-of-band control, that must be completely isolated. This feature would allow isolating a conductor to a single out-of-band network.
- A deployer may have multiple physical networks that not all conductors are connected to. By configuring the mapping correctly, conductors can manage only the nodes which they can communicate with. This is described further in another RFE.[0]
Proposed change
We propose adding a conductor_group
configuration option
for the conductor, which is a single arbitrary string specifying some
grouping characteristic of the conductor.
We also propose a conductor_group
field in the node
record, which will be used to map a node to a conductor. This matching
will be done case-insensitive, to make things a bit easier for
operators.
A blank conductor_group
field or config is the default.
A conductor without a group can only manage nodes without a group, and a
node without a group can only be managed by a conductor without a
group.
The hash ring will need to be modified to take grouping into account, as described below in the RPC API Impact section.
Alternatives
Another RFE[1] proposes a complex system of hard and soft affinity, affinity and anti-affinity, and scoring of placement to a conductor with multiple tags. This is quite complex, and I don't believe we'll get it done in the short term. Completing this more basic work doesn't block this more complex work, and so we should take it one step at a time.
Data model impact
A conductor_group
field will be added to the nodes
table, as a VARCHAR(255)
. This will have a default of
""
, or the empty string. This string will be used in the
hash ring calculation, so there's no sense in defaulting to
NULL
.
A conductor_group
field will also be added to the
conductors table, also as a VARCHAR(255)
. This will also
have a default of ""
, or the empty string. This will be
used to build the hash ring to look up where nodes should land.
State Machine Impact
None.
REST API impact
The conductor_group
field of the node will be added to
the node object in the REST API, with a microversion as usual. It will
be allowed in POST and PATCH requests. As with the database, it will be
restricted to 255 characters. There must be a conductor in that group
available, as the conductor services node creation and updates, and is
selected via the hash ring.
It's worth noting that we would like to expose the grouping of conductors via the REST API eventually. However, the best way to do this isn't immediately clear, so we leave it outside the scope of this spec for now. Another RFE[3] proposes a service management API that may be a good fit.
Client (CLI) impact
"ironic" CLI
None, it's deprecated.
"openstack baremetal" CLI
The conductor_group
field for a node will be exposed in
the client output, and added to the node create
and
node set
commands.
RPC API impact
This will affect which conductor is the destination for RPC calls corresponding to a given node, however won't have a direct effect on the RPC API itself.
The hash ring will change such that the internal keys for the hash
ring will now be of the structure
"$conductor_group:$drivername"
. A colon (:
) is
used as the separator between the two, to eliminate conflicts between
conductor groups and drivers or hardware types. For example, an
agent_ilo
key with no separator could mean a node with no
group and the agent_ilo
driver, or it could mean a node
with group agent_
using the ilo
hardware type.
To handle upgrades, hash ring keys will be built without the conductor
group while the service is pinned to a version before this feature, and
built with the conductor group when the service is unpinned or pinned to
a version after this feature is implemented.
We handle upgrades by ignoring grouping for services which have a pin in the RPC version that is less than the release with this feature. Once everything is upgraded and unpinned, we begin using the grouping tags configured.
Operators should leave a sufficient number of conductors available without a grouping tag configured to run the cluster, until nodes can be configured with the grouping tag. Any nodes without a grouping tag will only be managed by conductors without a grouping tag.
Driver API impact
Hash ring generation and lookup will include the grouping tag, as specified above in the RPC API Impact section.
Nova driver impact
This change is transparent to Nova.
Ramdisk impact
None.
Security impact
No direct impact; however this provides another mechanism for securing a deployment by enabling logical infrastructure segregation.
Other end user impact
None.
Scalability impact
None.
Performance Impact
None.
Other deployer impact
Deployers that wish to use this feature will need to manage the process of labeling conductors and nodes to enable it, which may be a non-trivial task.
Developer impact
None.
Implementation
Assignee(s)
- Primary assignee:
-
jroll
- Other contributors:
-
dtantsur
Work Items
- Add database fields.
- Add conductor config and populate conductor DB field.
- Change the hash ring calculation, and bump the RPC API so that we can pin during upgrades.
- Add fields to the node and conductor objects.
- Make the REST API changes.
- Update the client library/CLI.
- Document the feature.
Dependencies
None.
Testing
Unit tests should be sufficient, as that's how we test our hash ring now. It's difficult to test this with Tempest without exposing conductor grouping via the REST API.
Upgrades and Backwards Compatibility
This is described in the RPC API Impact section.
Documentation Impact
This should be documented in the install guide and admin guide.
References
[0] https://storyboard.openstack.org/#!/story/1734876
[1] https://storyboard.openstack.org/#!/story/1739426
- [2] Notes from the Rocky PTG session:
-
https://etherpad.openstack.org/p/ironic-rocky-ptg-location-awareness