Intermediate networking service
Related-Bug: 2063169 Change-Id: If048a175a5e014d3f9f7961143b3c06e96f475ac
This commit is contained in:
parent
888f8581d6
commit
6aeec08df0
|
@ -0,0 +1,416 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===============
|
||||
Project Mercury
|
||||
===============
|
||||
|
||||
https://bugs.launchpad.net/ironic/+bug/2063169
|
||||
|
||||
This is a project to create an simplified framework between Ironic and
|
||||
physical network configuration to facilitate orchustration of networking
|
||||
in a delineated way from existing OpenStack Neutron service in a model
|
||||
which would able to operated effectively by another team which is not
|
||||
a "cloud" team, but a "network" team.
|
||||
|
||||
The reasons why are plentiful:
|
||||
|
||||
* The number of Operators utilizing Ironic continue to grow, although the
|
||||
operators utilizing Ironic in fully integrated configurations is not
|
||||
growing at the same rate as operators running in a "standalone" mode.
|
||||
* Operators needing physical switch management generally need to operate
|
||||
in an environment with strong enforcement of separation of duties.
|
||||
i.e. The software might not be granted access to the Switch management
|
||||
framework, nor can such a service be accessible by any users under any
|
||||
circumstances.
|
||||
* The introduction of DPUs generally means that we now have potential cases
|
||||
where switches need to be programmed to provision a DPU, and then the DPU
|
||||
needs to be programmed to provision servers.
|
||||
|
||||
The goals can be summarized as:
|
||||
|
||||
* Provide a mechanism to configure L2 networks on Switches
|
||||
* Provide a mechanism to configure L2 networks to be provided to a host
|
||||
from a DPU.
|
||||
* Provide a mechanism to accomodate highly isolated network management
|
||||
interfaces where operators restrict access such that *only* Ironic
|
||||
is able to connect to the remote endpoint.
|
||||
* Provide a tool to apply and clean-up configuration, *not* track
|
||||
and then assert configuration. This doesn't preclude a future
|
||||
"double check this configuration" mode from existing at some point,
|
||||
but the minimal viable functionality is application and removal of
|
||||
the network configuration.
|
||||
* Provide a mechanism of receiving the call to do something, reading in
|
||||
networking configuration credentials from local storage, and then doing
|
||||
so without the need of a database *OR* shared message bus.
|
||||
|
||||
This project is NOT:
|
||||
|
||||
* Intended to provide any sort of IPAM functionality.
|
||||
* Intended to provide management of Routing.
|
||||
* Intended to provide management of Security Groups or Firewalling.
|
||||
* Intended to provide a public ReSTful API, nor require a database.
|
||||
|
||||
This project MAY:
|
||||
|
||||
* Provide a means to help enable and deploy advanced tooling to a DPU under
|
||||
Ironic's control.
|
||||
* Provide a means of offloading some of the layer2 interaction responsibility
|
||||
in an environment *with* Neutron and Ironic, espescially.
|
||||
|
||||
.. warning::
|
||||
This document is not a precise and thus prescriptive design document, but
|
||||
an document to record and surface the ideas in a way that can foster
|
||||
communication and consensus building. In this case, we are likely to
|
||||
leave it to the implementer's progotive with this document being overall
|
||||
guard rails.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Essentially, we need a service to facilitate secure and delienated network
|
||||
management which can be owned and operated by an infrastructure another team
|
||||
in an enterprise organization, which brings together, aspects like simple code
|
||||
patterns and playbooks such that they can trust the interface layer to apply
|
||||
basic network configuration and enable easier use of Bare Metal.
|
||||
|
||||
Furthermore, the available ecosystem in the DPU spaces wants to model their
|
||||
devices in a variery of ways and some of those devices have inherent
|
||||
limitations. For example, some devices are just another computer embedded and
|
||||
connected to the same PCI device bus. Others present the ability
|
||||
to load a P4 program to handle specific tasks. Or the available Flash and RAM
|
||||
of a device is highly limited such that options are very limited and entirely
|
||||
exclude all "off the shelf" operating systems and their utilities. This nature
|
||||
makes almost every device and their resulting use case entirely govern their
|
||||
configuration and use model which means an easy to modify modular interface
|
||||
would provide the greatest potential impact.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
We are proposing an RPC service. Specifically something along the lines of a
|
||||
JSON-RPC endpoint, which multiple ironic conductors would be able to connect
|
||||
to in order to request networking changes to be made.
|
||||
|
||||
Along side of the RPC service, we would have an appropriately named
|
||||
``network_interface`` driver to take the information stored inside of Ironic
|
||||
and perform attachment of interfaces based upon the provided information.
|
||||
|
||||
.. note::
|
||||
A distinct possibility exists that we may actually start with a hybird
|
||||
dual ``network_interface`` driver to help us delineate and handle
|
||||
integration in a clear fashion. This is much more of an "implementor's
|
||||
progorative" soft of item.
|
||||
|
||||
MVP would likely exclude locking, but be modeled as a single worker service
|
||||
or container which does not maintain state, largely simplfies the problem to
|
||||
"who is logged into what" to make concurrent changes, which has been the
|
||||
historical driver for locking.
|
||||
|
||||
.. note::
|
||||
Teaching networking-baremetal to call this proposed RPC service is
|
||||
generally considered out of scope of this work, but entirely within
|
||||
reason and possibility to facilitate as this would provide functionally
|
||||
a capability for some ML2 calls to be proxied through.
|
||||
|
||||
The overall call model, at a high level could take the following flow.::
|
||||
|
||||
┌──────────────────────────┐
|
||||
│Inbound Request/Connection│
|
||||
└───┬──────────────────────┘
|
||||
│
|
||||
┌───▼──────────────────────────────────┐
|
||||
│{"type": "ml2:update_port_postcommit",│
|
||||
│ "payload": {"context": {...}}} │
|
||||
└───┬──────────────────────────────────┘
|
||||
│
|
||||
┌───▼──────────────────────────┐
|
||||
│Invoke ML2 Plugin With Context│
|
||||
└───┬──────────────────────────┘
|
||||
│
|
||||
┌───▼────────────────────────────────┐
|
||||
│Plugin handles locking, if necessary│
|
||||
└───┬────────────────────────────────┘
|
||||
│
|
||||
┌───▼───────────────────────────────────────────────────────┐
|
||||
│Plugin succeeds (HTTP 200?) or fails and returns HTTP error│
|
||||
└───────────────────────────────────────────────────────────┘
|
||||
|
||||
With initially expected ml2 methods being:
|
||||
|
||||
* get_allowed_network_types - Returns a list of allowed types
|
||||
* bind_port
|
||||
* update_port_postcommit
|
||||
* delete_port_postcommit
|
||||
* create_port_postcommit
|
||||
* delete_network_postcommit
|
||||
* create_network_postcommit
|
||||
* try_to_bind_segment_for_agent - This may not be required outright for an
|
||||
MVP, however it does contain additional arguments which needs to influence
|
||||
our overall design and we should account for it up front.
|
||||
|
||||
For the remote RPC service, it is anticipated that logging will need to
|
||||
be verbose enough that Operators can understand the questions they may raise
|
||||
when investigating issues. For example: When, Who, What, Why, and How.
|
||||
Plugin code in Ironic should **also** log verbosely when invoked to ensure
|
||||
operators can match requests and resulting changes should an issue arise.
|
||||
|
||||
While beyond an initial MVP of basic functionality, to solve the DPU case,
|
||||
the overall pattern model would likely take the shape of one where Ironic
|
||||
would enumerate through the "child nodes", attach the child nodes to the
|
||||
requested physical network, and then engage on some level of programming
|
||||
which may need to be vendor or deployment specific based upon the overall
|
||||
use model. Details which at present time cannot be determined without the
|
||||
foundational layer needing to be constructed before being built upon.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The closest alternative would be a ``standalone Neutron`` coupled with some
|
||||
sort of extended/proxy RPC model, which is fine, but that really does not
|
||||
address the underlying challenge of the attach/detach functionality
|
||||
being needed by Infrastructure Operators. It also introduces modeling which
|
||||
might not be suitable for bulk infrastucture operators, as they tend to have
|
||||
their own IP Address management and routers in place for the physical portion
|
||||
of their infrastucture.
|
||||
|
||||
Another possibility would be to directly embed the network attach/detach
|
||||
loading and logic into Ironic itself, however that would present difficulties
|
||||
with maintenance where we largely want to unlock capability.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
At this time, we are largely modeling the idea to leverage existing port data
|
||||
stored inside of Ironic which is utilized for attachment operations.
|
||||
|
||||
A distinct possibility exists we may look at storing some additional physical
|
||||
and logical networking detail inside of Ironic's database to be included in
|
||||
requests, which could possibly be synchronized, but this would be beyond the
|
||||
scope of the minimum viable product as in the initial phase we intend to use
|
||||
the VIF attach/detach model to represent the logical network to be attached.
|
||||
|
||||
State Machine Impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
With a MVP, we do not anticipate any REST API changes to Ironic itself
|
||||
with the very minor exception of the loosing of a Regular Expression
|
||||
around what Ironic accepts for VIF attachment requests. This was agreed
|
||||
upon by the Ironic community quite some time ago and just never performed.
|
||||
|
||||
Existing fields on a node and port will continue to be used just as they
|
||||
have before with an MVP.
|
||||
|
||||
Post-MVP may include some sort of /v1/physical_network endpoint to be
|
||||
designed, but that is anticipated to be designed once we know more.
|
||||
|
||||
Client (CLI) impact
|
||||
-------------------
|
||||
|
||||
"openstack baremetal" CLI
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
None
|
||||
|
||||
"openstacksdk"
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
None
|
||||
|
||||
RPC API impact
|
||||
--------------
|
||||
|
||||
This change proposes a service which would be accessed by Ironic utilizing
|
||||
an RPC model of interaction. This means there would be some shared meaning
|
||||
for call interactions in the form of a library.
|
||||
|
||||
In all likelihood, this may be as simple as "attach" and "detach", but given
|
||||
the overall needs of an MVP and a use model we're focusing upon trying to
|
||||
leverage existing tooling as well, the exact details are best discovered
|
||||
through the development process which likely covers what was noted above in
|
||||
the Proposed Change section.
|
||||
|
||||
Driver API impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
Nova driver impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Ramdisk impact
|
||||
--------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
Impact for Ironic itself is minimal, although it will require credentials to
|
||||
be set for the remote service to signal interface attachment/detachments.
|
||||
|
||||
The security risk largely revolves around the new service we're looking at
|
||||
creating with this effort. The shared library utilized to connect to the
|
||||
remote service, likely needs to also contain the necessary client
|
||||
wrapper code, as an MVP service is likely going to start only with support
|
||||
for HTTP Digest Authentication, which can then move towards certificate
|
||||
authentication as it evolves.
|
||||
|
||||
In large part because that service will need to load and combine a set of
|
||||
credentials and access information. As such, this new service will **not**
|
||||
be a user facing service.
|
||||
|
||||
Today, individual ports are attached through a combination of network
|
||||
identifier and a binding profile which is utilized to map a port to a switch.
|
||||
In this model, there would be no substantial difference. A network_id would
|
||||
be a user supplied detail, and the local_link_information would contain
|
||||
sufficient information for the plugin executing to identify which device.
|
||||
The new service would retrieve the details to access the remote device from
|
||||
local configuration, and combine the rest of the binding profile and target
|
||||
network identifier to facilitate the attachment of the port to the device.
|
||||
|
||||
.. note::
|
||||
This security impact does not denote the likely situation of DPU credential
|
||||
management. We are presently defferring the possibility as a challenge we
|
||||
would focus on after an initial minimum viable product state is reached.
|
||||
|
||||
.. note::
|
||||
This security risk does not include any future mechanisms to do perform
|
||||
aspects such as software deployment on a DPU to facilitate a fully
|
||||
integrated with Neutron case, which is something we would want to
|
||||
identify and determine as we iterate along the path to support such
|
||||
a capability.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Scalability impact
|
||||
------------------
|
||||
|
||||
Please see the Performance Impact section below.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
This proposal is intentionally designed to be limited and isolated
|
||||
to minimze risk and reduce deployment complexity. It is also intentionally
|
||||
modeled as a tool to "do something", and that "something" happens to be
|
||||
configuration in area where device locking is necessary. This realistically
|
||||
means that the only content written to disk is going to be lock files.
|
||||
|
||||
Furthermore, the possibility exists that the Ironic driver code utilized
|
||||
*could* wait for a response, where today Neutron port attachment/detachment
|
||||
calls are asynchronous. This would pose an overall improvement for end users
|
||||
of Ironic. This is solved today as a 15 second sleep by default, and
|
||||
might not be necessary in this design overall improving Ironic performance.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
To utilize this functionality, deployers would need to deploy a new service.
|
||||
|
||||
This would be opt-in, and would not impact existing users.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
<Volunteer #1>
|
||||
|
||||
Other contributors:
|
||||
<Volunteer #2>
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
Broadly speaking, the work items would include:
|
||||
|
||||
1) Prototyping this new service.
|
||||
2) Prototyping an ironic network_interface driver to utilize this new
|
||||
service.
|
||||
3) Loosen (as previously agreed in past Ironic PTGs) the "vif binding"
|
||||
restriction Regular Expression to permit a VLAN id to be provided.
|
||||
4) Test!
|
||||
|
||||
.. note::
|
||||
The list below is intended to paint a picture of what we feel are the
|
||||
possible steps beyond the initial step of creating an Minimum Viable
|
||||
Product. They are included to provide a complete contextual picture
|
||||
to help the reader understand our mental model.
|
||||
|
||||
Past initial prototyping, the following may apply order:
|
||||
|
||||
* Creation of a common library for Ironic and any other program or tool
|
||||
to utilize to compose RPC calls to this service.
|
||||
* Extend support to VXLAN ports, which may require additional details or
|
||||
design work to take place and work in any ML2 plugins utilized.
|
||||
* Design an API rest endpoint to facilitate the tracking of physical
|
||||
networks to be attached to baremetal nodes.
|
||||
* Add support to networking-baremetal to try and reconcile these
|
||||
physical networks into Ironic, so node port attachment/detachments
|
||||
can take place.
|
||||
* Add support to networking-baremetal for it to proxy the request
|
||||
through to this service for port binding requests in Neutron.
|
||||
* Design a new model, likely superceeding VIFS, but vifs could just also be
|
||||
an internal network ID moving forward. This would likely be required for
|
||||
formal adoption of the functionality by Metal3, but standalone users could
|
||||
move to leverage this immediately once implemented.
|
||||
* Development of a model and flow where DPU devices could have a service
|
||||
deployed to them as part of a step invoked by Ironic. This would involve
|
||||
many challenges, but could be used to support the Neutron Integrated
|
||||
OVS/OVN agents to operate on the card for cases such as the remote
|
||||
card being in a hypervisor node.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
To be determined.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
An ideal model of testing in upstream CI has not been determined, and
|
||||
is dependent upon the state upon reaching a minimum viable product
|
||||
state, and then what the next objectives appear to be.
|
||||
|
||||
This may involve duplication of Ironic's existing multinode job in a
|
||||
standalone form. Ultimately the expectation is we would have one or
|
||||
more CI jobs dedicated to supporting such functionality being exercised.
|
||||
|
||||
Upgrades and Backwards Compatibility
|
||||
====================================
|
||||
|
||||
This functionality is anticipated to be "net new" for Ironic and exposed
|
||||
to end users through a dedicated ``network_interface`` module which could
|
||||
be selected by users at a point in the future. As such no upgrade or backwards
|
||||
compatability issues are anticipated.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
No impact is anticipated at this time.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
https://etherpad.opendev.org/p/ironic-ptg-april-2024#L609
|
|
@ -0,0 +1 @@
|
|||
../approved/mercury.rst
|
Loading…
Reference in New Issue