There are many references to review.openstack.org, and while the redirect should work, we can also go ahead and fix them. Change-Id: Ic82bde84617c461aba1e4a02b3fc562a90c99a6e
15 KiB
Stein Project Priorities
This is a list of development priorities the Ironic team is prioritizing for Stein development, in order of relative size and dependency addressing. Note that this is not our complete backlog for the cycle, we still hope to review and land non-priority items.
The primary contact(s) listed is/are responsible for tracking the status of that work and herding cats to help get that work done. They are not the only contributor(s) to this work, and not necessary doing most of the coding! They are expected to be available on IRC and the ML for questions, and report status on the whiteboard for the weekly IRC sync-up. The number of primary contacts is typically limited to 2-3 individuals to simplify communication. We expect at least one of them to have core privileges to simplify getting changes in.
As the time remaining in the Stein cycle is approximately 30 weeks from the Project Teams Gathering, the list of priorities has been split into two major pieces based upon an estimate of relative size. The overall goal is for the Smaller Goals items to be focused on with in the first few months of the cycle, while the larger Epic Goals may receive some work early on, but will be targeted for later in the cycle.
Smaller Goals
Priority | Primary Contacts |
---|---|
Upgrade Checker | TheJulia, rloo |
Python3 First | derek, TheJulia |
iPXE/PXE interface split | TheJulia, stendulker |
UEFI First | hshiina |
HTTPClient booting | TheJulia |
Nova conductor_group awareness | jroll, TheJulia |
Enhanced Checksum Support | jroll, kaifeng |
DHCP-less/L3 virtual media boot | shekar, stendulker |
Epic Goals
Goal | Primary Contacts |
---|---|
Deploy Templates | mgoddard, dtantsur, rloo |
Graphical Console | mkrai, etingof |
Federation Capabilities | TheJulia, dtantsur |
Task execution improvements | etingof, TheJulia, mgoddard |
No IPA to conductor communication | jroll, rloo |
Getting steps | TheJulia, dtantsur |
Conductor role splitting | jroll, dtantsur |
Neutron Event Processing | vdrok, mgoddard, hjensas |
Inter-Project Goals
Deployment state callbacks to nova | TheJulia, jroll |
Smartnic Support | TheJulia, mkrai, moshele |
Details
Upgrade Checker
This is an OpenStack Community goal for the Stein Cycle. For ironic
this will mean a new command called
ironic-status upgrade check
. This command is intended to
return an error for things that would be fatal for an upgrade such as
new required configuration missing, or schema/data upgrades not yet
performed. The story can be found at story
2003657.
Python3 First
This is an OpenStack Community goal for the Stein Cycle. Most of this
work has already been completed in ironic. Largely we need to change our
tests so we are explicitly testing on Python3. We can't do this for
every test at the moment, but we should be able to change most and still
ensure the bulk of the code paths are covered by tests labeled with
python2
.
We also desire for third party CI to begin to leverage Python3, with a goal of approximately 50% of third party CI jobs until we stop supporting Python2. The story can be found at story 2003230.
iPXE/PXE interface split
This is an older effort that has been restarted in the interest of supporting multiple architectures (such as AArch64, Power, and x86_64) in the same deployment.
As it turns out, Power's architecture expects the older PXELinux style templates that are written by our PXE boot interface. Additionally, while AArch64 can be booted using iPXE, no pre-built binaries are available.
As such, we need to no longer make this global for the conductor, but specific to the node, and splitting the interfaces apart begins to make much more sense. The original specification can be found ipxe-boot interface. The story can be found at story 1628069.
UEFI First
2020 is an important year for Baremetal Operators, as Legacy boot mode support is anticipated to be removed from newer processors being shipped.
To ensure our success, we need to improve our testing and prepare for
the time when UEFI is the only boot mode available for newer hardware.
As a result, this will become a multi-cycle focus to enable the default
boot mode to be changed to uefi
in a future cycle. The
story can be found at story
2003936.
HTTPClient Booting
While the community is interested in supporting HTTPClient based booting, we currently have a few steps to surpass first. Namely the iPXE/PXE interface split and improved UEFI testing.
The nature of this work is to enable an explict HTTP booting scenario where the booting node does not leverage PXE. The story can be found at story 2003934.
Nova conductor_group awareness
This work is exclusively in the ironic virt driver in the openstack/nova repository. This would enable us
to define a conductor_group
to which the nova-compute
process leverages for the view of baremetal nodes it is responsible for.
The story can be found at story
2003942.
Enhanced Checksum Support
Ironic presently defaults to use of MD5 checksums for the
image_checksum
which is far from ideal. During the Rocky
cycle, Glance has enhanced their support for checksum storage, which
means we should enhance ours as well. The story can be found at story
2003938.
DHCP-less/L3 virtual media boot
Some operators and vendors wish to enable ironic to manage deployments where DHCP is not something that is leveraged or utilized in the deployment process. In order to do this, we need to enable some additional capabilities in terms of enabling information to be attached to a deployment ramdisk. The specification can be found at the L3 based deployments specification. The story can be found at story 1749193.
Deploy Templates
In the future, we want to take specific action based upon traits submitted to ironic from Nova describing the instance's expected state or behavior.
This will allow us to take actions and influence the deployment steps, and as such is a continuation of the Deploy Steps work from the Rocky cycle. The story can be found at story 1722275.
Graphical Console
We need a way to expose graphical (e.g. VNC) consoles to users from drivers that support it. We reached agreement on the specification in the Rocky cycle and have started to work through the patches to enable this. Our goal being to have a framework and preferably at least one vendor driver to support Graphical console connectivity. The specification can be found vnc graphical console specification. The story can be found at story 1567629.
Federation Capabilities
Edge computing is bringing a variety of cases where support for federation of ironic deployments can be useful and extremely powerful.
In order to better support this emerging use case, we want to try and agree on a viable path forward that meets several different use cases and requirements. The objective for this effort is an agreed upon specification. The story can be found at story 2001821.
Task execution improvements
We realize that our task execution and locking model is problematic, and while it does scale in some ways, it does not scale in other ways. This work will consist of worker execution improvements, an evaluation and possible implementation of different worker thread execution models, and careful improvement of locking. The story can be found at story 2003943.
No IPA to conductor communication
Larger operators need much more strict security in their deployments, where they wish to prevent all outbound network connectivity to the control plane. Presently the design model requires that nodes are able to reach ironic's API in order to perform heartbeat and lookup operations.
The concept with this is to optionally enable the conductor to drive the deployment by polling IPA using the already known IP address. That being said, this is realistically going to require Task execution improvements to be complete to help ensure that operators are able to have performant deployments. The specification can be found at change 212206. The story can be found at story 1526486.
Getting steps
One of the biggest frustrations that people have with our cleaning
model is the lack of visibility into what steps they can execute. This
is further compounded with deploy steps
. We have ideas on
this and we need to begin providing the mechanisms to raise that
visibility.
This may also involve state machine states to enable the agent to sit in a holding pattern pending operator action.
The goal is ultimately to provide a CLI for the user to be able to understand the available steps that can be utilized. The story can be found at story 1715419.
Neutron Event Processing
Currently ironic has no way to determine when certain asynchronous events actually finish in neutron, and with what result. Nova, on the contrary, uses a special neutron driver, which filters out notifications and posts some of them to a special nova API endpoint. We should do the same. The story can be found at story 1304673.
Conductor role splitting
The conductor presently does all of the work... But does it need to?
This is a question we should be asking ourselves as we evolve, if we can optionally break the conductor into many pieces, to enable edge conductors, or edge local boot management. The goal here is to try and obtain a matrix of distinct actions taken, which will hopefully further guide us as time moves on. The story can be found at story 2003940.
Smartnic Support
Smartnics complicates ironic as the NIC needs to be programmed with the power in a state such that the configuration on the NIC can be changed.
While the effort to support this may ultimately result in enhancements to neutron in the form of Super-Agents to apply the configuration, we still need to understand the impact to our workflows and ensure that sufficient security is still present. The primary objective is to have a joint specification written in advance of the Berlin summit to reach consensus with the Neutron team as to the mechanics, information passing, and setting storage. The story can be found at story 2003346.
Deployment state callbacks to nova
One of the issues in ironic's nova virt driver is that no concept of callbacks exist. Due to this, the virt driver polls the ironic API endpoint repeatedly, which increases overall system load. In an ideal world, ironic would utilize a mechanism to indicate deployment state similar to how neutron informs nova that networking has been configured. The story can be found at story 2003939.