Sivasathurappan Radhakrishnan 04f25414c2 Add prep-for-network-aware-scheduling-pike spec

Change how we talk to neutron to allow us in the future to implement
network aware scheduling.

This is continuation of the work started during newton.

This is a important, because this blocks routed networks, which is
blocking the removal of cells v1.

blueprint prep-for-network-aware-scheduling-pike
Previously-Approved: Ocata

Change-Id: I4d87fd0cc8148aa90a1968064378b11a0d7b65c1

2017-02-28 13:26:17 +00:00

7.9 KiB

Raw Blame History

Prep work for Network aware scheduling (Pike)

https://blueprints.launchpad.net/nova/+spec/prep-for-network-aware-scheduling-pike

Change how we talk to neutron to allow us in the future to implement network aware scheduling.

This continues on from the work started in Newton: http://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/prep-for-network-aware-scheduling.html

Problem description

Some IP subnets can be restricted to a subset of hosts, due to an operators network configuration. In this environment, it means you could build somewhere that has no public IPs available. The ability to manage IP addresses in this way is being added into Neutron by the Routed Networks feature:

To make this possible, we need to know the details of all the user's requested ports, and what resources are required, before asking the scheduler for a host. In addition, after picking a location, we should check that there is an IP available before continuing with the rest of the build process.

As an aside, the allocate_for_instance call currently contains both parts of that operation and has proved very difficult to maintain and evolve. In newton, we changed the code to separate the update and create operations, so we are now able to look at moving where those operations happen.

Use Cases

This is largely a code refactor.

Proposed change

In newton we have changed the code inside allocate_for_instance into clear get/create and update phase. We need to complete the split, by ensuring the network info cache contains all the info required to be shared between the two phases of the operation (get/create ports and update ports). Should a build request fail, and the build is retried on a different host, the ports that Nova creates should be re-used for the new build attempt, just like the ports that are passed into Nova. This bug fix requires the data to be correctly passed in a very similar way, so will be the initial focus of this effort.

Second, we want to move the get/create ports before the scheduler is called. In terms of upgrades, we need to ensure old compute nodes don't re-create ports that the conductor has already created. Similarly, when deleting an instance, the old node should still correctly know which ports were created by Nova and can be deleted when the instance is deleted, in the usual way. For nova-network users, the get/create ports can be a noop.

To avoid problems across upgrades, the early creating of ports is not allowed until all nova-compute nodes are upgraded to the version that understands if a port has been created or not. Once all are upgraded, and credentials are available on the nova-conductor node, ports will be created before calling the scheduler.

The third step is to move the port update into the conductor, right after the scheduler has picked an appropriate host. We will not be able to run this code until all compute nodes have been upgraded to the newest version. Until all nodes have been upgraded, the new nodes will still have to run this code on the Compute node. While annoying, this move is only to help with faster retries, and as such, should not block any progress. Note this port update step includes setting the host on the port, and in the future will be the point an IP is assigned, if the port does not yet have an IP.

It is useful to update the port bindings in the conductor, so any failure in the port binding for a specific host can quickly trigger a retry. This is particularly a problem when you have routed networks, and segments can run out of IP addresses independently.

For nova-network, we can run the existing allocate-for-instance logic in the conductor, after the scheduler is called. For cells v1 users, this should correctly be in the child cell conductor, because each cells v1 cell has its own separate nova-network instance with a different set of IP addresses. (For cells v2 users, the networking is global to the nova deployment, so its doesn't matter where that happens.)

Alternatives

We could attempt to add more complexity into the existing allocate_for_instance code. But history has shown that is likely to create many regressions.

Data model impact

None

REST API impact

None

Security impact

Eventually it could mean we don't need any neutron related credentials on any of the compute nodes. This work will not achieve that goal, but it is a step in the right direction.

Notifications impact

Notifications may now have a different host and service, but they should be otherwise identical.

Other end user impact

None

Performance Impact

Currently the neutron port binding is done in parallel with other long running tasks that the compute node performs during the boot process. This moves the port creation and binding into the critical path of the boot process.

When Nova is creating ports for users, instead of just calling port create with all the parameters, will now first create the port and later update the port. This will slightly increase the load on the Neutron API during the boot process. However this should be minimal, as we are not duplicating any of the expensive parts of the process, such as port binding and IP allocation.

This also generally moves more load into the nova-conductor nodes, but on the upside this reduces the load on the nova-compute nodes.

Other deployer impact

We will need the neutron credentials on nova-conductor, which may not currently have been happening.

To maintain our upgrade promise, we will fall back to the old behaviour for one cycle to give deployers a warning about the missing credentials. The following cycle will require the credentials to be present on nova-conductor.

Developer impact

Improved ability to understand allocate_for_instance, and its replacements.

Implementation

Assignee(s)

Primary assignee:: John Garbutt (IRC: johnthetubaguy)

Work Items

Split allocate_for_instance into two functions
Move create/get port call into the conductor, before calling the scheduler, such that allocate_for_instance no longer creates ports, no op for nova-net. This is likely to be achieved by adding a new method into the network API for both neutron and nova-net.
Move the remainder of allocate_for_instance call into conductor, for both nova-net and neutron

Dependencies

None (however, several things depend on this work)

Testing

Grenade + neutron should ensure the pre-upgrade flow is covered, the regular gate tests should ensure the post-upgrade flow is covered.

We should add functional tests to test the re-schedule flow. We might also need functional tests to check the transition between the pre and post upgrade flows.

Documentation Impact

Need to describe the transition in the release notes, and release specific upgrade documentation, at a minimum.

References

Previous work: http://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/prep-for-network-aware-scheduling.html
Neutron Routed network spec: http://specs.openstack.org/openstack/neutron-specs/specs/newton/routed-networks.html
Nova Routed network spec: https://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/neutron-routed-networks.html

History

Revisions
Release Name	Description
Newton	Introduced
Ocata	Continued
Pike	Reproposed

7.9 KiB Raw Blame History

Prep work for Network aware scheduling (Pike)

Problem description

Use Cases

Proposed change

Alternatives

Data model impact

REST API impact

Security impact

Notifications impact

Other end user impact

Performance Impact

Other deployer impact

Developer impact

Implementation

Assignee(s)

Work Items

Dependencies

Testing

Documentation Impact

References

History

7.9 KiB

Raw Blame History