diff --git a/specs/7.0/neutron-resource-cleanup.rst b/specs/7.0/neutron-resource-cleanup.rst new file mode 100644 index 00000000..90abb9d1 --- /dev/null +++ b/specs/7.0/neutron-resource-cleanup.rst @@ -0,0 +1,171 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +======================== +Neutron Resource Cleanup +======================== + +Include the URL of your launchpad blueprint: + +https://blueprints.launchpad.net/fuel/+spec/neutron-resource-cleanup + +On controllers Neutron network resources such as namespaces, interfaces, +OVS ports and processes are managed in two unrelated parts of code: +Neutron agents and OCF scripts including `q-agent-cleanup` script. +The goal is to move all responsibility to a single point - Neutron. + + +Problem description +=================== + +Right now resource cleanup is performed by external script provided by Fuel +(`q-agent-cleanup`). The script lives in `fuel-library` repo and it may be +changed to re-use code from Neutron. Ideally all cleanup code should belong to +Neutron agents as they currently able to manage namespaces, network interfaces, +OVS ports and processes. + +The proposal in inspired by issues: + +* Clearing up the large number of network namespaces is very time consuming. + Pacemaker has very strict timeout management and if the operation doesn't + fit the whole resource may turn into unmanaged state + (https://bugs.launchpad.net/fuel/+bug/1436414). + +* Restart of a particular agent is performed via banning and clearing the + associated pacemaker resource. As result all network namespaces are dropped + and then created again causing traffic interruptions + (https://bugs.launchpad.net/fuel/+bug/1464817). + +* Code defects in `q-agent-cleanup` + (https://bugs.launchpad.net/fuel/+bug/1434196) + + +Proposed change +=============== + +The proposed change is mostly related to code refactoring and optimization. It +doesn't suggest changes in the architecture. OCF scripts will remain the +entry-point for resource clean-up, however the code will move to Neutron. + +Step-by-step changes: + 1. Move `q-agent-cleanup` and the corresponding test into Neutron tree. + Add shell entry point into Neutron package. + 2. Refactor the code by re-using existing parts from Neutron. The CLI should + stay the same. + 3. Apply optimizations with replacing shell calls by iproute2 (native ovsdb if + applicable) + + +Alternatives +------------ + +Initially we discussed the proposal to move the whole clean-up from OCF to +Neutron agents. However there are cases where resources need to be cleared +by external stuff. + +Data model impact +----------------- + +None + +REST API impact +--------------- + +None + +Upgrade impact +-------------- + +None + +Security impact +--------------- + +None + +Notifications impact +-------------------- + +None + +Other end user impact +--------------------- + +None + +Performance Impact +------------------ + +Restart of agents should become faster when using native APIs instead of +shell commands. + +Plugin impact +------------- + +None + +Other deployer impact +--------------------- + +None + + +Developer impact +---------------- + +Patches into `q-agent-cleanup` will be tested with Neutron infra, thus making +changes less risky. + +Infrastructure impact +--------------------- + +None + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + shakhat + +Other contributors: + skolekonov + kkuznetsova + +Work Items +---------- + +* Implementation + + * Move `q-agent-cleanup` into Neutron tree + * Update Neutron package to install `q-agent-cleanup` shell utility + * Remove the old script from fuel-library tree + * Optimize resource cleanup to satisfy scale tests + +Dependencies +============ + +None + +Testing +======= + +Revisit disaster tests, add new if needed. The following cases should be +covered: restart of agent, death of agent, banning agent on a particular node. +The same scenarios should be ran manually at scale on large numbers (thousands) +of networks and routers. + +Documentation Impact +==================== + +None + +References +========== + +None