Andrew Bonney a640ae8cc9 Prevent neutron-l3-agent killing keepalived on restart
Systemd processes use a default KillMode of 'control-group'
which causes all other processes spawned during execution to be
killed on service stop. Neutron expects the keepalived processes
it starts to remain running in order to prevent data-plane
interruptions for HA routers.

This change switches the systemd KillMode to process in order to
prevent this issue. In doing so we also have to clean up
non-keepalived processes started by neutron so that upon restart
everything is running from the latest virtualenv which may have
changed during an upgrade.

Change-Id: I958fda17e6207553466d8a7512e35c30b122c22c
Closes-Bug: #1846198
Depends-On: https://review.opendev.org/771770
2021-01-25 09:08:07 +00:00

99 lines
3.5 KiB
YAML

---
# Copyright 2014, Rackspace US, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
- name: Stop services
service:
name: "{{ item.service_name }}"
enabled: yes
state: "stopped"
daemon_reload: yes
with_items: "{{ filtered_neutron_services }}"
register: _stop
until: _stop is success
retries: 5
delay: 2
listen:
- "Restart neutron services"
- "venv changed"
# NOTE(cloudnull):
# When installing or upgrading it is possible that an old metadata proxy process will not
# be restarted by the metadata agent when a version changes. To fix it the ns-metadata
# proxy pids are killed if they're not running the current tag. Once the old processeses
# are removed the metadata agent will respawn the missing process within 60 seconds using
# the correct code.
- name: Run ns-metadata-proxy process cleanup
shell: |
for ns_pid in $(pgrep neutron-ns-meta); do
echo $(readlink -f "/proc/$ns_pid/exe") | grep -qv "{{ neutron_venv_tag }}"
if [ $? -eq 0 ]; then
if kill -9 "$ns_pid"; then
logger -s "old metadata proxy pid found and has been cleaned up on: \"$ns_pid\""
fi
fi
done
when: "'neutron-metadata-agent' in (filtered_neutron_services | map(attribute='service_key') | list)"
listen:
- "Restart neutron services"
- "venv changed"
# NOTE
# When restarting neutron-l3-agent, a non-default systemd KillMode of 'process' is used
# to prevent Keepalived from exiting and causing a data-plane outage. As a result of this
# some neutron processes remain running. In the case of an upgrade, these remaining
# processes will be running code from the previous version. This step ensures these
# orphaned processes are cleaned up correctly.
- name: Run neutron-l3-agent process cleanup
shell: |
for ns_pid in $(cat /sys/fs/cgroup/pids/neutron.slice/neutron-l3-agent.service/cgroup.procs); do
echo $(readlink -f "/proc/$ns_pid/exe") | grep -qv "keepalived"
if [ $? -eq 0 ]; then
if kill -9 "$ns_pid"; then
logger -s "old neutron-l3-agent pid found and has been cleaned up on: \"$ns_pid\""
fi
fi
done
when: "'neutron-l3-agent' in (filtered_neutron_services | map(attribute='service_key') | list)"
listen:
- "Restart neutron services"
- "venv changed"
- name: Perform a DB contract
command: "{{ neutron_bin }}/neutron-db-manage upgrade --contract"
become: yes
become_user: "{{ neutron_system_user_name }}"
delegate_to: "{{ groups[neutron_services['neutron-server']['group']][0] }}"
when:
- "ansible_local['openstack_ansible']['neutron']['need_db_contract'] | bool"
- "_neutron_is_first_play_host"
listen:
- "Restart neutron services"
- "venv changed"
- name: Start services
service:
name: "{{ item.service_name }}"
enabled: yes
state: "started"
daemon_reload: yes
with_items: "{{ filtered_neutron_services }}"
register: _start
until: _start is success
retries: 5
delay: 2
listen:
- "Restart neutron services"
- "venv changed"