diff --git a/doc/source/index.rst b/doc/source/index.rst index 3782c717ca63..e413ba8777bd 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -186,6 +186,7 @@ Advanced testing and guides gmr testing/libvirt-numa testing/serial-console + testing/zero-downtime-upgrade Sample Configuration File ------------------------- diff --git a/doc/source/testing/zero-downtime-upgrade.rst b/doc/source/testing/zero-downtime-upgrade.rst new file mode 100644 index 000000000000..097d72a6ba69 --- /dev/null +++ b/doc/source/testing/zero-downtime-upgrade.rst @@ -0,0 +1,221 @@ +===================================== +Testing Zero Downtime Upgrade Process +===================================== + +Zero Downtime upgrade eliminates any disruption to nova API service +during upgrade. + +Nova API services are upgraded at the end. The basic idea of the zero downtime +upgrade process is to have the connections drain from the old API before +being upgraded. In this process, new connections go to the new API nodes +while old connections slowly drain from the old nodes. This ensures that the +user sees the max_supported API version as a monotonically increasing number. +There might be some performance degradation during the process due to slow +HTTP responses and delayed request handling, but there is no API downtime. + +This page describes how to test the zero downtime upgrade process. + +----------- +Environment +----------- + +* Multinode devstack environment with 2 nodes: + * controller - All services (N release) + * compute-api - Only n-cpu and n-api services (N release) + +* Highly available load balancer (HAProxy) on top of the n-api services. + This is required for zero downtime upgrade as it allows one n-api service + to run while we upgrade the other. See instructions to setup HAProxy below. + +----------------------------- +Instructions to setup HAProxy +----------------------------- + + +Install HAProxy and Keepalived on both nodes. + +.. code-block:: bash + + # apt-get install haproxy keepalived + +Let the kernel know that we intend to bind additional IP addresses that +won't be defined in the interfaces file. To do this, edit ``/etc/sysctl.conf`` +and add the following line: + +.. code-block:: INI + + net.ipv4.ip_nonlocal_bind=1 + +Make this take effect without rebooting. + +.. code-block:: bash + + # sysctl -p + +Configure HAProxy to add backend servers and assign virtual IP to the frontend. +On both nodes add the below HAProxy config: + +.. code-block:: bash + + # cd /etc/haproxy + # cat >> haproxy.cfg <> + default_backend nova-api + + backend nova-api + balance roundrobin + option tcplog + server controller 192.168.0.88:8774 check + server apicomp 192.168.0.89:8774 check + + EOF + +.. note:: + Just change the IP for log in the global section on each node. + +On both nodes add ``keepalived.conf``: + +.. code-block:: bash + + # cd /etc/keepalived + # cat >> keepalived.conf <> 0" | sudo socat /var/run/haproxy.sock stdio + +* OR disable service using: + + .. code-block:: bash + + # echo "disable server nova-api/<>" | sudo socat /var/run/haproxy.sock stdio + +* This allows the current node to complete all the pending requests. When this + is being upgraded, other api node serves the requests. This way we can + achieve zero downtime. +* Restart n-api service and enable n-api using the command: + + .. code-block:: bash + + # echo "enable server nova-api/<>" | sudo socat /var/run/haproxy.sock stdio + +* Drain connections from other old api node in the same way and upgrade. +* No tempest tests should fail since there is no API downtime. + +After maintenance window +''''''''''''''''''''''''' + +* Follow the steps from general rolling upgrade process to clear any cached + service version data and complete all online data migrations.