===================================== Testing Zero Downtime Upgrade Process ===================================== Zero Downtime upgrade eliminates any disruption to nova API service during upgrade. Nova API services are upgraded at the end. The basic idea of the zero downtime upgrade process is to have the connections drain from the old API before being upgraded. In this process, new connections go to the new API nodes while old connections slowly drain from the old nodes. This ensures that the user sees the max_supported API version as a monotonically increasing number. There might be some performance degradation during the process due to slow HTTP responses and delayed request handling, but there is no API downtime. This page describes how to test the zero downtime upgrade process. ----------- Environment ----------- * Multinode devstack environment with 2 nodes: * controller - All services (N release) * compute-api - Only n-cpu and n-api services (N release) * Highly available load balancer (HAProxy) on top of the n-api services. This is required for zero downtime upgrade as it allows one n-api service to run while we upgrade the other. See instructions to setup HAProxy below. ----------------------------- Instructions to setup HAProxy ----------------------------- Install HAProxy and Keepalived on both nodes. .. code-block:: bash # apt-get install haproxy keepalived Let the kernel know that we intend to bind additional IP addresses that won't be defined in the interfaces file. To do this, edit ``/etc/sysctl.conf`` and add the following line: .. code-block:: INI net.ipv4.ip_nonlocal_bind=1 Make this take effect without rebooting. .. code-block:: bash # sysctl -p Configure HAProxy to add backend servers and assign virtual IP to the frontend. On both nodes add the below HAProxy config: .. code-block:: bash # cd /etc/haproxy # cat >> haproxy.cfg <> default_backend nova-api backend nova-api balance roundrobin option tcplog server controller 192.168.0.88:8774 check server apicomp 192.168.0.89:8774 check EOF .. note:: Just change the IP for log in the global section on each node. On both nodes add ``keepalived.conf``: .. code-block:: bash # cd /etc/keepalived # cat >> keepalived.conf <> 0" | sudo socat /var/run/haproxy.sock stdio * OR disable service using: .. code-block:: bash # echo "disable server nova-api/<>" | sudo socat /var/run/haproxy.sock stdio * This allows the current node to complete all the pending requests. When this is being upgraded, other api node serves the requests. This way we can achieve zero downtime. * Restart n-api service and enable n-api using the command: .. code-block:: bash # echo "enable server nova-api/<>" | sudo socat /var/run/haproxy.sock stdio * Drain connections from other old api node in the same way and upgrade. * No tempest tests should fail since there is no API downtime. After maintenance window ''''''''''''''''''''''''' * Follow the steps from general rolling upgrade process to clear any cached service version data and complete all online data migrations.