Merge "Document testing process for zero downtime upgrade"
This commit is contained in:
commit
b3f54cbda8
@ -186,6 +186,7 @@ Advanced testing and guides
|
||||
gmr
|
||||
testing/libvirt-numa
|
||||
testing/serial-console
|
||||
testing/zero-downtime-upgrade
|
||||
|
||||
Sample Configuration File
|
||||
-------------------------
|
||||
|
221
doc/source/testing/zero-downtime-upgrade.rst
Normal file
221
doc/source/testing/zero-downtime-upgrade.rst
Normal file
@ -0,0 +1,221 @@
|
||||
=====================================
|
||||
Testing Zero Downtime Upgrade Process
|
||||
=====================================
|
||||
|
||||
Zero Downtime upgrade eliminates any disruption to nova API service
|
||||
during upgrade.
|
||||
|
||||
Nova API services are upgraded at the end. The basic idea of the zero downtime
|
||||
upgrade process is to have the connections drain from the old API before
|
||||
being upgraded. In this process, new connections go to the new API nodes
|
||||
while old connections slowly drain from the old nodes. This ensures that the
|
||||
user sees the max_supported API version as a monotonically increasing number.
|
||||
There might be some performance degradation during the process due to slow
|
||||
HTTP responses and delayed request handling, but there is no API downtime.
|
||||
|
||||
This page describes how to test the zero downtime upgrade process.
|
||||
|
||||
-----------
|
||||
Environment
|
||||
-----------
|
||||
|
||||
* Multinode devstack environment with 2 nodes:
|
||||
* controller - All services (N release)
|
||||
* compute-api - Only n-cpu and n-api services (N release)
|
||||
|
||||
* Highly available load balancer (HAProxy) on top of the n-api services.
|
||||
This is required for zero downtime upgrade as it allows one n-api service
|
||||
to run while we upgrade the other. See instructions to setup HAProxy below.
|
||||
|
||||
-----------------------------
|
||||
Instructions to setup HAProxy
|
||||
-----------------------------
|
||||
|
||||
|
||||
Install HAProxy and Keepalived on both nodes.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# apt-get install haproxy keepalived
|
||||
|
||||
Let the kernel know that we intend to bind additional IP addresses that
|
||||
won't be defined in the interfaces file. To do this, edit ``/etc/sysctl.conf``
|
||||
and add the following line:
|
||||
|
||||
.. code-block:: INI
|
||||
|
||||
net.ipv4.ip_nonlocal_bind=1
|
||||
|
||||
Make this take effect without rebooting.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# sysctl -p
|
||||
|
||||
Configure HAProxy to add backend servers and assign virtual IP to the frontend.
|
||||
On both nodes add the below HAProxy config:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# cd /etc/haproxy
|
||||
# cat >> haproxy.cfg <<EOF
|
||||
|
||||
global
|
||||
chroot /var/lib/haproxy
|
||||
user haproxy
|
||||
group haproxy
|
||||
daemon
|
||||
log 192.168.0.88 local0
|
||||
pidfile /var/run/haproxy.pid
|
||||
stats socket /var/run/haproxy.sock mode 600 level admin
|
||||
stats timeout 2m
|
||||
maxconn 4000
|
||||
|
||||
defaults
|
||||
log global
|
||||
maxconn 8000
|
||||
mode http
|
||||
option redispatch
|
||||
retries 3
|
||||
stats enable
|
||||
timeout http-request 10s
|
||||
timeout queue 1m
|
||||
timeout connect 10s
|
||||
timeout client 1m
|
||||
timeout server 1m
|
||||
timeout check 10s
|
||||
|
||||
frontend nova-api-vip
|
||||
bind 192.168.0.95:8282 <<ha proxy virtual ip>>
|
||||
default_backend nova-api
|
||||
|
||||
backend nova-api
|
||||
balance roundrobin
|
||||
option tcplog
|
||||
server controller 192.168.0.88:8774 check
|
||||
server apicomp 192.168.0.89:8774 check
|
||||
|
||||
EOF
|
||||
|
||||
.. note::
|
||||
Just change the IP for log in the global section on each node.
|
||||
|
||||
On both nodes add ``keepalived.conf``:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# cd /etc/keepalived
|
||||
# cat >> keepalived.conf <<EOF
|
||||
|
||||
global_defs {
|
||||
router_id controller
|
||||
}
|
||||
vrrp_script haproxy {
|
||||
script "killall -0 haproxy"
|
||||
interval 2
|
||||
weight 2
|
||||
}
|
||||
vrrp_instance 50 {
|
||||
virtual_router_id 50
|
||||
advert_int 1
|
||||
priority 101
|
||||
state MASTER
|
||||
interface eth0
|
||||
virtual_ipaddress {
|
||||
192.168.0.95 dev eth0
|
||||
}
|
||||
track_script {
|
||||
haproxy
|
||||
}
|
||||
}
|
||||
|
||||
EOF
|
||||
|
||||
.. note::
|
||||
Change priority on node2 to 100 ( or vice-versa). Add HAProxy virtual IP.
|
||||
|
||||
Restart keepalived service.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# service keepalived restart
|
||||
|
||||
Add ``ENABLED=1`` in ``/etc/default/haproxy`` and then restart HAProxy service.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# service haproxy restart
|
||||
|
||||
When both the services have restarted, node with the highest priority for keepalived
|
||||
claims the virtual IP. You can check which node claimed the virtual IP using:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# ip a
|
||||
|
||||
|
||||
------------------------------
|
||||
Zero Downtime upgrade process
|
||||
------------------------------
|
||||
|
||||
General rolling upgrade process:
|
||||
http://docs.openstack.org/developer/nova/upgrade.html#minimal-downtime-upgrade-process
|
||||
|
||||
Before Upgrade
|
||||
''''''''''''''
|
||||
|
||||
* Change nova-api endpoint in keystone to point to the HAProxy virtual IP.
|
||||
* Run tempest tests
|
||||
* Check if n-api services on both nodes are serving the requests.
|
||||
|
||||
Before maintenance window
|
||||
'''''''''''''''''''''''''
|
||||
|
||||
* Start the upgrade process with controller node.
|
||||
* Follow the steps from the general rolling upgrade process to install new code and sync the
|
||||
db for schema changes.
|
||||
|
||||
During maintenance window
|
||||
'''''''''''''''''''''''''
|
||||
|
||||
* Set compute option in upgrade_levels to auto in ``nova.conf``.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
[upgrade_levels]
|
||||
compute = auto
|
||||
|
||||
* Starting with n-cond restart all services except n-api and n-cpu.
|
||||
* In small batches gracefully shutdown nova-cpu, then start n-cpu service
|
||||
with new version of the code.
|
||||
* Run tempest tests.
|
||||
* Drain connections on n-api while the tempest tests are running.
|
||||
HAProxy allows you to drain the connections by setting weight to zero:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# echo "set weight nova-api/<<server>> 0" | sudo socat /var/run/haproxy.sock stdio
|
||||
|
||||
* OR disable service using:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# echo "disable server nova-api/<<server>>" | sudo socat /var/run/haproxy.sock stdio
|
||||
|
||||
* This allows the current node to complete all the pending requests. When this
|
||||
is being upgraded, other api node serves the requests. This way we can
|
||||
achieve zero downtime.
|
||||
* Restart n-api service and enable n-api using the command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# echo "enable server nova-api/<<server>>" | sudo socat /var/run/haproxy.sock stdio
|
||||
|
||||
* Drain connections from other old api node in the same way and upgrade.
|
||||
* No tempest tests should fail since there is no API downtime.
|
||||
|
||||
After maintenance window
|
||||
'''''''''''''''''''''''''
|
||||
|
||||
* Follow the steps from general rolling upgrade process to clear any cached
|
||||
service version data and complete all online data migrations.
|
Loading…
Reference in New Issue
Block a user