Merge "Maintenance Mode update"

This commit is contained in:
Jenkins
2015-08-14 11:46:30 +00:00
committed by Gerrit Code Review
3 changed files with 145 additions and 136 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 91 KiB

After

Width:  |  Height:  |  Size: 46 KiB

View File

@@ -19,35 +19,35 @@ parameter in one of the following ways:
* by selecting the respective option in the boot menu in Ubuntu or CentOS; * by selecting the respective option in the boot menu in Ubuntu or CentOS;
* by forcing the reboot into maintenance mode from shell with the ``umm on`` * by forcing the reboot into maintenance mode from shell with
command; the :command:`umm on` command;
* automatically, by reaching an number of unclean-reboots specified in * automatically, by reaching an number of unclean-reboots specified in
REBOOT_COUNT parameters. ``REBOOT_COUNT`` parameters.
"Unclean reboot" means that system reboots unexpectedly without a `Unclean reboot` means that system reboots unexpectedly without a
direct call from the user. direct call from the user.
You can also disable the maintenance mode functionality You can also disable the maintenance mode functionality
if you do not need it (e.g. you do not want to if you do not need it (for example, you do not want to
be automatically booted into it every time). be automatically booted into it every time).
You can operate in maintenance mode through ssh or tty2. You can operate in maintenance mode through ssh or tty2.
A return back into normal mode is issued with the *umm off* command. A return back into normal mode is issued with the :command:`umm off`
command.
.. Note :: .. Note ::
If you manually start a service in the maintenance mode, it will not If you manually start a service in the maintenance mode, it will not
be automatically restarted when you put the system back in the normal be automatically restarted when you put the system back in the normal
mode with the *umm off* command. mode with the :command:`umm off` command.
Using the :command:`umm` command
--------------------------------
Using the ``umm`` command There are several parameters to use with the :command:`umm` command:
-------------------------
There are several parameters to use with the *umm* command:
- ``umm on [cmd]`` - enter the maintenance mode, and execute cmd when MM is reached; - ``umm on [cmd]`` - enter the maintenance mode, and execute cmd when MM is reached;
@@ -68,10 +68,10 @@ There are several parameters to use with the *umm* command:
- ``umm disable`` - disable the maintenance mode functionality. - ``umm disable`` - disable the maintenance mode functionality.
Configuring the ``UMM.conf`` file Configuring the `UMM.conf` file
--------------------------------- ---------------------------------
You can automate the maintenance mode start by editing the */etc/umm.conf* file. You can automate the maintenance mode start by editing the `/etc/umm.conf` file.
The configuration options are: The configuration options are:
@@ -82,20 +82,19 @@ The configuration options are:
where: where:
UMM UMM
will tell the system to go into the maintenance mode based on tells the system to go into the maintenance mode based on
the REBOOT_COUNT and COUNTER_RESET_TIME values. If the value is the ``REBOOT_COUNT`` and ``COUNTER_RESET_TIME`` values. If the value is
anything other than ``yes`` (or if the ``UMM.conf`` file is missing), the anything other than ``yes`` (or if the `UMM.conf` file is missing), the
system will go into the native Ubuntu recovery mode. system will go into the native Ubuntu recovery mode.
REBOOT_COUNT REBOOT_COUNT
determines the number of unclean reboots that will determines the number of unclean reboots that trigger the system to go
trigger the system to go into the maintenance mode; into the maintenance mode.
COUNTER_RESET_TIME
this is a time value in minutes after the system reboot when
"Unclean reboot" counter will be resetted.
COUNTER_RESET_TIME
determines the period of time (in minutes) before the `Unclean reboot`
counter reset.
Example of using MM on one node Example of using MM on one node
@@ -103,53 +102,57 @@ Example of using MM on one node
- Switching node into MM: - Switching node into MM:
:: .. code-block:: bash
:linenos:
root@node-1:~#umm on root@node-1:~#umm on
umm-gr start/running, process 6657 umm-gr start/running, process 6657
Broadcast message from root@node-1 Broadcast message from root@node-1
(/dev/pts/0) at 14:29 ... (/dev/pts/0) at 14:29 ...
The system is going down for reboot NOW! The system is going down for reboot NOW!
root@node-1:~# umm status root@node-1:~# umm status
rebooting rebooting
root@node-1:~# Connection to node-1 closed by remote host. root@node-1:~# Connection to node-1 closed by remote host.
Connection node-1:~# closed. Connection node-1:~# closed.
root@fuel:~#:~$ root@fuel:~#:~$
root@node-1:~#ssh root@node-1:~#ssh
root@node-1:~# umm status root@node-1:~# umm status
umm umm
root@node-1:~#ps -Af root@node-1:~#ps -Af
We can see only small set of working process. We can see only small set of working processes.
- Start the service: - Start the service:
:: .. code-block:: bash
:linenos:
root@node-1:~# /etc/init.d/apache2 start root@node-1:~# /etc/init.d/apache2 start
root@node-1:~# /etc/init.d/apache2 status root@node-1:~# /etc/init.d/apache2 status
Apache2 is running (pid 1907). Apache2 is running (pid 1907).
- Switch back to the working mode: - Switch back to the working mode:
:: .. code-block:: bash
:linenos:
root@node-1:~#umm off root@node-1:~#umm off
- Continue booting into working mode: - Continue booting into working mode:
:: .. code-block:: bash
:linenos:
root@node-1:~#umm status root@node-1:~#umm status
runlevel N 2 runlevel N 2
root@node-1:~#/etc/init.d/apache2 status root@node-1:~#/etc/init.d/apache2 status
Apache2 is running (pid 1907). Apache2 is running (pid 1907).
We can see that service was not restarted during switching from MM to We can see that service was not restarted during switching from MM to
@@ -157,115 +160,122 @@ Example of using MM on one node
- Check the state of the OpenStack services: - Check the state of the OpenStack services:
:: .. code-block:: bash
:linenos:
root@node-1:~#crm status root@node-1:~#crm status
- If you want to reach working mode by reboot, you should use the following - If you want to reach working mode by reboot, you should use the following
command: command:
:: .. code-block:: bash
:linenos:
root@node-1:~# umm off reboot umm-gr start/running, process 2825 root@node-1:~# umm off reboot umm-gr start/running, process 2825
Broadcast message from root@node-1 Broadcast message from root@node-1
(/dev/pts/0) at 11:23 ... (/dev/pts/0) at 11:23 ...
The system is going down for reboot NOW! The system is going down for reboot NOW!
root@node-1:~# Connection to node-1 closed by remote host. root@node-1:~# Connection to node-1 closed by remote host.
Connection to node-1 closed. Connection to node-1 closed.
[root@fuel ~]# [root@fuel ~]#
Example of putting all nodes into the maintenance mode at the same time Example of putting all nodes into the maintenance mode at the same time
----------------------------------------------------------------------- -----------------------------------------------------------------------
The following maintenance mode sequence is called "Last input First out". The following maintenance mode sequence is called `Last input First out`.
This guarantees that there is going to be the most recent data on This guarantees that there is going to be the most recent data on
the Cloud Infrastructure Controller (CIC) that comes back first. the Cloud Infrastructure Controller (CIC) that comes back first.
- Determine what nodes have Controller (CIC) role: - Determine which nodes have Controller (CIC) role:
:: .. code-block:: bash
:linenos:
[root@fuel ~]# fuel nodes [root@fuel ~]# fuel nodes
id | status | name | cluster| ip | mac | roles | pending_roles| online id | status | name | cluster| ip | mac | roles | pending_roles| online
---|--------|------------------|--------|-----------|-------------------|------------|--------------|------- ---|--------|------------------|--------|-----------|-------------------|------------|--------------|-------
2 | ready | Untitled (c0:02) | 1 | 10.20.0.4 | e6:6a:42:96:a4:45 | controller | | True 2 | ready | Untitled (c0:02) | 1 | 10.20.0.4 | e6:6a:42:96:a4:45 | controller | | True
4 | ready | Untitled (c0:04) | 1 | 10.20.0.6 | 66:10:2e:0c:12:4a | compute | | True 4 | ready | Untitled (c0:04) | 1 | 10.20.0.6 | 66:10:2e:0c:12:4a | compute | | True
1 | ready | Untitled (c0:01) | 1 | 10.20.0.3 | fa:a1:39:94:7f:4c | controller | | True 1 | ready | Untitled (c0:01) | 1 | 10.20.0.3 | fa:a1:39:94:7f:4c | controller | | True
3 | ready | Untitled (c0:03) | 1 | 10.20.0.5 | 82:cb:bb:50:40:47 | controller | | True 3 | ready | Untitled (c0:03) | 1 | 10.20.0.5 | 82:cb:bb:50:40:47 | controller | | True
- Copy id_rsa to the CICs for passwordless ssh authentification: - Copy ``id_rsa`` to the CICs for passwordless ssh authentification:
:: .. code-block:: bash
:linenos:
[root@fuel ~]# scp .ssh/id_rsa node-1:.ssh/id_rsa [root@fuel ~]# scp .ssh/id_rsa node-1:.ssh/id_rsa
Warning: Permanently added 'node-1' (RSA) to the list of known hosts. Warning: Permanently added 'node-1' (RSA) to the list of known hosts.
id_rsa 100% 1675 1.6KB/s 00:00 id_rsa 100% 1675 1.6KB/s 00:00
[root@fuel ~]# scp .ssh/id_rsa node-2:.ssh/id_rsa [root@fuel ~]# scp .ssh/id_rsa node-2:.ssh/id_rsa
Warning: Permanently added 'node-2' (RSA) to the list of known hosts. Warning: Permanently added 'node-2' (RSA) to the list of known hosts.
id_rsa 100% 1675 1.6KB/s 00:00 id_rsa 100% 1675 1.6KB/s 00:00
[root@fuel ~]# scp .ssh/id_rsa node-3:.ssh/id_rsa [root@fuel ~]# scp .ssh/id_rsa node-3:.ssh/id_rsa
Warning: Permanently added 'node-3' (RSA) to the list of known hosts. Warning: Permanently added 'node-3' (RSA) to the list of known hosts.
id_rsa 100% 1675 1.6KB/s 00:00 id_rsa 100% 1675 1.6KB/s 00:00
- Enforce switching into MM mode on all nodes: - Enforce switching into MM mode on all nodes:
:: .. code-block:: bash
:linenos:
[root@fuel ~]# ssh node-1 umm on ssh node-2 umm on ssh node-3 umm on [root@fuel ~]# ssh node-1 umm on ssh node-2 umm on ssh node-3 umm on
Warning: Permanently added 'node-1' (RSA) to the list of known hosts. Warning: Permanently added 'node-1' (RSA) to the list of known hosts.
umm-gr start/running, process 24318 umm-gr start/running, process 24318
Connection to node-1 closed by remote host. Connection to node-1 closed by remote host.
Connection to node-1 closed. Connection to node-1 closed.
[root@fuel ~]# [root@fuel ~]#
[root@fuel ~]# ssh -tt node-1 ssh -tt node-2 ssh -tt node-3 sleep 1 [root@fuel ~]# ssh -tt node-1 ssh -tt node-2 ssh -tt node-3 sleep 1
Warning: Permanently added 'node-1' (RSA) to the list of known hosts. Warning: Permanently added 'node-1' (RSA) to the list of known hosts.
ECDSA key fingerprint is 84:17:0d:ea:27:1f:4e:08:f7:54:b2:8c:fe:8a:13:1a. ECDSA key fingerprint is 84:17:0d:ea:27:1f:4e:08:f7:54:b2:8c:fe:8a:13:1a.
Are you sure you want to continue connecting (yes/no)? yes Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node-2,10.20.0.4' (ECDSA) Warning: Permanently added 'node-2,10.20.0.4' (ECDSA)
to the list of known hosts. established. to the list of known hosts. established.
ECDSA key fingerprint is ECDSA key fingerprint is
c3:c6:ca:7d:11:d3:53:01:15:64:20:f7:c7:44:fb:d1. c3:c6:ca:7d:11:d3:53:01:15:64:20:f7:c7:44:fb:d1.
Are you sure you want to continue connecting (yes/no)? yes Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node-3,192.168.0.6' (ECDSA) Warning: Permanently added 'node-3,192.168.0.6' (ECDSA)
to the list of known hosts. to the list of known hosts.
Connection to node-3 closed. Connection to node-3 closed.
Connection to node-2 closed. Connection to node-2 closed.
Connection to node-1 closed. [root@fuel ~]# Connection to node-1 closed. [root@fuel ~]#
- Wait until the last node reboots: - Wait until the last node reboots:
:: .. code-block:: bash
:linenos:
[root@fuel ~]# ssh node-3 [root@fuel ~]# ssh node-3
Warning: Permanently added 'node-3' (RSA) to the list of known hosts. Warning: Permanently added 'node-3' (RSA) to the list of known hosts.
Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.13.0-32-generic x86_64) Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.13.0-32-generic x86_64)
* Documentation: https://help.ubuntu.com/ * Documentation: https://help.ubuntu.com/
Last login: Tue Dec 23 05:55:47 2014 from 10.20.0.2 Last login: Tue Dec 23 05:55:47 2014 from 10.20.0.2
root@node-3:~# root@node-3:~#
Broadcast message from root@node-3 Broadcast message from root@node-3
(unknown) at 6:00 ... (unknown) at 6:00 ...
The system is going down for reboot NOW! The system is going down for reboot NOW!
Connection to node-3 closed by remote host. Connection to node-3 closed by remote host.
Connection to node-3 closed. Connection to node-3 closed.
[root@fuel ~]# [root@fuel ~]#
- Perform all the steps, planned for MM. - Perform all the steps planned for MM.
- Enforce a return back into normal mode in reverse state: - Enforce a return back into normal mode in reverse state:
:: .. code-block:: bash
:linenos:
[root@fuel ~]# ssh node-3 umm off [root@fuel ~]# ssh node-3 umm off
Warning: Permanently added 'node-3' (RSA) to the list of known hosts. Warning: Permanently added 'node-3' (RSA) to the list of known hosts.
[root@fuel ~]# ssh node-2 umm off [root@fuel ~]# ssh node-2 umm off
Warning: Permanently added 'node-2' (RSA) to the list of known hosts. Warning: Permanently added 'node-2' (RSA) to the list of known hosts.
[root@fuel ~]# ssh node-1 umm off [root@fuel ~]# ssh node-1 umm off
Warning: Permanently added 'node-1' (RSA) to the list of known hosts. Warning: Permanently added 'node-1' (RSA) to the list of known hosts.

View File

@@ -6,21 +6,20 @@ Maintenance Mode
Maintenance mode (MM) is a mode when the operating system on the node Maintenance mode (MM) is a mode when the operating system on the node
has only a critical set of working services that the system needs for has only a critical set of working services that the system needs for
basic network and disk operations. The purpose of the maintenance mode basic network and disk operations. The purpose of MM is to perform a system
is to do a system repair or run other service operations on the system. repair or run other maintenance operations on the system.
The implementation of maintenance mode in 15B is based on the Ubuntu
recovery mode. The system goes into a reboot and goes through the For switching to MM, the system shuts down and then goes through the regular
regular boot process until the system initialization stage (rc-sysinit). boot process until the system initialization stage (rc-sysinit).
This is where the system enters the maintenance mode with the network At that moment the system enters MM, the network and filesystem services
and filesystem services started. In this moment we have already started have already started. During the MM stage, the ``sshd`` and ``tty2``
network and filesystem. In MM stage are started sshd, tty2 and main MM services start, and the main MM service waits for the command to continue the boot flow.
service wait command for boot flow continue.
See the :ref:`mm-ops` section of the Operations guide for the details.
Here is a Cloud Infrastructure Controller boot flow scheme: Here is a Cloud Infrastructure Controller boot flow scheme:
.. image:: /_images/mm_bootflow.png .. image:: /_images/mm_bootflow.png
For more information, see:
- :ref:`mm-ops`.