fuel-docs/pages/frequently-asked-questions/0030-howtos.rst

332 lines
10 KiB
ReStructuredText

.. raw:: pdf
PageBreak
HowTo Notes
===========
.. index:: HowTo: Create an XFS disk partition
.. _create-the-XFS-partition:
HowTo: Create an XFS disk partition
-----------------------------------
In most cases, Fuel creates the XFS partition for you. If for some reason you
need to create it yourself, use this procedure:
.. note:: Replace ``/dev/sdb`` with the appropriate block device you wish to
configure.
1. Create the partition itself::
fdisk /dev/sdb
n(for new)
p(for partition)
<enter> (to accept the defaults)
<enter> (to accept the defaults)
w(to save changes)
2. Initialize the XFS partition::
mkfs.xfs -i size=1024 -f /dev/sdb1
3. For a standard swift install, all data drives are mounted directly under
/srv/node, so first create the mount point::
mkdir -p /srv/node/sdb1
4. Finally, add the new partition to fstab so it mounts automatically, then
mount all current partitions::
echo "/dev/sdb1 /srv/node/sdb1 xfs
noatime,nodiratime,nobarrier,logbufs=8 0 0" >> /etc/fstab
mount -a
.. index:: HowTo: Redeploy a node from scratch
.. _Redeploy_node_from_scratch:
HowTo: Redeploy a node from scratch
------------------------------------
Compute and Cinder nodes can be redeployed in both multinode and multinode HA
configurations. However, controllers cannot be redeployed without compeltely
redeploying the environment. To do so, follow these steps:
1. Remove the node from your environment in the Fuel UI
2. Deploy Changes
3. Wait for the host to become available as an unallocated node
4. Add the node to the environment with the same role as before
5. Deploy Changes
.. _Enable_Disable_Galera_autorebuild:
.. index:: HowTo: Galera Cluster Autorebuild
HowTo: Enable/Disable Galera Cluster Autorebuild Mechanism
----------------------------------------------------------
By defaults Fuel reassembles Galera cluster automatically without need for any
user interaction.
To prevent `autorebuild feature` you shall do::
crm_attribute -t crm_config --name mysqlprimaryinit --delete
To re-enable `autorebuild feature` you should do::
crm_attribute -t crm_config --name mysqlprimaryinit --update done
.. index:: HowTo: Troubleshoot Corosync/Pacemaker
How To Troubleshoot Corosync/Pacemaker
--------------------------------------
Pacemaker and Corosync come with several CLI utilities that can help you
troubleshoot and understand what is going on.
crm - Cluster Resource Manager
++++++++++++++++++++++++++++++
This is the main pacemaker utility it shows you state of pacemaker cluster.
Several most popular commands that you can use to understand whether your
cluster is consistent:
**crm status**
This command shows you the main information about pacemaker cluster and state of
resources being managed::
crm(live)# status
============
Last updated: Tue May 14 15:13:47 2013
Last change: Mon May 13 18:36:56 2013 via cibadmin on fuel-controller-01
Stack: openais
Current DC: fuel-controller-01 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
5 Nodes configured, 5 expected votes
3 Resources configured.
============
Online: [ fuel-controller-01 fuel-controller-02 fuel-controller-03
fuel-controller-04 fuel-controller-05 ]
p_quantum-plugin-openvswitch-agent (ocf::pacemaker:quantum-agent-ovs): Started fuel-controller-01
p_quantum-dhcp-agent (ocf::pacemaker:quantum-agent-dhcp): Started fuel-controller-01
p_quantum-l3-agent (ocf::pacemaker:quantum-agent-l3): Started fuel-controller-01
**crm(live)# resource**
Here you can enter resource-specific commands::
crm(live)resource# status`
p_quantum-plugin-openvswitch-agent (ocf::pacemaker:quantum-agent-ovs) Started
p_quantum-dhcp-agent (ocf::pacemaker:quantum-agent-dhcp) Started
p_quantum-l3-agent (ocf::pacemaker:quantum-agent-l3) Started
**crm(live)resource# start|restart|stop|cleanup <resource_name>**
These commands allow you to respectively start, stop, and restart resources.
**cleanup**
The pacemaker cleanup command resets a resource's state on the node if it is
currently in a failed state or due to some unexpected operation, such as some
side effects of a SysVInit operation on the resource. In such an event,
pacemaker will manage it by itself, deciding which node will run the resource.
Example::
3 Nodes configured, 3 expected votes
3 Resources configured.
============
3 Nodes configured, 3 expected votes
16 Resources configured.
Online: [ controller-01 controller-02 controller-03 ]
vip__management_old (ocf::heartbeat:IPaddr2): Started controller-01
vip__public_old (ocf::heartbeat:IPaddr2): Started controller-02
Clone Set: clone_p_haproxy [p_haproxy]
Started: [ controller-01 controller-02 controller-03 ]
Clone Set: clone_p_mysql [p_mysql]
Started: [ controller-01 controller-02 controller-03 ]
Clone Set: clone_p_quantum-openvswitch-agent [p_quantum-openvswitch-agent]
Started: [ controller-01 controller-02 controller-03 ]
Clone Set: clone_p_quantum-metadata-agent [p_quantum-metadata-agent]
Started: [ controller-01 controller-02 controller-03 ]
p_quantum-dhcp-agent (ocf::mirantis:quantum-agent-dhcp): Started controller-01
p_quantum-l3-agent (ocf::mirantis:quantum-agent-l3): Started controller-03
In this case there were residual OpenStack agent processes that were started by
pacemaker in case of network failure and cluster partitioning. After the
restoration of connectivity pacemaker saw these duplicate resources running on
different nodes. You can let it clean up this situation automatically or, if you
do not want to wait, cleanup them manually.
.. seealso::
crm interactive help and documentation resources for Pacemaker
(e.g. http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.manual_config.html).
In some network scenarios one can get cluster split into several parts and
``crm status`` showing something like this::
On ctrl1
============
….
Online: [ ctrl1 ]
On ctrl2
============
….
Online: [ ctrl2 ]
On ctrl3
============
….
Online: [ ctrl3 ]
You can troubleshoot this by checking corosync connectivity between nodes.
There are several points:
1) Multicast should be enabled in the network, IP address configured as
multicast should not be filtered. The mcast port, a single udp port should
be accepted on the management network among all controllers
2) Corosync should start after network interfaces are activated.
3) `bindnetaddr` should be located in the management network or at least in
the same multicast reachable segment
You can check this in output of ``ip maddr show``:
.. code-block:: none
:emphasize-lines: 1,8
5: br-mgmt
link 33:33:00:00:00:01
link 01:00:5e:00:00:01
link 33:33:ff:a3:e2:57
link 01:00:5e:01:01:02
link 01:00:5e:00:00:12
inet 224.0.0.18
inet 239.1.1.2
inet 224.0.0.1
inet6 ff02::1:ffa3:e257
inet6 ff02::1
**corosync-objctl**
This command is used to get/set runtime corosync configuration values including
status of corosync redundant ring members::
runtime.totem.pg.mrp.srp.members.134245130.ip=r(0) ip(10.107.0.8)
runtime.totem.pg.mrp.srp.members.134245130.join_count=1
...
runtime.totem.pg.mrp.srp.members.201353994.ip=r(0) ip(10.107.0.12)
runtime.totem.pg.mrp.srp.members.201353994.join_count=1
runtime.totem.pg.mrp.srp.members.201353994.status=joined
If IP of the node is 127.0.0.1 it means that corosync started when only loopback
interfaces was available and bound to it.
If there is only one IP in members list that means there is corosync connectivity
issue because the node does not see the other ones. The same stays for the case
when members list is incomplete.
.. index:: HowTo: Smoke Test HA
How To Smoke Test HA
--------------------
To test if NeutrnoHA is working, simply shut down the node hosting, e.g.
Neutron agents (either gracefully or hardly). You should see agents start on
the other node::
# crm status
Online: [ fuel-controller-02 fuel-controller-03 fuel-controller-04 fuel-controller-05 ]
OFFLINE: [ fuel-controller-01 ]
p_quantum-plugin-openvswitch-agent (ocf::pacemaker:quantum-agent-ovs): Started fuel-controller-02
p_quantum-dhcp-agent (ocf::pacemaker:quantum-agent-dhcp): Started fuel-controller-02
p_quantum-l3-agent (ocf::pacemaker:quantum-agent-l3): Started fuel-controller-02
and see corresponding Neutron interfaces on the new Neutron node::
# ip link show
11: tap7b4ded0e-cb: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
12: qr-829736b7-34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
13: qg-814b8c84-8f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
You can also check ``ovs-vsctl show output`` to see that all corresponding
tunnels/bridges/interfaces are created and connected properly::
ce754a73-a1c4-4099-b51b-8b839f10291c
Bridge br-mgmt
Port br-mgmt
Interface br-mgmt
type: internal
Port "eth1"
Interface "eth1"
Bridge br-ex
Port br-ex
Interface br-ex
type: internal
Port "eth0"
Interface "eth0"
Port "qg-814b8c84-8f"
Interface "qg-814b8c84-8f"
type: internal
Bridge br-int
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port br-int
Interface br-int
type: internal
Port "tap7b4ded0e-cb"
tag: 1
Interface "tap7b4ded0e-cb"
type: internal
Port "qr-829736b7-34"
tag: 1
Interface "qr-829736b7-34"
type: internal
Bridge br-tun
Port "gre-1"
Interface "gre-1"
type: gre
options: {in_key=flow, out_key=flow, remote_ip="10.107.0.8"}
Port "gre-2"
Interface "gre-2"
type: gre
options: {in_key=flow, out_key=flow, remote_ip="10.107.0.5"}
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port "gre-3"
Interface "gre-3"
type: gre
options: {in_key=flow, out_key=flow, remote_ip="10.107.0.6"}
Port "gre-4"
Interface "gre-4"
type: gre
options: {in_key=flow, out_key=flow, remote_ip="10.107.0.7"}
Port br-tun
Interface br-tun
type: internal
ovs_version: "1.4.0+build0"