Initial release

This commit is contained in:
Andres Rodriguez 2012-11-20 15:06:11 -05:00
commit 1c22ba36b4
19 changed files with 650 additions and 0 deletions

14
README Normal file
View File

@ -0,0 +1,14 @@
Overview
========
Usage
=====
Contact Information
===================
Technical Bootnotes
===================

37
TODO Normal file
View File

@ -0,0 +1,37 @@
HA Cluster (pacemaker/corosync) Charm
======================================
* Peer-relations
- make sure node was added to the cluster
- make sure node has been removed from the cluster (when deleting unit)
* One thing that can be done is to:
1. ha-relation-joined puts node in standby.
2. ha-relation-joined makes HA configuration
3. on hanode-relation-joined (2 or more nodes)
- services are stopped from upstart/lsb
- nodes are put in online mode
- services are loaded by cluster
- this way is not in HA until we have a second node.
* Needs to communicate the VIP to the top service
* TODO: Fix Disable upstart jobs
- sudo sh -c "echo 'manual' > /etc/init/SERVICE.override"
* BIG PROBlEM:
- given that we can only deploy hacluster once, and its config defines
the corosync configuration options, then we need to change the approach
on how the corosync is defined. Possible solution:
- in the 'service/charm' that uses hacluster, it will define the corosync options
- Instead of network source, it can define interfaces to use and assume each ethX
interface is connected to the same network and autodetect the network address.
* TODO: on juju destroy-server quantum, ha-relation-broken is executed.
we need to put nodes in standby or delete them.
* ERROR/BUG (discuss with jamespage):
- On add-unit in controller environment:
- subordinate (in added unit) gets the relation data in ha-relation-joined
- On add-unit in openstack
- subordinate (in added unit) *DOESN'T* get the the relation data in ha-relation-joined
- This is fine really cause we don't really need to re-add the services.
- However, the problem is that upstart jobs don't get stopped.
update-rc.d -f pacemaker remove
update-rc.d pacemaker start 50 1 2 3 4 5 . stop 01 0 6 .

36
config.yaml Normal file
View File

@ -0,0 +1,36 @@
options:
corosync_bindnetaddr:
type: string
description: |
Network address of the interface on which corosync will communicate
with the other nodes of the cluster.
corosync_mcastaddr:
default: 226.94.1.1
type: string
description: |
Multicast IP address to use for exchanging messages over the network.
If multiple clusters are on the same bindnetaddr network, this value
can be changed.
corosync_mcastport:
default: 5405
type: int
description: |
Multicast Port number to use for exchanging messages. If multiple
clusters sit on the same Multicast IP Address, this value needs to
be changed.
corosync_pcmk_ver:
default: 1
type: int
description: |
Service version for the Pacemaker service version. This will tell
Corosync how to start pacemaker
corosync_key:
type: string
description: |
This value will become the Corosync authentication key. To generate
a suitable value use:
.
corosync-keygen
.
This configuration element is mandatory and the service will fail on
install if it is not provided.

17
copyright Normal file
View File

@ -0,0 +1,17 @@
Format: http://dep.debian.net/deps/dep5/
Files: *
Copyright: Copyright 2011, Canonical Ltd., All Rights Reserved.
License: GPL-3
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.

1
hooks/config-changed Symbolic link
View File

@ -0,0 +1 @@
hooks.py

1
hooks/ha-relation-changed Symbolic link
View File

@ -0,0 +1 @@
hooks.py

1
hooks/ha-relation-departed Symbolic link
View File

@ -0,0 +1 @@
hooks.py

1
hooks/ha-relation-joined Symbolic link
View File

@ -0,0 +1 @@
hooks.py

190
hooks/hooks.py Executable file
View File

@ -0,0 +1,190 @@
#!/usr/bin/python
#
# Copyright 2012 Canonical Ltd.
#
# Authors:
# Andres Rodriguez <andres.rodriguez@canonical.com>
#
import glob
import os
import subprocess
import shutil
import sys
import time
import utils
import pcmk
def install():
utils.juju_log('INFO', 'Begin install hook.')
utils.configure_source()
utils.install('corosync', 'pacemaker', 'openstack-resource-agents')
utils.juju_log('INFO', 'End install hook.')
def emit_corosync_conf():
# read config variables
corosync_conf_context = {
'corosync_bindnetaddr': utils.config_get('corosync_bindnetaddr'),
'corosync_mcastaddr': utils.config_get('corosync_mcastaddr'),
'corosync_mcastport': utils.config_get('corosync_mcastport'),
'corosync_pcmk_ver': utils.config_get('corosync_pcmk_ver'),
}
# write /etc/default/corosync file
with open('/etc/default/corosync', 'w') as corosync_default:
corosync_default.write(utils.render_template('corosync', corosync_conf_context))
# write config file (/etc/corosync/corosync.conf
with open('/etc/corosync/corosync.conf', 'w') as corosync_conf:
corosync_conf.write(utils.render_template('corosync.conf', corosync_conf_context))
# write the authkey
corosync_key=utils.config_get('corosync_key')
with open(corosync_key, 'w') as corosync_key_file:
corosync_key_file.write(corosync_key)
def config_changed():
utils.juju_log('INFO', 'Begin config-changed hook.')
# validate configuration options
corosync_bindnetaddr = utils.config_get('corosync_bindnetaddr')
if corosync_bindnetaddr == '':
utils.juju_log('CRITICAL', 'No bindnetaddr supplied, cannot proceed.')
sys.exit(1)
corosync_key = utils.config_get('corosync_key')
if corosync_key == '':
utils.juju_log('CRITICAL',
'No Corosync key supplied, cannot proceed')
sys.exit(1)
# Create a new config file
emit_corosync_conf()
utils.juju_log('INFO', 'End config-changed hook.')
def upgrade_charm():
utils.juju_log('INFO', 'Begin upgrade-charm hook.')
emit_corosync_conf()
utils.juju_log('INFO', 'End upgrade-charm hook.')
def start():
if utils.running("corosync"):
utils.restart("corosync")
else:
utils.start("corosync")
# TODO: Only start pacemaker after making sure
# corosync has been started
# Wait a few seconds for corosync to start.
time.sleep(2)
if utils.running("pacemaker"):
utils.restart("pacemaker")
else:
utils.start("pacemaker")
def stop():
service("corosync", "stop")
time.sleep(2)
service("pacemaker", "stop")
def ha_relation():
utils.juju_log('INFO', 'Begin ha relation joined/changed hook')
pcmk.wait_for_pcmk()
cmd = "crm configure property stonith-enabled=false"
pcmk.commit(cmd)
cmd = "crm configure property no-quorum-policy=ignore"
pcmk.commit(cmd)
cmd = 'crm configure rsc_defaults $id="rsc-options" resource-stickiness="100"'
pcmk.commit(cmd)
# Obtain relation information
import ast
resources = {} if utils.relation_get("resources") is None else ast.literal_eval(utils.relation_get("resources"))
resource_params = {} if utils.relation_get("resource_params") is None else ast.literal_eval(utils.relation_get("resource_params"))
groups = {} if utils.relation_get("groups") is None else ast.literal_eval(utils.relation_get("groups"))
orders = {} if utils.relation_get("orders") is None else ast.literal_eval(utils.relation_get("orders"))
colocations = {} if utils.relation_get("colocations") is None else ast.literal_eval(utils.relation_get("colocations"))
clones = {} if utils.relation_get("clones") is None else ast.literal_eval(utils.relation_get("clones"))
init_services = {} if utils.relation_get("init_services") is None else ast.literal_eval(utils.relation_get("init_services"))
# Configuring the Resource
for res_name,res_type in resources.iteritems():
# disable the service we are going to put in HA
if res_type.split(':')[0] == "lsb":
utils.disable_lsb_services(res_type.split(':')[1])
if utils.running(res_type.split(':')[1]):
utils.stop(res_type.split(':')[1])
elif len(init_services) != 0 and res_name in init_services and init_services[res_name]:
utils.disable_upstart_services(init_services[res_name])
if utils.running(init_services[res_name]):
utils.stop(init_services[res_name])
# Put the services in HA, if not already done so
if not pcmk.is_resource_present(res_name):
if resource_params[res_name] is None:
cmd = 'crm -F configure primitive %s %s' % (res_name, res_type)
else:
cmd = 'crm -F configure primitive %s %s %s' % (res_name, res_type, resource_params[res_name])
pcmk.commit(cmd)
utils.juju_log('INFO', '%s' % cmd)
# Configuring groups
for grp_name, grp_params in groups.iteritems():
cmd = 'crm -F configure group %s %s' % (grp_name, grp_params)
pcmk.commit(cmd)
utils.juju_log('INFO', '%s' % cmd)
# Configuring ordering
for ord_name, ord_params in orders.iteritems():
cmd = 'crm -F configure order %s %s' % (ord_name, ord_params)
pcmk.commit(cmd)
utils.juju_log('INFO', '%s' % cmd)
# Configuring colocations
for col_name, col_params in colocations.iteritems():
cmd = 'crm -F configure colocation %s %s' % (col_name, col_params)
pcmk.commit(cmd)
utils.juju_log('INFO', '%s' % cmd)
# Configuring clones
for cln_name, cln_params in clones.iteritems():
cmd = 'crm -F configure clone %s %s' % (cln_name, cln_params)
pcmk.commit(cmd)
utils.juju_log('INFO', '%s' % cmd)
utils.juju_log('INFO', 'End ha relation joined/changed hook')
def ha_relation_departed():
# TODO: Fin out which node is departing and put it in standby mode.
# If this happens, and a new relation is created in the same machine
# (which already has node), then check whether it is standby and put it
# in online mode. This should be done in ha_relation_joined.
cmd = "crm -F node standby %s" % utils.get_unit_hostname()
pcmk.commit(cmd)
utils.do_hooks({
'config-changed': config_changed,
'install': install,
'start': start,
'stop': stop,
'upgrade-charm': upgrade_charm,
'ha-relation-joined': ha_relation,
'ha-relation-changed': ha_relation,
'ha-relation-departed': ha_relation_departed,
#'hanode-relation-departed': hanode_relation_departed, # TODO: should probably remove nodes from the cluster
})
sys.exit(0)

1
hooks/install Symbolic link
View File

@ -0,0 +1 @@
hooks.py

31
hooks/pcmk.py Normal file
View File

@ -0,0 +1,31 @@
import utils
import commands
import re
import subprocess
#def is_quorum():
#import time
#def is_leader():
def wait_for_pcmk():
crm_up = None
while not crm_up:
(status, output) = commands.getstatusoutput("crm node list")
show_re = re.compile(utils.get_unit_hostname())
crm_up = show_re.search(output)
def commit(cmd):
subprocess.call(cmd.split())
#def wait_for_cluster():
# while (not is_running()):
# time.sleep(3)
def is_resource_present(resource):
(status, output) = commands.getstatusoutput("crm resource status %s" % resource)
if status != 0:
return False
return True

1
hooks/start Symbolic link
View File

@ -0,0 +1 @@
hooks.py

1
hooks/stop Symbolic link
View File

@ -0,0 +1 @@
hooks.py

1
hooks/upgrade-charm Symbolic link
View File

@ -0,0 +1 @@
hooks.py

225
hooks/utils.py Normal file
View File

@ -0,0 +1,225 @@
#
# Copyright 2012 Canonical Ltd.
#
# Authors:
# James Page <james.page@ubuntu.com>
# Paul Collins <paul.collins@canonical.com>
#
import commands
import os
import re
import subprocess
import socket
import sys
def do_hooks(hooks):
hook = os.path.basename(sys.argv[0])
try:
hooks[hook]()
except KeyError:
juju_log('INFO',
"This charm doesn't know how to handle '{}'.".format(hook))
def install(*pkgs):
cmd = [
'apt-get',
'-y',
'install'
]
for pkg in pkgs:
cmd.append(pkg)
subprocess.check_call(cmd)
TEMPLATES_DIR = 'templates'
try:
import jinja2
except ImportError:
install('python-jinja2')
import jinja2
def render_template(template_name, context, template_dir=TEMPLATES_DIR):
templates = jinja2.Environment(
loader=jinja2.FileSystemLoader(template_dir)
)
template = templates.get_template(template_name)
return template.render(context)
def configure_source():
source = config_get('source')
if (source.startswith('ppa:') or
source.startswith('cloud:')):
cmd = [
'add-apt-repository',
source
]
subprocess.check_call(cmd)
if source.startswith('http:'):
with open('/etc/apt/sources.list.d/hacluster.list', 'w') as apt:
apt.write("deb " + source + "\n")
key = config_get('key')
if key != "":
cmd = [
'apt-key',
'import',
key
]
subprocess.check_call(cmd)
cmd = [
'apt-get',
'update'
]
subprocess.check_call(cmd)
# Protocols
TCP = 'TCP'
UDP = 'UDP'
def expose(port, protocol='TCP'):
cmd = [
'open-port',
'{}/{}'.format(port, protocol)
]
subprocess.check_call(cmd)
def juju_log(severity, message):
cmd = [
'juju-log',
'--log-level', severity,
message
]
subprocess.check_call(cmd)
def relation_ids(relation):
cmd = [
'relation-ids',
relation
]
return subprocess.check_output(cmd).split() # IGNORE:E1103
def relation_list(rid):
cmd = [
'relation-list',
'-r', rid,
]
return subprocess.check_output(cmd).split() # IGNORE:E1103
def relation_get(attribute, unit=None, rid=None):
cmd = [
'relation-get',
]
if rid:
cmd.append('-r')
cmd.append(rid)
cmd.append(attribute)
if unit:
cmd.append(unit)
value = subprocess.check_output(cmd).strip() # IGNORE:E1103
if value == "":
return None
else:
return value
def relation_set(**kwargs):
cmd = [
'relation-set'
]
args = []
for k, v in kwargs.items():
if k == 'rid':
cmd.append('-r')
cmd.append(v)
else:
args.append('{}={}'.format(k, v))
cmd += args
subprocess.check_call(cmd)
def unit_get(attribute):
cmd = [
'unit-get',
attribute
]
return subprocess.check_output(cmd).strip() # IGNORE:E1103
def config_get(attribute):
cmd = [
'config-get',
attribute
]
return subprocess.check_output(cmd).strip() # IGNORE:E1103
def get_unit_hostname():
return socket.gethostname()
def get_host_ip(hostname=unit_get('private-address')):
cmd = [
'dig',
'+short',
hostname
]
return subprocess.check_output(cmd).strip() # IGNORE:E1103
def restart(*services):
for service in services:
subprocess.check_call(['service', service, 'restart'])
def stop(*services):
for service in services:
subprocess.check_call(['service', service, 'stop'])
def start(*services):
for service in services:
subprocess.check_call(['service', service, 'start'])
def running(service):
#output = subprocess.check_output(['service', service, 'status'])
output = commands.getoutput('service %s status' % service)
show_re = re.compile("start/running")
status = show_re.search(output)
if status:
return True
return False
def disable_upstart_services(*services):
for service in services:
#subprocess.check_call('sh -c "echo manual > /etc/init/%s.override"' % service, shell=True)
override = open("/etc/init/%s.override" % service, "w")
override.write("manual")
override.close()
def enable_upstart_services(*services):
for service in services:
path = '/etc/init/%s.override' % service
if os.path.exists(path):
subprocess.check_call(['rm', '-rf', path])
def disable_lsb_services(*services):
for service in services:
subprocess.check_call(['update-rc.d', '-f', service, 'remove'])
def enable_lsb_services(*services):
for service in services:
subprocess.call(['update-rc.d','-f',service,'defaults'])

17
metadata.yaml Normal file
View File

@ -0,0 +1,17 @@
name: hacluster
summary: Corosync Cluster Engine - membership, messaging and quorum
maintainer: Andres Rodriguez <andres.rodriguez@canonical.com>
subordinate: true
description: |
Corosync/Pacemaker
requires:
juju-info:
interface: juju-info
scope: container
provides:
ha:
interface: hacluster
scope: container
peers:
hanode:
interface: hacluster

1
revision Normal file
View File

@ -0,0 +1 @@
1

3
templates/corosync Normal file
View File

@ -0,0 +1,3 @@
# Configuration file created by the ha charm
# start corosync at boot [yes|no]
START=yes

71
templates/corosync.conf Normal file
View File

@ -0,0 +1,71 @@
# Config file generated by the ha charm.
totem {
version: 2
# How long before declaring a token lost (ms)
token: 3000
# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10
# How long to wait for join messages in the membership protocol (ms)
join: 60
# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
consensus: 3600
# Turn off the virtual synchrony filter
vsftype: none
# Number of messages that may be sent by one processor on receipt of the token
max_messages: 20
# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes
# Disable encryption
secauth: off
# How many threads to use for encryption/decryption
threads: 0
# Optionally assign a fixed node id (integer)
# nodeid: 1234
# This specifies the mode of redundant ring, which may be none, active, or passive.
rrp_mode: none
interface {
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: {{ corosync_bindnetaddr }}
mcastaddr: {{ corosync_mcastaddr }}
mcastport: {{ corosync_mcastport }}
}
}
amf {
mode: disabled
}
service {
# Load the Pacemaker Cluster Resource Manager
ver: {{ corosync_pcmk_ver }}
name: pacemaker
}
logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}