Add autoscaling with self-healing scenario

A scenario for self-healing and auto-scaling with Heat, Mistral,
Zaqar, and Aodh.

Change-Id: I652c3b0e0caa433bfd08c0cb35b21507f0897889
This commit is contained in:
ricolin 2018-09-13 17:49:49 -06:00 committed by Adam Spiers
parent 6161217a88
commit bf603c75cc
16 changed files with 1669 additions and 0 deletions

View File

@ -9,3 +9,4 @@ a starting point.
:maxdepth: 1
use-cases/nic-failure-affects-instance-and-app.rst
use-cases/heat-mistral-aodh.rst

View File

@ -0,0 +1,127 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode
=============================================================
Self-healing and auto-scaling with Heat, Mistral, Zaqar, Aodh
=============================================================
This use case describes an existing scenario for auto-scaling a
self-healing cluster.
OpenStack projects used
=======================
* Heat
* Mistral
* Zaqar
* Telemetry (Aodh, Pinko, Gnocchi)
Remediation class
=================
Reactive
Fault detection
===============
The self-healing scenario detect failure like when any instance in
cluster been delete, or updated out side of this automatic flow.
Inputs and decision-making
==========================
* A cluster builder (Heat):
You can use Heat to build cluster, managing resource dependancies,
plan some flexibility with batch action, and also reproducable to
another OpenStack environment with same template. Whatever you're
using for building cluster, you need to make sure it allow to your
flow control service (Mistral) to trigger update and replace
unhealthy node in cluster.
* A monitoring service (Aodh):
There are multiple choices for self-healing monitors to monitor that
cluster and notify queue service; you can absolutely replace any
monitoring service in this showcase and use your own existing
services. Just make sure you're able to monitoring nodes in cluster
and able to send notification out when alarm triggered.
* A queue service (Zaqar):
A queue or any services that accept notification from alarm service
and capable to trigger healing flow.
* A triggerable flow control service (Mistral):
A triggerable flow control service to run healing flow. Make sure
your flow contains most cases. So we can guarantee that cluster is
healed. Covering most successes and failure cases (as many as you
can) is always a good idea.
We're using Heat, Aodh, Zaqar, and Mistral in this example. You can
always replace any service with your own services. Just make sure
they're fully integrated to each other for your healing scenario.
Remediation
===========
In our example, for each unit in the cluster, we set up a Nova
instance to run application on, binding that instance to a
load-balancer so the application can use lbaas in its infrastructure.
For healing, we create multiple event alarms monitoring the status of
that instance, a Zaqar queue for alarms to send notifications on, and
a Zaqar MistralTrigger (a resource type in Heat to use Zaqar
subscription to trigger Mistral workflow) to listen on that queue and
trigger Mistral workflow to mark this unit as unhealthy and then use
Heat stack update to replace unhealthy unit in cluster.
At this point you already got your structure for self-healing. As you
can see in the example, we are monitoring events like
``compute.instance.delete.*``. If you also use Aodh, please make sure
the event you're monitoring is enabled from services and ceilometer
(be aware that some events are disabled by default).
The event alarm going to send a ``POST`` request to Zaqar queue if the
instance status changed to delete. Again, it's possible to use your
own alarm services.
Once Zaqar queue gets that information, it will trigger a Mistral
workflow. That workflow will mark that unit (in this case it's a
server with nginx) as unhealthy in Heat and trigger a stack
update. Heat will replace all unhealthy unit and regenerate one for
your cluster.
Existing implementation(s)
==========================
With ability to self-heal, you can (as we show in this case) combine
with autoscaling ability. Also as you can notice, we use Heat
container agent, so your image only need to support ability to run
container (instead of pre-install services with
diskimage-builder). Here is the example Heat template :doc:`sample
files <heat-mistral-aodh>`. Also here's link for relevant `slides
(with video link)
<https://www.slideshare.net/GuanYuLin1/autoscale-a-selfhealing-cluster-in-openstack-with-heat>`_.
Future work
===========
We need better support for scaling with some resources require trust.
All further work is tracked in `the story "Improve Autoscaling
integration with Self-healing"
<https://storyboard.openstack.org/#!/story/2003690>`_. Also
integration with Vitrage is work in progress tracked in `the story
"Achieve Self healing with Vitrage Mistral And Heat"
<https://storyboard.openstack.org/#!/story/2002684>`_.

View File

@ -0,0 +1,38 @@
FROM registry.fedoraproject.org/fedora:26
MAINTAINER “Rico Lin” <rico.lin.guanyu@gmail.com>
ENV container docker
RUN dnf -y --setopt=tsflags=nodocs install \
redhat-rpm-config gcc python2-devel python3-devel docker findutils os-collect-config os-apply-config \
os-refresh-config dib-utils python-pip python-docker-py \
python-yaml python-zaqarclient && \
dnf clean all
# pip installing dpath as python-dpath is an older version of dpath
# install docker-compose
RUN pip install --no-cache dpath docker-compose python-zaqarclient oslo.log psutil
ADD ./scripts/55-heat-config \
/opt/stack/os-config-refresh/configure.d/
ADD ./scripts/50-heat-config-docker-compose \
/opt/stack/os-config-refresh/configure.d/
ADD ./scripts/hooks/* \
/var/lib/heat-config/hooks/
ADD ./scripts/heat-config-notify \
/usr/bin/heat-config-notify
ADD ./scripts/configure_container_agent.sh /tmp/
RUN chmod 700 /tmp/configure_container_agent.sh
RUN /tmp/configure_container_agent.sh
#create volumes to share the host directories
VOLUME [ "/var/lib/cloud"]
VOLUME [ "/var/lib/heat-cfntools" ]
#set DOCKER_HOST environment variable that docker-compose would use
ENV DOCKER_HOST unix:///var/run/docker.sock
CMD /usr/bin/os-collect-config

View File

@ -0,0 +1,218 @@
heat_template_version: rocky
description: AutoScaling a selfhealing cluster
parameters:
image:
type: string
description: Image used for servers
key:
type: string
description: SSH key to connect to the servers
flavor:
type: string
description: flavor used by the web servers
security_group:
type: string
description: security_group used by the web servers
network:
type: string
description: Network used by the server
subnet:
type: string
description: subnet on which the load balancer will be located
external_network:
type: string
description: UUID or Name of a Neutron external network
resources:
start_container_agent:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config: {get_file: ./start-container-agent.sh}
asg:
type: OS::Heat::AutoScalingGroup
properties:
min_size: 2
desired_capacity: 2
max_size: 6
rolling_updates:
min_in_service: 1
max_batch_size: 1
pause_time: 10
resource:
type: lb_server.yaml
properties:
root_stack_id: {get_param: "OS::stack_id"}
flavor: {get_param: flavor}
external_network: {get_param: external_network}
security_group: {get_param: security_group}
image: {get_param: image}
key_name: {get_param: key}
network: {get_param: network}
subnet: {get_param: subnet}
pool_id: {get_resource: pool}
metadata: {"metering.server_group": {get_param: "OS::stack_id"}}
user_data: {get_attr: [start_container_agent, config]}
web_server_scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: 1
web_server_scaledown_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: -1
scaleup_policy_percent:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: percent_change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 5
scaling_adjustment: 50
min_adjustment_step: 1
scaledown_policy_percent:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: percent_change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 5
scaling_adjustment: -50
min_adjustment_step: 1
cpu_alarm_high:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
description: Scale up if CPU > 80%
metric: cpu_util
aggregation_method: mean
granularity: 600
evaluation_periods: 1
threshold: 80
resource_type: instance
comparison_operator: gt
alarm_actions:
- str_replace:
template: trust+url
params:
url: {get_attr: [web_server_scaleup_policy, signal_url]}
query:
list_join:
- ''
- - {'=': {server_group: {get_param: "OS::stack_id"}}}
cpu_alarm_low:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
description: Scale down if CPU < 15% for 10 minutes
metric: cpu_util
aggregation_method: mean
granularity: 600
evaluation_periods: 1
threshold: 15
resource_type: instance
comparison_operator: lt
alarm_actions:
- str_replace:
template: trust+url
params:
url: {get_attr: [web_server_scaledown_policy, signal_url]}
query:
list_join:
- ''
- - {'=': {server_group: {get_param: "OS::stack_id"}}}
lb:
type: OS::Octavia::LoadBalancer
#type: OS::Neutron::LBaaS::LoadBalancer
properties:
vip_subnet: {get_param: subnet}
listener:
type: OS::Octavia::Listener
#type: OS::Neutron::LBaaS::Listener
properties:
loadbalancer: {get_resource: lb}
protocol: HTTP
protocol_port: 80
pool:
type: OS::Octavia::Pool
#type: OS::Neutron::LBaaS::Pool
properties:
listener: {get_resource: listener}
lb_algorithm: ROUND_ROBIN
protocol: HTTP
#session_persistence:
# type: SOURCE_IP
#
lb_monitor:
#type: OS::Neutron::LBaaS::HealthMonitor
type: OS::Octavia::HealthMonitor
properties:
pool: { get_resource: pool }
type: TCP
delay: 5
max_retries: 5
timeout: 5
# assign a floating ip address to the load balancer
# pool.
lb_floating:
type: OS::Neutron::FloatingIP
properties:
floating_network: {get_param: external_network}
port_id: {get_attr: [lb, vip_port_id]}
outputs:
scale_up_url:
description: >
This URL is the webhook to scale up the autoscaling group. You
can invoke the scale-up operation by doing an HTTP POST to this
URL; no body nor extra headers are needed.
value: {get_attr: [web_server_scaleup_policy, alarm_url]}
scale_dn_url:
description: >
This URL is the webhook to scale down the autoscaling group.
You can invoke the scale-down operation by doing an HTTP POST to
this URL; no body nor extra headers are needed.
value: {get_attr: [web_server_scaledown_policy, alarm_url]}
scale_up_percent_url:
value: {get_attr: [scaleup_policy_percent, signal_url]}
scale_down_percent_url:
value: {get_attr: [scaledown_policy_percent, signal_url]}
pool_ip_address:
value: {get_attr: [lb, vip_address]}
description: The IP address of the load balancing pool
website_url:
value:
str_replace:
template: http://host/rico.html
params:
host: { get_attr: [lb_floating, floating_ip_address] }
description: >
This URL is the "external" URL that can be used to access the
Wordpress site.
gnocchi_query:
value:
str_replace:
template: >
gnocchi measures aggregation --resource-type instance
--query 'server_group="stackval"'
--granularity 300 --aggregation mean -m cpu_util
params:
stackval: { get_param: "OS::stack_id" }
description: >
This is a Gnocchi query for statistics on the cpu_util measurements about
OS::Nova::Server instances in this stack. The --resource-type select the
type of Gnocchi resource. The --query parameter filters resources
according to its attributes. When a VM's metadata includes an item of the
form metering.server_group=X, the corresponding Gnocchi resource has a
attribute named server_group that can queried with 'server_group="X"' In
this case the nested stacks give their VMs metadata that is passed as a
nested stack parameter, and this stack passes a metadata of the form
metering.server_group=X, where X is this stack's ID.

View File

@ -0,0 +1,101 @@
#!/bin/bash
set -eux
# os-apply-config templates directory
oac_templates=/usr/libexec/os-apply-config/templates
mkdir -p $oac_templates/etc
# initial /etc/os-collect-config.conf
cat <<EOF >/etc/os-collect-config.conf
[DEFAULT]
command = os-refresh-config
EOF
# template for building os-collect-config.conf for polling heat
cat <<EOF >$oac_templates/etc/os-collect-config.conf
[DEFAULT]
{{^os-collect-config.command}}
command = os-refresh-config
{{/os-collect-config.command}}
{{#os-collect-config}}
{{#command}}
command = {{command}}
{{/command}}
{{#polling_interval}}
polling_interval = {{polling_interval}}
{{/polling_interval}}
{{#cachedir}}
cachedir = {{cachedir}}
{{/cachedir}}
{{#collectors}}
collectors = {{.}}
{{/collectors}}
{{#cfn}}
[cfn]
{{#metadata_url}}
metadata_url = {{metadata_url}}
{{/metadata_url}}
stack_name = {{stack_name}}
secret_access_key = {{secret_access_key}}
access_key_id = {{access_key_id}}
path = {{path}}
{{/cfn}}
{{#heat}}
[heat]
auth_url = {{auth_url}}
user_id = {{user_id}}
password = {{password}}
project_id = {{project_id}}
stack_id = {{stack_id}}
resource_name = {{resource_name}}
{{/heat}}
{{#zaqar}}
[zaqar]
auth_url = {{auth_url}}
user_id = {{user_id}}
password = {{password}}
project_id = {{project_id}}
queue_id = {{queue_id}}
{{/zaqar}}
{{#request}}
[request]
{{#metadata_url}}
metadata_url = {{metadata_url}}
{{/metadata_url}}
{{/request}}
{{/os-collect-config}}
EOF
mkdir -p $oac_templates/var/run/heat-config
# template for writing heat deployments data to a file
echo "{{deployments}}" > $oac_templates/var/run/heat-config/heat-config
# os-refresh-config scripts directory
# This moves to /usr/libexec/os-refresh-config in later releases
orc_scripts=/opt/stack/os-config-refresh
for d in pre-configure.d configure.d migration.d post-configure.d; do
install -m 0755 -o root -g root -d $orc_scripts/$d
done
# os-refresh-config script for running os-apply-config
cat <<EOF >$orc_scripts/configure.d/20-os-apply-config
#!/bin/bash
set -ue
exec os-apply-config
EOF
chmod 700 $orc_scripts/configure.d/20-os-apply-config
chmod 700 /opt/stack/os-config-refresh/configure.d/55-heat-config
chmod 700 /opt/stack/os-config-refresh/configure.d/50-heat-config-docker-compose
chmod 755 /var/lib/heat-config/hooks/atomic
chmod 755 /var/lib/heat-config/hooks/docker-compose
chmod 755 /var/lib/heat-config/hooks/script
chmod 755 /usr/bin/heat-config-notify

View File

@ -0,0 +1,8 @@
parameters:
key: key
flavor: flavor1
image: atomic-img
network: network
external_network: public
subnet: subnet
security_group: sg_all_open

View File

@ -0,0 +1,199 @@
heat_template_version: rocky
description: A load-balancer server
parameters:
image:
type: string
description: Image used for servers
key_name:
type: string
description: SSH key to connect to the servers
flavor:
type: string
description: flavor used by the servers
security_group:
type: string
description: security_group used by the web servers
pool_id:
type: string
description: Pool to contact
user_data:
type: string
description: Server user_data
metadata:
type: json
network:
type: string
description: Network used by the server
subnet:
type: string
description: Subnet used by the server
external_network:
type: string
description: UUID or Name of a Neutron external network
root_stack_id:
type: string
default: ""
conditions:
is_standalone: {equals: [{get_param: root_stack_id}, ""]}
resources:
config:
type: OS::Heat::SoftwareConfig
properties:
group: script
inputs:
- name: host
- name: version
outputs:
- name: result
config:
get_file: nginx-script.sh
deployment:
type: OS::Heat::SoftwareDeployment
properties:
config:
get_resource: config
server:
get_resource: server
input_values:
host: { get_attr: [server, first_address] }
version: "v1.0.0"
server:
type: OS::Nova::Server
properties:
flavor: {get_param: flavor}
security_groups: [{get_param: security_group} ]
image: {get_param: image}
key_name: {get_param: key_name}
metadata: {get_param: metadata}
user_data: {get_param: user_data}
user_data_format: SOFTWARE_CONFIG
networks:
- {network: {get_param: network} }
member:
#type: OS::Neutron::LBaaS::PoolMember
type: OS::Octavia::PoolMember
properties:
pool: {get_param: pool_id}
address: {get_attr: [server, first_address]}
protocol_port: 80
subnet: {get_param: subnet}
server_floating_ip_assoc:
type: OS::Neutron::FloatingIPAssociation
properties:
floatingip_id: {get_resource: floating_ip}
port_id: {get_attr: [server, addresses, {get_param: network}, 0, port]}
floating_ip:
type: OS::Neutron::FloatingIP
properties:
floating_network: {get_param: external_network}
alarm_queue:
type: OS::Zaqar::Queue
stop_event_alarm:
type: OS::Aodh::EventAlarm
properties:
event_type: compute.instance.update
query:
- field: traits.instance_id
value: {get_resource: server}
op: eq
- field: traits.state
value: stopped
op: eq
alarm_queues:
- {get_resource: alarm_queue}
error_event_alarm:
type: OS::Aodh::EventAlarm
properties:
event_type: compute.instance.update
query:
- field: traits.instance_id
value: {get_resource: server}
op: eq
- field: traits.state
value: error
op: eq
alarm_queues:
- {get_resource: alarm_queue}
deleted_event_alarm:
type: OS::Aodh::EventAlarm
properties:
event_type: compute.instance.delete.*
query:
- field: traits.instance_id
value: {get_resource: server}
op: eq
alarm_queues:
- {get_resource: alarm_queue}
# The Aodh event alarm does not take effect immediately; it may take up to
# 60s (by default) for the event_alarm_cache_ttl to expire and the tenant's
# alarm data to be loaded. This resource ensures the stack is not completed
# until the alarm is active. See https://bugs.launchpad.net/aodh/+bug/1651273
alarm_cache_wait:
type: OS::Heat::TestResource
properties:
action_wait_secs:
create: 60
update: 60
value:
list_join:
- ''
- - {get_attr: [stop_event_alarm, show]}
- {get_attr: [error_event_alarm, show]}
- {get_attr: [deleted_event_alarm, show]}
alarm_subscription:
type: OS::Zaqar::MistralTrigger
properties:
queue_name: {get_resource: alarm_queue}
workflow_id: {get_resource: autoheal}
input:
stack_id: {get_param: "OS::stack_id"}
root_stack_id:
if:
- is_standalone
- {get_param: "OS::stack_id"}
- {get_param: "root_stack_id"}
autoheal:
type: OS::Mistral::Workflow
properties:
description: >
Mark a server as unhealthy and commence a stack update to replace it.
input:
stack_id:
root_stack_id:
type: direct
tasks:
- name: resources_mark_unhealthy
action:
list_join:
- ' '
- - heat.resources_mark_unhealthy
- stack_id=<% $.stack_id %>
- resource_name=<% env().notification.body.reason_data.event.traits.where($[0] = 'instance_id').select($[2]).first() %>
- mark_unhealthy=true
- resource_status_reason='Marked by alarm'
on_success:
- stacks_update
- name: stacks_update
action: heat.stacks_update stack_id=<% $.root_stack_id %> existing=true
outputs:
server_ip:
description: IP Address of the load-balanced server.
value: { get_attr: [server, first_address] }
lb_member:
description: LB member details.
value: { get_attr: [member, show] }

View File

@ -0,0 +1,11 @@
#!/bin/sh -x
# use docker pull nginx if you not using local registry hub
docker pull 192.168.1.101:5000/nginx
docker run -d --name nginx-container -p 80:80 192.168.1.101:5000/nginx
echo "Rico Lin said hello from Vietnam InfraDay to $host::$version" > rico.html
docker cp $PWD/rico.html nginx-container:/usr/share/nginx/html/
cat rico.html > $heat_outputs_path.result
echo "Output to stderr" 1>&2

View File

@ -0,0 +1,116 @@
#!/usr/bin/env python
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import json
import logging
import os
import subprocess
import sys
import yaml
CONF_FILE = os.environ.get('HEAT_SHELL_CONFIG',
'/var/run/heat-config/heat-config')
DOCKER_COMPOSE_DIR = os.environ.get(
'HEAT_DOCKER_COMPOSE_WORKING',
'/var/lib/heat-config/heat-config-docker-compose')
DOCKER_COMPOSE_CMD = os.environ.get('HEAT_DOCKER_COMPOSE_CMD',
'docker-compose')
def main(argv=sys.argv):
log = logging.getLogger('heat-config')
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(
logging.Formatter(
'[%(asctime)s] (%(name)s) [%(levelname)s] %(message)s'))
log.addHandler(handler)
log.setLevel('DEBUG')
if not os.path.exists(CONF_FILE):
log.error('No config file %s' % CONF_FILE)
return 1
if not os.path.isdir(DOCKER_COMPOSE_DIR):
os.makedirs(DOCKER_COMPOSE_DIR, 0o700)
try:
configs = json.load(open(CONF_FILE))
except ValueError:
pass
try:
cleanup_stale_projects(configs)
for c in configs:
write_compose_config(c)
except Exception as e:
log.exception(e)
def cleanup_stale_projects(configs):
def deployments(configs):
for c in configs:
yield c['name']
def compose_projects(compose_dir):
for proj in os.listdir(compose_dir):
if os.path.isfile(
os.path.join(DOCKER_COMPOSE_DIR,
'%s/docker-compose.yml' % proj)):
yield proj
def cleanup_containers(project):
cmd = [
DOCKER_COMPOSE_CMD,
'kill'
]
subproc = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = subproc.communicate()
for proj in compose_projects(DOCKER_COMPOSE_DIR):
if proj not in deployments(configs):
proj_dir = os.path.join(DOCKER_COMPOSE_DIR, proj)
os.chdir(proj_dir)
cleanup_containers(proj)
os.remove('%s/docker-compose.yml' % proj_dir)
def write_compose_config(c):
group = c.get('group')
if group != 'docker-compose':
return
def prepare_dir(path):
if not os.path.isdir(path):
os.makedirs(path, 0o700)
compose_conf = c.get('config', '')
if isinstance(compose_conf, dict):
yaml_config = yaml.safe_dump(compose_conf, default_flow_style=False)
else:
yaml_config = compose_conf
proj_dir = os.path.join(DOCKER_COMPOSE_DIR, c['name'])
prepare_dir(proj_dir)
fn = os.path.join(proj_dir, 'docker-compose.yml')
with os.fdopen(os.open(fn, os.O_CREAT | os.O_WRONLY | os.O_TRUNC, 0o600),
'w') as f:
f.write(yaml_config.encode('utf-8'))
if __name__ == '__main__':
sys.exit(main(sys.argv))

View File

@ -0,0 +1,195 @@
#!/usr/bin/env python
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import json
import logging
import os
import shutil
import stat
import subprocess
import sys
import requests
import six
HOOKS_DIR_PATHS = (
os.environ.get('HEAT_CONFIG_HOOKS'),
'/usr/libexec/heat-config/hooks',
'/var/lib/heat-config/hooks',
)
CONF_FILE = os.environ.get('HEAT_SHELL_CONFIG',
'/var/run/heat-config/heat-config')
DEPLOYED_DIR = os.environ.get('HEAT_CONFIG_DEPLOYED',
'/var/lib/heat-config/deployed')
OLD_DEPLOYED_DIR = os.environ.get('HEAT_CONFIG_DEPLOYED_OLD',
'/var/run/heat-config/deployed')
HEAT_CONFIG_NOTIFY = os.environ.get('HEAT_CONFIG_NOTIFY',
'heat-config-notify')
def main(argv=sys.argv):
log = logging.getLogger('heat-config')
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(
logging.Formatter(
'[%(asctime)s] (%(name)s) [%(levelname)s] %(message)s'))
log.addHandler(handler)
log.setLevel('DEBUG')
if not os.path.exists(CONF_FILE):
log.error('No config file %s' % CONF_FILE)
return 1
conf_mode = stat.S_IMODE(os.lstat(CONF_FILE).st_mode)
if conf_mode != 0o600:
os.chmod(CONF_FILE, 0o600)
if not os.path.isdir(DEPLOYED_DIR):
if DEPLOYED_DIR != OLD_DEPLOYED_DIR and os.path.isdir(OLD_DEPLOYED_DIR):
log.debug('Migrating deployed state from %s to %s' %
(OLD_DEPLOYED_DIR, DEPLOYED_DIR))
shutil.move(OLD_DEPLOYED_DIR, DEPLOYED_DIR)
else:
os.makedirs(DEPLOYED_DIR, 0o700)
try:
configs = json.load(open(CONF_FILE))
except ValueError:
pass
else:
for c in configs:
try:
invoke_hook(c, log)
except Exception as e:
log.exception(e)
def find_hook_path(group):
# sanitise the group to get an alphanumeric hook file name
hook = "".join(
x for x in group if x == '-' or x == '_' or x.isalnum())
for h in HOOKS_DIR_PATHS:
if not h or not os.path.exists(h):
continue
hook_path = os.path.join(h, hook)
if os.path.exists(hook_path):
return hook_path
def invoke_hook(c, log):
# Sanitize input values (bug 1333992). Convert all String
# inputs to strings if they're not already
hot_inputs = c.get('inputs', [])
for hot_input in hot_inputs:
if hot_input.get('type', None) == 'String' and \
not isinstance(hot_input['value'], six.text_type):
hot_input['value'] = str(hot_input['value'])
iv = dict((i['name'], i['value']) for i in c['inputs'])
# The group property indicates whether it is softwarecomponent or
# plain softwareconfig
# If it is softwarecomponent, pick up a property config to invoke
# according to deploy_action
group = c.get('group')
if group == 'component':
found = False
action = iv.get('deploy_action')
config = c.get('config')
configs = config.get('configs')
if configs:
for cfg in configs:
if action in cfg['actions']:
c['config'] = cfg['config']
c['group'] = cfg['tool']
found = True
break
if not found:
log.warn('Skipping group %s, no valid script is defined'
' for deploy action %s' % (group, action))
return
# check to see if this config is already deployed
deployed_path = os.path.join(DEPLOYED_DIR, '%s.json' % c['id'])
if os.path.exists(deployed_path):
log.warn('Skipping config %s, already deployed' % c['id'])
log.warn('To force-deploy, rm %s' % deployed_path)
return
signal_data = {}
hook_path = find_hook_path(c['group'])
if not hook_path:
log.warn('Skipping group %s with no hook script %s' % (
c['group'], hook_path))
return
# write out config, which indicates it is deployed regardless of
# subsequent hook success
with os.fdopen(os.open(
deployed_path, os.O_CREAT | os.O_WRONLY, 0o600), 'w') as f:
json.dump(c, f, indent=2)
log.debug('Running %s < %s' % (hook_path, deployed_path))
subproc = subprocess.Popen([hook_path],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = subproc.communicate(input=json.dumps(c))
log.info(stdout)
log.debug(stderr)
if subproc.returncode:
log.error("Error running %s. [%s]\n" % (
hook_path, subproc.returncode))
else:
log.info('Completed %s' % hook_path)
try:
if stdout:
signal_data = json.loads(stdout)
except ValueError:
signal_data = {
'deploy_stdout': stdout,
'deploy_stderr': stderr,
'deploy_status_code': subproc.returncode,
}
signal_data_path = os.path.join(DEPLOYED_DIR, '%s.notify.json' % c['id'])
# write out notify data for debugging
with os.fdopen(os.open(
signal_data_path, os.O_CREAT | os.O_WRONLY, 0o600), 'w') as f:
json.dump(signal_data, f, indent=2)
log.debug('Running %s %s < %s' % (
HEAT_CONFIG_NOTIFY, deployed_path, signal_data_path))
subproc = subprocess.Popen([HEAT_CONFIG_NOTIFY, deployed_path],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = subproc.communicate(input=json.dumps(signal_data))
log.info(stdout)
if subproc.returncode:
log.error(
"Error running heat-config-notify. [%s]\n" % subproc.returncode)
log.error(stderr)
else:
log.debug(stderr)
if __name__ == '__main__':
sys.exit(main(sys.argv))

View File

@ -0,0 +1,101 @@
#!/bin/bash
set -eux
# os-apply-config templates directory
oac_templates=/usr/libexec/os-apply-config/templates
mkdir -p $oac_templates/etc
# initial /etc/os-collect-config.conf
cat <<EOF >/etc/os-collect-config.conf
[DEFAULT]
command = os-refresh-config
EOF
# template for building os-collect-config.conf for polling heat
cat <<EOF >$oac_templates/etc/os-collect-config.conf
[DEFAULT]
{{^os-collect-config.command}}
command = os-refresh-config
{{/os-collect-config.command}}
{{#os-collect-config}}
{{#command}}
command = {{command}}
{{/command}}
{{#polling_interval}}
polling_interval = {{polling_interval}}
{{/polling_interval}}
{{#cachedir}}
cachedir = {{cachedir}}
{{/cachedir}}
{{#collectors}}
collectors = {{.}}
{{/collectors}}
{{#cfn}}
[cfn]
{{#metadata_url}}
metadata_url = {{metadata_url}}
{{/metadata_url}}
stack_name = {{stack_name}}
secret_access_key = {{secret_access_key}}
access_key_id = {{access_key_id}}
path = {{path}}
{{/cfn}}
{{#heat}}
[heat]
auth_url = {{auth_url}}
user_id = {{user_id}}
password = {{password}}
project_id = {{project_id}}
stack_id = {{stack_id}}
resource_name = {{resource_name}}
{{/heat}}
{{#zaqar}}
[zaqar]
auth_url = {{auth_url}}
user_id = {{user_id}}
password = {{password}}
project_id = {{project_id}}
queue_id = {{queue_id}}
{{/zaqar}}
{{#request}}
[request]
{{#metadata_url}}
metadata_url = {{metadata_url}}
{{/metadata_url}}
{{/request}}
{{/os-collect-config}}
EOF
mkdir -p $oac_templates/var/run/heat-config
# template for writing heat deployments data to a file
echo "{{deployments}}" > $oac_templates/var/run/heat-config/heat-config
# os-refresh-config scripts directory
# This moves to /usr/libexec/os-refresh-config in later releases
orc_scripts=/opt/stack/os-config-refresh
for d in pre-configure.d configure.d migration.d post-configure.d; do
install -m 0755 -o root -g root -d $orc_scripts/$d
done
# os-refresh-config script for running os-apply-config
cat <<EOF >$orc_scripts/configure.d/20-os-apply-config
#!/bin/bash
set -ue
exec os-apply-config
EOF
chmod 700 $orc_scripts/configure.d/20-os-apply-config
chmod 700 /opt/stack/os-config-refresh/configure.d/55-heat-config
chmod 700 /opt/stack/os-config-refresh/configure.d/50-heat-config-docker-compose
chmod 755 /var/lib/heat-config/hooks/atomic
chmod 755 /var/lib/heat-config/hooks/docker-compose
chmod 755 /var/lib/heat-config/hooks/script
chmod 755 /usr/bin/heat-config-notify

View File

@ -0,0 +1,163 @@
#!/usr/bin/env python
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import json
import logging
import os
import sys
import requests
try:
from heatclient import client as heatclient
except ImportError:
heatclient = None
try:
from keystoneclient.v3 import client as ksclient
except ImportError:
ksclient = None
try:
from zaqarclient.queues.v1 import client as zaqarclient
except ImportError:
zaqarclient = None
MAX_RESPONSE_SIZE = 950000
def init_logging():
log = logging.getLogger('heat-config-notify')
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(
logging.Formatter(
'[%(asctime)s] (%(name)s) [%(levelname)s] %(message)s'))
log.addHandler(handler)
log.setLevel('DEBUG')
return log
def trim_response(response, trimmed_values=None):
"""Trim selected values from response.
Makes given response smaller or the same size as MAX_RESPONSE_SIZE by
trimming given trimmed_values from response dict from the left side
(beginning). Returns trimmed and serialized JSON response itself.
"""
trimmed_values = trimmed_values or ('deploy_stdout', 'deploy_stderr')
str_response = json.dumps(response, ensure_ascii=True, encoding='utf-8')
len_total = len(str_response)
offset = MAX_RESPONSE_SIZE - len_total
if offset >= 0:
return str_response
offset = abs(offset)
for key in trimmed_values:
len_value = len(response[key])
cut = int(round(float(len_value) / len_total * offset))
response[key] = response[key][cut:]
str_response = json.dumps(response, ensure_ascii=True, encoding='utf-8')
return str_response
def main(argv=sys.argv, stdin=sys.stdin):
log = init_logging()
usage = ('Usage:\n heat-config-notify /path/to/config.json '
'< /path/to/signal_data.json')
if len(argv) < 2:
log.error(usage)
return 1
try:
signal_data = json.load(stdin)
except ValueError:
log.warn('No valid json found on stdin')
signal_data = {}
conf_file = argv[1]
if not os.path.exists(conf_file):
log.error('No config file %s' % conf_file)
log.error(usage)
return 1
c = json.load(open(conf_file))
iv = dict((i['name'], i['value']) for i in c['inputs'])
if 'deploy_signal_id' in iv:
sigurl = iv.get('deploy_signal_id')
sigverb = iv.get('deploy_signal_verb', 'POST')
log.debug('Signaling to %s via %s' % (sigurl, sigverb))
# we need to trim log content because Heat response size is limited
# by max_json_body_size = 1048576
str_signal_data = trim_response(signal_data)
if sigverb == 'PUT':
r = requests.put(sigurl, data=str_signal_data,
headers={'content-type': 'application/json'})
else:
r = requests.post(sigurl, data=str_signal_data,
headers={'content-type': 'application/json'})
log.debug('Response %s ' % r)
if 'deploy_queue_id' in iv:
queue_id = iv.get('deploy_queue_id')
log.debug('Signaling to queue %s' % (queue_id,))
ks = ksclient.Client(
auth_url=iv['deploy_auth_url'],
user_id=iv['deploy_user_id'],
password=iv['deploy_password'],
project_id=iv['deploy_project_id'])
endpoint = ks.service_catalog.url_for(
service_type='messaging', endpoint_type='publicURL')
conf = {
'auth_opts': {
'backend': 'keystone',
'options': {
'os_auth_token': ks.auth_token,
'os_project_id': iv['deploy_project_id'],
}
}
}
cli = zaqarclient.Client(endpoint, conf=conf, version=1.1)
queue = cli.queue(queue_id)
r = queue.post({'body': signal_data, 'ttl': 600})
log.debug('Response %s ' % r)
elif 'deploy_auth_url' in iv:
ks = ksclient.Client(
auth_url=iv['deploy_auth_url'],
user_id=iv['deploy_user_id'],
password=iv['deploy_password'],
project_id=iv['deploy_project_id'])
endpoint = ks.service_catalog.url_for(
service_type='orchestration', endpoint_type='publicURL')
log.debug('Signalling to %s' % endpoint)
heat = heatclient.Client(
'1', endpoint, token=ks.auth_token)
r = heat.resources.signal(
iv.get('deploy_stack_id'),
iv.get('deploy_resource_name'),
data=signal_data)
log.debug('Response %s ' % r)
return 0
if __name__ == '__main__':
sys.exit(main(sys.argv, sys.stdin))

View File

@ -0,0 +1,115 @@
#!/usr/bin/env python
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import json
import logging
import os
import subprocess
import sys
WORKING_DIR = os.environ.get('HEAT_ATOMIC_WORKING',
'/var/lib/heat-config/heat-config-atomic')
ATOMIC_CMD = os.environ.get('HEAT_ATOMIC_CMD', 'atomic')
def prepare_dir(path):
if not os.path.isdir(path):
os.makedirs(path, 0o700)
def build_response(deploy_stdout, deploy_stderr, deploy_status_code):
return {
'deploy_stdout': deploy_stdout,
'deploy_stderr': deploy_stderr,
'deploy_status_code': deploy_status_code,
}
def main(argv=sys.argv):
log = logging.getLogger('heat-config')
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(
logging.Formatter(
'[%(asctime)s] (%(name)s) [%(levelname)s] %(message)s'))
log.addHandler(handler)
log.setLevel('DEBUG')
c = json.load(sys.stdin)
prepare_dir(WORKING_DIR)
os.chdir(WORKING_DIR)
env = os.environ.copy()
input_values = dict((i['name'], i['value']) for i in c['inputs'])
stdout, stderr = {}, {}
config = c.get('config', '')
if not config:
log.debug("No 'config' input found, nothing to do.")
json.dump(build_response(stdout, stderr, 0), sys.stdout)
return
atomic_subcmd = config.get('command', 'install')
image = config.get('image')
if input_values.get('deploy_action') == 'DELETE':
cmd = [
'uninstall',
atomic_subcmd,
image
]
subproc = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, env=env)
stdout, stderr = subproc.communicate()
json.dump(build_response(stdout, stderr, subproc.returncode), sys.stdout)
return
install_cmd = config.get('installcmd', '')
name = config.get('name', c.get('id'))
cmd = [
ATOMIC_CMD,
atomic_subcmd,
image,
'-n %s' % name
]
if atomic_subcmd == 'install':
cmd.extend([install_cmd])
privileged = config.get('privileged', False)
if atomic_subcmd == 'run' and privileged:
cmd.extend(['--spc'])
log.debug('Running %s' % cmd)
subproc = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = subproc.communicate()
log.debug(stdout)
log.debug(stderr)
if subproc.returncode:
log.error("Error running %s. [%s]\n" % (cmd, subproc.returncode))
else:
log.debug('Completed %s' % cmd)
json.dump(build_response(stdout, stderr, subproc.returncode), sys.stdout)
if __name__ == '__main__':
sys.exit(main(sys.argv))

View File

@ -0,0 +1,131 @@
#!/usr/bin/env python
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import ast
import dpath
import json
import logging
import os
import six
import subprocess
import sys
import warnings
import yaml
WORKING_DIR = os.environ.get('HEAT_DOCKER_COMPOSE_WORKING',
'/var/lib/heat-config/heat-config-docker-compose')
DOCKER_COMPOSE_CMD = os.environ.get('HEAT_DOCKER_COMPOSE_CMD',
'docker-compose')
def prepare_dir(path):
if not os.path.isdir(path):
os.makedirs(path, 0o700)
def write_input_file(file_path, content):
prepare_dir(os.path.dirname(file_path))
with os.fdopen(os.open(
file_path, os.O_CREAT | os.O_WRONLY, 0o600), 'w') as f:
f.write(content.encode('utf-8'))
def build_response(deploy_stdout, deploy_stderr, deploy_status_code):
return {
'deploy_stdout': deploy_stdout,
'deploy_stderr': deploy_stderr,
'deploy_status_code': deploy_status_code,
}
def main(argv=sys.argv):
warnings.warn('This hook is deprecated, please use hooks from heat-agents '
'repository instead.', DeprecationWarning)
log = logging.getLogger('heat-config')
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(
logging.Formatter(
'[%(asctime)s] (%(name)s) [%(levelname)s] %(message)s'))
log.addHandler(handler)
log.setLevel('DEBUG')
c = json.load(sys.stdin)
input_values = dict((i['name'], i['value']) for i in c['inputs'])
proj = os.path.join(WORKING_DIR, c.get('name'))
prepare_dir(proj)
stdout, stderr = {}, {}
if input_values.get('deploy_action') == 'DELETE':
json.dump(build_response(stdout, stderr, 0), sys.stdout)
return
config = c.get('config', '')
if not config:
log.debug("No 'config' input found, nothing to do.")
json.dump(build_response(stdout, stderr, 0), sys.stdout)
return
# convert config to dict
if not isinstance(config, dict):
config = ast.literal_eval(json.dumps(yaml.safe_load(config)))
os.chdir(proj)
compose_env_files = []
for value in dpath.util.values(config, '*/env_file'):
if isinstance(value, list):
compose_env_files.extend(value)
elif isinstance(value, six.string_types):
compose_env_files.extend([value])
input_env_files = {}
if input_values.get('env_files'):
input_env_files = dict(
(i['file_name'], i['content'])
for i in ast.literal_eval(input_values.get('env_files')))
for file in compose_env_files:
if file in input_env_files.keys():
write_input_file(file, input_env_files.get(file))
cmd = [
DOCKER_COMPOSE_CMD,
'up',
'-d',
'--no-build',
]
log.debug('Running %s' % cmd)
subproc = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = subproc.communicate()
log.debug(stdout)
log.debug(stderr)
if subproc.returncode:
log.error("Error running %s. [%s]\n" % (cmd, subproc.returncode))
else:
log.debug('Completed %s' % cmd)
json.dump(build_response(stdout, stderr, subproc.returncode), sys.stdout)
if __name__ == '__main__':
sys.exit(main(sys.argv))

View File

@ -0,0 +1,99 @@
#!/usr/bin/env python
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import json
import logging
import os
import subprocess
import sys
import warnings
WORKING_DIR = os.environ.get('HEAT_SCRIPT_WORKING',
'/var/lib/heat-config/heat-config-script')
OUTPUTS_DIR = os.environ.get('HEAT_SCRIPT_OUTPUTS',
'/var/run/heat-config/heat-config-script')
def prepare_dir(path):
if not os.path.isdir(path):
os.makedirs(path, 0o700)
def main(argv=sys.argv):
warnings.warn('This hook is deprecated, please use hooks from heat-agents '
'repository instead.', DeprecationWarning)
log = logging.getLogger('heat-config')
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(
logging.Formatter(
'[%(asctime)s] (%(name)s) [%(levelname)s] %(message)s'))
log.addHandler(handler)
log.setLevel('DEBUG')
prepare_dir(OUTPUTS_DIR)
prepare_dir(WORKING_DIR)
os.chdir(WORKING_DIR)
c = json.load(sys.stdin)
env = os.environ.copy()
for input in c['inputs']:
input_name = input['name']
value = input.get('value', '')
if isinstance(value, dict) or isinstance(value, list):
env[input_name] = json.dumps(value)
else:
env[input_name] = value
log.info('%s=%s' % (input_name, env[input_name]))
fn = os.path.join(WORKING_DIR, c['id'])
heat_outputs_path = os.path.join(OUTPUTS_DIR, c['id'])
env['heat_outputs_path'] = heat_outputs_path
with os.fdopen(os.open(fn, os.O_CREAT | os.O_WRONLY, 0o700), 'w') as f:
f.write(c.get('config', '').encode('utf-8'))
log.debug('Running %s' % fn)
subproc = subprocess.Popen([fn], stdout=subprocess.PIPE,
stderr=subprocess.PIPE, env=env)
stdout, stderr = subproc.communicate()
log.info(stdout)
log.debug(stderr)
if subproc.returncode:
log.error("Error running %s. [%s]\n" % (fn, subproc.returncode))
else:
log.info('Completed %s' % fn)
response = {}
for output in c.get('outputs') or []:
output_name = output['name']
try:
with open('%s.%s' % (heat_outputs_path, output_name)) as out:
response[output_name] = out.read()
except IOError:
pass
response.update({
'deploy_stdout': stdout,
'deploy_stderr': stderr,
'deploy_status_code': subproc.returncode,
})
json.dump(response, sys.stdout)
if __name__ == '__main__':
sys.exit(main(sys.argv))

View File

@ -0,0 +1,46 @@
#!/bin/bash
set -ux
# Allow local registry, and restart docker
sed -i -e "/^OPTIONS=/ s/'$/ --insecure-registry 192.168.1.101:5000'/" /etc/sysconfig/docker
systemctl restart docker
# Use `docker.io/rico/heat-container-agent-with-docker` if you not using `192.168.1.101:5000/heat-container-agent-with-docker`
# heat-docker-agent service
cat <<EOF > /etc/systemd/system/heat-container-agent.service
[Unit]
Description=Heat Container Agent
After=docker.service
Requires=docker.service
[Service]
TimeoutSec=5min
RestartSec=5min
User=root
Restart=on-failure
ExecStartPre=-/usr/bin/docker rm -f heat-container-agent-with-docker
ExecStartPre=-/usr/bin/docker pull 192.168.1.101:5000/heat-container-agent-with-docker
ExecStart=/usr/bin/docker run --name heat-container-agent-with-docker \\
--privileged \\
--net=host \\
-v /run/systemd:/run/systemd \\
-v /etc/sysconfig:/etc/sysconfig \\
-v /etc/systemd/system:/etc/systemd/system \\
-v /var/lib/heat-cfntools:/var/lib/heat-cfntools \\
-v /var/lib/cloud:/var/lib/cloud \\
-v /var/run/docker.sock:/var/run/docker.sock \\
-v /tmp:/tmp \\
-v /etc/hosts:/etc/hosts \\
192.168.1.101:5000/heat-container-agent-with-docker
ExecStop=/usr/bin/docker stop heat-container-agent-with-docker
[Install]
WantedBy=multi-user.target
EOF
# enable and start heat-container-agent
chmod 0640 /etc/systemd/system/heat-container-agent.service
/usr/bin/systemctl enable heat-container-agent.service
/usr/bin/systemctl start --no-block heat-container-agent.service