119 Commits

Author SHA1 Message Date
Jenkins
d67627b418 Merge "M/N upgrade fail to restart nova-scheduler." 2016-08-26 23:18:37 +00:00
Jenkins
a4b8e86199 Merge "Update pacemaker_resource_restart.sh for new HA arch" 2016-08-26 12:44:10 +00:00
Jenkins
319c42475c Merge "Clean up old functions" 2016-08-25 10:41:34 +00:00
Pradeep Kilambi
34c8a8cf60 Clean up old functions
These are not needed any more as they were specific to
mitaka upgrades.

Change-Id: I0d421b942e620403f88374e1c82105747d8d84c9
2016-08-24 11:47:47 -04:00
Sofer Athlan-Guyot
cb894b4509 M/N upgrade fail to restart nova-scheduler.
The nova api db need to be synchronized as well.

Change-Id: I2628b24ff1153c84cbf388455666ae42570cb10f
Closes-Bug: 1615042
2016-08-24 15:13:11 +02:00
Jiri Stransky
f45897ea1e Fix check for MariaDB upgrade manual switch off
The MySqlMajorUpgrade parameter has validation on it allowing only
values yes/no/auto, however in the script we checked for '0' instead of
'no', which means the only effective values were yes/auto. This is now
fixed to allow switching the migration off.

Change-Id: I5d64734894c6bfd9003ad643f3747e34e62465cc
Closes-Bug: #1616429
2016-08-24 13:21:59 +02:00
Jiri Stransky
072404b569 Don't trigger mariadb upgrade dump/restore when not needed
When upgrading from mariadb X.Y.Z to mariadb X.Y.Ž (X.Y part stays the
same), the dump/restore of mariadb shouldn't be necessary. Therefore we
now only check for up to the first 2 fields of the version string when
determining if we should trigger the dump/restore operation.

Closes-Bug: #1615721
Change-Id: Ib7af8bfb121f5c83184d51b3c6dc657108c25973
2016-08-22 18:29:28 +02:00
Pradeep Kilambi
420f786a45 Upgrade scripts to migrate aodh alarm data
In Newton, Aodh will be using its own mysql DB rather than
using ceilometer's mongo instance. This means we need to
migrate any existing alarm and alrm history data from
ceilometer DB to aodh mysqlDB. Upstream aodh provides us
with a aodh-data-migration utility. We need to invoke this
during the mitaka->newton upgrade procedure so data is
migrated as expected and aodh mysql backend takes over.

Closes-bug: #1611794

Change-Id: I17888b57ecf98cd83e92af2f9cdbead066b03aa3
2016-08-17 12:40:06 +00:00
James Slagle
cca569346f Update pacemaker_resource_restart.sh for new HA arch
Given the new HA architecture with less pacemaker managed resources, we
need to update this script to reflect those changes. Without these
changes, stack-updates using the exact same templates will fail since
this script is always executed on update.

Change-Id: I2ce1681d19d4a24a7561e3dd9c5efdae40d030b7
Closes-Bug: #1612667
2016-08-16 18:01:41 +02:00
Jenkins
03732f871b Merge "Allow to manually disable post-puppet restarts" 2016-07-28 13:59:11 +00:00
Steven Hardy
fe1f8a8c86 Convert AllNodesExtraConfig to OS::Heat::None
Instead of creating a nested stack, as it's slightly lower overhead
and will make things easier when adding custom roles (where a
hard-coded default template can't work)

Change-Id: If9f8294ba477d1c1364e19a52152905a2c02e959
2016-07-05 17:48:28 +01:00
Steven Hardy
7ff66b9af1 Remove config_identifier from all_nodes extraconfig examples
Since https://review.openstack.org/#/c/315616 this is no longer
required.

Change-Id: I0452d1577a25d19b4351bfe7830a6c7bbe485e67
2016-07-05 17:46:23 +01:00
Jenkins
a70464cd0d Merge "Dump and restore galera db during major upgrades" 2016-07-04 10:12:57 +00:00
Michele Baldessari
292fdf87e0 Dump and restore galera db during major upgrades
When the overcloud is upgraded we do a yum update of the packages.
This step might introduce a newer galera version. In such a situation
we need to dump the db and restore it. The high-level workflow should
be the following:
1) During the main upgrade step, before shutting down the cluster
   we need to dump the db
2) We upgrade the packages
3) We briefly start mysql on a single node while making sure that
   /root/.my.cnf is briefly moved out of the way (because it contains
   a password) and import the data. After the import we shutdown this
   mysql instance
4) We let the cluster start up normally

The above steps will take place in the following scenarios.
Given a locally installed mariadb version X.Y.Z and release R,
we will dump and restore the DB under the following conditions:
A) MySqlMajorUpgrade template parameter is set to 'auto' and
   the upgraded package differs in X, Y *or* Z. We basically don't
   dump automatically if the release field changes.

B) MySqlMajorUpgrade template parameter is set to 'yes'

When MySqlMajorUpgrade is set to 'no', no dumping will be performed.
Note that this will give a non functional upgrade if a major mariadb
upgrade is taking place.

Partial-Bug: #1587449

Co-Author: Damien Ciabrin <dciabrin@redhat.com>
Co-Author: Mike Bayer <mbayer@redhat.com>

Depends-On: I8cb4cb3193e6b823aad48ad7dbbbb227364d2a58
Depends-On: I38dcacfabc44539aab1f7da85168fe44a1b43a51

Change-Id: I374628547aed091129d0deaa29764bfc998d76ea
2016-06-29 23:44:01 +02:00
Damien Ciabrini
017334bbb5 Increase cluster sync timeout for M->N major upgrades
Since the Liberty release, the number of services managed by pacemaker
on HA Overcloud has increased. This has an impact on
major_upgrade_controller_pacemaker_1.sh, where cluster sync timeout
value tuned for older releases is now becoming too low.

Raise the cluster sync timeout value to a sensible limit to
give pacemaker enough time to stop the cluster during major upgrade.

Change-Id: I821d354ba30ce39134982ba12a82c429faa3ce62
Closes-Bug: #1597506
2016-06-29 22:36:34 +02:00
Michele Baldessari
257855082a Disable stonith temporarily during upgrades
It is best if we disable stonith if a cluster has it configured and on,
before we call "pcs cluster stop --all", because should a service fail
to stop for whatever reason, pacemaker will fence the node where it
happened. This is something that we unlikely want during an upgrade as
it will make things worse.

Once the cluster is stopped we can reenable stonith (if it was enabled
to start with) in the CIB while the cluster is shut down.

Closes-Bug: #1596065

Change-Id: I38dcacfabc44539aab1f7da85168fe44a1b43a51
2016-06-24 21:08:34 +02:00
Jiri Stransky
f918bdb048 Allow to manually disable post-puppet restarts
Restarting services after Puppet is vital to ensure that config changes
go applied. However, it can be sometimes desirable to prevent these
restarts to avoid downtime, if the operator is sure that no config
changes need applying. This can be a case e.g. when scaling compute
nodes. Passing the puppet-pacemaker-no-restart.yaml environment file *in
addition* to puppet-pacemaker.yaml should allow this.

This is a stop gap solution before we have proper communication between
Puppet and Pacemaker to allow selective restarts.

Change-Id: I9c3c5c10ed6ecd5489a59d7e320c3c69af9e19f4
2016-06-14 16:10:10 +02:00
Jenkins
172ded7a40 Merge "Add ExtraConfig example that always runs on update" 2016-05-20 10:16:36 +00:00
Jenkins
3c5521d2bb Merge "Disable VIPs before stopping cluster during version upgrade" 2016-05-04 11:14:00 +00:00
Jenkins
1bbf7a27e3 Merge "Fix distinguishing between stack-create and stack-update" 2016-05-04 10:01:28 +00:00
Ian Pilcher
6e65c8fc0a Disable VIPs before stopping cluster during version upgrade
If "pcs cluster stop --all" is executed on a controller that
happens to have a VIP on the internal network, pcs may use the
VIP as the source address for communication with another cluster
node.  When pacemaker is stopped this VIP goes away, and pcs never
receives a response from the other node.  This causes pcs to hang
indefinitely; eventually the upgrade times out and fails.

Disabling the VIPs before stopping the cluster avoids this
situation.

Change-Id: I6bc59120211af28456018640033ce3763c373bbb
Closes-Bug: 1577570
2016-05-02 16:26:49 -05:00
Jenkins
6a794c4328 Merge "Update .sh references from openstack-keystone to openstack-core" 2016-04-26 17:02:34 +00:00
Jenkins
d1e59ff720 Merge "Increase galera sync timeout in yum_update.sh" 2016-04-20 08:18:15 +00:00
Jiri Stransky
f5d96bb41b Make sure openstack services are dependent on openstack-core
Previously ceilometer-notification, aodh-listener and sahara-engine
didn't have constraints that would anchor them under openstack-core
dummy resource. Such constraints are added now. (sahara-engine starting
after sahara-api, aodh-listener after aodh-evaluator, and
ceilometer-notification after openstack-core.) Openstack-core ->
heat-api constraint has been removed because heat-api depends on
ceilometer-notification, so there's a transitive dependency on
openstack-core already.

Change-Id: Ided7321ebbf2c3556726343b4bb466fd8759b43a
Closes-Bug: #1569444
2016-04-13 13:19:37 +02:00
Jiri Stransky
aa0bd9eb1b Fix distinguishing between stack-create and stack-update
Previously we tried to use UpdateIdentifier for two different things:
tell whether to perform package update, and also to tell whether the
top-level stack is being created or updated (which was incorrect and
resulted in bug 1567384, and an attempt to work around that bug resulted
in bug 1567385).

We cannot use Heat's "action" conditionals in some cases, because they
refer to the direct parent stack, which can yield undesirable results
when introducing new nested stacks or temporarily no-opping something
and then adding it back (in both these cases, "action" would be
considered "CREATE", even though the top-level stack is in "UPDATE").

So tripleoclient passes a new parameter StackAction to tell whether the
top-level stack is being created or updated, and we make use of
that. (It seems there's no better way of getting this info from within
the nested Heat stacks.)

Change-Id: Ie14ddbff15e7ed21aaa3fcdacf36e0040f912382
Depends-On: I9dc3b4cd8a6a71df34d8babf0e4c6505041f5311
Closes-Bug: #1567384
Related-Bug: #1567385
2016-04-11 14:31:42 +02:00
Giulio Fidente
2d92911838 Update .sh references from openstack-keystone to openstack-core
The update and upgrade shell scripts were still referencing the
old openstack-keystone service which got removed with
Ie26908ac9bfc0b84b6b65ae3bda711236b03d9d4

Also removes kilo and liberty specific workarounds and config changes.

Change-Id: Icc80904908ee3558930d4639a21812f14b2fd12e
2016-04-11 14:27:55 +02:00
Dan Prince
ec78afd2cb Replace extraconfig/tasks/noop.yaml w/ Heat::None
Removes the old noop nested stack template for extraconfig
tasks and instead uses OS::Heat::None. This should avoid a few
extra resource checks on create and update.

Change-Id: I5a42fc78ece2553e86385236e214aa1e3c91cd85
2016-04-08 08:06:18 -04:00
marios
706c2fe4b6 Add removal of the /etc/resolv.conf.save file for +bug/1567004
The change at https://review.openstack.org/#/c/302352/ should stop
the if up/down scripts from making changes to resolv.conf as
discussed in that review and the related bug below. However during
upgrades, as we are moving from a version of the ifcfg-vlanXX files
that don't have the PEERDNS=no added by /#/c/302352 the if up
script will restore the /etc/resolv.conf.save to /etc/resolv.conf
and overwrite it. This removes the .save file during the upgrade
init command which gets delivered to all nodes as the first stage
of a major upgrade.

Change-Id: I91dd139f43be4912c20d8661691bee2b662964d4
Related-Bug: 1567004
2016-04-07 16:50:21 +03:00
Raoul Scarazzini
05b2a200ca Filter for local nodes in check_resource function
While having extra customizations inside a TripleO deployed
Pacemaker environment, say you have instance HA with
pacemaker_remoted or you need to configure an external arbitrator
for something, then the status of the resources for remote nodes
is "Stopped".
This leads to failures while, for example, scaling up.
This fixes the way status is checked, filtering just local nodes.

Co-Authored-By: Giulio Fidente <gfidente@redhat.com>
Change-Id: I8dc25f5d7031c265858afd5a266fda5315ae37a0
2016-04-04 10:41:37 +02:00
Ben Nemec
4f373ea30f Restart haproxy after configuring SSL certs
If a certificate expires, the user will need to update it.  However,
because we only restart services at the end of a stack-update the
new certificate doesn't take effect until after puppet has run.
This is a problem because puppet makes OpenStack calls, which will
fail if the certificate is expired.  In that case we never get to
the service restart so the stack is wedged until the user manually
restart haproxy.

This patch addresses the problem by reloading haproxy before puppet
runs.  This is done in a pre-puppet script for pacemaker after pacemaker
is maintenance mode because we need to make sure it happens after all of
the certs have been installed on the controllers, but before puppet
runs.

For non-pacemaker, haproxy is simply reloaded.

Change-Id: Id5ed05b3a20d06af8ae7a3d6f859b03399b0d77d
2016-04-01 12:42:02 -04:00
Mike Burns
d2566e5b94 change the default satellite tools rpm repo.
Change-Id: I60ab36b04b8932e4dbee58e21998dc984178b41c
Bugzilla:  https://bugzilla.redhat.com/1275281
2016-03-29 12:14:22 -04:00
Mathieu Bultel
6e56f87314 Set UpdateIdentifier for upgrade converge, to prevent services down
We'd like to let the post puppet pacemaker controller services
restart to happen for the convergence step so set the
UpdateIdentifier. However also set the PackageUpdate to noop so the
yum_update.sh doesn't happen.

Since a full haproxy restart is expected, we no longer need the
systemctl reload added at Iae3bad745ecdf952a7a0314fe1375d07eb47c454
so remove that too.

Some more context at
https://bugzilla.redhat.com/show_bug.cgi?id=1321036

Co-Authored-By: marios <marios@redhat.com>
Change-Id: I31c2d97d68c97b435f63863fae2c89f18f99681d
2016-03-24 20:25:52 +02:00
Jenkins
c6249a1af2 Merge "Fix satellite registration for http or https" 2016-03-24 18:06:00 +00:00
Jenkins
0e6071d395 Merge "Add systemctl reload haproxy to the pacemaker_resource_restart.sh" 2016-03-24 18:00:09 +00:00
Jenkins
19e44d2a61 Merge "Deploy Aodh services, replacing Ceilometer Alarm" 2016-03-24 17:51:43 +00:00
marios
843d25af04 Add systemctl reload haproxy to the pacemaker_resource_restart.sh
As discussed in the related bug below, after upgrading your
environment to latest liberty the haproxy config isn't picked
up. This adds a systemctl reload haproxy in the pacemaker
resource restart we run as part of the post-puppet-pacemaker.

Related-Bug: 1561012
Change-Id: Iae3bad745ecdf952a7a0314fe1375d07eb47c454
2016-03-23 16:54:33 +02:00
Steven Hardy
8c0ba4c09e Add ExtraConfig example that always runs on update
Some user feedback indicates we need better examples showing how
we can run an extraconfig template, e.g on post-deploy so that it
runs every update, regardless of if a change to the config has
occurred.  We can leverage DeployIdentifier for this, just like with
the puppet deployments.

This can be tested with an example like this:

resource_registry:
  OS::TripleO::NodeExtraConfigPost: tripleo-heat-templates/extraconfig/post_deploy/example_run_on_update.yaml

Change-Id: I45d8f8093ab45c03238ec56651c437128661cb95
2016-03-23 14:29:50 +00:00
James Slagle
5117011468 Fix satellite registration for http or https
If the satellite registration url was specified with https, the curl
command to detect the satellite version would not work as expected since
-L was not passed and you get redirected to https when testing the ping
api.

To additionally handle the case where https is specified, also use curl
directly with -k to download the configuration rpm instead of using rpm
with a url.

Fixes another bug with a missing $ in the reference to the
$satellite_version variable.

Change-Id: I984fdfc415eeeed4ef29cc8d0812e1b67545d6b1
2016-03-23 10:20:16 -04:00
Pradeep Kilambi
2018c38ed4 Deploy Aodh services, replacing Ceilometer Alarm
Ceilometer Alarm is deprecated in Liberty by Aodh.

This patch:
* manage Aodh Keystone resources
* deploy Aodh API under WSGI, Notifier, Listener and Evaluator
* manage new parameters to customize Aodh deployment
* uses ceilometer DB for the upgrade path
* pacemaker config
* Add migration logic to remove pcs resources

Depends-On: I5333faa72e52d2aa2a622ac2d4b60825aadc52b5
Depends-On: Ib6c9c4c35da3fb55e0ca8e2d5a58ebaf4204d792

Co-Authored-By: Emilien Macchi <emilien@redhat.com>

Change-Id: Ib47a22884afb032ebc1655e1a4a06bfe70249134
2016-03-20 10:27:21 -04:00
Jenkins
dfbcefa0b4 Merge "Upgrades: quiet yum upgrade on cinder nodes" 2016-03-11 13:37:58 +00:00
Jenkins
d35cb70a51 Merge "Upgrades: initialization command/snippet" 2016-03-10 18:34:01 +00:00
Jenkins
deff78b2d3 Merge "Add a ceph-storage node upgrade script for the upgrade workflow" 2016-03-10 18:32:47 +00:00
Jenkins
31ffe53d75 Merge "Upgrades: object storage node upgrade fix" 2016-03-10 15:25:56 +00:00
Jiri Stransky
d4b8297b31 Upgrades: quiet yum upgrade on cinder nodes
Yum update on cinder nodes should be quiet, as it is on controllers,
because results of these updates are sent to Heat. I mistakenly left
this out in the first patch because i used one of the standalone node
upgrade scripts as a copy/paste base for the cinder node upgrade script.

Change-Id: Id13190dc4d242317829c7994088183f52d21461d
2016-03-10 14:22:05 +01:00
Jenkins
d11ddf1d31 Merge "Upgrade of Cinder block storage nodes" 2016-03-10 12:57:37 +00:00
Jiri Stransky
d9ee847ec1 Upgrades: object storage node upgrade fix
The variables in the heredoc should be escaped because they should
evaluate only when the inner script runs, not when the outer "writer"
script runs.

Python-zaqarclient is installed for os-collect-config to work, as we do
on the other node types.

Swift-proxy is removed from list of services to stop/start, as
swift-proxy isn't supposed to run on the swift storage nodes.

Change-Id: I8426b859d11378ebdc3da94dcc090133dab0c628
2016-03-09 15:05:10 +01:00
marios
9727a27d00 Fixup systemctl_swift stop/start during the controller upgrade
During the controller upgrade in
major_upgrade_controller_pacemaker_1.sh we use systemctl to stop
all swift services and then start them again in _pacemaker_2.sh

In the case of stand-alone swift nodes the deployer may have
used the ControllerEnableSwiftStorage: false so that only the
swift-proxy service is left on controllers (wrt swift). The
systemctl_swift function used during upgrades is changed to factor
this in.

Change-Id: Ib22005123429f250324df389855d0dccd2343feb
2016-03-09 15:43:40 +02:00
Jiri Stransky
4323ad1c94 Upgrades: initialization command/snippet
This allows to run a command or a script snippet on all overcloud nodes
at the beginning of the upgrade. The intended use is to switch to a new
set of repositories on the overcloud. This is done differently in
different contexts (e.g. upstream vs. downstream), but generally it
should be simple enough to not warrant creation of switchable
"UpgradeInit" resource in the resource registry, and a string
command/snippet parameter should suffice.

Change-Id: I72271170d3f53a5179b3212ec9bae9a6204e29e6
2016-03-09 13:58:20 +01:00
marios
911a81192e Add a ceph-storage node upgrade script for the upgrade workflow
This adds delivery of an upgrade script to any ceph-storage nodes
during the script delivery that comes first during the upgrade
workflow.

The controllers have the ceph-mon whilst the ceph-osds are on the
ceph-storage nodes. The ceph-mons will be updated first as part of
the heat-driven controller upgrade, and ceph-osds on ceph nodes are
upgraded with the upgrade-non-controller.sh tripleo-common script
as with compute and swift nodes.

Also slight rename for the ObjectStorageConfig/Deployment here for
consistency.

Change-Id: I12abad5548dcb019ade9273da06fe66fd97f54cc
2016-03-08 17:47:36 +02:00
Jenkins
700d65fe76 Merge "Add an environment to use a swap partition" 2016-03-08 05:12:07 +00:00