Retire repo and note new content in openstack/osops

Change-Id: I8a36e470dfcf2e5db0a702d371b98cd94082bb4d
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
This commit is contained in:
Sean McGinnis 2020-09-10 20:02:41 -05:00
parent 19576800fb
commit 537bc74d55
No known key found for this signature in database
GPG Key ID: CE7EE4BFAF8D70C8
38 changed files with 12 additions and 2538 deletions

54
.gitignore vendored
View File

@ -1,54 +0,0 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
# C extensions
*.so
# Distribution / packaging
.Python
env/
bin/
build/
develop-eggs/
dist/
eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.cache
nosetests.xml
coverage.xml
# Translations
*.mo
# Mr Developer
.mr.developer.cfg
.project
.pydevproject
# Rope
.ropeproject
# Django stuff:
*.log
*.pot
# Sphinx documentation
docs/_build/

201
LICENSE
View File

@ -1,201 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "{}"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright {yyyy} {name of copyright owner}
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -1,19 +0,0 @@
tools-generic
=============
A repo of curated generic OpenStack Operations Tools
These have been verified that do what they say, pass our coding standards and have been found useful by the Operating Community.
For contributing other tools, generally you should submit them to the `osops-tools-contrib repo <https://github.com/openstack/osops-tools-contrib>`_ first.
Please see the wiki page at https://wiki.openstack.org/wiki/Osops#Overview_moving_code
for more details about how code is promoted up to the generic repo.
Other sources of tools
----------------------
* GoDaddy: https://github.com/godaddy/openstack-puppet/tree/master/tools
* NeCTAR: https://github.com/NeCTAR-RC/nectar-tools
* CERN: https://github.com/cernops
* DreamCompute: https://github.com/dreamhost/os-maintenance-tools

12
README.rst Normal file
View File

@ -0,0 +1,12 @@
This project is no longer maintained. Its content has now moved to the
https://opendev.org/openstack/osops repo, and further development will
continue there.
The contents of this repository are still available in the Git
source code management system. To see the contents of this
repository before it reached its end of life, please check out the
previous commit with "git checkout HEAD^1".
For any further questions, please email
openstack-discuss@lists.openstack.org or join #openstack-dev on
Freenode.

View File

@ -1,19 +0,0 @@
Copyright (c) 2014 Go Daddy Operating Company, LLC
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

View File

@ -1,101 +0,0 @@
ansible-playbooks
=================
Go Daddy Ansible playbooks for managing OpenStack infrastructure.
Also available publically at https://github.com/godaddy/openstack-ansible
This assumes your baseline Ansible config is at `/etc/ansible`, and this repo is cloned
to `/etc/ansible/playbooks` (Specifically, the playbooks assume the path to the tasks directory is `/etc/ansible/playbooks/tasks`,
so if you are cloning this repo somewhere else, you'll need to adjust that.)
Patches/comments/complaints welcomed and encouraged! Create an issue or PR here.
Usage Details
-------------
Playbooks are "shebanged" with `#!/usr/bin/env ansible-playbook --forks 50`, so you can actually
just run them directly from the command line.
We have the concept of "worlds", which correspond to dev, test, prod, etc., servers. We use
the world teminology to avoid confusion with the Puppet environment setting (which, for us,
corresponds to a branch in our [openstack-puppet](https://github.com/godaddy/openstack-puppet) repo.) So when you see references to the world
variable, that's what it is. Right now only a couple playbooks utilize that, so for the most
part you can probably use these without worrying about defining a world variable.
puppet-run.yaml and r10k-deploy.yaml are fairly specific to our environment, and are probably
mostly irrelevant unless you're also using our openstack-puppet repo for Puppet configuration.
Basic usage for the other playbooks follows.
### template-prestage.yaml
This one is really cool, complements of [krislindgren](http://github.com/krislindgren). It sets up a
BitTorrent swarm, seeded by the machine running Glance, to distribute a Glance image out to any number
of nova-compute nodes very quickly. So when we roll new gold images each month, we can use this to
"push" them out to all compute nodes, and avoid the first provision penalty of waiting for the image
to transfer the first time.
Caveats:
* Only works when the Glance backend is on a traditional (local or network-based) filesystem. Almost
certainly this does not work, and may not even make sense, for Swift-backed Glance.
* Firewalls or other traffic filters need to allow the BitTorrent ports through, and among, the Glance
server and all compute nodes.
* Assumes Glance images are stored at `/var/lib/glance/images` on the Glance server.
* There are some situations where this cannot be run multiple times, if the tracker or othe BitTorrent
processes are still running. So use caution, and YMMV.
* This is done completely outside the scope of Glance and Nova. There is no Keystone authentication or
access controls. You must have ssh and sudo access to all machines involved for this to work.
Usage:
./template-prestage.yaml -k -K -e "image_uuid=<uuid> image_sha1=<sha1> image_md5=<md5> tracker_host=<glance server> hosts_to_update=<compute host group>"
* _image_uuid_: UUID of the image to prestage (from `nova image-list` or `glance image-list`)
* _image_sha1_: SHA1 sum of the image_uuid (this can be gotten by running: `echo -n "<image_uuid>" | sha1sum | awk '{print $1}'` on any Linux box)
* _image_md5_: MD5 sum of the image file itsemf (this can be gotten by running: `md5sum /var/lib/glance/images/<image_uuid> | awk '{print $1}'` on the Glance server
* _tracker_host_: This is the Glance server host that runs the tracker
* _hosts_to_update_: This is the host group to place the image onto (a list of compute nodes)
### copy-hiera-eyaml-keys.yaml
Copies public and private keys for hiera-eyaml from the source/ansible client machine, to hosts, at
`/etc/pki/tls/private/hiera-eyaml-{public,private}_key.pkcs.pem`
./copy-hiera-eyaml-keys.yaml -k -K -e "srcdir=<srcdir> hosts=<host group>"
* _srcdir_: Source directory on the ansible client machine where the hiera-eyaml-public_key.pkcs7.pem and hiera-eyaml-private_key.pkcs7.pem keys can be found
### patching.yaml
This is a simple playbook to run `yum -y update --skip-broken` on all machines.
./patching.yaml -k -K -e "hosts=<host group>"
This also runs the remove-old-kernels.yaml task first, which removes any kernel packages from the
system which are not 1) the currently running kernel, nor, 2) the default boot kernel in GRUB.
### updatepackages.yaml
Similar to patching.yaml, but this one allows for a specification of exactly which package(s) to update.
./updatepackages.yaml -k -K -e "puppet=<true|false> package=<package spec> hosts=<host group>"
* _puppet_: If true, will also run the puppet-run.yaml task after updating the packages. Default false.
* _package spec_: Specification for what packages to update, wildcards are valid. Default '*'
### restartworld.yaml
Restarts some (or all) openstack services on hosts. Note that this uses the `tools/restartworld.sh` script
from the [godaddy/openstack-puppet](https://github.com/godaddy/openstack-puppet) repo, so you may want to look at that before trying to use this playbook.
This is also somewhat specific to our environment, as far as how we group services together (most of
them run on the "app" class of server.) So results and usefulness may vary.
./restartworld.yaml -k -K -e "class=<server class> hosts=<host group> service=<service class>"
* _class_: Server class to define what services to restart. Recognized options are app, network, and compute. See the `restartworld.sh` script referenced above for which services are on which class of server. You may want to define a group var for this (that's what we did) to automatically map servers to their appropriate server class value.
* _hosts_: Hosts on which to perform the service restarts
* _service class_: Generally, the OpenStack project name for the services to restart. Recognized options are nova,keystone,glance,ceilometer,heat,neutron,spice,els,world (where "world" means all services.)

View File

@ -1,76 +0,0 @@
#!/usr/local/bin/ansible-playbook
#
# Change any neutron agent admin state on any specified host.
#
# This script assumes that you have an openrc file in /root/openrc
# on the ansible host where you are running this from. It also
# requires that the neutron client be installed.
#
# Author: Matt Fischer <matt@mattfischer.com>
# Copyright 2015 Matthew Fischer
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#############################################################################
#
# Note: It is up to you, the user, to limit this with -l compute, -l control,
# etc otherwise all agents will be changed.
#
# Examples:
#
# Enable all L3 Agents on all compute nodes
# change-neutron-agent-state.yaml -e admin_state=up -e agent_type=L3 -l compute
#
# Disable all L3 Agents on all control nodes
# change-neutron-agent-state.yaml -e admin_state=down -e agent_type=L3 -l control
#
# Disable all Metadata Agents on all control nodes
# change-neutron-agent-state.yaml -e admin_state=down -e agent_type=Metadata -l control
#
# NOTE: Ovs is special because the shell eats the spaces
# Disable all OVS Agents on all compute nodes (this would probably be a bad idea to do)
# change-neutron-agent-state.yaml -e admin_state=down -e 'agent_type="Open vSwitch"' -l compute
---
- name: "change the L3 Agent state"
hosts: compute:control
serial: 30
gather_facts: no
connection: local
tasks:
- fail: msg="You need to pass either admin_state='up' or admin_state='down'"
name: "Ensure admin_state variable is set"
when: admin_state != "down" and admin_state != "up"
- fail: msg="Invalid agent_type, should be 'L3', 'Metadata', 'DHCP', or 'Open vSwitch'"
name: "Ensure agent_type variable is set"
when: agent_type != "L3" and agent_type != "Metadata" and agent_type != "DHCP" and agent_type != "Open vSwitch"
- local_action: shell . /root/openrc && neutron agent-list --format value --agent_type="{{ agent_type }} agent" --host={{ inventory_hostname }} -F id
name: "find agents for host"
failed_when: agent_id.rc != 0
register: agent_id
- local_action: shell . /root/openrc && neutron agent-update --admin-state-down --description "Disabled by Ansible" {{ item }}
name: "disable agent"
failed_when: result.rc != 0
register: result
with_items:
- "{{ agent_id.stdout_lines }}"
when: "admin_state=='down'"
- local_action: shell . /root/openrc && neutron agent-update --admin-state-up --description "" {{ item }}
name: "enable agent"
failed_when: result.rc != 0
register: result
with_items:
- "{{ agent_id.stdout_lines }}"
when: "admin_state=='up'"

View File

@ -1,14 +0,0 @@
#!/usr/bin/env ansible-playbook -f 50
---
# This playbook requires the following variables:
# hosts - this is the host(s) that you are trying to run on. any host in the hosts file is valid
#
# Author: Mike Dorman <mdorman@godaddy.com>
#
# Usage:
# ansible-playbook disable-glance-quota.yaml -k -K --extra-vars "hosts=glance-servers"
- hosts: '{{ hosts }}'
sudo: yes
tasks:
- include: ../tasks/turn-off-glance-quota.yaml

View File

@ -1,15 +0,0 @@
#!/usr/bin/env ansible-playbook -f 50
---
# This playbook requires the following variables:
# hosts - this is the host(s) that you are trying to run on. any host in the hosts file is valid
# value - value to set user_storage_quota to. Default 21474836480 (20 GB)
#
# Author: Mike Dorman <mdorman@godaddy.com>
#
# Usage:
# ansible-playbook enable-glance-quota.yaml -k -K --extra-vars "hosts=glance-servers value=21474836480"
- hosts: '{{ hosts }}'
sudo: yes
tasks:
- include: ../tasks/turn-on-glance-quota.yaml

View File

@ -1,29 +0,0 @@
#!/usr/local/bin/ansible-playbook
# This playbook looks for "ghost" VMs which are qemu processes
# that are running but that have been deleted from nova.
# These can occur as a result of a failed migration or a bug
# in nova or in your nova backend. "ghost" VMs use system resources
# on your compute hosts and it's possible that your customers are
# still using them via a floating-ip but have lost ability
# to manage them.
#
# This script assumes that you have an openrc file in /root/openrc
# on the ansible host where you are running this from.
#
# Author: Matt Fischer <matt.fischer@twcable.com>
#
# Usage:
# ghost-vm-finder.yaml
---
- name: "Find Ghost VMs (exist as a qemu process, but not in nova)"
hosts: compute
serial: 10
gather_facts: no
tasks:
- shell: "ps -ef | grep qemu | grep -v 'grep' | awk -F 'uuid' '{print $2}' | awk '{print $1}'"
name: "gather IDs from qemu"
register: qemu_list
- local_action: shell . /root/openrc && /usr/bin/nova show --minimal {{ item }}
name: "check IDs with nova"
with_items:
- "{{ qemu_list.stdout_lines }}"

View File

@ -1,18 +0,0 @@
#!/usr/bin/env ansible-playbook --forks 50
---
#
# This playbook supports the following variables:
#
# Author: Kris Lindgren <klindgren@godaddy.com>
#
# hosts - host(s)/group(s) on which to run this playbook (REQUIRED)
#
# Example:
# To run without changing the plabook with differnt uuids run ansible-plabook with the following:
# ansible-playbook run-puppet.yaml -k -K --extra-vars "hosts=compute-servers"
#
- hosts: '{{ hosts }}'
sudo: yes
tasks:
- include: /etc/ansible/playbooks/tasks/orphaned-vms.yaml

View File

@ -1,42 +0,0 @@
#!/usr/bin/env ansible-playbook --forks 50
---
# Reboot a series of compute nodes in a rolling fasion, verifying all VMs come back up on
# each node before going on to reboot the next group.
#
# Author: Kris Lindgren <klindgren@godaddy.com>
#
# This playbook requires the following variables:
# api_server - this is the server that runs the nova-api instance that we will use to get a list of the running vm's on compute nodes
# hosts - this is the host group to perform the rolling reboot on typically would be: *-compute
# This playbook also except the following *OPTIONAL* variables:
# reboot_parallelism (5) - How many hosts to reboot at once
# reboot_check_port - This is the port to check to see if the server has come back online (22)
# wait_delay - This is how long to wait between checks (120 seconds)
# wait_timeout - This is the maximum time to wait until we move on (1200 seconds)
# pause_for_host_boot - This is the time to wait for the host to fully restart (3 minutes)
# Example:
# ansible-playbook compute-rolling-reboot.yaml -k -K --extra-vars "api_server=api01 hosts=compute"
- hosts: '{{ hosts }}'
sudo: yes
serial: "{{ reboot_parallelism | default('5') }}"
tasks:
- name: Gather list of all running vm's on the host
shell: source /root/keystonerc_admin; nova list --all-tenants --status Active --host {{inventory_hostname}} --fields host,OS-EXT-SRV-ATTR:instance_name,status | grep ACTIVE | awk -F" | " '{if(a[$4]){a[$4]=a[$4]","$2"+"$6} else { a[$4]=$2"+"$6}} END {for (i in a) { print i":"a[i]} }'
register: running_vms
delegate_to: '{{ api_server }}'
- include: ../tasks/rolling-reboot.yaml
- name: ensure that nova-compute is started
service: name=openstack-nova-compute state=started
register: novacompute
- name: Verify running vm's are still running
shell: rc=$(echo "0"); vmlist=$( echo "{{running_vms.stdout }}" | grep {{inventory_hostname }} |cut -d":" -f2,2 |awk -F"," '{for (i=1; i<=NF; i++) print $i}'); virshlist=$( virsh list | grep running | awk '{print $2}'); for i in $vmlist; do vm=$( echo $i | cut -d"+" -f2,2 ); tmp=$( echo "$virshlist" | grep $vm); if [ $? -eq 1 ]; then uuid=$( echo "$i" | cut -d"+" -f1,1); echo "$uuid"; rc=$(echo "1"); fi; done; if [ "$rc" == "1" ]; then false; else true; fi
register: vms_not_running
when: novacompute.state == "started"
- debug: msg="{{vms_not_running}}"
when: vms_not_running.rc == 1

View File

@ -1,112 +0,0 @@
#!/usr/bin/env ansible-playbook --forks 50
---
# Distribute/prestage Glance image out to many compute nodes at once using BitTorrent
#
# Author: Kris Lindgren <klindgren@godaddy.com>
#
# Sets up a BitTorrent swarm, seeded by the machine running Glance, to distribute a Glance image out to any number
# of nova-compute nodes very quickly. So when we roll new gold images each month, we can use this to
# "push" them out to all compute nodes, and avoid the first provision penalty of waiting for the image
# to transfer the first time.
#
# Caveats:
# * Only works when the Glance backend is on a traditional (local or network-based) filesystem. Almost
# certainly this does not work, and may not even make sense, for Swift-backed Glance.
# * Firewalls or other traffic filters need to allow the BitTorrent ports through, and among, the Glance
# server and all compute nodes.
# * Assumes Glance images are stored at `/var/lib/glance/images` on the Glance server.
# * There are some situations where this cannot be run multiple times, if the tracker or othe BitTorrent
# processes are still running. So use caution, and YMMV.
# * This is done completely outside the scope of Glance and Nova. There is no Keystone authentication or
# access controls. You must have ssh and sudo access to all machines involved for this to work.
#
# This playbook requires the following variables:
# image_uuid - this can be gotten from the output of either nova image-list or glance image-list for the image you want to prestage
# image_sha1 - this can be gotten by running: echo -n "<image_uuid" | sha1sum | awk '{print $1'} on any linux box
# image_md5 - this can be gotten by running: md5sum /var/lib/glance/images/<image_uuid> | awk '{print $1}' on the glance server
# tracker_host - this is the host that runs the tracker (also this is the same host in the first and second play)
# hosts_to_update - this is the host group to place the image onto typically *-compute
# To run without changing the plabook with differnt uuids run ansible-plabook with the following:
# ansible-playbook template-prestage.yaml -k -K --extra-vars "image_uuid=41009dbd-52f5-4972-b65f-c429b1d42f5f image_sha1=1b8cddc7825df74e19d0a621ce527a0272541c35 image_md5=41d45920d859a2d5bd4d1ed98adf7668 tracker_host=api01 hosts_to_update=compute"
- hosts: '{{ tracker_host }}'
sudo: yes
vars:
image_uuid: '{{ image_uuid }}'
tracker_host: '{{ tracker_host }}'
tasks:
- name: install ctorrent client
yum: name=ctorrent state=present
- name: install opentracker-ipv4
yum: name=opentracker-ipv4 state=present
- name: make sane
shell: "killall -9 opentracker-ipv4 | true; killall -9 ctorrent | true;"
- name: Start Tracker
command: "{{item}}"
with_items:
- /usr/bin/opentracker-ipv4 -m -p 6969 -P 6969 -d /var/opentracker
- name: Create bittorrent file
command: "{{item}}"
with_items:
- mkdir -p /var/www/html/torrent
- rm -rf /var/www/html/torrent/{{ image_uuid }}.torrent
- /usr/bin/ctorrent -t -s /var/www/html/torrent/{{ image_uuid }}.torrent -u http://{{ tracker_host }}:6969/announce -c Testfile /var/lib/glance/images/{{ image_uuid }}
- name: Seed Bittorrent file
command: /usr/bin/ctorrent -d -U 50000 -s /var/lib/glance/images/{{ image_uuid }} /var/www/html/torrent/{{ image_uuid }}.torrent
- hosts: '{{hosts_to_update}}'
sudo: yes
vars:
image_uuid: '{{ image_uuid }}'
image_sha1: '{{ image_sha1 }}'
image_md5: '{{ image_md5 }}'
tracker_host: '{{ tracker_host }}'
tasks:
- name: install ctorrent client
yum: name=ctorrent state=present
- name: Check if image exits
stat: path=/var/lib/nova/instances/_base/{{ image_sha1 }}
register: image
- name: make sane
shell: "killall -9 ctorrent | true; iptables -D INPUT -p tcp --dport 2704:2706 -j ACCEPT | true"
when: image.stat.exists == False
- name: Download Torrent File and run torrent
command: "{{item}}"
with_items:
- /sbin/iptables -I INPUT -p tcp --dport 2704:2706 -j ACCEPT
- /usr/bin/wget http://{{ tracker_host }}/torrent/{{ image_uuid }}.torrent
- /usr/bin/ctorrent -e 0 -m 10 -U 30000 -D 80000 -p 2706 -s /var/lib/elsprecachedir/{{ image_uuid }} {{ image_uuid }}.torrent
when: image.stat.exists == False
- name: insure md5sum matches
shell: "md5sum /var/lib/elsprecachedir/{{ image_uuid }} | grep {{ image_md5 }}"
when: image.stat.exists == False
- name: Convert image to raw file
command: "{{item}}"
with_items:
- /usr/bin/qemu-img convert -f qcow2 -O raw /var/lib/elsprecachedir/{{ image_uuid }} /var/lib/nova/instances/_base/{{ image_sha1 }}
- /bin/chown nova:qemu /var/lib/nova/instances/_base/{{ image_sha1 }}
- /bin/chmod 644 /var/lib/nova/instances/_base/{{ image_sha1 }}
when: image.stat.exists == False
- name: Cleanup
shell: "/sbin/iptables -D INPUT -p tcp --dport 2704:2706 -j ACCEPT | true; rm -rf {{ image_uuid }}*; rm -rf /var/lib/elsprecachedir/{{ image_uuid }}; killall -9 ctorrent | true"
when: image.stat.exists == False
- hosts: '{{ tracker_host }}'
sudo: yes
vars:
image_uuid: '{{ image_uuid }}'
tasks:
- name: Kill tracker and ctorrent and remove torrent file
shell: "killall -9 ctorrent | true ; killall -9 opentracker-ipv4 | true; rm -rf /var/www/html/torrent/{{ image_uuid }}"

View File

@ -1,3 +0,0 @@
- name: Running puppet deployment script
shell: "tools/remove-deleted-orphans.sh"

View File

@ -1,21 +0,0 @@
---
- name: Rebooting Server
shell: sleep 2 && /sbin/shutdown -r now &
tags: reboot
- name: Waiting for port to go down from server reboot
wait_for: host={{ inventory_hostname }} port={{ reboot_check_port | default('22') }} timeout={{ wait_timeout | default('1200') }} state=stopped
connection: local
sudo: false
tags: reboot
- name: Waiting for port to come back after reboot
wait_for: host={{ inventory_hostname }} port={{ reboot_check_port | default('22') }} delay={{ wait_delay | default('120') }} timeout={{ wait_timeout | default('1200') }} state=started
connection: local
sudo: false
tags: reboot
- name: pausing to make sure host is fully booted
pause: minutes={{ pause_for_host_boot | default('3') }}
tags: reboot

View File

@ -1,4 +0,0 @@
- name: Removing user_storage_quota setting from glance-api.conf
shell: "sed -r -i 's/^[[:space:]]*user_storage_quota/#user_storage_quota/g' /etc/glance/glance-api.conf"
- service: name=openstack-glance-api state=restarted

View File

@ -1,3 +0,0 @@
- name: Adding user_storage_quota setting to glance-api.conf
shell: "sed -r -i '0,/^#?[[:space:]]*user_storage_quota/s/^#?[[:space:]]*user_storage_quota[[:space:]]*=[[:space:]]*[[:digit:]]+/user_storage_quota = {{ value | default('21474836480') }}/' /etc/glance/glance-api.conf"
- service: name=openstack-glance-api state=restarted

View File

@ -1,54 +0,0 @@
#!/bin/bash
# OpenStack credentialss are expected to be in your environment variables
if [ -z "$OS_AUTH_URL" -o -z "$OS_PASSWORD" -o -z "$OS_USERNAME" ]; then
echo "Please set OpenStack auth environment variables."
exit 1
fi
# temp files for caching outputs
volume_ids=$(mktemp)
cinder_reported_tenants=$(mktemp)
keystone_tenants=$(mktemp)
final_report=$(mktemp)
# get a list of all cinder volumes and their owner
echo -en "Retrieving list of all volumes...\r"
# oh cinder...
for volume in `cinder list --all-tenants | tail -n +4 | awk '{print $2}'`; do
for line in `cinder show $volume | \
grep 'os-vol-tenant-attr:tenant_id\| id ' | awk '{print $4}'`; do
echo -en " $line" >> $volume_ids
done
echo "" >> $volume_ids
done
awk '{print $2}' < $volume_ids | sort -u > $cinder_reported_tenants
# get a list of all tenants, as reported by keystone
echo -en "Retrieving list of all tenants...\r"
keystone tenant-list | tail -n +4 | awk '{print $2}' | \
sort -u > $keystone_tenants
# some rough/poor formatting
echo "Comparing outputs to locate orphaned volumes...\r"
echo "+--------------------------------------+--------------------------------\
---+----------------------------+--------------+------+--------+"
echo "| volume_id | tenant_id \
| created_at | display_name | size | status |"
echo "+--------------------------------------+--------------------------------\
---+----------------------------+--------------+------+--------+"
for tenant_id in `comm --nocheck-order -13 \
$keystone_tenants $cinder_reported_tenants`; do
for volume_id in `grep $tenant_id $volume_ids | awk '{print $1}'`; do
echo -en "| $volume_id | $tenant_id |"
for attr in `cinder show $volume_id |\
grep ' status \| size \| display_name \| created_at ' |\
awk '{print $4}'`; do
echo -en " $attr |"
done
echo ""
done
done
# cleanup after ourself
rm $keystone_tenants $volume_ids $cinder_reported_tenants $final_report

View File

@ -1,33 +0,0 @@
#!/usr/bin/env bash
#
# Run this script on a compute node to cleanup/remove any orphaned KVM VMs that were left behind by something.
# Run with --noop to do a dry run and not actually delete anything
#
# To populate the UUIDS value below, run the following command on a Nova api server to get list of VM UUIDs that are known to OpenStack:
# nova list --all-tenants | awk '{print $2;}' | grep -E '^[0-9a-f]+' | tr '\n' '|' | sed -r 's/\|$/\n/'
# Then paste in the results for UUIDS below OR define it in the environment before running this script.
#
# Author: Kris Lindgren <klindgren@godaddy.com>
#
#UUIDS=""
if [ -z "$UUIDS" ]; then
echo "UUIDS value not defined"
exit 1
fi
for vm-uuid in `virsh list --uuid --all` ; do
echo $vm-uuid | grep -E "$UUIDS" >/dev/null
if [ $? -ne 0 ]; then
echo -n "+ $vm-uuid is NOT known to OpenStack, removing managedsave info... "
[ -z "$1" ] && virsh managedsave-remove $vm-uuid 1>/dev/null 2>&1
echo -n "destroying VM... "
[ -z "$1" ] && virsh destroy $vm-uuid 1>/dev/null 2>&1
echo -n "undefining VM... "
[ -z "$1" ] && virsh undefine $vm-uuid 1>/dev/null 2>&1
echo DONE
else
echo "* $vm-uuid is known to OpenStack, not removing."
fi
done

View File

@ -1,140 +0,0 @@
#!/usr/bin/env ruby
#
# This script collects statistics of instances running on a libvirt-based
# compute node.
#
# It outputs these stats in a portable-ish way so that they can be stored
# in any type of backend. For example, we're storing these stats both in
# SQL and RRD. We then generate graphs through Grafana and link to the
# graphs in Horizon.
#
# The following stats are collectd:
#
# * cpu usage and time
# * memory used and available
# * interface bytes, packets, errors, and drops
# * disk bytes, reqs, flushes, times, usage
#
# Output format:
# uuid stat_category k:v k:v k:v
#
# Notes:
# * This script tries to be as quiet as possible. If a stat is unable
# to be retrieved, the script either moves on to the next instance
# or prints empty values.
#
# * `nova diagnostics uuid` gives similar results, though could take
# longer to run cloud-wide. The reported memory for `diagnostics`
# looks more accurate -- I need to look into this.
#
# * cpu "usage" is only useful for short, bursty use-cases. Do not use
# it if the instance runs longer than an initial burst. To calculate
# cpu usage more accurately, focus on cpu time, real time, and number
# of cores.
#
# Any questions or comments, contact jtopjian
uuid_output = `cd /etc/libvirt/qemu; grep -H '<uuid>' instance-*.xml`
uuid_output.split("\n").each do |line|
output = []
line.gsub!(/<\/?uuid>/, '')
line.gsub!(/\s+/, '')
line.gsub!(/\.xml/, '')
(instance, uuid) = line.split(':')
# Instance ID
output << "#{uuid} instance instance:#{instance}"
# Get CPU time and memory usage
cpu_time = 0
dominfo_output = %x{ virsh dominfo #{instance} 2> /dev/null }
if $? == 0
dominfo = {}
dominfo_output.split("\n").each do |dominfo_line|
(dominfo_key, dominfo_value) = dominfo_line.downcase.split(/:\s+/)
dominfo[dominfo_key] = dominfo_value
end
next if dominfo['state'] != 'running'
cpu_time = dominfo['cpu time'].gsub('s', '')
available = dominfo['max memory'].gsub(' kib', '')
used = dominfo['used memory'].gsub(' kib', '')
output << "#{uuid} memory available:#{available} used:#{used}"
else
output << "#{uuid} memory available:0 used:0"
end
# Get CPU usage
pid = %x{ pgrep -f #{instance} 2> /dev/null }.chomp
if $? == 0
cpu_command = "ps -p #{pid} -o %cpu h"
cpu = %x{ #{cpu_command} 2> /dev/null }.gsub!(/\s+/, '')
output << "#{uuid} cpu cpu_usage:#{cpu.to_i} cpu_time:#{cpu_time}"
else
output << "#{uuid} cpu cpu_usage:0 cpu_time:0"
end
# Get interface usage
iflist_output = %x{ virsh domiflist #{instance} | grep vnet | cut -d' ' -f1 2> /dev/null }.chomp
if $? == 0
ifstat_output = %x{ virsh domifstat #{instance} #{iflist_output} 2> /dev/null }
if $? == 0
ifstats = []
ifstat_output.split("\n").each do |ifstat_line|
(interface, metric, value) = ifstat_line.split(/\s+/)
ifstats << "#{metric}:#{value}"
end
output << "#{uuid} interface interface:eth0 #{ifstats.join(' ')}"
end
end
# Get storage usage
blkstats = {}
{'disk' => 'vda', 'disk.local' => 'vdb'}.each do |disk, blk|
disk_path = nil
blkstats[blk] = []
if File.exists?("/var/lib/nova/instances/#{instance}/#{disk}")
disk_path = "/var/lib/nova/instances/#{instance}/#{disk}"
elsif File.exists?("/var/lib/nova/instances/#{uuid}/#{disk}")
disk_path = "/var/lib/nova/instances/#{uuid}/#{disk}"
end
if disk_path
blkstats[blk] = []
blkstat_output = %x{ virsh domblkstat #{instance} #{blk} 2> /dev/null }.chomp
if $? == 0
blkstat_output.split("\n").each do |blkstat_line|
(blk, metric, value) = blkstat_line.split(/\s+/)
blkstats[blk] << "#{metric}:#{value}"
end
qemu_output = %x{ qemu-img info #{disk_path} | grep ^disk | cut -d' ' -f3 2> /dev/null }.chomp
if $? == 0
if qemu_output =~ /K/
qemu_output.gsub!('K', '')
qemu_output = qemu_output.to_i * 1024
end
if qemu_output =~ /M/
qemu_output.gsub!('M', '')
qemu_output = qemu_output.to_i * 1024 * 1024
end
if qemu_output =~ /G/
qemu_output.gsub!('G', '')
qemu_output = qemu_output.to_i * 1024 * 1024 * 1024
end
blkstats[blk] << "bytes_used:#{qemu_output}"
end
end
end
end
blkstats.each do |drive, stats|
output << "#{uuid} disk disk:#{drive} #{stats.join(' ')}"
end
puts output.join("\n")
end

View File

@ -1,40 +0,0 @@
#!/bin/bash
#
# This script will look at the configured vm's and will check to make sure that
# their disk drive still exists, If not then it will remove the vm from
# libvirt. This fixes the nova errors about disks missing from VM's
#
# Author: Kris Lindgren <klindgren@godaddy.com>
removeorphan(){
local domain
local tmp
domain=$1
tmp=$( virsh destroy $domain )
tmp=$( virsh undefine $domain )
tmp=$(virsh list --all | grep $domain )
if [ $? -eq 1 ]; then
tmp=$( ps auxwwwf | grep $domain | grep -v grep )
if [ $? -eq 1 ]; then
return 0
fi
fi
return 1
}
for i in /etc/libvirt/qemu/*.xml; do
disklocation=$( grep /var/lib/nova/instances $i | grep disk | \
cut -d"'" -f2,2)
if [ ! -e $disklocation ]; then
orphan=$(echo $i | cut -d"/" -f5,5 | cut -d"." -f1,1)
echo "$orphan does not have a disk located at: $disklocation"
echo "This is an orphan of openstack... stopping the orphaned vm."
removeorphan $orphan
if [ $? -eq 0 ]; then
echo "Domain $orphan has been shutdown and removed"
else
echo "Domain $orphan has *NOT* been shutdown and removed"
fi
fi
done

View File

@ -1,49 +0,0 @@
#!/bin/bash
#
# Copyright 2016 Workday, Inc. All Rights Reserved.
#
# Author: Edgar Magana <edgar.magana@workday.com>
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# This script decodes the information in /proc/cpuinfo and
# produces a human readable version displaying:
# - Total number of physical CPUs
# - Total number of logical CPUs
# - Model of the chipset
#
# Default linux file for CPU information
CPUFILE=/proc/cpuinfo
NUMPHY=`grep "physical id" $CPUFILE | sort -u | wc -l`
NUMLOG=`grep "processor" $CPUFILE | wc -l`
if [ $NUMPHY -eq 1 ]; then
echo This system has one physical CPU,
else
echo This system has $NUMPHY physical CPUs,
fi
if [ $NUMLOG -gt 1 ]; then
echo and $NUMLOG logical CPUs
NUMCORE=`grep "core id" $CPUFILE | sort -u | wc -l`
if [ $NUMCORE -gt 1 ]; then
echo For every physical CPU there are $NUMCORE cores.
fi
else
echo and one logical CPU.
fi
echo -n The CPU is a `grep "model name" $CPUFILE | sort -u | cut -d : -f 2-`
echo " with`grep "cache size" $CPUFILE | sort -u | cut -d : -f 2-` cache"

View File

@ -1,52 +0,0 @@
# Neutron Orphan Cleanup Tools
Provides a simple set of scripts to aid in the cleanup of orphaned resources in Neutron. Current
scripts include:
* list_orhpans.py - List orphaned networks, subnets, routers and floating IPs.
* delete_orphan_floatingips.py - Cleanup floating IPs without any associated ports.
* delete_tenantless_floatingips.py - Cleanup floating IPs without an associated tenant / project.
### Installation
Scripts work with Python 2.7 or newer including Python 3. It is suggested you install any Python
scripts in a seperate Python VirtualEnv to prevent spoiling your system's Python environment.
Create a new VirtualEnv and assume the VirtualEnv
```
virtualenv orphan_tools
source orphan_tools/bin/activate
```
Install dependencies
`pip install -r requirements.txt`
### Usage
Export OpenStack credentials as environment variables
```
export OS_USERNAME=test
export OS_PASSWORD=mys3cr3t
export OS_AUTH-URL=https://controller:5000/v2.0
export OS_TENANT_NAME=test
export OS_REGION_NAME=my_region
```
List orphaned Neutron resources
`python list_orphans.py`
It is recommended before you delete orphaned resources that you do a dry run and check the script
proposes to delete the resources you expect
`python delete_orphan_floatingips.py --dry-run`
Once you are happy you'd like to delete the items returned by the dry run remove the --dry-run flag
to perform the deletion
`python delete_orphan_floatingips.py`

View File

@ -1,43 +0,0 @@
#!/usr/bin/env python
"""
This script deletes all the floatingips a user has that are not
associated with a port_id.
"""
import os
import sys
from neutronclient.v2_0 import client
def main():
dry_run = (len(sys.argv) > 1 and sys.argv[1] == '--dry-run')
try:
username = os.environ['OS_USERNAME']
tenant_name = os.environ['OS_TENANT_NAME']
password = os.environ['OS_PASSWORD']
auth_url = os.environ['OS_AUTH_URL']
region_name = None
if 'OS_REGION_NAME' in os.environ:
region_name = os.environ['OS_REGION_NAME']
except KeyError:
print("You need to source your openstack creds file first!")
sys.exit(1)
neutron = client.Client(username=username,
tenant_name=tenant_name,
password=password,
auth_url=auth_url,
region_name=region_name)
floatingips = neutron.list_floatingips()
for floatingip in floatingips['floatingips']:
if not floatingip['port_id']:
print(("Deleting floatingip %s - %s") %
(floatingip['id'], floatingip['floating_ip_address']))
if not dry_run:
neutron.delete_floatingip(floatingip['id'])
if __name__ == "__main__":
main()

View File

@ -1,53 +0,0 @@
#!/usr/bin/env python
"""
This script deletes all the floatingips a user has that are not
associated with a tenant.
"""
import os
import sys
import keystoneclient.v2_0.client as ksclient
from neutronclient.v2_0 import client
def main():
dry_run = (len(sys.argv) > 1 and sys.argv[1] == '--dry-run')
try:
username = os.environ['OS_USERNAME']
tenant_name = os.environ['OS_TENANT_NAME']
password = os.environ['OS_PASSWORD']
auth_url = os.environ['OS_AUTH_URL']
region_name = None
if 'OS_REGION_NAME' in os.environ:
region_name = os.environ['OS_REGION_NAME']
except KeyError:
print("You need to source your openstack creds file first!")
sys.exit(1)
neutron = client.Client(username=username,
tenant_name=tenant_name,
password=password,
auth_url=auth_url,
region_name=region_name)
keystone = ksclient.Client(username=username,
tenant_name=tenant_name,
password=password,
auth_url=auth_url,
region_name=region_name)
floatingips = neutron.list_floatingips()
for floatingip in floatingips['floatingips']:
try:
keystone.tenants.get(floatingip['tenant_id'])
# If the tenant ID doesn't exist, then this object is orphaned
except ksclient.exceptions.NotFound:
print(("Deleting floatingip %s - %s") %
(floatingip['id'], floatingip['floating_ip_address']))
if not dry_run:
neutron.delete_floatingip(floatingip['id'])
if __name__ == "__main__":
main()

View File

@ -1,57 +0,0 @@
#!/usr/bin/env python
import os
import sys
import keystoneclient.v2_0.client as ksclient
import neutronclient.v2_0.client as nclient
def get_credentials():
credentials = {}
credentials['username'] = os.environ['OS_USERNAME']
credentials['password'] = os.environ['OS_PASSWORD']
credentials['auth_url'] = os.environ['OS_AUTH_URL']
credentials['tenant_name'] = os.environ['OS_TENANT_NAME']
if 'OS_REGION_NAME' in os.environ:
credentials['region_name'] = os.environ['OS_REGION_NAME']
return credentials
CREDENTIALS = get_credentials()
NEUTRON = nclient.Client(**CREDENTIALS)
KEYSTONE = ksclient.Client(**CREDENTIALS)
def usage():
print("listorphans.py <object> where object is one or more of ")
print("'networks', 'routers', 'subnets', 'floatingips' or 'all'")
def get_tenantids():
return [tenant.id for tenant in KEYSTONE.tenants.list()]
def get_orphaned_neutron_objects(neutron_obj):
neutron_objs = getattr(NEUTRON, 'list_' + neutron_obj)()
tenantids = get_tenantids()
orphans = []
for neutron_obj in neutron_objs.get(neutron_obj):
if neutron_obj['tenant_id'] not in tenantids:
orphans.append(neutron_obj['id'])
return orphans
if __name__ == '__main__':
if len(sys.argv) > 1:
if sys.argv[1] == 'all':
neutron_objs = ['networks', 'routers', 'subnets', 'floatingips']
else:
neutron_objs = sys.argv[1:]
for neutron_obj in neutron_objs:
orphans = get_orphaned_neutron_objects(neutron_obj)
print('%s orphan(s) found of type %s' % (len(orphans),
neutron_obj))
print('\n'.join(map(str, orphans)))
else:
usage()
sys.exit(1)

View File

@ -1,2 +0,0 @@
python-keystoneclient
python-neutronclient

View File

@ -1,35 +0,0 @@
#!/bin/bash
usage() {
echo "Usage: $0 [-n] [-q]"
echo "-n: Dry Run. Don't update the database"
echo "-q: Quiet mode. Only show incorrect quotas"
exit 1
}
while getopts ":nq" opt ; do
case ${opt} in
n)
base_msg="[DRY RUN] "
args="${args} -n"
;;
q)
args="${args} -q"
;;
*)
usage
;;
esac
done
echo "$(date): Tenant quota correction - started"
for x in $(keystone --insecure tenant-list | awk -F' |\
' '!/^\+/ && !/\ id\ / {print $2}'); do
msg="${base_msg}Correcting quota for tenant ${x}"
echo ${msg}
python ./auto-fix-quota.py ${args} --tenant ${x}
done
echo "$(date): Tenant quota correction - finished"

View File

@ -1,197 +0,0 @@
#!/usr/bin/python
"""
Author: amos.steven.davis@hp.com
Description: Fix nova quota in the nova database when the actual usage
and what nova thinks is the quota do not match.
"""
from nova import db
from nova import config
from nova import context
from nova import exception
from collections import OrderedDict
import argparse
import prettytable
def make_table(name, *args):
q = prettytable.PrettyTable(name)
q.align = "c"
q.add_row(args[0])
return q
def get_actual_usage(cntxt, tenant):
filter_object = {'deleted': '',
'project_id': tenant}
instances = db.instance_get_all_by_filters(cntxt, filter_object)
# calculate actual usage
actual_instance_count = len(instances)
actual_core_count = 0
actual_ram_count = 0
for instance in instances:
actual_core_count += instance['vcpus']
actual_ram_count += instance['memory_mb']
actual_secgroup_count = len(db.security_group_get_by_project(cntxt, tenant))
if actual_secgroup_count == 0:
actual_secgroup_count = 1 # Every tenant uses quota for default security group
return OrderedDict((
("actual_instance_count", actual_instance_count),
("actual_core_count", actual_core_count),
("actual_ram_count", actual_ram_count),
("actual_secgroup_count", actual_secgroup_count)
))
def get_incorrect_usage(cntxt, tenant):
existing_usage = db.quota_usage_get_all_by_project(cntxt, tenant)
# {u'ram': {'reserved': 0L, 'in_use': 0L},
# u'floating_ips': {'reserved': 0L, 'in_use': 1L},
# u'instances': {'reserved': 0L, 'in_use': 0L},
# u'cores': {'reserved': 0L, 'in_use': 0L},
# 'project_id': tenant,
# u'security_groups': {'reserved': 0L, 'in_use': 1L}}
#
# Get (instance_count, total_cores, total_ram) for project.
# If instances does not exist, then this
try:
security_groups = existing_usage["security_groups"]["in_use"]
except KeyError:
security_groups = 1
try:
instances = existing_usage["instances"]["in_use"]
except KeyError:
instances = 0
try:
cores = existing_usage["cores"]["in_use"]
except KeyError:
cores = 0
try:
ram = existing_usage["ram"]["in_use"]
except KeyError:
ram = 0
return OrderedDict((
("db_instance_count", instances),
("db_core_count", cores),
("db_ram_count", ram),
("db_secgroup_count", security_groups)
))
def fix_usage(cntxt, tenant):
# Get per-user data for this tenant since usage is now per-user
filter_object = {'project_id': tenant}
instance_info = db.instance_get_all_by_filters(cntxt, filter_object)
usage_by_resource = {}
#resource_types = ['instances', 'cores', 'ram', 'security_groups']
states_to_ignore = ['error', 'deleted', 'building']
for instance in instance_info:
user = instance['user_id']
# We need to build a list of users who have launched vm's even if the user
# no longer exists. We can't use keystone here.
if not usage_by_resource.has_key(user):
usage_by_resource[user] = {} # Record that this user has once used resources
if not instance['vm_state'] in states_to_ignore:
user_resource = usage_by_resource[user]
user_resource['instances'] = user_resource.get('instances', 0) + 1
user_resource['cores'] = user_resource.get('cores', 0) + instance['vcpus']
user_resource['ram'] = user_resource.get('ram', 0) + instance['memory_mb']
secgroup_list = db.security_group_get_by_project(cntxt, tenant)
for group in secgroup_list:
user = group.user_id
if not usage_by_resource.has_key(user):
usage_by_resource[user] = {} # Record that this user has once used resources
user_resource = usage_by_resource[user]
user_resource['security_groups'] = user_resource.get('security_groups', 0) + 1
# Correct the quota usage in the database
for user in usage_by_resource:
for resource in resource_types:
usage = usage_by_resource[user].get(resource, 0)
try:
db.quota_usage_update(cntxt, tenant, user, resource, in_use=usage)
except exception.QuotaUsageNotFound as e:
print e
print 'db.quota_usage_update(cntxt, %s, %s, %s, in_use=%s)' % \
(tenant, user, resource, usage)
def print_usage(cntxt, tenant):
actual_table_name = ["Actual Instances",
"Actual Cores",
"Actual RAM",
"Actual Security_Groups"]
# these are spaced so that the Quota & DB tables match in size
incorrect_table_name = [" DB Instances ",
" DB Cores ",
" DB RAM ",
" DB Security_Groups "]
print "############### Actual Usage (including non-active instances) ###############"
print make_table(actual_table_name, get_actual_usage(cntxt, tenant).values())
print "############### Database Usage ###############"
print make_table(incorrect_table_name, get_incorrect_usage(cntxt, tenant).values())
resource_types = ['instances', 'cores', 'ram', 'security_groups']
config.parse_args(['filename', '--config-file', '/etc/nova/nova.conf'])
# Get other arguments
parser = argparse.ArgumentParser(
description='Fix quota differences between reality and the database')
parser.add_argument('--tenant', help='Specify tenant', required=True)
parser.add_argument('-n', '--dryrun', help='Dry Run - don\'t update the database',
action="store_true")
parser.add_argument('-q', '--quiet', help='Quiet mode. Only show incorrect quotas',
action="store_true")
args = parser.parse_args()
tenant = args.tenant
# Get admin context
cxt = context.get_admin_context()
# if the actual usage & the quota tracking differ,
# update quota to match reality
try:
actual = get_actual_usage(cxt, tenant).values()
incorrect = get_incorrect_usage(cxt, tenant).values()
except:
exit(2)
if actual == incorrect:
if not args.quiet:
print_usage(cxt, tenant)
print "%s quota is OK" % tenant
exit(0)
else:
print "%s usage and database differ" % tenant
print_usage(cxt, tenant)
if args.dryrun:
print "Dry Run Mode Enabled - not correcting the quota database."
exit(1)
else:
print "Updating quota usage to reflect actual usage..\n"
fix_usage(cxt, tenant)
print_usage(cxt, tenant)
# This section can replace the final if/else statement to allow prompting for
# each tenant before changes happen
# if get_incorrect_usage(cxt,tenant).values() == get_actual_usage(cxt,tenant).values():
# print "%s quota is OK" % tenant
# else:
# if raw_input("Enter 'YES' to make the Database Usage match the Actual Usage. " \
# "This will modify the Nova database: ") != "YES":
# print "Exiting."
# exit(0)
# else:
# fix_usage(cxt,tenant,actual_table_name,incorrect_table_name)

View File

@ -1,24 +0,0 @@
#!/bin/bash
#
# A quick and dirty script to backfill empty config drive
# images to VMs that don't already have a config drive.
#
# This is a workaround for the config drive bug described
# at https://bugs.launchpad.net/nova/+bug/1356534
#
# Author: Mike Dorman <mdorman@godaddy.com>
cd /root
mkdir -p blank
mkisofs -o blank.iso blank/ >/dev/null 2>&1
rmdir blank
for i in `ls /var/lib/nova/instances | \
grep -E '[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}'`; do
ls -l /var/lib/nova/instances/$i/disk.config
if [ ! -s /var/lib/nova/instances/$i/disk.config ]; then
echo "$i config drive doesn't exist, or is size zero."
cp -f /root/blank.iso /var/lib/nova/instances/$i/disk.config
chown qemu:qemu /var/lib/nova/instances/$i/disk.config
fi
done

View File

@ -1,22 +0,0 @@
#!/bin/bash
#
# Outputs a tab-delimited list of all VMs with these fields:
# [Hypervisor Host] [UUID] [Status] [IP Address] [Name]
#
# Author: Mike Dorman <mdorman@godaddy.com>
IFS="
"
for i in `nova list --all-tenants | grep -v '^+-' | grep -v '^| ID' |\
cut -d "|" -f 2,3,5 | sed -e "s/ *| */,/g" -e "s/^ *//g"` ; do
ID=`echo $i | cut -d, -f 1`
NAME=`echo $i | cut -d, -f 2`
STATUS=`echo $i | cut -d, -f 3`
SHOW=`nova show ${ID}`
HV=`echo "${SHOW}" | grep OS-EXT-SRV-ATTR:host | awk '{print $4;}'`
IP=`echo "${SHOW}" | grep " network" | sed -e "s/.*network *| //" -e "s/ *| *$//"`
echo -e "${HV}\t${ID}\t${STATUS}\t${IP}\t${NAME}"
done

View File

@ -1,32 +0,0 @@
#!/bin/bash
#
# Lists VMs which have been orphaned from their tenant (i.e. the tenant
# was removed, but VMs were still in the tenant.)
#
# Author: Kris Lindgren <klindgren@godaddy.com>
echo "THIS SCRIPT NEED TO HAVE keystonerc sourced to work"
sleep 5
echo "Getting a list of vm's from nova..."
novavmsraw=$( nova list --all-tenants --fields name,tenant_id,user_id )
echo "done."
echo "Getting a list of tenants from keystone...."
keystoneraw=$( keystone tenant-list )
echo "done."
novatenants=$( echo "$novavmsraw" | awk '{print $6}' | sort | uniq |\
grep -v Tenant )
echo "Starting to list vm's that are no longer attached to a tenant..."
echo "Fields are:"
echo "| VM ID | \
VM Name | Tenant Id | \
User Id |"
for i in $novatenants; do
tmp=$( echo "$keystoneraw" | grep $i )
if [ $? -eq 0 ]; then
continue
else
vms=$( echo "$novavmsraw" | grep $i )
echo "$vms"
fi
done

View File

@ -1,174 +0,0 @@
#!/bin/bash
#
# Copyright 2012 Hewlett-Packard Development Company, L.P. All Rights Reserved.
#
# Author: Simon McCartney <simon.mccartney@hp.com>
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# purge tables of "deleted" records by archiveing them in sensible chunks to
# the shadow tables this work was started in PAASINFRA-206
#
# default to archiving all records flagged as deleted,
# use the -n option to enable dry run mode
unset DRY_RUN
# tables to arhive deleted records from
DATABASE=nova
TABLES="security_group_rules security_group_instance_association \
security_groups instance_info_caches instances reservations"
FKTABLES="block_device_mapping instance_metadata instance_system_metadata \
instance_actions instance_faults virtual_interfaces fixed_ips \
security_group_instance_association migrations instance_extra"
TABLES="${TABLES} ${FKTABLES}"
## process the command line arguments
while getopts "hnad:H:u:p:" opt; do
case $opt in
h)
echo "openstack_db_archive.sh - archive records flagged as deleted\
into the shadow tables."
echo "Records are archived from the following tables:"
echo
for TABLE in ${TABLES}; do
echo " ${DATABASE}.${TABLE}"
done
echo
echo "Options:"
echo " -n dry run mode - pass --dry-run to pt-archiver"
echo " -a no safe auto increment - pass --nosafe-auto-increment" \
"to pt-archiver"
echo " -d db name"
echo " -H db hostname"
echo " -u db username"
echo " -p db password"
echo " -h (show help)"
exit 0
;;
n)
DRY_RUN="--dry-run"
;;
a)
NOSAI="--nosafe-auto-increment"
;;
d)
DATABASE=${OPTARG}
;;
H)
HOSTPT=",h=${OPTARG}"
HOST="-h ${OPTARG}"
;;
u)
USERPT=",u=${OPTARG}"
USER="-u ${OPTARG}"
;;
p)
PASSPT=",p=${OPTARG}"
PASS="-p${OPTARG}"
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
:)
echo "Option -$OPTARG requires an argument." >&2
exit 1
;;
esac
done
echo
echo `date` "OpenStack Database Archiver starting.."
echo
echo `date` "Purging nova.instance_actions_events of deleted instance data"
# this is back to front (on delete if you can find a record in instances
# flagged for deletion)
# --where 'EXISTS(SELECT * FROM instance_actions, instances WHERE
# instance_actions.id=instance_actions_events.action_id AND
# instance_actions.instance_uuid=instances.uuid AND instances.deleted!=0)'
TABLE=instance_actions_events
SHADOW_TABLE="shadow_${TABLE}"
pt-archiver ${DRY_RUN} ${NOSAI} --statistics --sleep-coef 0.75 \
--progress 100 --commit-each --limit 10 \
--source D=${DATABASE},t=${TABLE}${HOSTPT}${USERPT}${PASSPT} \
--no-check-charset \
--dest D=${DATABASE},t=${SHADOW_TABLE}${HOSTPT}${USERPT}${PASSPT} \
--where 'EXISTS(SELECT * FROM instance_actions, instances WHERE '\
'instance_actions.id=instance_actions_events.action_id AND '\
'instance_actions.instance_uuid=instances.uuid AND instances.deleted!=0)'
for TABLE in ${FKTABLES}; do
echo `date` "Purging nova.${TABLE} of deleted instance data"
# this is back to front (on delete if you can find a record in instances
# flagged for deletion)
# --where 'EXISTS(SELECT * FROM instances WHERE deleted!=0 AND \
# uuid='${TABLE}'.instance_uuid)'
# to delete where there is no active record:
# --where 'NOT EXISTS(SELECT * FROM instances WHERE deleted=0 AND \
# uuid='${TABLE}'.instance_uuid)'
SHADOW_TABLE="shadow_${TABLE}"
pt-archiver ${DRY_RUN} ${NOSAI} --statistics --sleep-coef 0.75 \
--progress 100 --commit-each --limit 10 \
--source D=${DATABASE},t=${TABLE}${HOSTPT}${USERPT}${PASSPT} \
--no-check-charset \
--dest D=${DATABASE},t=${SHADOW_TABLE}${HOSTPT}${USERPT}${PASSPT} \
--where 'EXISTS(SELECT * FROM instances WHERE deleted!=0 '\
'AND uuid='${TABLE}'.instance_uuid)'
done
for TABLE in ${TABLES}; do
SHADOW_TABLE="shadow_${TABLE}"
ACTIVE_RECORDS=`mysql ${HOST} ${USER} ${PASS} \
-B -e "select count(id) from ${DATABASE}.${TABLE} where deleted=0" \
| tail -1`
DELETED_RECORDS=`mysql ${HOST} ${USER} ${PASS} -B -e \
"select count(id) from ${DATABASE}.${TABLE} where deleted!=0" | tail -1`
LOCAL_ABORTS=`mysql ${HOST} ${USER} ${PASS} -B -e \
"SHOW STATUS LIKE 'wsrep_%'" | \
grep -e wsrep_local_bf_aborts -e wsrep_local_cert_failures`
echo
echo
echo `date` "Archiving ${DELETED_RECORDS} records to ${SHADOW_TABLE} from \
${TABLE}, leaving ${ACTIVE_RECORDS}"
echo `date` "LOCAL_ABORTS before"
echo ${LOCAL_ABORTS}
pt-archiver ${DRY_RUN} ${NOSAI} --statistics --progress 100 \
--commit-each --limit 10 \
--source D=${DATABASE},t=${TABLE}${HOSTPT}${USERPT}${PASSPT} \
--dest D=${DATABASE},t=${SHADOW_TABLE}${HOSTPT}${USERPT}${PASSPT} \
--ignore --no-check-charset --sleep-coef 0.75 \
--where "deleted!=0"
echo `date` "Finished archiving ${DELETED_RECORDS} to ${SHADOW_TABLE} from\
${TABLE}"
echo `date` "LOCAL_ABORTS before"
echo ${LOCAL_ABORTS}
LOCAL_ABORTS=`mysql ${HOST} ${USER} ${PASS} -B -e \
"SHOW STATUS LIKE 'wsrep_%'" | \
grep -e wsrep_local_bf_aborts -e wsrep_local_cert_failures`
echo `date` "LOCAL_ABORTS after"
echo ${LOCAL_ABORTS}
echo
done
echo
echo `date` "OpenStack Database Archiver finished."
echo

View File

@ -1,69 +0,0 @@
#!/bin/bash
#
# Copyright 2012 Hewlett-Packard Development Company, L.P. All Rights Reserved.
#
# Author: Simon McCartney <simon.mccartney@hp.com>
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
# Report on the current state of unarchived records in the main nova.* tables
DATABASE=nova
FKTABLES="block_device_mapping instance_metadata instance_system_metadata \
instance_actions instance_faults virtual_interfaces fixed_ips \
security_group_instance_association migrations instance_extra"
TABLES="${TABLES} ${FKTABLES}"
function usage {
echo "$0: Report on the current state of unarchived records in the\
main nova.* tables"
echo "Usage: $0 -d [database] -H [hostname] -u [username] -p [password]"
}
while getopts "d:H:u:p:" opt; do
case $opt in
d)
DATABASE=${OPTARG}
;;
H)
HOST="-h ${OPTARG}"
;;
u)
USER="-u ${OPTARG}"
;;
p)
PASS="-p${OPTARG}"
;;
*)
usage
exit 1
;;
esac
done
for TABLE in ${TABLES}; do
SHADOW_TABLE="shadow_${TABLE}"
ACTIVE_RECORDS=`mysql ${HOST} ${USER} ${PASS} -B -e \
"select count(id) from ${DATABASE}.${TABLE} where deleted=0" | tail -1`
DELETED_RECORDS=`mysql ${HOST} ${USER} ${PASS} -B -e \
"select count(id) from ${DATABASE}.${TABLE} where deleted!=0" | \
tail -1`
SHADOW_RECORDS=`mysql ${HOST} ${USER} ${PASS} -B -e \
"select count(id) from ${DATABASE}.${SHADOW_TABLE}" | tail -1`
TOTAL_RECORDS=`expr $ACTIVE_RECORDS + $DELETED_RECORDS + $SHADOW_RECORDS`
echo `date` "${DATABASE}.${TABLE} has ${ACTIVE_RECORDS}," \
"${DELETED_RECORDS} ready for archiving and ${SHADOW_RECORDS}" \
"already in ${SHADOW_TABLE}. Total records is ${TOTAL_RECORDS}"
done

View File

@ -1,52 +0,0 @@
#!/bin/bash
# OpenStack credentials are expected to be in your environment variables
if [ -z "$OS_AUTH_URL" -o -z "$OS_PASSWORD" -o -z "$OS_USERNAME" ]; then
echo "Please set OpenStack auth environment variables."
exit 1
fi
# temp files used for caching outputs
vm_tenants=$(mktemp)
keystone_tenants=$(mktemp)
# get a list of all VMs in the cluster and who they belong to
echo -en "Retrieving list of all VMs...\r"
nova list --all-tenants --fields tenant_id | tail -n +4 | awk '{print $4}' |\
sort -u > $vm_tenants
total_vms=$(cat $vm_tenants | wc -l)
if [ $total_vms == 0 ]; then
echo "Zero VMs found. Exiting..."
rm -f $vm_tenants $keystone_tenants
exit 1
fi
# get a list of all tenants/projects in the cluster
echo -en "Retrieving list of all tenants...\r"
keystone tenant-list | tail -n +4 | awk '{print $2}' |\
sort -u > $keystone_tenants
total_tenants=$(cat $keystone_tenants | wc -l)
if [ $total_tenants == 0 ]; then
echo "Zero tenants found. Exiting..."
rm -f $vm_tenants $keystone_tenants
exit 1
fi
# compare all VM owners to all tenants as reported by keystone and print
# any VMs whose owner does not exist in keystone
echo -en "Comparing outputs to locate orphaned VMs....\r"
iter=0
for tenant_id in `comm --nocheck-order -13 $keystone_tenants $vm_tenants`; do
if [[ $iter == 0 ]]; then
nova list --all-tenants --tenant=$tenant_id \
--fields tenant_id,name,status,created,updated | head -n -1
let "iter++"
else
nova list --all-tenants --tenant=$tenant_id \
--fields tenant_id,name,status,created,updated | \
tail -n +4 | head -n -1
fi
done
# cleanup after ourself
rm $keystone_tenants $vm_tenants

View File

@ -1,506 +0,0 @@
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
"""
What is this ?!
---------------
This script is designed to monitor VMs resource utilization
WorkFlow
--------
1) List all domains at the host via libvirt API
2) Spawn a separate thread for each domain for periodic check of disk usage
3) Spawn a separate thread for each domain for periodic check of memory usage
4) Spawn a separate thread for each domain for periodic check of cpu usage
5) Spawn a separate thread for checking of total numbers for host
6) Wait and read log messages with stats...
How to stop "monitoring"
------------------------
Just call Ctrl+C (KeyboardInterrupt) and the script should gracefully stop
all the threads and exit.
How to configure
----------------
The script accepts one input argument - the patch to a configuration file in
a JSON format.
All options are optional (have default values), so configuration file is
optional as well.
Options:
* debug
bool, if True the logger will use DEBUG level or INFO level if False.
Defaults to False
* connection
str, URI of libvirt to connect
Defaults to "qemu:///system"
* disk_getinfo_method
str, The way to obtain an information about the disk. There are 3
options available:
- "qemu" - using `qemu-img info` command
- "virsh" - via pulling volume pools and volumes in them by libvirt
API. like it is done in `virsh pool-list` and `virsh vol-info`
commands
- "guestfs" - mount all the disks and checks the actual size
(experimental, not checked actually)
Defaults to "qemu"
* host_check_interval
float, The interval in seconds to sleep between checking stats
Defaults to 5
* disk_check_interval
float, The interval in seconds to sleep between updating stats about disk
usage of a single VM.
Defaults to 10
* memory_check_interval
float, The interval in seconds to sleep between updating stats about ram
usage of a single VM.
Defaults to 5
* cpu_check_interval
float, The interval in seconds to sleep between updating stats about CPU
usage of a single VM.
Defaults to 1
* host_disk_utilization_alert
float, the number between 0 to 100. The achievement of the host's disk
usage to send alert about critical situation
Defaults to 80
* vm_disk_utilization_alert
float, the number between 0 to 100. The achievement of the VM's disk
usage to send alert about critical situation
Defaults to host_disk_utilization_alert value
* host_memory_utilization_alert
float, the number between 0 to 100. The achievement of the host's RAM
usage to send alert about critical situation
Defaults to 80
* vm_memory_utilization_alert
float, the number between 0 to 100. The achievement of the VM's RAM
usage to send alert about critical situation
Defaults to host_memory_utilization_alert value
"""
import collections
import logging
import sys
import subprocess
import time
import threading
import xml.etree.ElementTree
import json
import libvirt
LOG = logging.getLogger(__name__)
def set_config_defaults(config):
"""Setup all default for config options."""
config.setdefault("debug", False)
config.setdefault("connection", "qemu:///system")
config.setdefault("disk_getinfo_method", "qemu")
# intervals
config.setdefault("host_check_interval", 5)
config.setdefault("disk_check_interval", 10)
config.setdefault("memory_check_interval", 5)
config.setdefault("cpu_check_interval", 1)
# alerts
config.setdefault("host_disk_utilization_alert", 80)
config.setdefault("vm_disk_utilization_alert",
config["host_disk_utilization_alert"])
config.setdefault("host_memory_utilization_alert", 80)
config.setdefault("vm_memory_utilization_alert",
config["host_memory_utilization_alert"])
return config
class Disk(object):
_VIRSH_VOLUME_CACHE = {}
def __init__(self, vm, dump, connection, config):
self._conn = connection
self._config = config
self.vm = vm
self.dump = dump
self.path = dump.find("source").get("file")
# self.target = dump.find("target")
def _get_info_from_qemu_img(self):
output = subprocess.check_output(["qemu-img", "info", self.path])
allocation = None
capacity = None
for line in output.splitlines():
if line.startswith("virtual size"):
# it looks like `virtual size: 4.0G (4294967296 bytes)`
_w1, size, _w2 = line.rsplit(" ", 2)
allocation = int(size.replace("(", ""))
elif line.startswith("disk size"):
size = line.split(" ")[2]
try:
capacity = float(size)
except ValueError:
from oslo_utils import strutils
capacity = strutils.string_to_bytes("%sB" % size,
return_int=True)
if allocation is None or capacity is None:
raise Exception("Failed to parse output of `qemu-img info %s`." %
self.path)
return capacity, allocation
def _get_info_from_virsh_vol_info(self):
# use the class level cache to not load all pools and volumes for each
# disk
cache = self._VIRSH_VOLUME_CACHE
if self.path not in cache:
# try to load all volumes
for pool in self._conn.listAllStoragePools():
for volume in pool.listAllVolumes():
cache[self.path] = volume
# it should appear after load
if self.path not in cache:
raise Exception("Failed to find %s volume." % self.path)
_something, capacity, allocation = cache[self.path].info()
return capacity, allocation
def _get_info_from_guestfs(self):
import guestfs
capacity = 0
allocation = 0
g = guestfs.GuestFS()
g.add_drive_opts(self.path, format="raw", readonly=1)
g.launch()
file_systems = g.list_filesystems()
for fs in file_systems:
if fs[1] not in ["", "swap", "unknown"]:
g.mount(fs[0], "/")
st = g.statvfs("/")
capacity += (st.f_blocks * st.f_frsize)
allocation += (st.f_blocks - st.f_bfree) * st.f_frsize
g.umount_all()
g.close()
return capacity, allocation
def info(self):
LOG.debug("Fetching info of %s disk." % self.path)
if self._config["disk_getinfo_method"] == "guestfs":
return self._get_info_from_guestfs()
elif self._config["disk_getinfo_method"] == "virsh":
return self._get_info_from_virsh_vol_info()
else:
return self._get_info_from_qemu_img()
class VM(object):
def __init__(self, domain, connection, config):
self._conn = connection
self._config = config
self.id = domain.ID()
self.uuid = domain.UUIDString()
self.name = domain.name()
self.dump = xml.etree.ElementTree.fromstring(domain.XMLDesc())
self._disks = None
# leave the original object just in case
self._domain = domain
@property
def disks(self):
if self._disks is None:
self._disks = []
for disk in self.dump.findall(".//disk"):
if disk.get("device") != "disk" or disk.get("type") != "file":
continue
self._disks.append(Disk(self, disk, self._conn, self._config))
return self._disks
def memory_utilization(self):
try:
stats = self._domain.memoryStats()
except libvirt.libvirtError:
if LOG.level == logging.DEBUG:
LOG.exception("Failed to retrieve memory info from %s VM." %
self.uuid)
return 0, 1
total = stats["actual"]
# "available" key is missed when the VM just begin launching
used = total - stats.get("available", 0)
return total, used
def cpu_utilization(self):
try:
total = self._domain.getCPUStats(total=True)[0]
except libvirt.libvirtError:
if LOG.level == logging.DEBUG:
LOG.exception("Failed to retrieve CPU timings from %s VM." %
self.uuid)
return 0
# The statistics are reported in nanoseconds.
return total["cpu_time"] / 1000000000.
class Host(object):
def __init__(self, config):
conn = libvirt.openReadOnly(config["connection"])
if conn is None:
raise Exception("Failed to open connection to %s." %
config["connection"])
self._config = config
self._conn = conn
self.vms = set()
self._stats = {}
self._stop_event = threading.Event()
def _vm_disk_utilization(self, vm, interval):
while not self._stop_event.isSet() and vm.uuid in self.vms:
total_c = 0
total_a = 0
for disk in vm.disks:
try:
capacity, allocation = disk.info()
except:
if LOG.level == logging.DEBUG:
LOG.exception("Error occurred while obtaining info "
"about disk (path=%s ; vm=%s)." %
(disk.path, vm.name))
continue
usage = capacity * 100.0 / allocation
LOG.debug("%(vm)s uses %(usage).4f%% of the disk %(file)s." % {
"vm": vm.name,
"usage": usage,
"file": disk.path
})
if usage >= self._config["vm_disk_utilization_alert"]:
LOG.critical("The VM %s uses too much (%.4f%%) of it's "
"disk %s!" % (vm.name, usage, disk.path))
total_c += capacity
total_a += allocation
self._stats[vm.uuid]["disks_capacity"] = total_c
self._stats[vm.uuid]["disks_allocation"] = total_a
time.sleep(interval)
# do not include the stats of turned-off VM
self._stats[vm.uuid].pop("disks_capacity", None)
self._stats[vm.uuid].pop("disks_allocation", None)
def _vm_memory_utilization(self, vm, interval):
while not self._stop_event.isSet() and vm.uuid in self.vms:
total, used = vm.memory_utilization()
usage = used * 100.0 / total
LOG.debug("%(vm)s uses %(usage).4f%% of memory." % {
"vm": vm.name,
"usage": usage
})
if usage >= self._config["vm_memory_utilization_alert"]:
LOG.critical("The VM %s uses too much (%.4f%%) of it's "
"memory!" % (vm.name, usage))
self._stats[vm.uuid]["total_ram"] = total
self._stats[vm.uuid]["used_ram"] = used
time.sleep(interval)
# do not include the stats of turned-off VM
self._stats[vm.uuid].pop("total_ram", None)
self._stats[vm.uuid].pop("used_ram", None)
def _vm_cpu_utilization(self, vm, interval):
self._stats[vm.uuid]["cpu_load"] = collections.deque(maxlen=60)
cpu_time_0 = None
while not self._stop_event.isSet() and vm.uuid in self.vms:
cpu_time = vm.cpu_utilization()
if cpu_time_0 is not None:
usage = (100.0 * (cpu_time - cpu_time_0) / interval)
LOG.debug("%(vm)s uses %(usage).4f%% of CPU." % {
"vm": vm.name,
"usage": usage
})
self._stats[vm.uuid]["cpu_load"].append(usage)
cpu_time_0 = cpu_time
time.sleep(interval)
# do not include the stats of turned-off VM
self._stats[vm.uuid].pop("cpu_load", None)
def _check_resources(self):
"""Check resources do not exceed their limits.
Check Disk, RAM, CPU utilization of the whole host based on the
stats from VMs and alert if necessary.
"""
while not self._stop_event.isSet():
disks_capacity = sum(
[s.get("disks_capacity", 0) for s in self._stats.values()])
disks_allocation = sum(
[s.get("disks_allocation", 0) for s in self._stats.values()])
if disks_allocation != 0:
disk_usage = disks_capacity * 100.0 / disks_allocation
else:
# it is not loaded yet or no vms
disk_usage = 0
if disk_usage >= self._config["host_disk_utilization_alert"]:
LOG.critical("Host uses too much (%.4f%%) of it's disk!" %
disk_usage)
else:
LOG.info("Host uses %.4f%% of it's disk." % disk_usage)
total_ram = sum(
[s.get("total_ram", 0) for s in self._stats.values()])
used_ram = sum(
[s.get("used_ram", 0) for s in self._stats.values()])
if total_ram != 0:
ram_usage = used_ram * 100.0 / total_ram
else:
# it is not loaded yet or no vms
ram_usage = 0
if ram_usage >= self._config["host_memory_utilization_alert"]:
LOG.critical("Host uses too much (%.4f%%) of it's memory!" %
ram_usage)
else:
LOG.info("Host uses %.4f%% of it's memory." % ram_usage)
time.sleep(self._config["host_check_interval"])
def _watch_for_vms(self):
workers = []
while not self._stop_event.isSet():
processed = set()
for domain_id in (self._conn.listDomainsID() or []):
domain = self._conn.lookupByID(domain_id)
if domain.UUIDString() not in self.vms:
LOG.info("Found a new VM (uuid=%s) at the host. Starting "
"watching for it's resources." %
domain.UUIDString())
vm = VM(domain, self._conn, self._config)
self.vms.add(vm.uuid)
self._stats[vm.uuid] = {}
disk_t = threading.Thread(
target=self._vm_disk_utilization,
kwargs={
"vm": vm,
"interval": self._config["disk_check_interval"]})
disk_t.start()
workers.append(disk_t)
memory_t = threading.Thread(
target=self._vm_memory_utilization,
kwargs={
"vm": vm,
"interval": self._config["memory_check_interval"]})
memory_t.start()
workers.append(memory_t)
cpu_t = threading.Thread(
target=self._vm_cpu_utilization,
kwargs={
"vm": vm,
"interval": self._config["cpu_check_interval"]})
cpu_t.start()
workers.append(cpu_t)
# sleep a bit to unsync checking different VMs (avoid
# checking disks of different VMs in the one timeframe)
time.sleep(0.5)
processed.add(domain.UUIDString())
for vm in self.vms - processed:
# stop watching for turned off VMs
LOG.info("The VM %s is shutdown now. Stop watching for it's "
"resources." % vm)
self.vms.remove(vm)
time.sleep(1)
for worker in workers:
worker.join()
def watch(self):
vms_t = threading.Thread(target=self._watch_for_vms)
vms_t.start()
checker_t = threading.Thread(target=self._check_resources)
checker_t.start()
try:
while True:
time.sleep(.1)
except KeyboardInterrupt:
self._stop_event.set()
vms_t.join()
checker_t.join()
self._conn.close()
def main():
if len(sys.argv) not in (1, 2):
print("The script expects one argument - a path to config in json "
"format.")
exit(1)
elif len(sys.argv) == 2:
if sys.argv[1] in ("--help", "help"):
print(__doc__)
exit(0)
try:
with open(sys.argv[1]) as f:
config = json.loads(f)
except:
print("Failed to load json from %s." % sys.argv[1])
raise
else:
config = {}
config = set_config_defaults(config)
handler = logging.StreamHandler()
handler.setFormatter(
logging.Formatter("%(asctime)s - %(levelname)s - %(message)s"))
LOG.addHandler(handler)
if config["debug"]:
LOG.setLevel(logging.DEBUG)
else:
LOG.setLevel(logging.INFO)
LOG.info("Loaded configuration:\n%s" % json.dumps(config, indent=4))
host = Host(config)
host.watch()
if __name__ == "__main__":
main()

View File

@ -1,157 +0,0 @@
#!/usr/bin/env python
"""
This script can help clear out selected rabbitmq queues and helps to
ensure that *only* transient queues are cleared (vs notifications queues
which should not be cleared due to side-effects this causes when those
queues are cleared).
"""
from __future__ import print_function
import argparse
import os
import subprocess
import sys
# Taken from a *liberty* capture of the output of `rabbitmqctl list_queues`
QUEUE_ROOTS = tuple([
('cells.intercell.broadcast', True),
('cells.intercell.response', True),
('cells.intercell.targeted', True),
('cells.', True),
('cells_fanout', True),
('cert', False),
('cinder-backup', False),
('cinder-scheduler', False),
('compute', False),
('conductor', False),
('console', False),
('consoleauth', False),
('dhcp_agent', False),
('engine', False),
('heat-engine-listener', False),
('l3_agent', False),
('q-agent-notifier-dvr-update', False),
('q-agent-notifier-network-update', False),
('q-agent-notifier-port-delete', False),
('q-agent-notifier-port-update', False),
('q-agent-notifier-security_group-update', False),
('q-agent-notifier-tunnel-delete', False),
('q-agent-notifier-tunnel-update', False),
('q-l3-plugin', False),
('q-plugin', False),
# All reply queues should be ok to purge as they will either
# just timeout or retry (or that's the desired goal).
('reply_', True),
('scheduler', False),
])
# Most queues get either '${root}_fanout.*' related queues or
# '${root}.some_uuid' or '${root}.some_hostname' formats so these will
# capture all of those as these will be appended to the root and will
# match if they are a prefix of a queue name.
AUTO_PREFIX_SUFFIXES = tuple(["_fanout", "."])
def prompt_for_purge(queue_count):
input = ""
while input == "":
input = raw_input("Purge %s queues? " % (queue_count))
input = input.lower().strip()
if input not in ('yes', 'no', 'y', 'n'):
print("Please enter one of 'yes' or 'no'")
input = ""
else:
if input in ['yes', 'y']:
return True
else:
return False
def should_purge_queue(queue_name, size):
for r, is_prefix in QUEUE_ROOTS:
if r == queue_name:
# Don't delete the roots themselves...
return False
# Otherwise check if we should try a bunch of prefix or just
# check the prefix itself...
if is_prefix and queue_name.startswith(r):
return True
else:
for prefix_suffix in AUTO_PREFIX_SUFFIXES:
if queue_name.startswith(r + prefix_suffix):
return True
return False
def main():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("-d", "--dry_run",
action='store_true', default=False,
help="simulate purge commands but do not"
" actually run them")
parser.add_argument("-e", "--ensure_gone",
action='append', default=[],
help="ensure named queue is purged",
metavar="queue")
parser.add_argument("-n", "--no_prompt",
action='store_true', default=False,
help="skip being prompted before purging")
parser.add_argument("-u", "--username",
help=("purge via a connection"
" using given username (default=%(default)s)"),
default='guest')
parser.add_argument("-p", "--password",
help=("purge via a connection using"
" given password (default=%(default)s)"),
default='guest')
args = parser.parse_args()
if os.getuid() != 0:
# We can't run rabbitmqctl or rabbitmqadmin without root
# so make sure we have it...
print("This program must be ran as root!", file=sys.stderr)
sys.exit(1)
# This tries to get the list queues output and then parses it to
# try to figure out why queues we should try to clear, this is not
# a formal (sadly it appears there is none) so this may break
# at some point in the future...
stdout = subprocess.check_output(['rabbitmqctl', 'list_queues'])
# The first line is expected to be 'Listing queues ...'
lines = stdout.splitlines()
first_line = lines[0]
if first_line != "Listing queues ...":
print("First line of the output of `rabbitmqctl list_queues`"
" was not as expected, avoiding further damage by exiting"
" early!", file=sys.stderr)
sys.exit(1)
goodbye_queues = []
queues = sorted(lines[1:])
if queues:
print("There are %s queues..." % (len(queues)))
for i, line in enumerate(queues):
queue_name, str_size = line.split()
if (queue_name in args.ensure_gone
or should_purge_queue(queue_name, int(str_size))):
print("%s. %s (purging)" % (i + 1, queue_name))
goodbye_queues.append(queue_name)
else:
print("%s. %s (not purging)" % (i + 1, queue_name))
if not args.dry_run and goodbye_queues:
if not args.no_prompt:
if not prompt_for_purge(len(goodbye_queues)):
sys.exit(0)
print("Executing %s purges, please wait..." % (len(goodbye_queues)))
for queue_name in goodbye_queues:
purge_cmd = [
'rabbitmqadmin', 'purge', 'queue', 'name=%s' % queue_name,
]
if args.username:
purge_cmd.extend(['-u', args.username])
if args.password:
purge_cmd.extend(['-p', args.password])
subprocess.check_output(purge_cmd)
if __name__ == '__main__':
main()

16
tox.ini
View File

@ -1,16 +0,0 @@
[tox]
minversion = 2.0
skipsdist = True
envlist = bashate
[testenv:bashate]
deps = bashate
whitelist_externals = bash
commands = bash -c "find {toxinidir} \
-not \( -type d -name .?\* -prune \) \
-not \( -type d -name contrib -prune \) \
-type f \
-not -name \*~ \
-not -name \*.md \
-name \*.sh \
-print0 | xargs -0 bashate -v"