Import RST ops-guide
Import RST ops-guide. Publish it as draft for now, do not translate it. Also do not publish mitaka/arch-design-draft version. Change-Id: Id25e02aa0b2219fd9141d1354124386cb59bb856
|
@ -44,4 +44,6 @@ declare -A SPECIAL_BOOKS=(
|
||||||
["releasenotes"]="skip"
|
["releasenotes"]="skip"
|
||||||
# Skip arch design while its being revised
|
# Skip arch design while its being revised
|
||||||
["arch-design-draft"]="skip"
|
["arch-design-draft"]="skip"
|
||||||
|
# Skip ops-guide while its being revised
|
||||||
|
["ops-guide"]="skip"
|
||||||
)
|
)
|
||||||
|
|
|
@ -0,0 +1,30 @@
|
||||||
|
[metadata]
|
||||||
|
name = openstackopsguide
|
||||||
|
summary = OpenStack Operations Guide
|
||||||
|
author = OpenStack
|
||||||
|
author-email = openstack-docs@lists.openstack.org
|
||||||
|
home-page = http://docs.openstack.org/
|
||||||
|
classifier =
|
||||||
|
Environment :: OpenStack
|
||||||
|
Intended Audience :: Information Technology
|
||||||
|
Intended Audience :: System Administrators
|
||||||
|
License :: OSI Approved :: Apache Software License
|
||||||
|
Operating System :: POSIX :: Linux
|
||||||
|
Topic :: Documentation
|
||||||
|
|
||||||
|
[global]
|
||||||
|
setup-hooks =
|
||||||
|
pbr.hooks.setup_hook
|
||||||
|
|
||||||
|
[files]
|
||||||
|
|
||||||
|
[build_sphinx]
|
||||||
|
all_files = 1
|
||||||
|
build-dir = build
|
||||||
|
source-dir = source
|
||||||
|
|
||||||
|
[wheel]
|
||||||
|
universal = 1
|
||||||
|
|
||||||
|
[pbr]
|
||||||
|
warnerrors = True
|
|
@ -0,0 +1,30 @@
|
||||||
|
#!/usr/bin/env python
|
||||||
|
# Copyright (c) 2013 Hewlett-Packard Development Company, L.P.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# THIS FILE IS MANAGED BY THE GLOBAL REQUIREMENTS REPO - DO NOT EDIT
|
||||||
|
import setuptools
|
||||||
|
|
||||||
|
# In python < 2.7.4, a lazy loading of package `pbr` will break
|
||||||
|
# setuptools if some other modules registered functions in `atexit`.
|
||||||
|
# solution from: http://bugs.python.org/issue15881#msg170215
|
||||||
|
try:
|
||||||
|
import multiprocessing # noqa
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
setuptools.setup(
|
||||||
|
setup_requires=['pbr'],
|
||||||
|
pbr=True)
|
|
@ -0,0 +1,51 @@
|
||||||
|
================
|
||||||
|
Acknowledgements
|
||||||
|
================
|
||||||
|
|
||||||
|
The OpenStack Foundation supported the creation of this book with plane
|
||||||
|
tickets to Austin, lodging (including one adventurous evening without
|
||||||
|
power after a windstorm), and delicious food. For about USD $10,000, we
|
||||||
|
could collaborate intensively for a week in the same room at the
|
||||||
|
Rackspace Austin office. The authors are all members of the OpenStack
|
||||||
|
Foundation, which you can join. Go to the `Foundation web
|
||||||
|
site <https://www.openstack.org/join>`_.
|
||||||
|
|
||||||
|
We want to acknowledge our excellent host Rackers at Rackspace in
|
||||||
|
Austin:
|
||||||
|
|
||||||
|
- Emma Richards of Rackspace Guest Relations took excellent care of our
|
||||||
|
lunch orders and even set aside a pile of sticky notes that had
|
||||||
|
fallen off the walls.
|
||||||
|
|
||||||
|
- Betsy Hagemeier, a Fanatical Executive Assistant, took care of a room
|
||||||
|
reshuffle and helped us settle in for the week.
|
||||||
|
|
||||||
|
- The Real Estate team at Rackspace in Austin, also known as "The
|
||||||
|
Victors," were super responsive.
|
||||||
|
|
||||||
|
- Adam Powell in Racker IT supplied us with bandwidth each day and
|
||||||
|
second monitors for those of us needing more screens.
|
||||||
|
|
||||||
|
- On Wednesday night we had a fun happy hour with the Austin OpenStack
|
||||||
|
Meetup group and Racker Katie Schmidt took great care of our group.
|
||||||
|
|
||||||
|
We also had some excellent input from outside of the room:
|
||||||
|
|
||||||
|
- Tim Bell from CERN gave us feedback on the outline before we started
|
||||||
|
and reviewed it mid-week.
|
||||||
|
|
||||||
|
- Sébastien Han has written excellent blogs and generously gave his
|
||||||
|
permission for re-use.
|
||||||
|
|
||||||
|
- Oisin Feeley read it, made some edits, and provided emailed feedback
|
||||||
|
right when we asked.
|
||||||
|
|
||||||
|
Inside the book sprint room with us each day was our book sprint
|
||||||
|
facilitator Adam Hyde. Without his tireless support and encouragement,
|
||||||
|
we would have thought a book of this scope was impossible in five days.
|
||||||
|
Adam has proven the book sprint method effectively again and again. He
|
||||||
|
creates both tools and faith in collaborative authoring at
|
||||||
|
`www.booksprints.net <http://www.booksprints.net/>`_.
|
||||||
|
|
||||||
|
We couldn't have pulled it off without so much supportive help and
|
||||||
|
encouragement.
|
|
@ -0,0 +1,542 @@
|
||||||
|
=================================
|
||||||
|
Tales From the Cryp^H^H^H^H Cloud
|
||||||
|
=================================
|
||||||
|
|
||||||
|
Herein lies a selection of tales from OpenStack cloud operators. Read,
|
||||||
|
and learn from their wisdom.
|
||||||
|
|
||||||
|
Double VLAN
|
||||||
|
~~~~~~~~~~~
|
||||||
|
|
||||||
|
I was on-site in Kelowna, British Columbia, Canada setting up a new
|
||||||
|
OpenStack cloud. The deployment was fully automated: Cobbler deployed
|
||||||
|
the OS on the bare metal, bootstrapped it, and Puppet took over from
|
||||||
|
there. I had run the deployment scenario so many times in practice and
|
||||||
|
took for granted that everything was working.
|
||||||
|
|
||||||
|
On my last day in Kelowna, I was in a conference call from my hotel. In
|
||||||
|
the background, I was fooling around on the new cloud. I launched an
|
||||||
|
instance and logged in. Everything looked fine. Out of boredom, I ran
|
||||||
|
:command:`ps aux` and all of the sudden the instance locked up.
|
||||||
|
|
||||||
|
Thinking it was just a one-off issue, I terminated the instance and
|
||||||
|
launched a new one. By then, the conference call ended and I was off to
|
||||||
|
the data center.
|
||||||
|
|
||||||
|
At the data center, I was finishing up some tasks and remembered the
|
||||||
|
lock-up. I logged into the new instance and ran :command:`ps aux` again.
|
||||||
|
It worked. Phew. I decided to run it one more time. It locked up.
|
||||||
|
|
||||||
|
After reproducing the problem several times, I came to the unfortunate
|
||||||
|
conclusion that this cloud did indeed have a problem. Even worse, my
|
||||||
|
time was up in Kelowna and I had to return back to Calgary.
|
||||||
|
|
||||||
|
Where do you even begin troubleshooting something like this? An instance
|
||||||
|
that just randomly locks up when a command is issued. Is it the image?
|
||||||
|
Nope—it happens on all images. Is it the compute node? Nope—all nodes.
|
||||||
|
Is the instance locked up? No! New SSH connections work just fine!
|
||||||
|
|
||||||
|
We reached out for help. A networking engineer suggested it was an MTU
|
||||||
|
issue. Great! MTU! Something to go on! What's MTU and why would it cause
|
||||||
|
a problem?
|
||||||
|
|
||||||
|
MTU is maximum transmission unit. It specifies the maximum number of
|
||||||
|
bytes that the interface accepts for each packet. If two interfaces have
|
||||||
|
two different MTUs, bytes might get chopped off and weird things
|
||||||
|
happen—such as random session lockups.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Not all packets have a size of 1500. Running the :command:`ls` command over
|
||||||
|
SSH might only create a single packets less than 1500 bytes.
|
||||||
|
However, running a command with heavy output, such as :command:`ps aux`
|
||||||
|
requires several packets of 1500 bytes.
|
||||||
|
|
||||||
|
OK, so where is the MTU issue coming from? Why haven't we seen this in
|
||||||
|
any other deployment? What's new in this situation? Well, new data
|
||||||
|
center, new uplink, new switches, new model of switches, new servers,
|
||||||
|
first time using this model of servers… so, basically everything was
|
||||||
|
new. Wonderful. We toyed around with raising the MTU at various areas:
|
||||||
|
the switches, the NICs on the compute nodes, the virtual NICs in the
|
||||||
|
instances, we even had the data center raise the MTU for our uplink
|
||||||
|
interface. Some changes worked, some didn't. This line of
|
||||||
|
troubleshooting didn't feel right, though. We shouldn't have to be
|
||||||
|
changing the MTU in these areas.
|
||||||
|
|
||||||
|
As a last resort, our network admin (Alvaro) and myself sat down with
|
||||||
|
four terminal windows, a pencil, and a piece of paper. In one window, we
|
||||||
|
ran ping. In the second window, we ran ``tcpdump`` on the cloud
|
||||||
|
controller. In the third, ``tcpdump`` on the compute node. And the forth
|
||||||
|
had ``tcpdump`` on the instance. For background, this cloud was a
|
||||||
|
multi-node, non-multi-host setup.
|
||||||
|
|
||||||
|
One cloud controller acted as a gateway to all compute nodes.
|
||||||
|
VlanManager was used for the network config. This means that the cloud
|
||||||
|
controller and all compute nodes had a different VLAN for each OpenStack
|
||||||
|
project. We used the :option:`-s` option of ``ping`` to change the packet
|
||||||
|
size. We watched as sometimes packets would fully return, sometimes they'd
|
||||||
|
only make it out and never back in, and sometimes the packets would stop at a
|
||||||
|
random point. We changed ``tcpdump`` to start displaying the hex dump of
|
||||||
|
the packet. We pinged between every combination of outside, controller,
|
||||||
|
compute, and instance.
|
||||||
|
|
||||||
|
Finally, Alvaro noticed something. When a packet from the outside hits
|
||||||
|
the cloud controller, it should not be configured with a VLAN. We
|
||||||
|
verified this as true. When the packet went from the cloud controller to
|
||||||
|
the compute node, it should only have a VLAN if it was destined for an
|
||||||
|
instance. This was still true. When the ping reply was sent from the
|
||||||
|
instance, it should be in a VLAN. True. When it came back to the cloud
|
||||||
|
controller and on its way out to the Internet, it should no longer have
|
||||||
|
a VLAN. False. Uh oh. It looked as though the VLAN part of the packet
|
||||||
|
was not being removed.
|
||||||
|
|
||||||
|
That made no sense.
|
||||||
|
|
||||||
|
While bouncing this idea around in our heads, I was randomly typing
|
||||||
|
commands on the compute node:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ ip a
|
||||||
|
…
|
||||||
|
10: vlan100@vlan20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br100 state UP
|
||||||
|
…
|
||||||
|
|
||||||
|
"Hey Alvaro, can you run a VLAN on top of a VLAN?"
|
||||||
|
|
||||||
|
"If you did, you'd add an extra 4 bytes to the packet…"
|
||||||
|
|
||||||
|
Then it all made sense…
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ grep vlan_interface /etc/nova/nova.conf
|
||||||
|
vlan_interface=vlan20
|
||||||
|
|
||||||
|
In ``nova.conf``, ``vlan_interface`` specifies what interface OpenStack
|
||||||
|
should attach all VLANs to. The correct setting should have been:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
vlan_interface=bond0
|
||||||
|
|
||||||
|
As this would be the server's bonded NIC.
|
||||||
|
|
||||||
|
vlan20 is the VLAN that the data center gave us for outgoing Internet
|
||||||
|
access. It's a correct VLAN and is also attached to bond0.
|
||||||
|
|
||||||
|
By mistake, I configured OpenStack to attach all tenant VLANs to vlan20
|
||||||
|
instead of bond0 thereby stacking one VLAN on top of another. This added
|
||||||
|
an extra 4 bytes to each packet and caused a packet of 1504 bytes to be
|
||||||
|
sent out which would cause problems when it arrived at an interface that
|
||||||
|
only accepted 1500.
|
||||||
|
|
||||||
|
As soon as this setting was fixed, everything worked.
|
||||||
|
|
||||||
|
"The Issue"
|
||||||
|
~~~~~~~~~~~
|
||||||
|
|
||||||
|
At the end of August 2012, a post-secondary school in Alberta, Canada
|
||||||
|
migrated its infrastructure to an OpenStack cloud. As luck would have
|
||||||
|
it, within the first day or two of it running, one of their servers just
|
||||||
|
disappeared from the network. Blip. Gone.
|
||||||
|
|
||||||
|
After restarting the instance, everything was back up and running. We
|
||||||
|
reviewed the logs and saw that at some point, network communication
|
||||||
|
stopped and then everything went idle. We chalked this up to a random
|
||||||
|
occurrence.
|
||||||
|
|
||||||
|
A few nights later, it happened again.
|
||||||
|
|
||||||
|
We reviewed both sets of logs. The one thing that stood out the most was
|
||||||
|
DHCP. At the time, OpenStack, by default, set DHCP leases for one minute
|
||||||
|
(it's now two minutes). This means that every instance contacts the
|
||||||
|
cloud controller (DHCP server) to renew its fixed IP. For some reason,
|
||||||
|
this instance could not renew its IP. We correlated the instance's logs
|
||||||
|
with the logs on the cloud controller and put together a conversation:
|
||||||
|
|
||||||
|
#. Instance tries to renew IP.
|
||||||
|
|
||||||
|
#. Cloud controller receives the renewal request and sends a response.
|
||||||
|
|
||||||
|
#. Instance "ignores" the response and re-sends the renewal request.
|
||||||
|
|
||||||
|
#. Cloud controller receives the second request and sends a new
|
||||||
|
response.
|
||||||
|
|
||||||
|
#. Instance begins sending a renewal request to ``255.255.255.255``
|
||||||
|
since it hasn't heard back from the cloud controller.
|
||||||
|
|
||||||
|
#. The cloud controller receives the ``255.255.255.255`` request and
|
||||||
|
sends a third response.
|
||||||
|
|
||||||
|
#. The instance finally gives up.
|
||||||
|
|
||||||
|
With this information in hand, we were sure that the problem had to do
|
||||||
|
with DHCP. We thought that for some reason, the instance wasn't getting
|
||||||
|
a new IP address and with no IP, it shut itself off from the network.
|
||||||
|
|
||||||
|
A quick Google search turned up this: `DHCP lease errors in VLAN
|
||||||
|
mode <https://lists.launchpad.net/openstack/msg11696.html>`_
|
||||||
|
(https://lists.launchpad.net/openstack/msg11696.html) which further
|
||||||
|
supported our DHCP theory.
|
||||||
|
|
||||||
|
An initial idea was to just increase the lease time. If the instance
|
||||||
|
only renewed once every week, the chances of this problem happening
|
||||||
|
would be tremendously smaller than every minute. This didn't solve the
|
||||||
|
problem, though. It was just covering the problem up.
|
||||||
|
|
||||||
|
We decided to have ``tcpdump`` run on this instance and see if we could
|
||||||
|
catch it in action again. Sure enough, we did.
|
||||||
|
|
||||||
|
The ``tcpdump`` looked very, very weird. In short, it looked as though
|
||||||
|
network communication stopped before the instance tried to renew its IP.
|
||||||
|
Since there is so much DHCP chatter from a one minute lease, it's very
|
||||||
|
hard to confirm it, but even with only milliseconds difference between
|
||||||
|
packets, if one packet arrives first, it arrived first, and if that
|
||||||
|
packet reported network issues, then it had to have happened before
|
||||||
|
DHCP.
|
||||||
|
|
||||||
|
Additionally, this instance in question was responsible for a very, very
|
||||||
|
large backup job each night. While "The Issue" (as we were now calling
|
||||||
|
it) didn't happen exactly when the backup happened, it was close enough
|
||||||
|
(a few hours) that we couldn't ignore it.
|
||||||
|
|
||||||
|
Further days go by and we catch The Issue in action more and more. We
|
||||||
|
find that dhclient is not running after The Issue happens. Now we're
|
||||||
|
back to thinking it's a DHCP issue. Running ``/etc/init.d/networking``
|
||||||
|
restart brings everything back up and running.
|
||||||
|
|
||||||
|
Ever have one of those days where all of the sudden you get the Google
|
||||||
|
results you were looking for? Well, that's what happened here. I was
|
||||||
|
looking for information on dhclient and why it dies when it can't renew
|
||||||
|
its lease and all of the sudden I found a bunch of OpenStack and dnsmasq
|
||||||
|
discussions that were identical to the problem we were seeing!
|
||||||
|
|
||||||
|
`Problem with Heavy Network IO and
|
||||||
|
Dnsmasq <http://www.gossamer-threads.com/lists/openstack/operators/18197>`_
|
||||||
|
(http://www.gossamer-threads.com/lists/openstack/operators/18197)
|
||||||
|
|
||||||
|
`instances losing IP address while running, due to No
|
||||||
|
DHCPOFFER <http://www.gossamer-threads.com/lists/openstack/dev/14696>`_
|
||||||
|
(http://www.gossamer-threads.com/lists/openstack/dev/14696)
|
||||||
|
|
||||||
|
Seriously, Google.
|
||||||
|
|
||||||
|
This bug report was the key to everything: `KVM images lose connectivity
|
||||||
|
with bridged
|
||||||
|
network <https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978>`_
|
||||||
|
(https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978)
|
||||||
|
|
||||||
|
It was funny to read the report. It was full of people who had some
|
||||||
|
strange network problem but didn't quite explain it in the same way.
|
||||||
|
|
||||||
|
So it was a qemu/kvm bug.
|
||||||
|
|
||||||
|
At the same time of finding the bug report, a co-worker was able to
|
||||||
|
successfully reproduce The Issue! How? He used ``iperf`` to spew a ton
|
||||||
|
of bandwidth at an instance. Within 30 minutes, the instance just
|
||||||
|
disappeared from the network.
|
||||||
|
|
||||||
|
Armed with a patched qemu and a way to reproduce, we set out to see if
|
||||||
|
we've finally solved The Issue. After 48 hours straight of hammering the
|
||||||
|
instance with bandwidth, we were confident. The rest is history. You can
|
||||||
|
search the bug report for "joe" to find my comments and actual tests.
|
||||||
|
|
||||||
|
Disappearing Images
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
At the end of 2012, Cybera (a nonprofit with a mandate to oversee the
|
||||||
|
development of cyberinfrastructure in Alberta, Canada) deployed an
|
||||||
|
updated OpenStack cloud for their `DAIR
|
||||||
|
project <http://www.canarie.ca/cloud/>`_
|
||||||
|
(http://www.canarie.ca/en/dair-program/about). A few days into
|
||||||
|
production, a compute node locks up. Upon rebooting the node, I checked
|
||||||
|
to see what instances were hosted on that node so I could boot them on
|
||||||
|
behalf of the customer. Luckily, only one instance.
|
||||||
|
|
||||||
|
The :command:`nova reboot` command wasn't working, so I used :command:`virsh`,
|
||||||
|
but it immediately came back with an error saying it was unable to find the
|
||||||
|
backing disk. In this case, the backing disk is the Glance image that is
|
||||||
|
copied to ``/var/lib/nova/instances/_base`` when the image is used for
|
||||||
|
the first time. Why couldn't it find it? I checked the directory and
|
||||||
|
sure enough it was gone.
|
||||||
|
|
||||||
|
I reviewed the ``nova`` database and saw the instance's entry in the
|
||||||
|
``nova.instances`` table. The image that the instance was using matched
|
||||||
|
what virsh was reporting, so no inconsistency there.
|
||||||
|
|
||||||
|
I checked Glance and noticed that this image was a snapshot that the
|
||||||
|
user created. At least that was good news—this user would have been the
|
||||||
|
only user affected.
|
||||||
|
|
||||||
|
Finally, I checked StackTach and reviewed the user's events. They had
|
||||||
|
created and deleted several snapshots—most likely experimenting.
|
||||||
|
Although the timestamps didn't match up, my conclusion was that they
|
||||||
|
launched their instance and then deleted the snapshot and it was somehow
|
||||||
|
removed from ``/var/lib/nova/instances/_base``. None of that made sense,
|
||||||
|
but it was the best I could come up with.
|
||||||
|
|
||||||
|
It turns out the reason that this compute node locked up was a hardware
|
||||||
|
issue. We removed it from the DAIR cloud and called Dell to have it
|
||||||
|
serviced. Dell arrived and began working. Somehow or another (or a fat
|
||||||
|
finger), a different compute node was bumped and rebooted. Great.
|
||||||
|
|
||||||
|
When this node fully booted, I ran through the same scenario of seeing
|
||||||
|
what instances were running so I could turn them back on. There were a
|
||||||
|
total of four. Three booted and one gave an error. It was the same error
|
||||||
|
as before: unable to find the backing disk. Seriously, what?
|
||||||
|
|
||||||
|
Again, it turns out that the image was a snapshot. The three other
|
||||||
|
instances that successfully started were standard cloud images. Was it a
|
||||||
|
problem with snapshots? That didn't make sense.
|
||||||
|
|
||||||
|
A note about DAIR's architecture: ``/var/lib/nova/instances`` is a
|
||||||
|
shared NFS mount. This means that all compute nodes have access to it,
|
||||||
|
which includes the ``_base`` directory. Another centralized area is
|
||||||
|
``/var/log/rsyslog`` on the cloud controller. This directory collects
|
||||||
|
all OpenStack logs from all compute nodes. I wondered if there were any
|
||||||
|
entries for the file that :command:`virsh` is reporting:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
dair-ua-c03/nova.log:Dec 19 12:10:59 dair-ua-c03
|
||||||
|
2012-12-19 12:10:59 INFO nova.virt.libvirt.imagecache
|
||||||
|
[-] Removing base file:
|
||||||
|
/var/lib/nova/instances/_base/7b4783508212f5d242cbf9ff56fb8d33b4ce6166_10
|
||||||
|
|
||||||
|
Ah-hah! So OpenStack was deleting it. But why?
|
||||||
|
|
||||||
|
A feature was introduced in Essex to periodically check and see if there
|
||||||
|
were any ``_base`` files not in use. If there were, OpenStack Compute
|
||||||
|
would delete them. This idea sounds innocent enough and has some good
|
||||||
|
qualities to it. But how did this feature end up turned on? It was
|
||||||
|
disabled by default in Essex. As it should be. It was `decided to be
|
||||||
|
turned on in Folsom <https://bugs.launchpad.net/nova/+bug/1029674>`_
|
||||||
|
(https://bugs.launchpad.net/nova/+bug/1029674). I cannot emphasize
|
||||||
|
enough that:
|
||||||
|
|
||||||
|
*Actions which delete things should not be enabled by default.*
|
||||||
|
|
||||||
|
Disk space is cheap these days. Data recovery is not.
|
||||||
|
|
||||||
|
Secondly, DAIR's shared ``/var/lib/nova/instances`` directory
|
||||||
|
contributed to the problem. Since all compute nodes have access to this
|
||||||
|
directory, all compute nodes periodically review the \_base directory.
|
||||||
|
If there is only one instance using an image, and the node that the
|
||||||
|
instance is on is down for a few minutes, it won't be able to mark the
|
||||||
|
image as still in use. Therefore, the image seems like it's not in use
|
||||||
|
and is deleted. When the compute node comes back online, the instance
|
||||||
|
hosted on that node is unable to start.
|
||||||
|
|
||||||
|
The Valentine's Day Compute Node Massacre
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Although the title of this story is much more dramatic than the actual
|
||||||
|
event, I don't think, or hope, that I'll have the opportunity to use
|
||||||
|
"Valentine's Day Massacre" again in a title.
|
||||||
|
|
||||||
|
This past Valentine's Day, I received an alert that a compute node was
|
||||||
|
no longer available in the cloud—meaning,
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova service-list
|
||||||
|
|
||||||
|
showed this particular node in a down state.
|
||||||
|
|
||||||
|
I logged into the cloud controller and was able to both ``ping`` and SSH
|
||||||
|
into the problematic compute node which seemed very odd. Usually if I
|
||||||
|
receive this type of alert, the compute node has totally locked up and
|
||||||
|
would be inaccessible.
|
||||||
|
|
||||||
|
After a few minutes of troubleshooting, I saw the following details:
|
||||||
|
|
||||||
|
- A user recently tried launching a CentOS instance on that node
|
||||||
|
|
||||||
|
- This user was the only user on the node (new node)
|
||||||
|
|
||||||
|
- The load shot up to 8 right before I received the alert
|
||||||
|
|
||||||
|
- The bonded 10gb network device (bond0) was in a DOWN state
|
||||||
|
|
||||||
|
- The 1gb NIC was still alive and active
|
||||||
|
|
||||||
|
I looked at the status of both NICs in the bonded pair and saw that
|
||||||
|
neither was able to communicate with the switch port. Seeing as how each
|
||||||
|
NIC in the bond is connected to a separate switch, I thought that the
|
||||||
|
chance of a switch port dying on each switch at the same time was quite
|
||||||
|
improbable. I concluded that the 10gb dual port NIC had died and needed
|
||||||
|
replaced. I created a ticket for the hardware support department at the
|
||||||
|
data center where the node was hosted. I felt lucky that this was a new
|
||||||
|
node and no one else was hosted on it yet.
|
||||||
|
|
||||||
|
An hour later I received the same alert, but for another compute node.
|
||||||
|
Crap. OK, now there's definitely a problem going on. Just like the
|
||||||
|
original node, I was able to log in by SSH. The bond0 NIC was DOWN but
|
||||||
|
the 1gb NIC was active.
|
||||||
|
|
||||||
|
And the best part: the same user had just tried creating a CentOS
|
||||||
|
instance. What?
|
||||||
|
|
||||||
|
I was totally confused at this point, so I texted our network admin to
|
||||||
|
see if he was available to help. He logged in to both switches and
|
||||||
|
immediately saw the problem: the switches detected spanning tree packets
|
||||||
|
coming from the two compute nodes and immediately shut the ports down to
|
||||||
|
prevent spanning tree loops:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
Feb 15 01:40:18 SW-1 Stp: %SPANTREE-4-BLOCK_BPDUGUARD: Received BPDU packet on Port-Channel35 with BPDU guard enabled. Disabling interface. (source mac fa:16:3e:24:e7:22)
|
||||||
|
Feb 15 01:40:18 SW-1 Ebra: %ETH-4-ERRDISABLE: bpduguard error detected on Port-Channel35.
|
||||||
|
Feb 15 01:40:18 SW-1 Mlag: %MLAG-4-INTF_INACTIVE_LOCAL: Local interface Port-Channel35 is link down. MLAG 35 is inactive.
|
||||||
|
Feb 15 01:40:18 SW-1 Ebra: %LINEPROTO-5-UPDOWN: Line protocol on Interface Port-Channel35 (Server35), changed state to down
|
||||||
|
Feb 15 01:40:19 SW-1 Stp: %SPANTREE-6-INTERFACE_DEL: Interface Port-Channel35 has been removed from instance MST0
|
||||||
|
Feb 15 01:40:19 SW-1 Ebra: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet35 (Server35), changed state to down
|
||||||
|
|
||||||
|
He re-enabled the switch ports and the two compute nodes immediately
|
||||||
|
came back to life.
|
||||||
|
|
||||||
|
Unfortunately, this story has an open ending... we're still looking into
|
||||||
|
why the CentOS image was sending out spanning tree packets. Further,
|
||||||
|
we're researching a proper way on how to mitigate this from happening.
|
||||||
|
It's a bigger issue than one might think. While it's extremely important
|
||||||
|
for switches to prevent spanning tree loops, it's very problematic to
|
||||||
|
have an entire compute node be cut from the network when this happens.
|
||||||
|
If a compute node is hosting 100 instances and one of them sends a
|
||||||
|
spanning tree packet, that instance has effectively DDOS'd the other 99
|
||||||
|
instances.
|
||||||
|
|
||||||
|
This is an ongoing and hot topic in networking circles —especially with
|
||||||
|
the raise of virtualization and virtual switches.
|
||||||
|
|
||||||
|
Down the Rabbit Hole
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Users being able to retrieve console logs from running instances is a
|
||||||
|
boon for support—many times they can figure out what's going on inside
|
||||||
|
their instance and fix what's going on without bothering you.
|
||||||
|
Unfortunately, sometimes overzealous logging of failures can cause
|
||||||
|
problems of its own.
|
||||||
|
|
||||||
|
A report came in: VMs were launching slowly, or not at all. Cue the
|
||||||
|
standard checks—nothing on the Nagios, but there was a spike in network
|
||||||
|
towards the current master of our RabbitMQ cluster. Investigation
|
||||||
|
started, but soon the other parts of the queue cluster were leaking
|
||||||
|
memory like a sieve. Then the alert came in—the master Rabbit server
|
||||||
|
went down and connections failed over to the slave.
|
||||||
|
|
||||||
|
At that time, our control services were hosted by another team and we
|
||||||
|
didn't have much debugging information to determine what was going on
|
||||||
|
with the master, and we could not reboot it. That team noted that it
|
||||||
|
failed without alert, but managed to reboot it. After an hour, the
|
||||||
|
cluster had returned to its normal state and we went home for the day.
|
||||||
|
|
||||||
|
Continuing the diagnosis the next morning was kick started by another
|
||||||
|
identical failure. We quickly got the message queue running again, and
|
||||||
|
tried to work out why Rabbit was suffering from so much network traffic.
|
||||||
|
Enabling debug logging on nova-api quickly brought understanding. A
|
||||||
|
``tail -f /var/log/nova/nova-api.log`` was scrolling by faster
|
||||||
|
than we'd ever seen before. CTRL+C on that and we could plainly see the
|
||||||
|
contents of a system log spewing failures over and over again - a system
|
||||||
|
log from one of our users' instances.
|
||||||
|
|
||||||
|
After finding the instance ID we headed over to
|
||||||
|
``/var/lib/nova/instances`` to find the ``console.log``:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
adm@cc12:/var/lib/nova/instances/instance-00000e05# wc -l console.log
|
||||||
|
92890453 console.log
|
||||||
|
adm@cc12:/var/lib/nova/instances/instance-00000e05# ls -sh console.log
|
||||||
|
5.5G console.log
|
||||||
|
|
||||||
|
Sure enough, the user had been periodically refreshing the console log
|
||||||
|
page on the dashboard and the 5G file was traversing the Rabbit cluster
|
||||||
|
to get to the dashboard.
|
||||||
|
|
||||||
|
We called them and asked them to stop for a while, and they were happy
|
||||||
|
to abandon the horribly broken VM. After that, we started monitoring the
|
||||||
|
size of console logs.
|
||||||
|
|
||||||
|
To this day, `the issue <https://bugs.launchpad.net/nova/+bug/832507>`__
|
||||||
|
(https://bugs.launchpad.net/nova/+bug/832507) doesn't have a permanent
|
||||||
|
resolution, but we look forward to the discussion at the next summit.
|
||||||
|
|
||||||
|
Havana Haunted by the Dead
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Felix Lee of Academia Sinica Grid Computing Centre in Taiwan contributed
|
||||||
|
this story.
|
||||||
|
|
||||||
|
I just upgraded OpenStack from Grizzly to Havana 2013.2-2 using the RDO
|
||||||
|
repository and everything was running pretty well—except the EC2 API.
|
||||||
|
|
||||||
|
I noticed that the API would suffer from a heavy load and respond slowly
|
||||||
|
to particular EC2 requests such as ``RunInstances``.
|
||||||
|
|
||||||
|
Output from ``/var/log/nova/nova-api.log`` on :term:`Havana`:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
2014-01-10 09:11:45.072 129745 INFO nova.ec2.wsgi.server
|
||||||
|
[req-84d16d16-3808-426b-b7af-3b90a11b83b0
|
||||||
|
0c6e7dba03c24c6a9bce299747499e8a 7052bd6714e7460caeb16242e68124f9]
|
||||||
|
117.103.103.29 "GET
|
||||||
|
/services/Cloud?AWSAccessKeyId=[something]&Action=RunInstances&ClientToken=[something]&ImageId=ami-00000001&InstanceInitiatedShutdownBehavior=terminate...
|
||||||
|
HTTP/1.1" status: 200 len: 1109 time: 138.5970151
|
||||||
|
|
||||||
|
This request took over two minutes to process, but executed quickly on
|
||||||
|
another co-existing Grizzly deployment using the same hardware and
|
||||||
|
system configuration.
|
||||||
|
|
||||||
|
Output from ``/var/log/nova/nova-api.log`` on :term:`Grizzly`:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
2014-01-08 11:15:15.704 INFO nova.ec2.wsgi.server
|
||||||
|
[req-ccac9790-3357-4aa8-84bd-cdaab1aa394e
|
||||||
|
ebbd729575cb404081a45c9ada0849b7 8175953c209044358ab5e0ec19d52c37]
|
||||||
|
117.103.103.29 "GET
|
||||||
|
/services/Cloud?AWSAccessKeyId=[something]&Action=RunInstances&ClientToken=[something]&ImageId=ami-00000007&InstanceInitiatedShutdownBehavior=terminate...
|
||||||
|
HTTP/1.1" status: 200 len: 931 time: 3.9426181
|
||||||
|
|
||||||
|
While monitoring system resources, I noticed a significant increase in
|
||||||
|
memory consumption while the EC2 API processed this request. I thought
|
||||||
|
it wasn't handling memory properly—possibly not releasing memory. If the
|
||||||
|
API received several of these requests, memory consumption quickly grew
|
||||||
|
until the system ran out of RAM and began using swap. Each node has 48
|
||||||
|
GB of RAM and the "nova-api" process would consume all of it within
|
||||||
|
minutes. Once this happened, the entire system would become unusably
|
||||||
|
slow until I restarted the nova-api service.
|
||||||
|
|
||||||
|
So, I found myself wondering what changed in the EC2 API on Havana that
|
||||||
|
might cause this to happen. Was it a bug or a normal behavior that I now
|
||||||
|
need to work around?
|
||||||
|
|
||||||
|
After digging into the nova (OpenStack Compute) code, I noticed two
|
||||||
|
areas in ``api/ec2/cloud.py`` potentially impacting my system:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
instances = self.compute_api.get_all(context,
|
||||||
|
search_opts=search_opts,
|
||||||
|
sort_dir='asc')
|
||||||
|
|
||||||
|
sys_metas = self.compute_api.get_all_system_metadata(
|
||||||
|
context, search_filts=[{'key': ['EC2_client_token']},
|
||||||
|
{'value': [client_token]}])
|
||||||
|
|
||||||
|
Since my database contained many records—over 1 million metadata records
|
||||||
|
and over 300,000 instance records in "deleted" or "errored" states—each
|
||||||
|
search took a long time. I decided to clean up the database by first
|
||||||
|
archiving a copy for backup and then performing some deletions using the
|
||||||
|
MySQL client. For example, I ran the following SQL command to remove
|
||||||
|
rows of instances deleted for over a year:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
mysql> delete from nova.instances where deleted=1 and terminated_at < (NOW() - INTERVAL 1 YEAR);
|
||||||
|
|
||||||
|
Performance increased greatly after deleting the old records and my new
|
||||||
|
deployment continues to behave well.
|
|
@ -0,0 +1,62 @@
|
||||||
|
=========
|
||||||
|
Resources
|
||||||
|
=========
|
||||||
|
|
||||||
|
OpenStack
|
||||||
|
~~~~~~~~~
|
||||||
|
|
||||||
|
- `Installation Guide for openSUSE 13.2 and SUSE Linux Enterprise
|
||||||
|
Server 12 <http://docs.openstack.org/liberty/install-guide-obs/>`_
|
||||||
|
|
||||||
|
- `Installation Guide for Red Hat Enterprise Linux 7, CentOS 7, and
|
||||||
|
Fedora 22 <http://docs.openstack.org/liberty/install-guide-rdo/>`_
|
||||||
|
|
||||||
|
- `Installation Guide for Ubuntu 14.04 (LTS)
|
||||||
|
Server <http://docs.openstack.org/liberty/install-guide-ubuntu/>`_
|
||||||
|
|
||||||
|
- `OpenStack Administrator Guide <http://docs.openstack.org/admin-guide/>`_
|
||||||
|
|
||||||
|
- `OpenStack Cloud Computing Cookbook (Packt
|
||||||
|
Publishing) <http://www.packtpub.com/openstack-cloud-computing-cookbook-second-edition/book>`_
|
||||||
|
|
||||||
|
Cloud (General)
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
- `“The NIST Definition of Cloud
|
||||||
|
Computing” <http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf>`_
|
||||||
|
|
||||||
|
Python
|
||||||
|
~~~~~~
|
||||||
|
|
||||||
|
- `Dive Into Python (Apress) <http://www.diveintopython.net/>`_
|
||||||
|
|
||||||
|
Networking
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
- `TCP/IP Illustrated, Volume 1: The Protocols, 2/E
|
||||||
|
(Pearson) <http://www.pearsonhighered.com/educator/product/TCPIP-Illustrated-Volume-1-The-Protocols/9780321336316.page>`_
|
||||||
|
|
||||||
|
- `The TCP/IP Guide (No Starch
|
||||||
|
Press) <http://www.nostarch.com/tcpip.htm>`_
|
||||||
|
|
||||||
|
- `“A tcpdump Tutorial and
|
||||||
|
Primer” <http://danielmiessler.com/study/tcpdump/>`_
|
||||||
|
|
||||||
|
Systems Administration
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
- `UNIX and Linux Systems Administration Handbook (Prentice
|
||||||
|
Hall) <http://www.admin.com/>`_
|
||||||
|
|
||||||
|
Virtualization
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
- `The Book of Xen (No Starch
|
||||||
|
Press) <http://www.nostarch.com/xen.htm>`_
|
||||||
|
|
||||||
|
Configuration Management
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
- `Puppet Labs Documentation <http://docs.puppetlabs.com/>`_
|
||||||
|
|
||||||
|
- `Pro Puppet (Apress) <http://www.apress.com/9781430230571>`_
|
|
@ -0,0 +1,435 @@
|
||||||
|
=====================
|
||||||
|
Working with Roadmaps
|
||||||
|
=====================
|
||||||
|
|
||||||
|
The good news: OpenStack has unprecedented transparency when it comes to
|
||||||
|
providing information about what's coming up. The bad news: each release
|
||||||
|
moves very quickly. The purpose of this appendix is to highlight some of
|
||||||
|
the useful pages to track, and take an educated guess at what is coming
|
||||||
|
up in the next release and perhaps further afield.
|
||||||
|
|
||||||
|
OpenStack follows a six month release cycle, typically releasing in
|
||||||
|
April/May and October/November each year. At the start of each cycle,
|
||||||
|
the community gathers in a single location for a design summit. At the
|
||||||
|
summit, the features for the coming releases are discussed, prioritized,
|
||||||
|
and planned. The below figure shows an example release cycle, with dates
|
||||||
|
showing milestone releases, code freeze, and string freeze dates, along
|
||||||
|
with an example of when the summit occurs. Milestones are interim releases
|
||||||
|
within the cycle that are available as packages for download and
|
||||||
|
testing. Code freeze is putting a stop to adding new features to the
|
||||||
|
release. String freeze is putting a stop to changing any strings within
|
||||||
|
the source code.
|
||||||
|
|
||||||
|
.. image:: figures/osog_ac01.png
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
|
||||||
|
Information Available to You
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
There are several good sources of information available that you can use
|
||||||
|
to track your OpenStack development desires.OpenStack community working
|
||||||
|
with roadmaps information available
|
||||||
|
|
||||||
|
Release notes are maintained on the OpenStack wiki, and also shown here:
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:widths: 25 25 25 25
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Series
|
||||||
|
- Status
|
||||||
|
- Releases
|
||||||
|
- Date
|
||||||
|
* - Liberty
|
||||||
|
- `Under Development
|
||||||
|
<https://wiki.openstack.org/wiki/Liberty_Release_Schedule>`_
|
||||||
|
- 2015.2
|
||||||
|
- Oct, 2015
|
||||||
|
* - Kilo
|
||||||
|
- `Current stable release, security-supported
|
||||||
|
<https://wiki.openstack.org/wiki/Kilo_Release_Schedule>`_
|
||||||
|
- `2015.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Kilo>`_
|
||||||
|
- Apr 30, 2015
|
||||||
|
* - Juno
|
||||||
|
- `Security-supported
|
||||||
|
<https://wiki.openstack.org/wiki/Juno_Release_Schedule>`_
|
||||||
|
- `2014.2 <https://wiki.openstack.org/wiki/ReleaseNotes/Juno>`_
|
||||||
|
- Oct 16, 2014
|
||||||
|
* - Icehouse
|
||||||
|
- `End-of-life
|
||||||
|
<https://wiki.openstack.org/wiki/Icehouse_Release_Schedule>`_
|
||||||
|
- `2014.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Icehouse>`_
|
||||||
|
- Apr 17, 2014
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2014.1.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2014.1.1>`_
|
||||||
|
- Jun 9, 2014
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2014.1.2 <https://wiki.openstack.org/wiki/ReleaseNotes/2014.1.2>`_
|
||||||
|
- Aug 8, 2014
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2014.1.3 <https://wiki.openstack.org/wiki/ReleaseNotes/2014.1.3>`_
|
||||||
|
- Oct 2, 2014
|
||||||
|
* - Havana
|
||||||
|
- End-of-life
|
||||||
|
- `2013.2 <https://wiki.openstack.org/wiki/ReleaseNotes/Havana>`_
|
||||||
|
- Apr 4, 2013
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2013.2.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.2.1>`_
|
||||||
|
- Dec 16, 2013
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2013.2.2 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.2.2>`_
|
||||||
|
- Feb 13, 2014
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2013.2.3 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.2.3>`_
|
||||||
|
- Apr 3, 2014
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2013.2.4 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.2.4>`_
|
||||||
|
- Sep 22, 2014
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2013.2.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.2.1>`_
|
||||||
|
- Dec 16, 2013
|
||||||
|
* - Grizzly
|
||||||
|
- End-of-life
|
||||||
|
- `2013.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Grizzly>`_
|
||||||
|
- Apr 4, 2013
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2013.1.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.1.1>`_
|
||||||
|
- May 9, 2013
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2013.1.2 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.1.2>`_
|
||||||
|
- Jun 6, 2013
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2013.1.3 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.1.3>`_
|
||||||
|
- Aug 8, 2013
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2013.1.4 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.1.4>`_
|
||||||
|
- Oct 17, 2013
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2013.1.5 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.1.5>`_
|
||||||
|
- Mar 20, 2015
|
||||||
|
* - Folsom
|
||||||
|
- End-of-life
|
||||||
|
- `2012.2 <https://wiki.openstack.org/wiki/ReleaseNotes/Folsom>`_
|
||||||
|
- Sep 27, 2012
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2012.2.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.2.1>`_
|
||||||
|
- Nov 29, 2012
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2012.2.2 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.2.2>`_
|
||||||
|
- Dec 13, 2012
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2012.2.3 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.2.3>`_
|
||||||
|
- Jan 31, 2013
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2012.2.4 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.2.4>`_
|
||||||
|
- Apr 11, 2013
|
||||||
|
* - Essex
|
||||||
|
- End-of-life
|
||||||
|
- `2012.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Essex>`_
|
||||||
|
- Apr 5, 2012
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2012.1.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.1.1>`_
|
||||||
|
- Jun 22, 2012
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2012.1.2 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.1.2>`_
|
||||||
|
- Aug 10, 2012
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2012.1.3 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.1.3>`_
|
||||||
|
- Oct 12, 2012
|
||||||
|
* - Diablo
|
||||||
|
- Deprecated
|
||||||
|
- `2011.3 <https://wiki.openstack.org/wiki/ReleaseNotes/Diablo>`_
|
||||||
|
- Sep 22, 2011
|
||||||
|
* -
|
||||||
|
-
|
||||||
|
- `2011.3.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2011.3.1>`_
|
||||||
|
- Jan 19, 2012
|
||||||
|
* - Cactus
|
||||||
|
- Deprecated
|
||||||
|
- `2011.2 <https://wiki.openstack.org/wiki/ReleaseNotes/Cactus>`_
|
||||||
|
- Apr 15, 2011
|
||||||
|
* - Bexar
|
||||||
|
- Deprecated
|
||||||
|
- `2011.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Bexar>`_
|
||||||
|
- Feb 3, 2011
|
||||||
|
* - Austin
|
||||||
|
- Deprecated
|
||||||
|
- `2010.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Austin>`_
|
||||||
|
- Oct 21, 2010
|
||||||
|
|
||||||
|
Here are some other resources:
|
||||||
|
|
||||||
|
- `A breakdown of current features under development, with their target
|
||||||
|
milestone <http://status.openstack.org/release/>`_
|
||||||
|
|
||||||
|
- `A list of all features, including those not yet under
|
||||||
|
development <https://blueprints.launchpad.net/openstack>`_
|
||||||
|
|
||||||
|
- `Rough-draft design discussions ("etherpads") from the last design
|
||||||
|
summit <https://wiki.openstack.org/wiki/Summit/Kilo/Etherpads>`_
|
||||||
|
|
||||||
|
- `List of individual code changes under
|
||||||
|
review <https://review.openstack.org/>`_
|
||||||
|
|
||||||
|
Influencing the Roadmap
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack truly welcomes your ideas (and contributions) and highly
|
||||||
|
values feedback from real-world users of the software. By learning a
|
||||||
|
little about the process that drives feature development, you can
|
||||||
|
participate and perhaps get the additions you desire.
|
||||||
|
|
||||||
|
Feature requests typically start their life in Etherpad, a collaborative
|
||||||
|
editing tool, which is used to take coordinating notes at a design
|
||||||
|
summit session specific to the feature. This then leads to the creation
|
||||||
|
of a blueprint on the Launchpad site for the particular project, which
|
||||||
|
is used to describe the feature more formally. Blueprints are then
|
||||||
|
approved by project team members, and development can begin.
|
||||||
|
|
||||||
|
Therefore, the fastest way to get your feature request up for
|
||||||
|
consideration is to create an Etherpad with your ideas and propose a
|
||||||
|
session to the design summit. If the design summit has already passed,
|
||||||
|
you may also create a blueprint directly. Read this `blog post about how
|
||||||
|
to work with blueprints
|
||||||
|
<http://vmartinezdelacruz.com/how-to-work-with-blueprints-without-losing-your-mind/>`_
|
||||||
|
the perspective of Victoria Martínez, a developer intern.
|
||||||
|
|
||||||
|
The roadmap for the next release as it is developed can be seen at
|
||||||
|
`Releases <http://releases.openstack.org>`_.
|
||||||
|
|
||||||
|
To determine the potential features going in to future releases, or to
|
||||||
|
look at features implemented previously, take a look at the existing
|
||||||
|
blueprints such as `OpenStack Compute (nova)
|
||||||
|
Blueprints <https://blueprints.launchpad.net/nova>`_, `OpenStack
|
||||||
|
Identity (keystone)
|
||||||
|
Blueprints <https://blueprints.launchpad.net/keystone>`_, and release
|
||||||
|
notes.
|
||||||
|
|
||||||
|
Aside from the direct-to-blueprint pathway, there is another very
|
||||||
|
well-regarded mechanism to influence the development roadmap: the user
|
||||||
|
survey. Found at http://openstack.org/user-survey, it allows you to
|
||||||
|
provide details of your deployments and needs, anonymously by default.
|
||||||
|
Each cycle, the user committee analyzes the results and produces a
|
||||||
|
report, including providing specific information to the technical
|
||||||
|
committee and project team leads.
|
||||||
|
|
||||||
|
Aspects to Watch
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
You want to keep an eye on the areas improving within OpenStack. The
|
||||||
|
best way to "watch" roadmaps for each project is to look at the
|
||||||
|
blueprints that are being approved for work on milestone releases. You
|
||||||
|
can also learn from PTL webinars that follow the OpenStack summits twice
|
||||||
|
a year.
|
||||||
|
|
||||||
|
Driver Quality Improvements
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
A major quality push has occurred across drivers and plug-ins in Block
|
||||||
|
Storage, Compute, and Networking. Particularly, developers of Compute
|
||||||
|
and Networking drivers that require proprietary or hardware products are
|
||||||
|
now required to provide an automated external testing system for use
|
||||||
|
during the development process.
|
||||||
|
|
||||||
|
Easier Upgrades
|
||||||
|
---------------
|
||||||
|
|
||||||
|
One of the most requested features since OpenStack began (for components
|
||||||
|
other than Object Storage, which tends to "just work"): easier upgrades.
|
||||||
|
In all recent releases internal messaging communication is versioned,
|
||||||
|
meaning services can theoretically drop back to backward-compatible
|
||||||
|
behavior. This allows you to run later versions of some components,
|
||||||
|
while keeping older versions of others.
|
||||||
|
|
||||||
|
In addition, database migrations are now tested with the Turbo Hipster
|
||||||
|
tool. This tool tests database migration performance on copies of
|
||||||
|
real-world user databases.
|
||||||
|
|
||||||
|
These changes have facilitated the first proper OpenStack upgrade guide,
|
||||||
|
found in :doc:`ops_upgrades`, and will continue to improve in the next
|
||||||
|
release.
|
||||||
|
|
||||||
|
Deprecation of Nova Network
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
With the introduction of the full software-defined networking stack
|
||||||
|
provided by OpenStack Networking (neutron) in the Folsom release,
|
||||||
|
development effort on the initial networking code that remains part of
|
||||||
|
the Compute component has gradually lessened. While many still use
|
||||||
|
``nova-network`` in production, there has been a long-term plan to
|
||||||
|
remove the code in favor of the more flexible and full-featured
|
||||||
|
OpenStack Networking.
|
||||||
|
|
||||||
|
An attempt was made to deprecate ``nova-network`` during the Havana
|
||||||
|
release, which was aborted due to the lack of equivalent functionality
|
||||||
|
(such as the FlatDHCP multi-host high-availability mode mentioned in
|
||||||
|
this guide), lack of a migration path between versions, insufficient
|
||||||
|
testing, and simplicity when used for the more straightforward use cases
|
||||||
|
``nova-network`` traditionally supported. Though significant effort has
|
||||||
|
been made to address these concerns, ``nova-network`` was not be
|
||||||
|
deprecated in the Juno release. In addition, to a limited degree,
|
||||||
|
patches to ``nova-network`` have again begin to be accepted, such as
|
||||||
|
adding a per-network settings feature and SR-IOV support in Juno.
|
||||||
|
|
||||||
|
This leaves you with an important point of decision when designing your
|
||||||
|
cloud. OpenStack Networking is robust enough to use with a small number
|
||||||
|
of limitations (performance issues in some scenarios, only basic high
|
||||||
|
availability of layer 3 systems) and provides many more features than
|
||||||
|
``nova-network``. However, if you do not have the more complex use cases
|
||||||
|
that can benefit from fuller software-defined networking capabilities,
|
||||||
|
or are uncomfortable with the new concepts introduced, ``nova-network``
|
||||||
|
may continue to be a viable option for the next 12 months.
|
||||||
|
|
||||||
|
Similarly, if you have an existing cloud and are looking to upgrade from
|
||||||
|
``nova-network`` to OpenStack Networking, you should have the option to
|
||||||
|
delay the upgrade for this period of time. However, each release of
|
||||||
|
OpenStack brings significant new innovation, and regardless of your use
|
||||||
|
of networking methodology, it is likely best to begin planning for an
|
||||||
|
upgrade within a reasonable timeframe of each release.
|
||||||
|
|
||||||
|
As mentioned, there's currently no way to cleanly migrate from
|
||||||
|
``nova-network`` to neutron. We recommend that you keep a migration in
|
||||||
|
mind and what that process might involve for when a proper migration
|
||||||
|
path is released.
|
||||||
|
|
||||||
|
Distributed Virtual Router
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
One of the long-time complaints surrounding OpenStack Networking was the
|
||||||
|
lack of high availability for the layer 3 components. The Juno release
|
||||||
|
introduced Distributed Virtual Router (DVR), which aims to solve this
|
||||||
|
problem.
|
||||||
|
|
||||||
|
Early indications are that it does do this well for a base set of
|
||||||
|
scenarios, such as using the ML2 plug-in with Open vSwitch, one flat
|
||||||
|
external network and VXLAN tenant networks. However, it does appear that
|
||||||
|
there are problems with the use of VLANs, IPv6, Floating IPs, high
|
||||||
|
north-south traffic scenarios and large numbers of compute nodes. It is
|
||||||
|
expected these will improve significantly with the next release, but bug
|
||||||
|
reports on specific issues are highly desirable.
|
||||||
|
|
||||||
|
Replacement of Open vSwitch Plug-in with Modular Layer 2
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The Modular Layer 2 plug-in is a framework allowing OpenStack Networking
|
||||||
|
to simultaneously utilize the variety of layer-2 networking technologies
|
||||||
|
found in complex real-world data centers. It currently works with the
|
||||||
|
existing Open vSwitch, Linux Bridge, and Hyper-V L2 agents and is
|
||||||
|
intended to replace and deprecate the monolithic plug-ins associated
|
||||||
|
with those L2 agents.
|
||||||
|
|
||||||
|
New API Versions
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The third version of the Compute API was broadly discussed and worked on
|
||||||
|
during the Havana and Icehouse release cycles. Current discussions
|
||||||
|
indicate that the V2 API will remain for many releases, and the next
|
||||||
|
iteration of the API will be denoted v2.1 and have similar properties to
|
||||||
|
the existing v2.0, rather than an entirely new v3 API. This is a great
|
||||||
|
time to evaluate all API and provide comments while the next generation
|
||||||
|
APIs are being defined. A new working group was formed specifically to
|
||||||
|
`improve OpenStack APIs <https://wiki.openstack.org/wiki/API_Working_Group>`_
|
||||||
|
and create design guidelines, which you are welcome to join.
|
||||||
|
|
||||||
|
OpenStack on OpenStack (TripleO)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This project continues to improve and you may consider using it for
|
||||||
|
greenfield deployments, though according to the latest user survey
|
||||||
|
results it remains to see widespread uptake.
|
||||||
|
|
||||||
|
Data processing service for OpenStack (sahara)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A much-requested answer to big data problems, a dedicated team has been
|
||||||
|
making solid progress on a Hadoop-as-a-Service project.
|
||||||
|
|
||||||
|
Bare metal Deployment (ironic)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The bare-metal deployment has been widely lauded, and development
|
||||||
|
continues. The Juno release brought the OpenStack Bare metal drive into
|
||||||
|
the Compute project, and it was aimed to deprecate the existing
|
||||||
|
bare-metal driver in Kilo. If you are a current user of the bare metal
|
||||||
|
driver, a particular blueprint to follow is `Deprecate the bare metal
|
||||||
|
driver
|
||||||
|
<https://blueprints.launchpad.net/nova/+spec/deprecate-baremetal-driver>`_
|
||||||
|
|
||||||
|
Database as a Service (trove)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The OpenStack community has had a database-as-a-service tool in
|
||||||
|
development for some time, and we saw the first integrated release of it
|
||||||
|
in Icehouse. From its release it was able to deploy database servers out
|
||||||
|
of the box in a highly available way, initially supporting only MySQL.
|
||||||
|
Juno introduced support for Mongo (including clustering), PostgreSQL and
|
||||||
|
Couchbase, in addition to replication functionality for MySQL. In Kilo,
|
||||||
|
more advanced clustering capability was delivered, in addition to better
|
||||||
|
integration with other OpenStack components such as Networking.
|
||||||
|
|
||||||
|
Message Service (zaqar)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A service to provide queues of messages and notifications was released.
|
||||||
|
|
||||||
|
DNS service (designate)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A long requested service, to provide the ability to manipulate DNS
|
||||||
|
entries associated with OpenStack resources has gathered a following.
|
||||||
|
The designate project was also released.
|
||||||
|
|
||||||
|
Scheduler Improvements
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Both Compute and Block Storage rely on schedulers to determine where to
|
||||||
|
place virtual machines or volumes. In Havana, the Compute scheduler
|
||||||
|
underwent significant improvement, while in Icehouse it was the
|
||||||
|
scheduler in Block Storage that received a boost. Further down the
|
||||||
|
track, an effort started this cycle that aims to create a holistic
|
||||||
|
scheduler covering both will come to fruition. Some of the work that was
|
||||||
|
done in Kilo can be found under the `Gantt
|
||||||
|
project <https://wiki.openstack.org/wiki/Gantt/kilo>`_.
|
||||||
|
|
||||||
|
Block Storage Improvements
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
Block Storage is considered a stable project, with wide uptake and a
|
||||||
|
long track record of quality drivers. The team has discussed many areas
|
||||||
|
of work at the summits, including better error reporting, automated
|
||||||
|
discovery, and thin provisioning features.
|
||||||
|
|
||||||
|
Toward a Python SDK
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Though many successfully use the various python-\*client code as an
|
||||||
|
effective SDK for interacting with OpenStack, consistency between the
|
||||||
|
projects and documentation availability waxes and wanes. To combat this,
|
||||||
|
an `effort to improve the
|
||||||
|
experience <https://wiki.openstack.org/wiki/PythonOpenStackSDK>`_ has
|
||||||
|
started. Cross-project development efforts in OpenStack have a checkered
|
||||||
|
history, such as the `unified client
|
||||||
|
project <https://wiki.openstack.org/wiki/OpenStackClient>`_ having
|
||||||
|
several false starts. However, the early signs for the SDK project are
|
||||||
|
promising, and we expect to see results during the Juno cycle.
|
|
@ -0,0 +1,192 @@
|
||||||
|
=========
|
||||||
|
Use Cases
|
||||||
|
=========
|
||||||
|
|
||||||
|
This appendix contains a small selection of use cases from the
|
||||||
|
community, with more technical detail than usual. Further examples can
|
||||||
|
be found on the `OpenStack website <https://www.openstack.org/user-stories/>`_.
|
||||||
|
|
||||||
|
NeCTAR
|
||||||
|
~~~~~~
|
||||||
|
|
||||||
|
Who uses it: researchers from the Australian publicly funded research
|
||||||
|
sector. Use is across a wide variety of disciplines, with the purpose of
|
||||||
|
instances ranging from running simple web servers to using hundreds of
|
||||||
|
cores for high-throughput computing.
|
||||||
|
|
||||||
|
Deployment
|
||||||
|
----------
|
||||||
|
|
||||||
|
Using OpenStack Compute cells, the NeCTAR Research Cloud spans eight
|
||||||
|
sites with approximately 4,000 cores per site.
|
||||||
|
|
||||||
|
Each site runs a different configuration, as a resource cells in an
|
||||||
|
OpenStack Compute cells setup. Some sites span multiple data centers,
|
||||||
|
some use off compute node storage with a shared file system, and some
|
||||||
|
use on compute node storage with a non-shared file system. Each site
|
||||||
|
deploys the Image service with an Object Storage back end. A central
|
||||||
|
Identity, dashboard, and Compute API service are used. A login to the
|
||||||
|
dashboard triggers a SAML login with Shibboleth, which creates an
|
||||||
|
account in the Identity service with an SQL back end. An Object Storage
|
||||||
|
Global Cluster is used across several sites.
|
||||||
|
|
||||||
|
Compute nodes have 24 to 48 cores, with at least 4 GB of RAM per core
|
||||||
|
and approximately 40 GB of ephemeral storage per core.
|
||||||
|
|
||||||
|
All sites are based on Ubuntu 14.04, with KVM as the hypervisor. The
|
||||||
|
OpenStack version in use is typically the current stable version, with 5
|
||||||
|
to 10 percent back-ported code from trunk and modifications.
|
||||||
|
|
||||||
|
Resources
|
||||||
|
---------
|
||||||
|
|
||||||
|
- `OpenStack.org case
|
||||||
|
study <https://www.openstack.org/user-stories/nectar/>`_
|
||||||
|
|
||||||
|
- `NeCTAR-RC GitHub <https://github.com/NeCTAR-RC/>`_
|
||||||
|
|
||||||
|
- `NeCTAR website <https://www.nectar.org.au/>`_
|
||||||
|
|
||||||
|
MIT CSAIL
|
||||||
|
~~~~~~~~~
|
||||||
|
|
||||||
|
Who uses it: researchers from the MIT Computer Science and Artificial
|
||||||
|
Intelligence Lab.
|
||||||
|
|
||||||
|
Deployment
|
||||||
|
----------
|
||||||
|
|
||||||
|
The CSAIL cloud is currently 64 physical nodes with a total of 768
|
||||||
|
physical cores and 3,456 GB of RAM. Persistent data storage is largely
|
||||||
|
outside the cloud on NFS, with cloud resources focused on compute
|
||||||
|
resources. There are more than 130 users in more than 40 projects,
|
||||||
|
typically running 2,000–2,500 vCPUs in 300 to 400 instances.
|
||||||
|
|
||||||
|
We initially deployed on Ubuntu 12.04 with the Essex release of
|
||||||
|
OpenStack using FlatDHCP multi-host networking.
|
||||||
|
|
||||||
|
The software stack is still Ubuntu 12.04 LTS, but now with OpenStack
|
||||||
|
Havana from the Ubuntu Cloud Archive. KVM is the hypervisor, deployed
|
||||||
|
using `FAI <http://fai-project.org/>`_ and Puppet for configuration
|
||||||
|
management. The FAI and Puppet combination is used lab-wide, not only
|
||||||
|
for OpenStack. There is a single cloud controller node, which also acts
|
||||||
|
as network controller, with the remainder of the server hardware
|
||||||
|
dedicated to compute nodes.
|
||||||
|
|
||||||
|
Host aggregates and instance-type extra specs are used to provide two
|
||||||
|
different resource allocation ratios. The default resource allocation
|
||||||
|
ratios we use are 4:1 CPU and 1.5:1 RAM. Compute-intensive workloads use
|
||||||
|
instance types that require non-oversubscribed hosts where ``cpu_ratio``
|
||||||
|
and ``ram_ratio`` are both set to 1.0. Since we have hyper-threading
|
||||||
|
enabled on our compute nodes, this provides one vCPU per CPU thread, or
|
||||||
|
two vCPUs per physical core.
|
||||||
|
|
||||||
|
With our upgrade to Grizzly in August 2013, we moved to OpenStack
|
||||||
|
Networking, neutron (quantum at the time). Compute nodes have
|
||||||
|
two-gigabit network interfaces and a separate management card for IPMI
|
||||||
|
management. One network interface is used for node-to-node
|
||||||
|
communications. The other is used as a trunk port for OpenStack managed
|
||||||
|
VLANs. The controller node uses two bonded 10g network interfaces for
|
||||||
|
its public IP communications. Big pipes are used here because images are
|
||||||
|
served over this port, and it is also used to connect to iSCSI storage,
|
||||||
|
back-ending the image storage and database. The controller node also has
|
||||||
|
a gigabit interface that is used in trunk mode for OpenStack managed
|
||||||
|
VLAN traffic. This port handles traffic to the dhcp-agent and
|
||||||
|
metadata-proxy.
|
||||||
|
|
||||||
|
We approximate the older ``nova-network`` multi-host HA setup by using
|
||||||
|
"provider VLAN networks" that connect instances directly to existing
|
||||||
|
publicly addressable networks and use existing physical routers as their
|
||||||
|
default gateway. This means that if our network controller goes down,
|
||||||
|
running instances still have their network available, and no single
|
||||||
|
Linux host becomes a traffic bottleneck. We are able to do this because
|
||||||
|
we have a sufficient supply of IPv4 addresses to cover all of our
|
||||||
|
instances and thus don't need NAT and don't use floating IP addresses.
|
||||||
|
We provide a single generic public network to all projects and
|
||||||
|
additional existing VLANs on a project-by-project basis as needed.
|
||||||
|
Individual projects are also allowed to create their own private GRE
|
||||||
|
based networks.
|
||||||
|
|
||||||
|
Resources
|
||||||
|
---------
|
||||||
|
|
||||||
|
- `CSAIL homepage <http://www.csail.mit.edu/>`_
|
||||||
|
|
||||||
|
DAIR
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
Who uses it: DAIR is an integrated virtual environment that leverages
|
||||||
|
the CANARIE network to develop and test new information communication
|
||||||
|
technology (ICT) and other digital technologies. It combines such
|
||||||
|
digital infrastructure as advanced networking and cloud computing and
|
||||||
|
storage to create an environment for developing and testing innovative
|
||||||
|
ICT applications, protocols, and services; performing at-scale
|
||||||
|
experimentation for deployment; and facilitating a faster time to
|
||||||
|
market.
|
||||||
|
|
||||||
|
Deployment
|
||||||
|
----------
|
||||||
|
|
||||||
|
DAIR is hosted at two different data centers across Canada: one in
|
||||||
|
Alberta and the other in Quebec. It consists of a cloud controller at
|
||||||
|
each location, although, one is designated the "master" controller that
|
||||||
|
is in charge of central authentication and quotas. This is done through
|
||||||
|
custom scripts and light modifications to OpenStack. DAIR is currently
|
||||||
|
running Havana.
|
||||||
|
|
||||||
|
For Object Storage, each region has a swift environment.
|
||||||
|
|
||||||
|
A NetApp appliance is used in each region for both block storage and
|
||||||
|
instance storage. There are future plans to move the instances off the
|
||||||
|
NetApp appliance and onto a distributed file system such as :term:`Ceph` or
|
||||||
|
GlusterFS.
|
||||||
|
|
||||||
|
VlanManager is used extensively for network management. All servers have
|
||||||
|
two bonded 10GbE NICs that are connected to two redundant switches. DAIR
|
||||||
|
is set up to use single-node networking where the cloud controller is
|
||||||
|
the gateway for all instances on all compute nodes. Internal OpenStack
|
||||||
|
traffic (for example, storage traffic) does not go through the cloud
|
||||||
|
controller.
|
||||||
|
|
||||||
|
Resources
|
||||||
|
---------
|
||||||
|
|
||||||
|
- `DAIR homepage <http://www.canarie.ca/cloud/>`__
|
||||||
|
|
||||||
|
CERN
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
Who uses it: researchers at CERN (European Organization for Nuclear
|
||||||
|
Research) conducting high-energy physics research.
|
||||||
|
|
||||||
|
Deployment
|
||||||
|
----------
|
||||||
|
|
||||||
|
The environment is largely based on Scientific Linux 6, which is Red Hat
|
||||||
|
compatible. We use KVM as our primary hypervisor, although tests are
|
||||||
|
ongoing with Hyper-V on Windows Server 2008.
|
||||||
|
|
||||||
|
We use the Puppet Labs OpenStack modules to configure Compute, Image
|
||||||
|
service, Identity, and dashboard. Puppet is used widely for instance
|
||||||
|
configuration, and Foreman is used as a GUI for reporting and instance
|
||||||
|
provisioning.
|
||||||
|
|
||||||
|
Users and groups are managed through Active Directory and imported into
|
||||||
|
the Identity service using LDAP. CLIs are available for nova and
|
||||||
|
Euca2ools to do this.
|
||||||
|
|
||||||
|
There are three clouds currently running at CERN, totaling about 4,700
|
||||||
|
compute nodes, with approximately 120,000 cores. The CERN IT cloud aims
|
||||||
|
to expand to 300,000 cores by 2015.
|
||||||
|
|
||||||
|
Resources
|
||||||
|
---------
|
||||||
|
|
||||||
|
- `“OpenStack in Production: A tale of 3 OpenStack
|
||||||
|
Clouds” <http://openstack-in-production.blogspot.de/2013/09/a-tale-of-3-openstack-clouds-50000.html>`_
|
||||||
|
|
||||||
|
- `“Review of CERN Data Centre
|
||||||
|
Infrastructure” <http://cds.cern.ch/record/1457989/files/chep%202012%20CERN%20infrastructure%20final.pdf?version=1>`_
|
||||||
|
|
||||||
|
- `“CERN Cloud Infrastructure User
|
||||||
|
Guide” <http://information-technology.web.cern.ch/book/cern-private-cloud-user-guide>`_
|
|
@ -0,0 +1,403 @@
|
||||||
|
====================================================
|
||||||
|
Designing for Cloud Controllers and Cloud Management
|
||||||
|
====================================================
|
||||||
|
|
||||||
|
OpenStack is designed to be massively horizontally scalable, which
|
||||||
|
allows all services to be distributed widely. However, to simplify this
|
||||||
|
guide, we have decided to discuss services of a more central nature,
|
||||||
|
using the concept of a *cloud controller*. A cloud controller is just a
|
||||||
|
conceptual simplification. In the real world, you design an architecture
|
||||||
|
for your cloud controller that enables high availability so that if any
|
||||||
|
node fails, another can take over the required tasks. In reality, cloud
|
||||||
|
controller tasks are spread out across more than a single node.
|
||||||
|
|
||||||
|
The cloud controller provides the central management system for
|
||||||
|
OpenStack deployments. Typically, the cloud controller manages
|
||||||
|
authentication and sends messaging to all the systems through a message
|
||||||
|
queue.
|
||||||
|
|
||||||
|
For many deployments, the cloud controller is a single node. However, to
|
||||||
|
have high availability, you have to take a few considerations into
|
||||||
|
account, which we'll cover in this chapter.
|
||||||
|
|
||||||
|
The cloud controller manages the following services for the cloud:
|
||||||
|
|
||||||
|
Databases
|
||||||
|
Tracks current information about users and instances, for example,
|
||||||
|
in a database, typically one database instance managed per service
|
||||||
|
|
||||||
|
Message queue services
|
||||||
|
All :term:`Advanced Message Queuing Protocol (AMQP)` messages for
|
||||||
|
services are received and sent according to the queue broker
|
||||||
|
|
||||||
|
Conductor services
|
||||||
|
Proxy requests to a database
|
||||||
|
|
||||||
|
Authentication and authorization for identity management
|
||||||
|
Indicates which users can do what actions on certain cloud
|
||||||
|
resources; quota management is spread out among services,
|
||||||
|
howeverauthentication
|
||||||
|
|
||||||
|
Image-management services
|
||||||
|
Stores and serves images with metadata on each, for launching in the
|
||||||
|
cloud
|
||||||
|
|
||||||
|
Scheduling services
|
||||||
|
Indicates which resources to use first; for example, spreading out
|
||||||
|
where instances are launched based on an algorithm
|
||||||
|
|
||||||
|
User dashboard
|
||||||
|
Provides a web-based front end for users to consume OpenStack cloud
|
||||||
|
services
|
||||||
|
|
||||||
|
API endpoints
|
||||||
|
Offers each service's REST API access, where the API endpoint
|
||||||
|
catalog is managed by the Identity service
|
||||||
|
|
||||||
|
For our example, the cloud controller has a collection of ``nova-*``
|
||||||
|
components that represent the global state of the cloud; talks to
|
||||||
|
services such as authentication; maintains information about the cloud
|
||||||
|
in a database; communicates to all compute nodes and storage
|
||||||
|
:term:`workers <worker>` through a queue; and provides API access.
|
||||||
|
Each service running on a designated cloud controller may be broken out
|
||||||
|
into separate nodes for scalability or availability.
|
||||||
|
|
||||||
|
As another example, you could use pairs of servers for a collective
|
||||||
|
cloud controller—one active, one standby—for redundant nodes providing a
|
||||||
|
given set of related services, such as:
|
||||||
|
|
||||||
|
- Front end web for API requests, the scheduler for choosing which
|
||||||
|
compute node to boot an instance on, Identity services, and the
|
||||||
|
dashboard
|
||||||
|
|
||||||
|
- Database and message queue server (such as MySQL, RabbitMQ)
|
||||||
|
|
||||||
|
- Image service for the image management
|
||||||
|
|
||||||
|
Now that you see the myriad designs for controlling your cloud, read
|
||||||
|
more about the further considerations to help with your design
|
||||||
|
decisions.
|
||||||
|
|
||||||
|
Hardware Considerations
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A cloud controller's hardware can be the same as a compute node, though
|
||||||
|
you may want to further specify based on the size and type of cloud that
|
||||||
|
you run.
|
||||||
|
|
||||||
|
It's also possible to use virtual machines for all or some of the
|
||||||
|
services that the cloud controller manages, such as the message queuing.
|
||||||
|
In this guide, we assume that all services are running directly on the
|
||||||
|
cloud controller.
|
||||||
|
|
||||||
|
The table below contains common considerations to review when sizing hardware
|
||||||
|
for the cloud controller design.
|
||||||
|
|
||||||
|
.. list-table:: Cloud controller hardware sizing considerations
|
||||||
|
:widths: 50 50
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Consideration
|
||||||
|
- Ramification
|
||||||
|
* - How many instances will run at once?
|
||||||
|
- Size your database server accordingly, and scale out beyond one cloud
|
||||||
|
controller if many instances will report status at the same time and
|
||||||
|
scheduling where a new instance starts up needs computing power.
|
||||||
|
* - How many compute nodes will run at once?
|
||||||
|
- Ensure that your messaging queue handles requests successfully and size
|
||||||
|
accordingly.
|
||||||
|
* - How many users will access the API?
|
||||||
|
- If many users will make multiple requests, make sure that the CPU load
|
||||||
|
for the cloud controller can handle it.
|
||||||
|
* - How many users will access the dashboard versus the REST API directly?
|
||||||
|
- The dashboard makes many requests, even more than the API access, so
|
||||||
|
add even more CPU if your dashboard is the main interface for your users.
|
||||||
|
* - How many ``nova-api`` services do you run at once for your cloud?
|
||||||
|
- You need to size the controller with a core per service.
|
||||||
|
* - How long does a single instance run?
|
||||||
|
- Starting instances and deleting instances is demanding on the compute
|
||||||
|
node but also demanding on the controller node because of all the API
|
||||||
|
queries and scheduling needs.
|
||||||
|
* - Does your authentication system also verify externally?
|
||||||
|
- External systems such as LDAP or Active Directory require network
|
||||||
|
connectivity between the cloud controller and an external authentication
|
||||||
|
system. Also ensure that the cloud controller has the CPU power to keep
|
||||||
|
up with requests.
|
||||||
|
|
||||||
|
|
||||||
|
Separation of Services
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
While our example contains all central services in a single location, it
|
||||||
|
is possible and indeed often a good idea to separate services onto
|
||||||
|
different physical servers. The table below is a list of deployment
|
||||||
|
scenarios we've seen and their justifications.
|
||||||
|
|
||||||
|
.. list-table:: Deployment scenarios
|
||||||
|
:widths: 50 50
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Scenario
|
||||||
|
- Justification
|
||||||
|
* - Run ``glance-*`` servers on the ``swift-proxy`` server.
|
||||||
|
- This deployment felt that the spare I/O on the Object Storage proxy
|
||||||
|
server was sufficient and that the Image Delivery portion of glance
|
||||||
|
benefited from being on physical hardware and having good connectivity
|
||||||
|
to the Object Storage back end it was using.
|
||||||
|
* - Run a central dedicated database server.
|
||||||
|
- This deployment used a central dedicated server to provide the databases
|
||||||
|
for all services. This approach simplified operations by isolating
|
||||||
|
database server updates and allowed for the simple creation of slave
|
||||||
|
database servers for failover.
|
||||||
|
* - Run one VM per service.
|
||||||
|
- This deployment ran central services on a set of servers running KVM.
|
||||||
|
A dedicated VM was created for each service (``nova-scheduler``,
|
||||||
|
rabbitmq, database, etc). This assisted the deployment with scaling
|
||||||
|
because administrators could tune the resources given to each virtual
|
||||||
|
machine based on the load it received (something that was not well
|
||||||
|
understood during installation).
|
||||||
|
* - Use an external load balancer.
|
||||||
|
- This deployment had an expensive hardware load balancer in its
|
||||||
|
organization. It ran multiple ``nova-api`` and ``swift-proxy``
|
||||||
|
servers on different physical servers and used the load balancer
|
||||||
|
to switch between them.
|
||||||
|
|
||||||
|
One choice that always comes up is whether to virtualize. Some services,
|
||||||
|
such as ``nova-compute``, ``swift-proxy`` and ``swift-object`` servers,
|
||||||
|
should not be virtualized. However, control servers can often be happily
|
||||||
|
virtualized—the performance penalty can usually be offset by simply
|
||||||
|
running more of the service.
|
||||||
|
|
||||||
|
Database
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack Compute uses an SQL database to store and retrieve stateful
|
||||||
|
information. MySQL is the popular database choice in the OpenStack
|
||||||
|
community.
|
||||||
|
|
||||||
|
Loss of the database leads to errors. As a result, we recommend that you
|
||||||
|
cluster your database to make it failure tolerant. Configuring and
|
||||||
|
maintaining a database cluster is done outside OpenStack and is
|
||||||
|
determined by the database software you choose to use in your cloud
|
||||||
|
environment. MySQL/Galera is a popular option for MySQL-based databases.
|
||||||
|
|
||||||
|
Message Queue
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Most OpenStack services communicate with each other using the *message
|
||||||
|
queue*.messages design considerationsdesign considerations message
|
||||||
|
queues For example, Compute communicates to block storage services and
|
||||||
|
networking services through the message queue. Also, you can optionally
|
||||||
|
enable notifications for any service. RabbitMQ, Qpid, and Zeromq are all
|
||||||
|
popular choices for a message-queue service. In general, if the message
|
||||||
|
queue fails or becomes inaccessible, the cluster grinds to a halt and
|
||||||
|
ends up in a read-only state, with information stuck at the point where
|
||||||
|
the last message was sent. Accordingly, we recommend that you cluster
|
||||||
|
the message queue. Be aware that clustered message queues can be a pain
|
||||||
|
point for many OpenStack deployments. While RabbitMQ has native
|
||||||
|
clustering support, there have been reports of issues when running it at
|
||||||
|
a large scale. While other queuing solutions are available, such as Zeromq
|
||||||
|
and Qpid, Zeromq does not offer stateful queues. Qpid is the messaging
|
||||||
|
system of choice for Red Hat and its derivatives. Qpid does not have
|
||||||
|
native clustering capabilities and requires a supplemental service, such
|
||||||
|
as Pacemaker or Corsync. For your message queue, you need to determine
|
||||||
|
what level of data loss you are comfortable with and whether to use an
|
||||||
|
OpenStack project's ability to retry multiple MQ hosts in the event of a
|
||||||
|
failure, such as using Compute's ability to do so.
|
||||||
|
|
||||||
|
Conductor Services
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In the previous version of OpenStack, all ``nova-compute`` services
|
||||||
|
required direct access to the database hosted on the cloud controller.
|
||||||
|
This was problematic for two reasons: security and performance. With
|
||||||
|
regard to security, if a compute node is compromised, the attacker
|
||||||
|
inherently has access to the database. With regard to performance,
|
||||||
|
``nova-compute`` calls to the database are single-threaded and blocking.
|
||||||
|
This creates a performance bottleneck because database requests are
|
||||||
|
fulfilled serially rather than in parallel.
|
||||||
|
|
||||||
|
The conductor service resolves both of these issues by acting as a proxy
|
||||||
|
for the ``nova-compute`` service. Now, instead of ``nova-compute``
|
||||||
|
directly accessing the database, it contacts the ``nova-conductor``
|
||||||
|
service, and ``nova-conductor`` accesses the database on
|
||||||
|
``nova-compute``'s behalf. Since ``nova-compute`` no longer has direct
|
||||||
|
access to the database, the security issue is resolved. Additionally,
|
||||||
|
``nova-conductor`` is a nonblocking service, so requests from all
|
||||||
|
compute nodes are fulfilled in parallel.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
If you are using ``nova-network`` and multi-host networking in your
|
||||||
|
cloud environment, ``nova-compute`` still requires direct access to
|
||||||
|
the database.
|
||||||
|
|
||||||
|
The ``nova-conductor`` service is horizontally scalable. To make
|
||||||
|
``nova-conductor`` highly available and fault tolerant, just launch more
|
||||||
|
instances of the ``nova-conductor`` process, either on the same server
|
||||||
|
or across multiple servers.
|
||||||
|
|
||||||
|
Application Programming Interface (API)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
All public access, whether direct, through a command-line client, or
|
||||||
|
through the web-based dashboard, uses the API service. Find the API
|
||||||
|
reference at http://api.openstack.org/.
|
||||||
|
|
||||||
|
You must choose whether you want to support the Amazon EC2 compatibility
|
||||||
|
APIs, or just the OpenStack APIs. One issue you might encounter when
|
||||||
|
running both APIs is an inconsistent experience when referring to images
|
||||||
|
and instances.
|
||||||
|
|
||||||
|
For example, the EC2 API refers to instances using IDs that contain
|
||||||
|
hexadecimal, whereas the OpenStack API uses names and digits. Similarly,
|
||||||
|
the EC2 API tends to rely on DNS aliases for contacting virtual
|
||||||
|
machines, as opposed to OpenStack, which typically lists IP
|
||||||
|
addresses.
|
||||||
|
|
||||||
|
If OpenStack is not set up in the right way, it is simple to have
|
||||||
|
scenarios in which users are unable to contact their instances due to
|
||||||
|
having only an incorrect DNS alias. Despite this, EC2 compatibility can
|
||||||
|
assist users migrating to your cloud.
|
||||||
|
|
||||||
|
As with databases and message queues, having more than one :term:`API server`
|
||||||
|
is a good thing. Traditional HTTP load-balancing techniques can be used to
|
||||||
|
achieve a highly available ``nova-api`` service.
|
||||||
|
|
||||||
|
Extensions
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
The `API
|
||||||
|
Specifications <http://docs.openstack.org/api/api-specs.html>`_ define
|
||||||
|
the core actions, capabilities, and mediatypes of the OpenStack API. A
|
||||||
|
client can always depend on the availability of this core API, and
|
||||||
|
implementers are always required to support it in its entirety.
|
||||||
|
Requiring strict adherence to the core API allows clients to rely upon a
|
||||||
|
minimal level of functionality when interacting with multiple
|
||||||
|
implementations of the same API.
|
||||||
|
|
||||||
|
The OpenStack Compute API is extensible. An extension adds capabilities
|
||||||
|
to an API beyond those defined in the core. The introduction of new
|
||||||
|
features, MIME types, actions, states, headers, parameters, and
|
||||||
|
resources can all be accomplished by means of extensions to the core
|
||||||
|
API. This allows the introduction of new features in the API without
|
||||||
|
requiring a version change and allows the introduction of
|
||||||
|
vendor-specific niche functionality.
|
||||||
|
|
||||||
|
Scheduling
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
The scheduling services are responsible for determining the compute or
|
||||||
|
storage node where a virtual machine or block storage volume should be
|
||||||
|
created. The scheduling services receive creation requests for these
|
||||||
|
resources from the message queue and then begin the process of
|
||||||
|
determining the appropriate node where the resource should reside. This
|
||||||
|
process is done by applying a series of user-configurable filters
|
||||||
|
against the available collection of nodes.
|
||||||
|
|
||||||
|
There are currently two schedulers: ``nova-scheduler`` for virtual
|
||||||
|
machines and ``cinder-scheduler`` for block storage volumes. Both
|
||||||
|
schedulers are able to scale horizontally, so for high-availability
|
||||||
|
purposes, or for very large or high-schedule-frequency installations,
|
||||||
|
you should consider running multiple instances of each scheduler. The
|
||||||
|
schedulers all listen to the shared message queue, so no special load
|
||||||
|
balancing is required.
|
||||||
|
|
||||||
|
Images
|
||||||
|
~~~~~~
|
||||||
|
|
||||||
|
The OpenStack Image service consists of two parts: ``glance-api`` and
|
||||||
|
``glance-registry``. The former is responsible for the delivery of
|
||||||
|
images; the compute node uses it to download images from the back end.
|
||||||
|
The latter maintains the metadata information associated with virtual
|
||||||
|
machine images and requires a database.
|
||||||
|
|
||||||
|
The ``glance-api`` part is an abstraction layer that allows a choice of
|
||||||
|
back end. Currently, it supports:
|
||||||
|
|
||||||
|
OpenStack Object Storage
|
||||||
|
Allows you to store images as objects.
|
||||||
|
|
||||||
|
File system
|
||||||
|
Uses any traditional file system to store the images as files.
|
||||||
|
|
||||||
|
S3
|
||||||
|
Allows you to fetch images from Amazon S3.
|
||||||
|
|
||||||
|
HTTP
|
||||||
|
Allows you to fetch images from a web server. You cannot write
|
||||||
|
images by using this mode.
|
||||||
|
|
||||||
|
If you have an OpenStack Object Storage service, we recommend using this
|
||||||
|
as a scalable place to store your images. You can also use a file system
|
||||||
|
with sufficient performance or Amazon S3—unless you do not need the
|
||||||
|
ability to upload new images through OpenStack.
|
||||||
|
|
||||||
|
Dashboard
|
||||||
|
~~~~~~~~~
|
||||||
|
|
||||||
|
The OpenStack dashboard (horizon) provides a web-based user interface to
|
||||||
|
the various OpenStack components. The dashboard includes an end-user
|
||||||
|
area for users to manage their virtual infrastructure and an admin area
|
||||||
|
for cloud operators to manage the OpenStack environment as a
|
||||||
|
whole.
|
||||||
|
|
||||||
|
The dashboard is implemented as a Python web application that normally
|
||||||
|
runs in :term:`Apache` ``httpd``. Therefore, you may treat it the same as any
|
||||||
|
other web application, provided it can reach the API servers (including
|
||||||
|
their admin endpoints) over the network.
|
||||||
|
|
||||||
|
Authentication and Authorization
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The concepts supporting OpenStack's authentication and authorization are
|
||||||
|
derived from well-understood and widely used systems of a similar
|
||||||
|
nature. Users have credentials they can use to authenticate, and they
|
||||||
|
can be a member of one or more groups (known as projects or tenants,
|
||||||
|
interchangeably).
|
||||||
|
|
||||||
|
For example, a cloud administrator might be able to list all instances
|
||||||
|
in the cloud, whereas a user can see only those in his current group.
|
||||||
|
Resources quotas, such as the number of cores that can be used, disk
|
||||||
|
space, and so on, are associated with a project.
|
||||||
|
|
||||||
|
OpenStack Identity provides authentication decisions and user attribute
|
||||||
|
information, which is then used by the other OpenStack services to
|
||||||
|
perform authorization. The policy is set in the ``policy.json`` file.
|
||||||
|
For information on how to configure these, see :doc:`ops_projects_users`
|
||||||
|
|
||||||
|
OpenStack Identity supports different plug-ins for authentication
|
||||||
|
decisions and identity storage. Examples of these plug-ins include:
|
||||||
|
|
||||||
|
- In-memory key-value Store (a simplified internal storage structure)
|
||||||
|
|
||||||
|
- SQL database (such as MySQL or PostgreSQL)
|
||||||
|
|
||||||
|
- Memcached (a distributed memory object caching system)
|
||||||
|
|
||||||
|
- LDAP (such as OpenLDAP or Microsoft's Active Directory)
|
||||||
|
|
||||||
|
Many deployments use the SQL database; however, LDAP is also a popular
|
||||||
|
choice for those with existing authentication infrastructure that needs
|
||||||
|
to be integrated.
|
||||||
|
|
||||||
|
Network Considerations
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Because the cloud controller handles so many different services, it must
|
||||||
|
be able to handle the amount of traffic that hits it. For example, if
|
||||||
|
you choose to host the OpenStack Image service on the cloud controller,
|
||||||
|
the cloud controller should be able to support the transferring of the
|
||||||
|
images at an acceptable speed.
|
||||||
|
|
||||||
|
As another example, if you choose to use single-host networking where
|
||||||
|
the cloud controller is the network gateway for all instances, then the
|
||||||
|
cloud controller must support the total amount of traffic that travels
|
||||||
|
between your cloud and the public Internet.
|
||||||
|
|
||||||
|
We recommend that you use a fast NIC, such as 10 GB. You can also choose
|
||||||
|
to use two 10 GB NICs and bond them together. While you might not be
|
||||||
|
able to get a full bonded 20 GB speed, different transmission streams
|
||||||
|
use different NICs. For example, if the cloud controller transfers two
|
||||||
|
images, each image uses a different NIC and gets a full 10 GB of
|
||||||
|
bandwidth.
|
|
@ -0,0 +1,331 @@
|
||||||
|
=============
|
||||||
|
Compute Nodes
|
||||||
|
=============
|
||||||
|
|
||||||
|
In this chapter, we discuss some of the choices you need to consider
|
||||||
|
when building out your compute nodes. Compute nodes form the resource
|
||||||
|
core of the OpenStack Compute cloud, providing the processing, memory,
|
||||||
|
network and storage resources to run instances.
|
||||||
|
|
||||||
|
Choosing a CPU
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The type of CPU in your compute node is a very important choice. First,
|
||||||
|
ensure that the CPU supports virtualization by way of *VT-x* for Intel
|
||||||
|
chips and *AMD-v* for AMD chips.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Consult the vendor documentation to check for virtualization
|
||||||
|
support. For Intel, read `“Does my processor support Intel® Virtualization
|
||||||
|
Technology?” <http://www.intel.com/support/processors/sb/cs-030729.htm>`_.
|
||||||
|
For AMD, read `AMD Virtualization
|
||||||
|
<http://www.amd.com/en-us/innovations/software-technologies/server-solution/virtualization>`_.
|
||||||
|
Note that your CPU may support virtualization but it may be
|
||||||
|
disabled. Consult your BIOS documentation for how to enable CPU
|
||||||
|
features.
|
||||||
|
|
||||||
|
The number of cores that the CPU has also affects the decision. It's
|
||||||
|
common for current CPUs to have up to 12 cores. Additionally, if an
|
||||||
|
Intel CPU supports hyperthreading, those 12 cores are doubled to 24
|
||||||
|
cores. If you purchase a server that supports multiple CPUs, the number
|
||||||
|
of cores is further multiplied.
|
||||||
|
|
||||||
|
**Multithread Considerations**
|
||||||
|
|
||||||
|
Hyper-Threading is Intel's proprietary simultaneous multithreading
|
||||||
|
implementation used to improve parallelization on their CPUs. You might
|
||||||
|
consider enabling Hyper-Threading to improve the performance of
|
||||||
|
multithreaded applications.
|
||||||
|
|
||||||
|
Whether you should enable Hyper-Threading on your CPUs depends upon your
|
||||||
|
use case. For example, disabling Hyper-Threading can be beneficial in
|
||||||
|
intense computing environments. We recommend that you do performance
|
||||||
|
testing with your local workload with both Hyper-Threading on and off to
|
||||||
|
determine what is more appropriate in your case.
|
||||||
|
|
||||||
|
Choosing a Hypervisor
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A hypervisor provides software to manage virtual machine access to the
|
||||||
|
underlying hardware. The hypervisor creates, manages, and monitors
|
||||||
|
virtual machines. OpenStack Compute supports many hypervisors to various
|
||||||
|
degrees, including:
|
||||||
|
|
||||||
|
- `KVM <http://www.linux-kvm.org/page/Main_Page>`_
|
||||||
|
|
||||||
|
- `LXC <https://linuxcontainers.org/>`_
|
||||||
|
|
||||||
|
- `QEMU <http://wiki.qemu.org/Main_Page>`_
|
||||||
|
|
||||||
|
- `VMware
|
||||||
|
ESX/ESXi <https://www.vmware.com/support/vsphere-hypervisor>`_
|
||||||
|
|
||||||
|
- `Xen <http://www.xenproject.org/>`_
|
||||||
|
|
||||||
|
- `Hyper-V <http://technet.microsoft.com/en-us/library/hh831531.aspx>`_
|
||||||
|
|
||||||
|
- `Docker <https://www.docker.com/>`_
|
||||||
|
|
||||||
|
Probably the most important factor in your choice of hypervisor is your
|
||||||
|
current usage or experience. Aside from that, there are practical
|
||||||
|
concerns to do with feature parity, documentation, and the level of
|
||||||
|
community experience.
|
||||||
|
|
||||||
|
For example, KVM is the most widely adopted hypervisor in the OpenStack
|
||||||
|
community. Besides KVM, more deployments run Xen, LXC, VMware, and
|
||||||
|
Hyper-V than the others listed. However, each of these are lacking some
|
||||||
|
feature support or the documentation on how to use them with OpenStack
|
||||||
|
is out of date.
|
||||||
|
|
||||||
|
The best information available to support your choice is found on the
|
||||||
|
`Hypervisor Support Matrix
|
||||||
|
<http://docs.openstack.org/developer/nova/support-matrix.html>`_
|
||||||
|
and in the `configuration reference
|
||||||
|
<http://docs.openstack.org/liberty/config-reference/content/section_compute-hypervisors.html>`_.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
It is also possible to run multiple hypervisors in a single
|
||||||
|
deployment using host aggregates or cells. However, an individual
|
||||||
|
compute node can run only a single hypervisor at a time.
|
||||||
|
|
||||||
|
Instance Storage Solutions
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
As part of the procurement for a compute cluster, you must specify some
|
||||||
|
storage for the disk on which the instantiated instance runs. There are
|
||||||
|
three main approaches to providing this temporary-style storage, and it
|
||||||
|
is important to understand the implications of the choice.
|
||||||
|
|
||||||
|
They are:
|
||||||
|
|
||||||
|
- Off compute node storage—shared file system
|
||||||
|
|
||||||
|
- On compute node storage—shared file system
|
||||||
|
|
||||||
|
- On compute node storage—nonshared file system
|
||||||
|
|
||||||
|
In general, the questions you should ask when selecting storage are as
|
||||||
|
follows:
|
||||||
|
|
||||||
|
- What is the platter count you can achieve?
|
||||||
|
|
||||||
|
- Do more spindles result in better I/O despite network access?
|
||||||
|
|
||||||
|
- Which one results in the best cost-performance scenario you're aiming
|
||||||
|
for?
|
||||||
|
|
||||||
|
- How do you manage the storage operationally?
|
||||||
|
|
||||||
|
Many operators use separate compute and storage hosts. Compute services
|
||||||
|
and storage services have different requirements, and compute hosts
|
||||||
|
typically require more CPU and RAM than storage hosts. Therefore, for a
|
||||||
|
fixed budget, it makes sense to have different configurations for your
|
||||||
|
compute nodes and your storage nodes. Compute nodes will be invested in
|
||||||
|
CPU and RAM, and storage nodes will be invested in block storage.
|
||||||
|
|
||||||
|
However, if you are more restricted in the number of physical hosts you
|
||||||
|
have available for creating your cloud and you want to be able to
|
||||||
|
dedicate as many of your hosts as possible to running instances, it
|
||||||
|
makes sense to run compute and storage on the same machines.
|
||||||
|
|
||||||
|
We'll discuss the three main approaches to instance storage in the next
|
||||||
|
few sections.
|
||||||
|
|
||||||
|
Off Compute Node Storage—Shared File System
|
||||||
|
-------------------------------------------
|
||||||
|
|
||||||
|
In this option, the disks storing the running instances are hosted in
|
||||||
|
servers outside of the compute nodes.
|
||||||
|
|
||||||
|
If you use separate compute and storage hosts, you can treat your
|
||||||
|
compute hosts as "stateless." As long as you don't have any instances
|
||||||
|
currently running on a compute host, you can take it offline or wipe it
|
||||||
|
completely without having any effect on the rest of your cloud. This
|
||||||
|
simplifies maintenance for the compute hosts.
|
||||||
|
|
||||||
|
There are several advantages to this approach:
|
||||||
|
|
||||||
|
- If a compute node fails, instances are usually easily recoverable.
|
||||||
|
|
||||||
|
- Running a dedicated storage system can be operationally simpler.
|
||||||
|
|
||||||
|
- You can scale to any number of spindles.
|
||||||
|
|
||||||
|
- It may be possible to share the external storage for other purposes.
|
||||||
|
|
||||||
|
The main downsides to this approach are:
|
||||||
|
|
||||||
|
- Depending on design, heavy I/O usage from some instances can affect
|
||||||
|
unrelated instances.
|
||||||
|
|
||||||
|
- Use of the network can decrease performance.
|
||||||
|
|
||||||
|
On Compute Node Storage—Shared File System
|
||||||
|
------------------------------------------
|
||||||
|
|
||||||
|
In this option, each compute node is specified with a significant amount
|
||||||
|
of disk space, but a distributed file system ties the disks from each
|
||||||
|
compute node into a single mount.
|
||||||
|
|
||||||
|
The main advantage of this option is that it scales to external storage
|
||||||
|
when you require additional storage.
|
||||||
|
|
||||||
|
However, this option has several downsides:
|
||||||
|
|
||||||
|
- Running a distributed file system can make you lose your data
|
||||||
|
locality compared with nonshared storage.
|
||||||
|
|
||||||
|
- Recovery of instances is complicated by depending on multiple hosts.
|
||||||
|
|
||||||
|
- The chassis size of the compute node can limit the number of spindles
|
||||||
|
able to be used in a compute node.
|
||||||
|
|
||||||
|
- Use of the network can decrease performance.
|
||||||
|
|
||||||
|
On Compute Node Storage—Nonshared File System
|
||||||
|
---------------------------------------------
|
||||||
|
|
||||||
|
In this option, each compute node is specified with enough disks to
|
||||||
|
store the instances it hosts.
|
||||||
|
|
||||||
|
There are two main reasons why this is a good idea:
|
||||||
|
|
||||||
|
- Heavy I/O usage on one compute node does not affect instances on
|
||||||
|
other compute nodes.
|
||||||
|
|
||||||
|
- Direct I/O access can increase performance.
|
||||||
|
|
||||||
|
This has several downsides:
|
||||||
|
|
||||||
|
- If a compute node fails, the instances running on that node are lost.
|
||||||
|
|
||||||
|
- The chassis size of the compute node can limit the number of spindles
|
||||||
|
able to be used in a compute node.
|
||||||
|
|
||||||
|
- Migrations of instances from one node to another are more complicated
|
||||||
|
and rely on features that may not continue to be developed.
|
||||||
|
|
||||||
|
- If additional storage is required, this option does not scale.
|
||||||
|
|
||||||
|
Running a shared file system on a storage system apart from the computes
|
||||||
|
nodes is ideal for clouds where reliability and scalability are the most
|
||||||
|
important factors. Running a shared file system on the compute nodes
|
||||||
|
themselves may be best in a scenario where you have to deploy to
|
||||||
|
preexisting servers for which you have little to no control over their
|
||||||
|
specifications. Running a nonshared file system on the compute nodes
|
||||||
|
themselves is a good option for clouds with high I/O requirements and
|
||||||
|
low concern for reliability.
|
||||||
|
|
||||||
|
Issues with Live Migration
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
We consider live migration an integral part of the operations of the
|
||||||
|
cloud. This feature provides the ability to seamlessly move instances
|
||||||
|
from one physical host to another, a necessity for performing upgrades
|
||||||
|
that require reboots of the compute hosts, but only works well with
|
||||||
|
shared storage.
|
||||||
|
|
||||||
|
Live migration can also be done with nonshared storage, using a feature
|
||||||
|
known as *KVM live block migration*. While an earlier implementation of
|
||||||
|
block-based migration in KVM and QEMU was considered unreliable, there
|
||||||
|
is a newer, more reliable implementation of block-based live migration
|
||||||
|
as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack.
|
||||||
|
However, none of the authors of this guide have first-hand experience
|
||||||
|
using live block migration.
|
||||||
|
|
||||||
|
Choice of File System
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
If you want to support shared-storage live migration, you need to
|
||||||
|
configure a distributed file system.
|
||||||
|
|
||||||
|
Possible options include:
|
||||||
|
|
||||||
|
- NFS (default for Linux)
|
||||||
|
|
||||||
|
- GlusterFS
|
||||||
|
|
||||||
|
- MooseFS
|
||||||
|
|
||||||
|
- Lustre
|
||||||
|
|
||||||
|
We've seen deployments with all, and recommend that you choose the one
|
||||||
|
you are most familiar with operating. If you are not familiar with any
|
||||||
|
of these, choose NFS, as it is the easiest to set up and there is
|
||||||
|
extensive community knowledge about it.
|
||||||
|
|
||||||
|
Overcommitting
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack allows you to overcommit CPU and RAM on compute nodes. This
|
||||||
|
allows you to increase the number of instances you can have running on
|
||||||
|
your cloud, at the cost of reducing the performance of the instances.
|
||||||
|
OpenStack Compute uses the following ratios by default:
|
||||||
|
|
||||||
|
- CPU allocation ratio: 16:1
|
||||||
|
|
||||||
|
- RAM allocation ratio: 1.5:1
|
||||||
|
|
||||||
|
The default CPU allocation ratio of 16:1 means that the scheduler
|
||||||
|
allocates up to 16 virtual cores per physical core. For example, if a
|
||||||
|
physical node has 12 cores, the scheduler sees 192 available virtual
|
||||||
|
cores. With typical flavor definitions of 4 virtual cores per instance,
|
||||||
|
this ratio would provide 48 instances on a physical node.
|
||||||
|
|
||||||
|
The formula for the number of virtual instances on a compute node is
|
||||||
|
*(OR*PC)/VC*, where:
|
||||||
|
|
||||||
|
*OR*
|
||||||
|
CPU overcommit ratio (virtual cores per physical core)
|
||||||
|
|
||||||
|
*PC*
|
||||||
|
Number of physical cores
|
||||||
|
|
||||||
|
*VC*
|
||||||
|
Number of virtual cores per instance
|
||||||
|
|
||||||
|
Similarly, the default RAM allocation ratio of 1.5:1 means that the
|
||||||
|
scheduler allocates instances to a physical node as long as the total
|
||||||
|
amount of RAM associated with the instances is less than 1.5 times the
|
||||||
|
amount of RAM available on the physical node.
|
||||||
|
|
||||||
|
For example, if a physical node has 48 GB of RAM, the scheduler
|
||||||
|
allocates instances to that node until the sum of the RAM associated
|
||||||
|
with the instances reaches 72 GB (such as nine instances, in the case
|
||||||
|
where each instance has 8 GB of RAM).
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Regardless of the overcommit ratio, an instance can not be placed
|
||||||
|
on any physical node with fewer raw (pre-overcommit) resources than
|
||||||
|
the instance flavor requires.
|
||||||
|
|
||||||
|
You must select the appropriate CPU and RAM allocation ratio for your
|
||||||
|
particular use case.
|
||||||
|
|
||||||
|
Logging
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
Logging is detailed more fully in :doc:`ops_logging_monitoring`. However,
|
||||||
|
it is an important design consideration to take into account before
|
||||||
|
commencing operations of your cloud.
|
||||||
|
|
||||||
|
OpenStack produces a great deal of useful logging information, however;
|
||||||
|
but for the information to be useful for operations purposes, you should
|
||||||
|
consider having a central logging server to send logs to, and a log
|
||||||
|
parsing/analysis system (such as logstash).
|
||||||
|
|
||||||
|
Networking
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Networking in OpenStack is a complex, multifaceted challenge. See
|
||||||
|
:doc:`arch_network_design`.
|
||||||
|
|
||||||
|
Conclusion
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Compute nodes are the workhorse of your cloud and the place where your
|
||||||
|
users' applications will run. They are likely to be affected by your
|
||||||
|
decisions on what to deploy and how you deploy it. Their requirements
|
||||||
|
should be reflected in the choices you make.
|
|
@ -0,0 +1,556 @@
|
||||||
|
===========================================
|
||||||
|
Example Architecture — OpenStack Networking
|
||||||
|
===========================================
|
||||||
|
|
||||||
|
This chapter provides an example architecture using OpenStack
|
||||||
|
Networking, also known as the Neutron project, in a highly available
|
||||||
|
environment.
|
||||||
|
|
||||||
|
Overview
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
A highly-available environment can be put into place if you require an
|
||||||
|
environment that can scale horizontally, or want your cloud to continue
|
||||||
|
to be operational in case of node failure. This example architecture has
|
||||||
|
been written based on the current default feature set of OpenStack
|
||||||
|
Havana, with an emphasis on high availability.
|
||||||
|
|
||||||
|
Components
|
||||||
|
----------
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:widths: 50 50
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Component
|
||||||
|
- Details
|
||||||
|
* - OpenStack release
|
||||||
|
- Havana
|
||||||
|
* - Host operating system
|
||||||
|
- Red Hat Enterprise Linux 6.5
|
||||||
|
* - OpenStack package repository
|
||||||
|
- `Red Hat Distributed OpenStack (RDO) <https://repos.fedorapeople.org/repos/openstack/>`_
|
||||||
|
* - Hypervisor
|
||||||
|
- KVM
|
||||||
|
* - Database
|
||||||
|
- MySQL
|
||||||
|
* - Message queue
|
||||||
|
- Qpid
|
||||||
|
* - Networking service
|
||||||
|
- OpenStack Networking
|
||||||
|
* - Tenant Network Separation
|
||||||
|
- VLAN
|
||||||
|
* - Image service back end
|
||||||
|
- GlusterFS
|
||||||
|
* - Identity driver
|
||||||
|
- SQL
|
||||||
|
* - Block Storage back end
|
||||||
|
- GlusterFS
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
---------
|
||||||
|
|
||||||
|
This example architecture has been selected based on the current default
|
||||||
|
feature set of OpenStack Havana, with an emphasis on high availability.
|
||||||
|
This architecture is currently being deployed in an internal Red Hat
|
||||||
|
OpenStack cloud and used to run hosted and shared services, which by
|
||||||
|
their nature must be highly available.
|
||||||
|
|
||||||
|
This architecture's components have been selected for the following
|
||||||
|
reasons:
|
||||||
|
|
||||||
|
Red Hat Enterprise Linux
|
||||||
|
You must choose an operating system that can run on all of the
|
||||||
|
physical nodes. This example architecture is based on Red Hat
|
||||||
|
Enterprise Linux, which offers reliability, long-term support,
|
||||||
|
certified testing, and is hardened. Enterprise customers, now moving
|
||||||
|
into OpenStack usage, typically require these advantages.
|
||||||
|
|
||||||
|
RDO
|
||||||
|
The Red Hat Distributed OpenStack package offers an easy way to
|
||||||
|
download the most current OpenStack release that is built for the
|
||||||
|
Red Hat Enterprise Linux platform.
|
||||||
|
|
||||||
|
KVM
|
||||||
|
KVM is the supported hypervisor of choice for Red Hat Enterprise
|
||||||
|
Linux (and included in distribution). It is feature complete and
|
||||||
|
free from licensing charges and restrictions.
|
||||||
|
|
||||||
|
MySQL
|
||||||
|
MySQL is used as the database back end for all databases in the
|
||||||
|
OpenStack environment. MySQL is the supported database of choice for
|
||||||
|
Red Hat Enterprise Linux (and included in distribution); the
|
||||||
|
database is open source, scalable, and handles memory well.
|
||||||
|
|
||||||
|
Qpid
|
||||||
|
Apache Qpid offers 100 percent compatibility with the
|
||||||
|
:term:`Advanced Message Queuing Protocol (AMQP)` Standard, and its
|
||||||
|
broker is available for both C++ and Java.
|
||||||
|
|
||||||
|
OpenStack Networking
|
||||||
|
OpenStack Networking offers sophisticated networking functionality,
|
||||||
|
including Layer 2 (L2) network segregation and provider networks.
|
||||||
|
|
||||||
|
VLAN
|
||||||
|
Using a virtual local area network offers broadcast control,
|
||||||
|
security, and physical layer transparency. If needed, use VXLAN to
|
||||||
|
extend your address space.
|
||||||
|
|
||||||
|
GlusterFS
|
||||||
|
GlusterFS offers scalable storage. As your environment grows, you
|
||||||
|
can continue to add more storage nodes (instead of being restricted,
|
||||||
|
for example, by an expensive storage array).
|
||||||
|
|
||||||
|
Detailed Description
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Node types
|
||||||
|
----------
|
||||||
|
|
||||||
|
This section gives you a breakdown of the different nodes that make up
|
||||||
|
the OpenStack environment. A node is a physical machine that is
|
||||||
|
provisioned with an operating system, and running a defined software
|
||||||
|
stack on top of it. The table below provides node descriptions and
|
||||||
|
specifications.
|
||||||
|
|
||||||
|
.. list-table:: Node types
|
||||||
|
:widths: 33 33 33
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Type
|
||||||
|
- Description
|
||||||
|
- Example hardware
|
||||||
|
* - Controller
|
||||||
|
- Controller nodes are responsible for running the management software
|
||||||
|
services needed for the OpenStack environment to function.
|
||||||
|
These nodes:
|
||||||
|
|
||||||
|
* Provide the front door that people access as well as the API
|
||||||
|
services that all other components in the environment talk to.
|
||||||
|
* Run a number of services in a highly available fashion,
|
||||||
|
utilizing Pacemaker and HAProxy to provide a virtual IP and
|
||||||
|
load-balancing functions so all controller nodes are being used.
|
||||||
|
* Supply highly available "infrastructure" services,
|
||||||
|
such as MySQL and Qpid, that underpin all the services.
|
||||||
|
* Provide what is known as "persistent storage" through services
|
||||||
|
run on the host as well. This persistent storage is backed onto
|
||||||
|
the storage nodes for reliability.
|
||||||
|
|
||||||
|
See :ref:`controller_node`.
|
||||||
|
- Model: Dell R620
|
||||||
|
|
||||||
|
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
|
||||||
|
|
||||||
|
Memory: 32 GB
|
||||||
|
|
||||||
|
Disk: two 300 GB 10000 RPM SAS Disks
|
||||||
|
|
||||||
|
Network: two 10G network ports
|
||||||
|
* - Compute
|
||||||
|
- Compute nodes run the virtual machine instances in OpenStack. They:
|
||||||
|
|
||||||
|
* Run the bare minimum of services needed to facilitate these
|
||||||
|
instances.
|
||||||
|
* Use local storage on the node for the virtual machines so that
|
||||||
|
no VM migration or instance recovery at node failure is possible.
|
||||||
|
|
||||||
|
See :ref:`compute_node`.
|
||||||
|
- Model: Dell R620
|
||||||
|
|
||||||
|
CPU: 2x Intel® Xeon® CPU E5-2650 0 @ 2.00 GHz
|
||||||
|
|
||||||
|
Memory: 128 GB
|
||||||
|
|
||||||
|
Disk: two 600 GB 10000 RPM SAS Disks
|
||||||
|
|
||||||
|
Network: four 10G network ports (For future proofing expansion)
|
||||||
|
* - Storage
|
||||||
|
- Storage nodes store all the data required for the environment,
|
||||||
|
including disk images in the Image service library, and the
|
||||||
|
persistent storage volumes created by the Block Storage service.
|
||||||
|
Storage nodes use GlusterFS technology to keep the data highly
|
||||||
|
available and scalable.
|
||||||
|
|
||||||
|
See :ref:`storage_node`.
|
||||||
|
- Model: Dell R720xd
|
||||||
|
|
||||||
|
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
|
||||||
|
|
||||||
|
Memory: 64 GB
|
||||||
|
|
||||||
|
Disk: two 500 GB 7200 RPM SAS Disks and twenty-four 600 GB
|
||||||
|
10000 RPM SAS Disks
|
||||||
|
|
||||||
|
Raid Controller: PERC H710P Integrated RAID Controller, 1 GB NV Cache
|
||||||
|
|
||||||
|
Network: two 10G network ports
|
||||||
|
* - Network
|
||||||
|
- Network nodes are responsible for doing all the virtual networking
|
||||||
|
needed for people to create public or private networks and uplink
|
||||||
|
their virtual machines into external networks. Network nodes:
|
||||||
|
|
||||||
|
* Form the only ingress and egress point for instances running
|
||||||
|
on top of OpenStack.
|
||||||
|
* Run all of the environment's networking services, with the
|
||||||
|
exception of the networking API service (which runs on the
|
||||||
|
controller node).
|
||||||
|
|
||||||
|
See :ref:`network_node`.
|
||||||
|
- Model: Dell R620
|
||||||
|
|
||||||
|
CPU: 1x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
|
||||||
|
|
||||||
|
Memory: 32 GB
|
||||||
|
|
||||||
|
Disk: two 300 GB 10000 RPM SAS Disks
|
||||||
|
|
||||||
|
Network: five 10G network ports
|
||||||
|
* - Utility
|
||||||
|
- Utility nodes are used by internal administration staff only to
|
||||||
|
provide a number of basic system administration functions needed
|
||||||
|
to get the environment up and running and to maintain the hardware,
|
||||||
|
OS, and software on which it runs.
|
||||||
|
|
||||||
|
These nodes run services such as provisioning, configuration
|
||||||
|
management, monitoring, or GlusterFS management software.
|
||||||
|
They are not required to scale, although these machines are
|
||||||
|
usually backed up.
|
||||||
|
- Model: Dell R620
|
||||||
|
|
||||||
|
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
|
||||||
|
|
||||||
|
Memory: 32 GB
|
||||||
|
|
||||||
|
Disk: two 500 GB 7200 RPM SAS Disks
|
||||||
|
|
||||||
|
Network: two 10G network ports
|
||||||
|
|
||||||
|
|
||||||
|
.. _networking_layout:
|
||||||
|
|
||||||
|
Networking layout
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
The network contains all the management devices for all hardware in the
|
||||||
|
environment (for example, by including Dell iDrac7 devices for the
|
||||||
|
hardware nodes, and management interfaces for network switches). The
|
||||||
|
network is accessed by internal staff only when diagnosing or recovering
|
||||||
|
a hardware issue.
|
||||||
|
|
||||||
|
OpenStack internal network
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
This network is used for OpenStack management functions and traffic,
|
||||||
|
including services needed for the provisioning of physical nodes
|
||||||
|
(``pxe``, ``tftp``, ``kickstart``), traffic between various OpenStack
|
||||||
|
node types using OpenStack APIs and messages (for example,
|
||||||
|
``nova-compute`` talking to ``keystone`` or ``cinder-volume`` talking to
|
||||||
|
``nova-api``), and all traffic for storage data to the storage layer
|
||||||
|
underneath by the Gluster protocol. All physical nodes have at least one
|
||||||
|
network interface (typically ``eth0``) in this network. This network is
|
||||||
|
only accessible from other VLANs on port 22 (for ``ssh`` access to
|
||||||
|
manage machines).
|
||||||
|
|
||||||
|
Public Network
|
||||||
|
--------------
|
||||||
|
|
||||||
|
This network is a combination of:
|
||||||
|
|
||||||
|
- IP addresses for public-facing interfaces on the controller nodes
|
||||||
|
(which end users will access the OpenStack services)
|
||||||
|
|
||||||
|
- A range of publicly routable, IPv4 network addresses to be used by
|
||||||
|
OpenStack Networking for floating IPs. You may be restricted in your
|
||||||
|
access to IPv4 addresses; a large range of IPv4 addresses is not
|
||||||
|
necessary.
|
||||||
|
|
||||||
|
- Routers for private networks created within OpenStack.
|
||||||
|
|
||||||
|
This network is connected to the controller nodes so users can access
|
||||||
|
the OpenStack interfaces, and connected to the network nodes to provide
|
||||||
|
VMs with publicly routable traffic functionality. The network is also
|
||||||
|
connected to the utility machines so that any utility services that need
|
||||||
|
to be made public (such as system monitoring) can be accessed.
|
||||||
|
|
||||||
|
VM traffic network
|
||||||
|
------------------
|
||||||
|
|
||||||
|
This is a closed network that is not publicly routable and is simply
|
||||||
|
used as a private, internal network for traffic between virtual machines
|
||||||
|
in OpenStack, and between the virtual machines and the network nodes
|
||||||
|
that provide l3 routes out to the public network (and floating IPs for
|
||||||
|
connections back in to the VMs). Because this is a closed network, we
|
||||||
|
are using a different address space to the others to clearly define the
|
||||||
|
separation. Only Compute and OpenStack Networking nodes need to be
|
||||||
|
connected to this network.
|
||||||
|
|
||||||
|
Node connectivity
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The following section details how the nodes are connected to the
|
||||||
|
different networks (see :ref:`networking_layout`) and
|
||||||
|
what other considerations need to take place (for example, bonding) when
|
||||||
|
connecting nodes to the networks.
|
||||||
|
|
||||||
|
Initial deployment
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Initially, the connection setup should revolve around keeping the
|
||||||
|
connectivity simple and straightforward in order to minimize deployment
|
||||||
|
complexity and time to deploy. The deployment shown below aims to have 1 × 10G
|
||||||
|
connectivity available to all compute nodes, while still leveraging bonding on
|
||||||
|
appropriate nodes for maximum performance.
|
||||||
|
|
||||||
|
.. figure:: figures/osog_0101.png
|
||||||
|
:alt: Basic node deployment
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
Basic node deployment
|
||||||
|
|
||||||
|
|
||||||
|
Connectivity for maximum performance
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
|
If the networking performance of the basic layout is not enough, you can
|
||||||
|
move to the design below, which provides 2 × 10G network
|
||||||
|
links to all instances in the environment as well as providing more
|
||||||
|
network bandwidth to the storage layer.
|
||||||
|
|
||||||
|
.. figure:: figures/osog_0102.png
|
||||||
|
:alt: Performance node deployment
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
Performance node deployment
|
||||||
|
|
||||||
|
|
||||||
|
Node diagrams
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The following diagrams include logical
|
||||||
|
information about the different types of nodes, indicating what services
|
||||||
|
will be running on top of them and how they interact with each other.
|
||||||
|
The diagrams also illustrate how the availability and scalability of
|
||||||
|
services are achieved.
|
||||||
|
|
||||||
|
.. _controller_node:
|
||||||
|
|
||||||
|
.. figure:: figures/osog_0103.png
|
||||||
|
:alt: Controller node
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
Controller node
|
||||||
|
|
||||||
|
.. _compute_node:
|
||||||
|
|
||||||
|
.. figure:: figures/osog_0104.png
|
||||||
|
:alt: Compute node
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
Compute node
|
||||||
|
|
||||||
|
.. _network_node:
|
||||||
|
|
||||||
|
.. figure:: figures/osog_0105.png
|
||||||
|
:alt: Network node
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
Network node
|
||||||
|
|
||||||
|
.. _storage_node:
|
||||||
|
|
||||||
|
.. figure:: figures/osog_0106.png
|
||||||
|
:alt: Storage node
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
Storage node
|
||||||
|
|
||||||
|
|
||||||
|
Example Component Configuration
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
The following tables include example configuration
|
||||||
|
and considerations for both third-party and OpenStack components:
|
||||||
|
|
||||||
|
.. list-table:: Table: Third-party component configuration
|
||||||
|
:widths: 25 25 25 25
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Component
|
||||||
|
- Tuning
|
||||||
|
- Availability
|
||||||
|
- Scalability
|
||||||
|
* - MySQL
|
||||||
|
- ``binlog-format = row``
|
||||||
|
- Master/master replication. However, both nodes are not used at the
|
||||||
|
same time. Replication keeps all nodes as close to being up to date
|
||||||
|
as possible (although the asynchronous nature of the replication means
|
||||||
|
a fully consistent state is not possible). Connections to the database
|
||||||
|
only happen through a Pacemaker virtual IP, ensuring that most problems
|
||||||
|
that occur with master-master replication can be avoided.
|
||||||
|
- Not heavily considered. Once load on the MySQL server increases enough
|
||||||
|
that scalability needs to be considered, multiple masters or a
|
||||||
|
master/slave setup can be used.
|
||||||
|
* - Qpid
|
||||||
|
- ``max-connections=1000`` ``worker-threads=20`` ``connection-backlog=10``,
|
||||||
|
sasl security enabled with SASL-BASIC authentication
|
||||||
|
- Qpid is added as a resource to the Pacemaker software that runs on
|
||||||
|
Controller nodes where Qpid is situated. This ensures only one Qpid
|
||||||
|
instance is running at one time, and the node with the Pacemaker
|
||||||
|
virtual IP will always be the node running Qpid.
|
||||||
|
- Not heavily considered. However, Qpid can be changed to run on all
|
||||||
|
controller nodes for scalability and availability purposes,
|
||||||
|
and removed from Pacemaker.
|
||||||
|
* - HAProxy
|
||||||
|
- ``maxconn 3000``
|
||||||
|
- HAProxy is a software layer-7 load balancer used to front door all
|
||||||
|
clustered OpenStack API components and do SSL termination.
|
||||||
|
HAProxy can be added as a resource to the Pacemaker software that
|
||||||
|
runs on the Controller nodes where HAProxy is situated.
|
||||||
|
This ensures that only one HAProxy instance is running at one time,
|
||||||
|
and the node with the Pacemaker virtual IP will always be the node
|
||||||
|
running HAProxy.
|
||||||
|
- Not considered. HAProxy has small enough performance overheads that
|
||||||
|
a single instance should scale enough for this level of workload.
|
||||||
|
If extra scalability is needed, ``keepalived`` or other Layer-4
|
||||||
|
load balancing can be introduced to be placed in front of multiple
|
||||||
|
copies of HAProxy.
|
||||||
|
* - Memcached
|
||||||
|
- ``MAXCONN="8192" CACHESIZE="30457"``
|
||||||
|
- Memcached is a fast in-memory key-value cache software that is used
|
||||||
|
by OpenStack components for caching data and increasing performance.
|
||||||
|
Memcached runs on all controller nodes, ensuring that should one go
|
||||||
|
down, another instance of Memcached is available.
|
||||||
|
- Not considered. A single instance of Memcached should be able to
|
||||||
|
scale to the desired workloads. If scalability is desired, HAProxy
|
||||||
|
can be placed in front of Memcached (in raw ``tcp`` mode) to utilize
|
||||||
|
multiple Memcached instances for scalability. However, this might
|
||||||
|
cause cache consistency issues.
|
||||||
|
* - Pacemaker
|
||||||
|
- Configured to use ``corosync`` and ``cman`` as a cluster communication
|
||||||
|
stack/quorum manager, and as a two-node cluster.
|
||||||
|
- Pacemaker is the clustering software used to ensure the availability
|
||||||
|
of services running on the controller and network nodes:
|
||||||
|
|
||||||
|
* Because Pacemaker is cluster software, the software itself handles
|
||||||
|
its own availability, leveraging ``corosync`` and ``cman``
|
||||||
|
underneath.
|
||||||
|
* If you use the GlusterFS native client, no virtual IP is needed,
|
||||||
|
since the client knows all about nodes after initial connection
|
||||||
|
and automatically routes around failures on the client side.
|
||||||
|
* If you use the NFS or SMB adaptor, you will need a virtual IP on
|
||||||
|
which to mount the GlusterFS volumes.
|
||||||
|
- If more nodes need to be made cluster aware, Pacemaker can scale to
|
||||||
|
64 nodes.
|
||||||
|
* - GlusterFS
|
||||||
|
- ``glusterfs`` performance profile "virt" enabled on all volumes.
|
||||||
|
Volumes are setup in two-node replication.
|
||||||
|
- Glusterfs is a clustered file system that is run on the storage
|
||||||
|
nodes to provide persistent scalable data storage in the environment.
|
||||||
|
Because all connections to gluster use the ``gluster`` native mount
|
||||||
|
points, the ``gluster`` instances themselves provide availability
|
||||||
|
and failover functionality.
|
||||||
|
- The scalability of GlusterFS storage can be achieved by adding in
|
||||||
|
more storage volumes.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
.. list-table:: Table: OpenStack component configuration
|
||||||
|
:widths: 20 20 20 20 20
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Component
|
||||||
|
- Node type
|
||||||
|
- Tuning
|
||||||
|
- Availability
|
||||||
|
- Scalability
|
||||||
|
* - Dashboard (horizon)
|
||||||
|
- Controller
|
||||||
|
- Configured to use Memcached as a session store, ``neutron``
|
||||||
|
support is enabled, ``can_set_mount_point = False``
|
||||||
|
- The dashboard is run on all controller nodes, ensuring at least one
|
||||||
|
instance will be available in case of node failure.
|
||||||
|
It also sits behind HAProxy, which detects when the software fails
|
||||||
|
and routes requests around the failing instance.
|
||||||
|
- The dashboard is run on all controller nodes, so scalability can be
|
||||||
|
achieved with additional controller nodes. HAProxy allows scalability
|
||||||
|
for the dashboard as more nodes are added.
|
||||||
|
* - Identity (keystone)
|
||||||
|
- Controller
|
||||||
|
- Configured to use Memcached for caching and PKI for tokens.
|
||||||
|
- Identity is run on all controller nodes, ensuring at least one
|
||||||
|
instance will be available in case of node failure.
|
||||||
|
Identity also sits behind HAProxy, which detects when the software
|
||||||
|
fails and routes requests around the failing instance.
|
||||||
|
- Identity is run on all controller nodes, so scalability can be
|
||||||
|
achieved with additional controller nodes.
|
||||||
|
HAProxy allows scalability for Identity as more nodes are added.
|
||||||
|
* - Image service (glance)
|
||||||
|
- Controller
|
||||||
|
- ``/var/lib/glance/images`` is a GlusterFS native mount to a Gluster
|
||||||
|
volume off the storage layer.
|
||||||
|
- The Image service is run on all controller nodes, ensuring at least
|
||||||
|
one instance will be available in case of node failure.
|
||||||
|
It also sits behind HAProxy, which detects when the software fails
|
||||||
|
and routes requests around the failing instance.
|
||||||
|
- The Image service is run on all controller nodes, so scalability
|
||||||
|
can be achieved with additional controller nodes. HAProxy allows
|
||||||
|
scalability for the Image service as more nodes are added.
|
||||||
|
* - Compute (nova)
|
||||||
|
- Controller, Compute
|
||||||
|
- Configured to use Qpid, ``qpid_heartbeat = `` ``10``,configured to
|
||||||
|
use Memcached for caching, configured to use ``libvirt``, configured
|
||||||
|
to use ``neutron``.
|
||||||
|
|
||||||
|
Configured ``nova-consoleauth`` to use Memcached for session
|
||||||
|
management (so that it can have multiple copies and run in a
|
||||||
|
load balancer).
|
||||||
|
- The nova API, scheduler, objectstore, cert, consoleauth, conductor,
|
||||||
|
and vncproxy services are run on all controller nodes, ensuring at
|
||||||
|
least one instance will be available in case of node failure.
|
||||||
|
Compute is also behind HAProxy, which detects when the software
|
||||||
|
fails and routes requests around the failing instance.
|
||||||
|
|
||||||
|
Nova-compute and nova-conductor services, which run on the compute
|
||||||
|
nodes, are only needed to run services on that node, so availability
|
||||||
|
of those services is coupled tightly to the nodes that are available.
|
||||||
|
As long as a compute node is up, it will have the needed services
|
||||||
|
running on top of it.
|
||||||
|
- The nova API, scheduler, objectstore, cert, consoleauth, conductor,
|
||||||
|
and vncproxy services are run on all controller nodes, so scalability
|
||||||
|
can be achieved with additional controller nodes. HAProxy allows
|
||||||
|
scalability for Compute as more nodes are added. The scalability
|
||||||
|
of services running on the compute nodes (compute, conductor) is
|
||||||
|
achieved linearly by adding in more compute nodes.
|
||||||
|
* - Block Storage (cinder)
|
||||||
|
- Controller
|
||||||
|
- Configured to use Qpid, ``qpid_heartbeat = ``\ ``10``,configured to
|
||||||
|
use a Gluster volume from the storage layer as the back end for
|
||||||
|
Block Storage, using the Gluster native client.
|
||||||
|
- Block Storage API, scheduler, and volume services are run on all
|
||||||
|
controller nodes, ensuring at least one instance will be available
|
||||||
|
in case of node failure. Block Storage also sits behind HAProxy,
|
||||||
|
which detects if the software fails and routes requests around the
|
||||||
|
failing instance.
|
||||||
|
- Block Storage API, scheduler and volume services are run on all
|
||||||
|
controller nodes, so scalability can be achieved with additional
|
||||||
|
controller nodes. HAProxy allows scalability for Block Storage as
|
||||||
|
more nodes are added.
|
||||||
|
* - OpenStack Networking (neutron)
|
||||||
|
- Controller, Compute, Network
|
||||||
|
- Configured to use QPID, ``qpid_heartbeat = 10``, kernel namespace
|
||||||
|
support enabled, ``tenant_network_type = vlan``,
|
||||||
|
``allow_overlapping_ips = true``, ``tenant_network_type = vlan``,
|
||||||
|
``bridge_uplinks = br-ex:em2``, ``bridge_mappings = physnet1:br-ex``
|
||||||
|
- The OpenStack Networking service is run on all controller nodes,
|
||||||
|
ensuring at least one instance will be available in case of node
|
||||||
|
failure. It also sits behind HAProxy, which detects if the software
|
||||||
|
fails and routes requests around the failing instance.
|
||||||
|
- The OpenStack Networking server service is run on all controller
|
||||||
|
nodes, so scalability can be achieved with additional controller
|
||||||
|
nodes. HAProxy allows scalability for OpenStack Networking as more
|
||||||
|
nodes are added. Scalability of services running on the network
|
||||||
|
nodes is not currently supported by OpenStack Networking, so they
|
||||||
|
are not be considered. One copy of the services should be sufficient
|
||||||
|
to handle the workload. Scalability of the ``ovs-agent`` running on
|
||||||
|
compute nodes is achieved by adding in more compute nodes as
|
||||||
|
necessary.
|
|
@ -0,0 +1,259 @@
|
||||||
|
===============================================
|
||||||
|
Example Architecture — Legacy Networking (nova)
|
||||||
|
===============================================
|
||||||
|
|
||||||
|
This particular example architecture has been upgraded from :term:`Grizzly` to
|
||||||
|
:term:`Havana` and tested in production environments where many public IP
|
||||||
|
addresses are available for assignment to multiple instances. You can
|
||||||
|
find a second example architecture that uses OpenStack Networking
|
||||||
|
(neutron) after this section. Each example offers high availability,
|
||||||
|
meaning that if a particular node goes down, another node with the same
|
||||||
|
configuration can take over the tasks so that the services continue to
|
||||||
|
be available.
|
||||||
|
|
||||||
|
Overview
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
The simplest architecture you can build upon for Compute has a single
|
||||||
|
cloud controller and multiple compute nodes. The simplest architecture
|
||||||
|
for Object Storage has five nodes: one for identifying users and
|
||||||
|
proxying requests to the API, then four for storage itself to provide
|
||||||
|
enough replication for eventual consistency. This example architecture
|
||||||
|
does not dictate a particular number of nodes, but shows the thinking
|
||||||
|
and considerations that went into choosing this architecture including
|
||||||
|
the features offered.
|
||||||
|
|
||||||
|
Components
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:widths: 50 50
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Component
|
||||||
|
- Details
|
||||||
|
* - OpenStack release
|
||||||
|
- Havana
|
||||||
|
* - Host operating system
|
||||||
|
- Ubuntu 12.04 LTS or Red Hat Enterprise Linux 6.5,
|
||||||
|
including derivatives such as CentOS and Scientific Linux
|
||||||
|
* - OpenStack package repository
|
||||||
|
- `Ubuntu Cloud Archive <https://wiki.ubuntu.com/ServerTeam/CloudArchive>`_
|
||||||
|
or `RDO <http://openstack.redhat.com/Frequently_Asked_Questions>`_
|
||||||
|
* - Hypervisor
|
||||||
|
- KVM
|
||||||
|
* - Database
|
||||||
|
- MySQL\*
|
||||||
|
* - Message queue
|
||||||
|
- RabbitMQ for Ubuntu; Qpid for Red Hat Enterprise Linux and derivatives
|
||||||
|
* - Networking service
|
||||||
|
- ``nova-network``
|
||||||
|
* - Network manager
|
||||||
|
- FlatDHCP
|
||||||
|
* - Single ``nova-network`` or multi-host?
|
||||||
|
- multi-host\*
|
||||||
|
* - Image service (glance) back end
|
||||||
|
- file
|
||||||
|
* - Identity (keystone) driver
|
||||||
|
- SQL
|
||||||
|
* - Block Storage (cinder) back end
|
||||||
|
- LVM/iSCSI
|
||||||
|
* - Live Migration back end
|
||||||
|
- Shared storage using NFS\*
|
||||||
|
* - Object storage
|
||||||
|
- OpenStack Object Storage (swift)
|
||||||
|
|
||||||
|
An asterisk (\*) indicates when the example architecture deviates from
|
||||||
|
the settings of a default installation. We'll offer explanations for
|
||||||
|
those deviations next.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The following features of OpenStack are supported by the example
|
||||||
|
architecture documented in this guide, but are optional:
|
||||||
|
|
||||||
|
- :term:`Dashboard`: You probably want to offer a dashboard, but your
|
||||||
|
users may be more interested in API access only.
|
||||||
|
|
||||||
|
- Block storage: You don't have to offer users block storage if
|
||||||
|
their use case only needs ephemeral storage on compute nodes, for
|
||||||
|
example.
|
||||||
|
|
||||||
|
- Floating IP address: Floating IP addresses are public IP
|
||||||
|
addresses that you allocate from a predefined pool to assign to
|
||||||
|
virtual machines at launch. Floating IP address ensure that the
|
||||||
|
public IP address is available whenever an instance is booted.
|
||||||
|
Not every organization can offer thousands of public floating IP
|
||||||
|
addresses for thousands of instances, so this feature is
|
||||||
|
considered optional.
|
||||||
|
|
||||||
|
- Live migration: If you need to move running virtual machine
|
||||||
|
instances from one host to another with little or no service
|
||||||
|
interruption, you would enable live migration, but it is
|
||||||
|
considered optional.
|
||||||
|
|
||||||
|
- Object storage: You may choose to store machine images on a file
|
||||||
|
system rather than in object storage if you do not have the extra
|
||||||
|
hardware for the required replication and redundancy that
|
||||||
|
OpenStack Object Storage offers.
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
~~~~~~~~~
|
||||||
|
|
||||||
|
This example architecture has been selected based on the current default
|
||||||
|
feature set of OpenStack Havana, with an emphasis on stability. We
|
||||||
|
believe that many clouds that currently run OpenStack in production have
|
||||||
|
made similar choices.
|
||||||
|
|
||||||
|
You must first choose the operating system that runs on all of the
|
||||||
|
physical nodes. While OpenStack is supported on several distributions of
|
||||||
|
Linux, we used *Ubuntu 12.04 LTS (Long Term Support)*, which is used by
|
||||||
|
the majority of the development community, has feature completeness
|
||||||
|
compared with other distributions and has clear future support plans.
|
||||||
|
|
||||||
|
We recommend that you do not use the default Ubuntu OpenStack install
|
||||||
|
packages and instead use the `Ubuntu Cloud
|
||||||
|
Archive <https://wiki.ubuntu.com/ServerTeam/CloudArchive>`__. The Cloud
|
||||||
|
Archive is a package repository supported by Canonical that allows you
|
||||||
|
to upgrade to future OpenStack releases while remaining on Ubuntu 12.04.
|
||||||
|
|
||||||
|
*KVM* as a :term:`hypervisor` complements the choice of Ubuntu—being a
|
||||||
|
matched pair in terms of support, and also because of the significant degree
|
||||||
|
of attention it garners from the OpenStack development community (including
|
||||||
|
the authors, who mostly use KVM). It is also feature complete, free from
|
||||||
|
licensing charges and restrictions.
|
||||||
|
|
||||||
|
*MySQL* follows a similar trend. Despite its recent change of ownership,
|
||||||
|
this database is the most tested for use with OpenStack and is heavily
|
||||||
|
documented. We deviate from the default database, *SQLite*, because
|
||||||
|
SQLite is not an appropriate database for production usage.
|
||||||
|
|
||||||
|
The choice of *RabbitMQ* over other
|
||||||
|
:term:`AMQP <Advanced Message Queuing Protocol (AMQP)>` compatible options
|
||||||
|
that are gaining support in OpenStack, such as ZeroMQ and Qpid, is due to its
|
||||||
|
ease of use and significant testing in production. It also is the only
|
||||||
|
option that supports features such as Compute cells. We recommend
|
||||||
|
clustering with RabbitMQ, as it is an integral component of the system
|
||||||
|
and fairly simple to implement due to its inbuilt nature.
|
||||||
|
|
||||||
|
As discussed in previous chapters, there are several options for
|
||||||
|
networking in OpenStack Compute. We recommend *FlatDHCP* and to use
|
||||||
|
*Multi-Host* networking mode for high availability, running one
|
||||||
|
``nova-network`` daemon per OpenStack compute host. This provides a
|
||||||
|
robust mechanism for ensuring network interruptions are isolated to
|
||||||
|
individual compute hosts, and allows for the direct use of hardware
|
||||||
|
network gateways.
|
||||||
|
|
||||||
|
*Live Migration* is supported by way of shared storage, with *NFS* as
|
||||||
|
the distributed file system.
|
||||||
|
|
||||||
|
Acknowledging that many small-scale deployments see running Object
|
||||||
|
Storage just for the storage of virtual machine images as too costly, we
|
||||||
|
opted for the file back end in the OpenStack :term:`Image service` (Glance).
|
||||||
|
If your cloud will include Object Storage, you can easily add it as a back
|
||||||
|
end.
|
||||||
|
|
||||||
|
We chose the *SQL back end for Identity* over others, such as LDAP. This
|
||||||
|
back end is simple to install and is robust. The authors acknowledge
|
||||||
|
that many installations want to bind with existing directory services
|
||||||
|
and caution careful understanding of the `array of options available
|
||||||
|
<http://docs.openstack.org/havana/config-reference/content/ch_configuring-openstack-identity.html#configuring-keystone-for-ldap-backend>`_.
|
||||||
|
|
||||||
|
Block Storage (cinder) is installed natively on external storage nodes
|
||||||
|
and uses the *LVM/iSCSI plug-in*. Most Block Storage plug-ins are tied
|
||||||
|
to particular vendor products and implementations limiting their use to
|
||||||
|
consumers of those hardware platforms, but LVM/iSCSI is robust and
|
||||||
|
stable on commodity hardware.
|
||||||
|
|
||||||
|
While the cloud can be run without the *OpenStack Dashboard*, we
|
||||||
|
consider it to be indispensable, not just for user interaction with the
|
||||||
|
cloud, but also as a tool for operators. Additionally, the dashboard's
|
||||||
|
use of Django makes it a flexible framework for extension.
|
||||||
|
|
||||||
|
Why not use OpenStack Networking?
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This example architecture does not use OpenStack Networking, because it
|
||||||
|
does not yet support multi-host networking and our organizations
|
||||||
|
(university, government) have access to a large range of
|
||||||
|
publicly-accessible IPv4 addresses.
|
||||||
|
|
||||||
|
Why use multi-host networking?
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In a default OpenStack deployment, there is a single ``nova-network``
|
||||||
|
service that runs within the cloud (usually on the cloud controller)
|
||||||
|
that provides services such as
|
||||||
|
:term:`network address translation <NAT>` (NAT), :term:`DHCP`,
|
||||||
|
and :term:`DNS` to the guest instances. If the single node that runs the
|
||||||
|
``nova-network`` service goes down, you cannot access your instances,
|
||||||
|
and the instances cannot access the Internet. The single node that runs
|
||||||
|
the ``nova-network`` service can become a bottleneck if excessive
|
||||||
|
network traffic comes in and goes out of the cloud.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
`Multi-host <http://docs.openstack.org/havana/install-guide/install/apt/content/nova-network.html>`_
|
||||||
|
is a high-availability option for the network configuration, where
|
||||||
|
the ``nova-network`` service is run on every compute node instead of
|
||||||
|
running on only a single node.
|
||||||
|
|
||||||
|
Detailed Description
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
The reference architecture consists of multiple compute nodes, a cloud
|
||||||
|
controller, an external NFS storage server for instance storage, and an
|
||||||
|
OpenStack Block Storage server for volume storage.legacy networking
|
||||||
|
(nova) detailed description A network time service (:term:`Network Time
|
||||||
|
Protocol <NTP>`, or NTP) synchronizes time on all the nodes. FlatDHCPManager in
|
||||||
|
multi-host mode is used for the networking. A logical diagram for this
|
||||||
|
example architecture shows which services are running on each node:
|
||||||
|
|
||||||
|
.. image:: figures/osog_01in01.png
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
The cloud controller runs the dashboard, the API services, the database
|
||||||
|
(MySQL), a message queue server (RabbitMQ), the scheduler for choosing
|
||||||
|
compute resources (``nova-scheduler``), Identity services (keystone,
|
||||||
|
``nova-consoleauth``), Image services (``glance-api``,
|
||||||
|
``glance-registry``), services for console access of guests, and Block
|
||||||
|
Storage services, including the scheduler for storage resources
|
||||||
|
(``cinder-api`` and ``cinder-scheduler``).
|
||||||
|
|
||||||
|
Compute nodes are where the computing resources are held, and in our
|
||||||
|
example architecture, they run the hypervisor (KVM), libvirt (the driver
|
||||||
|
for the hypervisor, which enables live migration from node to node),
|
||||||
|
``nova-compute``, ``nova-api-metadata`` (generally only used when
|
||||||
|
running in multi-host mode, it retrieves instance-specific metadata),
|
||||||
|
``nova-vncproxy``, and ``nova-network``.
|
||||||
|
|
||||||
|
The network consists of two switches, one for the management or private
|
||||||
|
traffic, and one that covers public access, including floating IPs. To
|
||||||
|
support this, the cloud controller and the compute nodes have two
|
||||||
|
network cards. The OpenStack Block Storage and NFS storage servers only
|
||||||
|
need to access the private network and therefore only need one network
|
||||||
|
card, but multiple cards run in a bonded configuration are recommended
|
||||||
|
if possible. Floating IP access is direct to the Internet, whereas Flat
|
||||||
|
IP access goes through a NAT. To envision the network traffic, use this
|
||||||
|
diagram:
|
||||||
|
|
||||||
|
.. image:: figures/osog_01in02.png
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
|
|
||||||
|
|
||||||
|
Optional Extensions
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
You can extend this reference architecture aslegacy networking (nova)
|
||||||
|
optional extensions follows:
|
||||||
|
|
||||||
|
- Add additional cloud controllers (see :doc:`ops_maintenance`).
|
||||||
|
|
||||||
|
- Add an OpenStack Storage service (see the Object Storage chapter in
|
||||||
|
the *OpenStack Installation Guide* for your distribution).
|
||||||
|
|
||||||
|
- Add additional OpenStack Block Storage hosts (see
|
||||||
|
:doc:`ops_maintenance`).
|
|
@ -0,0 +1,11 @@
|
||||||
|
=========================================
|
||||||
|
Parting Thoughts on Architecture Examples
|
||||||
|
=========================================
|
||||||
|
|
||||||
|
With so many considerations and options available, our hope is to
|
||||||
|
provide a few clearly-marked and tested paths for your OpenStack
|
||||||
|
exploration. If you're looking for additional ideas, check out
|
||||||
|
:doc:`app_usecases`, the
|
||||||
|
`OpenStack Installation Guides <http://docs.openstack.org/#install-guides>`_, or the
|
||||||
|
`OpenStack User Stories
|
||||||
|
page <http://www.openstack.org/user-stories/>`_.
|
|
@ -0,0 +1,30 @@
|
||||||
|
=====================
|
||||||
|
Architecture Examples
|
||||||
|
=====================
|
||||||
|
|
||||||
|
To understand the possibilities that OpenStack offers, it's best to
|
||||||
|
start with basic architecture that has been tested in production
|
||||||
|
environments. We offer two examples with basic pivots on the base
|
||||||
|
operating system (Ubuntu and Red Hat Enterprise Linux) and the
|
||||||
|
networking architecture. There are other differences between these two
|
||||||
|
examples and this guide provides reasons for each choice made.
|
||||||
|
|
||||||
|
Because OpenStack is highly configurable, with many different back ends
|
||||||
|
and network configuration options, it is difficult to write
|
||||||
|
documentation that covers all possible OpenStack deployments. Therefore,
|
||||||
|
this guide defines examples of architecture to simplify the task of
|
||||||
|
documenting, as well as to provide the scope for this guide. Both of the
|
||||||
|
offered architecture examples are currently running in production and
|
||||||
|
serving users.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
As always, refer to the :doc:`common/glossary` if you are unclear
|
||||||
|
about any of the terminology mentioned in architecture examples.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
arch_example_nova_network.rst
|
||||||
|
arch_example_neutron.rst
|
||||||
|
arch_example_thoughts.rst
|
|
@ -0,0 +1,290 @@
|
||||||
|
==============
|
||||||
|
Network Design
|
||||||
|
==============
|
||||||
|
|
||||||
|
OpenStack provides a rich networking environment, and this chapter
|
||||||
|
details the requirements and options to deliberate when designing your
|
||||||
|
cloud.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
If this is the first time you are deploying a cloud infrastructure
|
||||||
|
in your organization, after reading this section, your first
|
||||||
|
conversations should be with your networking team. Network usage in
|
||||||
|
a running cloud is vastly different from traditional network
|
||||||
|
deployments and has the potential to be disruptive at both a
|
||||||
|
connectivity and a policy level.
|
||||||
|
|
||||||
|
For example, you must plan the number of IP addresses that you need for
|
||||||
|
both your guest instances as well as management infrastructure.
|
||||||
|
Additionally, you must research and discuss cloud network connectivity
|
||||||
|
through proxy servers and firewalls.
|
||||||
|
|
||||||
|
In this chapter, we'll give some examples of network implementations to
|
||||||
|
consider and provide information about some of the network layouts that
|
||||||
|
OpenStack uses. Finally, we have some brief notes on the networking
|
||||||
|
services that are essential for stable operation.
|
||||||
|
|
||||||
|
Management Network
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A :term:`management network` (a separate network for use by your cloud
|
||||||
|
operators) typically consists of a separate switch and separate NICs
|
||||||
|
(network interface cards), and is a recommended option. This segregation
|
||||||
|
prevents system administration and the monitoring of system access from
|
||||||
|
being disrupted by traffic generated by guests.
|
||||||
|
|
||||||
|
Consider creating other private networks for communication between
|
||||||
|
internal components of OpenStack, such as the message queue and
|
||||||
|
OpenStack Compute. Using a virtual local area network (VLAN) works well
|
||||||
|
for these scenarios because it provides a method for creating multiple
|
||||||
|
virtual networks on a physical network.
|
||||||
|
|
||||||
|
Public Addressing Options
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
There are two main types of IP addresses for guest virtual machines:
|
||||||
|
fixed IPs and floating IPs. Fixed IPs are assigned to instances on boot,
|
||||||
|
whereas floating IP addresses can change their association between
|
||||||
|
instances by action of the user. Both types of IP addresses can be
|
||||||
|
either public or private, depending on your use case.
|
||||||
|
|
||||||
|
Fixed IP addresses are required, whereas it is possible to run OpenStack
|
||||||
|
without floating IPs. One of the most common use cases for floating IPs
|
||||||
|
is to provide public IP addresses to a private cloud, where there are a
|
||||||
|
limited number of IP addresses available. Another is for a public cloud
|
||||||
|
user to have a "static" IP address that can be reassigned when an
|
||||||
|
instance is upgraded or moved.
|
||||||
|
|
||||||
|
Fixed IP addresses can be private for private clouds, or public for
|
||||||
|
public clouds. When an instance terminates, its fixed IP is lost. It is
|
||||||
|
worth noting that newer users of cloud computing may find their
|
||||||
|
ephemeral nature frustrating.
|
||||||
|
|
||||||
|
IP Address Planning
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
An OpenStack installation can potentially have many subnets (ranges of
|
||||||
|
IP addresses) and different types of services in each. An IP address
|
||||||
|
plan can assist with a shared understanding of network partition
|
||||||
|
purposes and scalability. Control services can have public and private
|
||||||
|
IP addresses, and as noted above, there are a couple of options for an
|
||||||
|
instance's public addresses.
|
||||||
|
|
||||||
|
An IP address plan might be broken down into the following sections:
|
||||||
|
|
||||||
|
Subnet router
|
||||||
|
Packets leaving the subnet go via this address, which could be a
|
||||||
|
dedicated router or a ``nova-network`` service.
|
||||||
|
|
||||||
|
Control services public interfaces
|
||||||
|
Public access to ``swift-proxy``, ``nova-api``, ``glance-api``, and
|
||||||
|
horizon come to these addresses, which could be on one side of a
|
||||||
|
load balancer or pointing at individual machines.
|
||||||
|
|
||||||
|
Object Storage cluster internal communications
|
||||||
|
Traffic among object/account/container servers and between these and
|
||||||
|
the proxy server's internal interface uses this private network.
|
||||||
|
|
||||||
|
Compute and storage communications
|
||||||
|
If ephemeral or block storage is external to the compute node, this
|
||||||
|
network is used.
|
||||||
|
|
||||||
|
Out-of-band remote management
|
||||||
|
If a dedicated remote access controller chip is included in servers,
|
||||||
|
often these are on a separate network.
|
||||||
|
|
||||||
|
In-band remote management
|
||||||
|
Often, an extra (such as 1 GB) interface on compute or storage nodes
|
||||||
|
is used for system administrators or monitoring tools to access the
|
||||||
|
host instead of going through the public interface.
|
||||||
|
|
||||||
|
Spare space for future growth
|
||||||
|
Adding more public-facing control services or guest instance IPs
|
||||||
|
should always be part of your plan.
|
||||||
|
|
||||||
|
For example, take a deployment that has both OpenStack Compute and
|
||||||
|
Object Storage, with private ranges 172.22.42.0/24 and 172.22.87.0/26
|
||||||
|
available. One way to segregate the space might be as follows:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
172.22.42.0/24:
|
||||||
|
172.22.42.1 - 172.22.42.3 - subnet routers
|
||||||
|
172.22.42.4 - 172.22.42.20 - spare for networks
|
||||||
|
172.22.42.21 - 172.22.42.104 - Compute node remote access controllers
|
||||||
|
(inc spare)
|
||||||
|
172.22.42.105 - 172.22.42.188 - Compute node management interfaces (inc spare)
|
||||||
|
172.22.42.189 - 172.22.42.208 - Swift proxy remote access controllers
|
||||||
|
(inc spare)
|
||||||
|
172.22.42.209 - 172.22.42.228 - Swift proxy management interfaces (inc spare)
|
||||||
|
172.22.42.229 - 172.22.42.252 - Swift storage servers remote access controllers
|
||||||
|
(inc spare)
|
||||||
|
172.22.42.253 - 172.22.42.254 - spare
|
||||||
|
172.22.87.0/26:
|
||||||
|
172.22.87.1 - 172.22.87.3 - subnet routers
|
||||||
|
172.22.87.4 - 172.22.87.24 - Swift proxy server internal interfaces
|
||||||
|
(inc spare)
|
||||||
|
172.22.87.25 - 172.22.87.63 - Swift object server internal interfaces
|
||||||
|
(inc spare)
|
||||||
|
|
||||||
|
A similar approach can be taken with public IP addresses, taking note
|
||||||
|
that large, flat ranges are preferred for use with guest instance IPs.
|
||||||
|
Take into account that for some OpenStack networking options, a public
|
||||||
|
IP address in the range of a guest instance public IP address is
|
||||||
|
assigned to the ``nova-compute`` host.
|
||||||
|
|
||||||
|
Network Topology
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack Compute with ``nova-network`` provides predefined network
|
||||||
|
deployment models, each with its own strengths and weaknesses. The
|
||||||
|
selection of a network manager changes your network topology, so the
|
||||||
|
choice should be made carefully. You also have a choice between the
|
||||||
|
tried-and-true legacy ``nova-network`` settings or the neutron project
|
||||||
|
for OpenStack Networking. Both offer networking for launched instances
|
||||||
|
with different implementations and requirements.
|
||||||
|
|
||||||
|
For OpenStack Networking with the neutron project, typical
|
||||||
|
configurations are documented with the idea that any setup you can
|
||||||
|
configure with real hardware you can re-create with a software-defined
|
||||||
|
equivalent. Each tenant can contain typical network elements such as
|
||||||
|
routers, and services such as :term:`DHCP`.
|
||||||
|
|
||||||
|
The following table describes the networking deployment options for both
|
||||||
|
legacy ``nova-network`` options and an equivalent neutron
|
||||||
|
configuration.
|
||||||
|
|
||||||
|
.. list-table:: Networking deployment options
|
||||||
|
:widths: 25 25 25 25
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Network deployment model
|
||||||
|
- Strengths
|
||||||
|
- Weaknesses
|
||||||
|
- Neutron equivalent
|
||||||
|
* - Flat
|
||||||
|
- Extremely simple topology. No DHCP overhead.
|
||||||
|
- Requires file injection into the instance to configure network
|
||||||
|
interfaces.
|
||||||
|
- Configure a single bridge as the integration bridge (br-int) and
|
||||||
|
connect it to a physical network interface with the Modular Layer 2
|
||||||
|
(ML2) plug-in, which uses Open vSwitch by default.
|
||||||
|
* - FlatDHCP
|
||||||
|
- Relatively simple to deploy. Standard networking. Works with all guest
|
||||||
|
operating systems.
|
||||||
|
- Requires its own DHCP broadcast domain.
|
||||||
|
- Configure DHCP agents and routing agents. Network Address Translation
|
||||||
|
(NAT) performed outside of compute nodes, typically on one or more
|
||||||
|
network nodes.
|
||||||
|
* - VlanManager
|
||||||
|
- Each tenant is isolated to its own VLANs.
|
||||||
|
- More complex to set up. Requires its own DHCP broadcast domain.
|
||||||
|
Requires many VLANs to be trunked onto a single port. Standard VLAN
|
||||||
|
number limitation. Switches must support 802.1q VLAN tagging.
|
||||||
|
- Isolated tenant networks implement some form of isolation of layer 2
|
||||||
|
traffic between distinct networks. VLAN tagging is key concept, where
|
||||||
|
traffic is “tagged” with an ordinal identifier for the VLAN. Isolated
|
||||||
|
network implementations may or may not include additional services like
|
||||||
|
DHCP, NAT, and routing.
|
||||||
|
* - FlatDHCP Multi-host with high availability (HA)
|
||||||
|
- Networking failure is isolated to the VMs running on the affected
|
||||||
|
hypervisor. DHCP traffic can be isolated within an individual host.
|
||||||
|
Network traffic is distributed to the compute nodes.
|
||||||
|
- More complex to set up. Compute nodes typically need IP addresses
|
||||||
|
accessible by external networks. Options must be carefully configured
|
||||||
|
for live migration to work with networking services.
|
||||||
|
- Configure neutron with multiple DHCP and layer-3 agents. Network nodes
|
||||||
|
are not able to failover to each other, so the controller runs
|
||||||
|
networking services, such as DHCP. Compute nodes run the ML2 plug-in
|
||||||
|
with support for agents such as Open vSwitch or Linux Bridge.
|
||||||
|
|
||||||
|
Both ``nova-network`` and neutron services provide similar capabilities,
|
||||||
|
such as VLAN between VMs. You also can provide multiple NICs on VMs with
|
||||||
|
either service. Further discussion follows.
|
||||||
|
|
||||||
|
VLAN Configuration Within OpenStack VMs
|
||||||
|
---------------------------------------
|
||||||
|
|
||||||
|
VLAN configuration can be as simple or as complicated as desired. The
|
||||||
|
use of VLANs has the benefit of allowing each project its own subnet and
|
||||||
|
broadcast segregation from other projects. To allow OpenStack to
|
||||||
|
efficiently use VLANs, you must allocate a VLAN range (one for each
|
||||||
|
project) and turn each compute node switch port into a trunk
|
||||||
|
port.
|
||||||
|
|
||||||
|
For example, if you estimate that your cloud must support a maximum of
|
||||||
|
100 projects, pick a free VLAN range that your network infrastructure is
|
||||||
|
currently not using (such as VLAN 200–299). You must configure OpenStack
|
||||||
|
with this range and also configure your switch ports to allow VLAN
|
||||||
|
traffic from that range.
|
||||||
|
|
||||||
|
Multi-NIC Provisioning
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
OpenStack Networking with ``neutron`` and OpenStack Compute with
|
||||||
|
``nova-network`` have the ability to assign multiple NICs to instances. For
|
||||||
|
``nova-network`` this can be done on a per-request basis, with each
|
||||||
|
additional NIC using up an entire subnet or VLAN, reducing the total
|
||||||
|
number of supported projects.
|
||||||
|
|
||||||
|
Multi-Host and Single-Host Networking
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
The ``nova-network`` service has the ability to operate in a multi-host
|
||||||
|
or single-host mode. Multi-host is when each compute node runs a copy of
|
||||||
|
``nova-network`` and the instances on that compute node use the compute
|
||||||
|
node as a gateway to the Internet. The compute nodes also host the
|
||||||
|
floating IPs and security groups for instances on that node. Single-host
|
||||||
|
is when a central server—for example, the cloud controller—runs the
|
||||||
|
``nova-network`` service. All compute nodes forward traffic from the
|
||||||
|
instances to the cloud controller. The cloud controller then forwards
|
||||||
|
traffic to the Internet. The cloud controller hosts the floating IPs and
|
||||||
|
security groups for all instances on all compute nodes in the
|
||||||
|
cloud.
|
||||||
|
|
||||||
|
There are benefits to both modes. Single-node has the downside of a
|
||||||
|
single point of failure. If the cloud controller is not available,
|
||||||
|
instances cannot communicate on the network. This is not true with
|
||||||
|
multi-host, but multi-host requires that each compute node has a public
|
||||||
|
IP address to communicate on the Internet. If you are not able to obtain
|
||||||
|
a significant block of public IP addresses, multi-host might not be an
|
||||||
|
option.
|
||||||
|
|
||||||
|
Services for Networking
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack, like any network application, has a number of standard
|
||||||
|
considerations to apply, such as NTP and DNS.
|
||||||
|
|
||||||
|
NTP
|
||||||
|
---
|
||||||
|
|
||||||
|
Time synchronization is a critical element to ensure continued operation
|
||||||
|
of OpenStack components. Correct time is necessary to avoid errors in
|
||||||
|
instance scheduling, replication of objects in the object store, and
|
||||||
|
even matching log timestamps for debugging.
|
||||||
|
|
||||||
|
All servers running OpenStack components should be able to access an
|
||||||
|
appropriate NTP server. You may decide to set up one locally or use the
|
||||||
|
public pools available from the `Network Time Protocol
|
||||||
|
project <http://www.pool.ntp.org/en/>`_.
|
||||||
|
|
||||||
|
DNS
|
||||||
|
---
|
||||||
|
|
||||||
|
OpenStack does not currently provide DNS services, aside from the
|
||||||
|
dnsmasq daemon, which resides on ``nova-network`` hosts. You could
|
||||||
|
consider providing a dynamic DNS service to allow instances to update a
|
||||||
|
DNS entry with new IP addresses. You can also consider making a generic
|
||||||
|
forward and reverse DNS mapping for instances' IP addresses, such as
|
||||||
|
vm-203-0-113-123.example.com.
|
||||||
|
|
||||||
|
Conclusion
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Armed with your IP address layout and numbers and knowledge about the
|
||||||
|
topologies and services you can use, it's now time to prepare the
|
||||||
|
network for your installation. Be sure to also check out the `OpenStack
|
||||||
|
Security Guide <http://docs.openstack.org/sec/>`_ for tips on securing
|
||||||
|
your network. We wish you a good relationship with your networking team!
|
|
@ -0,0 +1,252 @@
|
||||||
|
===========================
|
||||||
|
Provisioning and Deployment
|
||||||
|
===========================
|
||||||
|
|
||||||
|
A critical part of a cloud's scalability is the amount of effort that it
|
||||||
|
takes to run your cloud. To minimize the operational cost of running
|
||||||
|
your cloud, set up and use an automated deployment and configuration
|
||||||
|
infrastructure with a configuration management system, such as :term:`Puppet`
|
||||||
|
or :term:`Chef`. Combined, these systems greatly reduce manual effort and the
|
||||||
|
chance for operator error.
|
||||||
|
|
||||||
|
This infrastructure includes systems to automatically install the
|
||||||
|
operating system's initial configuration and later coordinate the
|
||||||
|
configuration of all services automatically and centrally, which reduces
|
||||||
|
both manual effort and the chance for error. Examples include Ansible,
|
||||||
|
CFEngine, Chef, Puppet, and Salt. You can even use OpenStack to deploy
|
||||||
|
OpenStack, named TripleO (OpenStack On OpenStack).
|
||||||
|
|
||||||
|
Automated Deployment
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
An automated deployment system installs and configures operating systems
|
||||||
|
on new servers, without intervention, after the absolute minimum amount
|
||||||
|
of manual work, including physical racking, MAC-to-IP assignment, and
|
||||||
|
power configuration. Typically, solutions rely on wrappers around PXE
|
||||||
|
boot and TFTP servers for the basic operating system install and then
|
||||||
|
hand off to an automated configuration management system.
|
||||||
|
|
||||||
|
Both Ubuntu and Red Hat Enterprise Linux include mechanisms for
|
||||||
|
configuring the operating system, including preseed and kickstart, that
|
||||||
|
you can use after a network boot. Typically, these are used to bootstrap
|
||||||
|
an automated configuration system. Alternatively, you can use an
|
||||||
|
image-based approach for deploying the operating system, such as
|
||||||
|
systemimager. You can use both approaches with a virtualized
|
||||||
|
infrastructure, such as when you run VMs to separate your control
|
||||||
|
services and physical infrastructure.
|
||||||
|
|
||||||
|
When you create a deployment plan, focus on a few vital areas because
|
||||||
|
they are very hard to modify post deployment. The next two sections talk
|
||||||
|
about configurations for:
|
||||||
|
|
||||||
|
- Disk partitioning and disk array setup for scalability
|
||||||
|
|
||||||
|
- Networking configuration just for PXE booting
|
||||||
|
|
||||||
|
Disk Partitioning and RAID
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
At the very base of any operating system are the hard drives on which
|
||||||
|
the operating system (OS) is installed.
|
||||||
|
|
||||||
|
You must complete the following configurations on the server's hard
|
||||||
|
drives:
|
||||||
|
|
||||||
|
- Partitioning, which provides greater flexibility for layout of
|
||||||
|
operating system and swap space, as described below.
|
||||||
|
|
||||||
|
- Adding to a RAID array (RAID stands for redundant array of
|
||||||
|
independent disks), based on the number of disks you have available,
|
||||||
|
so that you can add capacity as your cloud grows. Some options are
|
||||||
|
described in more detail below.
|
||||||
|
|
||||||
|
The simplest option to get started is to use one hard drive with two
|
||||||
|
partitions:
|
||||||
|
|
||||||
|
- File system to store files and directories, where all the data lives,
|
||||||
|
including the root partition that starts and runs the system.
|
||||||
|
|
||||||
|
- Swap space to free up memory for processes, as an independent area of
|
||||||
|
the physical disk used only for swapping and nothing else.
|
||||||
|
|
||||||
|
RAID is not used in this simplistic one-drive setup because generally
|
||||||
|
for production clouds, you want to ensure that if one disk fails,
|
||||||
|
another can take its place. Instead, for production, use more than one
|
||||||
|
disk. The number of disks determine what types of RAID arrays to build.
|
||||||
|
|
||||||
|
We recommend that you choose one of the following multiple disk options:
|
||||||
|
|
||||||
|
Option 1
|
||||||
|
Partition all drives in the same way in a horizontal fashion, as
|
||||||
|
shown in :ref:`partition_setup`.
|
||||||
|
|
||||||
|
With this option, you can assign different partitions to different
|
||||||
|
RAID arrays. You can allocate partition 1 of disk one and two to the
|
||||||
|
``/boot`` partition mirror. You can make partition 2 of all disks
|
||||||
|
the root partition mirror. You can use partition 3 of all disks for
|
||||||
|
a ``cinder-volumes`` LVM partition running on a RAID 10 array.
|
||||||
|
|
||||||
|
.. _partition_setup:
|
||||||
|
|
||||||
|
.. figure:: figures/osog_0201.png
|
||||||
|
|
||||||
|
Figure. Partition setup of drives
|
||||||
|
|
||||||
|
While you might end up with unused partitions, such as partition 1
|
||||||
|
in disk three and four of this example, this option allows for
|
||||||
|
maximum utilization of disk space. I/O performance might be an issue
|
||||||
|
as a result of all disks being used for all tasks.
|
||||||
|
|
||||||
|
Option 2
|
||||||
|
Add all raw disks to one large RAID array, either hardware or
|
||||||
|
software based. You can partition this large array with the boot,
|
||||||
|
root, swap, and LVM areas. This option is simple to implement and
|
||||||
|
uses all partitions. However, disk I/O might suffer.
|
||||||
|
|
||||||
|
Option 3
|
||||||
|
Dedicate entire disks to certain partitions. For example, you could
|
||||||
|
allocate disk one and two entirely to the boot, root, and swap
|
||||||
|
partitions under a RAID 1 mirror. Then, allocate disk three and four
|
||||||
|
entirely to the LVM partition, also under a RAID 1 mirror. Disk I/O
|
||||||
|
should be better because I/O is focused on dedicated tasks. However,
|
||||||
|
the LVM partition is much smaller.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
You may find that you can automate the partitioning itself. For
|
||||||
|
example, MIT uses `Fully Automatic Installation
|
||||||
|
(FAI) <http://fai-project.org/>`_ to do the initial PXE-based
|
||||||
|
partition and then install using a combination of min/max and
|
||||||
|
percentage-based partitioning.
|
||||||
|
|
||||||
|
As with most architecture choices, the right answer depends on your
|
||||||
|
environment. If you are using existing hardware, you know the disk
|
||||||
|
density of your servers and can determine some decisions based on the
|
||||||
|
options above. If you are going through a procurement process, your
|
||||||
|
user's requirements also help you determine hardware purchases. Here are
|
||||||
|
some examples from a private cloud providing web developers custom
|
||||||
|
environments at AT&T. This example is from a specific deployment, so
|
||||||
|
your existing hardware or procurement opportunity may vary from this.
|
||||||
|
AT&T uses three types of hardware in its deployment:
|
||||||
|
|
||||||
|
- Hardware for controller nodes, used for all stateless OpenStack API
|
||||||
|
services. About 32–64 GB memory, small attached disk, one processor,
|
||||||
|
varied number of cores, such as 6–12.
|
||||||
|
|
||||||
|
- Hardware for compute nodes. Typically 256 or 144 GB memory, two
|
||||||
|
processors, 24 cores. 4–6 TB direct attached storage, typically in a
|
||||||
|
RAID 5 configuration.
|
||||||
|
|
||||||
|
- Hardware for storage nodes. Typically for these, the disk space is
|
||||||
|
optimized for the lowest cost per GB of storage while maintaining
|
||||||
|
rack-space efficiency.
|
||||||
|
|
||||||
|
Again, the right answer depends on your environment. You have to make
|
||||||
|
your decision based on the trade-offs between space utilization,
|
||||||
|
simplicity, and I/O performance.
|
||||||
|
|
||||||
|
Network Configuration
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Network configuration is a very large topic that spans multiple areas of
|
||||||
|
this book. For now, make sure that your servers can PXE boot and
|
||||||
|
successfully communicate with the deployment server.
|
||||||
|
|
||||||
|
For example, you usually cannot configure NICs for VLANs when PXE
|
||||||
|
booting. Additionally, you usually cannot PXE boot with bonded NICs. If
|
||||||
|
you run into this scenario, consider using a simple 1 GB switch in a
|
||||||
|
private network on which only your cloud communicates.
|
||||||
|
|
||||||
|
Automated Configuration
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The purpose of automatic configuration management is to establish and
|
||||||
|
maintain the consistency of a system without using human intervention.
|
||||||
|
You want to maintain consistency in your deployments so that you can
|
||||||
|
have the same cloud every time, repeatably. Proper use of automatic
|
||||||
|
configuration-management tools ensures that components of the cloud
|
||||||
|
systems are in particular states, in addition to simplifying deployment,
|
||||||
|
and configuration change propagation.
|
||||||
|
|
||||||
|
These tools also make it possible to test and roll back changes, as they
|
||||||
|
are fully repeatable. Conveniently, a large body of work has been done
|
||||||
|
by the OpenStack community in this space. Puppet, a configuration
|
||||||
|
management tool, even provides official modules for OpenStack projects
|
||||||
|
in an OpenStack infrastructure system known as `Puppet
|
||||||
|
OpenStack <https://wiki.openstack.org/wiki/Puppet>`_. Chef
|
||||||
|
configuration management is provided within
|
||||||
|
https://git.openstack.org/cgit/openstack/openstack-chef-repo. Additional
|
||||||
|
configuration management systems include Juju, Ansible, and Salt. Also,
|
||||||
|
PackStack is a command-line utility for Red Hat Enterprise Linux and
|
||||||
|
derivatives that uses Puppet modules to support rapid deployment of
|
||||||
|
OpenStack on existing servers over an SSH connection.
|
||||||
|
|
||||||
|
An integral part of a configuration-management system is the item that
|
||||||
|
it controls. You should carefully consider all of the items that you
|
||||||
|
want, or do not want, to be automatically managed. For example, you may
|
||||||
|
not want to automatically format hard drives with user data.
|
||||||
|
|
||||||
|
Remote Management
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In our experience, most operators don't sit right next to the servers
|
||||||
|
running the cloud, and many don't necessarily enjoy visiting the data
|
||||||
|
center. OpenStack should be entirely remotely configurable, but
|
||||||
|
sometimes not everything goes according to plan.
|
||||||
|
|
||||||
|
In this instance, having an out-of-band access into nodes running
|
||||||
|
OpenStack components is a boon. The IPMI protocol is the de facto
|
||||||
|
standard here, and acquiring hardware that supports it is highly
|
||||||
|
recommended to achieve that lights-out data center aim.
|
||||||
|
|
||||||
|
In addition, consider remote power control as well. While IPMI usually
|
||||||
|
controls the server's power state, having remote access to the PDU that
|
||||||
|
the server is plugged into can really be useful for situations when
|
||||||
|
everything seems wedged.
|
||||||
|
|
||||||
|
Parting Thoughts for Provisioning and Deploying OpenStack
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
You can save time by understanding the use cases for the cloud you want
|
||||||
|
to create. Use cases for OpenStack are varied. Some include object
|
||||||
|
storage only; others require preconfigured compute resources to speed
|
||||||
|
development-environment set up; and others need fast provisioning of
|
||||||
|
compute resources that are already secured per tenant with private
|
||||||
|
networks. Your users may have need for highly redundant servers to make
|
||||||
|
sure their legacy applications continue to run. Perhaps a goal would be
|
||||||
|
to architect these legacy applications so that they run on multiple
|
||||||
|
instances in a cloudy, fault-tolerant way, but not make it a goal to add
|
||||||
|
to those clusters over time. Your users may indicate that they need
|
||||||
|
scaling considerations because of heavy Windows server use.
|
||||||
|
|
||||||
|
You can save resources by looking at the best fit for the hardware you
|
||||||
|
have in place already. You might have some high-density storage hardware
|
||||||
|
available. You could format and repurpose those servers for OpenStack
|
||||||
|
Object Storage. All of these considerations and input from users help
|
||||||
|
you build your use case and your deployment plan.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
For further research about OpenStack deployment, investigate the
|
||||||
|
supported and documented preconfigured, prepackaged installers for
|
||||||
|
OpenStack from companies such as
|
||||||
|
`Canonical <http://www.ubuntu.com/cloud/ubuntu-openstack>`_,
|
||||||
|
`Cisco <http://www.cisco.com/web/solutions/openstack/index.html>`_,
|
||||||
|
`Cloudscaling <http://www.cloudscaling.com/>`_,
|
||||||
|
`IBM <http://www-03.ibm.com/software/products/en/smartcloud-orchestrator/>`_,
|
||||||
|
`Metacloud <http://www.metacloud.com/>`_,
|
||||||
|
`Mirantis <http://www.mirantis.com/>`_,
|
||||||
|
`Piston <http://www.pistoncloud.com/>`_,
|
||||||
|
`Rackspace <http://www.rackspace.com/cloud/private/>`_, `Red
|
||||||
|
Hat <http://www.redhat.com/openstack/>`_,
|
||||||
|
`SUSE <https://www.suse.com/products/suse-cloud/>`_, and
|
||||||
|
`SwiftStack <https://www.swiftstack.com/>`_.
|
||||||
|
|
||||||
|
Conclusion
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
The decisions you make with respect to provisioning and deployment will
|
||||||
|
affect your day-to-day, week-to-week, and month-to-month maintenance of
|
||||||
|
the cloud. Your configuration management will be able to evolve over
|
||||||
|
time. However, more thought and design need to be done for upfront
|
||||||
|
choices about deployment, disk partitioning, and network configuration.
|
|
@ -0,0 +1,427 @@
|
||||||
|
=======
|
||||||
|
Scaling
|
||||||
|
=======
|
||||||
|
|
||||||
|
Whereas traditional applications required larger hardware to scale
|
||||||
|
("vertical scaling"), cloud-based applications typically request more,
|
||||||
|
discrete hardware ("horizontal scaling"). If your cloud is successful,
|
||||||
|
eventually you must add resources to meet the increasing demand.
|
||||||
|
|
||||||
|
To suit the cloud paradigm, OpenStack itself is designed to be
|
||||||
|
horizontally scalable. Rather than switching to larger servers, you
|
||||||
|
procure more servers and simply install identically configured services.
|
||||||
|
Ideally, you scale out and load balance among groups of functionally
|
||||||
|
identical services (for example, compute nodes or ``nova-api`` nodes),
|
||||||
|
that communicate on a message bus.
|
||||||
|
|
||||||
|
The Starting Point
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Determining the scalability of your cloud and how to improve it is an
|
||||||
|
exercise with many variables to balance. No one solution meets
|
||||||
|
everyone's scalability goals. However, it is helpful to track a number
|
||||||
|
of metrics. Since you can define virtual hardware templates, called
|
||||||
|
"flavors" in OpenStack, you can start to make scaling decisions based on
|
||||||
|
the flavors you'll provide. These templates define sizes for memory in
|
||||||
|
RAM, root disk size, amount of ephemeral data disk space available, and
|
||||||
|
number of cores for starters.
|
||||||
|
|
||||||
|
The default OpenStack flavors are shown in the following table.
|
||||||
|
|
||||||
|
.. list-table:: OpenStack default flavors
|
||||||
|
:widths: 20 20 20 20 20
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Name
|
||||||
|
- Virtual cores
|
||||||
|
- Memory
|
||||||
|
- Disk
|
||||||
|
- Ephemeral
|
||||||
|
* - m1.tiny
|
||||||
|
- 1
|
||||||
|
- 512 MB
|
||||||
|
- 1 GB
|
||||||
|
- 0 GB
|
||||||
|
* - m1.small
|
||||||
|
- 1
|
||||||
|
- 2 GB
|
||||||
|
- 10 GB
|
||||||
|
- 20 GB
|
||||||
|
* - m1.medium
|
||||||
|
- 2
|
||||||
|
- 4 GB
|
||||||
|
- 10 GB
|
||||||
|
- 40 GB
|
||||||
|
* - m1.large
|
||||||
|
- 4
|
||||||
|
- 8 GB
|
||||||
|
- 10 GB
|
||||||
|
- 80 GB
|
||||||
|
* - m1.xlarge
|
||||||
|
- 8
|
||||||
|
- 16 GB
|
||||||
|
- 10 GB
|
||||||
|
- 160 GB
|
||||||
|
|
||||||
|
The starting point for most is the core count of your cloud. By applying
|
||||||
|
some ratios, you can gather information about:
|
||||||
|
|
||||||
|
- The number of virtual machines (VMs) you expect to run,
|
||||||
|
``((overcommit fraction × cores) / virtual cores per instance)``
|
||||||
|
|
||||||
|
- How much storage is required ``(flavor disk size × number of instances)``
|
||||||
|
|
||||||
|
You can use these ratios to determine how much additional infrastructure
|
||||||
|
you need to support your cloud.
|
||||||
|
|
||||||
|
Here is an example using the ratios for gathering scalability
|
||||||
|
information for the number of VMs expected as well as the storage
|
||||||
|
needed. The following numbers support (200 / 2) × 16 = 1600 VM instances
|
||||||
|
and require 80 TB of storage for ``/var/lib/nova/instances``:
|
||||||
|
|
||||||
|
- 200 physical cores.
|
||||||
|
|
||||||
|
- Most instances are size m1.medium (two virtual cores, 50 GB of
|
||||||
|
storage).
|
||||||
|
|
||||||
|
- Default CPU overcommit ratio (``cpu_allocation_ratio`` in nova.conf)
|
||||||
|
of 16:1.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Regardless of the overcommit ratio, an instance can not be placed
|
||||||
|
on any physical node with fewer raw (pre-overcommit) resources than
|
||||||
|
instance flavor requires.
|
||||||
|
|
||||||
|
However, you need more than the core count alone to estimate the load
|
||||||
|
that the API services, database servers, and queue servers are likely to
|
||||||
|
encounter. You must also consider the usage patterns of your cloud.
|
||||||
|
|
||||||
|
As a specific example, compare a cloud that supports a managed
|
||||||
|
web-hosting platform with one running integration tests for a
|
||||||
|
development project that creates one VM per code commit. In the former,
|
||||||
|
the heavy work of creating a VM happens only every few months, whereas
|
||||||
|
the latter puts constant heavy load on the cloud controller. You must
|
||||||
|
consider your average VM lifetime, as a larger number generally means
|
||||||
|
less load on the cloud controller.
|
||||||
|
|
||||||
|
Aside from the creation and termination of VMs, you must consider the
|
||||||
|
impact of users accessing the service—particularly on ``nova-api`` and
|
||||||
|
its associated database. Listing instances garners a great deal of
|
||||||
|
information and, given the frequency with which users run this
|
||||||
|
operation, a cloud with a large number of users can increase the load
|
||||||
|
significantly. This can occur even without their knowledge—leaving the
|
||||||
|
OpenStack dashboard instances tab open in the browser refreshes the list
|
||||||
|
of VMs every 30 seconds.
|
||||||
|
|
||||||
|
After you consider these factors, you can determine how many cloud
|
||||||
|
controller cores you require. A typical eight core, 8 GB of RAM server
|
||||||
|
is sufficient for up to a rack of compute nodes — given the above
|
||||||
|
caveats.
|
||||||
|
|
||||||
|
You must also consider key hardware specifications for the performance
|
||||||
|
of user VMs, as well as budget and performance needs, including storage
|
||||||
|
performance (spindles/core), memory availability (RAM/core), network
|
||||||
|
bandwidthbandwidth hardware specifications and (Gbps/core), and overall
|
||||||
|
CPU performance (CPU/core).
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
For a discussion of metric tracking, including how to extract
|
||||||
|
metrics from your cloud, see :doc:`ops_logging_monitoring`.
|
||||||
|
|
||||||
|
Adding Cloud Controller Nodes
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
You can facilitate the horizontal expansion of your cloud by adding
|
||||||
|
nodes. Adding compute nodes is straightforward—they are easily picked up
|
||||||
|
by the existing installation. However, you must consider some important
|
||||||
|
points when you design your cluster to be highly available.
|
||||||
|
|
||||||
|
Recall that a cloud controller node runs several different services. You
|
||||||
|
can install services that communicate only using the message queue
|
||||||
|
internally—\ ``nova-scheduler`` and ``nova-console``—on a new server for
|
||||||
|
expansion. However, other integral parts require more care.
|
||||||
|
|
||||||
|
You should load balance user-facing services such as dashboard,
|
||||||
|
``nova-api``, or the Object Storage proxy. Use any standard HTTP
|
||||||
|
load-balancing method (DNS round robin, hardware load balancer, or
|
||||||
|
software such as Pound or HAProxy). One caveat with dashboard is the VNC
|
||||||
|
proxy, which uses the WebSocket protocol—something that an L7 load
|
||||||
|
balancer might struggle with. See also `Horizon session storage
|
||||||
|
<http://docs.openstack.org/developer/horizon/topics/deployment.html#session-storage>`_.
|
||||||
|
|
||||||
|
You can configure some services, such as ``nova-api`` and
|
||||||
|
``glance-api``, to use multiple processes by changing a flag in their
|
||||||
|
configuration file—allowing them to share work between multiple cores on
|
||||||
|
the one machine.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Several options are available for MySQL load balancing, and the
|
||||||
|
supported AMQP brokers have built-in clustering support. Information
|
||||||
|
on how to configure these and many of the other services can be
|
||||||
|
found in :doc:`operations`.
|
||||||
|
|
||||||
|
Segregating Your Cloud
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
When you want to offer users different regions to provide legal
|
||||||
|
considerations for data storage, redundancy across earthquake fault
|
||||||
|
lines, or for low-latency API calls, you segregate your cloud. Use one
|
||||||
|
of the following OpenStack methods to segregate your cloud: *cells*,
|
||||||
|
*regions*, *availability zones*, or *host aggregates*.
|
||||||
|
|
||||||
|
Each method provides different functionality and can be best divided
|
||||||
|
into two groups:
|
||||||
|
|
||||||
|
- Cells and regions, which segregate an entire cloud and result in
|
||||||
|
running separate Compute deployments.
|
||||||
|
|
||||||
|
- :term:`Availability zones <availability zone>` and host aggregates, which
|
||||||
|
merely divide a single Compute deployment.
|
||||||
|
|
||||||
|
The table below provides a comparison view of each segregation method currently
|
||||||
|
provided by OpenStack Compute.
|
||||||
|
|
||||||
|
.. list-table:: OpenStack segregation methods
|
||||||
|
:widths: 20 20 20 20 20
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* -
|
||||||
|
- Cells
|
||||||
|
- Regions
|
||||||
|
- Availability zones
|
||||||
|
- Host aggregates
|
||||||
|
* - **Use when you need**
|
||||||
|
- A single :term:`API endpoint` for compute, or you require a second
|
||||||
|
level of scheduling.
|
||||||
|
- Discrete regions with separate API endpoints and no coordination
|
||||||
|
between regions.
|
||||||
|
- Logical separation within your nova deployment for physical isolation
|
||||||
|
or redundancy.
|
||||||
|
- To schedule a group of hosts with common features.
|
||||||
|
* - **Example**
|
||||||
|
- A cloud with multiple sites where you can schedule VMs "anywhere" or on
|
||||||
|
a particular site.
|
||||||
|
- A cloud with multiple sites, where you schedule VMs to a particular
|
||||||
|
site and you want a shared infrastructure.
|
||||||
|
- A single-site cloud with equipment fed by separate power supplies.
|
||||||
|
- Scheduling to hosts with trusted hardware support.
|
||||||
|
* - **Overhead**
|
||||||
|
- Considered experimental. A new service, nova-cells. Each cell has a full
|
||||||
|
nova installation except nova-api.
|
||||||
|
- A different API endpoint for every region. Each region has a full nova
|
||||||
|
installation.
|
||||||
|
- Configuration changes to ``nova.conf``.
|
||||||
|
- Configuration changes to ``nova.conf``.
|
||||||
|
* - **Shared services**
|
||||||
|
- Keystone, ``nova-api``
|
||||||
|
- Keystone
|
||||||
|
- Keystone, All nova services
|
||||||
|
- Keystone, All nova services
|
||||||
|
|
||||||
|
|
||||||
|
Cells and Regions
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
OpenStack Compute cells are designed to allow running the cloud in a
|
||||||
|
distributed fashion without having to use more complicated technologies,
|
||||||
|
or be invasive to existing nova installations. Hosts in a cloud are
|
||||||
|
partitioned into groups called *cells*. Cells are configured in a tree.
|
||||||
|
The top-level cell ("API cell") has a host that runs the ``nova-api``
|
||||||
|
service, but no ``nova-compute`` services. Each child cell runs all of
|
||||||
|
the other typical ``nova-*`` services found in a regular installation,
|
||||||
|
except for the ``nova-api`` service. Each cell has its own message queue
|
||||||
|
and database service and also runs ``nova-cells``, which manages the
|
||||||
|
communication between the API cell and child cells.
|
||||||
|
|
||||||
|
This allows for a single API server being used to control access to
|
||||||
|
multiple cloud installations. Introducing a second level of scheduling
|
||||||
|
(the cell selection), in addition to the regular ``nova-scheduler``
|
||||||
|
selection of hosts, provides greater flexibility to control where
|
||||||
|
virtual machines are run.
|
||||||
|
|
||||||
|
Unlike having a single API endpoint, regions have a separate API
|
||||||
|
endpoint per installation, allowing for a more discrete separation.
|
||||||
|
Users wanting to run instances across sites have to explicitly select a
|
||||||
|
region. However, the additional complexity of a running a new service is
|
||||||
|
not required.
|
||||||
|
|
||||||
|
The OpenStack dashboard (horizon) can be configured to use multiple
|
||||||
|
regions. This can be configured through the ``AVAILABLE_REGIONS``
|
||||||
|
parameter.
|
||||||
|
|
||||||
|
Availability Zones and Host Aggregates
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
|
You can use availability zones, host aggregates, or both to partition a
|
||||||
|
nova deployment.
|
||||||
|
|
||||||
|
Availability zones are implemented through and configured in a similar
|
||||||
|
way to host aggregates.
|
||||||
|
|
||||||
|
However, you use them for different reasons.
|
||||||
|
|
||||||
|
Availability zone
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This enables you to arrange OpenStack compute hosts into logical groups
|
||||||
|
and provides a form of physical isolation and redundancy from other
|
||||||
|
availability zones, such as by using a separate power supply or network
|
||||||
|
equipment.
|
||||||
|
|
||||||
|
You define the availability zone in which a specified compute host
|
||||||
|
resides locally on each server. An availability zone is commonly used to
|
||||||
|
identify a set of servers that have a common attribute. For instance, if
|
||||||
|
some of the racks in your data center are on a separate power source,
|
||||||
|
you can put servers in those racks in their own availability zone.
|
||||||
|
Availability zones can also help separate different classes of hardware.
|
||||||
|
|
||||||
|
When users provision resources, they can specify from which availability
|
||||||
|
zone they want their instance to be built. This allows cloud consumers
|
||||||
|
to ensure that their application resources are spread across disparate
|
||||||
|
machines to achieve high availability in the event of hardware failure.
|
||||||
|
|
||||||
|
Host aggregates zone
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This enables you to partition OpenStack Compute deployments into logical
|
||||||
|
groups for load balancing and instance distribution. You can use host
|
||||||
|
aggregates to further partition an availability zone. For example, you
|
||||||
|
might use host aggregates to partition an availability zone into groups
|
||||||
|
of hosts that either share common resources, such as storage and
|
||||||
|
network, or have a special property, such as trusted computing
|
||||||
|
hardware.
|
||||||
|
|
||||||
|
A common use of host aggregates is to provide information for use with
|
||||||
|
the ``nova-scheduler``. For example, you might use a host aggregate to
|
||||||
|
group a set of hosts that share specific flavors or images.
|
||||||
|
|
||||||
|
The general case for this is setting key-value pairs in the aggregate
|
||||||
|
metadata and matching key-value pairs in flavor's ``extra_specs``
|
||||||
|
metadata. The ``AggregateInstanceExtraSpecsFilter`` in the filter
|
||||||
|
scheduler will enforce that instances be scheduled only on hosts in
|
||||||
|
aggregates that define the same key to the same value.
|
||||||
|
|
||||||
|
An advanced use of this general concept allows different flavor types to
|
||||||
|
run with different CPU and RAM allocation ratios so that high-intensity
|
||||||
|
computing loads and low-intensity development and testing systems can
|
||||||
|
share the same cloud without either starving the high-use systems or
|
||||||
|
wasting resources on low-utilization systems. This works by setting
|
||||||
|
``metadata`` in your host aggregates and matching ``extra_specs`` in
|
||||||
|
your flavor types.
|
||||||
|
|
||||||
|
The first step is setting the aggregate metadata keys
|
||||||
|
``cpu_allocation_ratio`` and ``ram_allocation_ratio`` to a
|
||||||
|
floating-point value. The filter schedulers ``AggregateCoreFilter`` and
|
||||||
|
``AggregateRamFilter`` will use those values rather than the global
|
||||||
|
defaults in ``nova.conf`` when scheduling to hosts in the aggregate. It
|
||||||
|
is important to be cautious when using this feature, since each host can
|
||||||
|
be in multiple aggregates but should have only one allocation ratio for
|
||||||
|
each resources. It is up to you to avoid putting a host in multiple
|
||||||
|
aggregates that define different values for the same resource.
|
||||||
|
|
||||||
|
This is the first half of the equation. To get flavor types that are
|
||||||
|
guaranteed a particular ratio, you must set the ``extra_specs`` in the
|
||||||
|
flavor type to the key-value pair you want to match in the aggregate.
|
||||||
|
For example, if you define ``extra_specs`` ``cpu_allocation_ratio`` to
|
||||||
|
"1.0", then instances of that type will run in aggregates only where the
|
||||||
|
metadata key ``cpu_allocation_ratio`` is also defined as "1.0." In
|
||||||
|
practice, it is better to define an additional key-value pair in the
|
||||||
|
aggregate metadata to match on rather than match directly on
|
||||||
|
``cpu_allocation_ratio`` or ``core_allocation_ratio``. This allows
|
||||||
|
better abstraction. For example, by defining a key ``overcommit`` and
|
||||||
|
setting a value of "high," "medium," or "low," you could then tune the
|
||||||
|
numeric allocation ratios in the aggregates without also needing to
|
||||||
|
change all flavor types relating to them.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Previously, all services had an availability zone. Currently, only
|
||||||
|
the ``nova-compute`` service has its own availability zone. Services
|
||||||
|
such as ``nova-scheduler``, ``nova-network``, and ``nova-conductor``
|
||||||
|
have always spanned all availability zones.
|
||||||
|
|
||||||
|
When you run any of the following operations, the services appear in
|
||||||
|
their own internal availability zone
|
||||||
|
(CONF.internal_service_availability_zone):
|
||||||
|
|
||||||
|
- :command:`nova host-list` (os-hosts)
|
||||||
|
|
||||||
|
- :command:`euca-describe-availability-zones verbose`
|
||||||
|
|
||||||
|
- :command:`nova service-list`
|
||||||
|
|
||||||
|
The internal availability zone is hidden in
|
||||||
|
euca-describe-availability_zones (nonverbose).
|
||||||
|
|
||||||
|
CONF.node_availability_zone has been renamed to
|
||||||
|
CONF.default_availability_zone and is used only by the
|
||||||
|
``nova-api`` and ``nova-scheduler`` services.
|
||||||
|
|
||||||
|
CONF.node_availability_zone still works but is deprecated.
|
||||||
|
|
||||||
|
Scalable Hardware
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
While several resources already exist to help with deploying and
|
||||||
|
installing OpenStack, it's very important to make sure that you have
|
||||||
|
your deployment planned out ahead of time. This guide presumes that you
|
||||||
|
have at least set aside a rack for the OpenStack cloud but also offers
|
||||||
|
suggestions for when and what to scale.
|
||||||
|
|
||||||
|
Hardware Procurement
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
“The Cloud” has been described as a volatile environment where servers
|
||||||
|
can be created and terminated at will. While this may be true, it does
|
||||||
|
not mean that your servers must be volatile. Ensuring that your cloud's
|
||||||
|
hardware is stable and configured correctly means that your cloud
|
||||||
|
environment remains up and running. Basically, put effort into creating
|
||||||
|
a stable hardware environment so that you can host a cloud that users
|
||||||
|
may treat as unstable and volatile.
|
||||||
|
|
||||||
|
OpenStack can be deployed on any hardware supported by an
|
||||||
|
OpenStack-compatible Linux distribution.
|
||||||
|
|
||||||
|
Hardware does not have to be consistent, but it should at least have the
|
||||||
|
same type of CPU to support instance migration.
|
||||||
|
|
||||||
|
The typical hardware recommended for use with OpenStack is the standard
|
||||||
|
value-for-money offerings that most hardware vendors stock. It should be
|
||||||
|
straightforward to divide your procurement into building blocks such as
|
||||||
|
"compute," "object storage," and "cloud controller," and request as many
|
||||||
|
of these as you need. Alternatively, should you be unable to spend more,
|
||||||
|
if you have existing servers—provided they meet your performance
|
||||||
|
requirements and virtualization technology—they are quite likely to be
|
||||||
|
able to support OpenStack.
|
||||||
|
|
||||||
|
Capacity Planning
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
OpenStack is designed to increase in size in a straightforward manner.
|
||||||
|
Taking into account the considerations that we've mentioned in this
|
||||||
|
chapter—particularly on the sizing of the cloud controller—it should be
|
||||||
|
possible to procure additional compute or object storage nodes as
|
||||||
|
needed. New nodes do not need to be the same specification, or even
|
||||||
|
vendor, as existing nodes.
|
||||||
|
|
||||||
|
For compute nodes, ``nova-scheduler`` will take care of differences in
|
||||||
|
sizing having to do with core count and RAM amounts; however, you should
|
||||||
|
consider that the user experience changes with differing CPU speeds.
|
||||||
|
When adding object storage nodes, a weight should be specified that
|
||||||
|
reflects the capability of the node.
|
||||||
|
|
||||||
|
Monitoring the resource usage and user growth will enable you to know
|
||||||
|
when to procure. :doc:`ops_logging_monitoring` details some useful metrics.
|
||||||
|
|
||||||
|
Burn-in Testing
|
||||||
|
---------------
|
||||||
|
|
||||||
|
The chances of failure for the server's hardware are high at the start
|
||||||
|
and the end of its life. As a result, dealing with hardware failures
|
||||||
|
while in production can be avoided by appropriate burn-in testing to
|
||||||
|
attempt to trigger the early-stage failures. The general principle is to
|
||||||
|
stress the hardware to its limits. Examples of burn-in tests include
|
||||||
|
running a CPU or disk benchmark for several days.
|
||||||
|
|
|
@ -0,0 +1,521 @@
|
||||||
|
=================
|
||||||
|
Storage Decisions
|
||||||
|
=================
|
||||||
|
|
||||||
|
Storage is found in many parts of the OpenStack stack, and the differing
|
||||||
|
types can cause confusion to even experienced cloud engineers. This
|
||||||
|
section focuses on persistent storage options you can configure with
|
||||||
|
your cloud. It's important to understand the distinction between
|
||||||
|
:term:`ephemeral <ephemeral volume>` storage and
|
||||||
|
:term:`persistent <persistent volume>` storage.
|
||||||
|
|
||||||
|
Ephemeral Storage
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
If you deploy only the OpenStack :term:`Compute service` (nova), your users do
|
||||||
|
not have access to any form of persistent storage by default. The disks
|
||||||
|
associated with VMs are "ephemeral," meaning that (from the user's point
|
||||||
|
of view) they effectively disappear when a virtual machine is
|
||||||
|
terminated.
|
||||||
|
|
||||||
|
Persistent Storage
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Persistent storage means that the storage resource outlives any other
|
||||||
|
resource and is always available, regardless of the state of a running
|
||||||
|
instance.
|
||||||
|
|
||||||
|
Today, OpenStack clouds explicitly support three types of persistent
|
||||||
|
storage: *object storage*, *block storage*, and *file system storage*.
|
||||||
|
|
||||||
|
Object Storage
|
||||||
|
--------------
|
||||||
|
|
||||||
|
With object storage, users access binary objects through a REST API. You
|
||||||
|
may be familiar with Amazon S3, which is a well-known example of an
|
||||||
|
object storage system. Object storage is implemented in OpenStack by the
|
||||||
|
OpenStack Object Storage (swift) project. If your intended users need to
|
||||||
|
archive or manage large datasets, you want to provide them with object
|
||||||
|
storage. In addition, OpenStack can store your virtual machine (VM)
|
||||||
|
images inside of an object storage system, as an alternative to storing
|
||||||
|
the images on a file system.
|
||||||
|
|
||||||
|
OpenStack Object Storage provides a highly scalable, highly available
|
||||||
|
storage solution by relaxing some of the constraints of traditional file
|
||||||
|
systems. In designing and procuring for such a cluster, it is important
|
||||||
|
to understand some key concepts about its operation. Essentially, this
|
||||||
|
type of storage is built on the idea that all storage hardware fails, at
|
||||||
|
every level, at some point. Infrequently encountered failures that would
|
||||||
|
hamstring other storage systems, such as issues taking down RAID cards
|
||||||
|
or entire servers, are handled gracefully with OpenStack Object
|
||||||
|
Storage.
|
||||||
|
|
||||||
|
A good document describing the Object Storage architecture is found
|
||||||
|
within the `developer
|
||||||
|
documentation <http://docs.openstack.org/developer/swift/overview_architecture.html>`_
|
||||||
|
— read this first. Once you understand the architecture, you should know what a
|
||||||
|
proxy server does and how zones work. However, some important points are
|
||||||
|
often missed at first glance.
|
||||||
|
|
||||||
|
When designing your cluster, you must consider durability and
|
||||||
|
availability. Understand that the predominant source of these is the
|
||||||
|
spread and placement of your data, rather than the reliability of the
|
||||||
|
hardware. Consider the default value of the number of replicas, which is
|
||||||
|
three. This means that before an object is marked as having been
|
||||||
|
written, at least two copies exist—in case a single server fails to
|
||||||
|
write, the third copy may or may not yet exist when the write operation
|
||||||
|
initially returns. Altering this number increases the robustness of your
|
||||||
|
data, but reduces the amount of storage you have available. Next, look
|
||||||
|
at the placement of your servers. Consider spreading them widely
|
||||||
|
throughout your data center's network and power-failure zones. Is a zone
|
||||||
|
a rack, a server, or a disk?
|
||||||
|
|
||||||
|
Object Storage's network patterns might seem unfamiliar at first.
|
||||||
|
Consider these main traffic flows:
|
||||||
|
|
||||||
|
- Among :term:`object`, :term:`container`, and
|
||||||
|
:term:`account servers <account server>`
|
||||||
|
|
||||||
|
- Between those servers and the proxies
|
||||||
|
|
||||||
|
- Between the proxies and your users
|
||||||
|
|
||||||
|
Object Storage is very "chatty" among servers hosting data—even a small
|
||||||
|
cluster does megabytes/second of traffic, which is predominantly, “Do
|
||||||
|
you have the object?”/“Yes I have the object!” Of course, if the answer
|
||||||
|
to the aforementioned question is negative or the request times out,
|
||||||
|
replication of the object begins.
|
||||||
|
|
||||||
|
Consider the scenario where an entire server fails and 24 TB of data
|
||||||
|
needs to be transferred "immediately" to remain at three copies—this can
|
||||||
|
put significant load on the network.
|
||||||
|
|
||||||
|
Another fact that's often forgotten is that when a new file is being
|
||||||
|
uploaded, the proxy server must write out as many streams as there are
|
||||||
|
replicas—giving a multiple of network traffic. For a three-replica
|
||||||
|
cluster, 10 Gbps in means 30 Gbps out. Combining this with the previous
|
||||||
|
high bandwidth bandwidth private vs. public network recommendations
|
||||||
|
demands of replication is what results in the recommendation that your
|
||||||
|
private network be of significantly higher bandwidth than your public
|
||||||
|
need be. Oh, and OpenStack Object Storage communicates internally with
|
||||||
|
unencrypted, unauthenticated rsync for performance—you do want the
|
||||||
|
private network to be private.
|
||||||
|
|
||||||
|
The remaining point on bandwidth is the public-facing portion. The
|
||||||
|
``swift-proxy`` service is stateless, which means that you can easily
|
||||||
|
add more and use HTTP load-balancing methods to share bandwidth and
|
||||||
|
availability between them.
|
||||||
|
|
||||||
|
More proxies means more bandwidth, if your storage can keep up.
|
||||||
|
|
||||||
|
Block Storage
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Block storage (sometimes referred to as volume storage) provides users
|
||||||
|
with access to block-storage devices. Users interact with block storage
|
||||||
|
by attaching volumes to their running VM instances.
|
||||||
|
|
||||||
|
These volumes are persistent: they can be detached from one instance and
|
||||||
|
re-attached to another, and the data remains intact. Block storage is
|
||||||
|
implemented in OpenStack by the OpenStack Block Storage (cinder)
|
||||||
|
project, which supports multiple back ends in the form of drivers. Your
|
||||||
|
choice of a storage back end must be supported by a Block Storage
|
||||||
|
driver.
|
||||||
|
|
||||||
|
Most block storage drivers allow the instance to have direct access to
|
||||||
|
the underlying storage hardware's block device. This helps increase the
|
||||||
|
overall read/write IO. However, support for utilizing files as volumes
|
||||||
|
is also well established, with full support for NFS, GlusterFS and
|
||||||
|
others.
|
||||||
|
|
||||||
|
These drivers work a little differently than a traditional "block"
|
||||||
|
storage driver. On an NFS or GlusterFS file system, a single file is
|
||||||
|
created and then mapped as a "virtual" volume into the instance. This
|
||||||
|
mapping/translation is similar to how OpenStack utilizes QEMU's
|
||||||
|
file-based virtual machines stored in ``/var/lib/nova/instances``.
|
||||||
|
|
||||||
|
Shared File Systems Service
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
The Shared File Systems service provides a set of services for
|
||||||
|
management of Shared File Systems in a multi-tenant cloud environment.
|
||||||
|
Users interact with Shared File Systems service by mounting remote File
|
||||||
|
Systems on their instances with the following usage of those systems for
|
||||||
|
file storing and exchange. Shared File Systems service provides you with
|
||||||
|
shares. A share is a remote, mountable file system. You can mount a
|
||||||
|
share to and access a share from several hosts by several users at a
|
||||||
|
time. With shares, user can also:
|
||||||
|
|
||||||
|
- Create a share specifying its size, shared file system protocol,
|
||||||
|
visibility level
|
||||||
|
|
||||||
|
- Create a share on either a share server or standalone, depending on
|
||||||
|
the selected back-end mode, with or without using a share network.
|
||||||
|
|
||||||
|
- Specify access rules and security services for existing shares.
|
||||||
|
|
||||||
|
- Combine several shares in groups to keep data consistency inside the
|
||||||
|
groups for the following safe group operations.
|
||||||
|
|
||||||
|
- Create a snapshot of a selected share or a share group for storing
|
||||||
|
the existing shares consistently or creating new shares from that
|
||||||
|
snapshot in a consistent way
|
||||||
|
|
||||||
|
- Create a share from a snapshot.
|
||||||
|
|
||||||
|
- Set rate limits and quotas for specific shares and snapshots
|
||||||
|
|
||||||
|
- View usage of share resources
|
||||||
|
|
||||||
|
- Remove shares.
|
||||||
|
|
||||||
|
Like Block Storage, the Shared File Systems service is persistent. It
|
||||||
|
can be:
|
||||||
|
|
||||||
|
- Mounted to any number of client machines.
|
||||||
|
|
||||||
|
- Detached from one instance and attached to another without data loss.
|
||||||
|
During this process the data are safe unless the Shared File Systems
|
||||||
|
service itself is changed or removed.
|
||||||
|
|
||||||
|
Shares are provided by the Shared File Systems service. In OpenStack,
|
||||||
|
Shared File Systems service is implemented by Shared File System
|
||||||
|
(manila) project, which supports multiple back-ends in the form of
|
||||||
|
drivers. The Shared File Systems service can be configured to provision
|
||||||
|
shares from one or more back-ends. Share servers are, mostly, virtual
|
||||||
|
machines that export file shares via different protocols such as NFS,
|
||||||
|
CIFS, GlusterFS, or HDFS.
|
||||||
|
|
||||||
|
OpenStack Storage Concepts
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The table below explains the different storage concepts provided by OpenStack.
|
||||||
|
|
||||||
|
.. list-table:: OpenStack storage
|
||||||
|
:widths: 20 20 20 20 20
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* -
|
||||||
|
- Ephemeral storage
|
||||||
|
- Block storage
|
||||||
|
- Object storage
|
||||||
|
- Shared File System storage
|
||||||
|
* - Used to…
|
||||||
|
- Run operating system and scratch space
|
||||||
|
- Add additional persistent storage to a virtual machine (VM)
|
||||||
|
- Store data, including VM images
|
||||||
|
- Add additional persistent storage to a virtual machine
|
||||||
|
* - Accessed through…
|
||||||
|
- A file system
|
||||||
|
- A block device that can be partitioned, formatted, and mounted
|
||||||
|
(such as, /dev/vdc)
|
||||||
|
- The REST API
|
||||||
|
- A Shared File Systems service share (either manila managed or an
|
||||||
|
external one registered in manila) that can be partitioned, formatted
|
||||||
|
and mounted (such as /dev/vdc)
|
||||||
|
* - Accessible from…
|
||||||
|
- Within a VM
|
||||||
|
- Within a VM
|
||||||
|
- Anywhere
|
||||||
|
- Within a VM
|
||||||
|
* - Managed by…
|
||||||
|
- OpenStack Compute (nova)
|
||||||
|
- OpenStack Block Storage (cinder)
|
||||||
|
- OpenStack Object Storage (swift)
|
||||||
|
- OpenStack Shared File System Storage (manila)
|
||||||
|
* - Persists until…
|
||||||
|
- VM is terminated
|
||||||
|
- Deleted by user
|
||||||
|
- Deleted by user
|
||||||
|
- Deleted by user
|
||||||
|
* - Sizing determined by…
|
||||||
|
- Administrator configuration of size settings, known as *flavors*
|
||||||
|
- User specification in initial request
|
||||||
|
- Amount of available physical storage
|
||||||
|
- * User specification in initial request
|
||||||
|
* Requests for extension
|
||||||
|
* Available user-level quotes
|
||||||
|
* Limitations applied by Administrator
|
||||||
|
* - Encryption set by…
|
||||||
|
- Parameter in nova.conf
|
||||||
|
- Admin establishing `encrypted volume type
|
||||||
|
<http://docs.openstack.org/admin-guide/dashboard_manage_volumes.html>`_,
|
||||||
|
then user selecting encrypted volume
|
||||||
|
- Not yet available
|
||||||
|
- Shared File Systems service does not apply any additional encryption
|
||||||
|
above what the share’s back-end storage provides
|
||||||
|
* - Example of typical usage…
|
||||||
|
- 10 GB first disk, 30 GB second disk
|
||||||
|
- 1 TB disk
|
||||||
|
- 10s of TBs of dataset storage
|
||||||
|
- Depends completely on the size of back-end storage specified when
|
||||||
|
a share was being created. In case of thin provisioning it can be
|
||||||
|
partial space reservation (for more details see
|
||||||
|
`Capabilities and Extra-Specs
|
||||||
|
<http://docs.openstack.org/developer/manila/devref/capabilities_and_extra_specs.html?highlight=extra%20specs#common-capabilities>`_
|
||||||
|
specification)
|
||||||
|
|
||||||
|
|
||||||
|
With file-level storage, users access stored data using the operating
|
||||||
|
system's file system interface. Most users, if they have used a network
|
||||||
|
storage solution before, have encountered this form of networked
|
||||||
|
storage. In the Unix world, the most common form of this is NFS. In the
|
||||||
|
Windows world, the most common form is called CIFS (previously,
|
||||||
|
SMB).
|
||||||
|
|
||||||
|
OpenStack clouds do not present file-level storage to end users.
|
||||||
|
However, it is important to consider file-level storage for storing
|
||||||
|
instances under ``/var/lib/nova/instances`` when designing your cloud,
|
||||||
|
since you must have a shared file system if you want to support live
|
||||||
|
migration.
|
||||||
|
|
||||||
|
Choosing Storage Back Ends
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Users will indicate different needs for their cloud use cases. Some may
|
||||||
|
need fast access to many objects that do not change often, or want to
|
||||||
|
set a time-to-live (TTL) value on a file. Others may access only storage
|
||||||
|
that is mounted with the file system itself, but want it to be
|
||||||
|
replicated instantly when starting a new instance. For other systems,
|
||||||
|
ephemeral storage—storage that is released when a VM attached to it is
|
||||||
|
shut down— is the preferred way. When you select
|
||||||
|
:term:`storage back ends <storage back end>`,
|
||||||
|
ask the following questions on behalf of your users:
|
||||||
|
|
||||||
|
- Do my users need block storage?
|
||||||
|
|
||||||
|
- Do my users need object storage?
|
||||||
|
|
||||||
|
- Do I need to support live migration?
|
||||||
|
|
||||||
|
- Should my persistent storage drives be contained in my compute nodes,
|
||||||
|
or should I use external storage?
|
||||||
|
|
||||||
|
- What is the platter count I can achieve? Do more spindles result in
|
||||||
|
better I/O despite network access?
|
||||||
|
|
||||||
|
- Which one results in the best cost-performance scenario I'm aiming
|
||||||
|
for?
|
||||||
|
|
||||||
|
- How do I manage the storage operationally?
|
||||||
|
|
||||||
|
- How redundant and distributed is the storage? What happens if a
|
||||||
|
storage node fails? To what extent can it mitigate my data-loss
|
||||||
|
disaster scenarios?
|
||||||
|
|
||||||
|
To deploy your storage by using only commodity hardware, you can use a
|
||||||
|
number of open-source packages, as shown in the following table.
|
||||||
|
|
||||||
|
.. list-table:: Persistent file-based storage support
|
||||||
|
:widths: 25 25 25 25
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* -
|
||||||
|
- Object
|
||||||
|
- Block
|
||||||
|
- File-level
|
||||||
|
* - Swift
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
-
|
||||||
|
-
|
||||||
|
* - LVM
|
||||||
|
-
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
-
|
||||||
|
* - Ceph
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
- Experimental
|
||||||
|
* - Gluster
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
* - NFS
|
||||||
|
-
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
* - ZFS
|
||||||
|
-
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
-
|
||||||
|
* - Sheepdog
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
- .. image:: figures/Check_mark_23x20_02.png
|
||||||
|
:width: 30%
|
||||||
|
-
|
||||||
|
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This list of open source file-level shared storage solutions is not
|
||||||
|
exhaustive; other open source solutions exist (MooseFS). Your
|
||||||
|
organization may already have deployed a file-level shared storage
|
||||||
|
solution that you can use.
|
||||||
|
|
||||||
|
**Storage Driver Support**
|
||||||
|
|
||||||
|
In addition to the open source technologies, there are a number of
|
||||||
|
proprietary solutions that are officially supported by OpenStack Block
|
||||||
|
Storage. They are offered by the following vendors:
|
||||||
|
|
||||||
|
- IBM (Storwize family/SVC, XIV)
|
||||||
|
|
||||||
|
- NetApp
|
||||||
|
|
||||||
|
- Nexenta
|
||||||
|
|
||||||
|
- SolidFire
|
||||||
|
|
||||||
|
You can find a matrix of the functionality provided by all of the
|
||||||
|
supported Block Storage drivers on the `OpenStack
|
||||||
|
wiki <https://wiki.openstack.org/wiki/CinderSupportMatrix>`_.
|
||||||
|
|
||||||
|
Also, you need to decide whether you want to support object storage in
|
||||||
|
your cloud. The two common use cases for providing object storage in a
|
||||||
|
compute cloud are:
|
||||||
|
|
||||||
|
- To provide users with a persistent storage mechanism
|
||||||
|
|
||||||
|
- As a scalable, reliable data store for virtual machine images
|
||||||
|
|
||||||
|
Commodity Storage Back-end Technologies
|
||||||
|
---------------------------------------
|
||||||
|
|
||||||
|
This section provides a high-level overview of the differences among the
|
||||||
|
different commodity storage back end technologies. Depending on your
|
||||||
|
cloud user's needs, you can implement one or many of these technologies
|
||||||
|
in different combinations:
|
||||||
|
|
||||||
|
OpenStack Object Storage (swift)
|
||||||
|
The official OpenStack Object Store implementation. It is a mature
|
||||||
|
technology that has been used for several years in production by
|
||||||
|
Rackspace as the technology behind Rackspace Cloud Files. As it is
|
||||||
|
highly scalable, it is well-suited to managing petabytes of storage.
|
||||||
|
OpenStack Object Storage's advantages are better integration with
|
||||||
|
OpenStack (integrates with OpenStack Identity, works with the
|
||||||
|
OpenStack dashboard interface) and better support for multiple data
|
||||||
|
center deployment through support of asynchronous eventual
|
||||||
|
consistency replication.
|
||||||
|
|
||||||
|
Therefore, if you eventually plan on distributing your storage
|
||||||
|
cluster across multiple data centers, if you need unified accounts
|
||||||
|
for your users for both compute and object storage, or if you want
|
||||||
|
to control your object storage with the OpenStack dashboard, you
|
||||||
|
should consider OpenStack Object Storage. More detail can be found
|
||||||
|
about OpenStack Object Storage in the section below.
|
||||||
|
|
||||||
|
Ceph
|
||||||
|
A scalable storage solution that replicates data across commodity
|
||||||
|
storage nodes. Ceph was originally developed by one of the founders
|
||||||
|
of DreamHost and is currently used in production there.
|
||||||
|
|
||||||
|
Ceph was designed to expose different types of storage interfaces to
|
||||||
|
the end user: it supports object storage, block storage, and
|
||||||
|
file-system interfaces, although the file-system interface is not
|
||||||
|
yet considered production-ready. Ceph supports the same API as swift
|
||||||
|
for object storage and can be used as a back end for cinder block
|
||||||
|
storage as well as back-end storage for glance images. Ceph supports
|
||||||
|
"thin provisioning," implemented using copy-on-write.
|
||||||
|
|
||||||
|
This can be useful when booting from volume because a new volume can
|
||||||
|
be provisioned very quickly. Ceph also supports keystone-based
|
||||||
|
authentication (as of version 0.56), so it can be a seamless swap in
|
||||||
|
for the default OpenStack swift implementation.
|
||||||
|
|
||||||
|
Ceph's advantages are that it gives the administrator more
|
||||||
|
fine-grained control over data distribution and replication
|
||||||
|
strategies, enables you to consolidate your object and block
|
||||||
|
storage, enables very fast provisioning of boot-from-volume
|
||||||
|
instances using thin provisioning, and supports a distributed
|
||||||
|
file-system interface, though this interface is `not yet
|
||||||
|
recommended <http://ceph.com/docs/master/cephfs/>`_ for use in
|
||||||
|
production deployment by the Ceph project.
|
||||||
|
|
||||||
|
If you want to manage your object and block storage within a single
|
||||||
|
system, or if you want to support fast boot-from-volume, you should
|
||||||
|
consider Ceph.
|
||||||
|
|
||||||
|
Gluster
|
||||||
|
A distributed, shared file system. As of Gluster version 3.3, you
|
||||||
|
can use Gluster to consolidate your object storage and file storage
|
||||||
|
into one unified file and object storage solution, which is called
|
||||||
|
Gluster For OpenStack (GFO). GFO uses a customized version of swift
|
||||||
|
that enables Gluster to be used as the back-end storage.
|
||||||
|
|
||||||
|
The main reason to use GFO rather than regular swift is if you also
|
||||||
|
want to support a distributed file system, either to support shared
|
||||||
|
storage live migration or to provide it as a separate service to
|
||||||
|
your end users. If you want to manage your object and file storage
|
||||||
|
within a single system, you should consider GFO.
|
||||||
|
|
||||||
|
LVM
|
||||||
|
The Logical Volume Manager is a Linux-based system that provides an
|
||||||
|
abstraction layer on top of physical disks to expose logical volumes
|
||||||
|
to the operating system. The LVM back-end implements block storage
|
||||||
|
as LVM logical partitions.
|
||||||
|
|
||||||
|
On each host that will house block storage, an administrator must
|
||||||
|
initially create a volume group dedicated to Block Storage volumes.
|
||||||
|
Blocks are created from LVM logical volumes.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
LVM does *not* provide any replication. Typically,
|
||||||
|
administrators configure RAID on nodes that use LVM as block
|
||||||
|
storage to protect against failures of individual hard drives.
|
||||||
|
However, RAID does not protect against a failure of the entire
|
||||||
|
host.
|
||||||
|
|
||||||
|
ZFS
|
||||||
|
The Solaris iSCSI driver for OpenStack Block Storage implements
|
||||||
|
blocks as ZFS entities. ZFS is a file system that also has the
|
||||||
|
functionality of a volume manager. This is unlike on a Linux system,
|
||||||
|
where there is a separation of volume manager (LVM) and file system
|
||||||
|
(such as, ext3, ext4, xfs, and btrfs). ZFS has a number of
|
||||||
|
advantages over ext4, including improved data-integrity checking.
|
||||||
|
|
||||||
|
The ZFS back end for OpenStack Block Storage supports only
|
||||||
|
Solaris-based systems, such as Illumos. While there is a Linux port
|
||||||
|
of ZFS, it is not included in any of the standard Linux
|
||||||
|
distributions, and it has not been tested with OpenStack Block
|
||||||
|
Storage. As with LVM, ZFS does not provide replication across hosts
|
||||||
|
on its own; you need to add a replication solution on top of ZFS if
|
||||||
|
your cloud needs to be able to handle storage-node failures.
|
||||||
|
|
||||||
|
We don't recommend ZFS unless you have previous experience with
|
||||||
|
deploying it, since the ZFS back end for Block Storage requires a
|
||||||
|
Solaris-based operating system, and we assume that your experience
|
||||||
|
is primarily with Linux-based systems.
|
||||||
|
|
||||||
|
Sheepdog
|
||||||
|
Sheepdog is a userspace distributed storage system. Sheepdog scales
|
||||||
|
to several hundred nodes, and has powerful virtual disk management
|
||||||
|
features like snapshot, cloning, rollback, thin provisioning.
|
||||||
|
|
||||||
|
It is essentially an object storage system that manages disks and
|
||||||
|
aggregates the space and performance of disks linearly in hyper
|
||||||
|
scale on commodity hardware in a smart way. On top of its object
|
||||||
|
store, Sheepdog provides elastic volume service and http service.
|
||||||
|
Sheepdog does not assume anything about kernel version and can work
|
||||||
|
nicely with xattr-supported file systems.
|
||||||
|
|
||||||
|
Conclusion
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
We hope that you now have some considerations in mind and questions to
|
||||||
|
ask your future cloud users about their storage use cases. As you can
|
||||||
|
see, your storage decisions will also influence your network design for
|
||||||
|
performance and security needs. Continue with us to make more informed
|
||||||
|
decisions about your OpenStack cloud design.
|
||||||
|
|
|
@ -0,0 +1,52 @@
|
||||||
|
============
|
||||||
|
Architecture
|
||||||
|
============
|
||||||
|
|
||||||
|
Designing an OpenStack cloud is a great achievement. It requires a
|
||||||
|
robust understanding of the requirements and needs of the cloud's users
|
||||||
|
to determine the best possible configuration to meet them. OpenStack
|
||||||
|
provides a great deal of flexibility to achieve your needs, and this
|
||||||
|
part of the book aims to shine light on many of the decisions you need
|
||||||
|
to make during the process.
|
||||||
|
|
||||||
|
To design, deploy, and configure OpenStack, administrators must
|
||||||
|
understand the logical architecture. A diagram can help you envision all
|
||||||
|
the integrated services within OpenStack and how they interact with each
|
||||||
|
other.
|
||||||
|
|
||||||
|
OpenStack modules are one of the following types:
|
||||||
|
|
||||||
|
Daemon
|
||||||
|
Runs as a background process. On Linux platforms, a daemon is usually
|
||||||
|
installed as a service.
|
||||||
|
|
||||||
|
Script
|
||||||
|
Installs a virtual environment and runs tests.
|
||||||
|
|
||||||
|
Command-line interface (CLI)
|
||||||
|
Enables users to submit API calls to OpenStack services through commands.
|
||||||
|
|
||||||
|
As shown, end users can interact through the dashboard, CLIs, and APIs.
|
||||||
|
All services authenticate through a common Identity service, and
|
||||||
|
individual services interact with each other through public APIs, except
|
||||||
|
where privileged administrator commands are necessary.
|
||||||
|
:ref:`logical_architecture` shows the most common, but not the only logical
|
||||||
|
architecture for an OpenStack cloud.
|
||||||
|
|
||||||
|
.. _logical_architecture:
|
||||||
|
|
||||||
|
.. figure:: figures/osog_0001.png
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
Figure. OpenStack Logical Architecture
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
arch_examples.rst
|
||||||
|
arch_provision.rst
|
||||||
|
arch_cloud_controller.rst
|
||||||
|
arch_compute_nodes.rst
|
||||||
|
arch_scaling.rst
|
||||||
|
arch_storage.rst
|
||||||
|
arch_network_design.rst
|
|
@ -0,0 +1 @@
|
||||||
|
../../common
|
|
@ -0,0 +1,290 @@
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# This file is execfile()d with the current directory set to its
|
||||||
|
# containing dir.
|
||||||
|
#
|
||||||
|
# Note that not all possible configuration values are present in this
|
||||||
|
# autogenerated file.
|
||||||
|
#
|
||||||
|
# All configuration values have a default; values that are commented out
|
||||||
|
# serve to show the default.
|
||||||
|
|
||||||
|
import os
|
||||||
|
# import sys
|
||||||
|
|
||||||
|
import openstackdocstheme
|
||||||
|
|
||||||
|
# If extensions (or modules to document with autodoc) are in another directory,
|
||||||
|
# add these directories to sys.path here. If the directory is relative to the
|
||||||
|
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||||
|
# sys.path.insert(0, os.path.abspath('.'))
|
||||||
|
|
||||||
|
# -- General configuration ------------------------------------------------
|
||||||
|
|
||||||
|
# If your documentation needs a minimal Sphinx version, state it here.
|
||||||
|
# needs_sphinx = '1.0'
|
||||||
|
|
||||||
|
# Add any Sphinx extension module names here, as strings. They can be
|
||||||
|
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||||
|
# ones.
|
||||||
|
extensions = []
|
||||||
|
|
||||||
|
# Add any paths that contain templates here, relative to this directory.
|
||||||
|
# templates_path = ['_templates']
|
||||||
|
|
||||||
|
# The suffix of source filenames.
|
||||||
|
source_suffix = '.rst'
|
||||||
|
|
||||||
|
# The encoding of source files.
|
||||||
|
# source_encoding = 'utf-8-sig'
|
||||||
|
|
||||||
|
# The master toctree document.
|
||||||
|
master_doc = 'index'
|
||||||
|
|
||||||
|
# General information about the project.
|
||||||
|
project = u'Operations Guide'
|
||||||
|
bug_tag = u'ops-guide'
|
||||||
|
copyright = u'2016, OpenStack contributors'
|
||||||
|
|
||||||
|
# The version info for the project you're documenting, acts as replacement for
|
||||||
|
# |version| and |release|, also used in various other places throughout the
|
||||||
|
# built documents.
|
||||||
|
#
|
||||||
|
# The short X.Y version.
|
||||||
|
version = '0.0.1'
|
||||||
|
# The full version, including alpha/beta/rc tags.
|
||||||
|
release = '0.0.1'
|
||||||
|
|
||||||
|
# A few variables have to be set for the log-a-bug feature.
|
||||||
|
# giturl: The location of conf.py on Git. Must be set manually.
|
||||||
|
# gitsha: The SHA checksum of the bug description. Automatically extracted from git log.
|
||||||
|
# bug_tag: Tag for categorizing the bug. Must be set manually.
|
||||||
|
# These variables are passed to the logabug code via html_context.
|
||||||
|
giturl = u'http://git.openstack.org/cgit/openstack/openstack-manuals/tree/doc/ops-guide/source'
|
||||||
|
git_cmd = "/usr/bin/git log | head -n1 | cut -f2 -d' '"
|
||||||
|
gitsha = os.popen(git_cmd).read().strip('\n')
|
||||||
|
html_context = {"gitsha": gitsha, "bug_tag": bug_tag,
|
||||||
|
"giturl": giturl}
|
||||||
|
|
||||||
|
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||||
|
# for a list of supported languages.
|
||||||
|
# language = None
|
||||||
|
|
||||||
|
# There are two options for replacing |today|: either, you set today to some
|
||||||
|
# non-false value, then it is used:
|
||||||
|
# today = ''
|
||||||
|
# Else, today_fmt is used as the format for a strftime call.
|
||||||
|
# today_fmt = '%B %d, %Y'
|
||||||
|
|
||||||
|
# List of patterns, relative to source directory, that match files and
|
||||||
|
# directories to ignore when looking for source files.
|
||||||
|
exclude_patterns = ['common/cli*', 'common/nova*',
|
||||||
|
'common/get_started*', 'common/dashboard*']
|
||||||
|
|
||||||
|
# The reST default role (used for this markup: `text`) to use for all
|
||||||
|
# documents.
|
||||||
|
# default_role = None
|
||||||
|
|
||||||
|
# If true, '()' will be appended to :func: etc. cross-reference text.
|
||||||
|
# add_function_parentheses = True
|
||||||
|
|
||||||
|
# If true, the current module name will be prepended to all description
|
||||||
|
# unit titles (such as .. function::).
|
||||||
|
# add_module_names = True
|
||||||
|
|
||||||
|
# If true, sectionauthor and moduleauthor directives will be shown in the
|
||||||
|
# output. They are ignored by default.
|
||||||
|
# show_authors = False
|
||||||
|
|
||||||
|
# The name of the Pygments (syntax highlighting) style to use.
|
||||||
|
pygments_style = 'sphinx'
|
||||||
|
|
||||||
|
# A list of ignored prefixes for module index sorting.
|
||||||
|
# modindex_common_prefix = []
|
||||||
|
|
||||||
|
# If true, keep warnings as "system message" paragraphs in the built documents.
|
||||||
|
# keep_warnings = False
|
||||||
|
|
||||||
|
|
||||||
|
# -- Options for HTML output ----------------------------------------------
|
||||||
|
|
||||||
|
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||||
|
# a list of builtin themes.
|
||||||
|
html_theme = 'openstackdocs'
|
||||||
|
|
||||||
|
# Theme options are theme-specific and customize the look and feel of a theme
|
||||||
|
# further. For a list of options available for each theme, see the
|
||||||
|
# documentation.
|
||||||
|
# html_theme_options = {}
|
||||||
|
|
||||||
|
# Add any paths that contain custom themes here, relative to this directory.
|
||||||
|
html_theme_path = [openstackdocstheme.get_html_theme_path()]
|
||||||
|
|
||||||
|
# The name for this set of Sphinx documents. If None, it defaults to
|
||||||
|
# "<project> v<release> documentation".
|
||||||
|
# html_title = None
|
||||||
|
|
||||||
|
# A shorter title for the navigation bar. Default is the same as html_title.
|
||||||
|
# html_short_title = None
|
||||||
|
|
||||||
|
# The name of an image file (relative to this directory) to place at the top
|
||||||
|
# of the sidebar.
|
||||||
|
# html_logo = None
|
||||||
|
|
||||||
|
# The name of an image file (within the static path) to use as favicon of the
|
||||||
|
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
|
||||||
|
# pixels large.
|
||||||
|
# html_favicon = None
|
||||||
|
|
||||||
|
# Add any paths that contain custom static files (such as style sheets) here,
|
||||||
|
# relative to this directory. They are copied after the builtin static files,
|
||||||
|
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||||
|
# html_static_path = []
|
||||||
|
|
||||||
|
# Add any extra paths that contain custom files (such as robots.txt or
|
||||||
|
# .htaccess) here, relative to this directory. These files are copied
|
||||||
|
# directly to the root of the documentation.
|
||||||
|
# html_extra_path = []
|
||||||
|
|
||||||
|
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
|
||||||
|
# using the given strftime format.
|
||||||
|
# So that we can enable "log-a-bug" links from each output HTML page, this
|
||||||
|
# variable must be set to a format that includes year, month, day, hours and
|
||||||
|
# minutes.
|
||||||
|
html_last_updated_fmt = '%Y-%m-%d %H:%M'
|
||||||
|
|
||||||
|
# If true, SmartyPants will be used to convert quotes and dashes to
|
||||||
|
# typographically correct entities.
|
||||||
|
# html_use_smartypants = True
|
||||||
|
|
||||||
|
# Custom sidebar templates, maps document names to template names.
|
||||||
|
# html_sidebars = {}
|
||||||
|
|
||||||
|
# Additional templates that should be rendered to pages, maps page names to
|
||||||
|
# template names.
|
||||||
|
# html_additional_pages = {}
|
||||||
|
|
||||||
|
# If false, no module index is generated.
|
||||||
|
# html_domain_indices = True
|
||||||
|
|
||||||
|
# If false, no index is generated.
|
||||||
|
html_use_index = False
|
||||||
|
|
||||||
|
# If true, the index is split into individual pages for each letter.
|
||||||
|
# html_split_index = False
|
||||||
|
|
||||||
|
# If true, links to the reST sources are added to the pages.
|
||||||
|
html_show_sourcelink = False
|
||||||
|
|
||||||
|
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
|
||||||
|
# html_show_sphinx = True
|
||||||
|
|
||||||
|
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
|
||||||
|
# html_show_copyright = True
|
||||||
|
|
||||||
|
# If true, an OpenSearch description file will be output, and all pages will
|
||||||
|
# contain a <link> tag referring to it. The value of this option must be the
|
||||||
|
# base URL from which the finished HTML is served.
|
||||||
|
# html_use_opensearch = ''
|
||||||
|
|
||||||
|
# This is the file name suffix for HTML files (e.g. ".xhtml").
|
||||||
|
# html_file_suffix = None
|
||||||
|
|
||||||
|
# Output file base name for HTML help builder.
|
||||||
|
htmlhelp_basename = 'ops-guide'
|
||||||
|
|
||||||
|
# If true, publish source files
|
||||||
|
html_copy_source = False
|
||||||
|
|
||||||
|
# -- Options for LaTeX output ---------------------------------------------
|
||||||
|
|
||||||
|
latex_elements = {
|
||||||
|
# The paper size ('letterpaper' or 'a4paper').
|
||||||
|
# 'papersize': 'letterpaper',
|
||||||
|
|
||||||
|
# The font size ('10pt', '11pt' or '12pt').
|
||||||
|
# 'pointsize': '10pt',
|
||||||
|
|
||||||
|
# Additional stuff for the LaTeX preamble.
|
||||||
|
# 'preamble': '',
|
||||||
|
}
|
||||||
|
|
||||||
|
# Grouping the document tree into LaTeX files. List of tuples
|
||||||
|
# (source start file, target name, title,
|
||||||
|
# author, documentclass [howto, manual, or own class]).
|
||||||
|
latex_documents = [
|
||||||
|
('index', 'OpsGuide.tex', u'Operations Guide',
|
||||||
|
u'OpenStack contributors', 'manual'),
|
||||||
|
]
|
||||||
|
|
||||||
|
# The name of an image file (relative to this directory) to place at the top of
|
||||||
|
# the title page.
|
||||||
|
# latex_logo = None
|
||||||
|
|
||||||
|
# For "manual" documents, if this is true, then toplevel headings are parts,
|
||||||
|
# not chapters.
|
||||||
|
# latex_use_parts = False
|
||||||
|
|
||||||
|
# If true, show page references after internal links.
|
||||||
|
# latex_show_pagerefs = False
|
||||||
|
|
||||||
|
# If true, show URL addresses after external links.
|
||||||
|
# latex_show_urls = False
|
||||||
|
|
||||||
|
# Documents to append as an appendix to all manuals.
|
||||||
|
# latex_appendices = []
|
||||||
|
|
||||||
|
# If false, no module index is generated.
|
||||||
|
# latex_domain_indices = True
|
||||||
|
|
||||||
|
|
||||||
|
# -- Options for manual page output ---------------------------------------
|
||||||
|
|
||||||
|
# One entry per manual page. List of tuples
|
||||||
|
# (source start file, name, description, authors, manual section).
|
||||||
|
man_pages = [
|
||||||
|
('index', 'opsguide', u'Operations Guide',
|
||||||
|
[u'OpenStack contributors'], 1)
|
||||||
|
]
|
||||||
|
|
||||||
|
# If true, show URL addresses after external links.
|
||||||
|
# man_show_urls = False
|
||||||
|
|
||||||
|
|
||||||
|
# -- Options for Texinfo output -------------------------------------------
|
||||||
|
|
||||||
|
# Grouping the document tree into Texinfo files. List of tuples
|
||||||
|
# (source start file, target name, title, author,
|
||||||
|
# dir menu entry, description, category)
|
||||||
|
texinfo_documents = [
|
||||||
|
('index', 'OpsGuide', u'Operations Guide',
|
||||||
|
u'OpenStack contributors', 'OpsGuide',
|
||||||
|
'This book provides information about designing and operating '
|
||||||
|
'OpenStack clouds.', 'Miscellaneous'),
|
||||||
|
]
|
||||||
|
|
||||||
|
# Documents to append as an appendix to all manuals.
|
||||||
|
# texinfo_appendices = []
|
||||||
|
|
||||||
|
# If false, no module index is generated.
|
||||||
|
# texinfo_domain_indices = True
|
||||||
|
|
||||||
|
# How to display URL addresses: 'footnote', 'no', or 'inline'.
|
||||||
|
# texinfo_show_urls = 'footnote'
|
||||||
|
|
||||||
|
# If true, do not generate a @detailmenu in the "Top" node's menu.
|
||||||
|
# texinfo_no_detailmenu = False
|
||||||
|
|
||||||
|
# -- Options for Internationalization output ------------------------------
|
||||||
|
locale_dirs = ['locale/']
|
After Width: | Height: | Size: 3.0 KiB |
|
@ -0,0 +1,60 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||||
|
<!-- Created with Inkscape (http://www.inkscape.org/) -->
|
||||||
|
<svg
|
||||||
|
xmlns:dc="http://purl.org/dc/elements/1.1/"
|
||||||
|
xmlns:cc="http://web.resource.org/cc/"
|
||||||
|
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
|
||||||
|
xmlns:svg="http://www.w3.org/2000/svg"
|
||||||
|
xmlns="http://www.w3.org/2000/svg"
|
||||||
|
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
|
||||||
|
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
|
||||||
|
width="19.21315"
|
||||||
|
height="18.294994"
|
||||||
|
id="svg2"
|
||||||
|
sodipodi:version="0.32"
|
||||||
|
inkscape:version="0.45"
|
||||||
|
sodipodi:modified="true"
|
||||||
|
version="1.0">
|
||||||
|
<defs
|
||||||
|
id="defs4" />
|
||||||
|
<sodipodi:namedview
|
||||||
|
id="base"
|
||||||
|
pagecolor="#ffffff"
|
||||||
|
bordercolor="#666666"
|
||||||
|
borderopacity="1.0"
|
||||||
|
gridtolerance="10000"
|
||||||
|
guidetolerance="10"
|
||||||
|
objecttolerance="10"
|
||||||
|
inkscape:pageopacity="0.0"
|
||||||
|
inkscape:pageshadow="2"
|
||||||
|
inkscape:zoom="7.9195959"
|
||||||
|
inkscape:cx="17.757032"
|
||||||
|
inkscape:cy="7.298821"
|
||||||
|
inkscape:document-units="px"
|
||||||
|
inkscape:current-layer="layer1"
|
||||||
|
inkscape:window-width="984"
|
||||||
|
inkscape:window-height="852"
|
||||||
|
inkscape:window-x="148"
|
||||||
|
inkscape:window-y="66" />
|
||||||
|
<metadata
|
||||||
|
id="metadata7">
|
||||||
|
<rdf:RDF>
|
||||||
|
<cc:Work
|
||||||
|
rdf:about="">
|
||||||
|
<dc:format>image/svg+xml</dc:format>
|
||||||
|
<dc:type
|
||||||
|
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
|
||||||
|
</cc:Work>
|
||||||
|
</rdf:RDF>
|
||||||
|
</metadata>
|
||||||
|
<g
|
||||||
|
inkscape:label="Layer 1"
|
||||||
|
inkscape:groupmode="layer"
|
||||||
|
id="layer1"
|
||||||
|
transform="translate(-192.905,-516.02064)">
|
||||||
|
<path
|
||||||
|
style="fill:#000000"
|
||||||
|
d="M 197.67968,534.31563 C 197.40468,534.31208 196.21788,532.53719 195.04234,530.37143 L 192.905,526.43368 L 193.45901,525.87968 C 193.76371,525.57497 194.58269,525.32567 195.27896,525.32567 L 196.5449,525.32567 L 197.18129,527.33076 L 197.81768,529.33584 L 202.88215,523.79451 C 205.66761,520.74678 208.88522,517.75085 210.03239,517.13691 L 212.11815,516.02064 L 207.90871,520.80282 C 205.59351,523.43302 202.45735,527.55085 200.93947,529.95355 C 199.42159,532.35625 197.95468,534.31919 197.67968,534.31563 z "
|
||||||
|
id="path2223" />
|
||||||
|
</g>
|
||||||
|
</svg>
|
After Width: | Height: | Size: 2.1 KiB |
After Width: | Height: | Size: 8.9 KiB |
After Width: | Height: | Size: 69 KiB |
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 11 KiB |
After Width: | Height: | Size: 765 KiB |
After Width: | Height: | Size: 518 KiB |
After Width: | Height: | Size: 39 KiB |
After Width: | Height: | Size: 41 KiB |
After Width: | Height: | Size: 196 KiB |
After Width: | Height: | Size: 59 KiB |
After Width: | Height: | Size: 99 KiB |
After Width: | Height: | Size: 89 KiB |
After Width: | Height: | Size: 95 KiB |
After Width: | Height: | Size: 105 KiB |
After Width: | Height: | Size: 42 KiB |
After Width: | Height: | Size: 31 KiB |
After Width: | Height: | Size: 51 KiB |
After Width: | Height: | Size: 44 KiB |
After Width: | Height: | Size: 182 KiB |
After Width: | Height: | Size: 72 KiB |
After Width: | Height: | Size: 59 KiB |
|
@ -0,0 +1,26 @@
|
||||||
|
==========================
|
||||||
|
OpenStack Operations Guide
|
||||||
|
==========================
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
This book provides information about designing and operating OpenStack clouds.
|
||||||
|
|
||||||
|
|
||||||
|
Contents
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
acknowledgements.rst
|
||||||
|
preface_ops.rst
|
||||||
|
architecture.rst
|
||||||
|
operations.rst
|
||||||
|
app_usecases.rst
|
||||||
|
app_crypt.rst
|
||||||
|
app_roadmaps.rst
|
||||||
|
app_resources.rst
|
||||||
|
common/app_support.rst
|
||||||
|
common/glossary.rst
|
|
@ -0,0 +1,41 @@
|
||||||
|
==========
|
||||||
|
Operations
|
||||||
|
==========
|
||||||
|
|
||||||
|
Congratulations! By now, you should have a solid design for your cloud.
|
||||||
|
We now recommend that you turn to the `OpenStack Installation Guides
|
||||||
|
<http://docs.openstack.org/index.html#install-guides>`_, which contains a
|
||||||
|
step-by-step guide on how to manually install the OpenStack packages and
|
||||||
|
dependencies on your cloud.
|
||||||
|
|
||||||
|
While it is important for an operator to be familiar with the steps
|
||||||
|
involved in deploying OpenStack, we also strongly encourage you to
|
||||||
|
evaluate configuration-management tools, such as :term:`Puppet` or
|
||||||
|
:term:`Chef`, which can help automate this deployment process.
|
||||||
|
|
||||||
|
In the remainder of this guide, we assume that you have successfully
|
||||||
|
deployed an OpenStack cloud and are able to perform basic operations
|
||||||
|
such as adding images, booting instances, and attaching volumes.
|
||||||
|
|
||||||
|
As your focus turns to stable operations, we recommend that you do skim
|
||||||
|
the remainder of this book to get a sense of the content. Some of this
|
||||||
|
content is useful to read in advance so that you can put best practices
|
||||||
|
into effect to simplify your life in the long run. Other content is more
|
||||||
|
useful as a reference that you might turn to when an unexpected event
|
||||||
|
occurs (such as a power failure), or to troubleshoot a particular
|
||||||
|
problem.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
ops_lay_of_the_land.rst
|
||||||
|
ops_projects_users.rst
|
||||||
|
ops_user_facing_operations.rst
|
||||||
|
ops_maintenance.rst
|
||||||
|
ops_network_troubleshooting.rst
|
||||||
|
ops_logging_monitoring.rst
|
||||||
|
ops_backup_recovery.rst
|
||||||
|
ops_customize.rst
|
||||||
|
ops_upstream.rst
|
||||||
|
ops_advanced_configuration.rst
|
||||||
|
ops_upgrades.rst
|
|
@ -0,0 +1,163 @@
|
||||||
|
======================
|
||||||
|
Advanced Configuration
|
||||||
|
======================
|
||||||
|
|
||||||
|
OpenStack is intended to work well across a variety of installation
|
||||||
|
flavors, from very small private clouds to large public clouds. To
|
||||||
|
achieve this, the developers add configuration options to their code
|
||||||
|
that allow the behavior of the various components to be tweaked
|
||||||
|
depending on your needs. Unfortunately, it is not possible to cover all
|
||||||
|
possible deployments with the default configuration values.
|
||||||
|
|
||||||
|
At the time of writing, OpenStack has more than 3,000 configuration
|
||||||
|
options. You can see them documented at the
|
||||||
|
`OpenStack configuration reference
|
||||||
|
guide <http://docs.openstack.org/liberty/config-reference/content/config_overview.html>`_.
|
||||||
|
This chapter cannot hope to document all of these, but we do try to
|
||||||
|
introduce the important concepts so that you know where to go digging
|
||||||
|
for more information.
|
||||||
|
|
||||||
|
Differences Between Various Drivers
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Many OpenStack projects implement a driver layer, and each of these
|
||||||
|
drivers will implement its own configuration options. For example, in
|
||||||
|
OpenStack Compute (nova), there are various hypervisor drivers
|
||||||
|
implemented—libvirt, xenserver, hyper-v, and vmware, for example. Not
|
||||||
|
all of these hypervisor drivers have the same features, and each has
|
||||||
|
different tuning requirements.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The currently implemented hypervisors are listed on the `OpenStack
|
||||||
|
documentation
|
||||||
|
website <http://docs.openstack.org/liberty/config-reference/content/section_compute-hypervisors.html>`_.
|
||||||
|
You can see a matrix of the various features in OpenStack Compute
|
||||||
|
(nova) hypervisor drivers on the OpenStack wiki at the `Hypervisor
|
||||||
|
support matrix
|
||||||
|
page <http://docs.openstack.org/developer/nova/support-matrix.html>`_.
|
||||||
|
|
||||||
|
The point we are trying to make here is that just because an option
|
||||||
|
exists doesn't mean that option is relevant to your driver choices.
|
||||||
|
Normally, the documentation notes which drivers the configuration
|
||||||
|
applies to.
|
||||||
|
|
||||||
|
Implementing Periodic Tasks
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Another common concept across various OpenStack projects is that of
|
||||||
|
periodic tasks. Periodic tasks are much like cron jobs on traditional
|
||||||
|
Unix systems, but they are run inside an OpenStack process. For example,
|
||||||
|
when OpenStack Compute (nova) needs to work out what images it can
|
||||||
|
remove from its local cache, it runs a periodic task to do this.
|
||||||
|
|
||||||
|
Periodic tasks are important to understand because of limitations in the
|
||||||
|
threading model that OpenStack uses. OpenStack uses cooperative
|
||||||
|
threading in Python, which means that if something long and complicated
|
||||||
|
is running, it will block other tasks inside that process from running
|
||||||
|
unless it voluntarily yields execution to another cooperative thread.
|
||||||
|
|
||||||
|
A tangible example of this is the ``nova-compute`` process. In order to
|
||||||
|
manage the image cache with libvirt, ``nova-compute`` has a periodic
|
||||||
|
process that scans the contents of the image cache. Part of this scan is
|
||||||
|
calculating a checksum for each of the images and making sure that
|
||||||
|
checksum matches what ``nova-compute`` expects it to be. However, images
|
||||||
|
can be very large, and these checksums can take a long time to generate.
|
||||||
|
At one point, before it was reported as a bug and fixed,
|
||||||
|
``nova-compute`` would block on this task and stop responding to RPC
|
||||||
|
requests. This was visible to users as failure of operations such as
|
||||||
|
spawning or deleting instances.
|
||||||
|
|
||||||
|
The take away from this is if you observe an OpenStack process that
|
||||||
|
appears to "stop" for a while and then continue to process normally, you
|
||||||
|
should check that periodic tasks aren't the problem. One way to do this
|
||||||
|
is to disable the periodic tasks by setting their interval to zero.
|
||||||
|
Additionally, you can configure how often these periodic tasks run—in
|
||||||
|
some cases, it might make sense to run them at a different frequency
|
||||||
|
from the default.
|
||||||
|
|
||||||
|
The frequency is defined separately for each periodic task. Therefore,
|
||||||
|
to disable every periodic task in OpenStack Compute (nova), you would
|
||||||
|
need to set a number of configuration options to zero. The current list
|
||||||
|
of configuration options you would need to set to zero are:
|
||||||
|
|
||||||
|
- ``bandwidth_poll_interval``
|
||||||
|
|
||||||
|
- ``sync_power_state_interval``
|
||||||
|
|
||||||
|
- ``heal_instance_info_cache_interval``
|
||||||
|
|
||||||
|
- ``host_state_interval``
|
||||||
|
|
||||||
|
- ``image_cache_manager_interval``
|
||||||
|
|
||||||
|
- ``reclaim_instance_interval``
|
||||||
|
|
||||||
|
- ``volume_usage_poll_interval``
|
||||||
|
|
||||||
|
- ``shelved_poll_interval``
|
||||||
|
|
||||||
|
- ``shelved_offload_time``
|
||||||
|
|
||||||
|
- ``instance_delete_interval``
|
||||||
|
|
||||||
|
To set a configuration option to zero, include a line such as
|
||||||
|
``image_cache_manager_interval=0`` in your ``nova.conf`` file.
|
||||||
|
|
||||||
|
This list will change between releases, so please refer to your
|
||||||
|
configuration guide for up-to-date information.
|
||||||
|
|
||||||
|
Specific Configuration Topics
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This section covers specific examples of configuration options you might
|
||||||
|
consider tuning. It is by no means an exhaustive list.
|
||||||
|
|
||||||
|
Security Configuration for Compute, Networking, and Storage
|
||||||
|
-----------------------------------------------------------
|
||||||
|
|
||||||
|
The `OpenStack Security Guide <http://docs.openstack.org/sec/>`_
|
||||||
|
provides a deep dive into securing an OpenStack cloud, including
|
||||||
|
SSL/TLS, key management, PKI and certificate management, data transport
|
||||||
|
and privacy concerns, and compliance.
|
||||||
|
|
||||||
|
High Availability
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
The `OpenStack High Availability
|
||||||
|
Guide <http://docs.openstack.org/ha-guide/index.html>`_ offers
|
||||||
|
suggestions for elimination of a single point of failure that could
|
||||||
|
cause system downtime. While it is not a completely prescriptive
|
||||||
|
document, it offers methods and techniques for avoiding downtime and
|
||||||
|
data loss.
|
||||||
|
|
||||||
|
Enabling IPv6 Support
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
You can follow the progress being made on IPV6 support by watching the
|
||||||
|
`neutron IPv6 Subteam at
|
||||||
|
work <https://wiki.openstack.org/wiki/Meetings/Neutron-IPv6-Subteam>`_.Liberty
|
||||||
|
IPv6 supportIPv6, enabling support forconfiguration options IPv6 support
|
||||||
|
|
||||||
|
By modifying your configuration setup, you can set up IPv6 when using
|
||||||
|
``nova-network`` for networking, and a tested setup is documented for
|
||||||
|
FlatDHCP and a multi-host configuration. The key is to make
|
||||||
|
``nova-network`` think a ``radvd`` command ran successfully. The entire
|
||||||
|
configuration is detailed in a Cybera blog post, `“An IPv6 enabled
|
||||||
|
cloud” <http://www.cybera.ca/news-and-events/tech-radar/an-ipv6-enabled-cloud/>`_.
|
||||||
|
|
||||||
|
Geographical Considerations for Object Storage
|
||||||
|
----------------------------------------------
|
||||||
|
|
||||||
|
Support for global clustering of object storage servers is available for
|
||||||
|
all supported releases. You would implement these global clusters to
|
||||||
|
ensure replication across geographic areas in case of a natural disaster
|
||||||
|
and also to ensure that users can write or access their objects more
|
||||||
|
quickly based on the closest data center. You configure a default region
|
||||||
|
with one zone for each cluster, but be sure your network (WAN) can
|
||||||
|
handle the additional request and response load between zones as you add
|
||||||
|
more zones and build a ring that handles more zones. Refer to
|
||||||
|
`Geographically Distributed
|
||||||
|
Clusters <http://docs.openstack.org/developer/swift/admin_guide.html#geographically-distributed-clusters>`_
|
||||||
|
in the documentation for additional information.
|
||||||
|
|
|
@ -0,0 +1,203 @@
|
||||||
|
===================
|
||||||
|
Backup and Recovery
|
||||||
|
===================
|
||||||
|
|
||||||
|
Standard backup best practices apply when creating your OpenStack backup
|
||||||
|
policy. For example, how often to back up your data is closely related
|
||||||
|
to how quickly you need to recover from data loss.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
If you cannot have any data loss at all, you should also focus on a
|
||||||
|
highly available deployment. The `OpenStack High Availability
|
||||||
|
Guide <http://docs.openstack.org/ha-guide/index.html>`_ offers
|
||||||
|
suggestions for elimination of a single point of failure that could
|
||||||
|
cause system downtime. While it is not a completely prescriptive
|
||||||
|
document, it offers methods and techniques for avoiding downtime and
|
||||||
|
data loss.
|
||||||
|
|
||||||
|
Other backup considerations include:
|
||||||
|
|
||||||
|
- How many backups to keep?
|
||||||
|
|
||||||
|
- Should backups be kept off-site?
|
||||||
|
|
||||||
|
- How often should backups be tested?
|
||||||
|
|
||||||
|
Just as important as a backup policy is a recovery policy (or at least
|
||||||
|
recovery testing).
|
||||||
|
|
||||||
|
What to Back Up
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
While OpenStack is composed of many components and moving parts, backing
|
||||||
|
up the critical data is quite simple.
|
||||||
|
|
||||||
|
This chapter describes only how to back up configuration files and
|
||||||
|
databases that the various OpenStack components need to run. This
|
||||||
|
chapter does not describe how to back up objects inside Object Storage
|
||||||
|
or data contained inside Block Storage. Generally these areas are left
|
||||||
|
for users to back up on their own.
|
||||||
|
|
||||||
|
Database Backups
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The example OpenStack architecture designates the cloud controller as
|
||||||
|
the MySQL server. This MySQL server hosts the databases for nova,
|
||||||
|
glance, cinder, and keystone. With all of these databases in one place,
|
||||||
|
it's very easy to create a database backup:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# mysqldump --opt --all-databases > openstack.sql
|
||||||
|
|
||||||
|
If you only want to backup a single database, you can instead run:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# mysqldump --opt nova > nova.sql
|
||||||
|
|
||||||
|
where ``nova`` is the database you want to back up.
|
||||||
|
|
||||||
|
You can easily automate this process by creating a cron job that runs
|
||||||
|
the following script once per day:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
#!/bin/bash
|
||||||
|
backup_dir="/var/lib/backups/mysql"
|
||||||
|
filename="${backup_dir}/mysql-`hostname`-`eval date +%Y%m%d`.sql.gz"
|
||||||
|
# Dump the entire MySQL database
|
||||||
|
/usr/bin/mysqldump --opt --all-databases | gzip > $filename
|
||||||
|
# Delete backups older than 7 days
|
||||||
|
find $backup_dir -ctime +7 -type f -delete
|
||||||
|
|
||||||
|
This script dumps the entire MySQL database and deletes any backups
|
||||||
|
older than seven days.
|
||||||
|
|
||||||
|
File System Backups
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This section discusses which files and directories should be backed up
|
||||||
|
regularly, organized by service.
|
||||||
|
|
||||||
|
Compute
|
||||||
|
-------
|
||||||
|
|
||||||
|
The ``/etc/nova`` directory on both the cloud controller and compute
|
||||||
|
nodes should be regularly backed up.
|
||||||
|
|
||||||
|
``/var/log/nova`` does not need to be backed up if you have all logs
|
||||||
|
going to a central area. It is highly recommended to use a central
|
||||||
|
logging server or back up the log directory.
|
||||||
|
|
||||||
|
``/var/lib/nova`` is another important directory to back up. The
|
||||||
|
exception to this is the ``/var/lib/nova/instances`` subdirectory on
|
||||||
|
compute nodes. This subdirectory contains the KVM images of running
|
||||||
|
instances. You would want to back up this directory only if you need to
|
||||||
|
maintain backup copies of all instances. Under most circumstances, you
|
||||||
|
do not need to do this, but this can vary from cloud to cloud and your
|
||||||
|
service levels. Also be aware that making a backup of a live KVM
|
||||||
|
instance can cause that instance to not boot properly if it is ever
|
||||||
|
restored from a backup.
|
||||||
|
|
||||||
|
Image Catalog and Delivery
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
``/etc/glance`` and ``/var/log/glance`` follow the same rules as their
|
||||||
|
nova counterparts.
|
||||||
|
|
||||||
|
``/var/lib/glance`` should also be backed up. Take special notice of
|
||||||
|
``/var/lib/glance/images``. If you are using a file-based back end of
|
||||||
|
glance, ``/var/lib/glance/images`` is where the images are stored and
|
||||||
|
care should be taken.
|
||||||
|
|
||||||
|
There are two ways to ensure stability with this directory. The first is
|
||||||
|
to make sure this directory is run on a RAID array. If a disk fails, the
|
||||||
|
directory is available. The second way is to use a tool such as rsync to
|
||||||
|
replicate the images to another server:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# rsync -az --progress /var/lib/glance/images \
|
||||||
|
backup-server:/var/lib/glance/images/
|
||||||
|
|
||||||
|
Identity
|
||||||
|
--------
|
||||||
|
|
||||||
|
``/etc/keystone`` and ``/var/log/keystone`` follow the same rules as
|
||||||
|
other components.
|
||||||
|
|
||||||
|
``/var/lib/keystone``, although it should not contain any data being
|
||||||
|
used, can also be backed up just in case.
|
||||||
|
|
||||||
|
Block Storage
|
||||||
|
-------------
|
||||||
|
|
||||||
|
``/etc/cinder`` and ``/var/log/cinder`` follow the same rules as other
|
||||||
|
components.
|
||||||
|
|
||||||
|
``/var/lib/cinder`` should also be backed up.
|
||||||
|
|
||||||
|
Object Storage
|
||||||
|
--------------
|
||||||
|
|
||||||
|
``/etc/swift`` is very important to have backed up. This directory
|
||||||
|
contains the swift configuration files as well as the ring files and
|
||||||
|
ring :term:`builder files <builder file>`, which if lost, render the data
|
||||||
|
on your cluster inaccessible. A best practice is to copy the builder files
|
||||||
|
to all storage nodes along with the ring files. Multiple backup copies are
|
||||||
|
spread throughout your storage cluster.
|
||||||
|
|
||||||
|
Recovering Backups
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Recovering backups is a fairly simple process. To begin, first ensure
|
||||||
|
that the service you are recovering is not running. For example, to do a
|
||||||
|
full recovery of ``nova`` on the cloud controller, first stop all
|
||||||
|
``nova`` services:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# stop nova-api
|
||||||
|
# stop nova-cert
|
||||||
|
# stop nova-consoleauth
|
||||||
|
# stop nova-novncproxy
|
||||||
|
# stop nova-objectstore
|
||||||
|
# stop nova-scheduler
|
||||||
|
|
||||||
|
Now you can import a previously backed-up database:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# mysql nova < nova.sql
|
||||||
|
|
||||||
|
You can also restore backed-up nova directories:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# mv /etc/nova{,.orig}
|
||||||
|
# cp -a /path/to/backup/nova /etc/
|
||||||
|
|
||||||
|
Once the files are restored, start everything back up:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# start mysql
|
||||||
|
# for i in nova-api nova-cert nova-consoleauth nova-novncproxy
|
||||||
|
nova-objectstore nova-scheduler
|
||||||
|
> do
|
||||||
|
> start $i
|
||||||
|
> done
|
||||||
|
|
||||||
|
Other services follow the same process, with their respective
|
||||||
|
directories and databases.
|
||||||
|
|
||||||
|
Summary
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
Backup and subsequent recovery is one of the first tasks system
|
||||||
|
administrators learn. However, each system has different items that need
|
||||||
|
attention. By taking care of your database, image service, and
|
||||||
|
appropriate file system locations, you can be assured that you can
|
||||||
|
handle any event requiring recovery.
|
|
@ -0,0 +1,850 @@
|
||||||
|
=============
|
||||||
|
Customization
|
||||||
|
=============
|
||||||
|
|
||||||
|
OpenStack might not do everything you need it to do out of the box. To
|
||||||
|
add a new feature, you can follow different paths.
|
||||||
|
|
||||||
|
To take the first path, you can modify the OpenStack code directly.
|
||||||
|
Learn `how to
|
||||||
|
contribute <https://wiki.openstack.org/wiki/How_To_Contribute>`_,
|
||||||
|
follow the `code review
|
||||||
|
workflow <https://wiki.openstack.org/wiki/GerritWorkflow>`_, make your
|
||||||
|
changes, and contribute them back to the upstream OpenStack project.
|
||||||
|
This path is recommended if the feature you need requires deep
|
||||||
|
integration with an existing project. The community is always open to
|
||||||
|
contributions and welcomes new functionality that follows the
|
||||||
|
feature-development guidelines. This path still requires you to use
|
||||||
|
DevStack for testing your feature additions, so this chapter walks you
|
||||||
|
through the DevStack environment.
|
||||||
|
|
||||||
|
For the second path, you can write new features and plug them in using
|
||||||
|
changes to a configuration file. If the project where your feature would
|
||||||
|
need to reside uses the Python Paste framework, you can create
|
||||||
|
middleware for it and plug it in through configuration. There may also
|
||||||
|
be specific ways of customizing a project, such as creating a new
|
||||||
|
scheduler driver for Compute or a custom tab for the dashboard.
|
||||||
|
|
||||||
|
This chapter focuses on the second path for customizing OpenStack by
|
||||||
|
providing two examples for writing new features. The first example shows
|
||||||
|
how to modify Object Storage (swift) middleware to add a new feature,
|
||||||
|
and the second example provides a new scheduler feature for OpenStack
|
||||||
|
Compute (nova). To customize OpenStack this way you need a development
|
||||||
|
environment. The best way to get an environment up and running quickly
|
||||||
|
is to run DevStack within your cloud.
|
||||||
|
|
||||||
|
Create an OpenStack Development Environment
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
To create a development environment, you can use DevStack. DevStack is
|
||||||
|
essentially a collection of shell scripts and configuration files that
|
||||||
|
builds an OpenStack development environment for you. You use it to
|
||||||
|
create such an environment for developing a new feature.
|
||||||
|
|
||||||
|
You can find all of the documentation at the
|
||||||
|
`DevStack <http://docs.openstack.org/developer/devstack/>`_ website.
|
||||||
|
|
||||||
|
**To run DevStack on an instance in your OpenStack cloud:**
|
||||||
|
|
||||||
|
#. Boot an instance from the dashboard or the nova command-line interface
|
||||||
|
(CLI) with the following parameters:
|
||||||
|
|
||||||
|
- Name: devstack
|
||||||
|
|
||||||
|
- Image: Ubuntu 14.04 LTS
|
||||||
|
|
||||||
|
- Memory Size: 4 GB RAM
|
||||||
|
|
||||||
|
- Disk Size: minimum 5 GB
|
||||||
|
|
||||||
|
If you are using the ``nova`` client, specify :option:`--flavor 3` for the
|
||||||
|
:command:`nova boot` command to get adequate memory and disk sizes.
|
||||||
|
|
||||||
|
#. Log in and set up DevStack. Here's an example of the commands you can
|
||||||
|
use to set up DevStack on a virtual machine:
|
||||||
|
|
||||||
|
#. Log in to the instance:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ ssh username@my.instance.ip.address
|
||||||
|
|
||||||
|
#. Update the virtual machine's operating system:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# apt-get -y update
|
||||||
|
|
||||||
|
#. Install git:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# apt-get -y install git
|
||||||
|
|
||||||
|
#. Clone the ``devstack`` repository:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ git clone https://git.openstack.org/openstack-dev/devstack
|
||||||
|
|
||||||
|
#. Change to the ``devstack`` repository:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ cd devstack
|
||||||
|
|
||||||
|
#. (Optional) If you've logged in to your instance as the root user, you
|
||||||
|
must create a "stack" user; otherwise you'll run into permission issues.
|
||||||
|
If you've logged in as a user other than root, you can skip these steps:
|
||||||
|
|
||||||
|
#. Run the DevStack script to create the stack user:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# tools/create-stack-user.sh
|
||||||
|
|
||||||
|
#. Give ownership of the ``devstack`` directory to the stack user:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# chown -R stack:stack /root/devstack
|
||||||
|
|
||||||
|
#. Set some permissions you can use to view the DevStack screen later:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# chmod o+rwx /dev/pts/0
|
||||||
|
|
||||||
|
#. Switch to the stack user:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ su stack
|
||||||
|
|
||||||
|
#. Edit the ``local.conf`` configuration file that controls what DevStack
|
||||||
|
will deploy. Copy the example ``local.conf`` file at the end of this
|
||||||
|
section (:ref:`local.conf`):
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ vim local.conf
|
||||||
|
|
||||||
|
#. Run the stack script that will install OpenStack:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ ./stack.sh
|
||||||
|
|
||||||
|
#. When the stack script is done, you can open the screen session it
|
||||||
|
started to view all of the running OpenStack services:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ screen -r stack
|
||||||
|
|
||||||
|
#. Press ``Ctrl+A`` followed by 0 to go to the first ``screen`` window.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
- The ``stack.sh`` script takes a while to run. Perhaps you can
|
||||||
|
take this opportunity to `join the OpenStack
|
||||||
|
Foundation <https://www.openstack.org/join/>`__.
|
||||||
|
|
||||||
|
- ``Screen`` is a useful program for viewing many related services
|
||||||
|
at once. For more information, see the `GNU screen quick
|
||||||
|
reference <http://aperiodic.net/screen/quick_reference>`__.
|
||||||
|
|
||||||
|
Now that you have an OpenStack development environment, you're free to
|
||||||
|
hack around without worrying about damaging your production deployment.
|
||||||
|
:ref:`local.conf` provides a working environment for
|
||||||
|
running OpenStack Identity, Compute, Block Storage, Image service, the
|
||||||
|
OpenStack dashboard, and Object Storage as the starting point.
|
||||||
|
|
||||||
|
.. _local.conf:
|
||||||
|
|
||||||
|
local.conf
|
||||||
|
----------
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
[[local|localrc]]
|
||||||
|
FLOATING_RANGE=192.168.1.224/27
|
||||||
|
FIXED_RANGE=10.11.12.0/24
|
||||||
|
FIXED_NETWORK_SIZE=256
|
||||||
|
FLAT_INTERFACE=eth0
|
||||||
|
ADMIN_PASSWORD=supersecret
|
||||||
|
DATABASE_PASSWORD=iheartdatabases
|
||||||
|
RABBIT_PASSWORD=flopsymopsy
|
||||||
|
SERVICE_PASSWORD=iheartksl
|
||||||
|
SERVICE_TOKEN=xyzpdqlazydog
|
||||||
|
|
||||||
|
Customizing Object Storage (Swift) Middleware
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack Object Storage, known as swift when reading the code, is based
|
||||||
|
on the Python `Paste <http://pythonpaste.org/>`_ framework. The best
|
||||||
|
introduction to its architecture is `A Do-It-Yourself
|
||||||
|
Framework <http://pythonpaste.org/do-it-yourself-framework.html>`_.
|
||||||
|
Because of the swift project's use of this framework, you are able to
|
||||||
|
add features to a project by placing some custom code in a project's
|
||||||
|
pipeline without having to change any of the core code.
|
||||||
|
|
||||||
|
Imagine a scenario where you have public access to one of your
|
||||||
|
containers, but what you really want is to restrict access to that to a
|
||||||
|
set of IPs based on a whitelist. In this example, we'll create a piece
|
||||||
|
of middleware for swift that allows access to a container from only a
|
||||||
|
set of IP addresses, as determined by the container's metadata items.
|
||||||
|
Only those IP addresses that you explicitly whitelist using the
|
||||||
|
container's metadata will be able to access the container.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This example is for illustrative purposes only. It should not be
|
||||||
|
used as a container IP whitelist solution without further
|
||||||
|
development and extensive security testing.
|
||||||
|
|
||||||
|
When you join the screen session that ``stack.sh`` starts with
|
||||||
|
``screen -r stack``, you see a screen for each service running, which
|
||||||
|
can be a few or several, depending on how many services you configured
|
||||||
|
DevStack to run.
|
||||||
|
|
||||||
|
The asterisk * indicates which screen window you are viewing. This
|
||||||
|
example shows we are viewing the key (for keystone) screen window:
|
||||||
|
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
0$ shell 1$ key* 2$ horizon 3$ s-proxy 4$ s-object 5$ s-container 6$ s-account
|
||||||
|
|
||||||
|
The purpose of the screen windows are as follows:
|
||||||
|
|
||||||
|
|
||||||
|
``shell``
|
||||||
|
A shell where you can get some work done
|
||||||
|
|
||||||
|
``key*``
|
||||||
|
The keystone service
|
||||||
|
|
||||||
|
``horizon``
|
||||||
|
The horizon dashboard web application
|
||||||
|
|
||||||
|
``s-{name}``
|
||||||
|
The swift services
|
||||||
|
|
||||||
|
**To create the middleware and plug it in through Paste configuration:**
|
||||||
|
|
||||||
|
All of the code for OpenStack lives in ``/opt/stack``. Go to the swift
|
||||||
|
directory in the ``shell`` screen and edit your middleware module.
|
||||||
|
|
||||||
|
#. Change to the directory where Object Storage is installed:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ cd /opt/stack/swift
|
||||||
|
|
||||||
|
#. Create the ``ip_whitelist.py`` Python source code file:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ vim swift/common/middleware/ip_whitelist.py
|
||||||
|
|
||||||
|
#. Copy the code as shown below into ``ip_whitelist.py``.
|
||||||
|
The following code is a middleware example that
|
||||||
|
restricts access to a container based on IP address as explained at the
|
||||||
|
beginning of the section. Middleware passes the request on to another
|
||||||
|
application. This example uses the swift "swob" library to wrap Web
|
||||||
|
Server Gateway Interface (WSGI) requests and responses into objects for
|
||||||
|
swift to interact with. When you're done, save and close the file.
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
# vim: tabstop=4 shiftwidth=4 softtabstop=4
|
||||||
|
# Copyright (c) 2014 OpenStack Foundation
|
||||||
|
# All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||||
|
# not use this file except in compliance with the License. You may obtain
|
||||||
|
# a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||||
|
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||||
|
# License for the specific language governing permissions and limitations
|
||||||
|
# under the License.
|
||||||
|
|
||||||
|
import socket
|
||||||
|
|
||||||
|
from swift.common.utils import get_logger
|
||||||
|
from swift.proxy.controllers.base import get_container_info
|
||||||
|
from swift.common.swob import Request, Response
|
||||||
|
|
||||||
|
class IPWhitelistMiddleware(object):
|
||||||
|
"""
|
||||||
|
IP Whitelist Middleware
|
||||||
|
|
||||||
|
Middleware that allows access to a container from only a set of IP
|
||||||
|
addresses as determined by the container's metadata items that start
|
||||||
|
with the prefix 'allow'. E.G. allow-dev=192.168.0.20
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, app, conf, logger=None):
|
||||||
|
self.app = app
|
||||||
|
|
||||||
|
if logger:
|
||||||
|
self.logger = logger
|
||||||
|
else:
|
||||||
|
self.logger = get_logger(conf, log_route='ip_whitelist')
|
||||||
|
|
||||||
|
self.deny_message = conf.get('deny_message', "IP Denied")
|
||||||
|
self.local_ip = socket.gethostbyname(socket.gethostname())
|
||||||
|
|
||||||
|
def __call__(self, env, start_response):
|
||||||
|
"""
|
||||||
|
WSGI entry point.
|
||||||
|
Wraps env in swob.Request object and passes it down.
|
||||||
|
|
||||||
|
:param env: WSGI environment dictionary
|
||||||
|
:param start_response: WSGI callable
|
||||||
|
"""
|
||||||
|
req = Request(env)
|
||||||
|
|
||||||
|
try:
|
||||||
|
version, account, container, obj = req.split_path(1, 4, True)
|
||||||
|
except ValueError:
|
||||||
|
return self.app(env, start_response)
|
||||||
|
|
||||||
|
container_info = get_container_info(
|
||||||
|
req.environ, self.app, swift_source='IPWhitelistMiddleware')
|
||||||
|
|
||||||
|
remote_ip = env['REMOTE_ADDR']
|
||||||
|
self.logger.debug("Remote IP: %(remote_ip)s",
|
||||||
|
{'remote_ip': remote_ip})
|
||||||
|
|
||||||
|
meta = container_info['meta']
|
||||||
|
allow = {k:v for k,v in meta.iteritems() if k.startswith('allow')}
|
||||||
|
allow_ips = set(allow.values())
|
||||||
|
allow_ips.add(self.local_ip)
|
||||||
|
self.logger.debug("Allow IPs: %(allow_ips)s",
|
||||||
|
{'allow_ips': allow_ips})
|
||||||
|
|
||||||
|
if remote_ip in allow_ips:
|
||||||
|
return self.app(env, start_response)
|
||||||
|
else:
|
||||||
|
self.logger.debug(
|
||||||
|
"IP %(remote_ip)s denied access to Account=%(account)s "
|
||||||
|
"Container=%(container)s. Not in %(allow_ips)s", locals())
|
||||||
|
return Response(
|
||||||
|
status=403,
|
||||||
|
body=self.deny_message,
|
||||||
|
request=req)(env, start_response)
|
||||||
|
|
||||||
|
|
||||||
|
def filter_factory(global_conf, **local_conf):
|
||||||
|
"""
|
||||||
|
paste.deploy app factory for creating WSGI proxy apps.
|
||||||
|
"""
|
||||||
|
conf = global_conf.copy()
|
||||||
|
conf.update(local_conf)
|
||||||
|
|
||||||
|
def ip_whitelist(app):
|
||||||
|
return IPWhitelistMiddleware(app, conf)
|
||||||
|
return ip_whitelist
|
||||||
|
|
||||||
|
|
||||||
|
There is a lot of useful information in ``env`` and ``conf`` that you
|
||||||
|
can use to decide what to do with the request. To find out more about
|
||||||
|
what properties are available, you can insert the following log
|
||||||
|
statement into the ``__init__`` method:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
self.logger.debug("conf = %(conf)s", locals())
|
||||||
|
|
||||||
|
|
||||||
|
and the following log statement into the ``__call__`` method:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
self.logger.debug("env = %(env)s", locals())
|
||||||
|
|
||||||
|
#. To plug this middleware into the swift Paste pipeline, you edit one
|
||||||
|
configuration file, ``/etc/swift/proxy-server.conf``:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ vim /etc/swift/proxy-server.conf
|
||||||
|
|
||||||
|
#. Find the ``[filter:ratelimit]`` section in
|
||||||
|
``/etc/swift/proxy-server.conf``, and copy in the following
|
||||||
|
configuration section after it:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[filter:ip_whitelist]
|
||||||
|
paste.filter_factory = swift.common.middleware.ip_whitelist:filter_factory
|
||||||
|
# You can override the default log routing for this filter here:
|
||||||
|
# set log_name = ratelimit
|
||||||
|
# set log_facility = LOG_LOCAL0
|
||||||
|
# set log_level = INFO
|
||||||
|
# set log_headers = False
|
||||||
|
# set log_address = /dev/log
|
||||||
|
deny_message = You shall not pass!
|
||||||
|
|
||||||
|
#. Find the ``[pipeline:main]`` section in
|
||||||
|
``/etc/swift/proxy-server.conf``, and add ``ip_whitelist`` after
|
||||||
|
ratelimit to the list like so. When you're done, save and close the
|
||||||
|
file:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[pipeline:main]
|
||||||
|
pipeline = catch_errors gatekeeper healthcheck proxy-logging cache bulk tempurl ratelimit ip_whitelist ...
|
||||||
|
|
||||||
|
#. Restart the ``swift proxy`` service to make swift use your middleware.
|
||||||
|
Start by switching to the ``swift-proxy`` screen:
|
||||||
|
|
||||||
|
#. Press **Ctrl+A** followed by 3.
|
||||||
|
|
||||||
|
#. Press **Ctrl+C** to kill the service.
|
||||||
|
|
||||||
|
#. Press Up Arrow to bring up the last command.
|
||||||
|
|
||||||
|
#. Press Enter to run it.
|
||||||
|
|
||||||
|
#. Test your middleware with the ``swift`` CLI. Start by switching to the
|
||||||
|
shell screen and finish by switching back to the ``swift-proxy`` screen
|
||||||
|
to check the log output:
|
||||||
|
|
||||||
|
#. Press **Ctrl+A** followed by 0.
|
||||||
|
|
||||||
|
#. Make sure you're in the ``devstack`` directory:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ cd /root/devstack
|
||||||
|
|
||||||
|
#. Source openrc to set up your environment variables for the CLI:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ source openrc
|
||||||
|
|
||||||
|
#. Create a container called ``middleware-test``:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ swift post middleware-test
|
||||||
|
|
||||||
|
#. Press **Ctrl+A** followed by 3 to check the log output.
|
||||||
|
|
||||||
|
#. Among the log statements you'll see the lines:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
proxy-server Remote IP: my.instance.ip.address (txn: ...)
|
||||||
|
proxy-server Allow IPs: set(['my.instance.ip.address']) (txn: ...)
|
||||||
|
|
||||||
|
These two statements are produced by our middleware and show that the
|
||||||
|
request was sent from our DevStack instance and was allowed.
|
||||||
|
|
||||||
|
#. Test the middleware from outside DevStack on a remote machine that has
|
||||||
|
access to your DevStack instance:
|
||||||
|
|
||||||
|
#. Install the ``keystone`` and ``swift`` clients on your local machine:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# pip install python-keystoneclient python-swiftclient
|
||||||
|
|
||||||
|
#. Attempt to list the objects in the ``middleware-test`` container:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ swift --os-auth-url=http://my.instance.ip.address:5000/v2.0/ \
|
||||||
|
--os-region-name=RegionOne --os-username=demo:demo \
|
||||||
|
--os-password=devstack list middleware-test
|
||||||
|
Container GET failed: http://my.instance.ip.address:8080/v1/AUTH_.../
|
||||||
|
middleware-test?format=json 403 Forbidden You shall not pass!
|
||||||
|
|
||||||
|
#. Press **Ctrl+A** followed by 3 to check the log output. Look at the
|
||||||
|
swift log statements again, and among the log statements, you'll see the
|
||||||
|
lines:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
proxy-server Authorizing from an overriding middleware (i.e: tempurl) (txn: ...)
|
||||||
|
proxy-server ... IPWhitelistMiddleware
|
||||||
|
proxy-server Remote IP: my.local.ip.address (txn: ...)
|
||||||
|
proxy-server Allow IPs: set(['my.instance.ip.address']) (txn: ...)
|
||||||
|
proxy-server IP my.local.ip.address denied access to Account=AUTH_... \
|
||||||
|
Container=None. Not in set(['my.instance.ip.address']) (txn: ...)
|
||||||
|
|
||||||
|
Here we can see that the request was denied because the remote IP
|
||||||
|
address wasn't in the set of allowed IPs.
|
||||||
|
|
||||||
|
#. Back in your DevStack instance on the shell screen, add some metadata to
|
||||||
|
your container to allow the request from the remote machine:
|
||||||
|
|
||||||
|
#. Press **Ctrl+A** followed by 0.
|
||||||
|
|
||||||
|
#. Add metadata to the container to allow the IP:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ swift post --meta allow-dev:my.local.ip.address middleware-test
|
||||||
|
|
||||||
|
#. Now try the command from Step 10 again and it succeeds. There are no
|
||||||
|
objects in the container, so there is nothing to list; however, there is
|
||||||
|
also no error to report.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
Functional testing like this is not a replacement for proper unit
|
||||||
|
and integration testing, but it serves to get you started.
|
||||||
|
|
||||||
|
You can follow a similar pattern in other projects that use the Python
|
||||||
|
Paste framework. Simply create a middleware module and plug it in
|
||||||
|
through configuration. The middleware runs in sequence as part of that
|
||||||
|
project's pipeline and can call out to other services as necessary. No
|
||||||
|
project core code is touched. Look for a ``pipeline`` value in the
|
||||||
|
project's ``conf`` or ``ini`` configuration files in ``/etc/<project>``
|
||||||
|
to identify projects that use Paste.
|
||||||
|
|
||||||
|
When your middleware is done, we encourage you to open source it and let
|
||||||
|
the community know on the OpenStack mailing list. Perhaps others need
|
||||||
|
the same functionality. They can use your code, provide feedback, and
|
||||||
|
possibly contribute. If enough support exists for it, perhaps you can
|
||||||
|
propose that it be added to the official swift
|
||||||
|
`middleware <https://git.openstack.org/cgit/openstack/swift/tree/swift/common/middleware>`_.
|
||||||
|
|
||||||
|
Customizing the OpenStack Compute (nova) Scheduler
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Many OpenStack projects allow for customization of specific features
|
||||||
|
using a driver architecture. You can write a driver that conforms to a
|
||||||
|
particular interface and plug it in through configuration. For example,
|
||||||
|
you can easily plug in a new scheduler for Compute. The existing
|
||||||
|
schedulers for Compute are feature full and well documented at
|
||||||
|
`Scheduling <http://docs.openstack.org/liberty/config-reference/content/section_compute-scheduler.html>`_.
|
||||||
|
However, depending on your user's use cases, the existing schedulers
|
||||||
|
might not meet your requirements. You might need to create a new
|
||||||
|
scheduler.
|
||||||
|
|
||||||
|
To create a scheduler, you must inherit from the class
|
||||||
|
``nova.scheduler.driver.Scheduler``. Of the five methods that you can
|
||||||
|
override, you *must* override the two methods marked with an asterisk
|
||||||
|
(\*) below:
|
||||||
|
|
||||||
|
- ``update_service_capabilities``
|
||||||
|
|
||||||
|
- ``hosts_up``
|
||||||
|
|
||||||
|
- ``group_hosts``
|
||||||
|
|
||||||
|
- \* ``schedule_run_instance``
|
||||||
|
|
||||||
|
- \* ``select_destinations``
|
||||||
|
|
||||||
|
To demonstrate customizing OpenStack, we'll create an example of a
|
||||||
|
Compute scheduler that randomly places an instance on a subset of hosts,
|
||||||
|
depending on the originating IP address of the request and the prefix of
|
||||||
|
the hostname. Such an example could be useful when you have a group of
|
||||||
|
users on a subnet and you want all of their instances to start within
|
||||||
|
some subset of your hosts.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This example is for illustrative purposes only. It should not be
|
||||||
|
used as a scheduler for Compute without further development and
|
||||||
|
testing.
|
||||||
|
|
||||||
|
When you join the screen session that ``stack.sh`` starts with
|
||||||
|
``screen -r stack``, you are greeted with many screen windows:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
0$ shell* 1$ key 2$ horizon ... 9$ n-api ... 14$ n-sch ...
|
||||||
|
|
||||||
|
|
||||||
|
``shell``
|
||||||
|
A shell where you can get some work done
|
||||||
|
|
||||||
|
``key``
|
||||||
|
The keystone service
|
||||||
|
|
||||||
|
``horizon``
|
||||||
|
The horizon dashboard web application
|
||||||
|
|
||||||
|
``n-{name}``
|
||||||
|
The nova services
|
||||||
|
|
||||||
|
``n-sch``
|
||||||
|
The nova scheduler service
|
||||||
|
|
||||||
|
**To create the scheduler and plug it in through configuration**
|
||||||
|
|
||||||
|
#. The code for OpenStack lives in ``/opt/stack``, so go to the ``nova``
|
||||||
|
directory and edit your scheduler module. Change to the directory where
|
||||||
|
``nova`` is installed:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ cd /opt/stack/nova
|
||||||
|
|
||||||
|
#. Create the ``ip_scheduler.py`` Python source code file:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ vim nova/scheduler/ip_scheduler.py
|
||||||
|
|
||||||
|
#. The code shown below is a driver that will
|
||||||
|
schedule servers to hosts based on IP address as explained at the
|
||||||
|
beginning of the section. Copy the code into ``ip_scheduler.py``. When
|
||||||
|
you're done, save and close the file.
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
# vim: tabstop=4 shiftwidth=4 softtabstop=4
|
||||||
|
# Copyright (c) 2014 OpenStack Foundation
|
||||||
|
# All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||||
|
# not use this file except in compliance with the License. You may obtain
|
||||||
|
# a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||||
|
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||||
|
# License for the specific language governing permissions and limitations
|
||||||
|
# under the License.
|
||||||
|
|
||||||
|
"""
|
||||||
|
IP Scheduler implementation
|
||||||
|
"""
|
||||||
|
|
||||||
|
import random
|
||||||
|
|
||||||
|
from oslo.config import cfg
|
||||||
|
|
||||||
|
from nova.compute import rpcapi as compute_rpcapi
|
||||||
|
from nova import exception
|
||||||
|
from nova.openstack.common import log as logging
|
||||||
|
from nova.openstack.common.gettextutils import _
|
||||||
|
from nova.scheduler import driver
|
||||||
|
|
||||||
|
CONF = cfg.CONF
|
||||||
|
CONF.import_opt('compute_topic', 'nova.compute.rpcapi')
|
||||||
|
LOG = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
class IPScheduler(driver.Scheduler):
|
||||||
|
"""
|
||||||
|
Implements Scheduler as a random node selector based on
|
||||||
|
IP address and hostname prefix.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
super(IPScheduler, self).__init__(*args, **kwargs)
|
||||||
|
self.compute_rpcapi = compute_rpcapi.ComputeAPI()
|
||||||
|
|
||||||
|
def _filter_hosts(self, request_spec, hosts, filter_properties,
|
||||||
|
hostname_prefix):
|
||||||
|
"""Filter a list of hosts based on hostname prefix."""
|
||||||
|
|
||||||
|
hosts = [host for host in hosts if host.startswith(hostname_prefix)]
|
||||||
|
return hosts
|
||||||
|
|
||||||
|
def _schedule(self, context, topic, request_spec, filter_properties):
|
||||||
|
"""Picks a host that is up at random."""
|
||||||
|
|
||||||
|
elevated = context.elevated()
|
||||||
|
hosts = self.hosts_up(elevated, topic)
|
||||||
|
if not hosts:
|
||||||
|
msg = _("Is the appropriate service running?")
|
||||||
|
raise exception.NoValidHost(reason=msg)
|
||||||
|
|
||||||
|
remote_ip = context.remote_address
|
||||||
|
|
||||||
|
if remote_ip.startswith('10.1'):
|
||||||
|
hostname_prefix = 'doc'
|
||||||
|
elif remote_ip.startswith('10.2'):
|
||||||
|
hostname_prefix = 'ops'
|
||||||
|
else:
|
||||||
|
hostname_prefix = 'dev'
|
||||||
|
|
||||||
|
hosts = self._filter_hosts(request_spec, hosts, filter_properties,
|
||||||
|
hostname_prefix)
|
||||||
|
if not hosts:
|
||||||
|
msg = _("Could not find another compute")
|
||||||
|
raise exception.NoValidHost(reason=msg)
|
||||||
|
|
||||||
|
host = random.choice(hosts)
|
||||||
|
LOG.debug("Request from %(remote_ip)s scheduled to %(host)s" % locals())
|
||||||
|
|
||||||
|
return host
|
||||||
|
|
||||||
|
def select_destinations(self, context, request_spec, filter_properties):
|
||||||
|
"""Selects random destinations."""
|
||||||
|
num_instances = request_spec['num_instances']
|
||||||
|
# NOTE(timello): Returns a list of dicts with 'host', 'nodename' and
|
||||||
|
# 'limits' as keys for compatibility with filter_scheduler.
|
||||||
|
dests = []
|
||||||
|
for i in range(num_instances):
|
||||||
|
host = self._schedule(context, CONF.compute_topic,
|
||||||
|
request_spec, filter_properties)
|
||||||
|
host_state = dict(host=host, nodename=None, limits=None)
|
||||||
|
dests.append(host_state)
|
||||||
|
|
||||||
|
if len(dests) < num_instances:
|
||||||
|
raise exception.NoValidHost(reason='')
|
||||||
|
return dests
|
||||||
|
|
||||||
|
def schedule_run_instance(self, context, request_spec,
|
||||||
|
admin_password, injected_files,
|
||||||
|
requested_networks, is_first_time,
|
||||||
|
filter_properties, legacy_bdm_in_spec):
|
||||||
|
"""Create and run an instance or instances."""
|
||||||
|
instance_uuids = request_spec.get('instance_uuids')
|
||||||
|
for num, instance_uuid in enumerate(instance_uuids):
|
||||||
|
request_spec['instance_properties']['launch_index'] = num
|
||||||
|
try:
|
||||||
|
host = self._schedule(context, CONF.compute_topic,
|
||||||
|
request_spec, filter_properties)
|
||||||
|
updated_instance = driver.instance_update_db(context,
|
||||||
|
instance_uuid)
|
||||||
|
self.compute_rpcapi.run_instance(context,
|
||||||
|
instance=updated_instance, host=host,
|
||||||
|
requested_networks=requested_networks,
|
||||||
|
injected_files=injected_files,
|
||||||
|
admin_password=admin_password,
|
||||||
|
is_first_time=is_first_time,
|
||||||
|
request_spec=request_spec,
|
||||||
|
filter_properties=filter_properties,
|
||||||
|
legacy_bdm_in_spec=legacy_bdm_in_spec)
|
||||||
|
except Exception as ex:
|
||||||
|
# NOTE(vish): we don't reraise the exception here to make sure
|
||||||
|
# that all instances in the request get set to
|
||||||
|
# error properly
|
||||||
|
driver.handle_schedule_error(context, ex, instance_uuid,
|
||||||
|
request_spec)
|
||||||
|
|
||||||
|
|
||||||
|
There is a lot of useful information in ``context``, ``request_spec``,
|
||||||
|
and ``filter_properties`` that you can use to decide where to schedule
|
||||||
|
the instance. To find out more about what properties are available, you
|
||||||
|
can insert the following log statements into the
|
||||||
|
``schedule_run_instance`` method of the scheduler above:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
LOG.debug("context = %(context)s" % {'context': context.__dict__})
|
||||||
|
LOG.debug("request_spec = %(request_spec)s" % locals())
|
||||||
|
LOG.debug("filter_properties = %(filter_properties)s" % locals())
|
||||||
|
|
||||||
|
#. To plug this scheduler into nova, edit one configuration file,
|
||||||
|
``/etc/nova/nova.conf``:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ vim /etc/nova/nova.conf
|
||||||
|
|
||||||
|
#. Find the ``scheduler_driver`` config and change it like so:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
scheduler_driver=nova.scheduler.ip_scheduler.IPScheduler
|
||||||
|
|
||||||
|
#. Restart the nova scheduler service to make nova use your scheduler.
|
||||||
|
Start by switching to the ``n-sch`` screen:
|
||||||
|
|
||||||
|
#. Press **Ctrl+A** followed by 9.
|
||||||
|
|
||||||
|
#. Press **Ctrl+A** followed by N until you reach the ``n-sch`` screen.
|
||||||
|
|
||||||
|
#. Press **Ctrl+C** to kill the service.
|
||||||
|
|
||||||
|
#. Press Up Arrow to bring up the last command.
|
||||||
|
|
||||||
|
#. Press Enter to run it.
|
||||||
|
|
||||||
|
#. Test your scheduler with the nova CLI. Start by switching to the
|
||||||
|
``shell`` screen and finish by switching back to the ``n-sch`` screen to
|
||||||
|
check the log output:
|
||||||
|
|
||||||
|
#. Press **Ctrl+A** followed by 0.
|
||||||
|
|
||||||
|
#. Make sure you're in the ``devstack`` directory:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ cd /root/devstack
|
||||||
|
|
||||||
|
#. Source ``openrc`` to set up your environment variables for the CLI:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ source openrc
|
||||||
|
|
||||||
|
#. Put the image ID for the only installed image into an environment
|
||||||
|
variable:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ IMAGE_ID=`nova image-list | egrep cirros | egrep -v "kernel|ramdisk" | awk '{print $2}'`
|
||||||
|
|
||||||
|
#. Boot a test server:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova boot --flavor 1 --image $IMAGE_ID scheduler-test
|
||||||
|
|
||||||
|
#. Switch back to the ``n-sch`` screen. Among the log statements, you'll
|
||||||
|
see the line:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
2014-01-23 19:57:47.262 DEBUG nova.scheduler.ip_scheduler \
|
||||||
|
[req-... demo demo] Request from 162.242.221.84 \
|
||||||
|
scheduled to devstack-havana \
|
||||||
|
_schedule /opt/stack/nova/nova/scheduler/ip_scheduler.py:76
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
Functional testing like this is not a replacement for proper unit
|
||||||
|
and integration testing, but it serves to get you started.
|
||||||
|
|
||||||
|
A similar pattern can be followed in other projects that use the driver
|
||||||
|
architecture. Simply create a module and class that conform to the
|
||||||
|
driver interface and plug it in through configuration. Your code runs
|
||||||
|
when that feature is used and can call out to other services as
|
||||||
|
necessary. No project core code is touched. Look for a "driver" value in
|
||||||
|
the project's ``.conf`` configuration files in ``/etc/<project>`` to
|
||||||
|
identify projects that use a driver architecture.
|
||||||
|
|
||||||
|
When your scheduler is done, we encourage you to open source it and let
|
||||||
|
the community know on the OpenStack mailing list. Perhaps others need
|
||||||
|
the same functionality. They can use your code, provide feedback, and
|
||||||
|
possibly contribute. If enough support exists for it, perhaps you can
|
||||||
|
propose that it be added to the official Compute
|
||||||
|
`schedulers <https://git.openstack.org/cgit/openstack/nova/tree/nova/scheduler>`_.
|
||||||
|
|
||||||
|
Customizing the Dashboard (Horizon)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The dashboard is based on the Python
|
||||||
|
`Django <https://www.djangoproject.com/>`_ web application framework.
|
||||||
|
The best guide to customizing it has already been written and can be
|
||||||
|
found at `Building on
|
||||||
|
Horizon <http://docs.openstack.org/developer/horizon/topics/tutorial.html>`_.
|
||||||
|
|
||||||
|
Conclusion
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
When operating an OpenStack cloud, you may discover that your users can
|
||||||
|
be quite demanding. If OpenStack doesn't do what your users need, it may
|
||||||
|
be up to you to fulfill those requirements. This chapter provided you
|
||||||
|
with some options for customization and gave you the tools you need to
|
||||||
|
get started.
|
|
@ -0,0 +1,602 @@
|
||||||
|
===============
|
||||||
|
Lay of the Land
|
||||||
|
===============
|
||||||
|
|
||||||
|
This chapter helps you set up your working environment and use it to
|
||||||
|
take a look around your cloud.
|
||||||
|
|
||||||
|
Using the OpenStack Dashboard for Administration
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
As a cloud administrative user, you can use the OpenStack dashboard to
|
||||||
|
create and manage projects, users, images, and flavors. Users are
|
||||||
|
allowed to create and manage images within specified projects and to
|
||||||
|
share images, depending on the Image service configuration. Typically,
|
||||||
|
the policy configuration allows admin users only to set quotas and
|
||||||
|
create and manage services. The dashboard provides an :guilabel:`Admin`
|
||||||
|
tab with a :guilabel:`System Panel` and an :guilabel:`Identity` tab.
|
||||||
|
These interfaces give you access to system information and usage as
|
||||||
|
well as to settings for configuring what
|
||||||
|
end users can do. Refer to the `OpenStack Administrator
|
||||||
|
Guide <http://docs.openstack.org/admin-guide/dashboard.html>`_ for
|
||||||
|
detailed how-to information about using the dashboard as an admin user.
|
||||||
|
|
||||||
|
Command-Line Tools
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
We recommend using a combination of the OpenStack command-line interface
|
||||||
|
(CLI) tools and the OpenStack dashboard for administration. Some users
|
||||||
|
with a background in other cloud technologies may be using the EC2
|
||||||
|
Compatibility API, which uses naming conventions somewhat different from
|
||||||
|
the native API. We highlight those differences.
|
||||||
|
|
||||||
|
We strongly suggest that you install the command-line clients from the
|
||||||
|
`Python Package Index <https://pypi.python.org/pypi>`_ (PyPI) instead
|
||||||
|
of from the distribution packages. The clients are under heavy
|
||||||
|
development, and it is very likely at any given time that the version of
|
||||||
|
the packages distributed by your operating-system vendor are out of
|
||||||
|
date.
|
||||||
|
|
||||||
|
The pip utility is used to manage package installation from the PyPI
|
||||||
|
archive and is available in the python-pip package in most Linux
|
||||||
|
distributions. Each OpenStack project has its own client, so depending
|
||||||
|
on which services your site runs, install some or all of the
|
||||||
|
following packages:
|
||||||
|
|
||||||
|
* python-novaclient (:term:`nova` CLI)
|
||||||
|
* python-glanceclient (:term:`glance` CLI)
|
||||||
|
* python-keystoneclient (:term:`keystone` CLI)
|
||||||
|
* python-cinderclient (:term:`cinder` CLI)
|
||||||
|
* python-swiftclient (:term:`swift` CLI)
|
||||||
|
* python-neutronclient (:term:`neutron` CLI)
|
||||||
|
|
||||||
|
Installing the Tools
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
To install (or upgrade) a package from the PyPI archive with pip,
|
||||||
|
command-line tools installingas root:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# pip install [--upgrade] <package-name>
|
||||||
|
|
||||||
|
To remove the package:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# pip uninstall <package-name>
|
||||||
|
|
||||||
|
If you need even newer versions of the clients, pip can install directly
|
||||||
|
from the upstream git repository using the :option:`-e` flag. You must specify
|
||||||
|
a name for the Python egg that is installed. For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# pip install -e git+https://git.openstack.org/openstack/python-novaclient#egg=python-novaclient
|
||||||
|
|
||||||
|
If you support the EC2 API on your cloud, you should also install the
|
||||||
|
euca2ools package or some other EC2 API tool so that you can get the
|
||||||
|
same view your users have. Using EC2 API-based tools is mostly out of
|
||||||
|
the scope of this guide, though we discuss getting credentials for use
|
||||||
|
with it.
|
||||||
|
|
||||||
|
Administrative Command-Line Tools
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
There are also several :command:`*-manage` command-line tools. These are
|
||||||
|
installed with the project's services on the cloud controller and do not
|
||||||
|
need to be installed\*-manage command-line toolscommand-line tools
|
||||||
|
administrative separately:
|
||||||
|
|
||||||
|
* :command:`glance-manage`
|
||||||
|
* :command:`keystone-manage`
|
||||||
|
* :command:`cinder-manage`
|
||||||
|
|
||||||
|
Unlike the CLI tools mentioned above, the :command:`*-manage` tools must
|
||||||
|
be run from the cloud controller, as root, because they need read access
|
||||||
|
to the config files such as ``/etc/nova/nova.conf`` and to make queries
|
||||||
|
directly against the database rather than against the OpenStack
|
||||||
|
:term:`API endpoints <API endpoint>`.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
The existence of the ``*-manage`` tools is a legacy issue. It is a
|
||||||
|
goal of the OpenStack project to eventually migrate all of the
|
||||||
|
remaining functionality in the ``*-manage`` tools into the API-based
|
||||||
|
tools. Until that day, you need to SSH into the
|
||||||
|
:term:`cloud controller node` to perform some maintenance operations
|
||||||
|
that require one of the ``*-manage`` tools.
|
||||||
|
|
||||||
|
Getting Credentials
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
You must have the appropriate credentials if you want to use the
|
||||||
|
command-line tools to make queries against your OpenStack cloud. By far,
|
||||||
|
the easiest way to obtain :term:`authentication` credentials to use with
|
||||||
|
command-line clients is to use the OpenStack dashboard. Select
|
||||||
|
:guilabel:`Project`, click the :guilabel:`Project` tab, and click
|
||||||
|
:guilabel:`Access & Security` on the :guilabel:`Compute` category.
|
||||||
|
On the :guilabel:`Access & Security` page, click the :guilabel:`API Access`
|
||||||
|
tab to display two buttons, :guilabel:`Download OpenStack RC File` and
|
||||||
|
:guilabel:`Download EC2 Credentials`, which let you generate files that
|
||||||
|
you can source in your shell to populate the environment variables the
|
||||||
|
command-line tools require to know where your service endpoints and your
|
||||||
|
authentication information are. The user you logged in to the dashboard
|
||||||
|
dictates the filename for the openrc file, such as ``demo-openrc.sh``.
|
||||||
|
When logged in as admin, the file is named ``admin-openrc.sh``.
|
||||||
|
|
||||||
|
The generated file looks something like this:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# With the addition of Keystone, to use an openstack cloud you should
|
||||||
|
# authenticate against keystone, which returns a **Token** and **Service
|
||||||
|
# Catalog**. The catalog contains the endpoint for all services the
|
||||||
|
# user/tenant has access to--including nova, glance, keystone, swift.
|
||||||
|
#
|
||||||
|
# *NOTE*: Using the 2.0 *auth api* does not mean that compute api is 2.0.
|
||||||
|
# We use the 1.1 *compute api*
|
||||||
|
export OS_AUTH_URL=http://203.0.113.10:5000/v2.0
|
||||||
|
|
||||||
|
# With the addition of Keystone we have standardized on the term **tenant**
|
||||||
|
# as the entity that owns the resources.
|
||||||
|
export OS_TENANT_ID=98333aba48e756fa8f629c83a818ad57
|
||||||
|
export OS_TENANT_NAME="test-project"
|
||||||
|
|
||||||
|
# In addition to the owning entity (tenant), openstack stores the entity
|
||||||
|
# performing the action as the **user**.
|
||||||
|
export OS_USERNAME=demo
|
||||||
|
|
||||||
|
# With Keystone you pass the keystone password.
|
||||||
|
echo "Please enter your OpenStack Password: "
|
||||||
|
read -s OS_PASSWORD_INPUT
|
||||||
|
export OS_PASSWORD=$OS_PASSWORD_INPUT
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This does not save your password in plain text, which is a good
|
||||||
|
thing. But when you source or run the script, it prompts you for
|
||||||
|
your password and then stores your response in the environment
|
||||||
|
variable ``OS_PASSWORD``. It is important to note that this does
|
||||||
|
require interactivity. It is possible to store a value directly in
|
||||||
|
the script if you require a noninteractive operation, but you then
|
||||||
|
need to be extremely cautious with the security and permissions of
|
||||||
|
this file.passwordssecurity issues passwords
|
||||||
|
|
||||||
|
EC2 compatibility credentials can be downloaded by selecting
|
||||||
|
:guilabel:`Project`, then :guilabel:`Compute`, then
|
||||||
|
:guilabel:`Access & Security`, then :guilabel:`API Access` to display the
|
||||||
|
:guilabel:`Download EC2 Credentials` button. Click the button to generate
|
||||||
|
a ZIP file with server x509 certificates and a shell script fragment.
|
||||||
|
Create a new directory in a secure location because these are live credentials
|
||||||
|
containing all the authentication information required to access your
|
||||||
|
cloud identity, unlike the default ``user-openrc``. Extract the ZIP file
|
||||||
|
here. You should have ``cacert.pem``, ``cert.pem``, ``ec2rc.sh``, and
|
||||||
|
``pk.pem``. The ``ec2rc.sh`` is similar to this:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
NOVARC=$(readlink -f "${BASH_SOURCE:-${0}}" 2>/dev/null) ||\
|
||||||
|
NOVARC=$(python -c 'import os,sys; \
|
||||||
|
print os.path.abspath(os.path.realpath(sys.argv[1]))' "${BASH_SOURCE:-${0}}")
|
||||||
|
NOVA_KEY_DIR=${NOVARC%/*}
|
||||||
|
export EC2_ACCESS_KEY=df7f93ec47e84ef8a347bbb3d598449a
|
||||||
|
export EC2_SECRET_KEY=ead2fff9f8a344e489956deacd47e818
|
||||||
|
export EC2_URL=http://203.0.113.10:8773/services/Cloud
|
||||||
|
export EC2_USER_ID=42 # nova does not use user id, but bundling requires it
|
||||||
|
export EC2_PRIVATE_KEY=${NOVA_KEY_DIR}/pk.pem
|
||||||
|
export EC2_CERT=${NOVA_KEY_DIR}/cert.pem
|
||||||
|
export NOVA_CERT=${NOVA_KEY_DIR}/cacert.pem
|
||||||
|
export EUCALYPTUS_CERT=${NOVA_CERT} # euca-bundle-image seems to require this
|
||||||
|
|
||||||
|
alias ec2-bundle-image="ec2-bundle-image --cert $EC2_CERT --privatekey \
|
||||||
|
$EC2_PRIVATE_KEY --user 42 --ec2cert $NOVA_CERT"
|
||||||
|
alias ec2-upload-bundle="ec2-upload-bundle -a $EC2_ACCESS_KEY -s \
|
||||||
|
$EC2_SECRET_KEY --url $S3_URL --ec2cert $NOVA_CERT"
|
||||||
|
|
||||||
|
To put the EC2 credentials into your environment, source the
|
||||||
|
``ec2rc.sh`` file.
|
||||||
|
|
||||||
|
Inspecting API Calls
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
The command-line tools can be made to show the OpenStack API calls they
|
||||||
|
make by passing the :option:`--debug` flag to them.API (application
|
||||||
|
programming interface) API calls, inspectingcommand-line tools
|
||||||
|
inspecting API calls For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# nova --debug list
|
||||||
|
|
||||||
|
This example shows the HTTP requests from the client and the responses
|
||||||
|
from the endpoints, which can be helpful in creating custom tools
|
||||||
|
written to the OpenStack API.
|
||||||
|
|
||||||
|
Using cURL for further inspection
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Underlying the use of the command-line tools is the OpenStack API, which
|
||||||
|
is a RESTful API that runs over HTTP. There may be cases where you want
|
||||||
|
to interact with the API directly or need to use it because of a
|
||||||
|
suspected bug in one of the CLI tools. The best way to do this is to use
|
||||||
|
a combination of `cURL <http://curl.haxx.se/>`_ and another tool,
|
||||||
|
such as `jq <http://stedolan.github.io/jq/>`_, to parse the JSON from
|
||||||
|
the responses.
|
||||||
|
|
||||||
|
The first thing you must do is authenticate with the cloud using your
|
||||||
|
credentials to get an authentication token.
|
||||||
|
|
||||||
|
Your credentials are a combination of username, password, and tenant
|
||||||
|
(project). You can extract these values from the ``openrc.sh`` discussed
|
||||||
|
above. The token allows you to interact with your other service
|
||||||
|
endpoints without needing to reauthenticate for every request. Tokens
|
||||||
|
are typically good for 24 hours, and when the token expires, you are
|
||||||
|
alerted with a 401 (Unauthorized) response and you can request another
|
||||||
|
token.
|
||||||
|
|
||||||
|
#. Look at your OpenStack service catalog:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ curl -s -X POST http://203.0.113.10:35357/v2.0/tokens \
|
||||||
|
-d '{"auth": {"passwordCredentials": {"username":"test-user", \
|
||||||
|
"password":"test-password"}, \
|
||||||
|
"tenantName":"test-project"}}' \
|
||||||
|
-H "Content-type: application/json" | jq .
|
||||||
|
|
||||||
|
#. Read through the JSON response to get a feel for how the catalog is
|
||||||
|
laid out.
|
||||||
|
|
||||||
|
To make working with subsequent requests easier, store the token in
|
||||||
|
an environment variable:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ TOKEN=`curl -s -X POST http://203.0.113.10:35357/v2.0/tokens \
|
||||||
|
-d '{"auth": {"passwordCredentials": {"username":"test-user", \
|
||||||
|
"password":"test-password"}, \
|
||||||
|
"tenantName":"test-project"}}' \
|
||||||
|
-H "Content-type: application/json" | jq -r .access.token.id`
|
||||||
|
|
||||||
|
Now you can refer to your token on the command line as ``$TOKEN``.
|
||||||
|
|
||||||
|
#. Pick a service endpoint from your service catalog, such as compute.
|
||||||
|
Try a request, for example, listing instances (servers):
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ curl -s \
|
||||||
|
-H "X-Auth-Token: $TOKEN" \
|
||||||
|
http://203.0.113.10:8774/v2/98333aba48e756fa8f629c83a818ad57/servers | jq .
|
||||||
|
|
||||||
|
To discover how API requests should be structured, read the `OpenStack
|
||||||
|
API Reference <http://developer.openstack.org/api-ref.html>`_. To chew
|
||||||
|
through the responses using jq, see the `jq
|
||||||
|
Manual <http://stedolan.github.io/jq/manual/>`_.
|
||||||
|
|
||||||
|
The ``-s flag`` used in the cURL commands above are used to prevent
|
||||||
|
the progress meter from being shown. If you are having trouble running
|
||||||
|
cURL commands, you'll want to remove it. Likewise, to help you
|
||||||
|
troubleshoot cURL commands, you can include the ``-v`` flag to show you
|
||||||
|
the verbose output. There are many more extremely useful features in
|
||||||
|
cURL; refer to the man page for all the options.
|
||||||
|
|
||||||
|
Servers and Services
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
As an administrator, you have a few ways to discover what your OpenStack
|
||||||
|
cloud looks like simply by using the OpenStack tools available. This
|
||||||
|
section gives you an idea of how to get an overview of your cloud, its
|
||||||
|
shape, size, and current state.
|
||||||
|
|
||||||
|
First, you can discover what servers belong to your OpenStack cloud by
|
||||||
|
running:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# nova service-list
|
||||||
|
|
||||||
|
The output looks like the following:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
+----+------------------+-------------------+------+---------+-------+----------------------------+-----------------+
|
||||||
|
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
|
||||||
|
+----+------------------+-------------------+------+---------+-------+----------------------------+-----------------+
|
||||||
|
| 1 | nova-cert | cloud.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
|
||||||
|
| 2 | nova-compute | c01.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
|
||||||
|
| 3 | nova-compute | c01.example.com. | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
|
||||||
|
| 4 | nova-compute | c01.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
|
||||||
|
| 5 | nova-compute | c01.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
|
||||||
|
| 6 | nova-compute | c01.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
|
||||||
|
| 7 | nova-conductor | cloud.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
|
||||||
|
| 8 | nova-cert | cloud.example.com | nova | enabled | up | 2016-01-05T17:20:42.000000 | - |
|
||||||
|
| 9 | nova-scheduler | cloud.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
|
||||||
|
| 10 | nova-consoleauth | cloud.example.com | nova | enabled | up | 2016-01-05T17:20:35.000000 | - |
|
||||||
|
+----+------------------+-------------------+------+---------+-------+----------------------------+-----------------+
|
||||||
|
|
||||||
|
The output shows that there are five compute nodes and one cloud
|
||||||
|
controller. You see all the services in the up state, which indicates that
|
||||||
|
the services are up and running. If a service is in a down state, it is
|
||||||
|
no longer available. This is an indication that you
|
||||||
|
should troubleshoot why the service is down.
|
||||||
|
|
||||||
|
If you are using cinder, run the following command to see a similar
|
||||||
|
listing:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# cinder-manage host list | sort
|
||||||
|
host zone
|
||||||
|
c01.example.com nova
|
||||||
|
c02.example.com nova
|
||||||
|
c03.example.com nova
|
||||||
|
c04.example.com nova
|
||||||
|
c05.example.com nova
|
||||||
|
cloud.example.com nova
|
||||||
|
|
||||||
|
With these two tables, you now have a good overview of what servers and
|
||||||
|
services make up your cloud.
|
||||||
|
|
||||||
|
You can also use the Identity service (keystone) to see what services
|
||||||
|
are available in your cloud as well as what endpoints have been
|
||||||
|
configured for the services.
|
||||||
|
|
||||||
|
The following command requires you to have your shell environment
|
||||||
|
configured with the proper administrative variables:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ openstack catalog list
|
||||||
|
+----------+------------+---------------------------------------------------------------------------------+
|
||||||
|
| Name | Type | Endpoints |
|
||||||
|
+----------+------------+---------------------------------------------------------------------------------+
|
||||||
|
| nova | compute | RegionOne |
|
||||||
|
| | | publicURL: http://192.168.122.10:8774/v2/9faa845768224258808fc17a1bb27e5e |
|
||||||
|
| | | internalURL: http://192.168.122.10:8774/v2/9faa845768224258808fc17a1bb27e5e |
|
||||||
|
| | | adminURL: http://192.168.122.10:8774/v2/9faa845768224258808fc17a1bb27e5e |
|
||||||
|
| | | |
|
||||||
|
| cinderv2 | volumev2 | RegionOne |
|
||||||
|
| | | publicURL: http://192.168.122.10:8776/v2/9faa845768224258808fc17a1bb27e5e |
|
||||||
|
| | | internalURL: http://192.168.122.10:8776/v2/9faa845768224258808fc17a1bb27e5e |
|
||||||
|
| | | adminURL: http://192.168.122.10:8776/v2/9faa845768224258808fc17a1bb27e5e |
|
||||||
|
| | | |
|
||||||
|
|
||||||
|
The preceding output has been truncated to show only two services. You
|
||||||
|
will see one service entry for each service that your cloud provides.
|
||||||
|
Note how the endpoint domain can be different depending on the endpoint
|
||||||
|
type. Different endpoint domains per type are not required, but this can
|
||||||
|
be done for different reasons, such as endpoint privacy or network
|
||||||
|
traffic segregation.
|
||||||
|
|
||||||
|
You can find the version of the Compute installation by using the
|
||||||
|
nova client command:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# nova version-list
|
||||||
|
|
||||||
|
Diagnose Your Compute Nodes
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
You can obtain extra information about virtual machines that are
|
||||||
|
running—their CPU usage, the memory, the disk I/O or network I/O—per
|
||||||
|
instance, by running the :command:`nova diagnostics` command with a server ID:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova diagnostics <serverID>
|
||||||
|
|
||||||
|
The output of this command varies depending on the hypervisor because
|
||||||
|
hypervisors support different attributes. The following demonstrates
|
||||||
|
the difference between the two most popular hypervisors.
|
||||||
|
Here is example output when the hypervisor is Xen:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
+----------------+-----------------+
|
||||||
|
| Property | Value |
|
||||||
|
+----------------+-----------------+
|
||||||
|
| cpu0 | 4.3627 |
|
||||||
|
| memory | 1171088064.0000 |
|
||||||
|
| memory_target | 1171088064.0000 |
|
||||||
|
| vbd_xvda_read | 0.0 |
|
||||||
|
| vbd_xvda_write | 0.0 |
|
||||||
|
| vif_0_rx | 3223.6870 |
|
||||||
|
| vif_0_tx | 0.0 |
|
||||||
|
| vif_1_rx | 104.4955 |
|
||||||
|
| vif_1_tx | 0.0 |
|
||||||
|
+----------------+-----------------+
|
||||||
|
|
||||||
|
While the command should work with any hypervisor that is controlled
|
||||||
|
through libvirt (KVM, QEMU, or LXC), it has been tested only with KVM.
|
||||||
|
Here is the example output when the hypervisor is KVM:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
+------------------+------------+
|
||||||
|
| Property | Value |
|
||||||
|
+------------------+------------+
|
||||||
|
| cpu0_time | 2870000000 |
|
||||||
|
| memory | 524288 |
|
||||||
|
| vda_errors | -1 |
|
||||||
|
| vda_read | 262144 |
|
||||||
|
| vda_read_req | 112 |
|
||||||
|
| vda_write | 5606400 |
|
||||||
|
| vda_write_req | 376 |
|
||||||
|
| vnet0_rx | 63343 |
|
||||||
|
| vnet0_rx_drop | 0 |
|
||||||
|
| vnet0_rx_errors | 0 |
|
||||||
|
| vnet0_rx_packets | 431 |
|
||||||
|
| vnet0_tx | 4905 |
|
||||||
|
| vnet0_tx_drop | 0 |
|
||||||
|
| vnet0_tx_errors | 0 |
|
||||||
|
| vnet0_tx_packets | 45 |
|
||||||
|
+------------------+------------+
|
||||||
|
|
||||||
|
Network Inspection
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
To see which fixed IP networks are configured in your cloud, you can use
|
||||||
|
the :command:`nova` command-line client to get the IP ranges:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova network-list
|
||||||
|
+--------------------------------------+--------+--------------+
|
||||||
|
| ID | Label | Cidr |
|
||||||
|
+--------------------------------------+--------+--------------+
|
||||||
|
| 3df67919-9600-4ea8-952e-2a7be6f70774 | test01 | 10.1.0.0/24 |
|
||||||
|
| 8283efb2-e53d-46e1-a6bd-bb2bdef9cb9a | test02 | 10.1.1.0/24 |
|
||||||
|
+--------------------------------------+--------+--------------+
|
||||||
|
|
||||||
|
The nova command-line client can provide some additional details:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# nova network-list
|
||||||
|
id IPv4 IPv6 start address DNS1 DNS2 VlanID project uuid
|
||||||
|
1 10.1.0.0/24 None 10.1.0.3 None None 300 2725bbd beacb3f2
|
||||||
|
2 10.1.1.0/24 None 10.1.1.3 None None 301 none d0b1a796
|
||||||
|
|
||||||
|
This output shows that two networks are configured, each network
|
||||||
|
containing 255 IPs (a /24 subnet). The first network has been assigned
|
||||||
|
to a certain project, while the second network is still open for
|
||||||
|
assignment. You can assign this network manually; otherwise, it is
|
||||||
|
automatically assigned when a project launches its first instance.
|
||||||
|
|
||||||
|
To find out whether any floating IPs are available in your cloud, run:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# nova floating-ip-list
|
||||||
|
2725bb...59f43f 1.2.3.4 None nova vlan20
|
||||||
|
None 1.2.3.5 48a415...b010ff nova vlan20
|
||||||
|
|
||||||
|
Here, two floating IPs are available. The first has been allocated to a
|
||||||
|
project, while the other is unallocated.
|
||||||
|
|
||||||
|
Users and Projects
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
To see a list of projects that have been added to the cloud,projects
|
||||||
|
obtaining list of currentuser management listing usersworking
|
||||||
|
environment users and projects run:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ openstack project list
|
||||||
|
+----------------------------------+--------------------+
|
||||||
|
| ID | Name |
|
||||||
|
+----------------------------------+--------------------+
|
||||||
|
| 422c17c0b26f4fbe9449f37a5621a5e6 | alt_demo |
|
||||||
|
| 5dc65773519248f3a580cfe28ba7fa3f | demo |
|
||||||
|
| 9faa845768224258808fc17a1bb27e5e | admin |
|
||||||
|
| a733070a420c4b509784d7ea8f6884f7 | invisible_to_admin |
|
||||||
|
| aeb3e976e7794f3f89e4a7965db46c1e | service |
|
||||||
|
+----------------------------------+--------------------+
|
||||||
|
|
||||||
|
To see a list of users, run:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ openstack user list
|
||||||
|
+----------------------------------+----------+
|
||||||
|
| ID | Name |
|
||||||
|
+----------------------------------+----------+
|
||||||
|
| 5837063598694771aedd66aa4cddf0b8 | demo |
|
||||||
|
| 58efd9d852b74b87acc6efafaf31b30e | cinder |
|
||||||
|
| 6845d995a57a441f890abc8f55da8dfb | glance |
|
||||||
|
| ac2d15a1205f46d4837d5336cd4c5f5a | alt_demo |
|
||||||
|
| d8f593c3ae2b47289221f17a776a218b | admin |
|
||||||
|
| d959ec0a99e24df0b7cb106ff940df20 | nova |
|
||||||
|
+----------------------------------+----------+
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Sometimes a user and a group have a one-to-one mapping. This happens
|
||||||
|
for standard system accounts, such as cinder, glance, nova, and
|
||||||
|
swift, or when only one user is part of a group.
|
||||||
|
|
||||||
|
Running Instances
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
To see a list of running instances,instances list of runningworking
|
||||||
|
environment running instances run:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova list --all-tenants
|
||||||
|
+-----+------------------+--------+-------------------------------------------+
|
||||||
|
| ID | Name | Status | Networks |
|
||||||
|
+-----+------------------+--------+-------------------------------------------+
|
||||||
|
| ... | Windows | ACTIVE | novanetwork_1=10.1.1.3, 199.116.232.39 |
|
||||||
|
| ... | cloud controller | ACTIVE | novanetwork_0=10.1.0.6; jtopjian=10.1.2.3 |
|
||||||
|
| ... | compute node 1 | ACTIVE | novanetwork_0=10.1.0.4; jtopjian=10.1.2.4 |
|
||||||
|
| ... | devbox | ACTIVE | novanetwork_0=10.1.0.3 |
|
||||||
|
| ... | devstack | ACTIVE | novanetwork_0=10.1.0.5 |
|
||||||
|
| ... | initial | ACTIVE | nova_network=10.1.7.4, 10.1.8.4 |
|
||||||
|
| ... | lorin-head | ACTIVE | nova_network=10.1.7.3, 10.1.8.3 |
|
||||||
|
+-----+------------------+--------+-------------------------------------------+
|
||||||
|
|
||||||
|
Unfortunately, this command does not tell you various details about the
|
||||||
|
running instances, such as what compute node the instance is running on,
|
||||||
|
what flavor the instance is, and so on. You can use the following
|
||||||
|
command to view details about individual instances:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova show <uuid>
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# nova show 81db556b-8aa5-427d-a95c-2a9a6972f630
|
||||||
|
+-------------------------------------+-----------------------------------+
|
||||||
|
| Property | Value |
|
||||||
|
+-------------------------------------+-----------------------------------+
|
||||||
|
| OS-DCF:diskConfig | MANUAL |
|
||||||
|
| OS-EXT-SRV-ATTR:host | c02.example.com |
|
||||||
|
| OS-EXT-SRV-ATTR:hypervisor_hostname | c02.example.com |
|
||||||
|
| OS-EXT-SRV-ATTR:instance_name | instance-00000029 |
|
||||||
|
| OS-EXT-STS:power_state | 1 |
|
||||||
|
| OS-EXT-STS:task_state | None |
|
||||||
|
| OS-EXT-STS:vm_state | active |
|
||||||
|
| accessIPv4 | |
|
||||||
|
| accessIPv6 | |
|
||||||
|
| config_drive | |
|
||||||
|
| created | 2013-02-13T20:08:36Z |
|
||||||
|
| flavor | m1.small (6) |
|
||||||
|
| hostId | ... |
|
||||||
|
| id | ... |
|
||||||
|
| image | Ubuntu 12.04 cloudimg amd64 (...) |
|
||||||
|
| key_name | jtopjian-sandbox |
|
||||||
|
| metadata | {} |
|
||||||
|
| name | devstack |
|
||||||
|
| novanetwork_0 network | 10.1.0.5 |
|
||||||
|
| progress | 0 |
|
||||||
|
| security_groups | [{u'name': u'default'}] |
|
||||||
|
| status | ACTIVE |
|
||||||
|
| tenant_id | ... |
|
||||||
|
| updated | 2013-02-13T20:08:59Z |
|
||||||
|
| user_id | ... |
|
||||||
|
+-------------------------------------+-----------------------------------+
|
||||||
|
|
||||||
|
This output shows that an instance named ``devstack`` was created from
|
||||||
|
an Ubuntu 12.04 image using a flavor of ``m1.small`` and is hosted on
|
||||||
|
the compute node ``c02.example.com``.
|
||||||
|
|
||||||
|
Summary
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
We hope you have enjoyed this quick tour of your working environment,
|
||||||
|
including how to interact with your cloud and extract useful
|
||||||
|
information. From here, you can use the `Administrator
|
||||||
|
Guide <http://docs.openstack.org/admin-guide/>`_ as your
|
||||||
|
reference for all of the command-line functionality in your cloud.
|
|
@ -0,0 +1,777 @@
|
||||||
|
======================
|
||||||
|
Logging and Monitoring
|
||||||
|
======================
|
||||||
|
|
||||||
|
As an OpenStack cloud is composed of so many different services, there
|
||||||
|
are a large number of log files. This chapter aims to assist you in
|
||||||
|
locating and working with them and describes other ways to track the
|
||||||
|
status of your deployment.
|
||||||
|
|
||||||
|
Where Are the Logs?
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Most services use the convention of writing their log files to
|
||||||
|
subdirectories of the ``/var/log directory``, as listed in the
|
||||||
|
below table.
|
||||||
|
|
||||||
|
.. list-table:: OpenStack log locations
|
||||||
|
:widths: 33 33 33
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Node type
|
||||||
|
- Service
|
||||||
|
- Log location
|
||||||
|
* - Cloud controller
|
||||||
|
- ``nova-*``
|
||||||
|
- ``/var/log/nova``
|
||||||
|
* - Cloud controller
|
||||||
|
- ``glance-*``
|
||||||
|
- ``/var/log/glance``
|
||||||
|
* - Cloud controller
|
||||||
|
- ``cinder-*``
|
||||||
|
- ``/var/log/cinder``
|
||||||
|
* - Cloud controller
|
||||||
|
- ``keystone-*``
|
||||||
|
- ``/var/log/keystone``
|
||||||
|
* - Cloud controller
|
||||||
|
- ``neutron-*``
|
||||||
|
- ``/var/log/neutron``
|
||||||
|
* - Cloud controller
|
||||||
|
- horizon
|
||||||
|
- ``/var/log/apache2/``
|
||||||
|
* - All nodes
|
||||||
|
- misc (swift, dnsmasq)
|
||||||
|
- ``/var/log/syslog``
|
||||||
|
* - Compute nodes
|
||||||
|
- libvirt
|
||||||
|
- ``/var/log/libvirt/libvirtd.log``
|
||||||
|
* - Compute nodes
|
||||||
|
- Console (boot up messages) for VM instances:
|
||||||
|
- ``/var/lib/nova/instances/instance-<instance id>/console.log``
|
||||||
|
* - Block Storage nodes
|
||||||
|
- cinder-volume
|
||||||
|
- ``/var/log/cinder/cinder-volume.log``
|
||||||
|
|
||||||
|
|
||||||
|
Reading the Logs
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack services use the standard logging levels, at increasing
|
||||||
|
severity: DEBUG, INFO, AUDIT, WARNING, ERROR, CRITICAL, and TRACE. That
|
||||||
|
is, messages only appear in the logs if they are more "severe" than the
|
||||||
|
particular log level, with DEBUG allowing all log statements through.
|
||||||
|
For example, TRACE is logged only if the software has a stack trace,
|
||||||
|
while INFO is logged for every message including those that are only for
|
||||||
|
information.
|
||||||
|
|
||||||
|
To disable DEBUG-level logging, edit ``/etc/nova/nova.conf`` file as follows:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
debug=false
|
||||||
|
|
||||||
|
Keystone is handled a little differently. To modify the logging level,
|
||||||
|
edit the ``/etc/keystone/logging.conf`` file and look at the
|
||||||
|
``logger_root`` and ``handler_file`` sections.
|
||||||
|
|
||||||
|
Logging for horizon is configured in
|
||||||
|
``/etc/openstack_dashboard/local_settings.py``. Because horizon is
|
||||||
|
a Django web application, it follows the `Django Logging framework
|
||||||
|
conventions <https://docs.djangoproject.com/en/dev/topics/logging/>`_.
|
||||||
|
|
||||||
|
The first step in finding the source of an error is typically to search
|
||||||
|
for a CRITICAL, TRACE, or ERROR message in the log starting at the
|
||||||
|
bottom of the log file.
|
||||||
|
|
||||||
|
Here is an example of a CRITICAL log message, with the corresponding
|
||||||
|
TRACE (Python traceback) immediately following:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
2013-02-25 21:05:51 17409 CRITICAL cinder [-] Bad or unexpected response from the storage volume backend API: volume group
|
||||||
|
cinder-volumes doesn't exist
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder Traceback (most recent call last):
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/bin/cinder-volume", line 48, in <module>
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder service.wait()
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 422, in wait
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder _launcher.wait()
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 127, in wait
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder service.wait()
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder return self._exit_event.wait()
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder return hubs.get_hub().switch()
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder return self.greenlet.switch()
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder result = function(*args, **kwargs)
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 88, in run_server
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder server.start()
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 159, in start
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder self.manager.init_host()
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 95,
|
||||||
|
in init_host
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder self.driver.check_for_setup_error()
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/volume/driver.py", line 116,
|
||||||
|
in check_for_setup_error
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder raise exception.VolumeBackendAPIException(data=exception_message)
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder VolumeBackendAPIException: Bad or unexpected response from the storage volume
|
||||||
|
backend API: volume group cinder-volumes doesn't exist
|
||||||
|
2013-02-25 21:05:51 17409 TRACE cinder
|
||||||
|
|
||||||
|
In this example, ``cinder-volumes`` failed to start and has provided a
|
||||||
|
stack trace, since its volume back end has been unable to set up the
|
||||||
|
storage volume—probably because the LVM volume that is expected from the
|
||||||
|
configuration does not exist.
|
||||||
|
|
||||||
|
Here is an example error log:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
2013-02-25 20:26:33 6619 ERROR nova.openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable:
|
||||||
|
[Errno 111] ECONNREFUSED. Trying again in 23 seconds.
|
||||||
|
|
||||||
|
In this error, a nova service has failed to connect to the RabbitMQ
|
||||||
|
server because it got a connection refused error.
|
||||||
|
|
||||||
|
Tracing Instance Requests
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
When an instance fails to behave properly, you will often have to trace
|
||||||
|
activity associated with that instance across the log files of various
|
||||||
|
``nova-*`` services and across both the cloud controller and compute
|
||||||
|
nodes.
|
||||||
|
|
||||||
|
The typical way is to trace the UUID associated with an instance across
|
||||||
|
the service logs.
|
||||||
|
|
||||||
|
Consider the following example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova list
|
||||||
|
+--------------------------------+--------+--------+--------------------------+
|
||||||
|
| ID | Name | Status | Networks |
|
||||||
|
+--------------------------------+--------+--------+--------------------------+
|
||||||
|
| fafed8-4a46-413b-b113-f1959ffe | cirros | ACTIVE | novanetwork=192.168.100.3|
|
||||||
|
+--------------------------------------+--------+--------+--------------------+
|
||||||
|
|
||||||
|
Here, the ID associated with the instance is
|
||||||
|
``faf7ded8-4a46-413b-b113-f19590746ffe``. If you search for this string
|
||||||
|
on the cloud controller in the ``/var/log/nova-*.log`` files, it appears
|
||||||
|
in ``nova-api.log`` and ``nova-scheduler.log``. If you search for this
|
||||||
|
on the compute nodes in ``/var/log/nova-*.log``, it appears in
|
||||||
|
``nova-network.log`` and ``nova-compute.log``. If no ERROR or CRITICAL
|
||||||
|
messages appear, the most recent log entry that reports this may provide
|
||||||
|
a hint about what has gone wrong.
|
||||||
|
|
||||||
|
Adding Custom Logging Statements
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
If there is not enough information in the existing logs, you may need to
|
||||||
|
add your own custom logging statements to the ``nova-*``
|
||||||
|
services.
|
||||||
|
|
||||||
|
The source files are located in
|
||||||
|
``/usr/lib/python2.7/dist-packages/nova``.
|
||||||
|
|
||||||
|
To add logging statements, the following line should be near the top of
|
||||||
|
the file. For most files, these should already be there:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
from nova.openstack.common import log as logging
|
||||||
|
LOG = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
To add a DEBUG logging statement, you would do:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
LOG.debug("This is a custom debugging statement")
|
||||||
|
|
||||||
|
You may notice that all the existing logging messages are preceded by an
|
||||||
|
underscore and surrounded by parentheses, for example:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
LOG.debug(_("Logging statement appears here"))
|
||||||
|
|
||||||
|
This formatting is used to support translation of logging messages into
|
||||||
|
different languages using the
|
||||||
|
`gettext <https://docs.python.org/2/library/gettext.html>`_
|
||||||
|
internationalization library. You don't need to do this for your own
|
||||||
|
custom log messages. However, if you want to contribute the code back to
|
||||||
|
the OpenStack project that includes logging statements, you must
|
||||||
|
surround your log messages with underscores and parentheses.
|
||||||
|
|
||||||
|
RabbitMQ Web Management Interface or rabbitmqctl
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Aside from connection failures, RabbitMQ log files are generally not
|
||||||
|
useful for debugging OpenStack related issues. Instead, we recommend you
|
||||||
|
use the RabbitMQ web management interface.RabbitMQlogging/monitoring
|
||||||
|
RabbitMQ web management interface Enable it on your cloud
|
||||||
|
controller:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# /usr/lib/rabbitmq/bin/rabbitmq-plugins enable rabbitmq_management
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# service rabbitmq-server restart
|
||||||
|
|
||||||
|
The RabbitMQ web management interface is accessible on your cloud
|
||||||
|
controller at *http://localhost:55672*.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Ubuntu 12.04 installs RabbitMQ version 2.7.1, which uses port 55672.
|
||||||
|
RabbitMQ versions 3.0 and above use port 15672 instead. You can
|
||||||
|
check which version of RabbitMQ you have running on your local
|
||||||
|
Ubuntu machine by doing:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ dpkg -s rabbitmq-server | grep "Version:"
|
||||||
|
Version: 2.7.1-0ubuntu4
|
||||||
|
|
||||||
|
An alternative to enabling the RabbitMQ web management interface is to
|
||||||
|
use the ``rabbitmqctl`` commands. For example,
|
||||||
|
:command:`rabbitmqctl list_queues| grep cinder` displays any messages left in
|
||||||
|
the queue. If there are messages, it's a possible sign that cinder
|
||||||
|
services didn't connect properly to rabbitmq and might have to be
|
||||||
|
restarted.
|
||||||
|
|
||||||
|
Items to monitor for RabbitMQ include the number of items in each of the
|
||||||
|
queues and the processing time statistics for the server.
|
||||||
|
|
||||||
|
Centrally Managing Logs
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Because your cloud is most likely composed of many servers, you must
|
||||||
|
check logs on each of those servers to properly piece an event together.
|
||||||
|
A better solution is to send the logs of all servers to a central
|
||||||
|
location so that they can all be accessed from the same
|
||||||
|
area.
|
||||||
|
|
||||||
|
Ubuntu uses rsyslog as the default logging service. Since it is natively
|
||||||
|
able to send logs to a remote location, you don't have to install
|
||||||
|
anything extra to enable this feature, just modify the configuration
|
||||||
|
file. In doing this, consider running your logging over a management
|
||||||
|
network or using an encrypted VPN to avoid interception.
|
||||||
|
|
||||||
|
rsyslog Client Configuration
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
To begin, configure all OpenStack components to log to syslog in
|
||||||
|
addition to their standard log file location. Also configure each
|
||||||
|
component to log to a different syslog facility. This makes it easier to
|
||||||
|
split the logs into individual components on the central server:
|
||||||
|
|
||||||
|
``nova.conf``:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
use_syslog=True
|
||||||
|
syslog_log_facility=LOG_LOCAL0
|
||||||
|
|
||||||
|
``glance-api.conf`` and ``glance-registry.conf``:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
use_syslog=True
|
||||||
|
syslog_log_facility=LOG_LOCAL1
|
||||||
|
|
||||||
|
``cinder.conf``:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
use_syslog=True
|
||||||
|
syslog_log_facility=LOG_LOCAL2
|
||||||
|
|
||||||
|
``keystone.conf``:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
use_syslog=True
|
||||||
|
syslog_log_facility=LOG_LOCAL3
|
||||||
|
|
||||||
|
By default, Object Storage logs to syslog.
|
||||||
|
|
||||||
|
Next, create ``/etc/rsyslog.d/client.conf`` with the following line:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
*.* @192.168.1.10
|
||||||
|
|
||||||
|
This instructs rsyslog to send all logs to the IP listed. In this
|
||||||
|
example, the IP points to the cloud controller.
|
||||||
|
|
||||||
|
rsyslog Server Configuration
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
Designate a server as the central logging server. The best practice is
|
||||||
|
to choose a server that is solely dedicated to this purpose. Create a
|
||||||
|
file called ``/etc/rsyslog.d/server.conf`` with the following contents:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
# Enable UDP
|
||||||
|
$ModLoad imudp
|
||||||
|
# Listen on 192.168.1.10 only
|
||||||
|
$UDPServerAddress 192.168.1.10
|
||||||
|
# Port 514
|
||||||
|
$UDPServerRun 514
|
||||||
|
|
||||||
|
# Create logging templates for nova
|
||||||
|
$template NovaFile,"/var/log/rsyslog/%HOSTNAME%/nova.log"
|
||||||
|
$template NovaAll,"/var/log/rsyslog/nova.log"
|
||||||
|
|
||||||
|
# Log everything else to syslog.log
|
||||||
|
$template DynFile,"/var/log/rsyslog/%HOSTNAME%/syslog.log"
|
||||||
|
*.* ?DynFile
|
||||||
|
|
||||||
|
# Log various openstack components to their own individual file
|
||||||
|
local0.* ?NovaFile
|
||||||
|
local0.* ?NovaAll
|
||||||
|
& ~
|
||||||
|
|
||||||
|
This example configuration handles the nova service only. It first
|
||||||
|
configures rsyslog to act as a server that runs on port 514. Next, it
|
||||||
|
creates a series of logging templates. Logging templates control where
|
||||||
|
received logs are stored. Using the last example, a nova log from
|
||||||
|
c01.example.com goes to the following locations:
|
||||||
|
|
||||||
|
- ``/var/log/rsyslog/c01.example.com/nova.log``
|
||||||
|
|
||||||
|
- ``/var/log/rsyslog/nova.log``
|
||||||
|
|
||||||
|
This is useful, as logs from c02.example.com go to:
|
||||||
|
|
||||||
|
- ``/var/log/rsyslog/c02.example.com/nova.log``
|
||||||
|
|
||||||
|
- ``/var/log/rsyslog/nova.log``
|
||||||
|
|
||||||
|
You have an individual log file for each compute node as well as an
|
||||||
|
aggregated log that contains nova logs from all nodes.
|
||||||
|
|
||||||
|
Monitoring
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
There are two types of monitoring: watching for problems and watching
|
||||||
|
usage trends. The former ensures that all services are up and running,
|
||||||
|
creating a functional cloud. The latter involves monitoring resource
|
||||||
|
usage over time in order to make informed decisions about potential
|
||||||
|
bottlenecks and upgrades.
|
||||||
|
|
||||||
|
**Nagios** is an open source monitoring service. It's capable of executing
|
||||||
|
arbitrary commands to check the status of server and network services,
|
||||||
|
remotely executing arbitrary commands directly on servers, and allowing
|
||||||
|
servers to push notifications back in the form of passive monitoring.
|
||||||
|
Nagios has been around since 1999. Although newer monitoring services
|
||||||
|
are available, Nagios is a tried-and-true systems administration
|
||||||
|
staple.
|
||||||
|
|
||||||
|
Process Monitoring
|
||||||
|
------------------
|
||||||
|
|
||||||
|
A basic type of alert monitoring is to simply check and see whether a
|
||||||
|
required process is running.monitoring process monitoringprocess
|
||||||
|
monitoringlogging/monitoring process monitoring For example, ensure that
|
||||||
|
the ``nova-api`` service is running on the cloud controller:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# ps aux | grep nova-api
|
||||||
|
nova 12786 0.0 0.0 37952 1312 ? Ss Feb11 0:00 su -s /bin/sh -c exec nova-api
|
||||||
|
--config-file=/etc/nova/nova.conf nova
|
||||||
|
nova 12787 0.0 0.1 135764 57400 ? S Feb11 0:01 /usr/bin/python
|
||||||
|
/usr/bin/nova-api --config-file=/etc/nova/nova.conf
|
||||||
|
nova 12792 0.0 0.0 96052 22856 ? S Feb11 0:01 /usr/bin/python
|
||||||
|
/usr/bin/nova-api --config-file=/etc/nova/nova.conf
|
||||||
|
nova 12793 0.0 0.3 290688 115516 ? S Feb11 1:23 /usr/bin/python
|
||||||
|
/usr/bin/nova-api --config-file=/etc/nova/nova.conf
|
||||||
|
nova 12794 0.0 0.2 248636 77068 ? S Feb11 0:04 /usr/bin/python
|
||||||
|
/usr/bin/nova-api --config-file=/etc/nova/nova.conf
|
||||||
|
root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api
|
||||||
|
|
||||||
|
You can create automated alerts for critical processes by using Nagios
|
||||||
|
and NRPE. For example, to ensure that the ``nova-compute`` process is
|
||||||
|
running on compute nodes, create an alert on your Nagios server that
|
||||||
|
looks like this:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
define service {
|
||||||
|
host_name c01.example.com
|
||||||
|
check_command check_nrpe_1arg!check_nova-compute
|
||||||
|
use generic-service
|
||||||
|
notification_period 24x7
|
||||||
|
contact_groups sysadmins
|
||||||
|
service_description nova-compute
|
||||||
|
}
|
||||||
|
|
||||||
|
Then on the actual compute node, create the following NRPE
|
||||||
|
configuration:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
\command[check_nova-compute]=/usr/lib/nagios/plugins/check_procs -c 1: \
|
||||||
|
-a nova-compute
|
||||||
|
|
||||||
|
Nagios checks that at least one ``nova-compute`` service is running at
|
||||||
|
all times.
|
||||||
|
|
||||||
|
Resource Alerting
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Resource alerting provides notifications when one or more resources are
|
||||||
|
critically low. While the monitoring thresholds should be tuned to your
|
||||||
|
specific OpenStack environment, monitoring resource usage is not
|
||||||
|
specific to OpenStack at all—any generic type of alert will work
|
||||||
|
fine.
|
||||||
|
|
||||||
|
Some of the resources that you want to monitor include:
|
||||||
|
|
||||||
|
- Disk usage
|
||||||
|
|
||||||
|
- Server load
|
||||||
|
|
||||||
|
- Memory usage
|
||||||
|
|
||||||
|
- Network I/O
|
||||||
|
|
||||||
|
- Available vCPUs
|
||||||
|
|
||||||
|
For example, to monitor disk capacity on a compute node with Nagios, add
|
||||||
|
the following to your Nagios configuration:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
define service {
|
||||||
|
host_name c01.example.com
|
||||||
|
check_command check_nrpe!check_all_disks!20% 10%
|
||||||
|
use generic-service
|
||||||
|
contact_groups sysadmins
|
||||||
|
service_description Disk
|
||||||
|
}
|
||||||
|
|
||||||
|
On the compute node, add the following to your NRPE configuration:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c \
|
||||||
|
$ARG2$ -e
|
||||||
|
|
||||||
|
Nagios alerts you with a WARNING when any disk on the compute node is 80
|
||||||
|
percent full and CRITICAL when 90 percent is full.
|
||||||
|
|
||||||
|
StackTach
|
||||||
|
---------
|
||||||
|
|
||||||
|
StackTach is a tool that collects and reports the notifications sent by
|
||||||
|
``nova``. Notifications are essentially the same as logs but can be much
|
||||||
|
more detailed. Nearly all OpenStack components are capable of generating
|
||||||
|
notifications when significant events occur. Notifications are messages
|
||||||
|
placed on the OpenStack queue (generally RabbitMQ) for consumption by
|
||||||
|
downstream systems. An overview of notifications can be found at `System
|
||||||
|
Usage
|
||||||
|
Data <https://wiki.openstack.org/wiki/SystemUsageData>`_.
|
||||||
|
|
||||||
|
To enable ``nova`` to send notifications, add the following to
|
||||||
|
``nova.conf``:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
notification_topics=monitor
|
||||||
|
notification_driver=messagingv2
|
||||||
|
|
||||||
|
Once ``nova`` is sending notifications, install and configure StackTach.
|
||||||
|
StackTach workers for Queue consumption and pipeling processing are
|
||||||
|
configured to read these notifications from RabbitMQ servers and store
|
||||||
|
them in a database. Users can inquire on instances, requests and servers
|
||||||
|
by using the browser interface or command line tool,
|
||||||
|
`Stacky <https://github.com/rackerlabs/stacky>`_. Since StackTach is
|
||||||
|
relatively new and constantly changing, installation instructions
|
||||||
|
quickly become outdated. Please refer to the `StackTach Git
|
||||||
|
repo <https://git.openstack.org/cgit//openstack/stacktach>`_ for
|
||||||
|
instructions as well as a demo video. Additional details on the latest
|
||||||
|
developments can be discovered at the `official
|
||||||
|
page <http://stacktach.com/>`_
|
||||||
|
|
||||||
|
Logstash
|
||||||
|
--------
|
||||||
|
|
||||||
|
Logstash is a high performance indexing and search engine for logs. Logs
|
||||||
|
from Jenkins test runs are sent to logstash where they are indexed and
|
||||||
|
stored. Logstash facilitates reviewing logs from multiple sources in a
|
||||||
|
single test run, searching for errors or particular events within a test
|
||||||
|
run, and searching for log event trends across test runs.
|
||||||
|
|
||||||
|
There are four major layers in Logstash setup which are
|
||||||
|
|
||||||
|
- Log Pusher
|
||||||
|
|
||||||
|
- Log Indexer
|
||||||
|
|
||||||
|
- ElasticSearch
|
||||||
|
|
||||||
|
- Kibana
|
||||||
|
|
||||||
|
Each layer scales horizontally. As the number of logs grows you can add
|
||||||
|
more log pushers, more Logstash indexers, and more ElasticSearch nodes.
|
||||||
|
|
||||||
|
Logpusher is a pair of Python scripts which first listens to Jenkins
|
||||||
|
build events and converts them into Gearman jobs. Gearman provides a
|
||||||
|
generic application framework to farm out work to other machines or
|
||||||
|
processes that are better suited to do the work. It allows you to do
|
||||||
|
work in parallel, to load balance processing, and to call functions
|
||||||
|
between languages.Later Logpusher performs Gearman jobs to push log
|
||||||
|
files into logstash. Logstash indexer reads these log events, filters
|
||||||
|
them to remove unwanted lines, collapse multiple events together, and
|
||||||
|
parses useful information before shipping them to ElasticSearch for
|
||||||
|
storage and indexing. Kibana is a logstash oriented web client for
|
||||||
|
ElasticSearch.
|
||||||
|
|
||||||
|
OpenStack Telemetry
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
An integrated OpenStack project (code-named :term:`ceilometer`) collects
|
||||||
|
metering and event data relating to OpenStack services. Data collected
|
||||||
|
by the Telemetry service could be used for billing. Depending on
|
||||||
|
deployment configuration, collected data may be accessible to users
|
||||||
|
based on the deployment configuration. The Telemetry service provides a
|
||||||
|
REST API documented at
|
||||||
|
http://developer.openstack.org/api-ref-telemetry-v2.html. You can read
|
||||||
|
more about the module in the `OpenStack Administrator
|
||||||
|
Guide <http://docs.openstack.org/admin-guide/telemetry.html>`_ or
|
||||||
|
in the `developer
|
||||||
|
documentation <http://docs.openstack.org/developer/ceilometer>`_.
|
||||||
|
|
||||||
|
OpenStack-Specific Resources
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
Resources such as memory, disk, and CPU are generic resources that all
|
||||||
|
servers (even non-OpenStack servers) have and are important to the
|
||||||
|
overall health of the server. When dealing with OpenStack specifically,
|
||||||
|
these resources are important for a second reason: ensuring that enough
|
||||||
|
are available to launch instances. There are a few ways you can see
|
||||||
|
OpenStack resource usage.monitoring OpenStack-specific
|
||||||
|
resourcesresources generic vs. OpenStack-specificlogging/monitoring
|
||||||
|
OpenStack-specific resources The first is through the :command:`nova` command:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# nova usage-list
|
||||||
|
|
||||||
|
This command displays a list of how many instances a tenant has running
|
||||||
|
and some light usage statistics about the combined instances. This
|
||||||
|
command is useful for a quick overview of your cloud, but it doesn't
|
||||||
|
really get into a lot of details.
|
||||||
|
|
||||||
|
Next, the ``nova`` database contains three tables that store usage
|
||||||
|
information.
|
||||||
|
|
||||||
|
The ``nova.quotas`` and ``nova.quota_usages`` tables store quota
|
||||||
|
information. If a tenant's quota is different from the default quota
|
||||||
|
settings, its quota is stored in the ``nova.quotas`` table. For example:
|
||||||
|
|
||||||
|
.. code-block:: mysql
|
||||||
|
|
||||||
|
mysql> select project_id, resource, hard_limit from quotas;
|
||||||
|
+----------------------------------+-----------------------------+------------+
|
||||||
|
| project_id | resource | hard_limit |
|
||||||
|
+----------------------------------+-----------------------------+------------+
|
||||||
|
| 628df59f091142399e0689a2696f5baa | metadata_items | 128 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | injected_file_content_bytes | 10240 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | injected_files | 5 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | gigabytes | 1000 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | ram | 51200 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | floating_ips | 10 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | instances | 10 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | volumes | 10 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | cores | 20 |
|
||||||
|
+----------------------------------+-----------------------------+------------+
|
||||||
|
|
||||||
|
The ``nova.quota_usages`` table keeps track of how many resources the
|
||||||
|
tenant currently has in use:
|
||||||
|
|
||||||
|
.. code-block:: mysql
|
||||||
|
|
||||||
|
mysql> select project_id, resource, in_use from quota_usages where project_id like '628%';
|
||||||
|
+----------------------------------+--------------+--------+
|
||||||
|
| project_id | resource | in_use |
|
||||||
|
+----------------------------------+--------------+--------+
|
||||||
|
| 628df59f091142399e0689a2696f5baa | instances | 1 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | ram | 512 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | cores | 1 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | floating_ips | 1 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | volumes | 2 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | gigabytes | 12 |
|
||||||
|
| 628df59f091142399e0689a2696f5baa | images | 1 |
|
||||||
|
+----------------------------------+--------------+--------+
|
||||||
|
|
||||||
|
By comparing a tenant's hard limit with their current resource usage,
|
||||||
|
you can see their usage percentage. For example, if this tenant is using
|
||||||
|
1 floating IP out of 10, then they are using 10 percent of their
|
||||||
|
floating IP quota. Rather than doing the calculation manually, you can
|
||||||
|
use SQL or the scripting language of your choice and create a formatted
|
||||||
|
report:
|
||||||
|
|
||||||
|
.. code-block:: mysql
|
||||||
|
|
||||||
|
+----------------------------------+------------+-------------+---------------+
|
||||||
|
| some_tenant |
|
||||||
|
+-----------------------------------+------------+------------+---------------+
|
||||||
|
| Resource | Used | Limit | |
|
||||||
|
+-----------------------------------+------------+------------+---------------+
|
||||||
|
| cores | 1 | 20 | 5 % |
|
||||||
|
| floating_ips | 1 | 10 | 10 % |
|
||||||
|
| gigabytes | 12 | 1000 | 1 % |
|
||||||
|
| images | 1 | 4 | 25 % |
|
||||||
|
| injected_file_content_bytes | 0 | 10240 | 0 % |
|
||||||
|
| injected_file_path_bytes | 0 | 255 | 0 % |
|
||||||
|
| injected_files | 0 | 5 | 0 % |
|
||||||
|
| instances | 1 | 10 | 10 % |
|
||||||
|
| key_pairs | 0 | 100 | 0 % |
|
||||||
|
| metadata_items | 0 | 128 | 0 % |
|
||||||
|
| ram | 512 | 51200 | 1 % |
|
||||||
|
| reservation_expire | 0 | 86400 | 0 % |
|
||||||
|
| security_group_rules | 0 | 20 | 0 % |
|
||||||
|
| security_groups | 0 | 10 | 0 % |
|
||||||
|
| volumes | 2 | 10 | 20 % |
|
||||||
|
+-----------------------------------+------------+------------+---------------+
|
||||||
|
|
||||||
|
The preceding information was generated by using a custom script that
|
||||||
|
can be found on
|
||||||
|
`GitHub <https://github.com/cybera/novac/blob/dev/libexec/novac-quota-report>`_.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This script is specific to a certain OpenStack installation and must
|
||||||
|
be modified to fit your environment. However, the logic should
|
||||||
|
easily be transferable.
|
||||||
|
|
||||||
|
Intelligent Alerting
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Intelligent alerting can be thought of as a form of continuous
|
||||||
|
integration for operations. For example, you can easily check to see
|
||||||
|
whether the Image service is up and running by ensuring that
|
||||||
|
the ``glance-api`` and ``glance-registry`` processes are running or by
|
||||||
|
seeing whether ``glace-api`` is responding on port 9292.
|
||||||
|
|
||||||
|
But how can you tell whether images are being successfully uploaded to
|
||||||
|
the Image service? Maybe the disk that Image service is storing the
|
||||||
|
images on is full or the S3 back end is down. You could naturally check
|
||||||
|
this by doing a quick image upload:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
#!/bin/bash
|
||||||
|
#
|
||||||
|
# assumes that reasonable credentials have been stored at
|
||||||
|
# /root/auth
|
||||||
|
|
||||||
|
|
||||||
|
. /root/openrc
|
||||||
|
wget http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img
|
||||||
|
glance image-create --name='cirros image' --is-public=true
|
||||||
|
--container-format=bare --disk-format=qcow2 < cirros-0.3.4-x8
|
||||||
|
6_64-disk.img
|
||||||
|
|
||||||
|
By taking this script and rolling it into an alert for your monitoring
|
||||||
|
system (such as Nagios), you now have an automated way of ensuring that
|
||||||
|
image uploads to the Image Catalog are working.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
You must remove the image after each test. Even better, test whether
|
||||||
|
you can successfully delete an image from the Image service.
|
||||||
|
|
||||||
|
Intelligent alerting takes considerably more time to plan and implement
|
||||||
|
than the other alerts described in this chapter. A good outline to
|
||||||
|
implement intelligent alerting is:
|
||||||
|
|
||||||
|
- Review common actions in your cloud.
|
||||||
|
|
||||||
|
- Create ways to automatically test these actions.
|
||||||
|
|
||||||
|
- Roll these tests into an alerting system.
|
||||||
|
|
||||||
|
Some other examples for Intelligent Alerting include:
|
||||||
|
|
||||||
|
- Can instances launch and be destroyed?
|
||||||
|
|
||||||
|
- Can users be created?
|
||||||
|
|
||||||
|
- Can objects be stored and deleted?
|
||||||
|
|
||||||
|
- Can volumes be created and destroyed?
|
||||||
|
|
||||||
|
Trending
|
||||||
|
--------
|
||||||
|
|
||||||
|
Trending can give you great insight into how your cloud is performing
|
||||||
|
day to day. You can learn, for example, if a busy day was simply a rare
|
||||||
|
occurrence or if you should start adding new compute nodes.
|
||||||
|
|
||||||
|
Trending takes a slightly different approach than alerting. While
|
||||||
|
alerting is interested in a binary result (whether a check succeeds or
|
||||||
|
fails), trending records the current state of something at a certain
|
||||||
|
point in time. Once enough points in time have been recorded, you can
|
||||||
|
see how the value has changed over time.
|
||||||
|
|
||||||
|
All of the alert types mentioned earlier can also be used for trend
|
||||||
|
reporting. Some other trend examples include:
|
||||||
|
|
||||||
|
- The number of instances on each compute node
|
||||||
|
|
||||||
|
- The types of flavors in use
|
||||||
|
|
||||||
|
- The number of volumes in use
|
||||||
|
|
||||||
|
- The number of Object Storage requests each hour
|
||||||
|
|
||||||
|
- The number of ``nova-api`` requests each hour
|
||||||
|
|
||||||
|
- The I/O statistics of your storage services
|
||||||
|
|
||||||
|
As an example, recording ``nova-api`` usage can allow you to track the
|
||||||
|
need to scale your cloud controller. By keeping an eye on ``nova-api``
|
||||||
|
requests, you can determine whether you need to spawn more ``nova-api``
|
||||||
|
processes or go as far as introducing an entirely new server to run
|
||||||
|
``nova-api``. To get an approximate count of the requests, look for
|
||||||
|
standard INFO messages in ``/var/log/nova/nova-api.log``:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# grep INFO /var/log/nova/nova-api.log | wc
|
||||||
|
|
||||||
|
You can obtain further statistics by looking for the number of
|
||||||
|
successful requests:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# grep " 200 " /var/log/nova/nova-api.log | wc
|
||||||
|
|
||||||
|
By running this command periodically and keeping a record of the result,
|
||||||
|
you can create a trending report over time that shows whether your
|
||||||
|
``nova-api`` usage is increasing, decreasing, or keeping steady.
|
||||||
|
|
||||||
|
A tool such as **collectd** can be used to store this information. While
|
||||||
|
collectd is out of the scope of this book, a good starting point would
|
||||||
|
be to use collectd to store the result as a COUNTER data type. More
|
||||||
|
information can be found in `collectd's
|
||||||
|
documentation <https://collectd.org/wiki/index.php/Data_source>`_.
|
||||||
|
|
||||||
|
Summary
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
For stable operations, you want to detect failure promptly and determine
|
||||||
|
causes efficiently. With a distributed system, it's even more important
|
||||||
|
to track the right items to meet a service-level target. Learning where
|
||||||
|
these logs are located in the file system or API gives you an advantage.
|
||||||
|
This chapter also showed how to read, interpret, and manipulate
|
||||||
|
information from OpenStack services so that you can monitor effectively.
|
|
@ -0,0 +1,778 @@
|
||||||
|
===========================
|
||||||
|
Managing Projects and Users
|
||||||
|
===========================
|
||||||
|
|
||||||
|
An OpenStack cloud does not have much value without users. This chapter
|
||||||
|
covers topics that relate to managing users, projects, and quotas. This
|
||||||
|
chapter describes users and projects as described by version 2 of the
|
||||||
|
OpenStack Identity API.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
While version 3 of the Identity API is available, the client tools
|
||||||
|
do not yet implement those calls, and most OpenStack clouds are
|
||||||
|
still implementing Identity API v2.0.
|
||||||
|
|
||||||
|
Projects or Tenants?
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In OpenStack user interfaces and documentation, a group of users is
|
||||||
|
referred to as a :term:`project` or :term:`tenant`.
|
||||||
|
These terms are interchangeable.
|
||||||
|
|
||||||
|
The initial implementation of OpenStack Compute had its own
|
||||||
|
authentication system and used the term ``project``. When authentication
|
||||||
|
moved into the OpenStack Identity (keystone) project, it used the term
|
||||||
|
``tenant`` to refer to a group of users. Because of this legacy, some of
|
||||||
|
the OpenStack tools refer to projects and some refer to tenants.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This guide uses the term ``project``, unless an example shows
|
||||||
|
interaction with a tool that uses the term ``tenant``.
|
||||||
|
|
||||||
|
Managing Projects
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Users must be associated with at least one project, though they may
|
||||||
|
belong to many. Therefore, you should add at least one project before
|
||||||
|
adding users.
|
||||||
|
|
||||||
|
Adding Projects
|
||||||
|
---------------
|
||||||
|
|
||||||
|
To create a project through the OpenStack dashboard:
|
||||||
|
|
||||||
|
#. Log in as an administrative user.
|
||||||
|
|
||||||
|
#. Select the :guilabel:`Identity` tab in the left navigation bar.
|
||||||
|
|
||||||
|
#. Under Identity tab, click :guilabel:`Projects`.
|
||||||
|
|
||||||
|
#. Click the :guilabel:`Create Project` button.
|
||||||
|
|
||||||
|
You are prompted for a project name and an optional, but recommended,
|
||||||
|
description. Select the checkbox at the bottom of the form to enable
|
||||||
|
this project. By default, it is enabled, as shown in
|
||||||
|
:ref:`figure_create_project`.
|
||||||
|
|
||||||
|
.. _figure_create_project:
|
||||||
|
|
||||||
|
.. figure:: figures/osog_0901.png
|
||||||
|
:alt: Dashboard's Create Project form
|
||||||
|
|
||||||
|
Figure Dashboard's Create Project form
|
||||||
|
|
||||||
|
It is also possible to add project members and adjust the project
|
||||||
|
quotas. We'll discuss those actions later, but in practice, it can be
|
||||||
|
quite convenient to deal with all these operations at one time.
|
||||||
|
|
||||||
|
To add a project through the command line, you must use the OpenStack
|
||||||
|
command line client.
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# openstack project create demo
|
||||||
|
|
||||||
|
This command creates a project named "demo." Optionally, you can add a
|
||||||
|
description string by appending :option:`--description tenant-description`,
|
||||||
|
which can be very useful. You can also
|
||||||
|
create a group in a disabled state by appending :option:`--disable` to the
|
||||||
|
command. By default, projects are created in an enabled state.
|
||||||
|
|
||||||
|
Quotas
|
||||||
|
~~~~~~
|
||||||
|
|
||||||
|
To prevent system capacities from being exhausted without notification,
|
||||||
|
you can set up :term:`quotas <quota>`. Quotas are operational limits. For example,
|
||||||
|
the number of gigabytes allowed per tenant can be controlled to ensure that
|
||||||
|
a single tenant cannot consume all of the disk space. Quotas are
|
||||||
|
currently enforced at the tenant (or project) level, rather than the
|
||||||
|
user level.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
Because without sensible quotas a single tenant could use up all the
|
||||||
|
available resources, default quotas are shipped with OpenStack. You
|
||||||
|
should pay attention to which quota settings make sense for your
|
||||||
|
hardware capabilities.
|
||||||
|
|
||||||
|
Using the command-line interface, you can manage quotas for the
|
||||||
|
OpenStack Compute service and the Block Storage service.
|
||||||
|
|
||||||
|
Typically, default values are changed because a tenant requires more
|
||||||
|
than the OpenStack default of 10 volumes per tenant, or more than the
|
||||||
|
OpenStack default of 1 TB of disk space on a compute node.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
To view all tenants, run:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ openstack project list
|
||||||
|
+---------------------------------+----------+
|
||||||
|
| ID | Name |
|
||||||
|
+---------------------------------+----------+
|
||||||
|
| a981642d22c94e159a4a6540f70f9f8 | admin |
|
||||||
|
| 934b662357674c7b9f5e4ec6ded4d0e | tenant01 |
|
||||||
|
| 7bc1dbfd7d284ec4a856ea1eb82dca8 | tenant02 |
|
||||||
|
| 9c554aaef7804ba49e1b21cbd97d218 | services |
|
||||||
|
+---------------------------------+----------+
|
||||||
|
|
||||||
|
Set Image Quotas
|
||||||
|
----------------
|
||||||
|
|
||||||
|
You can restrict a project's image storage by total number of bytes.
|
||||||
|
Currently, this quota is applied cloud-wide, so if you were to set an
|
||||||
|
Image quota limit of 5 GB, then all projects in your cloud will be able
|
||||||
|
to store only 5 GB of images and snapshots.
|
||||||
|
|
||||||
|
To enable this feature, edit the ``/etc/glance/glance-api.conf`` file,
|
||||||
|
and under the ``[DEFAULT]`` section, add:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
user_storage_quota = <bytes>
|
||||||
|
|
||||||
|
For example, to restrict a project's image storage to 5 GB, do this:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
user_storage_quota = 5368709120
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
There is a configuration option in ``glance-api.conf`` that limits
|
||||||
|
the number of members allowed per image, called
|
||||||
|
``image_member_quota``, set to 128 by default. That setting is a
|
||||||
|
different quota from the storage quota.
|
||||||
|
|
||||||
|
Set Compute Service Quotas
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
As an administrative user, you can update the Compute service quotas for
|
||||||
|
an existing tenant, as well as update the quota defaults for a new
|
||||||
|
tenant.Compute Compute service See :ref:`table_compute_quota`.
|
||||||
|
|
||||||
|
.. _table_compute_quota:
|
||||||
|
|
||||||
|
.. list-table:: Compute quota descriptions
|
||||||
|
:widths: 30 40 30
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Quota
|
||||||
|
- Description
|
||||||
|
- Property name
|
||||||
|
* - Fixed IPs
|
||||||
|
- Number of fixed IP addresses allowed per tenant.
|
||||||
|
This number must be equal to or greater than the number
|
||||||
|
of allowed instances.
|
||||||
|
- fixed-ips
|
||||||
|
* - Floating IPs
|
||||||
|
- Number of floating IP addresses allowed per tenant.
|
||||||
|
- floating-ips
|
||||||
|
* - Injected file content bytes
|
||||||
|
- Number of content bytes allowed per injected file.
|
||||||
|
- injected-file-content-bytes
|
||||||
|
* - Injected file path bytes
|
||||||
|
- Number of bytes allowed per injected file path.
|
||||||
|
- injected-file-path-bytes
|
||||||
|
* - Injected files
|
||||||
|
- Number of injected files allowed per tenant.
|
||||||
|
- injected-files
|
||||||
|
* - Instances
|
||||||
|
- Number of instances allowed per tenant.
|
||||||
|
- instances
|
||||||
|
* - Key pairs
|
||||||
|
- Number of key pairs allowed per user.
|
||||||
|
- key-pairs
|
||||||
|
* - Metadata items
|
||||||
|
- Number of metadata items allowed per instance.
|
||||||
|
- metadata-items
|
||||||
|
* - RAM
|
||||||
|
- Megabytes of instance RAM allowed per tenant.
|
||||||
|
- ram
|
||||||
|
* - Security group rules
|
||||||
|
- Number of rules per security group.
|
||||||
|
- security-group-rules
|
||||||
|
* - Security groups
|
||||||
|
- Number of security groups per tenant.
|
||||||
|
- security-groups
|
||||||
|
* - VCPUs
|
||||||
|
- Number of instance cores allowed per tenant.
|
||||||
|
- cores
|
||||||
|
|
||||||
|
View and update compute quotas for a tenant (project)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
As an administrative user, you can use the :command:`nova quota-*`
|
||||||
|
commands, which are provided by the
|
||||||
|
``python-novaclient`` package, to view and update tenant quotas.
|
||||||
|
|
||||||
|
**To view and update default quota values**
|
||||||
|
|
||||||
|
#. List all default quotas for all tenants, as follows:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova quota-defaults
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova quota-defaults
|
||||||
|
+-----------------------------+-------+
|
||||||
|
| Property | Value |
|
||||||
|
+-----------------------------+-------+
|
||||||
|
| metadata_items | 128 |
|
||||||
|
| injected_file_content_bytes | 10240 |
|
||||||
|
| ram | 51200 |
|
||||||
|
| floating_ips | 10 |
|
||||||
|
| key_pairs | 100 |
|
||||||
|
| instances | 10 |
|
||||||
|
| security_group_rules | 20 |
|
||||||
|
| injected_files | 5 |
|
||||||
|
| cores | 20 |
|
||||||
|
| fixed_ips | -1 |
|
||||||
|
| injected_file_path_bytes | 255 |
|
||||||
|
| security_groups | 10 |
|
||||||
|
+-----------------------------+-------+
|
||||||
|
|
||||||
|
#. Update a default value for a new tenant, as follows:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova quota-class-update default key value
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova quota-class-update default --instances 15
|
||||||
|
|
||||||
|
**To view quota values for a tenant (project)**
|
||||||
|
|
||||||
|
#. Place the tenant ID in a variable:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ tenant=$(openstack project list | awk '/tenantName/ {print $2}')
|
||||||
|
|
||||||
|
#. List the currently set quota values for a tenant, as follows:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova quota-show --tenant $tenant
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova quota-show --tenant $tenant
|
||||||
|
+-----------------------------+-------+
|
||||||
|
| Property | Value |
|
||||||
|
+-----------------------------+-------+
|
||||||
|
| metadata_items | 128 |
|
||||||
|
| injected_file_content_bytes | 10240 |
|
||||||
|
| ram | 51200 |
|
||||||
|
| floating_ips | 12 |
|
||||||
|
| key_pairs | 100 |
|
||||||
|
| instances | 10 |
|
||||||
|
| security_group_rules | 20 |
|
||||||
|
| injected_files | 5 |
|
||||||
|
| cores | 20 |
|
||||||
|
| fixed_ips | -1 |
|
||||||
|
| injected_file_path_bytes | 255 |
|
||||||
|
| security_groups | 10 |
|
||||||
|
+-----------------------------+-------+
|
||||||
|
|
||||||
|
**To update quota values for a tenant (project)**
|
||||||
|
|
||||||
|
#. Obtain the tenant ID, as follows:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ tenant=$(openstack project list | awk '/tenantName/ {print $2}')
|
||||||
|
|
||||||
|
#. Update a particular quota value, as follows:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# nova quota-update --quotaName quotaValue tenantID
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# nova quota-update --floating-ips 20 $tenant
|
||||||
|
# nova quota-show --tenant $tenant
|
||||||
|
+-----------------------------+-------+
|
||||||
|
| Property | Value |
|
||||||
|
+-----------------------------+-------+
|
||||||
|
| metadata_items | 128 |
|
||||||
|
| injected_file_content_bytes | 10240 |
|
||||||
|
| ram | 51200 |
|
||||||
|
| floating_ips | 20 |
|
||||||
|
| key_pairs | 100 |
|
||||||
|
| instances | 10 |
|
||||||
|
| security_group_rules | 20 |
|
||||||
|
| injected_files | 5 |
|
||||||
|
| cores | 20 |
|
||||||
|
| fixed_ips | -1 |
|
||||||
|
| injected_file_path_bytes | 255 |
|
||||||
|
| security_groups | 10 |
|
||||||
|
+-----------------------------+-------+
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
To view a list of options for the ``quota-update`` command, run:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ nova help quota-update
|
||||||
|
|
||||||
|
Set Object Storage Quotas
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
There are currently two categories of quotas for Object Storage:
|
||||||
|
|
||||||
|
Container quotas
|
||||||
|
Limit the total size (in bytes) or number of objects that can be
|
||||||
|
stored in a single container.
|
||||||
|
|
||||||
|
Account quotas
|
||||||
|
Limit the total size (in bytes) that a user has available in the
|
||||||
|
Object Storage service.
|
||||||
|
|
||||||
|
To take advantage of either container quotas or account quotas, your
|
||||||
|
Object Storage proxy server must have ``container_quotas`` or
|
||||||
|
``account_quotas`` (or both) added to the ``[pipeline:main]`` pipeline.
|
||||||
|
Each quota type also requires its own section in the
|
||||||
|
``proxy-server.conf`` file:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[pipeline:main]
|
||||||
|
pipeline = catch_errors [...] slo dlo account_quotas proxy-server
|
||||||
|
|
||||||
|
[filter:account_quotas]
|
||||||
|
use = egg:swift#account_quotas
|
||||||
|
|
||||||
|
[filter:container_quotas]
|
||||||
|
use = egg:swift#container_quotas
|
||||||
|
|
||||||
|
To view and update Object Storage quotas, use the :command:`swift` command
|
||||||
|
provided by the ``python-swiftclient`` package. Any user included in the
|
||||||
|
project can view the quotas placed on their project. To update Object
|
||||||
|
Storage quotas on a project, you must have the role of ResellerAdmin in
|
||||||
|
the project that the quota is being applied to.
|
||||||
|
|
||||||
|
To view account quotas placed on a project:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ swift stat
|
||||||
|
Account: AUTH_b36ed2d326034beba0a9dd1fb19b70f9
|
||||||
|
Containers: 0
|
||||||
|
Objects: 0
|
||||||
|
Bytes: 0
|
||||||
|
Meta Quota-Bytes: 214748364800
|
||||||
|
X-Timestamp: 1351050521.29419
|
||||||
|
Content-Type: text/plain; charset=utf-8
|
||||||
|
Accept-Ranges: bytes
|
||||||
|
|
||||||
|
To apply or update account quotas on a project:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ swift post -m quota-bytes:
|
||||||
|
<bytes>
|
||||||
|
|
||||||
|
For example, to place a 5 GB quota on an account:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ swift post -m quota-bytes:
|
||||||
|
5368709120
|
||||||
|
|
||||||
|
To verify the quota, run the :command:`swift stat` command again:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ swift stat
|
||||||
|
Account: AUTH_b36ed2d326034beba0a9dd1fb19b70f9
|
||||||
|
Containers: 0
|
||||||
|
Objects: 0
|
||||||
|
Bytes: 0
|
||||||
|
Meta Quota-Bytes: 5368709120
|
||||||
|
X-Timestamp: 1351541410.38328
|
||||||
|
Content-Type: text/plain; charset=utf-8
|
||||||
|
Accept-Ranges: bytes
|
||||||
|
|
||||||
|
Set Block Storage Quotas
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
As an administrative user, you can update the Block Storage service
|
||||||
|
quotas for a tenant, as well as update the quota defaults for a new
|
||||||
|
tenant. See :ref:`table_block_storage_quota`.
|
||||||
|
|
||||||
|
.. _table_block_storage_quota:
|
||||||
|
|
||||||
|
.. list-table:: Table: Block Storage quota descriptions
|
||||||
|
:widths: 50 50
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Property name
|
||||||
|
- Description
|
||||||
|
* - gigabytes
|
||||||
|
- Number of volume gigabytes allowed per tenant
|
||||||
|
* - snapshots
|
||||||
|
- Number of Block Storage snapshots allowed per tenant.
|
||||||
|
* - volumes
|
||||||
|
- Number of Block Storage volumes allowed per tenant
|
||||||
|
|
||||||
|
View and update Block Storage quotas for a tenant (project)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
As an administrative user, you can use the :command:`cinder quota-*`
|
||||||
|
commands, which are provided by the
|
||||||
|
``python-cinderclient`` package, to view and update tenant quotas.
|
||||||
|
|
||||||
|
**To view and update default Block Storage quota values**
|
||||||
|
|
||||||
|
#. List all default quotas for all tenants, as follows:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ cinder quota-defaults
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ cinder quota-defaults
|
||||||
|
+-----------+-------+
|
||||||
|
| Property | Value |
|
||||||
|
+-----------+-------+
|
||||||
|
| gigabytes | 1000 |
|
||||||
|
| snapshots | 10 |
|
||||||
|
| volumes | 10 |
|
||||||
|
+-----------+-------+
|
||||||
|
|
||||||
|
#. To update a default value for a new tenant, update the property in the
|
||||||
|
``/etc/cinder/cinder.conf`` file.
|
||||||
|
|
||||||
|
**To view Block Storage quotas for a tenant (project)**
|
||||||
|
|
||||||
|
#. View quotas for the tenant, as follows:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# cinder quota-show tenantName
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# cinder quota-show tenant01
|
||||||
|
+-----------+-------+
|
||||||
|
| Property | Value |
|
||||||
|
+-----------+-------+
|
||||||
|
| gigabytes | 1000 |
|
||||||
|
| snapshots | 10 |
|
||||||
|
| volumes | 10 |
|
||||||
|
+-----------+-------+
|
||||||
|
|
||||||
|
**To update Block Storage quotas for a tenant (project)**
|
||||||
|
|
||||||
|
#. Place the tenant ID in a variable:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ tenant=$(openstack project list | awk '/tenantName/ {print $2}')
|
||||||
|
|
||||||
|
#. Update a particular quota value, as follows:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# cinder quota-update --quotaName NewValue tenantID
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# cinder quota-update --volumes 15 $tenant
|
||||||
|
# cinder quota-show tenant01
|
||||||
|
+-----------+-------+
|
||||||
|
| Property | Value |
|
||||||
|
+-----------+-------+
|
||||||
|
| gigabytes | 1000 |
|
||||||
|
| snapshots | 10 |
|
||||||
|
| volumes | 15 |
|
||||||
|
+-----------+-------+
|
||||||
|
|
||||||
|
User Management
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The command-line tools for managing users are inconvenient to use
|
||||||
|
directly. They require issuing multiple commands to complete a single
|
||||||
|
task, and they use UUIDs rather than symbolic names for many items. In
|
||||||
|
practice, humans typically do not use these tools directly. Fortunately,
|
||||||
|
the OpenStack dashboard provides a reasonable interface to this. In
|
||||||
|
addition, many sites write custom tools for local needs to enforce local
|
||||||
|
policies and provide levels of self-service to users that aren't
|
||||||
|
currently available with packaged tools.
|
||||||
|
|
||||||
|
Creating New Users
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
To create a user, you need the following information:
|
||||||
|
|
||||||
|
* Username
|
||||||
|
* Email address
|
||||||
|
* Password
|
||||||
|
* Primary project
|
||||||
|
* Role
|
||||||
|
* Enabled
|
||||||
|
|
||||||
|
Username and email address are self-explanatory, though your site may
|
||||||
|
have local conventions you should observe. The primary project is simply
|
||||||
|
the first project the user is associated with and must exist prior to
|
||||||
|
creating the user. Role is almost always going to be "member." Out of
|
||||||
|
the box, OpenStack comes with two roles defined:
|
||||||
|
|
||||||
|
member
|
||||||
|
A typical user
|
||||||
|
|
||||||
|
admin
|
||||||
|
An administrative super user, which has full permissions across all
|
||||||
|
projects and should be used with great care
|
||||||
|
|
||||||
|
It is possible to define other roles, but doing so is uncommon.
|
||||||
|
|
||||||
|
Once you've gathered this information, creating the user in the
|
||||||
|
dashboard is just another web form similar to what we've seen before and
|
||||||
|
can be found by clicking the Users link in the Identity navigation bar
|
||||||
|
and then clicking the Create User button at the top right.
|
||||||
|
|
||||||
|
Modifying users is also done from this Users page. If you have a large
|
||||||
|
number of users, this page can get quite crowded. The Filter search box
|
||||||
|
at the top of the page can be used to limit the users listing. A form
|
||||||
|
very similar to the user creation dialog can be pulled up by selecting
|
||||||
|
Edit from the actions dropdown menu at the end of the line for the user
|
||||||
|
you are modifying.
|
||||||
|
|
||||||
|
Associating Users with Projects
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Many sites run with users being associated with only one project. This
|
||||||
|
is a more conservative and simpler choice both for administration and
|
||||||
|
for users. Administratively, if a user reports a problem with an
|
||||||
|
instance or quota, it is obvious which project this relates to. Users
|
||||||
|
needn't worry about what project they are acting in if they are only in
|
||||||
|
one project. However, note that, by default, any user can affect the
|
||||||
|
resources of any other user within their project. It is also possible to
|
||||||
|
associate users with multiple projects if that makes sense for your
|
||||||
|
organization.
|
||||||
|
|
||||||
|
Associating existing users with an additional project or removing them
|
||||||
|
from an older project is done from the Projects page of the dashboard by
|
||||||
|
selecting Modify Users from the Actions column, as shown in
|
||||||
|
:ref:`figure_edit_project_members`.
|
||||||
|
|
||||||
|
From this view, you can do a number of useful things, as well as a few
|
||||||
|
dangerous ones.
|
||||||
|
|
||||||
|
The first column of this form, named All Users, includes a list of all
|
||||||
|
the users in your cloud who are not already associated with this
|
||||||
|
project. The second column shows all the users who are. These lists can
|
||||||
|
be quite long, but they can be limited by typing a substring of the
|
||||||
|
username you are looking for in the filter field at the top of the
|
||||||
|
column.
|
||||||
|
|
||||||
|
From here, click the :guilabel:`+` icon to add users to the project.
|
||||||
|
Click the :guilabel:`-` to remove them.
|
||||||
|
|
||||||
|
.. _figure_edit_project_members:
|
||||||
|
|
||||||
|
.. figure:: figures/osog_0902.png
|
||||||
|
:alt: Edit Project Members tab
|
||||||
|
|
||||||
|
Edit Project Members tab
|
||||||
|
|
||||||
|
The dangerous possibility comes with the ability to change member roles.
|
||||||
|
This is the dropdown list below the username in the
|
||||||
|
:guilabel:`Project Members` list. In virtually all cases,
|
||||||
|
this value should be set to Member. This example purposefully shows
|
||||||
|
an administrative user where this value is admin.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
The admin is global, not per project, so granting a user the admin
|
||||||
|
role in any project gives the user administrative rights across the
|
||||||
|
whole cloud.
|
||||||
|
|
||||||
|
Typical use is to only create administrative users in a single project,
|
||||||
|
by convention the admin project, which is created by default during
|
||||||
|
cloud setup. If your administrative users also use the cloud to launch
|
||||||
|
and manage instances, it is strongly recommended that you use separate
|
||||||
|
user accounts for administrative access and normal operations and that
|
||||||
|
they be in distinct projects.
|
||||||
|
|
||||||
|
Customizing Authorization
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
The default :term:`authorization` settings allow administrative users
|
||||||
|
only to create resources on behalf of a different project.
|
||||||
|
OpenStack handles two kinds of authorization policies:
|
||||||
|
|
||||||
|
Operation based
|
||||||
|
Policies specify access criteria for specific operations, possibly
|
||||||
|
with fine-grained control over specific attributes.
|
||||||
|
|
||||||
|
Resource based
|
||||||
|
Whether access to a specific resource might be granted or not
|
||||||
|
according to the permissions configured for the resource (currently
|
||||||
|
available only for the network resource). The actual authorization
|
||||||
|
policies enforced in an OpenStack service vary from deployment to
|
||||||
|
deployment.
|
||||||
|
|
||||||
|
The policy engine reads entries from the ``policy.json`` file. The
|
||||||
|
actual location of this file might vary from distribution to
|
||||||
|
distribution: for nova, it is typically in ``/etc/nova/policy.json``.
|
||||||
|
You can update entries while the system is running, and you do not have
|
||||||
|
to restart services. Currently, the only way to update such policies is
|
||||||
|
to edit the policy file.
|
||||||
|
|
||||||
|
The OpenStack service's policy engine matches a policy directly. A rule
|
||||||
|
indicates evaluation of the elements of such policies. For instance, in
|
||||||
|
a ``compute:create: [["rule:admin_or_owner"]]`` statement, the policy is
|
||||||
|
``compute:create``, and the rule is ``admin_or_owner``.
|
||||||
|
|
||||||
|
Policies are triggered by an OpenStack policy engine whenever one of
|
||||||
|
them matches an OpenStack API operation or a specific attribute being
|
||||||
|
used in a given operation. For instance, the engine tests the
|
||||||
|
``create:compute`` policy every time a user sends a
|
||||||
|
``POST /v2/{tenant_id}/servers`` request to the OpenStack Compute API
|
||||||
|
server. Policies can be also related to specific :term:`API extensions
|
||||||
|
<API extension>`. For instance, if a user needs an extension like
|
||||||
|
``compute_extension:rescue``, the attributes defined by the provider
|
||||||
|
extensions trigger the rule test for that operation.
|
||||||
|
|
||||||
|
An authorization policy can be composed by one or more rules. If more
|
||||||
|
rules are specified, evaluation policy is successful if any of the rules
|
||||||
|
evaluates successfully; if an API operation matches multiple policies,
|
||||||
|
then all the policies must evaluate successfully. Also, authorization
|
||||||
|
rules are recursive. Once a rule is matched, the rule(s) can be resolved
|
||||||
|
to another rule, until a terminal rule is reached. These are the rules
|
||||||
|
defined:
|
||||||
|
|
||||||
|
Role-based rules
|
||||||
|
Evaluate successfully if the user submitting the request has the
|
||||||
|
specified role. For instance, ``"role:admin"`` is successful if the
|
||||||
|
user submitting the request is an administrator.
|
||||||
|
|
||||||
|
Field-based rules
|
||||||
|
Evaluate successfully if a field of the resource specified in the
|
||||||
|
current request matches a specific value. For instance,
|
||||||
|
``"field:networks:shared=True"`` is successful if the attribute
|
||||||
|
shared of the network resource is set to ``true``.
|
||||||
|
|
||||||
|
Generic rules
|
||||||
|
Compare an attribute in the resource with an attribute extracted
|
||||||
|
from the user's security credentials and evaluates successfully if
|
||||||
|
the comparison is successful. For instance,
|
||||||
|
``"tenant_id:%(tenant_id)s"`` is successful if the tenant identifier
|
||||||
|
in the resource is equal to the tenant identifier of the user
|
||||||
|
submitting the request.
|
||||||
|
|
||||||
|
Here are snippets of the default nova ``policy.json`` file:
|
||||||
|
|
||||||
|
.. code-block:: json
|
||||||
|
|
||||||
|
{
|
||||||
|
"context_is_admin": [["role:admin"]],
|
||||||
|
"admin_or_owner": [["is_admin:True"], ["project_id:%(project_id)s"]], ~~~~(1)~~~~
|
||||||
|
"default": [["rule:admin_or_owner"]], ~~~~(2)~~~~
|
||||||
|
"compute:create": [ ],
|
||||||
|
"compute:create:attach_network": [ ],
|
||||||
|
"compute:create:attach_volume": [ ],
|
||||||
|
"compute:get_all": [ ],
|
||||||
|
"admin_api": [["is_admin:True"]],
|
||||||
|
"compute_extension:accounts": [["rule:admin_api"]],
|
||||||
|
"compute_extension:admin_actions": [["rule:admin_api"]],
|
||||||
|
"compute_extension:admin_actions:pause": [["rule:admin_or_owner"]],
|
||||||
|
"compute_extension:admin_actions:unpause": [["rule:admin_or_owner"]],
|
||||||
|
...
|
||||||
|
"compute_extension:admin_actions:migrate": [["rule:admin_api"]],
|
||||||
|
"compute_extension:aggregates": [["rule:admin_api"]],
|
||||||
|
"compute_extension:certificates": [ ],
|
||||||
|
...
|
||||||
|
"compute_extension:flavorextraspecs": [ ],
|
||||||
|
"compute_extension:flavormanage": [["rule:admin_api"]], ~~~~(3)~~~~
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
1. Shows a rule that evaluates successfully if the current user is an
|
||||||
|
administrator or the owner of the resource specified in the request
|
||||||
|
(tenant identifier is equal).
|
||||||
|
|
||||||
|
2. Shows the default policy, which is always evaluated if an API
|
||||||
|
operation does not match any of the policies in ``policy.json``.
|
||||||
|
|
||||||
|
3. Shows a policy restricting the ability to manipulate flavors to
|
||||||
|
administrators using the Admin API only.admin API
|
||||||
|
|
||||||
|
In some cases, some operations should be restricted to administrators
|
||||||
|
only. Therefore, as a further example, let us consider how this sample
|
||||||
|
policy file could be modified in a scenario where we enable users to
|
||||||
|
create their own flavors:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
"compute_extension:flavormanage": [ ],
|
||||||
|
|
||||||
|
Users Who Disrupt Other Users
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
Users on your cloud can disrupt other users, sometimes intentionally and
|
||||||
|
maliciously and other times by accident. Understanding the situation
|
||||||
|
allows you to make a better decision on how to handle the
|
||||||
|
disruption.
|
||||||
|
|
||||||
|
For example, a group of users have instances that are utilizing a large
|
||||||
|
amount of compute resources for very compute-intensive tasks. This is
|
||||||
|
driving the load up on compute nodes and affecting other users. In this
|
||||||
|
situation, review your user use cases. You may find that high compute
|
||||||
|
scenarios are common, and should then plan for proper segregation in
|
||||||
|
your cloud, such as host aggregation or regions.
|
||||||
|
|
||||||
|
Another example is a user consuming a very large amount of
|
||||||
|
bandwidthbandwidth recognizing DDOS attacks. Again, the key is to
|
||||||
|
understand what the user is doing. If she naturally needs a high amount
|
||||||
|
of bandwidth, you might have to limit her transmission rate as to not
|
||||||
|
affect other users or move her to an area with more bandwidth available.
|
||||||
|
On the other hand, maybe her instance has been hacked and is part of a
|
||||||
|
botnet launching DDOS attacks. Resolution of this issue is the same as
|
||||||
|
though any other server on your network has been hacked. Contact the
|
||||||
|
user and give her time to respond. If she doesn't respond, shut down the
|
||||||
|
instance.
|
||||||
|
|
||||||
|
A final example is if a user is hammering cloud resources repeatedly.
|
||||||
|
Contact the user and learn what he is trying to do. Maybe he doesn't
|
||||||
|
understand that what he's doing is inappropriate, or maybe there is an
|
||||||
|
issue with the resource he is trying to access that is causing his
|
||||||
|
requests to queue or lag.
|
||||||
|
|
||||||
|
Summary
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
One key element of systems administration that is often overlooked is
|
||||||
|
that end users are the reason systems administrators exist. Don't go the
|
||||||
|
BOFH route and terminate every user who causes an alert to go off. Work
|
||||||
|
with users to understand what they're trying to accomplish and see how
|
||||||
|
your environment can better assist them in achieving their goals. Meet
|
||||||
|
your users needs by organizing your users into projects, applying
|
||||||
|
policies, managing quotas, and working with them.
|
|
@ -0,0 +1,550 @@
|
||||||
|
========
|
||||||
|
Upgrades
|
||||||
|
========
|
||||||
|
|
||||||
|
With the exception of Object Storage, upgrading from one version of
|
||||||
|
OpenStack to another can take a great deal of effort. This chapter
|
||||||
|
provides some guidance on the operational aspects that you should
|
||||||
|
consider for performing an upgrade for a basic architecture.
|
||||||
|
|
||||||
|
Pre-upgrade considerations
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Upgrade planning
|
||||||
|
----------------
|
||||||
|
|
||||||
|
- Thoroughly review the `release
|
||||||
|
notes <http://wiki.openstack.org/wiki/ReleaseNotes/>`_ to learn
|
||||||
|
about new, updated, and deprecated features. Find incompatibilities
|
||||||
|
between versions.
|
||||||
|
|
||||||
|
- Consider the impact of an upgrade to users. The upgrade process
|
||||||
|
interrupts management of your environment including the dashboard. If
|
||||||
|
you properly prepare for the upgrade, existing instances, networking,
|
||||||
|
and storage should continue to operate. However, instances might
|
||||||
|
experience intermittent network interruptions.
|
||||||
|
|
||||||
|
- Consider the approach to upgrading your environment. You can perform
|
||||||
|
an upgrade with operational instances, but this is a dangerous
|
||||||
|
approach. You might consider using live migration to temporarily
|
||||||
|
relocate instances to other compute nodes while performing upgrades.
|
||||||
|
However, you must ensure database consistency throughout the process;
|
||||||
|
otherwise your environment might become unstable. Also, don't forget
|
||||||
|
to provide sufficient notice to your users, including giving them
|
||||||
|
plenty of time to perform their own backups.
|
||||||
|
|
||||||
|
- Consider adopting structure and options from the service
|
||||||
|
configuration files and merging them with existing configuration
|
||||||
|
files. The `OpenStack Configuration
|
||||||
|
Reference <http://docs.openstack.org/liberty/config-reference/content/>`_
|
||||||
|
contains new, updated, and deprecated options for most services.
|
||||||
|
|
||||||
|
- Like all major system upgrades, your upgrade could fail for one or
|
||||||
|
more reasons. You should prepare for this situation by having the
|
||||||
|
ability to roll back your environment to the previous release,
|
||||||
|
including databases, configuration files, and packages. We provide an
|
||||||
|
example process for rolling back your environment in
|
||||||
|
:ref:`rolling_back_a_failed_upgrade`.
|
||||||
|
|
||||||
|
- Develop an upgrade procedure and assess it thoroughly by using a test
|
||||||
|
environment similar to your production environment.
|
||||||
|
|
||||||
|
Pre-upgrade testing environment
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
The most important step is the pre-upgrade testing. If you are upgrading
|
||||||
|
immediately after release of a new version, undiscovered bugs might
|
||||||
|
hinder your progress. Some deployers prefer to wait until the first
|
||||||
|
point release is announced. However, if you have a significant
|
||||||
|
deployment, you might follow the development and testing of the release
|
||||||
|
to ensure that bugs for your use cases are fixed.
|
||||||
|
|
||||||
|
Each OpenStack cloud is different even if you have a near-identical
|
||||||
|
architecture as described in this guide. As a result, you must still
|
||||||
|
test upgrades between versions in your environment using an approximate
|
||||||
|
clone of your environment.
|
||||||
|
|
||||||
|
However, that is not to say that it needs to be the same size or use
|
||||||
|
identical hardware as the production environment. It is important to
|
||||||
|
consider the hardware and scale of the cloud that you are upgrading. The
|
||||||
|
following tips can help you minimise the cost:
|
||||||
|
|
||||||
|
Use your own cloud
|
||||||
|
The simplest place to start testing the next version of OpenStack is
|
||||||
|
by setting up a new environment inside your own cloud. This might
|
||||||
|
seem odd, especially the double virtualization used in running
|
||||||
|
compute nodes. But it is a sure way to very quickly test your
|
||||||
|
configuration.
|
||||||
|
|
||||||
|
Use a public cloud
|
||||||
|
Consider using a public cloud to test the scalability limits of your
|
||||||
|
cloud controller configuration. Most public clouds bill by the hour,
|
||||||
|
which means it can be inexpensive to perform even a test with many
|
||||||
|
nodes.
|
||||||
|
|
||||||
|
Make another storage endpoint on the same system
|
||||||
|
If you use an external storage plug-in or shared file system with
|
||||||
|
your cloud, you can test whether it works by creating a second share
|
||||||
|
or endpoint. This allows you to test the system before entrusting
|
||||||
|
the new version on to your storage.
|
||||||
|
|
||||||
|
Watch the network
|
||||||
|
Even at smaller-scale testing, look for excess network packets to
|
||||||
|
determine whether something is going horribly wrong in
|
||||||
|
inter-component communication.
|
||||||
|
|
||||||
|
To set up the test environment, you can use one of several methods:
|
||||||
|
|
||||||
|
- Do a full manual install by using the `OpenStack Installation
|
||||||
|
Guide <http://docs.openstack.org/index.html#install-guides>`_ for
|
||||||
|
your platform. Review the final configuration files and installed
|
||||||
|
packages.
|
||||||
|
|
||||||
|
- Create a clone of your automated configuration infrastructure with
|
||||||
|
changed package repository URLs.
|
||||||
|
|
||||||
|
Alter the configuration until it works.
|
||||||
|
|
||||||
|
Either approach is valid. Use the approach that matches your experience.
|
||||||
|
|
||||||
|
An upgrade pre-testing system is excellent for getting the configuration
|
||||||
|
to work. However, it is important to note that the historical use of the
|
||||||
|
system and differences in user interaction can affect the success of
|
||||||
|
upgrades.
|
||||||
|
|
||||||
|
If possible, we highly recommend that you dump your production database
|
||||||
|
tables and test the upgrade in your development environment using this
|
||||||
|
data. Several MySQL bugs have been uncovered during database migrations
|
||||||
|
because of slight table differences between a fresh installation and
|
||||||
|
tables that migrated from one version to another. This will have impact
|
||||||
|
on large real datasets, which you do not want to encounter during a
|
||||||
|
production outage.
|
||||||
|
|
||||||
|
Artificial scale testing can go only so far. After your cloud is
|
||||||
|
upgraded, you must pay careful attention to the performance aspects of
|
||||||
|
your cloud.
|
||||||
|
|
||||||
|
Upgrade Levels
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Upgrade levels are a feature added to OpenStack Compute since the
|
||||||
|
Grizzly release to provide version locking on the RPC (Message Queue)
|
||||||
|
communications between the various Compute services.
|
||||||
|
|
||||||
|
This functionality is an important piece of the puzzle when it comes to
|
||||||
|
live upgrades and is conceptually similar to the existing API versioning
|
||||||
|
that allows OpenStack services of different versions to communicate
|
||||||
|
without issue.
|
||||||
|
|
||||||
|
Without upgrade levels, an X+1 version Compute service can receive and
|
||||||
|
understand X version RPC messages, but it can only send out X+1 version
|
||||||
|
RPC messages. For example, if a nova-conductor process has been upgraded
|
||||||
|
to X+1 version, then the conductor service will be able to understand
|
||||||
|
messages from X version nova-compute processes, but those compute
|
||||||
|
services will not be able to understand messages sent by the conductor
|
||||||
|
service.
|
||||||
|
|
||||||
|
During an upgrade, operators can add configuration options to
|
||||||
|
``nova.conf`` which lock the version of RPC messages and allow live
|
||||||
|
upgrading of the services without interruption caused by version
|
||||||
|
mismatch. The configuration options allow the specification of RPC
|
||||||
|
version numbers if desired, but release name alias are also supported.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[upgrade_levels]
|
||||||
|
compute=X+1
|
||||||
|
conductor=X+1
|
||||||
|
scheduler=X+1
|
||||||
|
|
||||||
|
will keep the RPC version locked across the specified services to the
|
||||||
|
RPC version used in X+1. As all instances of a particular service are
|
||||||
|
upgraded to the newer version, the corresponding line can be removed
|
||||||
|
from ``nova.conf``.
|
||||||
|
|
||||||
|
Using this functionality, ideally one would lock the RPC version to the
|
||||||
|
OpenStack version being upgraded from on nova-compute nodes, to ensure
|
||||||
|
that, for example X+1 version nova-compute processes will continue to
|
||||||
|
work with X version nova-conductor processes while the upgrade
|
||||||
|
completes. Once the upgrade of nova-compute processes is complete, the
|
||||||
|
operator can move onto upgrading nova-conductor and remove the version
|
||||||
|
locking for nova-compute in ``nova.conf``.
|
||||||
|
|
||||||
|
General upgrade process
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This section describes the process to upgrade a basic OpenStack
|
||||||
|
deployment based on the basic two-node architecture in the `OpenStack
|
||||||
|
Installation
|
||||||
|
Guide <http://docs.openstack.org/index.html#install-guides>`_. All
|
||||||
|
nodes must run a supported distribution of Linux with a recent kernel
|
||||||
|
and the current release packages.
|
||||||
|
|
||||||
|
|
||||||
|
Service specific upgrade instructions
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
* `Upgrading the Networking Service <http://docs.openstack.org/developer/neutron/devref/upgrade.html>`_
|
||||||
|
|
||||||
|
Prerequisites
|
||||||
|
-------------
|
||||||
|
|
||||||
|
- Perform some cleaning of the environment prior to starting the
|
||||||
|
upgrade process to ensure a consistent state. For example, instances
|
||||||
|
not fully purged from the system after deletion might cause
|
||||||
|
indeterminate behavior.
|
||||||
|
|
||||||
|
- For environments using the OpenStack Networking service (neutron),
|
||||||
|
verify the release version of the database. For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# su -s /bin/sh -c "neutron-db-manage --config-file /etc/neutron/neutron.conf \
|
||||||
|
--config-file /etc/neutron/plugins/ml2/ml2_conf.ini current" neutron
|
||||||
|
|
||||||
|
Perform a backup
|
||||||
|
----------------
|
||||||
|
|
||||||
|
#. Save the configuration files on all nodes. For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# for i in keystone glance nova neutron openstack-dashboard cinder heat ceilometer; \
|
||||||
|
do mkdir $i-kilo; \
|
||||||
|
done
|
||||||
|
# for i in keystone glance nova neutron openstack-dashboard cinder heat ceilometer; \
|
||||||
|
do cp -r /etc/$i/* $i-kilo/; \
|
||||||
|
done
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
You can modify this example script on each node to handle different
|
||||||
|
services.
|
||||||
|
|
||||||
|
#. Make a full database backup of your production data. As of Kilo,
|
||||||
|
database downgrades are not supported, and the only method available to
|
||||||
|
get back to a prior database version will be to restore from backup.
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# mysqldump -u root -p --opt --add-drop-database --all-databases > icehouse-db-backup.sql
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Consider updating your SQL server configuration as described in the
|
||||||
|
`OpenStack Installation
|
||||||
|
Guide <http://docs.openstack.org/index.html#install-guides>`_.
|
||||||
|
|
||||||
|
Manage repositories
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
On all nodes:
|
||||||
|
|
||||||
|
#. Remove the repository for the previous release packages.
|
||||||
|
|
||||||
|
#. Add the repository for the new release packages.
|
||||||
|
|
||||||
|
#. Update the repository database.
|
||||||
|
|
||||||
|
Upgrade packages on each node
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
Depending on your specific configuration, upgrading all packages might
|
||||||
|
restart or break services supplemental to your OpenStack environment.
|
||||||
|
For example, if you use the TGT iSCSI framework for Block Storage
|
||||||
|
volumes and the upgrade includes new packages for it, the package
|
||||||
|
manager might restart the TGT iSCSI services and impact connectivity to
|
||||||
|
volumes.
|
||||||
|
|
||||||
|
If the package manager prompts you to update configuration files, reject
|
||||||
|
the changes. The package manager appends a suffix to newer versions of
|
||||||
|
configuration files. Consider reviewing and adopting content from these
|
||||||
|
files.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
You may need to explicitly install the ``ipset`` package if your
|
||||||
|
distribution does not install it as a dependency.
|
||||||
|
|
||||||
|
Update services
|
||||||
|
---------------
|
||||||
|
|
||||||
|
To update a service on each node, you generally modify one or more
|
||||||
|
configuration files, stop the service, synchronize the database schema,
|
||||||
|
and start the service. Some services require different steps. We
|
||||||
|
recommend verifying operation of each service before proceeding to the
|
||||||
|
next service.
|
||||||
|
|
||||||
|
The order you should upgrade services, and any changes from the general
|
||||||
|
upgrade process is described below:
|
||||||
|
|
||||||
|
**Controller node**
|
||||||
|
|
||||||
|
#. OpenStack Identity - Clear any expired tokens before synchronizing
|
||||||
|
the database.
|
||||||
|
|
||||||
|
#. OpenStack Image service
|
||||||
|
|
||||||
|
#. OpenStack Compute, including networking components.
|
||||||
|
|
||||||
|
#. OpenStack Networking
|
||||||
|
|
||||||
|
#. OpenStack Block Storage
|
||||||
|
|
||||||
|
#. OpenStack dashboard - In typical environments, updating the
|
||||||
|
dashboard only requires restarting the Apache HTTP service.
|
||||||
|
|
||||||
|
#. OpenStack Orchestration
|
||||||
|
|
||||||
|
#. OpenStack Telemetry - In typical environments, updating the
|
||||||
|
Telemetry service only requires restarting the service.
|
||||||
|
|
||||||
|
#. OpenStack Compute - Edit the configuration file and restart the
|
||||||
|
service.
|
||||||
|
|
||||||
|
#. OpenStack Networking - Edit the configuration file and restart the
|
||||||
|
service.
|
||||||
|
|
||||||
|
**Compute nodes**
|
||||||
|
|
||||||
|
- OpenStack Block Storage - Updating the Block Storage service only
|
||||||
|
requires restarting the service.
|
||||||
|
|
||||||
|
**Storage nodes**
|
||||||
|
|
||||||
|
- OpenStack Networking - Edit the configuration file and restart the
|
||||||
|
service.
|
||||||
|
|
||||||
|
Final steps
|
||||||
|
-----------
|
||||||
|
|
||||||
|
On all distributions, you must perform some final tasks to complete the
|
||||||
|
upgrade process.
|
||||||
|
|
||||||
|
#. Decrease DHCP timeouts by modifying ``/etc/nova/nova.conf`` on the
|
||||||
|
compute nodes back to the original value for your environment.
|
||||||
|
|
||||||
|
#. Update all ``.ini`` files to match passwords and pipelines as required
|
||||||
|
for the OpenStack release in your environment.
|
||||||
|
|
||||||
|
#. After migration, users see different results from
|
||||||
|
:command:`nova image-list` and :command:`glance image-list`. To ensure
|
||||||
|
users see the same images in the list
|
||||||
|
commands, edit the ``/etc/glance/policy.json`` and
|
||||||
|
``/etc/nova/policy.json`` files to contain
|
||||||
|
``"context_is_admin": "role:admin"``, which limits access to private
|
||||||
|
images for projects.
|
||||||
|
|
||||||
|
#. Verify proper operation of your environment. Then, notify your users
|
||||||
|
that their cloud is operating normally again.
|
||||||
|
|
||||||
|
.. _rolling_back_a_failed_upgrade:
|
||||||
|
|
||||||
|
Rolling back a failed upgrade
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Upgrades involve complex operations and can fail. Before attempting any
|
||||||
|
upgrade, you should make a full database backup of your production data.
|
||||||
|
As of Kilo, database downgrades are not supported, and the only method
|
||||||
|
available to get back to a prior database version will be to restore
|
||||||
|
from backup.
|
||||||
|
|
||||||
|
This section provides guidance for rolling back to a previous release of
|
||||||
|
OpenStack. All distributions follow a similar procedure.
|
||||||
|
|
||||||
|
A common scenario is to take down production management services in
|
||||||
|
preparation for an upgrade, completed part of the upgrade process, and
|
||||||
|
discovered one or more problems not encountered during testing. As a
|
||||||
|
consequence, you must roll back your environment to the original "known
|
||||||
|
good" state. You also made sure that you did not make any state changes
|
||||||
|
after attempting the upgrade process; no new instances, networks,
|
||||||
|
storage volumes, and so on. Any of these new resources will be in a
|
||||||
|
frozen state after the databases are restored from backup.
|
||||||
|
|
||||||
|
Within this scope, you must complete these steps to successfully roll
|
||||||
|
back your environment:
|
||||||
|
|
||||||
|
#. Roll back configuration files.
|
||||||
|
|
||||||
|
#. Restore databases from backup.
|
||||||
|
|
||||||
|
#. Roll back packages.
|
||||||
|
|
||||||
|
You should verify that you have the requisite backups to restore.
|
||||||
|
Rolling back upgrades is a tricky process because distributions tend to
|
||||||
|
put much more effort into testing upgrades than downgrades. Broken
|
||||||
|
downgrades take significantly more effort to troubleshoot and, resolve
|
||||||
|
than broken upgrades. Only you can weigh the risks of trying to push a
|
||||||
|
failed upgrade forward versus rolling it back. Generally, consider
|
||||||
|
rolling back as the very last option.
|
||||||
|
|
||||||
|
The following steps described for Ubuntu have worked on at least one
|
||||||
|
production environment, but they might not work for all environments.
|
||||||
|
|
||||||
|
**To perform the rollback**
|
||||||
|
|
||||||
|
#. Stop all OpenStack services.
|
||||||
|
|
||||||
|
#. Copy contents of configuration backup directories that you created
|
||||||
|
during the upgrade process back to ``/etc/<service>`` directory.
|
||||||
|
|
||||||
|
#. Restore databases from the ``RELEASE_NAME-db-backup.sql`` backup file
|
||||||
|
that you created with the :command:`mysqldump` command during the upgrade
|
||||||
|
process:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# mysql -u root -p < RELEASE_NAME-db-backup.sql
|
||||||
|
|
||||||
|
#. Downgrade OpenStack packages.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
Downgrading packages is by far the most complicated step; it is
|
||||||
|
highly dependent on the distribution and the overall administration
|
||||||
|
of the system.
|
||||||
|
|
||||||
|
#. Determine which OpenStack packages are installed on your system. Use the
|
||||||
|
:command:`dpkg --get-selections` command. Filter for OpenStack
|
||||||
|
packages, filter again to omit packages explicitly marked in the
|
||||||
|
``deinstall`` state, and save the final output to a file. For example,
|
||||||
|
the following command covers a controller node with keystone, glance,
|
||||||
|
nova, neutron, and cinder:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# dpkg --get-selections | grep -e keystone -e glance -e nova -e neutron \
|
||||||
|
-e cinder | grep -v deinstall | tee openstack-selections
|
||||||
|
cinder-api install
|
||||||
|
cinder-common install
|
||||||
|
cinder-scheduler install
|
||||||
|
cinder-volume install
|
||||||
|
glance install
|
||||||
|
glance-api install
|
||||||
|
glance-common install
|
||||||
|
glance-registry install
|
||||||
|
neutron-common install
|
||||||
|
neutron-dhcp-agent install
|
||||||
|
neutron-l3-agent install
|
||||||
|
neutron-lbaas-agent install
|
||||||
|
neutron-metadata-agent install
|
||||||
|
neutron-plugin-openvswitch install
|
||||||
|
neutron-plugin-openvswitch-agent install
|
||||||
|
neutron-server install
|
||||||
|
nova-api install
|
||||||
|
nova-cert install
|
||||||
|
nova-common install
|
||||||
|
nova-conductor install
|
||||||
|
nova-consoleauth install
|
||||||
|
nova-novncproxy install
|
||||||
|
nova-objectstore install
|
||||||
|
nova-scheduler install
|
||||||
|
python-cinder install
|
||||||
|
python-cinderclient install
|
||||||
|
python-glance install
|
||||||
|
python-glanceclient install
|
||||||
|
python-keystone install
|
||||||
|
python-keystoneclient install
|
||||||
|
python-neutron install
|
||||||
|
python-neutronclient install
|
||||||
|
python-nova install
|
||||||
|
python-novaclient install
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Depending on the type of server, the contents and order of your
|
||||||
|
package list might vary from this example.
|
||||||
|
|
||||||
|
#. You can determine the package versions available for reversion by using
|
||||||
|
the ``apt-cache policy`` command. If you removed the Grizzly
|
||||||
|
repositories, you must first reinstall them and run ``apt-get update``:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# apt-cache policy nova-common
|
||||||
|
nova-common:
|
||||||
|
Installed: 1:2013.2-0ubuntu1~cloud0
|
||||||
|
Candidate: 1:2013.2-0ubuntu1~cloud0
|
||||||
|
Version table:
|
||||||
|
*** 1:2013.2-0ubuntu1~cloud0 0
|
||||||
|
500 http://ubuntu-cloud.archive.canonical.com/ubuntu/
|
||||||
|
precise-updates/havana/main amd64 Packages
|
||||||
|
100 /var/lib/dpkg/status
|
||||||
|
1:2013.1.4-0ubuntu1~cloud0 0
|
||||||
|
500 http://ubuntu-cloud.archive.canonical.com/ubuntu/
|
||||||
|
precise-updates/grizzly/main amd64 Packages
|
||||||
|
2012.1.3+stable-20130423-e52e6912-0ubuntu1.2 0
|
||||||
|
500 http://us.archive.ubuntu.com/ubuntu/
|
||||||
|
precise-updates/main amd64 Packages
|
||||||
|
500 http://security.ubuntu.com/ubuntu/
|
||||||
|
precise-security/main amd64 Packages
|
||||||
|
2012.1-0ubuntu2 0
|
||||||
|
500 http://us.archive.ubuntu.com/ubuntu/
|
||||||
|
precise/main amd64 Packages
|
||||||
|
|
||||||
|
This tells us the currently installed version of the package, newest
|
||||||
|
candidate version, and all versions along with the repository that
|
||||||
|
contains each version. Look for the appropriate Grizzly
|
||||||
|
version— ``1:2013.1.4-0ubuntu1~cloud0`` in this case. The process of
|
||||||
|
manually picking through this list of packages is rather tedious and
|
||||||
|
prone to errors. You should consider using the following script to help
|
||||||
|
with this process:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# for i in `cut -f 1 openstack-selections | sed 's/neutron/quantum/;'`;
|
||||||
|
do echo -n $i ;apt-cache policy $i | grep -B 1 grizzly |
|
||||||
|
grep -v Packages | awk '{print "="$1}';done | tr '\n' ' ' |
|
||||||
|
tee openstack-grizzly-versions
|
||||||
|
cinder-api=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
cinder-common=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
cinder-scheduler=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
cinder-volume=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
glance=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
glance-api=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
glance-common=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
glance-registry=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
quantum-common=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
quantum-dhcp-agent=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
quantum-l3-agent=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
quantum-lbaas-agent=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
quantum-metadata-agent=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
quantum-plugin-openvswitch=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
quantum-plugin-openvswitch-agent=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
quantum-server=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
nova-api=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
nova-cert=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
nova-common=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
nova-conductor=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
nova-consoleauth=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
nova-novncproxy=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
nova-objectstore=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
nova-scheduler=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
python-cinder=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
python-cinderclient=1:1.0.3-0ubuntu1~cloud0
|
||||||
|
python-glance=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
python-glanceclient=1:0.9.0-0ubuntu1.2~cloud0
|
||||||
|
python-quantum=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
python-quantumclient=1:2.2.0-0ubuntu1~cloud0
|
||||||
|
python-nova=1:2013.1.4-0ubuntu1~cloud0
|
||||||
|
python-novaclient=1:2.13.0-0ubuntu1~cloud0
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
If you decide to continue this step manually, don't forget to change
|
||||||
|
``neutron`` to ``quantum`` where applicable.
|
||||||
|
|
||||||
|
#. Use the :command:`apt-get install` command to install specific versions of each
|
||||||
|
package by specifying ``<package-name>=<version>``. The script in the
|
||||||
|
previous step conveniently created a list of ``package=version`` pairs
|
||||||
|
for you:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# apt-get install `cat openstack-grizzly-versions`
|
||||||
|
|
||||||
|
This step completes the rollback procedure. You should remove the
|
||||||
|
upgrade release repository and run :command:`apt-get update` to prevent
|
||||||
|
accidental upgrades until you solve whatever issue caused you to roll
|
||||||
|
back your environment.
|
|
@ -0,0 +1,324 @@
|
||||||
|
==================
|
||||||
|
Upstream OpenStack
|
||||||
|
==================
|
||||||
|
|
||||||
|
OpenStack is founded on a thriving community that is a source of help
|
||||||
|
and welcomes your contributions. This chapter details some of the ways
|
||||||
|
you can interact with the others involved.
|
||||||
|
|
||||||
|
Getting Help
|
||||||
|
~~~~~~~~~~~~
|
||||||
|
|
||||||
|
There are several avenues available for seeking assistance. The quickest
|
||||||
|
way is to help the community help you. Search the Q&A sites, mailing
|
||||||
|
list archives, and bug lists for issues similar to yours. If you can't
|
||||||
|
find anything, follow the directions for reporting bugs or use one of
|
||||||
|
the channels for support, which are listed below.
|
||||||
|
|
||||||
|
Your first port of call should be the official OpenStack documentation,
|
||||||
|
found on http://docs.openstack.org. You can get questions answered on
|
||||||
|
http://ask.openstack.org.
|
||||||
|
|
||||||
|
`Mailing lists <https://wiki.openstack.org/wiki/Mailing_Lists>`_ are
|
||||||
|
also a great place to get help. The wiki page has more information about
|
||||||
|
the various lists. As an operator, the main lists you should be aware of
|
||||||
|
are:
|
||||||
|
|
||||||
|
`General list <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>`_
|
||||||
|
*openstack@lists.openstack.org*. The scope of this list is the
|
||||||
|
current state of OpenStack. This is a very high-traffic mailing
|
||||||
|
list, with many, many emails per day.
|
||||||
|
|
||||||
|
`Operators list <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>`_
|
||||||
|
*openstack-operators@lists.openstack.org.* This list is intended for
|
||||||
|
discussion among existing OpenStack cloud operators, such as
|
||||||
|
yourself. Currently, this list is relatively low traffic, on the
|
||||||
|
order of one email a day.
|
||||||
|
|
||||||
|
`Development list <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>`_
|
||||||
|
*openstack-dev@lists.openstack.org*. The scope of this list is the
|
||||||
|
future state of OpenStack. This is a high-traffic mailing list, with
|
||||||
|
multiple emails per day.
|
||||||
|
|
||||||
|
We recommend that you subscribe to the general list and the operator
|
||||||
|
list, although you must set up filters to manage the volume for the
|
||||||
|
general list. You'll also find links to the mailing list archives on the
|
||||||
|
mailing list wiki page, where you can search through the discussions.
|
||||||
|
|
||||||
|
`Multiple IRC channels <https://wiki.openstack.org/wiki/IRC>`_ are
|
||||||
|
available for general questions and developer discussions. The general
|
||||||
|
discussion channel is #openstack on *irc.freenode.net*.
|
||||||
|
|
||||||
|
Reporting Bugs
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
As an operator, you are in a very good position to report unexpected
|
||||||
|
behavior with your cloud. Since OpenStack is flexible, you may be the
|
||||||
|
only individual to report a particular issue. Every issue is important
|
||||||
|
to fix, so it is essential to learn how to easily submit a bug
|
||||||
|
report.
|
||||||
|
|
||||||
|
All OpenStack projects use `Launchpad <https://launchpad.net/>`_
|
||||||
|
for bug tracking. You'll need to create an account on Launchpad before you
|
||||||
|
can submit a bug report.
|
||||||
|
|
||||||
|
Once you have a Launchpad account, reporting a bug is as simple as
|
||||||
|
identifying the project or projects that are causing the issue.
|
||||||
|
Sometimes this is more difficult than expected, but those working on the
|
||||||
|
bug triage are happy to help relocate issues if they are not in the
|
||||||
|
right place initially:
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`nova <https://bugs.launchpad.net/nova/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`python-novaclient <https://bugs.launchpad.net/python-novaclient/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`swift <https://bugs.launchpad.net/swift/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`python-swiftclient <https://bugs.launchpad.net/python-swiftclient/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`glance <https://bugs.launchpad.net/glance/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`python-glanceclient <https://bugs.launchpad.net/python-glanceclient/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`keystone <https://bugs.launchpad.net/keystone/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`python-keystoneclient <https://bugs.launchpad.net/python-keystoneclient/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`neutron <https://bugs.launchpad.net/neutron/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`python-neutronclient <https://bugs.launchpad.net/python-neutronclient/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`cinder <https://bugs.launchpad.net/cinder/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`python-cinderclient <https://bugs.launchpad.net/python-cinderclient/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`manila <https://bugs.launchpad.net/manila/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`python-manilaclient <https://bugs.launchpad.net/python-manilaclient/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`python-openstackclient <https://bugs.launchpad.net/python-openstackclient/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug in
|
||||||
|
`horizon <https://bugs.launchpad.net/horizon/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug with the
|
||||||
|
`documentation <https://bugs.launchpad.net/openstack-manuals/+filebug/+login>`_.
|
||||||
|
|
||||||
|
- Report a bug with the `API
|
||||||
|
documentation <https://bugs.launchpad.net/openstack-api-site/+filebug/+login>`_.
|
||||||
|
|
||||||
|
To write a good bug report, the following process is essential. First,
|
||||||
|
search for the bug to make sure there is no bug already filed for the
|
||||||
|
same issue. If you find one, be sure to click on "This bug affects X
|
||||||
|
people. Does this bug affect you?" If you can't find the issue, then
|
||||||
|
enter the details of your report. It should at least include:
|
||||||
|
|
||||||
|
- The release, or milestone, or commit ID corresponding to the software
|
||||||
|
that you are running
|
||||||
|
|
||||||
|
- The operating system and version where you've identified the bug
|
||||||
|
|
||||||
|
- Steps to reproduce the bug, including what went wrong
|
||||||
|
|
||||||
|
- Description of the expected results instead of what you saw
|
||||||
|
|
||||||
|
- Portions of your log files so that you include only relevant excerpts
|
||||||
|
|
||||||
|
When you do this, the bug is created with:
|
||||||
|
|
||||||
|
- Status: *New*
|
||||||
|
|
||||||
|
In the bug comments, you can contribute instructions on how to fix a
|
||||||
|
given bug, and set it to *Triaged*. Or you can directly fix it: assign
|
||||||
|
the bug to yourself, set it to *In progress*, branch the code, implement
|
||||||
|
the fix, and propose your change for merging. But let's not get ahead of
|
||||||
|
ourselves; there are bug triaging tasks as well.
|
||||||
|
|
||||||
|
Confirming and Prioritizing
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
This stage is about checking that a bug is real and assessing its
|
||||||
|
impact. Some of these steps require bug supervisor rights (usually
|
||||||
|
limited to core teams). If the bug lacks information to properly
|
||||||
|
reproduce or assess the importance of the bug, the bug is set to:
|
||||||
|
|
||||||
|
- Status: *Incomplete*
|
||||||
|
|
||||||
|
Once you have reproduced the issue (or are 100 percent confident that
|
||||||
|
this is indeed a valid bug) and have permissions to do so, set:
|
||||||
|
|
||||||
|
- Status: *Confirmed*
|
||||||
|
|
||||||
|
Core developers also prioritize the bug, based on its impact:
|
||||||
|
|
||||||
|
- Importance: <Bug impact>
|
||||||
|
|
||||||
|
The bug impacts are categorized as follows:
|
||||||
|
|
||||||
|
#. *Critical* if the bug prevents a key feature from working properly
|
||||||
|
(regression) for all users (or without a simple workaround) or
|
||||||
|
results in data loss
|
||||||
|
|
||||||
|
#. *High* if the bug prevents a key feature from working properly for
|
||||||
|
some users (or with a workaround)
|
||||||
|
|
||||||
|
#. *Medium* if the bug prevents a secondary feature from working
|
||||||
|
properly
|
||||||
|
|
||||||
|
#. *Low* if the bug is mostly cosmetic
|
||||||
|
|
||||||
|
#. *Wishlist* if the bug is not really a bug but rather a welcome change
|
||||||
|
in behavior
|
||||||
|
|
||||||
|
If the bug contains the solution, or a patch, set the bug status to
|
||||||
|
*Triaged*.
|
||||||
|
|
||||||
|
Bug Fixing
|
||||||
|
----------
|
||||||
|
|
||||||
|
At this stage, a developer works on a fix. During that time, to avoid
|
||||||
|
duplicating the work, the developer should set:
|
||||||
|
|
||||||
|
- Status: *In Progress*
|
||||||
|
|
||||||
|
- Assignee: <yourself>
|
||||||
|
|
||||||
|
When the fix is ready, the developer proposes a change and gets the
|
||||||
|
change reviewed.
|
||||||
|
|
||||||
|
After the Change Is Accepted
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
After the change is reviewed, accepted, and lands in master, it
|
||||||
|
automatically moves to:
|
||||||
|
|
||||||
|
- Status: *Fix Committed*
|
||||||
|
|
||||||
|
When the fix makes it into a milestone or release branch, it
|
||||||
|
automatically moves to:
|
||||||
|
|
||||||
|
- Milestone: Milestone the bug was fixed in
|
||||||
|
|
||||||
|
- Status: \ *Fix Released*
|
||||||
|
|
||||||
|
Join the OpenStack Community
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Since you've made it this far in the book, you should consider becoming
|
||||||
|
an official individual member of the community and `join the OpenStack
|
||||||
|
Foundation <https://www.openstack.org/join/>`_. The OpenStack
|
||||||
|
Foundation is an independent body providing shared resources to help
|
||||||
|
achieve the OpenStack mission by protecting, empowering, and promoting
|
||||||
|
OpenStack software and the community around it, including users,
|
||||||
|
developers, and the entire ecosystem. We all share the responsibility to
|
||||||
|
make this community the best it can possibly be, and signing up to be a
|
||||||
|
member is the first step to participating. Like the software, individual
|
||||||
|
membership within the OpenStack Foundation is free and accessible to
|
||||||
|
anyone.
|
||||||
|
|
||||||
|
How to Contribute to the Documentation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack documentation efforts encompass operator and administrator
|
||||||
|
docs, API docs, and user docs.
|
||||||
|
|
||||||
|
The genesis of this book was an in-person event, but now that the book
|
||||||
|
is in your hands, we want you to contribute to it. OpenStack
|
||||||
|
documentation follows the coding principles of iterative work, with bug
|
||||||
|
logging, investigating, and fixing.
|
||||||
|
|
||||||
|
Just like the code, http://docs.openstack.org is updated constantly
|
||||||
|
using the Gerrit review system, with source stored in git.openstack.org
|
||||||
|
in the `openstack-manuals
|
||||||
|
repository <https://git.openstack.org/cgit/openstack/openstack-manuals/>`_
|
||||||
|
and the `api-site
|
||||||
|
repository <https://git.openstack.org/cgit/openstack/api-site/>`_.
|
||||||
|
|
||||||
|
To review the documentation before it's published, go to the OpenStack
|
||||||
|
Gerrit server at \ http://review.openstack.org and search for
|
||||||
|
`project:openstack/openstack-manuals <https://review.openstack.org/#/q/status:open+project:openstack/openstack-manuals,n,z>`_
|
||||||
|
or
|
||||||
|
`project:openstack/api-site <https://review.openstack.org/#/q/status:open+project:openstack/api-site,n,z>`_.
|
||||||
|
|
||||||
|
See the `How To Contribute page on the
|
||||||
|
wiki <https://wiki.openstack.org/wiki/How_To_Contribute>`_ for more
|
||||||
|
information on the steps you need to take to submit your first
|
||||||
|
documentation review or change.
|
||||||
|
|
||||||
|
Security Information
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
As a community, we take security very seriously and follow a specific
|
||||||
|
process for reporting potential issues. We vigilantly pursue fixes and
|
||||||
|
regularly eliminate exposures. You can report security issues you
|
||||||
|
discover through this specific process. The OpenStack Vulnerability
|
||||||
|
Management Team is a very small group of experts in vulnerability
|
||||||
|
management drawn from the OpenStack community. The team's job is
|
||||||
|
facilitating the reporting of vulnerabilities, coordinating security
|
||||||
|
fixes and handling progressive disclosure of the vulnerability
|
||||||
|
information. Specifically, the team is responsible for the following
|
||||||
|
functions:
|
||||||
|
|
||||||
|
Vulnerability management
|
||||||
|
All vulnerabilities discovered by community members (or users) can
|
||||||
|
be reported to the team.
|
||||||
|
|
||||||
|
Vulnerability tracking
|
||||||
|
The team will curate a set of vulnerability related issues in the
|
||||||
|
issue tracker. Some of these issues are private to the team and the
|
||||||
|
affected product leads, but once remediation is in place, all
|
||||||
|
vulnerabilities are public.
|
||||||
|
|
||||||
|
Responsible disclosure
|
||||||
|
As part of our commitment to work with the security community, the
|
||||||
|
team ensures that proper credit is given to security researchers who
|
||||||
|
responsibly report issues in OpenStack.
|
||||||
|
|
||||||
|
We provide two ways to report issues to the OpenStack Vulnerability
|
||||||
|
Management Team, depending on how sensitive the issue is:
|
||||||
|
|
||||||
|
- Open a bug in Launchpad and mark it as a "security bug." This makes
|
||||||
|
the bug private and accessible to only the Vulnerability Management
|
||||||
|
Team.
|
||||||
|
|
||||||
|
- If the issue is extremely sensitive, send an encrypted email to one
|
||||||
|
of the team's members. Find their GPG keys at `OpenStack
|
||||||
|
Security <http://www.openstack.org/projects/openstack-security/>`_.
|
||||||
|
|
||||||
|
You can find the full list of security-oriented teams you can join at
|
||||||
|
`Security Teams <https://wiki.openstack.org/wiki/SecurityTeams>`_. The
|
||||||
|
vulnerability management process is fully documented at `Vulnerability
|
||||||
|
Management <https://wiki.openstack.org/wiki/VulnerabilityManagement>`_.
|
||||||
|
|
||||||
|
Finding Additional Information
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In addition to this book, there are many other sources of information
|
||||||
|
about OpenStack. The
|
||||||
|
`OpenStack website <http://www.openstack.org/>`_
|
||||||
|
is a good starting point, with
|
||||||
|
`OpenStack Docs <http://docs.openstack.org/>`_ and `OpenStack API
|
||||||
|
Docs <http://developer.openstack.org/>`_ providing technical
|
||||||
|
documentation about OpenStack. The `OpenStack
|
||||||
|
wiki <https://wiki.openstack.org/wiki/Main_Page>`_ contains a lot of
|
||||||
|
general information that cuts across the OpenStack projects, including a
|
||||||
|
list of `recommended
|
||||||
|
tools <https://wiki.openstack.org/wiki/OperationsTools>`_. Finally,
|
||||||
|
there are a number of blogs aggregated at \ `Planet
|
||||||
|
OpenStack <http://planet.openstack.org/>`_.OpenStack community
|
||||||
|
additional information
|
|
@ -0,0 +1,500 @@
|
||||||
|
=======
|
||||||
|
Preface
|
||||||
|
=======
|
||||||
|
|
||||||
|
OpenStack is an open source platform that lets you build an
|
||||||
|
:term:`Infrastructure-as-a-Service (IaaS)<IaaS>` cloud that runs on commodity
|
||||||
|
hardware.
|
||||||
|
|
||||||
|
Introduction to OpenStack
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack believes in open source, open design, and open development,
|
||||||
|
all in an open community that encourages participation by anyone. The
|
||||||
|
long-term vision for OpenStack is to produce a ubiquitous open source
|
||||||
|
cloud computing platform that meets the needs of public and private
|
||||||
|
cloud providers regardless of size. OpenStack services control large
|
||||||
|
pools of compute, storage, and networking resources throughout a data
|
||||||
|
center.
|
||||||
|
|
||||||
|
The technology behind OpenStack consists of a series of interrelated
|
||||||
|
projects delivering various components for a cloud infrastructure
|
||||||
|
solution. Each service provides an open API so that all of these
|
||||||
|
resources can be managed through a dashboard that gives administrators
|
||||||
|
control while empowering users to provision resources through a web
|
||||||
|
interface, a command-line client, or software development kits that
|
||||||
|
support the API. Many OpenStack APIs are extensible, meaning you can
|
||||||
|
keep compatibility with a core set of calls while providing access to
|
||||||
|
more resources and innovating through API extensions. The OpenStack
|
||||||
|
project is a global collaboration of developers and cloud computing
|
||||||
|
technologists. The project produces an open standard cloud computing
|
||||||
|
platform for both public and private clouds. By focusing on ease of
|
||||||
|
implementation, massive scalability, a variety of rich features, and
|
||||||
|
tremendous extensibility, the project aims to deliver a practical and
|
||||||
|
reliable cloud solution for all types of organizations.
|
||||||
|
|
||||||
|
Getting Started with OpenStack
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
As an open source project, one of the unique aspects of OpenStack is
|
||||||
|
that it has many different levels at which you can begin to engage with
|
||||||
|
it—you don't have to do everything yourself.
|
||||||
|
|
||||||
|
Using OpenStack
|
||||||
|
---------------
|
||||||
|
|
||||||
|
You could ask, "Do I even need to build a cloud?" If you want to start
|
||||||
|
using a compute or storage service by just swiping your credit card, you
|
||||||
|
can go to eNovance, HP, Rackspace, or other organizations to start using
|
||||||
|
their public OpenStack clouds. Using their OpenStack cloud resources is
|
||||||
|
similar to accessing the publicly available Amazon Web Services Elastic
|
||||||
|
Compute Cloud (EC2) or Simple Storage Solution (S3).
|
||||||
|
|
||||||
|
Plug and Play OpenStack
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
However, the enticing part of OpenStack might be to build your own
|
||||||
|
private cloud, and there are several ways to accomplish this goal.
|
||||||
|
Perhaps the simplest of all is an appliance-style solution. You purchase
|
||||||
|
an appliance, unpack it, plug in the power and the network, and watch it
|
||||||
|
transform into an OpenStack cloud with minimal additional configuration.
|
||||||
|
|
||||||
|
However, hardware choice is important for many applications, so if that
|
||||||
|
applies to you, consider that there are several software distributions
|
||||||
|
available that you can run on servers, storage, and network products of
|
||||||
|
your choosing. Canonical (where OpenStack replaced Eucalyptus as the
|
||||||
|
default cloud option in 2011), Red Hat, and SUSE offer enterprise
|
||||||
|
OpenStack solutions and support. You may also want to take a look at
|
||||||
|
some of the specialized distributions, such as those from Rackspace,
|
||||||
|
Piston, SwiftStack, or Cloudscaling.
|
||||||
|
|
||||||
|
Alternatively, if you want someone to help guide you through the
|
||||||
|
decisions about the underlying hardware or your applications, perhaps
|
||||||
|
adding in a few features or integrating components along the way,
|
||||||
|
consider contacting one of the system integrators with OpenStack
|
||||||
|
experience, such as Mirantis or Metacloud.
|
||||||
|
|
||||||
|
If your preference is to build your own OpenStack expertise internally,
|
||||||
|
a good way to kick-start that might be to attend or arrange a training
|
||||||
|
session. The OpenStack Foundation has a `Training
|
||||||
|
Marketplace <http://www.openstack.org/marketplace/training>`_ where you
|
||||||
|
can look for nearby events. Also, the OpenStack community is `working to
|
||||||
|
produce <https://wiki.openstack.org/wiki/Training-guides>`_ open source
|
||||||
|
training materials.
|
||||||
|
|
||||||
|
Roll Your Own OpenStack
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
However, this guide has a different audience—those seeking flexibility
|
||||||
|
from the OpenStack framework by deploying do-it-yourself solutions.
|
||||||
|
|
||||||
|
OpenStack is designed for horizontal scalability, so you can easily add
|
||||||
|
new compute, network, and storage resources to grow your cloud over
|
||||||
|
time. In addition to the pervasiveness of massive OpenStack public
|
||||||
|
clouds, many organizations, such as PayPal, Intel, and Comcast, build
|
||||||
|
large-scale private clouds. OpenStack offers much more than a typical
|
||||||
|
software package because it lets you integrate a number of different
|
||||||
|
technologies to construct a cloud. This approach provides great
|
||||||
|
flexibility, but the number of options might be daunting at first.
|
||||||
|
|
||||||
|
Who This Book Is For
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This book is for those of you starting to run OpenStack clouds as well
|
||||||
|
as those of you who were handed an operational one and want to keep it
|
||||||
|
running well. Perhaps you're on a DevOps team, perhaps you are a system
|
||||||
|
administrator starting to dabble in the cloud, or maybe you want to get
|
||||||
|
on the OpenStack cloud team at your company. This book is for all of
|
||||||
|
you.
|
||||||
|
|
||||||
|
This guide assumes that you are familiar with a Linux distribution that
|
||||||
|
supports OpenStack, SQL databases, and virtualization. You must be
|
||||||
|
comfortable administering and configuring multiple Linux machines for
|
||||||
|
networking. You must install and maintain an SQL database and
|
||||||
|
occasionally run queries against it.
|
||||||
|
|
||||||
|
One of the most complex aspects of an OpenStack cloud is the networking
|
||||||
|
configuration. You should be familiar with concepts such as DHCP, Linux
|
||||||
|
bridges, VLANs, and iptables. You must also have access to a network
|
||||||
|
hardware expert who can configure the switches and routers required in
|
||||||
|
your OpenStack cloud.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Cloud computing is quite an advanced topic, and this book requires a
|
||||||
|
lot of background knowledge. However, if you are fairly new to cloud
|
||||||
|
computing, we recommend that you make use of the :doc:`common/glossary`
|
||||||
|
at the back of the book, as well as the online documentation for OpenStack
|
||||||
|
and additional resources mentioned in this book in :doc:`app_resources`.
|
||||||
|
|
||||||
|
Further Reading
|
||||||
|
---------------
|
||||||
|
|
||||||
|
There are other books on the `OpenStack documentation
|
||||||
|
website <http://docs.openstack.org>`_ that can help you get the job
|
||||||
|
done.
|
||||||
|
|
||||||
|
OpenStack Installation Guides
|
||||||
|
Describes a manual installation process, as in, by hand, without
|
||||||
|
automation, for multiple distributions based on a packaging system:
|
||||||
|
|
||||||
|
- `Installation Guide for openSUSE 13.2 and SUSE Linux Enterprise
|
||||||
|
Server
|
||||||
|
12 <http://docs.openstack.org/liberty/install-guide-obs/>`_
|
||||||
|
|
||||||
|
- `Installation Guide for Red Hat Enterprise Linux 7 and CentOS
|
||||||
|
7 <http://docs.openstack.org/liberty/install-guide-rdo/>`_
|
||||||
|
|
||||||
|
- `Installation Guide for Ubuntu 14.04 (LTS)
|
||||||
|
Server <http://docs.openstack.org/liberty/install-guide-ubuntu/>`_
|
||||||
|
|
||||||
|
`OpenStack Configuration Reference <http://docs.openstack.org/liberty/config-reference/content/>`_
|
||||||
|
Contains a reference listing of all configuration options for core
|
||||||
|
and integrated OpenStack services by release version
|
||||||
|
|
||||||
|
`OpenStack Administrator Guide <http://docs.openstack.org/admin-guide/>`_
|
||||||
|
Contains how-to information for managing an OpenStack cloud as
|
||||||
|
needed for your use cases, such as storage, computing, or
|
||||||
|
software-defined-networking
|
||||||
|
|
||||||
|
`OpenStack High Availability Guide <http://docs.openstack.org/ha-guide/index.html>`_
|
||||||
|
Describes potential strategies for making your OpenStack services
|
||||||
|
and related controllers and data stores highly available
|
||||||
|
|
||||||
|
`OpenStack Security Guide <http://docs.openstack.org/sec/>`_
|
||||||
|
Provides best practices and conceptual information about securing an
|
||||||
|
OpenStack cloud
|
||||||
|
|
||||||
|
`Virtual Machine Image Guide <http://docs.openstack.org/image-guide/>`_
|
||||||
|
Shows you how to obtain, create, and modify virtual machine images
|
||||||
|
that are compatible with OpenStack
|
||||||
|
|
||||||
|
`OpenStack End User Guide <http://docs.openstack.org/user-guide/>`_
|
||||||
|
Shows OpenStack end users how to create and manage resources in an
|
||||||
|
OpenStack cloud with the OpenStack dashboard and OpenStack client
|
||||||
|
commands
|
||||||
|
|
||||||
|
`Networking Guide <http://docs.openstack.org/networking-guide/>`_
|
||||||
|
This guide targets OpenStack administrators seeking to deploy and
|
||||||
|
manage OpenStack Networking (neutron).
|
||||||
|
|
||||||
|
`OpenStack API Guide <http://developer.openstack.org/api-guide/quick-start/>`_
|
||||||
|
A brief overview of how to send REST API requests to endpoints for
|
||||||
|
OpenStack services
|
||||||
|
|
||||||
|
How This Book Is Organized
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This book is organized into two parts: the architecture decisions for
|
||||||
|
designing OpenStack clouds and the repeated operations for running
|
||||||
|
OpenStack clouds.
|
||||||
|
|
||||||
|
**Part I:**
|
||||||
|
|
||||||
|
:doc:`arch_examples`
|
||||||
|
Because of all the decisions the other chapters discuss, this
|
||||||
|
chapter describes the decisions made for this particular book and
|
||||||
|
much of the justification for the example architecture.
|
||||||
|
|
||||||
|
:doc:`arch_provision`
|
||||||
|
While this book doesn't describe installation, we do recommend
|
||||||
|
automation for deployment and configuration, discussed in this
|
||||||
|
chapter.
|
||||||
|
|
||||||
|
:doc:`arch_cloud_controller`
|
||||||
|
The cloud controller is an invention for the sake of consolidating
|
||||||
|
and describing which services run on which nodes. This chapter
|
||||||
|
discusses hardware and network considerations as well as how to
|
||||||
|
design the cloud controller for performance and separation of
|
||||||
|
services.
|
||||||
|
|
||||||
|
:doc:`arch_compute_nodes`
|
||||||
|
This chapter describes the compute nodes, which are dedicated to
|
||||||
|
running virtual machines. Some hardware choices come into play here,
|
||||||
|
as well as logging and networking descriptions.
|
||||||
|
|
||||||
|
:doc:`arch_scaling`
|
||||||
|
This chapter discusses the growth of your cloud resources through
|
||||||
|
scaling and segregation considerations.
|
||||||
|
|
||||||
|
:doc:`arch_storage`
|
||||||
|
As with other architecture decisions, storage concepts within
|
||||||
|
OpenStack offer many options. This chapter lays out the choices for
|
||||||
|
you.
|
||||||
|
|
||||||
|
:doc:`arch_network_design`
|
||||||
|
Your OpenStack cloud networking needs to fit into your existing
|
||||||
|
networks while also enabling the best design for your users and
|
||||||
|
administrators, and this chapter gives you in-depth information
|
||||||
|
about networking decisions.
|
||||||
|
|
||||||
|
**Part II:**
|
||||||
|
|
||||||
|
:doc:`ops_lay_of_the_land`
|
||||||
|
This chapter is written to let you get your hands wrapped around
|
||||||
|
your OpenStack cloud through command-line tools and understanding
|
||||||
|
what is already set up in your cloud.
|
||||||
|
|
||||||
|
:doc:`ops_projects_users`
|
||||||
|
This chapter walks through user-enabling processes that all admins
|
||||||
|
must face to manage users, give them quotas to parcel out resources,
|
||||||
|
and so on.
|
||||||
|
|
||||||
|
:doc:`ops_user_facing_operations`
|
||||||
|
This chapter shows you how to use OpenStack cloud resources and how
|
||||||
|
to train your users.
|
||||||
|
|
||||||
|
:doc:`ops_maintenance`
|
||||||
|
This chapter goes into the common failures that the authors have
|
||||||
|
seen while running clouds in production, including troubleshooting.
|
||||||
|
|
||||||
|
:doc:`ops_network_troubleshooting`
|
||||||
|
Because network troubleshooting is especially difficult with virtual
|
||||||
|
resources, this chapter is chock-full of helpful tips and tricks for
|
||||||
|
tracing network traffic, finding the root cause of networking
|
||||||
|
failures, and debugging related services, such as DHCP and DNS.
|
||||||
|
|
||||||
|
:doc:`ops_logging_monitoring`
|
||||||
|
This chapter shows you where OpenStack places logs and how to best
|
||||||
|
read and manage logs for monitoring purposes.
|
||||||
|
|
||||||
|
:doc:`ops_backup_recovery`
|
||||||
|
This chapter describes what you need to back up within OpenStack as
|
||||||
|
well as best practices for recovering backups.
|
||||||
|
|
||||||
|
:doc:`ops_customize`
|
||||||
|
For readers who need to get a specialized feature into OpenStack,
|
||||||
|
this chapter describes how to use DevStack to write custom
|
||||||
|
middleware or a custom scheduler to rebalance your resources.
|
||||||
|
|
||||||
|
:doc:`ops_upstream`
|
||||||
|
Because OpenStack is so, well, open, this chapter is dedicated to
|
||||||
|
helping you navigate the community and find out where you can help
|
||||||
|
and where you can get help.
|
||||||
|
|
||||||
|
:doc:`ops_advanced_configuration`
|
||||||
|
Much of OpenStack is driver-oriented, so you can plug in different
|
||||||
|
solutions to the base set of services. This chapter describes some
|
||||||
|
advanced configuration topics.
|
||||||
|
|
||||||
|
:doc:`ops_upgrades`
|
||||||
|
This chapter provides upgrade information based on the architectures
|
||||||
|
used in this book.
|
||||||
|
|
||||||
|
**Back matter:**
|
||||||
|
|
||||||
|
:doc:`app_usecases`
|
||||||
|
You can read a small selection of use cases from the OpenStack
|
||||||
|
community with some technical details and further resources.
|
||||||
|
|
||||||
|
:doc:`app_crypt`
|
||||||
|
These are shared legendary tales of image disappearances, VM
|
||||||
|
massacres, and crazy troubleshooting techniques that result in
|
||||||
|
hard-learned lessons and wisdom.
|
||||||
|
|
||||||
|
:doc:`app_roadmaps`
|
||||||
|
Read about how to track the OpenStack roadmap through the open and
|
||||||
|
transparent development processes.
|
||||||
|
|
||||||
|
:doc:`app_resources`
|
||||||
|
So many OpenStack resources are available online because of the
|
||||||
|
fast-moving nature of the project, but there are also resources
|
||||||
|
listed here that the authors found helpful while learning
|
||||||
|
themselves.
|
||||||
|
|
||||||
|
:doc:`common/glossary`
|
||||||
|
A list of terms used in this book is included, which is a subset of
|
||||||
|
the larger OpenStack glossary available online.
|
||||||
|
|
||||||
|
Why and How We Wrote This Book
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
We wrote this book because we have deployed and maintained OpenStack
|
||||||
|
clouds for at least a year and we wanted to share this knowledge with
|
||||||
|
others. After months of being the point people for an OpenStack cloud,
|
||||||
|
we also wanted to have a document to hand to our system administrators
|
||||||
|
so that they'd know how to operate the cloud on a daily basis—both
|
||||||
|
reactively and pro-actively. We wanted to provide more detailed
|
||||||
|
technical information about the decisions that deployers make along the
|
||||||
|
way.
|
||||||
|
|
||||||
|
We wrote this book to help you:
|
||||||
|
|
||||||
|
- Design and create an architecture for your first nontrivial OpenStack
|
||||||
|
cloud. After you read this guide, you'll know which questions to ask
|
||||||
|
and how to organize your compute, networking, and storage resources
|
||||||
|
and the associated software packages.
|
||||||
|
|
||||||
|
- Perform the day-to-day tasks required to administer a cloud.
|
||||||
|
|
||||||
|
We wrote this book in a book sprint, which is a facilitated, rapid
|
||||||
|
development production method for books. For more information, see the
|
||||||
|
`BookSprints site <http://www.booksprints.net/>`_. Your authors cobbled
|
||||||
|
this book together in five days during February 2013, fueled by caffeine
|
||||||
|
and the best takeout food that Austin, Texas, could offer.
|
||||||
|
|
||||||
|
On the first day, we filled white boards with colorful sticky notes to
|
||||||
|
start to shape this nebulous book about how to architect and operate
|
||||||
|
clouds:
|
||||||
|
|
||||||
|
We wrote furiously from our own experiences and bounced ideas between
|
||||||
|
each other. At regular intervals we reviewed the shape and organization
|
||||||
|
of the book and further molded it, leading to what you see today.
|
||||||
|
|
||||||
|
The team includes:
|
||||||
|
|
||||||
|
Tom Fifield
|
||||||
|
After learning about scalability in computing from particle physics
|
||||||
|
experiments, such as ATLAS at the Large Hadron Collider (LHC) at
|
||||||
|
CERN, Tom worked on OpenStack clouds in production to support the
|
||||||
|
Australian public research sector. Tom currently serves as an
|
||||||
|
OpenStack community manager and works on OpenStack documentation in
|
||||||
|
his spare time.
|
||||||
|
|
||||||
|
Diane Fleming
|
||||||
|
Diane works on the OpenStack API documentation tirelessly. She
|
||||||
|
helped out wherever she could on this project.
|
||||||
|
|
||||||
|
Anne Gentle
|
||||||
|
Anne is the documentation coordinator for OpenStack and also served
|
||||||
|
as an individual contributor to the Google Documentation Summit in
|
||||||
|
2011, working with the Open Street Maps team. She has worked on book
|
||||||
|
sprints in the past, with FLOSS Manuals’ Adam Hyde facilitating.
|
||||||
|
Anne lives in Austin, Texas.
|
||||||
|
|
||||||
|
Lorin Hochstein
|
||||||
|
An academic turned software-developer-slash-operator, Lorin worked
|
||||||
|
as the lead architect for Cloud Services at Nimbis Services, where
|
||||||
|
he deploys OpenStack for technical computing applications. He has
|
||||||
|
been working with OpenStack since the Cactus release. Previously, he
|
||||||
|
worked on high-performance computing extensions for OpenStack at
|
||||||
|
University of Southern California's Information Sciences Institute
|
||||||
|
(USC-ISI).
|
||||||
|
|
||||||
|
Adam Hyde
|
||||||
|
Adam facilitated this book sprint. He also founded the book sprint
|
||||||
|
methodology and is the most experienced book-sprint facilitator
|
||||||
|
around. See http://www.booksprints.net for more information. Adam
|
||||||
|
founded FLOSS Manuals—a community of some 3,000 individuals
|
||||||
|
developing Free Manuals about Free Software. He is also the founder
|
||||||
|
and project manager for Booktype, an open source project for
|
||||||
|
writing, editing, and publishing books online and in print.
|
||||||
|
|
||||||
|
Jonathan Proulx
|
||||||
|
Jon has been piloting an OpenStack cloud as a senior technical
|
||||||
|
architect at the MIT Computer Science and Artificial Intelligence
|
||||||
|
Lab for his researchers to have as much computing power as they
|
||||||
|
need. He started contributing to OpenStack documentation and
|
||||||
|
reviewing the documentation so that he could accelerate his
|
||||||
|
learning.
|
||||||
|
|
||||||
|
Everett Toews
|
||||||
|
Everett is a developer advocate at Rackspace making OpenStack and
|
||||||
|
the Rackspace Cloud easy to use. Sometimes developer, sometimes
|
||||||
|
advocate, and sometimes operator, he's built web applications,
|
||||||
|
taught workshops, given presentations around the world, and deployed
|
||||||
|
OpenStack for production use by academia and business.
|
||||||
|
|
||||||
|
Joe Topjian
|
||||||
|
Joe has designed and deployed several clouds at Cybera, a nonprofit
|
||||||
|
where they are building e-infrastructure to support entrepreneurs
|
||||||
|
and local researchers in Alberta, Canada. He also actively maintains
|
||||||
|
and operates these clouds as a systems architect, and his
|
||||||
|
experiences have generated a wealth of troubleshooting skills for
|
||||||
|
cloud environments.
|
||||||
|
|
||||||
|
OpenStack community members
|
||||||
|
Many individual efforts keep a community book alive. Our community
|
||||||
|
members updated content for this book year-round. Also, a year after
|
||||||
|
the first sprint, Jon Proulx hosted a second two-day mini-sprint at
|
||||||
|
MIT with the goal of updating the book for the latest release. Since
|
||||||
|
the book's inception, more than 30 contributors have supported this
|
||||||
|
book. We have a tool chain for reviews, continuous builds, and
|
||||||
|
translations. Writers and developers continuously review patches,
|
||||||
|
enter doc bugs, edit content, and fix doc bugs. We want to recognize
|
||||||
|
their efforts!
|
||||||
|
|
||||||
|
The following people have contributed to this book: Akihiro Motoki,
|
||||||
|
Alejandro Avella, Alexandra Settle, Andreas Jaeger, Andy McCallum,
|
||||||
|
Benjamin Stassart, Chandan Kumar, Chris Ricker, David Cramer, David
|
||||||
|
Wittman, Denny Zhang, Emilien Macchi, Gauvain Pocentek, Ignacio
|
||||||
|
Barrio, James E. Blair, Jay Clark, Jeff White, Jeremy Stanley, K
|
||||||
|
Jonathan Harker, KATO Tomoyuki, Lana Brindley, Laura Alves, Lee Li,
|
||||||
|
Lukasz Jernas, Mario B. Codeniera, Matthew Kassawara, Michael Still,
|
||||||
|
Monty Taylor, Nermina Miller, Nigel Williams, Phil Hopkins, Russell
|
||||||
|
Bryant, Sahid Orentino Ferdjaoui, Sandy Walsh, Sascha Peilicke, Sean
|
||||||
|
M. Collins, Sergey Lukjanov, Shilla Saebi, Stephen Gordon, Summer
|
||||||
|
Long, Uwe Stuehler, Vaibhav Bhatkar, Veronica Musso, Ying Chun
|
||||||
|
"Daisy" Guo, Zhengguang Ou, and ZhiQiang Fan.
|
||||||
|
|
||||||
|
How to Contribute to This Book
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The genesis of this book was an in-person event, but now that the book
|
||||||
|
is in your hands, we want you to contribute to it. OpenStack
|
||||||
|
documentation follows the coding principles of iterative work, with bug
|
||||||
|
logging, investigating, and fixing. We also store the source content on
|
||||||
|
GitHub and invite collaborators through the OpenStack Gerrit
|
||||||
|
installation, which offers reviews. For the O'Reilly edition of this
|
||||||
|
book, we are using the company's Atlas system, which also stores source
|
||||||
|
content on GitHub and enables collaboration among contributors.
|
||||||
|
|
||||||
|
Learn more about how to contribute to the OpenStack docs at `OpenStack
|
||||||
|
Documentation Contributor
|
||||||
|
Guide <http://docs.openstack.org/contributor-guide/>`_.
|
||||||
|
|
||||||
|
If you find a bug and can't fix it or aren't sure it's really a doc bug,
|
||||||
|
log a bug at `OpenStack
|
||||||
|
Manuals <https://bugs.launchpad.net/openstack-manuals>`_. Tag the bug
|
||||||
|
under Extra options with the ``ops-guide`` tag to indicate that the bug
|
||||||
|
is in this guide. You can assign the bug to yourself if you know how to
|
||||||
|
fix it. Also, a member of the OpenStack doc-core team can triage the doc
|
||||||
|
bug.
|
||||||
|
|
||||||
|
Conventions Used in This Book
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The following typographical conventions are used in this book:
|
||||||
|
|
||||||
|
*Italic*
|
||||||
|
Indicates new terms, URLs, email addresses, filenames, and file
|
||||||
|
extensions.
|
||||||
|
|
||||||
|
``Constant width``
|
||||||
|
Used for program listings, as well as within paragraphs to refer to
|
||||||
|
program elements such as variable or function names, databases, data
|
||||||
|
types, environment variables, statements, and keywords.
|
||||||
|
|
||||||
|
``Constant width bold``
|
||||||
|
Shows commands or other text that should be typed literally by the
|
||||||
|
user.
|
||||||
|
|
||||||
|
Constant width italic
|
||||||
|
Shows text that should be replaced with user-supplied values or by
|
||||||
|
values determined by context.
|
||||||
|
|
||||||
|
Command prompts
|
||||||
|
Commands prefixed with the ``#`` prompt should be executed by the
|
||||||
|
``root`` user. These examples can also be executed using the
|
||||||
|
:command:`sudo` command, if available.
|
||||||
|
|
||||||
|
Commands prefixed with the ``$`` prompt can be executed by any user,
|
||||||
|
including ``root``.
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
|
||||||
|
This element signifies a tip or suggestion.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This element signifies a general note.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This element indicates a warning or caution.
|
||||||
|
|
||||||
|
See also:
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
|
||||||
|
common/conventions.rst
|
|
@ -22,7 +22,8 @@ done
|
||||||
# Draft guides
|
# Draft guides
|
||||||
# This includes guides that we publish from stable branches
|
# This includes guides that we publish from stable branches
|
||||||
# as versioned like the networking-guide.
|
# as versioned like the networking-guide.
|
||||||
for guide in networking-guide arch-design-draft config-reference; do
|
for guide in networking-guide arch-design-draft config-reference \
|
||||||
|
ops-guide; do
|
||||||
tools/build-rst.sh doc/$guide --build build \
|
tools/build-rst.sh doc/$guide --build build \
|
||||||
--target "draft/$guide" $LINKCHECK
|
--target "draft/$guide" $LINKCHECK
|
||||||
done
|
done
|
||||||
|
|
|
@ -31,8 +31,9 @@ function copy_to_branch {
|
||||||
cp -a publish-docs/draft/* publish-docs/$BRANCH/
|
cp -a publish-docs/draft/* publish-docs/$BRANCH/
|
||||||
# We don't need this file
|
# We don't need this file
|
||||||
rm -f publish-docs/$BRANCH/draft-index.html
|
rm -f publish-docs/$BRANCH/draft-index.html
|
||||||
# We don't need Contributor Guide
|
# We don't need these draft guides on the branch
|
||||||
rm -rf publish-docs/$BRANCH/contributor-guide
|
rm -rf publish-docs/$BRANCH/arch-design-draft
|
||||||
|
rm -rf publish-docs/$BRANCH/ops-guide
|
||||||
|
|
||||||
for f in $(find publish-docs/$BRANCH -name "atom.xml"); do
|
for f in $(find publish-docs/$BRANCH -name "atom.xml"); do
|
||||||
sed -i -e "s|/draft/|/$BRANCH/|g" $f
|
sed -i -e "s|/draft/|/$BRANCH/|g" $f
|
||||||
|
|