Import RST ops-guide

Import RST ops-guide.

Publish it as draft for now, do not translate it.

Also do not publish mitaka/arch-design-draft version.

Change-Id: Id25e02aa0b2219fd9141d1354124386cb59bb856
This commit is contained in:
Andreas Jaeger 2016-04-28 14:36:13 -05:00
parent 6da99e7619
commit 2603dc85e5
60 changed files with 15849 additions and 3 deletions

View File

@ -44,4 +44,6 @@ declare -A SPECIAL_BOOKS=(
["releasenotes"]="skip"
# Skip arch design while its being revised
["arch-design-draft"]="skip"
# Skip ops-guide while its being revised
["ops-guide"]="skip"
)

30
doc/ops-guide/setup.cfg Normal file
View File

@ -0,0 +1,30 @@
[metadata]
name = openstackopsguide
summary = OpenStack Operations Guide
author = OpenStack
author-email = openstack-docs@lists.openstack.org
home-page = http://docs.openstack.org/
classifier =
Environment :: OpenStack
Intended Audience :: Information Technology
Intended Audience :: System Administrators
License :: OSI Approved :: Apache Software License
Operating System :: POSIX :: Linux
Topic :: Documentation
[global]
setup-hooks =
pbr.hooks.setup_hook
[files]
[build_sphinx]
all_files = 1
build-dir = build
source-dir = source
[wheel]
universal = 1
[pbr]
warnerrors = True

30
doc/ops-guide/setup.py Normal file
View File

@ -0,0 +1,30 @@
#!/usr/bin/env python
# Copyright (c) 2013 Hewlett-Packard Development Company, L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# THIS FILE IS MANAGED BY THE GLOBAL REQUIREMENTS REPO - DO NOT EDIT
import setuptools
# In python < 2.7.4, a lazy loading of package `pbr` will break
# setuptools if some other modules registered functions in `atexit`.
# solution from: http://bugs.python.org/issue15881#msg170215
try:
import multiprocessing # noqa
except ImportError:
pass
setuptools.setup(
setup_requires=['pbr'],
pbr=True)

View File

@ -0,0 +1,51 @@
================
Acknowledgements
================
The OpenStack Foundation supported the creation of this book with plane
tickets to Austin, lodging (including one adventurous evening without
power after a windstorm), and delicious food. For about USD $10,000, we
could collaborate intensively for a week in the same room at the
Rackspace Austin office. The authors are all members of the OpenStack
Foundation, which you can join. Go to the `Foundation web
site <https://www.openstack.org/join>`_.
We want to acknowledge our excellent host Rackers at Rackspace in
Austin:
- Emma Richards of Rackspace Guest Relations took excellent care of our
lunch orders and even set aside a pile of sticky notes that had
fallen off the walls.
- Betsy Hagemeier, a Fanatical Executive Assistant, took care of a room
reshuffle and helped us settle in for the week.
- The Real Estate team at Rackspace in Austin, also known as "The
Victors," were super responsive.
- Adam Powell in Racker IT supplied us with bandwidth each day and
second monitors for those of us needing more screens.
- On Wednesday night we had a fun happy hour with the Austin OpenStack
Meetup group and Racker Katie Schmidt took great care of our group.
We also had some excellent input from outside of the room:
- Tim Bell from CERN gave us feedback on the outline before we started
and reviewed it mid-week.
- Sébastien Han has written excellent blogs and generously gave his
permission for re-use.
- Oisin Feeley read it, made some edits, and provided emailed feedback
right when we asked.
Inside the book sprint room with us each day was our book sprint
facilitator Adam Hyde. Without his tireless support and encouragement,
we would have thought a book of this scope was impossible in five days.
Adam has proven the book sprint method effectively again and again. He
creates both tools and faith in collaborative authoring at
`www.booksprints.net <http://www.booksprints.net/>`_.
We couldn't have pulled it off without so much supportive help and
encouragement.

View File

@ -0,0 +1,542 @@
=================================
Tales From the Cryp^H^H^H^H Cloud
=================================
Herein lies a selection of tales from OpenStack cloud operators. Read,
and learn from their wisdom.
Double VLAN
~~~~~~~~~~~
I was on-site in Kelowna, British Columbia, Canada setting up a new
OpenStack cloud. The deployment was fully automated: Cobbler deployed
the OS on the bare metal, bootstrapped it, and Puppet took over from
there. I had run the deployment scenario so many times in practice and
took for granted that everything was working.
On my last day in Kelowna, I was in a conference call from my hotel. In
the background, I was fooling around on the new cloud. I launched an
instance and logged in. Everything looked fine. Out of boredom, I ran
:command:`ps aux` and all of the sudden the instance locked up.
Thinking it was just a one-off issue, I terminated the instance and
launched a new one. By then, the conference call ended and I was off to
the data center.
At the data center, I was finishing up some tasks and remembered the
lock-up. I logged into the new instance and ran :command:`ps aux` again.
It worked. Phew. I decided to run it one more time. It locked up.
After reproducing the problem several times, I came to the unfortunate
conclusion that this cloud did indeed have a problem. Even worse, my
time was up in Kelowna and I had to return back to Calgary.
Where do you even begin troubleshooting something like this? An instance
that just randomly locks up when a command is issued. Is it the image?
Nope—it happens on all images. Is it the compute node? Nope—all nodes.
Is the instance locked up? No! New SSH connections work just fine!
We reached out for help. A networking engineer suggested it was an MTU
issue. Great! MTU! Something to go on! What's MTU and why would it cause
a problem?
MTU is maximum transmission unit. It specifies the maximum number of
bytes that the interface accepts for each packet. If two interfaces have
two different MTUs, bytes might get chopped off and weird things
happen—such as random session lockups.
.. note::
Not all packets have a size of 1500. Running the :command:`ls` command over
SSH might only create a single packets less than 1500 bytes.
However, running a command with heavy output, such as :command:`ps aux`
requires several packets of 1500 bytes.
OK, so where is the MTU issue coming from? Why haven't we seen this in
any other deployment? What's new in this situation? Well, new data
center, new uplink, new switches, new model of switches, new servers,
first time using this model of servers… so, basically everything was
new. Wonderful. We toyed around with raising the MTU at various areas:
the switches, the NICs on the compute nodes, the virtual NICs in the
instances, we even had the data center raise the MTU for our uplink
interface. Some changes worked, some didn't. This line of
troubleshooting didn't feel right, though. We shouldn't have to be
changing the MTU in these areas.
As a last resort, our network admin (Alvaro) and myself sat down with
four terminal windows, a pencil, and a piece of paper. In one window, we
ran ping. In the second window, we ran ``tcpdump`` on the cloud
controller. In the third, ``tcpdump`` on the compute node. And the forth
had ``tcpdump`` on the instance. For background, this cloud was a
multi-node, non-multi-host setup.
One cloud controller acted as a gateway to all compute nodes.
VlanManager was used for the network config. This means that the cloud
controller and all compute nodes had a different VLAN for each OpenStack
project. We used the :option:`-s` option of ``ping`` to change the packet
size. We watched as sometimes packets would fully return, sometimes they'd
only make it out and never back in, and sometimes the packets would stop at a
random point. We changed ``tcpdump`` to start displaying the hex dump of
the packet. We pinged between every combination of outside, controller,
compute, and instance.
Finally, Alvaro noticed something. When a packet from the outside hits
the cloud controller, it should not be configured with a VLAN. We
verified this as true. When the packet went from the cloud controller to
the compute node, it should only have a VLAN if it was destined for an
instance. This was still true. When the ping reply was sent from the
instance, it should be in a VLAN. True. When it came back to the cloud
controller and on its way out to the Internet, it should no longer have
a VLAN. False. Uh oh. It looked as though the VLAN part of the packet
was not being removed.
That made no sense.
While bouncing this idea around in our heads, I was randomly typing
commands on the compute node:
.. code-block:: console
$ ip a
10: vlan100@vlan20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br100 state UP
"Hey Alvaro, can you run a VLAN on top of a VLAN?"
"If you did, you'd add an extra 4 bytes to the packet…"
Then it all made sense…
.. code-block:: console
$ grep vlan_interface /etc/nova/nova.conf
vlan_interface=vlan20
In ``nova.conf``, ``vlan_interface`` specifies what interface OpenStack
should attach all VLANs to. The correct setting should have been:
.. code-block:: ini
vlan_interface=bond0
As this would be the server's bonded NIC.
vlan20 is the VLAN that the data center gave us for outgoing Internet
access. It's a correct VLAN and is also attached to bond0.
By mistake, I configured OpenStack to attach all tenant VLANs to vlan20
instead of bond0 thereby stacking one VLAN on top of another. This added
an extra 4 bytes to each packet and caused a packet of 1504 bytes to be
sent out which would cause problems when it arrived at an interface that
only accepted 1500.
As soon as this setting was fixed, everything worked.
"The Issue"
~~~~~~~~~~~
At the end of August 2012, a post-secondary school in Alberta, Canada
migrated its infrastructure to an OpenStack cloud. As luck would have
it, within the first day or two of it running, one of their servers just
disappeared from the network. Blip. Gone.
After restarting the instance, everything was back up and running. We
reviewed the logs and saw that at some point, network communication
stopped and then everything went idle. We chalked this up to a random
occurrence.
A few nights later, it happened again.
We reviewed both sets of logs. The one thing that stood out the most was
DHCP. At the time, OpenStack, by default, set DHCP leases for one minute
(it's now two minutes). This means that every instance contacts the
cloud controller (DHCP server) to renew its fixed IP. For some reason,
this instance could not renew its IP. We correlated the instance's logs
with the logs on the cloud controller and put together a conversation:
#. Instance tries to renew IP.
#. Cloud controller receives the renewal request and sends a response.
#. Instance "ignores" the response and re-sends the renewal request.
#. Cloud controller receives the second request and sends a new
response.
#. Instance begins sending a renewal request to ``255.255.255.255``
since it hasn't heard back from the cloud controller.
#. The cloud controller receives the ``255.255.255.255`` request and
sends a third response.
#. The instance finally gives up.
With this information in hand, we were sure that the problem had to do
with DHCP. We thought that for some reason, the instance wasn't getting
a new IP address and with no IP, it shut itself off from the network.
A quick Google search turned up this: `DHCP lease errors in VLAN
mode <https://lists.launchpad.net/openstack/msg11696.html>`_
(https://lists.launchpad.net/openstack/msg11696.html) which further
supported our DHCP theory.
An initial idea was to just increase the lease time. If the instance
only renewed once every week, the chances of this problem happening
would be tremendously smaller than every minute. This didn't solve the
problem, though. It was just covering the problem up.
We decided to have ``tcpdump`` run on this instance and see if we could
catch it in action again. Sure enough, we did.
The ``tcpdump`` looked very, very weird. In short, it looked as though
network communication stopped before the instance tried to renew its IP.
Since there is so much DHCP chatter from a one minute lease, it's very
hard to confirm it, but even with only milliseconds difference between
packets, if one packet arrives first, it arrived first, and if that
packet reported network issues, then it had to have happened before
DHCP.
Additionally, this instance in question was responsible for a very, very
large backup job each night. While "The Issue" (as we were now calling
it) didn't happen exactly when the backup happened, it was close enough
(a few hours) that we couldn't ignore it.
Further days go by and we catch The Issue in action more and more. We
find that dhclient is not running after The Issue happens. Now we're
back to thinking it's a DHCP issue. Running ``/etc/init.d/networking``
restart brings everything back up and running.
Ever have one of those days where all of the sudden you get the Google
results you were looking for? Well, that's what happened here. I was
looking for information on dhclient and why it dies when it can't renew
its lease and all of the sudden I found a bunch of OpenStack and dnsmasq
discussions that were identical to the problem we were seeing!
`Problem with Heavy Network IO and
Dnsmasq <http://www.gossamer-threads.com/lists/openstack/operators/18197>`_
(http://www.gossamer-threads.com/lists/openstack/operators/18197)
`instances losing IP address while running, due to No
DHCPOFFER <http://www.gossamer-threads.com/lists/openstack/dev/14696>`_
(http://www.gossamer-threads.com/lists/openstack/dev/14696)
Seriously, Google.
This bug report was the key to everything: `KVM images lose connectivity
with bridged
network <https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978>`_
(https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978)
It was funny to read the report. It was full of people who had some
strange network problem but didn't quite explain it in the same way.
So it was a qemu/kvm bug.
At the same time of finding the bug report, a co-worker was able to
successfully reproduce The Issue! How? He used ``iperf`` to spew a ton
of bandwidth at an instance. Within 30 minutes, the instance just
disappeared from the network.
Armed with a patched qemu and a way to reproduce, we set out to see if
we've finally solved The Issue. After 48 hours straight of hammering the
instance with bandwidth, we were confident. The rest is history. You can
search the bug report for "joe" to find my comments and actual tests.
Disappearing Images
~~~~~~~~~~~~~~~~~~~
At the end of 2012, Cybera (a nonprofit with a mandate to oversee the
development of cyberinfrastructure in Alberta, Canada) deployed an
updated OpenStack cloud for their `DAIR
project <http://www.canarie.ca/cloud/>`_
(http://www.canarie.ca/en/dair-program/about). A few days into
production, a compute node locks up. Upon rebooting the node, I checked
to see what instances were hosted on that node so I could boot them on
behalf of the customer. Luckily, only one instance.
The :command:`nova reboot` command wasn't working, so I used :command:`virsh`,
but it immediately came back with an error saying it was unable to find the
backing disk. In this case, the backing disk is the Glance image that is
copied to ``/var/lib/nova/instances/_base`` when the image is used for
the first time. Why couldn't it find it? I checked the directory and
sure enough it was gone.
I reviewed the ``nova`` database and saw the instance's entry in the
``nova.instances`` table. The image that the instance was using matched
what virsh was reporting, so no inconsistency there.
I checked Glance and noticed that this image was a snapshot that the
user created. At least that was good news—this user would have been the
only user affected.
Finally, I checked StackTach and reviewed the user's events. They had
created and deleted several snapshots—most likely experimenting.
Although the timestamps didn't match up, my conclusion was that they
launched their instance and then deleted the snapshot and it was somehow
removed from ``/var/lib/nova/instances/_base``. None of that made sense,
but it was the best I could come up with.
It turns out the reason that this compute node locked up was a hardware
issue. We removed it from the DAIR cloud and called Dell to have it
serviced. Dell arrived and began working. Somehow or another (or a fat
finger), a different compute node was bumped and rebooted. Great.
When this node fully booted, I ran through the same scenario of seeing
what instances were running so I could turn them back on. There were a
total of four. Three booted and one gave an error. It was the same error
as before: unable to find the backing disk. Seriously, what?
Again, it turns out that the image was a snapshot. The three other
instances that successfully started were standard cloud images. Was it a
problem with snapshots? That didn't make sense.
A note about DAIR's architecture: ``/var/lib/nova/instances`` is a
shared NFS mount. This means that all compute nodes have access to it,
which includes the ``_base`` directory. Another centralized area is
``/var/log/rsyslog`` on the cloud controller. This directory collects
all OpenStack logs from all compute nodes. I wondered if there were any
entries for the file that :command:`virsh` is reporting:
.. code-block:: console
dair-ua-c03/nova.log:Dec 19 12:10:59 dair-ua-c03
2012-12-19 12:10:59 INFO nova.virt.libvirt.imagecache
[-] Removing base file:
/var/lib/nova/instances/_base/7b4783508212f5d242cbf9ff56fb8d33b4ce6166_10
Ah-hah! So OpenStack was deleting it. But why?
A feature was introduced in Essex to periodically check and see if there
were any ``_base`` files not in use. If there were, OpenStack Compute
would delete them. This idea sounds innocent enough and has some good
qualities to it. But how did this feature end up turned on? It was
disabled by default in Essex. As it should be. It was `decided to be
turned on in Folsom <https://bugs.launchpad.net/nova/+bug/1029674>`_
(https://bugs.launchpad.net/nova/+bug/1029674). I cannot emphasize
enough that:
*Actions which delete things should not be enabled by default.*
Disk space is cheap these days. Data recovery is not.
Secondly, DAIR's shared ``/var/lib/nova/instances`` directory
contributed to the problem. Since all compute nodes have access to this
directory, all compute nodes periodically review the \_base directory.
If there is only one instance using an image, and the node that the
instance is on is down for a few minutes, it won't be able to mark the
image as still in use. Therefore, the image seems like it's not in use
and is deleted. When the compute node comes back online, the instance
hosted on that node is unable to start.
The Valentine's Day Compute Node Massacre
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Although the title of this story is much more dramatic than the actual
event, I don't think, or hope, that I'll have the opportunity to use
"Valentine's Day Massacre" again in a title.
This past Valentine's Day, I received an alert that a compute node was
no longer available in the cloud—meaning,
.. code-block:: console
$ nova service-list
showed this particular node in a down state.
I logged into the cloud controller and was able to both ``ping`` and SSH
into the problematic compute node which seemed very odd. Usually if I
receive this type of alert, the compute node has totally locked up and
would be inaccessible.
After a few minutes of troubleshooting, I saw the following details:
- A user recently tried launching a CentOS instance on that node
- This user was the only user on the node (new node)
- The load shot up to 8 right before I received the alert
- The bonded 10gb network device (bond0) was in a DOWN state
- The 1gb NIC was still alive and active
I looked at the status of both NICs in the bonded pair and saw that
neither was able to communicate with the switch port. Seeing as how each
NIC in the bond is connected to a separate switch, I thought that the
chance of a switch port dying on each switch at the same time was quite
improbable. I concluded that the 10gb dual port NIC had died and needed
replaced. I created a ticket for the hardware support department at the
data center where the node was hosted. I felt lucky that this was a new
node and no one else was hosted on it yet.
An hour later I received the same alert, but for another compute node.
Crap. OK, now there's definitely a problem going on. Just like the
original node, I was able to log in by SSH. The bond0 NIC was DOWN but
the 1gb NIC was active.
And the best part: the same user had just tried creating a CentOS
instance. What?
I was totally confused at this point, so I texted our network admin to
see if he was available to help. He logged in to both switches and
immediately saw the problem: the switches detected spanning tree packets
coming from the two compute nodes and immediately shut the ports down to
prevent spanning tree loops:
.. code-block:: console
Feb 15 01:40:18 SW-1 Stp: %SPANTREE-4-BLOCK_BPDUGUARD: Received BPDU packet on Port-Channel35 with BPDU guard enabled. Disabling interface. (source mac fa:16:3e:24:e7:22)
Feb 15 01:40:18 SW-1 Ebra: %ETH-4-ERRDISABLE: bpduguard error detected on Port-Channel35.
Feb 15 01:40:18 SW-1 Mlag: %MLAG-4-INTF_INACTIVE_LOCAL: Local interface Port-Channel35 is link down. MLAG 35 is inactive.
Feb 15 01:40:18 SW-1 Ebra: %LINEPROTO-5-UPDOWN: Line protocol on Interface Port-Channel35 (Server35), changed state to down
Feb 15 01:40:19 SW-1 Stp: %SPANTREE-6-INTERFACE_DEL: Interface Port-Channel35 has been removed from instance MST0
Feb 15 01:40:19 SW-1 Ebra: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet35 (Server35), changed state to down
He re-enabled the switch ports and the two compute nodes immediately
came back to life.
Unfortunately, this story has an open ending... we're still looking into
why the CentOS image was sending out spanning tree packets. Further,
we're researching a proper way on how to mitigate this from happening.
It's a bigger issue than one might think. While it's extremely important
for switches to prevent spanning tree loops, it's very problematic to
have an entire compute node be cut from the network when this happens.
If a compute node is hosting 100 instances and one of them sends a
spanning tree packet, that instance has effectively DDOS'd the other 99
instances.
This is an ongoing and hot topic in networking circles —especially with
the raise of virtualization and virtual switches.
Down the Rabbit Hole
~~~~~~~~~~~~~~~~~~~~
Users being able to retrieve console logs from running instances is a
boon for support—many times they can figure out what's going on inside
their instance and fix what's going on without bothering you.
Unfortunately, sometimes overzealous logging of failures can cause
problems of its own.
A report came in: VMs were launching slowly, or not at all. Cue the
standard checks—nothing on the Nagios, but there was a spike in network
towards the current master of our RabbitMQ cluster. Investigation
started, but soon the other parts of the queue cluster were leaking
memory like a sieve. Then the alert came in—the master Rabbit server
went down and connections failed over to the slave.
At that time, our control services were hosted by another team and we
didn't have much debugging information to determine what was going on
with the master, and we could not reboot it. That team noted that it
failed without alert, but managed to reboot it. After an hour, the
cluster had returned to its normal state and we went home for the day.
Continuing the diagnosis the next morning was kick started by another
identical failure. We quickly got the message queue running again, and
tried to work out why Rabbit was suffering from so much network traffic.
Enabling debug logging on nova-api quickly brought understanding. A
``tail -f /var/log/nova/nova-api.log`` was scrolling by faster
than we'd ever seen before. CTRL+C on that and we could plainly see the
contents of a system log spewing failures over and over again - a system
log from one of our users' instances.
After finding the instance ID we headed over to
``/var/lib/nova/instances`` to find the ``console.log``:
.. code-block:: console
adm@cc12:/var/lib/nova/instances/instance-00000e05# wc -l console.log
92890453 console.log
adm@cc12:/var/lib/nova/instances/instance-00000e05# ls -sh console.log
5.5G console.log
Sure enough, the user had been periodically refreshing the console log
page on the dashboard and the 5G file was traversing the Rabbit cluster
to get to the dashboard.
We called them and asked them to stop for a while, and they were happy
to abandon the horribly broken VM. After that, we started monitoring the
size of console logs.
To this day, `the issue <https://bugs.launchpad.net/nova/+bug/832507>`__
(https://bugs.launchpad.net/nova/+bug/832507) doesn't have a permanent
resolution, but we look forward to the discussion at the next summit.
Havana Haunted by the Dead
~~~~~~~~~~~~~~~~~~~~~~~~~~
Felix Lee of Academia Sinica Grid Computing Centre in Taiwan contributed
this story.
I just upgraded OpenStack from Grizzly to Havana 2013.2-2 using the RDO
repository and everything was running pretty well—except the EC2 API.
I noticed that the API would suffer from a heavy load and respond slowly
to particular EC2 requests such as ``RunInstances``.
Output from ``/var/log/nova/nova-api.log`` on :term:`Havana`:
.. code-block:: console
2014-01-10 09:11:45.072 129745 INFO nova.ec2.wsgi.server
[req-84d16d16-3808-426b-b7af-3b90a11b83b0
0c6e7dba03c24c6a9bce299747499e8a 7052bd6714e7460caeb16242e68124f9]
117.103.103.29 "GET
/services/Cloud?AWSAccessKeyId=[something]&Action=RunInstances&ClientToken=[something]&ImageId=ami-00000001&InstanceInitiatedShutdownBehavior=terminate...
HTTP/1.1" status: 200 len: 1109 time: 138.5970151
This request took over two minutes to process, but executed quickly on
another co-existing Grizzly deployment using the same hardware and
system configuration.
Output from ``/var/log/nova/nova-api.log`` on :term:`Grizzly`:
.. code-block:: console
2014-01-08 11:15:15.704 INFO nova.ec2.wsgi.server
[req-ccac9790-3357-4aa8-84bd-cdaab1aa394e
ebbd729575cb404081a45c9ada0849b7 8175953c209044358ab5e0ec19d52c37]
117.103.103.29 "GET
/services/Cloud?AWSAccessKeyId=[something]&Action=RunInstances&ClientToken=[something]&ImageId=ami-00000007&InstanceInitiatedShutdownBehavior=terminate...
HTTP/1.1" status: 200 len: 931 time: 3.9426181
While monitoring system resources, I noticed a significant increase in
memory consumption while the EC2 API processed this request. I thought
it wasn't handling memory properly—possibly not releasing memory. If the
API received several of these requests, memory consumption quickly grew
until the system ran out of RAM and began using swap. Each node has 48
GB of RAM and the "nova-api" process would consume all of it within
minutes. Once this happened, the entire system would become unusably
slow until I restarted the nova-api service.
So, I found myself wondering what changed in the EC2 API on Havana that
might cause this to happen. Was it a bug or a normal behavior that I now
need to work around?
After digging into the nova (OpenStack Compute) code, I noticed two
areas in ``api/ec2/cloud.py`` potentially impacting my system:
.. code-block:: python
instances = self.compute_api.get_all(context,
search_opts=search_opts,
sort_dir='asc')
sys_metas = self.compute_api.get_all_system_metadata(
context, search_filts=[{'key': ['EC2_client_token']},
{'value': [client_token]}])
Since my database contained many records—over 1 million metadata records
and over 300,000 instance records in "deleted" or "errored" states—each
search took a long time. I decided to clean up the database by first
archiving a copy for backup and then performing some deletions using the
MySQL client. For example, I ran the following SQL command to remove
rows of instances deleted for over a year:
.. code-block:: console
mysql> delete from nova.instances where deleted=1 and terminated_at < (NOW() - INTERVAL 1 YEAR);
Performance increased greatly after deleting the old records and my new
deployment continues to behave well.

View File

@ -0,0 +1,62 @@
=========
Resources
=========
OpenStack
~~~~~~~~~
- `Installation Guide for openSUSE 13.2 and SUSE Linux Enterprise
Server 12 <http://docs.openstack.org/liberty/install-guide-obs/>`_
- `Installation Guide for Red Hat Enterprise Linux 7, CentOS 7, and
Fedora 22 <http://docs.openstack.org/liberty/install-guide-rdo/>`_
- `Installation Guide for Ubuntu 14.04 (LTS)
Server <http://docs.openstack.org/liberty/install-guide-ubuntu/>`_
- `OpenStack Administrator Guide <http://docs.openstack.org/admin-guide/>`_
- `OpenStack Cloud Computing Cookbook (Packt
Publishing) <http://www.packtpub.com/openstack-cloud-computing-cookbook-second-edition/book>`_
Cloud (General)
~~~~~~~~~~~~~~~
- `“The NIST Definition of Cloud
Computing” <http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf>`_
Python
~~~~~~
- `Dive Into Python (Apress) <http://www.diveintopython.net/>`_
Networking
~~~~~~~~~~
- `TCP/IP Illustrated, Volume 1: The Protocols, 2/E
(Pearson) <http://www.pearsonhighered.com/educator/product/TCPIP-Illustrated-Volume-1-The-Protocols/9780321336316.page>`_
- `The TCP/IP Guide (No Starch
Press) <http://www.nostarch.com/tcpip.htm>`_
- `“A tcpdump Tutorial and
Primer” <http://danielmiessler.com/study/tcpdump/>`_
Systems Administration
~~~~~~~~~~~~~~~~~~~~~~
- `UNIX and Linux Systems Administration Handbook (Prentice
Hall) <http://www.admin.com/>`_
Virtualization
~~~~~~~~~~~~~~
- `The Book of Xen (No Starch
Press) <http://www.nostarch.com/xen.htm>`_
Configuration Management
~~~~~~~~~~~~~~~~~~~~~~~~
- `Puppet Labs Documentation <http://docs.puppetlabs.com/>`_
- `Pro Puppet (Apress) <http://www.apress.com/9781430230571>`_

View File

@ -0,0 +1,435 @@
=====================
Working with Roadmaps
=====================
The good news: OpenStack has unprecedented transparency when it comes to
providing information about what's coming up. The bad news: each release
moves very quickly. The purpose of this appendix is to highlight some of
the useful pages to track, and take an educated guess at what is coming
up in the next release and perhaps further afield.
OpenStack follows a six month release cycle, typically releasing in
April/May and October/November each year. At the start of each cycle,
the community gathers in a single location for a design summit. At the
summit, the features for the coming releases are discussed, prioritized,
and planned. The below figure shows an example release cycle, with dates
showing milestone releases, code freeze, and string freeze dates, along
with an example of when the summit occurs. Milestones are interim releases
within the cycle that are available as packages for download and
testing. Code freeze is putting a stop to adding new features to the
release. String freeze is putting a stop to changing any strings within
the source code.
.. image:: figures/osog_ac01.png
:width: 100%
Information Available to You
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are several good sources of information available that you can use
to track your OpenStack development desires.OpenStack community working
with roadmaps information available
Release notes are maintained on the OpenStack wiki, and also shown here:
.. list-table::
:widths: 25 25 25 25
:header-rows: 1
* - Series
- Status
- Releases
- Date
* - Liberty
- `Under Development
<https://wiki.openstack.org/wiki/Liberty_Release_Schedule>`_
- 2015.2
- Oct, 2015
* - Kilo
- `Current stable release, security-supported
<https://wiki.openstack.org/wiki/Kilo_Release_Schedule>`_
- `2015.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Kilo>`_
- Apr 30, 2015
* - Juno
- `Security-supported
<https://wiki.openstack.org/wiki/Juno_Release_Schedule>`_
- `2014.2 <https://wiki.openstack.org/wiki/ReleaseNotes/Juno>`_
- Oct 16, 2014
* - Icehouse
- `End-of-life
<https://wiki.openstack.org/wiki/Icehouse_Release_Schedule>`_
- `2014.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Icehouse>`_
- Apr 17, 2014
* -
-
- `2014.1.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2014.1.1>`_
- Jun 9, 2014
* -
-
- `2014.1.2 <https://wiki.openstack.org/wiki/ReleaseNotes/2014.1.2>`_
- Aug 8, 2014
* -
-
- `2014.1.3 <https://wiki.openstack.org/wiki/ReleaseNotes/2014.1.3>`_
- Oct 2, 2014
* - Havana
- End-of-life
- `2013.2 <https://wiki.openstack.org/wiki/ReleaseNotes/Havana>`_
- Apr 4, 2013
* -
-
- `2013.2.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.2.1>`_
- Dec 16, 2013
* -
-
- `2013.2.2 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.2.2>`_
- Feb 13, 2014
* -
-
- `2013.2.3 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.2.3>`_
- Apr 3, 2014
* -
-
- `2013.2.4 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.2.4>`_
- Sep 22, 2014
* -
-
- `2013.2.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.2.1>`_
- Dec 16, 2013
* - Grizzly
- End-of-life
- `2013.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Grizzly>`_
- Apr 4, 2013
* -
-
- `2013.1.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.1.1>`_
- May 9, 2013
* -
-
- `2013.1.2 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.1.2>`_
- Jun 6, 2013
* -
-
- `2013.1.3 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.1.3>`_
- Aug 8, 2013
* -
-
- `2013.1.4 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.1.4>`_
- Oct 17, 2013
* -
-
- `2013.1.5 <https://wiki.openstack.org/wiki/ReleaseNotes/2013.1.5>`_
- Mar 20, 2015
* - Folsom
- End-of-life
- `2012.2 <https://wiki.openstack.org/wiki/ReleaseNotes/Folsom>`_
- Sep 27, 2012
* -
-
- `2012.2.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.2.1>`_
- Nov 29, 2012
* -
-
- `2012.2.2 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.2.2>`_
- Dec 13, 2012
* -
-
- `2012.2.3 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.2.3>`_
- Jan 31, 2013
* -
-
- `2012.2.4 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.2.4>`_
- Apr 11, 2013
* - Essex
- End-of-life
- `2012.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Essex>`_
- Apr 5, 2012
* -
-
- `2012.1.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.1.1>`_
- Jun 22, 2012
* -
-
- `2012.1.2 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.1.2>`_
- Aug 10, 2012
* -
-
- `2012.1.3 <https://wiki.openstack.org/wiki/ReleaseNotes/2012.1.3>`_
- Oct 12, 2012
* - Diablo
- Deprecated
- `2011.3 <https://wiki.openstack.org/wiki/ReleaseNotes/Diablo>`_
- Sep 22, 2011
* -
-
- `2011.3.1 <https://wiki.openstack.org/wiki/ReleaseNotes/2011.3.1>`_
- Jan 19, 2012
* - Cactus
- Deprecated
- `2011.2 <https://wiki.openstack.org/wiki/ReleaseNotes/Cactus>`_
- Apr 15, 2011
* - Bexar
- Deprecated
- `2011.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Bexar>`_
- Feb 3, 2011
* - Austin
- Deprecated
- `2010.1 <https://wiki.openstack.org/wiki/ReleaseNotes/Austin>`_
- Oct 21, 2010
Here are some other resources:
- `A breakdown of current features under development, with their target
milestone <http://status.openstack.org/release/>`_
- `A list of all features, including those not yet under
development <https://blueprints.launchpad.net/openstack>`_
- `Rough-draft design discussions ("etherpads") from the last design
summit <https://wiki.openstack.org/wiki/Summit/Kilo/Etherpads>`_
- `List of individual code changes under
review <https://review.openstack.org/>`_
Influencing the Roadmap
~~~~~~~~~~~~~~~~~~~~~~~
OpenStack truly welcomes your ideas (and contributions) and highly
values feedback from real-world users of the software. By learning a
little about the process that drives feature development, you can
participate and perhaps get the additions you desire.
Feature requests typically start their life in Etherpad, a collaborative
editing tool, which is used to take coordinating notes at a design
summit session specific to the feature. This then leads to the creation
of a blueprint on the Launchpad site for the particular project, which
is used to describe the feature more formally. Blueprints are then
approved by project team members, and development can begin.
Therefore, the fastest way to get your feature request up for
consideration is to create an Etherpad with your ideas and propose a
session to the design summit. If the design summit has already passed,
you may also create a blueprint directly. Read this `blog post about how
to work with blueprints
<http://vmartinezdelacruz.com/how-to-work-with-blueprints-without-losing-your-mind/>`_
the perspective of Victoria Martínez, a developer intern.
The roadmap for the next release as it is developed can be seen at
`Releases <http://releases.openstack.org>`_.
To determine the potential features going in to future releases, or to
look at features implemented previously, take a look at the existing
blueprints such as  `OpenStack Compute (nova)
Blueprints <https://blueprints.launchpad.net/nova>`_, `OpenStack
Identity (keystone)
Blueprints <https://blueprints.launchpad.net/keystone>`_, and release
notes.
Aside from the direct-to-blueprint pathway, there is another very
well-regarded mechanism to influence the development roadmap: the user
survey. Found at http://openstack.org/user-survey, it allows you to
provide details of your deployments and needs, anonymously by default.
Each cycle, the user committee analyzes the results and produces a
report, including providing specific information to the technical
committee and project team leads.
Aspects to Watch
~~~~~~~~~~~~~~~~
You want to keep an eye on the areas improving within OpenStack. The
best way to "watch" roadmaps for each project is to look at the
blueprints that are being approved for work on milestone releases. You
can also learn from PTL webinars that follow the OpenStack summits twice
a year.
Driver Quality Improvements
---------------------------
A major quality push has occurred across drivers and plug-ins in Block
Storage, Compute, and Networking. Particularly, developers of Compute
and Networking drivers that require proprietary or hardware products are
now required to provide an automated external testing system for use
during the development process.
Easier Upgrades
---------------
One of the most requested features since OpenStack began (for components
other than Object Storage, which tends to "just work"): easier upgrades.
In all recent releases internal messaging communication is versioned,
meaning services can theoretically drop back to backward-compatible
behavior. This allows you to run later versions of some components,
while keeping older versions of others.
In addition, database migrations are now tested with the Turbo Hipster
tool. This tool tests database migration performance on copies of
real-world user databases.
These changes have facilitated the first proper OpenStack upgrade guide,
found in :doc:`ops_upgrades`, and will continue to improve in the next
release.
Deprecation of Nova Network
---------------------------
With the introduction of the full software-defined networking stack
provided by OpenStack Networking (neutron) in the Folsom release,
development effort on the initial networking code that remains part of
the Compute component has gradually lessened. While many still use
``nova-network`` in production, there has been a long-term plan to
remove the code in favor of the more flexible and full-featured
OpenStack Networking.
An attempt was made to deprecate ``nova-network`` during the Havana
release, which was aborted due to the lack of equivalent functionality
(such as the FlatDHCP multi-host high-availability mode mentioned in
this guide), lack of a migration path between versions, insufficient
testing, and simplicity when used for the more straightforward use cases
``nova-network`` traditionally supported. Though significant effort has
been made to address these concerns, ``nova-network`` was not be
deprecated in the Juno release. In addition, to a limited degree,
patches to ``nova-network`` have again begin to be accepted, such as
adding a per-network settings feature and SR-IOV support in Juno.
This leaves you with an important point of decision when designing your
cloud. OpenStack Networking is robust enough to use with a small number
of limitations (performance issues in some scenarios, only basic high
availability of layer 3 systems) and provides many more features than
``nova-network``. However, if you do not have the more complex use cases
that can benefit from fuller software-defined networking capabilities,
or are uncomfortable with the new concepts introduced, ``nova-network``
may continue to be a viable option for the next 12 months.
Similarly, if you have an existing cloud and are looking to upgrade from
``nova-network`` to OpenStack Networking, you should have the option to
delay the upgrade for this period of time. However, each release of
OpenStack brings significant new innovation, and regardless of your use
of networking methodology, it is likely best to begin planning for an
upgrade within a reasonable timeframe of each release.
As mentioned, there's currently no way to cleanly migrate from
``nova-network`` to neutron. We recommend that you keep a migration in
mind and what that process might involve for when a proper migration
path is released.
Distributed Virtual Router
~~~~~~~~~~~~~~~~~~~~~~~~~~
One of the long-time complaints surrounding OpenStack Networking was the
lack of high availability for the layer 3 components. The Juno release
introduced Distributed Virtual Router (DVR), which aims to solve this
problem.
Early indications are that it does do this well for a base set of
scenarios, such as using the ML2 plug-in with Open vSwitch, one flat
external network and VXLAN tenant networks. However, it does appear that
there are problems with the use of VLANs, IPv6, Floating IPs, high
north-south traffic scenarios and large numbers of compute nodes. It is
expected these will improve significantly with the next release, but bug
reports on specific issues are highly desirable.
Replacement of Open vSwitch Plug-in with Modular Layer 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Modular Layer 2 plug-in is a framework allowing OpenStack Networking
to simultaneously utilize the variety of layer-2 networking technologies
found in complex real-world data centers. It currently works with the
existing Open vSwitch, Linux Bridge, and Hyper-V L2 agents and is
intended to replace and deprecate the monolithic plug-ins associated
with those L2 agents.
New API Versions
~~~~~~~~~~~~~~~~
The third version of the Compute API was broadly discussed and worked on
during the Havana and Icehouse release cycles. Current discussions
indicate that the V2 API will remain for many releases, and the next
iteration of the API will be denoted v2.1 and have similar properties to
the existing v2.0, rather than an entirely new v3 API. This is a great
time to evaluate all API and provide comments while the next generation
APIs are being defined. A new working group was formed specifically to
`improve OpenStack APIs <https://wiki.openstack.org/wiki/API_Working_Group>`_
and create design guidelines, which you are welcome to join.
OpenStack on OpenStack (TripleO)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This project continues to improve and you may consider using it for
greenfield deployments, though according to the latest user survey
results it remains to see widespread uptake.
Data processing service for OpenStack (sahara)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A much-requested answer to big data problems, a dedicated team has been
making solid progress on a Hadoop-as-a-Service project.
Bare metal Deployment (ironic)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The bare-metal deployment has been widely lauded, and development
continues. The Juno release brought the OpenStack Bare metal drive into
the Compute project, and it was aimed to deprecate the existing
bare-metal driver in Kilo. If you are a current user of the bare metal
driver, a particular blueprint to follow is `Deprecate the bare metal
driver
<https://blueprints.launchpad.net/nova/+spec/deprecate-baremetal-driver>`_
Database as a Service (trove)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The OpenStack community has had a database-as-a-service tool in
development for some time, and we saw the first integrated release of it
in Icehouse. From its release it was able to deploy database servers out
of the box in a highly available way, initially supporting only MySQL.
Juno introduced support for Mongo (including clustering), PostgreSQL and
Couchbase, in addition to replication functionality for MySQL. In Kilo,
more advanced clustering capability was delivered, in addition to better
integration with other OpenStack components such as Networking.
Message Service (zaqar)
~~~~~~~~~~~~~~~~~~~~~~~
A service to provide queues of messages and notifications was released.
DNS service (designate)
~~~~~~~~~~~~~~~~~~~~~~~
A long requested service, to provide the ability to manipulate DNS
entries associated with OpenStack resources has gathered a following.
The designate project was also released.
Scheduler Improvements
~~~~~~~~~~~~~~~~~~~~~~
Both Compute and Block Storage rely on schedulers to determine where to
place virtual machines or volumes. In Havana, the Compute scheduler
underwent significant improvement, while in Icehouse it was the
scheduler in Block Storage that received a boost. Further down the
track, an effort started this cycle that aims to create a holistic
scheduler covering both will come to fruition. Some of the work that was
done in Kilo can be found under the `Gantt
project <https://wiki.openstack.org/wiki/Gantt/kilo>`_.
Block Storage Improvements
--------------------------
Block Storage is considered a stable project, with wide uptake and a
long track record of quality drivers. The team has discussed many areas
of work at the summits, including better error reporting, automated
discovery, and thin provisioning features.
Toward a Python SDK
-------------------
Though many successfully use the various python-\*client code as an
effective SDK for interacting with OpenStack, consistency between the
projects and documentation availability waxes and wanes. To combat this,
an `effort to improve the
experience <https://wiki.openstack.org/wiki/PythonOpenStackSDK>`_ has
started. Cross-project development efforts in OpenStack have a checkered
history, such as the `unified client
project <https://wiki.openstack.org/wiki/OpenStackClient>`_ having
several false starts. However, the early signs for the SDK project are
promising, and we expect to see results during the Juno cycle.

View File

@ -0,0 +1,192 @@
=========
Use Cases
=========
This appendix contains a small selection of use cases from the
community, with more technical detail than usual. Further examples can
be found on the `OpenStack website <https://www.openstack.org/user-stories/>`_.
NeCTAR
~~~~~~
Who uses it: researchers from the Australian publicly funded research
sector. Use is across a wide variety of disciplines, with the purpose of
instances ranging from running simple web servers to using hundreds of
cores for high-throughput computing.
Deployment
----------
Using OpenStack Compute cells, the NeCTAR Research Cloud spans eight
sites with approximately 4,000 cores per site.
Each site runs a different configuration, as a resource cells in an
OpenStack Compute cells setup. Some sites span multiple data centers,
some use off compute node storage with a shared file system, and some
use on compute node storage with a non-shared file system. Each site
deploys the Image service with an Object Storage back end. A central
Identity, dashboard, and Compute API service are used. A login to the
dashboard triggers a SAML login with Shibboleth, which creates an
account in the Identity service with an SQL back end. An Object Storage
Global Cluster is used across several sites.
Compute nodes have 24 to 48 cores, with at least 4 GB of RAM per core
and approximately 40 GB of ephemeral storage per core.
All sites are based on Ubuntu 14.04, with KVM as the hypervisor. The
OpenStack version in use is typically the current stable version, with 5
to 10 percent back-ported code from trunk and modifications.
Resources
---------
- `OpenStack.org case
study <https://www.openstack.org/user-stories/nectar/>`_
- `NeCTAR-RC GitHub <https://github.com/NeCTAR-RC/>`_
- `NeCTAR website <https://www.nectar.org.au/>`_
MIT CSAIL
~~~~~~~~~
Who uses it: researchers from the MIT Computer Science and Artificial
Intelligence Lab.
Deployment
----------
The CSAIL cloud is currently 64 physical nodes with a total of 768
physical cores and 3,456 GB of RAM. Persistent data storage is largely
outside the cloud on NFS, with cloud resources focused on compute
resources. There are more than 130 users in more than 40 projects,
typically running 2,0002,500 vCPUs in 300 to 400 instances.
We initially deployed on Ubuntu 12.04 with the Essex release of
OpenStack using FlatDHCP multi-host networking.
The software stack is still Ubuntu 12.04 LTS, but now with OpenStack
Havana from the Ubuntu Cloud Archive. KVM is the hypervisor, deployed
using `FAI <http://fai-project.org/>`_ and Puppet for configuration
management. The FAI and Puppet combination is used lab-wide, not only
for OpenStack. There is a single cloud controller node, which also acts
as network controller, with the remainder of the server hardware
dedicated to compute nodes.
Host aggregates and instance-type extra specs are used to provide two
different resource allocation ratios. The default resource allocation
ratios we use are 4:1 CPU and 1.5:1 RAM. Compute-intensive workloads use
instance types that require non-oversubscribed hosts where ``cpu_ratio``
and ``ram_ratio`` are both set to 1.0. Since we have hyper-threading
enabled on our compute nodes, this provides one vCPU per CPU thread, or
two vCPUs per physical core.
With our upgrade to Grizzly in August 2013, we moved to OpenStack
Networking, neutron (quantum at the time). Compute nodes have
two-gigabit network interfaces and a separate management card for IPMI
management. One network interface is used for node-to-node
communications. The other is used as a trunk port for OpenStack managed
VLANs. The controller node uses two bonded 10g network interfaces for
its public IP communications. Big pipes are used here because images are
served over this port, and it is also used to connect to iSCSI storage,
back-ending the image storage and database. The controller node also has
a gigabit interface that is used in trunk mode for OpenStack managed
VLAN traffic. This port handles traffic to the dhcp-agent and
metadata-proxy.
We approximate the older ``nova-network`` multi-host HA setup by using
"provider VLAN networks" that connect instances directly to existing
publicly addressable networks and use existing physical routers as their
default gateway. This means that if our network controller goes down,
running instances still have their network available, and no single
Linux host becomes a traffic bottleneck. We are able to do this because
we have a sufficient supply of IPv4 addresses to cover all of our
instances and thus don't need NAT and don't use floating IP addresses.
We provide a single generic public network to all projects and
additional existing VLANs on a project-by-project basis as needed.
Individual projects are also allowed to create their own private GRE
based networks.
Resources
---------
- `CSAIL homepage <http://www.csail.mit.edu/>`_
DAIR
~~~~
Who uses it: DAIR is an integrated virtual environment that leverages
the CANARIE network to develop and test new information communication
technology (ICT) and other digital technologies. It combines such
digital infrastructure as advanced networking and cloud computing and
storage to create an environment for developing and testing innovative
ICT applications, protocols, and services; performing at-scale
experimentation for deployment; and facilitating a faster time to
market.
Deployment
----------
DAIR is hosted at two different data centers across Canada: one in
Alberta and the other in Quebec. It consists of a cloud controller at
each location, although, one is designated the "master" controller that
is in charge of central authentication and quotas. This is done through
custom scripts and light modifications to OpenStack. DAIR is currently
running Havana.
For Object Storage, each region has a swift environment.
A NetApp appliance is used in each region for both block storage and
instance storage. There are future plans to move the instances off the
NetApp appliance and onto a distributed file system such as :term:`Ceph` or
GlusterFS.
VlanManager is used extensively for network management. All servers have
two bonded 10GbE NICs that are connected to two redundant switches. DAIR
is set up to use single-node networking where the cloud controller is
the gateway for all instances on all compute nodes. Internal OpenStack
traffic (for example, storage traffic) does not go through the cloud
controller.
Resources
---------
- `DAIR homepage <http://www.canarie.ca/cloud/>`__
CERN
~~~~
Who uses it: researchers at CERN (European Organization for Nuclear
Research) conducting high-energy physics research.
Deployment
----------
The environment is largely based on Scientific Linux 6, which is Red Hat
compatible. We use KVM as our primary hypervisor, although tests are
ongoing with Hyper-V on Windows Server 2008.
We use the Puppet Labs OpenStack modules to configure Compute, Image
service, Identity, and dashboard. Puppet is used widely for instance
configuration, and Foreman is used as a GUI for reporting and instance
provisioning.
Users and groups are managed through Active Directory and imported into
the Identity service using LDAP. CLIs are available for nova and
Euca2ools to do this.
There are three clouds currently running at CERN, totaling about 4,700
compute nodes, with approximately 120,000 cores. The CERN IT cloud aims
to expand to 300,000 cores by 2015.
Resources
---------
- `“OpenStack in Production: A tale of 3 OpenStack
Clouds” <http://openstack-in-production.blogspot.de/2013/09/a-tale-of-3-openstack-clouds-50000.html>`_
- `“Review of CERN Data Centre
Infrastructure” <http://cds.cern.ch/record/1457989/files/chep%202012%20CERN%20infrastructure%20final.pdf?version=1>`_
- `“CERN Cloud Infrastructure User
Guide” <http://information-technology.web.cern.ch/book/cern-private-cloud-user-guide>`_

View File

@ -0,0 +1,403 @@
====================================================
Designing for Cloud Controllers and Cloud Management
====================================================
OpenStack is designed to be massively horizontally scalable, which
allows all services to be distributed widely. However, to simplify this
guide, we have decided to discuss services of a more central nature,
using the concept of a *cloud controller*. A cloud controller is just a
conceptual simplification. In the real world, you design an architecture
for your cloud controller that enables high availability so that if any
node fails, another can take over the required tasks. In reality, cloud
controller tasks are spread out across more than a single node.
The cloud controller provides the central management system for
OpenStack deployments. Typically, the cloud controller manages
authentication and sends messaging to all the systems through a message
queue.
For many deployments, the cloud controller is a single node. However, to
have high availability, you have to take a few considerations into
account, which we'll cover in this chapter.
The cloud controller manages the following services for the cloud:
Databases
Tracks current information about users and instances, for example,
in a database, typically one database instance managed per service
Message queue services
All :term:`Advanced Message Queuing Protocol (AMQP)` messages for
services are received and sent according to the queue broker
Conductor services
Proxy requests to a database
Authentication and authorization for identity management
Indicates which users can do what actions on certain cloud
resources; quota management is spread out among services,
howeverauthentication
Image-management services
Stores and serves images with metadata on each, for launching in the
cloud
Scheduling services
Indicates which resources to use first; for example, spreading out
where instances are launched based on an algorithm
User dashboard
Provides a web-based front end for users to consume OpenStack cloud
services
API endpoints
Offers each service's REST API access, where the API endpoint
catalog is managed by the Identity service
For our example, the cloud controller has a collection of ``nova-*``
components that represent the global state of the cloud; talks to
services such as authentication; maintains information about the cloud
in a database; communicates to all compute nodes and storage
:term:`workers <worker>` through a queue; and provides API access.
Each service running on a designated cloud controller may be broken out
into separate nodes for scalability or availability.
As another example, you could use pairs of servers for a collective
cloud controller—one active, one standby—for redundant nodes providing a
given set of related services, such as:
- Front end web for API requests, the scheduler for choosing which
compute node to boot an instance on, Identity services, and the
dashboard
- Database and message queue server (such as MySQL, RabbitMQ)
- Image service for the image management
Now that you see the myriad designs for controlling your cloud, read
more about the further considerations to help with your design
decisions.
Hardware Considerations
~~~~~~~~~~~~~~~~~~~~~~~
A cloud controller's hardware can be the same as a compute node, though
you may want to further specify based on the size and type of cloud that
you run.
It's also possible to use virtual machines for all or some of the
services that the cloud controller manages, such as the message queuing.
In this guide, we assume that all services are running directly on the
cloud controller.
The table below contains common considerations to review when sizing hardware
for the cloud controller design.
.. list-table:: Cloud controller hardware sizing considerations
:widths: 50 50
:header-rows: 1
* - Consideration
- Ramification
* - How many instances will run at once?
- Size your database server accordingly, and scale out beyond one cloud
controller if many instances will report status at the same time and
scheduling where a new instance starts up needs computing power.
* - How many compute nodes will run at once?
- Ensure that your messaging queue handles requests successfully and size
accordingly.
* - How many users will access the API?
- If many users will make multiple requests, make sure that the CPU load
for the cloud controller can handle it.
* - How many users will access the dashboard versus the REST API directly?
- The dashboard makes many requests, even more than the API access, so
add even more CPU if your dashboard is the main interface for your users.
* - How many ``nova-api`` services do you run at once for your cloud?
- You need to size the controller with a core per service.
* - How long does a single instance run?
- Starting instances and deleting instances is demanding on the compute
node but also demanding on the controller node because of all the API
queries and scheduling needs.
* - Does your authentication system also verify externally?
- External systems such as LDAP or Active Directory require network
connectivity between the cloud controller and an external authentication
system. Also ensure that the cloud controller has the CPU power to keep
up with requests.
Separation of Services
~~~~~~~~~~~~~~~~~~~~~~
While our example contains all central services in a single location, it
is possible and indeed often a good idea to separate services onto
different physical servers. The table below is a list of deployment
scenarios we've seen and their justifications.
.. list-table:: Deployment scenarios
:widths: 50 50
:header-rows: 1
* - Scenario
- Justification
* - Run ``glance-*`` servers on the ``swift-proxy`` server.
- This deployment felt that the spare I/O on the Object Storage proxy
server was sufficient and that the Image Delivery portion of glance
benefited from being on physical hardware and having good connectivity
to the Object Storage back end it was using.
* - Run a central dedicated database server.
- This deployment used a central dedicated server to provide the databases
for all services. This approach simplified operations by isolating
database server updates and allowed for the simple creation of slave
database servers for failover.
* - Run one VM per service.
- This deployment ran central services on a set of servers running KVM.
A dedicated VM was created for each service (``nova-scheduler``,
rabbitmq, database, etc). This assisted the deployment with scaling
because administrators could tune the resources given to each virtual
machine based on the load it received (something that was not well
understood during installation).
* - Use an external load balancer.
- This deployment had an expensive hardware load balancer in its
organization. It ran multiple ``nova-api`` and ``swift-proxy``
servers on different physical servers and used the load balancer
to switch between them.
One choice that always comes up is whether to virtualize. Some services,
such as ``nova-compute``, ``swift-proxy`` and ``swift-object`` servers,
should not be virtualized. However, control servers can often be happily
virtualized—the performance penalty can usually be offset by simply
running more of the service.
Database
~~~~~~~~
OpenStack Compute uses an SQL database to store and retrieve stateful
information. MySQL is the popular database choice in the OpenStack
community.
Loss of the database leads to errors. As a result, we recommend that you
cluster your database to make it failure tolerant. Configuring and
maintaining a database cluster is done outside OpenStack and is
determined by the database software you choose to use in your cloud
environment. MySQL/Galera is a popular option for MySQL-based databases.
Message Queue
~~~~~~~~~~~~~
Most OpenStack services communicate with each other using the *message
queue*.messages design considerationsdesign considerations message
queues For example, Compute communicates to block storage services and
networking services through the message queue. Also, you can optionally
enable notifications for any service. RabbitMQ, Qpid, and Zeromq are all
popular choices for a message-queue service. In general, if the message
queue fails or becomes inaccessible, the cluster grinds to a halt and
ends up in a read-only state, with information stuck at the point where
the last message was sent. Accordingly, we recommend that you cluster
the message queue. Be aware that clustered message queues can be a pain
point for many OpenStack deployments. While RabbitMQ has native
clustering support, there have been reports of issues when running it at
a large scale. While other queuing solutions are available, such as Zeromq
and Qpid, Zeromq does not offer stateful queues. Qpid is the messaging
system of choice for Red Hat and its derivatives. Qpid does not have
native clustering capabilities and requires a supplemental service, such
as Pacemaker or Corsync. For your message queue, you need to determine
what level of data loss you are comfortable with and whether to use an
OpenStack project's ability to retry multiple MQ hosts in the event of a
failure, such as using Compute's ability to do so.
Conductor Services
~~~~~~~~~~~~~~~~~~
In the previous version of OpenStack, all ``nova-compute`` services
required direct access to the database hosted on the cloud controller.
This was problematic for two reasons: security and performance. With
regard to security, if a compute node is compromised, the attacker
inherently has access to the database. With regard to performance,
``nova-compute`` calls to the database are single-threaded and blocking.
This creates a performance bottleneck because database requests are
fulfilled serially rather than in parallel.
The conductor service resolves both of these issues by acting as a proxy
for the ``nova-compute`` service. Now, instead of ``nova-compute``
directly accessing the database, it contacts the ``nova-conductor``
service, and ``nova-conductor`` accesses the database on
``nova-compute``'s behalf. Since ``nova-compute`` no longer has direct
access to the database, the security issue is resolved. Additionally,
``nova-conductor`` is a nonblocking service, so requests from all
compute nodes are fulfilled in parallel.
.. note::
If you are using ``nova-network`` and multi-host networking in your
cloud environment, ``nova-compute`` still requires direct access to
the database.
The ``nova-conductor`` service is horizontally scalable. To make
``nova-conductor`` highly available and fault tolerant, just launch more
instances of the ``nova-conductor`` process, either on the same server
or across multiple servers.
Application Programming Interface (API)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All public access, whether direct, through a command-line client, or
through the web-based dashboard, uses the API service. Find the API
reference at http://api.openstack.org/.
You must choose whether you want to support the Amazon EC2 compatibility
APIs, or just the OpenStack APIs. One issue you might encounter when
running both APIs is an inconsistent experience when referring to images
and instances.
For example, the EC2 API refers to instances using IDs that contain
hexadecimal, whereas the OpenStack API uses names and digits. Similarly,
the EC2 API tends to rely on DNS aliases for contacting virtual
machines, as opposed to OpenStack, which typically lists IP
addresses.
If OpenStack is not set up in the right way, it is simple to have
scenarios in which users are unable to contact their instances due to
having only an incorrect DNS alias. Despite this, EC2 compatibility can
assist users migrating to your cloud.
As with databases and message queues, having more than one :term:`API server`
is a good thing. Traditional HTTP load-balancing techniques can be used to
achieve a highly available ``nova-api`` service.
Extensions
~~~~~~~~~~
The `API
Specifications <http://docs.openstack.org/api/api-specs.html>`_ define
the core actions, capabilities, and mediatypes of the OpenStack API. A
client can always depend on the availability of this core API, and
implementers are always required to support it in its entirety.
Requiring strict adherence to the core API allows clients to rely upon a
minimal level of functionality when interacting with multiple
implementations of the same API.
The OpenStack Compute API is extensible. An extension adds capabilities
to an API beyond those defined in the core. The introduction of new
features, MIME types, actions, states, headers, parameters, and
resources can all be accomplished by means of extensions to the core
API. This allows the introduction of new features in the API without
requiring a version change and allows the introduction of
vendor-specific niche functionality.
Scheduling
~~~~~~~~~~
The scheduling services are responsible for determining the compute or
storage node where a virtual machine or block storage volume should be
created. The scheduling services receive creation requests for these
resources from the message queue and then begin the process of
determining the appropriate node where the resource should reside. This
process is done by applying a series of user-configurable filters
against the available collection of nodes.
There are currently two schedulers: ``nova-scheduler`` for virtual
machines and ``cinder-scheduler`` for block storage volumes. Both
schedulers are able to scale horizontally, so for high-availability
purposes, or for very large or high-schedule-frequency installations,
you should consider running multiple instances of each scheduler. The
schedulers all listen to the shared message queue, so no special load
balancing is required.
Images
~~~~~~
The OpenStack Image service consists of two parts: ``glance-api`` and
``glance-registry``. The former is responsible for the delivery of
images; the compute node uses it to download images from the back end.
The latter maintains the metadata information associated with virtual
machine images and requires a database.
The ``glance-api`` part is an abstraction layer that allows a choice of
back end. Currently, it supports:
OpenStack Object Storage
Allows you to store images as objects.
File system
Uses any traditional file system to store the images as files.
S3
Allows you to fetch images from Amazon S3.
HTTP
Allows you to fetch images from a web server. You cannot write
images by using this mode.
If you have an OpenStack Object Storage service, we recommend using this
as a scalable place to store your images. You can also use a file system
with sufficient performance or Amazon S3—unless you do not need the
ability to upload new images through OpenStack.
Dashboard
~~~~~~~~~
The OpenStack dashboard (horizon) provides a web-based user interface to
the various OpenStack components. The dashboard includes an end-user
area for users to manage their virtual infrastructure and an admin area
for cloud operators to manage the OpenStack environment as a
whole.
The dashboard is implemented as a Python web application that normally
runs in :term:`Apache` ``httpd``. Therefore, you may treat it the same as any
other web application, provided it can reach the API servers (including
their admin endpoints) over the network.
Authentication and Authorization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The concepts supporting OpenStack's authentication and authorization are
derived from well-understood and widely used systems of a similar
nature. Users have credentials they can use to authenticate, and they
can be a member of one or more groups (known as projects or tenants,
interchangeably).
For example, a cloud administrator might be able to list all instances
in the cloud, whereas a user can see only those in his current group.
Resources quotas, such as the number of cores that can be used, disk
space, and so on, are associated with a project.
OpenStack Identity provides authentication decisions and user attribute
information, which is then used by the other OpenStack services to
perform authorization. The policy is set in the ``policy.json`` file.
For information on how to configure these, see :doc:`ops_projects_users`
OpenStack Identity supports different plug-ins for authentication
decisions and identity storage. Examples of these plug-ins include:
- In-memory key-value Store (a simplified internal storage structure)
- SQL database (such as MySQL or PostgreSQL)
- Memcached (a distributed memory object caching system)
- LDAP (such as OpenLDAP or Microsoft's Active Directory)
Many deployments use the SQL database; however, LDAP is also a popular
choice for those with existing authentication infrastructure that needs
to be integrated.
Network Considerations
~~~~~~~~~~~~~~~~~~~~~~
Because the cloud controller handles so many different services, it must
be able to handle the amount of traffic that hits it. For example, if
you choose to host the OpenStack Image service on the cloud controller,
the cloud controller should be able to support the transferring of the
images at an acceptable speed.
As another example, if you choose to use single-host networking where
the cloud controller is the network gateway for all instances, then the
cloud controller must support the total amount of traffic that travels
between your cloud and the public Internet.
We recommend that you use a fast NIC, such as 10 GB. You can also choose
to use two 10 GB NICs and bond them together. While you might not be
able to get a full bonded 20 GB speed, different transmission streams
use different NICs. For example, if the cloud controller transfers two
images, each image uses a different NIC and gets a full 10 GB of
bandwidth.

View File

@ -0,0 +1,331 @@
=============
Compute Nodes
=============
In this chapter, we discuss some of the choices you need to consider
when building out your compute nodes. Compute nodes form the resource
core of the OpenStack Compute cloud, providing the processing, memory,
network and storage resources to run instances.
Choosing a CPU
~~~~~~~~~~~~~~
The type of CPU in your compute node is a very important choice. First,
ensure that the CPU supports virtualization by way of *VT-x* for Intel
chips and *AMD-v* for AMD chips.
.. note::
Consult the vendor documentation to check for virtualization
support. For Intel, read `“Does my processor support Intel® Virtualization
Technology?” <http://www.intel.com/support/processors/sb/cs-030729.htm>`_.
For AMD, read `AMD Virtualization
<http://www.amd.com/en-us/innovations/software-technologies/server-solution/virtualization>`_.
Note that your CPU may support virtualization but it may be
disabled. Consult your BIOS documentation for how to enable CPU
features.
The number of cores that the CPU has also affects the decision. It's
common for current CPUs to have up to 12 cores. Additionally, if an
Intel CPU supports hyperthreading, those 12 cores are doubled to 24
cores. If you purchase a server that supports multiple CPUs, the number
of cores is further multiplied.
**Multithread Considerations**
Hyper-Threading is Intel's proprietary simultaneous multithreading
implementation used to improve parallelization on their CPUs. You might
consider enabling Hyper-Threading to improve the performance of
multithreaded applications.
Whether you should enable Hyper-Threading on your CPUs depends upon your
use case. For example, disabling Hyper-Threading can be beneficial in
intense computing environments. We recommend that you do performance
testing with your local workload with both Hyper-Threading on and off to
determine what is more appropriate in your case.
Choosing a Hypervisor
~~~~~~~~~~~~~~~~~~~~~
A hypervisor provides software to manage virtual machine access to the
underlying hardware. The hypervisor creates, manages, and monitors
virtual machines. OpenStack Compute supports many hypervisors to various
degrees, including:
- `KVM <http://www.linux-kvm.org/page/Main_Page>`_
- `LXC <https://linuxcontainers.org/>`_
- `QEMU <http://wiki.qemu.org/Main_Page>`_
- `VMware
ESX/ESXi <https://www.vmware.com/support/vsphere-hypervisor>`_
- `Xen <http://www.xenproject.org/>`_
- `Hyper-V <http://technet.microsoft.com/en-us/library/hh831531.aspx>`_
- `Docker <https://www.docker.com/>`_
Probably the most important factor in your choice of hypervisor is your
current usage or experience. Aside from that, there are practical
concerns to do with feature parity, documentation, and the level of
community experience.
For example, KVM is the most widely adopted hypervisor in the OpenStack
community. Besides KVM, more deployments run Xen, LXC, VMware, and
Hyper-V than the others listed. However, each of these are lacking some
feature support or the documentation on how to use them with OpenStack
is out of date.
The best information available to support your choice is found on the
`Hypervisor Support Matrix
<http://docs.openstack.org/developer/nova/support-matrix.html>`_
and in the `configuration reference
<http://docs.openstack.org/liberty/config-reference/content/section_compute-hypervisors.html>`_.
.. note::
It is also possible to run multiple hypervisors in a single
deployment using host aggregates or cells. However, an individual
compute node can run only a single hypervisor at a time.
Instance Storage Solutions
~~~~~~~~~~~~~~~~~~~~~~~~~~
As part of the procurement for a compute cluster, you must specify some
storage for the disk on which the instantiated instance runs. There are
three main approaches to providing this temporary-style storage, and it
is important to understand the implications of the choice.
They are:
- Off compute node storage—shared file system
- On compute node storage—shared file system
- On compute node storage—nonshared file system
In general, the questions you should ask when selecting storage are as
follows:
- What is the platter count you can achieve?
- Do more spindles result in better I/O despite network access?
- Which one results in the best cost-performance scenario you're aiming
for?
- How do you manage the storage operationally?
Many operators use separate compute and storage hosts. Compute services
and storage services have different requirements, and compute hosts
typically require more CPU and RAM than storage hosts. Therefore, for a
fixed budget, it makes sense to have different configurations for your
compute nodes and your storage nodes. Compute nodes will be invested in
CPU and RAM, and storage nodes will be invested in block storage.
However, if you are more restricted in the number of physical hosts you
have available for creating your cloud and you want to be able to
dedicate as many of your hosts as possible to running instances, it
makes sense to run compute and storage on the same machines.
We'll discuss the three main approaches to instance storage in the next
few sections.
Off Compute Node Storage—Shared File System
-------------------------------------------
In this option, the disks storing the running instances are hosted in
servers outside of the compute nodes.
If you use separate compute and storage hosts, you can treat your
compute hosts as "stateless." As long as you don't have any instances
currently running on a compute host, you can take it offline or wipe it
completely without having any effect on the rest of your cloud. This
simplifies maintenance for the compute hosts.
There are several advantages to this approach:
- If a compute node fails, instances are usually easily recoverable.
- Running a dedicated storage system can be operationally simpler.
- You can scale to any number of spindles.
- It may be possible to share the external storage for other purposes.
The main downsides to this approach are:
- Depending on design, heavy I/O usage from some instances can affect
unrelated instances.
- Use of the network can decrease performance.
On Compute Node Storage—Shared File System
------------------------------------------
In this option, each compute node is specified with a significant amount
of disk space, but a distributed file system ties the disks from each
compute node into a single mount.
The main advantage of this option is that it scales to external storage
when you require additional storage.
However, this option has several downsides:
- Running a distributed file system can make you lose your data
locality compared with nonshared storage.
- Recovery of instances is complicated by depending on multiple hosts.
- The chassis size of the compute node can limit the number of spindles
able to be used in a compute node.
- Use of the network can decrease performance.
On Compute Node Storage—Nonshared File System
---------------------------------------------
In this option, each compute node is specified with enough disks to
store the instances it hosts.
There are two main reasons why this is a good idea:
- Heavy I/O usage on one compute node does not affect instances on
other compute nodes.
- Direct I/O access can increase performance.
This has several downsides:
- If a compute node fails, the instances running on that node are lost.
- The chassis size of the compute node can limit the number of spindles
able to be used in a compute node.
- Migrations of instances from one node to another are more complicated
and rely on features that may not continue to be developed.
- If additional storage is required, this option does not scale.
Running a shared file system on a storage system apart from the computes
nodes is ideal for clouds where reliability and scalability are the most
important factors. Running a shared file system on the compute nodes
themselves may be best in a scenario where you have to deploy to
preexisting servers for which you have little to no control over their
specifications. Running a nonshared file system on the compute nodes
themselves is a good option for clouds with high I/O requirements and
low concern for reliability.
Issues with Live Migration
--------------------------
We consider live migration an integral part of the operations of the
cloud. This feature provides the ability to seamlessly move instances
from one physical host to another, a necessity for performing upgrades
that require reboots of the compute hosts, but only works well with
shared storage.
Live migration can also be done with nonshared storage, using a feature
known as *KVM live block migration*. While an earlier implementation of
block-based migration in KVM and QEMU was considered unreliable, there
is a newer, more reliable implementation of block-based live migration
as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack.
However, none of the authors of this guide have first-hand experience
using live block migration.
Choice of File System
---------------------
If you want to support shared-storage live migration, you need to
configure a distributed file system.
Possible options include:
- NFS (default for Linux)
- GlusterFS
- MooseFS
- Lustre
We've seen deployments with all, and recommend that you choose the one
you are most familiar with operating. If you are not familiar with any
of these, choose NFS, as it is the easiest to set up and there is
extensive community knowledge about it.
Overcommitting
~~~~~~~~~~~~~~
OpenStack allows you to overcommit CPU and RAM on compute nodes. This
allows you to increase the number of instances you can have running on
your cloud, at the cost of reducing the performance of the instances.
OpenStack Compute uses the following ratios by default:
- CPU allocation ratio: 16:1
- RAM allocation ratio: 1.5:1
The default CPU allocation ratio of 16:1 means that the scheduler
allocates up to 16 virtual cores per physical core. For example, if a
physical node has 12 cores, the scheduler sees 192 available virtual
cores. With typical flavor definitions of 4 virtual cores per instance,
this ratio would provide 48 instances on a physical node.
The formula for the number of virtual instances on a compute node is
*(OR*PC)/VC*, where:
*OR*
CPU overcommit ratio (virtual cores per physical core)
*PC*
Number of physical cores
*VC*
Number of virtual cores per instance
Similarly, the default RAM allocation ratio of 1.5:1 means that the
scheduler allocates instances to a physical node as long as the total
amount of RAM associated with the instances is less than 1.5 times the
amount of RAM available on the physical node.
For example, if a physical node has 48 GB of RAM, the scheduler
allocates instances to that node until the sum of the RAM associated
with the instances reaches 72 GB (such as nine instances, in the case
where each instance has 8 GB of RAM).
.. note::
Regardless of the overcommit ratio, an instance can not be placed
on any physical node with fewer raw (pre-overcommit) resources than
the instance flavor requires.
You must select the appropriate CPU and RAM allocation ratio for your
particular use case.
Logging
~~~~~~~
Logging is detailed more fully in :doc:`ops_logging_monitoring`. However,
it is an important design consideration to take into account before
commencing operations of your cloud.
OpenStack produces a great deal of useful logging information, however;
but for the information to be useful for operations purposes, you should
consider having a central logging server to send logs to, and a log
parsing/analysis system (such as logstash).
Networking
~~~~~~~~~~
Networking in OpenStack is a complex, multifaceted challenge. See
:doc:`arch_network_design`.
Conclusion
~~~~~~~~~~
Compute nodes are the workhorse of your cloud and the place where your
users' applications will run. They are likely to be affected by your
decisions on what to deploy and how you deploy it. Their requirements
should be reflected in the choices you make.

View File

@ -0,0 +1,556 @@
===========================================
Example Architecture — OpenStack Networking
===========================================
This chapter provides an example architecture using OpenStack
Networking, also known as the Neutron project, in a highly available
environment.
Overview
~~~~~~~~
A highly-available environment can be put into place if you require an
environment that can scale horizontally, or want your cloud to continue
to be operational in case of node failure. This example architecture has
been written based on the current default feature set of OpenStack
Havana, with an emphasis on high availability.
Components
----------
.. list-table::
:widths: 50 50
:header-rows: 1
* - Component
- Details
* - OpenStack release
- Havana
* - Host operating system
- Red Hat Enterprise Linux 6.5
* - OpenStack package repository
- `Red Hat Distributed OpenStack (RDO) <https://repos.fedorapeople.org/repos/openstack/>`_
* - Hypervisor
- KVM
* - Database
- MySQL
* - Message queue
- Qpid
* - Networking service
- OpenStack Networking
* - Tenant Network Separation
- VLAN
* - Image service back end
- GlusterFS
* - Identity driver
- SQL
* - Block Storage back end
- GlusterFS
Rationale
---------
This example architecture has been selected based on the current default
feature set of OpenStack Havana, with an emphasis on high availability.
This architecture is currently being deployed in an internal Red Hat
OpenStack cloud and used to run hosted and shared services, which by
their nature must be highly available.
This architecture's components have been selected for the following
reasons:
Red Hat Enterprise Linux
You must choose an operating system that can run on all of the
physical nodes. This example architecture is based on Red Hat
Enterprise Linux, which offers reliability, long-term support,
certified testing, and is hardened. Enterprise customers, now moving
into OpenStack usage, typically require these advantages.
RDO
The Red Hat Distributed OpenStack package offers an easy way to
download the most current OpenStack release that is built for the
Red Hat Enterprise Linux platform.
KVM
KVM is the supported hypervisor of choice for Red Hat Enterprise
Linux (and included in distribution). It is feature complete and
free from licensing charges and restrictions.
MySQL
MySQL is used as the database back end for all databases in the
OpenStack environment. MySQL is the supported database of choice for
Red Hat Enterprise Linux (and included in distribution); the
database is open source, scalable, and handles memory well.
Qpid
Apache Qpid offers 100 percent compatibility with the
:term:`Advanced Message Queuing Protocol (AMQP)` Standard, and its
broker is available for both C++ and Java.
OpenStack Networking
OpenStack Networking offers sophisticated networking functionality,
including Layer 2 (L2) network segregation and provider networks.
VLAN
Using a virtual local area network offers broadcast control,
security, and physical layer transparency. If needed, use VXLAN to
extend your address space.
GlusterFS
GlusterFS offers scalable storage. As your environment grows, you
can continue to add more storage nodes (instead of being restricted,
for example, by an expensive storage array).
Detailed Description
~~~~~~~~~~~~~~~~~~~~
Node types
----------
This section gives you a breakdown of the different nodes that make up
the OpenStack environment. A node is a physical machine that is
provisioned with an operating system, and running a defined software
stack on top of it. The table below provides node descriptions and
specifications.
.. list-table:: Node types
:widths: 33 33 33
:header-rows: 1
* - Type
- Description
- Example hardware
* - Controller
- Controller nodes are responsible for running the management software
services needed for the OpenStack environment to function.
These nodes:
* Provide the front door that people access as well as the API
services that all other components in the environment talk to.
* Run a number of services in a highly available fashion,
utilizing Pacemaker and HAProxy to provide a virtual IP and
load-balancing functions so all controller nodes are being used.
* Supply highly available "infrastructure" services,
such as MySQL and Qpid, that underpin all the services.
* Provide what is known as "persistent storage" through services
run on the host as well. This persistent storage is backed onto
the storage nodes for reliability.
See :ref:`controller_node`.
- Model: Dell R620
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
Memory: 32 GB
Disk: two 300 GB 10000 RPM SAS Disks
Network: two 10G network ports
* - Compute
- Compute nodes run the virtual machine instances in OpenStack. They:
* Run the bare minimum of services needed to facilitate these
instances.
* Use local storage on the node for the virtual machines so that
no VM migration or instance recovery at node failure is possible.
See :ref:`compute_node`.
- Model: Dell R620
CPU: 2x Intel® Xeon® CPU E5-2650 0 @ 2.00 GHz
Memory: 128 GB
Disk: two 600 GB 10000 RPM SAS Disks
Network: four 10G network ports (For future proofing expansion)
* - Storage
- Storage nodes store all the data required for the environment,
including disk images in the Image service library, and the
persistent storage volumes created by the Block Storage service.
Storage nodes use GlusterFS technology to keep the data highly
available and scalable.
See :ref:`storage_node`.
- Model: Dell R720xd
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
Memory: 64 GB
Disk: two 500 GB 7200 RPM SAS Disks and twenty-four 600 GB
10000 RPM SAS Disks
Raid Controller: PERC H710P Integrated RAID Controller, 1 GB NV Cache
Network: two 10G network ports
* - Network
- Network nodes are responsible for doing all the virtual networking
needed for people to create public or private networks and uplink
their virtual machines into external networks. Network nodes:
* Form the only ingress and egress point for instances running
on top of OpenStack.
* Run all of the environment's networking services, with the
exception of the networking API service (which runs on the
controller node).
See :ref:`network_node`.
- Model: Dell R620
CPU: 1x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
Memory: 32 GB
Disk: two 300 GB 10000 RPM SAS Disks
Network: five 10G network ports
* - Utility
- Utility nodes are used by internal administration staff only to
provide a number of basic system administration functions needed
to get the environment up and running and to maintain the hardware,
OS, and software on which it runs.
These nodes run services such as provisioning, configuration
management, monitoring, or GlusterFS management software.
They are not required to scale, although these machines are
usually backed up.
- Model: Dell R620
CPU: 2x Intel® Xeon® CPU E5-2620 0 @ 2.00 GHz
Memory: 32 GB
Disk: two 500 GB 7200 RPM SAS Disks
Network: two 10G network ports
.. _networking_layout:
Networking layout
-----------------
The network contains all the management devices for all hardware in the
environment (for example, by including Dell iDrac7 devices for the
hardware nodes, and management interfaces for network switches). The
network is accessed by internal staff only when diagnosing or recovering
a hardware issue.
OpenStack internal network
--------------------------
This network is used for OpenStack management functions and traffic,
including services needed for the provisioning of physical nodes
(``pxe``, ``tftp``, ``kickstart``), traffic between various OpenStack
node types using OpenStack APIs and messages (for example,
``nova-compute`` talking to ``keystone`` or ``cinder-volume`` talking to
``nova-api``), and all traffic for storage data to the storage layer
underneath by the Gluster protocol. All physical nodes have at least one
network interface (typically ``eth0``) in this network. This network is
only accessible from other VLANs on port 22 (for ``ssh`` access to
manage machines).
Public Network
--------------
This network is a combination of:
- IP addresses for public-facing interfaces on the controller nodes
(which end users will access the OpenStack services)
- A range of publicly routable, IPv4 network addresses to be used by
OpenStack Networking for floating IPs. You may be restricted in your
access to IPv4 addresses; a large range of IPv4 addresses is not
necessary.
- Routers for private networks created within OpenStack.
This network is connected to the controller nodes so users can access
the OpenStack interfaces, and connected to the network nodes to provide
VMs with publicly routable traffic functionality. The network is also
connected to the utility machines so that any utility services that need
to be made public (such as system monitoring) can be accessed.
VM traffic network
------------------
This is a closed network that is not publicly routable and is simply
used as a private, internal network for traffic between virtual machines
in OpenStack, and between the virtual machines and the network nodes
that provide l3 routes out to the public network (and floating IPs for
connections back in to the VMs). Because this is a closed network, we
are using a different address space to the others to clearly define the
separation. Only Compute and OpenStack Networking nodes need to be
connected to this network.
Node connectivity
~~~~~~~~~~~~~~~~~
The following section details how the nodes are connected to the
different networks (see :ref:`networking_layout`) and
what other considerations need to take place (for example, bonding) when
connecting nodes to the networks.
Initial deployment
------------------
Initially, the connection setup should revolve around keeping the
connectivity simple and straightforward in order to minimize deployment
complexity and time to deploy. The deployment shown below aims to have 1 × 10G
connectivity available to all compute nodes, while still leveraging bonding on
appropriate nodes for maximum performance.
.. figure:: figures/osog_0101.png
:alt: Basic node deployment
:width: 100%
Basic node deployment
Connectivity for maximum performance
------------------------------------
If the networking performance of the basic layout is not enough, you can
move to the design below, which provides 2 × 10G network
links to all instances in the environment as well as providing more
network bandwidth to the storage layer.
.. figure:: figures/osog_0102.png
:alt: Performance node deployment
:width: 100%
Performance node deployment
Node diagrams
~~~~~~~~~~~~~
The following diagrams include logical
information about the different types of nodes, indicating what services
will be running on top of them and how they interact with each other.
The diagrams also illustrate how the availability and scalability of
services are achieved.
.. _controller_node:
.. figure:: figures/osog_0103.png
:alt: Controller node
:width: 100%
Controller node
.. _compute_node:
.. figure:: figures/osog_0104.png
:alt: Compute node
:width: 100%
Compute node
.. _network_node:
.. figure:: figures/osog_0105.png
:alt: Network node
:width: 100%
Network node
.. _storage_node:
.. figure:: figures/osog_0106.png
:alt: Storage node
:width: 100%
Storage node
Example Component Configuration
-------------------------------
The following tables include example configuration
and considerations for both third-party and OpenStack components:
.. list-table:: Table: Third-party component configuration
:widths: 25 25 25 25
:header-rows: 1
* - Component
- Tuning
- Availability
- Scalability
* - MySQL
- ``binlog-format = row``
- Master/master replication. However, both nodes are not used at the
same time. Replication keeps all nodes as close to being up to date
as possible (although the asynchronous nature of the replication means
a fully consistent state is not possible). Connections to the database
only happen through a Pacemaker virtual IP, ensuring that most problems
that occur with master-master replication can be avoided.
- Not heavily considered. Once load on the MySQL server increases enough
that scalability needs to be considered, multiple masters or a
master/slave setup can be used.
* - Qpid
- ``max-connections=1000`` ``worker-threads=20`` ``connection-backlog=10``,
sasl security enabled with SASL-BASIC authentication
- Qpid is added as a resource to the Pacemaker software that runs on
Controller nodes where Qpid is situated. This ensures only one Qpid
instance is running at one time, and the node with the Pacemaker
virtual IP will always be the node running Qpid.
- Not heavily considered. However, Qpid can be changed to run on all
controller nodes for scalability and availability purposes,
and removed from Pacemaker.
* - HAProxy
- ``maxconn 3000``
- HAProxy is a software layer-7 load balancer used to front door all
clustered OpenStack API components and do SSL termination.
HAProxy can be added as a resource to the Pacemaker software that
runs on the Controller nodes where HAProxy is situated.
This ensures that only one HAProxy instance is running at one time,
and the node with the Pacemaker virtual IP will always be the node
running HAProxy.
- Not considered. HAProxy has small enough performance overheads that
a single instance should scale enough for this level of workload.
If extra scalability is needed, ``keepalived`` or other Layer-4
load balancing can be introduced to be placed in front of multiple
copies of HAProxy.
* - Memcached
- ``MAXCONN="8192" CACHESIZE="30457"``
- Memcached is a fast in-memory key-value cache software that is used
by OpenStack components for caching data and increasing performance.
Memcached runs on all controller nodes, ensuring that should one go
down, another instance of Memcached is available.
- Not considered. A single instance of Memcached should be able to
scale to the desired workloads. If scalability is desired, HAProxy
can be placed in front of Memcached (in raw ``tcp`` mode) to utilize
multiple Memcached instances for scalability. However, this might
cause cache consistency issues.
* - Pacemaker
- Configured to use ``corosync`` and ``cman`` as a cluster communication
stack/quorum manager, and as a two-node cluster.
- Pacemaker is the clustering software used to ensure the availability
of services running on the controller and network nodes:
* Because Pacemaker is cluster software, the software itself handles
its own availability, leveraging ``corosync`` and ``cman``
underneath.
* If you use the GlusterFS native client, no virtual IP is needed,
since the client knows all about nodes after initial connection
and automatically routes around failures on the client side.
* If you use the NFS or SMB adaptor, you will need a virtual IP on
which to mount the GlusterFS volumes.
- If more nodes need to be made cluster aware, Pacemaker can scale to
64 nodes.
* - GlusterFS
- ``glusterfs`` performance profile "virt" enabled on all volumes.
Volumes are setup in two-node replication.
- Glusterfs is a clustered file system that is run on the storage
nodes to provide persistent scalable data storage in the environment.
Because all connections to gluster use the ``gluster`` native mount
points, the ``gluster`` instances themselves provide availability
and failover functionality.
- The scalability of GlusterFS storage can be achieved by adding in
more storage volumes.
|
.. list-table:: Table: OpenStack component configuration
:widths: 20 20 20 20 20
:header-rows: 1
* - Component
- Node type
- Tuning
- Availability
- Scalability
* - Dashboard (horizon)
- Controller
- Configured to use Memcached as a session store, ``neutron``
support is enabled, ``can_set_mount_point = False``
- The dashboard is run on all controller nodes, ensuring at least one
instance will be available in case of node failure.
It also sits behind HAProxy, which detects when the software fails
and routes requests around the failing instance.
- The dashboard is run on all controller nodes, so scalability can be
achieved with additional controller nodes. HAProxy allows scalability
for the dashboard as more nodes are added.
* - Identity (keystone)
- Controller
- Configured to use Memcached for caching and PKI for tokens.
- Identity is run on all controller nodes, ensuring at least one
instance will be available in case of node failure.
Identity also sits behind HAProxy, which detects when the software
fails and routes requests around the failing instance.
- Identity is run on all controller nodes, so scalability can be
achieved with additional controller nodes.
HAProxy allows scalability for Identity as more nodes are added.
* - Image service (glance)
- Controller
- ``/var/lib/glance/images`` is a GlusterFS native mount to a Gluster
volume off the storage layer.
- The Image service is run on all controller nodes, ensuring at least
one instance will be available in case of node failure.
It also sits behind HAProxy, which detects when the software fails
and routes requests around the failing instance.
- The Image service is run on all controller nodes, so scalability
can be achieved with additional controller nodes. HAProxy allows
scalability for the Image service as more nodes are added.
* - Compute (nova)
- Controller, Compute
- Configured to use Qpid, ``qpid_heartbeat = `` ``10``,configured to
use Memcached for caching, configured to use ``libvirt``, configured
to use ``neutron``.
Configured ``nova-consoleauth`` to use Memcached for session
management (so that it can have multiple copies and run in a
load balancer).
- The nova API, scheduler, objectstore, cert, consoleauth, conductor,
and vncproxy services are run on all controller nodes, ensuring at
least one instance will be available in case of node failure.
Compute is also behind HAProxy, which detects when the software
fails and routes requests around the failing instance.
Nova-compute and nova-conductor services, which run on the compute
nodes, are only needed to run services on that node, so availability
of those services is coupled tightly to the nodes that are available.
As long as a compute node is up, it will have the needed services
running on top of it.
- The nova API, scheduler, objectstore, cert, consoleauth, conductor,
and vncproxy services are run on all controller nodes, so scalability
can be achieved with additional controller nodes. HAProxy allows
scalability for Compute as more nodes are added. The scalability
of services running on the compute nodes (compute, conductor) is
achieved linearly by adding in more compute nodes.
* - Block Storage (cinder)
- Controller
- Configured to use Qpid, ``qpid_heartbeat = ``\ ``10``,configured to
use a Gluster volume from the storage layer as the back end for
Block Storage, using the Gluster native client.
- Block Storage API, scheduler, and volume services are run on all
controller nodes, ensuring at least one instance will be available
in case of node failure. Block Storage also sits behind HAProxy,
which detects if the software fails and routes requests around the
failing instance.
- Block Storage API, scheduler and volume services are run on all
controller nodes, so scalability can be achieved with additional
controller nodes. HAProxy allows scalability for Block Storage as
more nodes are added.
* - OpenStack Networking (neutron)
- Controller, Compute, Network
- Configured to use QPID, ``qpid_heartbeat = 10``, kernel namespace
support enabled, ``tenant_network_type = vlan``,
``allow_overlapping_ips = true``, ``tenant_network_type = vlan``,
``bridge_uplinks = br-ex:em2``, ``bridge_mappings = physnet1:br-ex``
- The OpenStack Networking service is run on all controller nodes,
ensuring at least one instance will be available in case of node
failure. It also sits behind HAProxy, which detects if the software
fails and routes requests around the failing instance.
- The OpenStack Networking server service is run on all controller
nodes, so scalability can be achieved with additional controller
nodes. HAProxy allows scalability for OpenStack Networking as more
nodes are added. Scalability of services running on the network
nodes is not currently supported by OpenStack Networking, so they
are not be considered. One copy of the services should be sufficient
to handle the workload. Scalability of the ``ovs-agent`` running on
compute nodes is achieved by adding in more compute nodes as
necessary.

View File

@ -0,0 +1,259 @@
===============================================
Example Architecture — Legacy Networking (nova)
===============================================
This particular example architecture has been upgraded from :term:`Grizzly` to
:term:`Havana` and tested in production environments where many public IP
addresses are available for assignment to multiple instances. You can
find a second example architecture that uses OpenStack Networking
(neutron) after this section. Each example offers high availability,
meaning that if a particular node goes down, another node with the same
configuration can take over the tasks so that the services continue to
be available.
Overview
~~~~~~~~
The simplest architecture you can build upon for Compute has a single
cloud controller and multiple compute nodes. The simplest architecture
for Object Storage has five nodes: one for identifying users and
proxying requests to the API, then four for storage itself to provide
enough replication for eventual consistency. This example architecture
does not dictate a particular number of nodes, but shows the thinking
and considerations that went into choosing this architecture including
the features offered.
Components
~~~~~~~~~~
.. list-table::
:widths: 50 50
:header-rows: 1
* - Component
- Details
* - OpenStack release
- Havana
* - Host operating system
- Ubuntu 12.04 LTS or Red Hat Enterprise Linux 6.5,
including derivatives such as CentOS and Scientific Linux
* - OpenStack package repository
- `Ubuntu Cloud Archive <https://wiki.ubuntu.com/ServerTeam/CloudArchive>`_
or `RDO <http://openstack.redhat.com/Frequently_Asked_Questions>`_
* - Hypervisor
- KVM
* - Database
- MySQL\*
* - Message queue
- RabbitMQ for Ubuntu; Qpid for Red Hat Enterprise Linux and derivatives
* - Networking service
- ``nova-network``
* - Network manager
- FlatDHCP
* - Single ``nova-network`` or multi-host?
- multi-host\*
* - Image service (glance) back end
- file
* - Identity (keystone) driver
- SQL
* - Block Storage (cinder) back end
- LVM/iSCSI
* - Live Migration back end
- Shared storage using NFS\*
* - Object storage
- OpenStack Object Storage (swift)
An asterisk (\*) indicates when the example architecture deviates from
the settings of a default installation. We'll offer explanations for
those deviations next.
.. note::
The following features of OpenStack are supported by the example
architecture documented in this guide, but are optional:
- :term:`Dashboard`: You probably want to offer a dashboard, but your
users may be more interested in API access only.
- Block storage: You don't have to offer users block storage if
their use case only needs ephemeral storage on compute nodes, for
example.
- Floating IP address: Floating IP addresses are public IP
addresses that you allocate from a predefined pool to assign to
virtual machines at launch. Floating IP address ensure that the
public IP address is available whenever an instance is booted.
Not every organization can offer thousands of public floating IP
addresses for thousands of instances, so this feature is
considered optional.
- Live migration: If you need to move running virtual machine
instances from one host to another with little or no service
interruption, you would enable live migration, but it is
considered optional.
- Object storage: You may choose to store machine images on a file
system rather than in object storage if you do not have the extra
hardware for the required replication and redundancy that
OpenStack Object Storage offers.
Rationale
~~~~~~~~~
This example architecture has been selected based on the current default
feature set of OpenStack Havana, with an emphasis on stability. We
believe that many clouds that currently run OpenStack in production have
made similar choices.
You must first choose the operating system that runs on all of the
physical nodes. While OpenStack is supported on several distributions of
Linux, we used *Ubuntu 12.04 LTS (Long Term Support)*, which is used by
the majority of the development community, has feature completeness
compared with other distributions and has clear future support plans.
We recommend that you do not use the default Ubuntu OpenStack install
packages and instead use the `Ubuntu Cloud
Archive <https://wiki.ubuntu.com/ServerTeam/CloudArchive>`__. The Cloud
Archive is a package repository supported by Canonical that allows you
to upgrade to future OpenStack releases while remaining on Ubuntu 12.04.
*KVM* as a :term:`hypervisor` complements the choice of Ubuntu—being a
matched pair in terms of support, and also because of the significant degree
of attention it garners from the OpenStack development community (including
the authors, who mostly use KVM). It is also feature complete, free from
licensing charges and restrictions.
*MySQL* follows a similar trend. Despite its recent change of ownership,
this database is the most tested for use with OpenStack and is heavily
documented. We deviate from the default database, *SQLite*, because
SQLite is not an appropriate database for production usage.
The choice of *RabbitMQ* over other
:term:`AMQP <Advanced Message Queuing Protocol (AMQP)>` compatible options
that are gaining support in OpenStack, such as ZeroMQ and Qpid, is due to its
ease of use and significant testing in production. It also is the only
option that supports features such as Compute cells. We recommend
clustering with RabbitMQ, as it is an integral component of the system
and fairly simple to implement due to its inbuilt nature.
As discussed in previous chapters, there are several options for
networking in OpenStack Compute. We recommend *FlatDHCP* and to use
*Multi-Host* networking mode for high availability, running one
``nova-network`` daemon per OpenStack compute host. This provides a
robust mechanism for ensuring network interruptions are isolated to
individual compute hosts, and allows for the direct use of hardware
network gateways.
*Live Migration* is supported by way of shared storage, with *NFS* as
the distributed file system.
Acknowledging that many small-scale deployments see running Object
Storage just for the storage of virtual machine images as too costly, we
opted for the file back end in the OpenStack :term:`Image service` (Glance).
If your cloud will include Object Storage, you can easily add it as a back
end.
We chose the *SQL back end for Identity* over others, such as LDAP. This
back end is simple to install and is robust. The authors acknowledge
that many installations want to bind with existing directory services
and caution careful understanding of the `array of options available
<http://docs.openstack.org/havana/config-reference/content/ch_configuring-openstack-identity.html#configuring-keystone-for-ldap-backend>`_.
Block Storage (cinder) is installed natively on external storage nodes
and uses the *LVM/iSCSI plug-in*. Most Block Storage plug-ins are tied
to particular vendor products and implementations limiting their use to
consumers of those hardware platforms, but LVM/iSCSI is robust and
stable on commodity hardware.
While the cloud can be run without the *OpenStack Dashboard*, we
consider it to be indispensable, not just for user interaction with the
cloud, but also as a tool for operators. Additionally, the dashboard's
use of Django makes it a flexible framework for extension.
Why not use OpenStack Networking?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This example architecture does not use OpenStack Networking, because it
does not yet support multi-host networking and our organizations
(university, government) have access to a large range of
publicly-accessible IPv4 addresses.
Why use multi-host networking?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In a default OpenStack deployment, there is a single ``nova-network``
service that runs within the cloud (usually on the cloud controller)
that provides services such as
:term:`network address translation <NAT>` (NAT), :term:`DHCP`,
and :term:`DNS` to the guest instances. If the single node that runs the
``nova-network`` service goes down, you cannot access your instances,
and the instances cannot access the Internet. The single node that runs
the ``nova-network`` service can become a bottleneck if excessive
network traffic comes in and goes out of the cloud.
.. note::
`Multi-host <http://docs.openstack.org/havana/install-guide/install/apt/content/nova-network.html>`_
is a high-availability option for the network configuration, where
the ``nova-network`` service is run on every compute node instead of
running on only a single node.
Detailed Description
--------------------
The reference architecture consists of multiple compute nodes, a cloud
controller, an external NFS storage server for instance storage, and an
OpenStack Block Storage server for volume storage.legacy networking
(nova) detailed description A network time service (:term:`Network Time
Protocol <NTP>`, or NTP) synchronizes time on all the nodes. FlatDHCPManager in
multi-host mode is used for the networking. A logical diagram for this
example architecture shows which services are running on each node:
.. image:: figures/osog_01in01.png
:width: 100%
|
The cloud controller runs the dashboard, the API services, the database
(MySQL), a message queue server (RabbitMQ), the scheduler for choosing
compute resources (``nova-scheduler``), Identity services (keystone,
``nova-consoleauth``), Image services (``glance-api``,
``glance-registry``), services for console access of guests, and Block
Storage services, including the scheduler for storage resources
(``cinder-api`` and ``cinder-scheduler``).
Compute nodes are where the computing resources are held, and in our
example architecture, they run the hypervisor (KVM), libvirt (the driver
for the hypervisor, which enables live migration from node to node),
``nova-compute``, ``nova-api-metadata`` (generally only used when
running in multi-host mode, it retrieves instance-specific metadata),
``nova-vncproxy``, and ``nova-network``.
The network consists of two switches, one for the management or private
traffic, and one that covers public access, including floating IPs. To
support this, the cloud controller and the compute nodes have two
network cards. The OpenStack Block Storage and NFS storage servers only
need to access the private network and therefore only need one network
card, but multiple cards run in a bonded configuration are recommended
if possible. Floating IP access is direct to the Internet, whereas Flat
IP access goes through a NAT. To envision the network traffic, use this
diagram:
.. image:: figures/osog_01in02.png
:width: 100%
|
Optional Extensions
-------------------
You can extend this reference architecture aslegacy networking (nova)
optional extensions follows:
- Add additional cloud controllers (see :doc:`ops_maintenance`).
- Add an OpenStack Storage service (see the Object Storage chapter in
the *OpenStack Installation Guide* for your distribution).
- Add additional OpenStack Block Storage hosts (see
:doc:`ops_maintenance`).

View File

@ -0,0 +1,11 @@
=========================================
Parting Thoughts on Architecture Examples
=========================================
With so many considerations and options available, our hope is to
provide a few clearly-marked and tested paths for your OpenStack
exploration. If you're looking for additional ideas, check out
:doc:`app_usecases`, the
`OpenStack Installation Guides <http://docs.openstack.org/#install-guides>`_, or the
`OpenStack User Stories
page <http://www.openstack.org/user-stories/>`_.

View File

@ -0,0 +1,30 @@
=====================
Architecture Examples
=====================
To understand the possibilities that OpenStack offers, it's best to
start with basic architecture that has been tested in production
environments. We offer two examples with basic pivots on the base
operating system (Ubuntu and Red Hat Enterprise Linux) and the
networking architecture. There are other differences between these two
examples and this guide provides reasons for each choice made.
Because OpenStack is highly configurable, with many different back ends
and network configuration options, it is difficult to write
documentation that covers all possible OpenStack deployments. Therefore,
this guide defines examples of architecture to simplify the task of
documenting, as well as to provide the scope for this guide. Both of the
offered architecture examples are currently running in production and
serving users.
.. note::
As always, refer to the :doc:`common/glossary` if you are unclear
about any of the terminology mentioned in architecture examples.
.. toctree::
:maxdepth: 2
arch_example_nova_network.rst
arch_example_neutron.rst
arch_example_thoughts.rst

View File

@ -0,0 +1,290 @@
==============
Network Design
==============
OpenStack provides a rich networking environment, and this chapter
details the requirements and options to deliberate when designing your
cloud.
.. warning::
If this is the first time you are deploying a cloud infrastructure
in your organization, after reading this section, your first
conversations should be with your networking team. Network usage in
a running cloud is vastly different from traditional network
deployments and has the potential to be disruptive at both a
connectivity and a policy level.
For example, you must plan the number of IP addresses that you need for
both your guest instances as well as management infrastructure.
Additionally, you must research and discuss cloud network connectivity
through proxy servers and firewalls.
In this chapter, we'll give some examples of network implementations to
consider and provide information about some of the network layouts that
OpenStack uses. Finally, we have some brief notes on the networking
services that are essential for stable operation.
Management Network
~~~~~~~~~~~~~~~~~~
A :term:`management network` (a separate network for use by your cloud
operators) typically consists of a separate switch and separate NICs
(network interface cards), and is a recommended option. This segregation
prevents system administration and the monitoring of system access from
being disrupted by traffic generated by guests.
Consider creating other private networks for communication between
internal components of OpenStack, such as the message queue and
OpenStack Compute. Using a virtual local area network (VLAN) works well
for these scenarios because it provides a method for creating multiple
virtual networks on a physical network.
Public Addressing Options
~~~~~~~~~~~~~~~~~~~~~~~~~
There are two main types of IP addresses for guest virtual machines:
fixed IPs and floating IPs. Fixed IPs are assigned to instances on boot,
whereas floating IP addresses can change their association between
instances by action of the user. Both types of IP addresses can be
either public or private, depending on your use case.
Fixed IP addresses are required, whereas it is possible to run OpenStack
without floating IPs. One of the most common use cases for floating IPs
is to provide public IP addresses to a private cloud, where there are a
limited number of IP addresses available. Another is for a public cloud
user to have a "static" IP address that can be reassigned when an
instance is upgraded or moved.
Fixed IP addresses can be private for private clouds, or public for
public clouds. When an instance terminates, its fixed IP is lost. It is
worth noting that newer users of cloud computing may find their
ephemeral nature frustrating.
IP Address Planning
~~~~~~~~~~~~~~~~~~~
An OpenStack installation can potentially have many subnets (ranges of
IP addresses) and different types of services in each. An IP address
plan can assist with a shared understanding of network partition
purposes and scalability. Control services can have public and private
IP addresses, and as noted above, there are a couple of options for an
instance's public addresses.
An IP address plan might be broken down into the following sections:
Subnet router
Packets leaving the subnet go via this address, which could be a
dedicated router or a ``nova-network`` service.
Control services public interfaces
Public access to ``swift-proxy``, ``nova-api``, ``glance-api``, and
horizon come to these addresses, which could be on one side of a
load balancer or pointing at individual machines.
Object Storage cluster internal communications
Traffic among object/account/container servers and between these and
the proxy server's internal interface uses this private network.
Compute and storage communications
If ephemeral or block storage is external to the compute node, this
network is used.
Out-of-band remote management
If a dedicated remote access controller chip is included in servers,
often these are on a separate network.
In-band remote management
Often, an extra (such as 1 GB) interface on compute or storage nodes
is used for system administrators or monitoring tools to access the
host instead of going through the public interface.
Spare space for future growth
Adding more public-facing control services or guest instance IPs
should always be part of your plan.
For example, take a deployment that has both OpenStack Compute and
Object Storage, with private ranges 172.22.42.0/24 and 172.22.87.0/26
available. One way to segregate the space might be as follows:
::
172.22.42.0/24:
172.22.42.1 - 172.22.42.3 - subnet routers
172.22.42.4 - 172.22.42.20 - spare for networks
172.22.42.21 - 172.22.42.104 - Compute node remote access controllers
(inc spare)
172.22.42.105 - 172.22.42.188 - Compute node management interfaces (inc spare)
172.22.42.189 - 172.22.42.208 - Swift proxy remote access controllers
(inc spare)
172.22.42.209 - 172.22.42.228 - Swift proxy management interfaces (inc spare)
172.22.42.229 - 172.22.42.252 - Swift storage servers remote access controllers
(inc spare)
172.22.42.253 - 172.22.42.254 - spare
172.22.87.0/26:
172.22.87.1 - 172.22.87.3 - subnet routers
172.22.87.4 - 172.22.87.24 - Swift proxy server internal interfaces
(inc spare)
172.22.87.25 - 172.22.87.63 - Swift object server internal interfaces
(inc spare)
A similar approach can be taken with public IP addresses, taking note
that large, flat ranges are preferred for use with guest instance IPs.
Take into account that for some OpenStack networking options, a public
IP address in the range of a guest instance public IP address is
assigned to the ``nova-compute`` host.
Network Topology
~~~~~~~~~~~~~~~~
OpenStack Compute with ``nova-network`` provides predefined network
deployment models, each with its own strengths and weaknesses. The
selection of a network manager changes your network topology, so the
choice should be made carefully. You also have a choice between the
tried-and-true legacy ``nova-network`` settings or the neutron project
for OpenStack Networking. Both offer networking for launched instances
with different implementations and requirements.
For OpenStack Networking with the neutron project, typical
configurations are documented with the idea that any setup you can
configure with real hardware you can re-create with a software-defined
equivalent. Each tenant can contain typical network elements such as
routers, and services such as :term:`DHCP`.
The following table describes the networking deployment options for both
legacy ``nova-network`` options and an equivalent neutron
configuration.
.. list-table:: Networking deployment options
:widths: 25 25 25 25
:header-rows: 1
* - Network deployment model
- Strengths
- Weaknesses
- Neutron equivalent
* - Flat
- Extremely simple topology. No DHCP overhead.
- Requires file injection into the instance to configure network
interfaces.
- Configure a single bridge as the integration bridge (br-int) and
connect it to a physical network interface with the Modular Layer 2
(ML2) plug-in, which uses Open vSwitch by default.
* - FlatDHCP
- Relatively simple to deploy. Standard networking. Works with all guest
operating systems.
- Requires its own DHCP broadcast domain.
- Configure DHCP agents and routing agents. Network Address Translation
(NAT) performed outside of compute nodes, typically on one or more
network nodes.
* - VlanManager
- Each tenant is isolated to its own VLANs.
- More complex to set up. Requires its own DHCP broadcast domain.
Requires many VLANs to be trunked onto a single port. Standard VLAN
number limitation. Switches must support 802.1q VLAN tagging.
- Isolated tenant networks implement some form of isolation of layer 2
traffic between distinct networks. VLAN tagging is key concept, where
traffic is “tagged” with an ordinal identifier for the VLAN. Isolated
network implementations may or may not include additional services like
DHCP, NAT, and routing.
* - FlatDHCP Multi-host with high availability (HA)
- Networking failure is isolated to the VMs running on the affected
hypervisor. DHCP traffic can be isolated within an individual host.
Network traffic is distributed to the compute nodes.
- More complex to set up. Compute nodes typically need IP addresses
accessible by external networks. Options must be carefully configured
for live migration to work with networking services.
- Configure neutron with multiple DHCP and layer-3 agents. Network nodes
are not able to failover to each other, so the controller runs
networking services, such as DHCP. Compute nodes run the ML2 plug-in
with support for agents such as Open vSwitch or Linux Bridge.
Both ``nova-network`` and neutron services provide similar capabilities,
such as VLAN between VMs. You also can provide multiple NICs on VMs with
either service. Further discussion follows.
VLAN Configuration Within OpenStack VMs
---------------------------------------
VLAN configuration can be as simple or as complicated as desired. The
use of VLANs has the benefit of allowing each project its own subnet and
broadcast segregation from other projects. To allow OpenStack to
efficiently use VLANs, you must allocate a VLAN range (one for each
project) and turn each compute node switch port into a trunk
port.
For example, if you estimate that your cloud must support a maximum of
100 projects, pick a free VLAN range that your network infrastructure is
currently not using (such as VLAN 200299). You must configure OpenStack
with this range and also configure your switch ports to allow VLAN
traffic from that range.
Multi-NIC Provisioning
----------------------
OpenStack Networking with ``neutron`` and OpenStack Compute with
``nova-network`` have the ability to assign multiple NICs to instances. For
``nova-network`` this can be done on a per-request basis, with each
additional NIC using up an entire subnet or VLAN, reducing the total
number of supported projects.
Multi-Host and Single-Host Networking
-------------------------------------
The ``nova-network`` service has the ability to operate in a multi-host
or single-host mode. Multi-host is when each compute node runs a copy of
``nova-network`` and the instances on that compute node use the compute
node as a gateway to the Internet. The compute nodes also host the
floating IPs and security groups for instances on that node. Single-host
is when a central server—for example, the cloud controller—runs the
``nova-network`` service. All compute nodes forward traffic from the
instances to the cloud controller. The cloud controller then forwards
traffic to the Internet. The cloud controller hosts the floating IPs and
security groups for all instances on all compute nodes in the
cloud.
There are benefits to both modes. Single-node has the downside of a
single point of failure. If the cloud controller is not available,
instances cannot communicate on the network. This is not true with
multi-host, but multi-host requires that each compute node has a public
IP address to communicate on the Internet. If you are not able to obtain
a significant block of public IP addresses, multi-host might not be an
option.
Services for Networking
~~~~~~~~~~~~~~~~~~~~~~~
OpenStack, like any network application, has a number of standard
considerations to apply, such as NTP and DNS.
NTP
---
Time synchronization is a critical element to ensure continued operation
of OpenStack components. Correct time is necessary to avoid errors in
instance scheduling, replication of objects in the object store, and
even matching log timestamps for debugging.
All servers running OpenStack components should be able to access an
appropriate NTP server. You may decide to set up one locally or use the
public pools available from the `Network Time Protocol
project <http://www.pool.ntp.org/en/>`_.
DNS
---
OpenStack does not currently provide DNS services, aside from the
dnsmasq daemon, which resides on ``nova-network`` hosts. You could
consider providing a dynamic DNS service to allow instances to update a
DNS entry with new IP addresses. You can also consider making a generic
forward and reverse DNS mapping for instances' IP addresses, such as
vm-203-0-113-123.example.com.
Conclusion
~~~~~~~~~~
Armed with your IP address layout and numbers and knowledge about the
topologies and services you can use, it's now time to prepare the
network for your installation. Be sure to also check out the `OpenStack
Security Guide <http://docs.openstack.org/sec/>`_ for tips on securing
your network. We wish you a good relationship with your networking team!

View File

@ -0,0 +1,252 @@
===========================
Provisioning and Deployment
===========================
A critical part of a cloud's scalability is the amount of effort that it
takes to run your cloud. To minimize the operational cost of running
your cloud, set up and use an automated deployment and configuration
infrastructure with a configuration management system, such as :term:`Puppet`
or :term:`Chef`. Combined, these systems greatly reduce manual effort and the
chance for operator error.
This infrastructure includes systems to automatically install the
operating system's initial configuration and later coordinate the
configuration of all services automatically and centrally, which reduces
both manual effort and the chance for error. Examples include Ansible,
CFEngine, Chef, Puppet, and Salt. You can even use OpenStack to deploy
OpenStack, named TripleO (OpenStack On OpenStack).
Automated Deployment
~~~~~~~~~~~~~~~~~~~~
An automated deployment system installs and configures operating systems
on new servers, without intervention, after the absolute minimum amount
of manual work, including physical racking, MAC-to-IP assignment, and
power configuration. Typically, solutions rely on wrappers around PXE
boot and TFTP servers for the basic operating system install and then
hand off to an automated configuration management system.
Both Ubuntu and Red Hat Enterprise Linux include mechanisms for
configuring the operating system, including preseed and kickstart, that
you can use after a network boot. Typically, these are used to bootstrap
an automated configuration system. Alternatively, you can use an
image-based approach for deploying the operating system, such as
systemimager. You can use both approaches with a virtualized
infrastructure, such as when you run VMs to separate your control
services and physical infrastructure.
When you create a deployment plan, focus on a few vital areas because
they are very hard to modify post deployment. The next two sections talk
about configurations for:
- Disk partitioning and disk array setup for scalability
- Networking configuration just for PXE booting
Disk Partitioning and RAID
--------------------------
At the very base of any operating system are the hard drives on which
the operating system (OS) is installed.
You must complete the following configurations on the server's hard
drives:
- Partitioning, which provides greater flexibility for layout of
operating system and swap space, as described below.
- Adding to a RAID array (RAID stands for redundant array of
independent disks), based on the number of disks you have available,
so that you can add capacity as your cloud grows. Some options are
described in more detail below.
The simplest option to get started is to use one hard drive with two
partitions:
- File system to store files and directories, where all the data lives,
including the root partition that starts and runs the system.
- Swap space to free up memory for processes, as an independent area of
the physical disk used only for swapping and nothing else.
RAID is not used in this simplistic one-drive setup because generally
for production clouds, you want to ensure that if one disk fails,
another can take its place. Instead, for production, use more than one
disk. The number of disks determine what types of RAID arrays to build.
We recommend that you choose one of the following multiple disk options:
Option 1
Partition all drives in the same way in a horizontal fashion, as
shown in :ref:`partition_setup`.
With this option, you can assign different partitions to different
RAID arrays. You can allocate partition 1 of disk one and two to the
``/boot`` partition mirror. You can make partition 2 of all disks
the root partition mirror. You can use partition 3 of all disks for
a ``cinder-volumes`` LVM partition running on a RAID 10 array.
.. _partition_setup:
.. figure:: figures/osog_0201.png
Figure. Partition setup of drives
While you might end up with unused partitions, such as partition 1
in disk three and four of this example, this option allows for
maximum utilization of disk space. I/O performance might be an issue
as a result of all disks being used for all tasks.
Option 2
Add all raw disks to one large RAID array, either hardware or
software based. You can partition this large array with the boot,
root, swap, and LVM areas. This option is simple to implement and
uses all partitions. However, disk I/O might suffer.
Option 3
Dedicate entire disks to certain partitions. For example, you could
allocate disk one and two entirely to the boot, root, and swap
partitions under a RAID 1 mirror. Then, allocate disk three and four
entirely to the LVM partition, also under a RAID 1 mirror. Disk I/O
should be better because I/O is focused on dedicated tasks. However,
the LVM partition is much smaller.
.. note::
You may find that you can automate the partitioning itself. For
example, MIT uses `Fully Automatic Installation
(FAI) <http://fai-project.org/>`_ to do the initial PXE-based
partition and then install using a combination of min/max and
percentage-based partitioning.
As with most architecture choices, the right answer depends on your
environment. If you are using existing hardware, you know the disk
density of your servers and can determine some decisions based on the
options above. If you are going through a procurement process, your
user's requirements also help you determine hardware purchases. Here are
some examples from a private cloud providing web developers custom
environments at AT&T. This example is from a specific deployment, so
your existing hardware or procurement opportunity may vary from this.
AT&T uses three types of hardware in its deployment:
- Hardware for controller nodes, used for all stateless OpenStack API
services. About 3264 GB memory, small attached disk, one processor,
varied number of cores, such as 612.
- Hardware for compute nodes. Typically 256 or 144 GB memory, two
processors, 24 cores. 46 TB direct attached storage, typically in a
RAID 5 configuration.
- Hardware for storage nodes. Typically for these, the disk space is
optimized for the lowest cost per GB of storage while maintaining
rack-space efficiency.
Again, the right answer depends on your environment. You have to make
your decision based on the trade-offs between space utilization,
simplicity, and I/O performance.
Network Configuration
---------------------
Network configuration is a very large topic that spans multiple areas of
this book. For now, make sure that your servers can PXE boot and
successfully communicate with the deployment server.
For example, you usually cannot configure NICs for VLANs when PXE
booting. Additionally, you usually cannot PXE boot with bonded NICs. If
you run into this scenario, consider using a simple 1 GB switch in a
private network on which only your cloud communicates.
Automated Configuration
~~~~~~~~~~~~~~~~~~~~~~~
The purpose of automatic configuration management is to establish and
maintain the consistency of a system without using human intervention.
You want to maintain consistency in your deployments so that you can
have the same cloud every time, repeatably. Proper use of automatic
configuration-management tools ensures that components of the cloud
systems are in particular states, in addition to simplifying deployment,
and configuration change propagation.
These tools also make it possible to test and roll back changes, as they
are fully repeatable. Conveniently, a large body of work has been done
by the OpenStack community in this space. Puppet, a configuration
management tool, even provides official modules for OpenStack projects
in an OpenStack infrastructure system known as `Puppet
OpenStack <https://wiki.openstack.org/wiki/Puppet>`_. Chef
configuration management is provided within
https://git.openstack.org/cgit/openstack/openstack-chef-repo. Additional
configuration management systems include Juju, Ansible, and Salt. Also,
PackStack is a command-line utility for Red Hat Enterprise Linux and
derivatives that uses Puppet modules to support rapid deployment of
OpenStack on existing servers over an SSH connection.
An integral part of a configuration-management system is the item that
it controls. You should carefully consider all of the items that you
want, or do not want, to be automatically managed. For example, you may
not want to automatically format hard drives with user data.
Remote Management
~~~~~~~~~~~~~~~~~
In our experience, most operators don't sit right next to the servers
running the cloud, and many don't necessarily enjoy visiting the data
center. OpenStack should be entirely remotely configurable, but
sometimes not everything goes according to plan.
In this instance, having an out-of-band access into nodes running
OpenStack components is a boon. The IPMI protocol is the de facto
standard here, and acquiring hardware that supports it is highly
recommended to achieve that lights-out data center aim.
In addition, consider remote power control as well. While IPMI usually
controls the server's power state, having remote access to the PDU that
the server is plugged into can really be useful for situations when
everything seems wedged.
Parting Thoughts for Provisioning and Deploying OpenStack
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can save time by understanding the use cases for the cloud you want
to create. Use cases for OpenStack are varied. Some include object
storage only; others require preconfigured compute resources to speed
development-environment set up; and others need fast provisioning of
compute resources that are already secured per tenant with private
networks. Your users may have need for highly redundant servers to make
sure their legacy applications continue to run. Perhaps a goal would be
to architect these legacy applications so that they run on multiple
instances in a cloudy, fault-tolerant way, but not make it a goal to add
to those clusters over time. Your users may indicate that they need
scaling considerations because of heavy Windows server use.
You can save resources by looking at the best fit for the hardware you
have in place already. You might have some high-density storage hardware
available. You could format and repurpose those servers for OpenStack
Object Storage. All of these considerations and input from users help
you build your use case and your deployment plan.
.. note::
For further research about OpenStack deployment, investigate the
supported and documented preconfigured, prepackaged installers for
OpenStack from companies such as
`Canonical <http://www.ubuntu.com/cloud/ubuntu-openstack>`_,
`Cisco <http://www.cisco.com/web/solutions/openstack/index.html>`_,
`Cloudscaling <http://www.cloudscaling.com/>`_,
`IBM <http://www-03.ibm.com/software/products/en/smartcloud-orchestrator/>`_,
`Metacloud <http://www.metacloud.com/>`_,
`Mirantis <http://www.mirantis.com/>`_,
`Piston <http://www.pistoncloud.com/>`_,
`Rackspace <http://www.rackspace.com/cloud/private/>`_, `Red
Hat <http://www.redhat.com/openstack/>`_,
`SUSE <https://www.suse.com/products/suse-cloud/>`_, and
`SwiftStack <https://www.swiftstack.com/>`_.
Conclusion
~~~~~~~~~~
The decisions you make with respect to provisioning and deployment will
affect your day-to-day, week-to-week, and month-to-month maintenance of
the cloud. Your configuration management will be able to evolve over
time. However, more thought and design need to be done for upfront
choices about deployment, disk partitioning, and network configuration.

View File

@ -0,0 +1,427 @@
=======
Scaling
=======
Whereas traditional applications required larger hardware to scale
("vertical scaling"), cloud-based applications typically request more,
discrete hardware ("horizontal scaling"). If your cloud is successful,
eventually you must add resources to meet the increasing demand.
To suit the cloud paradigm, OpenStack itself is designed to be
horizontally scalable. Rather than switching to larger servers, you
procure more servers and simply install identically configured services.
Ideally, you scale out and load balance among groups of functionally
identical services (for example, compute nodes or ``nova-api`` nodes),
that communicate on a message bus.
The Starting Point
~~~~~~~~~~~~~~~~~~
Determining the scalability of your cloud and how to improve it is an
exercise with many variables to balance. No one solution meets
everyone's scalability goals. However, it is helpful to track a number
of metrics. Since you can define virtual hardware templates, called
"flavors" in OpenStack, you can start to make scaling decisions based on
the flavors you'll provide. These templates define sizes for memory in
RAM, root disk size, amount of ephemeral data disk space available, and
number of cores for starters.
The default OpenStack flavors are shown in the following table.
.. list-table:: OpenStack default flavors
:widths: 20 20 20 20 20
:header-rows: 1
* - Name
- Virtual cores
- Memory
- Disk
- Ephemeral
* - m1.tiny
- 1
- 512 MB
- 1 GB
- 0 GB
* - m1.small
- 1
- 2 GB
- 10 GB
- 20 GB
* - m1.medium
- 2
- 4 GB
- 10 GB
- 40 GB
* - m1.large
- 4
- 8 GB
- 10 GB
- 80 GB
* - m1.xlarge
- 8
- 16 GB
- 10 GB
- 160 GB
The starting point for most is the core count of your cloud. By applying
some ratios, you can gather information about:
- The number of virtual machines (VMs) you expect to run,
``((overcommit fraction × cores) / virtual cores per instance)``
- How much storage is required ``(flavor disk size × number of instances)``
You can use these ratios to determine how much additional infrastructure
you need to support your cloud.
Here is an example using the ratios for gathering scalability
information for the number of VMs expected as well as the storage
needed. The following numbers support (200 / 2) × 16 = 1600 VM instances
and require 80 TB of storage for ``/var/lib/nova/instances``:
- 200 physical cores.
- Most instances are size m1.medium (two virtual cores, 50 GB of
storage).
- Default CPU overcommit ratio (``cpu_allocation_ratio`` in nova.conf)
of 16:1.
.. note::
Regardless of the overcommit ratio, an instance can not be placed
on any physical node with fewer raw (pre-overcommit) resources than
instance flavor requires.
However, you need more than the core count alone to estimate the load
that the API services, database servers, and queue servers are likely to
encounter. You must also consider the usage patterns of your cloud.
As a specific example, compare a cloud that supports a managed
web-hosting platform with one running integration tests for a
development project that creates one VM per code commit. In the former,
the heavy work of creating a VM happens only every few months, whereas
the latter puts constant heavy load on the cloud controller. You must
consider your average VM lifetime, as a larger number generally means
less load on the cloud controller.
Aside from the creation and termination of VMs, you must consider the
impact of users accessing the service—particularly on ``nova-api`` and
its associated database. Listing instances garners a great deal of
information and, given the frequency with which users run this
operation, a cloud with a large number of users can increase the load
significantly. This can occur even without their knowledge—leaving the
OpenStack dashboard instances tab open in the browser refreshes the list
of VMs every 30 seconds.
After you consider these factors, you can determine how many cloud
controller cores you require. A typical eight core, 8 GB of RAM server
is sufficient for up to a rack of compute nodes — given the above
caveats.
You must also consider key hardware specifications for the performance
of user VMs, as well as budget and performance needs, including storage
performance (spindles/core), memory availability (RAM/core), network
bandwidthbandwidth hardware specifications and (Gbps/core), and overall
CPU performance (CPU/core).
.. note::
For a discussion of metric tracking, including how to extract
metrics from your cloud, see :doc:`ops_logging_monitoring`.
Adding Cloud Controller Nodes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can facilitate the horizontal expansion of your cloud by adding
nodes. Adding compute nodes is straightforward—they are easily picked up
by the existing installation. However, you must consider some important
points when you design your cluster to be highly available.
Recall that a cloud controller node runs several different services. You
can install services that communicate only using the message queue
internally—\ ``nova-scheduler`` and ``nova-console``—on a new server for
expansion. However, other integral parts require more care.
You should load balance user-facing services such as dashboard,
``nova-api``, or the Object Storage proxy. Use any standard HTTP
load-balancing method (DNS round robin, hardware load balancer, or
software such as Pound or HAProxy). One caveat with dashboard is the VNC
proxy, which uses the WebSocket protocol—something that an L7 load
balancer might struggle with. See also `Horizon session storage
<http://docs.openstack.org/developer/horizon/topics/deployment.html#session-storage>`_.
You can configure some services, such as ``nova-api`` and
``glance-api``, to use multiple processes by changing a flag in their
configuration file—allowing them to share work between multiple cores on
the one machine.
.. note::
Several options are available for MySQL load balancing, and the
supported AMQP brokers have built-in clustering support. Information
on how to configure these and many of the other services can be
found in :doc:`operations`.
Segregating Your Cloud
~~~~~~~~~~~~~~~~~~~~~~
When you want to offer users different regions to provide legal
considerations for data storage, redundancy across earthquake fault
lines, or for low-latency API calls, you segregate your cloud. Use one
of the following OpenStack methods to segregate your cloud: *cells*,
*regions*, *availability zones*, or *host aggregates*.
Each method provides different functionality and can be best divided
into two groups:
- Cells and regions, which segregate an entire cloud and result in
running separate Compute deployments.
- :term:`Availability zones <availability zone>` and host aggregates, which
merely divide a single Compute deployment.
The table below provides a comparison view of each segregation method currently
provided by OpenStack Compute.
.. list-table:: OpenStack segregation methods
:widths: 20 20 20 20 20
:header-rows: 1
* -
- Cells
- Regions
- Availability zones
- Host aggregates
* - **Use when you need**
- A single :term:`API endpoint` for compute, or you require a second
level of scheduling.
- Discrete regions with separate API endpoints and no coordination
between regions.
- Logical separation within your nova deployment for physical isolation
or redundancy.
- To schedule a group of hosts with common features.
* - **Example**
- A cloud with multiple sites where you can schedule VMs "anywhere" or on
a particular site.
- A cloud with multiple sites, where you schedule VMs to a particular
site and you want a shared infrastructure.
- A single-site cloud with equipment fed by separate power supplies.
- Scheduling to hosts with trusted hardware support.
* - **Overhead**
- Considered experimental. A new service, nova-cells. Each cell has a full
nova installation except nova-api.
- A different API endpoint for every region. Each region has a full nova
installation.
- Configuration changes to ``nova.conf``.
- Configuration changes to ``nova.conf``.
* - **Shared services**
- Keystone, ``nova-api``
- Keystone
- Keystone, All nova services
- Keystone, All nova services
Cells and Regions
-----------------
OpenStack Compute cells are designed to allow running the cloud in a
distributed fashion without having to use more complicated technologies,
or be invasive to existing nova installations. Hosts in a cloud are
partitioned into groups called *cells*. Cells are configured in a tree.
The top-level cell ("API cell") has a host that runs the ``nova-api``
service, but no ``nova-compute`` services. Each child cell runs all of
the other typical ``nova-*`` services found in a regular installation,
except for the ``nova-api`` service. Each cell has its own message queue
and database service and also runs ``nova-cells``, which manages the
communication between the API cell and child cells.
This allows for a single API server being used to control access to
multiple cloud installations. Introducing a second level of scheduling
(the cell selection), in addition to the regular ``nova-scheduler``
selection of hosts, provides greater flexibility to control where
virtual machines are run.
Unlike having a single API endpoint, regions have a separate API
endpoint per installation, allowing for a more discrete separation.
Users wanting to run instances across sites have to explicitly select a
region. However, the additional complexity of a running a new service is
not required.
The OpenStack dashboard (horizon) can be configured to use multiple
regions. This can be configured through the ``AVAILABLE_REGIONS``
parameter.
Availability Zones and Host Aggregates
--------------------------------------
You can use availability zones, host aggregates, or both to partition a
nova deployment.
Availability zones are implemented through and configured in a similar
way to host aggregates.
However, you use them for different reasons.
Availability zone
~~~~~~~~~~~~~~~~~
This enables you to arrange OpenStack compute hosts into logical groups
and provides a form of physical isolation and redundancy from other
availability zones, such as by using a separate power supply or network
equipment.
You define the availability zone in which a specified compute host
resides locally on each server. An availability zone is commonly used to
identify a set of servers that have a common attribute. For instance, if
some of the racks in your data center are on a separate power source,
you can put servers in those racks in their own availability zone.
Availability zones can also help separate different classes of hardware.
When users provision resources, they can specify from which availability
zone they want their instance to be built. This allows cloud consumers
to ensure that their application resources are spread across disparate
machines to achieve high availability in the event of hardware failure.
Host aggregates zone
~~~~~~~~~~~~~~~~~~~~
This enables you to partition OpenStack Compute deployments into logical
groups for load balancing and instance distribution. You can use host
aggregates to further partition an availability zone. For example, you
might use host aggregates to partition an availability zone into groups
of hosts that either share common resources, such as storage and
network, or have a special property, such as trusted computing
hardware.
A common use of host aggregates is to provide information for use with
the ``nova-scheduler``. For example, you might use a host aggregate to
group a set of hosts that share specific flavors or images.
The general case for this is setting key-value pairs in the aggregate
metadata and matching key-value pairs in flavor's ``extra_specs``
metadata. The ``AggregateInstanceExtraSpecsFilter`` in the filter
scheduler will enforce that instances be scheduled only on hosts in
aggregates that define the same key to the same value.
An advanced use of this general concept allows different flavor types to
run with different CPU and RAM allocation ratios so that high-intensity
computing loads and low-intensity development and testing systems can
share the same cloud without either starving the high-use systems or
wasting resources on low-utilization systems. This works by setting
``metadata`` in your host aggregates and matching ``extra_specs`` in
your flavor types.
The first step is setting the aggregate metadata keys
``cpu_allocation_ratio`` and ``ram_allocation_ratio`` to a
floating-point value. The filter schedulers ``AggregateCoreFilter`` and
``AggregateRamFilter`` will use those values rather than the global
defaults in ``nova.conf`` when scheduling to hosts in the aggregate. It
is important to be cautious when using this feature, since each host can
be in multiple aggregates but should have only one allocation ratio for
each resources. It is up to you to avoid putting a host in multiple
aggregates that define different values for the same resource.
This is the first half of the equation. To get flavor types that are
guaranteed a particular ratio, you must set the ``extra_specs`` in the
flavor type to the key-value pair you want to match in the aggregate.
For example, if you define ``extra_specs`` ``cpu_allocation_ratio`` to
"1.0", then instances of that type will run in aggregates only where the
metadata key ``cpu_allocation_ratio`` is also defined as "1.0." In
practice, it is better to define an additional key-value pair in the
aggregate metadata to match on rather than match directly on
``cpu_allocation_ratio`` or ``core_allocation_ratio``. This allows
better abstraction. For example, by defining a key ``overcommit`` and
setting a value of "high," "medium," or "low," you could then tune the
numeric allocation ratios in the aggregates without also needing to
change all flavor types relating to them.
.. note::
Previously, all services had an availability zone. Currently, only
the ``nova-compute`` service has its own availability zone. Services
such as ``nova-scheduler``, ``nova-network``, and ``nova-conductor``
have always spanned all availability zones.
When you run any of the following operations, the services appear in
their own internal availability zone
(CONF.internal_service_availability_zone):
- :command:`nova host-list` (os-hosts)
- :command:`euca-describe-availability-zones verbose`
- :command:`nova service-list`
The internal availability zone is hidden in
euca-describe-availability_zones (nonverbose).
CONF.node_availability_zone has been renamed to
CONF.default_availability_zone and is used only by the
``nova-api`` and ``nova-scheduler`` services.
CONF.node_availability_zone still works but is deprecated.
Scalable Hardware
~~~~~~~~~~~~~~~~~
While several resources already exist to help with deploying and
installing OpenStack, it's very important to make sure that you have
your deployment planned out ahead of time. This guide presumes that you
have at least set aside a rack for the OpenStack cloud but also offers
suggestions for when and what to scale.
Hardware Procurement
--------------------
“The Cloud” has been described as a volatile environment where servers
can be created and terminated at will. While this may be true, it does
not mean that your servers must be volatile. Ensuring that your cloud's
hardware is stable and configured correctly means that your cloud
environment remains up and running. Basically, put effort into creating
a stable hardware environment so that you can host a cloud that users
may treat as unstable and volatile.
OpenStack can be deployed on any hardware supported by an
OpenStack-compatible Linux distribution.
Hardware does not have to be consistent, but it should at least have the
same type of CPU to support instance migration.
The typical hardware recommended for use with OpenStack is the standard
value-for-money offerings that most hardware vendors stock. It should be
straightforward to divide your procurement into building blocks such as
"compute," "object storage," and "cloud controller," and request as many
of these as you need. Alternatively, should you be unable to spend more,
if you have existing servers—provided they meet your performance
requirements and virtualization technology—they are quite likely to be
able to support OpenStack.
Capacity Planning
-----------------
OpenStack is designed to increase in size in a straightforward manner.
Taking into account the considerations that we've mentioned in this
chapter—particularly on the sizing of the cloud controller—it should be
possible to procure additional compute or object storage nodes as
needed. New nodes do not need to be the same specification, or even
vendor, as existing nodes.
For compute nodes, ``nova-scheduler`` will take care of differences in
sizing having to do with core count and RAM amounts; however, you should
consider that the user experience changes with differing CPU speeds.
When adding object storage nodes, a weight should be specified that
reflects the capability of the node.
Monitoring the resource usage and user growth will enable you to know
when to procure. :doc:`ops_logging_monitoring` details some useful metrics.
Burn-in Testing
---------------
The chances of failure for the server's hardware are high at the start
and the end of its life. As a result, dealing with hardware failures
while in production can be avoided by appropriate burn-in testing to
attempt to trigger the early-stage failures. The general principle is to
stress the hardware to its limits. Examples of burn-in tests include
running a CPU or disk benchmark for several days.

View File

@ -0,0 +1,521 @@
=================
Storage Decisions
=================
Storage is found in many parts of the OpenStack stack, and the differing
types can cause confusion to even experienced cloud engineers. This
section focuses on persistent storage options you can configure with
your cloud. It's important to understand the distinction between
:term:`ephemeral <ephemeral volume>` storage and
:term:`persistent <persistent volume>` storage.
Ephemeral Storage
~~~~~~~~~~~~~~~~~
If you deploy only the OpenStack :term:`Compute service` (nova), your users do
not have access to any form of persistent storage by default. The disks
associated with VMs are "ephemeral," meaning that (from the user's point
of view) they effectively disappear when a virtual machine is
terminated.
Persistent Storage
~~~~~~~~~~~~~~~~~~
Persistent storage means that the storage resource outlives any other
resource and is always available, regardless of the state of a running
instance.
Today, OpenStack clouds explicitly support three types of persistent
storage: *object storage*, *block storage*, and *file system storage*.
Object Storage
--------------
With object storage, users access binary objects through a REST API. You
may be familiar with Amazon S3, which is a well-known example of an
object storage system. Object storage is implemented in OpenStack by the
OpenStack Object Storage (swift) project. If your intended users need to
archive or manage large datasets, you want to provide them with object
storage. In addition, OpenStack can store your virtual machine (VM)
images inside of an object storage system, as an alternative to storing
the images on a file system.
OpenStack Object Storage provides a highly scalable, highly available
storage solution by relaxing some of the constraints of traditional file
systems. In designing and procuring for such a cluster, it is important
to understand some key concepts about its operation. Essentially, this
type of storage is built on the idea that all storage hardware fails, at
every level, at some point. Infrequently encountered failures that would
hamstring other storage systems, such as issues taking down RAID cards
or entire servers, are handled gracefully with OpenStack Object
Storage.
A good document describing the Object Storage architecture is found
within the `developer
documentation <http://docs.openstack.org/developer/swift/overview_architecture.html>`_
— read this first. Once you understand the architecture, you should know what a
proxy server does and how zones work. However, some important points are
often missed at first glance.
When designing your cluster, you must consider durability and
availability. Understand that the predominant source of these is the
spread and placement of your data, rather than the reliability of the
hardware. Consider the default value of the number of replicas, which is
three. This means that before an object is marked as having been
written, at least two copies exist—in case a single server fails to
write, the third copy may or may not yet exist when the write operation
initially returns. Altering this number increases the robustness of your
data, but reduces the amount of storage you have available. Next, look
at the placement of your servers. Consider spreading them widely
throughout your data center's network and power-failure zones. Is a zone
a rack, a server, or a disk?
Object Storage's network patterns might seem unfamiliar at first.
Consider these main traffic flows:
- Among :term:`object`, :term:`container`, and
:term:`account servers <account server>`
- Between those servers and the proxies
- Between the proxies and your users
Object Storage is very "chatty" among servers hosting data—even a small
cluster does megabytes/second of traffic, which is predominantly, “Do
you have the object?”/“Yes I have the object!” Of course, if the answer
to the aforementioned question is negative or the request times out,
replication of the object begins.
Consider the scenario where an entire server fails and 24 TB of data
needs to be transferred "immediately" to remain at three copies—this can
put significant load on the network.
Another fact that's often forgotten is that when a new file is being
uploaded, the proxy server must write out as many streams as there are
replicas—giving a multiple of network traffic. For a three-replica
cluster, 10 Gbps in means 30 Gbps out. Combining this with the previous
high bandwidth bandwidth private vs. public network recommendations
demands of replication is what results in the recommendation that your
private network be of significantly higher bandwidth than your public
need be. Oh, and OpenStack Object Storage communicates internally with
unencrypted, unauthenticated rsync for performance—you do want the
private network to be private.
The remaining point on bandwidth is the public-facing portion. The
``swift-proxy`` service is stateless, which means that you can easily
add more and use HTTP load-balancing methods to share bandwidth and
availability between them.
More proxies means more bandwidth, if your storage can keep up.
Block Storage
-------------
Block storage (sometimes referred to as volume storage) provides users
with access to block-storage devices. Users interact with block storage
by attaching volumes to their running VM instances.
These volumes are persistent: they can be detached from one instance and
re-attached to another, and the data remains intact. Block storage is
implemented in OpenStack by the OpenStack Block Storage (cinder)
project, which supports multiple back ends in the form of drivers. Your
choice of a storage back end must be supported by a Block Storage
driver.
Most block storage drivers allow the instance to have direct access to
the underlying storage hardware's block device. This helps increase the
overall read/write IO. However, support for utilizing files as volumes
is also well established, with full support for NFS, GlusterFS and
others.
These drivers work a little differently than a traditional "block"
storage driver. On an NFS or GlusterFS file system, a single file is
created and then mapped as a "virtual" volume into the instance. This
mapping/translation is similar to how OpenStack utilizes QEMU's
file-based virtual machines stored in ``/var/lib/nova/instances``.
Shared File Systems Service
---------------------------
The Shared File Systems service provides a set of services for
management of Shared File Systems in a multi-tenant cloud environment.
Users interact with Shared File Systems service by mounting remote File
Systems on their instances with the following usage of those systems for
file storing and exchange. Shared File Systems service provides you with
shares. A share is a remote, mountable file system. You can mount a
share to and access a share from several hosts by several users at a
time. With shares, user can also:
- Create a share specifying its size, shared file system protocol,
visibility level
- Create a share on either a share server or standalone, depending on
the selected back-end mode, with or without using a share network.
- Specify access rules and security services for existing shares.
- Combine several shares in groups to keep data consistency inside the
groups for the following safe group operations.
- Create a snapshot of a selected share or a share group for storing
the existing shares consistently or creating new shares from that
snapshot in a consistent way
- Create a share from a snapshot.
- Set rate limits and quotas for specific shares and snapshots
- View usage of share resources
- Remove shares.
Like Block Storage, the Shared File Systems service is persistent. It
can be:
- Mounted to any number of client machines.
- Detached from one instance and attached to another without data loss.
During this process the data are safe unless the Shared File Systems
service itself is changed or removed.
Shares are provided by the Shared File Systems service. In OpenStack,
Shared File Systems service is implemented by Shared File System
(manila) project, which supports multiple back-ends in the form of
drivers. The Shared File Systems service can be configured to provision
shares from one or more back-ends. Share servers are, mostly, virtual
machines that export file shares via different protocols such as NFS,
CIFS, GlusterFS, or HDFS.
OpenStack Storage Concepts
~~~~~~~~~~~~~~~~~~~~~~~~~~
The table below explains the different storage concepts provided by OpenStack.
.. list-table:: OpenStack storage
:widths: 20 20 20 20 20
:header-rows: 1
* -
- Ephemeral storage
- Block storage
- Object storage
- Shared File System storage
* - Used to…
- Run operating system and scratch space
- Add additional persistent storage to a virtual machine (VM)
- Store data, including VM images
- Add additional persistent storage to a virtual machine
* - Accessed through…
- A file system
- A block device that can be partitioned, formatted, and mounted
(such as, /dev/vdc)
- The REST API
- A Shared File Systems service share (either manila managed or an
external one registered in manila) that can be partitioned, formatted
and mounted (such as /dev/vdc)
* - Accessible from…
- Within a VM
- Within a VM
- Anywhere
- Within a VM
* - Managed by…
- OpenStack Compute (nova)
- OpenStack Block Storage (cinder)
- OpenStack Object Storage (swift)
- OpenStack Shared File System Storage (manila)
* - Persists until…
- VM is terminated
- Deleted by user
- Deleted by user
- Deleted by user
* - Sizing determined by…
- Administrator configuration of size settings, known as *flavors*
- User specification in initial request
- Amount of available physical storage
- * User specification in initial request
* Requests for extension
* Available user-level quotes
* Limitations applied by Administrator
* - Encryption set by…
- Parameter in nova.conf
- Admin establishing `encrypted volume type
<http://docs.openstack.org/admin-guide/dashboard_manage_volumes.html>`_,
then user selecting encrypted volume
- Not yet available
- Shared File Systems service does not apply any additional encryption
above what the shares back-end storage provides
* - Example of typical usage…
- 10 GB first disk, 30 GB second disk
- 1 TB disk
- 10s of TBs of dataset storage
- Depends completely on the size of back-end storage specified when
a share was being created. In case of thin provisioning it can be
partial space reservation (for more details see
`Capabilities and Extra-Specs
<http://docs.openstack.org/developer/manila/devref/capabilities_and_extra_specs.html?highlight=extra%20specs#common-capabilities>`_
specification)
With file-level storage, users access stored data using the operating
system's file system interface. Most users, if they have used a network
storage solution before, have encountered this form of networked
storage. In the Unix world, the most common form of this is NFS. In the
Windows world, the most common form is called CIFS (previously,
SMB).
OpenStack clouds do not present file-level storage to end users.
However, it is important to consider file-level storage for storing
instances under ``/var/lib/nova/instances`` when designing your cloud,
since you must have a shared file system if you want to support live
migration.
Choosing Storage Back Ends
~~~~~~~~~~~~~~~~~~~~~~~~~~
Users will indicate different needs for their cloud use cases. Some may
need fast access to many objects that do not change often, or want to
set a time-to-live (TTL) value on a file. Others may access only storage
that is mounted with the file system itself, but want it to be
replicated instantly when starting a new instance. For other systems,
ephemeral storage—storage that is released when a VM attached to it is
shut down— is the preferred way. When you select
:term:`storage back ends <storage back end>`,
ask the following questions on behalf of your users:
- Do my users need block storage?
- Do my users need object storage?
- Do I need to support live migration?
- Should my persistent storage drives be contained in my compute nodes,
or should I use external storage?
- What is the platter count I can achieve? Do more spindles result in
better I/O despite network access?
- Which one results in the best cost-performance scenario I'm aiming
for?
- How do I manage the storage operationally?
- How redundant and distributed is the storage? What happens if a
storage node fails? To what extent can it mitigate my data-loss
disaster scenarios?
To deploy your storage by using only commodity hardware, you can use a
number of open-source packages, as shown in the following table.
.. list-table:: Persistent file-based storage support
:widths: 25 25 25 25
:header-rows: 1
* -  
- Object
- Block
- File-level
* - Swift
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-
-  
* - LVM
-
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-  
* - Ceph
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- Experimental
* - Gluster
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
* - NFS
-
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
* - ZFS
-
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-  
* - Sheepdog
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-
.. note::
This list of open source file-level shared storage solutions is not
exhaustive; other open source solutions exist (MooseFS). Your
organization may already have deployed a file-level shared storage
solution that you can use.
**Storage Driver Support**
In addition to the open source technologies, there are a number of
proprietary solutions that are officially supported by OpenStack Block
Storage. They are offered by the following vendors:
- IBM (Storwize family/SVC, XIV)
- NetApp
- Nexenta
- SolidFire
You can find a matrix of the functionality provided by all of the
supported Block Storage drivers on the `OpenStack
wiki <https://wiki.openstack.org/wiki/CinderSupportMatrix>`_.
Also, you need to decide whether you want to support object storage in
your cloud. The two common use cases for providing object storage in a
compute cloud are:
- To provide users with a persistent storage mechanism
- As a scalable, reliable data store for virtual machine images
Commodity Storage Back-end Technologies
---------------------------------------
This section provides a high-level overview of the differences among the
different commodity storage back end technologies. Depending on your
cloud user's needs, you can implement one or many of these technologies
in different combinations:
OpenStack Object Storage (swift)
The official OpenStack Object Store implementation. It is a mature
technology that has been used for several years in production by
Rackspace as the technology behind Rackspace Cloud Files. As it is
highly scalable, it is well-suited to managing petabytes of storage.
OpenStack Object Storage's advantages are better integration with
OpenStack (integrates with OpenStack Identity, works with the
OpenStack dashboard interface) and better support for multiple data
center deployment through support of asynchronous eventual
consistency replication.
Therefore, if you eventually plan on distributing your storage
cluster across multiple data centers, if you need unified accounts
for your users for both compute and object storage, or if you want
to control your object storage with the OpenStack dashboard, you
should consider OpenStack Object Storage. More detail can be found
about OpenStack Object Storage in the section below.
Ceph
A scalable storage solution that replicates data across commodity
storage nodes. Ceph was originally developed by one of the founders
of DreamHost and is currently used in production there.
Ceph was designed to expose different types of storage interfaces to
the end user: it supports object storage, block storage, and
file-system interfaces, although the file-system interface is not
yet considered production-ready. Ceph supports the same API as swift
for object storage and can be used as a back end for cinder block
storage as well as back-end storage for glance images. Ceph supports
"thin provisioning," implemented using copy-on-write.
This can be useful when booting from volume because a new volume can
be provisioned very quickly. Ceph also supports keystone-based
authentication (as of version 0.56), so it can be a seamless swap in
for the default OpenStack swift implementation.
Ceph's advantages are that it gives the administrator more
fine-grained control over data distribution and replication
strategies, enables you to consolidate your object and block
storage, enables very fast provisioning of boot-from-volume
instances using thin provisioning, and supports a distributed
file-system interface, though this interface is `not yet
recommended <http://ceph.com/docs/master/cephfs/>`_ for use in
production deployment by the Ceph project.
If you want to manage your object and block storage within a single
system, or if you want to support fast boot-from-volume, you should
consider Ceph.
Gluster
A distributed, shared file system. As of Gluster version 3.3, you
can use Gluster to consolidate your object storage and file storage
into one unified file and object storage solution, which is called
Gluster For OpenStack (GFO). GFO uses a customized version of swift
that enables Gluster to be used as the back-end storage.
The main reason to use GFO rather than regular swift is if you also
want to support a distributed file system, either to support shared
storage live migration or to provide it as a separate service to
your end users. If you want to manage your object and file storage
within a single system, you should consider GFO.
LVM
The Logical Volume Manager is a Linux-based system that provides an
abstraction layer on top of physical disks to expose logical volumes
to the operating system. The LVM back-end implements block storage
as LVM logical partitions.
On each host that will house block storage, an administrator must
initially create a volume group dedicated to Block Storage volumes.
Blocks are created from LVM logical volumes.
.. note::
LVM does *not* provide any replication. Typically,
administrators configure RAID on nodes that use LVM as block
storage to protect against failures of individual hard drives.
However, RAID does not protect against a failure of the entire
host.
ZFS
The Solaris iSCSI driver for OpenStack Block Storage implements
blocks as ZFS entities. ZFS is a file system that also has the
functionality of a volume manager. This is unlike on a Linux system,
where there is a separation of volume manager (LVM) and file system
(such as, ext3, ext4, xfs, and btrfs). ZFS has a number of
advantages over ext4, including improved data-integrity checking.
The ZFS back end for OpenStack Block Storage supports only
Solaris-based systems, such as Illumos. While there is a Linux port
of ZFS, it is not included in any of the standard Linux
distributions, and it has not been tested with OpenStack Block
Storage. As with LVM, ZFS does not provide replication across hosts
on its own; you need to add a replication solution on top of ZFS if
your cloud needs to be able to handle storage-node failures.
We don't recommend ZFS unless you have previous experience with
deploying it, since the ZFS back end for Block Storage requires a
Solaris-based operating system, and we assume that your experience
is primarily with Linux-based systems.
Sheepdog
Sheepdog is a userspace distributed storage system. Sheepdog scales
to several hundred nodes, and has powerful virtual disk management
features like snapshot, cloning, rollback, thin provisioning.
It is essentially an object storage system that manages disks and
aggregates the space and performance of disks linearly in hyper
scale on commodity hardware in a smart way. On top of its object
store, Sheepdog provides elastic volume service and http service.
Sheepdog does not assume anything about kernel version and can work
nicely with xattr-supported file systems.
Conclusion
~~~~~~~~~~
We hope that you now have some considerations in mind and questions to
ask your future cloud users about their storage use cases. As you can
see, your storage decisions will also influence your network design for
performance and security needs. Continue with us to make more informed
decisions about your OpenStack cloud design.

View File

@ -0,0 +1,52 @@
============
Architecture
============
Designing an OpenStack cloud is a great achievement. It requires a
robust understanding of the requirements and needs of the cloud's users
to determine the best possible configuration to meet them. OpenStack
provides a great deal of flexibility to achieve your needs, and this
part of the book aims to shine light on many of the decisions you need
to make during the process.
To design, deploy, and configure OpenStack, administrators must
understand the logical architecture. A diagram can help you envision all
the integrated services within OpenStack and how they interact with each
other.
OpenStack modules are one of the following types:
Daemon
Runs as a background process. On Linux platforms, a daemon is usually
installed as a service.
Script
Installs a virtual environment and runs tests.
Command-line interface (CLI)
Enables users to submit API calls to OpenStack services through commands.
As shown, end users can interact through the dashboard, CLIs, and APIs.
All services authenticate through a common Identity service, and
individual services interact with each other through public APIs, except
where privileged administrator commands are necessary.
:ref:`logical_architecture` shows the most common, but not the only logical
architecture for an OpenStack cloud.
.. _logical_architecture:
.. figure:: figures/osog_0001.png
:width: 100%
Figure. OpenStack Logical Architecture
.. toctree::
:maxdepth: 2
arch_examples.rst
arch_provision.rst
arch_cloud_controller.rst
arch_compute_nodes.rst
arch_scaling.rst
arch_storage.rst
arch_network_design.rst

1
doc/ops-guide/source/common Symbolic link
View File

@ -0,0 +1 @@
../../common

View File

@ -0,0 +1,290 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import os
# import sys
import openstackdocstheme
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
# sys.path.insert(0, os.path.abspath('.'))
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = []
# Add any paths that contain templates here, relative to this directory.
# templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
# source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'Operations Guide'
bug_tag = u'ops-guide'
copyright = u'2016, OpenStack contributors'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '0.0.1'
# The full version, including alpha/beta/rc tags.
release = '0.0.1'
# A few variables have to be set for the log-a-bug feature.
# giturl: The location of conf.py on Git. Must be set manually.
# gitsha: The SHA checksum of the bug description. Automatically extracted from git log.
# bug_tag: Tag for categorizing the bug. Must be set manually.
# These variables are passed to the logabug code via html_context.
giturl = u'http://git.openstack.org/cgit/openstack/openstack-manuals/tree/doc/ops-guide/source'
git_cmd = "/usr/bin/git log | head -n1 | cut -f2 -d' '"
gitsha = os.popen(git_cmd).read().strip('\n')
html_context = {"gitsha": gitsha, "bug_tag": bug_tag,
"giturl": giturl}
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
# language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
# today = ''
# Else, today_fmt is used as the format for a strftime call.
# today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['common/cli*', 'common/nova*',
'common/get_started*', 'common/dashboard*']
# The reST default role (used for this markup: `text`) to use for all
# documents.
# default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
# add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
# add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
# show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
# modindex_common_prefix = []
# If true, keep warnings as "system message" paragraphs in the built documents.
# keep_warnings = False
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'openstackdocs'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
# html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
html_theme_path = [openstackdocstheme.get_html_theme_path()]
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
# html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
# html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
# html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
# html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
# html_static_path = []
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
# html_extra_path = []
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
# So that we can enable "log-a-bug" links from each output HTML page, this
# variable must be set to a format that includes year, month, day, hours and
# minutes.
html_last_updated_fmt = '%Y-%m-%d %H:%M'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
# html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
# html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
# html_additional_pages = {}
# If false, no module index is generated.
# html_domain_indices = True
# If false, no index is generated.
html_use_index = False
# If true, the index is split into individual pages for each letter.
# html_split_index = False
# If true, links to the reST sources are added to the pages.
html_show_sourcelink = False
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
# html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
# html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
# html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
# html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = 'ops-guide'
# If true, publish source files
html_copy_source = False
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
# 'preamble': '',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
('index', 'OpsGuide.tex', u'Operations Guide',
u'OpenStack contributors', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
# latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
# latex_use_parts = False
# If true, show page references after internal links.
# latex_show_pagerefs = False
# If true, show URL addresses after external links.
# latex_show_urls = False
# Documents to append as an appendix to all manuals.
# latex_appendices = []
# If false, no module index is generated.
# latex_domain_indices = True
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
('index', 'opsguide', u'Operations Guide',
[u'OpenStack contributors'], 1)
]
# If true, show URL addresses after external links.
# man_show_urls = False
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
('index', 'OpsGuide', u'Operations Guide',
u'OpenStack contributors', 'OpsGuide',
'This book provides information about designing and operating '
'OpenStack clouds.', 'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
# texinfo_appendices = []
# If false, no module index is generated.
# texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
# texinfo_show_urls = 'footnote'
# If true, do not generate a @detailmenu in the "Top" node's menu.
# texinfo_no_detailmenu = False
# -- Options for Internationalization output ------------------------------
locale_dirs = ['locale/']

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.0 KiB

View File

@ -0,0 +1,60 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://web.resource.org/cc/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="19.21315"
height="18.294994"
id="svg2"
sodipodi:version="0.32"
inkscape:version="0.45"
sodipodi:modified="true"
version="1.0">
<defs
id="defs4" />
<sodipodi:namedview
id="base"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1.0"
gridtolerance="10000"
guidetolerance="10"
objecttolerance="10"
inkscape:pageopacity="0.0"
inkscape:pageshadow="2"
inkscape:zoom="7.9195959"
inkscape:cx="17.757032"
inkscape:cy="7.298821"
inkscape:document-units="px"
inkscape:current-layer="layer1"
inkscape:window-width="984"
inkscape:window-height="852"
inkscape:window-x="148"
inkscape:window-y="66" />
<metadata
id="metadata7">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1"
transform="translate(-192.905,-516.02064)">
<path
style="fill:#000000"
d="M 197.67968,534.31563 C 197.40468,534.31208 196.21788,532.53719 195.04234,530.37143 L 192.905,526.43368 L 193.45901,525.87968 C 193.76371,525.57497 194.58269,525.32567 195.27896,525.32567 L 196.5449,525.32567 L 197.18129,527.33076 L 197.81768,529.33584 L 202.88215,523.79451 C 205.66761,520.74678 208.88522,517.75085 210.03239,517.13691 L 212.11815,516.02064 L 207.90871,520.80282 C 205.59351,523.43302 202.45735,527.55085 200.93947,529.95355 C 199.42159,532.35625 197.95468,534.31919 197.67968,534.31563 z "
id="path2223" />
</g>
</svg>

After

Width:  |  Height:  |  Size: 2.1 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 8.9 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 69 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 20 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 765 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 518 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 182 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

View File

@ -0,0 +1,26 @@
==========================
OpenStack Operations Guide
==========================
Abstract
~~~~~~~~
This book provides information about designing and operating OpenStack clouds.
Contents
~~~~~~~~
.. toctree::
:maxdepth: 2
acknowledgements.rst
preface_ops.rst
architecture.rst
operations.rst
app_usecases.rst
app_crypt.rst
app_roadmaps.rst
app_resources.rst
common/app_support.rst
common/glossary.rst

View File

@ -0,0 +1,41 @@
==========
Operations
==========
Congratulations! By now, you should have a solid design for your cloud.
We now recommend that you turn to the `OpenStack Installation Guides
<http://docs.openstack.org/index.html#install-guides>`_, which contains a
step-by-step guide on how to manually install the OpenStack packages and
dependencies on your cloud.
While it is important for an operator to be familiar with the steps
involved in deploying OpenStack, we also strongly encourage you to
evaluate configuration-management tools, such as :term:`Puppet` or
:term:`Chef`, which can help automate this deployment process.
In the remainder of this guide, we assume that you have successfully
deployed an OpenStack cloud and are able to perform basic operations
such as adding images, booting instances, and attaching volumes.
As your focus turns to stable operations, we recommend that you do skim
the remainder of this book to get a sense of the content. Some of this
content is useful to read in advance so that you can put best practices
into effect to simplify your life in the long run. Other content is more
useful as a reference that you might turn to when an unexpected event
occurs (such as a power failure), or to troubleshoot a particular
problem.
.. toctree::
:maxdepth: 2
ops_lay_of_the_land.rst
ops_projects_users.rst
ops_user_facing_operations.rst
ops_maintenance.rst
ops_network_troubleshooting.rst
ops_logging_monitoring.rst
ops_backup_recovery.rst
ops_customize.rst
ops_upstream.rst
ops_advanced_configuration.rst
ops_upgrades.rst

View File

@ -0,0 +1,163 @@
======================
Advanced Configuration
======================
OpenStack is intended to work well across a variety of installation
flavors, from very small private clouds to large public clouds. To
achieve this, the developers add configuration options to their code
that allow the behavior of the various components to be tweaked
depending on your needs. Unfortunately, it is not possible to cover all
possible deployments with the default configuration values.
At the time of writing, OpenStack has more than 3,000 configuration
options. You can see them documented at the
`OpenStack configuration reference
guide <http://docs.openstack.org/liberty/config-reference/content/config_overview.html>`_.
This chapter cannot hope to document all of these, but we do try to
introduce the important concepts so that you know where to go digging
for more information.
Differences Between Various Drivers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Many OpenStack projects implement a driver layer, and each of these
drivers will implement its own configuration options. For example, in
OpenStack Compute (nova), there are various hypervisor drivers
implemented—libvirt, xenserver, hyper-v, and vmware, for example. Not
all of these hypervisor drivers have the same features, and each has
different tuning requirements.
.. note::
The currently implemented hypervisors are listed on the `OpenStack
documentation
website <http://docs.openstack.org/liberty/config-reference/content/section_compute-hypervisors.html>`_.
You can see a matrix of the various features in OpenStack Compute
(nova) hypervisor drivers on the OpenStack wiki at the `Hypervisor
support matrix
page <http://docs.openstack.org/developer/nova/support-matrix.html>`_.
The point we are trying to make here is that just because an option
exists doesn't mean that option is relevant to your driver choices.
Normally, the documentation notes which drivers the configuration
applies to.
Implementing Periodic Tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Another common concept across various OpenStack projects is that of
periodic tasks. Periodic tasks are much like cron jobs on traditional
Unix systems, but they are run inside an OpenStack process. For example,
when OpenStack Compute (nova) needs to work out what images it can
remove from its local cache, it runs a periodic task to do this.
Periodic tasks are important to understand because of limitations in the
threading model that OpenStack uses. OpenStack uses cooperative
threading in Python, which means that if something long and complicated
is running, it will block other tasks inside that process from running
unless it voluntarily yields execution to another cooperative thread.
A tangible example of this is the ``nova-compute`` process. In order to
manage the image cache with libvirt, ``nova-compute`` has a periodic
process that scans the contents of the image cache. Part of this scan is
calculating a checksum for each of the images and making sure that
checksum matches what ``nova-compute`` expects it to be. However, images
can be very large, and these checksums can take a long time to generate.
At one point, before it was reported as a bug and fixed,
``nova-compute`` would block on this task and stop responding to RPC
requests. This was visible to users as failure of operations such as
spawning or deleting instances.
The take away from this is if you observe an OpenStack process that
appears to "stop" for a while and then continue to process normally, you
should check that periodic tasks aren't the problem. One way to do this
is to disable the periodic tasks by setting their interval to zero.
Additionally, you can configure how often these periodic tasks run—in
some cases, it might make sense to run them at a different frequency
from the default.
The frequency is defined separately for each periodic task. Therefore,
to disable every periodic task in OpenStack Compute (nova), you would
need to set a number of configuration options to zero. The current list
of configuration options you would need to set to zero are:
- ``bandwidth_poll_interval``
- ``sync_power_state_interval``
- ``heal_instance_info_cache_interval``
- ``host_state_interval``
- ``image_cache_manager_interval``
- ``reclaim_instance_interval``
- ``volume_usage_poll_interval``
- ``shelved_poll_interval``
- ``shelved_offload_time``
- ``instance_delete_interval``
To set a configuration option to zero, include a line such as
``image_cache_manager_interval=0`` in your ``nova.conf`` file.
This list will change between releases, so please refer to your
configuration guide for up-to-date information.
Specific Configuration Topics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section covers specific examples of configuration options you might
consider tuning. It is by no means an exhaustive list.
Security Configuration for Compute, Networking, and Storage
-----------------------------------------------------------
The `OpenStack Security Guide <http://docs.openstack.org/sec/>`_
provides a deep dive into securing an OpenStack cloud, including
SSL/TLS, key management, PKI and certificate management, data transport
and privacy concerns, and compliance.
High Availability
-----------------
The `OpenStack High Availability
Guide <http://docs.openstack.org/ha-guide/index.html>`_ offers
suggestions for elimination of a single point of failure that could
cause system downtime. While it is not a completely prescriptive
document, it offers methods and techniques for avoiding downtime and
data loss.
Enabling IPv6 Support
---------------------
You can follow the progress being made on IPV6 support by watching the
`neutron IPv6 Subteam at
work <https://wiki.openstack.org/wiki/Meetings/Neutron-IPv6-Subteam>`_.Liberty
IPv6 supportIPv6, enabling support forconfiguration options IPv6 support
By modifying your configuration setup, you can set up IPv6 when using
``nova-network`` for networking, and a tested setup is documented for
FlatDHCP and a multi-host configuration. The key is to make
``nova-network`` think a ``radvd`` command ran successfully. The entire
configuration is detailed in a Cybera blog post, `“An IPv6 enabled
cloud” <http://www.cybera.ca/news-and-events/tech-radar/an-ipv6-enabled-cloud/>`_.
Geographical Considerations for Object Storage
----------------------------------------------
Support for global clustering of object storage servers is available for
all supported releases. You would implement these global clusters to
ensure replication across geographic areas in case of a natural disaster
and also to ensure that users can write or access their objects more
quickly based on the closest data center. You configure a default region
with one zone for each cluster, but be sure your network (WAN) can
handle the additional request and response load between zones as you add
more zones and build a ring that handles more zones. Refer to
`Geographically Distributed
Clusters <http://docs.openstack.org/developer/swift/admin_guide.html#geographically-distributed-clusters>`_
in the documentation for additional information.

View File

@ -0,0 +1,203 @@
===================
Backup and Recovery
===================
Standard backup best practices apply when creating your OpenStack backup
policy. For example, how often to back up your data is closely related
to how quickly you need to recover from data loss.
.. note::
If you cannot have any data loss at all, you should also focus on a
highly available deployment. The `OpenStack High Availability
Guide <http://docs.openstack.org/ha-guide/index.html>`_ offers
suggestions for elimination of a single point of failure that could
cause system downtime. While it is not a completely prescriptive
document, it offers methods and techniques for avoiding downtime and
data loss.
Other backup considerations include:
- How many backups to keep?
- Should backups be kept off-site?
- How often should backups be tested?
Just as important as a backup policy is a recovery policy (or at least
recovery testing).
What to Back Up
~~~~~~~~~~~~~~~
While OpenStack is composed of many components and moving parts, backing
up the critical data is quite simple.
This chapter describes only how to back up configuration files and
databases that the various OpenStack components need to run. This
chapter does not describe how to back up objects inside Object Storage
or data contained inside Block Storage. Generally these areas are left
for users to back up on their own.
Database Backups
~~~~~~~~~~~~~~~~
The example OpenStack architecture designates the cloud controller as
the MySQL server. This MySQL server hosts the databases for nova,
glance, cinder, and keystone. With all of these databases in one place,
it's very easy to create a database backup:
.. code-block:: console
# mysqldump --opt --all-databases > openstack.sql
If you only want to backup a single database, you can instead run:
.. code-block:: console
# mysqldump --opt nova > nova.sql
where ``nova`` is the database you want to back up.
You can easily automate this process by creating a cron job that runs
the following script once per day:
.. code-block:: bash
#!/bin/bash
backup_dir="/var/lib/backups/mysql"
filename="${backup_dir}/mysql-`hostname`-`eval date +%Y%m%d`.sql.gz"
# Dump the entire MySQL database
/usr/bin/mysqldump --opt --all-databases | gzip > $filename
# Delete backups older than 7 days
find $backup_dir -ctime +7 -type f -delete
This script dumps the entire MySQL database and deletes any backups
older than seven days.
File System Backups
~~~~~~~~~~~~~~~~~~~
This section discusses which files and directories should be backed up
regularly, organized by service.
Compute
-------
The ``/etc/nova`` directory on both the cloud controller and compute
nodes should be regularly backed up.
``/var/log/nova`` does not need to be backed up if you have all logs
going to a central area. It is highly recommended to use a central
logging server or back up the log directory.
``/var/lib/nova`` is another important directory to back up. The
exception to this is the ``/var/lib/nova/instances`` subdirectory on
compute nodes. This subdirectory contains the KVM images of running
instances. You would want to back up this directory only if you need to
maintain backup copies of all instances. Under most circumstances, you
do not need to do this, but this can vary from cloud to cloud and your
service levels. Also be aware that making a backup of a live KVM
instance can cause that instance to not boot properly if it is ever
restored from a backup.
Image Catalog and Delivery
--------------------------
``/etc/glance`` and ``/var/log/glance`` follow the same rules as their
nova counterparts.
``/var/lib/glance`` should also be backed up. Take special notice of
``/var/lib/glance/images``. If you are using a file-based back end of
glance, ``/var/lib/glance/images`` is where the images are stored and
care should be taken.
There are two ways to ensure stability with this directory. The first is
to make sure this directory is run on a RAID array. If a disk fails, the
directory is available. The second way is to use a tool such as rsync to
replicate the images to another server:
.. code-block:: console
# rsync -az --progress /var/lib/glance/images \
backup-server:/var/lib/glance/images/
Identity
--------
``/etc/keystone`` and ``/var/log/keystone`` follow the same rules as
other components.
``/var/lib/keystone``, although it should not contain any data being
used, can also be backed up just in case.
Block Storage
-------------
``/etc/cinder`` and ``/var/log/cinder`` follow the same rules as other
components.
``/var/lib/cinder`` should also be backed up.
Object Storage
--------------
``/etc/swift`` is very important to have backed up. This directory
contains the swift configuration files as well as the ring files and
ring :term:`builder files <builder file>`, which if lost, render the data
on your cluster inaccessible. A best practice is to copy the builder files
to all storage nodes along with the ring files. Multiple backup copies are
spread throughout your storage cluster.
Recovering Backups
~~~~~~~~~~~~~~~~~~
Recovering backups is a fairly simple process. To begin, first ensure
that the service you are recovering is not running. For example, to do a
full recovery of ``nova`` on the cloud controller, first stop all
``nova`` services:
.. code-block:: console
# stop nova-api
# stop nova-cert
# stop nova-consoleauth
# stop nova-novncproxy
# stop nova-objectstore
# stop nova-scheduler
Now you can import a previously backed-up database:
.. code-block:: console
# mysql nova < nova.sql
You can also restore backed-up nova directories:
.. code-block:: console
# mv /etc/nova{,.orig}
# cp -a /path/to/backup/nova /etc/
Once the files are restored, start everything back up:
.. code-block:: console
# start mysql
# for i in nova-api nova-cert nova-consoleauth nova-novncproxy
nova-objectstore nova-scheduler
> do
> start $i
> done
Other services follow the same process, with their respective
directories and databases.
Summary
~~~~~~~
Backup and subsequent recovery is one of the first tasks system
administrators learn. However, each system has different items that need
attention. By taking care of your database, image service, and
appropriate file system locations, you can be assured that you can
handle any event requiring recovery.

View File

@ -0,0 +1,850 @@
=============
Customization
=============
OpenStack might not do everything you need it to do out of the box. To
add a new feature, you can follow different paths.
To take the first path, you can modify the OpenStack code directly.
Learn `how to
contribute <https://wiki.openstack.org/wiki/How_To_Contribute>`_,
follow the `code review
workflow <https://wiki.openstack.org/wiki/GerritWorkflow>`_, make your
changes, and contribute them back to the upstream OpenStack project.
This path is recommended if the feature you need requires deep
integration with an existing project. The community is always open to
contributions and welcomes new functionality that follows the
feature-development guidelines. This path still requires you to use
DevStack for testing your feature additions, so this chapter walks you
through the DevStack environment.
For the second path, you can write new features and plug them in using
changes to a configuration file. If the project where your feature would
need to reside uses the Python Paste framework, you can create
middleware for it and plug it in through configuration. There may also
be specific ways of customizing a project, such as creating a new
scheduler driver for Compute or a custom tab for the dashboard.
This chapter focuses on the second path for customizing OpenStack by
providing two examples for writing new features. The first example shows
how to modify Object Storage (swift) middleware to add a new feature,
and the second example provides a new scheduler feature for OpenStack
Compute (nova). To customize OpenStack this way you need a development
environment. The best way to get an environment up and running quickly
is to run DevStack within your cloud.
Create an OpenStack Development Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To create a development environment, you can use DevStack. DevStack is
essentially a collection of shell scripts and configuration files that
builds an OpenStack development environment for you. You use it to
create such an environment for developing a new feature.
You can find all of the documentation at the
`DevStack <http://docs.openstack.org/developer/devstack/>`_ website.
**To run DevStack on an instance in your OpenStack cloud:**
#. Boot an instance from the dashboard or the nova command-line interface
(CLI) with the following parameters:
- Name: devstack
- Image: Ubuntu 14.04 LTS
- Memory Size: 4 GB RAM
- Disk Size: minimum 5 GB
If you are using the ``nova`` client, specify :option:`--flavor 3` for the
:command:`nova boot` command to get adequate memory and disk sizes.
#. Log in and set up DevStack. Here's an example of the commands you can
use to set up DevStack on a virtual machine:
#. Log in to the instance:
.. code-block:: console
$ ssh username@my.instance.ip.address
#. Update the virtual machine's operating system:
.. code-block:: console
# apt-get -y update
#. Install git:
.. code-block:: console
# apt-get -y install git
#. Clone the ``devstack`` repository:
.. code-block:: console
$ git clone https://git.openstack.org/openstack-dev/devstack
#. Change to the ``devstack`` repository:
.. code-block:: console
$ cd devstack
#. (Optional) If you've logged in to your instance as the root user, you
must create a "stack" user; otherwise you'll run into permission issues.
If you've logged in as a user other than root, you can skip these steps:
#. Run the DevStack script to create the stack user:
.. code-block:: console
# tools/create-stack-user.sh
#. Give ownership of the ``devstack`` directory to the stack user:
.. code-block:: console
# chown -R stack:stack /root/devstack
#. Set some permissions you can use to view the DevStack screen later:
.. code-block:: console
# chmod o+rwx /dev/pts/0
#. Switch to the stack user:
.. code-block:: console
$ su stack
#. Edit the ``local.conf`` configuration file that controls what DevStack
will deploy. Copy the example ``local.conf`` file at the end of this
section (:ref:`local.conf`):
.. code-block:: console
$ vim local.conf
#. Run the stack script that will install OpenStack:
.. code-block:: console
$ ./stack.sh
#. When the stack script is done, you can open the screen session it
started to view all of the running OpenStack services:
.. code-block:: console
$ screen -r stack
#. Press ``Ctrl+A`` followed by 0 to go to the first ``screen`` window.
.. note::
- The ``stack.sh`` script takes a while to run. Perhaps you can
take this opportunity to `join the OpenStack
Foundation <https://www.openstack.org/join/>`__.
- ``Screen`` is a useful program for viewing many related services
at once. For more information, see the `GNU screen quick
reference <http://aperiodic.net/screen/quick_reference>`__.
Now that you have an OpenStack development environment, you're free to
hack around without worrying about damaging your production deployment.
:ref:`local.conf` provides a working environment for
running OpenStack Identity, Compute, Block Storage, Image service, the
OpenStack dashboard, and Object Storage as the starting point.
.. _local.conf:
local.conf
----------
.. code-block:: bash
[[local|localrc]]
FLOATING_RANGE=192.168.1.224/27
FIXED_RANGE=10.11.12.0/24
FIXED_NETWORK_SIZE=256
FLAT_INTERFACE=eth0
ADMIN_PASSWORD=supersecret
DATABASE_PASSWORD=iheartdatabases
RABBIT_PASSWORD=flopsymopsy
SERVICE_PASSWORD=iheartksl
SERVICE_TOKEN=xyzpdqlazydog
Customizing Object Storage (Swift) Middleware
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenStack Object Storage, known as swift when reading the code, is based
on the Python `Paste <http://pythonpaste.org/>`_ framework. The best
introduction to its architecture is `A Do-It-Yourself
Framework <http://pythonpaste.org/do-it-yourself-framework.html>`_.
Because of the swift project's use of this framework, you are able to
add features to a project by placing some custom code in a project's
pipeline without having to change any of the core code.
Imagine a scenario where you have public access to one of your
containers, but what you really want is to restrict access to that to a
set of IPs based on a whitelist. In this example, we'll create a piece
of middleware for swift that allows access to a container from only a
set of IP addresses, as determined by the container's metadata items.
Only those IP addresses that you explicitly whitelist using the
container's metadata will be able to access the container.
.. warning::
This example is for illustrative purposes only. It should not be
used as a container IP whitelist solution without further
development and extensive security testing.
When you join the screen session that ``stack.sh`` starts with
``screen -r stack``, you see a screen for each service running, which
can be a few or several, depending on how many services you configured
DevStack to run.
The asterisk * indicates which screen window you are viewing. This
example shows we are viewing the key (for keystone) screen window:
.. code-block:: console
0$ shell 1$ key* 2$ horizon 3$ s-proxy 4$ s-object 5$ s-container 6$ s-account
The purpose of the screen windows are as follows:
``shell``
A shell where you can get some work done
``key*``
The keystone service
``horizon``
The horizon dashboard web application
``s-{name}``
The swift services
**To create the middleware and plug it in through Paste configuration:**
All of the code for OpenStack lives in ``/opt/stack``. Go to the swift
directory in the ``shell`` screen and edit your middleware module.
#. Change to the directory where Object Storage is installed:
.. code-block:: console
$ cd /opt/stack/swift
#. Create the ``ip_whitelist.py`` Python source code file:
.. code-block:: console
$ vim swift/common/middleware/ip_whitelist.py
#. Copy the code as shown below into ``ip_whitelist.py``.
The following code is a middleware example that
restricts access to a container based on IP address as explained at the
beginning of the section. Middleware passes the request on to another
application. This example uses the swift "swob" library to wrap Web
Server Gateway Interface (WSGI) requests and responses into objects for
swift to interact with. When you're done, save and close the file.
.. code-block:: python
# vim: tabstop=4 shiftwidth=4 softtabstop=4
# Copyright (c) 2014 OpenStack Foundation
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import socket
from swift.common.utils import get_logger
from swift.proxy.controllers.base import get_container_info
from swift.common.swob import Request, Response
class IPWhitelistMiddleware(object):
"""
IP Whitelist Middleware
Middleware that allows access to a container from only a set of IP
addresses as determined by the container's metadata items that start
with the prefix 'allow'. E.G. allow-dev=192.168.0.20
"""
def __init__(self, app, conf, logger=None):
self.app = app
if logger:
self.logger = logger
else:
self.logger = get_logger(conf, log_route='ip_whitelist')
self.deny_message = conf.get('deny_message', "IP Denied")
self.local_ip = socket.gethostbyname(socket.gethostname())
def __call__(self, env, start_response):
"""
WSGI entry point.
Wraps env in swob.Request object and passes it down.
:param env: WSGI environment dictionary
:param start_response: WSGI callable
"""
req = Request(env)
try:
version, account, container, obj = req.split_path(1, 4, True)
except ValueError:
return self.app(env, start_response)
container_info = get_container_info(
req.environ, self.app, swift_source='IPWhitelistMiddleware')
remote_ip = env['REMOTE_ADDR']
self.logger.debug("Remote IP: %(remote_ip)s",
{'remote_ip': remote_ip})
meta = container_info['meta']
allow = {k:v for k,v in meta.iteritems() if k.startswith('allow')}
allow_ips = set(allow.values())
allow_ips.add(self.local_ip)
self.logger.debug("Allow IPs: %(allow_ips)s",
{'allow_ips': allow_ips})
if remote_ip in allow_ips:
return self.app(env, start_response)
else:
self.logger.debug(
"IP %(remote_ip)s denied access to Account=%(account)s "
"Container=%(container)s. Not in %(allow_ips)s", locals())
return Response(
status=403,
body=self.deny_message,
request=req)(env, start_response)
def filter_factory(global_conf, **local_conf):
"""
paste.deploy app factory for creating WSGI proxy apps.
"""
conf = global_conf.copy()
conf.update(local_conf)
def ip_whitelist(app):
return IPWhitelistMiddleware(app, conf)
return ip_whitelist
There is a lot of useful information in ``env`` and ``conf`` that you
can use to decide what to do with the request. To find out more about
what properties are available, you can insert the following log
statement into the ``__init__`` method:
.. code-block:: python
self.logger.debug("conf = %(conf)s", locals())
and the following log statement into the ``__call__`` method:
.. code-block:: python
self.logger.debug("env = %(env)s", locals())
#. To plug this middleware into the swift Paste pipeline, you edit one
configuration file, ``/etc/swift/proxy-server.conf``:
.. code-block:: console
$ vim /etc/swift/proxy-server.conf
#. Find the ``[filter:ratelimit]`` section in
``/etc/swift/proxy-server.conf``, and copy in the following
configuration section after it:
.. code-block:: ini
[filter:ip_whitelist]
paste.filter_factory = swift.common.middleware.ip_whitelist:filter_factory
# You can override the default log routing for this filter here:
# set log_name = ratelimit
# set log_facility = LOG_LOCAL0
# set log_level = INFO
# set log_headers = False
# set log_address = /dev/log
deny_message = You shall not pass!
#. Find the ``[pipeline:main]`` section in
``/etc/swift/proxy-server.conf``, and add ``ip_whitelist`` after
ratelimit to the list like so. When you're done, save and close the
file:
.. code-block:: ini
[pipeline:main]
pipeline = catch_errors gatekeeper healthcheck proxy-logging cache bulk tempurl ratelimit ip_whitelist ...
#. Restart the ``swift proxy`` service to make swift use your middleware.
Start by switching to the ``swift-proxy`` screen:
#. Press **Ctrl+A** followed by 3.
#. Press **Ctrl+C** to kill the service.
#. Press Up Arrow to bring up the last command.
#. Press Enter to run it.
#. Test your middleware with the ``swift`` CLI. Start by switching to the
shell screen and finish by switching back to the ``swift-proxy`` screen
to check the log output:
#. Press  **Ctrl+A** followed by 0.
#. Make sure you're in the ``devstack`` directory:
.. code-block:: console
$ cd /root/devstack
#. Source openrc to set up your environment variables for the CLI:
.. code-block:: console
$ source openrc
#. Create a container called ``middleware-test``:
.. code-block:: console
$ swift post middleware-test
#. Press **Ctrl+A** followed by 3 to check the log output.
#. Among the log statements you'll see the lines:
.. code-block:: ini
proxy-server Remote IP: my.instance.ip.address (txn: ...)
proxy-server Allow IPs: set(['my.instance.ip.address']) (txn: ...)
These two statements are produced by our middleware and show that the
request was sent from our DevStack instance and was allowed.
#. Test the middleware from outside DevStack on a remote machine that has
access to your DevStack instance:
#. Install the ``keystone`` and ``swift`` clients on your local machine:
.. code-block:: console
# pip install python-keystoneclient python-swiftclient
#. Attempt to list the objects in the ``middleware-test`` container:
.. code-block:: console
$ swift --os-auth-url=http://my.instance.ip.address:5000/v2.0/ \
--os-region-name=RegionOne --os-username=demo:demo \
--os-password=devstack list middleware-test
Container GET failed: http://my.instance.ip.address:8080/v1/AUTH_.../
middleware-test?format=json 403 Forbidden   You shall not pass!
#. Press **Ctrl+A** followed by 3 to check the log output. Look at the
swift log statements again, and among the log statements, you'll see the
lines:
.. code-block:: console
proxy-server Authorizing from an overriding middleware (i.e: tempurl) (txn: ...)
proxy-server ... IPWhitelistMiddleware
proxy-server Remote IP: my.local.ip.address (txn: ...)
proxy-server Allow IPs: set(['my.instance.ip.address']) (txn: ...)
proxy-server IP my.local.ip.address denied access to Account=AUTH_... \
Container=None. Not in set(['my.instance.ip.address']) (txn: ...)
Here we can see that the request was denied because the remote IP
address wasn't in the set of allowed IPs.
#. Back in your DevStack instance on the shell screen, add some metadata to
your container to allow the request from the remote machine:
#. Press **Ctrl+A** followed by 0.
#. Add metadata to the container to allow the IP:
.. code-block:: console
$ swift post --meta allow-dev:my.local.ip.address middleware-test
#. Now try the command from Step 10 again and it succeeds. There are no
objects in the container, so there is nothing to list; however, there is
also no error to report.
.. warning::
Functional testing like this is not a replacement for proper unit
and integration testing, but it serves to get you started.
You can follow a similar pattern in other projects that use the Python
Paste framework. Simply create a middleware module and plug it in
through configuration. The middleware runs in sequence as part of that
project's pipeline and can call out to other services as necessary. No
project core code is touched. Look for a ``pipeline`` value in the
project's ``conf`` or ``ini`` configuration files in ``/etc/<project>``
to identify projects that use Paste.
When your middleware is done, we encourage you to open source it and let
the community know on the OpenStack mailing list. Perhaps others need
the same functionality. They can use your code, provide feedback, and
possibly contribute. If enough support exists for it, perhaps you can
propose that it be added to the official swift
`middleware <https://git.openstack.org/cgit/openstack/swift/tree/swift/common/middleware>`_.
Customizing the OpenStack Compute (nova) Scheduler
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Many OpenStack projects allow for customization of specific features
using a driver architecture. You can write a driver that conforms to a
particular interface and plug it in through configuration. For example,
you can easily plug in a new scheduler for Compute. The existing
schedulers for Compute are feature full and well documented at
`Scheduling <http://docs.openstack.org/liberty/config-reference/content/section_compute-scheduler.html>`_.
However, depending on your user's use cases, the existing schedulers
might not meet your requirements. You might need to create a new
scheduler.
To create a scheduler, you must inherit from the class
``nova.scheduler.driver.Scheduler``. Of the five methods that you can
override, you *must* override the two methods marked with an asterisk
(\*) below:
- ``update_service_capabilities``
- ``hosts_up``
- ``group_hosts``
- \* ``schedule_run_instance``
- \* ``select_destinations``
To demonstrate customizing OpenStack, we'll create an example of a
Compute scheduler that randomly places an instance on a subset of hosts,
depending on the originating IP address of the request and the prefix of
the hostname. Such an example could be useful when you have a group of
users on a subnet and you want all of their instances to start within
some subset of your hosts.
.. warning::
This example is for illustrative purposes only. It should not be
used as a scheduler for Compute without further development and
testing.
When you join the screen session that ``stack.sh`` starts with
``screen -r stack``, you are greeted with many screen windows:
.. code-block:: console
0$ shell*  1$ key  2$ horizon  ...  9$ n-api  ...  14$ n-sch ...
``shell``
A shell where you can get some work done
``key``
The keystone service
``horizon``
The horizon dashboard web application
``n-{name}``
The nova services
``n-sch``
The nova scheduler service
**To create the scheduler and plug it in through configuration**
#. The code for OpenStack lives in ``/opt/stack``, so go to the ``nova``
directory and edit your scheduler module. Change to the directory where
``nova`` is installed:
.. code-block:: console
$ cd /opt/stack/nova
#. Create the ``ip_scheduler.py`` Python source code file:
.. code-block:: console
$ vim nova/scheduler/ip_scheduler.py
#. The code shown below is a driver that will
schedule servers to hosts based on IP address as explained at the
beginning of the section. Copy the code into ``ip_scheduler.py``. When
you're done, save and close the file.
.. code-block:: python
# vim: tabstop=4 shiftwidth=4 softtabstop=4
# Copyright (c) 2014 OpenStack Foundation
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
"""
IP Scheduler implementation
"""
import random
from oslo.config import cfg
from nova.compute import rpcapi as compute_rpcapi
from nova import exception
from nova.openstack.common import log as logging
from nova.openstack.common.gettextutils import _
from nova.scheduler import driver
CONF = cfg.CONF
CONF.import_opt('compute_topic', 'nova.compute.rpcapi')
LOG = logging.getLogger(__name__)
class IPScheduler(driver.Scheduler):
"""
Implements Scheduler as a random node selector based on
IP address and hostname prefix.
"""
def __init__(self, *args, **kwargs):
super(IPScheduler, self).__init__(*args, **kwargs)
self.compute_rpcapi = compute_rpcapi.ComputeAPI()
def _filter_hosts(self, request_spec, hosts, filter_properties,
hostname_prefix):
"""Filter a list of hosts based on hostname prefix."""
hosts = [host for host in hosts if host.startswith(hostname_prefix)]
return hosts
def _schedule(self, context, topic, request_spec, filter_properties):
"""Picks a host that is up at random."""
elevated = context.elevated()
hosts = self.hosts_up(elevated, topic)
if not hosts:
msg = _("Is the appropriate service running?")
raise exception.NoValidHost(reason=msg)
remote_ip = context.remote_address
if remote_ip.startswith('10.1'):
hostname_prefix = 'doc'
elif remote_ip.startswith('10.2'):
hostname_prefix = 'ops'
else:
hostname_prefix = 'dev'
hosts = self._filter_hosts(request_spec, hosts, filter_properties,
hostname_prefix)
if not hosts:
msg = _("Could not find another compute")
raise exception.NoValidHost(reason=msg)
host = random.choice(hosts)
LOG.debug("Request from %(remote_ip)s scheduled to %(host)s" % locals())
return host
def select_destinations(self, context, request_spec, filter_properties):
"""Selects random destinations."""
num_instances = request_spec['num_instances']
# NOTE(timello): Returns a list of dicts with 'host', 'nodename' and
# 'limits' as keys for compatibility with filter_scheduler.
dests = []
for i in range(num_instances):
host = self._schedule(context, CONF.compute_topic,
request_spec, filter_properties)
host_state = dict(host=host, nodename=None, limits=None)
dests.append(host_state)
if len(dests) < num_instances:
raise exception.NoValidHost(reason='')
return dests
def schedule_run_instance(self, context, request_spec,
admin_password, injected_files,
requested_networks, is_first_time,
filter_properties, legacy_bdm_in_spec):
"""Create and run an instance or instances."""
instance_uuids = request_spec.get('instance_uuids')
for num, instance_uuid in enumerate(instance_uuids):
request_spec['instance_properties']['launch_index'] = num
try:
host = self._schedule(context, CONF.compute_topic,
request_spec, filter_properties)
updated_instance = driver.instance_update_db(context,
instance_uuid)
self.compute_rpcapi.run_instance(context,
instance=updated_instance, host=host,
requested_networks=requested_networks,
injected_files=injected_files,
admin_password=admin_password,
is_first_time=is_first_time,
request_spec=request_spec,
filter_properties=filter_properties,
legacy_bdm_in_spec=legacy_bdm_in_spec)
except Exception as ex:
# NOTE(vish): we don't reraise the exception here to make sure
# that all instances in the request get set to
# error properly
driver.handle_schedule_error(context, ex, instance_uuid,
request_spec)
There is a lot of useful information in ``context``, ``request_spec``,
and ``filter_properties`` that you can use to decide where to schedule
the instance. To find out more about what properties are available, you
can insert the following log statements into the
``schedule_run_instance`` method of the scheduler above:
.. code-block:: python
LOG.debug("context = %(context)s" % {'context': context.__dict__})
LOG.debug("request_spec = %(request_spec)s" % locals())
LOG.debug("filter_properties = %(filter_properties)s" % locals())
#. To plug this scheduler into nova, edit one configuration file,
``/etc/nova/nova.conf``:
.. code-block:: console
$ vim /etc/nova/nova.conf
#. Find the ``scheduler_driver`` config and change it like so:
.. code-block:: ini
scheduler_driver=nova.scheduler.ip_scheduler.IPScheduler
#. Restart the nova scheduler service to make nova use your scheduler.
Start by switching to the ``n-sch`` screen:
#. Press **Ctrl+A** followed by 9.
#. Press **Ctrl+A** followed by N until you reach the ``n-sch`` screen.
#. Press **Ctrl+C** to kill the service.
#. Press Up Arrow to bring up the last command.
#. Press Enter to run it.
#. Test your scheduler with the nova CLI. Start by switching to the
``shell`` screen and finish by switching back to the ``n-sch`` screen to
check the log output:
#. Press  **Ctrl+A** followed by 0.
#. Make sure you're in the ``devstack`` directory:
.. code-block:: console
$ cd /root/devstack
#. Source ``openrc`` to set up your environment variables for the CLI:
.. code-block:: console
$ source openrc
#. Put the image ID for the only installed image into an environment
variable:
.. code-block:: console
$ IMAGE_ID=`nova image-list | egrep cirros | egrep -v "kernel|ramdisk" | awk '{print $2}'`
#. Boot a test server:
.. code-block:: console
$ nova boot --flavor 1 --image $IMAGE_ID scheduler-test
#. Switch back to the ``n-sch`` screen. Among the log statements, you'll
see the line:
.. code-block:: console
2014-01-23 19:57:47.262 DEBUG nova.scheduler.ip_scheduler \
[req-... demo demo] Request from 162.242.221.84 \
scheduled to devstack-havana \
_schedule /opt/stack/nova/nova/scheduler/ip_scheduler.py:76
.. warning::
Functional testing like this is not a replacement for proper unit
and integration testing, but it serves to get you started.
A similar pattern can be followed in other projects that use the driver
architecture. Simply create a module and class that conform to the
driver interface and plug it in through configuration. Your code runs
when that feature is used and can call out to other services as
necessary. No project core code is touched. Look for a "driver" value in
the project's ``.conf`` configuration files in ``/etc/<project>`` to
identify projects that use a driver architecture.
When your scheduler is done, we encourage you to open source it and let
the community know on the OpenStack mailing list. Perhaps others need
the same functionality. They can use your code, provide feedback, and
possibly contribute. If enough support exists for it, perhaps you can
propose that it be added to the official Compute
`schedulers <https://git.openstack.org/cgit/openstack/nova/tree/nova/scheduler>`_.
Customizing the Dashboard (Horizon)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The dashboard is based on the Python
`Django <https://www.djangoproject.com/>`_ web application framework.
The best guide to customizing it has already been written and can be
found at `Building on
Horizon <http://docs.openstack.org/developer/horizon/topics/tutorial.html>`_.
Conclusion
~~~~~~~~~~
When operating an OpenStack cloud, you may discover that your users can
be quite demanding. If OpenStack doesn't do what your users need, it may
be up to you to fulfill those requirements. This chapter provided you
with some options for customization and gave you the tools you need to
get started.

View File

@ -0,0 +1,602 @@
===============
Lay of the Land
===============
This chapter helps you set up your working environment and use it to
take a look around your cloud.
Using the OpenStack Dashboard for Administration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As a cloud administrative user, you can use the OpenStack dashboard to
create and manage projects, users, images, and flavors. Users are
allowed to create and manage images within specified projects and to
share images, depending on the Image service configuration. Typically,
the policy configuration allows admin users only to set quotas and
create and manage services. The dashboard provides an :guilabel:`Admin`
tab with a :guilabel:`System Panel` and an :guilabel:`Identity` tab.
These interfaces give you access to system information and usage as
well as to settings for configuring what
end users can do. Refer to the `OpenStack Administrator
Guide <http://docs.openstack.org/admin-guide/dashboard.html>`_ for
detailed how-to information about using the dashboard as an admin user.
Command-Line Tools
~~~~~~~~~~~~~~~~~~
We recommend using a combination of the OpenStack command-line interface
(CLI) tools and the OpenStack dashboard for administration. Some users
with a background in other cloud technologies may be using the EC2
Compatibility API, which uses naming conventions somewhat different from
the native API. We highlight those differences.
We strongly suggest that you install the command-line clients from the
`Python Package Index <https://pypi.python.org/pypi>`_ (PyPI) instead
of from the distribution packages. The clients are under heavy
development, and it is very likely at any given time that the version of
the packages distributed by your operating-system vendor are out of
date.
The pip utility is used to manage package installation from the PyPI
archive and is available in the python-pip package in most Linux
distributions. Each OpenStack project has its own client, so depending
on which services your site runs, install some or all of the
following packages:
* python-novaclient (:term:`nova` CLI)
* python-glanceclient (:term:`glance` CLI)
* python-keystoneclient (:term:`keystone` CLI)
* python-cinderclient (:term:`cinder` CLI)
* python-swiftclient (:term:`swift` CLI)
* python-neutronclient (:term:`neutron` CLI)
Installing the Tools
--------------------
To install (or upgrade) a package from the PyPI archive with pip,
command-line tools installingas root:
.. code-block:: console
# pip install [--upgrade] <package-name>
To remove the package:
.. code-block:: console
# pip uninstall <package-name>
If you need even newer versions of the clients, pip can install directly
from the upstream git repository using the :option:`-e` flag. You must specify
a name for the Python egg that is installed. For example:
.. code-block:: console
# pip install -e git+https://git.openstack.org/openstack/python-novaclient#egg=python-novaclient
If you support the EC2 API on your cloud, you should also install the
euca2ools package or some other EC2 API tool so that you can get the
same view your users have. Using EC2 API-based tools is mostly out of
the scope of this guide, though we discuss getting credentials for use
with it.
Administrative Command-Line Tools
---------------------------------
There are also several :command:`*-manage` command-line tools. These are
installed with the project's services on the cloud controller and do not
need to be installed\*-manage command-line toolscommand-line tools
administrative separately:
* :command:`glance-manage`
* :command:`keystone-manage`
* :command:`cinder-manage`
Unlike the CLI tools mentioned above, the :command:`*-manage` tools must
be run from the cloud controller, as root, because they need read access
to the config files such as ``/etc/nova/nova.conf`` and to make queries
directly against the database rather than against the OpenStack
:term:`API endpoints <API endpoint>`.
.. warning::
The existence of the ``*-manage`` tools is a legacy issue. It is a
goal of the OpenStack project to eventually migrate all of the
remaining functionality in the ``*-manage`` tools into the API-based
tools. Until that day, you need to SSH into the
:term:`cloud controller node` to perform some maintenance operations
that require one of the ``*-manage`` tools.
Getting Credentials
-------------------
You must have the appropriate credentials if you want to use the
command-line tools to make queries against your OpenStack cloud. By far,
the easiest way to obtain :term:`authentication` credentials to use with
command-line clients is to use the OpenStack dashboard. Select
:guilabel:`Project`, click the :guilabel:`Project` tab, and click
:guilabel:`Access & Security` on the :guilabel:`Compute` category.
On the :guilabel:`Access & Security` page, click the :guilabel:`API Access`
tab to display two buttons, :guilabel:`Download OpenStack RC File` and
:guilabel:`Download EC2 Credentials`, which let you generate files that
you can source in your shell to populate the environment variables the
command-line tools require to know where your service endpoints and your
authentication information are. The user you logged in to the dashboard
dictates the filename for the openrc file, such as ``demo-openrc.sh``.
When logged in as admin, the file is named ``admin-openrc.sh``.
The generated file looks something like this:
.. code-block:: bash
#!/bin/bash
# With the addition of Keystone, to use an openstack cloud you should
# authenticate against keystone, which returns a **Token** and **Service
# Catalog**. The catalog contains the endpoint for all services the
# user/tenant has access to--including nova, glance, keystone, swift.
#
# *NOTE*: Using the 2.0 *auth api* does not mean that compute api is 2.0.
# We use the 1.1 *compute api*
export OS_AUTH_URL=http://203.0.113.10:5000/v2.0
# With the addition of Keystone we have standardized on the term **tenant**
# as the entity that owns the resources.
export OS_TENANT_ID=98333aba48e756fa8f629c83a818ad57
export OS_TENANT_NAME="test-project"
# In addition to the owning entity (tenant), openstack stores the entity
# performing the action as the **user**.
export OS_USERNAME=demo
# With Keystone you pass the keystone password.
echo "Please enter your OpenStack Password: "
read -s OS_PASSWORD_INPUT
export OS_PASSWORD=$OS_PASSWORD_INPUT
.. warning::
This does not save your password in plain text, which is a good
thing. But when you source or run the script, it prompts you for
your password and then stores your response in the environment
variable ``OS_PASSWORD``. It is important to note that this does
require interactivity. It is possible to store a value directly in
the script if you require a noninteractive operation, but you then
need to be extremely cautious with the security and permissions of
this file.passwordssecurity issues passwords
EC2 compatibility credentials can be downloaded by selecting
:guilabel:`Project`, then :guilabel:`Compute`, then
:guilabel:`Access & Security`, then :guilabel:`API Access` to display the
:guilabel:`Download EC2 Credentials` button. Click the button to generate
a ZIP file with server x509 certificates and a shell script fragment.
Create a new directory in a secure location because these are live credentials
containing all the authentication information required to access your
cloud identity, unlike the default ``user-openrc``. Extract the ZIP file
here. You should have ``cacert.pem``, ``cert.pem``, ``ec2rc.sh``, and
``pk.pem``. The ``ec2rc.sh`` is similar to this:
.. code-block:: bash
#!/bin/bash
NOVARC=$(readlink -f "${BASH_SOURCE:-${0}}" 2>/dev/null) ||\
NOVARC=$(python -c 'import os,sys; \
print os.path.abspath(os.path.realpath(sys.argv[1]))' "${BASH_SOURCE:-${0}}")
NOVA_KEY_DIR=${NOVARC%/*}
export EC2_ACCESS_KEY=df7f93ec47e84ef8a347bbb3d598449a
export EC2_SECRET_KEY=ead2fff9f8a344e489956deacd47e818
export EC2_URL=http://203.0.113.10:8773/services/Cloud
export EC2_USER_ID=42 # nova does not use user id, but bundling requires it
export EC2_PRIVATE_KEY=${NOVA_KEY_DIR}/pk.pem
export EC2_CERT=${NOVA_KEY_DIR}/cert.pem
export NOVA_CERT=${NOVA_KEY_DIR}/cacert.pem
export EUCALYPTUS_CERT=${NOVA_CERT} # euca-bundle-image seems to require this
alias ec2-bundle-image="ec2-bundle-image --cert $EC2_CERT --privatekey \
$EC2_PRIVATE_KEY --user 42 --ec2cert $NOVA_CERT"
alias ec2-upload-bundle="ec2-upload-bundle -a $EC2_ACCESS_KEY -s \
$EC2_SECRET_KEY --url $S3_URL --ec2cert $NOVA_CERT"
To put the EC2 credentials into your environment, source the
``ec2rc.sh`` file.
Inspecting API Calls
--------------------
The command-line tools can be made to show the OpenStack API calls they
make by passing the :option:`--debug` flag to them.API (application
programming interface) API calls, inspectingcommand-line tools
inspecting API calls For example:
.. code-block:: console
# nova --debug list
This example shows the HTTP requests from the client and the responses
from the endpoints, which can be helpful in creating custom tools
written to the OpenStack API.
Using cURL for further inspection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Underlying the use of the command-line tools is the OpenStack API, which
is a RESTful API that runs over HTTP. There may be cases where you want
to interact with the API directly or need to use it because of a
suspected bug in one of the CLI tools. The best way to do this is to use
a combination of `cURL <http://curl.haxx.se/>`_ and another tool,
such as `jq <http://stedolan.github.io/jq/>`_, to parse the JSON from
the responses.
The first thing you must do is authenticate with the cloud using your
credentials to get an authentication token.
Your credentials are a combination of username, password, and tenant
(project). You can extract these values from the ``openrc.sh`` discussed
above. The token allows you to interact with your other service
endpoints without needing to reauthenticate for every request. Tokens
are typically good for 24 hours, and when the token expires, you are
alerted with a 401 (Unauthorized) response and you can request another
token.
#. Look at your OpenStack service catalog:
.. code-block:: console
$ curl -s -X POST http://203.0.113.10:35357/v2.0/tokens \
-d '{"auth": {"passwordCredentials": {"username":"test-user", \
"password":"test-password"}, \
"tenantName":"test-project"}}' \
-H "Content-type: application/json" | jq .
#. Read through the JSON response to get a feel for how the catalog is
laid out.
To make working with subsequent requests easier, store the token in
an environment variable:
.. code-block:: console
$ TOKEN=`curl -s -X POST http://203.0.113.10:35357/v2.0/tokens \
-d '{"auth": {"passwordCredentials": {"username":"test-user", \
"password":"test-password"}, \
"tenantName":"test-project"}}' \
-H "Content-type: application/json" |  jq -r .access.token.id`
Now you can refer to your token on the command line as ``$TOKEN``.
#. Pick a service endpoint from your service catalog, such as compute.
Try a request, for example, listing instances (servers):
.. code-block:: console
$ curl -s \
-H "X-Auth-Token: $TOKEN" \
http://203.0.113.10:8774/v2/98333aba48e756fa8f629c83a818ad57/servers | jq .
To discover how API requests should be structured, read the `OpenStack
API Reference <http://developer.openstack.org/api-ref.html>`_. To chew
through the responses using jq, see the `jq
Manual <http://stedolan.github.io/jq/manual/>`_.
The ``-s flag`` used in the cURL commands above are used to prevent
the progress meter from being shown. If you are having trouble running
cURL commands, you'll want to remove it. Likewise, to help you
troubleshoot cURL commands, you can include the ``-v`` flag to show you
the verbose output. There are many more extremely useful features in
cURL; refer to the man page for all the options.
Servers and Services
--------------------
As an administrator, you have a few ways to discover what your OpenStack
cloud looks like simply by using the OpenStack tools available. This
section gives you an idea of how to get an overview of your cloud, its
shape, size, and current state.
First, you can discover what servers belong to your OpenStack cloud by
running:
.. code-block:: console
# nova service-list
The output looks like the following:
.. code-block:: console
+----+------------------+-------------------+------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+------------------+-------------------+------+---------+-------+----------------------------+-----------------+
| 1 | nova-cert | cloud.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
| 2 | nova-compute | c01.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
| 3 | nova-compute | c01.example.com. | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
| 4 | nova-compute | c01.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
| 5 | nova-compute | c01.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
| 6 | nova-compute | c01.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
| 7 | nova-conductor | cloud.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
| 8 | nova-cert | cloud.example.com | nova | enabled | up | 2016-01-05T17:20:42.000000 | - |
| 9 | nova-scheduler | cloud.example.com | nova | enabled | up | 2016-01-05T17:20:38.000000 | - |
| 10 | nova-consoleauth | cloud.example.com | nova | enabled | up | 2016-01-05T17:20:35.000000 | - |
+----+------------------+-------------------+------+---------+-------+----------------------------+-----------------+
The output shows that there are five compute nodes and one cloud
controller. You see all the services in the up state, which indicates that
the services are up and running. If a service is in a down state, it is
no longer available. This is an indication that you
should troubleshoot why the service is down.
If you are using cinder, run the following command to see a similar
listing:
.. code-block:: console
# cinder-manage host list | sort
host zone
c01.example.com nova
c02.example.com nova
c03.example.com nova
c04.example.com nova
c05.example.com nova
cloud.example.com nova
With these two tables, you now have a good overview of what servers and
services make up your cloud.
You can also use the Identity service (keystone) to see what services
are available in your cloud as well as what endpoints have been
configured for the services.
The following command requires you to have your shell environment
configured with the proper administrative variables:
.. code-block:: console
$ openstack catalog list
+----------+------------+---------------------------------------------------------------------------------+
| Name | Type | Endpoints |
+----------+------------+---------------------------------------------------------------------------------+
| nova | compute | RegionOne |
| | | publicURL: http://192.168.122.10:8774/v2/9faa845768224258808fc17a1bb27e5e |
| | | internalURL: http://192.168.122.10:8774/v2/9faa845768224258808fc17a1bb27e5e |
| | | adminURL: http://192.168.122.10:8774/v2/9faa845768224258808fc17a1bb27e5e |
| | | |
| cinderv2 | volumev2 | RegionOne |
| | | publicURL: http://192.168.122.10:8776/v2/9faa845768224258808fc17a1bb27e5e |
| | | internalURL: http://192.168.122.10:8776/v2/9faa845768224258808fc17a1bb27e5e |
| | | adminURL: http://192.168.122.10:8776/v2/9faa845768224258808fc17a1bb27e5e |
| | | |
The preceding output has been truncated to show only two services. You
will see one service entry for each service that your cloud provides.
Note how the endpoint domain can be different depending on the endpoint
type. Different endpoint domains per type are not required, but this can
be done for different reasons, such as endpoint privacy or network
traffic segregation.
You can find the version of the Compute installation by using the
nova client command:
.. code-block:: console
# nova version-list
Diagnose Your Compute Nodes
---------------------------
You can obtain extra information about virtual machines that are
running—their CPU usage, the memory, the disk I/O or network I/O—per
instance, by running the :command:`nova diagnostics` command with a server ID:
.. code-block:: console
$ nova diagnostics <serverID>
The output of this command varies depending on the hypervisor because
hypervisors support different attributes. The following demonstrates
the difference between the two most popular hypervisors.
Here is example output when the hypervisor is Xen:
.. code-block:: console
+----------------+-----------------+
| Property | Value |
+----------------+-----------------+
| cpu0 | 4.3627 |
| memory | 1171088064.0000 |
| memory_target | 1171088064.0000 |
| vbd_xvda_read | 0.0 |
| vbd_xvda_write | 0.0 |
| vif_0_rx | 3223.6870 |
| vif_0_tx | 0.0 |
| vif_1_rx | 104.4955 |
| vif_1_tx | 0.0 |
+----------------+-----------------+
While the command should work with any hypervisor that is controlled
through libvirt (KVM, QEMU, or LXC), it has been tested only with KVM.
Here is the example output when the hypervisor is KVM:
.. code-block:: console
+------------------+------------+
| Property | Value |
+------------------+------------+
| cpu0_time | 2870000000 |
| memory | 524288 |
| vda_errors | -1 |
| vda_read | 262144 |
| vda_read_req | 112 |
| vda_write | 5606400 |
| vda_write_req | 376 |
| vnet0_rx | 63343 |
| vnet0_rx_drop | 0 |
| vnet0_rx_errors | 0 |
| vnet0_rx_packets | 431 |
| vnet0_tx | 4905 |
| vnet0_tx_drop | 0 |
| vnet0_tx_errors | 0 |
| vnet0_tx_packets | 45 |
+------------------+------------+
Network Inspection
~~~~~~~~~~~~~~~~~~
To see which fixed IP networks are configured in your cloud, you can use
the :command:`nova` command-line client to get the IP ranges:
.. code-block:: console
$ nova network-list
+--------------------------------------+--------+--------------+
| ID | Label | Cidr |
+--------------------------------------+--------+--------------+
| 3df67919-9600-4ea8-952e-2a7be6f70774 | test01 | 10.1.0.0/24 |
| 8283efb2-e53d-46e1-a6bd-bb2bdef9cb9a | test02 | 10.1.1.0/24 |
+--------------------------------------+--------+--------------+
The nova command-line client can provide some additional details:
.. code-block:: console
# nova network-list
id IPv4 IPv6 start address DNS1 DNS2 VlanID project uuid
1 10.1.0.0/24 None 10.1.0.3 None None 300 2725bbd beacb3f2
2 10.1.1.0/24 None 10.1.1.3 None None 301 none d0b1a796
This output shows that two networks are configured, each network
containing 255 IPs (a /24 subnet). The first network has been assigned
to a certain project, while the second network is still open for
assignment. You can assign this network manually; otherwise, it is
automatically assigned when a project launches its first instance.
To find out whether any floating IPs are available in your cloud, run:
.. code-block:: console
# nova floating-ip-list
2725bb...59f43f 1.2.3.4 None nova vlan20
None 1.2.3.5 48a415...b010ff nova vlan20
Here, two floating IPs are available. The first has been allocated to a
project, while the other is unallocated.
Users and Projects
~~~~~~~~~~~~~~~~~~
To see a list of projects that have been added to the cloud,projects
obtaining list of currentuser management listing usersworking
environment users and projects run:
.. code-block:: console
$ openstack project list
+----------------------------------+--------------------+
| ID | Name |
+----------------------------------+--------------------+
| 422c17c0b26f4fbe9449f37a5621a5e6 | alt_demo |
| 5dc65773519248f3a580cfe28ba7fa3f | demo |
| 9faa845768224258808fc17a1bb27e5e | admin |
| a733070a420c4b509784d7ea8f6884f7 | invisible_to_admin |
| aeb3e976e7794f3f89e4a7965db46c1e | service |
+----------------------------------+--------------------+
To see a list of users, run:
.. code-block:: console
$ openstack user list
+----------------------------------+----------+
| ID | Name |
+----------------------------------+----------+
| 5837063598694771aedd66aa4cddf0b8 | demo |
| 58efd9d852b74b87acc6efafaf31b30e | cinder |
| 6845d995a57a441f890abc8f55da8dfb | glance |
| ac2d15a1205f46d4837d5336cd4c5f5a | alt_demo |
| d8f593c3ae2b47289221f17a776a218b | admin |
| d959ec0a99e24df0b7cb106ff940df20 | nova |
+----------------------------------+----------+
.. note::
Sometimes a user and a group have a one-to-one mapping. This happens
for standard system accounts, such as cinder, glance, nova, and
swift, or when only one user is part of a group.
Running Instances
~~~~~~~~~~~~~~~~~
To see a list of running instances,instances list of runningworking
environment running instances run:
.. code-block:: console
$ nova list --all-tenants
+-----+------------------+--------+-------------------------------------------+
| ID | Name | Status | Networks |
+-----+------------------+--------+-------------------------------------------+
| ... | Windows | ACTIVE | novanetwork_1=10.1.1.3, 199.116.232.39 |
| ... | cloud controller | ACTIVE | novanetwork_0=10.1.0.6; jtopjian=10.1.2.3 |
| ... | compute node 1 | ACTIVE | novanetwork_0=10.1.0.4; jtopjian=10.1.2.4 |
| ... | devbox | ACTIVE | novanetwork_0=10.1.0.3 |
| ... | devstack | ACTIVE | novanetwork_0=10.1.0.5 |
| ... | initial | ACTIVE | nova_network=10.1.7.4, 10.1.8.4 |
| ... | lorin-head | ACTIVE | nova_network=10.1.7.3, 10.1.8.3 |
+-----+------------------+--------+-------------------------------------------+
Unfortunately, this command does not tell you various details about the
running instances, such as what compute node the instance is running on,
what flavor the instance is, and so on. You can use the following
command to view details about individual instances:
.. code-block:: console
$ nova show <uuid>
For example:
.. code-block:: console
# nova show 81db556b-8aa5-427d-a95c-2a9a6972f630
+-------------------------------------+-----------------------------------+
| Property | Value |
+-------------------------------------+-----------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-SRV-ATTR:host | c02.example.com |
| OS-EXT-SRV-ATTR:hypervisor_hostname | c02.example.com |
| OS-EXT-SRV-ATTR:instance_name | instance-00000029 |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2013-02-13T20:08:36Z |
| flavor | m1.small (6) |
| hostId | ... |
| id | ... |
| image | Ubuntu 12.04 cloudimg amd64 (...) |
| key_name | jtopjian-sandbox |
| metadata | {} |
| name | devstack |
| novanetwork_0 network | 10.1.0.5 |
| progress | 0 |
| security_groups | [{u'name': u'default'}] |
| status | ACTIVE |
| tenant_id | ... |
| updated | 2013-02-13T20:08:59Z |
| user_id | ... |
+-------------------------------------+-----------------------------------+
This output shows that an instance named ``devstack`` was created from
an Ubuntu 12.04 image using a flavor of ``m1.small`` and is hosted on
the compute node ``c02.example.com``.
Summary
~~~~~~~
We hope you have enjoyed this quick tour of your working environment,
including how to interact with your cloud and extract useful
information. From here, you can use the `Administrator
Guide <http://docs.openstack.org/admin-guide/>`_ as your
reference for all of the command-line functionality in your cloud.

View File

@ -0,0 +1,777 @@
======================
Logging and Monitoring
======================
As an OpenStack cloud is composed of so many different services, there
are a large number of log files. This chapter aims to assist you in
locating and working with them and describes other ways to track the
status of your deployment.
Where Are the Logs?
~~~~~~~~~~~~~~~~~~~
Most services use the convention of writing their log files to
subdirectories of the ``/var/log directory``, as listed in the
below table.
.. list-table:: OpenStack log locations
:widths: 33 33 33
:header-rows: 1
* - Node type
- Service
- Log location
* - Cloud controller
- ``nova-*``
- ``/var/log/nova``
* - Cloud controller
- ``glance-*``
- ``/var/log/glance``
* - Cloud controller
- ``cinder-*``
- ``/var/log/cinder``
* - Cloud controller
- ``keystone-*``
- ``/var/log/keystone``
* - Cloud controller
- ``neutron-*``
- ``/var/log/neutron``
* - Cloud controller
- horizon
- ``/var/log/apache2/``
* - All nodes
- misc (swift, dnsmasq)
- ``/var/log/syslog``
* - Compute nodes
- libvirt
- ``/var/log/libvirt/libvirtd.log``
* - Compute nodes
- Console (boot up messages) for VM instances:
- ``/var/lib/nova/instances/instance-<instance id>/console.log``
* - Block Storage nodes
- cinder-volume
- ``/var/log/cinder/cinder-volume.log``
Reading the Logs
~~~~~~~~~~~~~~~~
OpenStack services use the standard logging levels, at increasing
severity: DEBUG, INFO, AUDIT, WARNING, ERROR, CRITICAL, and TRACE. That
is, messages only appear in the logs if they are more "severe" than the
particular log level, with DEBUG allowing all log statements through.
For example, TRACE is logged only if the software has a stack trace,
while INFO is logged for every message including those that are only for
information.
To disable DEBUG-level logging, edit ``/etc/nova/nova.conf`` file as follows:
.. code-block:: ini
debug=false
Keystone is handled a little differently. To modify the logging level,
edit the ``/etc/keystone/logging.conf`` file and look at the
``logger_root`` and ``handler_file`` sections.
Logging for horizon is configured in
``/etc/openstack_dashboard/local_settings.py``. Because horizon is
a Django web application, it follows the `Django Logging framework
conventions <https://docs.djangoproject.com/en/dev/topics/logging/>`_.
The first step in finding the source of an error is typically to search
for a CRITICAL, TRACE, or ERROR message in the log starting at the
bottom of the log file.
Here is an example of a CRITICAL log message, with the corresponding
TRACE (Python traceback) immediately following:
.. code-block:: console
2013-02-25 21:05:51 17409 CRITICAL cinder [-] Bad or unexpected response from the storage volume backend API: volume group
cinder-volumes doesn't exist
2013-02-25 21:05:51 17409 TRACE cinder Traceback (most recent call last):
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/bin/cinder-volume", line 48, in <module>
2013-02-25 21:05:51 17409 TRACE cinder service.wait()
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 422, in wait
2013-02-25 21:05:51 17409 TRACE cinder _launcher.wait()
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 127, in wait
2013-02-25 21:05:51 17409 TRACE cinder service.wait()
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
2013-02-25 21:05:51 17409 TRACE cinder return self._exit_event.wait()
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2013-02-25 21:05:51 17409 TRACE cinder return hubs.get_hub().switch()
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch
2013-02-25 21:05:51 17409 TRACE cinder return self.greenlet.switch()
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
2013-02-25 21:05:51 17409 TRACE cinder result = function(*args, **kwargs)
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 88, in run_server
2013-02-25 21:05:51 17409 TRACE cinder server.start()
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 159, in start
2013-02-25 21:05:51 17409 TRACE cinder self.manager.init_host()
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 95,
in init_host
2013-02-25 21:05:51 17409 TRACE cinder self.driver.check_for_setup_error()
2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/volume/driver.py", line 116,
in check_for_setup_error
2013-02-25 21:05:51 17409 TRACE cinder raise exception.VolumeBackendAPIException(data=exception_message)
2013-02-25 21:05:51 17409 TRACE cinder VolumeBackendAPIException: Bad or unexpected response from the storage volume
backend API: volume group cinder-volumes doesn't exist
2013-02-25 21:05:51 17409 TRACE cinder
In this example, ``cinder-volumes`` failed to start and has provided a
stack trace, since its volume back end has been unable to set up the
storage volume—probably because the LVM volume that is expected from the
configuration does not exist.
Here is an example error log:
.. code-block:: console
2013-02-25 20:26:33 6619 ERROR nova.openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable:
[Errno 111] ECONNREFUSED. Trying again in 23 seconds.
In this error, a nova service has failed to connect to the RabbitMQ
server because it got a connection refused error.
Tracing Instance Requests
~~~~~~~~~~~~~~~~~~~~~~~~~
When an instance fails to behave properly, you will often have to trace
activity associated with that instance across the log files of various
``nova-*`` services and across both the cloud controller and compute
nodes.
The typical way is to trace the UUID associated with an instance across
the service logs.
Consider the following example:
.. code-block:: console
$ nova list
+--------------------------------+--------+--------+--------------------------+
| ID | Name | Status | Networks |
+--------------------------------+--------+--------+--------------------------+
| fafed8-4a46-413b-b113-f1959ffe | cirros | ACTIVE | novanetwork=192.168.100.3|
+--------------------------------------+--------+--------+--------------------+
Here, the ID associated with the instance is
``faf7ded8-4a46-413b-b113-f19590746ffe``. If you search for this string
on the cloud controller in the ``/var/log/nova-*.log`` files, it appears
in ``nova-api.log`` and ``nova-scheduler.log``. If you search for this
on the compute nodes in ``/var/log/nova-*.log``, it appears in
``nova-network.log`` and ``nova-compute.log``. If no ERROR or CRITICAL
messages appear, the most recent log entry that reports this may provide
a hint about what has gone wrong.
Adding Custom Logging Statements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is not enough information in the existing logs, you may need to
add your own custom logging statements to the ``nova-*``
services.
The source files are located in
``/usr/lib/python2.7/dist-packages/nova``.
To add logging statements, the following line should be near the top of
the file. For most files, these should already be there:
.. code-block:: python
from nova.openstack.common import log as logging
LOG = logging.getLogger(__name__)
To add a DEBUG logging statement, you would do:
.. code-block:: python
LOG.debug("This is a custom debugging statement")
You may notice that all the existing logging messages are preceded by an
underscore and surrounded by parentheses, for example:
.. code-block:: python
LOG.debug(_("Logging statement appears here"))
This formatting is used to support translation of logging messages into
different languages using the
`gettext <https://docs.python.org/2/library/gettext.html>`_
internationalization library. You don't need to do this for your own
custom log messages. However, if you want to contribute the code back to
the OpenStack project that includes logging statements, you must
surround your log messages with underscores and parentheses.
RabbitMQ Web Management Interface or rabbitmqctl
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aside from connection failures, RabbitMQ log files are generally not
useful for debugging OpenStack related issues. Instead, we recommend you
use the RabbitMQ web management interface.RabbitMQlogging/monitoring
RabbitMQ web management interface Enable it on your cloud
controller:
.. code-block:: console
# /usr/lib/rabbitmq/bin/rabbitmq-plugins enable rabbitmq_management
.. code-block:: console
# service rabbitmq-server restart
The RabbitMQ web management interface is accessible on your cloud
controller at *http://localhost:55672*.
.. note::
Ubuntu 12.04 installs RabbitMQ version 2.7.1, which uses port 55672.
RabbitMQ versions 3.0 and above use port 15672 instead. You can
check which version of RabbitMQ you have running on your local
Ubuntu machine by doing:
.. code-block:: console
$ dpkg -s rabbitmq-server | grep "Version:"
Version: 2.7.1-0ubuntu4
An alternative to enabling the RabbitMQ web management interface is to
use the ``rabbitmqctl`` commands. For example,
:command:`rabbitmqctl list_queues| grep cinder` displays any messages left in
the queue. If there are messages, it's a possible sign that cinder
services didn't connect properly to rabbitmq and might have to be
restarted.
Items to monitor for RabbitMQ include the number of items in each of the
queues and the processing time statistics for the server.
Centrally Managing Logs
~~~~~~~~~~~~~~~~~~~~~~~
Because your cloud is most likely composed of many servers, you must
check logs on each of those servers to properly piece an event together.
A better solution is to send the logs of all servers to a central
location so that they can all be accessed from the same
area.
Ubuntu uses rsyslog as the default logging service. Since it is natively
able to send logs to a remote location, you don't have to install
anything extra to enable this feature, just modify the configuration
file. In doing this, consider running your logging over a management
network or using an encrypted VPN to avoid interception.
rsyslog Client Configuration
----------------------------
To begin, configure all OpenStack components to log to syslog in
addition to their standard log file location. Also configure each
component to log to a different syslog facility. This makes it easier to
split the logs into individual components on the central server:
``nova.conf``:
.. code-block:: ini
use_syslog=True
syslog_log_facility=LOG_LOCAL0
``glance-api.conf`` and ``glance-registry.conf``:
.. code-block:: ini
use_syslog=True
syslog_log_facility=LOG_LOCAL1
``cinder.conf``:
.. code-block:: ini
use_syslog=True
syslog_log_facility=LOG_LOCAL2
``keystone.conf``:
.. code-block:: ini
use_syslog=True
syslog_log_facility=LOG_LOCAL3
By default, Object Storage logs to syslog.
Next, create ``/etc/rsyslog.d/client.conf`` with the following line:
.. code-block:: ini
*.* @192.168.1.10
This instructs rsyslog to send all logs to the IP listed. In this
example, the IP points to the cloud controller.
rsyslog Server Configuration
----------------------------
Designate a server as the central logging server. The best practice is
to choose a server that is solely dedicated to this purpose. Create a
file called ``/etc/rsyslog.d/server.conf`` with the following contents:
.. code-block:: ini
# Enable UDP
$ModLoad imudp
# Listen on 192.168.1.10 only
$UDPServerAddress 192.168.1.10
# Port 514
$UDPServerRun 514
# Create logging templates for nova
$template NovaFile,"/var/log/rsyslog/%HOSTNAME%/nova.log"
$template NovaAll,"/var/log/rsyslog/nova.log"
# Log everything else to syslog.log
$template DynFile,"/var/log/rsyslog/%HOSTNAME%/syslog.log"
*.* ?DynFile
# Log various openstack components to their own individual file
local0.* ?NovaFile
local0.* ?NovaAll
& ~
This example configuration handles the nova service only. It first
configures rsyslog to act as a server that runs on port 514. Next, it
creates a series of logging templates. Logging templates control where
received logs are stored. Using the last example, a nova log from
c01.example.com goes to the following locations:
- ``/var/log/rsyslog/c01.example.com/nova.log``
- ``/var/log/rsyslog/nova.log``
This is useful, as logs from c02.example.com go to:
- ``/var/log/rsyslog/c02.example.com/nova.log``
- ``/var/log/rsyslog/nova.log``
You have an individual log file for each compute node as well as an
aggregated log that contains nova logs from all nodes.
Monitoring
~~~~~~~~~~
There are two types of monitoring: watching for problems and watching
usage trends. The former ensures that all services are up and running,
creating a functional cloud. The latter involves monitoring resource
usage over time in order to make informed decisions about potential
bottlenecks and upgrades.
**Nagios** is an open source monitoring service. It's capable of executing
arbitrary commands to check the status of server and network services,
remotely executing arbitrary commands directly on servers, and allowing
servers to push notifications back in the form of passive monitoring.
Nagios has been around since 1999. Although newer monitoring services
are available, Nagios is a tried-and-true systems administration
staple.
Process Monitoring
------------------
A basic type of alert monitoring is to simply check and see whether a
required process is running.monitoring process monitoringprocess
monitoringlogging/monitoring process monitoring For example, ensure that
the ``nova-api`` service is running on the cloud controller:
.. code-block:: console
# ps aux | grep nova-api
nova 12786 0.0 0.0 37952 1312 ? Ss Feb11 0:00 su -s /bin/sh -c exec nova-api
--config-file=/etc/nova/nova.conf nova
nova 12787 0.0 0.1 135764 57400 ? S Feb11 0:01 /usr/bin/python
/usr/bin/nova-api --config-file=/etc/nova/nova.conf
nova 12792 0.0 0.0 96052 22856 ? S Feb11 0:01 /usr/bin/python
/usr/bin/nova-api --config-file=/etc/nova/nova.conf
nova 12793 0.0 0.3 290688 115516 ? S Feb11 1:23 /usr/bin/python
/usr/bin/nova-api --config-file=/etc/nova/nova.conf
nova 12794 0.0 0.2 248636 77068 ? S Feb11 0:04 /usr/bin/python
/usr/bin/nova-api --config-file=/etc/nova/nova.conf
root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api
You can create automated alerts for critical processes by using Nagios
and NRPE. For example, to ensure that the ``nova-compute`` process is
running on compute nodes, create an alert on your Nagios server that
looks like this:
.. code-block:: none
define service {
host_name c01.example.com
check_command check_nrpe_1arg!check_nova-compute
use generic-service
notification_period 24x7
contact_groups sysadmins
service_description nova-compute
}
Then on the actual compute node, create the following NRPE
configuration:
.. code-block:: none
\command[check_nova-compute]=/usr/lib/nagios/plugins/check_procs -c 1: \
-a nova-compute
Nagios checks that at least one ``nova-compute`` service is running at
all times.
Resource Alerting
-----------------
Resource alerting provides notifications when one or more resources are
critically low. While the monitoring thresholds should be tuned to your
specific OpenStack environment, monitoring resource usage is not
specific to OpenStack at all—any generic type of alert will work
fine.
Some of the resources that you want to monitor include:
- Disk usage
- Server load
- Memory usage
- Network I/O
- Available vCPUs
For example, to monitor disk capacity on a compute node with Nagios, add
the following to your Nagios configuration:
.. code-block:: none
define service {
host_name c01.example.com
check_command check_nrpe!check_all_disks!20% 10%
use generic-service
contact_groups sysadmins
service_description Disk
}
On the compute node, add the following to your NRPE configuration:
.. code-block:: none
command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c \
$ARG2$ -e
Nagios alerts you with a WARNING when any disk on the compute node is 80
percent full and CRITICAL when 90 percent is full.
StackTach
---------
StackTach is a tool that collects and reports the notifications sent by
``nova``. Notifications are essentially the same as logs but can be much
more detailed. Nearly all OpenStack components are capable of generating
notifications when significant events occur. Notifications are messages
placed on the OpenStack queue (generally RabbitMQ) for consumption by
downstream systems. An overview of notifications can be found at `System
Usage
Data <https://wiki.openstack.org/wiki/SystemUsageData>`_.
To enable ``nova`` to send notifications, add the following to
``nova.conf``:
.. code-block:: ini
notification_topics=monitor
notification_driver=messagingv2
Once ``nova`` is sending notifications, install and configure StackTach.
StackTach workers for Queue consumption and pipeling processing are
configured to read these notifications from RabbitMQ servers and store
them in a database. Users can inquire on instances, requests and servers
by using the browser interface or command line tool,
`Stacky <https://github.com/rackerlabs/stacky>`_. Since StackTach is
relatively new and constantly changing, installation instructions
quickly become outdated. Please refer to the `StackTach Git
repo <https://git.openstack.org/cgit//openstack/stacktach>`_ for
instructions as well as a demo video. Additional details on the latest
developments can be discovered at the `official
page <http://stacktach.com/>`_
Logstash
--------
Logstash is a high performance indexing and search engine for logs. Logs
from Jenkins test runs are sent to logstash where they are indexed and
stored. Logstash facilitates reviewing logs from multiple sources in a
single test run, searching for errors or particular events within a test
run, and searching for log event trends across test runs.
There are four major layers in Logstash setup which are
- Log Pusher
- Log Indexer
- ElasticSearch
- Kibana
Each layer scales horizontally. As the number of logs grows you can add
more log pushers, more Logstash indexers, and more ElasticSearch nodes.
Logpusher is a pair of Python scripts which first listens to Jenkins
build events and converts them into Gearman jobs. Gearman provides a
generic application framework to farm out work to other machines or
processes that are better suited to do the work. It allows you to do
work in parallel, to load balance processing, and to call functions
between languages.Later Logpusher performs Gearman jobs to push log
files into logstash. Logstash indexer reads these log events, filters
them to remove unwanted lines, collapse multiple events together, and
parses useful information before shipping them to ElasticSearch for
storage and indexing. Kibana is a logstash oriented web client for
ElasticSearch.
OpenStack Telemetry
-------------------
An integrated OpenStack project (code-named :term:`ceilometer`) collects
metering and event data relating to OpenStack services. Data collected
by the Telemetry service could be used for billing. Depending on
deployment configuration, collected data may be accessible to users
based on the deployment configuration. The Telemetry service provides a
REST API documented at
http://developer.openstack.org/api-ref-telemetry-v2.html. You can read
more about the module in the `OpenStack Administrator
Guide <http://docs.openstack.org/admin-guide/telemetry.html>`_ or
in the `developer
documentation <http://docs.openstack.org/developer/ceilometer>`_.
OpenStack-Specific Resources
----------------------------
Resources such as memory, disk, and CPU are generic resources that all
servers (even non-OpenStack servers) have and are important to the
overall health of the server. When dealing with OpenStack specifically,
these resources are important for a second reason: ensuring that enough
are available to launch instances. There are a few ways you can see
OpenStack resource usage.monitoring OpenStack-specific
resourcesresources generic vs. OpenStack-specificlogging/monitoring
OpenStack-specific resources The first is through the :command:`nova` command:
.. code-block:: console
# nova usage-list
This command displays a list of how many instances a tenant has running
and some light usage statistics about the combined instances. This
command is useful for a quick overview of your cloud, but it doesn't
really get into a lot of details.
Next, the ``nova`` database contains three tables that store usage
information.
The ``nova.quotas`` and ``nova.quota_usages`` tables store quota
information. If a tenant's quota is different from the default quota
settings, its quota is stored in the ``nova.quotas`` table. For example:
.. code-block:: mysql
mysql> select project_id, resource, hard_limit from quotas;
+----------------------------------+-----------------------------+------------+
| project_id | resource | hard_limit |
+----------------------------------+-----------------------------+------------+
| 628df59f091142399e0689a2696f5baa | metadata_items | 128 |
| 628df59f091142399e0689a2696f5baa | injected_file_content_bytes | 10240 |
| 628df59f091142399e0689a2696f5baa | injected_files | 5 |
| 628df59f091142399e0689a2696f5baa | gigabytes | 1000 |
| 628df59f091142399e0689a2696f5baa | ram | 51200 |
| 628df59f091142399e0689a2696f5baa | floating_ips | 10 |
| 628df59f091142399e0689a2696f5baa | instances | 10 |
| 628df59f091142399e0689a2696f5baa | volumes | 10 |
| 628df59f091142399e0689a2696f5baa | cores | 20 |
+----------------------------------+-----------------------------+------------+
The ``nova.quota_usages`` table keeps track of how many resources the
tenant currently has in use:
.. code-block:: mysql
mysql> select project_id, resource, in_use from quota_usages where project_id like '628%';
+----------------------------------+--------------+--------+
| project_id | resource | in_use |
+----------------------------------+--------------+--------+
| 628df59f091142399e0689a2696f5baa | instances | 1 |
| 628df59f091142399e0689a2696f5baa | ram | 512 |
| 628df59f091142399e0689a2696f5baa | cores | 1 |
| 628df59f091142399e0689a2696f5baa | floating_ips | 1 |
| 628df59f091142399e0689a2696f5baa | volumes | 2 |
| 628df59f091142399e0689a2696f5baa | gigabytes | 12 |
| 628df59f091142399e0689a2696f5baa | images | 1 |
+----------------------------------+--------------+--------+
By comparing a tenant's hard limit with their current resource usage,
you can see their usage percentage. For example, if this tenant is using
1 floating IP out of 10, then they are using 10 percent of their
floating IP quota. Rather than doing the calculation manually, you can
use SQL or the scripting language of your choice and create a formatted
report:
.. code-block:: mysql
+----------------------------------+------------+-------------+---------------+
| some_tenant |
+-----------------------------------+------------+------------+---------------+
| Resource | Used | Limit | |
+-----------------------------------+------------+------------+---------------+
| cores | 1 | 20 | 5 % |
| floating_ips | 1 | 10 | 10 % |
| gigabytes | 12 | 1000 | 1 % |
| images | 1 | 4 | 25 % |
| injected_file_content_bytes | 0 | 10240 | 0 % |
| injected_file_path_bytes | 0 | 255 | 0 % |
| injected_files | 0 | 5 | 0 % |
| instances | 1 | 10 | 10 % |
| key_pairs | 0 | 100 | 0 % |
| metadata_items | 0 | 128 | 0 % |
| ram | 512 | 51200 | 1 % |
| reservation_expire | 0 | 86400 | 0 % |
| security_group_rules | 0 | 20 | 0 % |
| security_groups | 0 | 10 | 0 % |
| volumes | 2 | 10 | 20 % |
+-----------------------------------+------------+------------+---------------+
The preceding information was generated by using a custom script that
can be found on
`GitHub <https://github.com/cybera/novac/blob/dev/libexec/novac-quota-report>`_.
.. note::
This script is specific to a certain OpenStack installation and must
be modified to fit your environment. However, the logic should
easily be transferable.
Intelligent Alerting
--------------------
Intelligent alerting can be thought of as a form of continuous
integration for operations. For example, you can easily check to see
whether the Image service is up and running by ensuring that
the ``glance-api`` and ``glance-registry`` processes are running or by
seeing whether ``glace-api`` is responding on port 9292.
But how can you tell whether images are being successfully uploaded to
the Image service? Maybe the disk that Image service is storing the
images on is full or the S3 back end is down. You could naturally check
this by doing a quick image upload:
.. code-block:: bash
#!/bin/bash
#
# assumes that reasonable credentials have been stored at
# /root/auth
. /root/openrc
wget http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img
glance image-create --name='cirros image' --is-public=true
--container-format=bare --disk-format=qcow2 < cirros-0.3.4-x8
6_64-disk.img
By taking this script and rolling it into an alert for your monitoring
system (such as Nagios), you now have an automated way of ensuring that
image uploads to the Image Catalog are working.
.. note::
You must remove the image after each test. Even better, test whether
you can successfully delete an image from the Image service.
Intelligent alerting takes considerably more time to plan and implement
than the other alerts described in this chapter. A good outline to
implement intelligent alerting is:
- Review common actions in your cloud.
- Create ways to automatically test these actions.
- Roll these tests into an alerting system.
Some other examples for Intelligent Alerting include:
- Can instances launch and be destroyed?
- Can users be created?
- Can objects be stored and deleted?
- Can volumes be created and destroyed?
Trending
--------
Trending can give you great insight into how your cloud is performing
day to day. You can learn, for example, if a busy day was simply a rare
occurrence or if you should start adding new compute nodes.
Trending takes a slightly different approach than alerting. While
alerting is interested in a binary result (whether a check succeeds or
fails), trending records the current state of something at a certain
point in time. Once enough points in time have been recorded, you can
see how the value has changed over time.
All of the alert types mentioned earlier can also be used for trend
reporting. Some other trend examples include:
- The number of instances on each compute node
- The types of flavors in use
- The number of volumes in use
- The number of Object Storage requests each hour
- The number of ``nova-api`` requests each hour
- The I/O statistics of your storage services
As an example, recording ``nova-api`` usage can allow you to track the
need to scale your cloud controller. By keeping an eye on ``nova-api``
requests, you can determine whether you need to spawn more ``nova-api``
processes or go as far as introducing an entirely new server to run
``nova-api``. To get an approximate count of the requests, look for
standard INFO messages in ``/var/log/nova/nova-api.log``:
.. code-block:: console
# grep INFO /var/log/nova/nova-api.log | wc
You can obtain further statistics by looking for the number of
successful requests:
.. code-block:: console
# grep " 200 " /var/log/nova/nova-api.log | wc
By running this command periodically and keeping a record of the result,
you can create a trending report over time that shows whether your
``nova-api`` usage is increasing, decreasing, or keeping steady.
A tool such as **collectd** can be used to store this information. While
collectd is out of the scope of this book, a good starting point would
be to use collectd to store the result as a COUNTER data type. More
information can be found in `collectd's
documentation <https://collectd.org/wiki/index.php/Data_source>`_.
Summary
~~~~~~~
For stable operations, you want to detect failure promptly and determine
causes efficiently. With a distributed system, it's even more important
to track the right items to meet a service-level target. Learning where
these logs are located in the file system or API gives you an advantage.
This chapter also showed how to read, interpret, and manipulate
information from OpenStack services so that you can monitor effectively.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,778 @@
===========================
Managing Projects and Users
===========================
An OpenStack cloud does not have much value without users. This chapter
covers topics that relate to managing users, projects, and quotas. This
chapter describes users and projects as described by version 2 of the
OpenStack Identity API.
.. warning::
While version 3 of the Identity API is available, the client tools
do not yet implement those calls, and most OpenStack clouds are
still implementing Identity API v2.0.
Projects or Tenants?
~~~~~~~~~~~~~~~~~~~~
In OpenStack user interfaces and documentation, a group of users is
referred to as a :term:`project` or :term:`tenant`.
These terms are interchangeable.
The initial implementation of OpenStack Compute had its own
authentication system and used the term ``project``. When authentication
moved into the OpenStack Identity (keystone) project, it used the term
``tenant`` to refer to a group of users. Because of this legacy, some of
the OpenStack tools refer to projects and some refer to tenants.
.. note::
This guide uses the term ``project``, unless an example shows
interaction with a tool that uses the term ``tenant``.
Managing Projects
~~~~~~~~~~~~~~~~~
Users must be associated with at least one project, though they may
belong to many. Therefore, you should add at least one project before
adding users.
Adding Projects
---------------
To create a project through the OpenStack dashboard:
#. Log in as an administrative user.
#. Select the :guilabel:`Identity` tab in the left navigation bar.
#. Under Identity tab, click :guilabel:`Projects`.
#. Click the :guilabel:`Create Project` button.
You are prompted for a project name and an optional, but recommended,
description. Select the checkbox at the bottom of the form to enable
this project. By default, it is enabled, as shown in
:ref:`figure_create_project`.
.. _figure_create_project:
.. figure:: figures/osog_0901.png
:alt: Dashboard's Create Project form
Figure Dashboard's Create Project form
It is also possible to add project members and adjust the project
quotas. We'll discuss those actions later, but in practice, it can be
quite convenient to deal with all these operations at one time.
To add a project through the command line, you must use the OpenStack
command line client.
.. code-block:: console
# openstack project create demo
This command creates a project named "demo." Optionally, you can add a
description string by appending :option:`--description tenant-description`,
which can be very useful. You can also
create a group in a disabled state by appending :option:`--disable` to the
command. By default, projects are created in an enabled state.
Quotas
~~~~~~
To prevent system capacities from being exhausted without notification,
you can set up :term:`quotas <quota>`. Quotas are operational limits. For example,
the number of gigabytes allowed per tenant can be controlled to ensure that
a single tenant cannot consume all of the disk space. Quotas are
currently enforced at the tenant (or project) level, rather than the
user level.
.. warning::
Because without sensible quotas a single tenant could use up all the
available resources, default quotas are shipped with OpenStack. You
should pay attention to which quota settings make sense for your
hardware capabilities.
Using the command-line interface, you can manage quotas for the
OpenStack Compute service and the Block Storage service.
Typically, default values are changed because a tenant requires more
than the OpenStack default of 10 volumes per tenant, or more than the
OpenStack default of 1 TB of disk space on a compute node.
.. note::
To view all tenants, run:
.. code-block:: console
$ openstack project list
+---------------------------------+----------+
| ID | Name |
+---------------------------------+----------+
| a981642d22c94e159a4a6540f70f9f8 | admin |
| 934b662357674c7b9f5e4ec6ded4d0e | tenant01 |
| 7bc1dbfd7d284ec4a856ea1eb82dca8 | tenant02 |
| 9c554aaef7804ba49e1b21cbd97d218 | services |
+---------------------------------+----------+
Set Image Quotas
----------------
You can restrict a project's image storage by total number of bytes.
Currently, this quota is applied cloud-wide, so if you were to set an
Image quota limit of 5 GB, then all projects in your cloud will be able
to store only 5 GB of images and snapshots.
To enable this feature, edit the ``/etc/glance/glance-api.conf`` file,
and under the ``[DEFAULT]`` section, add:
.. code-block:: ini
user_storage_quota = <bytes>
For example, to restrict a project's image storage to 5 GB, do this:
.. code-block:: ini
user_storage_quota = 5368709120
.. note::
There is a configuration option in ``glance-api.conf`` that limits
the number of members allowed per image, called
``image_member_quota``, set to 128 by default. That setting is a
different quota from the storage quota.
Set Compute Service Quotas
--------------------------
As an administrative user, you can update the Compute service quotas for
an existing tenant, as well as update the quota defaults for a new
tenant.Compute Compute service See :ref:`table_compute_quota`.
.. _table_compute_quota:
.. list-table:: Compute quota descriptions
:widths: 30 40 30
:header-rows: 1
* - Quota
- Description
- Property name
* - Fixed IPs
- Number of fixed IP addresses allowed per tenant.
This number must be equal to or greater than the number
of allowed instances.
- fixed-ips
* - Floating IPs
- Number of floating IP addresses allowed per tenant.
- floating-ips
* - Injected file content bytes
- Number of content bytes allowed per injected file.
- injected-file-content-bytes
* - Injected file path bytes
- Number of bytes allowed per injected file path.
- injected-file-path-bytes
* - Injected files
- Number of injected files allowed per tenant.
- injected-files
* - Instances
- Number of instances allowed per tenant.
- instances
* - Key pairs
- Number of key pairs allowed per user.
- key-pairs
* - Metadata items
- Number of metadata items allowed per instance.
- metadata-items
* - RAM
- Megabytes of instance RAM allowed per tenant.
- ram
* - Security group rules
- Number of rules per security group.
- security-group-rules
* - Security groups
- Number of security groups per tenant.
- security-groups
* - VCPUs
- Number of instance cores allowed per tenant.
- cores
View and update compute quotas for a tenant (project)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As an administrative user, you can use the :command:`nova quota-*`
commands, which are provided by the
``python-novaclient`` package, to view and update tenant quotas.
**To view and update default quota values**
#. List all default quotas for all tenants, as follows:
.. code-block:: console
$ nova quota-defaults
For example:
.. code-block:: console
$ nova quota-defaults
+-----------------------------+-------+
| Property | Value |
+-----------------------------+-------+
| metadata_items | 128 |
| injected_file_content_bytes | 10240 |
| ram | 51200 |
| floating_ips | 10 |
| key_pairs | 100 |
| instances | 10 |
| security_group_rules | 20 |
| injected_files | 5 |
| cores | 20 |
| fixed_ips | -1 |
| injected_file_path_bytes | 255 |
| security_groups | 10 |
+-----------------------------+-------+
#. Update a default value for a new tenant, as follows:
.. code-block:: console
$ nova quota-class-update default key value
For example:
.. code-block:: console
$ nova quota-class-update default --instances 15
**To view quota values for a tenant (project)**
#. Place the tenant ID in a variable:
.. code-block:: console
$ tenant=$(openstack project list | awk '/tenantName/ {print $2}')
#. List the currently set quota values for a tenant, as follows:
.. code-block:: console
$ nova quota-show --tenant $tenant
For example:
.. code-block:: console
$ nova quota-show --tenant $tenant
+-----------------------------+-------+
| Property | Value |
+-----------------------------+-------+
| metadata_items | 128 |
| injected_file_content_bytes | 10240 |
| ram | 51200 |
| floating_ips | 12 |
| key_pairs | 100 |
| instances | 10 |
| security_group_rules | 20 |
| injected_files | 5 |
| cores | 20 |
| fixed_ips | -1 |
| injected_file_path_bytes | 255 |
| security_groups | 10 |
+-----------------------------+-------+
**To update quota values for a tenant (project)**
#. Obtain the tenant ID, as follows:
.. code-block:: console
$ tenant=$(openstack project list | awk '/tenantName/ {print $2}')
#. Update a particular quota value, as follows:
.. code-block:: console
# nova quota-update --quotaName quotaValue tenantID
For example:
.. code-block:: console
# nova quota-update --floating-ips 20 $tenant
# nova quota-show --tenant $tenant
+-----------------------------+-------+
| Property | Value |
+-----------------------------+-------+
| metadata_items | 128 |
| injected_file_content_bytes | 10240 |
| ram | 51200 |
| floating_ips | 20 |
| key_pairs | 100 |
| instances | 10 |
| security_group_rules | 20 |
| injected_files | 5 |
| cores | 20 |
| fixed_ips | -1 |
| injected_file_path_bytes | 255 |
| security_groups | 10 |
+-----------------------------+-------+
.. note::
To view a list of options for the ``quota-update`` command, run:
.. code-block:: console
$ nova help quota-update
Set Object Storage Quotas
-------------------------
There are currently two categories of quotas for Object Storage:
Container quotas
Limit the total size (in bytes) or number of objects that can be
stored in a single container.
Account quotas
Limit the total size (in bytes) that a user has available in the
Object Storage service.
To take advantage of either container quotas or account quotas, your
Object Storage proxy server must have ``container_quotas`` or
``account_quotas`` (or both) added to the ``[pipeline:main]`` pipeline.
Each quota type also requires its own section in the
``proxy-server.conf`` file:
.. code-block:: ini
[pipeline:main]
pipeline = catch_errors [...] slo dlo account_quotas proxy-server
[filter:account_quotas]
use = egg:swift#account_quotas
[filter:container_quotas]
use = egg:swift#container_quotas
To view and update Object Storage quotas, use the :command:`swift` command
provided by the ``python-swiftclient`` package. Any user included in the
project can view the quotas placed on their project. To update Object
Storage quotas on a project, you must have the role of ResellerAdmin in
the project that the quota is being applied to.
To view account quotas placed on a project:
.. code-block:: console
$ swift stat
Account: AUTH_b36ed2d326034beba0a9dd1fb19b70f9
Containers: 0
Objects: 0
Bytes: 0
Meta Quota-Bytes: 214748364800
X-Timestamp: 1351050521.29419
Content-Type: text/plain; charset=utf-8
Accept-Ranges: bytes
To apply or update account quotas on a project:
.. code-block:: console
$ swift post -m quota-bytes:
<bytes>
For example, to place a 5 GB quota on an account:
.. code-block:: console
$ swift post -m quota-bytes:
5368709120
To verify the quota, run the :command:`swift stat` command again:
.. code-block:: console
$ swift stat
Account: AUTH_b36ed2d326034beba0a9dd1fb19b70f9
Containers: 0
Objects: 0
Bytes: 0
Meta Quota-Bytes: 5368709120
X-Timestamp: 1351541410.38328
Content-Type: text/plain; charset=utf-8
Accept-Ranges: bytes
Set Block Storage Quotas
------------------------
As an administrative user, you can update the Block Storage service
quotas for a tenant, as well as update the quota defaults for a new
tenant. See :ref:`table_block_storage_quota`.
.. _table_block_storage_quota:
.. list-table:: Table: Block Storage quota descriptions
:widths: 50 50
:header-rows: 1
* - Property name
- Description
* - gigabytes
- Number of volume gigabytes allowed per tenant
* - snapshots
- Number of Block Storage snapshots allowed per tenant.
* - volumes
- Number of Block Storage volumes allowed per tenant
View and update Block Storage quotas for a tenant (project)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As an administrative user, you can use the :command:`cinder quota-*`
commands, which are provided by the
``python-cinderclient`` package, to view and update tenant quotas.
**To view and update default Block Storage quota values**
#. List all default quotas for all tenants, as follows:
.. code-block:: console
$ cinder quota-defaults
For example:
.. code-block:: console
$ cinder quota-defaults
+-----------+-------+
| Property | Value |
+-----------+-------+
| gigabytes | 1000 |
| snapshots | 10 |
| volumes | 10 |
+-----------+-------+
#. To update a default value for a new tenant, update the property in the
``/etc/cinder/cinder.conf`` file.
**To view Block Storage quotas for a tenant (project)**
#. View quotas for the tenant, as follows:
.. code-block:: console
# cinder quota-show tenantName
For example:
.. code-block:: console
# cinder quota-show tenant01
+-----------+-------+
| Property | Value |
+-----------+-------+
| gigabytes | 1000 |
| snapshots | 10 |
| volumes | 10 |
+-----------+-------+
**To update Block Storage quotas for a tenant (project)**
#. Place the tenant ID in a variable:
.. code-block:: console
$ tenant=$(openstack project list | awk '/tenantName/ {print $2}')
#. Update a particular quota value, as follows:
.. code-block:: console
# cinder quota-update --quotaName NewValue tenantID
For example:
.. code-block:: console
# cinder quota-update --volumes 15 $tenant
# cinder quota-show tenant01
+-----------+-------+
| Property | Value |
+-----------+-------+
| gigabytes | 1000 |
| snapshots | 10 |
| volumes | 15 |
+-----------+-------+
User Management
~~~~~~~~~~~~~~~
The command-line tools for managing users are inconvenient to use
directly. They require issuing multiple commands to complete a single
task, and they use UUIDs rather than symbolic names for many items. In
practice, humans typically do not use these tools directly. Fortunately,
the OpenStack dashboard provides a reasonable interface to this. In
addition, many sites write custom tools for local needs to enforce local
policies and provide levels of self-service to users that aren't
currently available with packaged tools.
Creating New Users
~~~~~~~~~~~~~~~~~~
To create a user, you need the following information:
* Username
* Email address
* Password
* Primary project
* Role
* Enabled
Username and email address are self-explanatory, though your site may
have local conventions you should observe. The primary project is simply
the first project the user is associated with and must exist prior to
creating the user. Role is almost always going to be "member." Out of
the box, OpenStack comes with two roles defined:
member
A typical user
admin
An administrative super user, which has full permissions across all
projects and should be used with great care
It is possible to define other roles, but doing so is uncommon.
Once you've gathered this information, creating the user in the
dashboard is just another web form similar to what we've seen before and
can be found by clicking the Users link in the Identity navigation bar
and then clicking the Create User button at the top right.
Modifying users is also done from this Users page. If you have a large
number of users, this page can get quite crowded. The Filter search box
at the top of the page can be used to limit the users listing. A form
very similar to the user creation dialog can be pulled up by selecting
Edit from the actions dropdown menu at the end of the line for the user
you are modifying.
Associating Users with Projects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Many sites run with users being associated with only one project. This
is a more conservative and simpler choice both for administration and
for users. Administratively, if a user reports a problem with an
instance or quota, it is obvious which project this relates to. Users
needn't worry about what project they are acting in if they are only in
one project. However, note that, by default, any user can affect the
resources of any other user within their project. It is also possible to
associate users with multiple projects if that makes sense for your
organization.
Associating existing users with an additional project or removing them
from an older project is done from the Projects page of the dashboard by
selecting Modify Users from the Actions column, as shown in
:ref:`figure_edit_project_members`.
From this view, you can do a number of useful things, as well as a few
dangerous ones.
The first column of this form, named All Users, includes a list of all
the users in your cloud who are not already associated with this
project. The second column shows all the users who are. These lists can
be quite long, but they can be limited by typing a substring of the
username you are looking for in the filter field at the top of the
column.
From here, click the :guilabel:`+` icon to add users to the project.
Click the :guilabel:`-` to remove them.
.. _figure_edit_project_members:
.. figure:: figures/osog_0902.png
:alt: Edit Project Members tab
Edit Project Members tab
The dangerous possibility comes with the ability to change member roles.
This is the dropdown list below the username in the
:guilabel:`Project Members` list. In virtually all cases,
this value should be set to Member. This example purposefully shows
an administrative user where this value is admin.
.. warning::
The admin is global, not per project, so granting a user the admin
role in any project gives the user administrative rights across the
whole cloud.
Typical use is to only create administrative users in a single project,
by convention the admin project, which is created by default during
cloud setup. If your administrative users also use the cloud to launch
and manage instances, it is strongly recommended that you use separate
user accounts for administrative access and normal operations and that
they be in distinct projects.
Customizing Authorization
-------------------------
The default :term:`authorization` settings allow administrative users
only to create resources on behalf of a different project.
OpenStack handles two kinds of authorization policies:
Operation based
Policies specify access criteria for specific operations, possibly
with fine-grained control over specific attributes.
Resource based
Whether access to a specific resource might be granted or not
according to the permissions configured for the resource (currently
available only for the network resource). The actual authorization
policies enforced in an OpenStack service vary from deployment to
deployment.
The policy engine reads entries from the ``policy.json`` file. The
actual location of this file might vary from distribution to
distribution: for nova, it is typically in ``/etc/nova/policy.json``.
You can update entries while the system is running, and you do not have
to restart services. Currently, the only way to update such policies is
to edit the policy file.
The OpenStack service's policy engine matches a policy directly. A rule
indicates evaluation of the elements of such policies. For instance, in
a ``compute:create: [["rule:admin_or_owner"]]`` statement, the policy is
``compute:create``, and the rule is ``admin_or_owner``.
Policies are triggered by an OpenStack policy engine whenever one of
them matches an OpenStack API operation or a specific attribute being
used in a given operation. For instance, the engine tests the
``create:compute`` policy every time a user sends a
``POST /v2/{tenant_id}/servers`` request to the OpenStack Compute API
server. Policies can be also related to specific :term:`API extensions
<API extension>`. For instance, if a user needs an extension like
``compute_extension:rescue``, the attributes defined by the provider
extensions trigger the rule test for that operation.
An authorization policy can be composed by one or more rules. If more
rules are specified, evaluation policy is successful if any of the rules
evaluates successfully; if an API operation matches multiple policies,
then all the policies must evaluate successfully. Also, authorization
rules are recursive. Once a rule is matched, the rule(s) can be resolved
to another rule, until a terminal rule is reached. These are the rules
defined:
Role-based rules
Evaluate successfully if the user submitting the request has the
specified role. For instance, ``"role:admin"`` is successful if the
user submitting the request is an administrator.
Field-based rules
Evaluate successfully if a field of the resource specified in the
current request matches a specific value. For instance,
``"field:networks:shared=True"`` is successful if the attribute
shared of the network resource is set to ``true``.
Generic rules
Compare an attribute in the resource with an attribute extracted
from the user's security credentials and evaluates successfully if
the comparison is successful. For instance,
``"tenant_id:%(tenant_id)s"`` is successful if the tenant identifier
in the resource is equal to the tenant identifier of the user
submitting the request.
Here are snippets of the default nova ``policy.json`` file:
.. code-block:: json
{
"context_is_admin": [["role:admin"]],
"admin_or_owner": [["is_admin:True"], ["project_id:%(project_id)s"]], ~~~~(1)~~~~
"default": [["rule:admin_or_owner"]], ~~~~(2)~~~~
"compute:create": [ ],
"compute:create:attach_network": [ ],
"compute:create:attach_volume": [ ],
"compute:get_all": [ ],
"admin_api": [["is_admin:True"]],
"compute_extension:accounts": [["rule:admin_api"]],
"compute_extension:admin_actions": [["rule:admin_api"]],
"compute_extension:admin_actions:pause": [["rule:admin_or_owner"]],
"compute_extension:admin_actions:unpause": [["rule:admin_or_owner"]],
...
"compute_extension:admin_actions:migrate": [["rule:admin_api"]],
"compute_extension:aggregates": [["rule:admin_api"]],
"compute_extension:certificates": [ ],
...
"compute_extension:flavorextraspecs": [ ],
"compute_extension:flavormanage": [["rule:admin_api"]], ~~~~(3)~~~~
}
1. Shows a rule that evaluates successfully if the current user is an
administrator or the owner of the resource specified in the request
(tenant identifier is equal).
2. Shows the default policy, which is always evaluated if an API
operation does not match any of the policies in ``policy.json``.
3. Shows a policy restricting the ability to manipulate flavors to
administrators using the Admin API only.admin API
In some cases, some operations should be restricted to administrators
only. Therefore, as a further example, let us consider how this sample
policy file could be modified in a scenario where we enable users to
create their own flavors:
.. code-block:: console
"compute_extension:flavormanage": [ ],
Users Who Disrupt Other Users
-----------------------------
Users on your cloud can disrupt other users, sometimes intentionally and
maliciously and other times by accident. Understanding the situation
allows you to make a better decision on how to handle the
disruption.
For example, a group of users have instances that are utilizing a large
amount of compute resources for very compute-intensive tasks. This is
driving the load up on compute nodes and affecting other users. In this
situation, review your user use cases. You may find that high compute
scenarios are common, and should then plan for proper segregation in
your cloud, such as host aggregation or regions.
Another example is a user consuming a very large amount of
bandwidthbandwidth recognizing DDOS attacks. Again, the key is to
understand what the user is doing. If she naturally needs a high amount
of bandwidth, you might have to limit her transmission rate as to not
affect other users or move her to an area with more bandwidth available.
On the other hand, maybe her instance has been hacked and is part of a
botnet launching DDOS attacks. Resolution of this issue is the same as
though any other server on your network has been hacked. Contact the
user and give her time to respond. If she doesn't respond, shut down the
instance.
A final example is if a user is hammering cloud resources repeatedly.
Contact the user and learn what he is trying to do. Maybe he doesn't
understand that what he's doing is inappropriate, or maybe there is an
issue with the resource he is trying to access that is causing his
requests to queue or lag.
Summary
~~~~~~~
One key element of systems administration that is often overlooked is
that end users are the reason systems administrators exist. Don't go the
BOFH route and terminate every user who causes an alert to go off. Work
with users to understand what they're trying to accomplish and see how
your environment can better assist them in achieving their goals. Meet
your users needs by organizing your users into projects, applying
policies, managing quotas, and working with them.

View File

@ -0,0 +1,550 @@
========
Upgrades
========
With the exception of Object Storage, upgrading from one version of
OpenStack to another can take a great deal of effort. This chapter
provides some guidance on the operational aspects that you should
consider for performing an upgrade for a basic architecture.
Pre-upgrade considerations
~~~~~~~~~~~~~~~~~~~~~~~~~~
Upgrade planning
----------------
- Thoroughly review the `release
notes <http://wiki.openstack.org/wiki/ReleaseNotes/>`_ to learn
about new, updated, and deprecated features. Find incompatibilities
between versions.
- Consider the impact of an upgrade to users. The upgrade process
interrupts management of your environment including the dashboard. If
you properly prepare for the upgrade, existing instances, networking,
and storage should continue to operate. However, instances might
experience intermittent network interruptions.
- Consider the approach to upgrading your environment. You can perform
an upgrade with operational instances, but this is a dangerous
approach. You might consider using live migration to temporarily
relocate instances to other compute nodes while performing upgrades.
However, you must ensure database consistency throughout the process;
otherwise your environment might become unstable. Also, don't forget
to provide sufficient notice to your users, including giving them
plenty of time to perform their own backups.
- Consider adopting structure and options from the service
configuration files and merging them with existing configuration
files. The `OpenStack Configuration
Reference <http://docs.openstack.org/liberty/config-reference/content/>`_
contains new, updated, and deprecated options for most services.
- Like all major system upgrades, your upgrade could fail for one or
more reasons. You should prepare for this situation by having the
ability to roll back your environment to the previous release,
including databases, configuration files, and packages. We provide an
example process for rolling back your environment in
:ref:`rolling_back_a_failed_upgrade`.
- Develop an upgrade procedure and assess it thoroughly by using a test
environment similar to your production environment.
Pre-upgrade testing environment
-------------------------------
The most important step is the pre-upgrade testing. If you are upgrading
immediately after release of a new version, undiscovered bugs might
hinder your progress. Some deployers prefer to wait until the first
point release is announced. However, if you have a significant
deployment, you might follow the development and testing of the release
to ensure that bugs for your use cases are fixed.
Each OpenStack cloud is different even if you have a near-identical
architecture as described in this guide. As a result, you must still
test upgrades between versions in your environment using an approximate
clone of your environment.
However, that is not to say that it needs to be the same size or use
identical hardware as the production environment. It is important to
consider the hardware and scale of the cloud that you are upgrading. The
following tips can help you minimise the cost:
Use your own cloud
The simplest place to start testing the next version of OpenStack is
by setting up a new environment inside your own cloud. This might
seem odd, especially the double virtualization used in running
compute nodes. But it is a sure way to very quickly test your
configuration.
Use a public cloud
Consider using a public cloud to test the scalability limits of your
cloud controller configuration. Most public clouds bill by the hour,
which means it can be inexpensive to perform even a test with many
nodes.
Make another storage endpoint on the same system
If you use an external storage plug-in or shared file system with
your cloud, you can test whether it works by creating a second share
or endpoint. This allows you to test the system before entrusting
the new version on to your storage.
Watch the network
Even at smaller-scale testing, look for excess network packets to
determine whether something is going horribly wrong in
inter-component communication.
To set up the test environment, you can use one of several methods:
- Do a full manual install by using the `OpenStack Installation
Guide <http://docs.openstack.org/index.html#install-guides>`_ for
your platform. Review the final configuration files and installed
packages.
- Create a clone of your automated configuration infrastructure with
changed package repository URLs.
Alter the configuration until it works.
Either approach is valid. Use the approach that matches your experience.
An upgrade pre-testing system is excellent for getting the configuration
to work. However, it is important to note that the historical use of the
system and differences in user interaction can affect the success of
upgrades.
If possible, we highly recommend that you dump your production database
tables and test the upgrade in your development environment using this
data. Several MySQL bugs have been uncovered during database migrations
because of slight table differences between a fresh installation and
tables that migrated from one version to another. This will have impact
on large real datasets, which you do not want to encounter during a
production outage.
Artificial scale testing can go only so far. After your cloud is
upgraded, you must pay careful attention to the performance aspects of
your cloud.
Upgrade Levels
--------------
Upgrade levels are a feature added to OpenStack Compute since the
Grizzly release to provide version locking on the RPC (Message Queue)
communications between the various Compute services.
This functionality is an important piece of the puzzle when it comes to
live upgrades and is conceptually similar to the existing API versioning
that allows OpenStack services of different versions to communicate
without issue.
Without upgrade levels, an X+1 version Compute service can receive and
understand X version RPC messages, but it can only send out X+1 version
RPC messages. For example, if a nova-conductor process has been upgraded
to X+1 version, then the conductor service will be able to understand
messages from X version nova-compute processes, but those compute
services will not be able to understand messages sent by the conductor
service.
During an upgrade, operators can add configuration options to
``nova.conf`` which lock the version of RPC messages and allow live
upgrading of the services without interruption caused by version
mismatch. The configuration options allow the specification of RPC
version numbers if desired, but release name alias are also supported.
For example:
.. code-block:: ini
[upgrade_levels]
compute=X+1
conductor=X+1
scheduler=X+1
will keep the RPC version locked across the specified services to the
RPC version used in X+1. As all instances of a particular service are
upgraded to the newer version, the corresponding line can be removed
from ``nova.conf``.
Using this functionality, ideally one would lock the RPC version to the
OpenStack version being upgraded from on nova-compute nodes, to ensure
that, for example X+1 version nova-compute processes will continue to
work with X version nova-conductor processes while the upgrade
completes. Once the upgrade of nova-compute processes is complete, the
operator can move onto upgrading nova-conductor and remove the version
locking for nova-compute in ``nova.conf``.
General upgrade process
~~~~~~~~~~~~~~~~~~~~~~~
This section describes the process to upgrade a basic OpenStack
deployment based on the basic two-node architecture in the `OpenStack
Installation
Guide <http://docs.openstack.org/index.html#install-guides>`_. All
nodes must run a supported distribution of Linux with a recent kernel
and the current release packages.
Service specific upgrade instructions
-------------------------------------
* `Upgrading the Networking Service <http://docs.openstack.org/developer/neutron/devref/upgrade.html>`_
Prerequisites
-------------
- Perform some cleaning of the environment prior to starting the
upgrade process to ensure a consistent state. For example, instances
not fully purged from the system after deletion might cause
indeterminate behavior.
- For environments using the OpenStack Networking service (neutron),
verify the release version of the database. For example:
.. code-block:: console
# su -s /bin/sh -c "neutron-db-manage --config-file /etc/neutron/neutron.conf \
--config-file /etc/neutron/plugins/ml2/ml2_conf.ini current" neutron
Perform a backup
----------------
#. Save the configuration files on all nodes. For example:
.. code-block:: console
# for i in keystone glance nova neutron openstack-dashboard cinder heat ceilometer; \
do mkdir $i-kilo; \
done
# for i in keystone glance nova neutron openstack-dashboard cinder heat ceilometer; \
do cp -r /etc/$i/* $i-kilo/; \
done
.. note::
You can modify this example script on each node to handle different
services.
#. Make a full database backup of your production data. As of Kilo,
database downgrades are not supported, and the only method available to
get back to a prior database version will be to restore from backup.
.. code-block:: console
# mysqldump -u root -p --opt --add-drop-database --all-databases > icehouse-db-backup.sql
.. note::
Consider updating your SQL server configuration as described in the
`OpenStack Installation
Guide <http://docs.openstack.org/index.html#install-guides>`_.
Manage repositories
-------------------
On all nodes:
#. Remove the repository for the previous release packages.
#. Add the repository for the new release packages.
#. Update the repository database.
Upgrade packages on each node
-----------------------------
Depending on your specific configuration, upgrading all packages might
restart or break services supplemental to your OpenStack environment.
For example, if you use the TGT iSCSI framework for Block Storage
volumes and the upgrade includes new packages for it, the package
manager might restart the TGT iSCSI services and impact connectivity to
volumes.
If the package manager prompts you to update configuration files, reject
the changes. The package manager appends a suffix to newer versions of
configuration files. Consider reviewing and adopting content from these
files.
.. note::
You may need to explicitly install the ``ipset`` package if your
distribution does not install it as a dependency.
Update services
---------------
To update a service on each node, you generally modify one or more
configuration files, stop the service, synchronize the database schema,
and start the service. Some services require different steps. We
recommend verifying operation of each service before proceeding to the
next service.
The order you should upgrade services, and any changes from the general
upgrade process is described below:
**Controller node**
#. OpenStack Identity - Clear any expired tokens before synchronizing
the database.
#. OpenStack Image service
#. OpenStack Compute, including networking components.
#. OpenStack Networking
#. OpenStack Block Storage
#. OpenStack dashboard - In typical environments, updating the
dashboard only requires restarting the Apache HTTP service.
#. OpenStack Orchestration
#. OpenStack Telemetry - In typical environments, updating the
Telemetry service only requires restarting the service.
#. OpenStack Compute - Edit the configuration file and restart the
service.
#. OpenStack Networking - Edit the configuration file and restart the
service.
**Compute nodes**
- OpenStack Block Storage - Updating the Block Storage service only
requires restarting the service.
**Storage nodes**
- OpenStack Networking - Edit the configuration file and restart the
service.
Final steps
-----------
On all distributions, you must perform some final tasks to complete the
upgrade process.
#. Decrease DHCP timeouts by modifying ``/etc/nova/nova.conf`` on the
compute nodes back to the original value for your environment.
#. Update all ``.ini`` files to match passwords and pipelines as required
for the OpenStack release in your environment.
#. After migration, users see different results from
:command:`nova image-list` and :command:`glance image-list`. To ensure
users see the same images in the list
commands, edit the ``/etc/glance/policy.json`` and
``/etc/nova/policy.json`` files to contain
``"context_is_admin": "role:admin"``, which limits access to private
images for projects.
#. Verify proper operation of your environment. Then, notify your users
that their cloud is operating normally again.
.. _rolling_back_a_failed_upgrade:
Rolling back a failed upgrade
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Upgrades involve complex operations and can fail. Before attempting any
upgrade, you should make a full database backup of your production data.
As of Kilo, database downgrades are not supported, and the only method
available to get back to a prior database version will be to restore
from backup.
This section provides guidance for rolling back to a previous release of
OpenStack. All distributions follow a similar procedure.
A common scenario is to take down production management services in
preparation for an upgrade, completed part of the upgrade process, and
discovered one or more problems not encountered during testing. As a
consequence, you must roll back your environment to the original "known
good" state. You also made sure that you did not make any state changes
after attempting the upgrade process; no new instances, networks,
storage volumes, and so on. Any of these new resources will be in a
frozen state after the databases are restored from backup.
Within this scope, you must complete these steps to successfully roll
back your environment:
#. Roll back configuration files.
#. Restore databases from backup.
#. Roll back packages.
You should verify that you have the requisite backups to restore.
Rolling back upgrades is a tricky process because distributions tend to
put much more effort into testing upgrades than downgrades. Broken
downgrades take significantly more effort to troubleshoot and, resolve
than broken upgrades. Only you can weigh the risks of trying to push a
failed upgrade forward versus rolling it back. Generally, consider
rolling back as the very last option.
The following steps described for Ubuntu have worked on at least one
production environment, but they might not work for all environments.
**To perform the rollback**
#. Stop all OpenStack services.
#. Copy contents of configuration backup directories that you created
during the upgrade process back to ``/etc/<service>`` directory.
#. Restore databases from the ``RELEASE_NAME-db-backup.sql`` backup file
that you created with the :command:`mysqldump` command during the upgrade
process:
.. code-block:: console
# mysql -u root -p < RELEASE_NAME-db-backup.sql
#. Downgrade OpenStack packages.
.. warning::
Downgrading packages is by far the most complicated step; it is
highly dependent on the distribution and the overall administration
of the system.
#. Determine which OpenStack packages are installed on your system. Use the
:command:`dpkg --get-selections` command. Filter for OpenStack
packages, filter again to omit packages explicitly marked in the
``deinstall`` state, and save the final output to a file. For example,
the following command covers a controller node with keystone, glance,
nova, neutron, and cinder:
.. code-block:: console
# dpkg --get-selections | grep -e keystone -e glance -e nova -e neutron \
-e cinder | grep -v deinstall | tee openstack-selections
cinder-api install
cinder-common install
cinder-scheduler install
cinder-volume install
glance install
glance-api install
glance-common install
glance-registry install
neutron-common install
neutron-dhcp-agent install
neutron-l3-agent install
neutron-lbaas-agent install
neutron-metadata-agent install
neutron-plugin-openvswitch install
neutron-plugin-openvswitch-agent install
neutron-server install
nova-api install
nova-cert install
nova-common install
nova-conductor install
nova-consoleauth install
nova-novncproxy install
nova-objectstore install
nova-scheduler install
python-cinder install
python-cinderclient install
python-glance install
python-glanceclient install
python-keystone install
python-keystoneclient install
python-neutron install
python-neutronclient install
python-nova install
python-novaclient install
.. note::
Depending on the type of server, the contents and order of your
package list might vary from this example.
#. You can determine the package versions available for reversion by using
the ``apt-cache policy`` command. If you removed the Grizzly
repositories, you must first reinstall them and run ``apt-get update``:
.. code-block:: console
# apt-cache policy nova-common
nova-common:
Installed: 1:2013.2-0ubuntu1~cloud0
Candidate: 1:2013.2-0ubuntu1~cloud0
Version table:
*** 1:2013.2-0ubuntu1~cloud0 0
500 http://ubuntu-cloud.archive.canonical.com/ubuntu/
precise-updates/havana/main amd64 Packages
100 /var/lib/dpkg/status
1:2013.1.4-0ubuntu1~cloud0 0
500 http://ubuntu-cloud.archive.canonical.com/ubuntu/
precise-updates/grizzly/main amd64 Packages
2012.1.3+stable-20130423-e52e6912-0ubuntu1.2 0
500 http://us.archive.ubuntu.com/ubuntu/
precise-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu/
precise-security/main amd64 Packages
2012.1-0ubuntu2 0
500 http://us.archive.ubuntu.com/ubuntu/
precise/main amd64 Packages
This tells us the currently installed version of the package, newest
candidate version, and all versions along with the repository that
contains each version. Look for the appropriate Grizzly
version— ``1:2013.1.4-0ubuntu1~cloud0`` in this case. The process of
manually picking through this list of packages is rather tedious and
prone to errors. You should consider using the following script to help
with this process:
.. code-block:: console
# for i in `cut -f 1 openstack-selections | sed 's/neutron/quantum/;'`;
do echo -n $i ;apt-cache policy $i | grep -B 1 grizzly |
grep -v Packages | awk '{print "="$1}';done | tr '\n' ' ' |
tee openstack-grizzly-versions
cinder-api=1:2013.1.4-0ubuntu1~cloud0
cinder-common=1:2013.1.4-0ubuntu1~cloud0
cinder-scheduler=1:2013.1.4-0ubuntu1~cloud0
cinder-volume=1:2013.1.4-0ubuntu1~cloud0
glance=1:2013.1.4-0ubuntu1~cloud0
glance-api=1:2013.1.4-0ubuntu1~cloud0
glance-common=1:2013.1.4-0ubuntu1~cloud0
glance-registry=1:2013.1.4-0ubuntu1~cloud0
quantum-common=1:2013.1.4-0ubuntu1~cloud0
quantum-dhcp-agent=1:2013.1.4-0ubuntu1~cloud0
quantum-l3-agent=1:2013.1.4-0ubuntu1~cloud0
quantum-lbaas-agent=1:2013.1.4-0ubuntu1~cloud0
quantum-metadata-agent=1:2013.1.4-0ubuntu1~cloud0
quantum-plugin-openvswitch=1:2013.1.4-0ubuntu1~cloud0
quantum-plugin-openvswitch-agent=1:2013.1.4-0ubuntu1~cloud0
quantum-server=1:2013.1.4-0ubuntu1~cloud0
nova-api=1:2013.1.4-0ubuntu1~cloud0
nova-cert=1:2013.1.4-0ubuntu1~cloud0
nova-common=1:2013.1.4-0ubuntu1~cloud0
nova-conductor=1:2013.1.4-0ubuntu1~cloud0
nova-consoleauth=1:2013.1.4-0ubuntu1~cloud0
nova-novncproxy=1:2013.1.4-0ubuntu1~cloud0
nova-objectstore=1:2013.1.4-0ubuntu1~cloud0
nova-scheduler=1:2013.1.4-0ubuntu1~cloud0
python-cinder=1:2013.1.4-0ubuntu1~cloud0
python-cinderclient=1:1.0.3-0ubuntu1~cloud0
python-glance=1:2013.1.4-0ubuntu1~cloud0
python-glanceclient=1:0.9.0-0ubuntu1.2~cloud0
python-quantum=1:2013.1.4-0ubuntu1~cloud0
python-quantumclient=1:2.2.0-0ubuntu1~cloud0
python-nova=1:2013.1.4-0ubuntu1~cloud0
python-novaclient=1:2.13.0-0ubuntu1~cloud0
.. note::
If you decide to continue this step manually, don't forget to change
``neutron`` to ``quantum`` where applicable.
#. Use the :command:`apt-get install` command to install specific versions of each
package by specifying ``<package-name>=<version>``. The script in the
previous step conveniently created a list of ``package=version`` pairs
for you:
.. code-block:: console
# apt-get install `cat openstack-grizzly-versions`
This step completes the rollback procedure. You should remove the
upgrade release repository and run :command:`apt-get update` to prevent
accidental upgrades until you solve whatever issue caused you to roll
back your environment.

View File

@ -0,0 +1,324 @@
==================
Upstream OpenStack
==================
OpenStack is founded on a thriving community that is a source of help
and welcomes your contributions. This chapter details some of the ways
you can interact with the others involved.
Getting Help
~~~~~~~~~~~~
There are several avenues available for seeking assistance. The quickest
way is to help the community help you. Search the Q&A sites, mailing
list archives, and bug lists for issues similar to yours. If you can't
find anything, follow the directions for reporting bugs or use one of
the channels for support, which are listed below.
Your first port of call should be the official OpenStack documentation,
found on http://docs.openstack.org. You can get questions answered on
http://ask.openstack.org.
`Mailing lists <https://wiki.openstack.org/wiki/Mailing_Lists>`_ are
also a great place to get help. The wiki page has more information about
the various lists. As an operator, the main lists you should be aware of
are:
`General list <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>`_
*openstack@lists.openstack.org*. The scope of this list is the
current state of OpenStack. This is a very high-traffic mailing
list, with many, many emails per day.
`Operators list <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>`_
*openstack-operators@lists.openstack.org.* This list is intended for
discussion among existing OpenStack cloud operators, such as
yourself. Currently, this list is relatively low traffic, on the
order of one email a day.
`Development list <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>`_
*openstack-dev@lists.openstack.org*. The scope of this list is the
future state of OpenStack. This is a high-traffic mailing list, with
multiple emails per day.
We recommend that you subscribe to the general list and the operator
list, although you must set up filters to manage the volume for the
general list. You'll also find links to the mailing list archives on the
mailing list wiki page, where you can search through the discussions.
`Multiple IRC channels <https://wiki.openstack.org/wiki/IRC>`_ are
available for general questions and developer discussions. The general
discussion channel is #openstack on *irc.freenode.net*.
Reporting Bugs
~~~~~~~~~~~~~~
As an operator, you are in a very good position to report unexpected
behavior with your cloud. Since OpenStack is flexible, you may be the
only individual to report a particular issue. Every issue is important
to fix, so it is essential to learn how to easily submit a bug
report.
All OpenStack projects use `Launchpad <https://launchpad.net/>`_
for bug tracking. You'll need to create an account on Launchpad before you
can submit a bug report.
Once you have a Launchpad account, reporting a bug is as simple as
identifying the project or projects that are causing the issue.
Sometimes this is more difficult than expected, but those working on the
bug triage are happy to help relocate issues if they are not in the
right place initially:
- Report a bug in
`nova <https://bugs.launchpad.net/nova/+filebug/+login>`_.
- Report a bug in
`python-novaclient <https://bugs.launchpad.net/python-novaclient/+filebug/+login>`_.
- Report a bug in
`swift <https://bugs.launchpad.net/swift/+filebug/+login>`_.
- Report a bug in
`python-swiftclient <https://bugs.launchpad.net/python-swiftclient/+filebug/+login>`_.
- Report a bug in
`glance <https://bugs.launchpad.net/glance/+filebug/+login>`_.
- Report a bug in
`python-glanceclient <https://bugs.launchpad.net/python-glanceclient/+filebug/+login>`_.
- Report a bug in
`keystone <https://bugs.launchpad.net/keystone/+filebug/+login>`_.
- Report a bug in
`python-keystoneclient <https://bugs.launchpad.net/python-keystoneclient/+filebug/+login>`_.
- Report a bug in
`neutron <https://bugs.launchpad.net/neutron/+filebug/+login>`_.
- Report a bug in
`python-neutronclient <https://bugs.launchpad.net/python-neutronclient/+filebug/+login>`_.
- Report a bug in
`cinder <https://bugs.launchpad.net/cinder/+filebug/+login>`_.
- Report a bug in
`python-cinderclient <https://bugs.launchpad.net/python-cinderclient/+filebug/+login>`_.
- Report a bug in
`manila <https://bugs.launchpad.net/manila/+filebug/+login>`_.
- Report a bug in
`python-manilaclient <https://bugs.launchpad.net/python-manilaclient/+filebug/+login>`_.
- Report a bug in
`python-openstackclient <https://bugs.launchpad.net/python-openstackclient/+filebug/+login>`_.
- Report a bug in
`horizon <https://bugs.launchpad.net/horizon/+filebug/+login>`_.
- Report a bug with the
`documentation <https://bugs.launchpad.net/openstack-manuals/+filebug/+login>`_.
- Report a bug with the `API
documentation <https://bugs.launchpad.net/openstack-api-site/+filebug/+login>`_.
To write a good bug report, the following process is essential. First,
search for the bug to make sure there is no bug already filed for the
same issue. If you find one, be sure to click on "This bug affects X
people. Does this bug affect you?" If you can't find the issue, then
enter the details of your report. It should at least include:
- The release, or milestone, or commit ID corresponding to the software
that you are running
- The operating system and version where you've identified the bug
- Steps to reproduce the bug, including what went wrong
- Description of the expected results instead of what you saw
- Portions of your log files so that you include only relevant excerpts
When you do this, the bug is created with:
- Status: *New*
In the bug comments, you can contribute instructions on how to fix a
given bug, and set it to *Triaged*. Or you can directly fix it: assign
the bug to yourself, set it to *In progress*, branch the code, implement
the fix, and propose your change for merging. But let's not get ahead of
ourselves; there are bug triaging tasks as well.
Confirming and Prioritizing
---------------------------
This stage is about checking that a bug is real and assessing its
impact. Some of these steps require bug supervisor rights (usually
limited to core teams). If the bug lacks information to properly
reproduce or assess the importance of the bug, the bug is set to:
- Status: *Incomplete*
Once you have reproduced the issue (or are 100 percent confident that
this is indeed a valid bug) and have permissions to do so, set:
- Status: *Confirmed*
Core developers also prioritize the bug, based on its impact:
- Importance: <Bug impact>
The bug impacts are categorized as follows:
#. *Critical* if the bug prevents a key feature from working properly
(regression) for all users (or without a simple workaround) or
results in data loss
#. *High* if the bug prevents a key feature from working properly for
some users (or with a workaround)
#. *Medium* if the bug prevents a secondary feature from working
properly
#. *Low* if the bug is mostly cosmetic
#. *Wishlist* if the bug is not really a bug but rather a welcome change
in behavior
If the bug contains the solution, or a patch, set the bug status to
*Triaged*.
Bug Fixing
----------
At this stage, a developer works on a fix. During that time, to avoid
duplicating the work, the developer should set:
- Status: *In Progress*
- Assignee: <yourself>
When the fix is ready, the developer proposes a change and gets the
change reviewed.
After the Change Is Accepted
----------------------------
After the change is reviewed, accepted, and lands in master, it
automatically moves to:
- Status: *Fix Committed*
When the fix makes it into a milestone or release branch, it
automatically moves to:
- Milestone: Milestone the bug was fixed in
- Status: \ *Fix Released*
Join the OpenStack Community
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since you've made it this far in the book, you should consider becoming
an official individual member of the community and `join the OpenStack
Foundation <https://www.openstack.org/join/>`_. The OpenStack
Foundation is an independent body providing shared resources to help
achieve the OpenStack mission by protecting, empowering, and promoting
OpenStack software and the community around it, including users,
developers, and the entire ecosystem. We all share the responsibility to
make this community the best it can possibly be, and signing up to be a
member is the first step to participating. Like the software, individual
membership within the OpenStack Foundation is free and accessible to
anyone.
How to Contribute to the Documentation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenStack documentation efforts encompass operator and administrator
docs, API docs, and user docs.
The genesis of this book was an in-person event, but now that the book
is in your hands, we want you to contribute to it. OpenStack
documentation follows the coding principles of iterative work, with bug
logging, investigating, and fixing.
Just like the code, http://docs.openstack.org is updated constantly
using the Gerrit review system, with source stored in git.openstack.org
in the `openstack-manuals
repository <https://git.openstack.org/cgit/openstack/openstack-manuals/>`_
and the `api-site
repository <https://git.openstack.org/cgit/openstack/api-site/>`_.
To review the documentation before it's published, go to the OpenStack
Gerrit server at \ http://review.openstack.org and search for
`project:openstack/openstack-manuals <https://review.openstack.org/#/q/status:open+project:openstack/openstack-manuals,n,z>`_
or
`project:openstack/api-site <https://review.openstack.org/#/q/status:open+project:openstack/api-site,n,z>`_.
See the `How To Contribute page on the
wiki <https://wiki.openstack.org/wiki/How_To_Contribute>`_ for more
information on the steps you need to take to submit your first
documentation review or change.
Security Information
~~~~~~~~~~~~~~~~~~~~
As a community, we take security very seriously and follow a specific
process for reporting potential issues. We vigilantly pursue fixes and
regularly eliminate exposures. You can report security issues you
discover through this specific process. The OpenStack Vulnerability
Management Team is a very small group of experts in vulnerability
management drawn from the OpenStack community. The team's job is
facilitating the reporting of vulnerabilities, coordinating security
fixes and handling progressive disclosure of the vulnerability
information. Specifically, the team is responsible for the following
functions:
Vulnerability management
All vulnerabilities discovered by community members (or users) can
be reported to the team.
Vulnerability tracking
The team will curate a set of vulnerability related issues in the
issue tracker. Some of these issues are private to the team and the
affected product leads, but once remediation is in place, all
vulnerabilities are public.
Responsible disclosure
As part of our commitment to work with the security community, the
team ensures that proper credit is given to security researchers who
responsibly report issues in OpenStack.
We provide two ways to report issues to the OpenStack Vulnerability
Management Team, depending on how sensitive the issue is:
- Open a bug in Launchpad and mark it as a "security bug." This makes
the bug private and accessible to only the Vulnerability Management
Team.
- If the issue is extremely sensitive, send an encrypted email to one
of the team's members. Find their GPG keys at `OpenStack
Security <http://www.openstack.org/projects/openstack-security/>`_.
You can find the full list of security-oriented teams you can join at
`Security Teams <https://wiki.openstack.org/wiki/SecurityTeams>`_. The
vulnerability management process is fully documented at `Vulnerability
Management <https://wiki.openstack.org/wiki/VulnerabilityManagement>`_.
Finding Additional Information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In addition to this book, there are many other sources of information
about OpenStack. The
`OpenStack website <http://www.openstack.org/>`_
is a good starting point, with
`OpenStack Docs <http://docs.openstack.org/>`_ and `OpenStack API
Docs <http://developer.openstack.org/>`_ providing technical
documentation about OpenStack. The `OpenStack
wiki <https://wiki.openstack.org/wiki/Main_Page>`_ contains a lot of
general information that cuts across the OpenStack projects, including a
list of `recommended
tools <https://wiki.openstack.org/wiki/OperationsTools>`_. Finally,
there are a number of blogs aggregated at \ `Planet
OpenStack <http://planet.openstack.org/>`_.OpenStack community
additional information

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,500 @@
=======
Preface
=======
OpenStack is an open source platform that lets you build an
:term:`Infrastructure-as-a-Service (IaaS)<IaaS>` cloud that runs on commodity
hardware.
Introduction to OpenStack
~~~~~~~~~~~~~~~~~~~~~~~~~
OpenStack believes in open source, open design, and open development,
all in an open community that encourages participation by anyone. The
long-term vision for OpenStack is to produce a ubiquitous open source
cloud computing platform that meets the needs of public and private
cloud providers regardless of size. OpenStack services control large
pools of compute, storage, and networking resources throughout a data
center.
The technology behind OpenStack consists of a series of interrelated
projects delivering various components for a cloud infrastructure
solution. Each service provides an open API so that all of these
resources can be managed through a dashboard that gives administrators
control while empowering users to provision resources through a web
interface, a command-line client, or software development kits that
support the API. Many OpenStack APIs are extensible, meaning you can
keep compatibility with a core set of calls while providing access to
more resources and innovating through API extensions. The OpenStack
project is a global collaboration of developers and cloud computing
technologists. The project produces an open standard cloud computing
platform for both public and private clouds. By focusing on ease of
implementation, massive scalability, a variety of rich features, and
tremendous extensibility, the project aims to deliver a practical and
reliable cloud solution for all types of organizations.
Getting Started with OpenStack
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As an open source project, one of the unique aspects of OpenStack is
that it has many different levels at which you can begin to engage with
it—you don't have to do everything yourself.
Using OpenStack
---------------
You could ask, "Do I even need to build a cloud?" If you want to start
using a compute or storage service by just swiping your credit card, you
can go to eNovance, HP, Rackspace, or other organizations to start using
their public OpenStack clouds. Using their OpenStack cloud resources is
similar to accessing the publicly available Amazon Web Services Elastic
Compute Cloud (EC2) or Simple Storage Solution (S3).
Plug and Play OpenStack
-----------------------
However, the enticing part of OpenStack might be to build your own
private cloud, and there are several ways to accomplish this goal.
Perhaps the simplest of all is an appliance-style solution. You purchase
an appliance, unpack it, plug in the power and the network, and watch it
transform into an OpenStack cloud with minimal additional configuration.
However, hardware choice is important for many applications, so if that
applies to you, consider that there are several software distributions
available that you can run on servers, storage, and network products of
your choosing. Canonical (where OpenStack replaced Eucalyptus as the
default cloud option in 2011), Red Hat, and SUSE offer enterprise
OpenStack solutions and support. You may also want to take a look at
some of the specialized distributions, such as those from Rackspace,
Piston, SwiftStack, or Cloudscaling.
Alternatively, if you want someone to help guide you through the
decisions about the underlying hardware or your applications, perhaps
adding in a few features or integrating components along the way,
consider contacting one of the system integrators with OpenStack
experience, such as Mirantis or Metacloud.
If your preference is to build your own OpenStack expertise internally,
a good way to kick-start that might be to attend or arrange a training
session. The OpenStack Foundation has a `Training
Marketplace <http://www.openstack.org/marketplace/training>`_ where you
can look for nearby events. Also, the OpenStack community is `working to
produce <https://wiki.openstack.org/wiki/Training-guides>`_ open source
training materials.
Roll Your Own OpenStack
-----------------------
However, this guide has a different audience—those seeking flexibility
from the OpenStack framework by deploying do-it-yourself solutions.
OpenStack is designed for horizontal scalability, so you can easily add
new compute, network, and storage resources to grow your cloud over
time. In addition to the pervasiveness of massive OpenStack public
clouds, many organizations, such as PayPal, Intel, and Comcast, build
large-scale private clouds. OpenStack offers much more than a typical
software package because it lets you integrate a number of different
technologies to construct a cloud. This approach provides great
flexibility, but the number of options might be daunting at first.
Who This Book Is For
~~~~~~~~~~~~~~~~~~~~
This book is for those of you starting to run OpenStack clouds as well
as those of you who were handed an operational one and want to keep it
running well. Perhaps you're on a DevOps team, perhaps you are a system
administrator starting to dabble in the cloud, or maybe you want to get
on the OpenStack cloud team at your company. This book is for all of
you.
This guide assumes that you are familiar with a Linux distribution that
supports OpenStack, SQL databases, and virtualization. You must be
comfortable administering and configuring multiple Linux machines for
networking. You must install and maintain an SQL database and
occasionally run queries against it.
One of the most complex aspects of an OpenStack cloud is the networking
configuration. You should be familiar with concepts such as DHCP, Linux
bridges, VLANs, and iptables. You must also have access to a network
hardware expert who can configure the switches and routers required in
your OpenStack cloud.
.. note::
Cloud computing is quite an advanced topic, and this book requires a
lot of background knowledge. However, if you are fairly new to cloud
computing, we recommend that you make use of the :doc:`common/glossary`
at the back of the book, as well as the online documentation for OpenStack
and additional resources mentioned in this book in :doc:`app_resources`.
Further Reading
---------------
There are other books on the `OpenStack documentation
website <http://docs.openstack.org>`_ that can help you get the job
done.
OpenStack Installation Guides
Describes a manual installation process, as in, by hand, without
automation, for multiple distributions based on a packaging system:
- `Installation Guide for openSUSE 13.2 and SUSE Linux Enterprise
Server
12 <http://docs.openstack.org/liberty/install-guide-obs/>`_
- `Installation Guide for Red Hat Enterprise Linux 7 and CentOS
7 <http://docs.openstack.org/liberty/install-guide-rdo/>`_
- `Installation Guide for Ubuntu 14.04 (LTS)
Server <http://docs.openstack.org/liberty/install-guide-ubuntu/>`_
`OpenStack Configuration Reference <http://docs.openstack.org/liberty/config-reference/content/>`_
Contains a reference listing of all configuration options for core
and integrated OpenStack services by release version
`OpenStack Administrator Guide <http://docs.openstack.org/admin-guide/>`_
Contains how-to information for managing an OpenStack cloud as
needed for your use cases, such as storage, computing, or
software-defined-networking
`OpenStack High Availability Guide <http://docs.openstack.org/ha-guide/index.html>`_
Describes potential strategies for making your OpenStack services
and related controllers and data stores highly available
`OpenStack Security Guide <http://docs.openstack.org/sec/>`_
Provides best practices and conceptual information about securing an
OpenStack cloud
`Virtual Machine Image Guide <http://docs.openstack.org/image-guide/>`_
Shows you how to obtain, create, and modify virtual machine images
that are compatible with OpenStack
`OpenStack End User Guide <http://docs.openstack.org/user-guide/>`_
Shows OpenStack end users how to create and manage resources in an
OpenStack cloud with the OpenStack dashboard and OpenStack client
commands
`Networking Guide <http://docs.openstack.org/networking-guide/>`_
This guide targets OpenStack administrators seeking to deploy and
manage OpenStack Networking (neutron).
`OpenStack API Guide <http://developer.openstack.org/api-guide/quick-start/>`_
A brief overview of how to send REST API requests to endpoints for
OpenStack services
How This Book Is Organized
~~~~~~~~~~~~~~~~~~~~~~~~~~
This book is organized into two parts: the architecture decisions for
designing OpenStack clouds and the repeated operations for running
OpenStack clouds.
**Part I:**
:doc:`arch_examples`
Because of all the decisions the other chapters discuss, this
chapter describes the decisions made for this particular book and
much of the justification for the example architecture.
:doc:`arch_provision`
While this book doesn't describe installation, we do recommend
automation for deployment and configuration, discussed in this
chapter.
:doc:`arch_cloud_controller`
The cloud controller is an invention for the sake of consolidating
and describing which services run on which nodes. This chapter
discusses hardware and network considerations as well as how to
design the cloud controller for performance and separation of
services.
:doc:`arch_compute_nodes`
This chapter describes the compute nodes, which are dedicated to
running virtual machines. Some hardware choices come into play here,
as well as logging and networking descriptions.
:doc:`arch_scaling`
This chapter discusses the growth of your cloud resources through
scaling and segregation considerations.
:doc:`arch_storage`
As with other architecture decisions, storage concepts within
OpenStack offer many options. This chapter lays out the choices for
you.
:doc:`arch_network_design`
Your OpenStack cloud networking needs to fit into your existing
networks while also enabling the best design for your users and
administrators, and this chapter gives you in-depth information
about networking decisions.
**Part II:**
:doc:`ops_lay_of_the_land`
This chapter is written to let you get your hands wrapped around
your OpenStack cloud through command-line tools and understanding
what is already set up in your cloud.
:doc:`ops_projects_users`
This chapter walks through user-enabling processes that all admins
must face to manage users, give them quotas to parcel out resources,
and so on.
:doc:`ops_user_facing_operations`
This chapter shows you how to use OpenStack cloud resources and how
to train your users.
:doc:`ops_maintenance`
This chapter goes into the common failures that the authors have
seen while running clouds in production, including troubleshooting.
:doc:`ops_network_troubleshooting`
Because network troubleshooting is especially difficult with virtual
resources, this chapter is chock-full of helpful tips and tricks for
tracing network traffic, finding the root cause of networking
failures, and debugging related services, such as DHCP and DNS.
:doc:`ops_logging_monitoring`
This chapter shows you where OpenStack places logs and how to best
read and manage logs for monitoring purposes.
:doc:`ops_backup_recovery`
This chapter describes what you need to back up within OpenStack as
well as best practices for recovering backups.
:doc:`ops_customize`
For readers who need to get a specialized feature into OpenStack,
this chapter describes how to use DevStack to write custom
middleware or a custom scheduler to rebalance your resources.
:doc:`ops_upstream`
Because OpenStack is so, well, open, this chapter is dedicated to
helping you navigate the community and find out where you can help
and where you can get help.
:doc:`ops_advanced_configuration`
Much of OpenStack is driver-oriented, so you can plug in different
solutions to the base set of services. This chapter describes some
advanced configuration topics.
:doc:`ops_upgrades`
This chapter provides upgrade information based on the architectures
used in this book.
**Back matter:**
:doc:`app_usecases`
You can read a small selection of use cases from the OpenStack
community with some technical details and further resources.
:doc:`app_crypt`
These are shared legendary tales of image disappearances, VM
massacres, and crazy troubleshooting techniques that result in
hard-learned lessons and wisdom.
:doc:`app_roadmaps`
Read about how to track the OpenStack roadmap through the open and
transparent development processes.
:doc:`app_resources`
So many OpenStack resources are available online because of the
fast-moving nature of the project, but there are also resources
listed here that the authors found helpful while learning
themselves.
:doc:`common/glossary`
A list of terms used in this book is included, which is a subset of
the larger OpenStack glossary available online.
Why and How We Wrote This Book
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We wrote this book because we have deployed and maintained OpenStack
clouds for at least a year and we wanted to share this knowledge with
others. After months of being the point people for an OpenStack cloud,
we also wanted to have a document to hand to our system administrators
so that they'd know how to operate the cloud on a daily basis—both
reactively and pro-actively. We wanted to provide more detailed
technical information about the decisions that deployers make along the
way.
We wrote this book to help you:
- Design and create an architecture for your first nontrivial OpenStack
cloud. After you read this guide, you'll know which questions to ask
and how to organize your compute, networking, and storage resources
and the associated software packages.
- Perform the day-to-day tasks required to administer a cloud.
We wrote this book in a book sprint, which is a facilitated, rapid
development production method for books. For more information, see the
`BookSprints site <http://www.booksprints.net/>`_. Your authors cobbled
this book together in five days during February 2013, fueled by caffeine
and the best takeout food that Austin, Texas, could offer.
On the first day, we filled white boards with colorful sticky notes to
start to shape this nebulous book about how to architect and operate
clouds:
We wrote furiously from our own experiences and bounced ideas between
each other. At regular intervals we reviewed the shape and organization
of the book and further molded it, leading to what you see today.
The team includes:
Tom Fifield
After learning about scalability in computing from particle physics
experiments, such as ATLAS at the Large Hadron Collider (LHC) at
CERN, Tom worked on OpenStack clouds in production to support the
Australian public research sector. Tom currently serves as an
OpenStack community manager and works on OpenStack documentation in
his spare time.
Diane Fleming
Diane works on the OpenStack API documentation tirelessly. She
helped out wherever she could on this project.
Anne Gentle
Anne is the documentation coordinator for OpenStack and also served
as an individual contributor to the Google Documentation Summit in
2011, working with the Open Street Maps team. She has worked on book
sprints in the past, with FLOSS Manuals Adam Hyde facilitating.
Anne lives in Austin, Texas.
Lorin Hochstein
An academic turned software-developer-slash-operator, Lorin worked
as the lead architect for Cloud Services at Nimbis Services, where
he deploys OpenStack for technical computing applications. He has
been working with OpenStack since the Cactus release. Previously, he
worked on high-performance computing extensions for OpenStack at
University of Southern California's Information Sciences Institute
(USC-ISI).
Adam Hyde
Adam facilitated this book sprint. He also founded the book sprint
methodology and is the most experienced book-sprint facilitator
around. See http://www.booksprints.net for more information. Adam
founded FLOSS Manuals—a community of some 3,000 individuals
developing Free Manuals about Free Software. He is also the founder
and project manager for Booktype, an open source project for
writing, editing, and publishing books online and in print.
Jonathan Proulx
Jon has been piloting an OpenStack cloud as a senior technical
architect at the MIT Computer Science and Artificial Intelligence
Lab for his researchers to have as much computing power as they
need. He started contributing to OpenStack documentation and
reviewing the documentation so that he could accelerate his
learning.
Everett Toews
Everett is a developer advocate at Rackspace making OpenStack and
the Rackspace Cloud easy to use. Sometimes developer, sometimes
advocate, and sometimes operator, he's built web applications,
taught workshops, given presentations around the world, and deployed
OpenStack for production use by academia and business.
Joe Topjian
Joe has designed and deployed several clouds at Cybera, a nonprofit
where they are building e-infrastructure to support entrepreneurs
and local researchers in Alberta, Canada. He also actively maintains
and operates these clouds as a systems architect, and his
experiences have generated a wealth of troubleshooting skills for
cloud environments.
OpenStack community members
Many individual efforts keep a community book alive. Our community
members updated content for this book year-round. Also, a year after
the first sprint, Jon Proulx hosted a second two-day mini-sprint at
MIT with the goal of updating the book for the latest release. Since
the book's inception, more than 30 contributors have supported this
book. We have a tool chain for reviews, continuous builds, and
translations. Writers and developers continuously review patches,
enter doc bugs, edit content, and fix doc bugs. We want to recognize
their efforts!
The following people have contributed to this book: Akihiro Motoki,
Alejandro Avella, Alexandra Settle, Andreas Jaeger, Andy McCallum,
Benjamin Stassart, Chandan Kumar, Chris Ricker, David Cramer, David
Wittman, Denny Zhang, Emilien Macchi, Gauvain Pocentek, Ignacio
Barrio, James E. Blair, Jay Clark, Jeff White, Jeremy Stanley, K
Jonathan Harker, KATO Tomoyuki, Lana Brindley, Laura Alves, Lee Li,
Lukasz Jernas, Mario B. Codeniera, Matthew Kassawara, Michael Still,
Monty Taylor, Nermina Miller, Nigel Williams, Phil Hopkins, Russell
Bryant, Sahid Orentino Ferdjaoui, Sandy Walsh, Sascha Peilicke, Sean
M. Collins, Sergey Lukjanov, Shilla Saebi, Stephen Gordon, Summer
Long, Uwe Stuehler, Vaibhav Bhatkar, Veronica Musso, Ying Chun
"Daisy" Guo, Zhengguang Ou, and ZhiQiang Fan.
How to Contribute to This Book
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The genesis of this book was an in-person event, but now that the book
is in your hands, we want you to contribute to it. OpenStack
documentation follows the coding principles of iterative work, with bug
logging, investigating, and fixing. We also store the source content on
GitHub and invite collaborators through the OpenStack Gerrit
installation, which offers reviews. For the O'Reilly edition of this
book, we are using the company's Atlas system, which also stores source
content on GitHub and enables collaboration among contributors.
Learn more about how to contribute to the OpenStack docs at `OpenStack
Documentation Contributor
Guide <http://docs.openstack.org/contributor-guide/>`_.
If you find a bug and can't fix it or aren't sure it's really a doc bug,
log a bug at `OpenStack
Manuals <https://bugs.launchpad.net/openstack-manuals>`_. Tag the bug
under Extra options with the ``ops-guide`` tag to indicate that the bug
is in this guide. You can assign the bug to yourself if you know how to
fix it. Also, a member of the OpenStack doc-core team can triage the doc
bug.
Conventions Used in This Book
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following typographical conventions are used in this book:
*Italic*
Indicates new terms, URLs, email addresses, filenames, and file
extensions.
``Constant width``
Used for program listings, as well as within paragraphs to refer to
program elements such as variable or function names, databases, data
types, environment variables, statements, and keywords.
``Constant width bold``
Shows commands or other text that should be typed literally by the
user.
Constant width italic
Shows text that should be replaced with user-supplied values or by
values determined by context.
Command prompts
Commands prefixed with the ``#`` prompt should be executed by the
``root`` user. These examples can also be executed using the
:command:`sudo` command, if available.
Commands prefixed with the ``$`` prompt can be executed by any user,
including ``root``.
.. tip::
This element signifies a tip or suggestion.
.. note::
This element signifies a general note.
.. warning::
This element indicates a warning or caution.
See also:
.. toctree::
common/conventions.rst

View File

@ -22,7 +22,8 @@ done
# Draft guides
# This includes guides that we publish from stable branches
# as versioned like the networking-guide.
for guide in networking-guide arch-design-draft config-reference; do
for guide in networking-guide arch-design-draft config-reference \
ops-guide; do
tools/build-rst.sh doc/$guide --build build \
--target "draft/$guide" $LINKCHECK
done

View File

@ -31,8 +31,9 @@ function copy_to_branch {
cp -a publish-docs/draft/* publish-docs/$BRANCH/
# We don't need this file
rm -f publish-docs/$BRANCH/draft-index.html
# We don't need Contributor Guide
rm -rf publish-docs/$BRANCH/contributor-guide
# We don't need these draft guides on the branch
rm -rf publish-docs/$BRANCH/arch-design-draft
rm -rf publish-docs/$BRANCH/ops-guide
for f in $(find publish-docs/$BRANCH -name "atom.xml"); do
sed -i -e "s|/draft/|/$BRANCH/|g" $f