openstack-manuals/doc/src/docbkx/common/getstart.xml

252 lines
30 KiB
XML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<?xml version="1.0" encoding="UTF-8"?>
<chapter xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
xml:id="ch_getting-started-with-openstack">
<title>Getting Started with OpenStack</title>
<para>OpenStack is a collection of open source technology that provides massively scalable open
source cloud computing software. Currently OpenStack develops two related projects:
OpenStack Compute, which offers computing power through virtual machine and network
management, and OpenStack Object Storage which is software for redundant, scalable object
storage capacity. Closely related to the OpenStack Compute project is the Image Service
project, named Glance. OpenStack can be used by corporations, service providers, VARS, SMBs,
researchers, and global data centers looking to deploy large-scale cloud deployments for
private or public clouds. </para>
<section xml:id="what-is-openstack">
<title>What is OpenStack?</title>
<para>OpenStack offers open source software to build public and private clouds. OpenStack is
a community and a project as well as open source software to help organizations run
clouds for virtual computing or storage. OpenStack contains a collection of open source
projects that are community-maintained including OpenStack Compute (code-named Nova),
OpenStack Object Storage (code-named Swift), and OpenStack Image Service (code-named
Glance). OpenStack provides an operating platform, or toolkit, for orchestrating clouds. </para>
<para>OpenStack is more easily defined once the concepts of cloud computing become
apparent, but we are on a mission: to provide scalable, elastic cloud computing for
both public and private clouds, large and small. At the heart of our mission is a
pair of basic requirements: clouds must be simple to implement and massively
scalable.</para>
<para>If you are new to OpenStack, you will undoubtedly have questions about installation,
deployment, and usage. It can seem overwhelming at first. But don't fear, there are
places to get information to guide you and to help resolve any issues you may run into
during the on-ramp process. Because the project is so new and constantly changing, be
aware of the revision time for all information. If you are reading a document that is a
few months old and you feel that it isn't entirely accurate, then please let us know
through the mailing list at <link xlink:href="https://launchpad.net/~openstack"
>https://launchpad.net/~openstack</link> so it can be updated or removed. </para>
</section>
<section xml:id="components-of-openstack"><title>Components of OpenStack</title>
<para>There are currently three main components of OpenStack: Compute, Object Storage, and
Image Service. Let's look at each in turn.</para>
<para>OpenStack Compute is a cloud fabric controller, used to start up virtual instances for
either a user or a group. It's also used to configure networking for each instance or
project that contains multiple instances for a particular project. </para>
<para>OpenStack Object Storage is a system to store objects in a massively scalable large
capacity system with built-in redundancy and failover. Object Storage has a variety of
applications, such as backing up or archiving data, serving graphics or videos
(streaming data to a users browser), storing secondary or tertiary static data,
developing new applications with data storage integration, storing data when predicting
storage capacity is difficult, and creating the elasticity and flexibility of
cloud-based storage for your web applications.</para>
<para>OpenStack Image Service is a lookup and retrieval system for virtual machine images.
It can be configured in three ways: using OpenStack Object Store to store images; using
Amazon's Simple Storage Solution (S3) storage directly; or using S3 storage with Object
Store as the intermediate for S3 access.</para>
<para>The following diagram shows the basic relationships between the projects, how they
relate to each other, and how they can fulfill the goals of open source cloud computing. </para>
<informalfigure>
<mediaobject>
<imageobject>
<imagedata fileref="figures/OpenStackCore.png"/>
</imageobject>
</mediaobject></informalfigure>
</section>
<section xml:id="openstack-architecture-overview"><title>OpenStack Project Architecture Overview</title>
<para>by <link xlink:href="http://ken.pepple.info">Ken Pepple</link></para><para>Before we dive into the conceptual and logic architecture, lets take a second to explain the OpenStack project: </para><blockquote><para>OpenStack is a collection of open source technologies delivering a massively scalable cloud operating system.</para></blockquote><para>You can think of it as software to power your own Infrastructure as a Service (IaaS) offering like <link xlink:href="http://aws.amazon.com">Amazon Web Services</link>. It currently encompasses three main projects:</para><itemizedlist><listitem><para><link xlink:href="https://launchpad.net/swift">Swift</link> which provides object/blob storage. This is roughly analogous to Rackspace Cloud Files (from which it is derived) or Amazon S3.</para></listitem><listitem><para><link xlink:href="https://launchpad.net/glance">Glance</link> which provides discovery, storage and retrieval of virtual machine images for OpenStack Nova.</para></listitem><listitem><para><link xlink:href="https://launchpad.net/nova">Nova</link> which provides virtual servers upon
demand. This is similar to Rackspace Cloud Servers or Amazon EC2.</para></listitem></itemizedlist><para>While these three projects provide the core of the cloud infrastructure, OpenStack is open and
evolving — <link xlink:href="http://wiki.openstack.org/Projects">there will be more
projects</link> (there are already related projects for <link
xlink:href="https://launchpad.net/horizon">web interfaces</link> and a
<link xlink:href="http://wiki.openstack.org/QueueService">queue service</link>).
With that brief introduction, lets delve into a conceptual architecture and then
examine how OpenStack Compute could map to it. </para>
<section xml:id="cloud-provider-conceptual-architecture">
<info><author><personname><firstname>Ken</firstname><lineage>Pepple</lineage></personname></author><title>Cloud Provider Conceptual Architecture</title></info><para>Imagine that we are going to build our own IaaS cloud and offer it to customers. To achieve this, we would need to provide several high level features:</para><orderedlist><listitem><para>Allow application owners to register for our cloud services, view their usage and see their bill (basic customer relations management functionality)</para></listitem><listitem><para>Allow Developers/DevOps folks to create and store custom images for their applications (basic build-time functionality)</para></listitem><listitem><para>Allow DevOps/Developers to launch, monitor and terminate instances (basic run-time functionality)</para></listitem><listitem><para>Allow the Cloud Operator to configure and operate the cloud infrastructure</para></listitem></orderedlist><para>While there are certainly many, many other features that we
would need to offer (especially if we were to follow a
more complete industry framework like <link
xlink:href="http://www.tmforum.org/BusinessProcessFramework/1647/home.html"
>eTOM</link>), these four get to the very heart of
providing IaaS. Now assuming that you agree with these
four top level features, you might put together a
conceptual architecture that looks something like
this:</para>
<informalfigure><mediaobject><imageobject><imagedata scale="70" fileref="figures/nova-cactus-conceptual.png"/></imageobject></mediaobject></informalfigure>
<para>In this model, Ive imagined four sets of users (developers, devops, owners and operators)
that need to interact with the cloud and then separated out the functionality needed
for each. From there, Ive followed a pretty common tiered approach to the
architecture (presentation, logic and resources) with two orthogonal areas
(integration and management). Lets explore each a little further: </para><itemizedlist><listitem><para>As with presentation layers in more typical application architectures, components here interact with users to accept and present information. In this layer, you will find web portals to provide graphical interfaces for non-developers and API endpoints for developers. For more advanced architectures, you might find load balancing, console proxies, security and naming services present here also.</para></listitem><listitem><para>The logic tier would provide the intelligence and control functionality for our cloud. This tier would house orchestration (workflow for complex tasks), scheduling (determining mapping of jobs to resources), policy (quotas and such) , image registry (metadata about instance images), logging (events and metering). </para></listitem><listitem><para>There will need to be integration functions within the architecture. It is assumed that most service providers will already have a customer identity and billing systems. Any cloud architecture would need to integrate with these systems.</para></listitem><listitem><para>As with any complex environment, we will need a management tier to operate the environment. This should include an API to access the cloud administration features as well as some forms of monitoring. It is likely that the monitoring functionality will take the form of integration into an existing tool. While Ive highlighted monitoring and an admin API for our fictional provider, in a more complete architecture you would see a vast array of operational support functions like provisioning and configuration management.</para></listitem><listitem><para>Finally, since this is a compute cloud, we will need actual compute, network and storage resources to provide to our customers. This tier provides these services, whether they be servers, network switches, network attached storage or other resources.</para></listitem></itemizedlist><para>With this model in place, lets shift gears and look at OpenStack Computes logical
architecture.</para></section><section xml:id="openstack-nova-logical-architecture"><title>OpenStack Compute Logical Architecture</title><para>Now that weve looked at a proposed conceptual architecture, lets see how OpenStack Compute
is logically architected. At the time of this writing, Cactus was the newest release
(which means if you are viewing this after around July 2011, this may be out of
date). There are several logical components of OpenStack Compute architecture but
the majority of these components are custom written python daemons of two
varieties:</para><itemizedlist><listitem><para>WSGI applications to receive and mediate API calls (<code>nova-api</code>, <code>glance-api</code>, etc.)</para></listitem><listitem><para>Worker daemons to carry out orchestration tasks (<code>nova-compute</code>, <code>nova-network</code>, <code>nova-schedule</code>, etc.)</para></listitem></itemizedlist><para>However, there are two essential pieces of the logical architecture are neither custom written nor Python based: the messaging queue and the database. These two components facilitate the asynchronous orchestration of complex tasks through message passing and information sharing. Putting this all together we get a picture like this:</para><informalfigure><mediaobject><imageobject><imagedata scale="70" fileref="figures/nova-cactus-logical.png"/></imageobject></mediaobject></informalfigure><para>This complicated, but not overly informative, diagram as it can be summed up in three sentences:</para><itemizedlist><listitem><para>End users (DevOps, Developers and even other OpenStack components) talk to
<code>nova-api</code> to interface with OpenStack Compute</para></listitem><listitem><para>OpenStack Compute daemons exchange info through the queue (actions) and database (information)
to carry out API requests</para></listitem><listitem><para>OpenStack Glance is basically a completely separate infrastructure which OpenStack Compute
interfaces through the Glance API</para></listitem></itemizedlist><para>Now that we see the overview of the processes and their interactions, lets take a closer look at each component.</para><itemizedlist><listitem><para>The <code>nova-api</code> daemon is the heart of the OpenStack Compute. You may see it
illustrated on many pictures of OpenStack Compute as API and “Cloud
Controller”. While this is partly true, cloud controller is really just a
class (specifically the CloudController in trunk/nova/api/ec2/cloud.py)
within the <code>nova-api</code> daemon. It provides an endpoint for all API
queries (either <link xlink:href="http://docs.rackspacecloud.com/api/"
>OpenStack API</link> or <link
xlink:href="http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/"
>EC2 API</link>), initiates most of the orchestration activities (such
as running an instance) and also enforces some policy (mostly quota
checks).</para></listitem><listitem><para>The <code>nova-schedule</code> process is conceptually the simplest piece of code in OpenStack
Compute: take a virtual machine instance request from the queue and
determines where it should run (specifically, which compute server host it
should run on). In practice however, I am sure this will grow to be the most
complex as it needs to factor in current state of the entire cloud
infrastructure and apply complicated algorithm to ensure efficient usage. To
that end, <code>nova-schedule</code> implements a pluggable architecture
that lets you choose (or write) your own algorithm for scheduling.
Currently, there are several to choose from (simple, chance, etc) and it is
a area of hot development for the future releases of OpenStack
Compute.</para></listitem><listitem><para>The <code>nova-compute</code> process is primarily a worker daemon that creates and terminates virtual machine instances. The process by which it does so is fairly complex (<link xlink:href="http://www.laurentluce.com/?p=227">see this blog post by Laurence Luce for the gritty details</link>) but the basics are simple: accept actions from the queue and then perform a series of system commands (like launching a KVM instance) to carry them out while updating state in the database.</para></listitem><listitem><para>As you can gather by the name, <code>nova-volume</code> manages the creation, attaching and detaching of persistent volumes to compute instances (similar functionality to <link xlink:href="http://aws.amazon.com/ebs/">Amazons Elastic Block Storage</link>). It can use volumes from a variety of providers, such as iSCSI.</para></listitem><listitem><para>The <code>nova-network</code> worker daemon is very similar to <code>nova-compute</code> and <code>nova-volume</code>. It accepts networking tasks from the queue and then performs tasks to manipulate the network (such as setting up bridging interfaces or changing <code>iptables</code> rules).</para></listitem><listitem><para>The queue provides a central hub for passing messages between daemons. This is currently implemented with <link xlink:href="http://www.rabbitmq.com/">RabbitMQ</link> today, but theoretically could be any <link xlink:href="http://www.amqp.org/confluence/display/AMQP/Advanced+Message+Queuing+Protocol">AMPQ message queue</link> supported by the python <link xlink:href="http://barryp.org/software/py-amqplib/">ampqlib</link>.</para></listitem><listitem><para>The <link xlink:href="http://en.wikipedia.org/wiki/SQL">SQL database</link> stores most of the
build-time and run-time state for a cloud infrastructure. This includes the
instance types that are available for use, instances in use, networks
available and projects. Theoretically, OpenStack Compute can support any
database supported by <link xlink:href="http://www.sqlalchemy.org/"
>SQL-Alchemy</link> but the only databases currently being widely used
are <link xlink:href="http://www.sqlite.org/">sqlite3</link> (only
appropriate for test and development work), <link
xlink:href="http://mysql.com/">MySQL</link> and <link
xlink:href="http://www.postgresql.org/">PostgreSQL</link>.</para></listitem><listitem><para>OpenStack Glance is a separate project from OpenStack Compute, but as shown above,
complimentary. While it is an optional part of the overall compute
architecture, I cant imagine that most OpenStack Compute installations will
not be using it (or a complimentary product). There are three pieces to
Glance: <code>glance-api</code>, <code>glance-registry</code> and the image
store. As you can probably guess, <code>glance-api</code> accepts API calls,
much like <code>nova-api</code>, and the actual image blobs are placed in
the image store. The <code>glance-registry</code> stores and retrieves
metadata about images. The image store can be a number of different object
stores, include OpenStack Object Storage (Swift).</para></listitem><listitem><para>Finally, another optional project that we will need for our fictional service provider is an
user dashboard. I have picked the OpenStack Dashboard here, but there are
also several other web front ends available for OpenStack Compute. The
OpenStack Dashboard provides a web interface into OpenStack Compute to give
application developers and devops staff similar functionality to the API. It
is currently implemented as a <link
xlink:href="http://www.djangoproject.com/">Django</link> web
application.</para></listitem></itemizedlist><para>This logical architecture represents just one way to architect OpenStack Compute. With its
pluggable architecture, we could easily swap out OpenStack Glance with another image
service or use another dashboard. In the coming releases of OpenStack, expect to see
more modularization of the code especially in the network and volume areas.</para></section>
<section xml:id="nova-conceptual-mapping"><title>Nova Conceptual Mapping</title><para>Now that weve seen a conceptual architecture for a fictional cloud provider and examined the logical architecture of OpenStack Nova, it is fairly easy to map the OpenStack components to the conceptual areas to see what we are lacking:</para><informalfigure><mediaobject><imageobject><imagedata scale="50" fileref="figures/nova-cactus-conceptual-coverage.png"/></imageobject></mediaobject></informalfigure><para>As you can see from the illustration, Ive overlaid logical components of OpenStack Nova, Glance and Dashboard to denote functional coverage. For each of the overlays, Ive added the name of the logical component within the project that provides the functionality. While all of these judgements are highly subjective, you can see that we have a majority coverage of the functional areas with a few notable exceptions:</para><itemizedlist><listitem><para>The largest gap in our functional coverage is logging and billing. At the moment, OpenStack Nova doesnt have a billing component that can mediate logging events, rate the logs and create/present bills. That being said, most service providers will already have one (or <emphasis>many</emphasis>) of these so the focus is really on the logging and integration with billing. This could be remedied in a variety of ways: augmentations of the code (which should happen in the next release “Diablo”), integration with commercial products or services (perhaps <link xlink:href="http://www.zuora.com/">Zuora</link>) or custom log parsing. </para></listitem><listitem><para>Identity is also a point which will likely need to be augmented. Unless we are running a stock
LDAP for our identity system, we will need to integrate our solution with
OpenStack Compute. Having said that, this is true of almost all cloud
solutions.</para></listitem><listitem><para>The customer portal will also be an integration point. While OpenStack Compute provides a user
dashboard (to see running instance, launch new instances, etc.), it doesnt
provide an interface to allow application owners to signup for service,
track their bills and lodge trouble tickets. Again, this is probably
something that it is already in place at our imaginary service provider. </para></listitem><listitem><para>Ideally, the Admin API would replicate all functionality that wed be able to do via the
command line interface (which in this case is mostly exposed through the
nova-manage command). This will get better in the Diablo release with the
<link xlink:href="http://wiki.openstack.org/NovaAdminAPI">Admin
API</link> work.</para></listitem><listitem><para>Cloud monitoring and operations will be an important area of focus for our service provider. A
key to any good operations approach is good tooling. While OpenStack Compute
provides nova-instancemonitor, which tracks compute node utilization, were
really going to need a number of third party tools for monitoring. </para></listitem><listitem><para>Policy is an extremely important area but very provider specific. Everything from quotas
(which are supported) to quality of service (QoS) to privacy controls can
fall under this. Ive given OpenStack Nova partial coverage here, but that
might vary depending on the intricacies of the providers needs. For the
record, the Cactus release of OpenStack Compute provides quotas for instances
(number and cores used, volumes (size and number), floating IP addresses and
metadata.</para></listitem><listitem><para>Scheduling within OpenStack Compute is fairly rudimentary for larger installations today. The
pluggable scheduler supports chance (random host assignment), simple (least
loaded) and zone (random nodes within an availability zone). As within most
areas on this list, this will be greatly augmented in Diablo. In development
are distributed schedulers and schedulers that understand heterogeneous
hosts (for support of GPUs and differing CPU architectures).</para></listitem></itemizedlist><para>As you can see, OpenStack Compute provides a fair basis for our mythical service provider, as
long as the mythical service providers are willing to do some integration here and
there. </para>
<para>Note that since the time of this writing, OpenStack Identity Service has been
added.</para></section></section>
<section xml:id="why-cloud">
<title>Why Cloud?</title>
<para>In data centers today, many computers suffer the same underutilization in computing
power and networking bandwidth. For example, projects may need a large amount of
computing capacity to complete a computation, but no longer need the computing power
after completing the computation. You want cloud computing when you want a service
that's available on-demand with the flexibility to bring it up or down through
automation or with little intervention. The phrase "cloud computing" is often
represented with a diagram that contains a cloud-like shape indicating a layer where
responsibility for service goes from user to provider. The cloud in these types of
diagrams contains the services that afford computing power harnessed to get work done.
Much like the electrical power we receive each day, cloud computing provides subscribers
or users with access to a shared collection of computing resources: networks for
transfer, servers for storage, and applications or services for completing tasks. </para>
<para>These are the compelling features of a cloud:</para>
<itemizedlist spacing="compact">
<listitem>
<para>On-demand self-service: Users can provision servers and networks with little
human intervention. </para></listitem>
<listitem>
<para>Network access: Any computing capabilities are available over the network.
Many different devices are allowed access through standardized mechanisms. </para></listitem>
<listitem>
<para>Resource pooling: Multiple users can access clouds that serve other consumers
according to demand. </para></listitem>
<listitem>
<para>Elasticity: Provisioning is rapid and scales out or in based on need. </para></listitem>
<listitem>
<para>Metered or measured service: Just like utilities that are paid for by the
hour, clouds should optimize resource use and control it for the level of
service or type of servers such as storage or processing.</para></listitem>
</itemizedlist>
<para>Cloud computing offers different service models depending on the capabilities a
consumer may require. </para>
<itemizedlist>
<listitem><para>SaaS: Software as a Service. Provides the consumer the ability to use the software
in a cloud environment, such as web-based email for example. </para></listitem>
<listitem><para>PaaS: Platform as a Service. Provides the consumer the ability to deploy
applications through a programming language or tools supported by the cloud platform
provider. An example of platform as a service is an Eclipse/Java programming
platform provided with no downloads required. </para></listitem>
<listitem><para>IaaS: Infrastructure as a Service. Provides infrastructure such as computer
instances, network connections, and storage so that people can run any software or
operating system. </para></listitem>
</itemizedlist>
<para>When you hear terms such as public cloud or private cloud, these refer to the
deployment model for the cloud. A private cloud operates for a single organization, but
can be managed on-premise or off-premise. A public cloud has an infrastructure that is
available to the general public or a large industry group and is likely owned by a cloud
services company. The NIST also defines community cloud as shared by several
organizations supporting a specific community with shared concerns. </para>
<para>Clouds can also be described as hybrid. A hybrid cloud can be a deployment model, as a
composition of both public and private clouds, or a hybrid model for cloud computing may
involve both virtual and physical servers. </para>
<para>What have people done with cloud computing? Cloud
computing can help with large-scale computing needs or can
lead consolidation efforts by virtualizing servers to make
more use of existing hardware and potentially release old
hardware from service. People also use cloud computing for
collaboration because of its high availability through
networked computers. Productivity suites for word
processing, number crunching, and email communications,
and more are also available through cloud computing. Cloud
computing also avails additional storage to the cloud
user, avoiding the need for additional hard drives on each
user's desktop and enabling access to huge data storage
capacity online in the cloud. </para>
<para>For a more detailed discussion of cloud computing's essential
characteristics and its models of service and deployment, see <link
xlink:href="http://www.nist.gov/itl/cloud/"
>http://www.nist.gov/itl/cloud/</link>, published by the US
National Institute of Standards and Technology.</para>
</section>
</chapter>