openstack-manuals/doc/src/docbkx/common/getstart.xml

<?xml version="1.0" encoding="UTF-8"?>
<chapter xmlns="http://docbook.org/ns/docbook"
    xmlns:xi="http://www.w3.org/2001/XInclude"
    xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
    xml:id="ch_getting-started-with-openstack">

    <title>Getting Started with OpenStack</title>
    <para>OpenStack is a collection of open source technology that provides massively scalable open
        source cloud computing software. Currently OpenStack develops two related projects:
        OpenStack Compute, which offers computing power through virtual machine and network
        management, and OpenStack Object Storage which is software for redundant, scalable object
        storage capacity. Closely related to the OpenStack Compute project is the Image Service
        project, named Glance. OpenStack can be used by corporations, service providers, VARS, SMBs,
        researchers, and global data centers looking to deploy large-scale cloud deployments for
        private or public clouds. </para>
    <section xml:id="what-is-openstack">
        <title>What is OpenStack?</title>
        <para>OpenStack offers open source software to build public and private clouds. OpenStack is
            a community and a project as well as open source software to help organizations run
            clouds for virtual computing or storage. OpenStack contains a collection of open source
            projects that are community-maintained including OpenStack Compute (code-named Nova),
            OpenStack Object Storage (code-named Swift), and OpenStack Image Service (code-named
            Glance). OpenStack provides an operating platform, or toolkit, for orchestrating clouds. </para>
        <para>OpenStack is more easily defined once the concepts of cloud computing become
            apparent, but we are on a mission: to provide scalable, elastic cloud computing for
            both public and private clouds, large and small. At the heart of our mission is a
            pair of basic requirements: clouds must be simple to implement and massively
            scalable.</para>
        <para>If you are new to OpenStack, you will undoubtedly have questions about installation,
            deployment, and usage. It can seem overwhelming at first. But don't fear, there are
            places to get information to guide you and to help resolve any issues you may run into
            during the on-ramp process. Because the project is so new and constantly changing, be
            aware of the revision time for all information. If you are reading a document that is a
            few months old and you feel that it isn't entirely accurate, then please let us know
            through the mailing list at <link xlink:href="https://launchpad.net/~openstack"
                >https://launchpad.net/~openstack</link> so it can be updated or removed. </para>
    </section>
    <section xml:id="components-of-openstack"><title>Components of OpenStack</title>
        <para>There are currently three main components of OpenStack: Compute, Object Storage, and
            Image Service. Let's look at each in turn.</para>
        <para>OpenStack Compute is a cloud fabric controller, used to start up virtual instances for
            either a user or a group. It's also used to configure networking for each instance or
            project that contains multiple instances for a particular project. </para>
        <para>OpenStack Object Storage is a system to store objects in a massively scalable large
            capacity system with built-in redundancy and failover. Object Storage has a variety of
            applications, such as backing up or archiving data, serving graphics or videos
            (streaming data to a user’s browser), storing secondary or tertiary static data,
            developing new applications with data storage integration, storing data when predicting
            storage capacity is difficult, and creating the elasticity and flexibility of
            cloud-based storage for your web applications.</para>
        <para>OpenStack Image Service is a lookup and retrieval system for virtual machine images.
            It can be configured in three ways: using OpenStack Object Store to store images; using
            Amazon's Simple Storage Solution (S3) storage directly; or using S3 storage with Object
            Store as the intermediate for S3 access.</para>
        <para>The following diagram shows the basic relationships between the projects, how they
            relate to each other, and how they can fulfill the goals of open source cloud computing. </para>

            <informalfigure>
            <mediaobject>
                    <imageobject>

                <imagedata fileref="figures/OpenStackCore.png"/>
            </imageobject>
        </mediaobject></informalfigure>
    </section>
    <section xml:id="openstack-architecture-overview"><title>OpenStack Project Architecture Overview</title>
        <para>by <link xlink:href="http://ken.pepple.info">Ken Pepple</link></para><para>Before we dive into the conceptual and logic architecture, let’s take a second to explain the OpenStack project: </para><blockquote><para>OpenStack is a collection of open source technologies delivering a massively scalable cloud operating system.</para></blockquote><para>You can think of it as software to power your own Infrastructure as a Service (IaaS) offering like <link xlink:href="http://aws.amazon.com">Amazon Web Services</link>. It currently encompasses three main projects:</para><itemizedlist><listitem><para><link xlink:href="https://launchpad.net/swift">Swift</link> which provides object/blob storage. This is roughly analogous to Rackspace Cloud Files (from which it is derived) or Amazon S3.</para></listitem><listitem><para><link xlink:href="https://launchpad.net/glance">Glance</link> which provides discovery, storage and retrieval of virtual machine images for OpenStack Nova.</para></listitem><listitem><para><link xlink:href="https://launchpad.net/nova">Nova</link> which provides virtual servers upon
                    demand. This is similar to Rackspace Cloud Servers or Amazon EC2.</para></listitem></itemizedlist><para>While these three projects provide the core of the cloud infrastructure, OpenStack is open and
            evolving — <link xlink:href="http://wiki.openstack.org/Projects">there will be more
                projects</link> (there are already related projects for <link
                xlink:href="https://launchpad.net/horizon">web interfaces</link> and a
                <link xlink:href="http://wiki.openstack.org/QueueService">queue service</link>).
            With that brief introduction, let’s delve into a conceptual architecture and then
            examine how OpenStack Compute could map to it. </para>
        <section xml:id="cloud-provider-conceptual-architecture">
            <info><author><personname><firstname>Ken</firstname><lineage>Pepple</lineage></personname></author><title>Cloud Provider Conceptual Architecture</title></info><para>Imagine that we are going to build our own IaaS cloud and offer it to customers. To achieve this, we would need to provide several high level features:</para><orderedlist><listitem><para>Allow application owners to register for our cloud services, view their usage and see their bill (basic customer relations management functionality)</para></listitem><listitem><para>Allow Developers/DevOps folks to create and store custom images for their applications (basic build-time functionality)</para></listitem><listitem><para>Allow DevOps/Developers to launch, monitor and terminate instances (basic run-time functionality)</para></listitem><listitem><para>Allow the Cloud Operator to configure and operate the cloud infrastructure</para></listitem></orderedlist><para>While there are certainly many, many other features that we
                would need to offer (especially if we were to follow a
                more complete industry framework like <link
                    xlink:href="http://www.tmforum.org/BusinessProcessFramework/1647/home.html"
                    >eTOM</link>), these four get to the very heart of
                providing IaaS. Now assuming that you agree with these
                four top level features, you might put together a
                conceptual architecture that looks something like
                this:</para>
                <informalfigure><mediaobject><imageobject><imagedata scale="70" fileref="figures/nova-cactus-conceptual.png"/></imageobject></mediaobject></informalfigure>
                <para>In this model, I’ve imagined four sets of users (developers, devops, owners and operators)
                that need to interact with the cloud and then separated out the functionality needed
                for each. From there, I’ve followed a pretty common tiered approach to the
                architecture (presentation, logic and resources) with two orthogonal areas
                (integration and management). Let’s explore each a little further: </para><itemizedlist><listitem><para>As with presentation layers in more typical application architectures, components here interact with users to accept and present information. In this layer, you will find web portals to provide graphical interfaces for non-developers and API endpoints for developers. For more advanced architectures, you might find load balancing, console proxies, security and naming services present here also.</para></listitem><listitem><para>The logic tier would provide the intelligence and control functionality for our cloud. This tier would house orchestration (workflow for complex tasks), scheduling (determining mapping of jobs to resources), policy (quotas and such) , image registry (metadata about instance images), logging (events and metering). </para></listitem><listitem><para>There will need to be integration functions within the architecture. It is assumed that most service providers will already have a customer identity and billing systems. Any cloud architecture would need to integrate with these systems.</para></listitem><listitem><para>As with any complex environment, we will need a management tier to operate the environment. This should include an API to access the cloud administration features as well as some forms of monitoring. It is likely that the monitoring functionality will take the form of integration into an existing tool. While I’ve highlighted monitoring and an admin API for our fictional provider, in a more complete architecture you would see a vast array of operational support functions like provisioning and configuration management.</para></listitem><listitem><para>Finally, since this is a compute cloud, we will need actual compute, network and storage resources to provide to our customers. This tier provides these services, whether they be servers, network switches, network attached storage or other resources.</para></listitem></itemizedlist><para>With this model in place, let’s shift gears and look at OpenStack Compute’s logical
                    architecture.</para></section><section xml:id="openstack-nova-logical-architecture"><title>OpenStack Compute Logical Architecture</title><para>Now that we’ve looked at a proposed conceptual architecture, let’s see how OpenStack Compute
                is logically architected. At the time of this writing, Cactus was the newest release
                (which means if you are viewing this after around July 2011, this may be out of
                date). There are several logical components of OpenStack Compute architecture but
                the majority of these components are custom written python daemons of two
                varieties:</para><itemizedlist><listitem><para>WSGI applications to receive and mediate API calls (<code>nova-api</code>, <code>glance-api</code>, etc.)</para></listitem><listitem><para>Worker daemons to carry out orchestration tasks (<code>nova-compute</code>, <code>nova-network</code>, <code>nova-schedule</code>, etc.)</para></listitem></itemizedlist><para>However, there are two essential pieces of the logical architecture are neither custom written nor Python based: the messaging queue and the database. These two components facilitate the asynchronous orchestration of complex tasks through message passing and information sharing. Putting this all together we get a picture like this:</para><informalfigure><mediaobject><imageobject><imagedata scale="70"  fileref="figures/nova-cactus-logical.png"/></imageobject></mediaobject></informalfigure><para>This complicated, but not overly informative, diagram as it can be summed up in three sentences:</para><itemizedlist><listitem><para>End users (DevOps, Developers and even other OpenStack components) talk to
                            <code>nova-api</code> to interface with OpenStack Compute</para></listitem><listitem><para>OpenStack Compute daemons exchange info through the queue (actions) and database (information)
                        to carry out API requests</para></listitem><listitem><para>OpenStack Glance is basically a completely separate infrastructure which OpenStack Compute
                        interfaces through the Glance API</para></listitem></itemizedlist><para>Now that we see the overview of the processes and their interactions, let’s take a closer look at each component.</para><itemizedlist><listitem><para>The <code>nova-api</code> daemon is the heart of the OpenStack Compute. You may see it
                        illustrated on many pictures of OpenStack Compute as API and “Cloud
                        Controller”. While this is partly true, cloud controller is really just a
                        class (specifically the CloudController in trunk/nova/api/ec2/cloud.py)
                        within the <code>nova-api</code> daemon. It provides an endpoint for all API
                        queries (either <link xlink:href="http://docs.rackspacecloud.com/api/"
                            >OpenStack API</link> or <link
                            xlink:href="http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/"
                            >EC2 API</link>), initiates most of the orchestration activities (such
                        as running an instance) and also enforces some policy (mostly quota
                        checks).</para></listitem><listitem><para>The <code>nova-schedule</code> process is conceptually the simplest piece of code in OpenStack
                        Compute: take a virtual machine instance request from the queue and
                        determines where it should run (specifically, which compute server host it
                        should run on). In practice however, I am sure this will grow to be the most
                        complex as it needs to factor in current state of the entire cloud
                        infrastructure and apply complicated algorithm to ensure efficient usage. To
                        that end, <code>nova-schedule</code> implements a pluggable architecture
                        that let’s you choose (or write) your own algorithm for scheduling.
                        Currently, there are several to choose from (simple, chance, etc) and it is
                        a area of hot development for the future releases of OpenStack
                        Compute.</para></listitem><listitem><para>The <code>nova-compute</code> process is primarily a worker daemon that creates and terminates virtual machine instances. The process by which it does so is fairly complex (<link xlink:href="http://www.laurentluce.com/?p=227">see this blog post by Laurence Luce for the gritty details</link>) but the basics are simple: accept actions from the queue and then perform a series of system commands (like launching a KVM instance) to carry them out while updating state in the database.</para></listitem><listitem><para>As you can gather by the name, <code>nova-volume</code> manages the creation, attaching and detaching of persistent volumes to compute instances (similar functionality to <link xlink:href="http://aws.amazon.com/ebs/">Amazon’s Elastic Block Storage</link>). It can use volumes from a variety of providers, such as iSCSI.</para></listitem><listitem><para>The <code>nova-network</code> worker daemon is very similar to <code>nova-compute</code> and <code>nova-volume</code>. It accepts networking tasks from the queue and then performs tasks to manipulate the network (such as setting up bridging interfaces or changing <code>iptables</code> rules).</para></listitem><listitem><para>The queue provides a central hub for passing messages between daemons. This is currently implemented with <link xlink:href="http://www.rabbitmq.com/">RabbitMQ</link> today, but theoretically could be any <link xlink:href="http://www.amqp.org/confluence/display/AMQP/Advanced+Message+Queuing+Protocol">AMPQ message queue</link> supported by the python <link xlink:href="http://barryp.org/software/py-amqplib/">ampqlib</link>.</para></listitem><listitem><para>The <link xlink:href="http://en.wikipedia.org/wiki/SQL">SQL database</link> stores most of the
                        build-time and run-time state for a cloud infrastructure. This includes the
                        instance types that are available for use, instances in use, networks
                        available and projects. Theoretically, OpenStack Compute can support any
                        database supported by <link xlink:href="http://www.sqlalchemy.org/"
                            >SQL-Alchemy</link> but the only databases currently being widely used
                        are <link xlink:href="http://www.sqlite.org/">sqlite3</link> (only
                        appropriate for test and development work), <link
                            xlink:href="http://mysql.com/">MySQL</link> and <link
                            xlink:href="http://www.postgresql.org/">PostgreSQL</link>.</para></listitem><listitem><para>OpenStack Glance is a separate project from OpenStack Compute, but as shown above,
                        complimentary. While it is an optional part of the overall compute
                        architecture, I can’t imagine that most OpenStack Compute installations will
                        not be using it (or a complimentary product). There are three pieces to
                        Glance: <code>glance-api</code>, <code>glance-registry</code> and the image
                        store. As you can probably guess, <code>glance-api</code> accepts API calls,
                        much like <code>nova-api</code>, and the actual image blobs are placed in
                        the image store. The <code>glance-registry</code> stores and retrieves
                        metadata about images. The image store can be a number of different object
                        stores, include OpenStack Object Storage (Swift).</para></listitem><listitem><para>Finally, another optional project that we will need for our fictional service provider is an
                        user dashboard. I have picked the OpenStack Dashboard here, but there are
                        also several other web front ends available for OpenStack Compute. The
                        OpenStack Dashboard provides a web interface into OpenStack Compute to give
                        application developers and devops staff similar functionality to the API. It
                        is currently implemented as a <link
                            xlink:href="http://www.djangoproject.com/">Django</link> web
                        application.</para></listitem></itemizedlist><para>This logical architecture represents just one way to architect OpenStack Compute. With its
                pluggable architecture, we could easily swap out OpenStack Glance with another image
                service or use another dashboard. In the coming releases of OpenStack, expect to see
                more modularization of the code especially in the network and volume areas.</para></section>
        <section xml:id="nova-conceptual-mapping"><title>Nova Conceptual Mapping</title><para>Now that we’ve seen a conceptual architecture for a fictional cloud provider and examined the logical architecture of OpenStack Nova, it is fairly easy to map the OpenStack components to the conceptual areas to see what we are lacking:</para><informalfigure><mediaobject><imageobject><imagedata scale="50" fileref="figures/nova-cactus-conceptual-coverage.png"/></imageobject></mediaobject></informalfigure><para>As you can see from the illustration, I’ve overlaid logical components of OpenStack Nova, Glance and Dashboard to denote functional coverage. For each of the overlays, I’ve added the name of the logical component within the project that provides the functionality. While all of these judgements are highly subjective, you can see that we have a majority coverage of the functional areas with a few notable exceptions:</para><itemizedlist><listitem><para>The largest gap in our functional coverage is logging and billing. At the moment, OpenStack Nova doesn’t have a billing component that can mediate logging events, rate the logs and create/present bills. That being said, most service providers will already have one (or <emphasis>many</emphasis>) of these so the focus is really on the logging and integration with billing. This could be remedied in a variety of ways: augmentations of the code (which should happen in the next release “Diablo”), integration with commercial products or services (perhaps <link xlink:href="http://www.zuora.com/">Zuora</link>) or custom log parsing. </para></listitem><listitem><para>Identity is also a point which will likely need to be augmented. Unless we are running a stock
                        LDAP for our identity system, we will need to integrate our solution with
                        OpenStack Compute. Having said that, this is true of almost all cloud
                        solutions.</para></listitem><listitem><para>The customer portal will also be an integration point. While OpenStack Compute provides a user
                        dashboard (to see running instance, launch new instances, etc.), it doesn’t
                        provide an interface to allow application owners to signup for service,
                        track their bills and lodge trouble tickets. Again, this is probably
                        something that it is already in place at our imaginary service provider. </para></listitem><listitem><para>Ideally, the Admin API would replicate all functionality that we’d be able to do via the
                        command line interface (which in this case is mostly exposed through the
                        nova-manage command). This will get better in the Diablo release with the
                            <link xlink:href="http://wiki.openstack.org/NovaAdminAPI">Admin
                            API</link> work.</para></listitem><listitem><para>Cloud monitoring and operations will be an important area of focus for our service provider. A
                        key to any good operations approach is good tooling. While OpenStack Compute
                        provides nova-instancemonitor, which tracks compute node utilization, we’re
                        really going to need a number of third party tools for monitoring. </para></listitem><listitem><para>Policy is an extremely important area but very provider specific. Everything from quotas
                        (which are supported) to quality of service (QoS) to privacy controls can
                        fall under this. I’ve given OpenStack Nova partial coverage here, but that
                        might vary depending on the intricacies of the providers needs. For the
                        record, the Cactus release of OpenStack Compute provides quotas for instances
                        (number and cores used, volumes (size and number), floating IP addresses and
                        metadata.</para></listitem><listitem><para>Scheduling within OpenStack Compute is fairly rudimentary for larger installations today. The
                        pluggable scheduler supports chance (random host assignment), simple (least
                        loaded) and zone (random nodes within an availability zone). As within most
                        areas on this list, this will be greatly augmented in Diablo. In development
                        are distributed schedulers and schedulers that understand heterogeneous
                        hosts (for support of GPUs and differing CPU architectures).</para></listitem></itemizedlist><para>As you can see, OpenStack Compute provides a fair basis for our mythical service provider, as
                long as the mythical service providers are willing to do some integration here and
                there. </para>
            <para>Note that since the time of this writing, OpenStack Identity Service has been
                added.</para></section></section>
    <section xml:id="why-cloud">
        <title>Why Cloud?</title>
        <para>In data centers today, many computers suffer the same underutilization in computing
            power and networking bandwidth. For example, projects may need a large amount of
            computing capacity to complete a computation, but no longer need the computing power
            after completing the computation. You want cloud computing when you want a service
            that's available on-demand with the flexibility to bring it up or down through
            automation or with little intervention. The phrase "cloud computing" is often
            represented with a diagram that contains a cloud-like shape indicating a layer where
            responsibility for service goes from user to provider. The cloud in these types of
            diagrams contains the services that afford computing power harnessed to get work done.
            Much like the electrical power we receive each day, cloud computing provides subscribers
            or users with access to a shared collection of computing resources: networks for
            transfer, servers for storage, and applications or services for completing tasks. </para>
        <para>These are the compelling features of a cloud:</para>
        <itemizedlist spacing="compact">
            <listitem>
                <para>On-demand self-service: Users can provision servers and networks with little
                    human intervention. </para></listitem>
            <listitem>
                <para>Network access: Any computing capabilities are available over the network.
                    Many different devices are allowed access through standardized mechanisms. </para></listitem>
            <listitem>
                <para>Resource pooling: Multiple users can access clouds that serve other consumers
                    according to demand. </para></listitem>
            <listitem>
                <para>Elasticity: Provisioning is rapid and scales out or in based on need. </para></listitem>
            <listitem>
                <para>Metered or measured service: Just like utilities that are paid for by the
                    hour, clouds should optimize resource use and control it for the level of
                    service or type of servers such as storage or processing.</para></listitem>
        </itemizedlist>
        <para>Cloud computing offers different service models depending on the capabilities a
            consumer may require. </para>
        <itemizedlist>
            <listitem><para>SaaS: Software as a Service. Provides the consumer the ability to use the software
                in a cloud environment, such as web-based email for example. </para></listitem>
            <listitem><para>PaaS: Platform as a Service. Provides the consumer the ability to deploy
                applications through a programming language or tools supported by the cloud platform
                provider. An example of platform as a service is an Eclipse/Java programming
                platform provided with no downloads required. </para></listitem>
            <listitem><para>IaaS: Infrastructure as a Service. Provides infrastructure such as computer
                instances, network connections, and storage so that people can run any software or
                operating system. </para></listitem>
        </itemizedlist>
        <para>When you hear terms such as public cloud or private cloud, these refer to the
            deployment model for the cloud. A private cloud operates for a single organization, but
            can be managed on-premise or off-premise. A public cloud has an infrastructure that is
            available to the general public or a large industry group and is likely owned by a cloud
            services company. The NIST also defines community cloud as shared by several
            organizations supporting a specific community with shared concerns. </para>
        <para>Clouds can also be described as hybrid. A hybrid cloud can be a deployment model, as a
            composition of both public and private clouds, or a hybrid model for cloud computing may
            involve both virtual and physical servers. </para>
        <para>What have people done with cloud computing? Cloud
            computing can help with large-scale computing needs or can
            lead consolidation efforts by virtualizing servers to make
            more use of existing hardware and potentially release old
            hardware from service. People also use cloud computing for
            collaboration because of its high availability through
            networked computers. Productivity suites for word
            processing, number crunching, and email communications,
            and more are also available through cloud computing. Cloud
            computing also avails additional storage to the cloud
            user, avoiding the need for additional hard drives on each
            user's desktop and enabling access to huge data storage
            capacity online in the cloud. </para>
        <para>For a more detailed discussion of cloud computing's essential
            characteristics and its models of service and deployment, see <link
                xlink:href="http://www.nist.gov/itl/cloud/"
                >http://www.nist.gov/itl/cloud/</link>, published by the US
            National Institute of Standards and Technology.</para>
    </section>
</chapter>