6ee7d4413a
server_concepts.rst contains some invalid JSON formats as samples. This patch fixes them and improves the readability by adding some spaces. Change-Id: Ib51f9e1351fca51c15ffdcb98f42311b999f22f5
904 lines
31 KiB
ReStructuredText
904 lines
31 KiB
ReStructuredText
===============
|
|
Server concepts
|
|
===============
|
|
|
|
For the OpenStack Compute API, a server is a virtual machine (VM) instance,
|
|
a physical machine or a container.
|
|
|
|
Server status
|
|
~~~~~~~~~~~~~
|
|
|
|
TODO: This section's content is old, we need to update the status list.
|
|
The task_state and vm_state which expose to Administrator need description to
|
|
help user to understand the difference.
|
|
|
|
You can filter the list of servers by image, flavor, name, and status
|
|
through the respective query parameters.
|
|
|
|
Server contains a status attribute that indicates the current server
|
|
state. You can filter on the server status when you complete a list
|
|
servers request. The server status is returned in the response body. The
|
|
server status is one of the following values:
|
|
|
|
**Server status values**
|
|
|
|
- ``ACTIVE``: The server is active.
|
|
|
|
- ``BUILD``: The server has not yet finished the original build process.
|
|
|
|
- ``DELETED``: The server is deleted.
|
|
|
|
- ``ERROR``: The server is in error.
|
|
|
|
- ``HARD_REBOOT``: The server is hard rebooting. This is equivalent to
|
|
pulling the power plug on a physical server, plugging it back in, and
|
|
rebooting it.
|
|
|
|
- ``MIGRATING``: The server is migrating. This is caused by a
|
|
live migration (moving a server that is active) action.
|
|
|
|
- ``PASSWORD``: The password is being reset on the server.
|
|
|
|
- ``PAUSED``: The server is paused.
|
|
|
|
- ``REBOOT``: The server is in a soft reboot state. A reboot command
|
|
was passed to the operating system.
|
|
|
|
- ``REBUILD``: The server is currently being rebuilt from an image.
|
|
|
|
- ``RESCUE``: The server is in rescue mode.
|
|
|
|
- ``RESIZE``: Server is performing the differential copy of data that
|
|
changed during its initial copy. Server is down for this stage.
|
|
|
|
- ``REVERT_RESIZE``: The resize or migration of a server failed for
|
|
some reason. The destination server is being cleaned up and the
|
|
original source server is restarting.
|
|
|
|
- ``SHELVED``: The server is in shelved state. Depends on the shelve offload
|
|
time, the server will be automatically shelved off loaded.
|
|
|
|
- ``SHELVED_OFFLOADED``: The shelved server is offloaded (removed from the
|
|
compute host) and it needs unshelved action to be used again.
|
|
|
|
- ``SHUTOFF``: The server was powered down by the user,
|
|
but not through the OpenStack Compute API. For example, the user
|
|
issued a ``shutdown -h`` command from within the server. If
|
|
the OpenStack Compute manager detects that the VM was powered down,
|
|
it transitions the server to the SHUTOFF status. If you use
|
|
the OpenStack Compute API to restart the server, it might
|
|
be deleted first, depending on the value in the
|
|
*``shutdown_terminate``* database field on the Instance model.
|
|
|
|
- ``SOFT_DELETED``: The server is marked as deleted while will keep in the
|
|
cloud for some time(configurable), during the period authorized user can
|
|
restore the server back to normal state. When the time expires, the
|
|
server will be deleted permanently.
|
|
|
|
- ``SUSPENDED``: The server is suspended, either by request or
|
|
necessity. This status appears for only the following hypervisors:
|
|
XenServer/XCP, KVM, and ESXi. Administrative users may suspend a
|
|
server if it is infrequently used or to perform system maintenance.
|
|
When you suspend a server, its state is stored on disk, all
|
|
memory is written to disk, and the server is stopped.
|
|
Suspending a server is similar to placing a device in hibernation;
|
|
memory and vCPUs become available to create other servers.
|
|
|
|
- ``UNKNOWN``: The state of the server is unknown. Contact your cloud
|
|
provider.
|
|
|
|
- ``VERIFY_RESIZE``: System is awaiting confirmation that the server is
|
|
operational after a move or resize.
|
|
|
|
The compute provisioning algorithm has an anti-affinity property that
|
|
attempts to spread customer VMs across hosts. Under certain situations,
|
|
VMs from the same customer might be placed on the same host. hostId
|
|
represents the host your server runs on and can be used to determine
|
|
this scenario if it is relevant to your application.
|
|
|
|
.. note:: HostId is unique *per account* and is not globally unique.
|
|
|
|
Server creation
|
|
~~~~~~~~~~~~~~~
|
|
|
|
Status Transition:
|
|
|
|
``BUILD``
|
|
|
|
``ACTIVE``
|
|
|
|
``ERROR`` (on error)
|
|
|
|
When you create a server, the operation asynchronously provisions a new
|
|
server. The progress of this operation depends on several factors
|
|
including location of the requested image, network I/O, host load, and
|
|
the selected flavor. The progress of the request can be checked by
|
|
performing a **GET** on /servers/*``id``*, which returns a progress
|
|
attribute (from 0% to 100% complete). The full URL to the newly created
|
|
server is returned through the ``Location`` header and is available as a
|
|
``self`` and ``bookmark`` link in the server representation. Note that
|
|
when creating a server, only the server ID, its links, and the
|
|
administrative password are guaranteed to be returned in the request.
|
|
You can retrieve additional attributes by performing subsequent **GET**
|
|
operations on the server.
|
|
|
|
Server query
|
|
~~~~~~~~~~~~
|
|
|
|
Nova allows both general user and administrator to filter the server
|
|
query result by using query options.
|
|
|
|
For general user, ``reservation_id``, ``name``, ``status``, ``image``,
|
|
``flavor``, ``ip``, ``changes-since``, ``ip6 (microversion 2.5)`` are
|
|
supported options to be used. The other options will be ignored by nova
|
|
silently only with a debug log.
|
|
|
|
For administrator, there are more fields can be used. The ``all_tenants``
|
|
option allows the servers owned by all tenants to be reported (otherwise
|
|
only the servers associated with the calling tenant are included in
|
|
the response). Additionally, the filter is applied to the database schema
|
|
definition of ``class Instance``, e.g there is a field named 'locked' in
|
|
the schema then the filter can use 'locked' as search options to filter
|
|
servers.
|
|
|
|
Also, there are some special options such as ``changes-since`` can
|
|
be used and interpreted by nova.
|
|
|
|
- **General user & Administrator supported options**
|
|
General user supported options are listed above and administrator can
|
|
use almost all the options except the options parameters for sorting
|
|
and pagination.
|
|
|
|
.. code::
|
|
|
|
Precondition:
|
|
there are 2 servers existing in cloud with following info:
|
|
|
|
"servers": [
|
|
{
|
|
"name": "t1",
|
|
"locked": "true",
|
|
...
|
|
},
|
|
{
|
|
"name": "t2",
|
|
"locked": "false",
|
|
...
|
|
}
|
|
]
|
|
|
|
**Example: General user query server with administrator only options**
|
|
|
|
.. code::
|
|
|
|
Request with non-administrator context:
|
|
GET /servers/detail?locked=1
|
|
Note that 'locked' is not returned through API layer
|
|
|
|
Response:
|
|
{
|
|
"servers": [
|
|
{
|
|
"name": "t1",
|
|
...
|
|
},
|
|
{
|
|
"name": "t2",
|
|
...
|
|
}
|
|
]
|
|
}
|
|
|
|
**Example: Administrator query server with administrator only options**
|
|
|
|
.. code::
|
|
|
|
Request with administrator context:
|
|
GET /servers/detail?locked=1
|
|
|
|
Response:
|
|
{
|
|
"servers": [
|
|
{
|
|
"name": "t1",
|
|
...
|
|
}
|
|
]
|
|
}
|
|
|
|
- **Exact matching and regex matching of the search options**
|
|
|
|
Depending on the name of a filter, matching for that filter is performed
|
|
using either exact matching or as regular expression matching.
|
|
``project_id``, ``user_id``, ``image_ref``, ``vm_state``,
|
|
``instance_type_id``, ``uuid``, ``metadata``, ``host``, ``system_metadata``
|
|
are the options that are applied by exact matching when filtering.
|
|
|
|
**Example: User query server using exact matching on host**
|
|
|
|
.. code::
|
|
|
|
Precondition:
|
|
Request with administrator context:
|
|
GET /servers/detail
|
|
|
|
Response:
|
|
|
|
{
|
|
"servers": [
|
|
{
|
|
"name": "t1",
|
|
"OS-EXT-SRV-ATTR:host": "devstack"
|
|
...
|
|
},
|
|
{
|
|
"name": "t2",
|
|
"OS-EXT-SRV-ATTR:host": "devstack1"
|
|
...
|
|
}
|
|
]
|
|
}
|
|
|
|
Request with administrator context:
|
|
GET /servers/detail?host=devstack
|
|
|
|
Response:
|
|
|
|
{
|
|
"servers": [
|
|
{
|
|
"name": "t1",
|
|
"OS-EXT-SRV-ATTR:host": "devstack"
|
|
...
|
|
}
|
|
]
|
|
}
|
|
|
|
**Example: Query server using regex matching on name**
|
|
|
|
.. code::
|
|
|
|
Precondition:
|
|
Request with administrator context:
|
|
GET /servers/detail
|
|
|
|
Response:
|
|
|
|
{
|
|
"servers": [
|
|
{
|
|
"name": "test11",
|
|
...
|
|
},
|
|
{
|
|
"name": "test21",
|
|
...
|
|
},
|
|
{
|
|
"name": "t1",
|
|
...
|
|
},
|
|
{
|
|
"name": "t14",
|
|
...
|
|
}
|
|
]
|
|
}
|
|
|
|
Request with administrator context:
|
|
GET /servers/detail?name=t1
|
|
|
|
Response:
|
|
|
|
{
|
|
"servers": [
|
|
{
|
|
"name": "test11",
|
|
...
|
|
},
|
|
{
|
|
"name": "t1",
|
|
...
|
|
},
|
|
{
|
|
"name": "t14",
|
|
...
|
|
}
|
|
]
|
|
}
|
|
|
|
**Example: User query server using exact matching on host and
|
|
regex matching on name**
|
|
|
|
.. code::
|
|
|
|
Precondition:
|
|
Request with administrator context:
|
|
GET /servers/detail
|
|
|
|
Response:
|
|
|
|
{
|
|
"servers": [
|
|
{
|
|
"name": "test1",
|
|
"OS-EXT-SRV-ATTR:host": "devstack"
|
|
...
|
|
},
|
|
{
|
|
"name": "t2",
|
|
"OS-EXT-SRV-ATTR:host": "devstack1"
|
|
...
|
|
},
|
|
{
|
|
"name": "test3",
|
|
"OS-EXT-SRV-ATTR:host": "devstack1"
|
|
...
|
|
}
|
|
]
|
|
}
|
|
|
|
Request with administrator context:
|
|
GET /servers/detail?host=devstack1&name=test
|
|
|
|
Response:
|
|
|
|
{
|
|
"servers": [
|
|
{
|
|
"name": "test3",
|
|
"OS-EXT-SRV-ATTR:host": "devstack1"
|
|
...
|
|
}
|
|
]
|
|
}
|
|
|
|
- **Special keys are used to tweek the query**
|
|
``changes-since`` returns instances updated after the given time,
|
|
``deleted`` return (or exclude) deleted instances and ``soft_deleted``
|
|
modify behavior of 'deleted' to either include or exclude instances whose
|
|
vm_state is SOFT_DELETED. Please see: :doc:`polling_changes-since_parameter`
|
|
|
|
**Example: User query server with special keys changes-since**
|
|
|
|
.. code::
|
|
|
|
Precondition:
|
|
GET /servers/detail
|
|
|
|
Response:
|
|
{
|
|
"servers": [
|
|
{
|
|
"name": "t1"
|
|
"updated": "2015-12-15T15:55:52Z"
|
|
...
|
|
},
|
|
{
|
|
"name": "t2",
|
|
"updated": "2015-12-17T15:55:52Z"
|
|
...
|
|
}
|
|
]
|
|
}
|
|
|
|
GET /servers/detail?changes-since='2015-12-16T15:55:52Z'
|
|
|
|
Response:
|
|
{
|
|
{
|
|
"name": "t2",
|
|
"updated": "2015-12-17T15:55:52Z"
|
|
...
|
|
}
|
|
}
|
|
|
|
Server actions
|
|
~~~~~~~~~~~~~~
|
|
|
|
- **Reboot**
|
|
|
|
Use this function to perform either a soft or hard reboot of a
|
|
server. With a soft reboot, the operating system is signaled to
|
|
restart, which allows for a graceful shutdown of all processes. A
|
|
hard reboot is the equivalent of power cycling the server. The
|
|
virtualization platform should ensure that the reboot action has
|
|
completed successfully even in cases in which the underlying
|
|
domain/VM is paused or halted/stopped.
|
|
|
|
- **Rebuild**
|
|
|
|
Use this function to remove all data on the server and replaces it
|
|
with the specified image. Server ID and IP addresses remain the same.
|
|
|
|
- **Evacuate**
|
|
|
|
Should a nova-compute service actually go offline, it can no longer report
|
|
status about any of the servers on it. This means they'll be
|
|
listed in an 'ACTIVE' state forever.
|
|
|
|
Evacuate is a work around for this that lets an administrator
|
|
forcibly rebuild these servers on another node. It makes
|
|
no guarantees that the host was actually down, so fencing is
|
|
left as an exercise to the deployer.
|
|
|
|
- **Resize** (including **Confirm resize**, **Revert resize**)
|
|
|
|
Use this function to convert an existing server to a different
|
|
flavor, in essence, scaling the server up or down. The original
|
|
server is saved for a period of time to allow rollback if there is a
|
|
problem. All resizes should be tested and explicitly confirmed, at
|
|
which time the original server is removed. All resizes are
|
|
automatically confirmed after 24 hours if you do not confirm or
|
|
revert them.
|
|
|
|
Confirm resize action will delete the old server in the virt layer.
|
|
The spawned server in the virt layer will be used from then on.
|
|
On the contrary, Revert resize action will delete the new server
|
|
spawned in the virt layer and revert all changes. The original server
|
|
will be used from then on.
|
|
|
|
Also, there is a periodic task configured by configuration option
|
|
resize_confirm_window(in seconds), if this value is not 0, nova compute
|
|
will check whether the server is in resized state longer than
|
|
value of resize_confirm_window, it will automatically confirm the resize
|
|
of the server.
|
|
|
|
- **Pause**, **Unpause**
|
|
|
|
You can pause a server by making a pause request. This request stores
|
|
the state of the VM in RAM. A paused server continues to run in a
|
|
frozen state.
|
|
|
|
Unpause returns a paused server back to an active state.
|
|
|
|
- **Suspend**, **Resume**
|
|
|
|
Administrative users might want to suspend a server if it is
|
|
infrequently used or to perform system maintenance. When you suspend
|
|
a server, its VM state is stored on disk, all memory is written to
|
|
disk, and the virtual machine is stopped. Suspending a server is
|
|
similar to placing a device in hibernation; memory and vCPUs become
|
|
available to create other servers.
|
|
|
|
Resume will resume a suspended server to an active state.
|
|
|
|
- **Snapshot**
|
|
|
|
You can store the current state of the server root disk to be saved
|
|
and uploaded back into the glance image repository.
|
|
Then a server can later be booted again using this saved image.
|
|
|
|
- **Backup**
|
|
|
|
You can use backup method to store server's current state in the glance
|
|
repository, in the mean time, old snapshots will be removed based on the
|
|
given 'daily' or 'weekly' type.
|
|
|
|
- **Start**
|
|
|
|
Power on the server.
|
|
|
|
- **Stop**
|
|
|
|
Power off the server.
|
|
|
|
- **Delete**, **Restore**
|
|
|
|
Power off the given server first then detach all the resources associated
|
|
to the server such as network and volumes, then delete the server.
|
|
|
|
The configuration option 'reclaim_instance_interval' (in seconds) decides whether
|
|
the server to be deleted will still be in the system. If this value is greater
|
|
than 0, the deleted server will not be deleted immediately, instead it will be
|
|
put into a queue until it's too old (deleted time greater than the value of
|
|
reclaim_instance_interval). Administrator is able to use Restore action to
|
|
recover the server from the delete queue. If the deleted server remains
|
|
longer than the value of reclaim_instance_interval, it will be deleted by compute
|
|
service automatically.
|
|
|
|
- **Shelve**, **Shelve offload**, **Unshelve**
|
|
|
|
Shelving a server indicates it will not be needed for some time and may be
|
|
temporarily removed from the hypervisors. This allows its resources to
|
|
be freed up for use by someone else.
|
|
|
|
By default the configuration option 'shelved_offload_time' is 0 and the shelved
|
|
server will be removed from the hypervisor immediately after shelve operation;
|
|
Otherwise, the resource will be kept for the value of 'shelved_offload_time'
|
|
(in seconds) so that during the time period the unshelve action will be faster,
|
|
then the periodic task will remove the server from hypervisor after
|
|
'shelved_offload_time' time passes. Set the option 'shelved_offload_time'
|
|
to -1 make it never offload.
|
|
|
|
Shelve will power off the given server and take a snapshot if it is booted
|
|
from image. The server can then be offloaded from the compute host and its
|
|
resources deallocated. Offloading is done immediately if booted from volume,
|
|
but if booted from image the offload can be delayed for some time or
|
|
infinitely, leaving the image on disk and the resources still allocated.
|
|
|
|
Shelve offload is used to explicitly remove a shelved server that has been
|
|
left on a host. This action can only be used on a shelved server and is
|
|
usually performed by an administrator.
|
|
|
|
Unshelve is the reverse operation of Shelve. It builds and boots the server
|
|
again, on a new scheduled host if it was offloaded, using the shelved image
|
|
in the glance repository if booted from image.
|
|
|
|
- **Lock**, **Unlock**
|
|
|
|
Lock a server so no further actions are allowed to the server. This can
|
|
be done by either administrator or the server's owner. By default, only owner
|
|
or administrator can lock the sever, and administrator can overwrite owner's lock.
|
|
|
|
Unlock will unlock a server in locked state so additional
|
|
operations can be performed on the server. By default, only owner or
|
|
administrator can unlock the server.
|
|
|
|
- **Rescue**, **Unrescue**
|
|
|
|
The rescue operation starts a server in a special configuration whereby
|
|
it is booted from a special root disk image. This enables the tenant to try
|
|
and restore a broken guest system.
|
|
|
|
Unrescue is the reverse action of Rescue. The server spawned from the special
|
|
root image will be deleted.
|
|
|
|
- **Set administrator password**
|
|
|
|
Sets the root/administrator password for the given server. It uses an
|
|
optionally installed agent to set the administrator password.
|
|
|
|
- **Migrate**, **Live migrate**
|
|
|
|
Migrate is usually utilized by administrator, it will move a server to
|
|
another host; it utilizes the 'resize' action but with same flavor, so during
|
|
migration, the server will be powered off and rebuilt on another host.
|
|
|
|
Live migrate also moves a server from one host to another, but it won't
|
|
power off the server in general so the server will not suffer a down time.
|
|
Administrators may use this to evacuate servers from a host that needs to
|
|
undergo maintenance tasks.
|
|
|
|
- **Trigger crash dump**
|
|
|
|
Trigger crash dump usually utilized by either administrator or the server's
|
|
owner, it will dump the memory image as dump file into the given server,
|
|
and then reboot the kernel again. And this feature depends on the setting
|
|
about the trigger (e.g. NMI) in the server.
|
|
|
|
Server passwords
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
You can specify a password when you create the server through the
|
|
optional adminPass attribute. The specified password must meet the
|
|
complexity requirements set by your OpenStack Compute provider. The
|
|
server might enter an ``ERROR`` state if the complexity requirements are
|
|
not met. In this case, a client can issue a change password action to
|
|
reset the server password.
|
|
|
|
If a password is not specified, a randomly generated password is
|
|
assigned and returned in the response object. This password is
|
|
guaranteed to meet the security requirements set by the compute
|
|
provider. For security reasons, the password is not returned in
|
|
subsequent **GET** calls.
|
|
|
|
Server metadata
|
|
~~~~~~~~~~~~~~~
|
|
|
|
Custom server metadata can also be supplied at launch time. The maximum
|
|
size of the metadata key and value is 255 bytes each. The maximum number
|
|
of key-value pairs that can be supplied per server is determined by the
|
|
compute provider and may be queried via the maxServerMeta absolute
|
|
limit.
|
|
|
|
Block Device Mapping
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
TODO: Add some description about BDM.
|
|
|
|
Scheduler Hints
|
|
~~~~~~~~~~~~~~~
|
|
|
|
TODO: Add description about how to custom scheduling policy for server booting.
|
|
|
|
Server Consoles
|
|
~~~~~~~~~~~~~~~
|
|
|
|
TODO: We have multiple endpoints about consoles, we should explain that.
|
|
|
|
Server networks
|
|
~~~~~~~~~~~~~~~
|
|
|
|
Networks to which the server connects can also be supplied at launch
|
|
time. One or more networks can be specified. User can also specify a
|
|
specific port on the network or the fixed IP address to assign to the
|
|
server interface.
|
|
|
|
Considerations
|
|
~~~~~~~~~~~~~~
|
|
|
|
- The maximum limit refers to the number of bytes in the decoded data
|
|
and not the number of characters in the encoded data.
|
|
|
|
- The maximum number of file path/content pairs that you can supply is
|
|
also determined by the compute provider and is defined by the
|
|
maxPersonality absolute limit.
|
|
|
|
- The absolute limit, maxPersonalitySize, is a byte limit that is
|
|
guaranteed to apply to all images in the deployment. Providers can
|
|
set additional per-image personality limits.
|
|
|
|
- The file injection might not occur until after the server is built and
|
|
booted.
|
|
|
|
- After file injection, personality files are accessible by only system
|
|
administrators. For example, on Linux, all files have root and the root
|
|
group as the owner and group owner, respectively, and allow user and
|
|
group read access only (octal 440).
|
|
|
|
Server access addresses
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In a hybrid environment, the IP address of a server might not be
|
|
controlled by the underlying implementation. Instead, the access IP
|
|
address might be part of the dedicated hardware; for example, a
|
|
router/NAT device. In this case, the addresses provided by the
|
|
implementation cannot actually be used to access the server (from
|
|
outside the local LAN). Here, a separate *access address* may be
|
|
assigned at creation time to provide access to the server. This address
|
|
may not be directly bound to a network interface on the server and may
|
|
not necessarily appear when a server's addresses are queried.
|
|
Nonetheless, clients that must access the server directly are encouraged
|
|
to do so via an access address. In the example below, an IPv4 address is
|
|
assigned at creation time.
|
|
|
|
|
|
**Example: Create server with access IP: JSON request**
|
|
|
|
.. code::
|
|
|
|
{
|
|
"server": {
|
|
"name": "new-server-test",
|
|
"imageRef": "52415800-8b69-11e0-9b19-734f6f006e54",
|
|
"flavorRef": "52415800-8b69-11e0-9b19-734f1195ff37",
|
|
"accessIPv4": "67.23.10.132"
|
|
}
|
|
}
|
|
|
|
.. note:: Both IPv4 and IPv6 addresses may be used as access addresses and both
|
|
addresses may be assigned simultaneously as illustrated below. Access
|
|
addresses may be updated after a server has been created.
|
|
|
|
|
|
**Example: Create server with multiple access IPs: JSON request**
|
|
|
|
.. code::
|
|
|
|
{
|
|
"server": {
|
|
"name": "new-server-test",
|
|
"imageRef": "52415800-8b69-11e0-9b19-734f6f006e54",
|
|
"flavorRef": "52415800-8b69-11e0-9b19-734f1195ff37",
|
|
"accessIPv4": "67.23.10.132",
|
|
"accessIPv6": "::babe:67.23.10.132"
|
|
}
|
|
}
|
|
|
|
Moving servers
|
|
~~~~~~~~~~~~~~
|
|
|
|
There are several actions that may result in a server moving from one
|
|
compute host to another including shelve, resize, migrations and
|
|
evacuate. The following use cases demonstrate the intention of the
|
|
actions and the consequence for operational procedures.
|
|
|
|
Cloud operator needs to move a server
|
|
-------------------------------------
|
|
|
|
Sometimes a cloud operator may need to redistribute work loads for
|
|
operational purposes. For example, the operator may need to remove
|
|
a compute host for maintenance or deploy a kernel security patch that
|
|
requires the host to be rebooted.
|
|
|
|
The operator has two actions available for deliberately moving
|
|
work loads: cold migration (moving a server that is not active)
|
|
and live migration (moving a server that is active).
|
|
|
|
Cold migration moves a server from one host to another by copying its
|
|
state, local storage and network configuration to new resources
|
|
allocated on a new host selected by scheduling policies. The operation is
|
|
relatively quick as the server is not changing its state during the copy
|
|
process. The user does not have access to the server during the operation.
|
|
|
|
Live migration moves a server from one host to another while it
|
|
is active, so it is constantly changing its state during the action.
|
|
As a result it can take considerably longer than cold migration.
|
|
During the action the server is online and accessible, but only
|
|
a limited set of management actions are available to the user.
|
|
|
|
The following are common patterns for employing migrations in
|
|
a cloud:
|
|
|
|
- **Host maintenance**
|
|
|
|
If a compute host is to be removed from the cloud all its servers
|
|
will need to be moved to other hosts. In this case it is normal for
|
|
the rest of the cloud to absorb the work load, redistributing
|
|
the servers by rescheduling them.
|
|
|
|
To prepare the host it will be disabled so it does not receive
|
|
any further servers. Then each server will be migrated to a new
|
|
host by cold or live migration, depending on the state of the
|
|
server. When complete, the host is ready to be removed.
|
|
|
|
- **Rolling updates**
|
|
|
|
Often it is necessary to perform an update on all compute hosts
|
|
which requires them to be rebooted. In this case it is not
|
|
strictly necessary to move inactive servers because they
|
|
will be available after the reboot. However, active servers would
|
|
be impacted by the reboot. Live migration will allow them to
|
|
continue operation.
|
|
|
|
In this case a rolling approach can be taken by starting with an
|
|
empty compute host that has been updated and rebooted. Another host
|
|
that has not yet been updated is disabled and all its servers are
|
|
migrated to the new host. When the migrations are complete the
|
|
new host continues normal operation. The old host will be empty
|
|
and can be updated and rebooted. It then becomes the new target for
|
|
another round of migrations.
|
|
|
|
This process can be repeated until the whole cloud has been updated,
|
|
usually using a pool of empty hosts instead of just one.
|
|
|
|
- **Resource Optimization**
|
|
|
|
To reduce energy usage, some cloud operators will try and move
|
|
servers so they fit into the minimum number of hosts, allowing
|
|
some servers to be turned off.
|
|
|
|
Sometimes higher performance might be wanted, so servers are
|
|
spread out between the hosts to minimize resource contention.
|
|
|
|
Migrating a server is not normally a choice that is available to
|
|
the cloud user because the user is not normally aware of compute
|
|
hosts. Management of the cloud and how servers are provisioned
|
|
in it is the responsibility of the cloud operator.
|
|
|
|
Recover from a failed compute host
|
|
----------------------------------
|
|
|
|
Sometimes a compute host may fail. This is a rare occurrence, but when
|
|
it happens during normal operation the servers running on the host may
|
|
be lost. In this case the operator may recreate the servers on the
|
|
remaining compute hosts using the evacuate action.
|
|
|
|
Failure detection can be proved to be impossible in compute systems
|
|
with asynchronous communication, so true failure detection cannot be
|
|
achieved. Usually when a host is considered to have failed it should be
|
|
excluded from the cloud and any virtual networking or storage associated
|
|
with servers on the failed host should be isolated from it. These steps
|
|
are called fencing the host. Initiating these action is outside the scope
|
|
of Nova.
|
|
|
|
Once the host has been fenced its servers can be recreated on other
|
|
hosts without worry of the old incarnations reappearing and trying to
|
|
access shared resources. It is usual to redistribute the servers
|
|
from a failed host by rescheduling them.
|
|
|
|
Please note, this operation can result in data loss for the user's server.
|
|
As there is no access to the original server, if there were any disks stored
|
|
on local storage, that data will be lost. Evacuate does the same operation
|
|
as a rebuild. It downloads any images from glance and creates new
|
|
blank ephemeral disks. Any disks that were volumes, or on shared storage,
|
|
are reconnected. There should be no data loss for those disks.
|
|
This is why fencing the host is important, to ensure volumes and shared
|
|
storage are not corrupted by two servers writing simultaneously.
|
|
|
|
Evacuating a server is solely in the domain of the cloud operator because
|
|
it must be performed in coordination with other operational procedures to
|
|
be safe. A user is not normally aware of compute hosts but is adversely
|
|
affected by their failure.
|
|
|
|
User resizes server to get more resources
|
|
-----------------------------------------
|
|
|
|
Sometimes a user may want to change the flavor of a server, e.g. change
|
|
the quantity of cpus, disk, memory or any other resource. This is done
|
|
by restarting the server with a new flavor. As the server is being
|
|
moved, it is normal to reschedule the server to another host
|
|
(although resize to the same host is an option for the operator).
|
|
|
|
Resize involves shutting down the server, finding a host that has
|
|
the correct resources for the new flavor size, moving the current
|
|
server (including all storage) to the new host. Once the server
|
|
has been given the appropriate resources to match the new flavor,
|
|
the server is started again.
|
|
|
|
After the resize operation, when the user is happy their server is
|
|
working correctly after the resize, the user calls Confirm Resize.
|
|
This deletes the 'before-the-resize' server that was kept on the source host.
|
|
Alternatively, the user can call Revert Resize to delete the new
|
|
resized server and restore the old that was stored on the source
|
|
host. If the user does not manually confirm the resize within a
|
|
configured time period, the resize is automatically confirmed, to
|
|
free up the space the old is using on the source host.
|
|
|
|
As with shelving, resize provides the cloud operator with an
|
|
opportunity to redistribute work loads across the cloud according
|
|
to the operators scheduling policy, providing the same benefits as
|
|
above.
|
|
|
|
Resizing a server is not normally a choice that is available to
|
|
the cloud operator because it changes the nature of the server
|
|
being provided to the user.
|
|
|
|
User doesn't want to be charged when not using a server
|
|
-------------------------------------------------------
|
|
|
|
Sometimes a user does not require a server to be active for a while,
|
|
perhaps over a weekend or at certain times of day.
|
|
Ideally they don't want to be billed for those resources.
|
|
Just powering down a server does not free up any resources,
|
|
but shelving a server does free up resources to be used by other users.
|
|
This makes it feasible for a cloud operator to offer a discount when
|
|
a server is shelved.
|
|
|
|
When the user shelves a server the operator can choose to remove it
|
|
from the compute hosts, i.e. the operator can offload the shelved server.
|
|
When the user's server is unshelved, it is scheduled to a new
|
|
host according to the operators policies for distributing work loads
|
|
across the compute hosts, including taking disabled hosts into account.
|
|
This will contribute to increased overall capacity, freeing hosts that
|
|
are ear-marked for maintenance and providing contiguous blocks
|
|
of resources on single hosts due to moving out old servers.
|
|
|
|
Shelving a server is not normally a choice that is available to
|
|
the cloud operator because it affects the availability of the server
|
|
being provided to the user.
|
|
|
|
Configure Guest OS
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
Metadata API
|
|
------------
|
|
|
|
TODO
|
|
|
|
Config Drive
|
|
------------
|
|
|
|
TODO
|
|
|
|
User data
|
|
---------
|
|
A user data file is a special key in the metadata service that holds a file
|
|
that cloud-aware applications in the server can access.
|
|
|
|
Nova has two ways to send user data to the deployed server, one is by
|
|
metadata service to let server able to access to its metadata through
|
|
a predefined ip address (169.254.169.254), then other way is to use config
|
|
drive which will wrap metadata into a iso9660 or vfat format disk so that
|
|
the deployed server can consume it by active engines such as cloud-init
|
|
during its boot process.
|
|
|
|
Server personality
|
|
------------------
|
|
|
|
You can customize the personality of a server by injecting data
|
|
into its file system. For example, you might want to insert ssh keys,
|
|
set configuration files, or store data that you want to retrieve from
|
|
inside the server. This feature provides a minimal amount of
|
|
launch-time personalization. If you require significant customization,
|
|
create a custom image.
|
|
|
|
Follow these guidelines when you inject files:
|
|
|
|
- The maximum size of the file path data is 255 bytes.
|
|
|
|
- Encode the file contents as a Base64 string. The maximum size of the
|
|
file contents is determined by the compute provider and may vary
|
|
based on the image that is used to create the server.
|