16e19d6432
Replace plugin with plug-in in text. Change-Id: I1209717446c11fd653a82840a24327a5a12e02b3
396 lines
21 KiB
XML
396 lines
21 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<chapter xmlns:xi="http://www.w3.org/2001/XInclude"
|
||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||
xmlns="http://docbook.org/ns/docbook"
|
||
xmlns:db="http://docbook.org/ns/docbook" version="5.0"
|
||
xml:id="ch013_node-bootstrapping">
|
||
<?dbhtml stop-chunking?>
|
||
<title>Integrity Life-cycle</title>
|
||
<para>We define integrity life cycle as a deliberate process that
|
||
provides assurance that we are always running the expected
|
||
software with the expected configurations throughout the cloud.
|
||
This process begins with secure bootstrapping and is maintained
|
||
through configuration management and security monitoring. This
|
||
chapter provides recommendations on how to approach the integrity
|
||
life-cycle process.</para>
|
||
<section xml:id="ch013_node-bootstrapping-idp44768">
|
||
<title>Secure Bootstrapping</title>
|
||
<para>Nodes in the cloud -- including compute, storage, network,
|
||
service, and hybrid nodes -- should have an automated
|
||
provisioning process. This ensures that nodes are provisioned
|
||
consistently and correctly. This also facilitates security
|
||
patching, upgrading, bug fixing, and other critical changes.
|
||
Since this process installs new software that runs at the
|
||
highest privilege levels in the cloud, it is important to verify
|
||
that the correct software is installed. This includes the
|
||
earliest stages of the boot process.</para>
|
||
<para>There are a variety of technologies that enable verification
|
||
of these early boot stages. These typically require hardware
|
||
support such as the trusted platform module (TPM), Intel Trusted
|
||
Execution Technology (TXT), dynamic root of trust measurement
|
||
(DRTM), and Unified Extensible Firmware Interface (UEFI) secure
|
||
boot. In this book, we will refer to all of these collectively
|
||
as <emphasis>secure boot technologies</emphasis>. We recommend
|
||
using secure boot, while acknowledging that many of the pieces
|
||
necessary to deploy this require advanced technical skills in
|
||
order to customize the tools for each environment. Utilizing
|
||
secure boot will require deeper integration and customization
|
||
than many of the other recommendations in this guide. TPM
|
||
technology, while common in most business class laptops and
|
||
desktops for several years, and is now becoming available in
|
||
servers together with supporting BIOS. Proper planning is
|
||
essential to a successful secure boot deployment.</para>
|
||
<para>A complete tutorial on secure boot deployment is beyond the
|
||
scope of this book. Instead, here we provide a framework for how
|
||
to integrate secure boot technologies with the typical node
|
||
provisioning process. For additional details, cloud architects
|
||
should refer to the related specifications and software
|
||
configuration manuals.</para>
|
||
<section xml:id="ch013_node-bootstrapping-idp48720">
|
||
<title>Node Provisioning</title>
|
||
<para>Nodes should use Preboot eXecution Environment (PXE) for
|
||
provisioning. This significantly reduces the effort required
|
||
for redeploying nodes. The typical process involves the node
|
||
receiving various boot stages (i.e., progressively more
|
||
complex software to execute) from a server.</para>
|
||
<para><inlinemediaobject>
|
||
<imageobject role="html">
|
||
<imagedata contentdepth="203" contentwidth="274"
|
||
fileref="static/node-provisioning-pxe.png" format="PNG"
|
||
scalefit="1"/>
|
||
</imageobject>
|
||
<imageobject role="fo">
|
||
<imagedata contentdepth="100%"
|
||
fileref="static/node-provisioning-pxe.png" format="PNG"
|
||
scalefit="1" width="100%"/>
|
||
</imageobject>
|
||
</inlinemediaobject></para>
|
||
<para>We recommend using a separate, isolated network within the
|
||
management security domain for provisioning. This network will
|
||
handle all PXE traffic, along with the subsequent boot stage
|
||
downloads depicted above. Note that the node boot process
|
||
begins with two insecure operations: DHCP and TFTP. Then the
|
||
boot process downloads over SSL the remaining information
|
||
required to deploy the node. This information might include an
|
||
initramfs and a kernel. This concludes by downloading the
|
||
remaining information needed to deploy the node. This may be
|
||
an operating system installer, a basic install managed by
|
||
<link xlink:href="http://www.opscode.com/chef/">Chef</link>
|
||
or <link xlink:href="https://puppetlabs.com/">Puppet</link>,
|
||
or even a complete file system image that is written directly
|
||
to disk.</para>
|
||
<para>While utilizing SSL during the PXE boot process is
|
||
somewhat more challenging, common PXE firmware projects, such
|
||
as iPXE, provide this support. Typically this involves
|
||
building the PXE firmware with knowledge of the allowed SSL
|
||
certificate chain(s) so that it can properly validate the
|
||
server certificate. This raises the bar for an attacker by
|
||
limiting the number of insecure, plain text network
|
||
operations.</para>
|
||
</section>
|
||
<section xml:id="ch013_node-bootstrapping-idp58144">
|
||
<title>Verified Boot</title>
|
||
<para>In general, there are two different strategies for
|
||
verifying the boot process. Traditional <emphasis>secure
|
||
boot</emphasis> will validate the code run at each step in
|
||
the process, and stop the boot if code is incorrect.
|
||
<emphasis>Boot attestation</emphasis> will record which code
|
||
is run at each step, and provide this information to another
|
||
machine as proof that the boot process completed as expected.
|
||
In both cases, the first step is to measure each piece of code
|
||
before it is run. In this context, a measurement is
|
||
effectively a SHA-1 hash of the code, taken before it is
|
||
executed. The hash is stored in a platform configuration
|
||
register (PCR) in the TPM.</para>
|
||
<para>Note: SHA-1 is used here because this is what the TPM
|
||
chips support.</para>
|
||
<para>Each TPM has at least 24 PCRs. The TCG Generic Server
|
||
Specification, v1.0, March 2005, defines the PCR assignments
|
||
for boot-time integrity measurements. The table below shows a
|
||
typical PCR configuration. The context indicates if the values
|
||
are determined based on the node hardware (firmware) or the
|
||
software provisioned onto the node. Some values are influenced
|
||
by firmware versions, disk sizes, and other low-level
|
||
information. Therefore, it is important to have good practices
|
||
in place around configuration management to ensure that each
|
||
system deployed is configured exactly as desired.</para>
|
||
|
||
<informaltable rules="all" width="80%">
|
||
<colgroup>
|
||
<col/>
|
||
<col/>
|
||
<col/>
|
||
</colgroup>
|
||
<tbody>
|
||
<tr>
|
||
<td><para><emphasis role="bold"
|
||
>Register</emphasis></para></td>
|
||
<td><para><emphasis role="bold">What Is
|
||
Measured</emphasis></para></td>
|
||
<td><para><emphasis role="bold"
|
||
>Context</emphasis></para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>PCR-00</para></td>
|
||
<td><para>Core Root of Trust Measurement (CRTM), Bios
|
||
code, Host platform extensions</para></td>
|
||
<td><para>Hardware</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>PCR-01</para></td>
|
||
<td><para>Host Platform Configuration</para></td>
|
||
<td><para>Hardware</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>PCR-02</para></td>
|
||
<td><para>Option ROM Code </para></td>
|
||
<td><para>Hardware</para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>PCR-03</para></td>
|
||
<td><para>Option ROM Configuration and Data </para></td>
|
||
<td><para>Hardware </para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>PCR-04</para></td>
|
||
<td><para>Initial Program Loader (IPL) Code. For example,
|
||
master boot record.</para></td>
|
||
<td><para>Software </para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>PCR-05</para></td>
|
||
<td><para>IPL Code Configuration and Data </para></td>
|
||
<td><para>Software </para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>PCR-06</para></td>
|
||
<td><para>State Transition and Wake Events </para></td>
|
||
<td><para>Software </para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>PCR-07</para></td>
|
||
<td><para>Host Platform Manufacturer Control </para></td>
|
||
<td><para>Software </para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>PCR-08</para></td>
|
||
<td><para>Platform specific, often Kernel, Kernel
|
||
Extensions, and Drivers</para></td>
|
||
<td><para>Software </para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>PCR-09</para></td>
|
||
<td><para>Platform specific, often Initramfs</para></td>
|
||
<td><para>Software </para></td>
|
||
</tr>
|
||
<tr>
|
||
<td><para>PCR-10 to PCR-23</para></td>
|
||
<td><para>Platform specific </para></td>
|
||
<td><para>Software </para></td>
|
||
</tr>
|
||
</tbody>
|
||
</informaltable>
|
||
|
||
<para>At the time of this writing, very few clouds are using
|
||
secure boot technologies in a production environment. As a
|
||
result, these technologies are still somewhat immature. We
|
||
recommend planning carefully in terms of hardware selection.
|
||
For example, ensure that you have a TPM and Intel TXT support.
|
||
Then verify how the node hardware vendor populates the PCR
|
||
values. For example, which values will be available for
|
||
validation. Typically the PCR values listed under the software
|
||
context in the table above are the ones that a cloud architect
|
||
has direct control over. But even these may change as the
|
||
software in the cloud is upgraded. Configuration management
|
||
should be linked into the PCR policy engine to ensure that the
|
||
validation is always up to date.</para>
|
||
<para>Each manufacturer must provide the BIOS and firmware code
|
||
for their servers. Different servers, hypervisors, and
|
||
operating systems will choose to populate different PCRs. In
|
||
most real world deployments, it will be impossible to validate
|
||
every PCR against a known good quantity ("golden
|
||
measurement"). Experience has shown that, even within a single
|
||
vendor's product line, the measurement process for a given PCR
|
||
may not be consistent. We recommend establishing a baseline
|
||
for each server and monitoring the PCR values for unexpected
|
||
changes. Third-party software may be available to assist in
|
||
the TPM provisioning and monitoring process, depending upon
|
||
your chosen hypervisor solution.</para>
|
||
<para>The initial program loader (IPL) code will most likely be
|
||
the PXE firmware, assuming the node deployment strategy
|
||
outlined above. Therefore, the secure boot or boot attestation
|
||
process can measure all of the early stage boot code, such as,
|
||
bios, firmware, and the like, the PXE firmware, and the node
|
||
kernel. Ensuring that each node has the correct versions of
|
||
these pieces installed provides a solid foundation on which to
|
||
build the rest of the node software stack.</para>
|
||
<para>Depending on the strategy selected, in the event of a
|
||
failure the node will either fail to boot or it can report the
|
||
failure back to another entity in the cloud. For secure boot,
|
||
the node will fail to boot and a provisioning service within
|
||
the management security domain must recognize this and log the
|
||
event. For boot attestation, the node will already be running
|
||
when the failure is detected. In this case the node should be
|
||
immediately quarantined by disabling its network access. Then
|
||
the event should be analyzed for the root cause. In either
|
||
case, policy should dictate how to proceed after a failure. A
|
||
cloud may automatically attempt to re-provision a node a
|
||
certain number of times. Or it may immediately notify a cloud
|
||
administrator to investigate the problem. The right policy
|
||
here will be deployment and failure mode specific.</para>
|
||
</section>
|
||
<section xml:id="ch013_node-bootstrapping-idp3728">
|
||
<title>Node Hardening</title>
|
||
<para>At this point we know that the node has booted with the
|
||
correct kernel and underlying components. There are many paths
|
||
for hardening a given operating system deployment. The
|
||
specifics on these steps are outside of the scope of this
|
||
book. We recommend following the guidance from a hardening
|
||
guide specific to your operating system. For example, the
|
||
<link xlink:href="http://iase.disa.mil/stigs/">security
|
||
technical implementation guides</link> (STIG) and the <link
|
||
xlink:href="http://www.nsa.gov/ia/mitigation_guidance/security_configuration_guides/"
|
||
>NSA guides</link> are useful starting places.</para>
|
||
<para>The nature of the nodes makes additional hardening
|
||
possible. We recommend the following additional steps for
|
||
production nodes:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>Use a read-only file system where possible. Ensure
|
||
that writeable file systems do not permit execution. This
|
||
can be handled through the mount options provided in
|
||
<literal>/etc/fstab</literal>.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Use a mandatory access control policy to contain the
|
||
instances, the node services, and any other critical
|
||
processes and data on the node. See the discussions on
|
||
sVirt / SELinux and AppArmor below.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Remove any unnecessary software packages. This should
|
||
result in a very stripped down installation because a
|
||
compute node has a relatively small number of
|
||
dependencies.</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>Finally, the node kernel should have a mechanism to
|
||
validate that the rest of the node starts in a known good
|
||
state. This provides the necessary link from the boot
|
||
validation process to validating the entire system. The steps
|
||
for doing this will be deployment specific. As an example, a
|
||
kernel module could verify a hash over the blocks comprising
|
||
the file system before mounting it using <link
|
||
xlink:href="https://code.google.com/p/cryptsetup/wiki/DMVerity"
|
||
>dm-verity</link>.</para>
|
||
</section>
|
||
</section>
|
||
<section xml:id="ch013_node-bootstrapping-idp11376">
|
||
<title>Runtime Verification</title>
|
||
<para>Once the node is running, we need to ensure that it remains
|
||
in a good state over time. Broadly speaking, this includes both
|
||
configuration management and security monitoring. The goals for
|
||
each of these areas are different. By checking both, we achieve
|
||
higher assurance that the system is operating as desired. We
|
||
discuss configuration management in the management section, and
|
||
security monitoring below.</para>
|
||
<section xml:id="ch013_node-bootstrapping-idp135504">
|
||
<title>Intrusion Detection System</title>
|
||
<para>Host-based intrusion detection tools are also useful for
|
||
automated validation of the cloud internals. There are a wide
|
||
variety of host-based intrusion detection tools available.
|
||
Some are open source projects that are freely available, while
|
||
others are commercial. Typically these tools analyze data from
|
||
a variety of sources and produce security alerts based on rule
|
||
sets and/or training. Typical capabilities include log
|
||
analysis, file integrity checking, policy monitoring, and
|
||
rootkit detection. More advanced -- often custom -- tools can
|
||
validate that in-memory process images match the on-disk
|
||
executable and validate the execution state of a running
|
||
process.</para>
|
||
<para>One critical policy decision for a cloud architect is what
|
||
to do with the output from a security monitoring tool. There
|
||
are effectively two options. The first is to alert a human to
|
||
investigate and/or take corrective action. This could be done
|
||
by including the security alert in a log or events feed for
|
||
cloud administrators. The second option is to have the cloud
|
||
take some form of remedial action automatically, in addition
|
||
to logging the event. Remedial actions could include anything
|
||
from re-installing a node to performing a minor service
|
||
configuration. However, automated remedial action can be
|
||
challenging due to the possibility of false positives.</para>
|
||
<para>False positives occur when the security monitoring tool
|
||
produces a security alert for a benign event. Due to the
|
||
nature of security monitoring tools, false positives will most
|
||
certainly occur from time to time. Typically a cloud
|
||
administrator can tune security monitoring tools to reduce the
|
||
false positives, but this may also reduce the overall
|
||
detection rate at the same time. These classic trade-offs must
|
||
be understood and accounted for when setting up a security
|
||
monitoring system in the cloud.</para>
|
||
<para>The selection and configuration of a host-based intrusion
|
||
detection tool is highly deployment specific. We recommend
|
||
starting by exploring the following open source projects which
|
||
implement a variety of host-based intrusion detection and file
|
||
monitoring features.</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para><link xlink:href="http://www.ossec.net/"
|
||
>OSSEC</link></para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><link xlink:href="http://la-samhna.de/samhain/"
|
||
>Samhain</link></para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><link
|
||
xlink:href="http://sourceforge.net/projects/tripwire/"
|
||
>Tripwire</link></para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><link xlink:href="http://aide.sourceforge.net/"
|
||
>AIDE</link></para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>Network intrusion detection tools complement the
|
||
host-based tools. OpenStack doesn't have a specific network
|
||
IDS built-in, but OpenStack's networking component, Neutron,
|
||
provides a plug-in mechanism to enable different technologies
|
||
via the Neutron API. This plug-in architecture will allow
|
||
tenants to develop API extensions to insert and configure
|
||
their own advanced networking services like a firewall, an
|
||
intrusion detection system, or a VPN between the VMs.</para>
|
||
<para>Similar to host-based tools, the selection and
|
||
configuration of a network-based intrusion detection tool is
|
||
deployment specific. <link xlink:href="http://www.snort.org/"
|
||
>Snort</link> is the leading open source networking
|
||
intrusion detection tool, and a good starting place to learn
|
||
more.</para>
|
||
<para>There are a few important security considerations for
|
||
network and host-based intrusion detection systems.</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>It is important to consider the placement of the
|
||
Network IDS on the cloud (for example, adding it to the
|
||
network boundary and/or around sensitive networks). The
|
||
placement depends on your network environment but make
|
||
sure to monitor the impact the IDS may have on your
|
||
services depending on where you choose to add it.
|
||
Encrypted traffic, such as SSL, cannot generally be
|
||
inspected for content by a Network IDS. However, the
|
||
Network IDS may still provide some benefit in identifying
|
||
anomalous unencrypted traffic on the network.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>In some deployments it may be required to add
|
||
host-based IDS on sensitive components on security domain
|
||
bridges. A host-based IDS may detect anomalous activity
|
||
by compromised or unauthorized processes on the component.
|
||
The IDS should transmit alert and log information on the
|
||
Management network.</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</section>
|
||
</section>
|
||
</chapter>
|