From 0aa8f16e1f4a23a26e5213ea2fd7c57d1d4d04a0 Mon Sep 17 00:00:00 2001 From: Stephen Finucane Date: Wed, 16 Nov 2016 18:28:07 +0000 Subject: [PATCH] [admin-guide] Add huge page documentation Document what huge pages are, why you may wish to use them, and how you can go about doing so. A minor typo fix is included in the 'compute-cpu-topologies' to ensure "huge pages" is the term used throughout all docs. Change-Id: I08b27f70bbcae69fa2bc4d9319ddd27ba706b496 --- doc/admin-guide/source/compute-adv-config.rst | 5 +- .../source/compute-cpu-topologies.rst | 10 +- doc/admin-guide/source/compute-huge-pages.rst | 239 ++++++++++++++++++ 3 files changed, 247 insertions(+), 7 deletions(-) create mode 100644 doc/admin-guide/source/compute-huge-pages.rst diff --git a/doc/admin-guide/source/compute-adv-config.rst b/doc/admin-guide/source/compute-adv-config.rst index 09b228ef18..2bbed97394 100644 --- a/doc/admin-guide/source/compute-adv-config.rst +++ b/doc/admin-guide/source/compute-adv-config.rst @@ -23,5 +23,6 @@ instance for these kind of workloads. .. toctree:: :maxdepth: 2 - compute-pci-passthrough.rst - compute-cpu-topologies.rst + compute-pci-passthrough + compute-cpu-topologies + compute-huge-pages diff --git a/doc/admin-guide/source/compute-cpu-topologies.rst b/doc/admin-guide/source/compute-cpu-topologies.rst index 4f947a99a5..cc64ee552b 100644 --- a/doc/admin-guide/source/compute-cpu-topologies.rst +++ b/doc/admin-guide/source/compute-cpu-topologies.rst @@ -56,11 +56,11 @@ also should be local. Finally, PCI devices are directly associated with specific NUMA nodes for the purposes of DMA. Instances that use PCI or SR-IOV devices should be placed on the NUMA node associated with these devices. -By default, an instance floats across all NUMA nodes on a host. NUMA -awareness can be enabled implicitly through the use of hugepages or pinned -CPUs or explicitly through the use of flavor extra specs or image metadata. -In all cases, the ``NUMATopologyFilter`` filter must be enabled. Details on -this filter are provided in `Scheduling`_ configuration guide. +By default, an instance floats across all NUMA nodes on a host. NUMA awareness +can be enabled implicitly through the use of huge pages or pinned CPUs or +explicitly through the use of flavor extra specs or image metadata. In all +cases, the ``NUMATopologyFilter`` filter must be enabled. Details on this +filter are provided in `Scheduling`_ configuration guide. .. caution:: diff --git a/doc/admin-guide/source/compute-huge-pages.rst b/doc/admin-guide/source/compute-huge-pages.rst new file mode 100644 index 0000000000..ea4e65cd8c --- /dev/null +++ b/doc/admin-guide/source/compute-huge-pages.rst @@ -0,0 +1,239 @@ +.. _compute-huge-pages: + +========== +Huge pages +========== + +The huge page feature in OpenStack provides important performance improvements +for applications that are highly memory IO-bound. + +.. note:: + + Huge pages may also be referred to hugepages or large pages, depending on + the source. These terms are synonyms. + +Pages, the TLB and huge pages +----------------------------- + +Pages + Physical memory is segmented into a series of contiguous regions called + pages. Each page contains a number of bytes, referred to as the page size. + The system retrieves memory by accessing entire pages, rather than byte by + byte. + +Translation Lookaside Buffer (TLB) + A TLB is used to map the virtual addresses of pages to the physical addresses + in actual memory. The TLB is a cache and is not limitless, storing only the + most recent or frequently accessed pages. During normal operation, processes + will sometimes attempt to retrieve pages that are not stored in the cache. + This is known as a TLB miss and results in a delay as the processor iterates + through the pages themselves to find the missing address mapping. + +Huge Pages + The standard page size in x86 systems is 4 kB. This is optimal for general + purpose computing but larger page sizes - 2 MB and 1 GB - are also available. + These larger page sizes are known as huge pages. Huge pages result in less + efficient memory usage as a process will not generally use all memory + available in each page. However, use of huge pages will result in fewer + overall pages and a reduced risk of TLB misses. For processes that have + significant memory requirements or are memory intensive, the benefits of huge + pages frequently outweigh the drawbacks. + +Persistent Huge Pages + On Linux hosts, persistent huge pages are huge pages that are reserved + upfront. The HugeTLB provides for the mechanism for this upfront + configuration of huge pages. The HugeTLB allows for the allocation of varying + quantities of different huge page sizes. Allocation can be made at boot time + or run time. Refer to the `Linux hugetlbfs guide`_ for more information. + +Transparent Huge Pages (THP) + On Linux hosts, transparent huge pages are huge pages that are automatically + provisioned based on process requests. Transparent huge pages are provisioned + on a best effort basis, attempting to provision 2 MB huge pages if available + but falling back to 4 kB small pages if not. However, no upfront + configuration is necessary. Refer to the `Linux THP guide`_ for more + information. + +Enabling huge pages on the host +------------------------------- + +Persistent huge pages are required owing to their guaranteed availability. +However, persistent huge pages are not enabled by default in most environments. +The steps for enabling huge pages differ from platform to platform and only the +steps for Linux hosts are described here. On Linux hosts, the number of +persistent huge pages on the host can be queried by checking ``/proc/meminfo``: + +.. code-block:: console + + $ grep Huge /proc/meminfo + AnonHugePages: 0 kB + ShmemHugePages: 0 kB + HugePages_Total: 0 + HugePages_Free: 0 + HugePages_Rsvd: 0 + HugePages_Surp: 0 + Hugepagesize: 2048 kB + +In this instance, there are 0 persistent huge pages (``HugePages_Total``) and 0 +transparent huge pages (``AnonHugePages``) allocated. Huge pages can be +allocated at boot time or run time. Huge pages require a contiguous area of +memory - memory that gets increasingly fragmented the long a host is running. +Identifying contiguous areas of memory is a issue for all huge page sizes, but +it's particularly problematic for larger huge page sizes such as 1 GB huge +pages. Allocating huge pages at boot time will ensure the correct number of huge +pages is always available, while allocating them at run time can fail if memory +has become too fragmented. + +To allocate huge pages at run time, the kernel boot parameters must be extended +to include some huge page-specific parameters. This can be achieved by +modifying ``/etc/default/grub`` and appending the ``hugepagesz``, +``hugepages``, and ``transparent_hugepages=never`` arguments to +``GRUB_CMDLINE_LINUX``. To allocate, for example, 2048 persistent 2 MB huge +pages at boot time, run: + +.. code-block:: console + + # echo 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX hugepagesz=2M hugepages=2048 transparent_hugepage=never"' > /etc/default/grub + $ grep GRUB_CMDLINE_LINUX /etc/default/grub + GRUB_CMDLINE_LINUX="..." + GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX hugepagesz=2M hugepages=2048 transparent_hugepage=never" + +.. important:: + + Persistent huge pages are not usable by standard host OS processes. Ensure + enough free, non-huge page memory is reserved for these processes. + +Reboot the host, then validate that huge pages are now available: + +.. code-block:: console + + $ grep "Huge" /proc/meminfo + AnonHugePages: 0 kB + ShmemHugePages: 0 kB + HugePages_Total: 2048 + HugePages_Free: 2048 + HugePages_Rsvd: 0 + HugePages_Surp: 0 + Hugepagesize: 2048 kB + +There are now 2048 2 MB huge pages totalling 4 GB of huge pages. These huge +pages must be mounted. On most platforms, this happens automatically. To verify +that the huge pages are mounted, run: + +.. code-block:: console + + # mount | grep huge + hugetlbfs on /dev/hugepages type hugetlbfs (rw) + +In this instance, the huge pages are mounted at ``/dev/hugepages``. This mount +point varies from platform to platform. If the above command did not return +anything, the hugepages must be mounted manually. To mount the huge pages at +``/dev/hugepages``, run: + +.. code-block:: console + + # mkdir -p /dev/hugepages + # mount -t hugetlbfs hugetlbfs /dev/hugepages + +There are many more ways to configure huge pages, including allocating huge +pages at run time, specifying varying allocations for different huge page +sizes, or allocating huge pages from memory affinitized to different NUMA +nodes. For more information on configuring huge pages on Linux hosts, refer to +the `Linux hugetlbfs guide`_. + +Customizing instance huge pages allocations +------------------------------------------- + +.. important:: + + The functionality described below is currently only supported by the + libvirt/KVM driver. + +.. important:: + + For performance reasons, configuring huge pages for an instance will + implicitly result in a NUMA topology being configured for the instance. + Configuring a NUMA topology for an instance requires enablement of + ``NUMATopologyFilter``. Refer to :doc:`compute-cpu-topologies` for more + information. + +By default, an instance does not use huge pages for its underlying memory. +However, huge pages can bring important or required performance improvements +for some workloads. Huge pages must be requested explicitly through the use of +flavor extra specs or image metadata. To request an instance use huge pages, +run: + +.. code-block:: console + + $ openstack flavor set m1.large --property hw:mem_page_size=large + +Different platforms offer different huge page sizes. For example: x86-based +platforms offer 2 MB and 1 GB huge page sizes. Specific huge page sizes can be +also be requested, with or without a unit suffix. The unit suffix must be one +of: Kb(it), Kib(it), Mb(it), Mib(it), Gb(it), Gib(it), Tb(it), Tib(it), KB, +KiB, MB, MiB, GB, GiB, TB, TiB. Where a unit suffix is not provided, Kilobytes +are assumed. To request an instance to use 2 MB huge pages, run one of: + +.. code-block:: console + + $ openstack flavor set m1.large --property hw:mem_page_size=2Mb + +.. code-block:: console + + $ openstack flavor set m1.large --property hw:mem_page_size=2048 + +Enabling huge pages for an instance can have negative consequences for other +instances by consuming limited huge pages resources. To explicitly request +an instance use small pages, run: + +.. code-block:: console + + $ openstack flavor set m1.large --property hw:mem_page_size=small + +.. note:: + + Explicitly requesting any page size will still result in a NUMA topology + being applied to the instance, as described earlier in this document. + +Finally, to leave the decision of huge or small pages to the compute driver, +run: + +.. code-block:: console + + $ openstack flavor set m1.large --property hw:mem_page_size=any + +For more information about the syntax for ``hw:mem_page_size``, refer to the +`Flavors`_ guide. + +Applications are frequently packaged as images. For applications that require +the IO performance improvements that huge pages provides, configure image +metadata to ensure instances always request the specific page size regardless +of flavor. To configure an image to use 1 GB huge pages, run: + +.. code-block:: console + + $ openstack image set [IMAGE_ID] --property hw_mem_page_size=1GB + +Image metadata takes precedence over flavor extra specs. Thus, configuring +competing page sizes causes an exception. By setting a ``small`` page size +through image metadata, administrators can prevent users requesting huge pages +in flavors and impacting resource utilization. To configure this page size, +run: + +.. code-block:: console + + $ openstack image set [IMAGE_ID] --property hw_mem_page_size=small + +.. note:: + + Explicitly requesting any page size will still result in a NUMA topology + being applied to the instance, as described earlier in this document. + +For more information about image metadata, refer to the `Image metadata`_ +guide. + +.. Links +.. _`Linux THP guide`: https://www.kernel.org/doc/Documentation/vm/transhuge.txt +.. _`Linux hugetlbfs guide`: https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt +.. _`Flavors`: http://docs.openstack.org/admin-guide/compute-flavors.html +.. _`Image metadata`: http://docs.openstack.org/image-guide/image-metadata.html