update spec to refer the change in implemntation

We are now removing the "small" request. This change also
sync the specs.

Change-Id: I4840d566de53ad14663341aaae59a00e9e74a79a
This commit is contained in:
Sahid Orentino Ferdjaoui
2015-04-09 09:44:47 -04:00
committed by Tony Breeds
parent b492942744
commit 99c5a64646

View File

@@ -72,13 +72,12 @@ Proposed change
The flavor extra specs will be enhanced to support a new parameter
* hw:mem_page_size=large|small|any|2MB|1GB
* hw:mem_page_size=large|any|2MB|1GB
In absence of any page size setting in the flavor, the current behaviour of
using the small, default, page size will continue. A setting of 'large' says
to only use larger page sizes for guest RAM, eg either 2MB or 1GB on x86;
'small' says to only use the small page sizes, eg 4k on x86, and is the
default; 'any' means to leave policy upto the compute driver implementation to
'any' means to leave policy upto the compute driver implementation to
decide. When seeing 'any' the libvirt driver might try to find large pages,
but fallback to small pages, but other drivers may choose alternate policies
for 'any'. Finally an explicit page size can be set if the workload has very
@@ -89,7 +88,7 @@ would require.
The property defined for the flavor can also be set against the image, but
the use of large pages would only be honoured if the flavor already had a
policy or 'large' or 'any'. ie if the flavor said 'small', or a specific
policy or 'large' or 'any'. ie if the flavor said a specific
numeric page size, the image would not be permitted to override this to access
other large page sizes. Such invalid override in the image would result in
an exception being raised and the attempt to boot the instance resulting in
@@ -123,19 +122,18 @@ involves altering the logic done in that blueprint, so that instead of just
looking at free memory in each NUMA node, it instead looks at the free page
count for the desired page size.
As illustrated later in this document each host will be reporting on all
page sizes available and this information will be available to the scheduler.
So when it interprets 'small', it will consider the smallest page size
reported by the compute node. Conversely when intepreting 'large' it will
consider any page size except the smallest one. This obviously implies that
there is potential for 'large' and 'small' to have different meanings
depending on the host being considered. For the use cases where this would
be a problem, an explicit page size would be requested instead of using
these symbolic named sizes. It will also have to consider whether the page
size is a multiple of the flavor memory size. If the instance is using
multiple NUMA nodes, it will have to consider whether the RAM in each
guest node is a multiple of the page size, rather than the total memory
size.
As illustrated later in this document each host will be reporting on
all page sizes available and this information will be available to the
scheduler. When intepreting 'large' it will consider any page size
except the smallest one. This obviously implies that there is
potential for 'large' and 'small' to have different meanings depending
on the host being considered. For the use cases where this would be a
problem, an explicit page size would be requested instead of using
these symbolic named sizes. It will also have to consider whether the
page size is a multiple of the flavor memory size. If the instance is
using multiple NUMA nodes, it will have to consider whether the RAM in
each guest node is a multiple of the page size, rather than the total
memory size.
Alternatives
------------
@@ -167,20 +165,20 @@ availability per node. So it would then look like
total = 10737418240
free = 3221225472
},
mempages = {
4096 = {
total = 262144
free = 262144
mempages = [{
size_kb = 4,
total = 262144,
used = 262144,
}, {
size_kb = 2048,
total = 1024,
used = 1024,
}, {
size_kb = 1048576,
total = 7,
used = 0,
}
2097152 = {
total = 1024
free = 1024
}
1073741824 = {
total = 7
free = 0
}
}
]
distances = [ 10, 20],
},
{
@@ -190,53 +188,25 @@ availability per node. So it would then look like
total = 10737418240
free = 5368709120
},
mempages = {
4096 = {
total = 262144
free = 262144
mempages = [{
size_kb = 4,
total = 262144,
used = 512,
}, {
size_kb = 2048,
total = 1024,
used = 128,
}, {
size_kb = 1048576,
total = 7,
used = 4,
}
2097152 = {
total = 1024
free = 1024
}
1073741824 = {
total = 7
free = 2
}
}
]
distances = [ 20, 10],
}
],
}
The data provided to the extensible resource tracker would be similarly
enhanced to include this page info in a flattened format, which can be
efficiently queried based on the key name:
* hw_numa_nodes=2
* hw_numa_node0_cpus=4
* hw_numa_node0_mem_total=10737418240
* hw_numa_node0_mem_avail=3221225472
* hw_numa_node0_mem_page_total_4=262144
* hw_numa_node0_mem_page_avail_4=262144
* hw_numa_node0_mem_page_total_2048=1024
* hw_numa_node0_mem_page_avail_2048=1024
* hw_numa_node0_mem_page_total_1048576=7
* hw_numa_node0_mem_page_avail_1048576=0
* hw_numa_node0_distance_node0=10
* hw_numa_node0_distance_node1=20
* hw_numa_node1_cpus=4
* hw_numa_node1_mem_total=10737418240
* hw_numa_node1_mem_avail=5368709120
* hw_numa_node1_mem_page_total_4=262144
* hw_numa_node1_mem_page_avail_4=262144
* hw_numa_node1_mem_page_total_2048=1024
* hw_numa_node1_mem_page_avail_2048=1024
* hw_numa_node1_mem_page_total_1048576=7
* hw_numa_node1_mem_page_avail_1048576=2
* hw_numa_node1_distance_node0=20
* hw_numa_node1_distance_node1=10
REST API impact
---------------
@@ -331,10 +301,6 @@ Dependencies
won't be fully realized until per-NUMA node large page allocation can be
done.
* Extensible resource tracker
https://blueprints.launchpad.net/nova/+spec/extensible-resource-tracking
Testing
=======