Files

Ronelle Landy 457819072f Update Overload standard deviation doc

Bug #2113862 details a number of suggested
corrections and additions to the Workload
Stabilization doc. This patch adds those
suggested changes.

Closes-Bug: #2113862
Assisted-By: Cursor (claude-3.5-sonnet)
Change-Id: I4131a304c064d2ea397b2447025c7edf69a56e2a
Signed-off-by: Ronelle Landy <rlandy@redhat.com>

2025-08-21 11:09:46 -04:00

8.4 KiB

Raw Blame History

Workload Stabilization Strategy

Synopsis

display name: Workload stabilization

goal: workload_balancing

watcher.decision_engine.strategy.strategies.workload_stabilization.WorkloadStabilization

Requirements

Metrics

The workload_stabilization strategy requires the following metrics:

metric	description
`instance_ram_usage`	ram memory usage in an instance as float in megabytes
`instance_cpu_usage`	cpu usage in an instance as float ranging between 0 and 100 representing the total cpu usage as percentage
`host_ram_usage`	ram memory usage in a compute node as float in megabytes
`host_cpu_usage`	cpu usage in a compute node as float ranging between 0 and 100 representing the total cpu usage as percentage

Cluster data model

Default Watcher's Compute cluster data model:

watcher.decision_engine.model.collector.nova.NovaClusterDataModelCollector

Actions

Default Watcher's actions:

action description

migration

watcher.applier.actions.migration.Migrate

action	description
`migration`	watcher.applier.actions.migration.Migrate

Planner

Default Watcher's planner:

watcher.decision_engine.planner.weight.WeightPlanner

Configuration

Strategy parameters are:

parameter	type	default Value	description
`metrics`	array	["instance_cpu_usage", "instance_ram_usage"]	Metrics used as rates of cluster loads.
`thresholds`	object	{"instance_cpu_usage": 0.2, "instance_ram_usage": 0.2}	Dict where key is a metric and value is a trigger value. The strategy will only will look for an action plan when the standard deviation for the usage of one of the resources included in the metrics, taken as a normalized usage between 0 and 1 among the hosts is higher than the threshold. The value of a perfectly balanced cluster for the standard deviation would be 0, while in a totally unbalanced one would be 0.5, which should be the maximum value.
`weights`	object	{"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0}	These weights are used to calculate common standard deviation when optimizing the resources usage. Name of weight contains meter name and _weight suffix. Higher values imply the metric will be prioritized when calculating an optimal resulting cluster distribution.
`instance_metrics`	object	{"instance_cpu_usage": "host_cpu_usage", "instance_ram_usage": "host_ram_usage"}	This parameter represents the compute node metrics representing compute resource usage for the instances resource indicated in the metrics parameter.
`host_choice`	string	retry	Method of host’s choice when analyzing destination for instances. There are cycle, retry and fullsearch methods. Cycle will iterate hosts in cycle. Retry will get some hosts random (count defined in retry_count option). Fullsearch will return each host from list.
`retry_count`	number	1	Count of random returned hosts.
`periods`	object	{"instance": 720, "node": 600}	Time, in seconds, to get statistical values for resources usage for instance and host metrics. Watcher will use the last period to calculate resource usage.
`granularity`	number	300	NOT RECOMMENDED TO MODIFY: The time between two measures in an aggregated timeseries of a metric.
`aggregation_method`	object	{"instance": 'mean', "compute_node": 'mean'}	NOT RECOMMENDED TO MODIFY: Function used to aggregate multiple measures into an aggregated value.

Efficacy Indicator

Global efficacy indicator:

watcher.decision_engine.goal.efficacy.specs.WorkloadBalancing.get_global_efficacy_indicator

Other efficacy indicators of the goal are:

instance_migrations_count: The number of VM migrations to be performed
instances_count: The total number of audited instances in strategy
standard_deviation_after_audit: The value of resulted standard deviation
standard_deviation_before_audit: The value of original standard deviation

Algorithm

You can find description of overload algorithm and role of standard deviation here: https://specs.openstack.org/openstack/watcher-specs/specs/newton/implemented/sd-strategy.html

How to use it ?

$ openstack optimize audittemplate create \
  at1 workload_balancing --strategy workload_stabilization

$ openstack optimize audit create -a at1 \
  -p thresholds='{"instance_ram_usage": 0.05}' \
  -p metrics='["instance_ram_usage"]'

External Links

None

8.4 KiB Raw Blame History Unescape Escape