Files
watcher/doc/source/strategies/workload-stabilization.rst
Ronelle Landy 457819072f Update Overload standard deviation doc
Bug #2113862 details a number of suggested
corrections and additions to the Workload
Stabilization doc. This patch adds those
suggested changes.

Closes-Bug: #2113862
Assisted-By: Cursor (claude-3.5-sonnet)
Change-Id: I4131a304c064d2ea397b2447025c7edf69a56e2a
Signed-off-by: Ronelle Landy <rlandy@redhat.com>
2025-08-21 11:09:46 -04:00

8.4 KiB
Raw Blame History

Workload Stabilization Strategy

Synopsis

display name: Workload stabilization

goal: workload_balancing

watcher.decision_engine.strategy.strategies.workload_stabilization.WorkloadStabilization

Requirements

Metrics

The workload_stabilization strategy requires the following metrics:

metric description

instance_ram_usage

ram memory usage in an instance as float in megabytes

instance_cpu_usage

cpu usage in an instance as float ranging between 0 and 100 representing the total cpu usage as percentage

host_ram_usage

ram memory usage in a compute node as float in megabytes

host_cpu_usage

cpu usage in a compute node as float ranging between 0 and 100 representing the total cpu usage as percentage

Cluster data model

Default Watcher's Compute cluster data model:

watcher.decision_engine.model.collector.nova.NovaClusterDataModelCollector

Actions

Default Watcher's actions:

action description
migration

watcher.applier.actions.migration.Migrate

Planner

Default Watcher's planner:

watcher.decision_engine.planner.weight.WeightPlanner

Configuration

Strategy parameters are:

parameter type default Value description

metrics

array

["instance_cpu_usage", "instance_ram_usage"]

Metrics used as rates of cluster loads.

thresholds

object

{"instance_cpu_usage": 0.2, "instance_ram_usage": 0.2}

Dict where key is a metric and value is a trigger value. The strategy will only will look for an action plan when the standard deviation for the usage of one of the resources included in the metrics, taken as a normalized usage between 0 and 1 among the hosts is higher than the threshold. The value of a perfectly balanced cluster for the standard deviation would be 0, while in a totally unbalanced one would be 0.5, which should be the maximum value.

weights

object

{"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0}

These weights are used to calculate common standard deviation when optimizing the resources usage. Name of weight contains meter name and _weight suffix. Higher values imply the metric will be prioritized when calculating an optimal resulting cluster distribution.

instance_metrics

object

{"instance_cpu_usage": "host_cpu_usage", "instance_ram_usage": "host_ram_usage"}

This parameter represents the compute node metrics representing compute resource usage for the instances resource indicated in the metrics parameter.

host_choice

string

retry

Method of hosts choice when analyzing destination for instances. There are cycle, retry and fullsearch methods. Cycle will iterate hosts in cycle. Retry will get some hosts random (count defined in retry_count option). Fullsearch will return each host from list.

retry_count

number

1

Count of random returned hosts.

periods

object

{"instance": 720, "node": 600}

Time, in seconds, to get statistical values for resources usage for instance and host metrics. Watcher will use the last period to calculate resource usage.

granularity

number

300

NOT RECOMMENDED TO MODIFY: The time between two measures in an aggregated timeseries of a metric.

aggregation_method

object

{"instance": 'mean', "compute_node": 'mean'}

NOT RECOMMENDED TO MODIFY: Function used to aggregate multiple measures into an aggregated value.

Efficacy Indicator

Global efficacy indicator:

watcher.decision_engine.goal.efficacy.specs.WorkloadBalancing.get_global_efficacy_indicator

Other efficacy indicators of the goal are:

  • instance_migrations_count: The number of VM migrations to be performed
  • instances_count: The total number of audited instances in strategy
  • standard_deviation_after_audit: The value of resulted standard deviation
  • standard_deviation_before_audit: The value of original standard deviation

Algorithm

You can find description of overload algorithm and role of standard deviation here: https://specs.openstack.org/openstack/watcher-specs/specs/newton/implemented/sd-strategy.html

How to use it ?

$ openstack optimize audittemplate create \
  at1 workload_balancing --strategy workload_stabilization

$ openstack optimize audit create -a at1 \
  -p thresholds='{"instance_ram_usage": 0.05}' \
  -p metrics='["instance_ram_usage"]'

None