diff --git a/doc/source/admin/availability-zones.rst b/doc/source/admin/availability-zones.rst new file mode 100644 index 0000000000..0e257aaa2f --- /dev/null +++ b/doc/source/admin/availability-zones.rst @@ -0,0 +1,581 @@ +.. meta:: + :keywords: availability zones, AZ, fault tolerance, conductor groups, + shards, resource partitioning, high availability, scaling, + multiple deployments, nova compute + +========================================== +Availability Zones and Resource Isolation +========================================== + +Overview +======== + +While Ironic does not implement traditional OpenStack Availability Zones like +Nova and Neutron, it provides a **three-tier approach** for resource +partitioning and isolation that achieves comprehensive availability +zone functionality: + +* **Multiple Ironic Deployments**: Completely separate Ironic services + targeted by different Nova compute nodes +* **Conductor Groups**: Physical/geographical resource partitioning within + a deployment +* **Shards**: Logical grouping for operational scaling within a deployment + +This document explains how these mechanisms work together and how to achieve +sophisticated availability zone functionality across your infrastructure. +This document does **not** cover similar effect which can be achieved +through the use of API level Role Based Access Control through the +``owner`` and ``lessee`` fields. + +.. contents:: Table of Contents + :local: + :depth: 2 + +Comparison with Other OpenStack Services +======================================== + ++------------------+-------------------+------------------------+ +| Service | Mechanism | Purpose | ++==================+===================+========================+ +| Nova | Availability | Instance placement | +| | Zones (host | across fault domains | +| | aggregates) | | ++------------------+-------------------+------------------------+ +| Neutron | Agent AZs | Network service HA | ++------------------+-------------------+------------------------+ +| **Ironic** | **Multiple | **Complete service | +| | Deployments + | isolation + physical | +| | Conductor Groups | partitioning + | +| | + Shards** | operational scaling** | ++------------------+-------------------+------------------------+ + +Ironic's Three-Tier Approach +============================= + +Tier 1: Multiple Ironic Deployments +------------------------------------ + +The highest level of isolation involves running **completely separate +Ironic services** that Nova and other API users can target independently. + +**Use Cases**: + +* Complete geographic separation (different regions/countries) +* Regulatory compliance requiring full data isolation +* Independent upgrade cycles and operational teams + +**Implementation**: Configure separate Nova compute services to target +different Ironic deployments using Nova's Ironic driver configuration. + +**Benefits**: + +* Complete fault isolation - failure of one deployment doesn't affect others +* Independent scaling, upgrades, and maintenance +* Different operational policies per deployment +* Complete API endpoint separation + +Tier 2: Conductor Groups (Physical Partitioning) +------------------------------------------------- + +Within a single Ironic deployment, conductor groups provides +**physical resource partitioning**. + +**Use Cases**: + +* Separate nodes by datacenter/availability zone within a region +* Separate nodes by conductor groups for conductor resource management +* Isolate hardware types or vendors +* Create fault domains for high availability +* Manage nodes with different network connectivity + +Conductor groups control **which conductor manages which nodes**. +Each conductor can be assigned to a specific group, and will only +manage nodes that belong to the same group. + +A classical challenge of Ironic is that it is able to manage far more +Bare Metal nodes than a single ``nova-compute`` service is designed to +support. The primary answer for this issue is to leverage Shards first, +and then continue to evolve based upon operational needs. + +See: :doc:`conductor-groups` for detailed configuration. + +.. _availability-zones-shards: + +Tier 3: Shards (Logical Partitioning) +-------------------------------------- + +The finest level of granularity for **operational and client-side grouping**. + +**Use Cases**: + +* Horizontal scaling of operations +* Parallelize maintenance tasks +* Create logical groupings for different teams + +Shards can be used by clients, including Nova, to limit the scope of their +requests to a logical and declared subset of nodes which prevents multiple +``nova-compute`` services from being able to see and work with the same +node on multiple ``nova-compute`` services. + +.. note:: + Shards are client-side constructs - Ironic itself does not use shard + values internally. + +.. versionadded:: 1.82 + Shard support was added in API version 1.82. + +.. warning:: + Once set, a shard should not be changed. Nova's model of leveraging the + Ironic API does not permit this value to be changed after the fact. + +Common Deployment Patterns +=========================== + +Pattern 1: Multi-Region with Complete Isolation +------------------------------------------------ + +**Use Case**: Global deployment with regulatory compliance + +**Implementation**: + +- **Multiple Deployments**: ``ironic-us-east``, ``ironic-eu-west``, ``ironic-apac`` +- **Nova Configuration**: Separate compute services per region +- **Conductor Groups**: Optional within each deployment +- **Shards**: Operational grouping within regions + +**Example Nova Configuration**: + +.. code-block:: ini + + # nova-compute for US East region + [ironic] + auth_url = https://keystone-us-east.example.com/v3 + endpoint_override = https://ironic-us-east.example.com + + # nova-compute for EU West region + [ironic] + auth_url = https://keystone-eu-west.example.com/v3 + endpoint_override = https://ironic-eu-west.example.com + +.. note:: + The above indicated ``endpoint_override`` configuration is provided + for illustrative purposes to stress endpoints would be distinctly + different. + +Pattern 2: Single Region with Datacenter Separation +---------------------------------------------------- + +**Use Case**: Metro deployment across multiple datacenters + +**Implementation**: + +- **Single Deployment**: One Ironic service +- **Conductor Groups**: ``datacenter-1``, ``datacenter-2``, ``datacenter-3`` +- **Nova Configuration**: Target specific conductor groups +- **Shards**: Optional operational grouping + +In this case, we don't expect BMC management network access to occur between +datacenters. Thus each datacenter is configured with it's own group of +conductors. + +**Example Configuration**: + +.. code-block:: bash + + # Configure Nova compute to target specific conductor group + [ironic] + conductor_group = datacenter-1 + + # Configure conductors (ironic.conf) + [conductor] + conductor_group = datacenter-1 + + # Assign nodes + baremetal node set --conductor-group datacenter-1 + +.. note:: + Some larger operators who leverage conductor groups have suggested + that it is sometimes logical to have a conductor set without a + ``conductor_group`` set. This helps prevent orphaning nodes because + Ironic routes all changes to the conductor which presently manages + the node. + +Pattern 3: Operational Scaling Within Datacenters +-------------------------------------------------- + +**Use Case**: Large deployment requiring parallel operations + +**Implementation**: + +- **Single Deployment**: One Ironic service +- **Conductor Groups**: By datacenter or hardware type +- **Shards**: Operational batches for maintenance/upgrades +- **Nova Configuration**: May target specific conductor groups + +**Example**: + +.. code-block:: bash + + # Set up conductor groups by hardware + baremetal node set --conductor-group dell-servers + baremetal node set --conductor-group hpe-servers + + # Create operational shards for maintenance + baremetal node set --shard maintenance-batch-1 + baremetal node set --shard maintenance-batch-2 + +Pattern 4: Hybrid Multi-Tier Approach +-------------------------------------- + +**Use Case**: Complex enterprise deployment + +**Implementation**: All three tiers working together + +**Example Architecture**: + +.. code-block:: bash + + # Deployment 1: Production East Coast + # Nova compute service targets ironic-prod-east + [ironic] + endpoint_override = https://ironic-prod-east.example.com + conductor_group = datacenter-east + + # Within this deployment: + baremetal node set --conductor-group datacenter-east --shard prod-batch-a + + # Deployment 2: Production West Coast + # Nova compute service targets ironic-prod-west + [ironic] + endpoint_override = https://ironic-prod-west.example.com + conductor_group = datacenter-west + +Nova Integration and Configuration +================================== + +Targeting Multiple Ironic Deployments +-------------------------------------- + +Nova's Ironic driver can be configured to target different Ironic services: + +**Per-Compute Service Configuration**: + +.. code-block:: ini + + # /etc/nova/nova.conf on compute-service-1 + [ironic] + auth_url = https://keystone-region1.example.com/v3 + endpoint_override = https://ironic-region1.example.com + conductor_group = region1-zone1 + + # /etc/nova/nova.conf on compute-service-2 + [ironic] + auth_url = https://keystone-region2.example.com/v3 + endpoint_override = https://ironic-region2.example.com + conductor_group = region2-zone1 + +**Advanced Options**: + +.. code-block:: ini + + [ironic] + # Target specific conductor group within deployment + conductor_group = datacenter-east + + # Target specific shard within deployment + shard = production-nodes + + # Connection retry configuration + api_max_retries = 60 + api_retry_interval = 2 + +.. seealso:: + `Nova Ironic Hypervisor Configuration `_ + for complete Nova configuration details. + +Scaling Considerations +---------------------- + +**Nova Compute Service Scaling**: + +* Single nova-compute can handle several hundred Ironic nodes efficiently. +* Consider multiple compute services for >1000 nodes per deployment. + Nova-compute is modeled on keeping a relatively small number of "instances" + per nova-compute process. For example, 250 baremetal nodes. +* One nova-compute process per conductor group or shard is expected. +* A ``conductor_group`` which is independent of a nova-compute service + configuration can be changed at any time. A shard should never be + changed once it has been introduced to a nova-compute process. + +**Multi-Deployment Benefits**: + +* Independent scaling per deployment +* Isolated failure domains +* Different operational schedules + +Integration Considerations +========================== + +Network Considerations +---------------------- + +Ironic's partitioning works alongside physical network configuration: + +* Physical networks can span multiple conductor groups +* Consider network topology when designing conductor group boundaries +* Ensure network connectivity between conductors and their assigned nodes + +.. seealso:: + :doc:`networking` for detailed network configuration guidance + +Nova Placement and Scheduling +------------------------------ + +When using Ironic with Nova: + +* Nova's availability zones operate independently of Ironic's partitioning +* Use resource classes and traits for capability-based scheduling + +.. seealso:: + :doc:`../install/configure-nova-flavors` for flavor and scheduling configuration + +API Client Usage +================ + +Working Across Multiple Deployments +------------------------------------ + +When managing multiple Ironic deployments, use separate client configurations: + +.. code-block:: bash + + # Configure client for deployment 1 + export OS_AUTH_URL=https://keystone-east.example.com/v3 + export OS_ENDPOINT_OVERRIDE=https://ironic-east.example.com + baremetal node list + + # Configure client for deployment 2 + export OS_AUTH_URL=https://keystone-west.example.com/v3 + export OS_ENDPOINT_OVERRIDE=https://ironic-west.example.com + baremetal node list + +Filtering by Conductor Group +----------------------------- + +.. code-block:: bash + + # List nodes by conductor group + baremetal node list --conductor-group datacenter-east + + # List ports by node conductor group + baremetal port list --conductor-group datacenter-east + +Filtering by Shard +------------------- + +.. code-block:: bash + + # List nodes by shard + baremetal node list --shard batch-a + + # Get shard distribution + baremetal shard list + + # Find nodes without a shard assignment + baremetal node list --unsharded + +Combined Filtering Within Deployments +-------------------------------------- + +.. code-block:: bash + + # Within a single deployment, filter by conductor group and shard + baremetal node list --conductor-group datacenter-1 --shard maintenance-batch-a + + # Set both conductor group and shard on a node + baremetal node set --conductor-group datacenter-east --shard batch-a + + # Get overview of resource distribution + baremetal shard list + baremetal conductor list + +Best Practices +============== + +Deployment Strategy Planning +---------------------------- + +1. **Assess isolation requirements**: Determine if you need complete service separation +2. **Plan geographic distribution**: Use multiple deployments for true regional separation +3. **Design conductor groups**: Align with physical/network boundaries +4. **Implement shard strategy**: Plan for operational efficiency +5. **Configure Nova appropriately**: Match Nova compute services to your architecture + +Operational Considerations +-------------------------- + +**Multiple Deployments**: + +* Maintain consistent tooling across deployments +* Plan for cross-deployment migrations if needed +* Monitor each deployment independently +* Coordinate upgrade schedules + +**Within Deployments**: + +* Monitor conductor distribution: ``baremetal shard list`` +* Ensure conductor redundancy per group +* Align network topology with conductor groups +* Automate shard management for balance + +**Nova Integration**: + +* Plan compute service distribution across deployments +* Monitor nova-compute to Ironic node ratios +* Test failover scenarios between compute services + +Naming Conventions +------------------ + +Naming patterns can be defined by the infrastructure operator and below +are some basic suggestions which may be relevant based upon operational +requirements. + +**Conductor Groups**: + +* Geographic: ``datacenter-east``, ``region-us-west``, ``rack-01`` +* Hardware: ``dell-servers``, ``hpe-gen10``, ``gpu-nodes`` +* Network: ``vlan-100``, ``isolated-network`` + +**Shards**: + +* Operational: ``maintenance-batch-1``, ``upgrade-group-a`` +* Size-based: ``small-nodes``, ``large-memory`` +* Temporal: ``weekend-maintenance``, ``business-hours`` + +Decision Matrix +--------------- + +Choose your approach based on requirements: + ++-------------------------+-------------------+-----------------+---------------+ +| **Requirement** | **Multiple | **Conductor | **Shards** | +| | Deployments** | **Groups** | | ++=========================+===================+=================+===============+ +| Complete isolation | ✓ Best | ✓ Good | ✗ No | ++-------------------------+-------------------+-----------------+---------------+ +| Independent upgrades | ✓ Complete | ✓ Partial | ✗ No | ++-------------------------+-------------------+-----------------+---------------+ +| Geographic separation | ✓ Best | ✓ Good | ✗ No | ++-------------------------+-------------------+-----------------+---------------+ +| Operational scaling | ✗ Overhead | ✓ Good | ✓ Best | ++-------------------------+-------------------+-----------------+---------------+ +| Resource efficiency | ✗ Lower | ✓ Good | ✓ Best | ++-------------------------+-------------------+-----------------+---------------+ + +Troubleshooting +=============== + +Multiple Deployment Issues +--------------------------- + +**Connectivity Problems**: + +.. code-block:: bash + + # Test connectivity to each deployment + baremetal --os-endpoint-override https://ironic-east.example.com node list + baremetal --os-endpoint-override https://ironic-west.example.com node list + +**Nova Configuration Issues**: + +.. code-block:: bash + + # Check Nova compute service registration + openstack compute service list --service nova-compute + + # Verify Nova can reach Ironic + grep -i ironic /var/log/nova/nova-compute.log + +**Cross-Deployment Node Migration**: + +.. code-block:: bash + + # Export node data from source deployment + baremetal node show --fields all + + # Import to destination deployment (manual process) + # Note: Requires careful planning and may need custom tooling + +Common Issues Within Deployments +--------------------------------- + +**Orphaned nodes**: Nodes without matching conductor groups cannot be managed + +.. code-block:: bash + + # Find nodes without conductor groups + baremetal node list --conductor-group "" + + # List available conductor groups + baremetal conductor list + +**Unbalanced shards**: Monitor node distribution across shards + +.. code-block:: bash + + # Check shard distribution + baremetal shard list + + # Find heavily loaded shards + baremetal node list --shard | wc -l + +**Missing conductor groups**: Ensure all groups have active conductors + +.. code-block:: bash + + # Check conductor status + baremetal conductor list + + # Verify conductor group configuration + # Check ironic.conf [conductor] conductor_group setting + +Migration Scenarios +------------------- + +**Moving nodes between conductor groups**: + +.. code-block:: bash + + # Move node to different conductor group + baremetal node set --conductor-group new-group + +**Reassigning shards**: + +.. code-block:: bash + + # Change node shard assignment + baremetal node set --shard new-shard + + # Remove shard assignment + baremetal node unset --shard + +.. warning:: + Shards should never be changed once a nova-compute service has + identified a node in Ironic. Changing a shard at this point is + an unsupported action. As such, Ironic's API RBAC policy restricts + these actions to a "System-Scoped Admin" user. Normal Admin users + are denied this capability due the restriction and requirement + on the nova-compute side of the consumption of shards. + +See Also +======== + +* :doc:`conductor-groups` - Detailed conductor group configuration +* :doc:`networking` - Physical network considerations +* :doc:`../install/refarch/index` - Reference architectures +* :doc:`multitenancy` - Multi-tenant deployments +* :doc:`tuning` - Performance tuning considerations +* `Nova Ironic Driver Documentation `_ +* `Nova Ironic Configuration Options `_ + diff --git a/doc/source/admin/conductor-groups.rst b/doc/source/admin/conductor-groups.rst index b9e515b8c8..e80f77781c 100644 --- a/doc/source/admin/conductor-groups.rst +++ b/doc/source/admin/conductor-groups.rst @@ -4,9 +4,18 @@ Conductor Groups ================ +.. seealso:: + For a complete guide on achieving availability zone functionality, + see :doc:`availability-zones`. + Overview ======== +Conductor groups provide **physical resource partitioning** in Ironic, +similar to Nova's availability zones but focused on conductor-level management. +They work alongside :ref:`shards ` to provide +complete resource isolation and operational scaling capabilities. + Large-scale operators tend to have needs that involve creating well-defined and delineated resources. In some cases, these systems may reside close by or in faraway locations. The reasoning may be simple @@ -64,3 +73,18 @@ expression. #. As desired and as needed, the remaining conductors can be updated with the first two steps. Please be mindful of the constraints covered earlier in the document related to the ability to manage nodes. + +Advanced Usage with Multiple Deployments +========================================= + +Conductor groups work within a single Ironic deployment. For complete +service isolation across geographic regions or regulatory boundaries, +consider using :ref:`multiple Ironic deployments ` +targeted by different Nova compute services. + +See Also +======== + +* :doc:`availability-zones` - Complete availability zone strategy +* :doc:`networking` - Physical network considerations +* :doc:`../install/refarch/index` - Reference architectures diff --git a/doc/source/admin/index.rst b/doc/source/admin/index.rst index 7933f0d8cf..b14304f5c7 100644 --- a/doc/source/admin/index.rst +++ b/doc/source/admin/index.rst @@ -12,8 +12,12 @@ the services. features operation architecture + availability-zones .. toctree:: :hidden: deploy-steps + conductor-groups + networking + multitenancy diff --git a/doc/source/admin/networking.rst b/doc/source/admin/networking.rst index e38a6ee146..41495db021 100644 --- a/doc/source/admin/networking.rst +++ b/doc/source/admin/networking.rst @@ -258,6 +258,11 @@ it is needed. Physical networks ----------------- +.. note:: + Physical networks work alongside :doc:`conductor-groups` and + :doc:`availability-zones` to provide complete resource partitioning. + Consider all three mechanisms when designing your network topology. + An Ironic port may be associated with a physical network using its ``physical_network`` field. Ironic uses this information when mapping between virtual ports in Neutron and physical ports and diff --git a/doc/source/install/refarch/common.rst b/doc/source/install/refarch/common.rst index a095ce35ca..eef32e5d20 100644 --- a/doc/source/install/refarch/common.rst +++ b/doc/source/install/refarch/common.rst @@ -23,6 +23,10 @@ components. provisioning logic lives. The following considerations are the most important when deciding on the way to deploy it: + .. tip:: + For large deployments, consider :doc:`../../admin/availability-zones` + to understand conductor groups and shards for resource partitioning. + * The conductor manages a certain proportion of nodes, distributed to it via a hash ring. This includes constantly polling these nodes for their current power state and hardware sensor data (if enabled and supported @@ -264,6 +268,11 @@ reliability and performance. There is some tolerance for a larger number per conductor. However, it was reported [1]_ [2]_ that reliability degrades when handling approximately 300 bare metal nodes per conductor. +.. note:: + For very large deployments, consider using :doc:`../../admin/availability-zones` + strategies such as conductor groups to distribute load across multiple + conductors, or even multiple Ironic deployments for complete isolation. + Disk space ^^^^^^^^^^