Files
docs/doc/source/releasenotes/index.rst
Juanita Balaraj 3112af01a2 Updated oidc-auth-apps versions (dsr10,dsr10mr)
Updated Patchset 11 comments
Updated Patchset 10 comments
Updated Patcshet 8 comments
Updated Patchset 7 comments
Fixed formatting issue in nginx version https://review.opendev.org/c/starlingx/docs/+/948716

Story: 2011328
Task: 52134

Change-Id: Ie169d3eae4f51b7b5dda282f78122929bf2ed9d1
Signed-off-by: Juanita Balaraj <juanita.balaraj@windriver.com>
2025-05-20 13:41:18 +00:00

3165 lines
109 KiB
ReStructuredText

.. This release note was created to address review https://review.opendev.org/c/starlingx/docs/+/862596
.. The Release Notes will be updated and a separate gerrit review will be sent out
.. Ignore the contents in this RN except for the updates stated in the comment above
.. _release-notes:
.. The Stx 10.0 RN is WIP and not ready for review.
.. Removed appearances of Armada as its not supported
===================
R10.0 Release Notes
===================
.. rubric:: |context|
StarlingX is a fully integrated edge cloud software stack that provides
everything needed to deploy an edge cloud on one, two, or up to 100 servers.
This section describes the new capabilities, Known Limitations and Workarounds,
Defects fixed and deprecated information in StarlingX 10.0.
.. contents::
:local:
:depth: 1
---------
ISO image
---------
The pre-built ISO (Debian) for StarlingX 10.0 is located at the
``StarlingX mirror`` repo:
https://mirror.starlingx.windriver.com/mirror/starlingx/release/10.0.0/debian/monolithic/outputs/iso/
------------------------------
Source Code for StarlingX 10.0
------------------------------
The source code for StarlingX 10.0 is available on the r/stx.10.0
branch in the `StarlingX repositories <https://opendev.org/starlingx>`_.
----------
Deployment
----------
To deploy StarlingX 10.0, see `Consuming StarlingX <https://docs.starlingx.io/introduction/consuming.html>`_.
For detailed installation instructions, see `StarlingX 10.0 Installation Guides <https://docs.starlingx.io/deploy_install_guides/index-install-e083ca818006.html>`_.
.. Ghada / Greg please confirm if all features listed here are required in Stx 10.0
-----------------------------
New Features and Enhancements
-----------------------------
The sections below provide a detailed list of new features and links to the
associated user guides (if applicable).
.. start-new-features-r10
****************************
Platform Component Upversion
****************************
The ``auto_update`` attribute supported for |prod| applications
enables apps to be automatically updated when a new app version tarball is
installed on a system.
**See**: https://wiki.openstack.org/wiki/StarlingX/Containers/Applications/AppIntegration
The following platform component versions have been updated in |prod| 10.0.
- sriov-fec-operator 2.9.0
- kubernetes-power-manager 2.5.1
- kubevirt-app: 1.1.0
- security-profiles-operator 0.8.7
- nginx-ingress-controller
- ingress-nginx 4.12.1
- secret-observer 0.1.1
- auditd 1.0.5
- snmp 1.0.3
- cert-manager 1.15.3
- ceph-csi-rbd 3.11.0
- node-interface-metrics-exporter 0.1.3
- node-feature-discovery 0.16.4
- app-rook-ceph
- rook-ceph 1.13.7
- rook-ceph-cluster 1.13.7
- rook-ceph-floating-monitor 1.0.0
- rook-ceph-provisioner 2.0.0
- dell-storage
- csi-powerstore 2.10.0
- csi-unity 2.10.0
- csi-powerscale 2.10.0
- csi-powerflex 2.10.1
- csi-powermax 2.10.0
- csm-replication 1.8.0
- csm-observability 1.8.0
- csm-resiliency 1.9.0
- portieris 0.13.16
- metrics-server 3.12.1 (0.7.1)
- FluxCD helm-controller 1.0.1 (for Helm 3.12.2)
- power-metrics
- cadvisor 0.50.0
- telegraf 1.1.30
- security-profiles-operator 0.8.7
- vault
- vault 1.14.0
- vault-manager 1.0.1
- oidc-auth-apps
- oidc-auth-secret-observer secret-observer 0.1.7 1.0
- oidc-dex dex-0.20.0 2.41.1
- oidc-oidc-client oidc-client 0.1.23 1.0
- platform-integ-apps
- ceph-csi-cephfs 3.11.0
- ceph-pools-audit 0.2.0
- app-istio
- istio-operator 1.22.1
- kiali-server 1.85.0
- harbor 1.12.4
- ptp-notification 2.0.55
- intel-device-plugins-operator
- intel-device-plugins-operator 0.30.3
- intel-device-plugins-qat 0.30.1
- intel-device-plugins-gpu 0.30.0
- intel-device-plugins-dsa 0.30.1
- secret-observer 0.1.1
- node-interface-metrics-exporter 0.1.3
- oran-o2 2.0.4
- helm 3.14.4 for K8s 1.21 - 1.29
- Redfish Tool 1.1.8-1
**See**: :ref:`Application Reference <application-reference-8916dfe370cd>`
********************
Kubernetes Upversion
********************
|prod-long| Release |this-ver| supports Kubernetes 1.29.2.
*****************************************
Distributed Cloud Scalability Improvement
*****************************************
|prod| System Controller scalability has been improved in |prod| 10.0 with
both 5 thousand maximum managed nodes and maximum number of parallel operations.
****************************************
Unified Software Delivery and Management
****************************************
In |prod| 10.0, the Software Patching functionality and the
Software Upgrades functionality have been re-designed into a single Unified
Software Management framework. There is now a single procedure for managing
the deployment of new software; regardless of whether the new software is a
new Patch Release or a new Major Release. The same APIs/CLIs are used, the
same procedures are used, the same |VIM| / Host Orchestration strategies are used
and the same Distributed Cloud / Subcloud Orchestration strategies are used;
regardless of whether the new software is a new Patch Release or a new Major
Release.
**See**: :ref:`appendix-commands-replaced-by-usm-for-updates-and-upgrades-835629a1f5b8`
for a detailed list of deprecated commands and new commands.
*******************************************
Infrastructure Management Component Updates
*******************************************
In |prod| 10.0, the new Unified Software Management framework
supports enhanced Patch Release packaging and enhanced Major Release deployments.
Patch Release packaging has been simplified to deliver new or modified Debian
packages, instead of the cryptic difference of OSTree builds done previously.
This allows for inspection and validation of Patch Release content prior to
deploying, and allows for future flexibility of Patch Release packaging.
Major Release deployments have been enhanced to fully leverage OSTree. An
OSTree deploy is now used to update the host software. The new software's
root filesystem can be installed on the host, while the host is still running
the software of the old root filesystem. The host is simply rebooted
into the new software's root filesystem. This provides a significant
improvement in both the upgrade duration and the upgrade service impact
(especially for |AIO-SX| systems), as previously upgrading hosts needed to have
disks/root-filesystems wiped and then software re-installed.
**See**
- :ref:`patch-release-deployment-before-bootstrap-and-commissioning-of-7d0a97144db8`
- :ref:`manual-host-software-deployment-ee17ec6f71a4`
- :ref:`manual-removal-host-software-deployment-24f47e80e518`
- :ref:`manual-rollback-host-software-deployment-9295ce1e6e29`
***********************************************************
Unified Software Management - Rollback Orchestration AIO-SX
***********************************************************
|VIM| Patch Orchestration has been enhanced to support the abort and rollback of
a Patch Release software deployment. |VIM| Patch Orchestration rollback will
automate the abort and rollback steps across all hosts of a Cloud configuration.
.. note::
In |prod| 10.0, |VIM| Patch Orchestration Rollback is only
supported for |AIO-SX| configurations.
In |prod-long| 10.0 |VIM| Patch Orchestration Rollback is only
supported if the Patch Release software deployment has been aborted or
failed prior to the 'software deploy activate' step. If the Patch Release
software deployment is at or beyond the 'software deploy activate' step,
then an install plus restore of the Cloud is required in order to rollback
the Patch Release deployment.
**See**: :ref:`orchestrated-rollback-host-software-deployment-c6b12f13a8a1`
***********************************
Enhancements to Full Debian Support
***********************************
The Kernel can be configured during runtime as [ standard <-> lowlatency ].
**See**: :ref:`Modify the Kernel using the CLI <modify-the-kernel-in-the-cli-39f25220ec1b>`
*********************************************************
Support for Kernel Live Patching (for possible scenarios)
*********************************************************
|prod-long| supports live patching that enables fixing critical functions
without rebooting the system and enables systems to be functional and running.
The live-patching modules will be built into the upgraded |prod-long| binary
patch.
The upgraded binary patch is generated as the in-service type (non-reboot-required).
The kernel modules will be matched with the correct kernel release version
during binary patch upgrading.
The relevant kernel module can be found in the location:
'/lib/modules/<release-kernel-version>/extra/kpatch'
During binary patch upgrading, the user space tool ``kpatch`` is
used for:
- installing the kernel module to ${installdir}
- loading(insmod) the kernel module for the running kernel
- unloading(rmmod) the kernel module from the running kernel
- uninstallling the kernel module from ${installdir}
- listing the enabled live patch kernel module
**************************
Subcloud Phased Deployment
**************************
Subclouds can be deployed using individual phases. Therefore, instead of using
a single operation, a subcloud can be deployed by executing each phase individually.
Users have the flexibility to proactively abort the deployment based on their
needs. When the deployment is resumed, previously installed contents will be
still valid.
**See**: :ref:`Install a Subcloud in Phases <install-a-subcloud-in-phases-0ce5f6fbf696>`
******************************
Kubernetes Local Client Access
******************************
You can configure Kubernetes access for a user logged in to the active
controller either through SSH or by using the system console.
**See**: :ref:`configure-kubernetes-local-client-access`
*******************************
Kubernetes Remote Client Access
*******************************
The access to the Kubernetes cluster from outside the controller can be done
using the remote CLI container or using the host directly.
**See**: :ref:`configure-kubernetes-remote-client-access`
**************************************************
IPv4/IPv6 Dual Stack support for Platform Networks
**************************************************
Migration of a single stack deployment to dual stack network deployments will
not cause service disruptions.
Dual-stack networking facilitates the simultaneous use of both IPv4 and IPv6
addresses, or continue to use each IP version independently. To accomplish
this, platform networks can be associated with 1 or 2 address pools, one for
each IP version (IPv4 or IPv6). The first pool is linked to the network
upon creation and cannot be subsequently removed. The second pool can be added or
removed to transition the system between dual-stack and single-stack modes.
**See**: :ref:`dual-stack-support-318550fd91b5`
*********************************
Run Kata Containers in Kubernetes
*********************************
There are two methods to run Kata Containers in Kubernetes: by runtime class or
by annotation. Runtime class is supported in Kubernetes since v1.12.0 or
higher, and it is the recommended method for running Kata Containers.
**See**: :ref:`kata_container`
***************************************************
External DNS Alternative: Adding Local Host Entries
***************************************************
You can configure user-defined host entries for external resources that are not
maintained by |DNS| records resolvable by the external |DNS| server(s) (i.e.
``nameservers`` in ``system dns-show/dns-modify``). This functionality enables
the configuration of local host records, supplementing hosts resolvable by
external |DNS| server(s).
**See**: :ref:`user-host-entries-configuration-9ad4c060eb15`
*******************************************
Power Metrics Enablement - vRAN Integration
*******************************************
|prod| 10.0 supports integrated enhanced power metrics tool with
reduced impact on vRAN field deployment.
Power Metrics may increase the scheduling latency due to perf and |MSR|
readings. It was observed that there was a latency impact of around 3 µs on
average, plus spikes with significant increases in maximum latency values.
There was also an impact on the kernel processing time. Applications that
run with priorities at or above 50 in real-time kernel isolated CPUs should
allow kernel services to avoid unexpected system behavior.
**See**: :ref:`install-power-metrics-application-a12de3db7478`
******************************************
Crash dump File Size Setting Enhancements
******************************************
The Linux kernel can be configured to perform a crash dump and reboot in
response to specific serious events. A crash dump event produces a
crash dump report with bundle of files that represent the state of the kernel at the
time of the event, which is useful for post-event root cause analysis.
The crash dump files that are generated by Linux kdump are configured to be
generated during kernel panics (default) are managed by the crashDumpMgr utility.
The utility will save crash dump files but the current handling uses a fixed
configuration when saving files. In order to provide a more flexible system
handling the crashDumpMgr utility is enhanced to support the following
configuration parameters that will control the storage and rotation of crash
dump files.
- Maximum Files: New configuration parameter for the number of saved crash
dump files (default 4).
- Maximum Size: Limit the maximum size of an individual crash dump file
(support for unlimited, default 5GB).
- Maximum Used: Limit the maximum storage used by saved crash dump files
(support for unlimited, default unlimited).
- Minimum Available: Limit the minimum available storage on the crash dump
file system (restricted to minimum 1GB, default 10%).
The service parameters must be specified using the following service hierarchy.
It is recommended to model the parameters after the platform coredump service
parameters for consistency.
.. code-block:: none
platform crashdump <parameter>=<value>
**See**: :ref:`customize-crashdumpmanager-46e0d32891a0`
.. Michel Desjardins please confirm if this is applicable?
***********************************************
Subcloud Install or Restore of Previous Release
***********************************************
|prod| |this-ver| system controller supports both |prod| 9.0 and
|prod| |this-ver| subclouds fresh install or restore.
If the upgrade is from |prod| 9.0 to a higher release, the **prestage status**
and **prestage versions** fields in the output of the
:command:`dcmanager subcloud list` command will be empty, regardless of whether
the deployment status of the subcloud was ``prestage-complete`` before the upgrade.
These fields will only be updated with values if you run ``subcloud prestage``
or ``prestage orchestration`` again.
**See**: :ref:`Subclouds Previous Major Release Management <subclouds-previous-release-management-5e986615cb4b>`
**For non-prestaged subcloud remote installations**
The ISO imported via ``load-import --active`` should always be at the same patch
level as the system controller. This is to ensure that the subcloud boot image
aligns with the patch level of the load to be installed on the subcloud.
**See**:`installing-a-subcloud-using-redfish-platform-management-service`
**For prestaged remote subcloud installations**
The ISO imported via ``load-import --inactive`` should be at the same patch level
as the system controller. If the system controller is patched after subclouds
have been prestaged, it is recommended to repeat the prestaging for each
subcloud. This is to ensure that the subcloud boot image aligns with the patch
level of the load to be installed on the subcloud.
**See**: :ref:`prestaging-prereqs`
****************************************
WAD Users Access Right Control via Group
****************************************
You can configure an |LDAP| / |WAD| user with 'sys_protected' group or 'sudo all'.
- an |LDAP| / |WAD| user in 'sys_protected' group on |prod-long|
- is equivalent to the special 'sysadmin' bootstrap user
- via "source /etc/platform/openrc"
- has Keystone admin/admin identity and credentials, and
- has Kubernetes /etc/kubernetes/admin.conf credentials
- only a small number of users have this capability
- an |LDAP| / |WAD| user with 'sudo all' capability on |prod-long|
- can perform the following |prod|-type operations:
- sw_patch to unauthenticated endpoint
- docker/crictl to communicate with the respective daemons
- using some utilities - like show-certs.sh, license-install (recovery only)
- IP configuration for local network setup
- password changes of Linux users (i.e. local LDAP)
- access to restricted files, including some logs
- manual reboots
The local |LDAP| server by default serves both HTTPS on port 636 and HTTP on
port 389.
The HTTPS server certificate is issued by cert-manager ClusterIssuer
``system-local-ca`` and is managed internally by cert-manager. The certificate
will be automatically renewed when the expiration date approaches. The
certificate is called ``system-openldap-local-certificate`` with its secret
having the same name ``system-openldap-local-certificate`` in the
``deployment`` namespace. The server certificate and private key files are
stored in the ``/etc/ldap/certs/`` system directory.
**See**:
- :ref:`local-ldap-certificates-4e1df1e39341`
- :ref:`sssd-support-5fb6c4b0320b`
- :ref:`create-ldap-linux-accounts`
****************************************************************************************
Accessing Collect Command with 'sudo' privileges and membership in 'sys-protected' Group
****************************************************************************************
The |prod| 10.0 adds support to run ``Collect`` from any
local |LDAP| or Remote |WAD| user account with 'sudo' capability and a member
of the 'sys_protected' group.
The ``Collect`` tool continues support from the 'sysadmin' user account
and also being run from any other successfully created |LDAP| and |WAD| account
with 'sudo' capability and a member of the 'sys_protected' group.
For security reasons, no password 'sudo' continues to be unsupported.
.. Eric McDonald please confirm if this is supported in Stx 10.0
********************************
Support for Intel In-tree Driver
********************************
The system supports both in-tree and out-of-tree versions of the Intel ``ice``,
``i40e``, and ``iavf`` drivers. On initial installation, the system uses the
default out-of-tree driver version. You can switch between the in-tree and
out-of-tree driver versions. For further details:
**See**: :ref:`intel-driver-version-c6e3fa384ff7`
.. note::
The ice in-tree driver does not support SyncE/GNSS deployments.
**************************
Password Rules Enhancement
**************************
You can check current password expiry settings by running the
:command:`chage -l <username>` command replacing ``<username>`` with the name
of the user whose password expiry settings you wish to view.
You can also change password expiry settings by running the
:command:`sudo chage -M <days_to_expiry> <username>` command.
Use the following new password rules as listed below:
1. There should be a minimum length of 12 characters.
2. The password must contain at least one letter, one number, and one special
character.
3. Do not reuse the past 5 passwords.
4. The Password expiration period should be defined by users, but by default
it is set to 90 days.
**See**:
- :ref:`linux-accounts-password-3dcad436dce4`
- :ref:`starlingx-system-accounts-system-account-password-rules`
- :ref:`system-account-password-rules`
*******************************************************************************
Management Network Reconfiguration after Deployment Completion Phase 1 |AIO-SX|
*******************************************************************************
|prod| 10.0 supports changes to the management IP addresses
for a standalone |AIO-SX| and for an |AIO-SX| subcloud after the node is
completely deployed.
**See**:
- :ref:`Manage Management Network Parameters for a Standalone AIO-SX <manage-management-network-parameters-for-a-standalone-aiosx-18c7aaace64d>`
- :ref:`Manage Subcloud Management Network Parameters <manage-subcloud-management-network-parameters-ffde7da356dc>`
****************************
Networking Statistic Support
****************************
The Node Interface Metrics Exporter application is designed to fetch and
display node statistics in a Kubernetes environment. It deploys an Interface
Metrics Exporter DaemonSet on all nodes with the
``starlingx.io/interface-metrics=true node`` label. It uses the Netlink library
to gather data directly from the kernel, offering real-time insights into node
performance.
**See**: :ref:`node-interface-metrics-exporter-application-d98b2707c7e9`
*****************************************************
Add Existing Cloud as Subcloud Without Reinstallation
*****************************************************
The subcloud enrollment feature converts a factory pre-installed system
or initially deployed as a standalone cloud system to a subcloud of a |DC|.
Factory pre-installation standalone systems must be installed locally in the
factory, and later deployed and configured on-site as a |DC| subcloud without
re-installing the system.
**See**: :ref:`Enroll a Factory Installed Non Distributed Standalone System as a Subcloud <enroll-a-factory-installed-nonminusdc-standalone-system-as-a-s-87b2fbf81be3>`
********************************************
Rook Support for freshly Installed StarlingX
********************************************
The new Rook Ceph application will be used for deploying the latest version of
Ceph via Rook.
Rook Ceph is an orchestrator that provides a containerized solution for
Ceph Storage with a specialized Kubernetes Operator to automate the management
of the cluster. It is an alternative solution to the bare metal Ceph storage.
See https://rook.io/docs/rook/latest-release/Getting-Started/intro/ for more
details.
The deployment model is the topology strategy that defines the storage backend
capabilities of the deployment. The deployment model dictates how the storage
solution will look like when defining rules for the placement of storage
cluster elements.
Enhanced Availability for Ceph on AIO-DX
****************************************
Ceph on |AIO-DX| now works with 3 Ceph monitors providing High Availability and
enhancing uptime and resilience.
Available Deployment Models
***************************
Each deployment model works with different deployment strategies and rules to
fit different needs. The following models available for the requirements of
your cluster are:
- Controller Model (default)
- Dedicated Model
- Open Model
**See**: :ref:`Deployment Models and Services for Rook Ceph <deployment-models-for-rook-ceph-b855bd0108cf>`
Storage Backend
***************
Configuration of the storage backend defines the deployment models
characteristics and main configurations.
Migration with Rook container based Ceph Installations
******************************************************
When you migrate an |AIO-SX| to an |AIO-DX| subcloud with Rook container-based
Ceph installations in |prod| 10.0, you would need to follow the
additional procedural steps below:
.. rubric:: |proc|
After you configure controller-1, follow the steps below:
#. Add a new Ceph monitor on controller-1.
.. code-block::none
~(keystone_admin)$ system host-fs-add controller-1 ceph=<size>
#. Add an |OSD| on controller-1.
#. List host's disks and identify disks you want to use for Ceph |OSDs|. Ensure
you note the |UUIDs|.
.. code-block::none
~(keystone_admin)$ system host-disk-list controller-1
#. Add disks as an |OSD| storage.
.. code-block::none
~(keystone_admin)$ system host-stor-add controller-1 osd <disk-uuid>
#. List |OSD| storage devices.
.. code-block::none
~(keystone_admin)$ system host-stor-list controller-1
Unlock controller-1 and follow the steps below:
#. Wait until Ceph is updated with two active monitors. To verify the updates,
run the :command:`ceph -s` command and ensure the output shows
`mon: 2 daemons, quorum a,b`. This confirms that both monitors are active.
.. code-block::none
~(keystone_admin)$ ceph -s
cluster:
id: c55813c6-4ce5-470b-b9f5-e3c1fa0c35b1
health: HEALTH_WARN
insufficient standby MDS daemons available
services:
mon: 2 daemons, quorum a,b (age 2m)
mgr: a(active, since 114s), standbys: b
mds: 1/1 daemons up
osd: 4 osds: 4 up (since 46s), 4 in (since 65s)
#. Add the floating monitor.
.. code-block::none
~(keystone_admin)$ system host-lock controller-1
~(keystone_admin)$ system controllerfs-add ceph-float=<size>
~(keystone_admin)$ system host-unlock controller-1
Wait for the controller to reset and come back up to an operational state.
#. Re-apply the ``rook-ceph`` application.
.. code-block::none
~(keystone_admin)$ system application-apply rook-ceph
To Install and Uninstall Rook Ceph
**********************************
**See**:
- :ref:`Install Rook Ceph <install-rook-ceph-a7926a1f9b70>`
- :ref:`Uninstall Rook Ceph <uninstall-rook-ceph-cbb046746782>`
Performance Configurations on Rook Ceph
***************************************
When using Rook Ceph it is important to consider resource allocation and
configuration adjustments to ensure optimal performance. Rook introduces
additional management overhead compared to a traditional bare-metal Ceph setup
and needs more infrastructure resources.
**See**: :ref:`performance-configurations-rook-ceph-9e719a652b02`
**********************************************************************************
Protecting against L2 Network Attackers - Securing local traffic on MGMT networks
**********************************************************************************
A new security solution is introduced for |prod-long| inter-host management
network:
- Attackers with direct access to local |prod-long| L2 VLANs
- specifically protect LOCAL traffic on the MGMT network which is used for
private/internal infrastructure management of the |prod-long| cluster.
- Protection against both passive and active attackers accessing private/internal
data, which could risk the security of the cluster
- passive attackers that are snooping traffic on L2 VLANs (MGMT), and
- active attackers attempting to connect to private internal endpoints on
|prod-long| L2 interfaces (MGMT) on |prod| hosts.
IPsec is a set of communication rules or protocols for setting up secure
connections over a network. |prod| utilizes IPsec to protect local traffic
on the internal management network of multi-node systems.
|prod| uses strongSwan as the IPsec implementation. strongSwan is an
opensource IPsec solution. See https://strongswan.org/ for more details.
For the most part, IPsec on |prod| is transparent to users.
**See**:
- :ref:`IPsec Overview <ipsec-overview-680c2dcfbf3b>`
- :ref:`Configure and Enable IPsec <ipsec-configuration-and-enabling-f70964bc49d1>`
- :ref:`IPSec Certificates <ipsec-certificates-2c0655a2a888>`
- :ref:`IPSec CLIs <ipsec-clis-5f38181d077f>`
**********************************************************
Vault application support for running on application cores
**********************************************************
By default the Vault application's pods will run on platform cores.
"If ``static kube-cpu-mgr-policy`` is selected and when overriding the label
``app.starlingx.io/component`` for Vault namespace or pods, there are two requirements:
- The Vault server pods need to be restarted as directed by Hashicorp Vault
documentation. Restart each of the standby server pods in turn, then restart
the active server pod.
- Ensure that sufficient hosts with worker function are available to run the
Vault server pods on application cores.
**See**:
- :ref:`Kubernetes CPU Manager Policies <kubernetes-cpu-manager-policies>`.
- :ref:`System backup, System and Storage Restore <index-backup-kub-699e0d16c076>`.
- :ref:`Run Hashicorp Vault Restore Playbook Remotely <run-hashicorp-vault-restore-playbook-remotely-436250ea3ed7>`.
- :ref:`Run Hashicorp Vault Restore Playbook Locally on the Controller <run-hashicorp-vault-restore-playbook-locally-on-the-controller-10daacd4abdc>`.
Restart the Vault Server pods
*****************************
The Vault server pods do not restart automatically. If the pods are to be
re-labelled to switch execution from platform to application cores, or vice-versa,
then the pods need to be restarted.
Under Kubernetes the pods are restarted using the :command:`kubectl delete pod`
command. See, Hashicorp Vault documentation for the recommended procedure for
restarting server pods in |HA| configuration,
https://support.hashicorp.com/hc/en-us/articles/23744227055635-How-to-safely-restart-a-Vault-cluster-running-on-Kubernetes.
Ensure that sufficient hosts are available to run the server pods on application cores
**************************************************************************************
The standard cluster with less than 3 worker nodes does not support Vault |HA|
on the application cores. In this configuration (less than three cluster hosts
with worker function):
- When setting label app.starlingx.io/component=application with the Vault
app already applied in |HA| configuration (3 Vault server pods), ensure that
there are 3 nodes with worker function to support the |HA| configuration.
- When applying Vault for the first time and with ``app.starlingx.io/component``
set to "application": ensure that the server replicas is also set to 1 for
non-HA configuration. The replicas for Vault server are overriden both for
the Vault Helm chart and the Vault manager Helm chart:
.. code-block:: none
cat <<EOF > vault_overrides.yaml
server:
extraLabels:
app.starlingx.io/component: application
ha:
replicas: 1
injector:
extraLabels:
app.starlingx.io/component: application
EOF
cat <<EOF > vault-manager_overrides.yaml
manager:
extraLabels:
app.starlingx.io/component: application
server:
ha:
replicas: 1
EOF
$ system helm-override-update vault vault vault --values vault_overrides.yaml
$ system helm-override-update vault vault-manager vault --values vault-manager_overrides.yaml
******************************************************
Component Based Upgrade and Update - VIM Orchestration
******************************************************
|VIM| Patch Orchestration in StarlingX 10.0 has been updated to interwork with
the new underlying Unified Software Management APIs.
As before, |VIM| Patch Orchestration automates the patching of software across
all hosts of a Cloud configuration. All Cloud configurations are supported;
|AIO-SX|, |AIO-DX|, |AIO-DX| with worker nodes, Standard configuration with controller
storage and Standard configuration with dedicated storage.
.. note::
This includes the automation of both applying a Patch and removing a Patch.
**See**
- :ref:`orchestrated-deployment-host-software-deployment-d234754c7d20`
- :ref:`orchestrated-removal-host-software-deployment-3f542895daf8` .
**********************************************************
Subcloud Remote Install, Upgrade and Prestaging Adaptation
**********************************************************
StarlingX 10.0 supports software management upgrade/update process
that does not require re-installation. The procedure for upgrading a system is
simplified since the existing filesystem and associated release configuration
will remain intact in the versioned controlled paths (e.g. /opt/platform/config/<version>).
In addition the /var and /etc directories is retained, indicating that
updates can be done directly as part of the software migration procedure. This
eliminates the need to perform a backup and restore procedure for |AIO-SX|
based systems. In addition, the rollback procedure can revert to the
existing versioned or saved configuration in the event an error occurs
if the system must be reverted to the older software release.
With this change, prestaging for an upgrade will involve populating a new ostree
deployment directory in preparation for an atomic upgrade and pulling new container
image versions into the local container registry. Since the system is not
reinstalled, there is no requirement to save container images to a protected
partition during the prestaging process, the new container images can be
populated in the local container registry directly.
**See**: :ref:`prestage-a-subcloud-using-dcmanager-df756866163f`
********************************************************
Update Default Certificate Configuration on Installation
********************************************************
You can configure default certificates during install for both standalone and
Distributed Cloud systems.
**New bootstrap overrides for system-local-ca (Platform Issuer)**
- You can customize the Platform Issuer (system-local-ca) used to sign the platform
certificates with an external Intermediate |CA| from bootstrap, using the new
bootstrap overrides.
**See**: :ref:`Platform Issuer <ansible_bootstrap_configs_platform_issuer>`
.. note::
It is recommended to configure these overrides. If it is not configured,
``system-local-ca`` will be configured using a local auto-generated
Kubernetes Root |CA|.
**REST API / Horizon GUI and Docker Registry certificates are issued during bootstrap**
- The certificates for StarlingX REST APIs / Horizon GUI access and Local
Docker Registry will be automatically issued by ``system-local-ca`` during
bootstrap. They will be anchored to ``system-local-ca`` Root CA public
certificate, so only this certificate needs to be added in the user list of
trusted CAs.
**HTTPS enabled by default for StarlingX REST API access**
- The system is now configured by default with HTTPS enabled for access to
StarlingX API and the Horizon GUI. The certificate used to secure this will be
anchored to ``system-local-ca`` Root |CA| public certificate.
**Playbook to update system-local-ca and re-sign the renamed platform certificates**
- The ``migrate_platform_certificates_to_certmanager.yml`` playbook is renamed
to ``update_platform_certificates.yml``.
**External certificates provided in bootstrap overrides can now be provided as
base64 strings, such that they can be securely stored with Ansible Vault**
- The following bootstrap overrides for certificate data **CAN** be provided as
the certificate / key converted into single line base64 strings instead of the
filepath for the certificate / key:
- ssl_ca_cert
- k8s_root_ca_cert and k8s_root_ca_key
- etcd_root_ca_cert and etcd_root_ca_key
- system_root_ca_cert, system_local_ca_cert and system_local_ca_key
.. note::
You can secure the certificate data in an encrypted bootstrap
overrides file using Ansible Vault.
The base64 string can be obtained using the :command:`base64 -w0 <cert_file>`
command. The string can be included in the overrides YAML file
(secured via Ansible Vault), then insecurely managed ``cert_file``
can be removed from the system.
***************************************************
Dell CSI Driver Support - Test with Dell PowerStore
***************************************************
|prod| 10.0 supports a new system application to support
kubernetes CSM/CSI for Dell Storage Platforms. With this application the user
can communicate with Dell PowerScale, PowerMax, PowerFlex, PowerStore and
Unity XT Storage Platforms to provision |PVCs| and use them on kubernetes
stateful applications.
**See**: :ref:`Dell Storage File System Provisioner <index-storage-kub-e797132c87a8>`
for details on installation and configurations.
************************************************
O-RAN O2 IMS and DMS Interface Compliancy Update
************************************************
With the new updates in Infrastructure Management Services (IMS) and
Deployment Management Services (DMS) the J-release for O-RAN O2, OAuth2 and mTLS
are mandatory options. It is fully compliant with latest O-RAN spec O2 IMS
interface R003 -v05.00 version and O2 DMS interface K8s profile - R003-v04.00
version. Kubernetes Secrets are no longer required.
The services implemented include:
- O2 API with mTLS enabled
- O2 API supported OAuth2.0
- Compliance with O2 IMS and DMS specs
**See**: :ref:`oran-o2-application-b50a0c899e66`
***************************************************
Configure Liveness Probes for PTP Notification Pods
***************************************************
Helm overrides can be used to configure liveness probes for ``ptp-notification``
containers.
**See**: :ref:`configure-liveness-probes`
*************************
Intel QAT and GPU Plugins
*************************
The |QAT| and |GPU| applications provide a set of plugins developed by Intel
to facilitate the use of Intel hardware features in Kubernetes clusters.
These plugins are designed to enable and optimize the use of Intel-specific
hardware capabilities in a Kubernetes environment.
Intel |GPU| plugin enables Kubernetes clusters to utilize Intel GPUs for
hardware acceleration of various workloads.
Intel® QuickAssist Technology (Intel® QAT) accelerates cryptographic workloads
by offloading the data to hardware capable of optimizing those functions.
The following QAT and GPU plugins are supported in |prod| 10.0.
**See**:
- :ref:`intel-device-plugins-operator-application-overview-c5de2a6212ae`
- :ref:`gpu-device-plugin-configuration-615e2f6edfba`
- :ref:`qat-device-plugin-configuration-616551306371`
******************************************
Support for Sapphire Rapids Integrated QAT
******************************************
Intel 4th generation Xeon Scalable Processor (Sapphire Rapids) support has been
introduced for the |prod| 10.0.
- Drivers for QAT Gen 4 Intel Xeon Gold Scalable processor (Sapphire Rapids)
- Intel Xeon Gold 6428N
**************************************************
Sapphire Rapids Data Streaming Accelerator Support
**************************************************
Intel® |DSA| is a high-performance data copy and transformation accelerator
integrated into Intel® processors starting with 4th Generation Intel® Xeon®
processors. It is targeted for optimizing streaming data movement and
transformation operations common with applications for high-performance
storage, networking, persistent memory, and various data processing
applications.
**See**: :ref:`data-streaming-accelerator-db88a67c930c`
*************************
DPDK Private Mode Support
*************************
For the purpose of enabling and using ``needVhostNet``, |SRIOV| needs to be
configured on a worker host.
**See**: :ref:`provisioning-sr-iov-interfaces-using-the-cli`
******************************
|SRIOV| |FEC| Operator Support
******************************
|FEC| Operator 2.9.0 is adopted based on Intel recommendations offering features
for various Intel hardware accelerators used for field deployments.
**See**: :ref:`configure-sriov-fec-operator-to-enable-hw-accelerators-for-hosted-vran-containarized-workloads`
******************************************************
Support for Advanced VMs on Stx Platform with KubeVirt
******************************************************
The KubeVirt system application kubevirt-app-1.1.0 in |prod-long| includes:
KubeVirt, Containerized Data Importer (CDI) v1.58.0, and the Virtctl client tool.
|prod| 10.0 supports enhancements for this application, describes
the Kubevirt architecture with steps to install Kubevirt and provides examples
for effective implementation in your environment.
**See**:
- :ref:`index-kubevirt-f1bfd2a21152`
***************************************************
Support Harbor Registry (Harbor System Application)
***************************************************
Harbor registry is integrated as a System Application. End users can use Harbor,
running on |prod-long|, for holding and managing their container images. The
Harbor registry is currently not used by the platform.
Harbor is an open-source registry that secures artifacts with policies and
role-based access control, ensures images are scanned and free from
vulnerabilities, and signs images as trusted. Harbor has been evolved to a
complete |OCI| compliant cloud-native artifact registry.
With Harbor V2.0, users can manage images, manifest lists, Helm charts,
|CNABs|, |OPAs| among others which all adhere to the |OCI| image specification.
It also allows for pulling, pushing, deleting, tagging, replicating, and
scanning such kinds of artifacts. Signing images and manifest list are also
possible now.
.. note::
When using local |LDAP| for authentication of the Harbor system application,
you cannot use local |LDAP| groups for authorization; use only individual
local |LDAP| users for authorization.
**See**: :ref:`harbor-as-system-app-1d1e3ec59823`
**************************
Support for DTLS over SCTP
**************************
DTLS (Datagram Transport Layer Security) v1.2 is supported in |prod| 10.0.
1. The |SCTP| module is now autoloaded by default.
2. The socket buffer size values have been upgraded:
Old values (in Bytes):
- net.core.rmem_max=425984
- net.core.wmem_max=212992
New Values (In Bytes):
- net.core.rmem_max=10485760
- net.core.wmem_max=10485760
3. To enable each |SCTP| socket association to have its own buffer space, the
socket accounting policies have been updated as follows:
- net.sctp.sndbuf_policy=1
- net.sctp.rcvbuf_policy=1
Old value:
- net.sctp.auth_enable=0
New value:
- net.sctp.auth_enable=1
***********************************************************
Banner Information Automation during Subcloud Bootstrapping
***********************************************************
Users can now customize and automate banner information for subclouds during
system commissioning and installation.
You can customize the pre-login message (issue) and post-login |MOTD| across
the entire |prod| cluster during system commissioning and installation.
**See**: :ref:`Brand the Login Banner During Commissioning <branding-the-login-banner-during-commissioning>`
.. end-new-features-r10
----------------
Hardware Updates
----------------
**See**:
- :ref:`Kubernetes Verified Commercial Hardware <verified-commercial-hardware>`
----------
Bug status
----------
**********
Fixed bugs
**********
This release provides fixes for a number of defects. Refer to the StarlingX bug
database to review the R10.0 `Fixed Bugs <https://bugs.launchpad.net/starlingx/+bugs?field.searchtext=&orderby=-importance&field.status%3Alist=FIXRELEASED&assignee_option=any&field.assignee=&field.bug_reporter=&field.bug_commenter=&field.subscriber=&field.structural_subscriber=&field.tag=stx.10.0&field.tags_combinator=ANY&field.has_cve.used=&field.omit_dupes.used=&field.omit_dupes=on&field.affects_me.used=&field.has_patch.used=&field.has_branches.used=&field.has_branches=on&field.has_no_branches.used=&field.has_no_branches=on&field.has_blueprints.used=&field.has_blueprints=on&field.has_no_blueprints.used=&field.has_no_blueprints=on&search=Search>`_.
.. All please confirm if any Limitations need to be removed / added for Stx 10.0.
----------------------------------------
Known Limitations and Procedural Changes
----------------------------------------
The following are known limitations you may encounter with your |prod| 10.0
and earlier releases. Workarounds are suggested where applicable.
.. note::
These limitations are considered temporary and will likely be resolved in
a future release.
.. contents:: |minitoc|
:local:
:depth: 1
************************************
Ceph Daemon Crash and Health Warning
************************************
After a Ceph daemon crash, an alarm is displayed to verify Ceph health.
Run ``ceph -s`` to display the following message:
.. code-block::
cluster:
id: <id>
health: HEALTH_WARN
1 daemons have recently crashed
One or more Ceph daemons have crashed, and the crash has not yet been
archived or acknowledged by the administrator.
**Procedural Changes**: To archive the crash, clear the health check warning
and the alarm.
1. List the timestamp/uuid crash-ids for all newcrash information:
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ ceph crash ls-new
2. Display details of a saved crash.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ ceph crash info <crash-id>
3. Archive the crash so it no longer appears in ``ceph crash ls-new`` output.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ ceph crash archive <crash-id>
4. After archiving the crash, make sure the recent crash is not displayed.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ ceph crash ls-new
5. If more than one crash needs to be archived run the following command.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ ceph crash archive-all
********************************
Rook Ceph Application Limitation
********************************
After applying Rook Ceph application in an |AIO-DX| configuration the
``800.001 - Storage Alarm Condition: HEALTH_WARN`` alarm may be triggered.
**Procedural Changes**: Restart the pod of the monitor associated with the
slow operations detected by Ceph. Check ``ceph -s``.
*********************************************************************
Subcloud failed during rehoming while creating RootCA update strategy
*********************************************************************
Subcloud rehoming may fail while creating the RootCA update strategy.
**Proceudral Changes**: Delete the subcloud from the new System Controller and
rehome it again.
**************************************************
RSA required to be the platform issuer private key
**************************************************
The ``system-local-ca`` issuer needs to use RSA type certificate/key. The usage
of other types of private keys is currently not supported during bootstrap
or with the ``Update system-local-ca or Migrate Platform Certificates to use
Cert Manager`` procedures.
**Proceudral Changes**: N/A.
*****************************************************
Host lock/unlock may interfere with application apply
*****************************************************
Host lock and unlock operations may interfere with applications that are in
the applying state.
**Proceudral Changes**: Re-applying or removing / installing applications may be
required. Application status can be checked using the :command:`system application-list`
command.
****************************************************
Add / delete operations on pods may result in errors
****************************************************
Under some circumstances, add / delete operations on pods may result in pods
staying in ContainerCreating/Terminating state and reporting an
'error getting ClusterInformation: connection is unauthorized: Unauthorized`.
This error may also prevent users from locking the host.
**Proceudral Changes**: If this error occurs run the following
:command:`kubectl describe pod -n <namespace> <pod name>` command. The following
message is displayed:
`error getting ClusterInformation: connection is unauthorized: Unauthorized`
.. note::
There is a known issue with the Calico CNI that may occur in rare
occasions if the Calico token required for communication with the
kube-apiserver becomes out of sync due to |NTP| skew or issues refreshing
the token.
**Proceudral Changes**: Delete the calico-node pod (causing it to automatically
restart) using the following commands:
.. code-block:: none
$ kubectl get pods -n kube-system --show-labels | grep calico
$ kubectl delete pods -n kube-system -l k8s-app=calico-node
******************************************
Deploy does not fail after a system reboot
******************************************
Deploy does not fail after a system reboot.
**Proceudral Changes**: Run the
:command:`sudo software-deploy-set-failed --hostname/-h <hostname> --confirm`
utility to manually move the deploy and deploy host to a failed state which is
caused by a failover, lost power, network outage etc. You can only run this
utility with root privileges on the active controller.
The utility displays the current state and warns the user about the next steps
to be taken in case the user needs to continue executing the utility. It also
displays the new states and the next operation to be executed.
*********************************
Rook-ceph application limitations
*********************************
This section documents the following known limitations you may encounter with
the rook-ceph application and procedural changes that you can use to resolve
the issue.
**Remove all OSDs in a host**
The procedure to remove |OSDs| will not work as expected when removing all
|OSDs| from a host. The Ceph cluster gets stuck in ``HEALTH_WARN`` state.
.. note::
Use the Procedural change only if the cluster is stuck in ``HEALTH_WARN``
state after removing all |OSDs| on a host.
**Procedural Changes**:
1. Check the cluster health status.
.. code-block::none
$ ceph status
2. Check crushmap tree.
.. code-block::none
$ ceph osd tree
3. Remove the host(s) that are empty in the command executed before
.. code-block::none
$ ceph crush remove <hostname>
4. Check the cluster health status.
.. code-block::none
$ ceph status
**Use the rook-ceph apply command when a host with OSD is in offline state**
The **rook-ceph apply** will not allocate the |OSDs| correctly if the host is
offline.
.. note::
Use either of the procedural changes below only if the |OSDs| are not
allocated in the Ceph cluster.
**Procedural Changes 1**:
1. Check if the |OSD| is not in crushmap tree.
.. code-block::none
$ ceph osd tree
2. Restart the rook-ceph operator pod.
.. code-block::none
$ kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=0
$ kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=1
.. note::
Wait for about 5 minutes to let the operator to try to recoever the |OSDs|.
3. Check if the |OSDS| have been added in crushmap tree.
.. code-block::none
$ ceph sd tree
**Procedural Changes 2**:
1. Check if the |OSD| is not in the crushmap tree OR it is in the crushmap tree
but not allocated in the correct location (within a host).
.. code-block::none
$ ceph osd tree
2. Lock the host
.. code-block::none
$ system host-lock <hostname>
Wait for the host to be locked.
3. Get the list from the |OSDs| inventory from the host.
.. code-block::none
$ system host-stor-list <hostname>
4. Remove the |OSDs| from the inventory.
.. code-block::none
$ system host-stor-delete <stor_uuid>
5. Reapply the rook-ceph application.
.. code-block::none
$ system application-apply rook-ceph
Wait for |OSDs| prepare pods to be recreated.
.. code-block::none
$ kubectl get pods -n rook-ceph -l app=rook-ceph-osd-prepare -w
6. Add the |OSDs| in the inventory.
.. code-block::none
$ system host-stor-add <hostname> <disk_uuid>
7. Reapply the rook-ceph application.
.. code-block::none
$ system application-apply rook-ceph
Wait for new |OSD| pods to be created and running.
.. code-block::none
$ kubectl get pods -n rook-ceph -l app=rook-ceph-osd -w
*****************************************************************************************************************
Unable to set maximum VFs for NICs using out-of-tree ice driver v1.14.9.2 on systems with a large number of cores
*****************************************************************************************************************
On systems with a large number of cores (>= 32 physical cores / 64 threads),
it is not possible to set the maximum number of |VFs| (32) for NICs using the
out-of-tree ice driver v1.14.9.2.
If the issue is encountered, the following error logs will be reported in kern.log:
.. code-block:: none
[ 83.322344] ice 0000:51:00.1: Only 59 MSI-X interrupts available for SR-IOV. Not enough to support minimum of 2 MSI-X interrupts per VF for 32 VFs
[ 83.322362] ice 0000:51:00.1: Not enough resources for 32 VFs, err -28. Try with fewer number of VFs
The impacted NICs are:
- Intel E810
- Silicom STS2
**Procedural Changes**: Reduce the number of configured |VFs|. To determine the
maximum number of supported |VFs|:
- Check /sys/class/net/<interface name>/device/sriov_vf_total_msix.
Example:
.. code-block:: none
cat /sys/class/net/enp81s0f0/device/sriov_vf_total_msix
59
- Calculate the maximum number of |VFs| as sriov_vf_total_msix / 2.
Example:
.. code-block:: none
max_VFs = 59/2 = 29
*****************************************************************
Critical alarm 800.001 after Backup and Restore on AIO-SX Systems
*****************************************************************
A Critical alarm 800.001 may be triggered after running the Restore
Playbook. The alarm details are as follows:
.. code-block:: none
~(keystone_admin)]$ fm alarm-list
+-------+----------------------------------------------------------------------+--------------------------------------+----------+---------------+
| Alarm | Reason Text | Entity ID | Severity | Time Stamp |
| ID | | | | |
+-------+----------------------------------------------------------------------+--------------------------------------+----------+---------------+
| 800. | Storage Alarm Condition: HEALTH_ERR. Please check 'ceph -s' for more | cluster= | critical | 2024-08-29T06 |
| 001 | details. | 96ebcfd4-3ea5-4114-b473-7fd0b4a65616 | | :57:59.701792 |
| | | | | |
+-------+----------------------------------------------------------------------+--------------------------------------+----------+---------------+
**Procedural Changes**: To clear this alarm run the following commands:
.. note::
Applies only to |AIO-SX| systems.
.. code-block:: none
FS_NAME=kube-cephfs
METADATA_POOL_NAME=kube-cephfs-metadata
DATA_POOL_NAME=kube-cephfs-data
# Ensure that the Ceph MDS is stopped
sudo rm -f /etc/pmon.d/ceph-mds.conf
sudo /etc/init.d/ceph stop mds
# Recover MDS state from filesystem
ceph fs new ${FS_NAME} ${METADATA_POOL_NAME} ${DATA_POOL_NAME} --force
# Try to recover from some common errors
sudo ceph fs reset ${FS_NAME} --yes-i-really-mean-it
cephfs-journal-tool --rank=${FS_NAME}:0 event recover_dentries summary
cephfs-journal-tool --rank=${FS_NAME}:0 journal reset
cephfs-table-tool ${FS_NAME}:0 reset session
cephfs-table-tool ${FS_NAME}:0 reset snap
cephfs-table-tool ${FS_NAME}:0 reset inode
sudo /etc/init.d/ceph start mds
*******************************************************************************
Error installing Rook Ceph on |AIO-DX| with host-fs-add before controllerfs-add
*******************************************************************************
When you provision controller-0 manually prior to unlock, the following sequence
of commands fail:
.. code-block:: none
~(keystone_admin)]$ system storage-backend-add ceph-rook --confirmed
~(keystone_admin)]$ system host-fs-add controller-0 ceph=20
~(keystone_admin)]$ system controllerfs-add ceph-float=20
The following error occurs when you run the :command:`controllerfs-add` command:
"Failed to create controller filesystem ceph-float: controllers have pending
LVG updates, please retry again later".
**Procedural Changes**: To avoid this issue, run the commands in the following
sequence:
.. code-block:: none
~(keystone_admin)]$ system storage-backend-add ceph-rook --confirmed
~(keystone_admin)]$ controllerfs-add ceph-float=20
~(keystone_admin)]$ system host-fs-add controller-0 ceph=20
***********************************************************
Intermittent installation of Rook-Ceph on Distributed Cloud
***********************************************************
While installing rook-ceph, if the installation fails, this is due to
``ceph-mgr-provision`` not being provisioned correctly.
**Procedural Changes**: It is recommended to use
the :command:`system application-remove rook-ceph --force` to initiate rook-ceph
installation.
********************************************************************
Authorization based on Local LDAP Groups is not supported for Harbor
********************************************************************
When using Local |LDAP| for authentication of the new Harbor system application,
you cannot use Local |LDAP| Groups for authorization; you can only use individual
Local |LDAP| users for authorization.
**Procedural Changes**: Use only individual Local LDAP users for specifying
authorization.
***************************************************
Vault application is not supported during Bootstrap
***************************************************
The Vault application cannot be configured during Bootstrap.
**Procedural Changes**:
The application must be configured after the platform nodes are unlocked /
enabled / available, a storage backend is configured, and ``platform-integ-apps``
is applied. If Vault is to be run in |HA| configuration (3 vault server pods)
then at least three controller / worker nodes must be unlocked / enabled / available.
******************************************
cert-manager cm-acme-http-solver pod fails
******************************************
On a multinode setup, when you deploy an acme issuer to issue a certificate,
the ``cm-acme-http-solver`` pod might fail and stays in "ImagePullBackOff" state
due to the following defect https://github.com/cert-manager/cert-manager/issues/5959.
**Procedural Changes**:
1. If you are using the namespace "test", create a docker-registry secret
"testkey" with local registry credentials in the "test" namespace.
.. code-block:: none
~(keystone_admin)]$ kubectl create secret docker-registry testkey --docker-server=registry.local:9001 --docker-username=admin --docker-password=Password*1234 -n test
2. Use the secret "testkey" in the issuer spec as follows:
.. code-block:: none
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: stepca-issuer
namespace: test
spec:
acme:
server: https://test.com:8080/acme/acme/directory
skipTLSVerify: true
email: test@test.com
privateKeySecretRef:
name: stepca-issuer
solvers:
- http01:
ingress:
podTemplate:
spec:
imagePullSecrets:
- name: testkey
class: nginx
**************************************************************
ptp-notification application is not supported during bootstrap
**************************************************************
- Deployment of ``ptp-notification`` during bootstrap time is not supported due
to dependencies on the system |PTP| configuration which is handled
post-bootstrap.
**Procedural Changes**: N/A.
- The :command:`helm-chart-attribute-modify` command is not supported for
``ptp-notification`` because the application consists of a single chart.
Disabling the chart would render ``ptp-notification`` non-functional.
See :ref:`sysconf-application-commands-and-helm-overrides` for details on
this command.
**Procedural Changes**: N/A.
******************************************
Harbor cannot be deployed during bootstrap
******************************************
The Harbor application cannot be deployed during bootstrap due to the bootstrap
deployment dependencies such as early availability of storage class.
**Procedural Changes**: N/A.
********************
Kubevirt Limitations
********************
The following limitations apply to Kubevirt in |prod| 10.0:
- **Limitation**: Kubernetes does not provide CPU Manager detection.
**Procedural Changes**: Add ``cpumanager`` to Kubevirt:
.. code-block:: none
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
name: kubevirt
namespace: kubevirt
spec:
configuration:
developerConfiguration:
featureGates:
- LiveMigration
- Macvtap
- Snapshot
- CPUManager
Check the label, using the following command:
.. code-block:: none
~(keystone_admin)]$ kubectl describe node | grep cpumanager
where `cpumanager=true`
- **Limitation**: Huge pages do not show up under cat /proc/meminfo inside a
guest VM. Although, resources are being consumed on the host. For example,
if a VM is using 4GB of Huge pages, the host shows the same 4GB of huge
pages used. The huge page memory is exposed as normal memory to the VM.
**Procedural Changes**: You need to configure Huge pages inside the guest
OS.
See :ref:`Installation Guides <index-install-e083ca818006>` for more details.
- **Limitation**: Virtual machines using Persistent Volume Claim (PVC) must
have a shared ReadWriteMany (RWX) access mode to be live migrated.
**Procedural Changes**: Ensure |PVC| is created with RWX.
.. code-block::
$ class=cephfs --access-mode=ReadWriteMany
$ virtctl image-upload --pvc-name=cirros-vm-disk-test-2 --pvc-size=500Mi --storage-class=cephfs --access-mode=ReadWriteMany --image-path=/home/sysadmin/Kubevirt-GA-testing/latest-manifest/kubevirt-GA-testing/cirros-0.5.1-x86_64-disk.img --uploadproxy-url=https://10.111.54.246 -insecure
.. note::
- Live migration is not allowed with a pod network binding of bridge
interface type ()
- Live migration requires ports 49152, 49153 to be available in the
virt-launcher pod. If these ports are explicitly specified in the
masquarade interface, live migration will not function.
- For live migration with |SRIOV| interface:
- specify networkData: in cloudinit, so when the VM moves to another node
it will not loose the IP config
- specify nameserver and internal |FQDNs| to connect to cluster metadata
server otherwise cloudinit will not work
- fix the MAC address otherwise when the VM moves to another node the MAC
address will change and cause a problem establishing the link
Example:
.. code-block:: none
cloudInitNoCloud:
networkData: |
ethernets:
sriov-net1:
addresses:
- 128.224.248.152/23
gateway: 128.224.248.1
match:
macAddress: "02:00:00:00:00:01"
nameservers:
addresses:
- 10.96.0.10
search:
- default.svc.cluster.local
- svc.cluster.local
- cluster.local
set-name: sriov-link-enabled
version: 2
- **Limitation**: Snapshot |CRDs| and controllers are not present by default
and needs to be installed on |prod-long|.
**Procedural Changes**: To install snapshot |CRDs| and controllers on
Kubernetes, see:
- kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
- kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
- kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
- kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
- kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
Additionally, create ``VolumeSnapshotClass`` for Cephfs and RBD:
.. code-block:: none
cat <<EOF>cephfs-storageclass.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-cephfsplugin-snapclass
driver: cephfs.csi.ceph.com
parameters:
clusterID: 60ee9439-6204-4b11-9b02-3f2c2f0a4344
csi.storage.k8s.io/snapshotter-secret-name: ceph-pool-kube-cephfs-data
csi.storage.k8s.io/snapshotter-secret-namespace: default deletionPolicy: Delete
EOF
.. code-block:: none
cat <<EOF>rbd-storageclass.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-rbdplugin-snapclass
driver: rbd.csi.ceph.com
parameters:
clusterID: 60ee9439-6204-4b11-9b02-3f2c2f0a4344
csi.storage.k8s.io/snapshotter-secret-name: ceph-pool-kube-rbd
csi.storage.k8s.io/snapshotter-secret-namespace: default deletionPolicy: Delete
EOF
.. note::
Get the cluster ID from : ``kubectl describe sc cephfs, rbd``
- **Limitation**: Live migration is not possible when using configmap as a
filesystem. Currently, virtual machine instances (VMIs) cannot be live migrated as
``virtiofs`` does not support live migration.
**Procedural Changes**: N/A.
- **Limitation**: Live migration is not possible when a VM is using secret
exposed as a filesystem. Currently, virtual machine instances cannot be
live migrated since ``virtiofs`` does not support live migration.
**Procedural Changes**: N/A.
- **Limitation**: Live migration will not work when a VM is using
ServiceAccount exposed as a file system. Currently, VMIs cannot be live
migrated since ``virtiofs`` does not support live migration.
**Procedural Changes**: N/A.
*************************************
synce4l CLI options are not supported
*************************************
The SyncE configuration using the ``synce4l`` is not supported in |prod|
10.0.
The service type of ``synce4l`` in the :command:`ptp-instance-add` command
is not supported in |prod-long| 10.0.
**Procedural Changes**: N/A.
***************************************************************************
Kubernetes Pod Core Dump Handler may fail due to a missing Kubernetes token
***************************************************************************
In certain cases the Kubernetes Pod Core Dump Handler may fail due to a missing
Kubernetes token resulting in disabling configuration of the coredump on a per
pod basis and limiting namespace access. If application coredumps are not being
generated, verify if the k8s-coredump token is empty on the configuration file:
``/etc/k8s-coredump-conf.json`` using the following command:
.. code-block:: none
~(keystone_admin)]$ ~$ sudo cat /etc/k8s-coredump-conf.json
{
"k8s_coredump_token": ""
}
**Procedural Changes**: If the k8s-coredump token is empty in the configuration file and
the kube-apiserver is verified to be responsive, users can re-execute the
create-k8s-account.sh script in order to generate the appropriate token after a
successful connection to kube-apiserver using the following commands:
.. code-block:: none
~(keystone_admin)]$ :/home/sysadmin$ sudo chmod +x /etc/k8s-coredump/create-k8s-account.sh
~(keystone_admin)]$ :/home/sysadmin$ sudo /etc/k8s-coredump/create-k8s-account.sh
**Limitations from previous releases**
*************************************
Impact of Kubernetes Upgrade to v1.24
*************************************
In Kubernetes v1.24 support for the ``RemoveSelfLink`` feature gate was removed.
In previous releases of |prod| this has been set to "false" for backward
compatibility, but this is no longer an option and it is now hardcoded to "true".
**Procedural Changes**: Any application that relies on this feature gate being disabled
(i.e. assumes the existance of the "self link") must be updated before
upgrading to Kubernetes v1.24.
******************************************
Console Session Issues during Installation
******************************************
After bootstrap and before unlocking the controller, if the console session times
out (or the user logs out), ``systemd`` does not work properly. ``fm, sysinv and
mtcAgent`` do not initialize.
**Procedural Changes**: If the console times out or the user logs out between bootstrap
and unlock of controller-0, then, to recover from this issue, you must
re-install the ISO.
************************************************
PTP O-RAN Spec Compliant Timing API Notification
************************************************
- The ``v1 API`` only supports monitoring a single ptp4l + phc2sys instance.
**Procedural Changes**: Ensure the system is not configured with multiple instances
when using the v1 API.
- The O-RAN Cloud Notification defines a /././sync API v2 endpoint intended to
allow a client to subscribe to all notifications from a node. This endpoint
is not supported in StarlingX.
**Procedural Changes**: A specific subscription for each resource type must be
created instead.
- ``v1 / v2``
- v1: Support for monitoring a single ptp4l instance per host - no other
services can be queried/subscribed to.
- v2: The API conforms to O-RAN.WG6.O-Cloud Notification API-v02.01
with the following exceptions, that are not supported in StarlingX.
- O-RAN SyncE Lock-Status-Extended notifications
- O-RAN SyncE Clock Quality Change notifications
- O-RAN Custom cluster names
**Procedural Changes**: See the respective PTP-notification v1 and v2 document
subsections for further details.
v1: https://docs.starlingx.io/api-ref/ptp-notification-armada-app/api_ptp_notifications_definition_v1.html
v2: https://docs.starlingx.io/api-ref/ptp-notification-armada-app/api_ptp_notifications_definition_v2.html
**************************************************************************
Upper case characters in host names cause issues with kubernetes labelling
**************************************************************************
Upper case characters in host names cause issues with kubernetes labelling.
**Procedural Changes**: Host names should be lower case.
***********************
Installing a Debian ISO
***********************
The disks and disk partitions need to be wiped before the install.
Installing a Debian ISO may fail with a message that the system is
in emergency mode if the disks and disk partitions are not
completely wiped before the install, especially if the server was
previously running a CentOS ISO.
**Procedural Changes**: When installing a system for any Debian install, the disks must
first be completely wiped using the following procedure before starting
an install.
Use the following wipedisk commands to run before any Debian install for
each disk (eg: sda, sdb, etc):
.. code-block:: none
sudo wipedisk
# Show
sudo sgdisk -p /dev/sda
# Clear part table
sudo sgdisk -o /dev/sda
.. note::
The above commands must be run before any Debian install. The above
commands must also be run if the same lab is used for CentOS installs after
the lab was previously running a Debian ISO.
**********************************
Security Audit Logging for K8s API
**********************************
A custom policy file can only be created at bootstrap in ``apiserver_extra_volumes``.
If a custom policy file was configured at bootstrap, then after bootstrap the
user has the option to configure the parameter ``audit-policy-file`` to either
this custom policy file (``/etc/kubernetes/my-audit-policy-file.yml``) or the
default policy file ``/etc/kubernetes/default-audit-policy.yaml``. If no
custom policy file was configured at bootstrap, then the user can only
configure the parameter ``audit-policy-file`` to the default policy file.
Only the parameter ``audit-policy-file`` is configurable after bootstrap, so
the other parameters (``audit-log-path``, ``audit-log-maxsize``,
``audit-log-maxage`` and ``audit-log-maxbackup``) cannot be changed at
runtime.
**Procedural Changes**: NA
**See**: :ref:`kubernetes-operator-command-logging-663fce5d74e7`.
******************************************
PTP is not supported on Broadcom 57504 NIC
******************************************
|PTP| is not supported on the Broadcom 57504 NIC.
**Procedural Changes**: None. Do not configure |PTP| instances on the Broadcom 57504
NIC.
************************************************************************************************
Deploying an App using nginx controller fails with internal error after controller.name override
************************************************************************************************
An Helm override of controller.name to the nginx-ingress-controller app may
result in errors when creating ingress resources later on.
Example of Helm override:
.. code-block::none
cat <<EOF> values.yml
controller:
name: notcontroller
EOF
~(keystone_admin)$ system helm-override-update nginx-ingress-controller ingress-nginx kube-system --values values.yml
+----------------+-----------------------+
| Property | Value |
+----------------+-----------------------+
| name | ingress-nginx |
| namespace | kube-system |
| user_overrides | controller: |
| | name: notcontroller |
| | |
+----------------+-----------------------+
~(keystone_admin)$ system application-apply nginx-ingress-controller
**Procedural Changes**: NA
****************************************
Optimization with a Large number of OSDs
****************************************
As Storage nodes are not optimized, you may need to optimize your Ceph
configuration for balanced operation across deployments with a high number of
|OSDs|. This results in an alarm being generated even if the installation
succeeds.
800.001 - Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s'
**Procedural Changes**: To optimize your storage nodes with a large number of |OSDs|, it
is recommended to use the following commands:
.. code-block:: none
~(keystone_admin)]$ ceph osd pool set kube-rbd pg_num 256
~(keystone_admin)]$ ceph osd pool set kube-rbd pgp_num 256
***************
BPF is disabled
***************
|BPF| cannot be used in the PREEMPT_RT/low latency kernel, due to the inherent
incompatibility between PREEMPT_RT and |BPF|, see, https://lwn.net/Articles/802884/.
Some packages might be affected when PREEMPT_RT and BPF are used together. This
includes the following, but not limited to these packages.
- libpcap
- libnet
- dnsmasq
- qemu
- nmap-ncat
- libv4l
- elfutils
- iptables
- tcpdump
- iproute
- gdb
- valgrind
- kubernetes
- cni
- strace
- mariadb
- libvirt
- dpdk
- libteam
- libseccomp
- binutils
- libbpf
- dhcp
- lldpd
- containernetworking-plugins
- golang
- i40e
- ice
**Procedural Changes**: It is recommended not to use BPF with real-time kernel.
If required it can still be used, for example, debugging only.
***********************
Control Group parameter
***********************
The control group (cgroup) parameter **kmem.limit_in_bytes** has been
deprecated, and results in the following message in the kernel's log buffer
(dmesg) during boot-up and/or during the Ansible bootstrap procedure:
"kmem.limit_in_bytes is deprecated and will be removed. Please report your
use case to linux-mm@kvack.org if you depend on this functionality." This
parameter is used by a number of software packages in |prod-long|, including,
but not limited to, **systemd, docker, containerd, libvirt** etc.
**Procedural Changes**: NA. This is only a warning message about the future deprecation
of an interface.
.. Chris F please confirm if this is applicable?
****************************************************
Kubernetes Taint on Controllers for Standard Systems
****************************************************
In Standard systems, a Kubernetes taint is applied to controller nodes in order
to prevent application pods from being scheduled on those nodes; since
controllers in Standard systems are intended ONLY for platform services.
If application pods MUST run on controllers, a Kubernetes toleration of the
taint can be specified in the application's pod specifications.
**Procedural Changes**: Customer applications that need to run on controllers on
Standard systems will need to be enabled/configured for Kubernetes toleration
in order to ensure the applications continue working after an upgrade from
|prod| 6.0 to |prod-long| future Releases. It is suggested to add
the Kubernetes toleration to your application prior to upgrading to |prod|
8.0.
You can specify toleration for a pod through the pod specification (PodSpec).
For example:
.. code-block:: none
spec:
....
template:
....
spec
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
**See**: `Taints and Tolerations <https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/>`__.
***************************************************************
Storage Nodes are not considered part of the Kubernetes cluster
***************************************************************
When running the :command:`system kube-host-upgrade-list` command the output
must only display controller and worker hosts that have control-plane and kubelet
components. Storage nodes do not have any of those components and so are not
considered a part of the Kubernetes cluster.
**Procedural Changes**: Do not include Storage nodes as part of the Kubernetes upgrade.
**************************************
Application Pods with SRIOV Interfaces
**************************************
Application Pods with |SRIOV| Interfaces require a **restart-on-reboot: "true"**
label in their pod spec template.
Pods with |SRIOV| interfaces may fail to start after a platform restore or
Simplex upgrade and persist in the **Container Creating** state due to missing
PCI address information in the CNI configuration.
**Procedural Changes**: Application pods that require|SRIOV| should add the label
**restart-on-reboot: "true"** to their pod spec template metadata. All pods with
this label will be deleted and recreated after system initialization, therefore
all pods must be restartable and managed by a Kubernetes controller
\(i.e. DaemonSet, Deployment or StatefulSet) for auto recovery.
Pod Spec template example:
.. code-block:: none
template:
metadata:
labels:
tier: node
app: sriovdp
restart-on-reboot: "true"
**************************************
Storage Nodes Recovery on Power Outage
**************************************
Storage nodes take 10-15 minutes longer to recover in the event of a full
power outage.
**Procedural Changes**: NA
*********************************
Ceph Recovery on an AIO-DX System
*********************************
In certain instances Ceph may not recover on an |AIO-DX| system, and remains
in the down state when viewed using the
:command"`ceph -s` command; for example, if an |OSD| comes up after a controller
reboot and a swact occurs, or other possible causes for example, hardware
failure of the disk or the entire host, power outage, or switch down.
**Procedural Changes**: There is no specific command or procedure that solves
the problem for all possible causes. Each case needs to be analyzed individually
to find the root cause of the problem and the solution. It is recommended to
contact Customer Support at,
`http://www.windriver.com/support <http://www.windriver.com/support>`__.
*******************************************************************
Cert-manager does not work with uppercase letters in IPv6 addresses
*******************************************************************
Cert-manager does not work with uppercase letters in IPv6 addresses.
**Procedural Changes**: Replace the uppercase letters in IPv6 addresses with lowercase
letters.
.. code-block:: none
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: oidc-auth-apps-certificate
namespace: test
spec:
secretName: oidc-auth-apps-certificate
dnsNames:
- ahost.com
ipAddresses:
- fe80::903a:1c1a:e802::11e4
issuerRef:
name: cloudplatform-interca-issuer
kind: Issuer
*******************************
Kubernetes Root CA Certificates
*******************************
Kubernetes does not properly support **k8s_root_ca_cert** and **k8s_root_ca_key**
being an Intermediate CA.
**Procedural Changes**: Accept internally generated **k8s_root_ca_cert/key** or
customize only with a Root CA certificate and key.
************************
Windows Active Directory
************************
.. _general-limitations-and-workarounds-ul-x3q-j3x-dmb:
- **Limitation**: The Kubernetes API does not support uppercase IPv6 addresses.
**Procedural Changes**: The issuer_url IPv6 address must be specified as
lowercase.
- **Limitation**: The refresh token does not work.
**Procedural Changes**: If the token expires, manually replace the ID token. For
more information, see, :ref:`Configure Kubernetes Client Access
<configure-kubernetes-client-access>`.
- **Limitation**: TLS error logs are reported in the **oidc-dex** container
on subclouds. These logs should not have any system impact.
**Procedural Changes**: NA
.. Stx LP Bug: https://bugs.launchpad.net/starlingx/+bug/1846418 Won't fix.
.. To be addressed in a future update.
************
BMC Password
************
The |BMC| password cannot be updated.
**Procedural Changes**: In order to update the |BMC| password, de-provision the |BMC|,
and then re-provision it again with the new password.
****************************************
Application Fails After Host Lock/Unlock
****************************************
In some situations, application may fail to apply after host lock/unlock due to
previously evicted pods.
**Procedural Changes**: Use the :command:`kubectl delete` command to delete the evicted
pods and reapply the application.
***************************************
Application Apply Failure if Host Reset
***************************************
If an application apply is in progress and a host is reset it will likely fail.
A re-apply attempt may be required once the host recovers and the system is
stable.
**Procedural Changes**: Once the host recovers and the system is stable, a re-apply
may be required.
*************************
Platform CPU Usage Alarms
*************************
Alarms may occur indicating platform cpu usage is greater than 90% if a large
number of pods are configured using liveness probes that run every second.
**Procedural Changes**: To mitigate either reduce the frequency for the liveness
probes or increase the number of platform cores.
*******************
Pods Using isolcpus
*******************
The isolcpus feature currently does not support allocation of thread siblings
for cpu requests (i.e. physical thread +HT sibling).
**Procedural Changes**: For optimal results, if hyperthreading is enabled then
isolcpus should be allocated in multiples of two in order to ensure that both
|SMT| siblings are allocated to the same container.
***********************************************************
Restrictions on the Size of Persistent Volume Claims (PVCs)
***********************************************************
There is a limitation on the size of Persistent Volume Claims (PVCs) that can
be used for all |prod| Releases.
**Procedural Changes**: It is recommended that all |PVCs| should be a minimum size of
1GB. For more information, see,
https://bugs.launchpad.net/starlingx/+bug/1814595.
***************************************************************
Sub-Numa Cluster Configuration not Supported on Skylake Servers
***************************************************************
Sub-Numa cluster configuration is not supported on Skylake servers.
**Procedural Changes**: For servers with Skylake Gold or Platinum CPUs, Sub-|NUMA|
clustering must be disabled in the BIOS.
*****************************************************************
The ptp-notification-demo App is Not a System-Managed Application
*****************************************************************
The ptp-notification-demo app is provided for demonstration purposes only.
Therefore, it is not supported on typical platform operations such as Upgrades
and Backup and Restore.
**Procedural Changes**: NA
*************************************************************************
Deleting image tags in registry.local may delete tags under the same name
*************************************************************************
When deleting image tags in the registry.local docker registry, you should be
aware that the deletion of an **<image-name:tag-name>** will delete all tags
under the specified <image-name> that have the same 'digest' as the specified
<image-name:tag-name>. For more information, see, :ref:`Delete Image Tags in
the Docker Registry <delete-image-tags-in-the-docker-registry-8e2e91d42294>`.
**Procedural Changes**: NA
****************************************************************************
Unable to create Kubernetes Upgrade Strategy for Subclouds using Horizon GUI
****************************************************************************
When creating a Kubernetes Upgrade Strategy for a
subcloud using the Horizon GUI, it fails and displays the following error:
.. code-block:: none
kube upgrade pre-check: Invalid kube version(s), left: (v1.24.4), right:
(1.24.4)
**Procedural Changes**: Use the following steps to create the strategy:
.. rubric:: |proc|
#. Create a strategy for subcloud Kubernetes upgrade using the
:command:`dcmanager kube-upgrade-strategy create --to-version <version>` command.
#. Apply the strategy using the Horizon GUI or the CLI using the command
:command:`dcmanager kube-upgrade-strategy apply`.
:ref:`apply-a-kubernetes-upgrade-strategy-using-horizon-2bb24c72e947`
**********************************************
Power Metrics Application in Real Time Kernels
**********************************************
When executing Power Metrics application in Real
Time kernels, the overall scheduling latency may increase due to inter-core
interruptions caused by the MSR (Model-specific Registers) reading.
Due to intensive workloads the kernel may not be able to handle the MSR
reading interruptions resulting in stalling data collection due to
not being scheduled on the affected core.
**Procedural Changes**: N/A.
***********************************************
k8s-coredump only supports lowercase annotation
***********************************************
Creating K8s pod core dump fails when setting the
``starlingx.io/core_pattern`` parameter in upper case characters on the pod
manifest. This results in the pod being unable to find the target directory
and fails to create the coredump file.
**Procedural Changes**: The ``starlingx.io/core_pattern`` parameter only accepts
lower case characters for the path and file name where the core dump is saved.
**See**: :ref:`kubernetes-pod-coredump-handler-54d27a0fd2ec`.
***********************
NetApp Permission Error
***********************
When installing/upgrading to Trident 20.07.1 and later, and Kubernetes version
1.17 or higher, new volumes created will not be writable if:
- The storageClass does not specify ``parameter.fsType``
- The pod using the requested |PVC| has an ``fsGroup`` enforced as part of a
Security constraint
**Procedural Changes**: Specify ``parameter.fsType`` in the ``localhost.yml`` file under
``netapp_k8s_storageclasses`` parameters as below.
The following example shows a minimal configuration in ``localhost.yml``:
.. code-block::
ansible_become_pass: xx43U~a96DN*m.?
trident_setup_dir: /tmp/trident
netapp_k8s_storageclasses:
- metadata:
name: netapp-nas-backend
provisioner: netapp.io/trident
parameters:
backendType: "ontap-nas"
fsType: "nfs"
netapp_k8s_snapshotstorageclasses:
- metadata:
name: csi-snapclass
**See**: :ref:`Configure an External NetApp Deployment as the Storage Backend <configure-an-external-netapp-deployment-as-the-storage-backend>`
********************************
Huge Page Limitation on Postgres
********************************
Debian postgres version supports huge pages, and by
default uses 1 huge page if it is available on the system, decreasing by 1 the
number of huge pages available.
**Procedural Changes**: The huge page setting must be disabled by setting
``/etc/postgresql/postgresql.conf: "huge_pages = off"``. The postgres service
needs to be restarted using the Service Manager :command:`sudo sm-restart service postgres`
command.
.. Warning::
The Procedural Changes is not persistent, therefore, if the host is rebooted
it will need to be applied again. This will be fixed in a future release.
************************************************
Password Expiry does not work on LDAP user login
************************************************
On Debian, the warning message is not being displayed for Active Directory users,
when a user logs in and the password is nearing expiry. Similarly, on login
when a user's password has already expired, the password change prompt is not
being displayed.
**Procedural Changes**: It is recommended that users rely on Directory administration
tools for "Windows Active Directory" servers to handle password updates,
reminders and expiration. It is also recommended that passwords should be
updated every 3 months.
.. note::
The expired password can be reset via Active Directory by IT administrators.
***************************************
Silicom TimeSync (STS) card limitations
***************************************
* Silicom and Intel based Time Sync NICs may not be deployed on the same system
due to conflicting time sync services and operations.
|PTP| configuration for Silicom TimeSync (STS) cards is handled separately
from |prod| host |PTP| configuration and may result in configuration
conflicts if both are used at the same time.
The sts-silicom application provides a dedicated ``phc2sys`` instance which
synchronizes the local system clock to the Silicom TimeSync (STS) card. Users
should ensure that ``phc2sys`` is not configured via |prod| |PTP| Host
Configuration when the sts-silicom application is in use.
Additionally, if |prod| |PTP| Host Configuration is being used in parallel
for non-STS NICs, users should ensure that all ``ptp4l`` instances do not use
conflicting ``domainNumber`` values.
* When the Silicom TimeSync (STS) card is configured in timing mode using the
sts-silicom application, the card goes through an initialization process on
application apply and server reboots. The ports will bounce up and down
several times during the initialization process, causing network traffic
disruption. Therefore, configuring the platform networks on the Silicom
TimeSync (STS) card is not supported since it will cause platform
instability.
**Procedural Changes**: N/A.
***********************************
N3000 Image in the containerd cache
***********************************
The |prod-long| system without an N3000 image in the containerd cache fails to
configure during a reboot cycle, and results in a failed / disabled node.
The N3000 device requires a reset early in the startup sequence. The reset is
done by the n3000-opae image. The image is automatically downloaded on bootstrap
and is expected to be in the cache to allow the reset to succeed. If the image
is not in the cache for any reason, the image cannot be downloaded as
``registry.local`` is not up yet at this point in the startup. This will result
in the impacted host going through multiple reboot cycles and coming up in an
enabled/degraded state. To avoid this issue:
1. Ensure that the docker filesystem is properly engineered to avoid the image
being automatically removed by the system if flagged as unused.
For instructions to resize the filesystem, see
:ref:`Increase Controller Filesystem Storage Allotments Using the CLI <increase-controller-filesystem-storage-allotments-using-the-cli>`
2. Do not manually prune the N3000 image.
**Procedural Changes**: Use the procedure below.
.. rubric:: |proc|
#. Lock the node.
.. code-block:: none
~(keystone_admin)]$ system host-lock controller-0
#. Pull the (N3000) required image into the ``containerd`` cache.
.. code-block:: none
~(keystone_admin)]$ crictl pull registry.local:9001/docker.io/starlingx/n3000-opae:stx.8.0-v1.0.2
#. Unlock the node.
.. code-block:: none
~(keystone_admin)]$ system host-unlock controller-0
.. Henrique please confirm if this is applicable in 10.0??
*****************
Quartzville Tools
*****************
The following :command:`celo64e` and :command:`nvmupdate64e` commands are not
supported in StarlingX due to a known issue in Quartzville tools that crashes
the host.
**Procedural Change**: Reboot the host using the boot screen menu.
*******************************************************************************************************
``ptp4l`` error "timed out while polling for tx timestamp" reported for NICs using the Intel ice driver
*******************************************************************************************************
NICs using the Intel® ice driver may report the following error in the ``ptp4l``
logs, which results in a |PTP| port switching to ``FAULTY`` before
re-initializing.
.. note::
|PTP| ports frequently switching to ``FAULTY`` may degrade the accuracy of
the |PTP| timing.
.. code-block:: none
ptp4l[80330.489]: timed out while polling for tx timestamp
ptp4l[80330.489]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
.. note::
This is due to a limitation with the Intel® ice driver as the driver cannot
guarantee the time interval to return the timestamp to the ``ptp4l`` user
space process which results in the occasional timeout error message.
**Procedural Changes**: The Procedural Changes recommended by Intel is to increase the
``tx_timestamp_timeout`` parameter in the ``ptp4l`` config. The increased
timeout value gives more time for the ice driver to provide the timestamp to
the ``ptp4l`` user space process. Timeout values of 50ms and 700ms have been
validated. However, the user can use a different value if it is more suitable
for their system.
.. code-block:: none
~(keystone_admin)]$ system ptp-instance-parameter-add <instance_name> tx_timestamp_timeout=700
~(keystone_admin)]$ system ptp-instance-apply
.. note::
The ``ptp4l`` timeout error log may also be caused by other underlying
issues, such as NIC port instability. Therefore, it is recommended to
confirm the NIC port is stable before adjusting the timeout values.
***************************************************
Cert-manager accepts only short hand IPv6 addresses
***************************************************
Cert-manager accepts only short hand IPv6 addresses.
**Procedural Changes**: You must use the following rules when defining IPv6 addresses
to be used by Cert-manager.
- all letters must be in lower case
- each group of hexadecimal values must not have any leading 0s
(use :12: instead of :0012:)
- the longest sequence of consecutive all-zero fields must be short handed
with ``::``
- ``::`` must not be used to short hand an IPv6 address with 7 groups of hexadecimal
values, use :0: instead of ``::``
.. note::
Use the rules above to set the IPv6 address related to the management
and |OAM| network in the Ansible bootstrap overrides file, localhost.yml.
.. code-block:: none
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: oidc-auth-apps-certificate
namespace: test
spec:
secretName: oidc-auth-apps-certificate
dnsNames:
- ahost.com
ipAddresses:
- fe80:12:903a:1c1a:e802::11e4
issuerRef:
name: cloudplatform-interca-issuer
kind: Issuer
.. Stx LP Bug: https://bugs.launchpad.net/starlingx/+bug/1846418 Won't fix.
.. To be addressed in a future update.
.. All please confirm if all these have been removed from the StarlingX 10.0?
------------------
Deprecated Notices
------------------
***************
Bare metal Ceph
***************
Host-based Ceph will be deprecated in a future release. Adoption
of Rook-Ceph is recommended for new deployments as some host-based Ceph
deployments may not be upgradable.
*********************************************************
No support for system_platform_certificate.subject_prefix
*********************************************************
|prod| 10.0 no longer supports system_platform_certificate.subject_prefix
This is an optional field to add a prefix to further identify the certificate,
for example, |prod| for instance.
***************************************************
Static Configuration for Hardware Accelerator Cards
***************************************************
Static configuration for hardware accelerator cards is deprecated in
|prod| 10.0 and will be discontinued in future releases.
Use |FEC| operator instead.
**See** :ref:`Switch between Static Method Hardware Accelerator and SR-IOV FEC Operator <switch-between-static-method-hardware-accelerator-and-srminusi-5f893343ee15>`
****************************************
N3000 FPGA Firmware Update Orchestration
****************************************
The N3000 |FPGA| Firmware Update Orchestration has been deprecated in |prod|
10.0. For more information, see :ref:`n3000-overview` for more
information.
********************
show-certs.sh Script
********************
The ``show-certs.sh`` script that is available when you ssh to a controller is
deprecated in |prod| 10.0.
The new response format of the 'system certificate-list' RESTAPI / CLI now
provides the same information as provided by ``show-certs.sh``.
***************
Kubernetes APIs
***************
Kubernetes APIs that will be removed in K8s 1.25 are listed below:
**See**: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-25
***********************
ptp-notification v1 API
***********************
The ptp-notification v1 API can still be used in |prod| 10.0.
The v1 API will be removed in a future release and only the O-RAN Compliant
Notification API (ptp-notification v2 API) will be supported.
.. note::
It is recommended that all new deployments use the O-RAN Compliant
Notification API (ptp-notification v2 API).
-------------------
Removed in Stx 10.0
-------------------
``kube-ignore-isol-cpus`` is no longer supported in |prod| 10.0.
*******************
Pod Security Policy
*******************
Pod Security Policy (PSP) is removed in |prod| 10.0 and
K8s v1.25 and ONLY applies if running on K8s v1.24 or earlier. Instead of
using Pod Security Policy, you can enforce similar restrictions on Pods
using Pod Security Admission Controller (PSAC) supporting K8s v1.25.
.. note::
Although |prod| 10.0 still supports K8s v1.24 which supports
|PSP|, |prod| 10.0 has removed the |prod| default |PSP| policies,
roles and role-bindings that made |PSP| usable in |prod|; It is important
to note that |prod| 10.0 is officially NOT supporting the use
of |PSP| in its Kubernetes deployment.
.. important::
Upgrades
- |PSP| should be removed on hosted application's and converted to
|PSA| Controller before the upgrade to |prod| 10.0.
.. - On 'upgrade activate or complete' of the upgrade to |prod|
.. 10.0, ALL |PSP| policies and all previously auto-generated ClusterRoles
.. and ClusterRoleBindings associated with |PSP| policies will be removed.
- Using the :command:`system application-update` command for Platform
applications will remove the use of roles or rolebindings dealing with
|PSP| policies.
- |PSA| Controller mechanisms should be configured to enforce the constraints that
the previous PSP policies were enforcing.
**See**: :ref:`Pod Security Admission Controller <pod-security-admission-controller-8e9e6994100f>`
*******************************
System certificate CLI Commands
*******************************
The following commands are removed in |prod| 10.0 and replaced
by:
- ``system certificate-install -m ssl <pemfile>``
has been replaced by an automatically installed 'system-restapi-gui-certificate'
CERTIFICATE (in the 'deployment' namespace) which can be modified using the
'update_platform_certificates' Ansible playbook
- ``system certificate-install -m openstack <pemfile>``
has been replaced by 'system os-certificate-install <pemfile>'
- ``system certificate-install -m ssl_ca <pemfile>``
- ``system certificate-install -m docker_registry <pemfile>``
has been replaced by an automatically installed 'system-registry-local-certificate'
CERTIFICATE (in the 'deployment' namespace) which can be modified using the
'update_platform_certificates' Ansible playbook
- ``system certificate-uninstall -m ssl_ca <pemfile>`` and
``system certificate-uninstall -m ssl_ca <pemfile>``
have been replaced by:
- ``'system ca-certificate-install <pemfile>'``
- ``'system ca-certificate-uninstall <uuid>'``
.. _appendix-commands-replaced-by-usm-for-updates-and-upgrades-835629a1f5b8:
------------------------------------------------------------------------
Appendix A - Commands replaced by USM for Updates (Patches) and Upgrades
------------------------------------------------------------------------
.. toctree::
:maxdepth: 1
**********************************
Manually Managing Software Patches
**********************************
The ``sudo sw-patch`` commands for manually managing software patches have
been replaced by ``software`` commands as listed below:
The following commands for manually managing software patches are **no** longer
supported:
- sw-patch upload <patch file>
- sw-patch upload-dir <directory with patch files>
- sw-patch query
- sw-patch show <patch-id>
- sw-patch apply <patch-id>
- sw-patch query-hosts
- sw-patch host-install <hostname>
- sw-patch host-install-async <hostname>
- sw-patch remove <patch-id>
- sw-patch delete <patch-id>
- sw-patch what-requires <patch-id>
- sw-patch query-dependencies <patch-id>
- sw-patch is-applied <patch-id>
- sw-patch is-available <patch-id>
- sw-patch install-local
- sw-patch drop-host <hostname-or-id>
- sw-patch commit <patch-id>
Software patching is now manually managed by the ``software`` commands
described in the :ref:``Manual Host Software Deployment <manual-removal-host-software-deployment-24f47e80e518>``
procedure.
- software upload <patch file>
- software upload-dir <directory with patch files>
- software list
- software delete <patch-release-id>
- software show <patch-release-id>
- software deploy precheck <patch-release-id>
- software deploy start <patch-release-id>
- software deploy show
- software deploy host <hostname>
- software deploy host-rollback <hostname>
- software deploy localhost
- software deploy host-list
- software deploy activate
- software deploy complete
- software deploy delete
************************
Manual Software Upgrades
************************
The ``system load-delete/import/list/show``,
``system upgrade-start/show/activate/abort/abort-complete/complete`` and
``system host-upgrade/upgrade-list/downgrade`` commands for manually managing
software upgrades have been replaced by ``software`` commands.
The following commands for manually managing software upgrades are **no** longer
supported:
- system load-import <major-release-ISO> <ISO-signature-file>
- system load-list
- system load-show <load-id>
- system load-delete <load-id>
- system upgrade-start
- system upgrade-show
- system host-upgrade <hostname>
- system host-upgrade-list
- system upgrade-activate
- system upgrade-complete
- system upgrade-abort
- system host-downgrade <hostname>
- system upgrade-abort-complete
Software upgrade is now manually managed by the ``software`` commands described
in the :ref:`manual-host-software-deployment-ee17ec6f71a4`
procedure.
- software upload <patch file>
- software upload-dir <directory with patch files>
- software list
- software delete <patch-release-id>
- software show <patch-release-id>
- software deploy precheck <patch-release-id>
- software deploy start <patch-release-id>
- software deploy show
- software deploy host <hostname>
- software deploy localhost
- software deploy host-list
- software deploy activate
- software deploy complete
- software deploy delete
- software deploy abort
- software deploy host-rollback <hostname>
- software deploy activate-rollback
*********************************
Orchestration of Software Patches
*********************************
The ``sw-manager patch-strategy-create/apply/show/abort/delete`` commands for
managing the orchestration of software patches have been replaced by
``sw-manager sw-deploy-strategy-create/apply/show/abort/delete`` commands.
The following commands for managing the orchestration of software patches are
**no** longer supported
- sw-manager patch-strategy create ... <options> ...
- sw-manager patch-strategy show
- sw-manager patch-strategy apply
- sw-manager patch-strategy abort
- sw-manager patch-strategy delete
Orchestrated software patching is now managed by the
``sw-manager sw-deploy-strategy-create/apply/show/abort/delete`` commands
described in the :ref:`orchestrated-deployment-host-software-deployment-d234754c7d20`
procedure.
- sw-manager sw-deploy-strategy create <patch-release-id> ... <options> ...
- sw-manager sw-deploy-strategy show
- sw-manager sw-deploy-strategy apply
- sw-manager sw-deploy-strategy abort
- sw-manager sw-deploy-strategy delete
**********************************
Orchestration of Software Upgrades
**********************************
The ``sw-manager patch-strategy-create/apply/show/abort/delete`` commands for
managing the orchestration of software upgrades have been replaced by
``sw-manager sw-deploy-strategy-create/apply/show/abort/delete`` commands.
The following commands for managing the orchestration of software upgrades are
no longer supported.
- sw-manager upgrade-strategy create ... <options> ...
- sw-manager upgrade-strategy show
- sw-manager upgrade-strategy apply
- sw-manager upgrade-strategy abort
- sw-manager upgrade-strategy delete
Orchestrated software upgrade is now managed by the
``sw-manager sw-deploy-strategy-create/apply/show/abort/delete`` commands
described in the :ref:`orchestrated-deployment-host-software-deployment-d234754c7d20`
procedure.
- sw-manager sw-deploy-strategy create <<major-release-id> ... <options> ...
- sw-manager sw-deploy-strategy show
- sw-manager sw-deploy-strategy apply
- sw-manager sw-deploy-strategy abort
- sw-manager sw-deploy-strategy delete
--------------------------------------
Release Information for other versions
--------------------------------------
You can find details about a release on the specific release page.
.. To change the 9.0 link
.. list-table::
* - Version
- Release Date
- Notes
- Status
* - StarlingX R10.0
- 2025-02
- https://docs.starlingx.io/r/stx.10.0/releasenotes/index.html
- Maintained
* - StarlingX R9.0
- 2024-03
- https://docs.starlingx.io/r/stx.9.0/releasenotes/index.html
- Maintained
* - StarlingX R8.0
- 2023-02
- https://docs.starlingx.io/r/stx.8.0/releasenotes/index.html
- Maintained
* - StarlingX R7.0
- 2022-07
- https://docs.starlingx.io/r/stx.7.0/releasenotes/index.html
- Maintained
* - StarlingX R6.0
- 2021-12
- https://docs.starlingx.io/r/stx.6.0/releasenotes/index.html
- Maintained
* - StarlingX R5.0.1
- 2021-09
- https://docs.starlingx.io/r/stx.5.0/releasenotes/index.html
- :abbr:`EOL (End of Life)`
* - StarlingX R5.0
- 2021-05
- https://docs.starlingx.io/r/stx.5.0/releasenotes/index.html
- :abbr:`EOL (End of Life)`
* - StarlingX R4.0
- 2020-08
-
- :abbr:`EOL (End of Life)`
* - StarlingX R3.0
- 2019-12
-
- :abbr:`EOL (End of Life)`
* - StarlingX R2.0.1
- 2019-10
-
- :abbr:`EOL (End of Life)`
* - StarlingX R2.0
- 2019-09
-
- :abbr:`EOL (End of Life)`
* - StarlingX R12.0
- 2018-10
-
- :abbr:`EOL (End of Life)`
StarlingX follows the release maintenance timelines in the `StarlingX Release
Plan <https://wiki.openstack.org/wiki/StarlingX/Release_Plan#Release_Maintenance>`_.
The Status column uses `OpenStack maintenance phase <https://docs.openstack.org/
project-team-guide/stable-branches.html#maintenance-phases>`_ definitions.