This commit introduces an in-memory, dictionary-based token caching
mechanism to reduce the number of token requests made to subclouds'
identity APIs.
The caching is implemented by subclassing the v3.Password
authentication class, which normally handles HTTP requests to the
identity API. The cache first checks if a valid, non-expired token
exists and returns it if found. If not, it proceeds with the actual
request and caches the new token for future use.
Tokens can be invalidated early when all fernet keys are rotated
(e.g., during the initial sync between subcloud and system controller).
The cache leverages Keystone's session reauthentication mechanism to
automatically invalidate cached tokens when necessary.
This commit also raises the open file descriptor limit for the DC
orchestrator service. With the use of sessions, TCP connections are
reused and are not closed immediately after each request.
Test Plan:
01. PASS - Deploy a subcloud and verify token caching behavior.
02. PASS - Deploy a subcloud with remote install, ensuring the token
cache works.
03. PASS - Prestage a subcloud for install and software deployment,
validating token caching during the process.
04. PASS - Run prestage orchestration and verify proper use of the
token cache.
05. PASS - Manage a subcloud for the first time and verify that the
initial sync functions as expected. Ensure fernet key rotation
causes cached tokens to invalidate, and confirm reauthentication
requests are made.
06. PASS - Unmanage a subcloud, rotate all fernet keys manually, then
manage the subcloud again. Verify token invalidation and
reauthentication function as expected.
07. PASS - Create a subcloud backup and ensure no token cache issues
arise.
08. PASS - Restore a subcloud from backup and verify proper
functionality of the token cache.
09. PASS - Deploy an N-1 subcloud and validate token caching for this
subcloud.
10. PASS - Verify that audits correctly identify an N-1 subcloud
without the USM patch as missing the USM service.
11. PASS - Apply the USM patch to the N-1 subcloud and verify that
the audit detects the USM service and prestage orchestration for
software deployment functions correctly.
12. PASS - Test DC orchestration audit and sync by creating a new
OpenStack user, and verify the user is replicated to the subcloud.
13. PASS - Apply a patch to subclouds using software deployment
orchestration, verifying token cache performance.
14. PASS - Test dcmanager API commands that send requests to
subclouds (e.g., 'dcmanager subcloud show <subcloud> --details'),
ensuring token cache is used.
15. PASS - Conduct a soak test of all DC services to verify token
expiration, renewal, and cache behavior over extended use.
16. PASS - Monitor TCP connections to ensure they are properly
closed after each use, preventing lingering open connections during
token caching or HTTP request handling.
17. PASS - Run end-to-end geo-redundancy operation and verify that it
completes successfully.
18. PASS - Run kube rootca update orchestration and verify that it
completes successfully.
19. PASS - Verify that the number of POST token requests made by the DC
audit to the subcloud per hour is equal to the number of DC audit
workers on the system controller.
20. PASS - Monitor the number of open file descriptors to ensure it
does not reach the new limit while executing a DC kube rootca
update strategy with the maximum number of supported subclouds.
Additionally, verify that all sessions are closed after the
strategy is complete.
Closes-Bug: 2084490
Change-Id: Ie3c17f58c09ae08df8cd9f0c92f50ab0c556c263
Signed-off-by: Gustavo Herzmann <gustavo.herzmann@windriver.com>
113 lines
3.7 KiB
Python
113 lines
3.7 KiB
Python
# Copyright (c) 2018-2021, 2024 Wind River Systems, Inc.
|
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
|
# not use this file except in compliance with the License. You may obtain
|
|
# a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
|
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
|
# License for the specific language governing permissions and limitations
|
|
# under the License.
|
|
#
|
|
#
|
|
|
|
import fmclient
|
|
from keystoneauth1 import session as ks_session
|
|
from oslo_log import log
|
|
|
|
from dccommon import consts as dccommon_consts
|
|
from dccommon.drivers import base
|
|
|
|
LOG = log.getLogger(__name__)
|
|
API_VERSION = "1"
|
|
|
|
|
|
class FmClient(base.DriverBase):
|
|
"""Fault Management driver."""
|
|
|
|
def __init__(
|
|
self,
|
|
region: str,
|
|
session: ks_session.Session,
|
|
endpoint_type=dccommon_consts.KS_ENDPOINT_DEFAULT,
|
|
endpoint: str = None,
|
|
token: str = None,
|
|
):
|
|
self.region_name = region
|
|
|
|
# If the token is specified, use it instead of using the session
|
|
if token:
|
|
if not endpoint:
|
|
endpoint = session.get_endpoint(
|
|
service_type=dccommon_consts.ENDPOINT_TYPE_FM,
|
|
region_name=region,
|
|
interface=endpoint_type,
|
|
)
|
|
session = None
|
|
|
|
self.fm = fmclient.Client(
|
|
API_VERSION,
|
|
session=session,
|
|
region_name=region,
|
|
endpoint_type=endpoint_type,
|
|
endpoint=endpoint,
|
|
auth_token=token,
|
|
)
|
|
|
|
def get_alarm_summary(self):
|
|
"""Get this region alarm summary"""
|
|
try:
|
|
LOG.debug("get_alarm_summary region %s" % self.region_name)
|
|
alarms = self.fm.alarm.summary()
|
|
except Exception as e:
|
|
LOG.error("get_alarm_summary exception={}".format(e))
|
|
raise e
|
|
return alarms
|
|
|
|
def get_alarms_by_id(self, alarm_id):
|
|
"""Get list of this region alarms for a particular alarm_id"""
|
|
try:
|
|
LOG.debug("get_alarms_by_id %s, region %s" % (alarm_id, self.region_name))
|
|
alarms = self.fm.alarm.list(
|
|
q=fmclient.common.options.cli_to_array("alarm_id=" + alarm_id),
|
|
include_suppress=True,
|
|
)
|
|
except Exception as e:
|
|
LOG.error("get_alarms_by_id exception={}".format(e))
|
|
raise e
|
|
return alarms
|
|
|
|
def get_alarms_by_ids(self, alarm_id_list):
|
|
"""Get list of this region alarms for a list of alarm_ids"""
|
|
try:
|
|
LOG.debug(
|
|
"get_alarms_by_ids %s, region %s" % (alarm_id_list, self.region_name)
|
|
)
|
|
# fm api does not support querying two alarm IDs at once so make
|
|
# multiple calls and join the list
|
|
alarms = []
|
|
for alarm_id in alarm_id_list:
|
|
alarms.extend(
|
|
self.fm.alarm.list(
|
|
q=fmclient.common.options.cli_to_array("alarm_id=" + alarm_id),
|
|
include_suppress=True,
|
|
)
|
|
)
|
|
except Exception as e:
|
|
LOG.error("get_alarms_by_ids exception={}".format(e))
|
|
raise e
|
|
return alarms
|
|
|
|
def get_alarms(self):
|
|
"""Get this region alarms"""
|
|
|
|
try:
|
|
LOG.debug("get_alarms region %s" % self.region_name)
|
|
alarms = self.fm.alarm.list(include_suppress=True)
|
|
except Exception as e:
|
|
LOG.error("get_alarms exception={}".format(e))
|
|
raise e
|
|
return alarms
|