diff --git a/charms/tempest-k8s/CONTRIBUTING.md b/charms/tempest-k8s/CONTRIBUTING.md new file mode 100644 index 00000000..7782c00f --- /dev/null +++ b/charms/tempest-k8s/CONTRIBUTING.md @@ -0,0 +1,25 @@ +# Contributing + +To make contributions to this charm, you'll need a working [development setup](https://juju.is/docs/sdk/dev-setup). + +## Testing and Development + +This project uses `tox` for managing test environments. There are some pre-configured environments +that can be used for linting and formatting code when you're preparing contributions to the charm. +Please see the tox.ini file in the root of this repository. + +For example: + +``` +tox -e fmt +tox -e pep8 +tox -e cover +``` + +## Build the charm + +Change to the root of this repository and run: + +``` +tox -e build -- tempest-k8s +``` diff --git a/charms/tempest-k8s/LICENSE b/charms/tempest-k8s/LICENSE new file mode 100644 index 00000000..af93a234 --- /dev/null +++ b/charms/tempest-k8s/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright 2023 Chi Wai CHAN + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/charms/tempest-k8s/README.md b/charms/tempest-k8s/README.md new file mode 100644 index 00000000..2a8c5dc2 --- /dev/null +++ b/charms/tempest-k8s/README.md @@ -0,0 +1,20 @@ + + +# tempest-k8s + +Tempest provides a set of integration tests to be run, in ad-hoc +or periodic fasion, against a live OpenStack cluster for OpenStack API +validation, scenarios, and other specific tests useful in validating an +OpenStack deployment. + +## Other resources + +- [tempest-k8s](https://charmhub.io/tempest-k8s) on Charmhub diff --git a/charms/tempest-k8s/charmcraft.yaml b/charms/tempest-k8s/charmcraft.yaml new file mode 100644 index 00000000..9b5807ba --- /dev/null +++ b/charms/tempest-k8s/charmcraft.yaml @@ -0,0 +1,126 @@ +type: "charm" +bases: + - build-on: + - name: "ubuntu" + channel: "22.04" + run-on: + - name: "ubuntu" + channel: "22.04" +parts: + update-certificates: + plugin: nil + override-build: | + apt update + apt install -y ca-certificates + update-ca-certificates + + charm: + after: [update-certificates] + build-packages: + - git + - libffi-dev + - libssl-dev + - rustc + - cargo + - pkg-config + charm-binary-python-packages: + - cryptography + - jsonschema + - pydantic<2.0 + - jinja2 + +name: tempest-k8s +summary: OpenStack integration test suite (tempest) +description: | + Tempest provides a set of integration tests to be run, in ad-hoc + or periodic fasion, against a live OpenStack cluster for OpenStack API + validation, scenarios, and other specific tests useful in validating an + OpenStack deployment. + +assumes: + - k8s-api + - juju >= 3.1 + +links: + source: https://opendev.org/openstack/sunbeam-charms + issues: https://bugs.launchpad.net/sunbeam-charms + +containers: + tempest: + resource: tempest-image + +resources: + tempest-image: + type: oci-image + description: OCI image for tempest + # ghcr.io/canonical/tempest:2023.2 + upstream-source: ghcr.io/canonical/tempest:2023.2 + +requires: + identity-ops: + interface: keystone-resources + logging: + interface: loki_push_api + +provides: + grafana-dashboard: + interface: grafana_dashboard + +peers: + peers: + interface: tempest-peer + +config: + options: + schedule: + type: string + default: "off" + description: | + The cron-like schedule to define when to run tempest. When the value is + "off" (case-insensitive), then period checks will be disabled. The + default is to turn off period checks. + +actions: + validate: + description: | + Run a set of tempest tests. + + Tests can be filtered using parameters: regex, exclude-regex, and test-list. + These parameters are optional; if none are given, all tests will run. + + Provided parameters narrow down the tests that will run. + For example, `regex="one two" exclude-regex=three test-list=list1`, + will run tests that are: + - found in test list "list1" + - AND match regex "one" or "two" + - AND don't match regex "three" + params: + regex: + type: string + default: "" + description: | + A list of regexes, whitespace separated, used to select tests from the list. + Tests matching any of the regexes will be selected. + + If no value provided (the default), all tests will be selected. + + To run the equivalent of tempest smoke tests (`tempest run --smoke`), + use `regex=smoke`. + exclude-regex: + type: string + default: "" + description: | + A single regex to exclude tests. + Any test that matches this regex will be excluded from the final list. + serial: + type: boolean + default: false + description: Run tests serially. By default, tests run in parallel. + test-list: + type: string + default: "" + description: | + Use a predefined test list. See `get-lists` for available test lists. + + get-lists: + description: List existing test lists, to be used with validate action. diff --git a/charms/tempest-k8s/rebuild b/charms/tempest-k8s/rebuild new file mode 100644 index 00000000..fe45819a --- /dev/null +++ b/charms/tempest-k8s/rebuild @@ -0,0 +1,3 @@ +# This file is used to trigger a build. +# Change uuid to trigger a new build. +c3b9c7c9-2bd4-4df1-a1df-89c729b34eb6 diff --git a/charms/tempest-k8s/requirements.txt b/charms/tempest-k8s/requirements.txt new file mode 100644 index 00000000..48fb8600 --- /dev/null +++ b/charms/tempest-k8s/requirements.txt @@ -0,0 +1,9 @@ +ops +jinja2 +lightkube +lightkube-models +# COS requirement +cosl + +# From ops_sunbeam +tenacity diff --git a/charms/tempest-k8s/src/charm.py b/charms/tempest-k8s/src/charm.py new file mode 100755 index 00000000..08d73f31 --- /dev/null +++ b/charms/tempest-k8s/src/charm.py @@ -0,0 +1,231 @@ +#!/usr/bin/env python3 +# +# Copyright 2024 Canonical Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tempest Operator Charm. + +This charm provide Tempest as part of an OpenStack deployment +""" + +import logging +from typing import ( + Dict, + List, +) + +import ops +import ops.charm +import ops.pebble +import ops_sunbeam.charm as sunbeam_charm +import ops_sunbeam.container_handlers as sunbeam_chandlers +import ops_sunbeam.core as sunbeam_core +import ops_sunbeam.relation_handlers as sunbeam_rhandlers +from handlers import ( + GrafanaDashboardRelationHandler, + LoggingRelationHandler, + TempestPebbleHandler, + TempestUserIdentityRelationHandler, +) +from ops.main import ( + main, +) +from ops.model import ( + ActiveStatus, + BlockedStatus, + MaintenanceStatus, +) +from utils.constants import ( + CONTAINER, + TEMPEST_CONCURRENCY, + TEMPEST_CONF, + TEMPEST_HOME, + TEMPEST_LIST_DIR, + TEMPEST_OUTPUT, + TEMPEST_TEST_ACCOUNTS, + TEMPEST_WORKSPACE, + TEMPEST_WORKSPACE_PATH, +) + +logger = logging.getLogger(__name__) + + +class TempestOperatorCharm(sunbeam_charm.OSBaseOperatorCharmK8S): + """Charm the service.""" + + _state = ops.framework.StoredState() + service_name = "tempest" + + mandatory_relations = {"identity-ops"} + + def __init__(self, framework: ops.framework.Framework) -> None: + """Run the constructor.""" + # config for openstack, used by tempest + super().__init__(framework) + self.framework.observe( + self.on.validate_action, self._on_validate_action + ) + self.framework.observe( + self.on.get_lists_action, self._on_get_lists_action + ) + + @property + def container_configs(self) -> List[sunbeam_core.ContainerConfigFile]: + """Container configuration files for the operator.""" + return [ + # crontab is owned and run by root + sunbeam_core.ContainerConfigFile("/etc/crontab", "root", "root"), + # Only give exec access to root and tempest user + # for these wrappers, simply for principle of least privilege. + sunbeam_core.ContainerConfigFile( + "/usr/local/sbin/tempest-run-wrapper", + "root", + "tempest", + 0o750, + ), + sunbeam_core.ContainerConfigFile( + "/usr/local/sbin/tempest-init", + "root", + "tempest", + 0o750, + ), + ] + + def get_pebble_handlers(self) -> List[sunbeam_chandlers.PebbleHandler]: + """Pebble handlers for operator.""" + return [ + TempestPebbleHandler( + self, + CONTAINER, + self.service_name, + self.container_configs, + self.template_dir, + self.configure_charm, + ) + ] + + def get_relation_handlers(self) -> List[sunbeam_rhandlers.RelationHandler]: + """Relation handlers for the service.""" + handlers = super().get_relation_handlers() + self.user_id_ops = TempestUserIdentityRelationHandler( + self, + "identity-ops", + self.configure_charm, + mandatory="identity-ops" in self.mandatory_relations, + ) + handlers.append(self.user_id_ops) + self.loki = LoggingRelationHandler( + self, + "logging", + self.configure_charm, + mandatory="logging" in self.mandatory_relations, + ) + handlers.append(self.loki) + self.grafana = GrafanaDashboardRelationHandler( + self, + "grafana-dashboard", + self.configure_charm, + mandatory="grafana-dashboard" in self.mandatory_relations, + ) + handlers.append(self.grafana) + return handlers + + def _get_environment_for_tempest(self) -> Dict[str, str]: + """Return a dictionary of environment variables. + + To be used with pebble commands that run tempest discover, etc. + """ + logger.debug("Retrieving OpenStack credentials") + credential = self.user_id_ops.get_user_credential() + return { + "OS_REGION_NAME": "RegionOne", + "OS_IDENTITY_API_VERSION": "3", + "OS_AUTH_VERSION": "3", + "OS_AUTH_URL": credential.get("auth-url"), + "OS_USERNAME": credential.get("username"), + "OS_PASSWORD": credential.get("password"), + "OS_USER_DOMAIN_NAME": credential.get("domain-name"), + "OS_PROJECT_NAME": credential.get("project-name"), + "OS_PROJECT_DOMAIN_NAME": credential.get("domain-name"), + "OS_DOMAIN_NAME": credential.get("domain-name"), + "TEMPEST_CONCURRENCY": TEMPEST_CONCURRENCY, + "TEMPEST_CONF": TEMPEST_CONF, + "TEMPEST_HOME": TEMPEST_HOME, + "TEMPEST_LIST_DIR": TEMPEST_LIST_DIR, + "TEMPEST_OUTPUT": TEMPEST_OUTPUT, + "TEMPEST_TEST_ACCOUNTS": TEMPEST_TEST_ACCOUNTS, + "TEMPEST_WORKSPACE": TEMPEST_WORKSPACE, + "TEMPEST_WORKSPACE_PATH": TEMPEST_WORKSPACE_PATH, + } + + def post_config_setup(self) -> None: + """Configuration steps after services have been setup. + + NOTE: this will be improved in future to avoid running unnecessarily. + """ + logger.debug("Running post config setup") + self.status.set(MaintenanceStatus("tempest init in progress")) + pebble = self.pebble_handler() + + logger.debug("Ready to init tempest environment") + env = self._get_environment_for_tempest() + try: + pebble.init_tempest(env) + except RuntimeError: + self.status.set( + BlockedStatus("tempest init failed, see logs for more info") + ) + return + + self.status.set(ActiveStatus("")) + logger.debug("Finish post config setup") + + def pebble_handler(self) -> TempestPebbleHandler: + """Get the pebble handler.""" + return self.get_named_pebble_handler(CONTAINER) + + def _on_validate_action(self, event: ops.charm.ActionEvent) -> None: + """Run tempest action.""" + serial: bool = event.params["serial"] + regexes: List[str] = event.params["regex"].strip().split() + exclude_regex: str = event.params["exclude-regex"].strip() + test_list: str = event.params["test-list"].strip() + + env = self._get_environment_for_tempest() + try: + output = self.pebble_handler().run_tempest_tests( + regexes, exclude_regex, test_list, serial, env + ) + except RuntimeError as e: + event.fail(str(e)) + # still print the message, + # because it could be a lot of output from tempest, + # and we want it neatly formatted + print(e) + return + print(output) + + def _on_get_lists_action(self, event: ops.charm.ActionEvent) -> None: + """List tempest test lists action.""" + try: + lists = self.pebble_handler().get_test_lists() + except RuntimeError as e: + event.fail(str(e)) + return + # display neatly to the user. This will also end up in the action output results.stdout + print("\n".join(lists)) + + +if __name__ == "__main__": + main(TempestOperatorCharm) diff --git a/charms/tempest-k8s/src/handlers.py b/charms/tempest-k8s/src/handlers.py new file mode 100644 index 00000000..64245dcc --- /dev/null +++ b/charms/tempest-k8s/src/handlers.py @@ -0,0 +1,519 @@ +# Copyright 2024 Canonical Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Handers for the tempest charm.""" +import hashlib +import json +import logging +import re +import secrets +import string +from functools import ( + wraps, +) +from typing import ( + Callable, + Dict, + FrozenSet, + List, + Optional, +) + +import charms.grafana_k8s.v0.grafana_dashboard as grafana_dashboard +import charms.loki_k8s.v1.loki_push_api as loki_push_api +import ops +import ops.model +import ops.pebble +import ops_sunbeam.container_handlers as sunbeam_chandlers +import ops_sunbeam.relation_handlers as sunbeam_rhandlers +from utils.constants import ( + OPENSTACK_DOMAIN, + OPENSTACK_PROJECT, + OPENSTACK_ROLE, + OPENSTACK_USER, + TEMPEST_HOME, + TEMPEST_LIST_DIR, + TEMPEST_OUTPUT, +) + +logger = logging.getLogger(__name__) + + +def assert_ready(f): + """Decorator for gating pebble handler methods for readiness. + + Raise a runtime error if the pebble handler is not ready. + """ + + @wraps(f) + def wrapper(self, *args, **kwargs): + if not self.pebble_ready: + raise RuntimeError("pebble is not ready") + return f(self, *args, **kwargs) + + return wrapper + + +class TempestPebbleHandler(sunbeam_chandlers.ServicePebbleHandler): + """Pebble handler for the container.""" + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + self.container = self.charm.unit.get_container(self.container_name) + + def get_layer(self) -> dict: + """Pebble configuration layer for the container.""" + return { + "summary": "Periodic cloud validation service", + "description": "Pebble config layer for periodic cloud validation job", + "services": { + # Note: cron service is started when the charm is ready, + # but the cronjobs will only be configured to run + # when the right conditions are met + # (eg. observability connected, configuration set to run). + self.service_name: { + "override": "replace", + "summary": "Running tempest periodically", + # Must run cron in foreground to be managed by pebble + "command": "cron -f", + "user": "root", + "group": "root", + "startup": "enabled", + }, + }, + } + + @assert_ready + def get_test_lists(self) -> List[str]: + """Get the filenames of available test lists.""" + files = self.container.list_files(TEMPEST_LIST_DIR) + return [x.name for x in files] + + @assert_ready + def init_tempest(self, env: Dict[str, str]): + """Init the openstack environment for tempest. + + Raise a RuntimeError if something goes wrong. + """ + # Pebble runs cron, which runs tempest periodically + # when periodic checks are enabled. + # This ensures that tempest gets the env, inherited from cron. + layer = self.get_layer() + layer["services"][self.service_name]["environment"] = env + self.container.add_layer(self.service_name, layer, combine=True) + + try: + self.execute( + ["tempest-init"], + user="tempest", + group="tempest", + working_dir=TEMPEST_HOME, + exception_on_error=True, + environment=env, + ) + except ops.pebble.ExecError as e: + if e.stdout: + for line in e.stdout.splitlines(): + logger.error(" %s", line) + raise RuntimeError("tempest init failed") + + @assert_ready + def run_tempest_tests( + self, + regexes: List[str], + exclude_regex: str, + test_list: str, + serial: bool, + env: Dict[str, str], + ) -> str: + """Wrapper for running a set of tempest tests. + + Return the output as a string. + Raises a RuntimeError if something goes wrong. + """ + # validation before running anything + for r in [*regexes, exclude_regex]: + try: + re.compile(r) + except re.error as e: + raise RuntimeError(f"{r!r} is an invalid regex: {e}") + + if test_list and test_list not in self.get_test_lists(): + raise RuntimeError( + f"'{test_list}' is not a known test list. " + "Please run list-tests action to view available lists." + ) + + # now build the command line for tempest + serial_args = ["--serial" if serial else "--parallel"] + regex_args = ["--regex", " ".join(regexes)] if regexes else [] + exclude_regex_args = ( + ["--exclude-regex", exclude_regex] if exclude_regex else [] + ) + list_args = ( + ["--load-list", TEMPEST_LIST_DIR + "/" + test_list] + if test_list + else [] + ) + args = [ + "tempest-run-wrapper", + *serial_args, + *regex_args, + *exclude_regex_args, + *list_args, + ] + + try: + output = self.execute( + args, + user="tempest", + group="tempest", + working_dir=TEMPEST_HOME, + exception_on_error=True, + environment=env, + ) + except ops.pebble.ExecError as e: + if e.stdout: + output = f"{e.stdout}\n\n{e.stderr}" + else: + output = e.stderr + raise RuntimeError(output) + + return output + + +class TempestUserIdentityRelationHandler(sunbeam_rhandlers.RelationHandler): + """Relation handler for identity ops.""" + + CREDENTIALS_SECRET_PREFIX = "tempest-user-identity-resource-" + CONFIGURE_SECRET_PREFIX = "configure-credential-" + + resource_identifiers: FrozenSet[str] = frozenset( + { + "name", + "domain", + "project", + } + ) + + def __init__( + self, + charm: ops.CharmBase, + relation_name: str, + callback_f: Callable, + mandatory: bool, + ): + super().__init__(charm, relation_name, callback_f, mandatory) + self.charm = charm + + @property + def ready(self) -> bool: + """Whether the relation is ready.""" + content = self.get_user_credential() + if content and content.get("auth-url") is not None: + return True + return False + + @property + def label(self) -> str: + """Secret label to share over keystone resource relation.""" + return self.CREDENTIALS_SECRET_PREFIX + OPENSTACK_USER + + def setup_event_handler(self) -> ops.Object: + """Configure event handlers for the relation.""" + import charms.keystone_k8s.v0.identity_resource as id_ops + + logger.debug("Setting up Identity Resource event handler") + ops_svc = id_ops.IdentityResourceRequires( + self.charm, + self.relation_name, + ) + self.framework.observe( + ops_svc.on.provider_ready, + self._on_provider_ready, + ) + self.framework.observe( + ops_svc.on.provider_goneaway, + self._on_provider_goneaway, + ) + self.framework.observe( + ops_svc.on.response_available, + self._on_response_available, + ) + return ops_svc + + def get_user_credential(self) -> Optional[dict]: + """Retrieve the user credential.""" + credentials_id = self.charm.leader_get(self.label) + if not credentials_id: + logger.warning("Failed to get openstack credential for tempest.") + return None + secret = self.model.get_secret(id=credentials_id) + return secret.get_content() + + def _hash_ops(self, ops: list) -> str: + """Hash ops request.""" + return hashlib.sha256(json.dumps(ops).encode()).hexdigest() + + def _ensure_credential(self) -> str: + """Ensure the credential exists and return the secret id.""" + credentials_id = self.charm.leader_get(self.label) + + # If it exists and the credentials have already been set, + # simply return the id + if credentials_id: + secret = self.model.get_secret(id=credentials_id) + content = secret.get_content() + if "password" in content: + return credentials_id + + # Otherwise, generate and save the credentials. + return self._set_secret( + { + "username": OPENSTACK_USER, + "password": self._generate_password(18), + "project-name": OPENSTACK_PROJECT, + "domain-name": OPENSTACK_DOMAIN, + }, + ) + + def _set_secret(self, entries: Dict[str, str]) -> str: + """Create or update a secret.""" + credential_id = self.charm.leader_get(self.label) + + # update secret if credential_id exists + if credential_id: + secret = self.model.get_secret(id=credential_id) + content = secret.get_content() + content.update(entries) + if content != secret.get_content(): + secret.set_content(content) + return credential_id + + # create new secret if credential_id does not exist + credential_secret = self.model.app.add_secret( + entries, + label=self.label, + ) + self.charm.leader_set({self.label: credential_secret.id}) + return credential_secret.id + + def _generate_password(self, length: int) -> str: + """Utility function to generate secure random string for password.""" + alphabet = string.ascii_letters + string.digits + return "".join(secrets.choice(alphabet) for i in range(length)) + + def _grant_ops_secret(self, relation: ops.Relation) -> None: + """Grant ops secret.""" + secret = self.model.get_secret(id=self._ensure_credential()) + secret.grant(relation) + + def _setup_tempest_resource_ops(self) -> List[dict]: + """Set up openstack resource ops.""" + credential_id = self._ensure_credential() + credential_secret = self.model.get_secret(id=credential_id) + content = credential_secret.get_content() + username = content.get("username") + password = content.get("password") + setup_ops = [ + { + "name": "create_role", + "params": { + "name": OPENSTACK_ROLE, + }, + }, + { + "name": "create_domain", + "params": { + "name": OPENSTACK_DOMAIN, + "enable": True, + }, + }, + { + "name": "create_project", + "params": { + "name": OPENSTACK_PROJECT, + "domain": "{{ create_domain[0].id }}", + }, + }, + { + "name": "create_user", + "params": { + "name": username, + "password": password, + "domain": "{{ create_domain[0].id }}", + }, + }, + { + "name": "grant_role", + "params": { + "role": "{{ create_role[0].id }}", + "domain": "{{ create_domain[0].id }}", + "user": "{{ create_user[0].id }}", + "user_domain": "{{ create_domain[0].id }}", + }, + }, + { + "name": "grant_role", + "params": { + "role": "{{ create_role[0].id }}", + "user": "{{ create_user[0].id }}", + "user_domain": "{{ create_domain[0].id }}", + "project": "{{ create_project[0].id }}", + "project_domain": "{{ create_domain[0].id }}", + }, + }, + ] + return setup_ops + + def _list_endpoint_ops(self) -> List[dict]: + """List endpoint ops.""" + list_endpoint_ops = [ + { + "name": "list_endpoint", + "params": {"name": "keystone", "interface": "admin"}, + }, + ] + return list_endpoint_ops + + def _teardown_tempest_resource_ops(self) -> List[dict]: + """Tear down openstack resource ops.""" + teardown_ops = [ + { + "name": "show_domain", + "params": { + "name": OPENSTACK_DOMAIN, + }, + }, + { + "name": "update_domain", + "params": { + "domain": "{{ show_domain[0].id }}", + "enable": False, + }, + }, + { + "name": "delete_domain", + "params": { + "name": "{{ show_domain[0].id }}", + }, + }, + ] + return teardown_ops + + def _setup_tempest_resource_request(self) -> dict: + """Set up openstack resource for tempest.""" + ops = [] + ops.extend(self._teardown_tempest_resource_ops()) + ops.extend(self._setup_tempest_resource_ops()) + ops.extend(self._list_endpoint_ops()) + request = { + "id": self._hash_ops(ops), + "tag": "setup_tempest_resource", + "ops": ops, + } + return request + + def _teardown_tempest_resource_request(self) -> dict: + """Tear down openstack resources for tempest.""" + ops = [] + ops.extend(self._teardown_tempest_resource_ops()) + request = { + "id": self._hash_ops(ops), + "tag": "teardown_tempest_resource", + "ops": ops, + } + return request + + def _process_list_endpoint_response(self, response: dict) -> None: + """Process extra ops request: `_list_endpoint_ops`.""" + for op in response.get("ops", []): + if op.get("name") != "list_endpoint": + continue + if op.get("return-code") != 0: + logger.warning("List endpoint ops failed.") + return + for endpoint in op.get("value", {}): + auth_url = endpoint.get("url") + if auth_url is not None: + self._set_secret({"auth-url": auth_url}) + return + + def _on_provider_ready(self, event) -> None: + """Handles response available events.""" + if not self.model.unit.is_leader(): + return + logger.info("Identity ops provider ready: setup tempest resources") + self.interface.request_ops(self._setup_tempest_resource_request()) + self._grant_ops_secret(event.relation) + self.callback_f(event) + + def _on_response_available(self, event) -> None: + """Handles response available events.""" + if not self.model.unit.is_leader(): + return + logger.info("Handle response from identity ops") + + response = self.interface.response + logger.info("%s", json.dumps(response, indent=4)) + self._process_list_endpoint_response(response) + self.callback_f(event) + + def _on_provider_goneaway(self, event) -> None: + """Handle gone_away event.""" + if not self.model.unit.is_leader(): + return + logger.info( + "Identity ops provider gone away: teardown tempest resources" + ) + self.callback_f(event) + + +class GrafanaDashboardRelationHandler(sunbeam_rhandlers.RelationHandler): + """Relation handler for grafana-dashboard relation.""" + + def setup_event_handler(self) -> ops.framework.Object: + """Configure event handlers for the relation.""" + logger.debug("Setting up Grafana Dashboards Provider event handler") + interface = grafana_dashboard.GrafanaDashboardProvider( + self.charm, + relation_name=self.relation_name, + dashboards_path="src/grafana_dashboards", + ) + return interface + + @property + def ready(self) -> bool: + """Determine with the relation is ready for use.""" + return True + + +class LoggingRelationHandler(sunbeam_rhandlers.RelationHandler): + """Relation handler for logging relation.""" + + def setup_event_handler(self) -> ops.framework.Object: + """Configure event handlers for the relation.""" + logger.debug("Setting up Logging Provider event handler") + interface = loki_push_api.LogProxyConsumer( + self.charm, + recursive=True, + relation_name=self.relation_name, + alert_rules_path="src/loki_alert_rules", + logs_scheme={"tempest": {"log-files": [TEMPEST_OUTPUT]}}, + ) + return interface + + @property + def ready(self) -> bool: + """Determine with the relation is ready for use.""" + return True diff --git a/charms/tempest-k8s/src/templates/crontab.j2 b/charms/tempest-k8s/src/templates/crontab.j2 new file mode 100644 index 00000000..7062ddfb --- /dev/null +++ b/charms/tempest-k8s/src/templates/crontab.j2 @@ -0,0 +1,17 @@ +# Do not change this file, this file is managed by juju. This is a dedicated +# system-wide crontab for running tempest periodically. + +SHELL=/bin/sh + +# Example of job definition: +# .---------------- minute (0 - 59) +# | .------------- hour (0 - 23) +# | | .---------- day of month (1 - 31) +# | | | .------- month (1 - 12) OR jan,feb,mar,apr ... +# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat +# | | | | | +# * * * * * user-name command to be executed +{% if options.schedule.casefold() != "off" %} +# Note that the process lock is shared between ad hoc check and the periodic check. +{{ options.schedule }} tempest tempest-run-wrapper --load-list /tempest_test_lists/readonly-quick +{% endif %} diff --git a/charms/tempest-k8s/src/templates/tempest-init.j2 b/charms/tempest-k8s/src/templates/tempest-init.j2 new file mode 100644 index 00000000..a7adce5e --- /dev/null +++ b/charms/tempest-k8s/src/templates/tempest-init.j2 @@ -0,0 +1,21 @@ +#!/bin/bash +# Do not change this file, this file is managed by juju. + +# set -e is important, to ensure the script bails out +# if there are issues, such as lock not acquired, +# or failure in one of the tempest steps. +set -ex + +(flock -n 9 || (echo "lock could not be acquired"; exit 1) + +# clean up before initialising everything +rm -rf "$TEMPEST_WORKSPACE_PATH" +rm -rf "${TEMPEST_HOME}/.tempest" + +tempest init --name "$TEMPEST_WORKSPACE" "$TEMPEST_WORKSPACE_PATH" + +discover-tempest-config --out "$TEMPEST_CONF" + +tempest account-generator -r "$TEMPEST_CONCURRENCY" -c "$TEMPEST_CONF" "$TEMPEST_TEST_ACCOUNTS" + +) 9>/var/lock/tempest diff --git a/charms/tempest-k8s/src/templates/tempest-run-wrapper.j2 b/charms/tempest-k8s/src/templates/tempest-run-wrapper.j2 new file mode 100644 index 00000000..01033fc0 --- /dev/null +++ b/charms/tempest-k8s/src/templates/tempest-run-wrapper.j2 @@ -0,0 +1,17 @@ +#!/bin/bash +# Do not change this file, this file is managed by juju. + +# set -e is important, to ensure the script bails out +# if there are issues, such as lock not acquired, +# or failure in one of the tempest steps. +set -ex + +(flock -n 9 || (echo "lock could not be acquired"; exit 1) + +discover-tempest-config --test-accounts "$TEMPEST_TEST_ACCOUNTS" --out "$TEMPEST_CONF" + +TMP_FILE="$(mktemp)" +tempest run --workspace "$TEMPEST_WORKSPACE" "$@" 2>&1 | tee "$TMP_FILE" +mv "$TMP_FILE" "$TEMPEST_OUTPUT" + +) 9>/var/lock/tempest diff --git a/charms/tempest-k8s/src/utils/__init__.py b/charms/tempest-k8s/src/utils/__init__.py new file mode 100644 index 00000000..ee9c7b74 --- /dev/null +++ b/charms/tempest-k8s/src/utils/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2024 Canonical Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Utility modules for tempest-k8s charm.""" diff --git a/charms/tempest-k8s/src/utils/constants.py b/charms/tempest-k8s/src/utils/constants.py new file mode 100644 index 00000000..63e30aa6 --- /dev/null +++ b/charms/tempest-k8s/src/utils/constants.py @@ -0,0 +1,32 @@ +# Copyright 2024 Canonical Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Constants for the tempest charm.""" +CONTAINER = "tempest" + +TEMPEST_CONCURRENCY = "4" +TEMPEST_HOME = "/var/lib/tempest" +TEMPEST_WORKSPACE_PATH = f"{TEMPEST_HOME}/workspace" +TEMPEST_CONF = f"{TEMPEST_WORKSPACE_PATH}/etc/tempest.conf" +TEMPEST_TEST_ACCOUNTS = f"{TEMPEST_WORKSPACE_PATH}/test_accounts.yaml" +TEMPEST_LIST_DIR = "/tempest_test_lists" +# this file will contain the output from tempest's latest test run +TEMPEST_OUTPUT = f"{TEMPEST_WORKSPACE_PATH}/tempest-output.log" +# This is the workspace name registered with tempest. +# It will be saved in a file in $HOME/.tempest/ +TEMPEST_WORKSPACE = "tempest" + +OPENSTACK_USER = "tempest" +OPENSTACK_DOMAIN = "tempest" +OPENSTACK_PROJECT = "tempest-CloudValidation" +OPENSTACK_ROLE = "admin" diff --git a/charms/tempest-k8s/tests/unit/__init__.py b/charms/tempest-k8s/tests/unit/__init__.py new file mode 100644 index 00000000..6f98ef2f --- /dev/null +++ b/charms/tempest-k8s/tests/unit/__init__.py @@ -0,0 +1,15 @@ +# Copyright 2024 Canonical Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Unit tests for tempest operator.""" diff --git a/charms/tempest-k8s/tests/unit/test_tempest_charm.py b/charms/tempest-k8s/tests/unit/test_tempest_charm.py new file mode 100644 index 00000000..033e8df1 --- /dev/null +++ b/charms/tempest-k8s/tests/unit/test_tempest_charm.py @@ -0,0 +1,389 @@ +#!/usr/bin/env python3 + +# Copyright 2024 Canonical Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Unit tests for Tempest operator.""" + +import json +import pathlib + +import charm +import mock +import ops_sunbeam.test_utils as test_utils +import yaml +from utils.constants import ( + CONTAINER, + TEMPEST_HOME, +) + +TEST_TEMPEST_ENV = { + "OS_REGION_NAME": "RegionOne", + "OS_IDENTITY_API_VERSION": "3", + "OS_AUTH_VERSION": "3", + "OS_AUTH_URL": "http://10.6.0.23/openstack-keystone/v3", + "OS_USERNAME": "tempest", + "OS_PASSWORD": "password", + "OS_USER_DOMAIN_NAME": "tempest", + "OS_PROJECT_NAME": "tempest-CloudValidation", + "OS_PROJECT_DOMAIN_NAME": "tempest", + "OS_DOMAIN_NAME": "tempest", + "TEMPEST_CONCURRENCY": "4", + "TEMPEST_CONF": "/var/lib/tempest/workspace/etc/tempest.conf", + "TEMPEST_HOME": "/var/lib/tempest", + "TEMPEST_LIST_DIR": "/tempest_test_lists", + "TEMPEST_OUTPUT": "/var/lib/tempest/workspace/tempest-output.log", + "TEMPEST_TEST_ACCOUNTS": "/var/lib/tempest/workspace/test_accounts.yaml", + "TEMPEST_WORKSPACE": "tempest", + "TEMPEST_WORKSPACE_PATH": "/var/lib/tempest/workspace", +} + + +charmcraft = ( + pathlib.Path(__file__).parents[2] / "charmcraft.yaml" +).read_text() +config = yaml.dump(yaml.safe_load(charmcraft)["config"]) +actions = yaml.dump(yaml.safe_load(charmcraft)["actions"]) + + +class _TempestTestOperatorCharm(charm.TempestOperatorCharm): + """Test Operator Charm for Tempest operator.""" + + def __init__(self, framework): + self.seen_events = [] + super().__init__(framework) + + def _log_event(self, event): + self.seen_events.append(type(event).__name__) + + def configure_charm(self, event): + super().configure_charm(event) + self._log_event(event) + + +class TestTempestOperatorCharm(test_utils.CharmTestCase): + """Classes for testing tempest charms.""" + + PATCHES = [] + + def setUp(self): + """Setup Placement tests.""" + super().setUp(charm, self.PATCHES) + self.harness = test_utils.get_harness( + _TempestTestOperatorCharm, + container_calls=self.container_calls, + charm_metadata=charmcraft, + charm_config=config, + charm_actions=actions, + ) + + self.addCleanup(self.harness.cleanup) + self.harness.begin() + self.harness.set_leader() + + def add_identity_ops_relation(self, harness): + """Add identity resource relation.""" + rel_id = harness.add_relation("identity-ops", "keystone") + harness.add_relation_unit(rel_id, "keystone/0") + harness.charm.user_id_ops.callback_f = mock.Mock() + harness.charm.user_id_ops.get_user_credential = mock.Mock( + return_value={ + "username": "tempest", + "password": "password", + "domain-name": "tempest", + "project-name": "tempest-CloudValidation", + "auth-url": "http://10.6.0.23/openstack-keystone/v3", + }, + ) + + # Only show the list_endpoint ops for simplicity + harness.update_relation_data( + rel_id, + "keystone", + { + "response": json.dumps( + { + "id": "c8e02ce67f57057d1a0d6660c6571361eea1a03d749d021d33e13ea4b0a7982a", + "tag": "setup_tempest_resource", + "ops": [ + { + "name": "some_other_ops", + "return-code": 0, + "value": "", + }, + { + "name": "list_endpoint", + "return-code": 0, + "value": [ + { + "id": "68c4eba8b01f41829d30cf2519998883", + "service_id": "b2a08eea7699460e838f7cce97529e55", + "interface": "admin", + "region": "RegionOne", + "url": "http://10.152.183.48:5000/v3", + "enabled": True, + } + ], + }, + ], + } + ) + }, + ) + return rel_id + + def add_logging_relation(self, harness): + """Add logging relation.""" + rel_id = harness.add_relation("logging", "loki") + harness.add_relation_unit(rel_id, "loki/0") + harness.charm.loki.interface = mock.Mock() + return rel_id + + def add_grafana_dashboard_relation(self, harness): + """Add grafana dashboard relation.""" + rel_id = harness.add_relation("grafana_dashboard", "grafana") + harness.add_relation_unit(rel_id, "grafana/0") + harness.charm.grafana.interface = mock.Mock() + return rel_id + + def test_pebble_ready_handler(self): + """Test Pebble ready event is captured.""" + self.assertEqual(self.harness.charm.seen_events, []) + test_utils.set_all_pebbles_ready(self.harness) + self.assertEqual(self.harness.charm.seen_events, ["PebbleReadyEvent"]) + + def test_all_relations(self): + """Test all integrations ready and okay for operator.""" + test_utils.set_all_pebbles_ready(self.harness) + logging_rel_id = self.add_logging_relation(self.harness) + identity_ops_rel_id = self.add_identity_ops_relation(self.harness) + grafana_dashboard_rel_id = self.add_grafana_dashboard_relation( + self.harness + ) + + self.harness.update_config({"schedule": "0 0 */7 * *"}) + + config_files = [ + "/etc/crontab", + "/usr/local/sbin/tempest-run-wrapper", + "/usr/local/sbin/tempest-init", + ] + for f in config_files: + self.check_file(charm.CONTAINER, f) + + self.assertEqual(self.harness.charm.status.message(), "") + self.assertEqual(self.harness.charm.status.status.name, "active") + + self.harness.remove_relation(logging_rel_id) + self.harness.remove_relation(identity_ops_rel_id) + self.harness.remove_relation(grafana_dashboard_rel_id) + + def test_validate_action_invalid_regex(self): + """Test validate action with invalid regex provided.""" + test_utils.set_all_pebbles_ready(self.harness) + logging_rel_id = self.add_logging_relation(self.harness) + identity_ops_rel_id = self.add_identity_ops_relation(self.harness) + grafana_dashboard_rel_id = self.add_grafana_dashboard_relation( + self.harness + ) + + action_event = mock.Mock() + action_event.params = { + "serial": False, + "regex": "test(", + "exclude-regex": "", + "test-list": "", + } + self.harness.charm._on_validate_action(action_event) + action_event.fail.assert_called_with( + "'test(' is an invalid regex: missing ), unterminated subpattern at position 4" + ) + + self.harness.remove_relation(logging_rel_id) + self.harness.remove_relation(identity_ops_rel_id) + self.harness.remove_relation(grafana_dashboard_rel_id) + + def test_validate_action_invalid_list(self): + """Test validate action with invalid list provided.""" + test_utils.set_all_pebbles_ready(self.harness) + logging_rel_id = self.add_logging_relation(self.harness) + identity_ops_rel_id = self.add_identity_ops_relation(self.harness) + grafana_dashboard_rel_id = self.add_grafana_dashboard_relation( + self.harness + ) + + file1 = mock.Mock() + file1.name = "file_1" + file2 = mock.Mock() + file2.name = "file_2" + self.harness.charm.pebble_handler().container.list_files = mock.Mock( + return_value=[file1, file2] + ) + + action_event = mock.Mock() + action_event.params = { + "serial": False, + "regex": "", + "exclude-regex": "", + "test-list": "nonexistent", + } + self.harness.charm._on_validate_action(action_event) + action_event.fail.assert_called_with( + "'nonexistent' is not a known test list. Please run list-tests action to view available lists." + ) + + self.harness.remove_relation(logging_rel_id) + self.harness.remove_relation(identity_ops_rel_id) + self.harness.remove_relation(grafana_dashboard_rel_id) + + def test_validate_action_success(self): + """Test validate action with default params.""" + test_utils.set_all_pebbles_ready(self.harness) + logging_rel_id = self.add_logging_relation(self.harness) + identity_ops_rel_id = self.add_identity_ops_relation(self.harness) + grafana_dashboard_rel_id = self.add_grafana_dashboard_relation( + self.harness + ) + + file1 = mock.Mock() + file1.name = "file_1" + file2 = mock.Mock() + file2.name = "file_2" + self.harness.charm.pebble_handler().container.list_files = mock.Mock( + return_value=[file1, file2] + ) + exec_mock = mock.Mock() + self.harness.charm.pebble_handler().execute = exec_mock + + action_event = mock.Mock() + action_event.params = { + "serial": False, + "regex": "", + "exclude-regex": "", + "test-list": "", + } + self.harness.charm._on_validate_action(action_event) + action_event.fail.assert_not_called() + exec_mock.assert_called_with( + ["tempest-run-wrapper", "--parallel"], + user="tempest", + group="tempest", + working_dir=TEMPEST_HOME, + exception_on_error=True, + environment=TEST_TEMPEST_ENV, + ) + + self.harness.remove_relation(logging_rel_id) + self.harness.remove_relation(identity_ops_rel_id) + self.harness.remove_relation(grafana_dashboard_rel_id) + + def test_validate_action_params(self): + """Test validate action with more params.""" + test_utils.set_all_pebbles_ready(self.harness) + logging_rel_id = self.add_logging_relation(self.harness) + identity_ops_rel_id = self.add_identity_ops_relation(self.harness) + grafana_dashboard_rel_id = self.add_grafana_dashboard_relation( + self.harness + ) + + file1 = mock.Mock() + file1.name = "file_1" + file2 = mock.Mock() + file2.name = "file_2" + self.harness.charm.pebble_handler().container.list_files = mock.Mock( + return_value=[file1, file2] + ) + exec_mock = mock.Mock() + self.harness.charm.pebble_handler().execute = exec_mock + + action_event = mock.Mock() + action_event.params = { + "serial": True, + "regex": "re1 re2", + "exclude-regex": "excludethis", + "test-list": "file_1", + } + self.harness.charm._on_validate_action(action_event) + action_event.fail.assert_not_called() + exec_mock.assert_called_with( + [ + "tempest-run-wrapper", + "--serial", + "--regex", + "re1 re2", + "--exclude-regex", + "excludethis", + "--load-list", + "/tempest_test_lists/file_1", + ], + user="tempest", + group="tempest", + working_dir=TEMPEST_HOME, + exception_on_error=True, + environment=TEST_TEMPEST_ENV, + ) + + self.harness.remove_relation(logging_rel_id) + self.harness.remove_relation(identity_ops_rel_id) + self.harness.remove_relation(grafana_dashboard_rel_id) + + def test_get_list_action(self): + """Test get-list action.""" + test_utils.set_all_pebbles_ready(self.harness) + logging_rel_id = self.add_logging_relation(self.harness) + identity_ops_rel_id = self.add_identity_ops_relation(self.harness) + grafana_dashboard_rel_id = self.add_grafana_dashboard_relation( + self.harness + ) + + file1 = mock.Mock() + file1.name = "file_1" + file2 = mock.Mock() + file2.name = "file_2" + self.harness.charm.pebble_handler().container.list_files = mock.Mock( + return_value=[file1, file2] + ) + + action_event = mock.Mock() + self.harness.charm._on_get_lists_action(action_event) + action_event.fail.assert_not_called() + + self.harness.remove_relation(logging_rel_id) + self.harness.remove_relation(identity_ops_rel_id) + self.harness.remove_relation(grafana_dashboard_rel_id) + + def test_get_list_action_not_ready(self): + """Test get-list action when pebble is not ready.""" + test_utils.set_all_pebbles_ready(self.harness) + logging_rel_id = self.add_logging_relation(self.harness) + identity_ops_rel_id = self.add_identity_ops_relation(self.harness) + grafana_dashboard_rel_id = self.add_grafana_dashboard_relation( + self.harness + ) + + file1 = mock.Mock() + file1.name = "file_1" + file2 = mock.Mock() + file2.name = "file_2" + self.harness.charm.unit.get_container(CONTAINER).can_connect = ( + mock.Mock(return_value=False) + ) + + action_event = mock.Mock() + self.harness.charm._on_get_lists_action(action_event) + action_event.fail.assert_called_with("pebble is not ready") + + self.harness.remove_relation(logging_rel_id) + self.harness.remove_relation(identity_ops_rel_id) + self.harness.remove_relation(grafana_dashboard_rel_id) diff --git a/common.sh b/common.sh index d101e8bf..b384eead 100644 --- a/common.sh +++ b/common.sh @@ -142,6 +142,12 @@ EXTERNAL_OVN_RELAY_LIBS=( "observability_libs" ) +EXTERNAL_TEMPEST_LIBS=( + "observability_libs" + "grafana_k8s" + "loki_k8s" +) + # Config template parts for each component. CONFIG_TEMPLATES_AODH=( "section-database" @@ -282,6 +288,7 @@ declare -A INTERNAL_LIBS=( [ovn-central-k8s]=${INTERNAL_OVN_CENTRAL_LIBS[@]} [ovn-relay-k8s]=${INTERNAL_OVN_CENTRAL_LIBS[@]} [placement-k8s]=${INTERNAL_KEYSTONE_LIBS[@]} + [tempest-k8s]=${INTERNAL_KEYSTONE_LIBS[@]} ) declare -A EXTERNAL_LIBS=( @@ -309,6 +316,7 @@ declare -A EXTERNAL_LIBS=( [ovn-central-k8s]=${EXTERNAL_OVN_CENTRAL_LIBS[@]} [ovn-relay-k8s]=${EXTERNAL_OVN_RELAY_LIBS[@]} [placement-k8s]=${EXTERNAL_AODH_LIBS[@]} + [tempest-k8s]=${EXTERNAL_TEMPEST_LIBS[@]} ) declare -A CONFIG_TEMPLATES=( @@ -336,6 +344,7 @@ declare -A CONFIG_TEMPLATES=( [ovn-central-k8s]=${NULL_ARRAY[@]} [ovn-relay-k8s]=${NULL_ARRAY[@]} [placement-k8s]=${CONFIG_TEMPLATES_PLACEMENT[@]} + [tempest-k8s]=${NULL_ARRAY[@]} ) diff --git a/fetch_libs.sh b/fetch_libs.sh index 9e04d4c4..7ff0b8d0 100755 --- a/fetch_libs.sh +++ b/fetch_libs.sh @@ -5,6 +5,8 @@ pushd libs/external echo "INFO: Fetching libs from charmhub." charmcraft fetch-lib charms.data_platform_libs.v0.data_interfaces charmcraft fetch-lib charms.grafana_k8s.v0.grafana_auth +charmcraft fetch-lib charms.loki_k8s.v1.loki_push_api +charmcraft fetch-lib charms.observability_libs.v0.juju_topology charmcraft fetch-lib charms.observability_libs.v1.kubernetes_service_patch charmcraft fetch-lib charms.operator_libs_linux.v2.snap charmcraft fetch-lib charms.prometheus_k8s.v0.prometheus_scrape diff --git a/libs/external/lib/charms/loki_k8s/v1/loki_push_api.py b/libs/external/lib/charms/loki_k8s/v1/loki_push_api.py new file mode 100644 index 00000000..5fd404bf --- /dev/null +++ b/libs/external/lib/charms/loki_k8s/v1/loki_push_api.py @@ -0,0 +1,2430 @@ +#!/usr/bin/env python3 +# Copyright 2023 Canonical Ltd. +# See LICENSE file for licensing details. +# +# Learn more at: https://juju.is/docs/sdk + +r"""## Overview. + +This document explains how to use the two principal objects this library provides: + +- `LokiPushApiProvider`: This object is meant to be used by any Charmed Operator that needs to +implement the provider side of the `loki_push_api` relation interface. For instance, a Loki charm. +The provider side of the relation represents the server side, to which logs are being pushed. + +- `LokiPushApiConsumer`: This object is meant to be used by any Charmed Operator that needs to +send log to Loki by implementing the consumer side of the `loki_push_api` relation interface. +For instance, a Promtail or Grafana agent charm which needs to send logs to Loki. + +- `LogProxyConsumer`: This object can be used by any Charmed Operator which needs to +send telemetry, such as logs, to Loki through a Log Proxy by implementing the consumer side of the +`loki_push_api` relation interface. + +Filtering logs in Loki is largely performed on the basis of labels. In the Juju ecosystem, Juju +topology labels are used to uniquely identify the workload which generates telemetry like logs. + +In order to be able to control the labels on the logs pushed this object adds a Pebble layer +that runs Promtail in the workload container, injecting Juju topology labels into the +logs on the fly. + +## LokiPushApiProvider Library Usage + +This object may be used by any Charmed Operator which implements the `loki_push_api` interface. +For instance, Loki or Grafana Agent. + +For this purpose a charm needs to instantiate the `LokiPushApiProvider` object with one mandatory +and three optional arguments. + +- `charm`: A reference to the parent (Loki) charm. + +- `relation_name`: The name of the relation that the charm uses to interact + with its clients, which implement `LokiPushApiConsumer` or `LogProxyConsumer`. + + If provided, this relation name must match a provided relation in metadata.yaml with the + `loki_push_api` interface. + + The default relation name is "logging" for `LokiPushApiConsumer` and "log-proxy" for + `LogProxyConsumer`. + + For example, a provider's `metadata.yaml` file may look as follows: + + ```yaml + provides: + logging: + interface: loki_push_api + ``` + + Subsequently, a Loki charm may instantiate the `LokiPushApiProvider` in its constructor as + follows: + + from charms.loki_k8s.v0.loki_push_api import LokiPushApiProvider + from loki_server import LokiServer + ... + + class LokiOperatorCharm(CharmBase): + ... + + def __init__(self, *args): + super().__init__(*args) + ... + external_url = urlparse(self._external_url) + self.loki_provider = LokiPushApiProvider( + self, + address=external_url.hostname or self.hostname, + port=external_url.port or 80, + scheme=external_url.scheme, + path=f"{external_url.path}/loki/api/v1/push", + ) + ... + + - `port`: Loki Push Api endpoint port. Default value: `3100`. + - `scheme`: Loki Push Api endpoint scheme (`HTTP` or `HTTPS`). Default value: `HTTP` + - `address`: Loki Push Api endpoint address. Default value: `localhost` + - `path`: Loki Push Api endpoint path. Default value: `loki/api/v1/push` + + +The `LokiPushApiProvider` object has several responsibilities: + +1. Set the URL of the Loki Push API in the relation application data bag; the URL + must be unique to all instances (e.g. using a load balancer). + +2. Set the Promtail binary URL (`promtail_binary_zip_url`) so clients that use + `LogProxyConsumer` object could download and configure it. + +3. Process the metadata of the consumer application, provided via the + "metadata" field of the consumer data bag, which are used to annotate the + alert rules (see next point). An example for "metadata" is the following: + + {'model': 'loki', + 'model_uuid': '0b7d1071-ded2-4bf5-80a3-10a81aeb1386', + 'application': 'promtail-k8s' + } + +4. Process alert rules set into the relation by the `LokiPushApiConsumer` + objects, e.g.: + + '{ + "groups": [{ + "name": "loki_0b7d1071-ded2-4bf5-80a3-10a81aeb1386_promtail-k8s_alerts", + "rules": [{ + "alert": "HighPercentageError", + "expr": "sum(rate({app=\\"foo\\", env=\\"production\\"} |= \\"error\\" [5m])) + by (job) \\n /\\nsum(rate({app=\\"foo\\", env=\\"production\\"}[5m])) + by (job)\\n > 0.05 + \\n", "for": "10m", + "labels": { + "severity": "page", + "juju_model": "loki", + "juju_model_uuid": "0b7d1071-ded2-4bf5-80a3-10a81aeb1386", + "juju_application": "promtail-k8s" + }, + "annotations": { + "summary": "High request latency" + } + }] + }] + }' + + +Once these alert rules are sent over relation data, the `LokiPushApiProvider` object +stores these files in the directory `/loki/rules` inside the Loki charm container. After +storing alert rules files, the object will check alert rules by querying Loki API +endpoint: [`loki/api/v1/rules`](https://grafana.com/docs/loki/latest/api/#list-rule-groups). +If there are changes in the alert rules a `loki_push_api_alert_rules_changed` event will +be emitted with details about the `RelationEvent` which triggered it. + +This events should be observed in the charm that uses `LokiPushApiProvider`: + +```python + def __init__(self, *args): + super().__init__(*args) + ... + self.loki_provider = LokiPushApiProvider(self) + self.framework.observe( + self.loki_provider.on.loki_push_api_alert_rules_changed, + self._loki_push_api_alert_rules_changed, + ) +``` + + +## LokiPushApiConsumer Library Usage + +This Loki charm interacts with its clients using the Loki charm library. Charms +seeking to send log to Loki, must do so using the `LokiPushApiConsumer` object from +this charm library. + +> **NOTE**: `LokiPushApiConsumer` also depends on an additional charm library. +> +> Ensure sure you `charmcraft fetch-lib charms.observability_libs.v0.juju_topology` +> when using this library. + +For the simplest use cases, using the `LokiPushApiConsumer` object only requires +instantiating it, typically in the constructor of your charm (the one which +sends logs). + +```python +from charms.loki_k8s.v0.loki_push_api import LokiPushApiConsumer + +class LokiClientCharm(CharmBase): + + def __init__(self, *args): + super().__init__(*args) + ... + self._loki_consumer = LokiPushApiConsumer(self) +``` + +The `LokiPushApiConsumer` constructor requires two things: + +- A reference to the parent (LokiClientCharm) charm. + +- Optionally, the name of the relation that the Loki charm uses to interact + with its clients. If provided, this relation name must match a required + relation in metadata.yaml with the `loki_push_api` interface. + + This argument is not required if your metadata.yaml has precisely one + required relation in metadata.yaml with the `loki_push_api` interface, as the + lib will automatically resolve the relation name inspecting the using the + meta information of the charm + +Any time the relation between a Loki provider charm and a Loki consumer charm is +established, a `LokiPushApiEndpointJoined` event is fired. In the consumer side +is it possible to observe this event with: + +```python + +self.framework.observe( + self._loki_consumer.on.loki_push_api_endpoint_joined, + self._on_loki_push_api_endpoint_joined, +) +``` + +Any time there are departures in relations between the consumer charm and Loki +the consumer charm is informed, through a `LokiPushApiEndpointDeparted` event, for instance: + +```python +self.framework.observe( + self._loki_consumer.on.loki_push_api_endpoint_departed, + self._on_loki_push_api_endpoint_departed, +) +``` + +The consumer charm can then choose to update its configuration in both situations. + +Note that LokiPushApiConsumer does not add any labels automatically on its own. In +order to better integrate with the Canonical Observability Stack, you may want to configure your +software to add Juju topology labels. The +[observability-libs](https://charmhub.io/observability-libs) library can be used to get topology +labels in charm code. See :func:`LogProxyConsumer._scrape_configs` for an example of how +to do this with promtail. + +## LogProxyConsumer Library Usage + +Let's say that we have a workload charm that produces logs, and we need to send those logs to a +workload implementing the `loki_push_api` interface, such as `Loki` or `Grafana Agent`. + +Adopting this object in a Charmed Operator consist of two steps: + +1. Use the `LogProxyConsumer` class by instantiating it in the `__init__` method of the charmed + operator. There are two ways to get logs in to promtail. You can give it a list of files to + read, or you can write to it using the syslog protocol. + + For example: + + ```python + from charms.loki_k8s.v1.loki_push_api import LogProxyConsumer + + ... + + def __init__(self, *args): + ... + self._log_proxy = LogProxyConsumer( + self, + logs_scheme={ + "workload-a": { + "log-files": ["/tmp/worload-a-1.log", "/tmp/worload-a-2.log"], + "syslog-port": 1514, + }, + "workload-b": {"log-files": ["/tmp/worload-b.log"], "syslog-port": 1515}, + }, + relation_name="log-proxy", + ) + self.framework.observe( + self._log_proxy.on.promtail_digest_error, + self._promtail_error, + ) + + def _promtail_error(self, event): + logger.error(event.message) + self.unit.status = BlockedStatus(event.message) + ``` + + Any time the relation between a provider charm and a LogProxy consumer charm is + established, a `LogProxyEndpointJoined` event is fired. In the consumer side is it + possible to observe this event with: + + ```python + + self.framework.observe( + self._log_proxy.on.log_proxy_endpoint_joined, + self._on_log_proxy_endpoint_joined, + ) + ``` + + Any time there are departures in relations between the consumer charm and the provider + the consumer charm is informed, through a `LogProxyEndpointDeparted` event, for instance: + + ```python + self.framework.observe( + self._log_proxy.on.log_proxy_endpoint_departed, + self._on_log_proxy_endpoint_departed, + ) + ``` + + The consumer charm can then choose to update its configuration in both situations. + + Note that: + + - You can configure your syslog software using `localhost` as the address and the method + `LogProxyConsumer.syslog_port("container_name")` to get the port, or, alternatively, if you are using rsyslog + you may use the method `LogProxyConsumer.rsyslog_config("container_name")`. + +2. Modify the `metadata.yaml` file to add: + + - The `log-proxy` relation in the `requires` section: + ```yaml + requires: + log-proxy: + interface: loki_push_api + optional: true + ``` + +Once the library is implemented in a Charmed Operator and a relation is established with +the charm that implements the `loki_push_api` interface, the library will inject a +Pebble layer that runs Promtail in the workload container to send logs. + +By default, the promtail binary injected into the container will be downloaded from the internet. +If, for any reason, the container has limited network access, you may allow charm administrators +to provide their own promtail binary at runtime by adding the following snippet to your charm +metadata: + +```yaml +resources: + promtail-bin: + type: file + description: Promtail binary for logging + filename: promtail-linux +``` + +Which would then allow operators to deploy the charm this way: + +``` +juju deploy \ + ./your_charm.charm \ + --resource promtail-bin=/tmp/promtail-linux-amd64 +``` + +If a different resource name is used, it can be specified with the `promtail_resource_name` +argument to the `LogProxyConsumer` constructor. + +The object can emit a `PromtailDigestError` event: + +- Promtail binary cannot be downloaded. +- The sha256 sum mismatch for promtail binary. + +The object can raise a `ContainerNotFoundError` event: + +- No `container_name` parameter has been specified and the Pod has more than 1 container. + +These can be monitored via the PromtailDigestError events via: + +```python + self.framework.observe( + self._loki_consumer.on.promtail_digest_error, + self._promtail_error, + ) + + def _promtail_error(self, event): + logger.error(msg) + self.unit.status = BlockedStatus(event.message) + ) +``` + +## Alerting Rules + +This charm library also supports gathering alerting rules from all related Loki client +charms and enabling corresponding alerts within the Loki charm. Alert rules are +automatically gathered by `LokiPushApiConsumer` object from a directory conventionally +named `loki_alert_rules`. + +This directory must reside at the top level in the `src` folder of the +consumer charm. Each file in this directory is assumed to be a single alert rule +in YAML format. The file name must have the `.rule` extension. +The format of this alert rule conforms to the +[Loki docs](https://grafana.com/docs/loki/latest/rules/#alerting-rules). + +An example of the contents of one such file is shown below. + +```yaml +alert: HighPercentageError +expr: | + sum(rate({%%juju_topology%%} |= "error" [5m])) by (job) + / + sum(rate({%%juju_topology%%}[5m])) by (job) + > 0.05 +for: 10m +labels: + severity: page +annotations: + summary: High request latency + +``` + +It is **critical** to use the `%%juju_topology%%` filter in the expression for the alert +rule shown above. This filter is a stub that is automatically replaced by the +`LokiPushApiConsumer` following Loki Client's Juju topology (application, model and its +UUID). Such a topology filter is essential to ensure that alert rules submitted by one +provider charm generates alerts only for that same charm. + +The Loki charm may be related to multiple Loki client charms. Without this, filter +rules submitted by one provider charm will also result in corresponding alerts for other +provider charms. Hence, every alert rule expression must include such a topology filter stub. + +Gathering alert rules and generating rule files within the Loki charm is easily done using +the `alerts()` method of `LokiPushApiProvider`. Alerts generated by Loki will automatically +include Juju topology labels in the alerts. These labels indicate the source of the alert. + +The following labels are automatically added to every alert + +- `juju_model` +- `juju_model_uuid` +- `juju_application` + + +Whether alert rules files does not contain the keys `alert` or `expr` or there is no alert +rules file in `alert_rules_path` a `loki_push_api_alert_rules_error` event is emitted. + +To handle these situations the event must be observed in the `LokiClientCharm` charm.py file: + +```python +class LokiClientCharm(CharmBase): + + def __init__(self, *args): + super().__init__(*args) + ... + self._loki_consumer = LokiPushApiConsumer(self) + + self.framework.observe( + self._loki_consumer.on.loki_push_api_alert_rules_error, + self._alert_rules_error + ) + + def _alert_rules_error(self, event): + self.unit.status = BlockedStatus(event.message) +``` + +## Relation Data + +The Loki charm uses both application and unit relation data to obtain information regarding +Loki Push API and alert rules. + +Units of consumer charm send their alert rules over app relation data using the `alert_rules` +key. +""" + +import json +import logging +import os +import platform +import re +import socket +import subprocess +import tempfile +import typing +from copy import deepcopy +from gzip import GzipFile +from hashlib import sha256 +from io import BytesIO +from pathlib import Path +from typing import Any, Dict, List, Optional, Tuple, Union +from urllib import request +from urllib.error import HTTPError + +import yaml +from charms.observability_libs.v0.juju_topology import JujuTopology +from ops.charm import ( + CharmBase, + HookEvent, + RelationBrokenEvent, + RelationCreatedEvent, + RelationDepartedEvent, + RelationEvent, + RelationJoinedEvent, + RelationRole, + WorkloadEvent, +) +from ops.framework import EventBase, EventSource, Object, ObjectEvents +from ops.model import Container, ModelError, Relation +from ops.pebble import APIError, ChangeError, Layer, PathError, ProtocolError + +# The unique Charmhub library identifier, never change it +LIBID = "bf76f23cdd03464b877c52bd1d2f563e" + +# Increment this major API version when introducing breaking changes +LIBAPI = 1 + +# Increment this PATCH version before using `charmcraft publish-lib` or reset +# to 0 if you are raising the major API version +LIBPATCH = 0 + +logger = logging.getLogger(__name__) + +RELATION_INTERFACE_NAME = "loki_push_api" +DEFAULT_RELATION_NAME = "logging" +DEFAULT_ALERT_RULES_RELATIVE_PATH = "./src/loki_alert_rules" +DEFAULT_LOG_PROXY_RELATION_NAME = "log-proxy" + +PROMTAIL_BASE_URL = "https://github.com/canonical/loki-k8s-operator/releases/download" +# To update Promtail version you only need to change the PROMTAIL_VERSION and +# update all sha256 sums in PROMTAIL_BINARIES. To support a new architecture +# you only need to add a new key value pair for the architecture in PROMTAIL_BINARIES. +PROMTAIL_VERSION = "v2.5.0" +PROMTAIL_BINARIES = { + "amd64": { + "filename": "promtail-static-amd64", + "zipsha": "543e333b0184e14015a42c3c9e9e66d2464aaa66eca48b29e185a6a18f67ab6d", + "binsha": "17e2e271e65f793a9fbe81eab887b941e9d680abe82d5a0602888c50f5e0cac9", + }, +} + +# Paths in `charm` container +BINARY_DIR = "/tmp" + +# Paths in `workload` container +WORKLOAD_BINARY_DIR = "/opt/promtail" +WORKLOAD_CONFIG_DIR = "/etc/promtail" +WORKLOAD_CONFIG_FILE_NAME = "promtail_config.yaml" +WORKLOAD_CONFIG_PATH = "{}/{}".format(WORKLOAD_CONFIG_DIR, WORKLOAD_CONFIG_FILE_NAME) +WORKLOAD_POSITIONS_PATH = "{}/positions.yaml".format(WORKLOAD_BINARY_DIR) +WORKLOAD_SERVICE_NAME = "promtail" + +# These are the initial port values. As we can have more than one container, +# we use odd and even numbers to avoid collisions. +# Each new container adds 2 to the previous value. +HTTP_LISTEN_PORT_START = 9080 # even start port +GRPC_LISTEN_PORT_START = 9095 # odd start port + + +class RelationNotFoundError(ValueError): + """Raised if there is no relation with the given name.""" + + def __init__(self, relation_name: str): + self.relation_name = relation_name + self.message = "No relation named '{}' found".format(relation_name) + + super().__init__(self.message) + + +class RelationInterfaceMismatchError(Exception): + """Raised if the relation with the given name has a different interface.""" + + def __init__( + self, + relation_name: str, + expected_relation_interface: str, + actual_relation_interface: str, + ): + self.relation_name = relation_name + self.expected_relation_interface = expected_relation_interface + self.actual_relation_interface = actual_relation_interface + self.message = ( + "The '{}' relation has '{}' as interface rather than the expected '{}'".format( + relation_name, actual_relation_interface, expected_relation_interface + ) + ) + super().__init__(self.message) + + +class RelationRoleMismatchError(Exception): + """Raised if the relation with the given name has a different direction.""" + + def __init__( + self, + relation_name: str, + expected_relation_role: RelationRole, + actual_relation_role: RelationRole, + ): + self.relation_name = relation_name + self.expected_relation_interface = expected_relation_role + self.actual_relation_role = actual_relation_role + self.message = "The '{}' relation has role '{}' rather than the expected '{}'".format( + relation_name, repr(actual_relation_role), repr(expected_relation_role) + ) + super().__init__(self.message) + + +def _validate_relation_by_interface_and_direction( + charm: CharmBase, + relation_name: str, + expected_relation_interface: str, + expected_relation_role: RelationRole, +): + """Verifies that a relation has the necessary characteristics. + + Verifies that the `relation_name` provided: (1) exists in metadata.yaml, + (2) declares as interface the interface name passed as `relation_interface` + and (3) has the right "direction", i.e., it is a relation that `charm` + provides or requires. + + Args: + charm: a `CharmBase` object to scan for the matching relation. + relation_name: the name of the relation to be verified. + expected_relation_interface: the interface name to be matched by the + relation named `relation_name`. + expected_relation_role: whether the `relation_name` must be either + provided or required by `charm`. + + Raises: + RelationNotFoundError: If there is no relation in the charm's metadata.yaml + with the same name as provided via `relation_name` argument. + RelationInterfaceMismatchError: The relation with the same name as provided + via `relation_name` argument does not have the same relation interface + as specified via the `expected_relation_interface` argument. + RelationRoleMismatchError: If the relation with the same name as provided + via `relation_name` argument does not have the same role as specified + via the `expected_relation_role` argument. + """ + if relation_name not in charm.meta.relations: + raise RelationNotFoundError(relation_name) + + relation = charm.meta.relations[relation_name] + + actual_relation_interface = relation.interface_name + if actual_relation_interface != expected_relation_interface: + raise RelationInterfaceMismatchError( + relation_name, + expected_relation_interface, + actual_relation_interface, # pyright: ignore + ) + + if expected_relation_role == RelationRole.provides: + if relation_name not in charm.meta.provides: + raise RelationRoleMismatchError( + relation_name, RelationRole.provides, RelationRole.requires + ) + elif expected_relation_role == RelationRole.requires: + if relation_name not in charm.meta.requires: + raise RelationRoleMismatchError( + relation_name, RelationRole.requires, RelationRole.provides + ) + else: + raise Exception("Unexpected RelationDirection: {}".format(expected_relation_role)) + + +class InvalidAlertRulePathError(Exception): + """Raised if the alert rules folder cannot be found or is otherwise invalid.""" + + def __init__( + self, + alert_rules_absolute_path: Path, + message: str, + ): + self.alert_rules_absolute_path = alert_rules_absolute_path + self.message = message + + super().__init__(self.message) + + +def _is_official_alert_rule_format(rules_dict: dict) -> bool: + """Are alert rules in the upstream format as supported by Loki. + + Alert rules in dictionary format are in "official" form if they + contain a "groups" key, since this implies they contain a list of + alert rule groups. + + Args: + rules_dict: a set of alert rules in Python dictionary format + + Returns: + True if alert rules are in official Loki file format. + """ + return "groups" in rules_dict + + +def _is_single_alert_rule_format(rules_dict: dict) -> bool: + """Are alert rules in single rule format. + + The Loki charm library supports reading of alert rules in a + custom format that consists of a single alert rule per file. This + does not conform to the official Loki alert rule file format + which requires that each alert rules file consists of a list of + alert rule groups and each group consists of a list of alert + rules. + + Alert rules in dictionary form are considered to be in single rule + format if in the least it contains two keys corresponding to the + alert rule name and alert expression. + + Returns: + True if alert rule is in single rule file format. + """ + # one alert rule per file + return set(rules_dict) >= {"alert", "expr"} + + +class AlertRules: + """Utility class for amalgamating Loki alert rule files and injecting juju topology. + + An `AlertRules` object supports aggregating alert rules from files and directories in both + official and single rule file formats using the `add_path()` method. All the alert rules + read are annotated with Juju topology labels and amalgamated into a single data structure + in the form of a Python dictionary using the `as_dict()` method. Such a dictionary can be + easily dumped into JSON format and exchanged over relation data. The dictionary can also + be dumped into YAML format and written directly into an alert rules file that is read by + Loki. Note that multiple `AlertRules` objects must not be written into the same file, + since Loki allows only a single list of alert rule groups per alert rules file. + + The official Loki format is a YAML file conforming to the Loki documentation + (https://grafana.com/docs/loki/latest/api/#list-rule-groups). + The custom single rule format is a subsection of the official YAML, having a single alert + rule, effectively "one alert per file". + """ + + # This class uses the following terminology for the various parts of a rule file: + # - alert rules file: the entire groups[] yaml, including the "groups:" key. + # - alert groups (plural): the list of groups[] (a list, i.e. no "groups:" key) - it is a list + # of dictionaries that have the "name" and "rules" keys. + # - alert group (singular): a single dictionary that has the "name" and "rules" keys. + # - alert rules (plural): all the alerts in a given alert group - a list of dictionaries with + # the "alert" and "expr" keys. + # - alert rule (singular): a single dictionary that has the "alert" and "expr" keys. + + def __init__(self, topology: Optional[JujuTopology] = None): + """Build and alert rule object. + + Args: + topology: a `JujuTopology` instance that is used to annotate all alert rules. + """ + self.topology = topology + self.tool = CosTool(None) + self.alert_groups = [] # type: List[dict] + + def _from_file(self, root_path: Path, file_path: Path) -> List[dict]: + """Read a rules file from path, injecting juju topology. + + Args: + root_path: full path to the root rules folder (used only for generating group name) + file_path: full path to a *.rule file. + + Returns: + A list of dictionaries representing the rules file, if file is valid (the structure is + formed by `yaml.safe_load` of the file); an empty list otherwise. + """ + with file_path.open() as rf: + # Load a list of rules from file then add labels and filters + try: + rule_file = yaml.safe_load(rf) or {} + + except Exception as e: + logger.error("Failed to read alert rules from %s: %s", file_path.name, e) + return [] + + if _is_official_alert_rule_format(rule_file): + alert_groups = rule_file["groups"] + elif _is_single_alert_rule_format(rule_file): + # convert to list of alert groups + # group name is made up from the file name + alert_groups = [{"name": file_path.stem, "rules": [rule_file]}] + else: + # invalid/unsupported + reason = "file is empty" if not rule_file else "unexpected file structure" + logger.error("Invalid rules file (%s): %s", reason, file_path.name) + return [] + + # update rules with additional metadata + for alert_group in alert_groups: + # update group name with topology and sub-path + alert_group["name"] = self._group_name( + str(root_path), + str(file_path), + alert_group["name"], + ) + + # add "juju_" topology labels + for alert_rule in alert_group["rules"]: + if "labels" not in alert_rule: + alert_rule["labels"] = {} + + if self.topology: + alert_rule["labels"].update(self.topology.label_matcher_dict) + # insert juju topology filters into a prometheus alert rule + # logql doesn't like empty matchers, so add a job matcher which hits + # any string as a "wildcard" which the topology labels will + # filter down + alert_rule["expr"] = self.tool.inject_label_matchers( + re.sub(r"%%juju_topology%%", r'job=~".+"', alert_rule["expr"]), + self.topology.label_matcher_dict, + ) + + return alert_groups + + def _group_name( + self, + root_path: typing.Union[Path, str], + file_path: typing.Union[Path, str], + group_name: str, + ) -> str: + """Generate group name from path and topology. + + The group name is made up of the relative path between the root dir_path, the file path, + and topology identifier. + + Args: + root_path: path to the root rules dir. + file_path: path to rule file. + group_name: original group name to keep as part of the new augmented group name + + Returns: + New group name, augmented by juju topology and relative path. + """ + file_path = Path(file_path) if not isinstance(file_path, Path) else file_path + root_path = Path(root_path) if not isinstance(root_path, Path) else root_path + rel_path = file_path.parent.relative_to(root_path.as_posix()) + + # We should account for both absolute paths and Windows paths. Convert it to a POSIX + # string, strip off any leading /, then join it + + path_str = "" + if not rel_path == Path("."): + # Get rid of leading / and optionally drive letters so they don't muck up + # the template later, since Path.parts returns them. The 'if relpath.is_absolute ...' + # isn't even needed since re.sub doesn't throw exceptions if it doesn't match, so it's + # optional, but it makes it clear what we're doing. + + # Note that Path doesn't actually care whether the path is valid just to instantiate + # the object, so we can happily strip that stuff out to make templating nicer + rel_path = Path( + re.sub(r"^([A-Za-z]+:)?/", "", rel_path.as_posix()) + if rel_path.is_absolute() + else str(rel_path) + ) + + # Get rid of relative path characters in the middle which both os.path and pathlib + # leave hanging around. We could use path.resolve(), but that would lead to very + # long template strings when rules come from pods and/or other deeply nested charm + # paths + path_str = "_".join(filter(lambda x: x not in ["..", "/"], rel_path.parts)) + + # Generate group name: + # - name, from juju topology + # - suffix, from the relative path of the rule file; + group_name_parts = [self.topology.identifier] if self.topology else [] + group_name_parts.extend([path_str, group_name, "alerts"]) + # filter to remove empty strings + return "_".join(filter(lambda x: x, group_name_parts)) + + @classmethod + def _multi_suffix_glob( + cls, dir_path: Path, suffixes: List[str], recursive: bool = True + ) -> list: + """Helper function for getting all files in a directory that have a matching suffix. + + Args: + dir_path: path to the directory to glob from. + suffixes: list of suffixes to include in the glob (items should begin with a period). + recursive: a flag indicating whether a glob is recursive (nested) or not. + + Returns: + List of files in `dir_path` that have one of the suffixes specified in `suffixes`. + """ + all_files_in_dir = dir_path.glob("**/*" if recursive else "*") + return list(filter(lambda f: f.is_file() and f.suffix in suffixes, all_files_in_dir)) + + def _from_dir(self, dir_path: Path, recursive: bool) -> List[dict]: + """Read all rule files in a directory. + + All rules from files for the same directory are loaded into a single + group. The generated name of this group includes juju topology. + By default, only the top directory is scanned; for nested scanning, pass `recursive=True`. + + Args: + dir_path: directory containing *.rule files (alert rules without groups). + recursive: flag indicating whether to scan for rule files recursively. + + Returns: + a list of dictionaries representing prometheus alert rule groups, each dictionary + representing an alert group (structure determined by `yaml.safe_load`). + """ + alert_groups = [] # type: List[dict] + + # Gather all alerts into a list of groups + for file_path in self._multi_suffix_glob(dir_path, [".rule", ".rules"], recursive): + alert_groups_from_file = self._from_file(dir_path, file_path) + if alert_groups_from_file: + logger.debug("Reading alert rule from %s", file_path) + alert_groups.extend(alert_groups_from_file) + + return alert_groups + + def add_path(self, path_str: str, *, recursive: bool = False): + """Add rules from a dir path. + + All rules from files are aggregated into a data structure representing a single rule file. + All group names are augmented with juju topology. + + Args: + path_str: either a rules file or a dir of rules files. + recursive: whether to read files recursively or not (no impact if `path` is a file). + + Raises: + InvalidAlertRulePathError: if the provided path is invalid. + """ + path = Path(path_str) # type: Path + if path.is_dir(): + self.alert_groups.extend(self._from_dir(path, recursive)) + elif path.is_file(): + self.alert_groups.extend(self._from_file(path.parent, path)) + else: + logger.debug("The alerts file does not exist: %s", path) + + def as_dict(self) -> dict: + """Return standard alert rules file in dict representation. + + Returns: + a dictionary containing a single list of alert rule groups. + The list of alert rule groups is provided as value of the + "groups" dictionary key. + """ + return {"groups": self.alert_groups} if self.alert_groups else {} + + +def _resolve_dir_against_charm_path(charm: CharmBase, *path_elements: str) -> str: + """Resolve the provided path items against the directory of the main file. + + Look up the directory of the `main.py` file being executed. This is normally + going to be the charm.py file of the charm including this library. Then, resolve + the provided path elements and, if the result path exists and is a directory, + return its absolute path; otherwise, raise en exception. + + Raises: + InvalidAlertRulePathError, if the path does not exist or is not a directory. + """ + charm_dir = Path(str(charm.charm_dir)) + if not charm_dir.exists() or not charm_dir.is_dir(): + # Operator Framework does not currently expose a robust + # way to determine the top level charm source directory + # that is consistent across deployed charms and unit tests + # Hence for unit tests the current working directory is used + # TODO: updated this logic when the following ticket is resolved + # https://github.com/canonical/operator/issues/643 + charm_dir = Path(os.getcwd()) + + alerts_dir_path = charm_dir.absolute().joinpath(*path_elements) + + if not alerts_dir_path.exists(): + raise InvalidAlertRulePathError(alerts_dir_path, "directory does not exist") + if not alerts_dir_path.is_dir(): + raise InvalidAlertRulePathError(alerts_dir_path, "is not a directory") + + return str(alerts_dir_path) + + +class NoRelationWithInterfaceFoundError(Exception): + """No relations with the given interface are found in the charm meta.""" + + def __init__(self, charm: CharmBase, relation_interface: Optional[str] = None): + self.charm = charm + self.relation_interface = relation_interface + self.message = ( + "No relations with interface '{}' found in the meta of the '{}' charm".format( + relation_interface, charm.meta.name + ) + ) + + super().__init__(self.message) + + +class MultipleRelationsWithInterfaceFoundError(Exception): + """Multiple relations with the given interface are found in the charm meta.""" + + def __init__(self, charm: CharmBase, relation_interface: str, relations: list): + self.charm = charm + self.relation_interface = relation_interface + self.relations = relations + self.message = ( + "Multiple relations with interface '{}' found in the meta of the '{}' charm.".format( + relation_interface, charm.meta.name + ) + ) + super().__init__(self.message) + + +class LokiPushApiEndpointDeparted(EventBase): + """Event emitted when Loki departed.""" + + +class LokiPushApiEndpointJoined(EventBase): + """Event emitted when Loki joined.""" + + +class LokiPushApiAlertRulesChanged(EventBase): + """Event emitted if there is a change in the alert rules.""" + + def __init__(self, handle, relation, relation_id, app=None, unit=None): + """Pretend we are almost like a RelationEvent. + + Fields to serialize: + { + "relation_name": , + "relation_id": , + "app_name": , + "unit_name": + } + + In this way, we can transparently use `RelationEvent.snapshot()` to pass + it back if we need to log it. + """ + super().__init__(handle) + self.relation = relation + self.relation_id = relation_id + self.app = app + self.unit = unit + + def snapshot(self) -> Dict: + """Save event information.""" + if not self.relation: + return {} + snapshot = {"relation_name": self.relation.name, "relation_id": self.relation.id} + if self.app: + snapshot["app_name"] = self.app.name + if self.unit: + snapshot["unit_name"] = self.unit.name + return snapshot + + def restore(self, snapshot: dict): + """Restore event information.""" + self.relation = self.framework.model.get_relation( + snapshot["relation_name"], snapshot["relation_id"] + ) + app_name = snapshot.get("app_name") + if app_name: + self.app = self.framework.model.get_app(app_name) + else: + self.app = None + unit_name = snapshot.get("unit_name") + if unit_name: + self.unit = self.framework.model.get_unit(unit_name) + else: + self.unit = None + + +class InvalidAlertRuleEvent(EventBase): + """Event emitted when alert rule files are not parsable. + + Enables us to set a clear status on the provider. + """ + + def __init__(self, handle, errors: str = "", valid: bool = False): + super().__init__(handle) + self.errors = errors + self.valid = valid + + def snapshot(self) -> Dict: + """Save alert rule information.""" + return { + "valid": self.valid, + "errors": self.errors, + } + + def restore(self, snapshot): + """Restore alert rule information.""" + self.valid = snapshot["valid"] + self.errors = snapshot["errors"] + + +class LokiPushApiEvents(ObjectEvents): + """Event descriptor for events raised by `LokiPushApiProvider`.""" + + loki_push_api_endpoint_departed = EventSource(LokiPushApiEndpointDeparted) + loki_push_api_endpoint_joined = EventSource(LokiPushApiEndpointJoined) + loki_push_api_alert_rules_changed = EventSource(LokiPushApiAlertRulesChanged) + alert_rule_status_changed = EventSource(InvalidAlertRuleEvent) + + +class LokiPushApiProvider(Object): + """A LokiPushApiProvider class.""" + + on = LokiPushApiEvents() # pyright: ignore + + def __init__( + self, + charm, + relation_name: str = DEFAULT_RELATION_NAME, + *, + port: Union[str, int] = 3100, + scheme: str = "http", + address: str = "localhost", + path: str = "loki/api/v1/push", + ): + """A Loki service provider. + + Args: + charm: a `CharmBase` instance that manages this + instance of the Loki service. + relation_name: an optional string name of the relation between `charm` + and the Loki charmed service. The default is "logging". + It is strongly advised not to change the default, so that people + deploying your charm will have a consistent experience with all + other charms that consume metrics endpoints. + port: an optional port of the Loki service (default is "3100"). + scheme: an optional scheme of the Loki API URL (default is "http"). + address: an optional address of the Loki service (default is "localhost"). + path: an optional path of the Loki API URL (default is "loki/api/v1/push") + + Raises: + RelationNotFoundError: If there is no relation in the charm's metadata.yaml + with the same name as provided via `relation_name` argument. + RelationInterfaceMismatchError: The relation with the same name as provided + via `relation_name` argument does not have the `loki_push_api` relation + interface. + RelationRoleMismatchError: If the relation with the same name as provided + via `relation_name` argument does not have the `RelationRole.requires` + role. + """ + _validate_relation_by_interface_and_direction( + charm, relation_name, RELATION_INTERFACE_NAME, RelationRole.provides + ) + super().__init__(charm, relation_name) + self._charm = charm + self._relation_name = relation_name + self._tool = CosTool(self) + self.port = int(port) + self.scheme = scheme + self.address = address + self.path = path + + events = self._charm.on[relation_name] + self.framework.observe(self._charm.on.upgrade_charm, self._on_lifecycle_event) + self.framework.observe(events.relation_joined, self._on_logging_relation_joined) + self.framework.observe(events.relation_changed, self._on_logging_relation_changed) + self.framework.observe(events.relation_departed, self._on_logging_relation_departed) + self.framework.observe(events.relation_broken, self._on_logging_relation_broken) + + def _on_lifecycle_event(self, _): + # Upgrade event or other charm-level event + should_update = False + for relation in self._charm.model.relations[self._relation_name]: + # Don't accidentally flip a True result back. + should_update = should_update or self._process_logging_relation_changed(relation) + if should_update: + # We don't have a RelationEvent, so build it up by hand + first_rel = self._charm.model.relations[self._relation_name][0] + self.on.loki_push_api_alert_rules_changed.emit( + relation=first_rel, + relation_id=first_rel.id, + ) + + def _on_logging_relation_joined(self, event: RelationJoinedEvent): + """Set basic data on relation joins. + + Set the promtail binary URL location, which will not change, and anything + else which may be required, but is static.. + + Args: + event: a `CharmEvent` in response to which the consumer + charm must set its relation data. + """ + if self._charm.unit.is_leader(): + event.relation.data[self._charm.app].update(self._promtail_binary_url) + logger.debug("Saved promtail binary url: %s", self._promtail_binary_url) + + def _on_logging_relation_changed(self, event: HookEvent): + """Handle changes in related consumers. + + Anytime there are changes in the relation between Loki + and its consumers charms. + + Args: + event: a `CharmEvent` in response to which the consumer + charm must update its relation data. + """ + should_update = self._process_logging_relation_changed(event.relation) # pyright: ignore + if should_update: + self.on.loki_push_api_alert_rules_changed.emit( + relation=event.relation, # pyright: ignore + relation_id=event.relation.id, # pyright: ignore + app=self._charm.app, + unit=self._charm.unit, + ) + + def _on_logging_relation_broken(self, event: RelationBrokenEvent): + """Removes alert rules files when consumer charms left the relation with Loki. + + Args: + event: a `CharmEvent` in response to which the Loki + charm must update its relation data. + """ + self.on.loki_push_api_alert_rules_changed.emit( + relation=event.relation, + relation_id=event.relation.id, + app=self._charm.app, + unit=self._charm.unit, + ) + + def _on_logging_relation_departed(self, event: RelationDepartedEvent): + """Removes alert rules files when consumer charms left the relation with Loki. + + Args: + event: a `CharmEvent` in response to which the Loki + charm must update its relation data. + """ + self.on.loki_push_api_alert_rules_changed.emit( + relation=event.relation, + relation_id=event.relation.id, + app=self._charm.app, + unit=self._charm.unit, + ) + + def _should_update_alert_rules(self, relation) -> bool: + """Determine whether alert rules should be regenerated. + + If there are alert rules in the relation data bag, tell the charm + whether to regenerate them based on the boolean returned here. + """ + if relation.data.get(relation.app).get("alert_rules", None) is not None: + return True + return False + + def _process_logging_relation_changed(self, relation: Relation) -> bool: + """Handle changes in related consumers. + + Anytime there are changes in relations between Loki + and its consumers charms, Loki set the `loki_push_api` + into the relation data. Set the endpoint building + appropriately, and if there are alert rules present in + the relation, let the caller know. + Besides Loki generates alert rules files based what + consumer charms forwards, + + Args: + relation: the `Relation` instance to update. + + Returns: + A boolean indicating whether an event should be emitted, so we + only emit one on lifecycle events + """ + relation.data[self._charm.unit]["public_address"] = socket.getfqdn() or "" + self.update_endpoint(relation=relation) + return self._should_update_alert_rules(relation) + + @property + def _promtail_binary_url(self) -> dict: + """URL from which Promtail binary can be downloaded.""" + # construct promtail binary url paths from parts + promtail_binaries = {} + for arch, info in PROMTAIL_BINARIES.items(): + info["url"] = "{}/promtail-{}/{}.gz".format( + PROMTAIL_BASE_URL, PROMTAIL_VERSION, info["filename"] + ) + promtail_binaries[arch] = info + + return {"promtail_binary_zip_url": json.dumps(promtail_binaries)} + + def update_endpoint(self, url: str = "", relation: Optional[Relation] = None) -> None: + """Triggers programmatically the update of endpoint in unit relation data. + + This method should be used when the charm relying on this library needs + to update the relation data in response to something occurring outside + the `logging` relation lifecycle, e.g., in case of a + host address change because the charmed operator becomes connected to an + Ingress after the `logging` relation is established. + + Args: + url: An optional url value to update relation data. + relation: An optional instance of `class:ops.model.Relation` to update. + """ + # if no relation is specified update all of them + if not relation: + if not self._charm.model.relations.get(self._relation_name): + return + + relations_list = self._charm.model.relations.get(self._relation_name) + else: + relations_list = [relation] + + endpoint = self._endpoint(url or self._url) + + for relation in relations_list: + relation.data[self._charm.unit].update({"endpoint": json.dumps(endpoint)}) + + logger.debug("Saved endpoint in unit relation data") + + @property + def _url(self) -> str: + """Get local Loki Push API url. + + Return url to loki, including port number, but without the endpoint subpath. + """ + return "http://{}:{}".format(socket.getfqdn(), self.port) + + def _endpoint(self, url) -> dict: + """Get Loki push API endpoint for a given url. + + Args: + url: A loki unit URL. + + Returns: str + """ + endpoint = "/loki/api/v1/push" + return {"url": url.rstrip("/") + endpoint} + + @property + def alerts(self) -> dict: # noqa: C901 + """Fetch alerts for all relations. + + A Loki alert rules file consists of a list of "groups". Each + group consists of a list of alerts (`rules`) that are sequentially + executed. This method returns all the alert rules provided by each + related metrics provider charm. These rules may be used to generate a + separate alert rules file for each relation since the returned list + of alert groups are indexed by relation ID. Also for each relation ID + associated scrape metadata such as Juju model, UUID and application + name are provided so a unique name may be generated for the rules + file. For each relation the structure of data returned is a dictionary + with four keys + + - groups + - model + - model_uuid + - application + + The value of the `groups` key is such that it may be used to generate + a Loki alert rules file directly using `yaml.dump` but the + `groups` key itself must be included as this is required by Loki, + for example as in `yaml.dump({"groups": alerts["groups"]})`. + + Currently only accepts a list of rules and these + rules are all placed into a single group, even though Loki itself + allows for multiple groups within a single alert rules file. + + Returns: + a dictionary of alert rule groups and associated scrape + metadata indexed by relation ID. + """ + alerts = {} # type: Dict[str, dict] # mapping b/w juju identifiers and alert rule files + for relation in self._charm.model.relations[self._relation_name]: + if not relation.units or not relation.app: + continue + + alert_rules = json.loads(relation.data[relation.app].get("alert_rules", "{}")) + if not alert_rules: + continue + + alert_rules = self._inject_alert_expr_labels(alert_rules) + + identifier, topology = self._get_identifier_by_alert_rules(alert_rules) + if not topology: + try: + metadata = json.loads(relation.data[relation.app]["metadata"]) + identifier = JujuTopology.from_dict(metadata).identifier + alerts[identifier] = self._tool.apply_label_matchers(alert_rules) # type: ignore + + except KeyError as e: + logger.debug( + "Relation %s has no 'metadata': %s", + relation.id, + e, + ) + + if not identifier: + logger.error( + "Alert rules were found but no usable group or identifier was present." + ) + continue + + _, errmsg = self._tool.validate_alert_rules(alert_rules) + if errmsg: + relation.data[self._charm.app]["event"] = json.dumps({"errors": errmsg}) + continue + + alerts[identifier] = alert_rules + + return alerts + + def _get_identifier_by_alert_rules( + self, rules: dict + ) -> Tuple[Union[str, None], Union[JujuTopology, None]]: + """Determine an appropriate dict key for alert rules. + + The key is used as the filename when writing alerts to disk, so the structure + and uniqueness is important. + + Args: + rules: a dict of alert rules + Returns: + A tuple containing an identifier, if found, and a JujuTopology, if it could + be constructed. + """ + if "groups" not in rules: + logger.debug("No alert groups were found in relation data") + return None, None + + # Construct an ID based on what's in the alert rules if they have labels + for group in rules["groups"]: + try: + labels = group["rules"][0]["labels"] + topology = JujuTopology( + # Don't try to safely get required constructor fields. There's already + # a handler for KeyErrors + model_uuid=labels["juju_model_uuid"], + model=labels["juju_model"], + application=labels["juju_application"], + unit=labels.get("juju_unit", ""), + charm_name=labels.get("juju_charm", ""), + ) + return topology.identifier, topology + except KeyError: + logger.debug("Alert rules were found but no usable labels were present") + continue + + logger.warning( + "No labeled alert rules were found, and no 'scrape_metadata' " + "was available. Using the alert group name as filename." + ) + try: + for group in rules["groups"]: + return group["name"], None + except KeyError: + logger.debug("No group name was found to use as identifier") + + return None, None + + def _inject_alert_expr_labels(self, rules: Dict[str, Any]) -> Dict[str, Any]: + """Iterate through alert rules and inject topology into expressions. + + Args: + rules: a dict of alert rules + """ + if "groups" not in rules: + return rules + + modified_groups = [] + for group in rules["groups"]: + # Copy off rules, so we don't modify an object we're iterating over + rules_copy = group["rules"] + for idx, rule in enumerate(rules_copy): + labels = rule.get("labels") + + if labels: + try: + topology = JujuTopology( + # Don't try to safely get required constructor fields. There's already + # a handler for KeyErrors + model_uuid=labels["juju_model_uuid"], + model=labels["juju_model"], + application=labels["juju_application"], + unit=labels.get("juju_unit", ""), + charm_name=labels.get("juju_charm", ""), + ) + + # Inject topology and put it back in the list + rule["expr"] = self._tool.inject_label_matchers( + re.sub(r"%%juju_topology%%,?", "", rule["expr"]), + topology.label_matcher_dict, + ) + except KeyError: + # Some required JujuTopology key is missing. Just move on. + pass + + group["rules"][idx] = rule + + modified_groups.append(group) + + rules["groups"] = modified_groups + return rules + + +class ConsumerBase(Object): + """Consumer's base class.""" + + def __init__( + self, + charm: CharmBase, + relation_name: str = DEFAULT_RELATION_NAME, + alert_rules_path: str = DEFAULT_ALERT_RULES_RELATIVE_PATH, + recursive: bool = False, + skip_alert_topology_labeling: bool = False, + ): + super().__init__(charm, relation_name) + self._charm = charm + self._relation_name = relation_name + self.topology = JujuTopology.from_charm(charm) + + try: + alert_rules_path = _resolve_dir_against_charm_path(charm, alert_rules_path) + except InvalidAlertRulePathError as e: + logger.debug( + "Invalid Loki alert rules folder at %s: %s", + e.alert_rules_absolute_path, + e.message, + ) + self._alert_rules_path = alert_rules_path + self._skip_alert_topology_labeling = skip_alert_topology_labeling + + self._recursive = recursive + + def _handle_alert_rules(self, relation): + if not self._charm.unit.is_leader(): + return + + alert_rules = ( + AlertRules(None) if self._skip_alert_topology_labeling else AlertRules(self.topology) + ) + alert_rules.add_path(self._alert_rules_path, recursive=self._recursive) + alert_rules_as_dict = alert_rules.as_dict() + + relation.data[self._charm.app]["metadata"] = json.dumps(self.topology.as_dict()) + relation.data[self._charm.app]["alert_rules"] = json.dumps( + alert_rules_as_dict, + sort_keys=True, # sort, to prevent unnecessary relation_changed events + ) + + @property + def loki_endpoints(self) -> List[dict]: + """Fetch Loki Push API endpoints sent from LokiPushApiProvider through relation data. + + Returns: + A list of dictionaries with Loki Push API endpoints, for instance: + [ + {"url": "http://loki1:3100/loki/api/v1/push"}, + {"url": "http://loki2:3100/loki/api/v1/push"}, + ] + """ + endpoints = [] # type: list + + for relation in self._charm.model.relations[self._relation_name]: + for unit in relation.units: + if unit.app == self._charm.app: + # This is a peer unit + continue + + endpoint = relation.data[unit].get("endpoint") + if endpoint: + deserialized_endpoint = json.loads(endpoint) + endpoints.append(deserialized_endpoint) + + return endpoints + + +class LokiPushApiConsumer(ConsumerBase): + """Loki Consumer class.""" + + on = LokiPushApiEvents() # pyright: ignore + + def __init__( + self, + charm: CharmBase, + relation_name: str = DEFAULT_RELATION_NAME, + alert_rules_path: str = DEFAULT_ALERT_RULES_RELATIVE_PATH, + recursive: bool = True, + skip_alert_topology_labeling: bool = False, + ): + """Construct a Loki charm client. + + The `LokiPushApiConsumer` object provides configurations to a Loki client charm, such as + the Loki API endpoint to push logs. It is intended for workloads that can speak + loki_push_api (https://grafana.com/docs/loki/latest/api/#push-log-entries-to-loki), such + as grafana-agent. + (If you only need to forward a few workload log files, then use LogProxyConsumer.) + + `LokiPushApiConsumer` can be instantiated as follows: + + self._loki_consumer = LokiPushApiConsumer(self) + + Args: + charm: a `CharmBase` object that manages this `LokiPushApiConsumer` object. + Typically, this is `self` in the instantiating class. + relation_name: the string name of the relation interface to look up. + If `charm` has exactly one relation with this interface, the relation's + name is returned. If none or multiple relations with the provided interface + are found, this method will raise either a NoRelationWithInterfaceFoundError or + MultipleRelationsWithInterfaceFoundError exception, respectively. + alert_rules_path: a string indicating a path where alert rules can be found + recursive: Whether to scan for rule files recursively. + skip_alert_topology_labeling: whether to skip the alert topology labeling. + + Raises: + RelationNotFoundError: If there is no relation in the charm's metadata.yaml + with the same name as provided via `relation_name` argument. + RelationInterfaceMismatchError: The relation with the same name as provided + via `relation_name` argument does not have the `loki_push_api` relation + interface. + RelationRoleMismatchError: If the relation with the same name as provided + via `relation_name` argument does not have the `RelationRole.provides` + role. + + Emits: + loki_push_api_endpoint_joined: This event is emitted when the relation between the + Charmed Operator that instantiates `LokiPushApiProvider` (Loki charm for instance) + and the Charmed Operator that instantiates `LokiPushApiConsumer` is established. + loki_push_api_endpoint_departed: This event is emitted when the relation between the + Charmed Operator that implements `LokiPushApiProvider` (Loki charm for instance) + and the Charmed Operator that implements `LokiPushApiConsumer` is removed. + loki_push_api_alert_rules_error: This event is emitted when an invalid alert rules + file is encountered or if `alert_rules_path` is empty. + """ + _validate_relation_by_interface_and_direction( + charm, relation_name, RELATION_INTERFACE_NAME, RelationRole.requires + ) + super().__init__( + charm, relation_name, alert_rules_path, recursive, skip_alert_topology_labeling + ) + events = self._charm.on[relation_name] + self.framework.observe(self._charm.on.upgrade_charm, self._on_lifecycle_event) + self.framework.observe(events.relation_joined, self._on_logging_relation_joined) + self.framework.observe(events.relation_changed, self._on_logging_relation_changed) + self.framework.observe(events.relation_departed, self._on_logging_relation_departed) + + def _on_lifecycle_event(self, _: HookEvent): + """Update require relation data on charm upgrades and other lifecycle events. + + Args: + event: a `CharmEvent` in response to which the consumer + charm must update its relation data. + """ + # Upgrade event or other charm-level event + self._reinitialize_alert_rules() + self.on.loki_push_api_endpoint_joined.emit() + + def _on_logging_relation_joined(self, event: RelationJoinedEvent): + """Handle changes in related consumers. + + Update relation data and emit events when a relation is established. + + Args: + event: a `CharmEvent` in response to which the consumer + charm must update its relation data. + + Emits: + loki_push_api_endpoint_joined: Once the relation is established, this event is emitted. + loki_push_api_alert_rules_error: This event is emitted when an invalid alert rules + file is encountered or if `alert_rules_path` is empty. + """ + # Alert rules will not change over the lifecycle of a charm, and do not need to be + # constantly set on every relation_changed event. Leave them here. + self._handle_alert_rules(event.relation) + self.on.loki_push_api_endpoint_joined.emit() + + def _on_logging_relation_changed(self, event: RelationEvent): + """Handle changes in related consumers. + + Anytime there are changes in the relation between Loki + and its consumers charms. + + Args: + event: a `CharmEvent` in response to which the consumer + charm must update its relation data. + + Emits: + loki_push_api_endpoint_joined: Once the relation is established, this event is emitted. + loki_push_api_alert_rules_error: This event is emitted when an invalid alert rules + file is encountered or if `alert_rules_path` is empty. + """ + if self._charm.unit.is_leader(): + ev = json.loads(event.relation.data[event.app].get("event", "{}")) + + if ev: + valid = bool(ev.get("valid", True)) + errors = ev.get("errors", "") + + if valid and not errors: + self.on.alert_rule_status_changed.emit(valid=valid) + else: + self.on.alert_rule_status_changed.emit(valid=valid, errors=errors) + + self.on.loki_push_api_endpoint_joined.emit() + + def _reinitialize_alert_rules(self): + """Reloads alert rules and updates all relations.""" + for relation in self._charm.model.relations[self._relation_name]: + self._handle_alert_rules(relation) + + def _process_logging_relation_changed(self, relation: Relation): + self._handle_alert_rules(relation) + self.on.loki_push_api_endpoint_joined.emit() + + def _on_logging_relation_departed(self, _: RelationEvent): + """Handle departures in related providers. + + Anytime there are departures in relations between the consumer charm and Loki + the consumer charm is informed, through a `LokiPushApiEndpointDeparted` event. + The consumer charm can then choose to update its configuration. + """ + # Provide default to avoid throwing, as in some complicated scenarios with + # upgrades and hook failures we might not have data in the storage + self.on.loki_push_api_endpoint_departed.emit() + + +class ContainerNotFoundError(Exception): + """Raised if the specified container does not exist.""" + + def __init__(self): + msg = "The specified container does not exist." + self.message = msg + + super().__init__(self.message) + + +class PromtailDigestError(EventBase): + """Event emitted when there is an error with Promtail initialization.""" + + def __init__(self, handle, message): + super().__init__(handle) + self.message = message + + def snapshot(self): + """Save message information.""" + return {"message": self.message} + + def restore(self, snapshot): + """Restore message information.""" + self.message = snapshot["message"] + + +class LogProxyEndpointDeparted(EventBase): + """Event emitted when a Log Proxy has departed.""" + + +class LogProxyEndpointJoined(EventBase): + """Event emitted when a Log Proxy joins.""" + + +class LogProxyEvents(ObjectEvents): + """Event descriptor for events raised by `LogProxyConsumer`.""" + + promtail_digest_error = EventSource(PromtailDigestError) + log_proxy_endpoint_departed = EventSource(LogProxyEndpointDeparted) + log_proxy_endpoint_joined = EventSource(LogProxyEndpointJoined) + + +class LogProxyConsumer(ConsumerBase): + """LogProxyConsumer class. + + The `LogProxyConsumer` object provides a method for attaching `promtail` to + a workload in order to generate structured logging data from applications + which traditionally log to syslog or do not have native Loki integration. + The `LogProxyConsumer` can be instantiated as follows: + + self._log_proxy = LogProxyConsumer( + self, + logs_scheme={ + "workload-a": { + "log-files": ["/tmp/worload-a-1.log", "/tmp/worload-a-2.log"], + "syslog-port": 1514, + }, + "workload-b": {"log-files": ["/tmp/worload-b.log"], "syslog-port": 1515}, + }, + relation_name="log-proxy", + ) + + Args: + charm: a `CharmBase` object that manages this `LokiPushApiConsumer` object. + Typically, this is `self` in the instantiating class. + logs_scheme: a dict which maps containers and a list of log files and syslog port. + relation_name: the string name of the relation interface to look up. + If `charm` has exactly one relation with this interface, the relation's + name is returned. If none or multiple relations with the provided interface + are found, this method will raise either a NoRelationWithInterfaceFoundError or + MultipleRelationsWithInterfaceFoundError exception, respectively. + containers_syslog_port: a dict which maps (and enable) containers and syslog port. + alert_rules_path: an optional path for the location of alert rules + files. Defaults to "./src/loki_alert_rules", + resolved from the directory hosting the charm entry file. + The alert rules are automatically updated on charm upgrade. + recursive: Whether to scan for rule files recursively. + promtail_resource_name: An optional promtail resource name from metadata + if it has been modified and attached + insecure_skip_verify: skip SSL verification. + + Raises: + RelationNotFoundError: If there is no relation in the charm's metadata.yaml + with the same name as provided via `relation_name` argument. + RelationInterfaceMismatchError: The relation with the same name as provided + via `relation_name` argument does not have the `loki_push_api` relation + interface. + RelationRoleMismatchError: If the relation with the same name as provided + via `relation_name` argument does not have the `RelationRole.provides` + role. + """ + + on = LogProxyEvents() # pyright: ignore + + def __init__( + self, + charm, + *, + logs_scheme=None, + relation_name: str = DEFAULT_LOG_PROXY_RELATION_NAME, + alert_rules_path: str = DEFAULT_ALERT_RULES_RELATIVE_PATH, + recursive: bool = False, + promtail_resource_name: Optional[str] = None, + insecure_skip_verify: bool = False, + ): + super().__init__(charm, relation_name, alert_rules_path, recursive) + self._charm = charm + self._logs_scheme = logs_scheme or {} + self._relation_name = relation_name + self.topology = JujuTopology.from_charm(charm) + self._promtail_resource_name = promtail_resource_name or "promtail-bin" + self.insecure_skip_verify = insecure_skip_verify + self._promtails_ports = self._generate_promtails_ports(logs_scheme) + + # architecture used for promtail binary + arch = platform.processor() + self._arch = "amd64" if arch == "x86_64" else arch + + events = self._charm.on[relation_name] + self.framework.observe(events.relation_created, self._on_relation_created) + self.framework.observe(events.relation_changed, self._on_relation_changed) + self.framework.observe(events.relation_departed, self._on_relation_departed) + self._observe_pebble_ready() + + def _observe_pebble_ready(self): + for container in self._containers.keys(): + snake_case_container_name = container.replace("-", "_") + self.framework.observe( + getattr(self._charm.on, f"{snake_case_container_name}_pebble_ready"), + self._on_pebble_ready, + ) + + def _on_pebble_ready(self, event: WorkloadEvent): + """Event handler for `pebble_ready`.""" + if self.model.relations[self._relation_name]: + self._setup_promtail(event.workload) + + def _on_relation_created(self, _: RelationCreatedEvent) -> None: + """Event handler for `relation_created`.""" + for container in self._containers.values(): + if container.can_connect(): + self._setup_promtail(container) + + def _on_relation_changed(self, event: RelationEvent) -> None: + """Event handler for `relation_changed`. + + Args: + event: The event object `RelationChangedEvent`. + """ + self._handle_alert_rules(event.relation) + + if self._charm.unit.is_leader(): + ev = json.loads(event.relation.data[event.app].get("event", "{}")) + + if ev: + valid = bool(ev.get("valid", True)) + errors = ev.get("errors", "") + + if valid and not errors: + self.on.alert_rule_status_changed.emit(valid=valid) + else: + self.on.alert_rule_status_changed.emit(valid=valid, errors=errors) + + for container in self._containers.values(): + if not container.can_connect(): + continue + if self.model.relations[self._relation_name]: + if "promtail" not in container.get_plan().services: + self._setup_promtail(container) + continue + + new_config = self._promtail_config(container.name) + if new_config != self._current_config(container): + container.push( + WORKLOAD_CONFIG_PATH, yaml.safe_dump(new_config), make_dirs=True + ) + + # Loki may send endpoints late. Don't necessarily start, there may be + # no clients + if new_config["clients"]: + container.restart(WORKLOAD_SERVICE_NAME) + self.on.log_proxy_endpoint_joined.emit() + else: + self.on.promtail_digest_error.emit("No promtail client endpoints available!") + + def _on_relation_departed(self, _: RelationEvent) -> None: + """Event handler for `relation_departed`. + + Args: + event: The event object `RelationDepartedEvent`. + """ + for container in self._containers.values(): + if not container.can_connect(): + continue + if not self._charm.model.relations[self._relation_name]: + container.stop(WORKLOAD_SERVICE_NAME) + continue + + new_config = self._promtail_config(container.name) + if new_config != self._current_config(container): + container.push(WORKLOAD_CONFIG_PATH, yaml.safe_dump(new_config), make_dirs=True) + + if new_config["clients"]: + container.restart(WORKLOAD_SERVICE_NAME) + else: + container.stop(WORKLOAD_SERVICE_NAME) + self.on.log_proxy_endpoint_departed.emit() + + def _add_pebble_layer(self, workload_binary_path: str, container: Container) -> None: + """Adds Pebble layer that manages Promtail service in Workload container. + + Args: + workload_binary_path: string providing path to promtail binary in workload container. + container: container into which the layer is to be added. + """ + pebble_layer = Layer( + { + "summary": "promtail layer", + "description": "pebble config layer for promtail", + "services": { + WORKLOAD_SERVICE_NAME: { + "override": "replace", + "summary": WORKLOAD_SERVICE_NAME, + "command": f"{workload_binary_path} {self._cli_args}", + "startup": "disabled", + } + }, + } + ) + container.add_layer(container.name, pebble_layer, combine=True) + + def _create_directories(self, container: Container) -> None: + """Creates the directories for Promtail binary and config file.""" + container.make_dir(path=WORKLOAD_BINARY_DIR, make_parents=True) + container.make_dir(path=WORKLOAD_CONFIG_DIR, make_parents=True) + + def _obtain_promtail(self, promtail_info: dict, container: Container) -> None: + """Obtain promtail binary from an attached resource or download it. + + Args: + promtail_info: dictionary containing information about promtail binary + that must be used. The dictionary must have three keys + - "filename": filename of promtail binary + - "zipsha": sha256 sum of zip file of promtail binary + - "binsha": sha256 sum of unpacked promtail binary + container: container into which promtail is to be obtained. + """ + workload_binary_path = os.path.join(WORKLOAD_BINARY_DIR, promtail_info["filename"]) + if self._promtail_attached_as_resource: + self._push_promtail_if_attached(container, workload_binary_path) + return + + if self._promtail_must_be_downloaded(promtail_info): + self._download_and_push_promtail_to_workload(container, promtail_info) + else: + binary_path = os.path.join(BINARY_DIR, promtail_info["filename"]) + self._push_binary_to_workload(container, binary_path, workload_binary_path) + + def _push_binary_to_workload( + self, container: Container, binary_path: str, workload_binary_path: str + ) -> None: + """Push promtail binary into workload container. + + Args: + binary_path: path in charm container from which promtail binary is read. + workload_binary_path: path in workload container to which promtail binary is pushed. + container: container into which promtail is to be uploaded. + """ + with open(binary_path, "rb") as f: + container.push(workload_binary_path, f, permissions=0o755, make_dirs=True) + logger.debug("The promtail binary file has been pushed to the workload container.") + + @property + def _promtail_attached_as_resource(self) -> bool: + """Checks whether Promtail binary is attached to the charm or not. + + Returns: + a boolean representing whether Promtail binary is attached as a resource or not. + """ + try: + self._charm.model.resources.fetch(self._promtail_resource_name) + return True + except ModelError: + return False + except NameError as e: + if "invalid resource name" in str(e): + return False + raise + + def _push_promtail_if_attached(self, container: Container, workload_binary_path: str) -> bool: + """Checks whether Promtail binary is attached to the charm or not. + + Args: + workload_binary_path: string specifying expected path of promtail + in workload container + container: container into which promtail is to be pushed. + + Returns: + a boolean representing whether Promtail binary is attached or not. + """ + logger.info("Promtail binary file has been obtained from an attached resource.") + resource_path = self._charm.model.resources.fetch(self._promtail_resource_name) + self._push_binary_to_workload(container, resource_path, workload_binary_path) + return True + + def _promtail_must_be_downloaded(self, promtail_info: dict) -> bool: + """Checks whether promtail binary must be downloaded or not. + + Args: + promtail_info: dictionary containing information about promtail binary + that must be used. The dictionary must have three keys + - "filename": filename of promtail binary + - "zipsha": sha256 sum of zip file of promtail binary + - "binsha": sha256 sum of unpacked promtail binary + + Returns: + a boolean representing whether Promtail binary must be downloaded or not. + """ + binary_path = os.path.join(BINARY_DIR, promtail_info["filename"]) + if not self._is_promtail_binary_in_charm(binary_path): + return True + + if not self._sha256sums_matches(binary_path, promtail_info["binsha"]): + return True + + logger.debug("Promtail binary file is already in the the charm container.") + return False + + def _sha256sums_matches(self, file_path: str, sha256sum: str) -> bool: + """Checks whether a file's sha256sum matches or not with a specific sha256sum. + + Args: + file_path: A string representing the files' patch. + sha256sum: The sha256sum against which we want to verify. + + Returns: + a boolean representing whether a file's sha256sum matches or not with + a specific sha256sum. + """ + try: + with open(file_path, "rb") as f: + file_bytes = f.read() + result = sha256(file_bytes).hexdigest() + + if result != sha256sum: + msg = "File sha256sum mismatch, expected:'{}' but got '{}'".format( + sha256sum, result + ) + logger.debug(msg) + return False + + return True + except (APIError, FileNotFoundError): + msg = "File: '{}' could not be opened".format(file_path) + logger.error(msg) + return False + + def _is_promtail_binary_in_charm(self, binary_path: str) -> bool: + """Check if Promtail binary is already stored in charm container. + + Args: + binary_path: string path of promtail binary to check + + Returns: + a boolean representing whether Promtail is present or not. + """ + return True if Path(binary_path).is_file() else False + + def _download_and_push_promtail_to_workload( + self, container: Container, promtail_info: dict + ) -> None: + """Downloads a Promtail zip file and pushes the binary to the workload. + + Args: + promtail_info: dictionary containing information about promtail binary + that must be used. The dictionary must have three keys + - "filename": filename of promtail binary + - "zipsha": sha256 sum of zip file of promtail binary + - "binsha": sha256 sum of unpacked promtail binary + container: container into which promtail is to be uploaded. + """ + with request.urlopen(promtail_info["url"]) as r: + file_bytes = r.read() + file_path = os.path.join(BINARY_DIR, promtail_info["filename"] + ".gz") + with open(file_path, "wb") as f: + f.write(file_bytes) + logger.info( + "Promtail binary zip file has been downloaded and stored in: %s", + file_path, + ) + + decompressed_file = GzipFile(fileobj=BytesIO(file_bytes)) + binary_path = os.path.join(BINARY_DIR, promtail_info["filename"]) + with open(binary_path, "wb") as outfile: + outfile.write(decompressed_file.read()) + logger.debug("Promtail binary file has been downloaded.") + + workload_binary_path = os.path.join(WORKLOAD_BINARY_DIR, promtail_info["filename"]) + self._push_binary_to_workload(container, binary_path, workload_binary_path) + + @property + def _cli_args(self) -> str: + """Return the cli arguments to pass to promtail. + + Returns: + The arguments as a string + """ + return "-config.file={}".format(WORKLOAD_CONFIG_PATH) + + def _current_config(self, container) -> dict: + """Property that returns the current Promtail configuration. + + Returns: + A dict containing Promtail configuration. + """ + if not container.can_connect(): + logger.debug("Could not connect to promtail container!") + return {} + try: + raw_current = container.pull(WORKLOAD_CONFIG_PATH).read() + return yaml.safe_load(raw_current) + except (ProtocolError, PathError) as e: + logger.warning( + "Could not check the current promtail configuration due to " + "a failure in retrieving the file: %s", + e, + ) + return {} + + def _promtail_config(self, container_name: str) -> dict: + """Generates the config file for Promtail. + + Reference: https://grafana.com/docs/loki/latest/send-data/promtail/configuration + """ + config = {"clients": self._clients_list()} + if self.insecure_skip_verify: + for client in config["clients"]: + client["tls_config"] = {"insecure_skip_verify": True} + + config.update(self._server_config(container_name)) + config.update(self._positions) + config.update(self._scrape_configs(container_name)) + return config + + def _clients_list(self) -> list: + """Generates a list of clients for use in the promtail config. + + Returns: + A list of endpoints + """ + return self.loki_endpoints + + def _server_config(self, container_name: str) -> dict: + """Generates the server section of the Promtail config file. + + Returns: + A dict representing the `server` section. + """ + return { + "server": { + "http_listen_port": self._promtails_ports[container_name]["http_listen_port"], + "grpc_listen_port": self._promtails_ports[container_name]["grpc_listen_port"], + } + } + + @property + def _positions(self) -> dict: + """Generates the positions section of the Promtail config file. + + Returns: + A dict representing the `positions` section. + """ + return {"positions": {"filename": WORKLOAD_POSITIONS_PATH}} + + def _scrape_configs(self, container_name: str) -> dict: + """Generates the scrape_configs section of the Promtail config file. + + Returns: + A dict representing the `scrape_configs` section. + """ + job_name = f"juju_{self.topology.identifier}" + + # The new JujuTopology doesn't include unit, but LogProxyConsumer should have it + common_labels = { + f"juju_{k}": v + for k, v in self.topology.as_dict(remapped_keys={"charm_name": "charm"}).items() + } + common_labels["container"] = container_name + scrape_configs = [] + + # Files config + labels = common_labels.copy() + labels.update( + { + "job": job_name, + "__path__": "", + } + ) + config = {"targets": ["localhost"], "labels": labels} + scrape_config = { + "job_name": "system", + "static_configs": self._generate_static_configs(config, container_name), + } + scrape_configs.append(scrape_config) + + # Syslog config + syslog_port = self._logs_scheme.get(container_name, {}).get("syslog-port") + if syslog_port: + relabel_mappings = [ + "severity", + "facility", + "hostname", + "app_name", + "proc_id", + "msg_id", + ] + syslog_labels = common_labels.copy() + syslog_labels.update({"job": f"{job_name}_syslog"}) + syslog_config = { + "job_name": "syslog", + "syslog": { + "listen_address": f"127.0.0.1:{syslog_port}", + "label_structured_data": True, + "labels": syslog_labels, + }, + "relabel_configs": [ + {"source_labels": [f"__syslog_message_{val}"], "target_label": val} + for val in relabel_mappings + ] + + [{"action": "labelmap", "regex": "__syslog_message_sd_(.+)"}], + } + scrape_configs.append(syslog_config) # type: ignore + + return {"scrape_configs": scrape_configs} + + def _generate_static_configs(self, config: dict, container_name: str) -> list: + """Generates static_configs section. + + Returns: + - a list of dictionaries representing static_configs section + """ + static_configs = [] + + for _file in self._logs_scheme.get(container_name, {}).get("log-files", []): + conf = deepcopy(config) + conf["labels"]["__path__"] = _file + static_configs.append(conf) + + return static_configs + + def _setup_promtail(self, container: Container) -> None: + # Use the first + relations = self._charm.model.relations[self._relation_name] + if len(relations) > 1: + logger.debug( + "Multiple log_proxy relations. Getting Promtail from application {}".format( + relations[0].app.name + ) + ) + relation = relations[0] + promtail_binaries = json.loads( + relation.data[relation.app].get("promtail_binary_zip_url", "{}") + ) + if not promtail_binaries: + return + + self._create_directories(container) + self._ensure_promtail_binary(promtail_binaries, container) + + container.push( + WORKLOAD_CONFIG_PATH, + yaml.safe_dump(self._promtail_config(container.name)), + make_dirs=True, + ) + + workload_binary_path = os.path.join( + WORKLOAD_BINARY_DIR, promtail_binaries[self._arch]["filename"] + ) + self._add_pebble_layer(workload_binary_path, container) + + if self._current_config(container).get("clients"): + try: + container.restart(WORKLOAD_SERVICE_NAME) + except ChangeError as e: + self.on.promtail_digest_error.emit(str(e)) + else: + self.on.log_proxy_endpoint_joined.emit() + else: + self.on.promtail_digest_error.emit("No promtail client endpoints available!") + + def _ensure_promtail_binary(self, promtail_binaries: dict, container: Container): + if self._is_promtail_installed(promtail_binaries[self._arch], container): + return + + try: + self._obtain_promtail(promtail_binaries[self._arch], container) + except HTTPError as e: + msg = f"Promtail binary couldn't be downloaded - {str(e)}" + logger.warning(msg) + self.on.promtail_digest_error.emit(msg) + + def _is_promtail_installed(self, promtail_info: dict, container: Container) -> bool: + """Determine if promtail has already been installed to the container. + + Args: + promtail_info: dictionary containing information about promtail binary + that must be used. The dictionary must at least contain a key + "filename" giving the name of promtail binary + container: container in which to check whether promtail is installed. + """ + workload_binary_path = f"{WORKLOAD_BINARY_DIR}/{promtail_info['filename']}" + try: + container.list_files(workload_binary_path) + except (APIError, FileNotFoundError): + return False + return True + + def _generate_promtails_ports(self, logs_scheme) -> dict: + return { + container: { + "http_listen_port": HTTP_LISTEN_PORT_START + 2 * i, + "grpc_listen_port": GRPC_LISTEN_PORT_START + 2 * i, + } + for i, container in enumerate(logs_scheme.keys()) + } + + def syslog_port(self, container_name: str) -> str: + """Gets the port on which promtail is listening for syslog in this container. + + Returns: + A str representing the port + """ + return str(self._logs_scheme.get(container_name, {}).get("syslog-port")) + + def rsyslog_config(self, container_name: str) -> str: + """Generates a config line for use with rsyslog. + + Returns: + The rsyslog config line as a string + """ + return 'action(type="omfwd" protocol="tcp" target="127.0.0.1" port="{}" Template="RSYSLOG_SyslogProtocol23Format" TCP_Framing="octet-counted")'.format( + self._logs_scheme.get(container_name, {}).get("syslog-port") + ) + + @property + def _containers(self) -> Dict[str, Container]: + return {cont: self._charm.unit.get_container(cont) for cont in self._logs_scheme.keys()} + + +class CosTool: + """Uses cos-tool to inject label matchers into alert rule expressions and validate rules.""" + + _path = None + _disabled = False + + def __init__(self, charm): + self._charm = charm + + @property + def path(self): + """Lazy lookup of the path of cos-tool.""" + if self._disabled: + return None + if not self._path: + self._path = self._get_tool_path() + if not self._path: + logger.debug("Skipping injection of juju topology as label matchers") + self._disabled = True + return self._path + + def apply_label_matchers(self, rules) -> dict: + """Will apply label matchers to the expression of all alerts in all supplied groups.""" + if not self.path: + return rules + for group in rules["groups"]: + rules_in_group = group.get("rules", []) + for rule in rules_in_group: + topology = {} + # if the user for some reason has provided juju_unit, we'll need to honor it + # in most cases, however, this will be empty + for label in [ + "juju_model", + "juju_model_uuid", + "juju_application", + "juju_charm", + "juju_unit", + ]: + if label in rule["labels"]: + topology[label] = rule["labels"][label] + + rule["expr"] = self.inject_label_matchers(rule["expr"], topology) + return rules + + def validate_alert_rules(self, rules: dict) -> Tuple[bool, str]: + """Will validate correctness of alert rules, returning a boolean and any errors.""" + if not self.path: + logger.debug("`cos-tool` unavailable. Not validating alert correctness.") + return True, "" + + with tempfile.TemporaryDirectory() as tmpdir: + rule_path = Path(tmpdir + "/validate_rule.yaml") + + # Smash "our" rules format into what upstream actually uses, which is more like: + # + # groups: + # - name: foo + # rules: + # - alert: SomeAlert + # expr: up + # - alert: OtherAlert + # expr: up + transformed_rules = {"groups": []} # type: ignore + for rule in rules["groups"]: + transformed_rules["groups"].append(rule) + + rule_path.write_text(yaml.dump(transformed_rules)) + args = [str(self.path), "--format", "logql", "validate", str(rule_path)] + # noinspection PyBroadException + try: + self._exec(args) + return True, "" + except subprocess.CalledProcessError as e: + logger.debug("Validating the rules failed: %s", e.output) + return False, ", ".join([line for line in e.output if "error validating" in line]) + + def inject_label_matchers(self, expression, topology) -> str: + """Add label matchers to an expression.""" + if not topology: + return expression + if not self.path: + logger.debug("`cos-tool` unavailable. Leaving expression unchanged: %s", expression) + return expression + args = [str(self.path), "--format", "logql", "transform"] + args.extend( + ["--label-matcher={}={}".format(key, value) for key, value in topology.items()] + ) + + args.extend(["{}".format(expression)]) + # noinspection PyBroadException + try: + return self._exec(args) + except subprocess.CalledProcessError as e: + logger.debug('Applying the expression failed: "%s", falling back to the original', e) + print('Applying the expression failed: "{}", falling back to the original'.format(e)) + return expression + + def _get_tool_path(self) -> Optional[Path]: + arch = platform.processor() + arch = "amd64" if arch == "x86_64" else arch + res = "cos-tool-{}".format(arch) + try: + path = Path(res).resolve() + path.chmod(0o777) + return path + except NotImplementedError: + logger.debug("System lacks support for chmod") + except FileNotFoundError: + logger.debug('Could not locate cos-tool at: "{}"'.format(res)) + return None + + def _exec(self, cmd) -> str: + result = subprocess.run(cmd, check=True, stdout=subprocess.PIPE) + output = result.stdout.decode("utf-8").strip() + return output diff --git a/libs/external/lib/charms/observability_libs/v0/juju_topology.py b/libs/external/lib/charms/observability_libs/v0/juju_topology.py new file mode 100644 index 00000000..a79e5d43 --- /dev/null +++ b/libs/external/lib/charms/observability_libs/v0/juju_topology.py @@ -0,0 +1,301 @@ +# Copyright 2022 Canonical Ltd. +# See LICENSE file for licensing details. +"""## Overview. + +This document explains how to use the `JujuTopology` class to +create and consume topology information from Juju in a consistent manner. + +The goal of the Juju topology is to uniquely identify a piece +of software running across any of your Juju-managed deployments. +This is achieved by combining the following four elements: + +- Model name +- Model UUID +- Application name +- Unit identifier + + +For a more in-depth description of the concept, as well as a +walk-through of it's use-case in observability, see +[this blog post](https://juju.is/blog/model-driven-observability-part-2-juju-topology-metrics) +on the Juju blog. + +## Library Usage + +This library may be used to create and consume `JujuTopology` objects. +The `JujuTopology` class provides three ways to create instances: + +### Using the `from_charm` method + +Enables instantiation by supplying the charm as an argument. When +creating topology objects for the current charm, this is the recommended +approach. + +```python +topology = JujuTopology.from_charm(self) +``` + +### Using the `from_dict` method + +Allows for instantion using a dictionary of relation data, like the +`scrape_metadata` from Prometheus or the labels of an alert rule. When +creating topology objects for remote charms, this is the recommended +approach. + +```python +scrape_metadata = json.loads(relation.data[relation.app].get("scrape_metadata", "{}")) +topology = JujuTopology.from_dict(scrape_metadata) +``` + +### Using the class constructor + +Enables instantiation using whatever values you want. While this +is useful in some very specific cases, this is almost certainly not +what you are looking for as setting these values manually may +result in observability metrics which do not uniquely identify a +charm in order to provide accurate usage reporting, alerting, +horizontal scaling, or other use cases. + +```python +topology = JujuTopology( + model="some-juju-model", + model_uuid="00000000-0000-0000-0000-000000000001", + application="fancy-juju-application", + unit="fancy-juju-application/0", + charm_name="fancy-juju-application-k8s", +) +``` + +""" +from collections import OrderedDict +from typing import Dict, List, Optional +from uuid import UUID + +# The unique Charmhub library identifier, never change it +LIBID = "bced1658f20f49d28b88f61f83c2d232" + +LIBAPI = 0 +LIBPATCH = 6 + + +class InvalidUUIDError(Exception): + """Invalid UUID was provided.""" + + def __init__(self, uuid: str): + self.message = "'{}' is not a valid UUID.".format(uuid) + super().__init__(self.message) + + +class JujuTopology: + """JujuTopology is used for storing, generating and formatting juju topology information. + + DEPRECATED: This class is deprecated. Use `pip install cosl` and + `from cosl.juju_topology import JujuTopology` instead. + """ + + def __init__( + self, + model: str, + model_uuid: str, + application: str, + unit: Optional[str] = None, + charm_name: Optional[str] = None, + ): + """Build a JujuTopology object. + + A `JujuTopology` object is used for storing and transforming + Juju topology information. This information is used to + annotate Prometheus scrape jobs and alert rules. Such + annotation when applied to scrape jobs helps in identifying + the source of the scrapped metrics. On the other hand when + applied to alert rules topology information ensures that + evaluation of alert expressions is restricted to the source + (charm) from which the alert rules were obtained. + + Args: + model: a string name of the Juju model + model_uuid: a globally unique string identifier for the Juju model + application: an application name as a string + unit: a unit name as a string + charm_name: name of charm as a string + """ + if not self.is_valid_uuid(model_uuid): + raise InvalidUUIDError(model_uuid) + + self._model = model + self._model_uuid = model_uuid + self._application = application + self._charm_name = charm_name + self._unit = unit + + def is_valid_uuid(self, uuid): + """Validate the supplied UUID against the Juju Model UUID pattern. + + Args: + uuid: string that needs to be checked if it is valid v4 UUID. + + Returns: + True if parameter is a valid v4 UUID, False otherwise. + """ + try: + return str(UUID(uuid, version=4)) == uuid + except (ValueError, TypeError): + return False + + @classmethod + def from_charm(cls, charm): + """Creates a JujuTopology instance by using the model data available on a charm object. + + Args: + charm: a `CharmBase` object for which the `JujuTopology` will be constructed + Returns: + a `JujuTopology` object. + """ + return cls( + model=charm.model.name, + model_uuid=charm.model.uuid, + application=charm.model.app.name, + unit=charm.model.unit.name, + charm_name=charm.meta.name, + ) + + @classmethod + def from_dict(cls, data: dict): + """Factory method for creating `JujuTopology` children from a dictionary. + + Args: + data: a dictionary with five keys providing topology information. The keys are + - "model" + - "model_uuid" + - "application" + - "unit" + - "charm_name" + `unit` and `charm_name` may be empty, but will result in more limited + labels. However, this allows us to support charms without workloads. + + Returns: + a `JujuTopology` object. + """ + return cls( + model=data["model"], + model_uuid=data["model_uuid"], + application=data["application"], + unit=data.get("unit", ""), + charm_name=data.get("charm_name", ""), + ) + + def as_dict( + self, + *, + remapped_keys: Optional[Dict[str, str]] = None, + excluded_keys: Optional[List[str]] = None, + ) -> OrderedDict: + """Format the topology information into an ordered dict. + + Keeping the dictionary ordered is important to be able to + compare dicts without having to resort to deep comparisons. + + Args: + remapped_keys: A dictionary mapping old key names to new key names, + which will be substituted when invoked. + excluded_keys: A list of key names to exclude from the returned dict. + uuid_length: The length to crop the UUID to. + """ + ret = OrderedDict( + [ + ("model", self.model), + ("model_uuid", self.model_uuid), + ("application", self.application), + ("unit", self.unit), + ("charm_name", self.charm_name), + ] + ) + if excluded_keys: + ret = OrderedDict({k: v for k, v in ret.items() if k not in excluded_keys}) + + if remapped_keys: + ret = OrderedDict( + (remapped_keys.get(k), v) if remapped_keys.get(k) else (k, v) for k, v in ret.items() # type: ignore + ) + + return ret + + @property + def identifier(self) -> str: + """Format the topology information into a terse string. + + This crops the model UUID, making it unsuitable for comparisons against + anything but other identifiers. Mainly to be used as a display name or file + name where long strings might become an issue. + + >>> JujuTopology( \ + model = "a-model", \ + model_uuid = "00000000-0000-4000-8000-000000000000", \ + application = "some-app", \ + unit = "some-app/1" \ + ).identifier + 'a-model_00000000_some-app' + """ + parts = self.as_dict( + excluded_keys=["unit", "charm_name"], + ) + + parts["model_uuid"] = self.model_uuid_short + values = parts.values() + + return "_".join([str(val) for val in values]).replace("/", "_") + + @property + def label_matcher_dict(self) -> Dict[str, str]: + """Format the topology information into a dict with keys having 'juju_' as prefix. + + Relabelled topology never includes the unit as it would then only match + the leader unit (ie. the unit that produced the dict). + """ + items = self.as_dict( + remapped_keys={"charm_name": "charm"}, + excluded_keys=["unit"], + ).items() + + return {"juju_{}".format(key): value for key, value in items if value} + + @property + def label_matchers(self) -> str: + """Format the topology information into a promql/logql label matcher string. + + Topology label matchers should never include the unit as it + would then only match the leader unit (ie. the unit that + produced the matchers). + """ + items = self.label_matcher_dict.items() + return ", ".join(['{}="{}"'.format(key, value) for key, value in items if value]) + + @property + def model(self) -> str: + """Getter for the juju model value.""" + return self._model + + @property + def model_uuid(self) -> str: + """Getter for the juju model uuid value.""" + return self._model_uuid + + @property + def model_uuid_short(self) -> str: + """Getter for the juju model value, truncated to the first eight letters.""" + return self._model_uuid[:8] + + @property + def application(self) -> str: + """Getter for the juju application value.""" + return self._application + + @property + def charm_name(self) -> Optional[str]: + """Getter for the juju charm name value.""" + return self._charm_name + + @property + def unit(self) -> Optional[str]: + """Getter for the juju unit value.""" + return self._unit diff --git a/ops-sunbeam/ops_sunbeam/test_utils.py b/ops-sunbeam/ops_sunbeam/test_utils.py index 7c006d47..c9577282 100644 --- a/ops-sunbeam/ops_sunbeam/test_utils.py +++ b/ops-sunbeam/ops_sunbeam/test_utils.py @@ -652,6 +652,7 @@ def get_harness( charm_metadata: str = None, container_calls: dict = None, charm_config: str = None, + charm_actions: str = None, initial_charm_config: dict = None, ) -> Harness: """Return a testing harness.""" @@ -759,7 +760,7 @@ def get_harness( with open(metadata_file) as f: charm_metadata = f.read() - harness = Harness(charm_class, meta=charm_metadata, config=charm_config) + harness = Harness(charm_class, meta=charm_metadata, config=charm_config, actions=charm_actions) harness._backend = _OSTestingModelBackend( harness._unit_name, harness._meta, harness._get_config(charm_config) ) diff --git a/zuul.d/jobs.yaml b/zuul.d/jobs.yaml index 5dafbad5..916f09d2 100644 --- a/zuul.d/jobs.yaml +++ b/zuul.d/jobs.yaml @@ -10,6 +10,18 @@ - rebuild vars: charm: keystone-k8s +- job: + name: charm-build-tempest-k8s + description: Build sunbeam tempest-k8s charm + run: playbooks/charm/build.yaml + timeout: 3600 + match-on-config-updates: false + files: + - ops-sunbeam/ops_sunbeam/* + - charms/tempest-k8s/* + - rebuild + vars: + charm: tempest-k8s - job: name: charm-build-glance-k8s description: Build sunbeam glance-k8s charm diff --git a/zuul.d/project-templates.yaml b/zuul.d/project-templates.yaml index d7398edf..0e5fcd6a 100644 --- a/zuul.d/project-templates.yaml +++ b/zuul.d/project-templates.yaml @@ -84,6 +84,8 @@ nodeset: ubuntu-jammy - charm-build-sunbeam-clusterd: nodeset: ubuntu-jammy + - charm-build-tempest-k8s: + nodeset: ubuntu-jammy gate: fail-fast: true jobs: @@ -135,6 +137,8 @@ nodeset: ubuntu-jammy - charm-build-sunbeam-clusterd: nodeset: ubuntu-jammy + - charm-build-tempest-k8s: + nodeset: ubuntu-jammy - project-template: name: charm-publish-jobs