Rewrite charm in operator framework

This is a major breaking change too.  In the process, also:

- move all processing from juju status to actions
  (run the actions to get data; the status line will be minimal)
- switch to COS integration, no longer legacy prometheus
  for the iperf benchmarks

It should be mostly feature parity with the original magpie charm,
but some things still need improving and iterating on,
such as the spec for data returned from actions,
and actual functional tests.

Change-Id: I289d4e7a0dd373c5c6f2471ab710e754c167ab8c
This commit is contained in:
Samuel Allan 2024-07-05 14:19:11 +09:30
parent 3f5e833adc
commit 5acbc4e5ba
No known key found for this signature in database
GPG Key ID: 622F8E99C893BD61
49 changed files with 3369 additions and 3280 deletions

17
.gitignore vendored
View File

@ -1,9 +1,10 @@
build
.tox
layers
interfaces
trusty
.testrepository
__pycache__
.stestr
venv/
build/
*.charm
.tox/
.coverage
cover/
__pycache__/
*.py[cod]
.idea
.vscode/

View File

@ -1,3 +0,0 @@
[DEFAULT]
test_path=./unit_tests
top_dir=./

View File

@ -1,5 +1,4 @@
- project:
templates:
- openstack-python3-charm-zed-jobs
- openstack-python3-charm-jobs
- openstack-cover-jobs

34
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,34 @@
# Contributing
To make contributions to this charm, you'll need a working [development setup](https://juju.is/docs/sdk/dev-setup).
You can create an environment for development with `tox`:
```shell
tox devenv -e integration
source venv/bin/activate
```
## Testing
This project uses `tox` for managing test environments. There are some pre-configured environments
that can be used for linting and formatting code when you're preparing contributions to the charm:
```shell
tox run -e format # update your code according to linting rules
tox run -e lint # code style
tox run -e static # static type checking
tox run -e unit # unit tests
tox run -e integration # integration tests
tox # runs 'format', 'lint', 'static', and 'unit' environments
```
## Build the charm
Build the charm in this git repository using:
```shell
charmcraft pack
```
<!-- You may want to include any contribution/style guidelines in this document>

View File

@ -187,7 +187,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Copyright 2023 Ubuntu
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

View File

@ -1 +0,0 @@
src/README.md

456
README.md Normal file
View File

@ -0,0 +1,456 @@
# Magpie
Magpie is a charm used for testing the networking of a Juju provider/substrate.
It provides tools for testing:
- DNS functionality
- network connectivity between nodes (iperf, ping)
- network benchmarking
- MTU
- local hostname lookup
## Usage
Deploy the charm to two or more units,
then run the provided actions to retrieve debug information about the nodes or run network diagnostic tests.
```
juju deploy magpie -n 3
juju actions magpie
juju run magpie/leader info
juju run magpie/leader ping
# etc.
```
Check the charm config before deploying for values you may wish to tweak,
and see the parameters accepted by each action.
## TODO: document each action and the expected results
## Network spaces
If you use network spaces in your Juju deployment (as you should) use
`--bind '<space-name> magpie=<space-name>'` to force magpie to test that
particular network space.
It is possible to deploy several magpie charms
(as different Juju applications) to the same server each in a different
network space.
Example:
```
juju deploy magpie magpie-space1 --bind "space1 magpie=space1" -n 5 --to 0,2,1,4,3
juju deploy magpie magpie-space2 --bind "space2 magpie=space2" -n 3 --to 3,2,0
juju deploy magpie magpie-space3 --bind "space3 magpie=space3" -n 4 --to 3,2,1,0
juju deploy magpie magpie-space4 --bind "space4 magpie=space4" -n 4 --to 3,2,1,0
```
## Benchmarking network with iperf and grafana
Assumes juju 3.1
Step 1, deploy COS:
```
# Deploy COS on microk8s.
# https://charmhub.io/topics/canonical-observability-stack/tutorials/install-microk8s
juju bootstrap microk8s microk8s
juju add-model cos
juju deploy cos-lite
# Expose the endpoints for the magpie model to consume.
juju offer grafana:grafana-dashboard
juju offer prometheus:receive-remote-write
```
Step 2, deploy magpie and relate to COS
```
juju switch <controller for cloud to be benchmarked>
juju add-model magpie
juju consume microk8s:cos.prometheus
juju consume microk8s:cos.grafana
# adjust as required
juju deploy magpie -n 3
juju deploy ./magpie_ubuntu-22.04-amd64.charm -n 3
juju deploy grafana-agent --channel edge
juju relate magpie grafana-agent
juju relate grafana-agent prometheus
juju relate grafana-agent grafana
```
Step 3, run the iperf action and view results in grafana:
```
# adjust as needed
juju run magpie/0 iperf
# you may wish to run against one unit pair at a time:
juju run magpie/0 iperf units=magpie/1
juju run magpie/0 iperf units=magpie/2
# etc.
```
Obtain details to access grafana from COS:
```
juju show-unit -m microk8s:cos catalogue/0 --format json | jq -r '.["catalogue/0"]."relation-info"[] | select(."application-data".name == "Grafana") | ."application-data".url'
juju config -m microk8s:cos grafana admin_user
juju run -m microk8s:cos grafana/0 get-admin-password
```
Find the dashboard titled "Magpie Network Benchmarking",
and limit the time range as required.
## Bonded links testing and troubleshooting
Network bonding enables the combination of two or more network interfaces into a single-bonded
(logical) interface, which increases the bandwidth and provides redundancy. While Magpie does some
sanity checks and could reveal some configuration problems, this part of README contains some
advanced troubleshooting information, which might be useful, while identifying and fixing the issue.
There are six bonding modes:
### `balance-rr`
Round-robin policy: Transmit packets in sequential order from the first available slave through the
last. This mode provides load balancing and fault tolerance.
### `active-backup`
Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and
only if, the active slave fails. The bond's MAC address is externally visible on only one port
(network adapter) to avoid confusing the switch. This mode provides fault tolerance. The primary
option affects the behavior of this mode.
### `balance-xor`
XOR policy: Transmit based on selectable hashing algorithm. The default policy is a simple
source+destination MAC address algorithm. Alternate transmit policies may be selected via the
`xmit_hash_policy` option, described below. This mode provides load balancing and fault tolerance.
### `broadcast`
Broadcast policy: transmits everything on all slave interfaces. This mode provides fault tolerance.
### `802.3ad` (LACP)
Link Aggregation Control Protocol (IEEE 802.3ad LACP) is a control protocol that automatically
detects multiple links between two LACP enabled devices and configures them to use their maximum
possible bandwidth by automatically trunking the links together. This mode has a prerequisite -
the switch(es) ports should have LACP configured and enabled.
### `balance-tlb`
Adaptive transmit load balancing: channel bonding that does not require any special switch support.
The outgoing traffic is distributed according to the current load (computed relative to the speed)
on each slave. Incoming traffic is received by the current slave. If the receiving slave fails,
another slave takes over the MAC address of the failed receiving slave.
### `balance-alb`
Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic,
and does not require any special switch support. The receive load balancing is achieved by ARP
negotiation.
The most commonly used modes are `active-backup` and `802.3ad` (LACP), and while active-backup
does not require any third party configuration, it has its own cons - for example, it can't multiply
the total bandwidth of the link, while 802.3ad-based bond could utilize all bond members, therefore
multiplying the bandwidth. However, in order to get a fully working LACP link, an appropriate
configuration has to be done both on the actor (link initiator) and partner (switch) side. Any
misconfiguration could lead to the link loss or instability, therefore it's very important to have
correct settings applied to the both sides of the link.
A quick overview of the LACP link status could be obtained by reading the
`/proc/net/bonding/<bond_name>` file.
```
$ sudo cat /proc/net/bonding/bondM
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 82:23:80:a1:a9:d3
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 15
Partner Key: 201
Partner Mac Address: 02:01:00:00:01:01
Slave Interface: eno3
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 3c:ec:ef:19:eb:30
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 82:23:80:a1:a9:d3
port key: 15
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 65534
system mac address: 02:01:00:00:01:01
oper key: 201
port priority: 1
port number: 12
port state: 63
Slave Interface: eno1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 3c:ec:ef:19:eb:2e
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 82:23:80:a1:a9:d3
port key: 15
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 65534
system mac address: 02:01:00:00:01:01
oper key: 201
port priority: 1
port number: 1012
port state: 63
```
The key things an operator should take a look at is:
- LACP rate
- Actor Churn State
- Partner Churn State
- Port State
### LACP rate
The Link Aggregation Control Protocol (LACP) provides a standardized means for exchanging
information between Partner Systems on a link to allow their Link Aggregation Control instances to
reach agreement on the identity of the LAG to which the link belongs, move the link to that LAG, and
enable its transmission and reception functions in an orderly manner. The protocol depends upon the
transmission of information and state, rather than the transmission of commands. LACPDUs (LACP Data
Unit) sent by the first party (the Actor) convey to the second party (the Actors protocol Partner)
what the Actor knows, both about its own state and that of the Partner.
Periodic transmission of LACPDUs occurs if the LACP Activity control of either the Actor or the
Partner is Active LACP. These periodic transmissions will occur at either a slow or fast
transmission rate depending upon the expressed LACP_Timeout preference (Long Timeout or Short
Timeout) of the Partner System.
### Actor/Partner Churn State
In general, "Churned" port status means that the parties are unable to reach agreement upon the
desired state of a link. Under normal operation of the protocol, such a resolution would be reached
very rapidly; continued failure to reach agreement can be symptomatic of component failure, of the
presence of non-standard devices on the link concerned, or of mis-configuration. Hence, detection of
such failures is signalled by the Churn Detection algorithm to the operator in order to prompt
administrative action to further resolution.
### Port State
Both of the Actor and Partner state are variables, encoded as individual bits within a single octet,
as follows.
0) LACP_Activity: Device intends to transmit periodically in order to find potential
members for the aggregate. Active LACP is encoded as a 1; Passive LACP as a 0.
1) LACP_Timeout: This flag indicates the Timeout control value with regard to this link. Short
Timeout is encoded as a 1; Long Timeout as a 0.
2) Aggregability: This flag indicates that the system considers this link to be Aggregateable; i.e.,
a potential candidate for aggregation. If FALSE (encoded as a 0), the link is considered to be
Individual; i.e., this link can be operated only as an individual link. Aggregatable is encoded as a
1; Individual is encoded as a 0.
3) Synchronization: Indicates that the bond on the transmitting machine is in sync with whats being
advertised in the LACP frames, meaning the link has been allocated to the correct LAG, the group has
been associated with a compatible Aggregator, and the identity of the LAG is consistent with the
System ID and operational Key information transmitted. "In Sync" is encoded as a 1; "Out of sync" is
encoded as a 0.
4) Collecting: Bond is accepting traffic received on this port, collection of incoming frames on
this link is definitely enabled and is not expected to be disabled in the absence of administrative
changes or changes in received protocol information. True is encoded as a 1; False is encoded as a
0.
5) Distributing: Bond is sending traffic using these ports encoded. Same as above, but for egress
traffic. True is encoded as a 1; False is encoded as a 0.
6) Defaulted: Determines, whether the receiving bond is using default (administratively defined)
parameters, if the information was received in an LACP PDU. Default settings are encoded as a 1,
LACP PDU is encoded as 0.
7) Expired: Is the bond in the expired state. Yes encoded as a 1, No encoded as a 0.
In the example output above, both of the port states are equal to 63. Let's decode:
```
$ python3
Python 3.8.4 (default, Jul 17 2020, 15:44:37)
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> bin(63)
'0b111111'
```
Reading right to the left:
LACP Activity: Active
LACP Timeout: Short
Aggregability: Link is Aggregatable
Synchronization: Link in sync
Collecting: True - bond is accepting the traffic
Distributing: True - bond is sending the traffic
Defaulted: Info received from LACP PDU
Expired: False - link is not expired
The above status represents the **fully healthy bond** without any LACP-related issues. Also, for
the operators' convenience, the [lacp_decoder.py](src/tools/lacp_decoder.py) script could be used to
quickly convert the status to some human-friendly format.
However, the situations where one of the links is misconfigured are happening too, so let's assume
we have the following:
```
$ sudo cat /proc/net/bonding/bondm
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: b4:96:91:6d:20:fc
Active Aggregator Info:
Aggregator ID: 2
Number of ports: 1
Actor Key: 9
Partner Key: 32784
Partner Mac Address: 00:23:04:ee:be:66
Slave Interface: enp197s0f2
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b4:96:91:6d:20:fe
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: churned
Partner Churn State: none
Actor Churned Count: 1
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: b4:96:91:6d:20:fc
port key: 7
port priority: 255
port number: 1
port state: 7
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:66
oper key: 32784
port priority: 32768
port number: 16661
port state: 13
Slave Interface: enp197s0f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b4:96:91:6d:20:fc
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: b4:96:91:6d:20:fc
port key: 9
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:66
oper key: 32784
port priority: 32768
port number: 277
port state: 63
```
As we could see, one of the links has different port states for both partner and actor, while the second
one has 63 for both - meaning, the first one is problematic and we'd need to dive more into this
problem.
Let's decode both of the statuses, using the mentioned script:
```
$ python ./lacp-decoder.py 7 13
(Equal for both ports) LACP Activity: Active LACP
LACP Timeout: Short (Port 1) / Long (Port 2)
(Equal for both ports) Aggregability: Aggregatable
Synchronization: Link out of sync (Port 1) / Link in sync (Port 2)
(Equal for both ports) Collecting: Ingress traffic: Rejecting
(Equal for both ports) Distributing: Egress traffic: Not sending
(Equal for both ports) Is Defaulted: Settings are received from LACP PDU
(Equal for both ports) Link Expiration: No
```
The above output means that there are two differences between these statuses: LACP Timeout and
Synchronization. That means two things:
1) the Partner side (a switch side in most of the cases) has incorrectly configured LACP timeout
control. To resolve this, an operator has to either change the LACP rate from the Actor (e.g a
server) side to "Slow", or adjust the Partner (e.g switch) LACP rate to "Fast".
2) the Partner side considers this physical link as a part of a different link aggregation group. The
switch config has to be revisited and link aggregation group members need to be verified again,
ensuring there is no extra or wrong links configured as part of the single LAG.
After addressing the above issues, the port state will change to 63, which means "LACP link is fully
functional".
# Bugs
Please report bugs on [Launchpad](https://bugs.launchpad.net/charm-magpie/+filebug).
For general questions please refer to the OpenStack [Charm Guide](https://docs.openstack.org/charm-guide/latest/).

89
actions.yaml Normal file
View File

@ -0,0 +1,89 @@
iperf:
description: |
Run iperf
params:
units:
default: ""
type: string
description: Space separated list of units. If empty string, will run against all peer units.
batch-time:
type: integer
default: 10
description: |
Maps to iperf -t option, time in seconds to transmit traffic
concurrency-progression:
type: string
default: "2 4 8"
description: |
Space separated list of concurrencies to use. An equal amount of time will be spent on each concurrency.
total-run-time:
type: integer
default: 600
description: |
Total run time for iperf test in seconds, per target unit.
min-speed:
default: "0"
description: |
Minimum transfer speed in integer mbit/s required to pass the test. "0" disables.
This can also be set to an integer percentage value (eg. "80%"),
which will be interpreted as a percentage of the link speed.
Useful in mixed link speed environments.
Likewise, "0%" disables.
type: string
info:
description: |
Retrieve all the information and data about the node as json data.
params:
required-mtu:
default: 0
type: integer
description: Desired MTU for all nodes - warn if the unit MTU is different (accounting for encapsulation). 0 disables mtu match checking.
bonds-to-check:
default: AUTO
description: Comma separated list of expected bonds or AUTO to check all available bonds.
type: string
lacp-passive-mode:
default: false
description: Set to true if switches are in LACP passive mode.
type: boolean
ping:
description: |
Ping each of the related magpie units and return the results.
params:
timeout:
default: 2
description: Timeout in seconds per ICMP request
type: integer
tries:
default: 20
description: Number of ICMP packets per ping
type: integer
interval:
default: 0.05
description: Number of seconds to wait between sending each packet
type: number
minimum: 0
required-mtu:
default: 0
type: integer
description: Desired MTU for all nodes - warn if the unit MTU is different (accounting for encapsulation). 0 disables mtu match checking.
dns:
description: |
Run dns checks against all peer nodes
params:
server:
default: ""
description: Provide a custom dns server. Uses unit default DNS server by default.
type: string
tries:
default: 1
description: Number of DNS resolution attempts per query
type: integer
timeout:
default: 5
description: Timeout in seconds per DNS query try
type: integer

View File

@ -1,4 +0,0 @@
libffi-dev [platform:dpkg]
libpq-dev [platform:dpkg]
libxml2-dev [platform:dpkg]
libxslt1-dev [platform:dpkg]

View File

@ -1,113 +1,11 @@
type: charm
# This file configures Charmcraft.
# See https://juju.is/docs/sdk/charmcraft-config for guidance.
parts:
charm:
source: src/
plugin: reactive
reactive-charm-build-arguments:
- --binary-wheels-from-source
- --verbose
build-packages:
- libpython3-dev
build-snaps:
- charm
build-environment:
- CHARM_INTERFACES_DIR: $CRAFT_PROJECT_DIR/interfaces/
- CHARM_LAYERS_DIR: $CRAFT_PROJECT_DIR/layers/
type: charm
bases:
- build-on:
- name: ubuntu
channel: "20.04"
architectures: [amd64]
- name: ubuntu
channel: "22.04"
run-on:
- name: ubuntu
channel: "20.04"
architectures: [amd64]
- build-on:
- name: ubuntu
channel: "20.04"
architectures: [s390x]
run-on:
- name: ubuntu
channel: "20.04"
architectures: [s390x]
- build-on:
- name: ubuntu
channel: "20.04"
architectures: [ppc64el]
run-on:
- name: ubuntu
channel: "20.04"
architectures: [ppc64el]
- build-on:
- name: ubuntu
channel: "20.04"
architectures: [arm64]
run-on:
- name: ubuntu
channel: "20.04"
architectures: [arm64]
- build-on:
- name: ubuntu
channel: "22.04"
architectures: [amd64]
run-on:
- name: ubuntu
channel: "22.04"
architectures: [amd64]
- build-on:
- name: ubuntu
channel: "22.04"
architectures: [s390x]
run-on:
- name: ubuntu
channel: "22.04"
architectures: [s390x]
- build-on:
- name: ubuntu
channel: "22.04"
architectures: [ppc64el]
run-on:
- name: ubuntu
channel: "22.04"
architectures: [ppc64el]
- build-on:
- name: ubuntu
channel: "22.04"
architectures: [arm64]
run-on:
- name: ubuntu
channel: "22.04"
architectures: [arm64]
- build-on:
- name: ubuntu
channel: "23.10"
architectures: [amd64]
run-on:
- name: ubuntu
channel: "23.10"
architectures: [amd64]
- build-on:
- name: ubuntu
channel: "23.10"
architectures: [s390x]
run-on:
- name: ubuntu
channel: "23.10"
architectures: [s390x]
- build-on:
- name: ubuntu
channel: "23.10"
architectures: [ppc64el]
run-on:
- name: ubuntu
channel: "23.10"
architectures: [ppc64el]
- build-on:
- name: ubuntu
channel: "23.10"
architectures: [arm64]
run-on:
- name: ubuntu
channel: "23.10"
architectures: [arm64]
- name: ubuntu
channel: "22.04"

5
config.yaml Normal file
View File

@ -0,0 +1,5 @@
options:
iperf_listen_cidr:
default: ""
type: string
description: Network cidr to use for iperf listener. Changing this option will only take effect on a new deployment.

View File

@ -1,54 +0,0 @@
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
<svg version="1.0" xmlns="http://www.w3.org/2000/svg"
width="96.000000pt" height="96.000000pt" viewBox="0 0 96.000000 96.000000"
preserveAspectRatio="xMidYMid meet">
<metadata>
Created by potrace 1.10, written by Peter Selinger 2001-2011
</metadata>
<g transform="translate(0.000000,96.000000) scale(0.100000,-0.100000)"
fill="#000000" stroke="none">
<path d="M62 913 c-15 -16 -18 -20 -6 -10 12 10 34 22 50 27 25 7 26 8 7 9
-13 1 -35 -11 -51 -26z"/>
<path d="M826 929 c59 -17 104 -84 104 -156 0 -13 5 -23 10 -23 15 0 12 65 -6
110 -20 49 -61 80 -106 79 l-33 -1 31 -9z"/>
<path d="M625 822 c-56 -2 -76 -8 -103 -29 -100 -76 -92 -236 16 -304 125 -79
282 8 282 157 0 54 -14 91 -48 127 -40 43 -69 52 -147 49z m0 -31 c-3 -5 -6
-32 -6 -60 -2 -49 -2 -50 -23 -36 -27 20 -39 19 -67 -3 -19 -16 -22 -27 -20
-71 2 -56 -17 -68 -30 -20 -17 64 22 160 76 187 25 13 78 15 70 3z m72 -13 c6
-7 11 -30 11 -51 1 -21 4 -42 7 -48 11 -17 25 1 25 32 0 16 5 29 10 29 12 0
13 -50 1 -68 -4 -8 -19 -12 -32 -10 -21 3 -24 9 -27 56 -3 44 -6 53 -20 50
-14 -3 -17 -17 -18 -96 -2 -106 -6 -122 -32 -122 -29 0 -42 22 -40 69 1 44 -4
57 -21 46 -6 -3 -11 -35 -11 -71 0 -38 -4 -64 -10 -64 -7 0 -10 27 -8 73 l3
72 30 0 c29 0 30 -1 33 -53 3 -44 6 -53 20 -50 14 3 17 18 20 98 3 111 5 120
30 120 11 0 24 -6 29 -12z m48 -7 c3 -5 1 -12 -4 -15 -5 -3 -11 1 -15 9 -6 16
9 21 19 6z m50 -130 c-9 -111 -71 -161 -190 -153 -47 3 -61 16 -44 40 13 17
17 18 33 7 39 -29 93 25 84 84 -4 29 4 41 16 23 10 -16 65 0 81 24 8 13 18 24
20 24 3 0 3 -22 0 -49z"/>
<path d="M86 739 c-23 -18 -41 -26 -69 -28 -5 -1 -5 -5 -2 -11 3 -5 21 -10 39
-10 49 0 56 -14 56 -107 0 -63 5 -93 19 -122 23 -44 74 -91 119 -110 18 -7 32
-20 32 -27 0 -27 -44 -103 -54 -95 -6 4 -18 1 -27 -9 -14 -15 -10 -16 58 -16
68 0 90 6 40 12 -12 1 -25 3 -29 3 -5 1 -8 8 -8 16 0 12 6 12 48 -1 70 -23
106 -29 98 -15 -4 6 -18 11 -32 11 -19 0 -24 4 -20 18 3 9 10 36 16 59 9 38
14 42 49 48 48 8 132 -13 243 -61 112 -48 167 -65 197 -57 16 4 22 2 17 -6 -5
-8 0 -8 17 -2 14 6 29 6 38 0 11 -6 8 -9 -11 -10 -17 -1 -19 -3 -7 -6 14 -4
16 -12 11 -46 -15 -101 -51 -132 -177 -153 l-82 -13 85 5 c50 3 99 13 120 23
42 20 76 84 83 153 5 49 2 52 -48 64 -23 5 -185 67 -255 96 -61 26 -93 49
-170 122 -52 49 -125 111 -161 138 -38 28 -72 62 -79 79 -17 40 -66 79 -99 79
-16 0 -40 -9 -55 -21z m44 -24 c-8 -9 -9 -15 -2 -15 6 0 14 5 17 10 13 21 35
9 65 -36 35 -49 34 -55 -6 -60 -14 -1 -34 -10 -45 -20 -18 -17 -19 -16 -19 22
-1 21 -7 50 -14 63 -14 26 -11 51 6 51 6 0 5 -6 -2 -15z m103 -199 c25 -53 43
-78 63 -88 16 -7 30 -22 32 -32 9 -43 -64 -49 -117 -9 -69 52 -87 89 -67 136
11 28 41 67 51 67 1 0 19 -33 38 -74z m115 22 c94 -46 112 -63 55 -52 -49 9
-142 56 -168 84 l-20 23 25 -8 c14 -4 62 -25 108 -47z m70 -88 c23 0 43 -10
68 -33 l37 -33 -59 0 c-33 1 -69 -3 -81 -7 -19 -7 -22 -4 -25 20 -2 19 -14 35
-40 52 -59 38 -48 55 15 26 28 -14 67 -25 85 -25z m-73 -151 l0 -44 -37 0
c-21 0 -38 1 -38 3 0 2 9 17 21 33 11 16 18 33 15 38 -7 10 13 21 29 17 6 -2
10 -23 10 -47z"/>
<path d="M168 223 c6 -2 18 -2 25 0 6 3 1 5 -13 5 -14 0 -19 -2 -12 -5z"/>
<path d="M1 174 c0 -11 3 -14 6 -6 3 7 2 16 -1 19 -3 4 -6 -2 -5 -13z"/>
<path d="M45 61 c28 -33 69 -48 150 -55 l70 -5 -75 13 c-88 15 -107 22 -140
50 l-25 21 20 -24z"/>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 3.3 KiB

View File

@ -0,0 +1,842 @@
# Copyright 2023 Canonical Ltd.
# See LICENSE file for licensing details.
r"""## Overview.
This library can be used to manage the cos_agent relation interface:
- `COSAgentProvider`: Use in machine charms that need to have a workload's metrics
or logs scraped, or forward rule files or dashboards to Prometheus, Loki or Grafana through
the Grafana Agent machine charm.
- `COSAgentConsumer`: Used in the Grafana Agent machine charm to manage the requirer side of
the `cos_agent` interface.
## COSAgentProvider Library Usage
Grafana Agent machine Charmed Operator interacts with its clients using the cos_agent library.
Charms seeking to send telemetry, must do so using the `COSAgentProvider` object from
this charm library.
Using the `COSAgentProvider` object only requires instantiating it,
typically in the `__init__` method of your charm (the one which sends telemetry).
The constructor of `COSAgentProvider` has only one required and nine optional parameters:
```python
def __init__(
self,
charm: CharmType,
relation_name: str = DEFAULT_RELATION_NAME,
metrics_endpoints: Optional[List[_MetricsEndpointDict]] = None,
metrics_rules_dir: str = "./src/prometheus_alert_rules",
logs_rules_dir: str = "./src/loki_alert_rules",
recurse_rules_dirs: bool = False,
log_slots: Optional[List[str]] = None,
dashboard_dirs: Optional[List[str]] = None,
refresh_events: Optional[List] = None,
scrape_configs: Optional[Union[List[Dict], Callable]] = None,
):
```
### Parameters
- `charm`: The instance of the charm that instantiates `COSAgentProvider`, typically `self`.
- `relation_name`: If your charmed operator uses a relation name other than `cos-agent` to use
the `cos_agent` interface, this is where you have to specify that.
- `metrics_endpoints`: In this parameter you can specify the metrics endpoints that Grafana Agent
machine Charmed Operator will scrape. The configs of this list will be merged with the configs
from `scrape_configs`.
- `metrics_rules_dir`: The directory in which the Charmed Operator stores its metrics alert rules
files.
- `logs_rules_dir`: The directory in which the Charmed Operator stores its logs alert rules files.
- `recurse_rules_dirs`: This parameters set whether Grafana Agent machine Charmed Operator has to
search alert rules files recursively in the previous two directories or not.
- `log_slots`: Snap slots to connect to for scraping logs in the form ["snap-name:slot", ...].
- `dashboard_dirs`: List of directories where the dashboards are stored in the Charmed Operator.
- `refresh_events`: List of events on which to refresh relation data.
- `scrape_configs`: List of standard scrape_configs dicts or a callable that returns the list in
case the configs need to be generated dynamically. The contents of this list will be merged
with the configs from `metrics_endpoints`.
### Example 1 - Minimal instrumentation:
In order to use this object the following should be in the `charm.py` file.
```python
from charms.grafana_agent.v0.cos_agent import COSAgentProvider
...
class TelemetryProviderCharm(CharmBase):
def __init__(self, *args):
...
self._grafana_agent = COSAgentProvider(self)
```
### Example 2 - Full instrumentation:
In order to use this object the following should be in the `charm.py` file.
```python
from charms.grafana_agent.v0.cos_agent import COSAgentProvider
...
class TelemetryProviderCharm(CharmBase):
def __init__(self, *args):
...
self._grafana_agent = COSAgentProvider(
self,
relation_name="custom-cos-agent",
metrics_endpoints=[
# specify "path" and "port" to scrape from localhost
{"path": "/metrics", "port": 9000},
{"path": "/metrics", "port": 9001},
{"path": "/metrics", "port": 9002},
],
metrics_rules_dir="./src/alert_rules/prometheus",
logs_rules_dir="./src/alert_rules/loki",
recursive_rules_dir=True,
log_slots=["my-app:slot"],
dashboard_dirs=["./src/dashboards_1", "./src/dashboards_2"],
refresh_events=["update-status", "upgrade-charm"],
scrape_configs=[
{
"job_name": "custom_job",
"metrics_path": "/metrics",
"authorization": {"credentials": "bearer-token"},
"static_configs": [
{
"targets": ["localhost:9003"]},
"labels": {"key": "value"},
},
],
},
]
)
```
### Example 3 - Dynamic scrape configs generation:
Pass a function to the `scrape_configs` to decouple the generation of the configs
from the instantiation of the COSAgentProvider object.
```python
from charms.grafana_agent.v0.cos_agent import COSAgentProvider
...
class TelemetryProviderCharm(CharmBase):
def generate_scrape_configs(self):
return [
{
"job_name": "custom",
"metrics_path": "/metrics",
"static_configs": [{"targets": ["localhost:9000"]}],
},
]
def __init__(self, *args):
...
self._grafana_agent = COSAgentProvider(
self,
scrape_configs=self.generate_scrape_configs,
)
```
## COSAgentConsumer Library Usage
This object may be used by any Charmed Operator which gathers telemetry data by
implementing the consumer side of the `cos_agent` interface.
For instance Grafana Agent machine Charmed Operator.
For this purpose the charm needs to instantiate the `COSAgentConsumer` object with one mandatory
and two optional arguments.
### Parameters
- `charm`: A reference to the parent (Grafana Agent machine) charm.
- `relation_name`: The name of the relation that the charm uses to interact
with its clients that provides telemetry data using the `COSAgentProvider` object.
If provided, this relation name must match a provided relation in metadata.yaml with the
`cos_agent` interface.
The default value of this argument is "cos-agent".
- `refresh_events`: List of events on which to refresh relation data.
### Example 1 - Minimal instrumentation:
In order to use this object the following should be in the `charm.py` file.
```python
from charms.grafana_agent.v0.cos_agent import COSAgentConsumer
...
class GrafanaAgentMachineCharm(GrafanaAgentCharm)
def __init__(self, *args):
...
self._cos = COSAgentRequirer(self)
```
### Example 2 - Full instrumentation:
In order to use this object the following should be in the `charm.py` file.
```python
from charms.grafana_agent.v0.cos_agent import COSAgentConsumer
...
class GrafanaAgentMachineCharm(GrafanaAgentCharm)
def __init__(self, *args):
...
self._cos = COSAgentRequirer(
self,
relation_name="cos-agent-consumer",
refresh_events=["update-status", "upgrade-charm"],
)
```
"""
import base64
import json
import logging
import lzma
from collections import namedtuple
from itertools import chain
from pathlib import Path
from typing import TYPE_CHECKING, Any, Callable, ClassVar, Dict, List, Optional, Set, Union
import pydantic
from cosl import JujuTopology
from cosl.rules import AlertRules
from ops.charm import RelationChangedEvent
from ops.framework import EventBase, EventSource, Object, ObjectEvents
from ops.model import Relation, Unit
from ops.testing import CharmType
if TYPE_CHECKING:
try:
from typing import TypedDict
class _MetricsEndpointDict(TypedDict):
path: str
port: int
except ModuleNotFoundError:
_MetricsEndpointDict = Dict # pyright: ignore
LIBID = "dc15fa84cef84ce58155fb84f6c6213a"
LIBAPI = 0
LIBPATCH = 6
PYDEPS = ["cosl", "pydantic < 2"]
DEFAULT_RELATION_NAME = "cos-agent"
DEFAULT_PEER_RELATION_NAME = "peers"
DEFAULT_SCRAPE_CONFIG = {
"static_configs": [{"targets": ["localhost:80"]}],
"metrics_path": "/metrics",
}
logger = logging.getLogger(__name__)
SnapEndpoint = namedtuple("SnapEndpoint", "owner, name")
class GrafanaDashboard(str):
"""Grafana Dashboard encoded json; lzma-compressed."""
# TODO Replace this with a custom type when pydantic v2 released (end of 2023 Q1?)
# https://github.com/pydantic/pydantic/issues/4887
@staticmethod
def _serialize(raw_json: Union[str, bytes]) -> "GrafanaDashboard":
if not isinstance(raw_json, bytes):
raw_json = raw_json.encode("utf-8")
encoded = base64.b64encode(lzma.compress(raw_json)).decode("utf-8")
return GrafanaDashboard(encoded)
def _deserialize(self) -> Dict:
try:
raw = lzma.decompress(base64.b64decode(self.encode("utf-8"))).decode()
return json.loads(raw)
except json.decoder.JSONDecodeError as e:
logger.error("Invalid Dashboard format: %s", e)
return {}
def __repr__(self):
"""Return string representation of self."""
return "<GrafanaDashboard>"
class CosAgentProviderUnitData(pydantic.BaseModel):
"""Unit databag model for `cos-agent` relation."""
# The following entries are the same for all units of the same principal.
# Note that the same grafana agent subordinate may be related to several apps.
# this needs to make its way to the gagent leader
metrics_alert_rules: dict
log_alert_rules: dict
dashboards: List[GrafanaDashboard]
subordinate: Optional[bool]
# The following entries may vary across units of the same principal app.
# this data does not need to be forwarded to the gagent leader
metrics_scrape_jobs: List[Dict]
log_slots: List[str]
# when this whole datastructure is dumped into a databag, it will be nested under this key.
# while not strictly necessary (we could have it 'flattened out' into the databag),
# this simplifies working with the model.
KEY: ClassVar[str] = "config"
class CosAgentPeersUnitData(pydantic.BaseModel):
"""Unit databag model for `peers` cos-agent machine charm peer relation."""
# We need the principal unit name and relation metadata to be able to render identifiers
# (e.g. topology) on the leader side, after all the data moves into peer data (the grafana
# agent leader can only see its own principal, because it is a subordinate charm).
principal_unit_name: str
principal_relation_id: str
principal_relation_name: str
# The only data that is forwarded to the leader is data that needs to go into the app databags
# of the outgoing o11y relations.
metrics_alert_rules: Optional[dict]
log_alert_rules: Optional[dict]
dashboards: Optional[List[GrafanaDashboard]]
# when this whole datastructure is dumped into a databag, it will be nested under this key.
# while not strictly necessary (we could have it 'flattened out' into the databag),
# this simplifies working with the model.
KEY: ClassVar[str] = "config"
@property
def app_name(self) -> str:
"""Parse out the app name from the unit name.
TODO: Switch to using `model_post_init` when pydantic v2 is released?
https://github.com/pydantic/pydantic/issues/1729#issuecomment-1300576214
"""
return self.principal_unit_name.split("/")[0]
class COSAgentProvider(Object):
"""Integration endpoint wrapper for the provider side of the cos_agent interface."""
def __init__(
self,
charm: CharmType,
relation_name: str = DEFAULT_RELATION_NAME,
metrics_endpoints: Optional[List["_MetricsEndpointDict"]] = None,
metrics_rules_dir: str = "./src/prometheus_alert_rules",
logs_rules_dir: str = "./src/loki_alert_rules",
recurse_rules_dirs: bool = False,
log_slots: Optional[List[str]] = None,
dashboard_dirs: Optional[List[str]] = None,
refresh_events: Optional[List] = None,
*,
scrape_configs: Optional[Union[List[dict], Callable]] = None,
):
"""Create a COSAgentProvider instance.
Args:
charm: The `CharmBase` instance that is instantiating this object.
relation_name: The name of the relation to communicate over.
metrics_endpoints: List of endpoints in the form [{"path": path, "port": port}, ...].
This argument is a simplified form of the `scrape_configs`.
The contents of this list will be merged with the contents of `scrape_configs`.
metrics_rules_dir: Directory where the metrics rules are stored.
logs_rules_dir: Directory where the logs rules are stored.
recurse_rules_dirs: Whether to recurse into rule paths.
log_slots: Snap slots to connect to for scraping logs
in the form ["snap-name:slot", ...].
dashboard_dirs: Directory where the dashboards are stored.
refresh_events: List of events on which to refresh relation data.
scrape_configs: List of standard scrape_configs dicts or a callable
that returns the list in case the configs need to be generated dynamically.
The contents of this list will be merged with the contents of `metrics_endpoints`.
"""
super().__init__(charm, relation_name)
dashboard_dirs = dashboard_dirs or ["./src/grafana_dashboards"]
self._charm = charm
self._relation_name = relation_name
self._metrics_endpoints = metrics_endpoints or []
self._scrape_configs = scrape_configs or []
self._metrics_rules = metrics_rules_dir
self._logs_rules = logs_rules_dir
self._recursive = recurse_rules_dirs
self._log_slots = log_slots or []
self._dashboard_dirs = dashboard_dirs
self._refresh_events = refresh_events or [self._charm.on.config_changed]
events = self._charm.on[relation_name]
self.framework.observe(events.relation_joined, self._on_refresh)
self.framework.observe(events.relation_changed, self._on_refresh)
for event in self._refresh_events:
self.framework.observe(event, self._on_refresh)
def _on_refresh(self, event):
"""Trigger the class to update relation data."""
relations = self._charm.model.relations[self._relation_name]
for relation in relations:
# Before a principal is related to the grafana-agent subordinate, we'd get
# ModelError: ERROR cannot read relation settings: unit "zk/2": settings not found
# Add a guard to make sure it doesn't happen.
if relation.data and self._charm.unit in relation.data:
# Subordinate relations can communicate only over unit data.
try:
data = CosAgentProviderUnitData(
metrics_alert_rules=self._metrics_alert_rules,
log_alert_rules=self._log_alert_rules,
dashboards=self._dashboards,
metrics_scrape_jobs=self._scrape_jobs,
log_slots=self._log_slots,
subordinate=self._charm.meta.subordinate,
)
relation.data[self._charm.unit][data.KEY] = data.json()
except (
pydantic.ValidationError,
json.decoder.JSONDecodeError,
) as e:
logger.error("Invalid relation data provided: %s", e)
@property
def _scrape_jobs(self) -> List[Dict]:
"""Return a prometheus_scrape-like data structure for jobs.
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
"""
if callable(self._scrape_configs):
scrape_configs = self._scrape_configs()
else:
# Create a copy of the user scrape_configs, since we will mutate this object
scrape_configs = self._scrape_configs.copy()
# Convert "metrics_endpoints" to standard scrape_configs, and add them in
for endpoint in self._metrics_endpoints:
scrape_configs.append(
{
"metrics_path": endpoint["path"],
"static_configs": [{"targets": [f"localhost:{endpoint['port']}"]}],
}
)
scrape_configs = scrape_configs or [DEFAULT_SCRAPE_CONFIG]
# Augment job name to include the app name and a unique id (index)
for idx, scrape_config in enumerate(scrape_configs):
scrape_config["job_name"] = "_".join(
[self._charm.app.name, str(idx), scrape_config.get("job_name", "default")]
)
return scrape_configs
@property
def _metrics_alert_rules(self) -> Dict:
"""Use (for now) the prometheus_scrape AlertRules to initialize this."""
alert_rules = AlertRules(
query_type="promql", topology=JujuTopology.from_charm(self._charm)
)
alert_rules.add_path(self._metrics_rules, recursive=self._recursive)
return alert_rules.as_dict()
@property
def _log_alert_rules(self) -> Dict:
"""Use (for now) the loki_push_api AlertRules to initialize this."""
alert_rules = AlertRules(query_type="logql", topology=JujuTopology.from_charm(self._charm))
alert_rules.add_path(self._logs_rules, recursive=self._recursive)
return alert_rules.as_dict()
@property
def _dashboards(self) -> List[GrafanaDashboard]:
dashboards: List[GrafanaDashboard] = []
for d in self._dashboard_dirs:
for path in Path(d).glob("*"):
dashboard = GrafanaDashboard._serialize(path.read_bytes())
dashboards.append(dashboard)
return dashboards
class COSAgentDataChanged(EventBase):
"""Event emitted by `COSAgentRequirer` when relation data changes."""
class COSAgentValidationError(EventBase):
"""Event emitted by `COSAgentRequirer` when there is an error in the relation data."""
def __init__(self, handle, message: str = ""):
super().__init__(handle)
self.message = message
def snapshot(self) -> Dict:
"""Save COSAgentValidationError source information."""
return {"message": self.message}
def restore(self, snapshot):
"""Restore COSAgentValidationError source information."""
self.message = snapshot["message"]
class COSAgentRequirerEvents(ObjectEvents):
"""`COSAgentRequirer` events."""
data_changed = EventSource(COSAgentDataChanged)
validation_error = EventSource(COSAgentValidationError)
class MultiplePrincipalsError(Exception):
"""Custom exception for when there are multiple principal applications."""
pass
class COSAgentRequirer(Object):
"""Integration endpoint wrapper for the Requirer side of the cos_agent interface."""
on = COSAgentRequirerEvents() # pyright: ignore
def __init__(
self,
charm: CharmType,
*,
relation_name: str = DEFAULT_RELATION_NAME,
peer_relation_name: str = DEFAULT_PEER_RELATION_NAME,
refresh_events: Optional[List[str]] = None,
):
"""Create a COSAgentRequirer instance.
Args:
charm: The `CharmBase` instance that is instantiating this object.
relation_name: The name of the relation to communicate over.
peer_relation_name: The name of the peer relation to communicate over.
refresh_events: List of events on which to refresh relation data.
"""
super().__init__(charm, relation_name)
self._charm = charm
self._relation_name = relation_name
self._peer_relation_name = peer_relation_name
self._refresh_events = refresh_events or [self._charm.on.config_changed]
events = self._charm.on[relation_name]
self.framework.observe(
events.relation_joined, self._on_relation_data_changed
) # TODO: do we need this?
self.framework.observe(events.relation_changed, self._on_relation_data_changed)
for event in self._refresh_events:
self.framework.observe(event, self.trigger_refresh) # pyright: ignore
# Peer relation events
# A peer relation is needed as it is the only mechanism for exchanging data across
# subordinate units.
# self.framework.observe(
# self.on[self._peer_relation_name].relation_joined, self._on_peer_relation_joined
# )
peer_events = self._charm.on[peer_relation_name]
self.framework.observe(peer_events.relation_changed, self._on_peer_relation_changed)
@property
def peer_relation(self) -> Optional["Relation"]:
"""Helper function for obtaining the peer relation object.
Returns: peer relation object
(NOTE: would return None if called too early, e.g. during install).
"""
return self.model.get_relation(self._peer_relation_name)
def _on_peer_relation_changed(self, _):
# Peer data is used for forwarding data from principal units to the grafana agent
# subordinate leader, for updating the app data of the outgoing o11y relations.
if self._charm.unit.is_leader():
self.on.data_changed.emit() # pyright: ignore
def _on_relation_data_changed(self, event: RelationChangedEvent):
# Peer data is the only means of communication between subordinate units.
if not self.peer_relation:
event.defer()
return
cos_agent_relation = event.relation
if not event.unit or not cos_agent_relation.data.get(event.unit):
return
principal_unit = event.unit
# Coherence check
units = cos_agent_relation.units
if len(units) > 1:
# should never happen
raise ValueError(
f"unexpected error: subordinate relation {cos_agent_relation} "
f"should have exactly one unit"
)
if not (raw := cos_agent_relation.data[principal_unit].get(CosAgentProviderUnitData.KEY)):
return
if not (provider_data := self._validated_provider_data(raw)):
return
# Copy data from the principal relation to the peer relation, so the leader could
# follow up.
# Save the originating unit name, so it could be used for topology later on by the leader.
data = CosAgentPeersUnitData( # peer relation databag model
principal_unit_name=event.unit.name,
principal_relation_id=str(event.relation.id),
principal_relation_name=event.relation.name,
metrics_alert_rules=provider_data.metrics_alert_rules,
log_alert_rules=provider_data.log_alert_rules,
dashboards=provider_data.dashboards,
)
self.peer_relation.data[self._charm.unit][
f"{CosAgentPeersUnitData.KEY}-{event.unit.name}"
] = data.json()
# We can't easily tell if the data that was changed is limited to only the data
# that goes into peer relation (in which case, if this is not a leader unit, we wouldn't
# need to emit `on.data_changed`), so we're emitting `on.data_changed` either way.
self.on.data_changed.emit() # pyright: ignore
def _validated_provider_data(self, raw) -> Optional[CosAgentProviderUnitData]:
try:
return CosAgentProviderUnitData(**json.loads(raw))
except (pydantic.ValidationError, json.decoder.JSONDecodeError) as e:
self.on.validation_error.emit(message=str(e)) # pyright: ignore
return None
def trigger_refresh(self, _):
"""Trigger a refresh of relation data."""
# FIXME: Figure out what we should do here
self.on.data_changed.emit() # pyright: ignore
@property
def _principal_unit(self) -> Optional[Unit]:
"""Return the principal unit for a relation.
Assumes that the relation is of type subordinate.
Relies on the fact that, for subordinate relations, the only remote unit visible to
*this unit* is the principal unit that this unit is attached to.
"""
if relations := self._principal_relations:
# Technically it's a list, but for subordinates there can only be one relation
principal_relation = next(iter(relations))
if units := principal_relation.units:
# Technically it's a list, but for subordinates there can only be one
return next(iter(units))
return None
@property
def _principal_relations(self):
relations = []
for relation in self._charm.model.relations[self._relation_name]:
if not json.loads(relation.data[next(iter(relation.units))]["config"]).get(
["subordinate"], False
):
relations.append(relation)
if len(relations) > 1:
logger.error(
"Multiple applications claiming to be principal. Update the cos-agent library in the client application charms."
)
raise MultiplePrincipalsError("Multiple principal applications.")
return relations
@property
def _remote_data(self) -> List[CosAgentProviderUnitData]:
"""Return a list of remote data from each of the related units.
Assumes that the relation is of type subordinate.
Relies on the fact that, for subordinate relations, the only remote unit visible to
*this unit* is the principal unit that this unit is attached to.
"""
all_data = []
for relation in self._charm.model.relations[self._relation_name]:
if not relation.units:
continue
unit = next(iter(relation.units))
if not (raw := relation.data[unit].get(CosAgentProviderUnitData.KEY)):
continue
if not (provider_data := self._validated_provider_data(raw)):
continue
all_data.append(provider_data)
return all_data
def _gather_peer_data(self) -> List[CosAgentPeersUnitData]:
"""Collect data from the peers.
Returns a trimmed-down list of CosAgentPeersUnitData.
"""
relation = self.peer_relation
# Ensure that whatever context we're running this in, we take the necessary precautions:
if not relation or not relation.data or not relation.app:
return []
# Iterate over all peer unit data and only collect every principal once.
peer_data: List[CosAgentPeersUnitData] = []
app_names: Set[str] = set()
for unit in chain((self._charm.unit,), relation.units):
if not relation.data.get(unit):
continue
for unit_name in relation.data.get(unit): # pyright: ignore
if not unit_name.startswith(CosAgentPeersUnitData.KEY):
continue
raw = relation.data[unit].get(unit_name)
if raw is None:
continue
data = CosAgentPeersUnitData(**json.loads(raw))
# Have we already seen this principal app?
if (app_name := data.app_name) in app_names:
continue
peer_data.append(data)
app_names.add(app_name)
return peer_data
@property
def metrics_alerts(self) -> Dict[str, Any]:
"""Fetch metrics alerts."""
alert_rules = {}
seen_apps: List[str] = []
for data in self._gather_peer_data():
if rules := data.metrics_alert_rules:
app_name = data.app_name
if app_name in seen_apps:
continue # dedup!
seen_apps.append(app_name)
# This is only used for naming the file, so be as specific as we can be
identifier = JujuTopology(
model=self._charm.model.name,
model_uuid=self._charm.model.uuid,
application=app_name,
# For the topology unit, we could use `data.principal_unit_name`, but that unit
# name may not be very stable: `_gather_peer_data` de-duplicates by app name so
# the exact unit name that turns up first in the iterator may vary from time to
# time. So using the grafana-agent unit name instead.
unit=self._charm.unit.name,
).identifier
alert_rules[identifier] = rules
return alert_rules
@property
def metrics_jobs(self) -> List[Dict]:
"""Parse the relation data contents and extract the metrics jobs."""
scrape_jobs = []
for data in self._remote_data:
for job in data.metrics_scrape_jobs:
# In #220, relation schema changed from a simplified dict to the standard
# `scrape_configs`.
# This is to ensure backwards compatibility with Providers older than v0.5.
if "path" in job and "port" in job and "job_name" in job:
job = {
"job_name": job["job_name"],
"metrics_path": job["path"],
"static_configs": [{"targets": [f"localhost:{job['port']}"]}],
}
scrape_jobs.append(job)
return scrape_jobs
@property
def snap_log_endpoints(self) -> List[SnapEndpoint]:
"""Fetch logging endpoints exposed by related snaps."""
plugs = []
for data in self._remote_data:
targets = data.log_slots
if targets:
for target in targets:
if target in plugs:
logger.warning(
f"plug {target} already listed. "
"The same snap is being passed from multiple "
"endpoints; this should not happen."
)
else:
plugs.append(target)
endpoints = []
for plug in plugs:
if ":" not in plug:
logger.error(f"invalid plug definition received: {plug}. Ignoring...")
else:
endpoint = SnapEndpoint(*plug.split(":"))
endpoints.append(endpoint)
return endpoints
@property
def logs_alerts(self) -> Dict[str, Any]:
"""Fetch log alerts."""
alert_rules = {}
seen_apps: List[str] = []
for data in self._gather_peer_data():
if rules := data.log_alert_rules:
# This is only used for naming the file, so be as specific as we can be
app_name = data.app_name
if app_name in seen_apps:
continue # dedup!
seen_apps.append(app_name)
identifier = JujuTopology(
model=self._charm.model.name,
model_uuid=self._charm.model.uuid,
application=app_name,
# For the topology unit, we could use `data.principal_unit_name`, but that unit
# name may not be very stable: `_gather_peer_data` de-duplicates by app name so
# the exact unit name that turns up first in the iterator may vary from time to
# time. So using the grafana-agent unit name instead.
unit=self._charm.unit.name,
).identifier
alert_rules[identifier] = rules
return alert_rules
@property
def dashboards(self) -> List[Dict[str, str]]:
"""Fetch dashboards as encoded content.
Dashboards are assumed not to vary across units of the same primary.
"""
dashboards: List[Dict[str, Any]] = []
seen_apps: List[str] = []
for data in self._gather_peer_data():
app_name = data.app_name
if app_name in seen_apps:
continue # dedup!
seen_apps.append(app_name)
for encoded_dashboard in data.dashboards or ():
content = GrafanaDashboard(encoded_dashboard)._deserialize()
title = content.get("title", "no_title")
dashboards.append(
{
"relation_id": data.principal_relation_id,
# We have the remote charm name - use it for the identifier
"charm": f"{data.principal_relation_name}-{app_name}",
"content": content,
"title": title,
}
)
return dashboards

View File

@ -1 +0,0 @@
src/metadata.yaml

21
metadata.yaml Normal file
View File

@ -0,0 +1,21 @@
# This file populates the Overview on Charmhub.
# See https://juju.is/docs/sdk/metadata-reference for a checklist and guidance.
name: magpie
summary: Magpie layer to test networking - ICMP and DNS
maintainer: OpenStack Charmers <openstack-charmers@lists.ubuntu.com>
description: |
Magpie will check ICMP, DNS, MTU and rx/tx speed between itself and any
peer units deployed - deploy more than one magpie unit for meaningful results.
tags: [testing, CI]
provides:
# https://charmhub.io/grafana-agent/libraries/cos_agent
cos-agent:
interface: cos_agent
peers:
magpie:
interface: magpie2
series:
- focal
- jammy
- lunar
- mantic

View File

@ -1,13 +1,8 @@
- project:
templates:
- charm-unit-jobs-py38
- charm-unit-jobs-py310
check:
jobs:
- focal
- jammy
vars:
needs_charm_build: true
charm_build_name: magpie
build_type: charmcraft
charmcraft_channel: 2.x/edge
charmcraft_channel: 2.x/stable

47
pyproject.toml Normal file
View File

@ -0,0 +1,47 @@
# Testing tools configuration
[tool.coverage.run]
branch = true
[tool.coverage.report]
show_missing = true
[tool.pytest.ini_options]
minversion = "6.0"
log_cli_level = "INFO"
# Formatting tools configuration
[tool.black]
line-length = 99
target-version = ["py38"]
# Linting tools configuration
[lint]
line-length = 99
select = ["E", "W", "F", "C", "N", "D", "I001"]
extend-ignore = [
"C901",
"D203",
"D204",
"D213",
"D215",
"D400",
"D404",
"D406",
"D407",
"D408",
"D409",
"D413",
]
ignore = ["E501", "D107"]
extend-exclude = ["__pycache__", "*.egg_info"]
per-file-ignores = {"tests/*" = ["D100","D101","D102","D103","D104"]}
[lint.mccabe]
max-complexity = 10
[tool.codespell]
skip = "build,lib,venv,icon.svg,.tox,.git,.mypy_cache,.ruff_cache,.coverage,cover"
[tool.pyright]
include = ["src/**.py", "tests/**.py"]

View File

@ -1,5 +0,0 @@
# This file is used to trigger rebuilds
# when dependencies of the charm change,
# but nothing in the charm needs to.
# simply change the uuid to something new
53cb6df6-1178-11ec-b383-bf4fe629ca15

View File

@ -1,20 +1,10 @@
# This file is managed centrally by release-tools and should not be modified
# within individual charm repos. See the 'global' dir contents for available
# choices of *requirements.txt files for OpenStack Charms:
# https://github.com/openstack-charmers/release-tools
#
# NOTE(lourot): This might look like a duplication of test-requirements.txt but
# some tox targets use only test-requirements.txt whereas charm-build uses only
# requirements.txt
setuptools<50.0.0 # https://github.com/pypa/setuptools/commit/04e3df22df840c6bb244e9b27bc56750c44b7c85
ops ~= 2.4
netifaces ~= 0.11.0
netaddr ~= 0.8.0
pyyaml ~= 6.0.1
psutil ~= 5.9.5
prometheus-client ~= 0.17.1
# NOTE: newer versions of cryptography require a Rust compiler to build,
# see
# * https://github.com/openstack-charmers/zaza/issues/421
# * https://mail.python.org/pipermail/cryptography-dev/2021-January/001003.html
#
cryptography<3.4
git+https://github.com/juju/charm-tools.git
simplejson
# for lib/charms/grafana_agent/v0/cos_agent.py
cosl
pydantic < 2

View File

@ -1,202 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -1,425 +0,0 @@
# Overview
Magpie is a charm used for testing the networking of a Juju provider/substrate.
Simply deploy Magpie charm to at least two units and watch the status messages and
debug logs.
Magpie will test:
- DNS functionality
- Local hostname lookup
- ICMP between peers
- MTU between leader and clients
- Transfer between leader and clients
Note : **MTU and transfer speed are tested with iperf2**
Status messages will show the unit numbers that have issues - if there are
no problems, there will not be a verbose status message.
All strings, queries, and actions are logged in the Juju logs.
# MTU Notes
The MTU size reported by iperf is sometimes 8 or 12 bytes less than the configured
MTU on the interface. This is due to TCP options not being included in the measurement,
and therefore we ignore that difference and report everything OK.
# Workload Status
In addition to ICMP and DNS status messages, if a networking problem is
detected, the workload status of the agent which has found the issues
will be set to blocked.
# Reactive States
This layer will set the following states:
- **`magpie-icmp.failed`** ICMP has failed to one or more units in the peer relation.
- **`magpie-dns.failed`** DNS has failed to one or more units in the peer relation.
Note: work stopped on these states as it is currently unlikely magpie will be consumed
as a layer.
Please open an issue against this github repo if more states are required.
# Usage
```
juju deploy magpie -n 2
juju deploy magpie -n 1 --to lxd:1
```
This charm supports several config values for tuning behaviour.
Please refer to ./src/config.yaml or run `juju config magpie`.
Example of adjusting config:
```
juju config magpie dns_server=8.8.8.8 required_mtu=9000 min_speed=1000
```
## Network spaces
If you use network spaces in your Juju deployment (as you should) use
`--bind '<space-name> magpie=<space-name>'` to force magpie to test that
particular network space.
It is possible to deploy several magpie charms
(as different Juju applications) to the same server each in a different
network space.
Example:
```
juju deploy -m magpie cs:~openstack-charmers/magpie magpie-space1 --bind "space1 magpie=space1" -n 5 --to 0,2,1,4,3
juju deploy -m magpie cs:~openstack-charmers/magpie magpie-space2 --bind "space2 magpie=space2" -n 3 --to 3,2,0
juju deploy -m magpie cs:~openstack-charmers/magpie magpie-space3 --bind "space3 magpie=space3" -n 4 --to 3,2,1,0
juju deploy -m magpie cs:~openstack-charmers/magpie magpie-space4 --bind "space4 magpie=space4" -n 4 --to 3,2,1,0
```
## Bonded links testing and troubleshooting
Network bonding enables the combination of two or more network interfaces into a single-bonded
(logical) interface, which increases the bandwidth and provides redundancy. While Magpie does some
sanity checks and could reveal some configuration problems, this part of README contains some
advanced troubleshooting information, which might be useful, while identifying and fixing the issue.
There are six bonding modes:
### `balance-rr`
Round-robin policy: Transmit packets in sequential order from the first available slave through the
last. This mode provides load balancing and fault tolerance.
### `active-backup`
Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and
only if, the active slave fails. The bond's MAC address is externally visible on only one port
(network adapter) to avoid confusing the switch. This mode provides fault tolerance. The primary
option affects the behavior of this mode.
### `balance-xor`
XOR policy: Transmit based on selectable hashing algorithm. The default policy is a simple
source+destination MAC address algorithm. Alternate transmit policies may be selected via the
`xmit_hash_policy` option, described below. This mode provides load balancing and fault tolerance.
### `broadcast`
Broadcast policy: transmits everything on all slave interfaces. This mode provides fault tolerance.
### `802.3ad` (LACP)
Link Aggregation Control Protocol (IEEE 802.3ad LACP) is a control protocol that automatically
detects multiple links between two LACP enabled devices and configures them to use their maximum
possible bandwidth by automatically trunking the links together. This mode has a prerequisite -
the switch(es) ports should have LACP configured and enabled.
### `balance-tlb`
Adaptive transmit load balancing: channel bonding that does not require any special switch support.
The outgoing traffic is distributed according to the current load (computed relative to the speed)
on each slave. Incoming traffic is received by the current slave. If the receiving slave fails,
another slave takes over the MAC address of the failed receiving slave.
### `balance-alb`
Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic,
and does not require any special switch support. The receive load balancing is achieved by ARP
negotiation.
The most commonly used modes are `active-backup` and `802.3ad` (LACP), and while active-backup
does not require any third party configuration, it has its own cons - for example, it can't multiply
the total bandwidth of the link, while 802.3ad-based bond could utilize all bond members, therefore
multiplying the bandwidth. However, in order to get a fully working LACP link, an appropriate
configuration has to be done both on the actor (link initiator) and partner (switch) side. Any
misconfiguration could lead to the link loss or instability, therefore it's very important to have
correct settings applied to the both sides of the link.
A quick overview of the LACP link status could be obtained by reading the
`/proc/net/bonding/<bond_name>` file.
```
$ sudo cat /proc/net/bonding/bondM
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 82:23:80:a1:a9:d3
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 15
Partner Key: 201
Partner Mac Address: 02:01:00:00:01:01
Slave Interface: eno3
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 3c:ec:ef:19:eb:30
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 82:23:80:a1:a9:d3
port key: 15
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 65534
system mac address: 02:01:00:00:01:01
oper key: 201
port priority: 1
port number: 12
port state: 63
Slave Interface: eno1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 3c:ec:ef:19:eb:2e
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 82:23:80:a1:a9:d3
port key: 15
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 65534
system mac address: 02:01:00:00:01:01
oper key: 201
port priority: 1
port number: 1012
port state: 63
```
The key things an operator should take a look at is:
- LACP rate
- Actor Churn State
- Partner Churn State
- Port State
### LACP rate
The Link Aggregation Control Protocol (LACP) provides a standardized means for exchanging
information between Partner Systems on a link to allow their Link Aggregation Control instances to
reach agreement on the identity of the LAG to which the link belongs, move the link to that LAG, and
enable its transmission and reception functions in an orderly manner. The protocol depends upon the
transmission of information and state, rather than the transmission of commands. LACPDUs (LACP Data
Unit) sent by the first party (the Actor) convey to the second party (the Actors protocol Partner)
what the Actor knows, both about its own state and that of the Partner.
Periodic transmission of LACPDUs occurs if the LACP Activity control of either the Actor or the
Partner is Active LACP. These periodic transmissions will occur at either a slow or fast
transmission rate depending upon the expressed LACP_Timeout preference (Long Timeout or Short
Timeout) of the Partner System.
### Actor/Partner Churn State
In general, "Churned" port status means that the parties are unable to reach agreement upon the
desired state of a link. Under normal operation of the protocol, such a resolution would be reached
very rapidly; continued failure to reach agreement can be symptomatic of component failure, of the
presence of non-standard devices on the link concerned, or of mis-configuration. Hence, detection of
such failures is signalled by the Churn Detection algorithm to the operator in order to prompt
administrative action to further resolution.
### Port State
Both of the Actor and Partner state are variables, encoded as individual bits within a single octet,
as follows.
0) LACP_Activity: Device intends to transmit periodically in order to find potential
members for the aggregate. Active LACP is encoded as a 1; Passive LACP as a 0.
1) LACP_Timeout: This flag indicates the Timeout control value with regard to this link. Short
Timeout is encoded as a 1; Long Timeout as a 0.
2) Aggregability: This flag indicates that the system considers this link to be Aggregateable; i.e.,
a potential candidate for aggregation. If FALSE (encoded as a 0), the link is considered to be
Individual; i.e., this link can be operated only as an individual link. Aggregatable is encoded as a
1; Individual is encoded as a 0.
3) Synchronization: Indicates that the bond on the transmitting machine is in sync with whats being
advertised in the LACP frames, meaning the link has been allocated to the correct LAG, the group has
been associated with a compatible Aggregator, and the identity of the LAG is consistent with the
System ID and operational Key information transmitted. "In Sync" is encoded as a 1; "Out of sync" is
encoded as a 0.
4) Collecting: Bond is accepting traffic received on this port, collection of incoming frames on
this link is definitely enabled and is not expected to be disabled in the absence of administrative
changes or changes in received protocol information. True is encoded as a 1; False is encoded as a
0.
5) Distributing: Bond is sending traffic using these ports encoded. Same as above, but for egress
traffic. True is encoded as a 1; False is encoded as a 0.
6) Defaulted: Determines, whether the receiving bond is using default (administratively defined)
parameters, if the information was received in an LACP PDU. Default settings are encoded as a 1,
LACP PDU is encoded as 0.
7) Expired: Is the bond in the expired state. Yes encoded as a 1, No encoded as a 0.
In the example output above, both of the port states are equal to 63. Let's decode:
```
$ python3
Python 3.8.4 (default, Jul 17 2020, 15:44:37)
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> bin(63)
'0b111111'
```
Reading right to the left:
LACP Activity: Active
LACP Timeout: Short
Aggregability: Link is Aggregatable
Synchronization: Link in sync
Collecting: True - bond is accepting the traffic
Distributing: True - bond is sending the traffic
Defaulted: Info received from LACP PDU
Expired: False - link is not expired
The above status represents the **fully healthy bond** without any LACP-related issues. Also, for
the operators' convenience, the [lacp_decoder.py](src/tools/lacp_decoder.py) script could be used to
quickly convert the status to some human-friendly format.
However, the situations where one of the links is misconfigured are happening too, so let's assume
we have the following:
```
$ sudo cat /proc/net/bonding/bondm
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: b4:96:91:6d:20:fc
Active Aggregator Info:
Aggregator ID: 2
Number of ports: 1
Actor Key: 9
Partner Key: 32784
Partner Mac Address: 00:23:04:ee:be:66
Slave Interface: enp197s0f2
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b4:96:91:6d:20:fe
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: churned
Partner Churn State: none
Actor Churned Count: 1
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: b4:96:91:6d:20:fc
port key: 7
port priority: 255
port number: 1
port state: 7
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:66
oper key: 32784
port priority: 32768
port number: 16661
port state: 13
Slave Interface: enp197s0f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b4:96:91:6d:20:fc
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: b4:96:91:6d:20:fc
port key: 9
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:66
oper key: 32784
port priority: 32768
port number: 277
port state: 63
```
As we could see, one of the links has different port states for both partner and actor, while the second
one has 63 for both - meaning, the first one is problematic and we'd need to dive more into this
problem.
Let's decode both of the statuses, using the mentioned script:
```
$ python ./lacp-decoder.py 7 13
(Equal for both ports) LACP Activity: Active LACP
LACP Timeout: Short (Port 1) / Long (Port 2)
(Equal for both ports) Aggregability: Aggregatable
Synchronization: Link out of sync (Port 1) / Link in sync (Port 2)
(Equal for both ports) Collecting: Ingress traffic: Rejecting
(Equal for both ports) Distributing: Egress trafic: Not sending
(Equal for both ports) Is Defaulted: Settings are received from LACP PDU
(Equal for both ports) Link Expiration: No
```
The above output means that there are two differences between these statuses: LACP Timeout and
Synchronization. That means two things:
1) the Partner side (a switch side in most of the cases) has incorrectly configured LACP timeout
control. To resolve this, an operator has to either change the LACP rate from the Actor (e.g a
server) side to "Slow", or adjust the Partner (e.g switch) LACP rate to "Fast".
2) the Partner side considers this physical link as a part of a different link aggregation group. The
switch config has to be revisited and link aggregation group members need to be verified again,
ensuring there is no extra or wrong links configured as part of the single LAG.
After addressing the above issues, the port state will change to 63, which means "LACP link is fully
functional".
# Bugs
Please report bugs on [Launchpad](https://bugs.launchpad.net/charm-magpie/+filebug).
For general questions please refer to the OpenStack [Charm Guide](https://docs.openstack.org/charm-guide/latest/).

View File

@ -1,43 +0,0 @@
listen:
description: |
Instruct unit to listen
properties:
network-cidr:
type: string
description: Network cidr to use for iperf
listener-count:
type: integer
description: Number of listeners to start
advertise:
description: |
Advertise addresses
run-iperf:
description: |
Run iperf
properties:
network-cidr:
type: string
description: Network cidr to use for iperf
units:
type: string
description: Space separated list of units
iperf-batch-time:
type: integer
default: 10
description: |
Maps to iperf -t option, time in seconds to transmit traffic
concurrency-progression:
type: [integer, string]
default: "2 4 8"
description: |
Space separated list of concurrency value for each batch
total-run-time:
type: integer
default: 600
description: |
Total run time for iperf test in seconds
tag:
type: string
default: default
description: |
Tag to use when publishing metrics

View File

@ -1,89 +0,0 @@
#!/usr/local/sbin/charm-env python3
# Copyright 2020 Canonical Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import os
import sys
# Load modules from $CHARM_DIR/lib
sys.path.append('lib')
from charms.layer import basic
basic.bootstrap_charm_deps()
basic.init_config_states()
import charms.reactive as reactive
import charmhelpers.core.hookenv as hookenv
from charms.layer.magpie_tools import Iperf
IPERF_BASE_PORT = 5001
def listen(*args):
action_config = hookenv.action_get()
cidr = action_config.get('network-cidr')
listener_count = action_config.get('listener-count') or 1
magpie = reactive.relations.endpoint_from_flag('magpie.joined')
iperf = Iperf()
for port in range(IPERF_BASE_PORT, IPERF_BASE_PORT + int(listener_count)):
iperf.listen(cidr=cidr, port=port)
magpie.set_iperf_server_ready()
reactive.set_state('iperf.listening')
def advertise(*args):
magpie = reactive.relations.endpoint_from_flag('magpie.joined')
magpie.advertise_addresses()
def run_iperf(*args):
action_config = hookenv.action_get()
cidr = action_config.get('network-cidr')
units = action_config.get('units', '').split()
magpie = reactive.relations.endpoint_from_flag('magpie.joined')
nodes = {ip: name
for name, ip in magpie.get_nodes(cidr=cidr)
if not units or name in units}
iperf = Iperf()
results = iperf.batch_hostcheck(
nodes,
action_config.get('total-run-time'),
action_config.get('iperf-batch-time'),
[int(i) for i in str(
action_config.get('concurrency-progression')
).split()],
tag=action_config.get('tag'))
hookenv.action_set({
"output": json.dumps(results)})
# Actions to function mapping, to allow for illegal python action names that
# can map to a python function.
ACTIONS = {
"listen": listen,
"advertise": advertise,
"run-iperf": run_iperf,
}
def main(args):
action_name = os.path.basename(args[0])
action = ACTIONS[action_name]
action(args)
if __name__ == "__main__":
sys.exit(main(sys.argv))

View File

@ -1 +0,0 @@
actions.py

View File

@ -1 +0,0 @@
actions.py

View File

@ -1 +0,0 @@
actions.py

156
src/charm.py Executable file
View File

@ -0,0 +1,156 @@
#!/usr/bin/env python3
# Copyright 2023 Ubuntu
# See LICENSE file for licensing details.
#
# Learn more at: https://juju.is/docs/sdk
"""Charm for Magpie."""
import json
import logging
import os
from typing import Dict, List
import ops
from charms.grafana_agent.v0.cos_agent import COSAgentProvider
from magpie_tools import (
CollectDataConfig,
DnsConfig,
HostWithIp,
Iperf,
PingConfig,
check_dns,
check_ping,
collect_local_data,
configure_lldpd,
)
from ops.model import ActiveStatus
logger = logging.getLogger(__name__)
class MagpieCharm(ops.CharmBase):
"""Charm the service."""
def __init__(self, *args):
super().__init__(*args)
self.framework.observe(self.on.iperf_action, self._on_iperf_action)
self.framework.observe(self.on.info_action, self._on_info_action)
self.framework.observe(self.on.ping_action, self._on_ping_action)
self.framework.observe(self.on.dns_action, self._on_dns_action)
self.framework.observe(self.on.update_status, self._on_update_status)
self.framework.observe(self.on.install, self._on_install)
self.framework.observe(self.on.start, self._on_start)
self.framework.observe(self.on.magpie_relation_changed, self._on_peers_changed)
self.framework.observe(self.on.config_changed, self._on_config_changed)
self._grafana_agent = COSAgentProvider(
self,
metrics_endpoints=[
{"path": "/metrics", "port": 80},
],
dashboard_dirs=["./src/grafana_dashboards"],
)
def _on_install(self, event):
os.system("apt update")
os.system("apt install -y iperf")
def _on_start(self, event):
iperf = Iperf(self.model.name, self.app.name, self.model.unit.name, with_prometheus=False)
cidr: str = self.config.get("iperf_listen_cidr") # type: ignore
fallback_bind_address: str = str(self.model.get_binding("magpie").network.bind_address) # type: ignore
iperf.listen(cidr, fallback_bind_address)
self._on_update_status(event)
configure_lldpd()
def _on_config_changed(self, event):
pass
def _on_peers_changed(self, event):
self._on_update_status(event)
def _get_peer_units(self) -> Dict[ops.model.Unit, dict]: # unit -> unit relation data
units = {}
for relation in self.model.relations["magpie"]:
for unit in relation.units: # Set[Unit]
units[unit] = relation.data[unit]
return units
def _on_update_status(self, event):
n_peers = len(self._get_peer_units())
self.unit.status = ActiveStatus(f'Ready, with {n_peers} peer{"s" if n_peers != 1 else ""}')
def _on_iperf_action(self, event):
total_run_time = event.params["total-run-time"]
batch_time = event.params["batch-time"]
concurrency_progression = [int(i) for i in event.params["concurrency-progression"].split()]
filter_units = event.params["units"].split()
min_speed = event.params["min-speed"]
with_prometheus = len(self.model.relations["cos-agent"]) > 0
units = []
for host_with_ip in self._get_peer_addresses():
if not filter_units or host_with_ip.name in filter_units:
units.append(host_with_ip)
iperf = Iperf(self.model.name, self.app.name, self.model.unit.name, with_prometheus)
results = iperf.batch_hostcheck(
units,
total_run_time,
batch_time,
concurrency_progression,
min_speed,
)
data = json.dumps(results, indent=2)
event.set_results({"output": data})
def _on_info_action(self, event):
local_ip: str = str(self.model.get_binding("magpie").network.ingress_addresses[0]) # type: ignore
data = json.dumps(
collect_local_data(
CollectDataConfig(
required_mtu=event.params["required-mtu"],
bonds_to_check=event.params["bonds-to-check"],
lacp_passive_mode=event.params["lacp-passive-mode"],
local_ip=local_ip,
)
),
indent=2,
)
event.set_results({"output": data})
def _get_peer_addresses(self) -> List[HostWithIp]:
addresses = []
for unit, data in self._get_peer_units().items():
ip = data.get("ingress-address")
if ip:
addresses.append(HostWithIp(name=unit.name, ip=ip))
return addresses
def _on_ping_action(self, event):
data: Dict[str, str] = check_ping(
self._get_peer_addresses(),
PingConfig(
timeout=event.params["timeout"],
tries=event.params["tries"],
interval=event.params["interval"],
required_mtu=event.params["required-mtu"],
),
)
event.set_results({"output": json.dumps(data, indent=2)})
def _on_dns_action(self, event):
data = check_dns(
self._get_peer_addresses(),
DnsConfig(
server=event.params["server"],
tries=event.params["tries"],
timeout=event.params["timeout"],
),
)
event.set_results({"output": json.dumps(data, indent=2)})
if __name__ == "__main__": # pragma: nocover
ops.main(MagpieCharm) # type: ignore

View File

@ -1,96 +0,0 @@
options:
check_bonds:
default: AUTO
description: Comma separated list of expected bonds or AUTO to check all available bonds.
type: string
use_lldp:
default: false
description: Enable LLDP agent and collect data
type: boolean
check_port_description:
default: false
description: Check LLDP port description to match hostname
type: boolean
check_iperf:
default: true
description: Execute iperf network performance test
type: boolean
check_dns:
default: true
description: Check if peers are resolveble
type: boolean
check_local_hostname:
default: true
description: Check if local hostname is resolvable
type: boolean
dns_server:
default: ''
description: Use unit default DNS server
type: string
dns_tries:
default: 1
description: Number of DNS resolution attempts per query
type: int
dns_time:
default: 5
description: Timeout in seconds per DNS query try
type: int
lacp_passive_mode:
default: false
description: Set to true if switches are in LACP passive mode.
type: boolean
ping_timeout:
default: 2
description: Timeout in seconds per ICMP request
type: int
ping_tries:
default: 20
description: Number of ICMP packets per ping
type: int
ping_interval:
default: 0.05
description: Number of seconds to wait between sending each packet
type: float
ping_mesh_mode:
default: true
description: |
If true: each unit will ping each other unit.
If false: only the leader unit will ping each other unit.
type: boolean
supress_status:
default: False
description: Enable this if you intend to consume this layer - suppresses status messages
type: boolean
required_mtu:
default: 0
description: Desired MTU for all nodes - block if the unit MTU is different (accounting for encapsulation). 0 disables.
type: int
min_speed:
default: '0'
description: |
Minimum transfer speed in integer mbit/s required to pass the test. 0 disables.
This can also be set to an integer percentage value (eg. '80%'),
which will be interpreted as a percentage of the link speed.
Useful in mixed link speed environments.
Likewise, '0%' disables.
type: string
iperf_duration:
default: 1
description: |
Time in seconds to run iperf to test the transfer speed. Larger
value can be set to mitigate the impact of CPU power saving
features especially on faster links such as 50G.
type: int
source:
default: distro
type: string
description: |
Repository to add to unit before installing any dependencies.
May be one of the following:
distro (default)
ppa:somecustom/ppa (PPA name must include UCA OpenStack Release name)
deb url sources entry|key id
or a supported Ubuntu Cloud Archive pocket.

View File

View File

@ -0,0 +1,457 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": 1,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": "${prometheusds}",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "bps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [
"mean",
"lastNotNull",
"max",
"min"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"pluginVersion": "9.2.1",
"targets": [
{
"expr": "sum(avg_over_time(magpie_iperf_bandwidth[30s]))",
"interval": "",
"legendFormat": "bandwidth",
"queryType": "randomWalk",
"refId": "A"
}
],
"title": "iperf client bandwidth (total)",
"type": "timeseries"
},
{
"datasource": "${prometheusds}",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "bps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 0
},
"id": 4,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"pluginVersion": "9.2.1",
"targets": [
{
"editorMode": "code",
"expr": "avg_over_time(magpie_iperf_bandwidth[600s])",
"interval": "",
"legendFormat": "{{src}} -> {{dest}}",
"queryType": "randomWalk",
"range": true,
"refId": "A"
}
],
"title": "iperf bandwidth (unit)",
"type": "timeseries"
},
{
"datasource": "${prometheusds}",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "normal"
},
"thresholdsStyle": {
"mode": "off"
}
},
"links": [],
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "Bps"
},
"overrides": [
{
"matcher": {
"id": "byRegexp",
"options": "/In .*/"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#629E51",
"mode": "fixed"
}
}
]
},
{
"matcher": {
"id": "byRegexp",
"options": "/Out .*/"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#1F78C1",
"mode": "fixed"
}
},
{
"id": "custom.fillOpacity",
"value": 0
},
{
"id": "custom.lineWidth",
"value": 2
}
]
}
]
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
},
"id": 6,
"links": [],
"options": {
"legend": {
"calcs": [
"mean",
"lastNotNull",
"max",
"min"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"pluginVersion": "9.2.1",
"targets": [
{
"expr": "sum(irate(node_network_receive_bytes_total[5m]))",
"format": "time_series",
"hide": false,
"instant": false,
"interval": "",
"intervalFactor": 2,
"legendFormat": "received",
"metric": "net_by",
"refId": "A",
"step": 4
},
{
"expr": "sum(irate(node_network_transmit_bytes_total[5m]))",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 2,
"legendFormat": "sent",
"refId": "B",
"step": 4
}
],
"title": "Network throughput",
"type": "timeseries"
},
{
"datasource": "${prometheusds}",
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8
},
"id": 8,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"editorMode": "code",
"expr": "magpie_iperf_concurrency",
"legendFormat": "{{src}} -> {{dest}}",
"range": true,
"refId": "A"
}
],
"title": "Concurrency",
"type": "timeseries"
}
],
"refresh": "10s",
"schemaVersion": 37,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-30m",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "Magpie Network Benchmarking",
"uid": "YzR4rgBGz",
"version": 17,
"weekStart": ""
}

View File

@ -1,11 +0,0 @@
repo: git@github.com:openstack-charmers/magpie-layer.git
includes: [
'layer:basic',
'interface:magpie',
'layer:leadership',
'interface:http'
]
options:
basic:
use_venv: True
include_system_packages: False

File diff suppressed because it is too large Load Diff

1045
src/magpie_tools.py Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1,18 +0,0 @@
name: magpie
summary: Magpie layer to test networking - ICMP and DNS
maintainer: Andrew McLeod <andrew.mcleod@canonical.com>
description: |
Magpie will check ICMP, DNS, MTU and rx/tx speed between itself and any
peer units deployed - deploy more than one magpie unit for meaningful results.
tags: [testing, CI]
provides:
prometheus-target:
interface: http
peers:
magpie:
interface: magpie
series:
- focal
- jammy
- lunar
- mantic

View File

@ -1,136 +0,0 @@
# Copyright 2020 Canonical Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=unused-argument
from charms.reactive import when, when_not, set_state, remove_state
from charmhelpers.core import hookenv
from charms.layer.magpie_tools import check_nodes, safe_status, Iperf, Lldp
import charmhelpers.contrib.openstack.utils as os_utils
import charmhelpers.fetch as fetch
def _set_states(check_result):
if 'fail' in check_result['icmp']:
set_state('magpie-icmp.failed')
else:
remove_state('magpie-icmp.failed')
if 'fail' in check_result['dns']:
set_state('magpie-dns.failed')
else:
remove_state('magpie-dns.failed')
@when_not('charm.installed')
def install():
"""Configure APT source.
The many permutations of package source syntaxes in use does not allow us
to simply call `add-apt-repository` on the unit and we need to make use
of `charmhelpers.fetch.add_source` for this to be universally useful.
"""
source, key = os_utils.get_source_and_pgp_key(
hookenv.config().get('source', 'distro'))
fetch.add_source(source, key)
fetch.apt_update(fatal=True)
# The ``magpie`` charm is used as principle for functional tests with some
# subordinate charms. Install the ``openstack-release`` package when
# available to allow the functional test code to determine installed UCA
# versions.
fetch.apt_install(fetch.filter_installed_packages(['openstack-release']),
fatal=False, quiet=True)
fetch.apt_install(fetch.filter_installed_packages(['iperf']),
fatal=True, quiet=True)
set_state('charm.installed')
@when('charm.installed')
@when_not('lldp.installed')
def install_lldp_pkg():
if hookenv.config().get('use_lldp'):
lldp = Lldp()
lldp.install()
lldp.enable()
set_state('lldp.installed')
@when_not('magpie.joined')
def no_peers():
safe_status('waiting', 'Waiting for peers...')
@when('magpie.joined')
@when_not('leadership.is_leader', 'iperf.checked')
def check_check_state(magpie):
'''
Servers should only update their status after iperf has checked them
'''
if magpie.get_iperf_checked():
for units in magpie.get_iperf_checked():
if units and hookenv.local_unit() in units:
set_state('iperf.checked')
@when('magpie.joined', 'leadership.is_leader')
@when_not('iperf.servers.ready')
def leader_wait_servers_ready(magpie):
'''
Don't do any iperf checks until the servers are listening
'''
nodes = sorted(magpie.get_nodes())
iperf_ready_nodes = sorted(magpie.check_ready_iperf_servers())
if nodes == iperf_ready_nodes:
set_state('iperf.servers.ready')
else:
remove_state('iperf.servers.ready')
@when('magpie.joined')
@when_not('leadership.is_leader', 'iperf.listening')
def listen_for_checks(magpie):
'''
If im not the leader, and im not listening, then listen
'''
iperf = Iperf()
iperf.listen()
magpie.set_iperf_server_ready()
set_state('iperf.listening')
@when('iperf.servers.ready', 'magpie.joined', 'leadership.is_leader')
def client_check_hosts(magpie):
'''
Once the iperf servers are listening, do the checks
'''
nodes = magpie.get_nodes()
_set_states(check_nodes(nodes, is_leader=True))
magpie.set_iperf_checked()
@when('magpie.joined', 'iperf.checked')
@when_not('leadership.is_leader')
def check_all_node(magpie):
'''
Now that the iperf checks have been done, we can update our status
'''
nodes = magpie.get_nodes()
_set_states(check_nodes(nodes))
@when('prometheus-target.available')
def advertise_metric_port(target):
'''
Advertise prometheus metric port used during action execution
'''
target.configure(port="8088")

View File

@ -1,9 +0,0 @@
# This file is managed centrally by release-tools and should not be modified
# within individual charm repos. See the 'global' dir contents for available
# choices of *requirements.txt files for OpenStack Charms:
# https://github.com/openstack-charmers/release-tools
#
# Functional Test Requirements (let Zaza's dependencies solve all dependencies here!)
git+https://github.com/openstack-charmers/zaza.git#egg=zaza
git+https://github.com/openstack-charmers/zaza-openstack-tests.git#egg=zaza.openstack

View File

@ -1,7 +0,0 @@
local_overlay_enabled: False
series: focal
applications:
magpie:
num_units: 3
charm: ../../../magpie_ubuntu-20.04-amd64.charm

View File

@ -1,7 +0,0 @@
local_overlay_enabled: False
series: jammy
applications:
magpie:
num_units: 3
charm: ../../../magpie_ubuntu-22.04-amd64.charm

View File

@ -1,7 +0,0 @@
local_overlay_enabled: False
series: lunar
applications:
magpie:
num_units: 3
charm: ../../../magpie_ubuntu-23.04-amd64.charm

View File

@ -1,7 +0,0 @@
local_overlay_enabled: False
series: mantic
applications:
magpie:
num_units: 3
charm: ../../../magpie_ubuntu-23.10-amd64.charm

View File

@ -1,24 +0,0 @@
charm_name: magpie
gate_bundles:
- focal
- jammy
dev_bundles:
- lunar
- mantic
smoke_bundles:
- jammy
target_deploy_status:
magpie:
workload-status-message-prefix: "icmp ok"
tests:
- zaza.openstack.charm_tests.magpie.tests.MagpieTest
tests_options:
force_deploy:
- lunar
- mantic

51
src/tools/lacp_decoder.py Executable file → Normal file
View File

@ -13,51 +13,45 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tool to decode and help debug LACP port states.
See README.md for more information.
"""
import argparse
def status_decoder(status):
"""Extract the bits from the status integer into a list we can work with easier."""
decoded_status = [(status >> bit) & 1 for bit in range(8 - 1, -1, -1)]
decoded_status.reverse()
return decoded_status
def main(args):
"""Run the application."""
try:
port_state = int(args.port_state)
except (TypeError, ValueError):
raise Exception('port_state has to be integer')
raise Exception("port_state has to be integer")
if args.second_port_state:
try:
second_port_state = int(args.second_port_state)
except (TypeError, ValueError):
raise Exception('second_port_state has to be integer')
raise Exception("second_port_state has to be integer")
else:
second_port_state = None
states = {
0: {
"name": "LACP Activity",
1: "Active LACP",
0: "Passive LACP"
},
1: {
"name": "LACP Timeout",
1: "Short",
0: "Long"
},
0: {"name": "LACP Activity", 1: "Active LACP", 0: "Passive LACP"},
1: {"name": "LACP Timeout", 1: "Short", 0: "Long"},
2: {
"name": "Aggregability",
1: "Aggregatable",
0: "Individual",
},
3: {
"name": "Synchronization",
1: "Link in sync",
0: "Link out of sync"
},
3: {"name": "Synchronization", 1: "Link in sync", 0: "Link out of sync"},
4: {
"name": "Collecting",
1: "Ingress traffic: Accepting",
@ -66,37 +60,30 @@ def main(args):
5: {
"name": "Distributing",
1: "Egress traffic: Sending",
0: "Egress trafic: Not sending"
0: "Egress traffic: Not sending",
},
6: {
"name": "Is Defaulted",
1: "Defaulted settings",
0: "Settings are received from LACP PDU"
0: "Settings are received from LACP PDU",
},
7: {
"name": "Link Expiration",
1: "Yes",
0: "No"
}
7: {"name": "Link Expiration", 1: "Yes", 0: "No"},
}
status = status_decoder(port_state)
for i, entry in enumerate(status):
status_string = "{0}: {1}".format(states[i]['name'], states[i][entry])
status_string = "{0}: {1}".format(states[i]["name"], states[i][entry])
if second_port_state:
second_status = status_decoder(second_port_state)
if entry == second_status[i]:
status_string = "(Equal for both ports) {0}".format(
status_string)
status_string = "(Equal for both ports) {0}".format(status_string)
else:
status_string += " (Port 1) / {0} (Port 2)".format(
states[i][second_status[i]])
status_string += " (Port 1) / {0} (Port 2)".format(states[i][second_status[i]])
print(status_string)
if __name__ == '__main__':
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("port_state")
parser.add_argument("second_port_state", nargs='?', default=None)
parser.add_argument("second_port_state", nargs="?", default=None)
main(parser.parse_args())

View File

@ -1,55 +0,0 @@
# Source charm (with zaza): ./src/tox.ini
# This file is managed centrally by release-tools and should not be modified
# within individual charm repos. See the 'global' dir contents for available
# choices of tox.ini for OpenStack Charms:
# https://github.com/openstack-charmers/release-tools
[tox]
envlist = pep8
# NOTE: Avoid build/test env pollution by not enabling sitepackages.
sitepackages = False
# NOTE: Avoid false positives by not skipping missing interpreters.
skip_missing_interpreters = False
[testenv]
# We use tox mainly for virtual environment management for test requirements
# and do not install the charm code as a Python package into that environment.
# Ref: https://tox.wiki/en/latest/config.html#skip_install
skip_install = True
setenv = VIRTUAL_ENV={envdir}
PYTHONHASHSEED=0
allowlist_externals = juju
passenv =
HOME
TERM
CS_*
OS_*
TEST_*
deps = -r{toxinidir}/test-requirements.txt
[testenv:pep8]
basepython = python3
commands = charm-proof
[testenv:func-noop]
basepython = python3
commands =
functest-run-suite --help
[testenv:func]
basepython = python3
commands =
functest-run-suite --keep-model
[testenv:func-smoke]
basepython = python3
commands =
functest-run-suite --keep-model --smoke
[testenv:func-target]
basepython = python3
commands =
functest-run-suite --keep-model --bundle {posargs}
[testenv:venv]
commands = {posargs}

View File

@ -1,8 +0,0 @@
# charmhelpers.contrib.openstack.utils pulls in a dep that require this
netifaces
prometheus_client
psutil
git+https://github.com/openstack/charms.openstack.git#egg=charms.openstack
git+https://github.com/juju/charm-helpers.git#egg=charmhelpers

View File

@ -1,37 +1,13 @@
# This file is managed centrally by release-tools and should not be modified
# within individual charm repos. See the 'global' dir contents for available
# choices of *requirements.txt files for OpenStack Charms:
# https://github.com/openstack-charmers/release-tools
#
pyparsing<3.0.0 # aodhclient is pinned in zaza and needs pyparsing < 3.0.0, but cffi also needs it, so pin here.
# static analysis
black
ruff
codespell
pyright
stestr>=2.2.0
# unit tests
pytest
coverage[toml]
# Dependency of stestr. Workaround for
# https://github.com/mtreinish/stestr/issues/145
cliff<3.0.0
requests>=2.18.4
charms.reactive
mock>=1.2
nose>=1.3.7
coverage>=3.6
git+https://github.com/openstack/charms.openstack.git#egg=charms.openstack
#
# Revisit for removal / mock improvement:
#
# NOTE(lourot): newer versions of cryptography require a Rust compiler to build,
# see
# * https://github.com/openstack-charmers/zaza/issues/421
# * https://mail.python.org/pipermail/cryptography-dev/2021-January/001003.html
#
netifaces # vault
psycopg2-binary # vault
tenacity # vault
pbr==5.6.0 # vault
cryptography<3.4 # vault, keystone-saml-mellon
lxml # keystone-saml-mellon
hvac # vault, barbican-vault
psutil # cinder-lvm
# integration tests
juju
pytest-operator

View File

@ -0,0 +1,34 @@
#!/usr/bin/env python3
# Copyright 2023 Ubuntu
# See LICENSE file for licensing details.
import asyncio
import logging
from pathlib import Path
import pytest
import yaml
from pytest_operator.plugin import OpsTest
logger = logging.getLogger(__name__)
METADATA = yaml.safe_load(Path("./metadata.yaml").read_text())
APP_NAME = METADATA["name"]
@pytest.mark.abort_on_fail
async def test_build_and_deploy(ops_test: OpsTest):
"""Build the charm-under-test and deploy it together with related charms.
Assert on the unit status before any relations/configurations take place.
"""
# Build and deploy charm from local source folder
charm = await ops_test.build_charm(".")
# Deploy the charm and wait for active/idle status
await asyncio.gather(
ops_test.model.deploy(charm, application_name=APP_NAME),
ops_test.model.wait_for_idle(
apps=[APP_NAME], status="active", raise_on_blocked=True, timeout=1000
),
)

57
tests/unit/test_charm.py Normal file
View File

@ -0,0 +1,57 @@
# Copyright 2023 Ubuntu
# See LICENSE file for licensing details.
#
# Learn more about testing at: https://juju.is/docs/sdk/testing
from unittest.mock import Mock, call
import ops
import ops.testing
import pytest
from charm import MagpieCharm
from magpie_tools import status_for_speed_check
@pytest.fixture
def harness():
harness = ops.testing.Harness(MagpieCharm)
harness.begin()
yield harness
harness.cleanup()
@pytest.fixture
def os_system_mock(monkeypatch):
mock = Mock()
monkeypatch.setattr("charm.os.system", mock)
return mock
def test_example(harness, os_system_mock):
harness.charm.on.install.emit()
assert os_system_mock.call_count == 2
os_system_mock.assert_has_calls([call("apt update"), call("apt install -y iperf")])
def test_status_for_speed_check():
assert status_for_speed_check("0", 123, 150) == {"message": "min-speed disabled", "ok": True}
assert status_for_speed_check("0%", 123, 150) == {"message": "min-speed disabled", "ok": True}
assert status_for_speed_check(":P", 123, 150) == {
"message": "invalid min_speed: :P",
"ok": False,
}
assert status_for_speed_check("1", 10, 400) == {"message": "10 >= 1 mbit/s", "ok": True}
assert status_for_speed_check("12", 10, 400) == {
"message": "failed: 10 < 12 mbit/s",
"ok": False,
}
assert status_for_speed_check("50%", 100, 400) == {
"message": "failed: 100 < 200 mbit/s",
"ok": False,
}
assert status_for_speed_check("50%", 200, 400) == {"message": "200 >= 200 mbit/s", "ok": True}
assert status_for_speed_check("50%", 300, 400) == {"message": "300 >= 200 mbit/s", "ok": True}
assert status_for_speed_check("50%", 300, -1) == {
"message": "unknown, link speed undefined",
"ok": False,
}

153
tox.ini
View File

@ -1,110 +1,95 @@
# Source charm: ./tox.ini
# This file is managed centrally by release-tools and should not be modified
# within individual charm repos. See the 'global' dir contents for available
# choices of tox.ini for OpenStack Charms:
# https://github.com/openstack-charmers/release-tools
# Copyright 2023 Ubuntu
# See LICENSE file for licensing details.
[tox]
envlist = pep8,py3
# NOTE: Avoid build/test env pollution by not enabling sitepackages.
sitepackages = False
# NOTE: Avoid false positives by not skipping missing interpreters.
skip_missing_interpreters = False
no_package = True
skip_missing_interpreters = True
env_list = pep8, cover
min_version = 4.0.0
[vars]
src_path = {tox_root}/src
tests_path = {tox_root}/tests
;lib_path = {tox_root}/lib/charms/operator_name_with_underscores
all_path = {[vars]src_path} {[vars]tests_path}
[testenv]
# We use tox mainly for virtual environment management for test requirements
# and do not install the charm code as a Python package into that environment.
# Ref: https://tox.wiki/en/latest/config.html#skip_install
skip_install = True
setenv = VIRTUAL_ENV={envdir}
PYTHONHASHSEED=0
TERM=linux
CHARM_LAYERS_DIR={toxinidir}/layers
CHARM_INTERFACES_DIR={toxinidir}/interfaces
JUJU_REPOSITORY={toxinidir}/build
passenv =
no_proxy
http_proxy
https_proxy
CHARM_INTERFACES_DIR
CHARM_LAYERS_DIR
JUJU_REPOSITORY
allowlist_externals =
charmcraft
bash
tox
set_env =
PYTHONPATH = {tox_root}/lib:{[vars]src_path}
PYTHONBREAKPOINT=pdb.set_trace
PY_COLORS=1
pass_env =
PYTHONPATH
CHARM_BUILD_DIR
MODEL_SETTINGS
deps =
-r{toxinidir}/requirements.txt
-r {tox_root}/requirements.txt
-r {tox_root}/test-requirements.txt
[testenv:build]
basepython = python3
# charmcraft clean is done to ensure that
# `tox -e build` always performs a clean, repeatable build.
# For faster rebuilds during development,
# directly run `charmcraft -v pack && ./rename.sh`.
deps =
allowlist_externals =
charmcraft
commands =
charmcraft clean
charmcraft -v pack
charmcraft clean
[testenv:build-reactive]
basepython = python3
[testenv:format]
description = Apply coding style standards to code
commands =
charm-build --log-level DEBUG --use-lock-file-branches --binary-wheels-from-source -o {toxinidir}/build/builds src {posargs}
black {[vars]all_path}
ruff check --fix {[vars]all_path}
[testenv:add-build-lock-file]
basepython = python3
[testenv:pep8]
description = Code style and other linting
commands =
charm-build --log-level DEBUG --write-lock-file -o {toxinidir}/build/builds src {posargs}
codespell {tox_root}
ruff check {[vars]all_path}
black --check --diff {[vars]all_path}
[testenv:static]
description = Static typing analysis
commands =
pyright {[vars]all_path}
[testenv:py3]
basepython = python3
deps = -r{toxinidir}/test-requirements.txt
commands = stestr run --slowest {posargs}
description = Run unit tests
commands =
pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
[testenv:py39]
basepython = python3.9
description = Run unit tests
commands =
pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
[testenv:py310]
basepython = python3.10
deps = -r{toxinidir}/test-requirements.txt
commands = stestr run --slowest {posargs}
description = Run unit tests
commands =
pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
[testenv:pep8]
basepython = python3
deps = flake8==3.9.2
git+https://github.com/juju/charm-tools.git
commands = flake8 {posargs} src unit_tests
[testenv:py311]
basepython = python3.11
description = Run unit tests
commands =
pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
[testenv:py312]
basepython = python3.12
description = Run unit tests
commands =
pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
[testenv:cover]
# Technique based heavily upon
# https://github.com/openstack/nova/blob/master/tox.ini
basepython = python3
deps = -r{toxinidir}/requirements.txt
-r{toxinidir}/test-requirements.txt
setenv =
{[testenv]setenv}
PYTHON=coverage run
description = Run unit tests
commands =
coverage erase
stestr run --slowest {posargs}
coverage combine
coverage html -d cover
coverage xml -o cover/coverage.xml
coverage run --source={[vars]src_path},{[vars]tests_path} -m pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
coverage report
coverage html --directory cover
[coverage:run]
branch = True
concurrency = multiprocessing
parallel = True
source =
.
omit =
.tox/*
*/charmhelpers/*
unit_tests/*
[testenv:venv]
basepython = python3
commands = {posargs}
[flake8]
# E402 ignore necessary for path append before sys module import in actions
ignore = E402,W503,W504
[testenv:integration]
description = Run integration tests
commands =
pytest -v -s --tb native --log-cli-level=INFO {posargs} {[vars]tests_path}/integration

View File

@ -1,28 +0,0 @@
# Copyright 2016 Canonical Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import mock
import sys
sys.path.append('src')
sys.path.append('src/lib')
# Mock out charmhelpers so that we can test without it.
import charms_openstack.test_mocks # noqa
charms_openstack.test_mocks.mock_charmhelpers()
psutil_mock = mock.MagicMock()
sys.modules['psutil'] = psutil_mock
prometheus_client_mock = mock.MagicMock()
sys.modules['prometheus_client'] = prometheus_client_mock

View File

@ -1,513 +0,0 @@
from unittest.mock import (
patch,
mock_open,
MagicMock,
)
import lib.charms.layer.magpie_tools as magpie_tools
from unit_tests.test_utils import patch_open, CharmTestCase, async_test
import netifaces
LACP_STATE_SLOW_ACTIVE = '61'
LACP_STATE_FAST_ACTIVE = '63'
LACP_STATE_SLOW_PASSIVE = '60'
def mocked_open_lacp_port_state(actor, partner):
def the_actual_mock(path):
if (
path ==
"/sys/class/net/test/bonding_slave/ad_actor_oper_port_state"
):
return mock_open(read_data=actor)(path)
elif (
path ==
"/sys/class/net/test/bonding_slave/ad_partner_oper_port_state"
):
return mock_open(read_data=partner)(path)
return the_actual_mock
class TestMagpieTools(CharmTestCase):
def setUp(self):
super(TestMagpieTools, self).setUp()
self.obj = self.tools = magpie_tools
self.patches = [
'hookenv',
]
self.patch_all()
self.maxDiff = None
def test_safe_status(self):
self.hookenv.config.return_value = {
'supress_status': False}
self.tools.safe_status('active', 'awesome')
self.hookenv.status_set.assert_called_once_with(
'active', 'awesome')
self.hookenv.status_set.reset_mock()
self.hookenv.config.return_value = {
'supress_status': True}
self.tools.safe_status('active', 'awesome')
self.assertFalse(self.hookenv.status_set.called)
def test_status_for_speed_check(self):
self.assertEqual(
magpie_tools.status_for_speed_check('0', 123, 150),
', 123 mbit/s'
)
self.assertEqual(
magpie_tools.status_for_speed_check('0%', 123, 150),
', 123 mbit/s'
)
self.assertEqual(
magpie_tools.status_for_speed_check(':P', 123, 150),
", invalid min_speed: ':P'"
)
self.assertEqual(
magpie_tools.status_for_speed_check('1', 10, 400),
', speed ok: 10 mbit/s'
)
self.assertEqual(
magpie_tools.status_for_speed_check('12', 10, 400),
', speed failed: 10 < 12 mbit/s'
)
self.assertEqual(
magpie_tools.status_for_speed_check('50%', 100, 400),
', speed failed: 100 < 200 mbit/s'
)
self.assertEqual(
magpie_tools.status_for_speed_check('50%', 200, 400),
', speed ok: 200 mbit/s'
)
self.assertEqual(
magpie_tools.status_for_speed_check('50%', 300, 400),
', speed ok: 300 mbit/s'
)
self.assertEqual(
magpie_tools.status_for_speed_check('50%', 300, -1),
', speed failed: link speed undefined'
)
@patch('lib.charms.layer.magpie_tools.open',
mock_open(read_data=LACP_STATE_SLOW_ACTIVE))
def test_check_lacp_port_state_match_default(self):
self.hookenv.config.return_value = {}
self.assertIsNone(magpie_tools.check_lacp_port_state('test'))
@patch('lib.charms.layer.magpie_tools.open',
mock_open(read_data=LACP_STATE_SLOW_ACTIVE))
def test_check_lacp_port_state_match_explicit_active(self):
self.hookenv.config.return_value = {'lacp_passive_mode': False}
self.assertIsNone(magpie_tools.check_lacp_port_state('test'))
@patch('lib.charms.layer.magpie_tools.open',
mock_open(read_data=LACP_STATE_SLOW_ACTIVE))
def test_check_lacp_port_state_match_passive(self):
self.hookenv.config.return_value = {'lacp_passive_mode': True}
self.assertIsNone(magpie_tools.check_lacp_port_state('test'))
@patch('lib.charms.layer.magpie_tools.open')
def test_check_lacp_port_state_passive_expected_mismatch(self, open_):
open_.side_effect = mocked_open_lacp_port_state(
LACP_STATE_SLOW_ACTIVE, LACP_STATE_SLOW_PASSIVE
)
self.hookenv.config.return_value = {'lacp_passive_mode': True}
self.assertIsNone(magpie_tools.check_lacp_port_state('test'))
@patch('lib.charms.layer.magpie_tools.open')
def test_check_lacp_port_state_passive_default(self, open_):
open_.side_effect = mocked_open_lacp_port_state(
LACP_STATE_SLOW_ACTIVE, LACP_STATE_SLOW_PASSIVE
)
self.hookenv.config.return_value = {}
self.assertEqual(
magpie_tools.check_lacp_port_state('test'),
'lacp_port_state_mismatch')
@patch('lib.charms.layer.magpie_tools.open')
def test_check_lacp_port_state_passive_configured_active(self, open_):
open_.side_effect = mocked_open_lacp_port_state(
LACP_STATE_SLOW_ACTIVE, LACP_STATE_SLOW_PASSIVE
)
self.hookenv.config.return_value = {'lacp_passive_mode': False}
self.assertEqual(
magpie_tools.check_lacp_port_state('test'),
'lacp_port_state_mismatch')
@patch('lib.charms.layer.magpie_tools.open')
def test_check_lacp_port_state_passive_unexpected_mismatch(self, open_):
open_.side_effect = mocked_open_lacp_port_state(
LACP_STATE_FAST_ACTIVE, LACP_STATE_SLOW_PASSIVE
)
self.hookenv.config.return_value = {'lacp_passive_mode': True}
self.assertEqual(
magpie_tools.check_lacp_port_state('test'),
'lacp_port_state_mismatch')
def test_get_link_speed(self):
# Normal operation
with patch_open() as (mock_open, mock_file):
mock_file.read.return_value = b'1000'
self.assertEqual(
1000,
magpie_tools.get_link_speed('eth0'),
)
mock_open.assert_called_once_with('/sys/class/net/eth0/speed')
# Invalid argument
with patch_open() as (mock_open, mock_file):
mock_open.side_effect = OSError()
self.assertEqual(
-1,
magpie_tools.get_link_speed('eth0'),
)
@async_test
@patch(
"lib.charms.layer.magpie_tools.get_iface_mac",
lambda _: "de:ad:be:ef:01:01"
)
@patch(
"lib.charms.layer.magpie_tools.get_dest_mac",
lambda _, __: "de:ad:be:ef:02:02"
)
@patch(
"lib.charms.layer.magpie_tools.ch_ip.get_iface_from_addr",
lambda _: "de:ad:be:ef:03:03"
)
@patch(
"lib.charms.layer.magpie_tools.get_src_ip_from_dest",
lambda _: "192.168.2.2"
)
@patch("lib.charms.layer.magpie_tools.run")
async def test_run_iperf(self, mock_run):
async def mocked_run(cmd):
return """
19700101000000,192.168.2.2,60266,192.168.2.1,5001,2,0.0-10.1,95158332,75301087
19700101000000,192.168.2.2,60268,192.168.2.1,5001,1,0.0-10.1,61742908,27989222
"""
mock_run.side_effect = mocked_run
result = await magpie_tools.run_iperf(
"mynode", "192.168.2.1", "10", "2"
)
mock_run.assert_called_once_with(
"iperf -t10 -c 192.168.2.1 --port 5001 -P2 --reportstyle c"
)
self.assertEqual(result, {
"GBytes_transferred": 0.146,
"Mbits_per_second": 98,
"bits_per_second": 103290309,
"concurrency": "2",
"dest_ip": "192.168.2.1",
"dest_node": "mynode",
"dest_port": "5001",
"session": [2, 1],
"src_ip": "192.168.2.2",
"src_port": [60266, 60268],
"time_interval": "0.0-10.1",
"timestamp": "19700101000000",
"transferred_bytes": 156901240,
"src_mac": "de:ad:be:ef:01:01",
"dest_mac": "de:ad:be:ef:02:02",
"src_interface": "de:ad:be:ef:03:03",
})
@patch('netifaces.AF_LINK', 17)
@patch.object(netifaces, 'ifaddresses')
@patch.object(netifaces, 'interfaces')
def test_get_iface_mac(self, mock_interfaces, mock_addresses):
mock_interfaces.return_value = [
'lo',
'enp0s31f6',
'eth0',
'bond0',
'br0'
]
mock_addresses.return_value = {
17: [{'addr': 'c8:5b:76:80:86:01'}],
2: [{'addr': '192.168.123.45', 'netmask': '255.255.255.0'}],
}
# with interface listed by netifaces
self.assertEqual(
magpie_tools.get_iface_mac('bond0'),
'c8:5b:76:80:86:01',
)
# with unknown interface
self.assertEqual(
'',
magpie_tools.get_iface_mac('wronginterface0')
)
@patch('subprocess.PIPE', None)
@patch('subprocess.run')
def test_get_dest_mac(self, mock_subprocess):
mock_stdout = MagicMock()
mock_stdout.configure_mock(
**{
'stdout.decode.return_value': '[{"dst":"192.168.12.1",'
'"lladdr":"dc:fb:02:d1:28:18","state":["REACHABLE"]}]'
}
)
mock_subprocess.return_value = mock_stdout
self.assertEqual(
magpie_tools.get_dest_mac("eth0", "192.168.12.1"),
'dc:fb:02:d1:28:18',
)
@patch('subprocess.PIPE', None)
@patch('subprocess.run')
def test_get_src_ip_from_dest(self, mock_subprocess):
mock_stdout = MagicMock()
mock_stdout.configure_mock(
**{
'stdout.decode.return_value': '[{"dst":"192.168.12.1",'
'"dev":"enp5s0","prefsrc":"192.168.12.15","flags":[],'
'"uid":1000,"cache":[]}]'
}
)
mock_subprocess.return_value = mock_stdout
self.assertEqual(
magpie_tools.get_src_ip_from_dest("192.168.12.1"),
'192.168.12.15',
)
def test_parse_dig_yaml(self):
output = """
-
type: MESSAGE
message:
response_message_data:
ANSWER_SECTION:
- 99.0.0.10.in-addr.arpa. 30 IN PTR example.com.
"""
result, stderr = magpie_tools.parse_dig_yaml(
output,
"",
1,
30,
is_reverse_query=True,
)
self.assertEqual(result, 'example.com')
self.assertEqual(stderr, 0)
@patch('subprocess.check_output')
def test_parse_dig_yaml_calls_resolves_cname(self, mock_subprocess):
output = "-\n type: MESSAGE\n"
output += " message:\n"
output += " response_message_data:\n"
output += " ANSWER_SECTION:\n"
output += " - 99.0.0.10.in-addr.arpa. 30 IN CNAME"
output += " 99.1-25.0.0.10.in-addr.arpa"
rev_response = """
-
type: MESSAGE
message:
response_message_data:
ANSWER_SECTION:
- 99.0.0.10.in-addr.arpa. 30 IN PTR example.com.
"""
mock_subprocess.side_effect = [
bytes(rev_response, "utf-8")
]
result, stderr = magpie_tools.parse_dig_yaml(
output,
"",
1,
30,
is_reverse_query=True,
)
self.assertEqual(result, 'example.com')
self.assertEqual(stderr, 0)
@patch('subprocess.check_output')
def test_forward_dns_good(self, mock_subprocess):
ip = "10.0.0.99"
unit_id = "magpie/0"
self.hookenv.config.return_value = {
"dns_server": "127.0.0.1",
"dns_tries": "1",
"dns_time": "3"
}
rev_response = """
-
type: MESSAGE
message:
response_message_data:
ANSWER_SECTION:
- 99.0.0.10.in-addr.arpa. 30 IN PTR example.com.
"""
fwd_response = """
-
type: MESSAGE
message:
response_message_data:
ANSWER_SECTION:
- example.com. 30 IN A 10.0.0.99
"""
mock_subprocess.side_effect = [
bytes(rev_response, "utf-8"), # for reverse_dns
bytes(fwd_response, "utf-8") # for forward_dns
]
norev, nofwd, nomatch = magpie_tools.check_dns([(unit_id, ip)])
self.assertEqual(
norev, [], "Reverse lookup failed for IP {}".format(ip))
self.assertEqual(
nofwd, [], ("Forward lookup failed for IP {}, "
"faked to example.com".format(ip)))
self.assertEqual(
nomatch, [], "Reverse and forward lookups didn't match")
@patch('subprocess.check_output')
def test_forward_dns_multiple_ips(self, mock_subprocess):
ip = "10.0.0.99"
unit_id = "magpie/0"
self.hookenv.config.return_value = {
"dns_server": "127.0.0.1",
"dns_tries": "1",
"dns_time": "3"
}
rev_response = """
-
type: MESSAGE
message:
response_message_data:
ANSWER_SECTION:
- 99.0.0.10.in-addr.arpa. 30 IN PTR example.com.
"""
fwd_response = """
-
type: MESSAGE
message:
response_message_data:
ANSWER_SECTION:
- example.com. 30 IN A 10.0.0.99
- example.com. 30 IN A 10.1.0.99
- example.com. 30 IN A 10.2.0.99
"""
mock_subprocess.side_effect = [
bytes(rev_response, "utf-8"), # for reverse_dns
bytes(fwd_response, "utf-8") # for forward_dns
]
norev, nofwd, nomatch = magpie_tools.check_dns([(unit_id, ip)])
self.assertEqual(
norev, [], "Reverse lookup failed for IP {}".format(ip))
self.assertEqual(
nofwd, [], ("Forward lookup failed for IP {}, "
"faked to example.com".format(ip))
)
self.assertEqual(
nomatch, [], "Reverse and forward lookups didn't match")
self.hookenv.log.assert_any_call(
"Forward result for unit_id: 0, "
"ip: 10.0.0.99\n10.1.0.99\n10.2.0.99, exitcode: 0"
)
self.hookenv.log.assert_any_call(
"Original IP and Forward MATCH OK for unit_id: 0, "
"Original: 10.0.0.99, "
"Forward: ['10.0.0.99', '10.1.0.99', '10.2.0.99']", "INFO"
)
@patch('subprocess.check_output')
def test_cname_dns_is_followed(self, mock_subprocess):
ip = "10.0.0.99"
unit_id = "magpie/0"
self.hookenv.config.return_value = {
"dns_server": "127.0.0.1",
"dns_tries": "1",
"dns_time": "3",
}
rev_response = "-\n"
rev_response += " type: MESSAGE\n"
rev_response += " message:\n"
rev_response += " response_message_data:\n"
rev_response += " ANSWER_SECTION:\n"
rev_response += " - 99.0.0.10.in-addr.arpa. 30 IN CNAME"
rev_response += " 99.1-25.0.0.10.in-addr.arpa."
cname_response = """
-
type: MESSAGE
message:
response_message_data:
ANSWER_SECTION:
- 99.0-25.0.10.in-addr.arpa. 30 IN PTR example.com.
- 99.0-25.0.10.in-addr.arpa. 30 IN PTR other.example.com.
"""
fwd_response_1 = """
-
type: MESSAGE
message:
response_message_data:
ANSWER_SECTION:
- example.com. 30 IN A 10.0.0.99
"""
fwd_response_2 = """
-
type: MESSAGE
message:
response_message_data:
ANSWER_SECTION:
- other.example.com. 30 IN A 10.0.0.99
"""
mock_subprocess.side_effect = [
bytes(rev_response, "utf-8"), # for reverse_dns
bytes(cname_response, "utf-8"), # for resolve_cname
bytes(fwd_response_1, "utf-8"), # for forward_dns
bytes(fwd_response_2, "utf-8") # for forward_dns
]
norev, nofwd, nomatch = magpie_tools.check_dns([(unit_id, ip)])
self.assertEqual(
norev, [], "Reverse lookup failed for IP {}".format(ip))
self.assertEqual(
nofwd, [], ("Forward lookup failed for IP {}, "
"faked to example.com".format(ip))
)
self.assertEqual(
nomatch, [], "Reverse and forward lookups didn't match")
self.hookenv.log.assert_any_call(
"Forward result for unit_id: 0, "
"ip: 10.0.0.99, exitcode: 0"
)
self.hookenv.log.assert_any_call(
"Original IP and Forward MATCH OK for unit_id: 0, "
"Original: 10.0.0.99, "
"Forward: ['10.0.0.99']", "INFO"
)
@patch('subprocess.check_output')
def test_check_dns_gracefully_handles_no_answer(self, mock_subprocess):
ip = "10.0.0.99"
unit_id = "magpie/0"
self.hookenv.config.return_value = {
"dns_server": "127.0.0.1",
"dns_tries": "1",
"dns_time": "3"
}
rev_response = """
-
type: MESSAGE
message:
response_message_data: {}
"""
fwd_response = """
-
type: MESSAGE
message:
response_message_data: {}
"""
mock_subprocess.side_effect = [
bytes(rev_response, "utf-8"), # for reverse_dns
bytes(fwd_response, "utf-8") # for forward_dns
]
norev, nofwd, nomatch = magpie_tools.check_dns([(unit_id, ip)])
self.assertEqual(
norev, ['0'], "Reverse lookup had an answer for {}".format(ip))
self.assertEqual(
nofwd, [], ("Forward lookup failed for IP {}, "
"faked to example.com".format(ip)))
self.assertEqual(
nomatch, [], "Reverse and forward lookups didn't match")

View File

@ -1,89 +0,0 @@
import asyncio
import contextlib
import io
import mock
import unittest
import unittest.mock
@contextlib.contextmanager
def patch_open():
'''Patch open() to allow mocking both open() itself and the file that is
yielded.
Yields the mock for "open" and "file", respectively.'''
mock_open = mock.MagicMock(spec=open)
mock_file = mock.MagicMock(spec=io.FileIO)
@contextlib.contextmanager
def stub_open(*args, **kwargs):
mock_open(*args, **kwargs)
yield mock_file
with mock.patch('builtins.open', stub_open):
yield mock_open, mock_file
def async_test(f):
"""
A decorator to test async functions within a synchronous environment.
see https://stackoverflow.com/questions/23033939/
"""
def wrapper(*args, **kwargs):
coro = asyncio.coroutine(f)
future = coro(*args, **kwargs)
loop = asyncio.get_event_loop()
loop.run_until_complete(future)
return wrapper
class CharmTestCase(unittest.TestCase):
def setUp(self):
self._patches = {}
self._patches_start = {}
def tearDown(self):
for k, v in self._patches.items():
v.stop()
setattr(self, k, None)
self._patches = None
self._patches_start = None
def _patch(self, method):
_m = unittest.mock.patch.object(self.obj, method)
mock = _m.start()
self.addCleanup(_m.stop)
return mock
def patch_all(self):
for method in self.patches:
setattr(self, method, self._patch(method))
def patch_object(self, obj, attr, return_value=None, name=None, new=None,
**kwargs):
if name is None:
name = attr
if new is not None:
mocked = mock.patch.object(obj, attr, new=new, **kwargs)
else:
mocked = mock.patch.object(obj, attr, **kwargs)
self._patches[name] = mocked
started = mocked.start()
if new is None:
started.return_value = return_value
self._patches_start[name] = started
setattr(self, name, started)
def patch(self, item, return_value=None, name=None, new=None, **kwargs):
if name is None:
raise RuntimeError("Must pass 'name' to .patch()")
if new is not None:
mocked = mock.patch(item, new=new, **kwargs)
else:
mocked = mock.patch(item, **kwargs)
self._patches[name] = mocked
started = mocked.start()
if new is None:
started.return_value = return_value
self._patches_start[name] = started