Rewrite charm in operator framework
This is a major breaking change too. In the process, also: - move all processing from juju status to actions (run the actions to get data; the status line will be minimal) - switch to COS integration, no longer legacy prometheus for the iperf benchmarks It should be mostly feature parity with the original magpie charm, but some things still need improving and iterating on, such as the spec for data returned from actions, and actual functional tests. Change-Id: I289d4e7a0dd373c5c6f2471ab710e754c167ab8c
This commit is contained in:
parent
3f5e833adc
commit
5acbc4e5ba
17
.gitignore
vendored
17
.gitignore
vendored
@ -1,9 +1,10 @@
|
||||
build
|
||||
.tox
|
||||
layers
|
||||
interfaces
|
||||
trusty
|
||||
.testrepository
|
||||
__pycache__
|
||||
.stestr
|
||||
venv/
|
||||
build/
|
||||
*.charm
|
||||
.tox/
|
||||
.coverage
|
||||
cover/
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
.idea
|
||||
.vscode/
|
||||
|
@ -1,3 +0,0 @@
|
||||
[DEFAULT]
|
||||
test_path=./unit_tests
|
||||
top_dir=./
|
@ -1,5 +1,4 @@
|
||||
- project:
|
||||
templates:
|
||||
- openstack-python3-charm-zed-jobs
|
||||
- openstack-python3-charm-jobs
|
||||
- openstack-cover-jobs
|
||||
|
34
CONTRIBUTING.md
Normal file
34
CONTRIBUTING.md
Normal file
@ -0,0 +1,34 @@
|
||||
# Contributing
|
||||
|
||||
To make contributions to this charm, you'll need a working [development setup](https://juju.is/docs/sdk/dev-setup).
|
||||
|
||||
You can create an environment for development with `tox`:
|
||||
|
||||
```shell
|
||||
tox devenv -e integration
|
||||
source venv/bin/activate
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
This project uses `tox` for managing test environments. There are some pre-configured environments
|
||||
that can be used for linting and formatting code when you're preparing contributions to the charm:
|
||||
|
||||
```shell
|
||||
tox run -e format # update your code according to linting rules
|
||||
tox run -e lint # code style
|
||||
tox run -e static # static type checking
|
||||
tox run -e unit # unit tests
|
||||
tox run -e integration # integration tests
|
||||
tox # runs 'format', 'lint', 'static', and 'unit' environments
|
||||
```
|
||||
|
||||
## Build the charm
|
||||
|
||||
Build the charm in this git repository using:
|
||||
|
||||
```shell
|
||||
charmcraft pack
|
||||
```
|
||||
|
||||
<!-- You may want to include any contribution/style guidelines in this document>
|
2
LICENSE
2
LICENSE
@ -187,7 +187,7 @@
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
Copyright 2023 Ubuntu
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
|
456
README.md
Normal file
456
README.md
Normal file
@ -0,0 +1,456 @@
|
||||
# Magpie
|
||||
|
||||
Magpie is a charm used for testing the networking of a Juju provider/substrate.
|
||||
|
||||
It provides tools for testing:
|
||||
|
||||
- DNS functionality
|
||||
- network connectivity between nodes (iperf, ping)
|
||||
- network benchmarking
|
||||
- MTU
|
||||
- local hostname lookup
|
||||
|
||||
## Usage
|
||||
|
||||
Deploy the charm to two or more units,
|
||||
then run the provided actions to retrieve debug information about the nodes or run network diagnostic tests.
|
||||
|
||||
```
|
||||
juju deploy magpie -n 3
|
||||
|
||||
juju actions magpie
|
||||
juju run magpie/leader info
|
||||
juju run magpie/leader ping
|
||||
# etc.
|
||||
```
|
||||
|
||||
Check the charm config before deploying for values you may wish to tweak,
|
||||
and see the parameters accepted by each action.
|
||||
|
||||
## TODO: document each action and the expected results
|
||||
|
||||
## Network spaces
|
||||
|
||||
If you use network spaces in your Juju deployment (as you should) use
|
||||
`--bind '<space-name> magpie=<space-name>'` to force magpie to test that
|
||||
particular network space.
|
||||
|
||||
It is possible to deploy several magpie charms
|
||||
(as different Juju applications) to the same server each in a different
|
||||
network space.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
juju deploy magpie magpie-space1 --bind "space1 magpie=space1" -n 5 --to 0,2,1,4,3
|
||||
juju deploy magpie magpie-space2 --bind "space2 magpie=space2" -n 3 --to 3,2,0
|
||||
juju deploy magpie magpie-space3 --bind "space3 magpie=space3" -n 4 --to 3,2,1,0
|
||||
juju deploy magpie magpie-space4 --bind "space4 magpie=space4" -n 4 --to 3,2,1,0
|
||||
```
|
||||
|
||||
## Benchmarking network with iperf and grafana
|
||||
|
||||
Assumes juju 3.1
|
||||
|
||||
Step 1, deploy COS:
|
||||
|
||||
```
|
||||
# Deploy COS on microk8s.
|
||||
# https://charmhub.io/topics/canonical-observability-stack/tutorials/install-microk8s
|
||||
juju bootstrap microk8s microk8s
|
||||
juju add-model cos
|
||||
juju deploy cos-lite
|
||||
|
||||
# Expose the endpoints for the magpie model to consume.
|
||||
juju offer grafana:grafana-dashboard
|
||||
juju offer prometheus:receive-remote-write
|
||||
```
|
||||
|
||||
Step 2, deploy magpie and relate to COS
|
||||
|
||||
```
|
||||
juju switch <controller for cloud to be benchmarked>
|
||||
juju add-model magpie
|
||||
|
||||
juju consume microk8s:cos.prometheus
|
||||
juju consume microk8s:cos.grafana
|
||||
|
||||
# adjust as required
|
||||
juju deploy magpie -n 3
|
||||
juju deploy ./magpie_ubuntu-22.04-amd64.charm -n 3
|
||||
|
||||
juju deploy grafana-agent --channel edge
|
||||
juju relate magpie grafana-agent
|
||||
juju relate grafana-agent prometheus
|
||||
juju relate grafana-agent grafana
|
||||
```
|
||||
|
||||
Step 3, run the iperf action and view results in grafana:
|
||||
|
||||
```
|
||||
# adjust as needed
|
||||
juju run magpie/0 iperf
|
||||
|
||||
# you may wish to run against one unit pair at a time:
|
||||
juju run magpie/0 iperf units=magpie/1
|
||||
juju run magpie/0 iperf units=magpie/2
|
||||
# etc.
|
||||
```
|
||||
|
||||
|
||||
Obtain details to access grafana from COS:
|
||||
|
||||
```
|
||||
juju show-unit -m microk8s:cos catalogue/0 --format json | jq -r '.["catalogue/0"]."relation-info"[] | select(."application-data".name == "Grafana") | ."application-data".url'
|
||||
juju config -m microk8s:cos grafana admin_user
|
||||
juju run -m microk8s:cos grafana/0 get-admin-password
|
||||
```
|
||||
|
||||
Find the dashboard titled "Magpie Network Benchmarking",
|
||||
and limit the time range as required.
|
||||
|
||||
## Bonded links testing and troubleshooting
|
||||
|
||||
Network bonding enables the combination of two or more network interfaces into a single-bonded
|
||||
(logical) interface, which increases the bandwidth and provides redundancy. While Magpie does some
|
||||
sanity checks and could reveal some configuration problems, this part of README contains some
|
||||
advanced troubleshooting information, which might be useful, while identifying and fixing the issue.
|
||||
|
||||
There are six bonding modes:
|
||||
|
||||
### `balance-rr`
|
||||
|
||||
Round-robin policy: Transmit packets in sequential order from the first available slave through the
|
||||
last. This mode provides load balancing and fault tolerance.
|
||||
|
||||
### `active-backup`
|
||||
|
||||
Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and
|
||||
only if, the active slave fails. The bond's MAC address is externally visible on only one port
|
||||
(network adapter) to avoid confusing the switch. This mode provides fault tolerance. The primary
|
||||
option affects the behavior of this mode.
|
||||
|
||||
### `balance-xor`
|
||||
|
||||
XOR policy: Transmit based on selectable hashing algorithm. The default policy is a simple
|
||||
source+destination MAC address algorithm. Alternate transmit policies may be selected via the
|
||||
`xmit_hash_policy` option, described below. This mode provides load balancing and fault tolerance.
|
||||
|
||||
### `broadcast`
|
||||
|
||||
Broadcast policy: transmits everything on all slave interfaces. This mode provides fault tolerance.
|
||||
|
||||
### `802.3ad` (LACP)
|
||||
|
||||
Link Aggregation Control Protocol (IEEE 802.3ad LACP) is a control protocol that automatically
|
||||
detects multiple links between two LACP enabled devices and configures them to use their maximum
|
||||
possible bandwidth by automatically trunking the links together. This mode has a prerequisite -
|
||||
the switch(es) ports should have LACP configured and enabled.
|
||||
|
||||
### `balance-tlb`
|
||||
|
||||
Adaptive transmit load balancing: channel bonding that does not require any special switch support.
|
||||
The outgoing traffic is distributed according to the current load (computed relative to the speed)
|
||||
on each slave. Incoming traffic is received by the current slave. If the receiving slave fails,
|
||||
another slave takes over the MAC address of the failed receiving slave.
|
||||
|
||||
### `balance-alb`
|
||||
|
||||
Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic,
|
||||
and does not require any special switch support. The receive load balancing is achieved by ARP
|
||||
negotiation.
|
||||
|
||||
The most commonly used modes are `active-backup` and `802.3ad` (LACP), and while active-backup
|
||||
does not require any third party configuration, it has its own cons - for example, it can't multiply
|
||||
the total bandwidth of the link, while 802.3ad-based bond could utilize all bond members, therefore
|
||||
multiplying the bandwidth. However, in order to get a fully working LACP link, an appropriate
|
||||
configuration has to be done both on the actor (link initiator) and partner (switch) side. Any
|
||||
misconfiguration could lead to the link loss or instability, therefore it's very important to have
|
||||
correct settings applied to the both sides of the link.
|
||||
|
||||
A quick overview of the LACP link status could be obtained by reading the
|
||||
`/proc/net/bonding/<bond_name>` file.
|
||||
|
||||
```
|
||||
$ sudo cat /proc/net/bonding/bondM
|
||||
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
|
||||
|
||||
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
|
||||
Transmit Hash Policy: layer3+4 (1)
|
||||
MII Status: up
|
||||
MII Polling Interval (ms): 100
|
||||
Up Delay (ms): 0
|
||||
Down Delay (ms): 0
|
||||
|
||||
802.3ad info
|
||||
LACP rate: fast
|
||||
Min links: 0
|
||||
Aggregator selection policy (ad_select): stable
|
||||
System priority: 65535
|
||||
System MAC address: 82:23:80:a1:a9:d3
|
||||
Active Aggregator Info:
|
||||
Aggregator ID: 1
|
||||
Number of ports: 2
|
||||
Actor Key: 15
|
||||
Partner Key: 201
|
||||
Partner Mac Address: 02:01:00:00:01:01
|
||||
|
||||
Slave Interface: eno3
|
||||
MII Status: up
|
||||
Speed: 10000 Mbps
|
||||
Duplex: full
|
||||
Link Failure Count: 0
|
||||
Permanent HW addr: 3c:ec:ef:19:eb:30
|
||||
Slave queue ID: 0
|
||||
Aggregator ID: 1
|
||||
Actor Churn State: none
|
||||
Partner Churn State: none
|
||||
Actor Churned Count: 0
|
||||
Partner Churned Count: 0
|
||||
details actor lacp pdu:
|
||||
system priority: 65535
|
||||
system mac address: 82:23:80:a1:a9:d3
|
||||
port key: 15
|
||||
port priority: 255
|
||||
port number: 1
|
||||
port state: 63
|
||||
details partner lacp pdu:
|
||||
system priority: 65534
|
||||
system mac address: 02:01:00:00:01:01
|
||||
oper key: 201
|
||||
port priority: 1
|
||||
port number: 12
|
||||
port state: 63
|
||||
|
||||
Slave Interface: eno1
|
||||
MII Status: up
|
||||
Speed: 10000 Mbps
|
||||
Duplex: full
|
||||
Link Failure Count: 0
|
||||
Permanent HW addr: 3c:ec:ef:19:eb:2e
|
||||
Slave queue ID: 0
|
||||
Aggregator ID: 1
|
||||
Actor Churn State: none
|
||||
Partner Churn State: none
|
||||
Actor Churned Count: 0
|
||||
Partner Churned Count: 0
|
||||
details actor lacp pdu:
|
||||
system priority: 65535
|
||||
system mac address: 82:23:80:a1:a9:d3
|
||||
port key: 15
|
||||
port priority: 255
|
||||
port number: 2
|
||||
port state: 63
|
||||
details partner lacp pdu:
|
||||
system priority: 65534
|
||||
system mac address: 02:01:00:00:01:01
|
||||
oper key: 201
|
||||
port priority: 1
|
||||
port number: 1012
|
||||
port state: 63
|
||||
```
|
||||
|
||||
The key things an operator should take a look at is:
|
||||
|
||||
- LACP rate
|
||||
- Actor Churn State
|
||||
- Partner Churn State
|
||||
- Port State
|
||||
|
||||
### LACP rate
|
||||
|
||||
The Link Aggregation Control Protocol (LACP) provides a standardized means for exchanging
|
||||
information between Partner Systems on a link to allow their Link Aggregation Control instances to
|
||||
reach agreement on the identity of the LAG to which the link belongs, move the link to that LAG, and
|
||||
enable its transmission and reception functions in an orderly manner. The protocol depends upon the
|
||||
transmission of information and state, rather than the transmission of commands. LACPDUs (LACP Data
|
||||
Unit) sent by the first party (the Actor) convey to the second party (the Actor’s protocol Partner)
|
||||
what the Actor knows, both about its own state and that of the Partner.
|
||||
|
||||
Periodic transmission of LACPDUs occurs if the LACP Activity control of either the Actor or the
|
||||
Partner is Active LACP. These periodic transmissions will occur at either a slow or fast
|
||||
transmission rate depending upon the expressed LACP_Timeout preference (Long Timeout or Short
|
||||
Timeout) of the Partner System.
|
||||
|
||||
### Actor/Partner Churn State
|
||||
|
||||
In general, "Churned" port status means that the parties are unable to reach agreement upon the
|
||||
desired state of a link. Under normal operation of the protocol, such a resolution would be reached
|
||||
very rapidly; continued failure to reach agreement can be symptomatic of component failure, of the
|
||||
presence of non-standard devices on the link concerned, or of mis-configuration. Hence, detection of
|
||||
such failures is signalled by the Churn Detection algorithm to the operator in order to prompt
|
||||
administrative action to further resolution.
|
||||
|
||||
### Port State
|
||||
|
||||
Both of the Actor and Partner state are variables, encoded as individual bits within a single octet,
|
||||
as follows.
|
||||
|
||||
0) LACP_Activity: Device intends to transmit periodically in order to find potential
|
||||
members for the aggregate. Active LACP is encoded as a 1; Passive LACP as a 0.
|
||||
1) LACP_Timeout: This flag indicates the Timeout control value with regard to this link. Short
|
||||
Timeout is encoded as a 1; Long Timeout as a 0.
|
||||
2) Aggregability: This flag indicates that the system considers this link to be Aggregateable; i.e.,
|
||||
a potential candidate for aggregation. If FALSE (encoded as a 0), the link is considered to be
|
||||
Individual; i.e., this link can be operated only as an individual link. Aggregatable is encoded as a
|
||||
1; Individual is encoded as a 0.
|
||||
3) Synchronization: Indicates that the bond on the transmitting machine is in sync with what’s being
|
||||
advertised in the LACP frames, meaning the link has been allocated to the correct LAG, the group has
|
||||
been associated with a compatible Aggregator, and the identity of the LAG is consistent with the
|
||||
System ID and operational Key information transmitted. "In Sync" is encoded as a 1; "Out of sync" is
|
||||
encoded as a 0.
|
||||
4) Collecting: Bond is accepting traffic received on this port, collection of incoming frames on
|
||||
this link is definitely enabled and is not expected to be disabled in the absence of administrative
|
||||
changes or changes in received protocol information. True is encoded as a 1; False is encoded as a
|
||||
0.
|
||||
5) Distributing: Bond is sending traffic using these ports encoded. Same as above, but for egress
|
||||
traffic. True is encoded as a 1; False is encoded as a 0.
|
||||
6) Defaulted: Determines, whether the receiving bond is using default (administratively defined)
|
||||
parameters, if the information was received in an LACP PDU. Default settings are encoded as a 1,
|
||||
LACP PDU is encoded as 0.
|
||||
7) Expired: Is the bond in the expired state. Yes encoded as a 1, No encoded as a 0.
|
||||
|
||||
In the example output above, both of the port states are equal to 63. Let's decode:
|
||||
|
||||
```
|
||||
$ python3
|
||||
Python 3.8.4 (default, Jul 17 2020, 15:44:37)
|
||||
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
|
||||
Type "help", "copyright", "credits" or "license" for more information.
|
||||
>>> bin(63)
|
||||
'0b111111'
|
||||
```
|
||||
|
||||
Reading right to the left:
|
||||
|
||||
LACP Activity: Active
|
||||
LACP Timeout: Short
|
||||
Aggregability: Link is Aggregatable
|
||||
Synchronization: Link in sync
|
||||
Collecting: True - bond is accepting the traffic
|
||||
Distributing: True - bond is sending the traffic
|
||||
Defaulted: Info received from LACP PDU
|
||||
Expired: False - link is not expired
|
||||
|
||||
The above status represents the **fully healthy bond** without any LACP-related issues. Also, for
|
||||
the operators' convenience, the [lacp_decoder.py](src/tools/lacp_decoder.py) script could be used to
|
||||
quickly convert the status to some human-friendly format.
|
||||
|
||||
However, the situations where one of the links is misconfigured are happening too, so let's assume
|
||||
we have the following:
|
||||
|
||||
```
|
||||
$ sudo cat /proc/net/bonding/bondm
|
||||
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
|
||||
|
||||
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
|
||||
Transmit Hash Policy: layer3+4 (1)
|
||||
MII Status: up
|
||||
MII Polling Interval (ms): 100
|
||||
Up Delay (ms): 0
|
||||
Down Delay (ms): 0
|
||||
|
||||
802.3ad info
|
||||
LACP rate: fast
|
||||
Min links: 0
|
||||
Aggregator selection policy (ad_select): stable
|
||||
System priority: 65535
|
||||
System MAC address: b4:96:91:6d:20:fc
|
||||
Active Aggregator Info:
|
||||
Aggregator ID: 2
|
||||
Number of ports: 1
|
||||
Actor Key: 9
|
||||
Partner Key: 32784
|
||||
Partner Mac Address: 00:23:04:ee:be:66
|
||||
|
||||
Slave Interface: enp197s0f2
|
||||
MII Status: up
|
||||
Speed: 100 Mbps
|
||||
Duplex: full
|
||||
Link Failure Count: 0
|
||||
Permanent HW addr: b4:96:91:6d:20:fe
|
||||
Slave queue ID: 0
|
||||
Aggregator ID: 1
|
||||
Actor Churn State: churned
|
||||
Partner Churn State: none
|
||||
Actor Churned Count: 1
|
||||
Partner Churned Count: 0
|
||||
details actor lacp pdu:
|
||||
system priority: 65535
|
||||
system mac address: b4:96:91:6d:20:fc
|
||||
port key: 7
|
||||
port priority: 255
|
||||
port number: 1
|
||||
port state: 7
|
||||
details partner lacp pdu:
|
||||
system priority: 32667
|
||||
system mac address: 00:23:04:ee:be:66
|
||||
oper key: 32784
|
||||
port priority: 32768
|
||||
port number: 16661
|
||||
port state: 13
|
||||
|
||||
Slave Interface: enp197s0f0
|
||||
MII Status: up
|
||||
Speed: 1000 Mbps
|
||||
Duplex: full
|
||||
Link Failure Count: 0
|
||||
Permanent HW addr: b4:96:91:6d:20:fc
|
||||
Slave queue ID: 0
|
||||
Aggregator ID: 2
|
||||
Actor Churn State: none
|
||||
Partner Churn State: none
|
||||
Actor Churned Count: 0
|
||||
Partner Churned Count: 0
|
||||
details actor lacp pdu:
|
||||
system priority: 65535
|
||||
system mac address: b4:96:91:6d:20:fc
|
||||
port key: 9
|
||||
port priority: 255
|
||||
port number: 2
|
||||
port state: 63
|
||||
details partner lacp pdu:
|
||||
system priority: 32667
|
||||
system mac address: 00:23:04:ee:be:66
|
||||
oper key: 32784
|
||||
port priority: 32768
|
||||
port number: 277
|
||||
port state: 63
|
||||
```
|
||||
|
||||
As we could see, one of the links has different port states for both partner and actor, while the second
|
||||
one has 63 for both - meaning, the first one is problematic and we'd need to dive more into this
|
||||
problem.
|
||||
|
||||
Let's decode both of the statuses, using the mentioned script:
|
||||
|
||||
```
|
||||
$ python ./lacp-decoder.py 7 13
|
||||
(Equal for both ports) LACP Activity: Active LACP
|
||||
LACP Timeout: Short (Port 1) / Long (Port 2)
|
||||
(Equal for both ports) Aggregability: Aggregatable
|
||||
Synchronization: Link out of sync (Port 1) / Link in sync (Port 2)
|
||||
(Equal for both ports) Collecting: Ingress traffic: Rejecting
|
||||
(Equal for both ports) Distributing: Egress traffic: Not sending
|
||||
(Equal for both ports) Is Defaulted: Settings are received from LACP PDU
|
||||
(Equal for both ports) Link Expiration: No
|
||||
```
|
||||
|
||||
The above output means that there are two differences between these statuses: LACP Timeout and
|
||||
Synchronization. That means two things:
|
||||
|
||||
1) the Partner side (a switch side in most of the cases) has incorrectly configured LACP timeout
|
||||
control. To resolve this, an operator has to either change the LACP rate from the Actor (e.g a
|
||||
server) side to "Slow", or adjust the Partner (e.g switch) LACP rate to "Fast".
|
||||
2) the Partner side considers this physical link as a part of a different link aggregation group. The
|
||||
switch config has to be revisited and link aggregation group members need to be verified again,
|
||||
ensuring there is no extra or wrong links configured as part of the single LAG.
|
||||
|
||||
After addressing the above issues, the port state will change to 63, which means "LACP link is fully
|
||||
functional".
|
||||
|
||||
# Bugs
|
||||
|
||||
Please report bugs on [Launchpad](https://bugs.launchpad.net/charm-magpie/+filebug).
|
||||
|
||||
For general questions please refer to the OpenStack [Charm Guide](https://docs.openstack.org/charm-guide/latest/).
|
89
actions.yaml
Normal file
89
actions.yaml
Normal file
@ -0,0 +1,89 @@
|
||||
iperf:
|
||||
description: |
|
||||
Run iperf
|
||||
params:
|
||||
units:
|
||||
default: ""
|
||||
type: string
|
||||
description: Space separated list of units. If empty string, will run against all peer units.
|
||||
batch-time:
|
||||
type: integer
|
||||
default: 10
|
||||
description: |
|
||||
Maps to iperf -t option, time in seconds to transmit traffic
|
||||
concurrency-progression:
|
||||
type: string
|
||||
default: "2 4 8"
|
||||
description: |
|
||||
Space separated list of concurrencies to use. An equal amount of time will be spent on each concurrency.
|
||||
total-run-time:
|
||||
type: integer
|
||||
default: 600
|
||||
description: |
|
||||
Total run time for iperf test in seconds, per target unit.
|
||||
min-speed:
|
||||
default: "0"
|
||||
description: |
|
||||
Minimum transfer speed in integer mbit/s required to pass the test. "0" disables.
|
||||
|
||||
This can also be set to an integer percentage value (eg. "80%"),
|
||||
which will be interpreted as a percentage of the link speed.
|
||||
Useful in mixed link speed environments.
|
||||
Likewise, "0%" disables.
|
||||
type: string
|
||||
|
||||
info:
|
||||
description: |
|
||||
Retrieve all the information and data about the node as json data.
|
||||
params:
|
||||
required-mtu:
|
||||
default: 0
|
||||
type: integer
|
||||
description: Desired MTU for all nodes - warn if the unit MTU is different (accounting for encapsulation). 0 disables mtu match checking.
|
||||
bonds-to-check:
|
||||
default: AUTO
|
||||
description: Comma separated list of expected bonds or AUTO to check all available bonds.
|
||||
type: string
|
||||
lacp-passive-mode:
|
||||
default: false
|
||||
description: Set to true if switches are in LACP passive mode.
|
||||
type: boolean
|
||||
|
||||
ping:
|
||||
description: |
|
||||
Ping each of the related magpie units and return the results.
|
||||
params:
|
||||
timeout:
|
||||
default: 2
|
||||
description: Timeout in seconds per ICMP request
|
||||
type: integer
|
||||
tries:
|
||||
default: 20
|
||||
description: Number of ICMP packets per ping
|
||||
type: integer
|
||||
interval:
|
||||
default: 0.05
|
||||
description: Number of seconds to wait between sending each packet
|
||||
type: number
|
||||
minimum: 0
|
||||
required-mtu:
|
||||
default: 0
|
||||
type: integer
|
||||
description: Desired MTU for all nodes - warn if the unit MTU is different (accounting for encapsulation). 0 disables mtu match checking.
|
||||
|
||||
dns:
|
||||
description: |
|
||||
Run dns checks against all peer nodes
|
||||
params:
|
||||
server:
|
||||
default: ""
|
||||
description: Provide a custom dns server. Uses unit default DNS server by default.
|
||||
type: string
|
||||
tries:
|
||||
default: 1
|
||||
description: Number of DNS resolution attempts per query
|
||||
type: integer
|
||||
timeout:
|
||||
default: 5
|
||||
description: Timeout in seconds per DNS query try
|
||||
type: integer
|
@ -1,4 +0,0 @@
|
||||
libffi-dev [platform:dpkg]
|
||||
libpq-dev [platform:dpkg]
|
||||
libxml2-dev [platform:dpkg]
|
||||
libxslt1-dev [platform:dpkg]
|
116
charmcraft.yaml
116
charmcraft.yaml
@ -1,113 +1,11 @@
|
||||
type: charm
|
||||
# This file configures Charmcraft.
|
||||
# See https://juju.is/docs/sdk/charmcraft-config for guidance.
|
||||
|
||||
parts:
|
||||
charm:
|
||||
source: src/
|
||||
plugin: reactive
|
||||
reactive-charm-build-arguments:
|
||||
- --binary-wheels-from-source
|
||||
- --verbose
|
||||
build-packages:
|
||||
- libpython3-dev
|
||||
build-snaps:
|
||||
- charm
|
||||
build-environment:
|
||||
- CHARM_INTERFACES_DIR: $CRAFT_PROJECT_DIR/interfaces/
|
||||
- CHARM_LAYERS_DIR: $CRAFT_PROJECT_DIR/layers/
|
||||
type: charm
|
||||
bases:
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "20.04"
|
||||
architectures: [amd64]
|
||||
- name: ubuntu
|
||||
channel: "22.04"
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "20.04"
|
||||
architectures: [amd64]
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "20.04"
|
||||
architectures: [s390x]
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "20.04"
|
||||
architectures: [s390x]
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "20.04"
|
||||
architectures: [ppc64el]
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "20.04"
|
||||
architectures: [ppc64el]
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "20.04"
|
||||
architectures: [arm64]
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "20.04"
|
||||
architectures: [arm64]
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "22.04"
|
||||
architectures: [amd64]
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "22.04"
|
||||
architectures: [amd64]
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "22.04"
|
||||
architectures: [s390x]
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "22.04"
|
||||
architectures: [s390x]
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "22.04"
|
||||
architectures: [ppc64el]
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "22.04"
|
||||
architectures: [ppc64el]
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "22.04"
|
||||
architectures: [arm64]
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "22.04"
|
||||
architectures: [arm64]
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "23.10"
|
||||
architectures: [amd64]
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "23.10"
|
||||
architectures: [amd64]
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "23.10"
|
||||
architectures: [s390x]
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "23.10"
|
||||
architectures: [s390x]
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "23.10"
|
||||
architectures: [ppc64el]
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "23.10"
|
||||
architectures: [ppc64el]
|
||||
- build-on:
|
||||
- name: ubuntu
|
||||
channel: "23.10"
|
||||
architectures: [arm64]
|
||||
run-on:
|
||||
- name: ubuntu
|
||||
channel: "23.10"
|
||||
architectures: [arm64]
|
||||
- name: ubuntu
|
||||
channel: "22.04"
|
||||
|
5
config.yaml
Normal file
5
config.yaml
Normal file
@ -0,0 +1,5 @@
|
||||
options:
|
||||
iperf_listen_cidr:
|
||||
default: ""
|
||||
type: string
|
||||
description: Network cidr to use for iperf listener. Changing this option will only take effect on a new deployment.
|
54
icon.svg
54
icon.svg
@ -1,54 +0,0 @@
|
||||
<?xml version="1.0" standalone="no"?>
|
||||
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
|
||||
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
|
||||
<svg version="1.0" xmlns="http://www.w3.org/2000/svg"
|
||||
width="96.000000pt" height="96.000000pt" viewBox="0 0 96.000000 96.000000"
|
||||
preserveAspectRatio="xMidYMid meet">
|
||||
<metadata>
|
||||
Created by potrace 1.10, written by Peter Selinger 2001-2011
|
||||
</metadata>
|
||||
<g transform="translate(0.000000,96.000000) scale(0.100000,-0.100000)"
|
||||
fill="#000000" stroke="none">
|
||||
<path d="M62 913 c-15 -16 -18 -20 -6 -10 12 10 34 22 50 27 25 7 26 8 7 9
|
||||
-13 1 -35 -11 -51 -26z"/>
|
||||
<path d="M826 929 c59 -17 104 -84 104 -156 0 -13 5 -23 10 -23 15 0 12 65 -6
|
||||
110 -20 49 -61 80 -106 79 l-33 -1 31 -9z"/>
|
||||
<path d="M625 822 c-56 -2 -76 -8 -103 -29 -100 -76 -92 -236 16 -304 125 -79
|
||||
282 8 282 157 0 54 -14 91 -48 127 -40 43 -69 52 -147 49z m0 -31 c-3 -5 -6
|
||||
-32 -6 -60 -2 -49 -2 -50 -23 -36 -27 20 -39 19 -67 -3 -19 -16 -22 -27 -20
|
||||
-71 2 -56 -17 -68 -30 -20 -17 64 22 160 76 187 25 13 78 15 70 3z m72 -13 c6
|
||||
-7 11 -30 11 -51 1 -21 4 -42 7 -48 11 -17 25 1 25 32 0 16 5 29 10 29 12 0
|
||||
13 -50 1 -68 -4 -8 -19 -12 -32 -10 -21 3 -24 9 -27 56 -3 44 -6 53 -20 50
|
||||
-14 -3 -17 -17 -18 -96 -2 -106 -6 -122 -32 -122 -29 0 -42 22 -40 69 1 44 -4
|
||||
57 -21 46 -6 -3 -11 -35 -11 -71 0 -38 -4 -64 -10 -64 -7 0 -10 27 -8 73 l3
|
||||
72 30 0 c29 0 30 -1 33 -53 3 -44 6 -53 20 -50 14 3 17 18 20 98 3 111 5 120
|
||||
30 120 11 0 24 -6 29 -12z m48 -7 c3 -5 1 -12 -4 -15 -5 -3 -11 1 -15 9 -6 16
|
||||
9 21 19 6z m50 -130 c-9 -111 -71 -161 -190 -153 -47 3 -61 16 -44 40 13 17
|
||||
17 18 33 7 39 -29 93 25 84 84 -4 29 4 41 16 23 10 -16 65 0 81 24 8 13 18 24
|
||||
20 24 3 0 3 -22 0 -49z"/>
|
||||
<path d="M86 739 c-23 -18 -41 -26 -69 -28 -5 -1 -5 -5 -2 -11 3 -5 21 -10 39
|
||||
-10 49 0 56 -14 56 -107 0 -63 5 -93 19 -122 23 -44 74 -91 119 -110 18 -7 32
|
||||
-20 32 -27 0 -27 -44 -103 -54 -95 -6 4 -18 1 -27 -9 -14 -15 -10 -16 58 -16
|
||||
68 0 90 6 40 12 -12 1 -25 3 -29 3 -5 1 -8 8 -8 16 0 12 6 12 48 -1 70 -23
|
||||
106 -29 98 -15 -4 6 -18 11 -32 11 -19 0 -24 4 -20 18 3 9 10 36 16 59 9 38
|
||||
14 42 49 48 48 8 132 -13 243 -61 112 -48 167 -65 197 -57 16 4 22 2 17 -6 -5
|
||||
-8 0 -8 17 -2 14 6 29 6 38 0 11 -6 8 -9 -11 -10 -17 -1 -19 -3 -7 -6 14 -4
|
||||
16 -12 11 -46 -15 -101 -51 -132 -177 -153 l-82 -13 85 5 c50 3 99 13 120 23
|
||||
42 20 76 84 83 153 5 49 2 52 -48 64 -23 5 -185 67 -255 96 -61 26 -93 49
|
||||
-170 122 -52 49 -125 111 -161 138 -38 28 -72 62 -79 79 -17 40 -66 79 -99 79
|
||||
-16 0 -40 -9 -55 -21z m44 -24 c-8 -9 -9 -15 -2 -15 6 0 14 5 17 10 13 21 35
|
||||
9 65 -36 35 -49 34 -55 -6 -60 -14 -1 -34 -10 -45 -20 -18 -17 -19 -16 -19 22
|
||||
-1 21 -7 50 -14 63 -14 26 -11 51 6 51 6 0 5 -6 -2 -15z m103 -199 c25 -53 43
|
||||
-78 63 -88 16 -7 30 -22 32 -32 9 -43 -64 -49 -117 -9 -69 52 -87 89 -67 136
|
||||
11 28 41 67 51 67 1 0 19 -33 38 -74z m115 22 c94 -46 112 -63 55 -52 -49 9
|
||||
-142 56 -168 84 l-20 23 25 -8 c14 -4 62 -25 108 -47z m70 -88 c23 0 43 -10
|
||||
68 -33 l37 -33 -59 0 c-33 1 -69 -3 -81 -7 -19 -7 -22 -4 -25 20 -2 19 -14 35
|
||||
-40 52 -59 38 -48 55 15 26 28 -14 67 -25 85 -25z m-73 -151 l0 -44 -37 0
|
||||
c-21 0 -38 1 -38 3 0 2 9 17 21 33 11 16 18 33 15 38 -7 10 13 21 29 17 6 -2
|
||||
10 -23 10 -47z"/>
|
||||
<path d="M168 223 c6 -2 18 -2 25 0 6 3 1 5 -13 5 -14 0 -19 -2 -12 -5z"/>
|
||||
<path d="M1 174 c0 -11 3 -14 6 -6 3 7 2 16 -1 19 -3 4 -6 -2 -5 -13z"/>
|
||||
<path d="M45 61 c28 -33 69 -48 150 -55 l70 -5 -75 13 c-88 15 -107 22 -140
|
||||
50 l-25 21 20 -24z"/>
|
||||
</g>
|
||||
</svg>
|
Before Width: | Height: | Size: 3.3 KiB |
842
lib/charms/grafana_agent/v0/cos_agent.py
Normal file
842
lib/charms/grafana_agent/v0/cos_agent.py
Normal file
@ -0,0 +1,842 @@
|
||||
# Copyright 2023 Canonical Ltd.
|
||||
# See LICENSE file for licensing details.
|
||||
|
||||
r"""## Overview.
|
||||
|
||||
This library can be used to manage the cos_agent relation interface:
|
||||
|
||||
- `COSAgentProvider`: Use in machine charms that need to have a workload's metrics
|
||||
or logs scraped, or forward rule files or dashboards to Prometheus, Loki or Grafana through
|
||||
the Grafana Agent machine charm.
|
||||
|
||||
- `COSAgentConsumer`: Used in the Grafana Agent machine charm to manage the requirer side of
|
||||
the `cos_agent` interface.
|
||||
|
||||
|
||||
## COSAgentProvider Library Usage
|
||||
|
||||
Grafana Agent machine Charmed Operator interacts with its clients using the cos_agent library.
|
||||
Charms seeking to send telemetry, must do so using the `COSAgentProvider` object from
|
||||
this charm library.
|
||||
|
||||
Using the `COSAgentProvider` object only requires instantiating it,
|
||||
typically in the `__init__` method of your charm (the one which sends telemetry).
|
||||
|
||||
The constructor of `COSAgentProvider` has only one required and nine optional parameters:
|
||||
|
||||
```python
|
||||
def __init__(
|
||||
self,
|
||||
charm: CharmType,
|
||||
relation_name: str = DEFAULT_RELATION_NAME,
|
||||
metrics_endpoints: Optional[List[_MetricsEndpointDict]] = None,
|
||||
metrics_rules_dir: str = "./src/prometheus_alert_rules",
|
||||
logs_rules_dir: str = "./src/loki_alert_rules",
|
||||
recurse_rules_dirs: bool = False,
|
||||
log_slots: Optional[List[str]] = None,
|
||||
dashboard_dirs: Optional[List[str]] = None,
|
||||
refresh_events: Optional[List] = None,
|
||||
scrape_configs: Optional[Union[List[Dict], Callable]] = None,
|
||||
):
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
- `charm`: The instance of the charm that instantiates `COSAgentProvider`, typically `self`.
|
||||
|
||||
- `relation_name`: If your charmed operator uses a relation name other than `cos-agent` to use
|
||||
the `cos_agent` interface, this is where you have to specify that.
|
||||
|
||||
- `metrics_endpoints`: In this parameter you can specify the metrics endpoints that Grafana Agent
|
||||
machine Charmed Operator will scrape. The configs of this list will be merged with the configs
|
||||
from `scrape_configs`.
|
||||
|
||||
- `metrics_rules_dir`: The directory in which the Charmed Operator stores its metrics alert rules
|
||||
files.
|
||||
|
||||
- `logs_rules_dir`: The directory in which the Charmed Operator stores its logs alert rules files.
|
||||
|
||||
- `recurse_rules_dirs`: This parameters set whether Grafana Agent machine Charmed Operator has to
|
||||
search alert rules files recursively in the previous two directories or not.
|
||||
|
||||
- `log_slots`: Snap slots to connect to for scraping logs in the form ["snap-name:slot", ...].
|
||||
|
||||
- `dashboard_dirs`: List of directories where the dashboards are stored in the Charmed Operator.
|
||||
|
||||
- `refresh_events`: List of events on which to refresh relation data.
|
||||
|
||||
- `scrape_configs`: List of standard scrape_configs dicts or a callable that returns the list in
|
||||
case the configs need to be generated dynamically. The contents of this list will be merged
|
||||
with the configs from `metrics_endpoints`.
|
||||
|
||||
|
||||
### Example 1 - Minimal instrumentation:
|
||||
|
||||
In order to use this object the following should be in the `charm.py` file.
|
||||
|
||||
```python
|
||||
from charms.grafana_agent.v0.cos_agent import COSAgentProvider
|
||||
...
|
||||
class TelemetryProviderCharm(CharmBase):
|
||||
def __init__(self, *args):
|
||||
...
|
||||
self._grafana_agent = COSAgentProvider(self)
|
||||
```
|
||||
|
||||
### Example 2 - Full instrumentation:
|
||||
|
||||
In order to use this object the following should be in the `charm.py` file.
|
||||
|
||||
```python
|
||||
from charms.grafana_agent.v0.cos_agent import COSAgentProvider
|
||||
...
|
||||
class TelemetryProviderCharm(CharmBase):
|
||||
def __init__(self, *args):
|
||||
...
|
||||
self._grafana_agent = COSAgentProvider(
|
||||
self,
|
||||
relation_name="custom-cos-agent",
|
||||
metrics_endpoints=[
|
||||
# specify "path" and "port" to scrape from localhost
|
||||
{"path": "/metrics", "port": 9000},
|
||||
{"path": "/metrics", "port": 9001},
|
||||
{"path": "/metrics", "port": 9002},
|
||||
],
|
||||
metrics_rules_dir="./src/alert_rules/prometheus",
|
||||
logs_rules_dir="./src/alert_rules/loki",
|
||||
recursive_rules_dir=True,
|
||||
log_slots=["my-app:slot"],
|
||||
dashboard_dirs=["./src/dashboards_1", "./src/dashboards_2"],
|
||||
refresh_events=["update-status", "upgrade-charm"],
|
||||
scrape_configs=[
|
||||
{
|
||||
"job_name": "custom_job",
|
||||
"metrics_path": "/metrics",
|
||||
"authorization": {"credentials": "bearer-token"},
|
||||
"static_configs": [
|
||||
{
|
||||
"targets": ["localhost:9003"]},
|
||||
"labels": {"key": "value"},
|
||||
},
|
||||
],
|
||||
},
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
### Example 3 - Dynamic scrape configs generation:
|
||||
|
||||
Pass a function to the `scrape_configs` to decouple the generation of the configs
|
||||
from the instantiation of the COSAgentProvider object.
|
||||
|
||||
```python
|
||||
from charms.grafana_agent.v0.cos_agent import COSAgentProvider
|
||||
...
|
||||
|
||||
class TelemetryProviderCharm(CharmBase):
|
||||
def generate_scrape_configs(self):
|
||||
return [
|
||||
{
|
||||
"job_name": "custom",
|
||||
"metrics_path": "/metrics",
|
||||
"static_configs": [{"targets": ["localhost:9000"]}],
|
||||
},
|
||||
]
|
||||
|
||||
def __init__(self, *args):
|
||||
...
|
||||
self._grafana_agent = COSAgentProvider(
|
||||
self,
|
||||
scrape_configs=self.generate_scrape_configs,
|
||||
)
|
||||
```
|
||||
|
||||
## COSAgentConsumer Library Usage
|
||||
|
||||
This object may be used by any Charmed Operator which gathers telemetry data by
|
||||
implementing the consumer side of the `cos_agent` interface.
|
||||
For instance Grafana Agent machine Charmed Operator.
|
||||
|
||||
For this purpose the charm needs to instantiate the `COSAgentConsumer` object with one mandatory
|
||||
and two optional arguments.
|
||||
|
||||
### Parameters
|
||||
|
||||
- `charm`: A reference to the parent (Grafana Agent machine) charm.
|
||||
|
||||
- `relation_name`: The name of the relation that the charm uses to interact
|
||||
with its clients that provides telemetry data using the `COSAgentProvider` object.
|
||||
|
||||
If provided, this relation name must match a provided relation in metadata.yaml with the
|
||||
`cos_agent` interface.
|
||||
The default value of this argument is "cos-agent".
|
||||
|
||||
- `refresh_events`: List of events on which to refresh relation data.
|
||||
|
||||
|
||||
### Example 1 - Minimal instrumentation:
|
||||
|
||||
In order to use this object the following should be in the `charm.py` file.
|
||||
|
||||
```python
|
||||
from charms.grafana_agent.v0.cos_agent import COSAgentConsumer
|
||||
...
|
||||
class GrafanaAgentMachineCharm(GrafanaAgentCharm)
|
||||
def __init__(self, *args):
|
||||
...
|
||||
self._cos = COSAgentRequirer(self)
|
||||
```
|
||||
|
||||
|
||||
### Example 2 - Full instrumentation:
|
||||
|
||||
In order to use this object the following should be in the `charm.py` file.
|
||||
|
||||
```python
|
||||
from charms.grafana_agent.v0.cos_agent import COSAgentConsumer
|
||||
...
|
||||
class GrafanaAgentMachineCharm(GrafanaAgentCharm)
|
||||
def __init__(self, *args):
|
||||
...
|
||||
self._cos = COSAgentRequirer(
|
||||
self,
|
||||
relation_name="cos-agent-consumer",
|
||||
refresh_events=["update-status", "upgrade-charm"],
|
||||
)
|
||||
```
|
||||
"""
|
||||
|
||||
import base64
|
||||
import json
|
||||
import logging
|
||||
import lzma
|
||||
from collections import namedtuple
|
||||
from itertools import chain
|
||||
from pathlib import Path
|
||||
from typing import TYPE_CHECKING, Any, Callable, ClassVar, Dict, List, Optional, Set, Union
|
||||
|
||||
import pydantic
|
||||
from cosl import JujuTopology
|
||||
from cosl.rules import AlertRules
|
||||
from ops.charm import RelationChangedEvent
|
||||
from ops.framework import EventBase, EventSource, Object, ObjectEvents
|
||||
from ops.model import Relation, Unit
|
||||
from ops.testing import CharmType
|
||||
|
||||
if TYPE_CHECKING:
|
||||
try:
|
||||
from typing import TypedDict
|
||||
|
||||
class _MetricsEndpointDict(TypedDict):
|
||||
path: str
|
||||
port: int
|
||||
|
||||
except ModuleNotFoundError:
|
||||
_MetricsEndpointDict = Dict # pyright: ignore
|
||||
|
||||
LIBID = "dc15fa84cef84ce58155fb84f6c6213a"
|
||||
LIBAPI = 0
|
||||
LIBPATCH = 6
|
||||
|
||||
PYDEPS = ["cosl", "pydantic < 2"]
|
||||
|
||||
DEFAULT_RELATION_NAME = "cos-agent"
|
||||
DEFAULT_PEER_RELATION_NAME = "peers"
|
||||
DEFAULT_SCRAPE_CONFIG = {
|
||||
"static_configs": [{"targets": ["localhost:80"]}],
|
||||
"metrics_path": "/metrics",
|
||||
}
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
SnapEndpoint = namedtuple("SnapEndpoint", "owner, name")
|
||||
|
||||
|
||||
class GrafanaDashboard(str):
|
||||
"""Grafana Dashboard encoded json; lzma-compressed."""
|
||||
|
||||
# TODO Replace this with a custom type when pydantic v2 released (end of 2023 Q1?)
|
||||
# https://github.com/pydantic/pydantic/issues/4887
|
||||
@staticmethod
|
||||
def _serialize(raw_json: Union[str, bytes]) -> "GrafanaDashboard":
|
||||
if not isinstance(raw_json, bytes):
|
||||
raw_json = raw_json.encode("utf-8")
|
||||
encoded = base64.b64encode(lzma.compress(raw_json)).decode("utf-8")
|
||||
return GrafanaDashboard(encoded)
|
||||
|
||||
def _deserialize(self) -> Dict:
|
||||
try:
|
||||
raw = lzma.decompress(base64.b64decode(self.encode("utf-8"))).decode()
|
||||
return json.loads(raw)
|
||||
except json.decoder.JSONDecodeError as e:
|
||||
logger.error("Invalid Dashboard format: %s", e)
|
||||
return {}
|
||||
|
||||
def __repr__(self):
|
||||
"""Return string representation of self."""
|
||||
return "<GrafanaDashboard>"
|
||||
|
||||
|
||||
class CosAgentProviderUnitData(pydantic.BaseModel):
|
||||
"""Unit databag model for `cos-agent` relation."""
|
||||
|
||||
# The following entries are the same for all units of the same principal.
|
||||
# Note that the same grafana agent subordinate may be related to several apps.
|
||||
# this needs to make its way to the gagent leader
|
||||
metrics_alert_rules: dict
|
||||
log_alert_rules: dict
|
||||
dashboards: List[GrafanaDashboard]
|
||||
subordinate: Optional[bool]
|
||||
|
||||
# The following entries may vary across units of the same principal app.
|
||||
# this data does not need to be forwarded to the gagent leader
|
||||
metrics_scrape_jobs: List[Dict]
|
||||
log_slots: List[str]
|
||||
|
||||
# when this whole datastructure is dumped into a databag, it will be nested under this key.
|
||||
# while not strictly necessary (we could have it 'flattened out' into the databag),
|
||||
# this simplifies working with the model.
|
||||
KEY: ClassVar[str] = "config"
|
||||
|
||||
|
||||
class CosAgentPeersUnitData(pydantic.BaseModel):
|
||||
"""Unit databag model for `peers` cos-agent machine charm peer relation."""
|
||||
|
||||
# We need the principal unit name and relation metadata to be able to render identifiers
|
||||
# (e.g. topology) on the leader side, after all the data moves into peer data (the grafana
|
||||
# agent leader can only see its own principal, because it is a subordinate charm).
|
||||
principal_unit_name: str
|
||||
principal_relation_id: str
|
||||
principal_relation_name: str
|
||||
|
||||
# The only data that is forwarded to the leader is data that needs to go into the app databags
|
||||
# of the outgoing o11y relations.
|
||||
metrics_alert_rules: Optional[dict]
|
||||
log_alert_rules: Optional[dict]
|
||||
dashboards: Optional[List[GrafanaDashboard]]
|
||||
|
||||
# when this whole datastructure is dumped into a databag, it will be nested under this key.
|
||||
# while not strictly necessary (we could have it 'flattened out' into the databag),
|
||||
# this simplifies working with the model.
|
||||
KEY: ClassVar[str] = "config"
|
||||
|
||||
@property
|
||||
def app_name(self) -> str:
|
||||
"""Parse out the app name from the unit name.
|
||||
|
||||
TODO: Switch to using `model_post_init` when pydantic v2 is released?
|
||||
https://github.com/pydantic/pydantic/issues/1729#issuecomment-1300576214
|
||||
"""
|
||||
return self.principal_unit_name.split("/")[0]
|
||||
|
||||
|
||||
class COSAgentProvider(Object):
|
||||
"""Integration endpoint wrapper for the provider side of the cos_agent interface."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
charm: CharmType,
|
||||
relation_name: str = DEFAULT_RELATION_NAME,
|
||||
metrics_endpoints: Optional[List["_MetricsEndpointDict"]] = None,
|
||||
metrics_rules_dir: str = "./src/prometheus_alert_rules",
|
||||
logs_rules_dir: str = "./src/loki_alert_rules",
|
||||
recurse_rules_dirs: bool = False,
|
||||
log_slots: Optional[List[str]] = None,
|
||||
dashboard_dirs: Optional[List[str]] = None,
|
||||
refresh_events: Optional[List] = None,
|
||||
*,
|
||||
scrape_configs: Optional[Union[List[dict], Callable]] = None,
|
||||
):
|
||||
"""Create a COSAgentProvider instance.
|
||||
|
||||
Args:
|
||||
charm: The `CharmBase` instance that is instantiating this object.
|
||||
relation_name: The name of the relation to communicate over.
|
||||
metrics_endpoints: List of endpoints in the form [{"path": path, "port": port}, ...].
|
||||
This argument is a simplified form of the `scrape_configs`.
|
||||
The contents of this list will be merged with the contents of `scrape_configs`.
|
||||
metrics_rules_dir: Directory where the metrics rules are stored.
|
||||
logs_rules_dir: Directory where the logs rules are stored.
|
||||
recurse_rules_dirs: Whether to recurse into rule paths.
|
||||
log_slots: Snap slots to connect to for scraping logs
|
||||
in the form ["snap-name:slot", ...].
|
||||
dashboard_dirs: Directory where the dashboards are stored.
|
||||
refresh_events: List of events on which to refresh relation data.
|
||||
scrape_configs: List of standard scrape_configs dicts or a callable
|
||||
that returns the list in case the configs need to be generated dynamically.
|
||||
The contents of this list will be merged with the contents of `metrics_endpoints`.
|
||||
"""
|
||||
super().__init__(charm, relation_name)
|
||||
dashboard_dirs = dashboard_dirs or ["./src/grafana_dashboards"]
|
||||
|
||||
self._charm = charm
|
||||
self._relation_name = relation_name
|
||||
self._metrics_endpoints = metrics_endpoints or []
|
||||
self._scrape_configs = scrape_configs or []
|
||||
self._metrics_rules = metrics_rules_dir
|
||||
self._logs_rules = logs_rules_dir
|
||||
self._recursive = recurse_rules_dirs
|
||||
self._log_slots = log_slots or []
|
||||
self._dashboard_dirs = dashboard_dirs
|
||||
self._refresh_events = refresh_events or [self._charm.on.config_changed]
|
||||
|
||||
events = self._charm.on[relation_name]
|
||||
self.framework.observe(events.relation_joined, self._on_refresh)
|
||||
self.framework.observe(events.relation_changed, self._on_refresh)
|
||||
for event in self._refresh_events:
|
||||
self.framework.observe(event, self._on_refresh)
|
||||
|
||||
def _on_refresh(self, event):
|
||||
"""Trigger the class to update relation data."""
|
||||
relations = self._charm.model.relations[self._relation_name]
|
||||
|
||||
for relation in relations:
|
||||
# Before a principal is related to the grafana-agent subordinate, we'd get
|
||||
# ModelError: ERROR cannot read relation settings: unit "zk/2": settings not found
|
||||
# Add a guard to make sure it doesn't happen.
|
||||
if relation.data and self._charm.unit in relation.data:
|
||||
# Subordinate relations can communicate only over unit data.
|
||||
try:
|
||||
data = CosAgentProviderUnitData(
|
||||
metrics_alert_rules=self._metrics_alert_rules,
|
||||
log_alert_rules=self._log_alert_rules,
|
||||
dashboards=self._dashboards,
|
||||
metrics_scrape_jobs=self._scrape_jobs,
|
||||
log_slots=self._log_slots,
|
||||
subordinate=self._charm.meta.subordinate,
|
||||
)
|
||||
relation.data[self._charm.unit][data.KEY] = data.json()
|
||||
except (
|
||||
pydantic.ValidationError,
|
||||
json.decoder.JSONDecodeError,
|
||||
) as e:
|
||||
logger.error("Invalid relation data provided: %s", e)
|
||||
|
||||
@property
|
||||
def _scrape_jobs(self) -> List[Dict]:
|
||||
"""Return a prometheus_scrape-like data structure for jobs.
|
||||
|
||||
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
|
||||
"""
|
||||
if callable(self._scrape_configs):
|
||||
scrape_configs = self._scrape_configs()
|
||||
else:
|
||||
# Create a copy of the user scrape_configs, since we will mutate this object
|
||||
scrape_configs = self._scrape_configs.copy()
|
||||
|
||||
# Convert "metrics_endpoints" to standard scrape_configs, and add them in
|
||||
for endpoint in self._metrics_endpoints:
|
||||
scrape_configs.append(
|
||||
{
|
||||
"metrics_path": endpoint["path"],
|
||||
"static_configs": [{"targets": [f"localhost:{endpoint['port']}"]}],
|
||||
}
|
||||
)
|
||||
|
||||
scrape_configs = scrape_configs or [DEFAULT_SCRAPE_CONFIG]
|
||||
|
||||
# Augment job name to include the app name and a unique id (index)
|
||||
for idx, scrape_config in enumerate(scrape_configs):
|
||||
scrape_config["job_name"] = "_".join(
|
||||
[self._charm.app.name, str(idx), scrape_config.get("job_name", "default")]
|
||||
)
|
||||
|
||||
return scrape_configs
|
||||
|
||||
@property
|
||||
def _metrics_alert_rules(self) -> Dict:
|
||||
"""Use (for now) the prometheus_scrape AlertRules to initialize this."""
|
||||
alert_rules = AlertRules(
|
||||
query_type="promql", topology=JujuTopology.from_charm(self._charm)
|
||||
)
|
||||
alert_rules.add_path(self._metrics_rules, recursive=self._recursive)
|
||||
return alert_rules.as_dict()
|
||||
|
||||
@property
|
||||
def _log_alert_rules(self) -> Dict:
|
||||
"""Use (for now) the loki_push_api AlertRules to initialize this."""
|
||||
alert_rules = AlertRules(query_type="logql", topology=JujuTopology.from_charm(self._charm))
|
||||
alert_rules.add_path(self._logs_rules, recursive=self._recursive)
|
||||
return alert_rules.as_dict()
|
||||
|
||||
@property
|
||||
def _dashboards(self) -> List[GrafanaDashboard]:
|
||||
dashboards: List[GrafanaDashboard] = []
|
||||
for d in self._dashboard_dirs:
|
||||
for path in Path(d).glob("*"):
|
||||
dashboard = GrafanaDashboard._serialize(path.read_bytes())
|
||||
dashboards.append(dashboard)
|
||||
return dashboards
|
||||
|
||||
|
||||
class COSAgentDataChanged(EventBase):
|
||||
"""Event emitted by `COSAgentRequirer` when relation data changes."""
|
||||
|
||||
|
||||
class COSAgentValidationError(EventBase):
|
||||
"""Event emitted by `COSAgentRequirer` when there is an error in the relation data."""
|
||||
|
||||
def __init__(self, handle, message: str = ""):
|
||||
super().__init__(handle)
|
||||
self.message = message
|
||||
|
||||
def snapshot(self) -> Dict:
|
||||
"""Save COSAgentValidationError source information."""
|
||||
return {"message": self.message}
|
||||
|
||||
def restore(self, snapshot):
|
||||
"""Restore COSAgentValidationError source information."""
|
||||
self.message = snapshot["message"]
|
||||
|
||||
|
||||
class COSAgentRequirerEvents(ObjectEvents):
|
||||
"""`COSAgentRequirer` events."""
|
||||
|
||||
data_changed = EventSource(COSAgentDataChanged)
|
||||
validation_error = EventSource(COSAgentValidationError)
|
||||
|
||||
|
||||
class MultiplePrincipalsError(Exception):
|
||||
"""Custom exception for when there are multiple principal applications."""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
class COSAgentRequirer(Object):
|
||||
"""Integration endpoint wrapper for the Requirer side of the cos_agent interface."""
|
||||
|
||||
on = COSAgentRequirerEvents() # pyright: ignore
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
charm: CharmType,
|
||||
*,
|
||||
relation_name: str = DEFAULT_RELATION_NAME,
|
||||
peer_relation_name: str = DEFAULT_PEER_RELATION_NAME,
|
||||
refresh_events: Optional[List[str]] = None,
|
||||
):
|
||||
"""Create a COSAgentRequirer instance.
|
||||
|
||||
Args:
|
||||
charm: The `CharmBase` instance that is instantiating this object.
|
||||
relation_name: The name of the relation to communicate over.
|
||||
peer_relation_name: The name of the peer relation to communicate over.
|
||||
refresh_events: List of events on which to refresh relation data.
|
||||
"""
|
||||
super().__init__(charm, relation_name)
|
||||
self._charm = charm
|
||||
self._relation_name = relation_name
|
||||
self._peer_relation_name = peer_relation_name
|
||||
self._refresh_events = refresh_events or [self._charm.on.config_changed]
|
||||
|
||||
events = self._charm.on[relation_name]
|
||||
self.framework.observe(
|
||||
events.relation_joined, self._on_relation_data_changed
|
||||
) # TODO: do we need this?
|
||||
self.framework.observe(events.relation_changed, self._on_relation_data_changed)
|
||||
for event in self._refresh_events:
|
||||
self.framework.observe(event, self.trigger_refresh) # pyright: ignore
|
||||
|
||||
# Peer relation events
|
||||
# A peer relation is needed as it is the only mechanism for exchanging data across
|
||||
# subordinate units.
|
||||
# self.framework.observe(
|
||||
# self.on[self._peer_relation_name].relation_joined, self._on_peer_relation_joined
|
||||
# )
|
||||
peer_events = self._charm.on[peer_relation_name]
|
||||
self.framework.observe(peer_events.relation_changed, self._on_peer_relation_changed)
|
||||
|
||||
@property
|
||||
def peer_relation(self) -> Optional["Relation"]:
|
||||
"""Helper function for obtaining the peer relation object.
|
||||
|
||||
Returns: peer relation object
|
||||
(NOTE: would return None if called too early, e.g. during install).
|
||||
"""
|
||||
return self.model.get_relation(self._peer_relation_name)
|
||||
|
||||
def _on_peer_relation_changed(self, _):
|
||||
# Peer data is used for forwarding data from principal units to the grafana agent
|
||||
# subordinate leader, for updating the app data of the outgoing o11y relations.
|
||||
if self._charm.unit.is_leader():
|
||||
self.on.data_changed.emit() # pyright: ignore
|
||||
|
||||
def _on_relation_data_changed(self, event: RelationChangedEvent):
|
||||
# Peer data is the only means of communication between subordinate units.
|
||||
if not self.peer_relation:
|
||||
event.defer()
|
||||
return
|
||||
|
||||
cos_agent_relation = event.relation
|
||||
if not event.unit or not cos_agent_relation.data.get(event.unit):
|
||||
return
|
||||
principal_unit = event.unit
|
||||
|
||||
# Coherence check
|
||||
units = cos_agent_relation.units
|
||||
if len(units) > 1:
|
||||
# should never happen
|
||||
raise ValueError(
|
||||
f"unexpected error: subordinate relation {cos_agent_relation} "
|
||||
f"should have exactly one unit"
|
||||
)
|
||||
|
||||
if not (raw := cos_agent_relation.data[principal_unit].get(CosAgentProviderUnitData.KEY)):
|
||||
return
|
||||
|
||||
if not (provider_data := self._validated_provider_data(raw)):
|
||||
return
|
||||
|
||||
# Copy data from the principal relation to the peer relation, so the leader could
|
||||
# follow up.
|
||||
# Save the originating unit name, so it could be used for topology later on by the leader.
|
||||
data = CosAgentPeersUnitData( # peer relation databag model
|
||||
principal_unit_name=event.unit.name,
|
||||
principal_relation_id=str(event.relation.id),
|
||||
principal_relation_name=event.relation.name,
|
||||
metrics_alert_rules=provider_data.metrics_alert_rules,
|
||||
log_alert_rules=provider_data.log_alert_rules,
|
||||
dashboards=provider_data.dashboards,
|
||||
)
|
||||
self.peer_relation.data[self._charm.unit][
|
||||
f"{CosAgentPeersUnitData.KEY}-{event.unit.name}"
|
||||
] = data.json()
|
||||
|
||||
# We can't easily tell if the data that was changed is limited to only the data
|
||||
# that goes into peer relation (in which case, if this is not a leader unit, we wouldn't
|
||||
# need to emit `on.data_changed`), so we're emitting `on.data_changed` either way.
|
||||
self.on.data_changed.emit() # pyright: ignore
|
||||
|
||||
def _validated_provider_data(self, raw) -> Optional[CosAgentProviderUnitData]:
|
||||
try:
|
||||
return CosAgentProviderUnitData(**json.loads(raw))
|
||||
except (pydantic.ValidationError, json.decoder.JSONDecodeError) as e:
|
||||
self.on.validation_error.emit(message=str(e)) # pyright: ignore
|
||||
return None
|
||||
|
||||
def trigger_refresh(self, _):
|
||||
"""Trigger a refresh of relation data."""
|
||||
# FIXME: Figure out what we should do here
|
||||
self.on.data_changed.emit() # pyright: ignore
|
||||
|
||||
@property
|
||||
def _principal_unit(self) -> Optional[Unit]:
|
||||
"""Return the principal unit for a relation.
|
||||
|
||||
Assumes that the relation is of type subordinate.
|
||||
Relies on the fact that, for subordinate relations, the only remote unit visible to
|
||||
*this unit* is the principal unit that this unit is attached to.
|
||||
"""
|
||||
if relations := self._principal_relations:
|
||||
# Technically it's a list, but for subordinates there can only be one relation
|
||||
principal_relation = next(iter(relations))
|
||||
if units := principal_relation.units:
|
||||
# Technically it's a list, but for subordinates there can only be one
|
||||
return next(iter(units))
|
||||
|
||||
return None
|
||||
|
||||
@property
|
||||
def _principal_relations(self):
|
||||
relations = []
|
||||
for relation in self._charm.model.relations[self._relation_name]:
|
||||
if not json.loads(relation.data[next(iter(relation.units))]["config"]).get(
|
||||
["subordinate"], False
|
||||
):
|
||||
relations.append(relation)
|
||||
if len(relations) > 1:
|
||||
logger.error(
|
||||
"Multiple applications claiming to be principal. Update the cos-agent library in the client application charms."
|
||||
)
|
||||
raise MultiplePrincipalsError("Multiple principal applications.")
|
||||
return relations
|
||||
|
||||
@property
|
||||
def _remote_data(self) -> List[CosAgentProviderUnitData]:
|
||||
"""Return a list of remote data from each of the related units.
|
||||
|
||||
Assumes that the relation is of type subordinate.
|
||||
Relies on the fact that, for subordinate relations, the only remote unit visible to
|
||||
*this unit* is the principal unit that this unit is attached to.
|
||||
"""
|
||||
all_data = []
|
||||
|
||||
for relation in self._charm.model.relations[self._relation_name]:
|
||||
if not relation.units:
|
||||
continue
|
||||
unit = next(iter(relation.units))
|
||||
if not (raw := relation.data[unit].get(CosAgentProviderUnitData.KEY)):
|
||||
continue
|
||||
if not (provider_data := self._validated_provider_data(raw)):
|
||||
continue
|
||||
all_data.append(provider_data)
|
||||
|
||||
return all_data
|
||||
|
||||
def _gather_peer_data(self) -> List[CosAgentPeersUnitData]:
|
||||
"""Collect data from the peers.
|
||||
|
||||
Returns a trimmed-down list of CosAgentPeersUnitData.
|
||||
"""
|
||||
relation = self.peer_relation
|
||||
|
||||
# Ensure that whatever context we're running this in, we take the necessary precautions:
|
||||
if not relation or not relation.data or not relation.app:
|
||||
return []
|
||||
|
||||
# Iterate over all peer unit data and only collect every principal once.
|
||||
peer_data: List[CosAgentPeersUnitData] = []
|
||||
app_names: Set[str] = set()
|
||||
|
||||
for unit in chain((self._charm.unit,), relation.units):
|
||||
if not relation.data.get(unit):
|
||||
continue
|
||||
|
||||
for unit_name in relation.data.get(unit): # pyright: ignore
|
||||
if not unit_name.startswith(CosAgentPeersUnitData.KEY):
|
||||
continue
|
||||
raw = relation.data[unit].get(unit_name)
|
||||
if raw is None:
|
||||
continue
|
||||
data = CosAgentPeersUnitData(**json.loads(raw))
|
||||
# Have we already seen this principal app?
|
||||
if (app_name := data.app_name) in app_names:
|
||||
continue
|
||||
peer_data.append(data)
|
||||
app_names.add(app_name)
|
||||
|
||||
return peer_data
|
||||
|
||||
@property
|
||||
def metrics_alerts(self) -> Dict[str, Any]:
|
||||
"""Fetch metrics alerts."""
|
||||
alert_rules = {}
|
||||
|
||||
seen_apps: List[str] = []
|
||||
for data in self._gather_peer_data():
|
||||
if rules := data.metrics_alert_rules:
|
||||
app_name = data.app_name
|
||||
if app_name in seen_apps:
|
||||
continue # dedup!
|
||||
seen_apps.append(app_name)
|
||||
# This is only used for naming the file, so be as specific as we can be
|
||||
identifier = JujuTopology(
|
||||
model=self._charm.model.name,
|
||||
model_uuid=self._charm.model.uuid,
|
||||
application=app_name,
|
||||
# For the topology unit, we could use `data.principal_unit_name`, but that unit
|
||||
# name may not be very stable: `_gather_peer_data` de-duplicates by app name so
|
||||
# the exact unit name that turns up first in the iterator may vary from time to
|
||||
# time. So using the grafana-agent unit name instead.
|
||||
unit=self._charm.unit.name,
|
||||
).identifier
|
||||
|
||||
alert_rules[identifier] = rules
|
||||
|
||||
return alert_rules
|
||||
|
||||
@property
|
||||
def metrics_jobs(self) -> List[Dict]:
|
||||
"""Parse the relation data contents and extract the metrics jobs."""
|
||||
scrape_jobs = []
|
||||
for data in self._remote_data:
|
||||
for job in data.metrics_scrape_jobs:
|
||||
# In #220, relation schema changed from a simplified dict to the standard
|
||||
# `scrape_configs`.
|
||||
# This is to ensure backwards compatibility with Providers older than v0.5.
|
||||
if "path" in job and "port" in job and "job_name" in job:
|
||||
job = {
|
||||
"job_name": job["job_name"],
|
||||
"metrics_path": job["path"],
|
||||
"static_configs": [{"targets": [f"localhost:{job['port']}"]}],
|
||||
}
|
||||
|
||||
scrape_jobs.append(job)
|
||||
|
||||
return scrape_jobs
|
||||
|
||||
@property
|
||||
def snap_log_endpoints(self) -> List[SnapEndpoint]:
|
||||
"""Fetch logging endpoints exposed by related snaps."""
|
||||
plugs = []
|
||||
for data in self._remote_data:
|
||||
targets = data.log_slots
|
||||
if targets:
|
||||
for target in targets:
|
||||
if target in plugs:
|
||||
logger.warning(
|
||||
f"plug {target} already listed. "
|
||||
"The same snap is being passed from multiple "
|
||||
"endpoints; this should not happen."
|
||||
)
|
||||
else:
|
||||
plugs.append(target)
|
||||
|
||||
endpoints = []
|
||||
for plug in plugs:
|
||||
if ":" not in plug:
|
||||
logger.error(f"invalid plug definition received: {plug}. Ignoring...")
|
||||
else:
|
||||
endpoint = SnapEndpoint(*plug.split(":"))
|
||||
endpoints.append(endpoint)
|
||||
return endpoints
|
||||
|
||||
@property
|
||||
def logs_alerts(self) -> Dict[str, Any]:
|
||||
"""Fetch log alerts."""
|
||||
alert_rules = {}
|
||||
seen_apps: List[str] = []
|
||||
|
||||
for data in self._gather_peer_data():
|
||||
if rules := data.log_alert_rules:
|
||||
# This is only used for naming the file, so be as specific as we can be
|
||||
app_name = data.app_name
|
||||
if app_name in seen_apps:
|
||||
continue # dedup!
|
||||
seen_apps.append(app_name)
|
||||
|
||||
identifier = JujuTopology(
|
||||
model=self._charm.model.name,
|
||||
model_uuid=self._charm.model.uuid,
|
||||
application=app_name,
|
||||
# For the topology unit, we could use `data.principal_unit_name`, but that unit
|
||||
# name may not be very stable: `_gather_peer_data` de-duplicates by app name so
|
||||
# the exact unit name that turns up first in the iterator may vary from time to
|
||||
# time. So using the grafana-agent unit name instead.
|
||||
unit=self._charm.unit.name,
|
||||
).identifier
|
||||
|
||||
alert_rules[identifier] = rules
|
||||
|
||||
return alert_rules
|
||||
|
||||
@property
|
||||
def dashboards(self) -> List[Dict[str, str]]:
|
||||
"""Fetch dashboards as encoded content.
|
||||
|
||||
Dashboards are assumed not to vary across units of the same primary.
|
||||
"""
|
||||
dashboards: List[Dict[str, Any]] = []
|
||||
|
||||
seen_apps: List[str] = []
|
||||
for data in self._gather_peer_data():
|
||||
app_name = data.app_name
|
||||
if app_name in seen_apps:
|
||||
continue # dedup!
|
||||
seen_apps.append(app_name)
|
||||
|
||||
for encoded_dashboard in data.dashboards or ():
|
||||
content = GrafanaDashboard(encoded_dashboard)._deserialize()
|
||||
|
||||
title = content.get("title", "no_title")
|
||||
|
||||
dashboards.append(
|
||||
{
|
||||
"relation_id": data.principal_relation_id,
|
||||
# We have the remote charm name - use it for the identifier
|
||||
"charm": f"{data.principal_relation_name}-{app_name}",
|
||||
"content": content,
|
||||
"title": title,
|
||||
}
|
||||
)
|
||||
|
||||
return dashboards
|
@ -1 +0,0 @@
|
||||
src/metadata.yaml
|
21
metadata.yaml
Normal file
21
metadata.yaml
Normal file
@ -0,0 +1,21 @@
|
||||
# This file populates the Overview on Charmhub.
|
||||
# See https://juju.is/docs/sdk/metadata-reference for a checklist and guidance.
|
||||
name: magpie
|
||||
summary: Magpie layer to test networking - ICMP and DNS
|
||||
maintainer: OpenStack Charmers <openstack-charmers@lists.ubuntu.com>
|
||||
description: |
|
||||
Magpie will check ICMP, DNS, MTU and rx/tx speed between itself and any
|
||||
peer units deployed - deploy more than one magpie unit for meaningful results.
|
||||
tags: [testing, CI]
|
||||
provides:
|
||||
# https://charmhub.io/grafana-agent/libraries/cos_agent
|
||||
cos-agent:
|
||||
interface: cos_agent
|
||||
peers:
|
||||
magpie:
|
||||
interface: magpie2
|
||||
series:
|
||||
- focal
|
||||
- jammy
|
||||
- lunar
|
||||
- mantic
|
@ -1,13 +1,8 @@
|
||||
- project:
|
||||
templates:
|
||||
- charm-unit-jobs-py38
|
||||
- charm-unit-jobs-py310
|
||||
check:
|
||||
jobs:
|
||||
- focal
|
||||
- jammy
|
||||
vars:
|
||||
needs_charm_build: true
|
||||
charm_build_name: magpie
|
||||
build_type: charmcraft
|
||||
charmcraft_channel: 2.x/edge
|
||||
charmcraft_channel: 2.x/stable
|
||||
|
47
pyproject.toml
Normal file
47
pyproject.toml
Normal file
@ -0,0 +1,47 @@
|
||||
# Testing tools configuration
|
||||
[tool.coverage.run]
|
||||
branch = true
|
||||
|
||||
[tool.coverage.report]
|
||||
show_missing = true
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
minversion = "6.0"
|
||||
log_cli_level = "INFO"
|
||||
|
||||
# Formatting tools configuration
|
||||
[tool.black]
|
||||
line-length = 99
|
||||
target-version = ["py38"]
|
||||
|
||||
# Linting tools configuration
|
||||
[lint]
|
||||
line-length = 99
|
||||
select = ["E", "W", "F", "C", "N", "D", "I001"]
|
||||
extend-ignore = [
|
||||
"C901",
|
||||
"D203",
|
||||
"D204",
|
||||
"D213",
|
||||
"D215",
|
||||
"D400",
|
||||
"D404",
|
||||
"D406",
|
||||
"D407",
|
||||
"D408",
|
||||
"D409",
|
||||
"D413",
|
||||
]
|
||||
ignore = ["E501", "D107"]
|
||||
extend-exclude = ["__pycache__", "*.egg_info"]
|
||||
per-file-ignores = {"tests/*" = ["D100","D101","D102","D103","D104"]}
|
||||
|
||||
[lint.mccabe]
|
||||
max-complexity = 10
|
||||
|
||||
[tool.codespell]
|
||||
skip = "build,lib,venv,icon.svg,.tox,.git,.mypy_cache,.ruff_cache,.coverage,cover"
|
||||
|
||||
[tool.pyright]
|
||||
include = ["src/**.py", "tests/**.py"]
|
||||
|
5
rebuild
5
rebuild
@ -1,5 +0,0 @@
|
||||
# This file is used to trigger rebuilds
|
||||
# when dependencies of the charm change,
|
||||
# but nothing in the charm needs to.
|
||||
# simply change the uuid to something new
|
||||
53cb6df6-1178-11ec-b383-bf4fe629ca15
|
@ -1,20 +1,10 @@
|
||||
# This file is managed centrally by release-tools and should not be modified
|
||||
# within individual charm repos. See the 'global' dir contents for available
|
||||
# choices of *requirements.txt files for OpenStack Charms:
|
||||
# https://github.com/openstack-charmers/release-tools
|
||||
#
|
||||
# NOTE(lourot): This might look like a duplication of test-requirements.txt but
|
||||
# some tox targets use only test-requirements.txt whereas charm-build uses only
|
||||
# requirements.txt
|
||||
setuptools<50.0.0 # https://github.com/pypa/setuptools/commit/04e3df22df840c6bb244e9b27bc56750c44b7c85
|
||||
ops ~= 2.4
|
||||
netifaces ~= 0.11.0
|
||||
netaddr ~= 0.8.0
|
||||
pyyaml ~= 6.0.1
|
||||
psutil ~= 5.9.5
|
||||
prometheus-client ~= 0.17.1
|
||||
|
||||
# NOTE: newer versions of cryptography require a Rust compiler to build,
|
||||
# see
|
||||
# * https://github.com/openstack-charmers/zaza/issues/421
|
||||
# * https://mail.python.org/pipermail/cryptography-dev/2021-January/001003.html
|
||||
#
|
||||
cryptography<3.4
|
||||
|
||||
git+https://github.com/juju/charm-tools.git
|
||||
|
||||
simplejson
|
||||
# for lib/charms/grafana_agent/v0/cos_agent.py
|
||||
cosl
|
||||
pydantic < 2
|
||||
|
202
src/LICENSE
202
src/LICENSE
@ -1,202 +0,0 @@
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
425
src/README.md
425
src/README.md
@ -1,425 +0,0 @@
|
||||
# Overview
|
||||
|
||||
Magpie is a charm used for testing the networking of a Juju provider/substrate.
|
||||
Simply deploy Magpie charm to at least two units and watch the status messages and
|
||||
debug logs.
|
||||
|
||||
Magpie will test:
|
||||
|
||||
- DNS functionality
|
||||
- Local hostname lookup
|
||||
- ICMP between peers
|
||||
- MTU between leader and clients
|
||||
- Transfer between leader and clients
|
||||
|
||||
Note : **MTU and transfer speed are tested with iperf2**
|
||||
|
||||
Status messages will show the unit numbers that have issues - if there are
|
||||
no problems, there will not be a verbose status message.
|
||||
|
||||
All strings, queries, and actions are logged in the Juju logs.
|
||||
|
||||
# MTU Notes
|
||||
|
||||
The MTU size reported by iperf is sometimes 8 or 12 bytes less than the configured
|
||||
MTU on the interface. This is due to TCP options not being included in the measurement,
|
||||
and therefore we ignore that difference and report everything OK.
|
||||
|
||||
# Workload Status
|
||||
|
||||
In addition to ICMP and DNS status messages, if a networking problem is
|
||||
detected, the workload status of the agent which has found the issues
|
||||
will be set to blocked.
|
||||
|
||||
# Reactive States
|
||||
|
||||
This layer will set the following states:
|
||||
|
||||
- **`magpie-icmp.failed`** ICMP has failed to one or more units in the peer relation.
|
||||
- **`magpie-dns.failed`** DNS has failed to one or more units in the peer relation.
|
||||
|
||||
Note: work stopped on these states as it is currently unlikely magpie will be consumed
|
||||
as a layer.
|
||||
Please open an issue against this github repo if more states are required.
|
||||
|
||||
# Usage
|
||||
|
||||
```
|
||||
juju deploy magpie -n 2
|
||||
juju deploy magpie -n 1 --to lxd:1
|
||||
```
|
||||
|
||||
This charm supports several config values for tuning behaviour.
|
||||
Please refer to ./src/config.yaml or run `juju config magpie`.
|
||||
|
||||
Example of adjusting config:
|
||||
|
||||
```
|
||||
juju config magpie dns_server=8.8.8.8 required_mtu=9000 min_speed=1000
|
||||
```
|
||||
|
||||
## Network spaces
|
||||
|
||||
If you use network spaces in your Juju deployment (as you should) use
|
||||
`--bind '<space-name> magpie=<space-name>'` to force magpie to test that
|
||||
particular network space.
|
||||
|
||||
It is possible to deploy several magpie charms
|
||||
(as different Juju applications) to the same server each in a different
|
||||
network space.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
juju deploy -m magpie cs:~openstack-charmers/magpie magpie-space1 --bind "space1 magpie=space1" -n 5 --to 0,2,1,4,3
|
||||
juju deploy -m magpie cs:~openstack-charmers/magpie magpie-space2 --bind "space2 magpie=space2" -n 3 --to 3,2,0
|
||||
juju deploy -m magpie cs:~openstack-charmers/magpie magpie-space3 --bind "space3 magpie=space3" -n 4 --to 3,2,1,0
|
||||
juju deploy -m magpie cs:~openstack-charmers/magpie magpie-space4 --bind "space4 magpie=space4" -n 4 --to 3,2,1,0
|
||||
```
|
||||
|
||||
|
||||
## Bonded links testing and troubleshooting
|
||||
|
||||
Network bonding enables the combination of two or more network interfaces into a single-bonded
|
||||
(logical) interface, which increases the bandwidth and provides redundancy. While Magpie does some
|
||||
sanity checks and could reveal some configuration problems, this part of README contains some
|
||||
advanced troubleshooting information, which might be useful, while identifying and fixing the issue.
|
||||
|
||||
There are six bonding modes:
|
||||
|
||||
### `balance-rr`
|
||||
|
||||
Round-robin policy: Transmit packets in sequential order from the first available slave through the
|
||||
last. This mode provides load balancing and fault tolerance.
|
||||
|
||||
### `active-backup`
|
||||
|
||||
Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and
|
||||
only if, the active slave fails. The bond's MAC address is externally visible on only one port
|
||||
(network adapter) to avoid confusing the switch. This mode provides fault tolerance. The primary
|
||||
option affects the behavior of this mode.
|
||||
|
||||
### `balance-xor`
|
||||
|
||||
XOR policy: Transmit based on selectable hashing algorithm. The default policy is a simple
|
||||
source+destination MAC address algorithm. Alternate transmit policies may be selected via the
|
||||
`xmit_hash_policy` option, described below. This mode provides load balancing and fault tolerance.
|
||||
|
||||
### `broadcast`
|
||||
|
||||
Broadcast policy: transmits everything on all slave interfaces. This mode provides fault tolerance.
|
||||
|
||||
### `802.3ad` (LACP)
|
||||
|
||||
Link Aggregation Control Protocol (IEEE 802.3ad LACP) is a control protocol that automatically
|
||||
detects multiple links between two LACP enabled devices and configures them to use their maximum
|
||||
possible bandwidth by automatically trunking the links together. This mode has a prerequisite -
|
||||
the switch(es) ports should have LACP configured and enabled.
|
||||
|
||||
### `balance-tlb`
|
||||
|
||||
Adaptive transmit load balancing: channel bonding that does not require any special switch support.
|
||||
The outgoing traffic is distributed according to the current load (computed relative to the speed)
|
||||
on each slave. Incoming traffic is received by the current slave. If the receiving slave fails,
|
||||
another slave takes over the MAC address of the failed receiving slave.
|
||||
|
||||
### `balance-alb`
|
||||
|
||||
Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic,
|
||||
and does not require any special switch support. The receive load balancing is achieved by ARP
|
||||
negotiation.
|
||||
|
||||
The most commonly used modes are `active-backup` and `802.3ad` (LACP), and while active-backup
|
||||
does not require any third party configuration, it has its own cons - for example, it can't multiply
|
||||
the total bandwidth of the link, while 802.3ad-based bond could utilize all bond members, therefore
|
||||
multiplying the bandwidth. However, in order to get a fully working LACP link, an appropriate
|
||||
configuration has to be done both on the actor (link initiator) and partner (switch) side. Any
|
||||
misconfiguration could lead to the link loss or instability, therefore it's very important to have
|
||||
correct settings applied to the both sides of the link.
|
||||
|
||||
A quick overview of the LACP link status could be obtained by reading the
|
||||
`/proc/net/bonding/<bond_name>` file.
|
||||
|
||||
```
|
||||
$ sudo cat /proc/net/bonding/bondM
|
||||
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
|
||||
|
||||
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
|
||||
Transmit Hash Policy: layer3+4 (1)
|
||||
MII Status: up
|
||||
MII Polling Interval (ms): 100
|
||||
Up Delay (ms): 0
|
||||
Down Delay (ms): 0
|
||||
|
||||
802.3ad info
|
||||
LACP rate: fast
|
||||
Min links: 0
|
||||
Aggregator selection policy (ad_select): stable
|
||||
System priority: 65535
|
||||
System MAC address: 82:23:80:a1:a9:d3
|
||||
Active Aggregator Info:
|
||||
Aggregator ID: 1
|
||||
Number of ports: 2
|
||||
Actor Key: 15
|
||||
Partner Key: 201
|
||||
Partner Mac Address: 02:01:00:00:01:01
|
||||
|
||||
Slave Interface: eno3
|
||||
MII Status: up
|
||||
Speed: 10000 Mbps
|
||||
Duplex: full
|
||||
Link Failure Count: 0
|
||||
Permanent HW addr: 3c:ec:ef:19:eb:30
|
||||
Slave queue ID: 0
|
||||
Aggregator ID: 1
|
||||
Actor Churn State: none
|
||||
Partner Churn State: none
|
||||
Actor Churned Count: 0
|
||||
Partner Churned Count: 0
|
||||
details actor lacp pdu:
|
||||
system priority: 65535
|
||||
system mac address: 82:23:80:a1:a9:d3
|
||||
port key: 15
|
||||
port priority: 255
|
||||
port number: 1
|
||||
port state: 63
|
||||
details partner lacp pdu:
|
||||
system priority: 65534
|
||||
system mac address: 02:01:00:00:01:01
|
||||
oper key: 201
|
||||
port priority: 1
|
||||
port number: 12
|
||||
port state: 63
|
||||
|
||||
Slave Interface: eno1
|
||||
MII Status: up
|
||||
Speed: 10000 Mbps
|
||||
Duplex: full
|
||||
Link Failure Count: 0
|
||||
Permanent HW addr: 3c:ec:ef:19:eb:2e
|
||||
Slave queue ID: 0
|
||||
Aggregator ID: 1
|
||||
Actor Churn State: none
|
||||
Partner Churn State: none
|
||||
Actor Churned Count: 0
|
||||
Partner Churned Count: 0
|
||||
details actor lacp pdu:
|
||||
system priority: 65535
|
||||
system mac address: 82:23:80:a1:a9:d3
|
||||
port key: 15
|
||||
port priority: 255
|
||||
port number: 2
|
||||
port state: 63
|
||||
details partner lacp pdu:
|
||||
system priority: 65534
|
||||
system mac address: 02:01:00:00:01:01
|
||||
oper key: 201
|
||||
port priority: 1
|
||||
port number: 1012
|
||||
port state: 63
|
||||
```
|
||||
|
||||
The key things an operator should take a look at is:
|
||||
|
||||
- LACP rate
|
||||
- Actor Churn State
|
||||
- Partner Churn State
|
||||
- Port State
|
||||
|
||||
### LACP rate
|
||||
|
||||
The Link Aggregation Control Protocol (LACP) provides a standardized means for exchanging
|
||||
information between Partner Systems on a link to allow their Link Aggregation Control instances to
|
||||
reach agreement on the identity of the LAG to which the link belongs, move the link to that LAG, and
|
||||
enable its transmission and reception functions in an orderly manner. The protocol depends upon the
|
||||
transmission of information and state, rather than the transmission of commands. LACPDUs (LACP Data
|
||||
Unit) sent by the first party (the Actor) convey to the second party (the Actor’s protocol Partner)
|
||||
what the Actor knows, both about its own state and that of the Partner.
|
||||
|
||||
Periodic transmission of LACPDUs occurs if the LACP Activity control of either the Actor or the
|
||||
Partner is Active LACP. These periodic transmissions will occur at either a slow or fast
|
||||
transmission rate depending upon the expressed LACP_Timeout preference (Long Timeout or Short
|
||||
Timeout) of the Partner System.
|
||||
|
||||
### Actor/Partner Churn State
|
||||
|
||||
In general, "Churned" port status means that the parties are unable to reach agreement upon the
|
||||
desired state of a link. Under normal operation of the protocol, such a resolution would be reached
|
||||
very rapidly; continued failure to reach agreement can be symptomatic of component failure, of the
|
||||
presence of non-standard devices on the link concerned, or of mis-configuration. Hence, detection of
|
||||
such failures is signalled by the Churn Detection algorithm to the operator in order to prompt
|
||||
administrative action to further resolution.
|
||||
|
||||
### Port State
|
||||
|
||||
Both of the Actor and Partner state are variables, encoded as individual bits within a single octet,
|
||||
as follows.
|
||||
|
||||
0) LACP_Activity: Device intends to transmit periodically in order to find potential
|
||||
members for the aggregate. Active LACP is encoded as a 1; Passive LACP as a 0.
|
||||
1) LACP_Timeout: This flag indicates the Timeout control value with regard to this link. Short
|
||||
Timeout is encoded as a 1; Long Timeout as a 0.
|
||||
2) Aggregability: This flag indicates that the system considers this link to be Aggregateable; i.e.,
|
||||
a potential candidate for aggregation. If FALSE (encoded as a 0), the link is considered to be
|
||||
Individual; i.e., this link can be operated only as an individual link. Aggregatable is encoded as a
|
||||
1; Individual is encoded as a 0.
|
||||
3) Synchronization: Indicates that the bond on the transmitting machine is in sync with what’s being
|
||||
advertised in the LACP frames, meaning the link has been allocated to the correct LAG, the group has
|
||||
been associated with a compatible Aggregator, and the identity of the LAG is consistent with the
|
||||
System ID and operational Key information transmitted. "In Sync" is encoded as a 1; "Out of sync" is
|
||||
encoded as a 0.
|
||||
4) Collecting: Bond is accepting traffic received on this port, collection of incoming frames on
|
||||
this link is definitely enabled and is not expected to be disabled in the absence of administrative
|
||||
changes or changes in received protocol information. True is encoded as a 1; False is encoded as a
|
||||
0.
|
||||
5) Distributing: Bond is sending traffic using these ports encoded. Same as above, but for egress
|
||||
traffic. True is encoded as a 1; False is encoded as a 0.
|
||||
6) Defaulted: Determines, whether the receiving bond is using default (administratively defined)
|
||||
parameters, if the information was received in an LACP PDU. Default settings are encoded as a 1,
|
||||
LACP PDU is encoded as 0.
|
||||
7) Expired: Is the bond in the expired state. Yes encoded as a 1, No encoded as a 0.
|
||||
|
||||
In the example output above, both of the port states are equal to 63. Let's decode:
|
||||
|
||||
```
|
||||
$ python3
|
||||
Python 3.8.4 (default, Jul 17 2020, 15:44:37)
|
||||
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
|
||||
Type "help", "copyright", "credits" or "license" for more information.
|
||||
>>> bin(63)
|
||||
'0b111111'
|
||||
```
|
||||
|
||||
Reading right to the left:
|
||||
|
||||
LACP Activity: Active
|
||||
LACP Timeout: Short
|
||||
Aggregability: Link is Aggregatable
|
||||
Synchronization: Link in sync
|
||||
Collecting: True - bond is accepting the traffic
|
||||
Distributing: True - bond is sending the traffic
|
||||
Defaulted: Info received from LACP PDU
|
||||
Expired: False - link is not expired
|
||||
|
||||
The above status represents the **fully healthy bond** without any LACP-related issues. Also, for
|
||||
the operators' convenience, the [lacp_decoder.py](src/tools/lacp_decoder.py) script could be used to
|
||||
quickly convert the status to some human-friendly format.
|
||||
|
||||
However, the situations where one of the links is misconfigured are happening too, so let's assume
|
||||
we have the following:
|
||||
|
||||
```
|
||||
$ sudo cat /proc/net/bonding/bondm
|
||||
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
|
||||
|
||||
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
|
||||
Transmit Hash Policy: layer3+4 (1)
|
||||
MII Status: up
|
||||
MII Polling Interval (ms): 100
|
||||
Up Delay (ms): 0
|
||||
Down Delay (ms): 0
|
||||
|
||||
802.3ad info
|
||||
LACP rate: fast
|
||||
Min links: 0
|
||||
Aggregator selection policy (ad_select): stable
|
||||
System priority: 65535
|
||||
System MAC address: b4:96:91:6d:20:fc
|
||||
Active Aggregator Info:
|
||||
Aggregator ID: 2
|
||||
Number of ports: 1
|
||||
Actor Key: 9
|
||||
Partner Key: 32784
|
||||
Partner Mac Address: 00:23:04:ee:be:66
|
||||
|
||||
Slave Interface: enp197s0f2
|
||||
MII Status: up
|
||||
Speed: 100 Mbps
|
||||
Duplex: full
|
||||
Link Failure Count: 0
|
||||
Permanent HW addr: b4:96:91:6d:20:fe
|
||||
Slave queue ID: 0
|
||||
Aggregator ID: 1
|
||||
Actor Churn State: churned
|
||||
Partner Churn State: none
|
||||
Actor Churned Count: 1
|
||||
Partner Churned Count: 0
|
||||
details actor lacp pdu:
|
||||
system priority: 65535
|
||||
system mac address: b4:96:91:6d:20:fc
|
||||
port key: 7
|
||||
port priority: 255
|
||||
port number: 1
|
||||
port state: 7
|
||||
details partner lacp pdu:
|
||||
system priority: 32667
|
||||
system mac address: 00:23:04:ee:be:66
|
||||
oper key: 32784
|
||||
port priority: 32768
|
||||
port number: 16661
|
||||
port state: 13
|
||||
|
||||
Slave Interface: enp197s0f0
|
||||
MII Status: up
|
||||
Speed: 1000 Mbps
|
||||
Duplex: full
|
||||
Link Failure Count: 0
|
||||
Permanent HW addr: b4:96:91:6d:20:fc
|
||||
Slave queue ID: 0
|
||||
Aggregator ID: 2
|
||||
Actor Churn State: none
|
||||
Partner Churn State: none
|
||||
Actor Churned Count: 0
|
||||
Partner Churned Count: 0
|
||||
details actor lacp pdu:
|
||||
system priority: 65535
|
||||
system mac address: b4:96:91:6d:20:fc
|
||||
port key: 9
|
||||
port priority: 255
|
||||
port number: 2
|
||||
port state: 63
|
||||
details partner lacp pdu:
|
||||
system priority: 32667
|
||||
system mac address: 00:23:04:ee:be:66
|
||||
oper key: 32784
|
||||
port priority: 32768
|
||||
port number: 277
|
||||
port state: 63
|
||||
```
|
||||
|
||||
As we could see, one of the links has different port states for both partner and actor, while the second
|
||||
one has 63 for both - meaning, the first one is problematic and we'd need to dive more into this
|
||||
problem.
|
||||
|
||||
Let's decode both of the statuses, using the mentioned script:
|
||||
|
||||
```
|
||||
$ python ./lacp-decoder.py 7 13
|
||||
(Equal for both ports) LACP Activity: Active LACP
|
||||
LACP Timeout: Short (Port 1) / Long (Port 2)
|
||||
(Equal for both ports) Aggregability: Aggregatable
|
||||
Synchronization: Link out of sync (Port 1) / Link in sync (Port 2)
|
||||
(Equal for both ports) Collecting: Ingress traffic: Rejecting
|
||||
(Equal for both ports) Distributing: Egress trafic: Not sending
|
||||
(Equal for both ports) Is Defaulted: Settings are received from LACP PDU
|
||||
(Equal for both ports) Link Expiration: No
|
||||
```
|
||||
|
||||
The above output means that there are two differences between these statuses: LACP Timeout and
|
||||
Synchronization. That means two things:
|
||||
|
||||
1) the Partner side (a switch side in most of the cases) has incorrectly configured LACP timeout
|
||||
control. To resolve this, an operator has to either change the LACP rate from the Actor (e.g a
|
||||
server) side to "Slow", or adjust the Partner (e.g switch) LACP rate to "Fast".
|
||||
2) the Partner side considers this physical link as a part of a different link aggregation group. The
|
||||
switch config has to be revisited and link aggregation group members need to be verified again,
|
||||
ensuring there is no extra or wrong links configured as part of the single LAG.
|
||||
|
||||
After addressing the above issues, the port state will change to 63, which means "LACP link is fully
|
||||
functional".
|
||||
|
||||
# Bugs
|
||||
|
||||
Please report bugs on [Launchpad](https://bugs.launchpad.net/charm-magpie/+filebug).
|
||||
|
||||
For general questions please refer to the OpenStack [Charm Guide](https://docs.openstack.org/charm-guide/latest/).
|
@ -1,43 +0,0 @@
|
||||
listen:
|
||||
description: |
|
||||
Instruct unit to listen
|
||||
properties:
|
||||
network-cidr:
|
||||
type: string
|
||||
description: Network cidr to use for iperf
|
||||
listener-count:
|
||||
type: integer
|
||||
description: Number of listeners to start
|
||||
advertise:
|
||||
description: |
|
||||
Advertise addresses
|
||||
run-iperf:
|
||||
description: |
|
||||
Run iperf
|
||||
properties:
|
||||
network-cidr:
|
||||
type: string
|
||||
description: Network cidr to use for iperf
|
||||
units:
|
||||
type: string
|
||||
description: Space separated list of units
|
||||
iperf-batch-time:
|
||||
type: integer
|
||||
default: 10
|
||||
description: |
|
||||
Maps to iperf -t option, time in seconds to transmit traffic
|
||||
concurrency-progression:
|
||||
type: [integer, string]
|
||||
default: "2 4 8"
|
||||
description: |
|
||||
Space separated list of concurrency value for each batch
|
||||
total-run-time:
|
||||
type: integer
|
||||
default: 600
|
||||
description: |
|
||||
Total run time for iperf test in seconds
|
||||
tag:
|
||||
type: string
|
||||
default: default
|
||||
description: |
|
||||
Tag to use when publishing metrics
|
@ -1,89 +0,0 @@
|
||||
#!/usr/local/sbin/charm-env python3
|
||||
|
||||
# Copyright 2020 Canonical Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
|
||||
# Load modules from $CHARM_DIR/lib
|
||||
sys.path.append('lib')
|
||||
|
||||
from charms.layer import basic
|
||||
basic.bootstrap_charm_deps()
|
||||
basic.init_config_states()
|
||||
|
||||
import charms.reactive as reactive
|
||||
import charmhelpers.core.hookenv as hookenv
|
||||
from charms.layer.magpie_tools import Iperf
|
||||
|
||||
IPERF_BASE_PORT = 5001
|
||||
|
||||
|
||||
def listen(*args):
|
||||
action_config = hookenv.action_get()
|
||||
cidr = action_config.get('network-cidr')
|
||||
listener_count = action_config.get('listener-count') or 1
|
||||
magpie = reactive.relations.endpoint_from_flag('magpie.joined')
|
||||
iperf = Iperf()
|
||||
for port in range(IPERF_BASE_PORT, IPERF_BASE_PORT + int(listener_count)):
|
||||
iperf.listen(cidr=cidr, port=port)
|
||||
magpie.set_iperf_server_ready()
|
||||
reactive.set_state('iperf.listening')
|
||||
|
||||
|
||||
def advertise(*args):
|
||||
magpie = reactive.relations.endpoint_from_flag('magpie.joined')
|
||||
magpie.advertise_addresses()
|
||||
|
||||
|
||||
def run_iperf(*args):
|
||||
action_config = hookenv.action_get()
|
||||
cidr = action_config.get('network-cidr')
|
||||
units = action_config.get('units', '').split()
|
||||
magpie = reactive.relations.endpoint_from_flag('magpie.joined')
|
||||
nodes = {ip: name
|
||||
for name, ip in magpie.get_nodes(cidr=cidr)
|
||||
if not units or name in units}
|
||||
iperf = Iperf()
|
||||
results = iperf.batch_hostcheck(
|
||||
nodes,
|
||||
action_config.get('total-run-time'),
|
||||
action_config.get('iperf-batch-time'),
|
||||
[int(i) for i in str(
|
||||
action_config.get('concurrency-progression')
|
||||
).split()],
|
||||
tag=action_config.get('tag'))
|
||||
hookenv.action_set({
|
||||
"output": json.dumps(results)})
|
||||
|
||||
|
||||
# Actions to function mapping, to allow for illegal python action names that
|
||||
# can map to a python function.
|
||||
ACTIONS = {
|
||||
"listen": listen,
|
||||
"advertise": advertise,
|
||||
"run-iperf": run_iperf,
|
||||
}
|
||||
|
||||
|
||||
def main(args):
|
||||
action_name = os.path.basename(args[0])
|
||||
action = ACTIONS[action_name]
|
||||
action(args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main(sys.argv))
|
@ -1 +0,0 @@
|
||||
actions.py
|
@ -1 +0,0 @@
|
||||
actions.py
|
@ -1 +0,0 @@
|
||||
actions.py
|
156
src/charm.py
Executable file
156
src/charm.py
Executable file
@ -0,0 +1,156 @@
|
||||
#!/usr/bin/env python3
|
||||
# Copyright 2023 Ubuntu
|
||||
# See LICENSE file for licensing details.
|
||||
#
|
||||
# Learn more at: https://juju.is/docs/sdk
|
||||
|
||||
"""Charm for Magpie."""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from typing import Dict, List
|
||||
|
||||
import ops
|
||||
from charms.grafana_agent.v0.cos_agent import COSAgentProvider
|
||||
from magpie_tools import (
|
||||
CollectDataConfig,
|
||||
DnsConfig,
|
||||
HostWithIp,
|
||||
Iperf,
|
||||
PingConfig,
|
||||
check_dns,
|
||||
check_ping,
|
||||
collect_local_data,
|
||||
configure_lldpd,
|
||||
)
|
||||
from ops.model import ActiveStatus
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class MagpieCharm(ops.CharmBase):
|
||||
"""Charm the service."""
|
||||
|
||||
def __init__(self, *args):
|
||||
super().__init__(*args)
|
||||
self.framework.observe(self.on.iperf_action, self._on_iperf_action)
|
||||
self.framework.observe(self.on.info_action, self._on_info_action)
|
||||
self.framework.observe(self.on.ping_action, self._on_ping_action)
|
||||
self.framework.observe(self.on.dns_action, self._on_dns_action)
|
||||
self.framework.observe(self.on.update_status, self._on_update_status)
|
||||
self.framework.observe(self.on.install, self._on_install)
|
||||
self.framework.observe(self.on.start, self._on_start)
|
||||
self.framework.observe(self.on.magpie_relation_changed, self._on_peers_changed)
|
||||
self.framework.observe(self.on.config_changed, self._on_config_changed)
|
||||
|
||||
self._grafana_agent = COSAgentProvider(
|
||||
self,
|
||||
metrics_endpoints=[
|
||||
{"path": "/metrics", "port": 80},
|
||||
],
|
||||
dashboard_dirs=["./src/grafana_dashboards"],
|
||||
)
|
||||
|
||||
def _on_install(self, event):
|
||||
os.system("apt update")
|
||||
os.system("apt install -y iperf")
|
||||
|
||||
def _on_start(self, event):
|
||||
iperf = Iperf(self.model.name, self.app.name, self.model.unit.name, with_prometheus=False)
|
||||
cidr: str = self.config.get("iperf_listen_cidr") # type: ignore
|
||||
fallback_bind_address: str = str(self.model.get_binding("magpie").network.bind_address) # type: ignore
|
||||
iperf.listen(cidr, fallback_bind_address)
|
||||
self._on_update_status(event)
|
||||
configure_lldpd()
|
||||
|
||||
def _on_config_changed(self, event):
|
||||
pass
|
||||
|
||||
def _on_peers_changed(self, event):
|
||||
self._on_update_status(event)
|
||||
|
||||
def _get_peer_units(self) -> Dict[ops.model.Unit, dict]: # unit -> unit relation data
|
||||
units = {}
|
||||
for relation in self.model.relations["magpie"]:
|
||||
for unit in relation.units: # Set[Unit]
|
||||
units[unit] = relation.data[unit]
|
||||
return units
|
||||
|
||||
def _on_update_status(self, event):
|
||||
n_peers = len(self._get_peer_units())
|
||||
self.unit.status = ActiveStatus(f'Ready, with {n_peers} peer{"s" if n_peers != 1 else ""}')
|
||||
|
||||
def _on_iperf_action(self, event):
|
||||
total_run_time = event.params["total-run-time"]
|
||||
batch_time = event.params["batch-time"]
|
||||
concurrency_progression = [int(i) for i in event.params["concurrency-progression"].split()]
|
||||
filter_units = event.params["units"].split()
|
||||
min_speed = event.params["min-speed"]
|
||||
with_prometheus = len(self.model.relations["cos-agent"]) > 0
|
||||
|
||||
units = []
|
||||
for host_with_ip in self._get_peer_addresses():
|
||||
if not filter_units or host_with_ip.name in filter_units:
|
||||
units.append(host_with_ip)
|
||||
|
||||
iperf = Iperf(self.model.name, self.app.name, self.model.unit.name, with_prometheus)
|
||||
results = iperf.batch_hostcheck(
|
||||
units,
|
||||
total_run_time,
|
||||
batch_time,
|
||||
concurrency_progression,
|
||||
min_speed,
|
||||
)
|
||||
data = json.dumps(results, indent=2)
|
||||
event.set_results({"output": data})
|
||||
|
||||
def _on_info_action(self, event):
|
||||
local_ip: str = str(self.model.get_binding("magpie").network.ingress_addresses[0]) # type: ignore
|
||||
data = json.dumps(
|
||||
collect_local_data(
|
||||
CollectDataConfig(
|
||||
required_mtu=event.params["required-mtu"],
|
||||
bonds_to_check=event.params["bonds-to-check"],
|
||||
lacp_passive_mode=event.params["lacp-passive-mode"],
|
||||
local_ip=local_ip,
|
||||
)
|
||||
),
|
||||
indent=2,
|
||||
)
|
||||
event.set_results({"output": data})
|
||||
|
||||
def _get_peer_addresses(self) -> List[HostWithIp]:
|
||||
addresses = []
|
||||
for unit, data in self._get_peer_units().items():
|
||||
ip = data.get("ingress-address")
|
||||
if ip:
|
||||
addresses.append(HostWithIp(name=unit.name, ip=ip))
|
||||
return addresses
|
||||
|
||||
def _on_ping_action(self, event):
|
||||
data: Dict[str, str] = check_ping(
|
||||
self._get_peer_addresses(),
|
||||
PingConfig(
|
||||
timeout=event.params["timeout"],
|
||||
tries=event.params["tries"],
|
||||
interval=event.params["interval"],
|
||||
required_mtu=event.params["required-mtu"],
|
||||
),
|
||||
)
|
||||
event.set_results({"output": json.dumps(data, indent=2)})
|
||||
|
||||
def _on_dns_action(self, event):
|
||||
data = check_dns(
|
||||
self._get_peer_addresses(),
|
||||
DnsConfig(
|
||||
server=event.params["server"],
|
||||
tries=event.params["tries"],
|
||||
timeout=event.params["timeout"],
|
||||
),
|
||||
)
|
||||
event.set_results({"output": json.dumps(data, indent=2)})
|
||||
|
||||
|
||||
if __name__ == "__main__": # pragma: nocover
|
||||
ops.main(MagpieCharm) # type: ignore
|
@ -1,96 +0,0 @@
|
||||
options:
|
||||
check_bonds:
|
||||
default: AUTO
|
||||
description: Comma separated list of expected bonds or AUTO to check all available bonds.
|
||||
type: string
|
||||
use_lldp:
|
||||
default: false
|
||||
description: Enable LLDP agent and collect data
|
||||
type: boolean
|
||||
check_port_description:
|
||||
default: false
|
||||
description: Check LLDP port description to match hostname
|
||||
type: boolean
|
||||
check_iperf:
|
||||
default: true
|
||||
description: Execute iperf network performance test
|
||||
type: boolean
|
||||
check_dns:
|
||||
default: true
|
||||
description: Check if peers are resolveble
|
||||
type: boolean
|
||||
check_local_hostname:
|
||||
default: true
|
||||
description: Check if local hostname is resolvable
|
||||
type: boolean
|
||||
dns_server:
|
||||
default: ''
|
||||
description: Use unit default DNS server
|
||||
type: string
|
||||
dns_tries:
|
||||
default: 1
|
||||
description: Number of DNS resolution attempts per query
|
||||
type: int
|
||||
dns_time:
|
||||
default: 5
|
||||
description: Timeout in seconds per DNS query try
|
||||
type: int
|
||||
lacp_passive_mode:
|
||||
default: false
|
||||
description: Set to true if switches are in LACP passive mode.
|
||||
type: boolean
|
||||
ping_timeout:
|
||||
default: 2
|
||||
description: Timeout in seconds per ICMP request
|
||||
type: int
|
||||
ping_tries:
|
||||
default: 20
|
||||
description: Number of ICMP packets per ping
|
||||
type: int
|
||||
ping_interval:
|
||||
default: 0.05
|
||||
description: Number of seconds to wait between sending each packet
|
||||
type: float
|
||||
ping_mesh_mode:
|
||||
default: true
|
||||
description: |
|
||||
If true: each unit will ping each other unit.
|
||||
If false: only the leader unit will ping each other unit.
|
||||
type: boolean
|
||||
supress_status:
|
||||
default: False
|
||||
description: Enable this if you intend to consume this layer - suppresses status messages
|
||||
type: boolean
|
||||
required_mtu:
|
||||
default: 0
|
||||
description: Desired MTU for all nodes - block if the unit MTU is different (accounting for encapsulation). 0 disables.
|
||||
type: int
|
||||
min_speed:
|
||||
default: '0'
|
||||
description: |
|
||||
Minimum transfer speed in integer mbit/s required to pass the test. 0 disables.
|
||||
|
||||
This can also be set to an integer percentage value (eg. '80%'),
|
||||
which will be interpreted as a percentage of the link speed.
|
||||
Useful in mixed link speed environments.
|
||||
Likewise, '0%' disables.
|
||||
type: string
|
||||
iperf_duration:
|
||||
default: 1
|
||||
description: |
|
||||
Time in seconds to run iperf to test the transfer speed. Larger
|
||||
value can be set to mitigate the impact of CPU power saving
|
||||
features especially on faster links such as 50G.
|
||||
type: int
|
||||
source:
|
||||
default: distro
|
||||
type: string
|
||||
description: |
|
||||
Repository to add to unit before installing any dependencies.
|
||||
|
||||
May be one of the following:
|
||||
|
||||
distro (default)
|
||||
ppa:somecustom/ppa (PPA name must include UCA OpenStack Release name)
|
||||
deb url sources entry|key id
|
||||
or a supported Ubuntu Cloud Archive pocket.
|
457
src/grafana_dashboards/magpie_benchmarking.json
Normal file
457
src/grafana_dashboards/magpie_benchmarking.json
Normal file
@ -0,0 +1,457 @@
|
||||
{
|
||||
"annotations": {
|
||||
"list": [
|
||||
{
|
||||
"builtIn": 1,
|
||||
"datasource": "-- Grafana --",
|
||||
"enable": true,
|
||||
"hide": true,
|
||||
"iconColor": "rgba(0, 211, 255, 1)",
|
||||
"name": "Annotations & Alerts",
|
||||
"type": "dashboard"
|
||||
}
|
||||
]
|
||||
},
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 0,
|
||||
"id": 1,
|
||||
"links": [],
|
||||
"liveNow": false,
|
||||
"panels": [
|
||||
{
|
||||
"datasource": "${prometheusds}",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisCenteredZero": false,
|
||||
"axisColorMode": "text",
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"barAlignment": 0,
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 10,
|
||||
"gradientMode": "none",
|
||||
"hideFrom": {
|
||||
"legend": false,
|
||||
"tooltip": false,
|
||||
"viz": false
|
||||
},
|
||||
"lineInterpolation": "linear",
|
||||
"lineWidth": 1,
|
||||
"pointSize": 5,
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
},
|
||||
"showPoints": "never",
|
||||
"spanNulls": false,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "none"
|
||||
},
|
||||
"thresholdsStyle": {
|
||||
"mode": "off"
|
||||
}
|
||||
},
|
||||
"mappings": [],
|
||||
"min": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "bps"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"id": 2,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"lastNotNull",
|
||||
"max",
|
||||
"min"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "bottom",
|
||||
"showLegend": true
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"pluginVersion": "9.2.1",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(avg_over_time(magpie_iperf_bandwidth[30s]))",
|
||||
"interval": "",
|
||||
"legendFormat": "bandwidth",
|
||||
"queryType": "randomWalk",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "iperf client bandwidth (total)",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": "${prometheusds}",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisCenteredZero": false,
|
||||
"axisColorMode": "text",
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"barAlignment": 0,
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 0,
|
||||
"gradientMode": "none",
|
||||
"hideFrom": {
|
||||
"legend": false,
|
||||
"tooltip": false,
|
||||
"viz": false
|
||||
},
|
||||
"lineInterpolation": "linear",
|
||||
"lineWidth": 1,
|
||||
"pointSize": 5,
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
},
|
||||
"showPoints": "never",
|
||||
"spanNulls": false,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "none"
|
||||
},
|
||||
"thresholdsStyle": {
|
||||
"mode": "off"
|
||||
}
|
||||
},
|
||||
"mappings": [],
|
||||
"min": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "bps"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"id": 4,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [],
|
||||
"displayMode": "list",
|
||||
"placement": "bottom",
|
||||
"showLegend": true
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"pluginVersion": "9.2.1",
|
||||
"targets": [
|
||||
{
|
||||
"editorMode": "code",
|
||||
"expr": "avg_over_time(magpie_iperf_bandwidth[600s])",
|
||||
"interval": "",
|
||||
"legendFormat": "{{src}} -> {{dest}}",
|
||||
"queryType": "randomWalk",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "iperf bandwidth (unit)",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": "${prometheusds}",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisCenteredZero": false,
|
||||
"axisColorMode": "text",
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"barAlignment": 0,
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 10,
|
||||
"gradientMode": "none",
|
||||
"hideFrom": {
|
||||
"legend": false,
|
||||
"tooltip": false,
|
||||
"viz": false
|
||||
},
|
||||
"lineInterpolation": "linear",
|
||||
"lineWidth": 1,
|
||||
"pointSize": 5,
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
},
|
||||
"showPoints": "never",
|
||||
"spanNulls": false,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "normal"
|
||||
},
|
||||
"thresholdsStyle": {
|
||||
"mode": "off"
|
||||
}
|
||||
},
|
||||
"links": [],
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "Bps"
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byRegexp",
|
||||
"options": "/In .*/"
|
||||
},
|
||||
"properties": [
|
||||
{
|
||||
"id": "color",
|
||||
"value": {
|
||||
"fixedColor": "#629E51",
|
||||
"mode": "fixed"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byRegexp",
|
||||
"options": "/Out .*/"
|
||||
},
|
||||
"properties": [
|
||||
{
|
||||
"id": "color",
|
||||
"value": {
|
||||
"fixedColor": "#1F78C1",
|
||||
"mode": "fixed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "custom.fillOpacity",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"id": "custom.lineWidth",
|
||||
"value": 2
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 8
|
||||
},
|
||||
"id": 6,
|
||||
"links": [],
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"lastNotNull",
|
||||
"max",
|
||||
"min"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "bottom",
|
||||
"showLegend": true
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"pluginVersion": "9.2.1",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(irate(node_network_receive_bytes_total[5m]))",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"instant": false,
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "received",
|
||||
"metric": "net_by",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
},
|
||||
{
|
||||
"expr": "sum(irate(node_network_transmit_bytes_total[5m]))",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "sent",
|
||||
"refId": "B",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"title": "Network throughput",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": "${prometheusds}",
|
||||
"description": "",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisCenteredZero": false,
|
||||
"axisColorMode": "text",
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"barAlignment": 0,
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 0,
|
||||
"gradientMode": "none",
|
||||
"hideFrom": {
|
||||
"legend": false,
|
||||
"tooltip": false,
|
||||
"viz": false
|
||||
},
|
||||
"lineInterpolation": "linear",
|
||||
"lineWidth": 1,
|
||||
"pointSize": 5,
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
},
|
||||
"showPoints": "auto",
|
||||
"spanNulls": false,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "none"
|
||||
},
|
||||
"thresholdsStyle": {
|
||||
"mode": "off"
|
||||
}
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 8
|
||||
},
|
||||
"id": 8,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [],
|
||||
"displayMode": "list",
|
||||
"placement": "bottom",
|
||||
"showLegend": true
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "single",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"editorMode": "code",
|
||||
"expr": "magpie_iperf_concurrency",
|
||||
"legendFormat": "{{src}} -> {{dest}}",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Concurrency",
|
||||
"type": "timeseries"
|
||||
}
|
||||
],
|
||||
"refresh": "10s",
|
||||
"schemaVersion": 37,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-30m",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {},
|
||||
"timezone": "",
|
||||
"title": "Magpie Network Benchmarking",
|
||||
"uid": "YzR4rgBGz",
|
||||
"version": 17,
|
||||
"weekStart": ""
|
||||
}
|
@ -1,11 +0,0 @@
|
||||
repo: git@github.com:openstack-charmers/magpie-layer.git
|
||||
includes: [
|
||||
'layer:basic',
|
||||
'interface:magpie',
|
||||
'layer:leadership',
|
||||
'interface:http'
|
||||
]
|
||||
options:
|
||||
basic:
|
||||
use_venv: True
|
||||
include_system_packages: False
|
File diff suppressed because it is too large
Load Diff
1045
src/magpie_tools.py
Normal file
1045
src/magpie_tools.py
Normal file
File diff suppressed because it is too large
Load Diff
@ -1,18 +0,0 @@
|
||||
name: magpie
|
||||
summary: Magpie layer to test networking - ICMP and DNS
|
||||
maintainer: Andrew McLeod <andrew.mcleod@canonical.com>
|
||||
description: |
|
||||
Magpie will check ICMP, DNS, MTU and rx/tx speed between itself and any
|
||||
peer units deployed - deploy more than one magpie unit for meaningful results.
|
||||
tags: [testing, CI]
|
||||
provides:
|
||||
prometheus-target:
|
||||
interface: http
|
||||
peers:
|
||||
magpie:
|
||||
interface: magpie
|
||||
series:
|
||||
- focal
|
||||
- jammy
|
||||
- lunar
|
||||
- mantic
|
@ -1,136 +0,0 @@
|
||||
# Copyright 2020 Canonical Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# pylint: disable=unused-argument
|
||||
from charms.reactive import when, when_not, set_state, remove_state
|
||||
from charmhelpers.core import hookenv
|
||||
from charms.layer.magpie_tools import check_nodes, safe_status, Iperf, Lldp
|
||||
|
||||
import charmhelpers.contrib.openstack.utils as os_utils
|
||||
import charmhelpers.fetch as fetch
|
||||
|
||||
|
||||
def _set_states(check_result):
|
||||
if 'fail' in check_result['icmp']:
|
||||
set_state('magpie-icmp.failed')
|
||||
else:
|
||||
remove_state('magpie-icmp.failed')
|
||||
if 'fail' in check_result['dns']:
|
||||
set_state('magpie-dns.failed')
|
||||
else:
|
||||
remove_state('magpie-dns.failed')
|
||||
|
||||
|
||||
@when_not('charm.installed')
|
||||
def install():
|
||||
"""Configure APT source.
|
||||
|
||||
The many permutations of package source syntaxes in use does not allow us
|
||||
to simply call `add-apt-repository` on the unit and we need to make use
|
||||
of `charmhelpers.fetch.add_source` for this to be universally useful.
|
||||
"""
|
||||
source, key = os_utils.get_source_and_pgp_key(
|
||||
hookenv.config().get('source', 'distro'))
|
||||
fetch.add_source(source, key)
|
||||
fetch.apt_update(fatal=True)
|
||||
# The ``magpie`` charm is used as principle for functional tests with some
|
||||
# subordinate charms. Install the ``openstack-release`` package when
|
||||
# available to allow the functional test code to determine installed UCA
|
||||
# versions.
|
||||
fetch.apt_install(fetch.filter_installed_packages(['openstack-release']),
|
||||
fatal=False, quiet=True)
|
||||
fetch.apt_install(fetch.filter_installed_packages(['iperf']),
|
||||
fatal=True, quiet=True)
|
||||
set_state('charm.installed')
|
||||
|
||||
|
||||
@when('charm.installed')
|
||||
@when_not('lldp.installed')
|
||||
def install_lldp_pkg():
|
||||
if hookenv.config().get('use_lldp'):
|
||||
lldp = Lldp()
|
||||
lldp.install()
|
||||
lldp.enable()
|
||||
set_state('lldp.installed')
|
||||
|
||||
|
||||
@when_not('magpie.joined')
|
||||
def no_peers():
|
||||
safe_status('waiting', 'Waiting for peers...')
|
||||
|
||||
|
||||
@when('magpie.joined')
|
||||
@when_not('leadership.is_leader', 'iperf.checked')
|
||||
def check_check_state(magpie):
|
||||
'''
|
||||
Servers should only update their status after iperf has checked them
|
||||
'''
|
||||
if magpie.get_iperf_checked():
|
||||
for units in magpie.get_iperf_checked():
|
||||
if units and hookenv.local_unit() in units:
|
||||
set_state('iperf.checked')
|
||||
|
||||
|
||||
@when('magpie.joined', 'leadership.is_leader')
|
||||
@when_not('iperf.servers.ready')
|
||||
def leader_wait_servers_ready(magpie):
|
||||
'''
|
||||
Don't do any iperf checks until the servers are listening
|
||||
'''
|
||||
nodes = sorted(magpie.get_nodes())
|
||||
iperf_ready_nodes = sorted(magpie.check_ready_iperf_servers())
|
||||
if nodes == iperf_ready_nodes:
|
||||
set_state('iperf.servers.ready')
|
||||
else:
|
||||
remove_state('iperf.servers.ready')
|
||||
|
||||
|
||||
@when('magpie.joined')
|
||||
@when_not('leadership.is_leader', 'iperf.listening')
|
||||
def listen_for_checks(magpie):
|
||||
'''
|
||||
If im not the leader, and im not listening, then listen
|
||||
'''
|
||||
iperf = Iperf()
|
||||
iperf.listen()
|
||||
magpie.set_iperf_server_ready()
|
||||
set_state('iperf.listening')
|
||||
|
||||
|
||||
@when('iperf.servers.ready', 'magpie.joined', 'leadership.is_leader')
|
||||
def client_check_hosts(magpie):
|
||||
'''
|
||||
Once the iperf servers are listening, do the checks
|
||||
'''
|
||||
nodes = magpie.get_nodes()
|
||||
_set_states(check_nodes(nodes, is_leader=True))
|
||||
magpie.set_iperf_checked()
|
||||
|
||||
|
||||
@when('magpie.joined', 'iperf.checked')
|
||||
@when_not('leadership.is_leader')
|
||||
def check_all_node(magpie):
|
||||
'''
|
||||
Now that the iperf checks have been done, we can update our status
|
||||
'''
|
||||
nodes = magpie.get_nodes()
|
||||
_set_states(check_nodes(nodes))
|
||||
|
||||
|
||||
@when('prometheus-target.available')
|
||||
def advertise_metric_port(target):
|
||||
'''
|
||||
Advertise prometheus metric port used during action execution
|
||||
'''
|
||||
target.configure(port="8088")
|
@ -1,9 +0,0 @@
|
||||
# This file is managed centrally by release-tools and should not be modified
|
||||
# within individual charm repos. See the 'global' dir contents for available
|
||||
# choices of *requirements.txt files for OpenStack Charms:
|
||||
# https://github.com/openstack-charmers/release-tools
|
||||
#
|
||||
|
||||
# Functional Test Requirements (let Zaza's dependencies solve all dependencies here!)
|
||||
git+https://github.com/openstack-charmers/zaza.git#egg=zaza
|
||||
git+https://github.com/openstack-charmers/zaza-openstack-tests.git#egg=zaza.openstack
|
@ -1,7 +0,0 @@
|
||||
local_overlay_enabled: False
|
||||
|
||||
series: focal
|
||||
applications:
|
||||
magpie:
|
||||
num_units: 3
|
||||
charm: ../../../magpie_ubuntu-20.04-amd64.charm
|
@ -1,7 +0,0 @@
|
||||
local_overlay_enabled: False
|
||||
|
||||
series: jammy
|
||||
applications:
|
||||
magpie:
|
||||
num_units: 3
|
||||
charm: ../../../magpie_ubuntu-22.04-amd64.charm
|
@ -1,7 +0,0 @@
|
||||
local_overlay_enabled: False
|
||||
|
||||
series: lunar
|
||||
applications:
|
||||
magpie:
|
||||
num_units: 3
|
||||
charm: ../../../magpie_ubuntu-23.04-amd64.charm
|
@ -1,7 +0,0 @@
|
||||
local_overlay_enabled: False
|
||||
|
||||
series: mantic
|
||||
applications:
|
||||
magpie:
|
||||
num_units: 3
|
||||
charm: ../../../magpie_ubuntu-23.10-amd64.charm
|
@ -1,24 +0,0 @@
|
||||
charm_name: magpie
|
||||
|
||||
gate_bundles:
|
||||
- focal
|
||||
- jammy
|
||||
|
||||
dev_bundles:
|
||||
- lunar
|
||||
- mantic
|
||||
|
||||
smoke_bundles:
|
||||
- jammy
|
||||
|
||||
target_deploy_status:
|
||||
magpie:
|
||||
workload-status-message-prefix: "icmp ok"
|
||||
|
||||
tests:
|
||||
- zaza.openstack.charm_tests.magpie.tests.MagpieTest
|
||||
|
||||
tests_options:
|
||||
force_deploy:
|
||||
- lunar
|
||||
- mantic
|
51
src/tools/lacp_decoder.py
Executable file → Normal file
51
src/tools/lacp_decoder.py
Executable file → Normal file
@ -13,51 +13,45 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
"""Tool to decode and help debug LACP port states.
|
||||
|
||||
See README.md for more information.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
|
||||
|
||||
def status_decoder(status):
|
||||
"""Extract the bits from the status integer into a list we can work with easier."""
|
||||
decoded_status = [(status >> bit) & 1 for bit in range(8 - 1, -1, -1)]
|
||||
decoded_status.reverse()
|
||||
return decoded_status
|
||||
|
||||
|
||||
def main(args):
|
||||
"""Run the application."""
|
||||
try:
|
||||
port_state = int(args.port_state)
|
||||
except (TypeError, ValueError):
|
||||
raise Exception('port_state has to be integer')
|
||||
raise Exception("port_state has to be integer")
|
||||
|
||||
if args.second_port_state:
|
||||
try:
|
||||
second_port_state = int(args.second_port_state)
|
||||
except (TypeError, ValueError):
|
||||
raise Exception('second_port_state has to be integer')
|
||||
raise Exception("second_port_state has to be integer")
|
||||
else:
|
||||
second_port_state = None
|
||||
|
||||
states = {
|
||||
0: {
|
||||
"name": "LACP Activity",
|
||||
1: "Active LACP",
|
||||
0: "Passive LACP"
|
||||
},
|
||||
1: {
|
||||
"name": "LACP Timeout",
|
||||
1: "Short",
|
||||
0: "Long"
|
||||
},
|
||||
0: {"name": "LACP Activity", 1: "Active LACP", 0: "Passive LACP"},
|
||||
1: {"name": "LACP Timeout", 1: "Short", 0: "Long"},
|
||||
2: {
|
||||
"name": "Aggregability",
|
||||
1: "Aggregatable",
|
||||
0: "Individual",
|
||||
},
|
||||
3: {
|
||||
"name": "Synchronization",
|
||||
1: "Link in sync",
|
||||
0: "Link out of sync"
|
||||
},
|
||||
3: {"name": "Synchronization", 1: "Link in sync", 0: "Link out of sync"},
|
||||
4: {
|
||||
"name": "Collecting",
|
||||
1: "Ingress traffic: Accepting",
|
||||
@ -66,37 +60,30 @@ def main(args):
|
||||
5: {
|
||||
"name": "Distributing",
|
||||
1: "Egress traffic: Sending",
|
||||
0: "Egress trafic: Not sending"
|
||||
0: "Egress traffic: Not sending",
|
||||
},
|
||||
6: {
|
||||
"name": "Is Defaulted",
|
||||
1: "Defaulted settings",
|
||||
0: "Settings are received from LACP PDU"
|
||||
0: "Settings are received from LACP PDU",
|
||||
},
|
||||
7: {
|
||||
"name": "Link Expiration",
|
||||
1: "Yes",
|
||||
0: "No"
|
||||
}
|
||||
|
||||
7: {"name": "Link Expiration", 1: "Yes", 0: "No"},
|
||||
}
|
||||
status = status_decoder(port_state)
|
||||
|
||||
for i, entry in enumerate(status):
|
||||
status_string = "{0}: {1}".format(states[i]['name'], states[i][entry])
|
||||
status_string = "{0}: {1}".format(states[i]["name"], states[i][entry])
|
||||
if second_port_state:
|
||||
second_status = status_decoder(second_port_state)
|
||||
if entry == second_status[i]:
|
||||
status_string = "(Equal for both ports) {0}".format(
|
||||
status_string)
|
||||
status_string = "(Equal for both ports) {0}".format(status_string)
|
||||
else:
|
||||
status_string += " (Port 1) / {0} (Port 2)".format(
|
||||
states[i][second_status[i]])
|
||||
status_string += " (Port 1) / {0} (Port 2)".format(states[i][second_status[i]])
|
||||
print(status_string)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("port_state")
|
||||
parser.add_argument("second_port_state", nargs='?', default=None)
|
||||
parser.add_argument("second_port_state", nargs="?", default=None)
|
||||
main(parser.parse_args())
|
||||
|
55
src/tox.ini
55
src/tox.ini
@ -1,55 +0,0 @@
|
||||
# Source charm (with zaza): ./src/tox.ini
|
||||
# This file is managed centrally by release-tools and should not be modified
|
||||
# within individual charm repos. See the 'global' dir contents for available
|
||||
# choices of tox.ini for OpenStack Charms:
|
||||
# https://github.com/openstack-charmers/release-tools
|
||||
|
||||
[tox]
|
||||
envlist = pep8
|
||||
# NOTE: Avoid build/test env pollution by not enabling sitepackages.
|
||||
sitepackages = False
|
||||
# NOTE: Avoid false positives by not skipping missing interpreters.
|
||||
skip_missing_interpreters = False
|
||||
|
||||
[testenv]
|
||||
# We use tox mainly for virtual environment management for test requirements
|
||||
# and do not install the charm code as a Python package into that environment.
|
||||
# Ref: https://tox.wiki/en/latest/config.html#skip_install
|
||||
skip_install = True
|
||||
setenv = VIRTUAL_ENV={envdir}
|
||||
PYTHONHASHSEED=0
|
||||
allowlist_externals = juju
|
||||
passenv =
|
||||
HOME
|
||||
TERM
|
||||
CS_*
|
||||
OS_*
|
||||
TEST_*
|
||||
deps = -r{toxinidir}/test-requirements.txt
|
||||
|
||||
[testenv:pep8]
|
||||
basepython = python3
|
||||
commands = charm-proof
|
||||
|
||||
[testenv:func-noop]
|
||||
basepython = python3
|
||||
commands =
|
||||
functest-run-suite --help
|
||||
|
||||
[testenv:func]
|
||||
basepython = python3
|
||||
commands =
|
||||
functest-run-suite --keep-model
|
||||
|
||||
[testenv:func-smoke]
|
||||
basepython = python3
|
||||
commands =
|
||||
functest-run-suite --keep-model --smoke
|
||||
|
||||
[testenv:func-target]
|
||||
basepython = python3
|
||||
commands =
|
||||
functest-run-suite --keep-model --bundle {posargs}
|
||||
|
||||
[testenv:venv]
|
||||
commands = {posargs}
|
@ -1,8 +0,0 @@
|
||||
# charmhelpers.contrib.openstack.utils pulls in a dep that require this
|
||||
netifaces
|
||||
prometheus_client
|
||||
psutil
|
||||
|
||||
git+https://github.com/openstack/charms.openstack.git#egg=charms.openstack
|
||||
|
||||
git+https://github.com/juju/charm-helpers.git#egg=charmhelpers
|
@ -1,37 +1,13 @@
|
||||
# This file is managed centrally by release-tools and should not be modified
|
||||
# within individual charm repos. See the 'global' dir contents for available
|
||||
# choices of *requirements.txt files for OpenStack Charms:
|
||||
# https://github.com/openstack-charmers/release-tools
|
||||
#
|
||||
pyparsing<3.0.0 # aodhclient is pinned in zaza and needs pyparsing < 3.0.0, but cffi also needs it, so pin here.
|
||||
# static analysis
|
||||
black
|
||||
ruff
|
||||
codespell
|
||||
pyright
|
||||
|
||||
stestr>=2.2.0
|
||||
# unit tests
|
||||
pytest
|
||||
coverage[toml]
|
||||
|
||||
# Dependency of stestr. Workaround for
|
||||
# https://github.com/mtreinish/stestr/issues/145
|
||||
cliff<3.0.0
|
||||
|
||||
requests>=2.18.4
|
||||
charms.reactive
|
||||
|
||||
mock>=1.2
|
||||
|
||||
nose>=1.3.7
|
||||
coverage>=3.6
|
||||
git+https://github.com/openstack/charms.openstack.git#egg=charms.openstack
|
||||
#
|
||||
# Revisit for removal / mock improvement:
|
||||
#
|
||||
# NOTE(lourot): newer versions of cryptography require a Rust compiler to build,
|
||||
# see
|
||||
# * https://github.com/openstack-charmers/zaza/issues/421
|
||||
# * https://mail.python.org/pipermail/cryptography-dev/2021-January/001003.html
|
||||
#
|
||||
netifaces # vault
|
||||
psycopg2-binary # vault
|
||||
tenacity # vault
|
||||
pbr==5.6.0 # vault
|
||||
cryptography<3.4 # vault, keystone-saml-mellon
|
||||
lxml # keystone-saml-mellon
|
||||
hvac # vault, barbican-vault
|
||||
psutil # cinder-lvm
|
||||
# integration tests
|
||||
juju
|
||||
pytest-operator
|
||||
|
34
tests/integration/test_charm.py
Normal file
34
tests/integration/test_charm.py
Normal file
@ -0,0 +1,34 @@
|
||||
#!/usr/bin/env python3
|
||||
# Copyright 2023 Ubuntu
|
||||
# See LICENSE file for licensing details.
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
import yaml
|
||||
from pytest_operator.plugin import OpsTest
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
METADATA = yaml.safe_load(Path("./metadata.yaml").read_text())
|
||||
APP_NAME = METADATA["name"]
|
||||
|
||||
|
||||
@pytest.mark.abort_on_fail
|
||||
async def test_build_and_deploy(ops_test: OpsTest):
|
||||
"""Build the charm-under-test and deploy it together with related charms.
|
||||
|
||||
Assert on the unit status before any relations/configurations take place.
|
||||
"""
|
||||
# Build and deploy charm from local source folder
|
||||
charm = await ops_test.build_charm(".")
|
||||
|
||||
# Deploy the charm and wait for active/idle status
|
||||
await asyncio.gather(
|
||||
ops_test.model.deploy(charm, application_name=APP_NAME),
|
||||
ops_test.model.wait_for_idle(
|
||||
apps=[APP_NAME], status="active", raise_on_blocked=True, timeout=1000
|
||||
),
|
||||
)
|
57
tests/unit/test_charm.py
Normal file
57
tests/unit/test_charm.py
Normal file
@ -0,0 +1,57 @@
|
||||
# Copyright 2023 Ubuntu
|
||||
# See LICENSE file for licensing details.
|
||||
#
|
||||
# Learn more about testing at: https://juju.is/docs/sdk/testing
|
||||
|
||||
from unittest.mock import Mock, call
|
||||
|
||||
import ops
|
||||
import ops.testing
|
||||
import pytest
|
||||
from charm import MagpieCharm
|
||||
from magpie_tools import status_for_speed_check
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def harness():
|
||||
harness = ops.testing.Harness(MagpieCharm)
|
||||
harness.begin()
|
||||
yield harness
|
||||
harness.cleanup()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def os_system_mock(monkeypatch):
|
||||
mock = Mock()
|
||||
monkeypatch.setattr("charm.os.system", mock)
|
||||
return mock
|
||||
|
||||
|
||||
def test_example(harness, os_system_mock):
|
||||
harness.charm.on.install.emit()
|
||||
assert os_system_mock.call_count == 2
|
||||
os_system_mock.assert_has_calls([call("apt update"), call("apt install -y iperf")])
|
||||
|
||||
|
||||
def test_status_for_speed_check():
|
||||
assert status_for_speed_check("0", 123, 150) == {"message": "min-speed disabled", "ok": True}
|
||||
assert status_for_speed_check("0%", 123, 150) == {"message": "min-speed disabled", "ok": True}
|
||||
assert status_for_speed_check(":P", 123, 150) == {
|
||||
"message": "invalid min_speed: :P",
|
||||
"ok": False,
|
||||
}
|
||||
assert status_for_speed_check("1", 10, 400) == {"message": "10 >= 1 mbit/s", "ok": True}
|
||||
assert status_for_speed_check("12", 10, 400) == {
|
||||
"message": "failed: 10 < 12 mbit/s",
|
||||
"ok": False,
|
||||
}
|
||||
assert status_for_speed_check("50%", 100, 400) == {
|
||||
"message": "failed: 100 < 200 mbit/s",
|
||||
"ok": False,
|
||||
}
|
||||
assert status_for_speed_check("50%", 200, 400) == {"message": "200 >= 200 mbit/s", "ok": True}
|
||||
assert status_for_speed_check("50%", 300, 400) == {"message": "300 >= 200 mbit/s", "ok": True}
|
||||
assert status_for_speed_check("50%", 300, -1) == {
|
||||
"message": "unknown, link speed undefined",
|
||||
"ok": False,
|
||||
}
|
153
tox.ini
153
tox.ini
@ -1,110 +1,95 @@
|
||||
# Source charm: ./tox.ini
|
||||
# This file is managed centrally by release-tools and should not be modified
|
||||
# within individual charm repos. See the 'global' dir contents for available
|
||||
# choices of tox.ini for OpenStack Charms:
|
||||
# https://github.com/openstack-charmers/release-tools
|
||||
# Copyright 2023 Ubuntu
|
||||
# See LICENSE file for licensing details.
|
||||
|
||||
[tox]
|
||||
envlist = pep8,py3
|
||||
# NOTE: Avoid build/test env pollution by not enabling sitepackages.
|
||||
sitepackages = False
|
||||
# NOTE: Avoid false positives by not skipping missing interpreters.
|
||||
skip_missing_interpreters = False
|
||||
no_package = True
|
||||
skip_missing_interpreters = True
|
||||
env_list = pep8, cover
|
||||
min_version = 4.0.0
|
||||
|
||||
[vars]
|
||||
src_path = {tox_root}/src
|
||||
tests_path = {tox_root}/tests
|
||||
;lib_path = {tox_root}/lib/charms/operator_name_with_underscores
|
||||
all_path = {[vars]src_path} {[vars]tests_path}
|
||||
|
||||
[testenv]
|
||||
# We use tox mainly for virtual environment management for test requirements
|
||||
# and do not install the charm code as a Python package into that environment.
|
||||
# Ref: https://tox.wiki/en/latest/config.html#skip_install
|
||||
skip_install = True
|
||||
setenv = VIRTUAL_ENV={envdir}
|
||||
PYTHONHASHSEED=0
|
||||
TERM=linux
|
||||
CHARM_LAYERS_DIR={toxinidir}/layers
|
||||
CHARM_INTERFACES_DIR={toxinidir}/interfaces
|
||||
JUJU_REPOSITORY={toxinidir}/build
|
||||
passenv =
|
||||
no_proxy
|
||||
http_proxy
|
||||
https_proxy
|
||||
CHARM_INTERFACES_DIR
|
||||
CHARM_LAYERS_DIR
|
||||
JUJU_REPOSITORY
|
||||
allowlist_externals =
|
||||
charmcraft
|
||||
bash
|
||||
tox
|
||||
set_env =
|
||||
PYTHONPATH = {tox_root}/lib:{[vars]src_path}
|
||||
PYTHONBREAKPOINT=pdb.set_trace
|
||||
PY_COLORS=1
|
||||
pass_env =
|
||||
PYTHONPATH
|
||||
CHARM_BUILD_DIR
|
||||
MODEL_SETTINGS
|
||||
deps =
|
||||
-r{toxinidir}/requirements.txt
|
||||
-r {tox_root}/requirements.txt
|
||||
-r {tox_root}/test-requirements.txt
|
||||
|
||||
[testenv:build]
|
||||
basepython = python3
|
||||
# charmcraft clean is done to ensure that
|
||||
# `tox -e build` always performs a clean, repeatable build.
|
||||
# For faster rebuilds during development,
|
||||
# directly run `charmcraft -v pack && ./rename.sh`.
|
||||
deps =
|
||||
allowlist_externals =
|
||||
charmcraft
|
||||
commands =
|
||||
charmcraft clean
|
||||
charmcraft -v pack
|
||||
charmcraft clean
|
||||
|
||||
[testenv:build-reactive]
|
||||
basepython = python3
|
||||
[testenv:format]
|
||||
description = Apply coding style standards to code
|
||||
commands =
|
||||
charm-build --log-level DEBUG --use-lock-file-branches --binary-wheels-from-source -o {toxinidir}/build/builds src {posargs}
|
||||
black {[vars]all_path}
|
||||
ruff check --fix {[vars]all_path}
|
||||
|
||||
[testenv:add-build-lock-file]
|
||||
basepython = python3
|
||||
[testenv:pep8]
|
||||
description = Code style and other linting
|
||||
commands =
|
||||
charm-build --log-level DEBUG --write-lock-file -o {toxinidir}/build/builds src {posargs}
|
||||
codespell {tox_root}
|
||||
ruff check {[vars]all_path}
|
||||
black --check --diff {[vars]all_path}
|
||||
|
||||
[testenv:static]
|
||||
description = Static typing analysis
|
||||
commands =
|
||||
pyright {[vars]all_path}
|
||||
|
||||
[testenv:py3]
|
||||
basepython = python3
|
||||
deps = -r{toxinidir}/test-requirements.txt
|
||||
commands = stestr run --slowest {posargs}
|
||||
description = Run unit tests
|
||||
commands =
|
||||
pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
|
||||
|
||||
[testenv:py39]
|
||||
basepython = python3.9
|
||||
description = Run unit tests
|
||||
commands =
|
||||
pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
|
||||
|
||||
[testenv:py310]
|
||||
basepython = python3.10
|
||||
deps = -r{toxinidir}/test-requirements.txt
|
||||
commands = stestr run --slowest {posargs}
|
||||
description = Run unit tests
|
||||
commands =
|
||||
pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
|
||||
|
||||
[testenv:pep8]
|
||||
basepython = python3
|
||||
deps = flake8==3.9.2
|
||||
git+https://github.com/juju/charm-tools.git
|
||||
commands = flake8 {posargs} src unit_tests
|
||||
[testenv:py311]
|
||||
basepython = python3.11
|
||||
description = Run unit tests
|
||||
commands =
|
||||
pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
|
||||
|
||||
[testenv:py312]
|
||||
basepython = python3.12
|
||||
description = Run unit tests
|
||||
commands =
|
||||
pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
|
||||
|
||||
[testenv:cover]
|
||||
# Technique based heavily upon
|
||||
# https://github.com/openstack/nova/blob/master/tox.ini
|
||||
basepython = python3
|
||||
deps = -r{toxinidir}/requirements.txt
|
||||
-r{toxinidir}/test-requirements.txt
|
||||
setenv =
|
||||
{[testenv]setenv}
|
||||
PYTHON=coverage run
|
||||
description = Run unit tests
|
||||
commands =
|
||||
coverage erase
|
||||
stestr run --slowest {posargs}
|
||||
coverage combine
|
||||
coverage html -d cover
|
||||
coverage xml -o cover/coverage.xml
|
||||
coverage run --source={[vars]src_path},{[vars]tests_path} -m pytest --tb native -v -s {posargs} {[vars]tests_path}/unit
|
||||
coverage report
|
||||
coverage html --directory cover
|
||||
|
||||
[coverage:run]
|
||||
branch = True
|
||||
concurrency = multiprocessing
|
||||
parallel = True
|
||||
source =
|
||||
.
|
||||
omit =
|
||||
.tox/*
|
||||
*/charmhelpers/*
|
||||
unit_tests/*
|
||||
|
||||
[testenv:venv]
|
||||
basepython = python3
|
||||
commands = {posargs}
|
||||
|
||||
[flake8]
|
||||
# E402 ignore necessary for path append before sys module import in actions
|
||||
ignore = E402,W503,W504
|
||||
[testenv:integration]
|
||||
description = Run integration tests
|
||||
commands =
|
||||
pytest -v -s --tb native --log-cli-level=INFO {posargs} {[vars]tests_path}/integration
|
||||
|
@ -1,28 +0,0 @@
|
||||
# Copyright 2016 Canonical Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import mock
|
||||
import sys
|
||||
|
||||
sys.path.append('src')
|
||||
sys.path.append('src/lib')
|
||||
|
||||
# Mock out charmhelpers so that we can test without it.
|
||||
import charms_openstack.test_mocks # noqa
|
||||
charms_openstack.test_mocks.mock_charmhelpers()
|
||||
|
||||
psutil_mock = mock.MagicMock()
|
||||
sys.modules['psutil'] = psutil_mock
|
||||
prometheus_client_mock = mock.MagicMock()
|
||||
sys.modules['prometheus_client'] = prometheus_client_mock
|
@ -1,513 +0,0 @@
|
||||
from unittest.mock import (
|
||||
patch,
|
||||
mock_open,
|
||||
MagicMock,
|
||||
)
|
||||
|
||||
import lib.charms.layer.magpie_tools as magpie_tools
|
||||
from unit_tests.test_utils import patch_open, CharmTestCase, async_test
|
||||
import netifaces
|
||||
|
||||
LACP_STATE_SLOW_ACTIVE = '61'
|
||||
LACP_STATE_FAST_ACTIVE = '63'
|
||||
LACP_STATE_SLOW_PASSIVE = '60'
|
||||
|
||||
|
||||
def mocked_open_lacp_port_state(actor, partner):
|
||||
def the_actual_mock(path):
|
||||
if (
|
||||
path ==
|
||||
"/sys/class/net/test/bonding_slave/ad_actor_oper_port_state"
|
||||
):
|
||||
return mock_open(read_data=actor)(path)
|
||||
elif (
|
||||
path ==
|
||||
"/sys/class/net/test/bonding_slave/ad_partner_oper_port_state"
|
||||
):
|
||||
return mock_open(read_data=partner)(path)
|
||||
return the_actual_mock
|
||||
|
||||
|
||||
class TestMagpieTools(CharmTestCase):
|
||||
|
||||
def setUp(self):
|
||||
super(TestMagpieTools, self).setUp()
|
||||
self.obj = self.tools = magpie_tools
|
||||
self.patches = [
|
||||
'hookenv',
|
||||
]
|
||||
self.patch_all()
|
||||
self.maxDiff = None
|
||||
|
||||
def test_safe_status(self):
|
||||
self.hookenv.config.return_value = {
|
||||
'supress_status': False}
|
||||
self.tools.safe_status('active', 'awesome')
|
||||
self.hookenv.status_set.assert_called_once_with(
|
||||
'active', 'awesome')
|
||||
self.hookenv.status_set.reset_mock()
|
||||
self.hookenv.config.return_value = {
|
||||
'supress_status': True}
|
||||
self.tools.safe_status('active', 'awesome')
|
||||
self.assertFalse(self.hookenv.status_set.called)
|
||||
|
||||
def test_status_for_speed_check(self):
|
||||
self.assertEqual(
|
||||
magpie_tools.status_for_speed_check('0', 123, 150),
|
||||
', 123 mbit/s'
|
||||
)
|
||||
self.assertEqual(
|
||||
magpie_tools.status_for_speed_check('0%', 123, 150),
|
||||
', 123 mbit/s'
|
||||
)
|
||||
self.assertEqual(
|
||||
magpie_tools.status_for_speed_check(':P', 123, 150),
|
||||
", invalid min_speed: ':P'"
|
||||
)
|
||||
self.assertEqual(
|
||||
magpie_tools.status_for_speed_check('1', 10, 400),
|
||||
', speed ok: 10 mbit/s'
|
||||
)
|
||||
self.assertEqual(
|
||||
magpie_tools.status_for_speed_check('12', 10, 400),
|
||||
', speed failed: 10 < 12 mbit/s'
|
||||
)
|
||||
self.assertEqual(
|
||||
magpie_tools.status_for_speed_check('50%', 100, 400),
|
||||
', speed failed: 100 < 200 mbit/s'
|
||||
)
|
||||
self.assertEqual(
|
||||
magpie_tools.status_for_speed_check('50%', 200, 400),
|
||||
', speed ok: 200 mbit/s'
|
||||
)
|
||||
self.assertEqual(
|
||||
magpie_tools.status_for_speed_check('50%', 300, 400),
|
||||
', speed ok: 300 mbit/s'
|
||||
)
|
||||
self.assertEqual(
|
||||
magpie_tools.status_for_speed_check('50%', 300, -1),
|
||||
', speed failed: link speed undefined'
|
||||
)
|
||||
|
||||
@patch('lib.charms.layer.magpie_tools.open',
|
||||
mock_open(read_data=LACP_STATE_SLOW_ACTIVE))
|
||||
def test_check_lacp_port_state_match_default(self):
|
||||
self.hookenv.config.return_value = {}
|
||||
self.assertIsNone(magpie_tools.check_lacp_port_state('test'))
|
||||
|
||||
@patch('lib.charms.layer.magpie_tools.open',
|
||||
mock_open(read_data=LACP_STATE_SLOW_ACTIVE))
|
||||
def test_check_lacp_port_state_match_explicit_active(self):
|
||||
self.hookenv.config.return_value = {'lacp_passive_mode': False}
|
||||
self.assertIsNone(magpie_tools.check_lacp_port_state('test'))
|
||||
|
||||
@patch('lib.charms.layer.magpie_tools.open',
|
||||
mock_open(read_data=LACP_STATE_SLOW_ACTIVE))
|
||||
def test_check_lacp_port_state_match_passive(self):
|
||||
self.hookenv.config.return_value = {'lacp_passive_mode': True}
|
||||
self.assertIsNone(magpie_tools.check_lacp_port_state('test'))
|
||||
|
||||
@patch('lib.charms.layer.magpie_tools.open')
|
||||
def test_check_lacp_port_state_passive_expected_mismatch(self, open_):
|
||||
open_.side_effect = mocked_open_lacp_port_state(
|
||||
LACP_STATE_SLOW_ACTIVE, LACP_STATE_SLOW_PASSIVE
|
||||
)
|
||||
self.hookenv.config.return_value = {'lacp_passive_mode': True}
|
||||
self.assertIsNone(magpie_tools.check_lacp_port_state('test'))
|
||||
|
||||
@patch('lib.charms.layer.magpie_tools.open')
|
||||
def test_check_lacp_port_state_passive_default(self, open_):
|
||||
open_.side_effect = mocked_open_lacp_port_state(
|
||||
LACP_STATE_SLOW_ACTIVE, LACP_STATE_SLOW_PASSIVE
|
||||
)
|
||||
self.hookenv.config.return_value = {}
|
||||
self.assertEqual(
|
||||
magpie_tools.check_lacp_port_state('test'),
|
||||
'lacp_port_state_mismatch')
|
||||
|
||||
@patch('lib.charms.layer.magpie_tools.open')
|
||||
def test_check_lacp_port_state_passive_configured_active(self, open_):
|
||||
open_.side_effect = mocked_open_lacp_port_state(
|
||||
LACP_STATE_SLOW_ACTIVE, LACP_STATE_SLOW_PASSIVE
|
||||
)
|
||||
self.hookenv.config.return_value = {'lacp_passive_mode': False}
|
||||
self.assertEqual(
|
||||
magpie_tools.check_lacp_port_state('test'),
|
||||
'lacp_port_state_mismatch')
|
||||
|
||||
@patch('lib.charms.layer.magpie_tools.open')
|
||||
def test_check_lacp_port_state_passive_unexpected_mismatch(self, open_):
|
||||
open_.side_effect = mocked_open_lacp_port_state(
|
||||
LACP_STATE_FAST_ACTIVE, LACP_STATE_SLOW_PASSIVE
|
||||
)
|
||||
self.hookenv.config.return_value = {'lacp_passive_mode': True}
|
||||
self.assertEqual(
|
||||
magpie_tools.check_lacp_port_state('test'),
|
||||
'lacp_port_state_mismatch')
|
||||
|
||||
def test_get_link_speed(self):
|
||||
# Normal operation
|
||||
with patch_open() as (mock_open, mock_file):
|
||||
mock_file.read.return_value = b'1000'
|
||||
self.assertEqual(
|
||||
1000,
|
||||
magpie_tools.get_link_speed('eth0'),
|
||||
)
|
||||
mock_open.assert_called_once_with('/sys/class/net/eth0/speed')
|
||||
# Invalid argument
|
||||
with patch_open() as (mock_open, mock_file):
|
||||
mock_open.side_effect = OSError()
|
||||
self.assertEqual(
|
||||
-1,
|
||||
magpie_tools.get_link_speed('eth0'),
|
||||
)
|
||||
|
||||
@async_test
|
||||
@patch(
|
||||
"lib.charms.layer.magpie_tools.get_iface_mac",
|
||||
lambda _: "de:ad:be:ef:01:01"
|
||||
)
|
||||
@patch(
|
||||
"lib.charms.layer.magpie_tools.get_dest_mac",
|
||||
lambda _, __: "de:ad:be:ef:02:02"
|
||||
)
|
||||
@patch(
|
||||
"lib.charms.layer.magpie_tools.ch_ip.get_iface_from_addr",
|
||||
lambda _: "de:ad:be:ef:03:03"
|
||||
)
|
||||
@patch(
|
||||
"lib.charms.layer.magpie_tools.get_src_ip_from_dest",
|
||||
lambda _: "192.168.2.2"
|
||||
)
|
||||
@patch("lib.charms.layer.magpie_tools.run")
|
||||
async def test_run_iperf(self, mock_run):
|
||||
|
||||
async def mocked_run(cmd):
|
||||
return """
|
||||
19700101000000,192.168.2.2,60266,192.168.2.1,5001,2,0.0-10.1,95158332,75301087
|
||||
19700101000000,192.168.2.2,60268,192.168.2.1,5001,1,0.0-10.1,61742908,27989222
|
||||
"""
|
||||
|
||||
mock_run.side_effect = mocked_run
|
||||
result = await magpie_tools.run_iperf(
|
||||
"mynode", "192.168.2.1", "10", "2"
|
||||
)
|
||||
|
||||
mock_run.assert_called_once_with(
|
||||
"iperf -t10 -c 192.168.2.1 --port 5001 -P2 --reportstyle c"
|
||||
)
|
||||
self.assertEqual(result, {
|
||||
"GBytes_transferred": 0.146,
|
||||
"Mbits_per_second": 98,
|
||||
"bits_per_second": 103290309,
|
||||
"concurrency": "2",
|
||||
"dest_ip": "192.168.2.1",
|
||||
"dest_node": "mynode",
|
||||
"dest_port": "5001",
|
||||
"session": [2, 1],
|
||||
"src_ip": "192.168.2.2",
|
||||
"src_port": [60266, 60268],
|
||||
"time_interval": "0.0-10.1",
|
||||
"timestamp": "19700101000000",
|
||||
"transferred_bytes": 156901240,
|
||||
"src_mac": "de:ad:be:ef:01:01",
|
||||
"dest_mac": "de:ad:be:ef:02:02",
|
||||
"src_interface": "de:ad:be:ef:03:03",
|
||||
})
|
||||
|
||||
@patch('netifaces.AF_LINK', 17)
|
||||
@patch.object(netifaces, 'ifaddresses')
|
||||
@patch.object(netifaces, 'interfaces')
|
||||
def test_get_iface_mac(self, mock_interfaces, mock_addresses):
|
||||
mock_interfaces.return_value = [
|
||||
'lo',
|
||||
'enp0s31f6',
|
||||
'eth0',
|
||||
'bond0',
|
||||
'br0'
|
||||
]
|
||||
mock_addresses.return_value = {
|
||||
17: [{'addr': 'c8:5b:76:80:86:01'}],
|
||||
2: [{'addr': '192.168.123.45', 'netmask': '255.255.255.0'}],
|
||||
}
|
||||
|
||||
# with interface listed by netifaces
|
||||
self.assertEqual(
|
||||
magpie_tools.get_iface_mac('bond0'),
|
||||
'c8:5b:76:80:86:01',
|
||||
)
|
||||
# with unknown interface
|
||||
self.assertEqual(
|
||||
'',
|
||||
magpie_tools.get_iface_mac('wronginterface0')
|
||||
)
|
||||
|
||||
@patch('subprocess.PIPE', None)
|
||||
@patch('subprocess.run')
|
||||
def test_get_dest_mac(self, mock_subprocess):
|
||||
mock_stdout = MagicMock()
|
||||
mock_stdout.configure_mock(
|
||||
**{
|
||||
'stdout.decode.return_value': '[{"dst":"192.168.12.1",'
|
||||
'"lladdr":"dc:fb:02:d1:28:18","state":["REACHABLE"]}]'
|
||||
}
|
||||
)
|
||||
mock_subprocess.return_value = mock_stdout
|
||||
self.assertEqual(
|
||||
magpie_tools.get_dest_mac("eth0", "192.168.12.1"),
|
||||
'dc:fb:02:d1:28:18',
|
||||
)
|
||||
|
||||
@patch('subprocess.PIPE', None)
|
||||
@patch('subprocess.run')
|
||||
def test_get_src_ip_from_dest(self, mock_subprocess):
|
||||
mock_stdout = MagicMock()
|
||||
mock_stdout.configure_mock(
|
||||
**{
|
||||
'stdout.decode.return_value': '[{"dst":"192.168.12.1",'
|
||||
'"dev":"enp5s0","prefsrc":"192.168.12.15","flags":[],'
|
||||
'"uid":1000,"cache":[]}]'
|
||||
}
|
||||
)
|
||||
mock_subprocess.return_value = mock_stdout
|
||||
self.assertEqual(
|
||||
magpie_tools.get_src_ip_from_dest("192.168.12.1"),
|
||||
'192.168.12.15',
|
||||
)
|
||||
|
||||
def test_parse_dig_yaml(self):
|
||||
output = """
|
||||
-
|
||||
type: MESSAGE
|
||||
message:
|
||||
response_message_data:
|
||||
ANSWER_SECTION:
|
||||
- 99.0.0.10.in-addr.arpa. 30 IN PTR example.com.
|
||||
"""
|
||||
result, stderr = magpie_tools.parse_dig_yaml(
|
||||
output,
|
||||
"",
|
||||
1,
|
||||
30,
|
||||
is_reverse_query=True,
|
||||
)
|
||||
self.assertEqual(result, 'example.com')
|
||||
self.assertEqual(stderr, 0)
|
||||
|
||||
@patch('subprocess.check_output')
|
||||
def test_parse_dig_yaml_calls_resolves_cname(self, mock_subprocess):
|
||||
output = "-\n type: MESSAGE\n"
|
||||
output += " message:\n"
|
||||
output += " response_message_data:\n"
|
||||
output += " ANSWER_SECTION:\n"
|
||||
output += " - 99.0.0.10.in-addr.arpa. 30 IN CNAME"
|
||||
output += " 99.1-25.0.0.10.in-addr.arpa"
|
||||
|
||||
rev_response = """
|
||||
-
|
||||
type: MESSAGE
|
||||
message:
|
||||
response_message_data:
|
||||
ANSWER_SECTION:
|
||||
- 99.0.0.10.in-addr.arpa. 30 IN PTR example.com.
|
||||
"""
|
||||
mock_subprocess.side_effect = [
|
||||
bytes(rev_response, "utf-8")
|
||||
]
|
||||
result, stderr = magpie_tools.parse_dig_yaml(
|
||||
output,
|
||||
"",
|
||||
1,
|
||||
30,
|
||||
is_reverse_query=True,
|
||||
)
|
||||
self.assertEqual(result, 'example.com')
|
||||
self.assertEqual(stderr, 0)
|
||||
|
||||
@patch('subprocess.check_output')
|
||||
def test_forward_dns_good(self, mock_subprocess):
|
||||
ip = "10.0.0.99"
|
||||
unit_id = "magpie/0"
|
||||
self.hookenv.config.return_value = {
|
||||
"dns_server": "127.0.0.1",
|
||||
"dns_tries": "1",
|
||||
"dns_time": "3"
|
||||
}
|
||||
rev_response = """
|
||||
-
|
||||
type: MESSAGE
|
||||
message:
|
||||
response_message_data:
|
||||
ANSWER_SECTION:
|
||||
- 99.0.0.10.in-addr.arpa. 30 IN PTR example.com.
|
||||
"""
|
||||
fwd_response = """
|
||||
-
|
||||
type: MESSAGE
|
||||
message:
|
||||
response_message_data:
|
||||
ANSWER_SECTION:
|
||||
- example.com. 30 IN A 10.0.0.99
|
||||
"""
|
||||
mock_subprocess.side_effect = [
|
||||
bytes(rev_response, "utf-8"), # for reverse_dns
|
||||
bytes(fwd_response, "utf-8") # for forward_dns
|
||||
]
|
||||
norev, nofwd, nomatch = magpie_tools.check_dns([(unit_id, ip)])
|
||||
self.assertEqual(
|
||||
norev, [], "Reverse lookup failed for IP {}".format(ip))
|
||||
self.assertEqual(
|
||||
nofwd, [], ("Forward lookup failed for IP {}, "
|
||||
"faked to example.com".format(ip)))
|
||||
self.assertEqual(
|
||||
nomatch, [], "Reverse and forward lookups didn't match")
|
||||
|
||||
@patch('subprocess.check_output')
|
||||
def test_forward_dns_multiple_ips(self, mock_subprocess):
|
||||
ip = "10.0.0.99"
|
||||
unit_id = "magpie/0"
|
||||
self.hookenv.config.return_value = {
|
||||
"dns_server": "127.0.0.1",
|
||||
"dns_tries": "1",
|
||||
"dns_time": "3"
|
||||
}
|
||||
rev_response = """
|
||||
-
|
||||
type: MESSAGE
|
||||
message:
|
||||
response_message_data:
|
||||
ANSWER_SECTION:
|
||||
- 99.0.0.10.in-addr.arpa. 30 IN PTR example.com.
|
||||
"""
|
||||
fwd_response = """
|
||||
-
|
||||
type: MESSAGE
|
||||
message:
|
||||
response_message_data:
|
||||
ANSWER_SECTION:
|
||||
- example.com. 30 IN A 10.0.0.99
|
||||
- example.com. 30 IN A 10.1.0.99
|
||||
- example.com. 30 IN A 10.2.0.99
|
||||
"""
|
||||
mock_subprocess.side_effect = [
|
||||
bytes(rev_response, "utf-8"), # for reverse_dns
|
||||
bytes(fwd_response, "utf-8") # for forward_dns
|
||||
]
|
||||
norev, nofwd, nomatch = magpie_tools.check_dns([(unit_id, ip)])
|
||||
self.assertEqual(
|
||||
norev, [], "Reverse lookup failed for IP {}".format(ip))
|
||||
self.assertEqual(
|
||||
nofwd, [], ("Forward lookup failed for IP {}, "
|
||||
"faked to example.com".format(ip))
|
||||
)
|
||||
self.assertEqual(
|
||||
nomatch, [], "Reverse and forward lookups didn't match")
|
||||
self.hookenv.log.assert_any_call(
|
||||
"Forward result for unit_id: 0, "
|
||||
"ip: 10.0.0.99\n10.1.0.99\n10.2.0.99, exitcode: 0"
|
||||
)
|
||||
self.hookenv.log.assert_any_call(
|
||||
"Original IP and Forward MATCH OK for unit_id: 0, "
|
||||
"Original: 10.0.0.99, "
|
||||
"Forward: ['10.0.0.99', '10.1.0.99', '10.2.0.99']", "INFO"
|
||||
)
|
||||
|
||||
@patch('subprocess.check_output')
|
||||
def test_cname_dns_is_followed(self, mock_subprocess):
|
||||
ip = "10.0.0.99"
|
||||
unit_id = "magpie/0"
|
||||
self.hookenv.config.return_value = {
|
||||
"dns_server": "127.0.0.1",
|
||||
"dns_tries": "1",
|
||||
"dns_time": "3",
|
||||
}
|
||||
rev_response = "-\n"
|
||||
rev_response += " type: MESSAGE\n"
|
||||
rev_response += " message:\n"
|
||||
rev_response += " response_message_data:\n"
|
||||
rev_response += " ANSWER_SECTION:\n"
|
||||
rev_response += " - 99.0.0.10.in-addr.arpa. 30 IN CNAME"
|
||||
rev_response += " 99.1-25.0.0.10.in-addr.arpa."
|
||||
cname_response = """
|
||||
-
|
||||
type: MESSAGE
|
||||
message:
|
||||
response_message_data:
|
||||
ANSWER_SECTION:
|
||||
- 99.0-25.0.10.in-addr.arpa. 30 IN PTR example.com.
|
||||
- 99.0-25.0.10.in-addr.arpa. 30 IN PTR other.example.com.
|
||||
"""
|
||||
fwd_response_1 = """
|
||||
-
|
||||
type: MESSAGE
|
||||
message:
|
||||
response_message_data:
|
||||
ANSWER_SECTION:
|
||||
- example.com. 30 IN A 10.0.0.99
|
||||
"""
|
||||
fwd_response_2 = """
|
||||
-
|
||||
type: MESSAGE
|
||||
message:
|
||||
response_message_data:
|
||||
ANSWER_SECTION:
|
||||
- other.example.com. 30 IN A 10.0.0.99
|
||||
"""
|
||||
mock_subprocess.side_effect = [
|
||||
bytes(rev_response, "utf-8"), # for reverse_dns
|
||||
bytes(cname_response, "utf-8"), # for resolve_cname
|
||||
bytes(fwd_response_1, "utf-8"), # for forward_dns
|
||||
bytes(fwd_response_2, "utf-8") # for forward_dns
|
||||
]
|
||||
norev, nofwd, nomatch = magpie_tools.check_dns([(unit_id, ip)])
|
||||
self.assertEqual(
|
||||
norev, [], "Reverse lookup failed for IP {}".format(ip))
|
||||
self.assertEqual(
|
||||
nofwd, [], ("Forward lookup failed for IP {}, "
|
||||
"faked to example.com".format(ip))
|
||||
)
|
||||
self.assertEqual(
|
||||
nomatch, [], "Reverse and forward lookups didn't match")
|
||||
self.hookenv.log.assert_any_call(
|
||||
"Forward result for unit_id: 0, "
|
||||
"ip: 10.0.0.99, exitcode: 0"
|
||||
)
|
||||
self.hookenv.log.assert_any_call(
|
||||
"Original IP and Forward MATCH OK for unit_id: 0, "
|
||||
"Original: 10.0.0.99, "
|
||||
"Forward: ['10.0.0.99']", "INFO"
|
||||
)
|
||||
|
||||
@patch('subprocess.check_output')
|
||||
def test_check_dns_gracefully_handles_no_answer(self, mock_subprocess):
|
||||
ip = "10.0.0.99"
|
||||
unit_id = "magpie/0"
|
||||
self.hookenv.config.return_value = {
|
||||
"dns_server": "127.0.0.1",
|
||||
"dns_tries": "1",
|
||||
"dns_time": "3"
|
||||
}
|
||||
rev_response = """
|
||||
-
|
||||
type: MESSAGE
|
||||
message:
|
||||
response_message_data: {}
|
||||
"""
|
||||
fwd_response = """
|
||||
-
|
||||
type: MESSAGE
|
||||
message:
|
||||
response_message_data: {}
|
||||
"""
|
||||
mock_subprocess.side_effect = [
|
||||
bytes(rev_response, "utf-8"), # for reverse_dns
|
||||
bytes(fwd_response, "utf-8") # for forward_dns
|
||||
]
|
||||
norev, nofwd, nomatch = magpie_tools.check_dns([(unit_id, ip)])
|
||||
self.assertEqual(
|
||||
norev, ['0'], "Reverse lookup had an answer for {}".format(ip))
|
||||
self.assertEqual(
|
||||
nofwd, [], ("Forward lookup failed for IP {}, "
|
||||
"faked to example.com".format(ip)))
|
||||
self.assertEqual(
|
||||
nomatch, [], "Reverse and forward lookups didn't match")
|
@ -1,89 +0,0 @@
|
||||
import asyncio
|
||||
import contextlib
|
||||
import io
|
||||
import mock
|
||||
import unittest
|
||||
import unittest.mock
|
||||
|
||||
|
||||
@contextlib.contextmanager
|
||||
def patch_open():
|
||||
'''Patch open() to allow mocking both open() itself and the file that is
|
||||
yielded.
|
||||
Yields the mock for "open" and "file", respectively.'''
|
||||
mock_open = mock.MagicMock(spec=open)
|
||||
mock_file = mock.MagicMock(spec=io.FileIO)
|
||||
|
||||
@contextlib.contextmanager
|
||||
def stub_open(*args, **kwargs):
|
||||
mock_open(*args, **kwargs)
|
||||
yield mock_file
|
||||
|
||||
with mock.patch('builtins.open', stub_open):
|
||||
yield mock_open, mock_file
|
||||
|
||||
|
||||
def async_test(f):
|
||||
"""
|
||||
A decorator to test async functions within a synchronous environment.
|
||||
|
||||
see https://stackoverflow.com/questions/23033939/
|
||||
"""
|
||||
def wrapper(*args, **kwargs):
|
||||
coro = asyncio.coroutine(f)
|
||||
future = coro(*args, **kwargs)
|
||||
loop = asyncio.get_event_loop()
|
||||
loop.run_until_complete(future)
|
||||
return wrapper
|
||||
|
||||
|
||||
class CharmTestCase(unittest.TestCase):
|
||||
|
||||
def setUp(self):
|
||||
self._patches = {}
|
||||
self._patches_start = {}
|
||||
|
||||
def tearDown(self):
|
||||
for k, v in self._patches.items():
|
||||
v.stop()
|
||||
setattr(self, k, None)
|
||||
self._patches = None
|
||||
self._patches_start = None
|
||||
|
||||
def _patch(self, method):
|
||||
_m = unittest.mock.patch.object(self.obj, method)
|
||||
mock = _m.start()
|
||||
self.addCleanup(_m.stop)
|
||||
return mock
|
||||
|
||||
def patch_all(self):
|
||||
for method in self.patches:
|
||||
setattr(self, method, self._patch(method))
|
||||
|
||||
def patch_object(self, obj, attr, return_value=None, name=None, new=None,
|
||||
**kwargs):
|
||||
if name is None:
|
||||
name = attr
|
||||
if new is not None:
|
||||
mocked = mock.patch.object(obj, attr, new=new, **kwargs)
|
||||
else:
|
||||
mocked = mock.patch.object(obj, attr, **kwargs)
|
||||
self._patches[name] = mocked
|
||||
started = mocked.start()
|
||||
if new is None:
|
||||
started.return_value = return_value
|
||||
self._patches_start[name] = started
|
||||
setattr(self, name, started)
|
||||
|
||||
def patch(self, item, return_value=None, name=None, new=None, **kwargs):
|
||||
if name is None:
|
||||
raise RuntimeError("Must pass 'name' to .patch()")
|
||||
if new is not None:
|
||||
mocked = mock.patch(item, new=new, **kwargs)
|
||||
else:
|
||||
mocked = mock.patch(item, **kwargs)
|
||||
self._patches[name] = mocked
|
||||
started = mocked.start()
|
||||
if new is None:
|
||||
started.return_value = return_value
|
||||
self._patches_start[name] = started
|
Loading…
Reference in New Issue
Block a user