Add SNMP v3 Support to StarlingX

This story also containerizes the StarlingX SNMP solution,
introducing a new snmp armada application, and changes the
configuration of SNMP from sysinv CLI/RESTAPIs to helm chart
overrides.

Story: 2008132
Task: 40856

Change-Id: I8dd3a8b7bf43ef0bf87df480cf79d9542d2a4d95
Signed-off-by: Gustavo Dobro <gustavo.dobro@windriver.com>
This commit is contained in:
Gustavo Dobro 2020-09-23 00:09:20 -03:00
parent 5b3e10cb1c
commit acedefa12a
1 changed files with 333 additions and 0 deletions

View File

@ -0,0 +1,333 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License. https://creativecommons.org/licenses/by/3.0/legalcode
==============
SNMPv3 Support
==============
Storyboard: https://storyboard.openstack.org/#!/story/2008132
This story introduces upgrade to Net-SNMP 5.8 version into the StarlingX
solution in order to support SNMP v2c and v3 and provides a Net-SNMP
containerized solution.
Problem description
===================
Users want the ability to manage the StarlingX solution with SNMP v2c and v3.
Current StarlingX does not support SNMPv3. The infrastructure management shall
include the next requirements:
* support for both SNMPv2c and SNMPv3
* access by read-only for all v2c communities or all v3 users
* support for SNMP GET, GETNEXT, GETBULK, SNMPv2C-TRAP, SNMPv3Trap, note NO
support for SNMPv3INFORM
* all v2c communities and v3 users to have access to entire OID tree, with no
support for configuring custom Views (VACM),
* support for the basic standard MIB for SNMP entities is limited to the System
and SNMP groups, as follows:
* System Group, .iso.org.dod.internet.mgmt.mib-2.system,
* SNMP Group, .iso.org.dod.internet.mgmt.mib-2.SNMP,
* coldStart and warmStart traps,
* support for the following Enterprise Registration and Alarm MIBs:
* https://opendev.org/starlingx/fault/src/branch/r/stx.4.0/snmp-ext/sources/mibs/wrsEnterpriseReg.mib.txt
* https://opendev.org/starlingx/fault/src/branch/r/stx.4.0/snmp-ext/sources/mibs/wrsAlarmMib.mib.txt
* SNMPv3 security levels supported: NoAuthNoPriv, authNoPriv, authPriv
* MD5 for auth, and DES for priv; as supported by netSNMP.org
* With NO support for SNMP SET.
Net-SNMP's features include all the mentioned requirements. Net-SNMP is an open
source project. More information available at
http://www.Net-SNMP.org/docs/readmefiles.html.
In addition to providing SNMPv3 support, this story will also containerize the
StarlingX SNMP solution. This is consistent with long term direction of
StarlingX, to containerize more of the StarlingX flock components.
Use Cases
---------
* End user wants to monitor StarlingX infrastructures Alarms and Logs via SNMP
v2c and/or v3 from their SNMP Manager.
* End user wants to use SNMP v2c and/or v3 GET/GETNEXT to get the contents of
the ActiveAlarmTable and the EventLogTable in the wrsAlarmMib.
* End user wants to receive SNMP v2c and/or v3 traps defined in the wrsAlarmMib.
Proposed change
===============
SNMP integration
----------------
StarlingX platform is currently supporting SNMP v2c in a non-containerized
solution, on the host of controller/master nodes. It uses the
dynamic-loading/SNMPd-plugin approach to bind the host-based FM get methods to
the appropriate nodes of the OID tree in the host-based Net-SNMP process. It
uses the SNMPtrap CLI invoked from host-based FM alarm/log collection code, to
generate SNMP Traps. And finally, it uses StarlingX REST APIs / CLIs to
configure V2C Communities and V2 Trap Destinations.
The StarlingX SNMP solution will change to use extended Net-SNMP's
MasterAgent/SubAgent integration in order to deal with Net-SNMP being
containerized and the FM application, supporting the wrsAlarmMib, being either
host-based (current) or containerized (future). Specifically, Net-SNMP will run
in a container as the MasterAgent, and a containerized FM-SubAgent will be
implemented to interact with the host-based FM application's postgres DB Tables.
The (containerized) FM SubAgent will internally use the existing
cgtsAgentplugin logic (through fmcommon.so), to bind the existing host-based FM
query methods to the appropriate local OID trees (alarm & events) within the
SubAgent code and trigger the SubAgent to register for those OID subtrees with
the Net-SNMP MasterAgent.
A containerized FM-Trap-SubAgent will be implemented to interact with the
host-based FM application's log handling and the Net-SNMP MasterAgent.
Specifically, the host-based FM-Mgr trap handling code will forward the
alarm/log data to the FM-Trap-SubAgent (if configured), and the FM-Trap-SubAgent
will leverage Net-SNMP subagent APIs for generating traps and sending to the
Net-SNMP MasterAgent for distribution to the configured trap destinations.
V2C Communities, V3 users and Trap Destinations will be configured through
override values in the Net-SNMP helm chart, which will be part of the new
Net-SNMP system application. The existing StarlingX REST APIs / CLIs for SNMP
configuration will be removed.
The Net-SNMP helm chart will use a kubernetes deployment and liveness/readiness
probes. Net-SNMP does not support an active/active deployment, therefore the
kubernetes deployment will be limited to a replica of 1 and rely on kubernetes
dead host detection times and dead container detection times (through
liveness/readiness probes) in order to restart failed SNMP containers.
For networking, the nginx-ingress-controller in the platform will be used to
direct ingress traffic from UDP port 161 to the internal Net-SNMP ClusterIP
kubernetes service.
For Distributed Cloud configuration, the syncing of SNMP trap destination and
community configuration accross subclouds would be removed. Each subcloud will
need to be configured for SNMP independently, through the SNMP Helm chart /
Armada application.
Packaging & installation
------------------------
A new optional SNMP system application (Armada manifest and Helm chart)
will be developed. This will include:
* The building of a Net-SNMP MasterAgent container image within StarlingX and
delivered in the dockerhub StarlingX repo,
* The building of an FM-SubAgent container image (for handling SNMP GETs, etc)
within StarlingX and delivered in the docker hub StarlingX repo,
* The building of an FM-Trap-SubAgent container image (for handling SNMP Traps)
within StarlingX and delivered in the docker hub StarlingX repo,
* An Armada manifest containing a reference to a single Helm chart for Net-SNMP
MasterAgent container, FM-SubAgent container and the FM-Trap-SubAgent
container, and
* A helm chart for the Net-SNMP MasterAgent container, FM-SubAgent container and
the FM-Trap-SubAgent container.
The Net-SNMP Armada application tarball will be packaged as an RPM in the
StarlingX ISO such that the application tarball is installed (but not uploaded
or applied) as part of the StarlingX install.
Alternatives
------------
The existing Net-SNMP integration in StarlingX could have been extended to
support SNMPv3, by adding new V3 Users and V3 Trap Destinations to the StarlingX
REST APIs / CLIs. However, given the long-term direction for StarlingX to
containerize its flock components and given that the SNMP solution is
relatively isolated, it was decided to containerize the SNMP solution and
leverage Helm for deployment and configuration of Net-SNMP.
For High Availability, for improved switchover times on failure, we may look at
leveraging Kubernetes leader election to run Net-SNMP active/standby within a
deployment of replica=2 .
There are others commercial and open source alternatives rather than Net-SNMP,
however Net-SNMP is being the SNMP tool installed in StarlingX in current
implementation, it is an mature Open Source project with more than 20 years in
the market and a lot of releases and it has been integrated with StarlingX
successfully. Net-SNMP has also an active user and developer community support.
Data model impact
-----------------
The existing StarlingX Data Model of SNMP configuration will be removed,
I.e. specifically the postgres DB tables and sysinv CLI/RESTAPIs for the SNMP
V2C Community table and the SNMP V2C Trap Destination Tables. SNMP Configuration
will now be done through Helm Chart overrides of the
Net-SNMP system application.
Since SNMP support is already provided by Net-SNMP 5.7.2 in StarlingX there are
no changes in the internal Net-SNMP data model. The changes will be focused on
containerize Net-SNMP 5.8 inside StarlingX solution.
Additionally, since SNMP support would be provided by this new optional Armada
application, it means that it will not be included in a fresh install.
REST API impact
---------------
The following REST APIs for configuring SNMP will be removed:
* https://docs.starlingx.io/api-ref/config/api-ref-sysinv-v1-config.html#snmp-communities
* https://docs.StarlingX.io/api-ref/config/api-ref-sysinv-v1-config.html#snmp-trap-destinations
SNMP Configuration will now be done through Helm Chart override of the Net-SNMP
system application.
Security impact
---------------
Support for SNMPv3 provides improved security over the current SNMPv2C support.
SNMPv3 provides both secure user/password authentication and encryption of SNMP
PDUs. SNMPv2C provides only a clear text password/community-string check and no
encryption.
Net-SNMP is currently working on StarlingX solution and the changes to upgrade
the Net-SNMP version and start supporting SNMP v3 is not impacting security by
exposing a new API for configuration or usage.
Other end user impact
---------------------
Ability to optionally use SNMPv3 instead of SNMPv2 for monitoring StarlingX
Alarms and Logs.
Performance Impact
------------------
Since the solution is to containerize Net-SNMP and the code for sending traps
would be modified to support not only SNMP v2c but v3 traps, so there is no
impact on performance.
Other deployer impact
---------------------
Configuration of SNMP will be done through Helm Chart overrides as opposed to
StarlingX REST APIs / CLIs.
Developer impact
----------------
This may impact the work currently being done to containerize portions of
FM code. This work is covered by a different Storyboard Story and has yet to be
merged.
Upgrade impact
--------------
The SNMP solution is not considering to cover the upgrade scenario from STX 4.0
(old StarlingX implementation) to STX 5.0 (new StarlingX implementation). The
rationale for this is that SNMP is not a system-critical service and the amount
of SNMP configuration, that would need to be re-configured, is extremely small.
The resulting behaviour for software upgrade from STX 4.0 to STX 5.0 will be
that any existing SNMP Configuration from the STX4.0 deployment will be lost.
After finishing the software upgrade to STX 5.0, the new SNMP Armada application
will need to be installed and the old SNMP configuration re-entered as helm
overrides for this new SNMP Armada application.
Software upgrades from STX 5.0 to future release will be supported with no
configuration loss.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
* Gustavo Dobro (PL)
* Jose Infanzon (TL)
Repos Impacted
--------------
* Net-SNMP-armada-app (new repo)
* config
* config-files
* distcloud
* fault
Work Items
----------
* Create new repo for the new application 'SNMP',
* Create SNMP helm chart, containing Net-SNMP MasterAgent container, FM-SubAgent
container and the FM-Trap-SubAgent container,
* With helm chart override values for configuring Net-SNMP and adding additional
mibs,
* Define required armada manifest,
* Build new SNMP armada tarball and package in RPM,
* Build and deliver Net-SNMP MasterAgent container image,
* Implement system override plugin for the SNMP armada application in order to
determine FM DB connection values from current system configuration and pass
those details to the Net-SNMP MasterAgent container through a helm chart
override,
* Only required depending on # of replicas supported,
* Remove existing StarlingX REST API and CLI commands related to SNMP
configuration,
* Implement FM SubAgent container image and support for SNMP GET/GETNEXT,
* Implement FM generation container image of traps within context of SubAgent,
* Implement changes to host-based FM-Mgr's asynchronous generated alarm/log
handling to send alarm/log data to the FM-Trap-SubAgent, if configured,
* Remove existing host-based Net-SNMP implementation,
* Update existing documentation.
Dependencies
============
None
Testing
=======
* SNMP pods should return to a ready state after being restarted as indicated by
'kubectl get pods'.
* User overrides should be available for various parameters including SNMP
configuration.
* Users should be able to perform SNMPGET/BULK/WALK operations with SNMP v2c
and v3.
* Configure SNMP trap destination and check if SNMP v2c and v3 traps are sent.
* Validate that coldTraps and warmTraps are being sent.
* Validate all existing StarlingX REST API / CLI commands related to SNMP are
removed and documentation is updated.
* Validate documentation on configurating SNMP.
* Verify that on StarlingX Install, the new SNMP application is installed but
NOT uploaded and NOT applied,
* Verify system behaviour (e.g. log/alarm handling) with SNMP application NOT
applied,
* Verify system behaviour with SNMP application applied, and v2c communities and
V3 users and trap destinations defined,
* Verify system behaviour after removing SNMP application.
* Test system behaviour when incorrect snmpd.conf data is specified in helm
chart overrides. And document procedure for user to verify that SNMP
application applied without error, and if error, how to determine info on
error.
* Test on all system configurations (AIO-SX, AIO-DX, Standard and DC)
* Test controller switchovers (failures and manual) on dual controller systems
* Test Dead-Office-Recovery
* Test that the upgrade from from STX.4.0 to STX.5.0 removes STX.4.0 SNMP
configuration and that SNMP Armada application can be installed and configured
on STX.5.0 after the upgrade.
Documentation Impact
====================
Documentation to be updated with user override configuration parameters and
availability of SNMP v3 in StarlingX
References
==========
Feature storyboard: https://storyboard.openstack.org/#!/story/2008132
Net-SNMP: http://www.Net-SNMP.org/
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - STX 5.0
- Introduced