This story introduces upgrade to Net-SNMP 5.8 version into the StarlingX solution in order to support SNMP v2c and v3 and provides a Net-SNMP containerized solution.
Users want the ability to manage the StarlingX solution with SNMP v2c and v3. Current StarlingX does not support SNMPv3. The infrastructure management shall include the next requirements:
Net-SNMP's features include all the mentioned requirements. Net-SNMP is an open source project. More information available at http://www.Net-SNMP.org/docs/readmefiles.html.
In addition to providing SNMPv3 support, this story will also containerize the StarlingX SNMP solution. This is consistent with long term direction of StarlingX, to containerize more of the StarlingX flock components.
StarlingX platform is currently supporting SNMP v2c in a non-containerized solution, on the host of controller/master nodes. It uses the dynamic-loading/SNMPd-plugin approach to bind the host-based FM get methods to the appropriate nodes of the OID tree in the host-based Net-SNMP process. It uses the SNMPtrap CLI invoked from host-based FM alarm/log collection code, to generate SNMP Traps. And finally, it uses StarlingX REST APIs / CLIs to configure V2C Communities and V2 Trap Destinations.
The StarlingX SNMP solution will change to use extended Net-SNMP's MasterAgent/SubAgent integration in order to deal with Net-SNMP being containerized and the FM application, supporting the wrsAlarmMib, being either host-based (current) or containerized (future). Specifically, Net-SNMP will run in a container as the MasterAgent, and a containerized FM-SubAgent will be implemented to interact with the host-based FM application's postgres DB Tables. The (containerized) FM SubAgent will internally use the existing cgtsAgentplugin logic (through fmcommon.so), to bind the existing host-based FM query methods to the appropriate local OID trees (alarm & events) within the SubAgent code and trigger the SubAgent to register for those OID subtrees with the Net-SNMP MasterAgent.
A containerized FM-Trap-SubAgent will be implemented to interact with the host-based FM application's log handling and the Net-SNMP MasterAgent. Specifically, the host-based FM-Mgr trap handling code will forward the alarm/log data to the FM-Trap-SubAgent (if configured), and the FM-Trap-SubAgent will leverage Net-SNMP subagent APIs for generating traps and sending to the Net-SNMP MasterAgent for distribution to the configured trap destinations.
V2C Communities, V3 users and Trap Destinations will be configured through override values in the Net-SNMP helm chart, which will be part of the new Net-SNMP system application. The existing StarlingX REST APIs / CLIs for SNMP configuration will be removed.
The Net-SNMP helm chart will use a kubernetes deployment and liveness/readiness probes. Net-SNMP does not support an active/active deployment, therefore the kubernetes deployment will be limited to a replica of 1 and rely on kubernetes dead host detection times and dead container detection times (through liveness/readiness probes) in order to restart failed SNMP containers.
For networking, the nginx-ingress-controller in the platform will be used to direct ingress traffic from UDP port 161 to the internal Net-SNMP ClusterIP kubernetes service.
For Distributed Cloud configuration, the syncing of SNMP trap destination and community configuration accross subclouds would be removed. Each subcloud will need to be configured for SNMP independently, through the SNMP Helm chart / Armada application.
A new optional ‘SNMP’ system application (Armada manifest and Helm chart) will be developed. This will include:
The Net-SNMP Armada application tarball will be packaged as an RPM in the StarlingX ISO such that the application tarball is installed (but not uploaded or applied) as part of the StarlingX install.
The existing Net-SNMP integration in StarlingX could have been extended to support SNMPv3, by adding new V3 Users and V3 Trap Destinations to the StarlingX REST APIs / CLIs. However, given the long-term direction for StarlingX to containerize its flock components and given that the SNMP solution is relatively isolated, it was decided to containerize the SNMP solution and leverage Helm for deployment and configuration of Net-SNMP.
For High Availability, for improved switchover times on failure, we may look at leveraging Kubernetes leader election to run Net-SNMP active/standby within a deployment of replica=2 .
There are others commercial and open source alternatives rather than Net-SNMP, however Net-SNMP is being the SNMP tool installed in StarlingX in current implementation, it is an mature Open Source project with more than 20 years in the market and a lot of releases and it has been integrated with StarlingX successfully. Net-SNMP has also an active user and developer community support.
The existing StarlingX Data Model of SNMP configuration will be removed, I.e. specifically the postgres DB tables and sysinv CLI/RESTAPIs for the SNMP V2C Community table and the SNMP V2C Trap Destination Tables. SNMP Configuration will now be done through Helm Chart overrides of the Net-SNMP system application.
Since SNMP support is already provided by Net-SNMP 5.7.2 in StarlingX there are no changes in the internal Net-SNMP data model. The changes will be focused on containerize Net-SNMP 5.8 inside StarlingX solution. Additionally, since SNMP support would be provided by this new optional Armada application, it means that it will not be included in a fresh install.
The following REST APIs for configuring SNMP will be removed:
SNMP Configuration will now be done through Helm Chart override of the Net-SNMP system application.
Support for SNMPv3 provides improved security over the current SNMPv2C support. SNMPv3 provides both secure user/password authentication and encryption of SNMP PDUs. SNMPv2C provides only a clear text password/community-string check and no encryption.
Net-SNMP is currently working on StarlingX solution and the changes to upgrade the Net-SNMP version and start supporting SNMP v3 is not impacting security by exposing a new API for configuration or usage.
Ability to optionally use SNMPv3 instead of SNMPv2 for monitoring StarlingX Alarms and Logs.
Since the solution is to containerize Net-SNMP and the code for sending traps would be modified to support not only SNMP v2c but v3 traps, so there is no impact on performance.
Configuration of SNMP will be done through Helm Chart overrides as opposed to StarlingX REST APIs / CLIs.
This may impact the work currently being done to containerize portions of FM code. This work is covered by a different Storyboard Story and has yet to be merged.
The SNMP solution is not considering to cover the upgrade scenario from STX 4.0 (old StarlingX implementation) to STX 5.0 (new StarlingX implementation). The rationale for this is that SNMP is not a system-critical service and the amount of SNMP configuration, that would need to be re-configured, is extremely small.
The resulting behaviour for software upgrade from STX 4.0 to STX 5.0 will be that any existing SNMP Configuration from the STX4.0 deployment will be lost. After finishing the software upgrade to STX 5.0, the new SNMP Armada application will need to be installed and the old SNMP configuration re-entered as helm overrides for this new SNMP Armada application.
Software upgrades from STX 5.0 to future release will be supported with no configuration loss.
Documentation to be updated with user override configuration parameters and availability of SNMP v3 in StarlingX
Feature storyboard: https://storyboard.openstack.org/#!/story/2008132