From 2aafaa80486da0deb8c1fd118b40f7b3e5a91982 Mon Sep 17 00:00:00 2001
From: Samuel Pilla <sp516w@att.com>
Date: Mon, 24 Jun 2019 08:47:33 -0500
Subject: [PATCH] (armada) Chart Time Metrics

Change-Id: I121d8fcf050a83cbcf01a14c1543d11a0b04ea2a
---
 specs/approved/armada_time_metrics.rst | 156 +++++++++++++++++++++++++
 1 file changed, 156 insertions(+)
 create mode 100644 specs/approved/armada_time_metrics.rst

diff --git a/specs/approved/armada_time_metrics.rst b/specs/approved/armada_time_metrics.rst
new file mode 100644
index 0000000..70ccd55
--- /dev/null
+++ b/specs/approved/armada_time_metrics.rst
@@ -0,0 +1,156 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+
+=======================================
+Time Performance Metrics for Each Chart
+=======================================
+
+Allow time performance metrics on charts including deployment time, upgrade
+time, wait time, test time, and consumed time for docs or resources, if
+applicable.
+
+Problem description
+===================
+
+There are currently no time metrics within Armada for chart deployments,
+upgrades, tests, or other actions. This can cause issues in that there is no
+known time for deployment of an environment, potentially restricting
+deployment or upgrade periods for charts. By logging time metrics that can be
+scraped by CRD and Prometheus, this will allow for better predictability of
+deployments and upgrades as well as show when charts are acting not as intended.
+
+Use Cases
+---------
+
+Knowing how long a chart takes to deploy or upgrade can streamline these
+processes in future deployements or upgrades. It allows for predictable chart
+deployment and upgrade times as well as finding inconsistencies within those
+deployments and upgrades, likely pinpointing which chart(s) is causing errors.
+
+Proposed change
+===============
+
+Add time metrics to the `ChartBuilder`, `ChartDeploy`,  and `ChartDelete`
+classes. The timer will be built in python library `time` which will
+then be written to the logs for use or analysis.
+
+These metrics include the full deployment, upgrade, wait, install, and delete
+time for charts through Armada. These will be logged with a date and timestamp
+with the chart name and action performed, such as the following::
+
+    Ingress DEPLOYMENT start: 2019-06-25 12:34:56 UTC
+    ...
+    Ingress DEPLOYMENT complete: 2019-06-25 13:57:09 UTC
+    Ingress DEPLOYMENT duration: 01:22:13
+
+As shown, the logs will show the chart name, the action the chart is performing,
+the status of the action, and the datetime of the stage along with the duration
+at the end. In case of an error, the `complete` will be replaced with `error`.
+
+In order to log these metrics, changes to the deployment files will need to be
+made, adding in lines to create the timestamps needed and then log the start,
+completion or error, and duration times for the chart's action.
+
+Example:
+
+chart_deploy.py::
+
+    def execute(self, chart, cg_test_all_charts, prefix, known_releases):
+        namespace = chart.get('namespace')
+        release = chart.get('release')
+        release_name = r.release_prefixer(prefix, release)
+        LOG.info('Processing Chart, release=%s', release_name)
+        start_time = time.time()
+    ...
+        LOG.info('Chart deployment/update completed in %s' % \
+        time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
+
+        start_time = time.time()
+        # Wait
+        timer = int(round(deadline - time.time()))
+        chart_wait.wait(timer)
+
+        LOG.info('Chart wait completed in %s' % \
+            time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
+
+        start_time = time.time()
+        # Test
+        just_deployed = ('install' in result) or ('upgrade' in result)
+    ...
+        if run_test:
+            self._test_chart(release_name, test_handler)
+
+        LOG.info('Chart test completed in %s' % \
+            time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
+    ...
+
+The logs will then have the time metrics as follows::
+
+    2019-07-01 00:00:00.000 0 INFO armada.handlers.chart_deploy [-] [chart=chart-name] Beginning chart deployment
+    ...
+    2019-07-01 00:00:00.000 0 INFO armada.handlers.chart_deploy [-] [chart=chart-name] SUCCESS chart deployment complete in 00:00:00.000
+
+Prometheus can then scrape the metrics from the logs as long as the chart has
+enabled it in the Prometheus section of the chart's values.yaml::
+
+    monitoring:
+      prometheus:
+        enabled: false
+        node_exporter:
+          scrape: true
+
+Alternatives
+------------
+
+1. A simplistic alternative is to merely log time stamps for each action which
+occurs on a chart. While similar the same as the proposed change, it would not
+show an elapsed time but just start and end points.
+
+2. Another alternative is to use the `datetime` library instead of the `time`
+library. This allows for very similar functionality in getting the elapsed
+time for chart deployment, update, wait, test, etc. It is slightly more
+effort to get the `timedelta` object produced by comparing two `datetime`
+objects to a string format to put into the log.
+
+
+Security Impact
+---------------
+None
+
+Notifications Impact
+--------------------
+
+Extra notification diplaying deployment or upgrade time
+
+Other End User Impact
+---------------------
+None
+
+Performance Impact
+------------------
+None
+
+Other Deployer Impact
+---------------------
+None
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Work Items
+----------
+
+Dependencies
+============
+None
+
+Documentation Impact
+====================
+None