From 2aafaa80486da0deb8c1fd118b40f7b3e5a91982 Mon Sep 17 00:00:00 2001 From: Samuel Pilla Date: Mon, 24 Jun 2019 08:47:33 -0500 Subject: [PATCH] (armada) Chart Time Metrics Change-Id: I121d8fcf050a83cbcf01a14c1543d11a0b04ea2a --- specs/approved/armada_time_metrics.rst | 156 +++++++++++++++++++++++++ 1 file changed, 156 insertions(+) create mode 100644 specs/approved/armada_time_metrics.rst diff --git a/specs/approved/armada_time_metrics.rst b/specs/approved/armada_time_metrics.rst new file mode 100644 index 0000000..70ccd55 --- /dev/null +++ b/specs/approved/armada_time_metrics.rst @@ -0,0 +1,156 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + + +======================================= +Time Performance Metrics for Each Chart +======================================= + +Allow time performance metrics on charts including deployment time, upgrade +time, wait time, test time, and consumed time for docs or resources, if +applicable. + +Problem description +=================== + +There are currently no time metrics within Armada for chart deployments, +upgrades, tests, or other actions. This can cause issues in that there is no +known time for deployment of an environment, potentially restricting +deployment or upgrade periods for charts. By logging time metrics that can be +scraped by CRD and Prometheus, this will allow for better predictability of +deployments and upgrades as well as show when charts are acting not as intended. + +Use Cases +--------- + +Knowing how long a chart takes to deploy or upgrade can streamline these +processes in future deployements or upgrades. It allows for predictable chart +deployment and upgrade times as well as finding inconsistencies within those +deployments and upgrades, likely pinpointing which chart(s) is causing errors. + +Proposed change +=============== + +Add time metrics to the `ChartBuilder`, `ChartDeploy`, and `ChartDelete` +classes. The timer will be built in python library `time` which will +then be written to the logs for use or analysis. + +These metrics include the full deployment, upgrade, wait, install, and delete +time for charts through Armada. These will be logged with a date and timestamp +with the chart name and action performed, such as the following:: + + Ingress DEPLOYMENT start: 2019-06-25 12:34:56 UTC + ... + Ingress DEPLOYMENT complete: 2019-06-25 13:57:09 UTC + Ingress DEPLOYMENT duration: 01:22:13 + +As shown, the logs will show the chart name, the action the chart is performing, +the status of the action, and the datetime of the stage along with the duration +at the end. In case of an error, the `complete` will be replaced with `error`. + +In order to log these metrics, changes to the deployment files will need to be +made, adding in lines to create the timestamps needed and then log the start, +completion or error, and duration times for the chart's action. + +Example: + +chart_deploy.py:: + + def execute(self, chart, cg_test_all_charts, prefix, known_releases): + namespace = chart.get('namespace') + release = chart.get('release') + release_name = r.release_prefixer(prefix, release) + LOG.info('Processing Chart, release=%s', release_name) + start_time = time.time() + ... + LOG.info('Chart deployment/update completed in %s' % \ + time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time))) + + start_time = time.time() + # Wait + timer = int(round(deadline - time.time())) + chart_wait.wait(timer) + + LOG.info('Chart wait completed in %s' % \ + time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time))) + + start_time = time.time() + # Test + just_deployed = ('install' in result) or ('upgrade' in result) + ... + if run_test: + self._test_chart(release_name, test_handler) + + LOG.info('Chart test completed in %s' % \ + time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time))) + ... + +The logs will then have the time metrics as follows:: + + 2019-07-01 00:00:00.000 0 INFO armada.handlers.chart_deploy [-] [chart=chart-name] Beginning chart deployment + ... + 2019-07-01 00:00:00.000 0 INFO armada.handlers.chart_deploy [-] [chart=chart-name] SUCCESS chart deployment complete in 00:00:00.000 + +Prometheus can then scrape the metrics from the logs as long as the chart has +enabled it in the Prometheus section of the chart's values.yaml:: + + monitoring: + prometheus: + enabled: false + node_exporter: + scrape: true + +Alternatives +------------ + +1. A simplistic alternative is to merely log time stamps for each action which +occurs on a chart. While similar the same as the proposed change, it would not +show an elapsed time but just start and end points. + +2. Another alternative is to use the `datetime` library instead of the `time` +library. This allows for very similar functionality in getting the elapsed +time for chart deployment, update, wait, test, etc. It is slightly more +effort to get the `timedelta` object produced by comparing two `datetime` +objects to a string format to put into the log. + + +Security Impact +--------------- +None + +Notifications Impact +-------------------- + +Extra notification diplaying deployment or upgrade time + +Other End User Impact +--------------------- +None + +Performance Impact +------------------ +None + +Other Deployer Impact +--------------------- +None + +Implementation +============== + +Assignee(s) +----------- + +Work Items +---------- + +Dependencies +============ +None + +Documentation Impact +==================== +None