Remove project content on master branch
This is step 2b of repository deprecation process as described in [1]. Project deprecation has been anounced here [2]. [1] https://docs.openstack.org/project-team-guide/repository.html#step-2b-remove-project-content [2] http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016814.html Depends-On: https://review.opendev.org/751983 Change-Id: I83bb2821d64a4dddd569ff9939aa78d271834f08
This commit is contained in:
parent
326483ee4c
commit
811acd76c9
9
.gitignore
vendored
9
.gitignore
vendored
@ -1,9 +0,0 @@
|
||||
.idea
|
||||
AUTHORS
|
||||
ChangeLog
|
||||
monasca_transform.egg-info
|
||||
tools/vagrant/.vagrant
|
||||
doc/build/*
|
||||
.stestr
|
||||
.tox
|
||||
*.pyc
|
@ -1,3 +0,0 @@
|
||||
[DEFAULT]
|
||||
test_path=${OS_TEST_PATH:-./tests/unit}
|
||||
top_dir=./
|
14
.zuul.yaml
14
.zuul.yaml
@ -1,14 +0,0 @@
|
||||
- project:
|
||||
templates:
|
||||
- build-openstack-docs-pti
|
||||
- check-requirements
|
||||
- openstack-cover-jobs
|
||||
- openstack-lower-constraints-jobs
|
||||
- openstack-python3-victoria-jobs
|
||||
check:
|
||||
jobs:
|
||||
- legacy-tempest-dsvm-monasca-transform-python35-functional:
|
||||
voting: false
|
||||
irrelevant-files:
|
||||
- ^(test-|)requirements.txt$
|
||||
- ^setup.cfg$
|
175
LICENSE
175
LICENSE
@ -1,175 +0,0 @@
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
114
README.rst
114
README.rst
@ -1,110 +1,16 @@
|
||||
Team and repository tags
|
||||
========================
|
||||
|
||||
.. image:: https://governance.openstack.org/tc/badges/monasca-transform.svg
|
||||
:target: https://governance.openstack.org/tc/reference/tags/index.html
|
||||
|
||||
- `Monasca Transform`_
|
||||
|
||||
- `Use Cases handled by Monasca Transform`_
|
||||
- `Operation`_
|
||||
- `Architecture`_
|
||||
- `To set up the development environment`_
|
||||
- `Generic aggregation components`_
|
||||
- `Create a new aggregation pipeline example`_
|
||||
- `Original proposal and blueprint`_
|
||||
|
||||
Monasca Transform
|
||||
=================
|
||||
|
||||
monasca-transform is a data driven aggregation engine which collects,
|
||||
groups and aggregates existing individual Monasca metrics according to
|
||||
business requirements and publishes new transformed (derived) metrics to
|
||||
the Monasca Kafka queue.
|
||||
This project is no longer maintained.
|
||||
|
||||
- Since the new transformed metrics are published as any other metric
|
||||
in Monasca, alarms can be set and triggered on the transformed
|
||||
metric.
|
||||
The contents of this repository are still available in the Git
|
||||
source code management system. To see the contents of this
|
||||
repository before it reached its end of life, please check out the
|
||||
previous commit with "git checkout HEAD^1".
|
||||
|
||||
- Monasca Transform uses `Apache Spark`_ to aggregate data. `Apache
|
||||
Spark`_ is a highly scalable, fast, in-memory, fault tolerant and
|
||||
parallel data processing framework. All monasca-transform components
|
||||
are implemented in Python and use Spark’s `PySpark Python API`_ to
|
||||
interact with Spark.
|
||||
Older versions of this project are still supported and available in stable
|
||||
branches.
|
||||
|
||||
- Monasca Transform does transformation and aggregation of incoming
|
||||
metrics in two phases.
|
||||
|
||||
- In the first phase spark streaming application is set to retrieve
|
||||
in data from kafka at a configurable *stream interval* (default
|
||||
*stream_inteval* is 10 minutes) and write the data aggregated for
|
||||
*stream interval* to *pre_hourly_metrics* topic in kafka.
|
||||
|
||||
- In the second phase, which is kicked off every hour, all metrics
|
||||
in *metrics_pre_hourly* topic in Kafka are aggregated again, this
|
||||
time over a larger interval of an hour. These hourly aggregated
|
||||
metrics published to *metrics* topic in kafka.
|
||||
|
||||
Use Cases handled by Monasca Transform
|
||||
--------------------------------------
|
||||
|
||||
Please refer to **Problem Description** section on the
|
||||
`Monasca/Transform wiki`_
|
||||
|
||||
Operation
|
||||
---------
|
||||
|
||||
Please refer to **How Monasca Transform Operates** section on the
|
||||
`Monasca/Transform wiki`_
|
||||
|
||||
Architecture
|
||||
------------
|
||||
|
||||
Please refer to **Architecture** and **Logical processing data flow**
|
||||
sections on the `Monasca/Transform wiki`_
|
||||
|
||||
To set up the development environment
|
||||
-------------------------------------
|
||||
|
||||
The monasca-transform uses `DevStack`_ as a common dev environment. See
|
||||
the `README.md`_ in the devstack directory for details on how to include
|
||||
monasca-transform in a DevStack deployment.
|
||||
|
||||
Generic aggregation components
|
||||
------------------------------
|
||||
|
||||
Monasca Transform uses a set of generic aggregation components which can
|
||||
be assembled in to an aggregation pipeline.
|
||||
|
||||
Please refer to the
|
||||
`generic-aggregation-components`_
|
||||
document for information on list of generic aggregation components
|
||||
available.
|
||||
|
||||
Create a new aggregation pipeline example
|
||||
-----------------------------------------
|
||||
|
||||
Generic aggregation components make it easy to build new aggregation
|
||||
pipelines for different Monasca metrics.
|
||||
|
||||
This create a `new aggregation pipeline`_ example shows how to create
|
||||
*pre_transform_specs* and *transform_specs* to create an aggregation
|
||||
pipeline for a new set of Monasca metrics, while leveraging existing set
|
||||
of generic aggregation components.
|
||||
|
||||
Original proposal and blueprint
|
||||
-------------------------------
|
||||
|
||||
Original proposal: `Monasca/Transform-proposal`_
|
||||
|
||||
Blueprint: `monasca-transform blueprint`_
|
||||
|
||||
.. _Apache Spark: https://spark.apache.org
|
||||
.. _generic-aggregation-components: docs/generic-aggregation-components.md
|
||||
.. _PySpark Python API: https://spark.apache.org/docs/latest/api/python/index.html
|
||||
.. _Monasca/Transform wiki: https://wiki.openstack.org/wiki/Monasca/Transform
|
||||
.. _DevStack: https://docs.openstack.org/devstack/latest/
|
||||
.. _README.md: devstack/README.md
|
||||
.. _new aggregation pipeline: docs/create-new-aggregation-pipeline.md
|
||||
.. _Monasca/Transform-proposal: https://wiki.openstack.org/wiki/Monasca/Transform-proposal
|
||||
.. _monasca-transform blueprint: https://blueprints.launchpad.net/monasca/+spec/monasca-transform
|
||||
For any further questions, please email
|
||||
openstack-discuss@lists.openstack.org or join #openstack-monasca on
|
||||
Freenode.
|
@ -1,206 +0,0 @@
|
||||
# Monasca-transform DevStack Plugin
|
||||
|
||||
The Monasca-transform DevStack plugin is tested only on Ubuntu 16.04 (Xenial).
|
||||
|
||||
A short cut to running monasca-transform in devstack is implemented with vagrant.
|
||||
|
||||
## Variables
|
||||
* DATABASE_PASSWORD(default: *secretmysql*) - password to upload monasca-transform schema
|
||||
* MONASCA_TRANSFORM_DB_PASSWORD(default: *password*) - password for m-transform user
|
||||
|
||||
## To run monasca-transform using the provided vagrant environment
|
||||
|
||||
### Using any changes made locally to monasca-transform
|
||||
|
||||
cd tools/vagrant
|
||||
vagrant up
|
||||
vagrant ssh
|
||||
cd devstack
|
||||
./stack.sh
|
||||
|
||||
The devstack vagrant environment is set up to share the monasca-transform
|
||||
directory with the vm, copy it and commit any changes in the vm copy. This is
|
||||
because the devstack deploy process checks out the master branch to
|
||||
|
||||
/opt/stack
|
||||
|
||||
and deploys using that. Changes made by the user need to be committed in order
|
||||
to be used in the devstack instance. It is important therefore that changes
|
||||
should not be pushed from the vm as the unevaluated commit would be pushed.
|
||||
|
||||
N.B. If you are running with virtualbox you may find that the `./stack.sh` fails with the filesystem becoming read only. There is a work around:
|
||||
|
||||
1. vagrant up --no-provision && vagrant halt
|
||||
2. open virtualbox gui
|
||||
3. open target vm settings and change storage controller from SCSI to SATA
|
||||
4. vagrant up
|
||||
|
||||
### Using the upstream committed state of monasca-transform
|
||||
|
||||
This should operate the same as for any other devstack plugin. However, to use
|
||||
the plugin from the upstream repo with the vagrant environment as described
|
||||
above it is sufficient to do:
|
||||
|
||||
cd tools/vagrant
|
||||
vagrant up
|
||||
vagrant ssh
|
||||
cd devstack
|
||||
vi local.conf
|
||||
|
||||
and change the line
|
||||
|
||||
enable_plugin monasca-transform /home/ubuntu/monasca-transform
|
||||
|
||||
to
|
||||
|
||||
enable_plugin monasca-transform https://opendev.org/openstack/monasca-transform
|
||||
|
||||
before running
|
||||
|
||||
./stack.sh
|
||||
|
||||
### Connecting to devstack
|
||||
|
||||
The host key changes with each ```vagrant destroy```/```vagrant up``` cycle so
|
||||
it is necessary to manage host key verification for your workstation:
|
||||
|
||||
ssh-keygen -R 192.168.15.6
|
||||
|
||||
The devstack vm vagrant up process generates a private key which can be used for
|
||||
passwordless ssh to the host as follows:
|
||||
|
||||
cd tools/vagrant
|
||||
ssh -i .vagrant/machines/default/virtualbox/private_key ubuntu@192.168.15.6
|
||||
|
||||
### Running tox on devstack
|
||||
|
||||
Once the deploy is up use the following commands to set up tox.
|
||||
|
||||
sudo su monasca-transform
|
||||
cd /opt/stack/monasca-transform
|
||||
virtualenv .venv
|
||||
. .venv/bin/activate
|
||||
pip install tox
|
||||
tox
|
||||
|
||||
### Updating the code for dev
|
||||
|
||||
To regenerate the environment for development purposes a script is provided
|
||||
on the devstack instance at
|
||||
/opt/stack/monasca-transform/tools/vagrant/refresh_monasca_transform.sh
|
||||
To run the refresh_monasca_transform.sh script on devstack instance
|
||||
|
||||
cd /opt/stack/monasca-transform
|
||||
tools/vagrant/refresh_monasca_transform.sh
|
||||
|
||||
(note: to use/run tox after running this script, the
|
||||
"Running tox on devstack" steps above have to be re-executed)
|
||||
|
||||
This mostly re-does the work of the devstack plugin, updating the code from the
|
||||
shared directory, regenerating the venv and the zip that is passed to spark
|
||||
during the spark-submit call. The configuration and the transform and
|
||||
pre transform specs in the database are updated with fresh copies, along
|
||||
with driver and service python code.
|
||||
|
||||
If refresh_monasca_transform.sh script completes successfully you should see
|
||||
a message like the following in the console.
|
||||
|
||||
***********************************************
|
||||
* *
|
||||
* SUCCESS!! refresh monasca transform done. *
|
||||
* *
|
||||
***********************************************
|
||||
|
||||
### Development workflow
|
||||
|
||||
Here are the normal steps a developer can take to make any code changes. It is
|
||||
essential that the developer runs all tests in functional tests in a devstack
|
||||
environment before submitting any changes for review/merge.
|
||||
|
||||
Please follow steps mentioned in
|
||||
"To run monasca-transform using the provided vagrant environment" section above
|
||||
to create a devstack VM environment before following steps below:
|
||||
|
||||
1. Make code changes on the host machine (e.g. ~/monasca-transform)
|
||||
2. vagrant ssh (to connect to the devstack VM)
|
||||
3. cd /opt/stack/monasca-transform
|
||||
4. tools/vagrant/refresh_monasca_transform.sh (See "Updating the code for dev"
|
||||
section above)
|
||||
5. cd /opt/stack/monasca-transform (since monasca-transform folder
|
||||
gets recreated in Step 4. above)
|
||||
6. tox -e pep8
|
||||
7. tox -e py27
|
||||
8. tox -e functional
|
||||
|
||||
Note: It is mandatory to run functional unit tests before submitting any changes
|
||||
for review/merge. These can be currently be run only in a devstack VM since tests
|
||||
need access to Apache Spark libraries. This is accomplished by setting
|
||||
SPARK_HOME environment variable which is being done in tox.ini.
|
||||
|
||||
export SPARK_HOME=/opt/spark/current
|
||||
|
||||
#### How to find and fix test failures ?
|
||||
|
||||
To find which tests failed after running functional tests (After you have run
|
||||
functional tests as per steps in Development workflow)
|
||||
|
||||
export OS_TEST_PATH=tests/functional
|
||||
export SPARK_HOME=/opt/spark/current
|
||||
source .tox/functional/bin/activate
|
||||
testr run
|
||||
testr failing (to get list of tests that failed)
|
||||
|
||||
You can add
|
||||
|
||||
import pdb
|
||||
pdb.set_trace()
|
||||
|
||||
in test or in code where you want to start python debugger.
|
||||
|
||||
Run test using
|
||||
|
||||
python -m testtools.run <test>
|
||||
|
||||
For example:
|
||||
|
||||
python -m testtools.run \
|
||||
tests.functional.usage.test_host_cpu_usage_component_second_agg.SparkTest
|
||||
|
||||
Reference: https://wiki.openstack.org/wiki/Testr
|
||||
|
||||
## Access Spark Streaming and Spark Master/Worker User Interface
|
||||
|
||||
In a devstack environment ports on which Spark Streaming UI (4040), Spark Master(18080)
|
||||
and Spark Worker (18081) UI are available are forwarded to the host and are
|
||||
accessible from the host machine.
|
||||
|
||||
http://<host_machine_ip>:4040/ (Note: Spark Streaming UI,
|
||||
is available only when
|
||||
monasca-transform application
|
||||
is running)
|
||||
http://<host_machine_ip>:18080/ (Spark Master UI)
|
||||
http://<host_machine_ip>:18081/ (Spark Worker UI)
|
||||
|
||||
## To run monasca-transform using a different deployment technology
|
||||
|
||||
Monasca-transform requires supporting services, such as Kafka and
|
||||
Zookeeper, also are set up. So just adding "enable_plugin monasca-transform"
|
||||
to a default DevStack local.conf is not sufficient to configure a working
|
||||
DevStack deployment unless these services are also added.
|
||||
|
||||
Please reference the devstack/settings file for an example of a working list of
|
||||
plugins and services as used by the Vagrant deployment.
|
||||
|
||||
## WIP
|
||||
|
||||
This is a work in progress. There are a number of improvements necessary to
|
||||
improve value as a development tool.
|
||||
|
||||
|
||||
###TODO
|
||||
|
||||
1. Shorten initial deploy
|
||||
Currently the services deployed are the default set plus all of monasca. It's
|
||||
quite possible that not all of this is necessary to develop monasca-transform.
|
||||
So some services may be dropped in order to shorten the deploy.
|
||||
|
@ -1,20 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
activate_this_file = "/opt/monasca/transform/venv/bin/activate_this.py"
|
||||
exec(open(activate_this_file).read(), dict(__file__=activate_this_file))
|
||||
|
||||
from monasca_transform.driver.mon_metrics_kafka import invoke
|
||||
|
||||
invoke()
|
@ -1,94 +0,0 @@
|
||||
#!/bin/bash
|
||||
### BEGIN INIT INFO
|
||||
# Provides: {{ service_name }}
|
||||
# Required-Start:
|
||||
# Required-Stop:
|
||||
# Default-Start: {{ service_start_levels }}
|
||||
# Default-Stop:
|
||||
# Short-Description: {{ service_name }}
|
||||
# Description:
|
||||
### END INIT INFO
|
||||
|
||||
|
||||
service_is_running()
|
||||
{
|
||||
if [ -e {{ service_pid_file }} ]; then
|
||||
PID=$(cat {{ service_pid_file }})
|
||||
if $(ps $PID > /dev/null 2>&1); then
|
||||
return 0
|
||||
else
|
||||
echo "Found obsolete PID file for {{ service_name }}...deleting it"
|
||||
rm {{ service_pid_file }}
|
||||
return 1
|
||||
fi
|
||||
else
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
|
||||
case $1 in
|
||||
start)
|
||||
echo "Starting {{ service_name }}..."
|
||||
if service_is_running; then
|
||||
echo "{{ service_name }} is already running"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo "
|
||||
|
||||
_/_/ _/_/ _/_/_/_/ _/_/ _/ _/_/_/_/ _/_/_/_/ _/_/_/_/ _/_/_/_/
|
||||
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
|
||||
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
|
||||
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/_/_/_/ _/ _/_/_/_/ _/_/_/_/
|
||||
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
|
||||
_/ _/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
|
||||
_/ _/ _/ _/_/_/_/ _/ _/_/ _/ _/ _/_/_/_/ _/_/_/_/ _/ _/
|
||||
|
||||
|
||||
_/_/_/_/ _/_/_/ _/_/_/_/ _/_/ _/ _/_/_/_/ _/_/_/_/ _/_/_/_/ _/_/_/ _/_/ _/_/
|
||||
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
|
||||
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
|
||||
_/ _/ _/ _/_/_/_/ _/ _/ _/ _/_/_/_/ _/_/_/ _/ _/ _/ _/ _/ _/ _/ _/
|
||||
_/ _/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/ _/ _/ _/ _/
|
||||
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/ _/
|
||||
_/ _/ _/ _/ _/ _/ _/_/ _/_/_/_/ _/ _/_/_/_/ _/ _/ _/ _/ _/
|
||||
|
||||
" >> {{ service_log_dir }}/{{ service_name }}.log
|
||||
|
||||
nohup sudo -u {{ service_user }} {{ virtualenv_location }}/bin/python \
|
||||
{{ service_dir }}/{{ service_file }} \
|
||||
>> {{ service_log_dir }}/{{ service_name }}.log \
|
||||
2>> {{ service_log_dir }}/{{ service_name }}.log &
|
||||
PID=$(echo $!)
|
||||
if [ -z $PID ]; then
|
||||
echo "{{ service_name }} failed to start"
|
||||
else
|
||||
echo $PID > {{ service_pid_file }}
|
||||
echo "{{ service_name }} is running"
|
||||
fi
|
||||
;;
|
||||
stop)
|
||||
echo "Stopping {{ service_name }}..."
|
||||
if service_is_running; then
|
||||
PID=$(cat {{ service_pid_file }})
|
||||
sudo kill -- -$(ps -o pgid= $PID | grep -o '[0-9]*')
|
||||
rm {{ service_pid_file }}
|
||||
echo "{{ service_name }} is stopped"
|
||||
else
|
||||
echo "{{ service_name }} is not running"
|
||||
exit 0
|
||||
fi
|
||||
;;
|
||||
status)
|
||||
if service_is_running; then
|
||||
echo "{{ service_name }} is running"
|
||||
else
|
||||
echo "{{ service_name }} is not running"
|
||||
fi
|
||||
;;
|
||||
restart)
|
||||
$0 stop
|
||||
$0 start
|
||||
;;
|
||||
esac
|
@ -1,88 +0,0 @@
|
||||
[DEFAULTS]
|
||||
|
||||
[repositories]
|
||||
offsets = monasca_transform.mysql_offset_specs:MySQLOffsetSpecs
|
||||
data_driven_specs = monasca_transform.data_driven_specs.mysql_data_driven_specs_repo:MySQLDataDrivenSpecsRepo
|
||||
offsets_max_revisions = 10
|
||||
|
||||
[database]
|
||||
server_type = mysql:thin
|
||||
host = localhost
|
||||
database_name = monasca_transform
|
||||
username = m-transform
|
||||
password = password
|
||||
|
||||
[messaging]
|
||||
adapter = monasca_transform.messaging.adapter:KafkaMessageAdapter
|
||||
topic = metrics
|
||||
brokers=192.168.15.6:9092
|
||||
publish_region = useast
|
||||
publish_kafka_project_id=d2cb21079930415a9f2a33588b9f2bb6
|
||||
adapter_pre_hourly = monasca_transform.messaging.adapter:KafkaMessageAdapterPreHourly
|
||||
topic_pre_hourly = metrics_pre_hourly
|
||||
|
||||
[stage_processors]
|
||||
pre_hourly_processor_enabled = True
|
||||
|
||||
[pre_hourly_processor]
|
||||
late_metric_slack_time = 600
|
||||
enable_instance_usage_df_cache = True
|
||||
instance_usage_df_cache_storage_level = MEMORY_ONLY_SER_2
|
||||
enable_batch_time_filtering = True
|
||||
data_provider=monasca_transform.processor.pre_hourly_processor:PreHourlyProcessorDataProvider
|
||||
effective_batch_revision=2
|
||||
|
||||
#
|
||||
# Configurable values for the monasca-transform service
|
||||
#
|
||||
[service]
|
||||
|
||||
# The address of the mechanism being used for election coordination
|
||||
coordinator_address = kazoo://localhost:2181
|
||||
|
||||
# The name of the coordination/election group
|
||||
coordinator_group = monasca-transform
|
||||
|
||||
# How long the candidate should sleep between election result
|
||||
# queries (in seconds)
|
||||
election_polling_frequency = 15
|
||||
|
||||
# Whether debug-level log entries should be included in the application
|
||||
# log. If this setting is false, info-level will be used for logging.
|
||||
enable_debug_log_entries = true
|
||||
|
||||
# The path for the monasca-transform Spark driver
|
||||
spark_driver = /opt/monasca/transform/lib/driver.py
|
||||
|
||||
# the location for the transform-service log
|
||||
service_log_path=/var/log/monasca/transform/
|
||||
|
||||
# the filename for the transform-service log
|
||||
service_log_filename=monasca-transform.log
|
||||
|
||||
# Whether Spark event logging should be enabled (true/false)
|
||||
spark_event_logging_enabled = true
|
||||
|
||||
# A list of jars which Spark should use
|
||||
spark_jars_list = /opt/spark/current/assembly/target/scala-2.10/jars/spark-streaming-kafka-0-8_2.10-2.2.0.jar,/opt/spark/current/assembly/target/scala-2.10/jars/scala-library-2.10.6.jar,/opt/spark/current/assembly/target/scala-2.10/jars/kafka_2.10-0.8.1.1.jar,/opt/spark/current/assembly/target/scala-2.10/jars/metrics-core-2.2.0.jar,/opt/spark/current/assembly/target/scala-2.10/jars/drizzle-jdbc-1.3.jar
|
||||
|
||||
# A list of where the Spark master(s) should run
|
||||
spark_master_list = spark://localhost:7077
|
||||
|
||||
# spark_home for the environment
|
||||
spark_home = /opt/spark/current
|
||||
|
||||
# Python files for Spark to use
|
||||
spark_python_files = /opt/monasca/transform/lib/monasca-transform.zip
|
||||
|
||||
# How often the stream should be read (in seconds)
|
||||
stream_interval = 600
|
||||
|
||||
# The working directory for monasca-transform
|
||||
work_dir = /var/run/monasca/transform
|
||||
|
||||
# enable caching of record store df
|
||||
enable_record_store_df_cache = True
|
||||
|
||||
# set spark storage level for record store df cache
|
||||
record_store_df_cache_storage_level = MEMORY_ONLY_SER_2
|
@ -1,30 +0,0 @@
|
||||
CREATE DATABASE IF NOT EXISTS `monasca_transform` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
|
||||
USE `monasca_transform`;
|
||||
|
||||
SET foreign_key_checks = 0;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `kafka_offsets` (
|
||||
`id` INTEGER AUTO_INCREMENT NOT NULL,
|
||||
`topic` varchar(128) NOT NULL,
|
||||
`until_offset` BIGINT NULL,
|
||||
`from_offset` BIGINT NULL,
|
||||
`app_name` varchar(128) NOT NULL,
|
||||
`partition` integer NOT NULL,
|
||||
`batch_time` varchar(20) NOT NULL,
|
||||
`last_updated` varchar(20) NOT NULL,
|
||||
`revision` integer NOT NULL,
|
||||
PRIMARY KEY (`id`, `app_name`, `topic`, `partition`, `revision`)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
|
||||
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `transform_specs` (
|
||||
`metric_id` varchar(128) NOT NULL,
|
||||
`transform_spec` varchar(2048) NOT NULL,
|
||||
PRIMARY KEY (`metric_id`)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `pre_transform_specs` (
|
||||
`event_type` varchar(128) NOT NULL,
|
||||
`pre_transform_spec` varchar(2048) NOT NULL,
|
||||
PRIMARY KEY (`event_type`)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
|
@ -1,12 +0,0 @@
|
||||
description "Monasca Transform"
|
||||
|
||||
start on runlevel [2345]
|
||||
stop on runlevel [!2345]
|
||||
|
||||
respawn
|
||||
|
||||
limit nofile 32768 32768
|
||||
|
||||
expect daemon
|
||||
|
||||
exec /etc/monasca/transform/init/start-monasca-transform.sh
|
@ -1,29 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import sys
|
||||
|
||||
activate_this_file = "/opt/monasca/transform/venv/bin/activate_this.py"
|
||||
exec(open(activate_this_file).read(), dict(__file__=activate_this_file))
|
||||
|
||||
from monasca_transform.service.transform_service import main_service
|
||||
|
||||
|
||||
def main():
|
||||
main_service()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
sys.exit(0)
|
@ -1,3 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
cd /
|
||||
/opt/monasca/transform/venv/bin/python /etc/monasca/transform/init/service_runner.py
|
@ -1,30 +0,0 @@
|
||||
spark.driver.extraClassPath /opt/spark/current/assembly/target/scala-2.10/jars/drizzle-jdbc-1.3.jar
|
||||
spark.executor.extraClassPath /opt/spark/current/assembly/target/scala-2.10/jars/drizzle-jdbc-1.3.jar
|
||||
|
||||
spark.blockManager.port 7100
|
||||
spark.broadcast.port 7105
|
||||
spark.cores.max 1
|
||||
spark.driver.memory 512m
|
||||
spark.driver.port 7110
|
||||
spark.eventLog.dir /var/log/spark/events
|
||||
spark.executor.cores 1
|
||||
spark.executor.memory 512m
|
||||
spark.executor.port 7115
|
||||
spark.fileserver.port 7120
|
||||
spark.python.worker.memory 16m
|
||||
spark.speculation true
|
||||
spark.speculation.interval 200
|
||||
spark.sql.shuffle.partitions 32
|
||||
spark.worker.cleanup.enabled True
|
||||
spark.cleaner.ttl 900
|
||||
spark.sql.ui.retainedExecutions 10
|
||||
spark.streaming.ui.retainedBatches 10
|
||||
spark.worker.ui.retainedExecutors 10
|
||||
spark.worker.ui.retainedDrivers 10
|
||||
spark.ui.retainedJobs 10
|
||||
spark.ui.retainedStages 10
|
||||
spark.driver.extraJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/tmp/gc_driver.log
|
||||
spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/tmp/gc_executor.log
|
||||
spark.executor.logs.rolling.maxRetainedFiles 6
|
||||
spark.executor.logs.rolling.strategy time
|
||||
spark.executor.logs.rolling.time.interval hourly
|
@ -1,18 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
export SPARK_LOCAL_IP=127.0.0.1
|
||||
export SPARK_MASTER_IP=127.0.0.1
|
||||
export SPARK_MASTER_PORT=7077
|
||||
export SPARK_MASTERS=127.0.0.1:7077
|
||||
export SPARK_MASTER_WEBUI_PORT=18080
|
||||
|
||||
export SPARK_WORKER_PORT=7078
|
||||
export SPARK_WORKER_WEBUI_PORT=18081
|
||||
export SPARK_WORKER_DIR=/var/run/spark/work
|
||||
|
||||
export SPARK_WORKER_MEMORY=2g
|
||||
export SPARK_WORKER_CORES=2
|
||||
|
||||
export SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=file://var/log/spark/events -Dspark.history.ui.port=18082"
|
||||
export SPARK_LOG_DIR=/var/log/spark
|
||||
export SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=127.0.0.1:2181 -Dspark.deploy.zookeeper.dir=/var/run/spark"
|
@ -1,12 +0,0 @@
|
||||
[Unit]
|
||||
Description=Spark Master
|
||||
After=zookeeper.service
|
||||
|
||||
[Service]
|
||||
User=spark
|
||||
Group=spark
|
||||
ExecStart=/etc/spark/init/start-spark-master.sh
|
||||
Restart=on-failure
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
@ -1,18 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
export SPARK_LOCAL_IP=127.0.0.1
|
||||
export SPARK_MASTER_IP=127.0.0.1
|
||||
export SPARK_MASTER_PORT=7077
|
||||
export SPARK_MASTERS=127.0.0.1:7077
|
||||
export SPARK_MASTER_WEBUI_PORT=18080
|
||||
|
||||
export SPARK_WORKER_PORT=7078
|
||||
export SPARK_WORKER_WEBUI_PORT=18081
|
||||
export SPARK_WORKER_DIR=/var/run/spark/work
|
||||
|
||||
export SPARK_WORKER_MEMORY=2g
|
||||
export SPARK_WORKER_CORES=1
|
||||
export SPARK_WORKER_OPTS="$SPARK_WORKER_OPTS -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=900 -Dspark.worker.cleanup.appDataTtl=1*24*3600"
|
||||
export SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=file://var/log/spark/events -Dspark.history.ui.port=18082"
|
||||
export SPARK_LOG_DIR=/var/log/spark
|
||||
export SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=127.0.0.1:2181 -Dspark.deploy.zookeeper.dir=/var/run/spark"
|
@ -1,9 +0,0 @@
|
||||
[Unit]
|
||||
Description=Spark Worker
|
||||
After=zookeeper.service
|
||||
|
||||
[Service]
|
||||
User=spark
|
||||
Group=spark
|
||||
ExecStart=/etc/spark/init/start-spark-worker.sh
|
||||
Restart=on-failure
|
@ -1,14 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
. /opt/spark/current/conf/spark-env.sh
|
||||
export EXEC_CLASS=org.apache.spark.deploy.master.Master
|
||||
export INSTANCE_ID=1
|
||||
export SPARK_CLASSPATH=/etc/spark/conf/:/opt/spark/current/assembly/target/scala-2.10/jars/*
|
||||
export log="$SPARK_LOG_DIR/spark-spark-"$EXEC_CLASS"-"$INSTANCE_ID"-127.0.0.1.out"
|
||||
export SPARK_HOME=/opt/spark/current
|
||||
|
||||
# added for spark 2
|
||||
export PYTHONPATH="${SPARK_HOME}/python:${PYTHONPATH}"
|
||||
export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.4-src.zip:${PYTHONPATH}"
|
||||
export SPARK_SCALA_VERSION="2.10"
|
||||
|
||||
/usr/bin/java -cp "$SPARK_CLASSPATH" $SPARK_DAEMON_JAVA_OPTS -Xms1g -Xmx1g "$EXEC_CLASS" --ip "$SPARK_MASTER_IP" --port "$SPARK_MASTER_PORT" --webui-port "$SPARK_MASTER_WEBUI_PORT" --properties-file "/etc/spark/conf/spark-defaults.conf"
|
@ -1,17 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
. /opt/spark/current/conf/spark-worker-env.sh
|
||||
export EXEC_CLASS=org.apache.spark.deploy.worker.Worker
|
||||
export INSTANCE_ID=1
|
||||
export SPARK_CLASSPATH=/etc/spark/conf/:/opt/spark/current/assembly/target/scala-2.10/jars/*
|
||||
export log="$SPARK_LOG_DIR/spark-spark-"$EXEC_CLASS"-"$INSTANCE_ID"-127.0.0.1.out"
|
||||
export SPARK_HOME=/opt/spark/current
|
||||
|
||||
# added for spark 2.1.1
|
||||
export PYTHONPATH="${SPARK_HOME}/python:${PYTHONPATH}"
|
||||
export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.4-src.zip:${PYTHONPATH}"
|
||||
export SPARK_SCALA_VERSION="2.10"
|
||||
|
||||
/usr/bin/java -cp "$SPARK_CLASSPATH" $SPARK_DAEMON_JAVA_OPTS -Xms1g -Xmx1g "$EXEC_CLASS" --host $SPARK_LOCAL_IP --cores $SPARK_WORKER_CORES --memory $SPARK_WORKER_MEMORY --port "$SPARK_WORKER_PORT" -d "$SPARK_WORKER_DIR" --webui-port "$SPARK_WORKER_WEBUI_PORT" --properties-file "/etc/spark/conf/spark-defaults.conf" spark://$SPARK_MASTERS
|
||||
|
||||
|
||||
|
@ -1,485 +0,0 @@
|
||||
# (C) Copyright 2015 Hewlett Packard Enterprise Development Company LP
|
||||
# Copyright 2016 FUJITSU LIMITED
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
# implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
|
||||
# Monasca-transform DevStack plugin
|
||||
#
|
||||
# Install and start Monasca-transform service in devstack
|
||||
#
|
||||
# To enable Monasca-transform in devstack add an entry to local.conf that
|
||||
# looks like
|
||||
#
|
||||
# [[local|localrc]]
|
||||
# enable_plugin monasca-transform https://opendev.org/openstack/monasca-transform
|
||||
#
|
||||
# By default all Monasca services are started (see
|
||||
# devstack/settings). To disable a specific service use the
|
||||
# disable_service function. For example to turn off notification:
|
||||
#
|
||||
# disable_service monasca-notification
|
||||
#
|
||||
# Several variables set in the localrc section adjust common behaviors
|
||||
# of Monasca (see within for additional settings):
|
||||
#
|
||||
# EXAMPLE VARS HERE
|
||||
|
||||
# Save trace setting
|
||||
XTRACE=$(set +o | grep xtrace)
|
||||
set -o xtrace
|
||||
|
||||
ERREXIT=$(set +o | grep errexit)
|
||||
set -o errexit
|
||||
|
||||
# monasca-transform database password
|
||||
export MONASCA_TRANSFORM_DB_PASSWORD=${MONASCA_TRANSFORM_DB_PASSWORD:-"password"}
|
||||
|
||||
export MONASCA_TRANSFORM_FILES="${DEST}"/monasca-transform/devstack/files
|
||||
export DOWNLOADS_DIRECTORY=${DOWNLOADS_DIRECTORY:-"/home/${USER}/downloads"}
|
||||
|
||||
function pre_install_monasca_transform {
|
||||
:
|
||||
}
|
||||
|
||||
function pre_install_spark {
|
||||
for SPARK_JAVA_LIB in "${SPARK_JAVA_LIBS[@]}"
|
||||
do
|
||||
SPARK_LIB_NAME=`echo ${SPARK_JAVA_LIB} | sed 's/.*\///'`
|
||||
download_through_cache ${MAVEN_REPO}/${SPARK_JAVA_LIB} ${SPARK_LIB_NAME}
|
||||
done
|
||||
|
||||
for SPARK_JAR in "${SPARK_JARS[@]}"
|
||||
do
|
||||
SPARK_JAR_NAME=`echo ${SPARK_JAR} | sed 's/.*\///'`
|
||||
download_through_cache ${MAVEN_REPO}/${SPARK_JAR} ${SPARK_JAR_NAME}
|
||||
done
|
||||
|
||||
download_through_cache ${APACHE_MIRROR}/spark/spark-${SPARK_VERSION}/${SPARK_TARBALL_NAME} ${SPARK_TARBALL_NAME} 1000
|
||||
|
||||
|
||||
}
|
||||
|
||||
function install_java_libs {
|
||||
|
||||
pushd /opt/spark/current/assembly/target/scala-2.10/jars/
|
||||
for SPARK_JAVA_LIB in "${SPARK_JAVA_LIBS[@]}"
|
||||
do
|
||||
SPARK_LIB_NAME=`echo ${SPARK_JAVA_LIB} | sed 's/.*\///'`
|
||||
copy_from_cache ${SPARK_LIB_NAME}
|
||||
done
|
||||
popd
|
||||
}
|
||||
|
||||
function install_spark_jars {
|
||||
|
||||
# create a directory for jars
|
||||
mkdir -p /opt/spark/current/assembly/target/scala-2.10/jars
|
||||
|
||||
# copy jars to new location
|
||||
pushd /opt/spark/current/assembly/target/scala-2.10/jars
|
||||
for SPARK_JAR in "${SPARK_JARS[@]}"
|
||||
do
|
||||
SPARK_JAR_NAME=`echo ${SPARK_JAR} | sed 's/.*\///'`
|
||||
copy_from_cache ${SPARK_JAR_NAME}
|
||||
done
|
||||
|
||||
# copy all jars except spark and scala to assembly/target/scala_2.10/jars
|
||||
find /opt/spark/current/jars/ -type f ! \( -iname 'spark*' -o -iname 'scala*' -o -iname 'jackson-module-scala*' -o -iname 'json4s-*' -o -iname 'breeze*' -o -iname 'spire*' -o -iname 'macro-compat*' -o -iname 'shapeless*' -o -iname 'machinist*' -o -iname 'chill*' \) -exec cp {} . \;
|
||||
|
||||
# rename jars directory
|
||||
mv /opt/spark/current/jars/ /opt/spark/current/jars_original
|
||||
popd
|
||||
}
|
||||
|
||||
function copy_from_cache {
|
||||
resource_name=$1
|
||||
target_directory=${2:-"./."}
|
||||
cp ${DOWNLOADS_DIRECTORY}/${resource_name} ${target_directory}/.
|
||||
}
|
||||
|
||||
function download_through_cache {
|
||||
resource_location=$1
|
||||
resource_name=$2
|
||||
resource_timeout=${3:-"300"}
|
||||
if [[ ! -d ${DOWNLOADS_DIRECTORY} ]]; then
|
||||
_safe_permission_operation mkdir -p ${DOWNLOADS_DIRECTORY}
|
||||
_safe_permission_operation chown ${USER} ${DOWNLOADS_DIRECTORY}
|
||||
fi
|
||||
pushd ${DOWNLOADS_DIRECTORY}
|
||||
if [[ ! -f ${resource_name} ]]; then
|
||||
curl -m ${resource_timeout} --retry 3 --retry-delay 5 ${resource_location} -o ${resource_name}
|
||||
fi
|
||||
popd
|
||||
}
|
||||
|
||||
function unstack_monasca_transform {
|
||||
|
||||
echo_summary "Unstack Monasca-transform"
|
||||
stop_process "monasca-transform" || true
|
||||
|
||||
}
|
||||
|
||||
function delete_monasca_transform_files {
|
||||
|
||||
sudo rm -rf /opt/monasca/transform || true
|
||||
sudo rm /etc/monasca-transform.conf || true
|
||||
|
||||
MONASCA_TRANSFORM_DIRECTORIES=("/var/log/monasca/transform" "/var/run/monasca/transform" "/etc/monasca/transform/init")
|
||||
|
||||
for MONASCA_TRANSFORM_DIRECTORY in "${MONASCA_TRANSFORM_DIRECTORIES[@]}"
|
||||
do
|
||||
sudo rm -rf ${MONASCA_TRANSFORM_DIRECTORY} || true
|
||||
done
|
||||
|
||||
}
|
||||
|
||||
function drop_monasca_transform_database {
|
||||
sudo mysql -u$DATABASE_USER -p$DATABASE_PASSWORD -h$MYSQL_HOST -e "drop database monasca_transform; drop user 'm-transform'@'%' from mysql.user; drop user 'm-transform'@'localhost' from mysql.user;" || echo "Failed to drop database 'monasca_transform' and/or user 'm-transform' from mysql database, you may wish to do this manually."
|
||||
}
|
||||
|
||||
function unstack_spark {
|
||||
|
||||
echo_summary "Unstack Spark"
|
||||
|
||||
stop_spark_worker
|
||||
|
||||
stop_spark_master
|
||||
|
||||
}
|
||||
|
||||
function stop_spark_worker {
|
||||
|
||||
stop_process "spark-worker"
|
||||
|
||||
}
|
||||
|
||||
function stop_spark_master {
|
||||
|
||||
stop_process "spark-master"
|
||||
|
||||
}
|
||||
|
||||
function clean_spark {
|
||||
echo_summary "Clean spark"
|
||||
set +o errexit
|
||||
delete_spark_start_scripts
|
||||
delete_spark_upstart_definitions
|
||||
unlink_spark_commands
|
||||
delete_spark_directories
|
||||
sudo rm -rf `readlink /opt/spark/current` || true
|
||||
sudo rm -rf /opt/spark || true
|
||||
sudo userdel spark || true
|
||||
sudo groupdel spark || true
|
||||
set -o errexit
|
||||
}
|
||||
|
||||
function clean_monasca_transform {
|
||||
set +o errexit
|
||||
delete_monasca_transform_files
|
||||
sudo rm /etc/init/monasca-transform.conf || true
|
||||
sudo rm -rf /etc/monasca/transform || true
|
||||
drop_monasca_transform_database
|
||||
set -o errexit
|
||||
}
|
||||
|
||||
function create_spark_directories {
|
||||
|
||||
for SPARK_DIRECTORY in "${SPARK_DIRECTORIES[@]}"
|
||||
do
|
||||
sudo mkdir -p ${SPARK_DIRECTORY}
|
||||
sudo chown ${USER} ${SPARK_DIRECTORY}
|
||||
sudo chmod 755 ${SPARK_DIRECTORY}
|
||||
done
|
||||
|
||||
|
||||
}
|
||||
|
||||
function delete_spark_directories {
|
||||
|
||||
for SPARK_DIRECTORY in "${SPARK_DIRECTORIES[@]}"
|
||||
do
|
||||
sudo rm -rf ${SPARK_DIRECTORY} || true
|
||||
done
|
||||
|
||||
}
|
||||
|
||||
|
||||
function link_spark_commands_to_usr_bin {
|
||||
|
||||
SPARK_COMMANDS=("spark-submit" "spark-class" "spark-shell" "spark-sql")
|
||||
for SPARK_COMMAND in "${SPARK_COMMANDS[@]}"
|
||||
do
|
||||
sudo ln -sf /opt/spark/current/bin/${SPARK_COMMAND} /usr/bin/${SPARK_COMMAND}
|
||||
done
|
||||
|
||||
}
|
||||
|
||||
function unlink_spark_commands {
|
||||
|
||||
SPARK_COMMANDS=("spark-submit" "spark-class" "spark-shell" "spark-sql")
|
||||
for SPARK_COMMAND in "${SPARK_COMMANDS[@]}"
|
||||
do
|
||||
sudo unlink /usr/bin/${SPARK_COMMAND} || true
|
||||
done
|
||||
|
||||
}
|
||||
|
||||
function copy_and_link_config {
|
||||
|
||||
SPARK_ENV_FILES=("spark-env.sh" "spark-worker-env.sh" "spark-defaults.conf")
|
||||
for SPARK_ENV_FILE in "${SPARK_ENV_FILES[@]}"
|
||||
do
|
||||
cp -f "${MONASCA_TRANSFORM_FILES}"/spark/"${SPARK_ENV_FILE}" /etc/spark/conf/.
|
||||
ln -sf /etc/spark/conf/"${SPARK_ENV_FILE}" /opt/spark/current/conf/"${SPARK_ENV_FILE}"
|
||||
done
|
||||
|
||||
}
|
||||
|
||||
function copy_spark_start_scripts {
|
||||
|
||||
SPARK_START_SCRIPTS=("start-spark-master.sh" "start-spark-worker.sh")
|
||||
for SPARK_START_SCRIPT in "${SPARK_START_SCRIPTS[@]}"
|
||||
do
|
||||
cp -f "${MONASCA_TRANSFORM_FILES}"/spark/"${SPARK_START_SCRIPT}" /etc/spark/init/.
|
||||
chmod 755 /etc/spark/init/"${SPARK_START_SCRIPT}"
|
||||
done
|
||||
}
|
||||
|
||||
function delete_spark_start_scripts {
|
||||
|
||||
SPARK_START_SCRIPTS=("start-spark-master.sh" "start-spark-worker.sh")
|
||||
for SPARK_START_SCRIPT in "${SPARK_START_SCRIPTS[@]}"
|
||||
do
|
||||
rm /etc/spark/init/"${SPARK_START_SCRIPT}" || true
|
||||
done
|
||||
}
|
||||
|
||||
|
||||
function install_monasca_transform {
|
||||
|
||||
echo_summary "Install Monasca-Transform"
|
||||
|
||||
create_monasca_transform_directories
|
||||
copy_monasca_transform_files
|
||||
create_monasca_transform_venv
|
||||
|
||||
sudo cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/start-monasca-transform.sh /etc/monasca/transform/init/.
|
||||
sudo chmod +x /etc/monasca/transform/init/start-monasca-transform.sh
|
||||
sudo cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/service_runner.py /etc/monasca/transform/init/.
|
||||
|
||||
}
|
||||
|
||||
|
||||
function create_monasca_transform_directories {
|
||||
|
||||
MONASCA_TRANSFORM_DIRECTORIES=("/var/log/monasca/transform" "/opt/monasca/transform" "/opt/monasca/transform/lib" "/var/run/monasca/transform" "/etc/monasca/transform/init")
|
||||
|
||||
for MONASCA_TRANSFORM_DIRECTORY in "${MONASCA_TRANSFORM_DIRECTORIES[@]}"
|
||||
do
|
||||
sudo mkdir -p ${MONASCA_TRANSFORM_DIRECTORY}
|
||||
sudo chown ${USER} ${MONASCA_TRANSFORM_DIRECTORY}
|
||||
chmod 755 ${MONASCA_TRANSFORM_DIRECTORY}
|
||||
done
|
||||
|
||||
}
|
||||
|
||||
function get_id () {
|
||||
echo `"$@" | grep ' id ' | awk '{print $4}'`
|
||||
}
|
||||
|
||||
function ascertain_admin_project_id {
|
||||
|
||||
source ~/devstack/openrc admin admin
|
||||
export ADMIN_PROJECT_ID=$(get_id openstack project show mini-mon)
|
||||
}
|
||||
|
||||
function copy_monasca_transform_files {
|
||||
|
||||
cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/service_runner.py /opt/monasca/transform/lib/.
|
||||
sudo cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/monasca-transform.conf /etc/.
|
||||
cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/driver.py /opt/monasca/transform/lib/.
|
||||
${DEST}/monasca-transform/scripts/create_zip.sh
|
||||
cp -f "${DEST}"/monasca-transform/scripts/monasca-transform.zip /opt/monasca/transform/lib/.
|
||||
${DEST}/monasca-transform/scripts/generate_ddl_for_devstack.sh
|
||||
cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/monasca-transform_mysql.sql /opt/monasca/transform/lib/.
|
||||
cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/transform_specs.sql /opt/monasca/transform/lib/.
|
||||
cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/pre_transform_specs.sql /opt/monasca/transform/lib/.
|
||||
touch /var/log/monasca/transform/monasca-transform.log
|
||||
|
||||
# set variables in configuration files
|
||||
iniset -sudo /etc/monasca-transform.conf database password "$MONASCA_TRANSFORM_DB_PASSWORD"
|
||||
|
||||
iniset -sudo /etc/monasca-transform.conf messaging brokers "$SERVICE_HOST:9092"
|
||||
iniset -sudo /etc/monasca-transform.conf messaging publish_region "$REGION_NAME"
|
||||
}
|
||||
|
||||
function create_monasca_transform_venv {
|
||||
|
||||
sudo chown -R ${USER} ${DEST}/monasca-transform
|
||||
virtualenv /opt/monasca/transform/venv ;
|
||||
. /opt/monasca/transform/venv/bin/activate ;
|
||||
pip install -e "${DEST}"/monasca-transform/ ;
|
||||
deactivate
|
||||
|
||||
}
|
||||
|
||||
function create_and_populate_monasca_transform_database {
|
||||
# must login as root@localhost
|
||||
mysql -u$DATABASE_USER -p$DATABASE_PASSWORD -h$MYSQL_HOST < /opt/monasca/transform/lib/monasca-transform_mysql.sql || echo "Did the schema change? This process will fail on schema changes."
|
||||
|
||||
# set grants for m-transform user (needs to be done from localhost)
|
||||
mysql -u$DATABASE_USER -p$DATABASE_PASSWORD -h$MYSQL_HOST -e "GRANT ALL ON monasca_transform.* TO 'm-transform'@'%' IDENTIFIED BY '${MONASCA_TRANSFORM_DB_PASSWORD}';"
|
||||
mysql -u$DATABASE_USER -p$DATABASE_PASSWORD -h$MYSQL_HOST -e "GRANT ALL ON monasca_transform.* TO 'm-transform'@'localhost' IDENTIFIED BY '${MONASCA_TRANSFORM_DB_PASSWORD}';"
|
||||
|
||||
# copy rest of files after grants are ready
|
||||
mysql -um-transform -p$MONASCA_TRANSFORM_DB_PASSWORD -h$MYSQL_HOST < /opt/monasca/transform/lib/pre_transform_specs.sql
|
||||
mysql -um-transform -p$MONASCA_TRANSFORM_DB_PASSWORD -h$MYSQL_HOST < /opt/monasca/transform/lib/transform_specs.sql
|
||||
}
|
||||
|
||||
function install_spark {
|
||||
|
||||
echo_summary "Install Spark"
|
||||
|
||||
sudo mkdir /opt/spark || true
|
||||
|
||||
sudo chown -R ${USER} /opt/spark
|
||||
|
||||
tar -xzf ${DOWNLOADS_DIRECTORY}/${SPARK_TARBALL_NAME} -C /opt/spark/
|
||||
|
||||
ln -sf /opt/spark/${SPARK_HADOOP_VERSION} /opt/spark/current
|
||||
|
||||
install_spark_jars
|
||||
|
||||
install_java_libs
|
||||
|
||||
create_spark_directories
|
||||
|
||||
link_spark_commands_to_usr_bin
|
||||
|
||||
copy_and_link_config
|
||||
|
||||
copy_spark_start_scripts
|
||||
|
||||
}
|
||||
|
||||
function extra_spark {
|
||||
|
||||
start_spark_master
|
||||
start_spark_worker
|
||||
|
||||
}
|
||||
|
||||
function start_spark_worker {
|
||||
|
||||
run_process "spark-worker" "/etc/spark/init/start-spark-worker.sh"
|
||||
|
||||
}
|
||||
|
||||
function start_spark_master {
|
||||
|
||||
run_process "spark-master" "/etc/spark/init/start-spark-master.sh"
|
||||
|
||||
}
|
||||
|
||||
function post_config_monasca_transform {
|
||||
|
||||
create_and_populate_monasca_transform_database
|
||||
|
||||
}
|
||||
|
||||
function post_config_spark {
|
||||
:
|
||||
}
|
||||
|
||||
function extra_monasca_transform {
|
||||
|
||||
/opt/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 64 --topic metrics_pre_hourly
|
||||
|
||||
ascertain_admin_project_id
|
||||
sudo sed -i "s/publish_kafka_project_id=d2cb21079930415a9f2a33588b9f2bb6/publish_kafka_project_id=${ADMIN_PROJECT_ID}/g" /etc/monasca-transform.conf
|
||||
start_monasca_transform
|
||||
|
||||
}
|
||||
|
||||
function start_monasca_transform {
|
||||
run_process "monasca-transform" "/etc/monasca/transform/init/start-monasca-transform.sh"
|
||||
# systemd unit file updates
|
||||
local unitfile="$SYSTEMD_DIR/devstack@monasca-transform.service"
|
||||
local after_service="devstack@zookeeper.service devstack@spark-master.service devstack@spark-worker.service"
|
||||
iniset -sudo "$unitfile" "Unit" "After" "$after_service"
|
||||
iniset -sudo "$unitfile" "Service" "Type" "simple"
|
||||
iniset -sudo "$unitfile" "Service" "LimitNOFILE" "32768"
|
||||
# reset KillMode for monasca-transform, as spawns several child processes
|
||||
iniset -sudo "$unitfile" "Service" "KillMode" "control-group"
|
||||
sudo systemctl daemon-reload
|
||||
}
|
||||
|
||||
# check for service enabled
|
||||
if is_service_enabled monasca-transform; then
|
||||
|
||||
if [[ "$1" == "stack" && "$2" == "pre-install" ]]; then
|
||||
# Set up system services
|
||||
echo_summary "Configuring Spark system services"
|
||||
pre_install_spark
|
||||
echo_summary "Configuring Monasca-transform system services"
|
||||
pre_install_monasca_transform
|
||||
|
||||
elif [[ "$1" == "stack" && "$2" == "install" ]]; then
|
||||
# Perform installation of service source
|
||||
echo_summary "Installing Spark"
|
||||
install_spark
|
||||
echo_summary "Installing Monasca-transform"
|
||||
install_monasca_transform
|
||||
|
||||
elif [[ "$1" == "stack" && "$2" == "post-config" ]]; then
|
||||
# Configure after the other layer 1 and 2 services have been configured
|
||||
echo_summary "Configuring Spark"
|
||||
post_config_spark
|
||||
echo_summary "Configuring Monasca-transform"
|
||||
post_config_monasca_transform
|
||||
|
||||
elif [[ "$1" == "stack" && "$2" == "extra" ]]; then
|
||||
# Initialize and start the Monasca service
|
||||
echo_summary "Initializing Spark"
|
||||
extra_spark
|
||||
echo_summary "Initializing Monasca-transform"
|
||||
extra_monasca_transform
|
||||
fi
|
||||
|
||||
if [[ "$1" == "unstack" ]]; then
|
||||
echo_summary "Unstacking Monasca-transform"
|
||||
unstack_monasca_transform
|
||||
echo_summary "Unstacking Spark"
|
||||
unstack_spark
|
||||
fi
|
||||
|
||||
if [[ "$1" == "clean" ]]; then
|
||||
# Remove state and transient data
|
||||
# Remember clean.sh first calls unstack.sh
|
||||
echo_summary "Cleaning Monasca-transform"
|
||||
clean_monasca_transform
|
||||
echo_summary "Cleaning Spark"
|
||||
clean_spark
|
||||
fi
|
||||
|
||||
else
|
||||
echo_summary "Monasca-transform not enabled"
|
||||
fi
|
||||
|
||||
#Restore errexit
|
||||
$ERREXIT
|
||||
|
||||
# Restore xtrace
|
||||
$XTRACE
|
@ -1,57 +0,0 @@
|
||||
#!/bin/bash -xe
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
# This script is executed inside post_test_hook function in devstack gate
|
||||
|
||||
function generate_testr_results {
|
||||
if [ -f .testrepository/0 ]; then
|
||||
sudo .tox/functional/bin/testr last --subunit > $WORKSPACE/testrepository.subunit
|
||||
sudo mv $WORKSPACE/testrepository.subunit $BASE/logs/testrepository.subunit
|
||||
sudo /usr/os-testr-env/bin/subunit2html $BASE/logs/testrepository.subunit $BASE/logs/testr_results.html
|
||||
sudo gzip -9 $BASE/logs/testrepository.subunit
|
||||
sudo gzip -9 $BASE/logs/testr_results.html
|
||||
sudo chown $USER:$USER $BASE/logs/testrepository.subunit.gz $BASE/logs/testr_results.html.gz
|
||||
sudo chmod a+r $BASE/logs/testrepository.subunit.gz $BASE/logs/testr_results.html.gz
|
||||
fi
|
||||
}
|
||||
|
||||
export MONASCA_TRANSFORM_DIR="$BASE/new/monasca-transform"
|
||||
|
||||
export MONASCA_TRANSFORM_LOG_DIR="/var/log/monasca/transform/"
|
||||
|
||||
# Go to the monasca-transform dir
|
||||
cd $MONASCA_TRANSFORM_DIR
|
||||
|
||||
if [[ -z "$STACK_USER" ]]; then
|
||||
export STACK_USER=stack
|
||||
fi
|
||||
|
||||
sudo chown -R $STACK_USER:stack $MONASCA_TRANSFORM_DIR
|
||||
|
||||
# create a log dir
|
||||
sudo mkdir -p $MONASCA_TRANSFORM_LOG_DIR
|
||||
sudo chown -R $STACK_USER:stack $MONASCA_TRANSFORM_LOG_DIR
|
||||
|
||||
# Run tests
|
||||
echo "Running monasca-transform functional test suite"
|
||||
set +e
|
||||
|
||||
|
||||
sudo -E -H -u ${STACK_USER:-${USER}} tox -efunctional
|
||||
EXIT_CODE=$?
|
||||
set -e
|
||||
|
||||
# Collect and parse result
|
||||
generate_testr_results
|
||||
exit $EXIT_CODE
|
@ -1,84 +0,0 @@
|
||||
#
|
||||
# (C) Copyright 2015 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
# implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
disable_service horizon
|
||||
disable_service monasca-thresh
|
||||
|
||||
enable_service monasca
|
||||
enable_service monasca-influxdb
|
||||
enable_service monasca-storm
|
||||
enable_service zookeeper
|
||||
enable_service monasca-kafka
|
||||
enable_service monasca-api
|
||||
enable_service monasca-persister
|
||||
enable_service monasca-agent
|
||||
enable_service monasca-cli
|
||||
|
||||
enable_service monasca-transform
|
||||
enable_service spark-master
|
||||
enable_service spark-worker
|
||||
|
||||
#
|
||||
# Dependent Software Versions
|
||||
#
|
||||
|
||||
# spark vars
|
||||
SPARK_DIRECTORIES=("/var/spark" "/var/log/spark" "/var/log/spark/events" "/var/run/spark" "/var/run/spark/work" "/etc/spark/conf" "/etc/spark/init" )
|
||||
SPARK_VERSION=${SPARK_VERSION:-2.2.0}
|
||||
HADOOP_VERSION=${HADOOP_VERSION:-2.7}
|
||||
SPARK_HADOOP_VERSION=spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION
|
||||
SPARK_TARBALL_NAME=${SPARK_HADOOP_VERSION}.tgz
|
||||
MAVEN_REPO=${MAVEN_REPO:-https://repo1.maven.org/maven2}
|
||||
APACHE_MIRROR=${APACHE_MIRROR:-http://archive.apache.org/dist/}
|
||||
|
||||
# Kafka deb consists of the version of scala plus the version of kafka
|
||||
BASE_KAFKA_VERSION=${BASE_KAFKA_VERSION:-0.8.1.1}
|
||||
SCALA_VERSION=${SCALA_VERSION:-2.10}
|
||||
KAFKA_VERSION=${KAFKA_VERSION:-${SCALA_VERSION}-${BASE_KAFKA_VERSION}}
|
||||
SPARK_JAVA_LIBS=("org/apache/kafka/kafka_2.10/0.8.1.1/kafka_2.10-0.8.1.1.jar" "com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar" "org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar" "org/scala-lang/scala-compiler/2.10.6/scala-compiler-2.10.6.jar" "org/scala-lang/scala-reflect/2.10.6/scala-reflect-2.10.6.jar" "org/scala-lang/scalap/2.10.6/scalap-2.10.6.jar" "org/apache/spark/spark-streaming-kafka-0-8_2.10/${SPARK_VERSION}/spark-streaming-kafka-0-8_2.10-${SPARK_VERSION}.jar" "org/drizzle/jdbc/drizzle-jdbc/1.3/drizzle-jdbc-1.3.jar" "com/fasterxml/jackson/module/jackson-module-scala_2.10/2.6.5/jackson-module-scala_2.10-2.6.5.jar" "org/json4s/json4s-jackson_2.10/3.2.11/json4s-jackson_2.10-3.2.11.jar" "org/json4s/json4s-core_2.10/3.2.11/json4s-core_2.10-3.2.11.jar" "org/json4s/json4s-ast_2.10/3.2.11/json4s-ast_2.10-3.2.11.jar" "org/scalanlp/breeze-macros_2.10/0.13.1/breeze-macros_2.10-0.13.1.jar" "org/spire-math/spire_2.10/0.13.0/spire_2.10-0.13.0.jar" "org/typelevel/macro-compat_2.10/1.1.1/macro-compat_2.10-1.1.1.jar" "com/chuusai/shapeless_2.10/2.3.2/shapeless_2.10-2.3.2.jar" "org/spire-math/spire-macros_2.10/0.13.0/spire-macros_2.10-0.13.0.jar" "org/typelevel/machinist_2.10/0.6.1/machinist_2.10-0.6.1.jar" "org/scalanlp/breeze_2.10/0.13.1/breeze_2.10-0.13.1.jar" "com/twitter/chill_2.10/0.8.0/chill_2.10-0.8.0.jar" "com/twitter/chill-java/0.8.0/chill-java-0.8.0.jar")
|
||||
|
||||
# Get Spark 2.2 jars compiled with Scala 2.10 from mvn
|
||||
SPARK_JARS=("org/apache/spark/spark-catalyst_2.10/${SPARK_VERSION}/spark-catalyst_2.10-2.2.0.jar" "org/apache/spark/spark-core_2.10/${SPARK_VERSION}/spark-core_2.10-2.2.0.jar" "org/apache/spark/spark-graphx_2.10/${SPARK_VERSION}/spark-graphx_2.10-2.2.0.jar" "org/apache/spark/spark-launcher_2.10/${SPARK_VERSION}/spark-launcher_2.10-2.2.0.jar" "org/apache/spark/spark-mllib_2.10/${SPARK_VERSION}/spark-mllib_2.10-2.2.0.jar" "org/apache/spark/spark-mllib-local_2.10/${SPARK_VERSION}/spark-mllib-local_2.10-2.2.0.jar" "org/apache/spark/spark-network-common_2.10/${SPARK_VERSION}/spark-network-common_2.10-2.2.0.jar" "org/apache/spark/spark-network-shuffle_2.10/${SPARK_VERSION}/spark-network-shuffle_2.10-2.2.0.jar" "org/apache/spark/spark-repl_2.10/${SPARK_VERSION}/spark-repl_2.10-2.2.0.jar" "org/apache/spark/spark-sketch_2.10/${SPARK_VERSION}/spark-sketch_2.10-2.2.0.jar" "org/apache/spark/spark-sql_2.10/${SPARK_VERSION}/spark-sql_2.10-2.2.0.jar" "org/apache/spark/spark-streaming_2.10/${SPARK_VERSION}/spark-streaming_2.10-2.2.0.jar" "org/apache/spark/spark-tags_2.10/${SPARK_VERSION}/spark-tags_2.10-2.2.0.jar" "org/apache/spark/spark-unsafe_2.10/${SPARK_VERSION}/spark-unsafe_2.10-2.2.0.jar" "org/apache/spark/spark-yarn_2.10/${SPARK_VERSION}/spark-yarn_2.10-2.2.0.jar")
|
||||
|
||||
# monasca-api stuff
|
||||
|
||||
VERTICA_VERSION=${VERTICA_VERSION:-7.2.1-0}
|
||||
CASSANDRA_VERSION=${CASSANDRA_VERSION:-37x}
|
||||
STORM_VERSION=${STORM_VERSION:-1.0.2}
|
||||
GO_VERSION=${GO_VERSION:-"1.7.1"}
|
||||
NODE_JS_VERSION=${NODE_JS_VERSION:-"4.0.0"}
|
||||
NVM_VERSION=${NVM_VERSION:-"0.32.1"}
|
||||
|
||||
# Repository settings
|
||||
MONASCA_API_REPO=${MONASCA_API_REPO:-${GIT_BASE}/openstack/monasca-api.git}
|
||||
MONASCA_API_BRANCH=${MONASCA_API_BRANCH:-master}
|
||||
MONASCA_API_DIR=${DEST}/monasca-api
|
||||
|
||||
MONASCA_PERSISTER_REPO=${MONASCA_PERSISTER_REPO:-${GIT_BASE}/openstack/monasca-persister.git}
|
||||
MONASCA_PERSISTER_BRANCH=${MONASCA_PERSISTER_BRANCH:-master}
|
||||
MONASCA_PERSISTER_DIR=${DEST}/monasca-persister
|
||||
|
||||
MONASCA_CLIENT_REPO=${MONASCA_CLIENT_REPO:-${GIT_BASE}/openstack/python-monascaclient.git}
|
||||
MONASCA_CLIENT_BRANCH=${MONASCA_CLIENT_BRANCH:-master}
|
||||
MONASCA_CLIENT_DIR=${DEST}/python-monascaclient
|
||||
|
||||
MONASCA_AGENT_REPO=${MONASCA_AGENT_REPO:-${GIT_BASE}/openstack/monasca-agent.git}
|
||||
MONASCA_AGENT_BRANCH=${MONASCA_AGENT_BRANCH:-master}
|
||||
MONASCA_AGENT_DIR=${DEST}/monasca-agent
|
||||
|
||||
MONASCA_COMMON_REPO=${MONASCA_COMMON_REPO:-${GIT_BASE}/openstack/monasca-common.git}
|
||||
MONASCA_COMMON_BRANCH=${MONASCA_COMMON_BRANCH:-master}
|
||||
MONASCA_COMMON_DIR=${DEST}/monasca-common
|
@ -1,15 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
MAVEN_STUB="https://repo1.maven.org/maven2"
|
||||
SPARK_JAVA_LIBS=("org/apache/kafka/kafka_2.10/0.8.1.1/kafka_2.10-0.8.1.1.jar" "org/scala-lang/scala-library/2.10.1/scala-library-2.10.1.jar" "com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar" "org/apache/spark/spark-streaming-kafka_2.10/1.6.0/spark-streaming-kafka_2.10-1.6.0.jar")
|
||||
for SPARK_JAVA_LIB in "${SPARK_JAVA_LIBS[@]}"
|
||||
do
|
||||
echo Would fetch ${MAVEN_STUB}/${SPARK_JAVA_LIB}
|
||||
done
|
||||
|
||||
for SPARK_JAVA_LIB in "${SPARK_JAVA_LIBS[@]}"
|
||||
do
|
||||
SPARK_LIB_NAME=`echo ${SPARK_JAVA_LIB} | sed 's/.*\///'`
|
||||
echo Got lib ${SPARK_LIB_NAME}
|
||||
|
||||
done
|
@ -1,258 +0,0 @@
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
# implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
#
|
||||
# monasca-transform documentation build configuration file, created by
|
||||
# sphinx-quickstart on Mon Jan 9 12:02:59 2012.
|
||||
#
|
||||
# This file is execfile()d with the current directory set to its
|
||||
# containing dir.
|
||||
#
|
||||
# Note that not all possible configuration values are present in this
|
||||
# autogenerated file.
|
||||
#
|
||||
# All configuration values have a default; values that are commented out
|
||||
# serve to show the default.
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import warnings
|
||||
|
||||
# If extensions (or modules to document with autodoc) are in another directory,
|
||||
# add these directories to sys.path here. If the directory is relative to the
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
sys.path.insert(0, os.path.abspath('../../'))
|
||||
sys.path.insert(0, os.path.abspath('../'))
|
||||
sys.path.insert(0, os.path.abspath('./'))
|
||||
|
||||
# -- General configuration ----------------------------------------------------
|
||||
|
||||
# If your documentation needs a minimal Sphinx version, state it here.
|
||||
# needs_sphinx = '1.0'
|
||||
|
||||
# Add any Sphinx extension module names here, as strings. They can be
|
||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||
# ones.
|
||||
extensions = ['sphinx.ext.autodoc',
|
||||
'sphinx.ext.todo',
|
||||
'sphinx.ext.coverage',
|
||||
'sphinx.ext.viewcode',
|
||||
]
|
||||
|
||||
todo_include_todos = True
|
||||
|
||||
# The suffix of source filenames.
|
||||
source_suffix = '.rst'
|
||||
|
||||
# The encoding of source files.
|
||||
# source_encoding = 'utf-8-sig'
|
||||
|
||||
# The master toctree document.
|
||||
master_doc = 'index'
|
||||
|
||||
# General information about the project.
|
||||
project = u'monasca-transform'
|
||||
copyright = u'2016, OpenStack Foundation'
|
||||
|
||||
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||
# for a list of supported languages.
|
||||
# language = None
|
||||
|
||||
# There are two options for replacing |today|: either, you set today to some
|
||||
# non-false value, then it is used:
|
||||
# today = ''
|
||||
# Else, today_fmt is used as the format for a strftime call.
|
||||
# today_fmt = '%B %d, %Y'
|
||||
|
||||
# List of patterns, relative to source directory, that match files and
|
||||
# directories to ignore when looking for source files.
|
||||
exclude_patterns = ['old']
|
||||
|
||||
# The reST default role (used for this markup: `text`) to use for all
|
||||
# documents.
|
||||
# default_role = None
|
||||
|
||||
# If true, '()' will be appended to :func: etc. cross-reference text.
|
||||
# add_function_parentheses = True
|
||||
|
||||
# If true, the current module name will be prepended to all description
|
||||
# unit titles (such as .. function::).
|
||||
# add_module_names = True
|
||||
|
||||
# If true, sectionauthor and moduleauthor directives will be shown in the
|
||||
# output. They are ignored by default.
|
||||
show_authors = True
|
||||
|
||||
# The name of the Pygments (syntax highlighting) style to use.
|
||||
pygments_style = 'sphinx'
|
||||
|
||||
# A list of ignored prefixes for module index sorting.
|
||||
modindex_common_prefix = ['monasca-transform.']
|
||||
|
||||
# -- Options for man page output --------------------------------------------
|
||||
|
||||
|
||||
# -- Options for HTML output --------------------------------------------------
|
||||
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
# html_theme_path = ["."]
|
||||
# html_theme = '_theme'
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
# documentation.
|
||||
# html_theme_options = {}
|
||||
|
||||
# Add any paths that contain custom themes here, relative to this directory.
|
||||
# html_theme_path = []
|
||||
|
||||
# The name for this set of Sphinx documents. If None, it defaults to
|
||||
# "<project> v<release> documentation".
|
||||
# html_title = None
|
||||
|
||||
# A shorter title for the navigation bar. Default is the same as html_title.
|
||||
# html_short_title = None
|
||||
|
||||
# The name of an image file (relative to this directory) to place at the top
|
||||
# of the sidebar.
|
||||
# html_logo = None
|
||||
|
||||
# The name of an image file (within the static path) to use as favicon of the
|
||||
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
|
||||
# pixels large.
|
||||
# html_favicon = None
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
# html_static_path = ['_static']
|
||||
|
||||
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
|
||||
# using the given strftime format.
|
||||
# html_last_updated_fmt = '%b %d, %Y'
|
||||
git_cmd = ["git", "log", "--pretty=format:'%ad, commit %h'", "--date=local",
|
||||
"-n1"]
|
||||
try:
|
||||
html_last_updated_fmt = subprocess.check_output(git_cmd).decode('utf-8')
|
||||
except Exception:
|
||||
warnings.warn('Cannot get last updated time from git repository. '
|
||||
'Not setting "html_last_updated_fmt".')
|
||||
|
||||
# If true, SmartyPants will be used to convert quotes and dashes to
|
||||
# typographically correct entities.
|
||||
# html_use_smartypants = True
|
||||
|
||||
# Custom sidebar templates, maps document names to template names.
|
||||
# html_sidebars = {}
|
||||
|
||||
# Additional templates that should be rendered to pages, maps page names to
|
||||
# template names.
|
||||
# html_additional_pages = {}
|
||||
|
||||
# If false, no module index is generated.
|
||||
# html_domain_indices = True
|
||||
|
||||
# If false, no index is generated.
|
||||
# html_use_index = True
|
||||
|
||||
# If true, the index is split into individual pages for each letter.
|
||||
# html_split_index = False
|
||||
|
||||
# If true, links to the reST sources are added to the pages.
|
||||
# html_show_sourcelink = True
|
||||
|
||||
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
|
||||
# html_show_sphinx = True
|
||||
|
||||
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
|
||||
# html_show_copyright = True
|
||||
|
||||
# If true, an OpenSearch description file will be output, and all pages will
|
||||
# contain a <link> tag referring to it. The value of this option must be the
|
||||
# base URL from which the finished HTML is served.
|
||||
# html_use_opensearch = ''
|
||||
|
||||
# This is the file name suffix for HTML files (e.g. ".xhtml").
|
||||
# html_file_suffix = None
|
||||
|
||||
# Output file base name for HTML help builder.
|
||||
htmlhelp_basename = 'monasca-transformdoc'
|
||||
|
||||
|
||||
# -- Options for LaTeX output -------------------------------------------------
|
||||
|
||||
latex_elements = {
|
||||
# The paper size ('letterpaper' or 'a4paper').
|
||||
# 'papersize': 'letterpaper',
|
||||
|
||||
# The font size ('10pt', '11pt' or '12pt').
|
||||
# 'pointsize': '10pt',
|
||||
|
||||
# Additional stuff for the LaTeX preamble.
|
||||
# 'preamble': '',
|
||||
}
|
||||
|
||||
# Grouping the document tree into LaTeX files. List of tuples (source
|
||||
# start file, target name, title, author, documentclass
|
||||
# [howto/manual]).
|
||||
latex_documents = [
|
||||
('index', 'monasca-transform.tex', u'Monasca-transform Documentation',
|
||||
u'OpenStack', 'manual'),
|
||||
]
|
||||
|
||||
# The name of an image file (relative to this directory) to place at the top of
|
||||
# the title page.
|
||||
# latex_logo = None
|
||||
|
||||
# For "manual" documents, if this is true, then toplevel headings are parts,
|
||||
# not chapters.
|
||||
# latex_use_parts = False
|
||||
|
||||
# If true, show page references after internal links.
|
||||
# latex_show_pagerefs = False
|
||||
|
||||
# If true, show URL addresses after external links.
|
||||
# latex_show_urls = False
|
||||
|
||||
# Documents to append as an appendix to all manuals.
|
||||
# latex_appendices = []
|
||||
|
||||
# If false, no module index is generated.
|
||||
# latex_domain_indices = True
|
||||
|
||||
|
||||
# -- Options for Texinfo output -----------------------------------------------
|
||||
|
||||
# Grouping the document tree into Texinfo files. List of tuples
|
||||
# (source start file, target name, title, author,
|
||||
# dir menu entry, description, category)
|
||||
texinfo_documents = [
|
||||
('index', 'monasca-transform', u'Monasca-transform Documentation',
|
||||
u'OpenStack', 'monasca-transform', 'One line description of project.',
|
||||
'Miscellaneous'),
|
||||
]
|
||||
|
||||
# Documents to append as an appendix to all manuals.
|
||||
# texinfo_appendices = []
|
||||
|
||||
# If false, no module index is generated.
|
||||
# texinfo_domain_indices = True
|
||||
|
||||
# How to display URL addresses: 'footnote', 'no', or 'inline'.
|
||||
# texinfo_show_urls = 'footnote'
|
||||
|
||||
|
||||
# Example configuration for intersphinx: refer to the Python standard library.
|
||||
# intersphinx_mapping = {'http://docs.python.org/': None}
|
@ -1,24 +0,0 @@
|
||||
..
|
||||
Copyright 2016 OpenStack Foundation
|
||||
All Rights Reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
not use this file except in compliance with the License. You may obtain
|
||||
a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
License for the specific language governing permissions and limitations
|
||||
under the License.
|
||||
|
||||
=================
|
||||
Monasca-transform
|
||||
=================
|
||||
|
||||
.. toctree::
|
||||
|
||||
api/autoindex.rst
|
||||
|
@ -1,329 +0,0 @@
|
||||
Team and repository tags
|
||||
========================
|
||||
|
||||
[![Team and repository tags](https://governance.openstack.org/badges/monasca-transform.svg)](https://governance.openstack.org/reference/tags/index.html)
|
||||
|
||||
<!-- Change things from this point on -->
|
||||
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
|
||||
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
||||
|
||||
|
||||
- [Create a new aggregation pipeline](#create-a-new-aggregation-pipeline)
|
||||
- [Using existing generic aggregation components](#using-existing-generic-aggregation-components)
|
||||
|
||||
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
||||
|
||||
<!-- Change things from this point on -->
|
||||
|
||||
# Create a new aggregation pipeline
|
||||
|
||||
Monasca Transform allows you to create new aggregation by creating *pre_transform_spec* and
|
||||
*transform_spec* for any set of Monasca metrics. This page gives you steps on how to create a new
|
||||
aggregation pipeline and test the pipeline in your DevStack environment.
|
||||
|
||||
Pre-requisite for following steps on this page is that you have already created a devstack
|
||||
development environment for Monasca Transform, following instructions in
|
||||
[devstack/README.md](devstack/README.md)
|
||||
|
||||
|
||||
## Using existing generic aggregation components ##
|
||||
|
||||
Most of the use cases will fall into this category where you should be able to create new
|
||||
aggregation for new set of metrics using existing set of generic aggregation components.
|
||||
|
||||
Let's consider a use case where we want to find out
|
||||
|
||||
* Maximum time monasca-agent takes to submit metrics over a period of an hour across all hosts
|
||||
|
||||
* Maximum time monasca-agent takes to submit metrics over period of a hour per host.
|
||||
|
||||
We know that monasca-agent on each host generates a small number of
|
||||
[monasca-agent metrics](https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md).
|
||||
|
||||
The metric we are interested in is
|
||||
|
||||
* **"monasca.collection_time_sec"**: Amount of time that the collector took for this collection run
|
||||
|
||||
**Steps:**
|
||||
|
||||
* **Step 1**: Identify the monasca metric to be aggregated from the Kafka topic
|
||||
```
|
||||
/opt/kafka_2.11-0.9.0.1/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic metrics | grep "monasca.collection_time_sec"
|
||||
|
||||
{"metric":{"timestamp":1523323485360.6650390625,"name":"monasca.collection_time_sec",
|
||||
"dimensions":{"hostname":"devstack","component":"monasca-agent",
|
||||
"service":"monitoring"},"value":0.0340659618, "value_meta":null},
|
||||
"meta":{"region":"RegionOne","tenantId":"d6bece1bbeff47c1b8734cd4e544dc02"},
|
||||
"creation_time":1523323489}
|
||||
```
|
||||
Note: "hostname" is available as a dimension, which we will use to find maximum collection time for each host.
|
||||
|
||||
* **Step 2**: Create a **pre_transform_spec**
|
||||
|
||||
"pre_transform_spec" drives the pre-processing of monasca metric to record store format. Look
|
||||
for existing example in
|
||||
"/monasca-transform-source/monasca_transform/data_driven_specs/pre_transform_specs/pre_transform_specs.json"
|
||||
|
||||
**pre_transform_spec**
|
||||
```
|
||||
{
|
||||
"event_processing_params":{
|
||||
"set_default_zone_to":"1",
|
||||
"set_default_geolocation_to":"1",
|
||||
"set_default_region_to":"W"
|
||||
},
|
||||
"event_type":"monasca.collection_time_sec", <-- EDITED
|
||||
"metric_id_list":["monasca_collection_host"], <-- EDITED
|
||||
"required_raw_fields_list":["creation_time", "metric.dimensions.hostname"], <--EDITED
|
||||
}
|
||||
```
|
||||
Lets look at all the fields that were edited (Marked as `<-- EDITED` above):
|
||||
|
||||
**event_type**: set to "monasca.collection_time_sec". These are the metrics we want to
|
||||
transform/aggregate.
|
||||
|
||||
**metric_id_list**: set to ['monasca_collection_host']. This is a transformation spec
|
||||
identifier. During pre-processing record generator generates additional "record_store" data for
|
||||
each item in this list. (To be renamed to transform_spec_list)
|
||||
|
||||
**required_raw_fields_list**: set to ["creation_time", "metric.dimensions.hostname"]
|
||||
This should list fields in the incoming metrics that are required. Pre-processing will
|
||||
eliminate or remove metrics which have missing required fields, during validation.
|
||||
|
||||
**Note:** "metric_id" is a misnomer, it is not really a metric identifier but rather identifier
|
||||
for transformation spec. This will be changed to transform_spec_id in the future.
|
||||
|
||||
* **Step 3**: Create a "transform_spec" to find maximum metric value for each host
|
||||
|
||||
"transform_spec" drives the aggregation of record store data created during pre-processing
|
||||
to final aggregated metric. Look for existing example in
|
||||
"/monasca-transform-source/monasca_transform/data_driven_specs/transform_specs/transform_specs.json"
|
||||
|
||||
**transform_spec**
|
||||
```
|
||||
{
|
||||
"aggregation_params_map":{
|
||||
|
||||
"aggregation_pipeline":{
|
||||
"source":"streaming",
|
||||
"usage":"fetch_quantity", <-- EDITED
|
||||
"setters":["set_aggregated_metric_name","set_aggregated_period"], <-- EDITED
|
||||
"insert":["insert_data_pre_hourly"] <-- EDITED
|
||||
},
|
||||
|
||||
"aggregated_metric_name":"monasca.collection_time_sec_host_agg", <-- EDITED
|
||||
"aggregation_period":"hourly", <-- EDITED
|
||||
"aggregation_group_by_list": ["host"],
|
||||
"usage_fetch_operation": "max", <-- EDITED
|
||||
"filter_by_list": [],
|
||||
"dimension_list":["aggregation_period","host"], <-- EDITED
|
||||
|
||||
"pre_hourly_operation":"max",
|
||||
"pre_hourly_group_by_list":["default"]},
|
||||
|
||||
"metric_group":"monasca_collection_host", <-- EDITED
|
||||
"metric_id":"monasca_collection_host" <-- EDITED
|
||||
}
|
||||
```
|
||||
Lets look at all the fields that were edited (Marked as `<-- EDITED` above):
|
||||
|
||||
aggregation pipeline fields
|
||||
|
||||
* **usage**: set to "fetch_quantity" Use "fetch_quantity" generic aggregation component. This
|
||||
component takes a "aggregation_group_by_list", "usage_fetch_operation" and "filter_by_list" as
|
||||
parameters.
|
||||
* **aggregation_group_by_list** set to ["host"]. Since we want to find monasca agent
|
||||
collection time for each host.
|
||||
* **usage_fetch_operation** set to "max". Since we want to find maximum value for
|
||||
monasca agent collection time.
|
||||
* **filter_by_list** set to []. Since we dont want filter out/ignore any metrics (based on
|
||||
say particular host or set of hosts)
|
||||
|
||||
* **setters**: set to ["set_aggregated_metric_name","set_aggregated_period"] These components set
|
||||
aggregated metric name and aggregation period in final aggregated metric.
|
||||
* **set_aggregated_metric_name** sets final aggregated metric name. This setter component takes
|
||||
"aggregated_metric_name" as a parameter.
|
||||
* **aggregated_metric_name**: set to "monasca.collection_time_sec_host_agg"
|
||||
* **set_aggregated_period** sets final aggregated metric period. This setter component takes
|
||||
"aggregation_period" as a parameter.
|
||||
* **aggregation_period**: set to "hourly"
|
||||
|
||||
* **insert**: set to ["insert_data_pre_hourly"]. These components are responsible for
|
||||
transforming instance usage data records to final metric format and writing the data to kafka
|
||||
topic.
|
||||
* **insert_data_pre_hourly** writes the to "metrics_pre_hourly" kafka topic, which gets
|
||||
processed by the pre hourly processor every hour.
|
||||
|
||||
pre hourly processor fields
|
||||
|
||||
* **pre_hourly_operation** set to "max"
|
||||
Find the hourly maximum value from records that were written to "metrics_pre_hourly" topic
|
||||
|
||||
* **pre_hourly_group_by_list** set to ["default"]
|
||||
|
||||
transformation spec identifier fields
|
||||
|
||||
* **metric_group** set to "monasca_collection_host". Group identifier for this transformation
|
||||
spec
|
||||
|
||||
* **metric_id** set to "monasca_collection_host". Identifier for this transformation spec.
|
||||
|
||||
**Note:** metric_group" and "metric_id" are misnomers, it is not really a metric identifier but
|
||||
rather identifier for transformation spec. This will be changed to "transform_group" and
|
||||
"transform_spec_id" in the future. (Please see Story
|
||||
[2001815](https://storyboard.openstack.org/#!/story/2001815))
|
||||
|
||||
* **Step 4**: Create a "transform_spec" to find maximum metric value across all hosts
|
||||
|
||||
Now let's create another transformation spec to find maximum metric value across all hosts.
|
||||
|
||||
**transform_spec**
|
||||
```
|
||||
{
|
||||
"aggregation_params_map":{
|
||||
|
||||
"aggregation_pipeline":{
|
||||
"source":"streaming",
|
||||
"usage":"fetch_quantity", <-- EDITED
|
||||
"setters":["set_aggregated_metric_name","set_aggregated_period"], <-- EDITED
|
||||
"insert":["insert_data_pre_hourly"] <-- EDITED
|
||||
},
|
||||
|
||||
"aggregated_metric_name":"monasca.collection_time_sec_all_agg", <-- EDITED
|
||||
"aggregation_period":"hourly", <-- EDITED
|
||||
"aggregation_group_by_list": [],
|
||||
"usage_fetch_operation": "max", <-- EDITED
|
||||
"filter_by_list": [],
|
||||
"dimension_list":["aggregation_period"], <-- EDITED
|
||||
|
||||
"pre_hourly_operation":"max",
|
||||
"pre_hourly_group_by_list":["default"]},
|
||||
|
||||
"metric_group":"monasca_collection_all", <-- EDITED
|
||||
"metric_id":"monasca_collection_all" <-- EDITED
|
||||
}
|
||||
```
|
||||
|
||||
The transformation spec above is almost identical to transformation spec created in **Step 3**
|
||||
with a few additional changes.
|
||||
|
||||
**aggregation_group_by_list** is set to [] i.e. empty list, since we want to find maximum value
|
||||
across all hosts (consider all the incoming metric data).
|
||||
|
||||
**aggregated_metric_name** is set to "monasca.collection_time_sec_all_agg".
|
||||
|
||||
**metric_group** is set to "monasca_collection_all", since we need a new transfomation spec
|
||||
group identifier.
|
||||
|
||||
**metric_id** is set to "monasca_collection_all", since we need a new transformation spec
|
||||
identifier.
|
||||
|
||||
* **Step 5**: Update "pre_transform_spec" with new transformation spec identifier
|
||||
|
||||
In **Step 4** we created a new transformation spec, with new "metric_id", namely
|
||||
"monasca_collection_all". We will have to now update the "pre_transform_spec" that we
|
||||
created in **Step 2** with new "metric_id" by adding it to the "metric_id_list"
|
||||
|
||||
**pre_transform_spec**
|
||||
```
|
||||
{
|
||||
"event_processing_params":{
|
||||
"set_default_zone_to":"1",
|
||||
"set_default_geolocation_to":"1",
|
||||
"set_default_region_to":"W"
|
||||
},
|
||||
"event_type":"monasca.collection_time_sec",
|
||||
"metric_id_list":["monasca_collection_host", "monasca_collection_all"], <-- EDITED
|
||||
"required_raw_fields_list":["creation_time", "metric.dimensions.hostname"],
|
||||
}
|
||||
```
|
||||
Thus we were able to add additional transformation or aggregation pipeline to the same incoming
|
||||
monasca metric very easily.
|
||||
|
||||
* **Step 6**: Update "pre_transform_spec" and "transform_spec"
|
||||
|
||||
* Edit
|
||||
"/monasca-transform-source/monasca_transform/data_driven_specs/pre_transform_specs/pre_transform_specs.json"
|
||||
and add following line.
|
||||
|
||||
```
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"monasca.collection_time_sec","metric_id_list":["monasca_collection_host","monasca_collection_all"],"required_raw_fields_list":["creation_time"]}
|
||||
```
|
||||
|
||||
**Note:** Each line does not end with a comma (the file is not one big json document).
|
||||
|
||||
* Edit
|
||||
"/monasca-transform-source/monasca_transform/data_driven_specs/transform_specs/transform_specs.json"
|
||||
and add following lines.
|
||||
|
||||
```
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["set_aggregated_metric_name","set_aggregated_period"],"insert":["insert_data_pre_hourly"]},"aggregated_metric_name":"monasca.collection_time_sec_host_agg","aggregation_period":"hourly","aggregation_group_by_list":["host"],"usage_fetch_operation":"max","filter_by_list":[],"dimension_list":["aggregation_period","host"],"pre_hourly_operation":"max","pre_hourly_group_by_list":["default"]},"metric_group":"monasca_collection_host","metric_id":"monasca_collection_host"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["set_aggregated_metric_name","set_aggregated_period"],"insert":["insert_data_pre_hourly"]},"aggregated_metric_name":"monasca.collection_time_sec_all_agg","aggregation_period":"hourly","aggregation_group_by_list":[],"usage_fetch_operation":"max","filter_by_list":[],"dimension_list":["aggregation_period"],"pre_hourly_operation":"max","pre_hourly_group_by_list":["default"]},"metric_group":"monasca_collection_all","metric_id":"monasca_collection_all"}
|
||||
```
|
||||
|
||||
* Run "refresh_monasca_transform.sh" script as documented in devstack
|
||||
[README](devstack/README.md) to refresh the specs in the database.
|
||||
```
|
||||
vagrant@devstack:~$ cd /opt/stack/monasca-transform
|
||||
vagrant@devstack:/opt/stack/monasca-transform$ tools/vagrant/refresh_monasca_transform.sh
|
||||
```
|
||||
|
||||
If successful, you should see this message.
|
||||
```
|
||||
***********************************************
|
||||
* *
|
||||
* SUCCESS!! refresh monasca transform done. *
|
||||
* *
|
||||
***********************************************
|
||||
```
|
||||
* **Step 7**: Verifying results
|
||||
|
||||
To verify if new aggregated metrics are being produced you can look at the "metrics_pre_hourly"
|
||||
topic in kafka. By default, monasca-transform fires of a batch every 10 minutes so you should
|
||||
see metrics in intermediate "instance_usage" data format being published to that topic every 10
|
||||
minutes.
|
||||
```
|
||||
/opt/kafka_2.11-0.9.0.1/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic metrics_pre_hourly
|
||||
|
||||
{"usage_hour":"06","geolocation":"NA","record_count":40.0,"app":"NA","deployment":"NA","resource_uuid":"NA",
|
||||
"pod_name":"NA","usage_minute":"NA","service_group":"NA","lastrecord_timestamp_string":"2018-04-1106:29:49",
|
||||
"user_id":"NA","zone":"NA","namespace":"NA","usage_date":"2018-04-11","daemon_set":"NA","processing_meta":{
|
||||
"event_type":"NA","metric_id":"monasca_collection_all"},
|
||||
"firstrecord_timestamp_unix":1523427604.208577,"project_id":"NA","lastrecord_timestamp_unix":1523428189.718174,
|
||||
"aggregation_period":"hourly","host":"NA","container_name":"NA","interface":"NA",
|
||||
"aggregated_metric_name":"monasca.collection_time_sec_all_agg","tenant_id":"NA","region":"NA",
|
||||
"firstrecord_timestamp_string":"2018-04-11 06:20:04","quantity":0.0687000751}
|
||||
|
||||
{"usage_hour":"06","geolocation":"NA","record_count":40.0,"app":"NA","deployment":"NA","resource_uuid":"NA",
|
||||
"pod_name":"NA","usage_minute":"NA","service_group":"NA","lastrecord_timestamp_string":"2018-04-11 06:29:49",
|
||||
"user_id":"NA","zone":"NA","namespace":"NA","usage_date":"2018-04-11","daemon_set":"NA","processing_meta":{
|
||||
"event_type":"NA","metric_id":"monasca_collection_host"},"firstrecord_timestamp_unix":1523427604.208577,
|
||||
"project_id":"NA","lastrecord_timestamp_unix":1523428189.718174,"aggregation_period":"hourly",
|
||||
"host":"devstack","container_name":"NA","interface":"NA",
|
||||
"aggregated_metric_name":"monasca.collection_time_sec_host_agg","tenant_id":"NA","region":"NA",
|
||||
"firstrecord_timestamp_string":"2018-04-11 06:20:04","quantity":0.0687000751}
|
||||
```
|
||||
|
||||
Similarly, to verify if final aggregated metrics are being published by pre hourly processor,
|
||||
you can look at "metrics" topic in kafka. By default pre hourly processor (which processes
|
||||
metrics from "metrics_pre_hourly" topic) runs 10 minutes past the top of the hour.
|
||||
```
|
||||
/opt/kafka_2.11-0.9.0.1/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic metrics | grep "_agg"
|
||||
|
||||
{"metric":{"timestamp":1523459468616,"value_meta":{"firstrecord_timestamp_string":"2018-04-11 14:00:13",
|
||||
"lastrecord_timestamp_string":"2018-04-11 14:59:46","record_count":239.0},"name":"monasca.collection_time_sec_host_agg",
|
||||
"value":0.1182248592,"dimensions":{"aggregation_period":"hourly","host":"devstack"}},
|
||||
"meta":{"region":"useast","tenantId":"df89c3db21954b08b0516b4b60b8baff"},"creation_time":1523459468}
|
||||
|
||||
{"metric":{"timestamp":1523455872740,"value_meta":{"firstrecord_timestamp_string":"2018-04-11 13:00:10",
|
||||
"lastrecord_timestamp_string":"2018-04-11 13:59:58","record_count":240.0},"name":"monasca.collection_time_sec_all_agg",
|
||||
"value":0.0898442268,"dimensions":{"aggregation_period":"hourly"}},
|
||||
"meta":{"region":"useast","tenantId":"df89c3db21954b08b0516b4b60b8baff"},"creation_time":1523455872}
|
||||
```
|
||||
|
||||
As you can see monasca-transform created two new aggregated metrics with name
|
||||
"monasca.collection_time_sec_host_agg" and "monasca.collection_time_sec_all_agg". "value_meta"
|
||||
section has three fields "firstrecord_timestamp" and "lastrecord_timestamp" and
|
||||
"record_count". These fields are for informational purposes only. It shows timestamp of the first metric,
|
||||
timestamp of the last metric and number of metrics that went into the calculation of the aggregated
|
||||
metric.
|
@ -1,109 +0,0 @@
|
||||
Team and repository tags
|
||||
========================
|
||||
|
||||
[![Team and repositorytags](https://governance.openstack.org/badges/monasca-transform.svg)](https://governance.openstack.org/reference/tags/index.html)
|
||||
|
||||
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
|
||||
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
||||
|
||||
|
||||
- [Monasca Transform Data Formats](#monasca-transform-data-formats)
|
||||
- [Record Store Data Format](#record-store-data-format)
|
||||
- [Instance Usage Data Format](#instance-usage-data-format)
|
||||
- [References](#references)
|
||||
|
||||
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
||||
|
||||
# Monasca Transform Data Formats
|
||||
|
||||
There are two data formats used by monasca transform. The following sections describes the schema
|
||||
(Spark's DataFrame[1] schema) for the two formats.
|
||||
|
||||
Note: These are internal formats used by Monasca Transform when aggregating data. If you are a user
|
||||
who wants to create new aggregation pipeline using an existing framework, you don't need to know or
|
||||
care about these two formats.
|
||||
|
||||
As a developer, if you want to write new aggregation components then you might have to know how to
|
||||
enhance the record store data format or instance usage data format with additional fields that you
|
||||
may need or to write new aggregation components that aggregate data from the additional fields.
|
||||
|
||||
**Source Metric**
|
||||
|
||||
This is an example monasca metric. Monasca metric is transformed into `record_store` data format and
|
||||
later transformed/aggregated using re-usable generic aggregation components, to derive
|
||||
'instance_usage` data format.
|
||||
|
||||
Example of a monasca metric:
|
||||
|
||||
```
|
||||
{"metric":{"timestamp":1523323485360.6650390625,
|
||||
"name":"monasca.collection_time_sec",
|
||||
"dimensions":{"hostname":"devstack",
|
||||
"component":"monasca-agent",
|
||||
"service":"monitoring"},
|
||||
"value":0.0340659618,
|
||||
"value_meta":null},
|
||||
"meta":{"region":"RegionOne","tenantId":"d6bece1bbeff47c1b8734cd4e544dc02"},
|
||||
"creation_time":1523323489}
|
||||
```
|
||||
|
||||
## Record Store Data Format ##
|
||||
|
||||
Data Frame Schema:
|
||||
|
||||
| Column Name | Column Data Type | Description |
|
||||
| :---------- | :--------------- | :---------- |
|
||||
| event_quantity | `pyspark.sql.types.DoubleType` | mapped to `metric.value`|
|
||||
| event_timestamp_unix | `pyspark.sql.types.DoubleType` | calculated as `metric.timestamp`/`1000` from source metric|
|
||||
| event_timestamp_string | `pyspark.sql.types.StringType` | mapped to `metric.timestamp` from the source metric|
|
||||
| event_type | `pyspark.sql.types.StringType` | placeholder for the future. mapped to `metric.name` from source metric|
|
||||
| event_quantity_name | `pyspark.sql.types.StringType` | mapped to `metric.name` from source metric|
|
||||
| resource_uuid | `pyspark.sql.types.StringType` | mapped to `metric.dimensions.instanceId` or `metric.dimensions.resource_id` from source metric |
|
||||
| tenant_id | `pyspark.sql.types.StringType` | mapped to `metric.dimensions.tenant_id` or `metric.dimensions.tenantid` or `metric.dimensions.project_id` |
|
||||
| user_id | `pyspark.sql.types.StringType` | mapped to `meta.userId` |
|
||||
| region | `pyspark.sql.types.StringType` | placeholder of the future. mapped to `meta.region`, defaults to `event_processing_params.set_default_region_to` (`pre_transform_spec`) |
|
||||
| zone | `pyspark.sql.types.StringType` | placeholder for the future. mapped to `meta.zone`, defaults to `event_processing_params.set_default_zone_to` (`pre_transform_spec`) |
|
||||
| host | `pyspark.sql.types.StringType` | mapped to `metric.dimensions.hostname` or `metric.value_meta.host` |
|
||||
| project_id | `pyspark.sql.types.StringType` | mapped to metric tenant_id |
|
||||
| event_date | `pyspark.sql.types.StringType` | "YYYY-mm-dd". Extracted from `metric.timestamp` |
|
||||
| event_hour | `pyspark.sql.types.StringType` | "HH". Extracted from `metric.timestamp` |
|
||||
| event_minute | `pyspark.sql.types.StringType` | "MM". Extracted from `metric.timestamp` |
|
||||
| event_second | `pyspark.sql.types.StringType` | "SS". Extracted from `metric.timestamp` |
|
||||
| metric_group | `pyspark.sql.types.StringType` | identifier for transform spec group |
|
||||
| metric_id | `pyspark.sql.types.StringType` | identifier for transform spec |
|
||||
|
||||
## Instance Usage Data Format ##
|
||||
|
||||
Data Frame Schema:
|
||||
|
||||
| Column Name | Column Data Type | Description |
|
||||
| :---------- | :--------------- | :---------- |
|
||||
| tenant_id | `pyspark.sql.types.StringType` | project_id, defaults to `NA` |
|
||||
| user_id | `pyspark.sql.types.StringType` | user_id, defaults to `NA`|
|
||||
| resource_uuid | `pyspark.sql.types.StringType` | resource_id, defaults to `NA`|
|
||||
| geolocation | `pyspark.sql.types.StringType` | placeholder for future, defaults to `NA`|
|
||||
| region | `pyspark.sql.types.StringType` | placeholder for future, defaults to `NA`|
|
||||
| zone | `pyspark.sql.types.StringType` | placeholder for future, defaults to `NA`|
|
||||
| host | `pyspark.sql.types.StringType` | compute hostname, defaults to `NA`|
|
||||
| project_id | `pyspark.sql.types.StringType` | project_id, defaults to `NA`|
|
||||
| aggregated_metric_name | `pyspark.sql.types.StringType` | aggregated metric name, defaults to `NA`|
|
||||
| firstrecord_timestamp_string | `pyspark.sql.types.StringType` | timestamp of the first metric used to derive this aggregated metric|
|
||||
| lastrecord_timestamp_string | `pyspark.sql.types.StringType` | timestamp of the last metric used to derive this aggregated metric|
|
||||
| usage_date | `pyspark.sql.types.StringType` | "YYYY-mm-dd" date|
|
||||
| usage_hour | `pyspark.sql.types.StringType` | "HH" hour|
|
||||
| usage_minute | `pyspark.sql.types.StringType` | "MM" minute|
|
||||
| aggregation_period | `pyspark.sql.types.StringType` | "hourly" or "minutely" |
|
||||
| firstrecord_timestamp_unix | `pyspark.sql.types.DoubleType` | epoch timestamp of the first metric used to derive this aggregated metric |
|
||||
| lastrecord_timestamp_unix | `pyspark.sql.types.DoubleType` | epoch timestamp of the first metric used to derive this aggregated metric |
|
||||
| quantity | `pyspark.sql.types.DoubleType` | aggregated metric quantity |
|
||||
| record_count | `pyspark.sql.types.DoubleType` | number of source metrics that were used to derive this aggregated metric. For informational purposes only. |
|
||||
| processing_meta | `pyspark.sql.types.MapType(pyspark.sql.types.StringType, pyspark.sql.types.StringType, True)` | Key-Value pairs to store additional information, to aid processing |
|
||||
| extra_data_map | `pyspark.sql.types.MapType(pyspark.sql.types.StringType, pyspark.sql.types.StringType, True)` | Key-Value pairs to store group by column key value pair |
|
||||
|
||||
## References
|
||||
|
||||
[1] [Spark SQL, DataFrames and Datasets
|
||||
Guide](https://spark.apache.org/docs/latest/sql-programming-guide.html)
|
||||
|
||||
[2] [Spark
|
||||
DataTypes](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.types.DataType)
|
@ -1,705 +0,0 @@
|
||||
Team and repository tags
|
||||
========================
|
||||
|
||||
[![Team and repository tags](https://governance.openstack.org/badges/monasca-transform.svg)](https://governance.openstack.org/reference/tags/index.html)
|
||||
|
||||
<!-- Change things from this point on -->
|
||||
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
|
||||
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
||||
- [Monasca Transform Generic Aggregation Components](#monasca-transform-generic-aggregation-components)
|
||||
|
||||
- [Monasca Transform Generic Aggregation Components](#monasca-transform-generic-aggregation-components)
|
||||
- [Introduction](#introduction)
|
||||
- [1: Conversion of incoming metrics to record store data format](#1-conversion-of-incoming-metrics-to-record-store-data-format)
|
||||
- [Pre Transform Spec](#pre-transform-spec)
|
||||
- [2: Data aggregation using generic aggregation components](#2-data-aggregation-using-generic-aggregation-components)
|
||||
- [Transform Specs](#transform-specs)
|
||||
- [aggregation_params_map](#aggregation_params_map)
|
||||
- [aggregation_pipeline](#aggregation_pipeline)
|
||||
- [Other parameters](#other-parameters)
|
||||
- [metric_group and metric_id](#metric_group-and-metric_id)
|
||||
- [Generic Aggregation Components](#generic-aggregation-components)
|
||||
- [Usage Components](#usage-components)
|
||||
- [fetch_quantity](#fetch_quantity)
|
||||
- [fetch_quantity_util](#fetch_quantity_util)
|
||||
- [calculate_rate](#calculate_rate)
|
||||
- [Setter Components](#setter-components)
|
||||
- [set_aggregated_metric_name](#set_aggregated_metric_name)
|
||||
- [set_aggregated_period](#set_aggregated_period)
|
||||
- [rollup_quantity](#rollup_quantity)
|
||||
- [Insert Components](#insert-components)
|
||||
- [insert_data](#insert_data)
|
||||
- [insert_data_pre_hourly](#insert_data_pre_hourly)
|
||||
- [Processors](#processors)
|
||||
- [pre_hourly_processor](#pre_hourly_processor)
|
||||
- [Special notation](#special-notation)
|
||||
- [pre_transform spec](#pre_transform-spec)
|
||||
- [transform spec](#transform-spec)
|
||||
- [Putting it all together](#putting-it-all-together)
|
||||
|
||||
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
||||
# Monasca Transform Generic Aggregation Components
|
||||
|
||||
# Introduction
|
||||
|
||||
Monasca Transform uses standard ETL (Extract-Transform-Load) design pattern to aggregate monasca
|
||||
metrics and uses innovative data/configuration driven mechanism to drive processing. It accomplishes
|
||||
data aggregation in two distinct steps, each is driven using external configuration specifications,
|
||||
namely *pre-transform_spec* and *transform_spec*.
|
||||
|
||||
## 1: Conversion of incoming metrics to record store data format ##
|
||||
|
||||
In the first step, the incoming metrics are converted into a canonical data format called as record
|
||||
store data using *pre_transform_spec*.
|
||||
|
||||
This logical processing data flow is explained in more detail in [Monasca/Transform wiki: Logical
|
||||
processing data flow section: Conversion to record store
|
||||
format](https://wiki.openstack.org/wiki/Monasca/Transform#Logical_processing_data_flow) and includes
|
||||
following operations:
|
||||
|
||||
* identifying metrics that are required (or in other words filtering out of unwanted metrics)
|
||||
|
||||
* validation and extraction of essential data in metric
|
||||
|
||||
* generating multiple records for incoming metrics if they are to be aggregated in multiple ways,
|
||||
and finally
|
||||
|
||||
* conversion of the incoming metrics to canonical record store data format. Please refer to record
|
||||
store section in [Data Formats](data_formats.md) for more information on record store format.
|
||||
|
||||
### Pre Transform Spec ###
|
||||
|
||||
Example *pre_transform_spec* for metric
|
||||
|
||||
```
|
||||
{
|
||||
"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},
|
||||
"event_type":"cpu.total_logical_cores",
|
||||
"metric_id_list":["cpu_total_all","cpu_total_host","cpu_util_all","cpu_util_host"],
|
||||
"required_raw_fields_list":["creation_time"],
|
||||
}
|
||||
```
|
||||
|
||||
*List of fields*
|
||||
|
||||
| field name | values | description |
|
||||
| :--------- | :----- | :---------- |
|
||||
| event_processing_params | Set default field values `set_default_zone_to`, `set_default_geolocation_to`, `set_default_region_to`| Set default values for certain fields in the record store data |
|
||||
| event_type | Name of the metric | identifies metric that needs to be aggregated |
|
||||
| metric_id_list | List of `metric_id`'s | List of identifiers, should match `metric_id` in transform specs. This is used by record generation step to generate multiple records if this metric is to be aggregated in multiple ways|
|
||||
| required_raw_fields_list | List of `field`'s | List of fields (use [Special notation](#special-notation)) that are required in the incoming metric, used for validating incoming metric. The validator checks if field is present and is not empty. If the field is absent or empty the validator filters such metrics out from aggregation. |
|
||||
|
||||
## 2: Data aggregation using generic aggregation components ##
|
||||
|
||||
In the second step, the canonical record store data is aggregated using *transform_spec*. Each
|
||||
*transform_spec* defines series of generic aggregation components, which are specified in
|
||||
`aggregation_params_map.aggregation_pipeline` section. (See *transform_spec* example below).
|
||||
|
||||
Any parameters used by the generic aggregation components are also specified in the
|
||||
`aggregation_params_map` section (See *Other parameters* e.g. `aggregated_metric_name`, `aggregation_period`,
|
||||
`aggregation_group_by_list` etc. in *transform_spec* example below)
|
||||
|
||||
### Transform Specs ###
|
||||
|
||||
Example *transform_spec* for metric
|
||||
```
|
||||
{"aggregation_params_map":{
|
||||
"aggregation_pipeline":{
|
||||
"source":"streaming",
|
||||
"usage":"fetch_quantity",
|
||||
"setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],
|
||||
"insert":["prepare_data","insert_data_pre_hourly"]
|
||||
},
|
||||
"aggregated_metric_name":"cpu.total_logical_cores_agg",
|
||||
"aggregation_period":"hourly",
|
||||
"aggregation_group_by_list": ["host", "metric_id", "tenant_id"],
|
||||
"usage_fetch_operation": "avg",
|
||||
"filter_by_list": [],
|
||||
"setter_rollup_group_by_list": [],
|
||||
"setter_rollup_operation": "sum",
|
||||
"dimension_list":["aggregation_period","host","project_id"],
|
||||
"pre_hourly_operation":"avg",
|
||||
"pre_hourly_group_by_list":["default"]
|
||||
},
|
||||
"metric_group":"cpu_total_all",
|
||||
"metric_id":"cpu_total_all"
|
||||
}
|
||||
```
|
||||
|
||||
#### aggregation_params_map ####
|
||||
|
||||
This section specifies *aggregation_pipeline*, *Other parameters* (used by generic aggregation
|
||||
components in *aggregation_pipeline*).
|
||||
|
||||
##### aggregation_pipeline #####
|
||||
|
||||
Specifies generic aggregation components that should be used to process incoming metrics.
|
||||
|
||||
Note: generic aggregation components are re-usable and can be used to build different aggregation
|
||||
pipelines as required.
|
||||
|
||||
*List of fields*
|
||||
|
||||
| field name | values | description |
|
||||
| :--------- | :----- | :---------- |
|
||||
| source | ```streaming``` | source is ```streaming```. In the future this can be used to specify a component which can fetch data directly from monasca datastore |
|
||||
| usage | ```fetch_quantity```, ```fetch_quantity_util```, ```calculate_rate``` | [Usage Components](https://github.com/openstack/monasca-transform/tree/master/monasca_transform/component/usage)|
|
||||
| setters | ```pre_hourly_calculate_rate```, ```rollup_quantity```, ```set_aggregated_metric_name```, ```set_aggregated_period``` | [Setter Components](https://github.com/openstack/monasca-transform/tree/master/monasca_transform/component/setter)|
|
||||
| insert | ```insert_data```, ```insert_data_pre_hourly``` | [Insert Components](https://github.com/openstack/monasca-transform/tree/master/monasca_transform/component/insert)|
|
||||
|
||||
|
||||
##### Other parameters #####
|
||||
|
||||
Specifies parameters that generic aggregation components use to process and aggregate data.
|
||||
|
||||
*List of Other parameters*
|
||||
|
||||
| Parameter Name | Values | Description | Used by |
|
||||
| :------------- | :----- | :---------- | :------ |
|
||||
| aggregated_metric_name| e.g. "cpu.total_logical_cores_agg" | Name of the aggregated metric | [set_aggregated_metric_name](#set_aggregated_metric_name) |
|
||||
| aggregation_period |"hourly", "minutely" or "secondly" | Period over which to aggregate data. | [fetch_quantity](#fetch_quantity), [fetch_quantity_util](#fetch_quantity_util), [calculate_rate](#calculate_rate), [set_aggregated_period](#set_aggregated_period), [rollup_quantity](#rollup_quantity) |[fetch_quantity](#fetch_quantity), [fetch_quantity_util](#fetch_quantity_util), [calculate_rate](#calculate_rate) |
|
||||
| aggregation_group_by_list | e.g. "project_id", "hostname" | Group `record_store` data with these columns. Please also see [Special notation](#special-notation) below | [fetch_quantity](#fetch_quantity), [fetch_quantity_util](#fetch_quantity_util), [calculate_rate](#calculate_rate) |
|
||||
| usage_fetch_operation | e.g "sum" | After the data is grouped by `aggregation_group_by_list`, perform this operation to find the aggregated metric value | [fetch_quantity](#fetch_quantity), [fetch_quantity_util](#fetch_quantity_util), [calculate_rate](#calculate_rate) |
|
||||
| filter_by_list | Filter regex | Filter data using regex on a `record_store` column value| [fetch_quantity](#fetch_quantity), [fetch_quantity_util](#fetch_quantity_util), [calculate_rate](#calculate_rate) |
|
||||
| setter_rollup_group_by_list | e.g. "project_id" | Group `instance_usage` data with these columns rollup operation. Please also see [Special notation](#special-notation) below | [rollup_quantity](#rollup_quantity) |
|
||||
| setter_rollup_operation | e.g. "avg" | After data is grouped by `setter_rollup_group_by_list`, perform this operation to find aggregated metric value | [rollup_quantity](#rollup_quantity) |
|
||||
| dimension_list | e.g. "aggregation_period", "host", "project_id" | List of fields which specify dimensions in aggregated metric. Please also see [Special notation](#special-notation) below | [insert_data](#insert_data), [insert_data_pre_hourly](#insert_data_pre_hourly)|
|
||||
| pre_hourly_group_by_list | e.g. "default" | List of `instance usage data` fields to do a group by operation to aggregate data. Please also see [Special notation](#special-notation) below | [pre_hourly_processor](#pre_hourly_processor) |
|
||||
| pre_hourly_operation | e.g. "avg" | When aggregating data published to `metrics_pre_hourly` every hour, perform this operation to find hourly aggregated metric value | [pre_hourly_processor](#pre_hourly_processor) |
|
||||
|
||||
### metric_group and metric_id ###
|
||||
|
||||
Specifies a metric or list of metrics from the record store data, which will be processed by this
|
||||
*transform_spec*. Note: This can be a single metric or a group of metrics that will be combined to
|
||||
produce the final aggregated metric.
|
||||
|
||||
*List of fields*
|
||||
|
||||
| field name | values | description |
|
||||
| :--------- | :----- | :---------- |
|
||||
| metric_group | unique transform spec group identifier | group identifier for this transform spec e.g. "cpu_total_all" |
|
||||
| metric_id | unique transform spec identifier | identifier for this transform spec e.g. "cpu_total_all" |
|
||||
|
||||
**Note:** "metric_id" is a misnomer, it is not really a metric group/or metric identifier but rather
|
||||
identifier for transformation spec. This will be changed to "transform_spec_id" in the future.
|
||||
|
||||
## Generic Aggregation Components ##
|
||||
|
||||
*List of Generic Aggregation Components*
|
||||
|
||||
### Usage Components ###
|
||||
|
||||
All usage components implement a method
|
||||
|
||||
```
|
||||
def usage(transform_context, record_store_df):
|
||||
..
|
||||
..
|
||||
return instance_usage_df
|
||||
```
|
||||
|
||||
#### fetch_quantity ####
|
||||
|
||||
This component groups record store records by `aggregation_group_by_list`, sorts within
|
||||
group by timestamp field, finds usage based on `usage_fetch_operation`. Optionally this
|
||||
component also takes `filter_by_list` to include for exclude certain records from usage
|
||||
calculation.
|
||||
|
||||
*Other parameters*
|
||||
|
||||
* **aggregation_group_by_list**
|
||||
|
||||
List of fields to group by.
|
||||
|
||||
Possible values: any set of fields in record store data. Please also see [Special notation](#special-notation).
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
"aggregation_group_by_list": ["tenant_id"]
|
||||
```
|
||||
* **usage_fetch_operation**
|
||||
|
||||
Operation to be performed on grouped data set.
|
||||
|
||||
*Possible values:* "sum", "max", "min", "avg", "latest", "oldest"
|
||||
|
||||
* **aggregation_period**
|
||||
|
||||
Period to aggregate by.
|
||||
|
||||
*Possible values:* 'daily', 'hourly', 'minutely', 'secondly'.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
"aggregation_period": "hourly"
|
||||
```
|
||||
|
||||
* **filter_by_list**
|
||||
|
||||
Filter (include or exclude) record store data as specified.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
filter_by_list": "[{"field_to_filter": "hostname",
|
||||
"filter_expression": "comp-(\d)+",
|
||||
"filter_operation": "include"}]
|
||||
```
|
||||
|
||||
OR
|
||||
|
||||
```
|
||||
filter_by_list": "[{"field_to_filter": "hostname",
|
||||
"filter_expression": "controller-(\d)+",
|
||||
"filter_operation": "exclude"}]
|
||||
```
|
||||
|
||||
#### fetch_quantity_util ####
|
||||
|
||||
This component finds the utilized quantity based on *total_quantity* and *idle_perc* using
|
||||
following calculation
|
||||
|
||||
```
|
||||
utilized_quantity = (100 - idle_perc) * total_quantity / 100
|
||||
```
|
||||
|
||||
where,
|
||||
|
||||
* **total_quantity** data, identified by `usage_fetch_util_quantity_event_type` parameter and
|
||||
|
||||
* **idle_perc** data, identified by `usage_fetch_util_idle_perc_event_type` parameter
|
||||
|
||||
This component initially groups record store records by `aggregation_group_by_list` and
|
||||
`event_type`, sorts within group by timestamp field, calculates `total_quantity` and
|
||||
`idle_perc` values based on `usage_fetch_operation`. `utilized_quantity` is then calculated
|
||||
using the formula given above.
|
||||
|
||||
*Other parameters*
|
||||
|
||||
* **aggregation_group_by_list**
|
||||
|
||||
List of fields to group by.
|
||||
|
||||
Possible values: any set of fields in record store data. Please also see [Special notation](#special-notation) below.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
"aggregation_group_by_list": ["tenant_id"]
|
||||
```
|
||||
* **usage_fetch_operation**
|
||||
|
||||
Operation to be performed on grouped data set
|
||||
|
||||
*Possible values:* "sum", "max", "min", "avg", "latest", "oldest"
|
||||
|
||||
* **aggregation_period**
|
||||
|
||||
Period to aggregate by.
|
||||
|
||||
*Possible values:* 'daily', 'hourly', 'minutely', 'secondly'.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
"aggregation_period": "hourly"
|
||||
```
|
||||
|
||||
* **filter_by_list**
|
||||
|
||||
Filter (include or exclude) record store data as specified
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
filter_by_list": "[{"field_to_filter": "hostname",
|
||||
"filter_expression": "comp-(\d)+",
|
||||
"filter_operation": "include"}]
|
||||
```
|
||||
|
||||
OR
|
||||
|
||||
```
|
||||
filter_by_list": "[{"field_to_filter": "hostname",
|
||||
"filter_expression": "controller-(\d)+",
|
||||
"filter_operation": "exclude"}]
|
||||
```
|
||||
|
||||
* **usage_fetch_util_quantity_event_type**
|
||||
|
||||
event type (metric name) to identify data which will be used to calculate `total_quantity`
|
||||
|
||||
*Possible values:* metric name
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
"usage_fetch_util_quantity_event_type": "cpu.total_logical_cores"
|
||||
```
|
||||
|
||||
|
||||
* **usage_fetch_util_idle_perc_event_type**
|
||||
|
||||
event type (metric name) to identify data which will be used to calculate `total_quantity`
|
||||
|
||||
*Possible values:* metric name
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
"usage_fetch_util_idle_perc_event_type": "cpu.idle_perc"
|
||||
```
|
||||
|
||||
#### calculate_rate ####
|
||||
|
||||
This component finds the rate of change of quantity (in percent) over a time period using
|
||||
following calculation
|
||||
|
||||
```
|
||||
rate_of_change (in percent) = ((oldest_quantity - latest_quantity)/oldest_quantity) * 100
|
||||
```
|
||||
|
||||
where,
|
||||
|
||||
* **oldest_quantity**: oldest (or earliest) `average` quantity if there are multiple quantites in a
|
||||
group for a given time period.
|
||||
|
||||
* **latest_quantity**: latest `average` quantity if there are multiple quantities in a group
|
||||
for a given time period
|
||||
|
||||
*Other parameters*
|
||||
|
||||
* **aggregation_group_by_list**
|
||||
|
||||
List of fields to group by.
|
||||
|
||||
Possible values: any set of fields in record store data. Please also see [Special notation](#special-notation) below.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
"aggregation_group_by_list": ["tenant_id"]
|
||||
```
|
||||
* **usage_fetch_operation**
|
||||
|
||||
Operation to be performed on grouped data set
|
||||
|
||||
*Possible values:* "sum", "max", "min", "avg", "latest", "oldest"
|
||||
|
||||
* **aggregation_period**
|
||||
|
||||
Period to aggregate by.
|
||||
|
||||
*Possible values:* 'daily', 'hourly', 'minutely', 'secondly'.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
"aggregation_period": "hourly"
|
||||
```
|
||||
|
||||
* **filter_by_list**
|
||||
|
||||
Filter (include or exclude) record store data as specified
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
filter_by_list": "[{"field_to_filter": "hostname",
|
||||
"filter_expression": "comp-(\d)+",
|
||||
"filter_operation": "include"}]
|
||||
```
|
||||
|
||||
OR
|
||||
|
||||
```
|
||||
filter_by_list": "[{"field_to_filter": "hostname",
|
||||
"filter_expression": "controller-(\d)+",
|
||||
"filter_operation": "exclude"}]
|
||||
```
|
||||
|
||||
|
||||
### Setter Components ###
|
||||
|
||||
All usage components implement a method
|
||||
|
||||
```
|
||||
def setter(transform_context, instance_usage_df):
|
||||
..
|
||||
..
|
||||
return instance_usage_df
|
||||
```
|
||||
|
||||
#### set_aggregated_metric_name ####
|
||||
|
||||
This component sets final aggregated metric name by setting `aggregated_metric_name` field in
|
||||
`instance_usage` data.
|
||||
|
||||
*Other parameters*
|
||||
|
||||
* **aggregated_metric_name**
|
||||
|
||||
Name of the metric name being generated.
|
||||
|
||||
*Possible values:* any aggregated metric name. Convention is to end the metric name
|
||||
with "_agg".
|
||||
|
||||
Example:
|
||||
```
|
||||
"aggregated_metric_name":"cpu.total_logical_cores_agg"
|
||||
```
|
||||
|
||||
#### set_aggregated_period ####
|
||||
|
||||
This component sets final aggregated metric name by setting `aggregation_period` field in
|
||||
`instance_usage` data.
|
||||
|
||||
*Other parameters*
|
||||
|
||||
* **aggregated_period**
|
||||
|
||||
Name of the metric name being generated.
|
||||
|
||||
*Possible values:* 'daily', 'hourly', 'minutely', 'secondly'.
|
||||
|
||||
Example:
|
||||
```
|
||||
"aggregation_period": "hourly"
|
||||
```
|
||||
|
||||
**Note** If you are publishing metrics to *metrics_pre_hourly* kafka topic using
|
||||
`insert_data_pre_hourly` component(See *insert_data_pre_hourly* component below),
|
||||
`aggregation_period` will have to be set to `hourly`since by default all data in
|
||||
*metrics_pre_hourly* topic, by default gets aggregated every hour by `Pre Hourly Processor` (See
|
||||
`Processors` section below)
|
||||
|
||||
#### rollup_quantity ####
|
||||
|
||||
This component groups `instance_usage` records by `setter_rollup_group_by_list`, sorts within
|
||||
group by timestamp field, finds usage based on `setter_fetch_operation`.
|
||||
|
||||
*Other parameters*
|
||||
|
||||
* **setter_rollup_group_by_list**
|
||||
|
||||
List of fields to group by.
|
||||
|
||||
Possible values: any set of fields in record store data. Please also see [Special notation](#special-notation) below.
|
||||
|
||||
Example:
|
||||
```
|
||||
"setter_rollup_group_by_list": ["tenant_id"]
|
||||
```
|
||||
* **setter_fetch_operation**
|
||||
|
||||
Operation to be performed on grouped data set
|
||||
|
||||
*Possible values:* "sum", "max", "min", "avg"
|
||||
|
||||
Example:
|
||||
```
|
||||
"setter_fetch_operation": "avg"
|
||||
```
|
||||
|
||||
* **aggregation_period**
|
||||
|
||||
Period to aggregate by.
|
||||
|
||||
*Possible values:* 'daily', 'hourly', 'minutely', 'secondly'.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
"aggregation_period": "hourly"
|
||||
```
|
||||
|
||||
### Insert Components ###
|
||||
|
||||
All usage components implement a method
|
||||
|
||||
```
|
||||
def insert(transform_context, instance_usage_df):
|
||||
..
|
||||
..
|
||||
return instance_usage_df
|
||||
```
|
||||
|
||||
#### insert_data ####
|
||||
|
||||
This component converts `instance_usage` data into monasca metric format and writes the metric to
|
||||
`metrics` topic in kafka.
|
||||
|
||||
*Other parameters*
|
||||
|
||||
* **dimension_list**
|
||||
|
||||
List of fields in `instance_usage` data that should be converted to monasca metric dimensions.
|
||||
|
||||
*Possible values:* any fields in `instance_usage` data or use [Special notation](#special-notation) below.
|
||||
|
||||
Example:
|
||||
```
|
||||
"dimension_list":["aggregation_period","host","project_id"]
|
||||
```
|
||||
|
||||
#### insert_data_pre_hourly ####
|
||||
|
||||
This component converts `instance_usage` data into monasca metric format and writes the metric to
|
||||
`metrics_pre_hourly` topic in kafka.
|
||||
|
||||
*Other parameters*
|
||||
|
||||
* **dimension_list**
|
||||
|
||||
List of fields in `instance_usage` data that should be converted to monasca metric dimensions.
|
||||
|
||||
*Possible values:* any fields in `instance_usage` data
|
||||
|
||||
Example:
|
||||
```
|
||||
"dimension_list":["aggregation_period","host","project_id"]
|
||||
```
|
||||
|
||||
## Processors ##
|
||||
|
||||
Processors are special components that process data from a kafka topic, at the desired time
|
||||
interval. These are different from generic aggregation components since they process data from
|
||||
specific kafka topic.
|
||||
|
||||
All processor components implement following methods
|
||||
|
||||
```
|
||||
def get_app_name(self):
|
||||
[...]
|
||||
return app_name
|
||||
|
||||
def is_time_to_run(self, current_time):
|
||||
if current_time > last_invoked + 1:
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
|
||||
def run_processor(self, time):
|
||||
# do work...
|
||||
```
|
||||
|
||||
### pre_hourly_processor ###
|
||||
|
||||
Pre Hourly Processor, runs every hour and aggregates `instance_usage` data published to
|
||||
`metrics_pre_hourly` topic.
|
||||
|
||||
Pre Hourly Processor by default is set to run 10 minutes after the top of the hour and processes
|
||||
data from previous hour. `instance_usage` data is grouped by `pre_hourly_group_by_list`
|
||||
|
||||
*Other parameters*
|
||||
|
||||
* **pre_hourly_group_by_list**
|
||||
|
||||
List of fields to group by.
|
||||
|
||||
Possible values: any set of fields in `instance_usage` data or to `default`. Please also see
|
||||
[Special notation](#special-notation) below.
|
||||
|
||||
Note: setting to `default` will group `instance_usage` data by `tenant_id`, `user_id`,
|
||||
`resource_uuid`, `geolocation`, `region`, `zone`, `host`, `project_id`,
|
||||
`aggregated_metric_name`, `aggregation_period`
|
||||
|
||||
Example:
|
||||
```
|
||||
"pre_hourly_group_by_list": ["tenant_id"]
|
||||
```
|
||||
|
||||
OR
|
||||
|
||||
```
|
||||
"pre_hourly_group_by_list": ["default"]
|
||||
```
|
||||
|
||||
* **pre_hourly_operation**
|
||||
|
||||
Operation to be performed on grouped data set.
|
||||
|
||||
*Possible values:* "sum", "max", "min", "avg", "rate"
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
"pre_hourly_operation": "avg"
|
||||
```
|
||||
|
||||
|
||||
## Special notation ##
|
||||
|
||||
### pre_transform spec ###
|
||||
|
||||
To specify `required_raw_fields_list` please use special notation
|
||||
`dimensions#{$field_name}` or `meta#{$field_name}` or`value_meta#{$field_name}` to refer to any field in
|
||||
dimension, meta or value_meta field in the incoming raw metric.
|
||||
|
||||
For example if you want to check that for a particular metric say dimension called "pod_name" is
|
||||
present and is non-empty, then simply add `dimensions#pod_name` to the
|
||||
`required_raw_fields_list`.
|
||||
|
||||
Example `pre_transform` spec
|
||||
```
|
||||
{"event_processing_params":{"set_default_zone_to":"1",
|
||||
"set_default_geolocation_to":"1",
|
||||
"set_default_region_to":"W"},
|
||||
"event_type":"pod.net.in_bytes_sec",
|
||||
"metric_id_list":["pod_net_in_b_per_sec_per_namespace"],
|
||||
"required_raw_fields_list":["creation_time",
|
||||
"meta#tenantId",
|
||||
"dimensions#namespace",
|
||||
"dimensions#pod_name",
|
||||
"dimensions#app"]
|
||||
}
|
||||
```
|
||||
|
||||
### transform spec ###
|
||||
|
||||
To specify `aggregation_group_by_list`, `setter_rollup_group_by_list`, `pre_hourly_group_by_list`,
|
||||
`dimension_list`, you can also use special notation `dimensions#{$field_name}` or `meta#{$field_name}`
|
||||
or`value_meta#$field_name` to refer to any field in dimension, meta or value_meta field in the
|
||||
incoming raw metric.
|
||||
|
||||
For example following `transform_spec` will aggregate by "app", "namespace" and "pod_name"
|
||||
dimensions, then will do a rollup of the aggregated data by "namespace" dimension, and write final
|
||||
aggregated metric with "app", "namespace" and "pod_name" dimensions. Note that "app" and "pod_name"
|
||||
will be set to "all" since the final rollup operation was done only based on "namespace" dimension.
|
||||
|
||||
```
|
||||
{
|
||||
"aggregation_params_map":{
|
||||
"aggregation_pipeline":{"source":"streaming",
|
||||
"usage":"fetch_quantity",
|
||||
"setters":["rollup_quantity",
|
||||
"set_aggregated_metric_name",
|
||||
"set_aggregated_period"],
|
||||
"insert":["prepare_data",
|
||||
"insert_data_pre_hourly"]},
|
||||
"aggregated_metric_name":"pod.net.in_bytes_sec_agg",
|
||||
"aggregation_period":"hourly",
|
||||
"aggregation_group_by_list": ["tenant_id",
|
||||
"dimensions#app",
|
||||
"dimensions#namespace",
|
||||
"dimensions#pod_name"],
|
||||
"usage_fetch_operation": "avg",
|
||||
"filter_by_list": [],
|
||||
"setter_rollup_group_by_list":["dimensions#namespace"],
|
||||
"setter_rollup_operation": "sum",
|
||||
"dimension_list":["aggregation_period",
|
||||
"dimensions#app",
|
||||
"dimensions#namespace",
|
||||
"dimensions#pod_name"],
|
||||
"pre_hourly_operation":"avg",
|
||||
"pre_hourly_group_by_list":["aggregation_period",
|
||||
"dimensions#namespace]'"]},
|
||||
"metric_group":"pod_net_in_b_per_sec_per_namespace",
|
||||
"metric_id":"pod_net_in_b_per_sec_per_namespace"}
|
||||
```
|
||||
|
||||
# Putting it all together
|
||||
Please refer to [Create a new aggregation pipeline](create-new-aggregation-pipeline.md) document to
|
||||
create a new aggregation pipeline.
|
@ -1,89 +0,0 @@
|
||||
[DEFAULTS]
|
||||
|
||||
[repositories]
|
||||
offsets = monasca_transform.mysql_offset_specs:MySQLOffsetSpecs
|
||||
data_driven_specs = monasca_transform.data_driven_specs.mysql_data_driven_specs_repo:MySQLDataDrivenSpecsRepo
|
||||
offsets_max_revisions = 10
|
||||
|
||||
[database]
|
||||
server_type = mysql:thin
|
||||
host = localhost
|
||||
database_name = monasca_transform
|
||||
username = m-transform
|
||||
password = password
|
||||
|
||||
[messaging]
|
||||
adapter = monasca_transform.messaging.adapter:KafkaMessageAdapter
|
||||
topic = metrics
|
||||
brokers = localhost:9092
|
||||
publish_kafka_project_id = d2cb21079930415a9f2a33588b9f2bb6
|
||||
publish_region = useast
|
||||
adapter_pre_hourly = monasca_transform.messaging.adapter:KafkaMessageAdapterPreHourly
|
||||
topic_pre_hourly = metrics_pre_hourly
|
||||
|
||||
[stage_processors]
|
||||
enable_pre_hourly_processor = True
|
||||
|
||||
[pre_hourly_processor]
|
||||
enable_instance_usage_df_cache = True
|
||||
instance_usage_df_cache_storage_level = MEMORY_ONLY_SER_2
|
||||
enable_batch_time_filtering = True
|
||||
effective_batch_revision=2
|
||||
|
||||
#
|
||||
# Configurable values for the monasca-transform service
|
||||
#
|
||||
[service]
|
||||
|
||||
# The address of the mechanism being used for election coordination
|
||||
coordinator_address = kazoo://localhost:2181
|
||||
|
||||
# The name of the coordination/election group
|
||||
coordinator_group = monasca-transform
|
||||
|
||||
# How long the candidate should sleep between election result
|
||||
# queries (in seconds)
|
||||
election_polling_frequency = 15
|
||||
|
||||
# Whether debug-level log entries should be included in the application
|
||||
# log. If this setting is false, info-level will be used for logging.
|
||||
enable_debug_log_entries = true
|
||||
|
||||
# The path for the setup file to be executed
|
||||
setup_file = /opt/stack/monasca-transform/setup.py
|
||||
|
||||
# The target of the setup file
|
||||
setup_target = bdist_egg
|
||||
|
||||
# The path for the monasca-transform Spark driver
|
||||
spark_driver = /opt/stack/monasca-transform/monasca_transform/driver/mon_metrics_kafka.py
|
||||
|
||||
# the location for the transform-service log
|
||||
service_log_path=/var/log/monasca/transform/
|
||||
|
||||
# the filename for the transform-service log
|
||||
service_log_filename=monasca-transform.log
|
||||
|
||||
# Whether Spark event logging should be enabled (true/false)
|
||||
spark_event_logging_enabled = true
|
||||
|
||||
# A list of jars which Spark should use
|
||||
spark_jars_list = /opt/spark/current/assembly/target/scala-2.10/jars/spark-streaming-kafka-0-8_2.10-2.1.1.jar,/opt/spark/current/assembly/target/scala-2.10/jars/scala-library-2.10.6.jar,/opt/spark/current/assembly/target/scala-2.10/jars/kafka_2.10-0.8.1.1.jar,/opt/spark/current/assembly/target/scala-2.10/jars/metrics-core-2.2.0.jar,/opt/spark/current/assembly/target/scala-2.10/jars/drizzle-jdbc-1.3.jar
|
||||
|
||||
# A list of where the Spark master(s) should run
|
||||
spark_master_list = spark://localhost:7077
|
||||
|
||||
# spark_home for the environment
|
||||
spark_home = /opt/spark/current
|
||||
|
||||
# Python files for Spark to use
|
||||
spark_python_files = /opt/stack/monasca-transform/dist/monasca_transform-0.0.1.egg
|
||||
|
||||
# How often the stream should be read (in seconds)
|
||||
stream_interval = 600
|
||||
|
||||
# The working directory for monasca-transform
|
||||
work_dir = /opt/stack/monasca-transform
|
||||
|
||||
enable_record_store_df_cache = True
|
||||
record_store_df_cache_storage_level = MEMORY_ONLY_SER_2
|
@ -1,84 +0,0 @@
|
||||
alabaster==0.7.10
|
||||
Babel==2.5.3
|
||||
certifi==2018.1.18
|
||||
chardet==3.0.4
|
||||
cliff==2.11.0
|
||||
cmd2==0.8.1
|
||||
contextlib2==0.5.5
|
||||
coverage==4.0
|
||||
debtcollector==1.19.0
|
||||
docutils==0.14
|
||||
enum-compat==0.0.2
|
||||
eventlet==0.20.0
|
||||
extras==1.0.0
|
||||
fasteners==0.14.1
|
||||
fixtures==3.0.0
|
||||
flake8==2.5.5
|
||||
future==0.16.0
|
||||
futurist==1.6.0
|
||||
greenlet==0.4.13
|
||||
hacking==1.1.0
|
||||
idna==2.6
|
||||
imagesize==1.0.0
|
||||
iso8601==0.1.12
|
||||
Jinja2==2.10
|
||||
kazoo==2.4.0
|
||||
linecache2==1.0.0
|
||||
MarkupSafe==1.0
|
||||
mccabe==0.2.1
|
||||
monasca-common==2.7.0
|
||||
monotonic==1.4
|
||||
msgpack==0.5.6
|
||||
netaddr==0.7.19
|
||||
netifaces==0.10.6
|
||||
nose==1.3.7
|
||||
os-testr==1.0.0
|
||||
oslo.concurrency==3.26.0
|
||||
oslo.config==5.2.0
|
||||
oslo.context==2.20.0
|
||||
oslo.i18n==3.20.0
|
||||
oslo.log==3.36.0
|
||||
oslo.policy==1.34.0
|
||||
oslo.serialization==2.25.0
|
||||
oslo.service==1.24.0
|
||||
oslo.utils==3.36.0
|
||||
Paste==2.0.3
|
||||
PasteDeploy==1.5.2
|
||||
pbr==2.0.0
|
||||
pep8==1.5.7
|
||||
prettytable==0.7.2
|
||||
psutil==3.2.2
|
||||
pycodestyle==2.5.0
|
||||
pyflakes==0.8.1
|
||||
Pygments==2.2.0
|
||||
pyinotify==0.9.6
|
||||
PyMySQL==0.7.6
|
||||
pyparsing==2.2.0
|
||||
pyperclip==1.6.0
|
||||
python-dateutil==2.7.0
|
||||
python-mimeparse==1.6.0
|
||||
python-subunit==1.2.0
|
||||
pytz==2018.3
|
||||
PyYAML==3.12
|
||||
repoze.lru==0.7
|
||||
requests==2.18.4
|
||||
rfc3986==1.1.0
|
||||
Routes==2.4.1
|
||||
six==1.10.0
|
||||
snowballstemmer==1.2.1
|
||||
Sphinx==1.6.2
|
||||
sphinxcontrib-websupport==1.0.1
|
||||
SQLAlchemy==1.0.10
|
||||
stestr==2.0.0
|
||||
stevedore==1.20.0
|
||||
tabulate==0.8.2
|
||||
tenacity==4.9.0
|
||||
testtools==2.3.0
|
||||
tooz==1.58.0
|
||||
traceback2==1.4.0
|
||||
ujson==1.35
|
||||
unittest2==1.1.0
|
||||
urllib3==1.22
|
||||
voluptuous==0.11.1
|
||||
WebOb==1.7.4
|
||||
wrapt==1.10.11
|
@ -1,41 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from collections import namedtuple
|
||||
|
||||
|
||||
class Component(object):
|
||||
|
||||
SOURCE_COMPONENT_TYPE = "source"
|
||||
USAGE_COMPONENT_TYPE = "usage"
|
||||
SETTER_COMPONENT_TYPE = "setter"
|
||||
INSERT_COMPONENT_TYPE = "insert"
|
||||
|
||||
DEFAULT_UNAVAILABLE_VALUE = "NA"
|
||||
|
||||
|
||||
InstanceUsageDataAggParamsBase = namedtuple('InstanceUsageDataAggParams',
|
||||
['instance_usage_data',
|
||||
'agg_params'])
|
||||
|
||||
|
||||
class InstanceUsageDataAggParams(InstanceUsageDataAggParamsBase):
|
||||
"""A tuple which is a wrapper containing the instance usage data and aggregation params
|
||||
|
||||
namdetuple contains:
|
||||
|
||||
instance_usage_data - instance usage
|
||||
agg_params - aggregation params dict
|
||||
|
||||
"""
|
@ -1,50 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import logging
|
||||
|
||||
LOG = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ComponentUtils(object):
|
||||
|
||||
@staticmethod
|
||||
def _get_group_by_period_list(aggregation_period):
|
||||
"""get a list of columns for an aggregation period."""
|
||||
group_by_period_list = []
|
||||
if (aggregation_period == "daily"):
|
||||
group_by_period_list = ["event_date"]
|
||||
elif (aggregation_period == "hourly"):
|
||||
group_by_period_list = ["event_date", "event_hour"]
|
||||
elif (aggregation_period == "minutely"):
|
||||
group_by_period_list = ["event_date", "event_hour", "event_minute"]
|
||||
elif (aggregation_period == "secondly"):
|
||||
group_by_period_list = ["event_date", "event_hour",
|
||||
"event_minute", "event_second"]
|
||||
return group_by_period_list
|
||||
|
||||
@staticmethod
|
||||
def _get_instance_group_by_period_list(aggregation_period):
|
||||
"""get a list of columns for an aggregation period."""
|
||||
group_by_period_list = []
|
||||
if (aggregation_period == "daily"):
|
||||
group_by_period_list = ["usage_date"]
|
||||
elif (aggregation_period == "hourly"):
|
||||
group_by_period_list = ["usage_date", "usage_hour"]
|
||||
elif (aggregation_period == "minutely"):
|
||||
group_by_period_list = ["usage_date", "usage_hour", "usage_minute"]
|
||||
elif (aggregation_period == "secondly"):
|
||||
group_by_period_list = ["usage_date", "usage_hour",
|
||||
"usage_minute", "usage_second"]
|
||||
return group_by_period_list
|
@ -1,204 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
import abc
|
||||
import json
|
||||
import time
|
||||
|
||||
from monasca_common.validation import metrics as metric_validator
|
||||
from monasca_transform.component import Component
|
||||
from monasca_transform.config.config_initializer import ConfigInitializer
|
||||
from monasca_transform.log_utils import LogUtils
|
||||
from monasca_transform.transform.transform_utils import InstanceUsageUtils
|
||||
|
||||
from oslo_config import cfg
|
||||
|
||||
ConfigInitializer.basic_config()
|
||||
log = LogUtils.init_logger(__name__)
|
||||
|
||||
|
||||
class InsertComponent(Component):
|
||||
|
||||
@abc.abstractmethod
|
||||
def insert(transform_context, instance_usage_df):
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement setter(instance_usage_df,"
|
||||
" transform_spec_df)"
|
||||
% __name__)
|
||||
|
||||
@staticmethod
|
||||
def get_component_type():
|
||||
return Component.INSERT_COMPONENT_TYPE
|
||||
|
||||
@staticmethod
|
||||
def _validate_metric(metric):
|
||||
"""validate monasca metric."""
|
||||
try:
|
||||
# validate metric part, without the wrapper
|
||||
metric_validator.validate(metric["metric"])
|
||||
except Exception as e:
|
||||
log.info("Metric %s is invalid: Exception : %s"
|
||||
% (json.dumps(metric), str(e)))
|
||||
return False
|
||||
return True
|
||||
|
||||
@staticmethod
|
||||
def _prepare_metric(instance_usage_dict, agg_params):
|
||||
"""transform instance usage rdd to a monasca metric.
|
||||
|
||||
example metric:
|
||||
|
||||
{"metric":{"name":"host_alive_status",
|
||||
"dimensions":{"hostname":"mini-mon",
|
||||
"observer_host":"devstack",
|
||||
"test_type":"ssh"},
|
||||
"timestamp":1456858016000,
|
||||
"value":1.0,
|
||||
"value_meta":{"error":
|
||||
"Unable to open socket to host mini-mon"}
|
||||
},
|
||||
"meta":{"tenantId":"8eadcf71fc5441d8956cb9cbb691704e",
|
||||
"region":"useast"},
|
||||
"creation_time":1456858034
|
||||
}
|
||||
|
||||
"""
|
||||
|
||||
current_epoch_seconds = time.time()
|
||||
current_epoch_milliseconds = current_epoch_seconds * 1000
|
||||
|
||||
log.debug(instance_usage_dict)
|
||||
|
||||
# extract dimensions
|
||||
dimension_list = agg_params["dimension_list"]
|
||||
dimensions_part = InstanceUsageUtils.extract_dimensions(instance_usage_dict,
|
||||
dimension_list)
|
||||
|
||||
meta_part = {}
|
||||
|
||||
# TODO(someone) determine the appropriate tenant ID to use. For now,
|
||||
# what works is to use the same tenant ID as other metrics specify in
|
||||
# their kafka messages (and this appears to change each time mini-mon
|
||||
# is re-installed). The long term solution is to have HLM provide
|
||||
# a usable tenant ID to us in a configurable way. BTW, without a
|
||||
# proper/valid tenant ID, aggregated metrics don't get persisted
|
||||
# to the Monasca DB.
|
||||
meta_part["tenantId"] = cfg.CONF.messaging.publish_kafka_project_id
|
||||
meta_part["region"] = cfg.CONF.messaging.publish_region
|
||||
|
||||
value_meta_part = {"record_count": instance_usage_dict.get(
|
||||
"record_count", 0),
|
||||
"firstrecord_timestamp_string":
|
||||
instance_usage_dict.get(
|
||||
"firstrecord_timestamp_string",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE),
|
||||
"lastrecord_timestamp_string":
|
||||
instance_usage_dict.get(
|
||||
"lastrecord_timestamp_string",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)}
|
||||
|
||||
metric_part = {"name": instance_usage_dict.get(
|
||||
"aggregated_metric_name"),
|
||||
"dimensions": dimensions_part,
|
||||
"timestamp": int(current_epoch_milliseconds),
|
||||
"value": instance_usage_dict.get(
|
||||
"quantity", 0.0),
|
||||
"value_meta": value_meta_part}
|
||||
|
||||
metric = {"metric": metric_part,
|
||||
"meta": meta_part,
|
||||
"creation_time": int(current_epoch_seconds)}
|
||||
|
||||
log.debug(metric)
|
||||
|
||||
return metric
|
||||
|
||||
@staticmethod
|
||||
def _get_metric(row, agg_params):
|
||||
"""write data to kafka. extracts and formats metric data and write s the data to kafka"""
|
||||
instance_usage_dict = {"tenant_id": row.tenant_id,
|
||||
"user_id": row.user_id,
|
||||
"resource_uuid": row.resource_uuid,
|
||||
"geolocation": row.geolocation,
|
||||
"region": row.region,
|
||||
"zone": row.zone,
|
||||
"host": row.host,
|
||||
"project_id": row.project_id,
|
||||
"aggregated_metric_name":
|
||||
row.aggregated_metric_name,
|
||||
"quantity": row.quantity,
|
||||
"firstrecord_timestamp_string":
|
||||
row.firstrecord_timestamp_string,
|
||||
"lastrecord_timestamp_string":
|
||||
row.lastrecord_timestamp_string,
|
||||
"record_count": row.record_count,
|
||||
"usage_date": row.usage_date,
|
||||
"usage_hour": row.usage_hour,
|
||||
"usage_minute": row.usage_minute,
|
||||
"aggregation_period":
|
||||
row.aggregation_period,
|
||||
"extra_data_map":
|
||||
row.extra_data_map}
|
||||
metric = InsertComponent._prepare_metric(instance_usage_dict,
|
||||
agg_params)
|
||||
return metric
|
||||
|
||||
@staticmethod
|
||||
def _get_instance_usage_pre_hourly(row,
|
||||
metric_id):
|
||||
"""write data to kafka. extracts and formats metric data and writes the data to kafka"""
|
||||
# retrieve the processing meta from the row
|
||||
processing_meta = row.processing_meta
|
||||
|
||||
# add transform spec metric id to the processing meta
|
||||
if processing_meta:
|
||||
processing_meta["metric_id"] = metric_id
|
||||
else:
|
||||
processing_meta = {"metric_id": metric_id}
|
||||
|
||||
instance_usage_dict = {"tenant_id": row.tenant_id,
|
||||
"user_id": row.user_id,
|
||||
"resource_uuid": row.resource_uuid,
|
||||
"geolocation": row.geolocation,
|
||||
"region": row.region,
|
||||
"zone": row.zone,
|
||||
"host": row.host,
|
||||
"project_id": row.project_id,
|
||||
"aggregated_metric_name":
|
||||
row.aggregated_metric_name,
|
||||
"quantity": row.quantity,
|
||||
"firstrecord_timestamp_string":
|
||||
row.firstrecord_timestamp_string,
|
||||
"lastrecord_timestamp_string":
|
||||
row.lastrecord_timestamp_string,
|
||||
"firstrecord_timestamp_unix":
|
||||
row.firstrecord_timestamp_unix,
|
||||
"lastrecord_timestamp_unix":
|
||||
row.lastrecord_timestamp_unix,
|
||||
"record_count": row.record_count,
|
||||
"usage_date": row.usage_date,
|
||||
"usage_hour": row.usage_hour,
|
||||
"usage_minute": row.usage_minute,
|
||||
"aggregation_period":
|
||||
row.aggregation_period,
|
||||
"processing_meta": processing_meta,
|
||||
"extra_data_map": row.extra_data_map}
|
||||
return instance_usage_dict
|
||||
|
||||
@staticmethod
|
||||
def _write_metrics_from_partition(partlistiter):
|
||||
"""iterate through all rdd elements in partition and write metrics to kafka"""
|
||||
for part in partlistiter:
|
||||
agg_params = part.agg_params
|
||||
row = part.instance_usage_data
|
||||
InsertComponent._write_metric(row, agg_params)
|
@ -1,65 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from monasca_transform.component.insert import InsertComponent
|
||||
from monasca_transform.config.config_initializer import ConfigInitializer
|
||||
from monasca_transform.messaging.adapter import KafkaMessageAdapter
|
||||
|
||||
|
||||
class KafkaInsert(InsertComponent):
|
||||
"""Insert component that writes instance usage data to kafka queue"""
|
||||
|
||||
@staticmethod
|
||||
def insert(transform_context, instance_usage_df):
|
||||
"""write instance usage data to kafka"""
|
||||
|
||||
# object to init config
|
||||
ConfigInitializer.basic_config()
|
||||
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.dimension_list").collect()[0].asDict()
|
||||
|
||||
# Approach # 1
|
||||
# using foreachPartition to iterate through elements in an
|
||||
# RDD is the recommended approach so as to not overwhelm kafka with the
|
||||
# zillion connections (but in our case the MessageAdapter does
|
||||
# store the adapter_impl so we should not create many producers)
|
||||
|
||||
# using foreachpartitions was causing some serialization/cpickle
|
||||
# problems where few libs like kafka.SimpleProducer and oslo_config.cfg
|
||||
# were not available in foreachPartition method
|
||||
#
|
||||
# removing _write_metrics_from_partition for now in favor of
|
||||
# Approach # 2
|
||||
#
|
||||
|
||||
# instance_usage_df_agg_params = instance_usage_df.rdd.map(
|
||||
# lambda x: InstanceUsageDataAggParams(x,
|
||||
# agg_params))
|
||||
# instance_usage_df_agg_params.foreachPartition(
|
||||
# DummyInsert._write_metrics_from_partition)
|
||||
|
||||
# Approach # 2
|
||||
# using collect() to fetch all elements of an RDD and write to
|
||||
# kafka
|
||||
|
||||
for instance_usage_row in instance_usage_df.collect():
|
||||
metric = InsertComponent._get_metric(
|
||||
instance_usage_row, agg_params)
|
||||
# validate metric part
|
||||
if InsertComponent._validate_metric(metric):
|
||||
KafkaMessageAdapter.send_metric(metric)
|
||||
return instance_usage_df
|
@ -1,44 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from monasca_transform.component.insert import InsertComponent
|
||||
from monasca_transform.config.config_initializer import ConfigInitializer
|
||||
from monasca_transform.messaging.adapter import KafkaMessageAdapterPreHourly
|
||||
|
||||
|
||||
class KafkaInsertPreHourly(InsertComponent):
|
||||
"""Insert component that writes instance usage data to kafka queue"""
|
||||
|
||||
@staticmethod
|
||||
def insert(transform_context, instance_usage_df):
|
||||
"""write instance usage data to kafka"""
|
||||
|
||||
# object to init config
|
||||
ConfigInitializer.basic_config()
|
||||
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
agg_params = transform_spec_df.select(
|
||||
"metric_id").\
|
||||
collect()[0].asDict()
|
||||
metric_id = agg_params["metric_id"]
|
||||
|
||||
for instance_usage_row in instance_usage_df.collect():
|
||||
instance_usage_dict = \
|
||||
InsertComponent._get_instance_usage_pre_hourly(
|
||||
instance_usage_row,
|
||||
metric_id)
|
||||
KafkaMessageAdapterPreHourly.send_metric(instance_usage_dict)
|
||||
|
||||
return instance_usage_df
|
@ -1,28 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from monasca_transform.component.insert import InsertComponent
|
||||
|
||||
|
||||
class PrepareData(InsertComponent):
|
||||
"""prepare for insert component validates instance usage data before calling Insert component"""
|
||||
@staticmethod
|
||||
def insert(transform_context, instance_usage_df):
|
||||
"""write instance usage data to kafka"""
|
||||
|
||||
#
|
||||
# TODO(someone) add instance usage data validation
|
||||
#
|
||||
|
||||
return instance_usage_df
|
@ -1,31 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import abc
|
||||
from monasca_transform.component import Component
|
||||
|
||||
|
||||
class SetterComponent(Component):
|
||||
|
||||
@abc.abstractmethod
|
||||
def setter(transform_context, instance_usage_df):
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement setter(instance_usage_df,"
|
||||
" transform_context)"
|
||||
% __name__)
|
||||
|
||||
@staticmethod
|
||||
def get_component_type():
|
||||
"""get component type."""
|
||||
return Component.SETTER_COMPONENT_TYPE
|
@ -1,125 +0,0 @@
|
||||
# (c) Copyright 2016 Hewlett Packard Enterprise Development LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from pyspark.sql import functions
|
||||
from pyspark.sql import SQLContext
|
||||
|
||||
from monasca_transform.component.setter import SetterComponent
|
||||
from monasca_transform.transform.transform_utils import InstanceUsageUtils
|
||||
|
||||
import json
|
||||
|
||||
|
||||
class PreHourlyCalculateRateException(Exception):
|
||||
"""Exception thrown when doing pre-hourly rate calculations
|
||||
|
||||
Attributes:
|
||||
value: string representing the error
|
||||
"""
|
||||
|
||||
def __init__(self, value):
|
||||
self.value = value
|
||||
|
||||
def __str__(self):
|
||||
return repr(self.value)
|
||||
|
||||
|
||||
class PreHourlyCalculateRate(SetterComponent):
|
||||
|
||||
@staticmethod
|
||||
def _calculate_rate(instance_usage_df):
|
||||
instance_usage_data_json_list = []
|
||||
|
||||
try:
|
||||
sorted_oldest_ascending_df = instance_usage_df.sort(
|
||||
functions.asc("processing_meta.oldest_timestamp_string"))
|
||||
|
||||
sorted_latest_descending_df = instance_usage_df.sort(
|
||||
functions.desc("processing_meta.latest_timestamp_string"))
|
||||
|
||||
# Calculate the rate change by percentage
|
||||
oldest_dict = sorted_oldest_ascending_df.collect()[0].asDict()
|
||||
oldest_quantity = float(oldest_dict[
|
||||
"processing_meta"]["oldest_quantity"])
|
||||
|
||||
latest_dict = sorted_latest_descending_df.collect()[0].asDict()
|
||||
latest_quantity = float(latest_dict[
|
||||
"processing_meta"]["latest_quantity"])
|
||||
|
||||
rate_percentage = 100 * (
|
||||
(oldest_quantity - latest_quantity) / oldest_quantity)
|
||||
|
||||
# get any extra data
|
||||
extra_data_map = getattr(sorted_oldest_ascending_df.collect()[0],
|
||||
"extra_data_map", {})
|
||||
except Exception as e:
|
||||
raise PreHourlyCalculateRateException(
|
||||
"Exception occurred in pre-hourly rate calculation. Error: %s"
|
||||
% str(e))
|
||||
# create a new instance usage dict
|
||||
instance_usage_dict = {"tenant_id":
|
||||
latest_dict.get("tenant_id", "all"),
|
||||
"user_id":
|
||||
latest_dict.get("user_id", "all"),
|
||||
"resource_uuid":
|
||||
latest_dict.get("resource_uuid", "all"),
|
||||
"geolocation":
|
||||
latest_dict.get("geolocation", "all"),
|
||||
"region":
|
||||
latest_dict.get("region", "all"),
|
||||
"zone":
|
||||
latest_dict.get("zone", "all"),
|
||||
"host":
|
||||
latest_dict.get("host", "all"),
|
||||
"project_id":
|
||||
latest_dict.get("project_id", "all"),
|
||||
"aggregated_metric_name":
|
||||
latest_dict["aggregated_metric_name"],
|
||||
"quantity": rate_percentage,
|
||||
"firstrecord_timestamp_unix":
|
||||
oldest_dict["firstrecord_timestamp_unix"],
|
||||
"firstrecord_timestamp_string":
|
||||
oldest_dict["firstrecord_timestamp_string"],
|
||||
"lastrecord_timestamp_unix":
|
||||
latest_dict["lastrecord_timestamp_unix"],
|
||||
"lastrecord_timestamp_string":
|
||||
latest_dict["lastrecord_timestamp_string"],
|
||||
"record_count": oldest_dict["record_count"] +
|
||||
latest_dict["record_count"],
|
||||
"usage_date": latest_dict["usage_date"],
|
||||
"usage_hour": latest_dict["usage_hour"],
|
||||
"usage_minute": latest_dict["usage_minute"],
|
||||
"aggregation_period":
|
||||
latest_dict["aggregation_period"],
|
||||
"extra_data_map": extra_data_map
|
||||
}
|
||||
|
||||
instance_usage_data_json = json.dumps(instance_usage_dict)
|
||||
instance_usage_data_json_list.append(instance_usage_data_json)
|
||||
|
||||
# convert to rdd
|
||||
spark_context = instance_usage_df.rdd.context
|
||||
return spark_context.parallelize(instance_usage_data_json_list)
|
||||
|
||||
@staticmethod
|
||||
def do_rate_calculation(instance_usage_df):
|
||||
instance_usage_json_rdd = PreHourlyCalculateRate._calculate_rate(
|
||||
instance_usage_df)
|
||||
|
||||
sql_context = SQLContext.getOrCreate(instance_usage_df.rdd.context)
|
||||
instance_usage_trans_df = InstanceUsageUtils.create_df_from_json_rdd(
|
||||
sql_context,
|
||||
instance_usage_json_rdd)
|
||||
|
||||
return instance_usage_trans_df
|
@ -1,261 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from pyspark.sql import SQLContext
|
||||
|
||||
import datetime
|
||||
|
||||
from monasca_transform.component import Component
|
||||
from monasca_transform.component.component_utils import ComponentUtils
|
||||
from monasca_transform.component.setter import SetterComponent
|
||||
from monasca_transform.transform.transform_utils import InstanceUsageUtils
|
||||
|
||||
import json
|
||||
|
||||
|
||||
class RollupQuantityException(Exception):
|
||||
"""Exception thrown when doing quantity rollup
|
||||
|
||||
Attributes:
|
||||
value: string representing the error
|
||||
"""
|
||||
|
||||
def __init__(self, value):
|
||||
self.value = value
|
||||
|
||||
def __str__(self):
|
||||
return repr(self.value)
|
||||
|
||||
|
||||
class RollupQuantity(SetterComponent):
|
||||
|
||||
@staticmethod
|
||||
def _supported_rollup_operations():
|
||||
return ["sum", "max", "min", "avg"]
|
||||
|
||||
@staticmethod
|
||||
def _is_valid_rollup_operation(operation):
|
||||
if operation in RollupQuantity._supported_rollup_operations():
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
|
||||
@staticmethod
|
||||
def _rollup_quantity(instance_usage_df,
|
||||
setter_rollup_group_by_list,
|
||||
setter_rollup_operation):
|
||||
|
||||
instance_usage_data_json_list = []
|
||||
|
||||
# check if operation is valid
|
||||
if not RollupQuantity.\
|
||||
_is_valid_rollup_operation(setter_rollup_operation):
|
||||
raise RollupQuantityException(
|
||||
"Operation %s is not supported" % setter_rollup_operation)
|
||||
|
||||
# call required operation on grouped data
|
||||
# e.g. sum, max, min, avg etc
|
||||
agg_operations_map = {
|
||||
"quantity": str(setter_rollup_operation),
|
||||
"firstrecord_timestamp_unix": "min",
|
||||
"lastrecord_timestamp_unix": "max",
|
||||
"record_count": "sum"}
|
||||
|
||||
# do a group by
|
||||
grouped_data = instance_usage_df.groupBy(
|
||||
*setter_rollup_group_by_list)
|
||||
rollup_df = grouped_data.agg(agg_operations_map)
|
||||
|
||||
for row in rollup_df.collect():
|
||||
|
||||
# first record timestamp
|
||||
earliest_record_timestamp_unix = getattr(
|
||||
row, "min(firstrecord_timestamp_unix)",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
earliest_record_timestamp_string = \
|
||||
datetime.datetime.utcfromtimestamp(
|
||||
earliest_record_timestamp_unix).strftime(
|
||||
'%Y-%m-%d %H:%M:%S')
|
||||
|
||||
# last record_timestamp
|
||||
latest_record_timestamp_unix = getattr(
|
||||
row, "max(lastrecord_timestamp_unix)",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
latest_record_timestamp_string = \
|
||||
datetime.datetime.utcfromtimestamp(
|
||||
latest_record_timestamp_unix).strftime('%Y-%m-%d %H:%M:%S')
|
||||
|
||||
# record count
|
||||
record_count = getattr(row, "sum(record_count)", 0.0)
|
||||
|
||||
# quantity
|
||||
# get expression that will be used to select quantity
|
||||
# from rolled up data
|
||||
select_quant_str = "".join((setter_rollup_operation, "(quantity)"))
|
||||
quantity = getattr(row, select_quant_str, 0.0)
|
||||
|
||||
try:
|
||||
processing_meta = row.processing_meta
|
||||
except AttributeError:
|
||||
processing_meta = {}
|
||||
|
||||
# create a column name, value pairs from grouped data
|
||||
extra_data_map = InstanceUsageUtils.grouped_data_to_map(row,
|
||||
setter_rollup_group_by_list)
|
||||
|
||||
# convert column names, so that values can be accessed by components
|
||||
# later in the pipeline
|
||||
extra_data_map = InstanceUsageUtils.prepare_extra_data_map(extra_data_map)
|
||||
|
||||
# create a new instance usage dict
|
||||
instance_usage_dict = {"tenant_id": getattr(row, "tenant_id",
|
||||
"all"),
|
||||
"user_id":
|
||||
getattr(row, "user_id", "all"),
|
||||
"resource_uuid":
|
||||
getattr(row, "resource_uuid", "all"),
|
||||
"geolocation":
|
||||
getattr(row, "geolocation", "all"),
|
||||
"region":
|
||||
getattr(row, "region", "all"),
|
||||
"zone":
|
||||
getattr(row, "zone", "all"),
|
||||
"host":
|
||||
getattr(row, "host", "all"),
|
||||
"project_id":
|
||||
getattr(row, "tenant_id", "all"),
|
||||
"aggregated_metric_name":
|
||||
getattr(row, "aggregated_metric_name",
|
||||
"all"),
|
||||
"quantity":
|
||||
quantity,
|
||||
"firstrecord_timestamp_unix":
|
||||
earliest_record_timestamp_unix,
|
||||
"firstrecord_timestamp_string":
|
||||
earliest_record_timestamp_string,
|
||||
"lastrecord_timestamp_unix":
|
||||
latest_record_timestamp_unix,
|
||||
"lastrecord_timestamp_string":
|
||||
latest_record_timestamp_string,
|
||||
"record_count": record_count,
|
||||
"usage_date":
|
||||
getattr(row, "usage_date", "all"),
|
||||
"usage_hour":
|
||||
getattr(row, "usage_hour", "all"),
|
||||
"usage_minute":
|
||||
getattr(row, "usage_minute", "all"),
|
||||
"aggregation_period":
|
||||
getattr(row, "aggregation_period",
|
||||
"all"),
|
||||
"processing_meta": processing_meta,
|
||||
"extra_data_map": extra_data_map
|
||||
}
|
||||
|
||||
instance_usage_data_json = json.dumps(instance_usage_dict)
|
||||
instance_usage_data_json_list.append(instance_usage_data_json)
|
||||
|
||||
# convert to rdd
|
||||
spark_context = instance_usage_df.rdd.context
|
||||
return spark_context.parallelize(instance_usage_data_json_list)
|
||||
|
||||
@staticmethod
|
||||
def setter(transform_context, instance_usage_df):
|
||||
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
# get rollup operation (sum, max, avg, min)
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.setter_rollup_operation").\
|
||||
collect()[0].asDict()
|
||||
setter_rollup_operation = agg_params["setter_rollup_operation"]
|
||||
|
||||
instance_usage_trans_df = RollupQuantity.setter_by_operation(
|
||||
transform_context,
|
||||
instance_usage_df,
|
||||
setter_rollup_operation)
|
||||
|
||||
return instance_usage_trans_df
|
||||
|
||||
@staticmethod
|
||||
def setter_by_operation(transform_context, instance_usage_df,
|
||||
setter_rollup_operation):
|
||||
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
# get fields we want to group by for a rollup
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.setter_rollup_group_by_list"). \
|
||||
collect()[0].asDict()
|
||||
setter_rollup_group_by_list = agg_params["setter_rollup_group_by_list"]
|
||||
|
||||
# get aggregation period
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.aggregation_period").collect()[0].asDict()
|
||||
aggregation_period = agg_params["aggregation_period"]
|
||||
group_by_period_list = \
|
||||
ComponentUtils._get_instance_group_by_period_list(
|
||||
aggregation_period)
|
||||
|
||||
# group by columns list
|
||||
group_by_columns_list = \
|
||||
group_by_period_list + setter_rollup_group_by_list
|
||||
|
||||
# prepare for group by
|
||||
group_by_columns_list = InstanceUsageUtils.prepare_instance_usage_group_by_list(
|
||||
group_by_columns_list)
|
||||
|
||||
# perform rollup operation
|
||||
instance_usage_json_rdd = RollupQuantity._rollup_quantity(
|
||||
instance_usage_df,
|
||||
group_by_columns_list,
|
||||
str(setter_rollup_operation))
|
||||
|
||||
sql_context = SQLContext.getOrCreate(instance_usage_df.rdd.context)
|
||||
instance_usage_trans_df = InstanceUsageUtils.create_df_from_json_rdd(
|
||||
sql_context,
|
||||
instance_usage_json_rdd)
|
||||
|
||||
return instance_usage_trans_df
|
||||
|
||||
@staticmethod
|
||||
def do_rollup(setter_rollup_group_by_list,
|
||||
aggregation_period,
|
||||
setter_rollup_operation,
|
||||
instance_usage_df):
|
||||
|
||||
# get aggregation period
|
||||
group_by_period_list = \
|
||||
ComponentUtils._get_instance_group_by_period_list(
|
||||
aggregation_period)
|
||||
|
||||
# group by columns list
|
||||
group_by_columns_list = group_by_period_list + \
|
||||
setter_rollup_group_by_list
|
||||
|
||||
# prepare for group by
|
||||
group_by_columns_list = InstanceUsageUtils.prepare_instance_usage_group_by_list(
|
||||
group_by_columns_list)
|
||||
|
||||
# perform rollup operation
|
||||
instance_usage_json_rdd = RollupQuantity._rollup_quantity(
|
||||
instance_usage_df,
|
||||
group_by_columns_list,
|
||||
str(setter_rollup_operation))
|
||||
|
||||
sql_context = SQLContext.getOrCreate(instance_usage_df.rdd.context)
|
||||
instance_usage_trans_df = InstanceUsageUtils.create_df_from_json_rdd(
|
||||
sql_context,
|
||||
instance_usage_json_rdd)
|
||||
|
||||
return instance_usage_trans_df
|
@ -1,97 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from pyspark.sql import SQLContext
|
||||
|
||||
from monasca_transform.component import InstanceUsageDataAggParams
|
||||
from monasca_transform.component.setter import SetterComponent
|
||||
from monasca_transform.transform.transform_utils import InstanceUsageUtils
|
||||
|
||||
import json
|
||||
|
||||
|
||||
class SetAggregatedMetricName(SetterComponent):
|
||||
"""setter component that sets final aggregated metric name.
|
||||
|
||||
aggregated metric name is available as a parameter 'aggregated_metric_name'
|
||||
in aggregation_params in metric processing driver table.
|
||||
"""
|
||||
@staticmethod
|
||||
def _set_aggregated_metric_name(instance_usage_agg_params):
|
||||
|
||||
row = instance_usage_agg_params.instance_usage_data
|
||||
|
||||
agg_params = instance_usage_agg_params.agg_params
|
||||
|
||||
try:
|
||||
processing_meta = row.processing_meta
|
||||
except AttributeError:
|
||||
processing_meta = {}
|
||||
|
||||
# get any extra data
|
||||
extra_data_map = getattr(row, "extra_data_map", {})
|
||||
|
||||
instance_usage_dict = {"tenant_id": row.tenant_id,
|
||||
"user_id": row.user_id,
|
||||
"resource_uuid": row.resource_uuid,
|
||||
"geolocation": row.geolocation,
|
||||
"region": row.region,
|
||||
"zone": row.zone,
|
||||
"host": row.host,
|
||||
"project_id": row.project_id,
|
||||
"aggregated_metric_name":
|
||||
agg_params["aggregated_metric_name"],
|
||||
"quantity": row.quantity,
|
||||
"firstrecord_timestamp_unix":
|
||||
row.firstrecord_timestamp_unix,
|
||||
"firstrecord_timestamp_string":
|
||||
row.firstrecord_timestamp_string,
|
||||
"lastrecord_timestamp_unix":
|
||||
row.lastrecord_timestamp_unix,
|
||||
"lastrecord_timestamp_string":
|
||||
row.lastrecord_timestamp_string,
|
||||
"record_count": row.record_count,
|
||||
"usage_date": row.usage_date,
|
||||
"usage_hour": row.usage_hour,
|
||||
"usage_minute": row.usage_minute,
|
||||
"aggregation_period": row.aggregation_period,
|
||||
"processing_meta": processing_meta,
|
||||
"extra_data_map": extra_data_map}
|
||||
|
||||
instance_usage_data_json = json.dumps(instance_usage_dict)
|
||||
|
||||
return instance_usage_data_json
|
||||
|
||||
@staticmethod
|
||||
def setter(transform_context, instance_usage_df):
|
||||
"""set the aggregated metric name field for elements in instance usage rdd"""
|
||||
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.aggregated_metric_name").collect()[0].\
|
||||
asDict()
|
||||
|
||||
instance_usage_df_agg_params = instance_usage_df.rdd.map(
|
||||
lambda x: InstanceUsageDataAggParams(x, agg_params))
|
||||
|
||||
instance_usage_json_rdd = instance_usage_df_agg_params.map(
|
||||
SetAggregatedMetricName._set_aggregated_metric_name)
|
||||
|
||||
sql_context = SQLContext.getOrCreate(instance_usage_df.rdd.context)
|
||||
|
||||
instance_usage_trans_df = InstanceUsageUtils.create_df_from_json_rdd(
|
||||
sql_context,
|
||||
instance_usage_json_rdd)
|
||||
return instance_usage_trans_df
|
@ -1,97 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from pyspark.sql import SQLContext
|
||||
|
||||
from monasca_transform.component import InstanceUsageDataAggParams
|
||||
from monasca_transform.component.setter import SetterComponent
|
||||
from monasca_transform.transform.transform_utils import InstanceUsageUtils
|
||||
|
||||
import json
|
||||
|
||||
|
||||
class SetAggregatedPeriod(SetterComponent):
|
||||
"""setter component that sets final aggregated metric name.
|
||||
|
||||
aggregated metric name is available as a parameter 'aggregated_metric_name'
|
||||
in aggregation_params in metric processing driver table.
|
||||
"""
|
||||
@staticmethod
|
||||
def _set_aggregated_period(instance_usage_agg_params):
|
||||
|
||||
row = instance_usage_agg_params.instance_usage_data
|
||||
|
||||
agg_params = instance_usage_agg_params.agg_params
|
||||
|
||||
try:
|
||||
processing_meta = row.processing_meta
|
||||
except AttributeError:
|
||||
processing_meta = {}
|
||||
|
||||
# get any extra data
|
||||
extra_data_map = getattr(row, "extra_data_map", {})
|
||||
|
||||
instance_usage_dict = {"tenant_id": row.tenant_id,
|
||||
"user_id": row.user_id,
|
||||
"resource_uuid": row.resource_uuid,
|
||||
"geolocation": row.geolocation,
|
||||
"region": row.region,
|
||||
"zone": row.zone,
|
||||
"host": row.host,
|
||||
"project_id": row.project_id,
|
||||
"aggregated_metric_name":
|
||||
row.aggregated_metric_name,
|
||||
"quantity": row.quantity,
|
||||
"firstrecord_timestamp_unix":
|
||||
row.firstrecord_timestamp_unix,
|
||||
"firstrecord_timestamp_string":
|
||||
row.firstrecord_timestamp_string,
|
||||
"lastrecord_timestamp_unix":
|
||||
row.lastrecord_timestamp_unix,
|
||||
"lastrecord_timestamp_string":
|
||||
row.lastrecord_timestamp_string,
|
||||
"record_count": row.record_count,
|
||||
"usage_date": row.usage_date,
|
||||
"usage_hour": row.usage_hour,
|
||||
"usage_minute": row.usage_minute,
|
||||
"aggregation_period":
|
||||
agg_params["aggregation_period"],
|
||||
"processing_meta": processing_meta,
|
||||
"extra_data_map": extra_data_map}
|
||||
|
||||
instance_usage_data_json = json.dumps(instance_usage_dict)
|
||||
|
||||
return instance_usage_data_json
|
||||
|
||||
@staticmethod
|
||||
def setter(transform_context, instance_usage_df):
|
||||
"""set the aggregated metric name field for elements in instance usage rdd"""
|
||||
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.aggregation_period").collect()[0].asDict()
|
||||
|
||||
instance_usage_df_agg_params = instance_usage_df.rdd.map(
|
||||
lambda x: InstanceUsageDataAggParams(x, agg_params))
|
||||
|
||||
instance_usage_json_rdd = instance_usage_df_agg_params.map(
|
||||
SetAggregatedPeriod._set_aggregated_period)
|
||||
|
||||
sql_context = SQLContext.getOrCreate(instance_usage_df.rdd.context)
|
||||
|
||||
instance_usage_trans_df = InstanceUsageUtils.create_df_from_json_rdd(
|
||||
sql_context,
|
||||
instance_usage_json_rdd)
|
||||
return instance_usage_trans_df
|
@ -1,30 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import abc
|
||||
from monasca_transform.component import Component
|
||||
|
||||
|
||||
class UsageComponent(Component):
|
||||
|
||||
@abc.abstractmethod
|
||||
def usage(transform_context, record_store_df):
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement setter(instance_usage_df,"
|
||||
" transform_spec_df)"
|
||||
% __name__)
|
||||
|
||||
@staticmethod
|
||||
def get_component_type():
|
||||
return Component.USAGE_COMPONENT_TYPE
|
@ -1,164 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from pyspark.sql import SQLContext
|
||||
|
||||
from monasca_transform.component import Component
|
||||
from monasca_transform.component.setter.rollup_quantity import RollupQuantity
|
||||
from monasca_transform.component.usage.fetch_quantity import FetchQuantity
|
||||
from monasca_transform.component.usage import UsageComponent
|
||||
from monasca_transform.transform.transform_utils import InstanceUsageUtils
|
||||
|
||||
import json
|
||||
|
||||
|
||||
class CalculateRateException(Exception):
|
||||
"""Exception thrown when calculating rate
|
||||
|
||||
Attributes:
|
||||
value: string representing the error
|
||||
"""
|
||||
|
||||
def __init__(self, value):
|
||||
self.value = value
|
||||
|
||||
def __str__(self):
|
||||
return repr(self.value)
|
||||
|
||||
|
||||
class CalculateRate(UsageComponent):
|
||||
|
||||
@staticmethod
|
||||
def usage(transform_context, record_store_df):
|
||||
"""Method to return instance usage dataframe:
|
||||
|
||||
It groups together record store records by
|
||||
provided group by columns list,sorts within the group by event
|
||||
timestamp field, calculates the rate of change between the
|
||||
oldest and latest values, and returns the resultant value as an
|
||||
instance usage dataframe
|
||||
"""
|
||||
instance_usage_data_json_list = []
|
||||
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
# get aggregated metric name
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.aggregated_metric_name"). \
|
||||
collect()[0].asDict()
|
||||
aggregated_metric_name = agg_params["aggregated_metric_name"]
|
||||
|
||||
# get aggregation period
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.aggregation_period").collect()[0].asDict()
|
||||
aggregation_period = agg_params["aggregation_period"]
|
||||
|
||||
# Fetch the oldest quantities
|
||||
latest_instance_usage_df = \
|
||||
FetchQuantity().usage_by_operation(transform_context,
|
||||
record_store_df,
|
||||
"avg")
|
||||
|
||||
# Roll up the latest quantities
|
||||
latest_rolled_up_instance_usage_df = \
|
||||
RollupQuantity().setter_by_operation(transform_context,
|
||||
latest_instance_usage_df,
|
||||
"sum")
|
||||
|
||||
# Fetch the oldest quantities
|
||||
oldest_instance_usage_df = \
|
||||
FetchQuantity().usage_by_operation(transform_context,
|
||||
record_store_df,
|
||||
"oldest")
|
||||
|
||||
# Roll up the oldest quantities
|
||||
oldest_rolled_up_instance_usage_df = \
|
||||
RollupQuantity().setter_by_operation(transform_context,
|
||||
oldest_instance_usage_df,
|
||||
"sum")
|
||||
|
||||
# Calculate the rate change by percentage
|
||||
oldest_dict = oldest_rolled_up_instance_usage_df.collect()[0].asDict()
|
||||
oldest_quantity = float(oldest_dict['quantity'])
|
||||
|
||||
latest_dict = latest_rolled_up_instance_usage_df.collect()[0].asDict()
|
||||
latest_quantity = float(latest_dict['quantity'])
|
||||
|
||||
rate_percentage = \
|
||||
((oldest_quantity - latest_quantity) / oldest_quantity) * 100
|
||||
|
||||
# create a new instance usage dict
|
||||
instance_usage_dict = {"tenant_id":
|
||||
latest_dict.get("tenant_id", "all"),
|
||||
"user_id":
|
||||
latest_dict.get("user_id", "all"),
|
||||
"resource_uuid":
|
||||
latest_dict.get("resource_uuid", "all"),
|
||||
"geolocation":
|
||||
latest_dict.get("geolocation", "all"),
|
||||
"region":
|
||||
latest_dict.get("region", "all"),
|
||||
"zone":
|
||||
latest_dict.get("zone", "all"),
|
||||
"host":
|
||||
latest_dict.get("host", "all"),
|
||||
"project_id":
|
||||
latest_dict.get("project_id", "all"),
|
||||
"aggregated_metric_name":
|
||||
aggregated_metric_name,
|
||||
"quantity": rate_percentage,
|
||||
"firstrecord_timestamp_unix":
|
||||
oldest_dict["firstrecord_timestamp_unix"],
|
||||
"firstrecord_timestamp_string":
|
||||
oldest_dict["firstrecord_timestamp_string"],
|
||||
"lastrecord_timestamp_unix":
|
||||
latest_dict["lastrecord_timestamp_unix"],
|
||||
"lastrecord_timestamp_string":
|
||||
latest_dict["lastrecord_timestamp_string"],
|
||||
"record_count": oldest_dict["record_count"] +
|
||||
latest_dict["record_count"],
|
||||
"usage_date": latest_dict["usage_date"],
|
||||
"usage_hour": latest_dict["usage_hour"],
|
||||
"usage_minute": latest_dict["usage_minute"],
|
||||
"aggregation_period": aggregation_period,
|
||||
"processing_meta":
|
||||
{"event_type":
|
||||
latest_dict.get("event_type",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"oldest_timestamp_string":
|
||||
oldest_dict[
|
||||
"firstrecord_timestamp_string"],
|
||||
"oldest_quantity": oldest_quantity,
|
||||
"latest_timestamp_string":
|
||||
latest_dict[
|
||||
"lastrecord_timestamp_string"],
|
||||
"latest_quantity": latest_quantity
|
||||
}
|
||||
}
|
||||
|
||||
instance_usage_data_json = json.dumps(instance_usage_dict)
|
||||
instance_usage_data_json_list.append(instance_usage_data_json)
|
||||
spark_context = record_store_df.rdd.context
|
||||
|
||||
instance_usage_rdd = \
|
||||
spark_context.parallelize(instance_usage_data_json_list)
|
||||
|
||||
sql_context = SQLContext\
|
||||
.getOrCreate(record_store_df.rdd.context)
|
||||
instance_usage_df = InstanceUsageUtils.create_df_from_json_rdd(
|
||||
sql_context,
|
||||
instance_usage_rdd)
|
||||
|
||||
return instance_usage_df
|
@ -1,478 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from collections import namedtuple
|
||||
import datetime
|
||||
|
||||
from pyspark.sql import functions
|
||||
from pyspark.sql import SQLContext
|
||||
|
||||
|
||||
from monasca_transform.component import Component
|
||||
from monasca_transform.component.component_utils import ComponentUtils
|
||||
from monasca_transform.component.usage import UsageComponent
|
||||
from monasca_transform.transform.grouping.group_sort_by_timestamp \
|
||||
import GroupSortbyTimestamp
|
||||
from monasca_transform.transform.grouping.group_sort_by_timestamp_partition \
|
||||
import GroupSortbyTimestampPartition
|
||||
from monasca_transform.transform.transform_utils import InstanceUsageUtils
|
||||
from monasca_transform.transform.transform_utils import RecordStoreUtils
|
||||
|
||||
import json
|
||||
|
||||
|
||||
class FetchQuantityException(Exception):
|
||||
"""Exception thrown when fetching quantity
|
||||
|
||||
Attributes:
|
||||
value: string representing the error
|
||||
"""
|
||||
|
||||
def __init__(self, value):
|
||||
self.value = value
|
||||
|
||||
def __str__(self):
|
||||
return repr(self.value)
|
||||
|
||||
|
||||
GroupedDataNamedTuple = namedtuple("GroupedDataWithOperation",
|
||||
["grouped_data",
|
||||
"usage_fetch_operation",
|
||||
"group_by_columns_list"])
|
||||
|
||||
|
||||
class GroupedDataNamedTuple(GroupedDataNamedTuple):
|
||||
"""A tuple which is a wrapper containing record store data and the usage operation
|
||||
|
||||
namdetuple contains:
|
||||
|
||||
grouped_data - grouped record store data
|
||||
usage_fetch_operation - operation to be performed on
|
||||
grouped data group_by_columns_list - list of group by columns
|
||||
"""
|
||||
|
||||
|
||||
class FetchQuantity(UsageComponent):
|
||||
|
||||
@staticmethod
|
||||
def _supported_fetch_operations():
|
||||
return ["sum", "max", "min", "avg", "latest", "oldest"]
|
||||
|
||||
@staticmethod
|
||||
def _is_valid_fetch_operation(operation):
|
||||
"""return true if its a valid fetch operation"""
|
||||
if operation in FetchQuantity._supported_fetch_operations():
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
|
||||
@staticmethod
|
||||
def _get_latest_oldest_quantity(grouped_data_named_tuple):
|
||||
"""Get quantity for each group.
|
||||
|
||||
By performing the requested usage operation and return a instance usage data.
|
||||
"""
|
||||
# row
|
||||
grouping_results = grouped_data_named_tuple.\
|
||||
grouped_data
|
||||
|
||||
# usage fetch operation
|
||||
usage_fetch_operation = grouped_data_named_tuple.\
|
||||
usage_fetch_operation
|
||||
|
||||
# group_by_columns_list
|
||||
group_by_columns_list = grouped_data_named_tuple.\
|
||||
group_by_columns_list
|
||||
|
||||
group_by_dict = grouping_results.grouping_key_dict
|
||||
|
||||
#
|
||||
tenant_id = group_by_dict.get("tenant_id",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
resource_uuid = group_by_dict.get("resource_uuid",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
user_id = group_by_dict.get("user_id",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
|
||||
geolocation = group_by_dict.get("geolocation",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
region = group_by_dict.get("region",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
zone = group_by_dict.get("zone", Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
host = group_by_dict.get("host", Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
|
||||
usage_date = group_by_dict.get("event_date",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
usage_hour = group_by_dict.get("event_hour",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
usage_minute = group_by_dict.get("event_minute",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
|
||||
aggregated_metric_name = group_by_dict.get(
|
||||
"aggregated_metric_name", Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
|
||||
# stats
|
||||
agg_stats = grouping_results.results
|
||||
|
||||
# get quantity for this host
|
||||
quantity = None
|
||||
if (usage_fetch_operation == "latest"):
|
||||
quantity = agg_stats["lastrecord_quantity"]
|
||||
elif usage_fetch_operation == "oldest":
|
||||
quantity = agg_stats["firstrecord_quantity"]
|
||||
|
||||
firstrecord_timestamp_unix = agg_stats["firstrecord_timestamp_unix"]
|
||||
firstrecord_timestamp_string = \
|
||||
agg_stats["firstrecord_timestamp_string"]
|
||||
lastrecord_timestamp_unix = agg_stats["lastrecord_timestamp_unix"]
|
||||
lastrecord_timestamp_string = agg_stats["lastrecord_timestamp_string"]
|
||||
record_count = agg_stats["record_count"]
|
||||
|
||||
# aggregation period
|
||||
aggregation_period = Component.DEFAULT_UNAVAILABLE_VALUE
|
||||
|
||||
# event type
|
||||
event_type = group_by_dict.get("event_type",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
|
||||
# add group by fields data to extra data map
|
||||
# get existing extra_data_map if any
|
||||
extra_data_map = group_by_dict.get("extra_data_map", {})
|
||||
for column_name in group_by_columns_list:
|
||||
column_value = group_by_dict.get(column_name, Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE)
|
||||
extra_data_map[column_name] = column_value
|
||||
|
||||
instance_usage_dict = {"tenant_id": tenant_id, "user_id": user_id,
|
||||
"resource_uuid": resource_uuid,
|
||||
"geolocation": geolocation, "region": region,
|
||||
"zone": zone, "host": host,
|
||||
"aggregated_metric_name":
|
||||
aggregated_metric_name,
|
||||
"quantity": quantity,
|
||||
"firstrecord_timestamp_unix":
|
||||
firstrecord_timestamp_unix,
|
||||
"firstrecord_timestamp_string":
|
||||
firstrecord_timestamp_string,
|
||||
"lastrecord_timestamp_unix":
|
||||
lastrecord_timestamp_unix,
|
||||
"lastrecord_timestamp_string":
|
||||
lastrecord_timestamp_string,
|
||||
"record_count": record_count,
|
||||
"usage_date": usage_date,
|
||||
"usage_hour": usage_hour,
|
||||
"usage_minute": usage_minute,
|
||||
"aggregation_period": aggregation_period,
|
||||
"processing_meta": {"event_type": event_type},
|
||||
"extra_data_map": extra_data_map
|
||||
}
|
||||
instance_usage_data_json = json.dumps(instance_usage_dict)
|
||||
|
||||
return instance_usage_data_json
|
||||
|
||||
@staticmethod
|
||||
def _get_quantity(grouped_data_named_tuple):
|
||||
|
||||
# row
|
||||
row = grouped_data_named_tuple.grouped_data
|
||||
|
||||
# usage fetch operation
|
||||
usage_fetch_operation = grouped_data_named_tuple.\
|
||||
usage_fetch_operation
|
||||
|
||||
# group by columns list
|
||||
|
||||
group_by_columns_list = grouped_data_named_tuple.\
|
||||
group_by_columns_list
|
||||
|
||||
# first record timestamp # FIXME: beginning of epoch?
|
||||
earliest_record_timestamp_unix = getattr(
|
||||
row, "min(event_timestamp_unix_for_min)",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
earliest_record_timestamp_string = \
|
||||
datetime.datetime.utcfromtimestamp(
|
||||
earliest_record_timestamp_unix).strftime(
|
||||
'%Y-%m-%d %H:%M:%S')
|
||||
|
||||
# last record_timestamp # FIXME: beginning of epoch?
|
||||
latest_record_timestamp_unix = getattr(
|
||||
row, "max(event_timestamp_unix_for_max)",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
latest_record_timestamp_string = \
|
||||
datetime.datetime.utcfromtimestamp(
|
||||
latest_record_timestamp_unix).strftime('%Y-%m-%d %H:%M:%S')
|
||||
|
||||
# record count
|
||||
record_count = getattr(row, "count(event_timestamp_unix)", 0.0)
|
||||
|
||||
# quantity
|
||||
# get expression that will be used to select quantity
|
||||
# from rolled up data
|
||||
select_quant_str = "".join((usage_fetch_operation, "(event_quantity)"))
|
||||
quantity = getattr(row, select_quant_str, 0.0)
|
||||
|
||||
# create a column name, value pairs from grouped data
|
||||
extra_data_map = InstanceUsageUtils.grouped_data_to_map(row,
|
||||
group_by_columns_list)
|
||||
|
||||
# convert column names, so that values can be accessed by components
|
||||
# later in the pipeline
|
||||
extra_data_map = InstanceUsageUtils.prepare_extra_data_map(extra_data_map)
|
||||
|
||||
# create a new instance usage dict
|
||||
instance_usage_dict = {"tenant_id": getattr(row, "tenant_id",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"user_id":
|
||||
getattr(row, "user_id",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"resource_uuid":
|
||||
getattr(row, "resource_uuid",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"geolocation":
|
||||
getattr(row, "geolocation",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"region":
|
||||
getattr(row, "region",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"zone":
|
||||
getattr(row, "zone",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"host":
|
||||
getattr(row, "host",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"project_id":
|
||||
getattr(row, "tenant_id",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"aggregated_metric_name":
|
||||
getattr(row, "aggregated_metric_name",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"quantity":
|
||||
quantity,
|
||||
"firstrecord_timestamp_unix":
|
||||
earliest_record_timestamp_unix,
|
||||
"firstrecord_timestamp_string":
|
||||
earliest_record_timestamp_string,
|
||||
"lastrecord_timestamp_unix":
|
||||
latest_record_timestamp_unix,
|
||||
"lastrecord_timestamp_string":
|
||||
latest_record_timestamp_string,
|
||||
"record_count": record_count,
|
||||
"usage_date":
|
||||
getattr(row, "event_date",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"usage_hour":
|
||||
getattr(row, "event_hour",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"usage_minute":
|
||||
getattr(row, "event_minute",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"aggregation_period":
|
||||
getattr(row, "aggregation_period",
|
||||
Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE),
|
||||
"processing_meta": {"event_type": getattr(
|
||||
row, "event_type",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)},
|
||||
"extra_data_map": extra_data_map
|
||||
}
|
||||
|
||||
instance_usage_data_json = json.dumps(instance_usage_dict)
|
||||
return instance_usage_data_json
|
||||
|
||||
@staticmethod
|
||||
def usage(transform_context, record_store_df):
|
||||
"""Method to return the latest quantity as an instance usage dataframe:
|
||||
|
||||
It groups together record store records by
|
||||
provided group by columns list , sorts within the group by event
|
||||
timestamp field, applies group stats udf and returns the latest
|
||||
quantity as an instance usage dataframe
|
||||
"""
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
# get rollup operation (sum, max, avg, min)
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.usage_fetch_operation").\
|
||||
collect()[0].asDict()
|
||||
usage_fetch_operation = agg_params["usage_fetch_operation"]
|
||||
|
||||
instance_usage_df = FetchQuantity.usage_by_operation(
|
||||
transform_context, record_store_df, usage_fetch_operation)
|
||||
|
||||
return instance_usage_df
|
||||
|
||||
@staticmethod
|
||||
def usage_by_operation(transform_context, record_store_df,
|
||||
usage_fetch_operation):
|
||||
"""Returns the latest quantity as a instance usage dataframe
|
||||
|
||||
It groups together record store records by
|
||||
provided group by columns list , sorts within the group by event
|
||||
timestamp field, applies group stats udf and returns the latest
|
||||
quantity as an instance usage dataframe
|
||||
"""
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
# check if operation is valid
|
||||
if not FetchQuantity. \
|
||||
_is_valid_fetch_operation(usage_fetch_operation):
|
||||
raise FetchQuantityException(
|
||||
"Operation %s is not supported" % usage_fetch_operation)
|
||||
|
||||
# get aggregation period
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.aggregation_period").collect()[0].asDict()
|
||||
aggregation_period = agg_params["aggregation_period"]
|
||||
group_by_period_list = ComponentUtils._get_group_by_period_list(
|
||||
aggregation_period)
|
||||
|
||||
# retrieve filter specifications
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.filter_by_list"). \
|
||||
collect()[0].asDict()
|
||||
filter_by_list = \
|
||||
agg_params["filter_by_list"]
|
||||
|
||||
# if filter(s) have been specified, apply them one at a time
|
||||
if filter_by_list:
|
||||
for filter_element in filter_by_list:
|
||||
field_to_filter = filter_element["field_to_filter"]
|
||||
filter_expression = filter_element["filter_expression"]
|
||||
filter_operation = filter_element["filter_operation"]
|
||||
|
||||
if (field_to_filter and
|
||||
filter_expression and
|
||||
filter_operation and
|
||||
(filter_operation == "include" or
|
||||
filter_operation == "exclude")):
|
||||
if filter_operation == "include":
|
||||
match = True
|
||||
else:
|
||||
match = False
|
||||
# apply the specified filter to the record store
|
||||
record_store_df = record_store_df.where(
|
||||
functions.col(str(field_to_filter)).rlike(
|
||||
str(filter_expression)) == match)
|
||||
else:
|
||||
raise FetchQuantityException(
|
||||
"Encountered invalid filter details: "
|
||||
"field to filter = %s, filter expression = %s, "
|
||||
"filter operation = %s. All values must be "
|
||||
"supplied and filter operation must be either "
|
||||
"'include' or 'exclude'." % (field_to_filter,
|
||||
filter_expression,
|
||||
filter_operation))
|
||||
|
||||
# get what we want to group by
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.aggregation_group_by_list"). \
|
||||
collect()[0].asDict()
|
||||
aggregation_group_by_list = agg_params["aggregation_group_by_list"]
|
||||
|
||||
# group by columns list
|
||||
group_by_columns_list = group_by_period_list + \
|
||||
aggregation_group_by_list
|
||||
|
||||
# prepare group by columns list
|
||||
group_by_columns_list = RecordStoreUtils.prepare_recordstore_group_by_list(
|
||||
group_by_columns_list)
|
||||
|
||||
instance_usage_json_rdd = None
|
||||
if (usage_fetch_operation == "latest" or
|
||||
usage_fetch_operation == "oldest"):
|
||||
|
||||
grouped_rows_rdd = None
|
||||
|
||||
# FIXME:
|
||||
# select group by method
|
||||
IS_GROUP_BY_PARTITION = False
|
||||
|
||||
if (IS_GROUP_BY_PARTITION):
|
||||
# GroupSortbyTimestampPartition is a more scalable
|
||||
# since it creates groups using repartitioning and sorting
|
||||
# but is disabled
|
||||
|
||||
# number of groups should be more than what is expected
|
||||
# this might be hard to guess. Setting this to a very
|
||||
# high number is adversely affecting performance
|
||||
num_of_groups = 100
|
||||
grouped_rows_rdd = \
|
||||
GroupSortbyTimestampPartition. \
|
||||
fetch_group_latest_oldest_quantity(
|
||||
record_store_df, transform_spec_df,
|
||||
group_by_columns_list,
|
||||
num_of_groups)
|
||||
else:
|
||||
# group using key-value pair RDD's groupByKey()
|
||||
grouped_rows_rdd = \
|
||||
GroupSortbyTimestamp. \
|
||||
fetch_group_latest_oldest_quantity(
|
||||
record_store_df, transform_spec_df,
|
||||
group_by_columns_list)
|
||||
|
||||
grouped_data_rdd_with_operation = grouped_rows_rdd.map(
|
||||
lambda x:
|
||||
GroupedDataNamedTuple(x,
|
||||
str(usage_fetch_operation),
|
||||
group_by_columns_list))
|
||||
|
||||
instance_usage_json_rdd = \
|
||||
grouped_data_rdd_with_operation.map(
|
||||
FetchQuantity._get_latest_oldest_quantity)
|
||||
else:
|
||||
record_store_df_int = \
|
||||
record_store_df.select(
|
||||
record_store_df.event_timestamp_unix.alias(
|
||||
"event_timestamp_unix_for_min"),
|
||||
record_store_df.event_timestamp_unix.alias(
|
||||
"event_timestamp_unix_for_max"),
|
||||
"*")
|
||||
|
||||
# for standard sum, max, min, avg operations on grouped data
|
||||
agg_operations_map = {
|
||||
"event_quantity": str(usage_fetch_operation),
|
||||
"event_timestamp_unix_for_min": "min",
|
||||
"event_timestamp_unix_for_max": "max",
|
||||
"event_timestamp_unix": "count"}
|
||||
|
||||
# do a group by
|
||||
grouped_data = record_store_df_int.groupBy(*group_by_columns_list)
|
||||
grouped_record_store_df = grouped_data.agg(agg_operations_map)
|
||||
|
||||
grouped_data_rdd_with_operation = grouped_record_store_df.rdd.map(
|
||||
lambda x:
|
||||
GroupedDataNamedTuple(x,
|
||||
str(usage_fetch_operation),
|
||||
group_by_columns_list))
|
||||
|
||||
instance_usage_json_rdd = grouped_data_rdd_with_operation.map(
|
||||
FetchQuantity._get_quantity)
|
||||
|
||||
sql_context = SQLContext.getOrCreate(record_store_df.rdd.context)
|
||||
instance_usage_df = \
|
||||
InstanceUsageUtils.create_df_from_json_rdd(sql_context,
|
||||
instance_usage_json_rdd)
|
||||
return instance_usage_df
|
@ -1,280 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from pyspark.sql.functions import col
|
||||
from pyspark.sql.functions import when
|
||||
from pyspark.sql import SQLContext
|
||||
|
||||
from monasca_transform.component import Component
|
||||
from monasca_transform.component.component_utils import ComponentUtils
|
||||
from monasca_transform.component.usage.fetch_quantity import FetchQuantity
|
||||
from monasca_transform.component.usage import UsageComponent
|
||||
|
||||
from monasca_transform.transform.transform_utils import InstanceUsageUtils
|
||||
|
||||
import json
|
||||
|
||||
|
||||
class FetchQuantityUtilException(Exception):
|
||||
"""Exception thrown when fetching quantity
|
||||
|
||||
Attributes:
|
||||
value: string representing the error
|
||||
"""
|
||||
|
||||
def __init__(self, value):
|
||||
self.value = value
|
||||
|
||||
def __str__(self):
|
||||
return repr(self.value)
|
||||
|
||||
|
||||
class FetchQuantityUtil(UsageComponent):
|
||||
|
||||
@staticmethod
|
||||
def _supported_fetch_quantity_util_operations():
|
||||
# The results of "sum", "max", and "min" don't make sense and/or
|
||||
# may be misleading (the latter two due to the metrics which are
|
||||
# used as input to the utilization calculation potentially not
|
||||
# being from the same time period...e.g., one being from the
|
||||
# beginning of the streaming intervale and the other being from
|
||||
# the end.
|
||||
return ["avg", "latest", "oldest"]
|
||||
|
||||
@staticmethod
|
||||
def _is_valid_fetch_quantity_util_operation(operation):
|
||||
"""return true if its a valid fetch operation"""
|
||||
if operation in FetchQuantityUtil.\
|
||||
_supported_fetch_quantity_util_operations():
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
|
||||
@staticmethod
|
||||
def _format_quantity_util(row):
|
||||
"""Converts calculated utilized quantity to an instance usage format
|
||||
|
||||
Calculation based on idle percentage
|
||||
"""
|
||||
#
|
||||
tenant_id = getattr(row, "tenant_id", "all")
|
||||
resource_uuid = getattr(row, "resource_uuid",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
user_id = getattr(row, "user_id",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
geolocation = getattr(row, "geolocation",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
region = getattr(row, "region", Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
zone = getattr(row, "zone", Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
host = getattr(row, "host", "all")
|
||||
|
||||
usage_date = getattr(row, "usage_date",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
usage_hour = getattr(row, "usage_hour",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
usage_minute = getattr(row, "usage_minute",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
|
||||
aggregated_metric_name = getattr(row, "aggregated_metric_name",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
|
||||
# get utilized quantity
|
||||
quantity = row.utilized_quantity
|
||||
|
||||
firstrecord_timestamp_unix = \
|
||||
getattr(row, "firstrecord_timestamp_unix",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
firstrecord_timestamp_string = \
|
||||
getattr(row, "firstrecord_timestamp_string",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
lastrecord_timestamp_unix = \
|
||||
getattr(row, "lastrecord_timestamp_unix",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
lastrecord_timestamp_string = \
|
||||
getattr(row, "lastrecord_timestamp_string",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
record_count = getattr(row, "record_count",
|
||||
Component.DEFAULT_UNAVAILABLE_VALUE)
|
||||
|
||||
# aggregation period
|
||||
aggregation_period = Component.DEFAULT_UNAVAILABLE_VALUE
|
||||
|
||||
# get extra_data_map, if any
|
||||
extra_data_map = getattr(row, "extra_data_map", {})
|
||||
# filter out event_type
|
||||
extra_data_map_filtered = \
|
||||
{key: extra_data_map[key] for key in list(extra_data_map)
|
||||
if key != 'event_type'}
|
||||
|
||||
instance_usage_dict = {"tenant_id": tenant_id, "user_id": user_id,
|
||||
"resource_uuid": resource_uuid,
|
||||
"geolocation": geolocation, "region": region,
|
||||
"zone": zone, "host": host,
|
||||
"aggregated_metric_name":
|
||||
aggregated_metric_name,
|
||||
"quantity": quantity,
|
||||
"firstrecord_timestamp_unix":
|
||||
firstrecord_timestamp_unix,
|
||||
"firstrecord_timestamp_string":
|
||||
firstrecord_timestamp_string,
|
||||
"lastrecord_timestamp_unix":
|
||||
lastrecord_timestamp_unix,
|
||||
"lastrecord_timestamp_string":
|
||||
lastrecord_timestamp_string,
|
||||
"record_count": record_count,
|
||||
"usage_date": usage_date,
|
||||
"usage_hour": usage_hour,
|
||||
"usage_minute": usage_minute,
|
||||
"aggregation_period": aggregation_period,
|
||||
"extra_data_map": extra_data_map_filtered}
|
||||
|
||||
instance_usage_data_json = json.dumps(instance_usage_dict)
|
||||
|
||||
return instance_usage_data_json
|
||||
|
||||
@staticmethod
|
||||
def usage(transform_context, record_store_df):
|
||||
"""Method to return instance usage dataframe:
|
||||
|
||||
It groups together record store records by
|
||||
provided group by columns list, sorts within the group by event
|
||||
timestamp field, applies group stats udf and returns the latest
|
||||
quantity as a instance usage dataframe
|
||||
|
||||
This component does groups records by event_type (a.k.a metric name)
|
||||
and expects two kinds of records in record_store data
|
||||
total quantity records - the total available quantity
|
||||
e.g. cpu.total_logical_cores
|
||||
idle perc records - percentage that is idle
|
||||
e.g. cpu.idle_perc
|
||||
|
||||
To calculate the utilized quantity this component uses following
|
||||
formula:
|
||||
|
||||
utilized quantity = (100 - idle_perc) * total_quantity / 100
|
||||
|
||||
"""
|
||||
|
||||
sql_context = SQLContext.getOrCreate(record_store_df.rdd.context)
|
||||
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
# get rollup operation (sum, max, avg, min)
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.usage_fetch_operation"). \
|
||||
collect()[0].asDict()
|
||||
usage_fetch_operation = agg_params["usage_fetch_operation"]
|
||||
|
||||
# check if operation is valid
|
||||
if not FetchQuantityUtil. \
|
||||
_is_valid_fetch_quantity_util_operation(usage_fetch_operation):
|
||||
raise FetchQuantityUtilException(
|
||||
"Operation %s is not supported" % usage_fetch_operation)
|
||||
|
||||
# get the quantities for idle perc and quantity
|
||||
instance_usage_df = FetchQuantity().usage(
|
||||
transform_context, record_store_df)
|
||||
|
||||
# get aggregation period for instance usage dataframe
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.aggregation_period").collect()[0].asDict()
|
||||
aggregation_period = agg_params["aggregation_period"]
|
||||
group_by_period_list = ComponentUtils.\
|
||||
_get_instance_group_by_period_list(aggregation_period)
|
||||
|
||||
# get what we want to group by
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.aggregation_group_by_list").\
|
||||
collect()[0].asDict()
|
||||
aggregation_group_by_list = agg_params["aggregation_group_by_list"]
|
||||
|
||||
# group by columns list
|
||||
group_by_columns_list = group_by_period_list + \
|
||||
aggregation_group_by_list
|
||||
|
||||
# get quantity event type
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.usage_fetch_util_quantity_event_type").\
|
||||
collect()[0].asDict()
|
||||
usage_fetch_util_quantity_event_type = \
|
||||
agg_params["usage_fetch_util_quantity_event_type"]
|
||||
|
||||
# check if driver parameter is provided
|
||||
if usage_fetch_util_quantity_event_type is None or \
|
||||
usage_fetch_util_quantity_event_type == "":
|
||||
raise FetchQuantityUtilException(
|
||||
"Driver parameter '%s' is missing"
|
||||
% "usage_fetch_util_quantity_event_type")
|
||||
|
||||
# get idle perc event type
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.usage_fetch_util_idle_perc_event_type").\
|
||||
collect()[0].asDict()
|
||||
usage_fetch_util_idle_perc_event_type = \
|
||||
agg_params["usage_fetch_util_idle_perc_event_type"]
|
||||
|
||||
# check if driver parameter is provided
|
||||
if usage_fetch_util_idle_perc_event_type is None or \
|
||||
usage_fetch_util_idle_perc_event_type == "":
|
||||
raise FetchQuantityUtilException(
|
||||
"Driver parameter '%s' is missing"
|
||||
% "usage_fetch_util_idle_perc_event_type")
|
||||
|
||||
# get quantity records dataframe
|
||||
event_type_quantity_clause = "processing_meta.event_type='%s'" \
|
||||
% usage_fetch_util_quantity_event_type
|
||||
quantity_df = instance_usage_df.select('*').where(
|
||||
event_type_quantity_clause).alias("quantity_df_alias")
|
||||
|
||||
# get idle perc records dataframe
|
||||
event_type_idle_perc_clause = "processing_meta.event_type='%s'" \
|
||||
% usage_fetch_util_idle_perc_event_type
|
||||
idle_perc_df = instance_usage_df.select('*').where(
|
||||
event_type_idle_perc_clause).alias("idle_perc_df_alias")
|
||||
|
||||
# join quantity records with idle perc records
|
||||
# create a join condition without the event_type
|
||||
cond = [item for item in group_by_columns_list
|
||||
if item != 'event_type']
|
||||
quant_idle_perc_df = quantity_df.join(idle_perc_df, cond, 'left')
|
||||
|
||||
#
|
||||
# Find utilized quantity based on idle percentage
|
||||
#
|
||||
# utilized quantity = (100 - idle_perc) * total_quantity / 100
|
||||
#
|
||||
quant_idle_perc_calc_df = quant_idle_perc_df.select(
|
||||
col("quantity_df_alias.*"),
|
||||
when(col("idle_perc_df_alias.quantity") != 0.0,
|
||||
(100.0 - col(
|
||||
"idle_perc_df_alias.quantity")) * col(
|
||||
"quantity_df_alias.quantity") / 100.0)
|
||||
.otherwise(col("quantity_df_alias.quantity"))
|
||||
.alias("utilized_quantity"),
|
||||
|
||||
col("quantity_df_alias.quantity")
|
||||
.alias("total_quantity"),
|
||||
|
||||
col("idle_perc_df_alias.quantity")
|
||||
.alias("idle_perc"))
|
||||
|
||||
instance_usage_json_rdd = \
|
||||
quant_idle_perc_calc_df.rdd.map(
|
||||
FetchQuantityUtil._format_quantity_util)
|
||||
|
||||
instance_usage_df = \
|
||||
InstanceUsageUtils.create_df_from_json_rdd(sql_context,
|
||||
instance_usage_json_rdd)
|
||||
|
||||
return instance_usage_df
|
@ -1,154 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from oslo_config import cfg
|
||||
|
||||
|
||||
class ConfigInitializer(object):
|
||||
|
||||
@staticmethod
|
||||
def basic_config(default_config_files=None):
|
||||
cfg.CONF.reset()
|
||||
ConfigInitializer.load_repositories_options()
|
||||
ConfigInitializer.load_database_options()
|
||||
ConfigInitializer.load_messaging_options()
|
||||
ConfigInitializer.load_service_options()
|
||||
ConfigInitializer.load_stage_processors_options()
|
||||
ConfigInitializer.load_pre_hourly_processor_options()
|
||||
if not default_config_files:
|
||||
default_config_files = ['/etc/monasca-transform.conf']
|
||||
cfg.CONF(args=[],
|
||||
project='monasca_transform',
|
||||
default_config_files=default_config_files)
|
||||
|
||||
@staticmethod
|
||||
def load_repositories_options():
|
||||
repo_opts = [
|
||||
cfg.StrOpt(
|
||||
'offsets',
|
||||
default='monasca_transform.offset_specs:JSONOffsetSpecs',
|
||||
help='Repository for offset persistence'
|
||||
),
|
||||
cfg.StrOpt(
|
||||
'data_driven_specs',
|
||||
default='monasca_transform.data_driven_specs.'
|
||||
'json_data_driven_specs_repo:JSONDataDrivenSpecsRepo',
|
||||
help='Repository for metric and event data_driven_specs'
|
||||
),
|
||||
cfg.IntOpt('offsets_max_revisions', default=10,
|
||||
help="Max revisions of offsets for each application")
|
||||
]
|
||||
repo_group = cfg.OptGroup(name='repositories', title='repositories')
|
||||
cfg.CONF.register_group(repo_group)
|
||||
cfg.CONF.register_opts(repo_opts, group=repo_group)
|
||||
|
||||
@staticmethod
|
||||
def load_database_options():
|
||||
db_opts = [
|
||||
cfg.StrOpt('server_type'),
|
||||
cfg.StrOpt('host'),
|
||||
cfg.StrOpt('database_name'),
|
||||
cfg.StrOpt('username'),
|
||||
cfg.StrOpt('password'),
|
||||
cfg.BoolOpt('use_ssl', default=False),
|
||||
cfg.StrOpt('ca_file')
|
||||
]
|
||||
mysql_group = cfg.OptGroup(name='database', title='database')
|
||||
cfg.CONF.register_group(mysql_group)
|
||||
cfg.CONF.register_opts(db_opts, group=mysql_group)
|
||||
|
||||
@staticmethod
|
||||
def load_messaging_options():
|
||||
messaging_options = [
|
||||
cfg.StrOpt('adapter',
|
||||
default='monasca_transform.messaging.adapter:'
|
||||
'KafkaMessageAdapter',
|
||||
help='Message adapter implementation'),
|
||||
cfg.StrOpt('topic', default='metrics',
|
||||
help='Messaging topic'),
|
||||
cfg.StrOpt('brokers',
|
||||
default='192.168.10.4:9092',
|
||||
help='Messaging brokers'),
|
||||
cfg.StrOpt('publish_kafka_project_id',
|
||||
default='111111',
|
||||
help='publish aggregated metrics tenant'),
|
||||
cfg.StrOpt('publish_region',
|
||||
default='useast',
|
||||
help='publish aggregated metrics region'),
|
||||
cfg.StrOpt('adapter_pre_hourly',
|
||||
default='monasca_transform.messaging.adapter:'
|
||||
'KafkaMessageAdapterPreHourly',
|
||||
help='Message adapter implementation'),
|
||||
cfg.StrOpt('topic_pre_hourly', default='metrics_pre_hourly',
|
||||
help='Messaging topic pre hourly')
|
||||
]
|
||||
messaging_group = cfg.OptGroup(name='messaging', title='messaging')
|
||||
cfg.CONF.register_group(messaging_group)
|
||||
cfg.CONF.register_opts(messaging_options, group=messaging_group)
|
||||
|
||||
@staticmethod
|
||||
def load_service_options():
|
||||
service_opts = [
|
||||
cfg.StrOpt('coordinator_address'),
|
||||
cfg.StrOpt('coordinator_group'),
|
||||
cfg.FloatOpt('election_polling_frequency'),
|
||||
cfg.BoolOpt('enable_debug_log_entries', default='false'),
|
||||
cfg.StrOpt('setup_file'),
|
||||
cfg.StrOpt('setup_target'),
|
||||
cfg.StrOpt('spark_driver'),
|
||||
cfg.StrOpt('service_log_path'),
|
||||
cfg.StrOpt('service_log_filename',
|
||||
default='monasca-transform.log'),
|
||||
cfg.StrOpt('spark_event_logging_dest'),
|
||||
cfg.StrOpt('spark_event_logging_enabled'),
|
||||
cfg.StrOpt('spark_jars_list'),
|
||||
cfg.StrOpt('spark_master_list'),
|
||||
cfg.StrOpt('spark_python_files'),
|
||||
cfg.IntOpt('stream_interval'),
|
||||
cfg.StrOpt('work_dir'),
|
||||
cfg.StrOpt('spark_home'),
|
||||
cfg.BoolOpt('enable_record_store_df_cache'),
|
||||
cfg.StrOpt('record_store_df_cache_storage_level')
|
||||
]
|
||||
service_group = cfg.OptGroup(name='service', title='service')
|
||||
cfg.CONF.register_group(service_group)
|
||||
cfg.CONF.register_opts(service_opts, group=service_group)
|
||||
|
||||
@staticmethod
|
||||
def load_stage_processors_options():
|
||||
app_opts = [
|
||||
cfg.BoolOpt('pre_hourly_processor_enabled'),
|
||||
]
|
||||
app_group = cfg.OptGroup(name='stage_processors',
|
||||
title='stage_processors')
|
||||
cfg.CONF.register_group(app_group)
|
||||
cfg.CONF.register_opts(app_opts, group=app_group)
|
||||
|
||||
@staticmethod
|
||||
def load_pre_hourly_processor_options():
|
||||
app_opts = [
|
||||
cfg.IntOpt('late_metric_slack_time', default=600),
|
||||
cfg.StrOpt('data_provider',
|
||||
default='monasca_transform.processor.'
|
||||
'pre_hourly_processor:'
|
||||
'PreHourlyProcessorDataProvider'),
|
||||
cfg.BoolOpt('enable_instance_usage_df_cache'),
|
||||
cfg.StrOpt('instance_usage_df_cache_storage_level'),
|
||||
cfg.BoolOpt('enable_batch_time_filtering'),
|
||||
cfg.IntOpt('effective_batch_revision', default=2)
|
||||
]
|
||||
app_group = cfg.OptGroup(name='pre_hourly_processor',
|
||||
title='pre_hourly_processor')
|
||||
cfg.CONF.register_group(app_group)
|
||||
cfg.CONF.register_opts(app_opts, group=app_group)
|
@ -1,43 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import abc
|
||||
from monasca_common.simport import simport
|
||||
from oslo_config import cfg
|
||||
import six
|
||||
|
||||
|
||||
class DataDrivenSpecsRepoFactory(object):
|
||||
|
||||
data_driven_specs_repo = None
|
||||
|
||||
@staticmethod
|
||||
def get_data_driven_specs_repo():
|
||||
if not DataDrivenSpecsRepoFactory.data_driven_specs_repo:
|
||||
DataDrivenSpecsRepoFactory.data_driven_specs_repo = simport.load(
|
||||
cfg.CONF.repositories.data_driven_specs)()
|
||||
return DataDrivenSpecsRepoFactory.data_driven_specs_repo
|
||||
|
||||
|
||||
@six.add_metaclass(abc.ABCMeta)
|
||||
class DataDrivenSpecsRepo(object):
|
||||
|
||||
transform_specs_type = 'transform_specs'
|
||||
pre_transform_specs_type = 'pre_transform_specs'
|
||||
|
||||
@abc.abstractmethod
|
||||
def get_data_driven_specs(self, sql_context=None, type=None):
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement get_data_driven_specs(self, type=None)"
|
||||
% self.__class__.__name__)
|
@ -1,76 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import json
|
||||
|
||||
from pyspark.sql import DataFrameReader
|
||||
|
||||
from monasca_transform.data_driven_specs.data_driven_specs_repo \
|
||||
import DataDrivenSpecsRepo
|
||||
from monasca_transform.db.db_utils import DbUtil
|
||||
|
||||
|
||||
class MySQLDataDrivenSpecsRepo(DataDrivenSpecsRepo):
|
||||
|
||||
transform_specs_data_frame = None
|
||||
pre_transform_specs_data_frame = None
|
||||
|
||||
def get_data_driven_specs(self, sql_context=None,
|
||||
data_driven_spec_type=None):
|
||||
data_driven_spec = None
|
||||
if self.transform_specs_type == data_driven_spec_type:
|
||||
if not self.transform_specs_data_frame:
|
||||
self.generate_transform_specs_data_frame(
|
||||
spark_context=sql_context._sc,
|
||||
sql_context=sql_context)
|
||||
data_driven_spec = self.transform_specs_data_frame
|
||||
elif self.pre_transform_specs_type == data_driven_spec_type:
|
||||
if not self.pre_transform_specs_data_frame:
|
||||
self.generate_pre_transform_specs_data_frame(
|
||||
spark_context=sql_context._sc,
|
||||
sql_context=sql_context)
|
||||
data_driven_spec = self.pre_transform_specs_data_frame
|
||||
return data_driven_spec
|
||||
|
||||
def generate_transform_specs_data_frame(self, spark_context=None,
|
||||
sql_context=None):
|
||||
|
||||
data_frame_reader = DataFrameReader(sql_context)
|
||||
transform_specs_data_frame = data_frame_reader.jdbc(
|
||||
DbUtil.get_java_db_connection_string(),
|
||||
'transform_specs'
|
||||
)
|
||||
data = []
|
||||
for item in transform_specs_data_frame.collect():
|
||||
spec = json.loads(item['transform_spec'])
|
||||
data.append(json.dumps(spec))
|
||||
|
||||
data_frame = sql_context.read.json(spark_context.parallelize(data))
|
||||
self.transform_specs_data_frame = data_frame
|
||||
|
||||
def generate_pre_transform_specs_data_frame(self, spark_context=None,
|
||||
sql_context=None):
|
||||
|
||||
data_frame_reader = DataFrameReader(sql_context)
|
||||
pre_transform_specs_data_frame = data_frame_reader.jdbc(
|
||||
DbUtil.get_java_db_connection_string(),
|
||||
'pre_transform_specs'
|
||||
)
|
||||
data = []
|
||||
for item in pre_transform_specs_data_frame.collect():
|
||||
spec = json.loads(item['pre_transform_spec'])
|
||||
data.append(json.dumps(spec))
|
||||
|
||||
data_frame = sql_context.read.json(spark_context.parallelize(data))
|
||||
self.pre_transform_specs_data_frame = data_frame
|
@ -1,17 +0,0 @@
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"mem.total_mb","metric_id_list":["mem_total_all"],"required_raw_fields_list":["creation_time"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"mem.usable_mb","metric_id_list":["mem_usable_all"],"required_raw_fields_list":["creation_time"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"vm.mem.total_mb","metric_id_list":["vm_mem_total_mb_all","vm_mem_total_mb_project"],"required_raw_fields_list":["creation_time","dimensions#tenant_id","dimensions#resource_id"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"vm.mem.used_mb","metric_id_list":["vm_mem_used_mb_all","vm_mem_used_mb_project"],"required_raw_fields_list":["creation_time","dimensions#tenant_id","dimensions#resource_id"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"nova.vm.mem.total_allocated_mb","metric_id_list":["nova_vm_mem_total_all"],"required_raw_fields_list":["creation_time"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"disk.total_space_mb","metric_id_list":["disk_total_all"],"required_raw_fields_list":["creation_time"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"disk.total_used_space_mb","metric_id_list":["disk_usable_all"],"required_raw_fields_list":["creation_time"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"nova.vm.disk.total_allocated_gb","metric_id_list":["nova_disk_total_allocated_gb_all"],"required_raw_fields_list":["creation_time"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"vm.disk.allocation","metric_id_list":["vm_disk_allocation_all","vm_disk_allocation_project"],"required_raw_fields_list":["creation_time","dimensions#tenant_id","dimensions#resource_id"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"cpu.total_logical_cores","metric_id_list":["cpu_total_all","cpu_total_host","cpu_util_all","cpu_util_host"],"required_raw_fields_list":["creation_time"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"cpu.idle_perc","metric_id_list":["cpu_util_all","cpu_util_host"],"required_raw_fields_list":["creation_time"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"vcpus","metric_id_list":["vcpus_all","vcpus_project"],"required_raw_fields_list":["creation_time","dimensions#project_id","dimensions#resource_id"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"vm.cpu.utilization_perc","metric_id_list":["vm_cpu_util_perc_project"],"required_raw_fields_list":["creation_time","dimensions#tenant_id","dimensions#resource_id"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"nova.vm.cpu.total_allocated","metric_id_list":["nova_vm_cpu_total_all"],"required_raw_fields_list":["creation_time"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"swiftlm.diskusage.host.val.size","metric_id_list":["swift_total_all","swift_total_host"],"required_raw_fields_list":["creation_time", "dimensions#hostname", "dimensions#mount"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"swiftlm.diskusage.host.val.avail","metric_id_list":["swift_avail_all","swift_avail_host","swift_usage_rate"],"required_raw_fields_list":["creation_time", "dimensions#hostname", "dimensions#mount"]}
|
||||
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"storage.objects.size","metric_id_list":["storage_objects_size_all"],"required_raw_fields_list":["creation_time", "dimensions#project_id"]}
|
@ -1,26 +0,0 @@
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"mem.total_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"mem_total_all","metric_id":"mem_total_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"mem.usable_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"mem_usable_all","metric_id":"mem_usable_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.mem.total_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_mem_total_mb_all","metric_id":"vm_mem_total_mb_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.mem.total_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["tenant_id"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_mem_total_mb_project","metric_id":"vm_mem_total_mb_project"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.mem.used_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_mem_used_mb_all","metric_id":"vm_mem_used_mb_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.mem.used_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["tenant_id"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_mem_used_mb_project","metric_id":"vm_mem_used_mb_project"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"nova.vm.mem.total_allocated_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list": [],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"nova_vm_mem_total_all","metric_id":"nova_vm_mem_total_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"disk.total_space_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"disk_total_all","metric_id":"disk_total_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"disk.total_used_space_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"disk_usable_all","metric_id":"disk_usable_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"nova.vm.disk.total_allocated_gb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"nova_disk_total_allocated_gb_all","metric_id":"nova_disk_total_allocated_gb_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.disk.allocation_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_disk_allocation_all","metric_id":"vm_disk_allocation_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.disk.allocation_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["tenant_id"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_disk_allocation_project","metric_id":"vm_disk_allocation_project"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"cpu.total_logical_cores_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list": [],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"cpu_total_all","metric_id":"cpu_total_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"cpu.total_logical_cores_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["host"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"cpu_total_host","metric_id":"cpu_total_host"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity_util","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"cpu.utilized_logical_cores_agg","aggregation_period":"hourly","aggregation_group_by_list": ["event_type", "host"],"usage_fetch_operation": "avg","usage_fetch_util_quantity_event_type": "cpu.total_logical_cores","usage_fetch_util_idle_perc_event_type": "cpu.idle_perc","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"cpu_util_all","metric_id":"cpu_util_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity_util","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"cpu.utilized_logical_cores_agg","aggregation_period":"hourly","aggregation_group_by_list": ["event_type", "host"],"usage_fetch_operation": "avg","usage_fetch_util_quantity_event_type": "cpu.total_logical_cores","usage_fetch_util_idle_perc_event_type": "cpu.idle_perc","filter_by_list": [],"setter_rollup_group_by_list":["host"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"cpu_util_host","metric_id":"cpu_util_host"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vcpus_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#tenant_id", "dimensions#resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vcpus_all","metric_id":"vcpus_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vcpus_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["tenant_id"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vcpus_project","metric_id":"vcpus_project"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.cpu.utilization_perc_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["tenant_id"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_cpu_util_perc_project","metric_id":"vm_cpu_util_perc_project"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"nova.vm.cpu.total_allocated_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list": [],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"nova_vm_cpu_total_all","metric_id":"nova_vm_cpu_total_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"swiftlm.diskusage.val.size_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#mount"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"swift_total_all","metric_id":"swift_total_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"swiftlm.diskusage.val.size_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#mount"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["host"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"swift_total_host","metric_id":"swift_total_host"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"swiftlm.diskusage.val.avail_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#mount"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"swift_avail_all","metric_id":"swift_avail_all"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"swiftlm.diskusage.val.avail_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#mount"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["host"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"swift_avail_host","metric_id":"swift_avail_host"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"calculate_rate","setters":["set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"swiftlm.diskusage.rate_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#mount"],"filter_by_list": [],"setter_rollup_group_by_list": [],"dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"rate","pre_hourly_group_by_list":["default"]},"metric_group":"swift_avail_rate","metric_id":"swift_usage_rate"}
|
||||
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"storage.objects.size_agg","aggregation_period":"hourly","aggregation_group_by_list": ["metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"sum","pre_hourly_group_by_list":["default"]},"metric_group":"storage_objects_size_all","metric_id":"storage_objects_size_all"}
|
@ -1,54 +0,0 @@
|
||||
# (c) Copyright 2016 Hewlett Packard Enterprise Development LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
from oslo_config import cfg
|
||||
|
||||
|
||||
class DbUtil(object):
|
||||
|
||||
@staticmethod
|
||||
def get_python_db_connection_string(config=cfg.CONF):
|
||||
database_name = config.database.database_name
|
||||
database_server = config.database.host
|
||||
database_uid = config.database.username
|
||||
database_pwd = config.database.password
|
||||
|
||||
if config.database.use_ssl:
|
||||
db_ssl = "?ssl_ca=%s" % config.database.ca_file
|
||||
else:
|
||||
db_ssl = ''
|
||||
|
||||
return 'mysql+pymysql://%s:%s@%s/%s%s' % (
|
||||
database_uid,
|
||||
database_pwd,
|
||||
database_server,
|
||||
database_name,
|
||||
db_ssl)
|
||||
|
||||
@staticmethod
|
||||
def get_java_db_connection_string(config=cfg.CONF):
|
||||
|
||||
ssl_params = ''
|
||||
if config.database.use_ssl:
|
||||
ssl_params = "&useSSL=%s&requireSSL=%s" % (
|
||||
config.database.use_ssl, config.database.use_ssl
|
||||
)
|
||||
# FIXME I don't like this, find a better way of managing the conn
|
||||
return 'jdbc:%s://%s/%s?user=%s&password=%s%s' % (
|
||||
config.database.server_type,
|
||||
config.database.host,
|
||||
config.database.database_name,
|
||||
config.database.username,
|
||||
config.database.password,
|
||||
ssl_params,
|
||||
)
|
@ -1,570 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from pyspark import SparkConf
|
||||
from pyspark import SparkContext
|
||||
|
||||
from pyspark.streaming.kafka import KafkaUtils
|
||||
from pyspark.streaming.kafka import TopicAndPartition
|
||||
from pyspark.streaming import StreamingContext
|
||||
|
||||
from pyspark.sql.functions import explode
|
||||
from pyspark.sql.functions import from_unixtime
|
||||
from pyspark.sql.functions import when
|
||||
from pyspark.sql import SQLContext
|
||||
|
||||
import logging
|
||||
from monasca_common.simport import simport
|
||||
from oslo_config import cfg
|
||||
import time
|
||||
|
||||
from monasca_transform.component.usage.fetch_quantity import \
|
||||
FetchQuantityException
|
||||
from monasca_transform.component.usage.fetch_quantity_util import \
|
||||
FetchQuantityUtilException
|
||||
from monasca_transform.config.config_initializer import ConfigInitializer
|
||||
from monasca_transform.log_utils import LogUtils
|
||||
from monasca_transform.transform.builder.generic_transform_builder \
|
||||
import GenericTransformBuilder
|
||||
|
||||
from monasca_transform.data_driven_specs.data_driven_specs_repo \
|
||||
import DataDrivenSpecsRepo
|
||||
|
||||
from monasca_transform.data_driven_specs.data_driven_specs_repo \
|
||||
import DataDrivenSpecsRepoFactory
|
||||
|
||||
from monasca_transform.processor.pre_hourly_processor import PreHourlyProcessor
|
||||
|
||||
from monasca_transform.transform import RddTransformContext
|
||||
from monasca_transform.transform.storage_utils import \
|
||||
InvalidCacheStorageLevelException
|
||||
from monasca_transform.transform.storage_utils import StorageUtils
|
||||
from monasca_transform.transform.transform_utils import MonMetricUtils
|
||||
from monasca_transform.transform.transform_utils import PreTransformSpecsUtils
|
||||
from monasca_transform.transform import TransformContextUtils
|
||||
|
||||
ConfigInitializer.basic_config()
|
||||
log = LogUtils.init_logger(__name__)
|
||||
|
||||
|
||||
class MonMetricsKafkaProcessor(object):
|
||||
|
||||
@staticmethod
|
||||
def log_debug(message):
|
||||
print(message)
|
||||
log.debug(message)
|
||||
|
||||
@staticmethod
|
||||
def store_offset_ranges(batch_time, rdd):
|
||||
if rdd.isEmpty():
|
||||
MonMetricsKafkaProcessor.log_debug(
|
||||
"storeOffsetRanges: nothing to process...")
|
||||
return rdd
|
||||
else:
|
||||
my_offset_ranges = rdd.offsetRanges()
|
||||
transform_context = \
|
||||
TransformContextUtils.get_context(offset_info=my_offset_ranges,
|
||||
batch_time_info=batch_time
|
||||
)
|
||||
rdd_transform_context = \
|
||||
rdd.map(lambda x: RddTransformContext(x, transform_context))
|
||||
return rdd_transform_context
|
||||
|
||||
@staticmethod
|
||||
def print_offset_ranges(my_offset_ranges):
|
||||
for o in my_offset_ranges:
|
||||
print("printOffSetRanges: %s %s %s %s" % (
|
||||
o.topic, o.partition, o.fromOffset, o.untilOffset))
|
||||
|
||||
@staticmethod
|
||||
def get_kafka_stream(topic, streaming_context):
|
||||
offset_specifications = simport.load(cfg.CONF.repositories.offsets)()
|
||||
app_name = streaming_context.sparkContext.appName
|
||||
saved_offset_spec = offset_specifications.get_kafka_offsets(app_name)
|
||||
if len(saved_offset_spec) < 1:
|
||||
|
||||
MonMetricsKafkaProcessor.log_debug(
|
||||
"No saved offsets available..."
|
||||
"connecting to kafka without specifying offsets")
|
||||
kvs = KafkaUtils.createDirectStream(
|
||||
streaming_context, [topic],
|
||||
{"metadata.broker.list": cfg.CONF.messaging.brokers})
|
||||
|
||||
return kvs
|
||||
|
||||
else:
|
||||
from_offsets = {}
|
||||
for key, value in saved_offset_spec.items():
|
||||
if key.startswith("%s_%s" % (app_name, topic)):
|
||||
# spec_app_name = value.get_app_name()
|
||||
spec_topic = value.get_topic()
|
||||
spec_partition = int(value.get_partition())
|
||||
# spec_from_offset = value.get_from_offset()
|
||||
spec_until_offset = value.get_until_offset()
|
||||
# composite_key = "%s_%s_%s" % (spec_app_name,
|
||||
# spec_topic,
|
||||
# spec_partition)
|
||||
# partition = saved_offset_spec[composite_key]
|
||||
from_offsets[
|
||||
TopicAndPartition(spec_topic, spec_partition)
|
||||
] = int(spec_until_offset)
|
||||
|
||||
MonMetricsKafkaProcessor.log_debug(
|
||||
"get_kafka_stream: calling createDirectStream :"
|
||||
" topic:{%s} : start " % topic)
|
||||
for key, value in from_offsets.items():
|
||||
MonMetricsKafkaProcessor.log_debug(
|
||||
"get_kafka_stream: calling createDirectStream : "
|
||||
"offsets : TopicAndPartition:{%s,%s}, value:{%s}" %
|
||||
(str(key._topic), str(key._partition), str(value)))
|
||||
MonMetricsKafkaProcessor.log_debug(
|
||||
"get_kafka_stream: calling createDirectStream : "
|
||||
"topic:{%s} : done" % topic)
|
||||
|
||||
kvs = KafkaUtils.createDirectStream(
|
||||
streaming_context, [topic],
|
||||
{"metadata.broker.list": cfg.CONF.messaging.brokers},
|
||||
from_offsets)
|
||||
return kvs
|
||||
|
||||
@staticmethod
|
||||
def save_rdd_contents(rdd):
|
||||
file_name = "".join((
|
||||
"/vagrant_home/uniq_metrics",
|
||||
'-', time.strftime("%Y-%m-%d-%H-%M-%S"),
|
||||
'-', str(rdd.id),
|
||||
'.log'))
|
||||
rdd.saveAsTextFile(file_name)
|
||||
|
||||
@staticmethod
|
||||
def save_kafka_offsets(current_offsets, app_name,
|
||||
batch_time_info):
|
||||
"""save current offsets to offset specification."""
|
||||
|
||||
offset_specs = simport.load(cfg.CONF.repositories.offsets)()
|
||||
|
||||
for o in current_offsets:
|
||||
MonMetricsKafkaProcessor.log_debug(
|
||||
"saving: OffSetRanges: %s %s %s %s, "
|
||||
"batch_time_info: %s" % (
|
||||
o.topic, o.partition, o.fromOffset, o.untilOffset,
|
||||
str(batch_time_info)))
|
||||
# add new offsets, update revision
|
||||
offset_specs.add_all_offsets(app_name,
|
||||
current_offsets,
|
||||
batch_time_info)
|
||||
|
||||
@staticmethod
|
||||
def reset_kafka_offsets(app_name):
|
||||
"""delete all offsets from the offset specification."""
|
||||
# get the offsets from global var
|
||||
offset_specs = simport.load(cfg.CONF.repositories.offsets)()
|
||||
offset_specs.delete_all_kafka_offsets(app_name)
|
||||
|
||||
@staticmethod
|
||||
def _validate_raw_mon_metrics(row):
|
||||
|
||||
required_fields = row.required_raw_fields_list
|
||||
|
||||
# prepare list of required fields, to a rdd syntax to retrieve value
|
||||
required_fields = PreTransformSpecsUtils.prepare_required_raw_fields_list(
|
||||
required_fields)
|
||||
|
||||
invalid_list = []
|
||||
for required_field in required_fields:
|
||||
required_field_value = None
|
||||
|
||||
# Look for the field in the first layer of the row
|
||||
try:
|
||||
required_field_value = eval(".".join(("row", required_field)))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
if required_field_value is None \
|
||||
or required_field_value == "":
|
||||
invalid_list.append((required_field,
|
||||
required_field_value))
|
||||
|
||||
if len(invalid_list) <= 0:
|
||||
return row
|
||||
else:
|
||||
for field_name, field_value in invalid_list:
|
||||
MonMetricsKafkaProcessor.log_debug(
|
||||
"_validate_raw_mon_metrics : found invalid field : ** %s: %s" % (
|
||||
field_name, field_value))
|
||||
|
||||
@staticmethod
|
||||
def process_metric(transform_context, record_store_df):
|
||||
"""process (aggregate) metric data from record_store data
|
||||
|
||||
All the parameters to drive processing should be available
|
||||
in transform_spec_df dataframe.
|
||||
"""
|
||||
# call processing chain
|
||||
return GenericTransformBuilder.do_transform(
|
||||
transform_context, record_store_df)
|
||||
|
||||
@staticmethod
|
||||
def process_metrics(transform_context, record_store_df):
|
||||
"""start processing (aggregating) metrics"""
|
||||
#
|
||||
# look in record_store_df for list of metrics to be processed
|
||||
#
|
||||
metric_ids_df = record_store_df.select("metric_id").distinct()
|
||||
metric_ids_to_process = [row.metric_id
|
||||
for row in metric_ids_df.collect()]
|
||||
|
||||
data_driven_specs_repo = DataDrivenSpecsRepoFactory.\
|
||||
get_data_driven_specs_repo()
|
||||
sqlc = SQLContext.getOrCreate(record_store_df.rdd.context)
|
||||
transform_specs_df = data_driven_specs_repo.get_data_driven_specs(
|
||||
sql_context=sqlc,
|
||||
data_driven_spec_type=DataDrivenSpecsRepo.transform_specs_type)
|
||||
|
||||
for metric_id in metric_ids_to_process:
|
||||
transform_spec_df = transform_specs_df.select(
|
||||
["aggregation_params_map", "metric_id"]
|
||||
).where(transform_specs_df.metric_id == metric_id)
|
||||
source_record_store_df = record_store_df.select("*").where(
|
||||
record_store_df.metric_id == metric_id)
|
||||
|
||||
# set transform_spec_df in TransformContext
|
||||
transform_context = \
|
||||
TransformContextUtils.get_context(
|
||||
transform_context_info=transform_context,
|
||||
transform_spec_df_info=transform_spec_df)
|
||||
|
||||
try:
|
||||
agg_inst_usage_df = (
|
||||
MonMetricsKafkaProcessor.process_metric(
|
||||
transform_context, source_record_store_df))
|
||||
|
||||
# if running in debug mode, write out the aggregated metric
|
||||
# name just processed (along with the count of how many of
|
||||
# these were aggregated) to the application log.
|
||||
if log.isEnabledFor(logging.DEBUG):
|
||||
agg_inst_usage_collection = agg_inst_usage_df.collect()
|
||||
collection_len = len(agg_inst_usage_collection)
|
||||
if collection_len > 0:
|
||||
agg_inst_usage_dict = (
|
||||
agg_inst_usage_collection[0].asDict())
|
||||
log.debug("Submitted pre-hourly aggregated metric: "
|
||||
"%s (%s)",
|
||||
agg_inst_usage_dict[
|
||||
"aggregated_metric_name"],
|
||||
str(collection_len))
|
||||
except FetchQuantityException:
|
||||
raise
|
||||
except FetchQuantityUtilException:
|
||||
raise
|
||||
except Exception as e:
|
||||
MonMetricsKafkaProcessor.log_debug(
|
||||
"Exception raised in metric processing for metric: " +
|
||||
str(metric_id) + ". Error: " + str(e))
|
||||
|
||||
@staticmethod
|
||||
def rdd_to_recordstore(rdd_transform_context_rdd):
|
||||
|
||||
if rdd_transform_context_rdd.isEmpty():
|
||||
MonMetricsKafkaProcessor.log_debug(
|
||||
"rdd_to_recordstore: nothing to process...")
|
||||
else:
|
||||
|
||||
sql_context = SQLContext.getOrCreate(
|
||||
rdd_transform_context_rdd.context)
|
||||
data_driven_specs_repo = DataDrivenSpecsRepoFactory.\
|
||||
get_data_driven_specs_repo()
|
||||
pre_transform_specs_df = data_driven_specs_repo.\
|
||||
get_data_driven_specs(
|
||||
sql_context=sql_context,
|
||||
data_driven_spec_type=DataDrivenSpecsRepo.
|
||||
pre_transform_specs_type)
|
||||
|
||||
#
|
||||
# extract second column containing raw metric data
|
||||
#
|
||||
raw_mon_metrics = rdd_transform_context_rdd.map(
|
||||
lambda nt: nt.rdd_info[1])
|
||||
|
||||
#
|
||||
# convert raw metric data rdd to dataframe rdd
|
||||
#
|
||||
raw_mon_metrics_df = \
|
||||
MonMetricUtils.create_mon_metrics_df_from_json_rdd(
|
||||
sql_context,
|
||||
raw_mon_metrics)
|
||||
|
||||
#
|
||||
# filter out unwanted metrics and keep metrics we are interested in
|
||||
#
|
||||
cond = [
|
||||
raw_mon_metrics_df.metric["name"] ==
|
||||
pre_transform_specs_df.event_type]
|
||||
filtered_metrics_df = raw_mon_metrics_df.join(
|
||||
pre_transform_specs_df, cond)
|
||||
|
||||
#
|
||||
# validate filtered metrics to check if required fields
|
||||
# are present and not empty
|
||||
# In order to be able to apply filter function had to convert
|
||||
# data frame rdd to normal rdd. After validation the rdd is
|
||||
# converted back to dataframe rdd
|
||||
#
|
||||
# FIXME: find a way to apply filter function on dataframe rdd data
|
||||
validated_mon_metrics_rdd = filtered_metrics_df.rdd.filter(
|
||||
MonMetricsKafkaProcessor._validate_raw_mon_metrics)
|
||||
validated_mon_metrics_df = sql_context.createDataFrame(
|
||||
validated_mon_metrics_rdd, filtered_metrics_df.schema)
|
||||
|
||||
#
|
||||
# record generator
|
||||
# generate a new intermediate metric record if a given metric
|
||||
# metric_id_list, in pre_transform_specs table has several
|
||||
# intermediate metrics defined.
|
||||
# intermediate metrics are used as a convenient way to
|
||||
# process (aggregated) metric in mutiple ways by making a copy
|
||||
# of the source data for each processing
|
||||
#
|
||||
gen_mon_metrics_df = validated_mon_metrics_df.select(
|
||||
validated_mon_metrics_df.meta,
|
||||
validated_mon_metrics_df.metric,
|
||||
validated_mon_metrics_df.event_processing_params,
|
||||
validated_mon_metrics_df.event_type,
|
||||
explode(validated_mon_metrics_df.metric_id_list).alias(
|
||||
"this_metric_id"))
|
||||
|
||||
#
|
||||
# transform metrics data to record_store format
|
||||
# record store format is the common format which will serve as
|
||||
# source to aggregation processing.
|
||||
# converting the metric to common standard format helps in writing
|
||||
# generic aggregation routines driven by configuration parameters
|
||||
# and can be reused
|
||||
#
|
||||
record_store_df = gen_mon_metrics_df.select(
|
||||
(gen_mon_metrics_df.metric.timestamp / 1000).alias(
|
||||
"event_timestamp_unix"),
|
||||
from_unixtime(
|
||||
gen_mon_metrics_df.metric.timestamp / 1000).alias(
|
||||
"event_timestamp_string"),
|
||||
gen_mon_metrics_df.event_type.alias("event_type"),
|
||||
gen_mon_metrics_df.event_type.alias("event_quantity_name"),
|
||||
(gen_mon_metrics_df.metric.value / 1.0).alias(
|
||||
"event_quantity"),
|
||||
|
||||
# resource_uuid
|
||||
when(gen_mon_metrics_df.metric.dimensions.instanceId != '',
|
||||
gen_mon_metrics_df.metric.dimensions.instanceId).when(
|
||||
gen_mon_metrics_df.metric.dimensions.resource_id != '',
|
||||
gen_mon_metrics_df.metric.dimensions.resource_id).
|
||||
otherwise('NA').alias("resource_uuid"),
|
||||
|
||||
# tenant_id
|
||||
when(gen_mon_metrics_df.metric.dimensions.tenantId != '',
|
||||
gen_mon_metrics_df.metric.dimensions.tenantId).when(
|
||||
gen_mon_metrics_df.metric.dimensions.tenant_id != '',
|
||||
gen_mon_metrics_df.metric.dimensions.tenant_id).when(
|
||||
gen_mon_metrics_df.metric.dimensions.project_id != '',
|
||||
gen_mon_metrics_df.metric.dimensions.project_id).otherwise(
|
||||
'NA').alias("tenant_id"),
|
||||
|
||||
# user_id
|
||||
when(gen_mon_metrics_df.meta.userId != '',
|
||||
gen_mon_metrics_df.meta.userId).otherwise('NA').alias(
|
||||
"user_id"),
|
||||
|
||||
# region
|
||||
when(gen_mon_metrics_df.meta.region != '',
|
||||
gen_mon_metrics_df.meta.region).when(
|
||||
gen_mon_metrics_df.event_processing_params
|
||||
.set_default_region_to != '',
|
||||
gen_mon_metrics_df.event_processing_params
|
||||
.set_default_region_to).otherwise(
|
||||
'NA').alias("region"),
|
||||
|
||||
# zone
|
||||
when(gen_mon_metrics_df.meta.zone != '',
|
||||
gen_mon_metrics_df.meta.zone).when(
|
||||
gen_mon_metrics_df.event_processing_params
|
||||
.set_default_zone_to != '',
|
||||
gen_mon_metrics_df.event_processing_params
|
||||
.set_default_zone_to).otherwise(
|
||||
'NA').alias("zone"),
|
||||
|
||||
# host
|
||||
when(gen_mon_metrics_df.metric.dimensions.hostname != '',
|
||||
gen_mon_metrics_df.metric.dimensions.hostname).when(
|
||||
gen_mon_metrics_df.metric.value_meta.host != '',
|
||||
gen_mon_metrics_df.metric.value_meta.host).otherwise(
|
||||
'NA').alias("host"),
|
||||
|
||||
# event_date
|
||||
from_unixtime(gen_mon_metrics_df.metric.timestamp / 1000,
|
||||
'yyyy-MM-dd').alias("event_date"),
|
||||
# event_hour
|
||||
from_unixtime(gen_mon_metrics_df.metric.timestamp / 1000,
|
||||
'HH').alias("event_hour"),
|
||||
# event_minute
|
||||
from_unixtime(gen_mon_metrics_df.metric.timestamp / 1000,
|
||||
'mm').alias("event_minute"),
|
||||
# event_second
|
||||
from_unixtime(gen_mon_metrics_df.metric.timestamp / 1000,
|
||||
'ss').alias("event_second"),
|
||||
# TODO(ashwin): rename to transform_spec_group
|
||||
gen_mon_metrics_df.this_metric_id.alias("metric_group"),
|
||||
# TODO(ashwin): rename to transform_spec_id
|
||||
gen_mon_metrics_df.this_metric_id.alias("metric_id"),
|
||||
|
||||
# metric dimensions
|
||||
gen_mon_metrics_df.meta.alias("meta"),
|
||||
# metric dimensions
|
||||
gen_mon_metrics_df.metric.dimensions.alias("dimensions"),
|
||||
# metric value_meta
|
||||
gen_mon_metrics_df.metric.value_meta.alias("value_meta"))
|
||||
|
||||
#
|
||||
# get transform context
|
||||
#
|
||||
rdd_transform_context = rdd_transform_context_rdd.first()
|
||||
transform_context = rdd_transform_context.transform_context_info
|
||||
|
||||
#
|
||||
# cache record store rdd
|
||||
#
|
||||
if cfg.CONF.service.enable_record_store_df_cache:
|
||||
storage_level_prop = \
|
||||
cfg.CONF.service.record_store_df_cache_storage_level
|
||||
try:
|
||||
storage_level = StorageUtils.get_storage_level(
|
||||
storage_level_prop)
|
||||
except InvalidCacheStorageLevelException as storage_error:
|
||||
storage_error.value += \
|
||||
" (as specified in " \
|
||||
"service.record_store_df_cache_storage_level)"
|
||||
raise
|
||||
record_store_df.persist(storage_level)
|
||||
|
||||
#
|
||||
# start processing metrics available in record_store data
|
||||
#
|
||||
MonMetricsKafkaProcessor.process_metrics(transform_context,
|
||||
record_store_df)
|
||||
|
||||
# remove df from cache
|
||||
if cfg.CONF.service.enable_record_store_df_cache:
|
||||
record_store_df.unpersist()
|
||||
|
||||
#
|
||||
# extract kafka offsets and batch processing time
|
||||
# stored in transform_context and save offsets
|
||||
#
|
||||
offsets = transform_context.offset_info
|
||||
|
||||
# batch time
|
||||
batch_time_info = \
|
||||
transform_context.batch_time_info
|
||||
|
||||
MonMetricsKafkaProcessor.save_kafka_offsets(
|
||||
offsets, rdd_transform_context_rdd.context.appName,
|
||||
batch_time_info)
|
||||
|
||||
# call pre hourly processor, if its time to run
|
||||
if (cfg.CONF.stage_processors.pre_hourly_processor_enabled and
|
||||
PreHourlyProcessor.is_time_to_run(batch_time_info)):
|
||||
PreHourlyProcessor.run_processor(
|
||||
record_store_df.rdd.context,
|
||||
batch_time_info)
|
||||
|
||||
@staticmethod
|
||||
def transform_to_recordstore(kvs):
|
||||
"""Transform metrics data from kafka to record store format.
|
||||
|
||||
extracts, validates, filters, generates data from kakfa to only keep
|
||||
data that has to be aggregated. Generate data generates multiple
|
||||
records for for the same incoming metric if the metric has multiple
|
||||
intermediate metrics defined, so that each of intermediate metrics can
|
||||
be potentially processed independently.
|
||||
"""
|
||||
# save offsets in global var myOffsetRanges
|
||||
# http://spark.apache.org/docs/latest/streaming-kafka-integration.html
|
||||
# Note that the typecast to HasOffsetRanges will only succeed if it is
|
||||
# done in the first method called on the directKafkaStream, not later
|
||||
# down a chain of methods. You can use transform() instead of
|
||||
# foreachRDD() as your first method call in order to access offsets,
|
||||
# then call further Spark methods. However, be aware that the
|
||||
# one-to-one mapping between RDD partition and Kafka partition does not
|
||||
# remain after any methods that shuffle or repartition,
|
||||
# e.g. reduceByKey() or window()
|
||||
kvs.transform(
|
||||
MonMetricsKafkaProcessor.store_offset_ranges
|
||||
).foreachRDD(MonMetricsKafkaProcessor.rdd_to_recordstore)
|
||||
|
||||
|
||||
def invoke():
|
||||
# object to keep track of offsets
|
||||
ConfigInitializer.basic_config()
|
||||
|
||||
# app name
|
||||
application_name = "mon_metrics_kafka"
|
||||
|
||||
my_spark_conf = SparkConf().setAppName(application_name)
|
||||
|
||||
spark_context = SparkContext(conf=my_spark_conf)
|
||||
|
||||
# read at the configured interval
|
||||
spark_streaming_context = \
|
||||
StreamingContext(spark_context, cfg.CONF.service.stream_interval)
|
||||
|
||||
kafka_stream = MonMetricsKafkaProcessor.get_kafka_stream(
|
||||
cfg.CONF.messaging.topic,
|
||||
spark_streaming_context)
|
||||
|
||||
# transform to recordstore
|
||||
MonMetricsKafkaProcessor.transform_to_recordstore(kafka_stream)
|
||||
|
||||
# catch interrupt, stop streaming context gracefully
|
||||
# signal.signal(signal.SIGINT, signal_handler)
|
||||
|
||||
# start processing
|
||||
spark_streaming_context.start()
|
||||
|
||||
# FIXME: stop spark context to relinquish resources
|
||||
|
||||
# FIXME: specify cores, so as not to use all the resources on the cluster.
|
||||
|
||||
# FIXME: HA deploy multiple masters, may be one on each control node
|
||||
|
||||
try:
|
||||
# Wait for the Spark driver to "finish"
|
||||
spark_streaming_context.awaitTermination()
|
||||
except Exception as e:
|
||||
MonMetricsKafkaProcessor.log_debug(
|
||||
"Exception raised during Spark execution : " + str(e))
|
||||
# One exception that can occur here is the result of the saved
|
||||
# kafka offsets being obsolete/out of range. Delete the saved
|
||||
# offsets to improve the chance of success on the next execution.
|
||||
|
||||
# TODO(someone) prevent deleting all offsets for an application,
|
||||
# but just the latest revision
|
||||
MonMetricsKafkaProcessor.log_debug(
|
||||
"Deleting saved offsets for chance of success on next execution")
|
||||
|
||||
MonMetricsKafkaProcessor.reset_kafka_offsets(application_name)
|
||||
|
||||
# delete pre hourly processor offsets
|
||||
if cfg.CONF.stage_processors.pre_hourly_processor_enabled:
|
||||
PreHourlyProcessor.reset_kafka_offsets()
|
||||
|
||||
if __name__ == "__main__":
|
||||
invoke()
|
@ -1,53 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import logging
|
||||
from oslo_config import cfg
|
||||
|
||||
|
||||
class LogUtils(object):
|
||||
"""util methods for logging"""
|
||||
|
||||
@staticmethod
|
||||
def log_debug(message):
|
||||
log = logging.getLogger(__name__)
|
||||
print(message)
|
||||
log.debug(message)
|
||||
|
||||
@staticmethod
|
||||
def who_am_i(obj):
|
||||
sep = "*" * 10
|
||||
debugstr = "\n".join((sep, "name: %s " % type(obj).__name__))
|
||||
debugstr = "\n".join((debugstr, "type: %s" % (type(obj))))
|
||||
debugstr = "\n".join((debugstr, "dir: %s" % (dir(obj)), sep))
|
||||
LogUtils.log_debug(debugstr)
|
||||
|
||||
@staticmethod
|
||||
def init_logger(logger_name):
|
||||
|
||||
# initialize logger
|
||||
log = logging.getLogger(logger_name)
|
||||
_h = logging.FileHandler('%s/%s' % (
|
||||
cfg.CONF.service.service_log_path,
|
||||
cfg.CONF.service.service_log_filename))
|
||||
_h.setFormatter(logging.Formatter("'%(asctime)s - %(pathname)s:"
|
||||
"%(lineno)s - %(levelname)s"
|
||||
" - %(message)s'"))
|
||||
log.addHandler(_h)
|
||||
if cfg.CONF.service.enable_debug_log_entries:
|
||||
log.setLevel(logging.DEBUG)
|
||||
else:
|
||||
log.setLevel(logging.INFO)
|
||||
|
||||
return log
|
@ -1,85 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import abc
|
||||
import json
|
||||
from monasca_common.kafka_lib.client import KafkaClient
|
||||
from monasca_common.kafka_lib.producer import SimpleProducer
|
||||
from monasca_common.simport import simport
|
||||
from oslo_config import cfg
|
||||
|
||||
|
||||
class MessageAdapter(object):
|
||||
|
||||
@abc.abstractmethod
|
||||
def do_send_metric(self, metric):
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement do_send_metric(self, metric)"
|
||||
% self.__class__.__name__)
|
||||
|
||||
|
||||
class KafkaMessageAdapter(MessageAdapter):
|
||||
|
||||
adapter_impl = None
|
||||
|
||||
def __init__(self):
|
||||
client_for_writing = KafkaClient(cfg.CONF.messaging.brokers)
|
||||
self.producer = SimpleProducer(client_for_writing)
|
||||
self.topic = cfg.CONF.messaging.topic
|
||||
|
||||
@staticmethod
|
||||
def init():
|
||||
# object to keep track of offsets
|
||||
KafkaMessageAdapter.adapter_impl = simport.load(
|
||||
cfg.CONF.messaging.adapter)()
|
||||
|
||||
def do_send_metric(self, metric):
|
||||
self.producer.send_messages(
|
||||
self.topic,
|
||||
json.dumps(metric, separators=(',', ':')))
|
||||
return
|
||||
|
||||
@staticmethod
|
||||
def send_metric(metric):
|
||||
if not KafkaMessageAdapter.adapter_impl:
|
||||
KafkaMessageAdapter.init()
|
||||
KafkaMessageAdapter.adapter_impl.do_send_metric(metric)
|
||||
|
||||
|
||||
class KafkaMessageAdapterPreHourly(MessageAdapter):
|
||||
|
||||
adapter_impl = None
|
||||
|
||||
def __init__(self):
|
||||
client_for_writing = KafkaClient(cfg.CONF.messaging.brokers)
|
||||
self.producer = SimpleProducer(client_for_writing)
|
||||
self.topic = cfg.CONF.messaging.topic_pre_hourly
|
||||
|
||||
@staticmethod
|
||||
def init():
|
||||
# object to keep track of offsets
|
||||
KafkaMessageAdapterPreHourly.adapter_impl = simport.load(
|
||||
cfg.CONF.messaging.adapter_pre_hourly)()
|
||||
|
||||
def do_send_metric(self, metric):
|
||||
self.producer.send_messages(
|
||||
self.topic,
|
||||
json.dumps(metric, separators=(',', ':')))
|
||||
return
|
||||
|
||||
@staticmethod
|
||||
def send_metric(metric):
|
||||
if not KafkaMessageAdapterPreHourly.adapter_impl:
|
||||
KafkaMessageAdapterPreHourly.init()
|
||||
KafkaMessageAdapterPreHourly.adapter_impl.do_send_metric(metric)
|
@ -1,197 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
import datetime
|
||||
from oslo_config import cfg
|
||||
from sqlalchemy import create_engine
|
||||
from sqlalchemy import desc
|
||||
from sqlalchemy.ext.automap import automap_base
|
||||
from sqlalchemy.orm import sessionmaker
|
||||
|
||||
from monasca_transform.db.db_utils import DbUtil
|
||||
from monasca_transform.offset_specs import OffsetSpec
|
||||
from monasca_transform.offset_specs import OffsetSpecs
|
||||
|
||||
Base = automap_base()
|
||||
|
||||
|
||||
class MySQLOffsetSpec(Base, OffsetSpec):
|
||||
__tablename__ = 'kafka_offsets'
|
||||
|
||||
def __str__(self):
|
||||
return "%s,%s,%s,%s,%s,%s,%s,%s" % (str(self.id),
|
||||
str(self.topic),
|
||||
str(self.partition),
|
||||
str(self.until_offset),
|
||||
str(self.from_offset),
|
||||
str(self.batch_time),
|
||||
str(self.last_updated),
|
||||
str(self.revision))
|
||||
|
||||
|
||||
class MySQLOffsetSpecs(OffsetSpecs):
|
||||
|
||||
def __init__(self):
|
||||
|
||||
db = create_engine(DbUtil.get_python_db_connection_string(),
|
||||
isolation_level="READ UNCOMMITTED")
|
||||
|
||||
if cfg.CONF.service.enable_debug_log_entries:
|
||||
db.echo = True
|
||||
|
||||
# reflect the tables
|
||||
Base.prepare(db, reflect=True)
|
||||
|
||||
Session = sessionmaker(bind=db)
|
||||
self.session = Session()
|
||||
|
||||
# keep these many offset versions around
|
||||
self.MAX_REVISIONS = cfg.CONF.repositories.offsets_max_revisions
|
||||
|
||||
def _manage_offset_revisions(self):
|
||||
"""manage offset versions"""
|
||||
distinct_offset_specs = self.session.query(
|
||||
MySQLOffsetSpec).group_by(MySQLOffsetSpec.app_name,
|
||||
MySQLOffsetSpec.topic,
|
||||
MySQLOffsetSpec.partition
|
||||
).all()
|
||||
|
||||
for distinct_offset_spec in distinct_offset_specs:
|
||||
ordered_versions = self.session.query(
|
||||
MySQLOffsetSpec).filter_by(
|
||||
app_name=distinct_offset_spec.app_name,
|
||||
topic=distinct_offset_spec.topic,
|
||||
partition=distinct_offset_spec.partition).order_by(
|
||||
desc(MySQLOffsetSpec.id)).all()
|
||||
|
||||
revision = 1
|
||||
for version_spec in ordered_versions:
|
||||
version_spec.revision = revision
|
||||
revision = revision + 1
|
||||
|
||||
# delete any revisions excess than required
|
||||
self.session.query(MySQLOffsetSpec).filter(
|
||||
MySQLOffsetSpec.revision > self.MAX_REVISIONS).delete(
|
||||
synchronize_session="fetch")
|
||||
|
||||
def get_kafka_offsets(self, app_name):
|
||||
return {'%s_%s_%s' % (
|
||||
offset.get_app_name(), offset.get_topic(), offset.get_partition()
|
||||
): offset for offset in self.session.query(MySQLOffsetSpec).filter(
|
||||
MySQLOffsetSpec.app_name == app_name,
|
||||
MySQLOffsetSpec.revision == 1).all()}
|
||||
|
||||
def get_kafka_offsets_by_revision(self, app_name, revision):
|
||||
return {'%s_%s_%s' % (
|
||||
offset.get_app_name(), offset.get_topic(), offset.get_partition()
|
||||
): offset for offset in self.session.query(MySQLOffsetSpec).filter(
|
||||
MySQLOffsetSpec.app_name == app_name,
|
||||
MySQLOffsetSpec.revision == revision).all()}
|
||||
|
||||
def get_most_recent_batch_time_from_offsets(self, app_name, topic):
|
||||
try:
|
||||
# get partition 0 as a representative of all others
|
||||
offset = self.session.query(MySQLOffsetSpec).filter(
|
||||
MySQLOffsetSpec.app_name == app_name,
|
||||
MySQLOffsetSpec.topic == topic,
|
||||
MySQLOffsetSpec.partition == 0,
|
||||
MySQLOffsetSpec.revision == 1).one()
|
||||
most_recent_batch_time = datetime.datetime.strptime(
|
||||
offset.get_batch_time(),
|
||||
'%Y-%m-%d %H:%M:%S')
|
||||
except Exception:
|
||||
most_recent_batch_time = None
|
||||
|
||||
return most_recent_batch_time
|
||||
|
||||
def delete_all_kafka_offsets(self, app_name):
|
||||
try:
|
||||
self.session.query(MySQLOffsetSpec).filter(
|
||||
MySQLOffsetSpec.app_name == app_name).delete()
|
||||
self.session.commit()
|
||||
except Exception:
|
||||
# Seems like there isn't much that can be done in this situation
|
||||
pass
|
||||
|
||||
def add_all_offsets(self, app_name, offsets,
|
||||
batch_time_info):
|
||||
"""add offsets. """
|
||||
try:
|
||||
|
||||
# batch time
|
||||
batch_time = \
|
||||
batch_time_info.strftime(
|
||||
'%Y-%m-%d %H:%M:%S')
|
||||
|
||||
# last updated
|
||||
last_updated = \
|
||||
datetime.datetime.now().strftime(
|
||||
'%Y-%m-%d %H:%M:%S')
|
||||
|
||||
NEW_REVISION_NO = -1
|
||||
|
||||
for o in offsets:
|
||||
offset_spec = MySQLOffsetSpec(
|
||||
topic=o.topic,
|
||||
app_name=app_name,
|
||||
partition=o.partition,
|
||||
from_offset=o.fromOffset,
|
||||
until_offset=o.untilOffset,
|
||||
batch_time=batch_time,
|
||||
last_updated=last_updated,
|
||||
revision=NEW_REVISION_NO)
|
||||
self.session.add(offset_spec)
|
||||
|
||||
# manage versions
|
||||
self._manage_offset_revisions()
|
||||
|
||||
self.session.commit()
|
||||
except Exception:
|
||||
self.session.rollback()
|
||||
raise
|
||||
|
||||
def add(self, app_name, topic, partition,
|
||||
from_offset, until_offset, batch_time_info):
|
||||
"""add offset info. """
|
||||
try:
|
||||
# batch time
|
||||
batch_time = \
|
||||
batch_time_info.strftime(
|
||||
'%Y-%m-%d %H:%M:%S')
|
||||
|
||||
# last updated
|
||||
last_updated = \
|
||||
datetime.datetime.now().strftime(
|
||||
'%Y-%m-%d %H:%M:%S')
|
||||
|
||||
NEW_REVISION_NO = -1
|
||||
|
||||
offset_spec = MySQLOffsetSpec(
|
||||
topic=topic,
|
||||
app_name=app_name,
|
||||
partition=partition,
|
||||
from_offset=from_offset,
|
||||
until_offset=until_offset,
|
||||
batch_time=batch_time,
|
||||
last_updated=last_updated,
|
||||
revision=NEW_REVISION_NO)
|
||||
|
||||
self.session.add(offset_spec)
|
||||
|
||||
# manage versions
|
||||
self._manage_offset_revisions()
|
||||
|
||||
self.session.commit()
|
||||
except Exception:
|
||||
self.session.rollback()
|
||||
raise
|
@ -1,101 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import abc
|
||||
import six
|
||||
|
||||
|
||||
class OffsetSpec(object):
|
||||
|
||||
def __init__(self, app_name=None, topic=None, partition=None,
|
||||
from_offset=None, until_offset=None,
|
||||
batch_time=None, last_updated=None,
|
||||
revision=None):
|
||||
|
||||
self.app_name = app_name
|
||||
self.topic = topic
|
||||
self.partition = partition
|
||||
self.from_offset = from_offset
|
||||
self.until_offset = until_offset
|
||||
self.batch_time = batch_time
|
||||
self.last_updated = last_updated
|
||||
self.revision = revision
|
||||
|
||||
def get_app_name(self):
|
||||
return self.app_name
|
||||
|
||||
def get_topic(self):
|
||||
return self.topic
|
||||
|
||||
def get_partition(self):
|
||||
return self.partition
|
||||
|
||||
def get_from_offset(self):
|
||||
return self.from_offset
|
||||
|
||||
def get_until_offset(self):
|
||||
return self.until_offset
|
||||
|
||||
def get_batch_time(self):
|
||||
return self.batch_time
|
||||
|
||||
def get_last_updated(self):
|
||||
return self.last_updated
|
||||
|
||||
def get_revision(self):
|
||||
return self.revision
|
||||
|
||||
|
||||
@six.add_metaclass(abc.ABCMeta)
|
||||
class OffsetSpecs(object):
|
||||
"""Class representing offset specs to help recover.
|
||||
|
||||
From where processing should pick up in case of failure
|
||||
"""
|
||||
|
||||
@abc.abstractmethod
|
||||
def add(self, app_name, topic, partition,
|
||||
from_offset, until_offset, batch_time_info):
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement add(self, app_name, topic, "
|
||||
"partition, from_offset, until_offset, batch_time,"
|
||||
"last_updated, revision)"
|
||||
% self.__class__.__name__)
|
||||
|
||||
@abc.abstractmethod
|
||||
def add_all_offsets(self, app_name, offsets, batch_time_info):
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement add(self, app_name, topic, "
|
||||
"partition, from_offset, until_offset, batch_time,"
|
||||
"last_updated, revision)"
|
||||
% self.__class__.__name__)
|
||||
|
||||
@abc.abstractmethod
|
||||
def get_kafka_offsets(self, app_name):
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement get_kafka_offsets()"
|
||||
% self.__class__.__name__)
|
||||
|
||||
@abc.abstractmethod
|
||||
def delete_all_kafka_offsets(self, app_name):
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement delete_all_kafka_offsets()"
|
||||
% self.__class__.__name__)
|
||||
|
||||
@abc.abstractmethod
|
||||
def get_most_recent_batch_time_from_offsets(self, app_name, topic):
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement "
|
||||
"get_most_recent_batch_time_from_offsets()"
|
||||
% self.__class__.__name__)
|
@ -1,39 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
import abc
|
||||
|
||||
|
||||
class Processor(object):
|
||||
"""processor object """
|
||||
|
||||
@abc.abstractmethod
|
||||
def get_app_name(self):
|
||||
"""get name of this application. Will be used to store offsets in database"""
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement get_app_name()"
|
||||
% self.__class__.__name__)
|
||||
|
||||
@abc.abstractmethod
|
||||
def is_time_to_run(self, current_time):
|
||||
"""return True if its time to run this processor"""
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement is_time_to_run()"
|
||||
% self.__class__.__name__)
|
||||
|
||||
@abc.abstractmethod
|
||||
def run_processor(self, time):
|
||||
"""Run application"""
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement run_processor()"
|
||||
% self.__class__.__name__)
|
@ -1,617 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
from monasca_common.kafka_lib.client import KafkaClient
|
||||
from monasca_common.kafka_lib.common import OffsetRequest
|
||||
|
||||
from pyspark.sql import SQLContext
|
||||
from pyspark.streaming.kafka import KafkaUtils
|
||||
from pyspark.streaming.kafka import OffsetRange
|
||||
|
||||
import datetime
|
||||
import logging
|
||||
from monasca_common.simport import simport
|
||||
from oslo_config import cfg
|
||||
|
||||
from monasca_transform.component.insert.kafka_insert import KafkaInsert
|
||||
from monasca_transform.component.setter.pre_hourly_calculate_rate import \
|
||||
PreHourlyCalculateRate
|
||||
from monasca_transform.component.setter.rollup_quantity import RollupQuantity
|
||||
from monasca_transform.config.config_initializer import ConfigInitializer
|
||||
from monasca_transform.data_driven_specs.data_driven_specs_repo \
|
||||
import DataDrivenSpecsRepo
|
||||
from monasca_transform.data_driven_specs.data_driven_specs_repo \
|
||||
import DataDrivenSpecsRepoFactory
|
||||
from monasca_transform.log_utils import LogUtils
|
||||
from monasca_transform.processor import Processor
|
||||
from monasca_transform.processor.processor_util import PreHourlyProcessorUtil
|
||||
from monasca_transform.processor.processor_util import ProcessUtilDataProvider
|
||||
from monasca_transform.transform.storage_utils import \
|
||||
InvalidCacheStorageLevelException
|
||||
from monasca_transform.transform.storage_utils import StorageUtils
|
||||
from monasca_transform.transform.transform_utils import InstanceUsageUtils
|
||||
from monasca_transform.transform import TransformContextUtils
|
||||
|
||||
ConfigInitializer.basic_config()
|
||||
log = LogUtils.init_logger(__name__)
|
||||
|
||||
|
||||
class PreHourlyProcessorDataProvider(ProcessUtilDataProvider):
|
||||
|
||||
def get_last_processed(self):
|
||||
offset_specifications = PreHourlyProcessor.get_offset_specs()
|
||||
app_name = PreHourlyProcessor.get_app_name()
|
||||
topic = PreHourlyProcessor.get_kafka_topic()
|
||||
most_recent_batch_time = (
|
||||
offset_specifications.get_most_recent_batch_time_from_offsets(
|
||||
app_name, topic))
|
||||
return most_recent_batch_time
|
||||
|
||||
|
||||
class PreHourlyProcessor(Processor):
|
||||
"""Publish metrics in kafka
|
||||
|
||||
Processor to process usage data published to metrics_pre_hourly topic a
|
||||
and publish final rolled up metrics to metrics topic in kafka.
|
||||
"""
|
||||
|
||||
@staticmethod
|
||||
def save_kafka_offsets(current_offsets,
|
||||
batch_time_info):
|
||||
"""save current offsets to offset specification."""
|
||||
|
||||
offset_specs = simport.load(cfg.CONF.repositories.offsets)()
|
||||
|
||||
app_name = PreHourlyProcessor.get_app_name()
|
||||
|
||||
for o in current_offsets:
|
||||
log.debug(
|
||||
"saving: OffSetRanges: %s %s %s %s, "
|
||||
"batch_time_info: %s" % (
|
||||
o.topic, o.partition, o.fromOffset, o.untilOffset,
|
||||
str(batch_time_info)))
|
||||
# add new offsets, update revision
|
||||
offset_specs.add_all_offsets(app_name,
|
||||
current_offsets,
|
||||
batch_time_info)
|
||||
|
||||
@staticmethod
|
||||
def reset_kafka_offsets():
|
||||
"""delete all offsets from the offset specification."""
|
||||
|
||||
app_name = PreHourlyProcessor.get_app_name()
|
||||
|
||||
# get the offsets from global var
|
||||
offset_specs = simport.load(cfg.CONF.repositories.offsets)()
|
||||
offset_specs.delete_all_kafka_offsets(app_name)
|
||||
|
||||
@staticmethod
|
||||
def get_app_name():
|
||||
"""get name of this application. Will be used to store offsets in database"""
|
||||
return "mon_metrics_kafka_pre_hourly"
|
||||
|
||||
@staticmethod
|
||||
def get_kafka_topic():
|
||||
"""get name of kafka topic for transformation."""
|
||||
return "metrics_pre_hourly"
|
||||
|
||||
@staticmethod
|
||||
def is_time_to_run(check_time):
|
||||
return PreHourlyProcessorUtil.is_time_to_run(check_time)
|
||||
|
||||
@staticmethod
|
||||
def _get_offsets_from_kafka(brokers,
|
||||
topic,
|
||||
offset_time):
|
||||
"""get dict representing kafka offsets."""
|
||||
# get client
|
||||
client = KafkaClient(brokers)
|
||||
|
||||
# get partitions for a topic
|
||||
partitions = client.topic_partitions[topic]
|
||||
|
||||
# https://cwiki.apache.org/confluence/display/KAFKA/
|
||||
# A+Guide+To+The+Kafka+Protocol#
|
||||
# AGuideToTheKafkaProtocol-OffsetRequest
|
||||
MAX_OFFSETS = 1
|
||||
offset_requests = [OffsetRequest(topic,
|
||||
part_name,
|
||||
offset_time,
|
||||
MAX_OFFSETS) for part_name
|
||||
in partitions.keys()]
|
||||
|
||||
offsets_responses = client.send_offset_request(offset_requests)
|
||||
|
||||
offset_dict = {}
|
||||
for response in offsets_responses:
|
||||
key = "_".join((response.topic,
|
||||
str(response.partition)))
|
||||
offset_dict[key] = response
|
||||
|
||||
return offset_dict
|
||||
|
||||
@staticmethod
|
||||
def _parse_saved_offsets(app_name, topic, saved_offset_spec):
|
||||
"""get dict representing saved offsets."""
|
||||
offset_dict = {}
|
||||
for key, value in saved_offset_spec.items():
|
||||
if key.startswith("%s_%s" % (app_name, topic)):
|
||||
spec_app_name = value.get_app_name()
|
||||
spec_topic = value.get_topic()
|
||||
spec_partition = int(value.get_partition())
|
||||
spec_from_offset = value.get_from_offset()
|
||||
spec_until_offset = value.get_until_offset()
|
||||
key = "_".join((spec_topic,
|
||||
str(spec_partition)))
|
||||
offset_dict[key] = (spec_app_name,
|
||||
spec_topic,
|
||||
spec_partition,
|
||||
spec_from_offset,
|
||||
spec_until_offset)
|
||||
return offset_dict
|
||||
|
||||
@staticmethod
|
||||
def _get_new_offset_range_list(brokers, topic):
|
||||
"""get offset range from earliest to latest."""
|
||||
offset_range_list = []
|
||||
|
||||
# https://cwiki.apache.org/confluence/display/KAFKA/
|
||||
# A+Guide+To+The+Kafka+Protocol#
|
||||
# AGuideToTheKafkaProtocol-OffsetRequest
|
||||
GET_LATEST_OFFSETS = -1
|
||||
latest_dict = PreHourlyProcessor._get_offsets_from_kafka(
|
||||
brokers, topic, GET_LATEST_OFFSETS)
|
||||
|
||||
GET_EARLIEST_OFFSETS = -2
|
||||
earliest_dict = PreHourlyProcessor._get_offsets_from_kafka(
|
||||
brokers, topic, GET_EARLIEST_OFFSETS)
|
||||
|
||||
for item in latest_dict:
|
||||
until_offset = latest_dict[item].offsets[0]
|
||||
from_offset = earliest_dict[item].offsets[0]
|
||||
partition = latest_dict[item].partition
|
||||
topic = latest_dict[item].topic
|
||||
offset_range_list.append(OffsetRange(topic,
|
||||
partition,
|
||||
from_offset,
|
||||
until_offset))
|
||||
|
||||
return offset_range_list
|
||||
|
||||
@staticmethod
|
||||
def _get_offset_range_list(brokers,
|
||||
topic,
|
||||
app_name,
|
||||
saved_offset_spec):
|
||||
"""get offset range from saved offset to latest."""
|
||||
offset_range_list = []
|
||||
|
||||
# https://cwiki.apache.org/confluence/display/KAFKA/
|
||||
# A+Guide+To+The+Kafka+Protocol#
|
||||
# AGuideToTheKafkaProtocol-OffsetRequest
|
||||
GET_LATEST_OFFSETS = -1
|
||||
latest_dict = PreHourlyProcessor._get_offsets_from_kafka(
|
||||
brokers, topic, GET_LATEST_OFFSETS)
|
||||
|
||||
GET_EARLIEST_OFFSETS = -2
|
||||
earliest_dict = PreHourlyProcessor._get_offsets_from_kafka(
|
||||
brokers, topic, GET_EARLIEST_OFFSETS)
|
||||
|
||||
saved_dict = PreHourlyProcessor._parse_saved_offsets(
|
||||
app_name, topic, saved_offset_spec)
|
||||
|
||||
for item in latest_dict:
|
||||
# saved spec
|
||||
(spec_app_name,
|
||||
spec_topic_name,
|
||||
spec_partition,
|
||||
spec_from_offset,
|
||||
spec_until_offset) = saved_dict[item]
|
||||
|
||||
# until
|
||||
until_offset = latest_dict[item].offsets[0]
|
||||
|
||||
# from
|
||||
if spec_until_offset is not None and int(spec_until_offset) >= 0:
|
||||
from_offset = spec_until_offset
|
||||
else:
|
||||
from_offset = earliest_dict[item].offsets[0]
|
||||
|
||||
partition = latest_dict[item].partition
|
||||
topic = latest_dict[item].topic
|
||||
offset_range_list.append(OffsetRange(topic,
|
||||
partition,
|
||||
from_offset,
|
||||
until_offset))
|
||||
|
||||
return offset_range_list
|
||||
|
||||
@staticmethod
|
||||
def get_processing_offset_range_list(processing_time):
|
||||
"""Get offset range to fetch data from.
|
||||
|
||||
The range will last from the last saved offsets to current offsets
|
||||
available. If there are no last saved offsets available in the
|
||||
database the starting offsets will be set to the earliest
|
||||
available in kafka.
|
||||
"""
|
||||
|
||||
offset_specifications = PreHourlyProcessor.get_offset_specs()
|
||||
|
||||
# get application name, will be used to get offsets from database
|
||||
app_name = PreHourlyProcessor.get_app_name()
|
||||
|
||||
saved_offset_spec = offset_specifications.get_kafka_offsets(app_name)
|
||||
|
||||
# get kafka topic to fetch data
|
||||
topic = PreHourlyProcessor.get_kafka_topic()
|
||||
|
||||
if len(saved_offset_spec) < 1:
|
||||
|
||||
log.debug(
|
||||
"No saved offsets available..."
|
||||
"connecting to kafka and fetching "
|
||||
"from earliest available offset ...")
|
||||
|
||||
offset_range_list = PreHourlyProcessor._get_new_offset_range_list(
|
||||
cfg.CONF.messaging.brokers,
|
||||
topic)
|
||||
else:
|
||||
log.debug(
|
||||
"Saved offsets available..."
|
||||
"connecting to kafka and fetching from saved offset ...")
|
||||
|
||||
offset_range_list = PreHourlyProcessor._get_offset_range_list(
|
||||
cfg.CONF.messaging.brokers,
|
||||
topic,
|
||||
app_name,
|
||||
saved_offset_spec)
|
||||
return offset_range_list
|
||||
|
||||
@staticmethod
|
||||
def get_offset_specs():
|
||||
"""get offset specifications."""
|
||||
return simport.load(cfg.CONF.repositories.offsets)()
|
||||
|
||||
@staticmethod
|
||||
def get_effective_offset_range_list(offset_range_list):
|
||||
"""Get effective batch offset range.
|
||||
|
||||
Effective batch offset range covers offsets starting
|
||||
from effective batch revision (defined by effective_batch_revision
|
||||
config property). By default this method will set the
|
||||
pyspark Offset.fromOffset for each partition
|
||||
to have value older than the latest revision
|
||||
(defaults to latest -1) so that prehourly processor has access
|
||||
to entire data for the hour. This will also account for and cover
|
||||
any early arriving data (data that arrives before the start hour).
|
||||
"""
|
||||
|
||||
offset_specifications = PreHourlyProcessor.get_offset_specs()
|
||||
|
||||
app_name = PreHourlyProcessor.get_app_name()
|
||||
|
||||
topic = PreHourlyProcessor.get_kafka_topic()
|
||||
|
||||
# start offset revision
|
||||
effective_batch_revision = cfg.CONF.pre_hourly_processor.\
|
||||
effective_batch_revision
|
||||
|
||||
effective_batch_spec = offset_specifications\
|
||||
.get_kafka_offsets_by_revision(app_name,
|
||||
effective_batch_revision)
|
||||
|
||||
# get latest revision, if penultimate is unavailable
|
||||
if not effective_batch_spec:
|
||||
log.debug("effective batch spec: offsets: revision %s unavailable,"
|
||||
" getting the latest revision instead..." % (
|
||||
effective_batch_revision))
|
||||
# not available
|
||||
effective_batch_spec = offset_specifications.get_kafka_offsets(
|
||||
app_name)
|
||||
|
||||
effective_batch_offsets = PreHourlyProcessor._parse_saved_offsets(
|
||||
app_name, topic,
|
||||
effective_batch_spec)
|
||||
|
||||
# for debugging
|
||||
for effective_key in effective_batch_offsets.keys():
|
||||
effective_offset = effective_batch_offsets.get(effective_key,
|
||||
None)
|
||||
(effect_app_name,
|
||||
effect_topic_name,
|
||||
effect_partition,
|
||||
effect_from_offset,
|
||||
effect_until_offset) = effective_offset
|
||||
log.debug(
|
||||
"effective batch offsets (from db):"
|
||||
" OffSetRanges: %s %s %s %s" % (
|
||||
effect_topic_name, effect_partition,
|
||||
effect_from_offset, effect_until_offset))
|
||||
|
||||
# effective batch revision
|
||||
effective_offset_range_list = []
|
||||
for offset_range in offset_range_list:
|
||||
part_topic_key = "_".join((offset_range.topic,
|
||||
str(offset_range.partition)))
|
||||
effective_offset = effective_batch_offsets.get(part_topic_key,
|
||||
None)
|
||||
if effective_offset:
|
||||
(effect_app_name,
|
||||
effect_topic_name,
|
||||
effect_partition,
|
||||
effect_from_offset,
|
||||
effect_until_offset) = effective_offset
|
||||
|
||||
log.debug(
|
||||
"Extending effective offset range:"
|
||||
" OffSetRanges: %s %s %s-->%s %s" % (
|
||||
effect_topic_name, effect_partition,
|
||||
offset_range.fromOffset,
|
||||
effect_from_offset,
|
||||
effect_until_offset))
|
||||
|
||||
effective_offset_range_list.append(
|
||||
OffsetRange(offset_range.topic,
|
||||
offset_range.partition,
|
||||
effect_from_offset,
|
||||
offset_range.untilOffset))
|
||||
else:
|
||||
effective_offset_range_list.append(
|
||||
OffsetRange(offset_range.topic,
|
||||
offset_range.partition,
|
||||
offset_range.fromOffset,
|
||||
offset_range.untilOffset))
|
||||
|
||||
# return effective offset range list
|
||||
return effective_offset_range_list
|
||||
|
||||
@staticmethod
|
||||
def fetch_pre_hourly_data(spark_context,
|
||||
offset_range_list):
|
||||
"""get metrics pre hourly data from offset range list."""
|
||||
|
||||
for o in offset_range_list:
|
||||
log.debug(
|
||||
"fetch_pre_hourly: offset_range_list:"
|
||||
" OffSetRanges: %s %s %s %s" % (
|
||||
o.topic, o.partition, o.fromOffset, o.untilOffset))
|
||||
|
||||
effective_offset_list = PreHourlyProcessor.\
|
||||
get_effective_offset_range_list(offset_range_list)
|
||||
|
||||
for o in effective_offset_list:
|
||||
log.debug(
|
||||
"fetch_pre_hourly: effective_offset_range_list:"
|
||||
" OffSetRanges: %s %s %s %s" % (
|
||||
o.topic, o.partition, o.fromOffset, o.untilOffset))
|
||||
|
||||
# get kafka stream over the same offsets
|
||||
pre_hourly_rdd = KafkaUtils.createRDD(spark_context,
|
||||
{"metadata.broker.list":
|
||||
cfg.CONF.messaging.brokers},
|
||||
effective_offset_list)
|
||||
return pre_hourly_rdd
|
||||
|
||||
@staticmethod
|
||||
def pre_hourly_to_instance_usage_df(pre_hourly_rdd):
|
||||
"""convert raw pre hourly data into instance usage dataframe."""
|
||||
#
|
||||
# extract second column containing instance usage data
|
||||
#
|
||||
instance_usage_rdd = pre_hourly_rdd.map(
|
||||
lambda iud: iud[1])
|
||||
|
||||
#
|
||||
# convert usage data rdd to instance usage df
|
||||
#
|
||||
sqlc = SQLContext.getOrCreate(pre_hourly_rdd.context)
|
||||
instance_usage_df = InstanceUsageUtils.create_df_from_json_rdd(
|
||||
sqlc, instance_usage_rdd)
|
||||
|
||||
if cfg.CONF.pre_hourly_processor.enable_batch_time_filtering:
|
||||
instance_usage_df = (
|
||||
PreHourlyProcessor.filter_out_records_not_in_current_batch(
|
||||
instance_usage_df))
|
||||
|
||||
return instance_usage_df
|
||||
|
||||
@staticmethod
|
||||
def filter_out_records_not_in_current_batch(instance_usage_df):
|
||||
"""Filter out any records which don't pertain to the current batch
|
||||
|
||||
(i.e., records before or after the
|
||||
batch currently being processed).
|
||||
"""
|
||||
# get the most recent batch time from the stored offsets
|
||||
|
||||
offset_specifications = PreHourlyProcessor.get_offset_specs()
|
||||
app_name = PreHourlyProcessor.get_app_name()
|
||||
topic = PreHourlyProcessor.get_kafka_topic()
|
||||
most_recent_batch_time = (
|
||||
offset_specifications.get_most_recent_batch_time_from_offsets(
|
||||
app_name, topic))
|
||||
|
||||
if most_recent_batch_time:
|
||||
# batches can fire after late metrics slack time, not neccessarily
|
||||
# at the top of the hour
|
||||
most_recent_batch_time_truncated = most_recent_batch_time.replace(
|
||||
minute=0, second=0, microsecond=0)
|
||||
log.debug("filter out records before : %s" % (
|
||||
most_recent_batch_time_truncated.strftime(
|
||||
'%Y-%m-%dT%H:%M:%S')))
|
||||
# filter out records before current batch
|
||||
instance_usage_df = instance_usage_df.filter(
|
||||
instance_usage_df.lastrecord_timestamp_string >=
|
||||
most_recent_batch_time_truncated)
|
||||
|
||||
# determine the timestamp of the most recent top-of-the-hour (which
|
||||
# is the end of the current batch).
|
||||
current_time = datetime.datetime.now()
|
||||
truncated_timestamp_to_current_hour = current_time.replace(
|
||||
minute=0, second=0, microsecond=0)
|
||||
|
||||
# filter out records after current batch
|
||||
log.debug("filter out records after : %s" % (
|
||||
truncated_timestamp_to_current_hour.strftime(
|
||||
'%Y-%m-%dT%H:%M:%S')))
|
||||
instance_usage_df = instance_usage_df.filter(
|
||||
instance_usage_df.firstrecord_timestamp_string <
|
||||
truncated_timestamp_to_current_hour)
|
||||
|
||||
return instance_usage_df
|
||||
|
||||
@staticmethod
|
||||
def process_instance_usage(transform_context, instance_usage_df):
|
||||
"""Second stage aggregation.
|
||||
|
||||
Aggregate instance usage rdd
|
||||
data and write results to metrics topic in kafka.
|
||||
"""
|
||||
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
#
|
||||
# do a rollup operation
|
||||
#
|
||||
agg_params = (transform_spec_df.select(
|
||||
"aggregation_params_map.pre_hourly_group_by_list")
|
||||
.collect()[0].asDict())
|
||||
pre_hourly_group_by_list = agg_params["pre_hourly_group_by_list"]
|
||||
|
||||
if (len(pre_hourly_group_by_list) == 1 and
|
||||
pre_hourly_group_by_list[0] == "default"):
|
||||
pre_hourly_group_by_list = ["tenant_id", "user_id",
|
||||
"resource_uuid",
|
||||
"geolocation", "region", "zone",
|
||||
"host", "project_id",
|
||||
"aggregated_metric_name",
|
||||
"aggregation_period"]
|
||||
|
||||
# get aggregation period
|
||||
agg_params = transform_spec_df.select(
|
||||
"aggregation_params_map.aggregation_period").collect()[0].asDict()
|
||||
aggregation_period = agg_params["aggregation_period"]
|
||||
|
||||
# get 2stage operation
|
||||
agg_params = (transform_spec_df.select(
|
||||
"aggregation_params_map.pre_hourly_operation")
|
||||
.collect()[0].asDict())
|
||||
pre_hourly_operation = agg_params["pre_hourly_operation"]
|
||||
|
||||
if pre_hourly_operation != "rate":
|
||||
instance_usage_df = RollupQuantity.do_rollup(
|
||||
pre_hourly_group_by_list, aggregation_period,
|
||||
pre_hourly_operation, instance_usage_df)
|
||||
else:
|
||||
instance_usage_df = PreHourlyCalculateRate.do_rate_calculation(
|
||||
instance_usage_df)
|
||||
|
||||
# insert metrics
|
||||
instance_usage_df = KafkaInsert.insert(transform_context,
|
||||
instance_usage_df)
|
||||
return instance_usage_df
|
||||
|
||||
@staticmethod
|
||||
def do_transform(instance_usage_df):
|
||||
"""start processing (aggregating) metrics"""
|
||||
#
|
||||
# look in instance_usage_df for list of metrics to be processed
|
||||
#
|
||||
metric_ids_df = instance_usage_df.select(
|
||||
"processing_meta.metric_id").distinct()
|
||||
|
||||
metric_ids_to_process = [row.metric_id
|
||||
for row in metric_ids_df.collect()]
|
||||
|
||||
data_driven_specs_repo = (
|
||||
DataDrivenSpecsRepoFactory.get_data_driven_specs_repo())
|
||||
sqlc = SQLContext.getOrCreate(instance_usage_df.rdd.context)
|
||||
transform_specs_df = data_driven_specs_repo.get_data_driven_specs(
|
||||
sql_context=sqlc,
|
||||
data_driven_spec_type=DataDrivenSpecsRepo.transform_specs_type)
|
||||
|
||||
for metric_id in metric_ids_to_process:
|
||||
transform_spec_df = transform_specs_df.select(
|
||||
["aggregation_params_map", "metric_id"]
|
||||
).where(transform_specs_df.metric_id == metric_id)
|
||||
source_instance_usage_df = instance_usage_df.select("*").where(
|
||||
instance_usage_df.processing_meta.metric_id == metric_id)
|
||||
|
||||
# set transform_spec_df in TransformContext
|
||||
transform_context = TransformContextUtils.get_context(
|
||||
transform_spec_df_info=transform_spec_df)
|
||||
|
||||
agg_inst_usage_df = PreHourlyProcessor.process_instance_usage(
|
||||
transform_context, source_instance_usage_df)
|
||||
|
||||
# if running in debug mode, write out the aggregated metric
|
||||
# name just processed (along with the count of how many of these
|
||||
# were aggregated) to the application log.
|
||||
if log.isEnabledFor(logging.DEBUG):
|
||||
agg_inst_usage_collection = agg_inst_usage_df.collect()
|
||||
collection_len = len(agg_inst_usage_collection)
|
||||
if collection_len > 0:
|
||||
agg_inst_usage_dict = agg_inst_usage_collection[0].asDict()
|
||||
log.debug("Submitted hourly aggregated metric: %s (%s)",
|
||||
agg_inst_usage_dict["aggregated_metric_name"],
|
||||
str(collection_len))
|
||||
|
||||
@staticmethod
|
||||
def run_processor(spark_context, processing_time):
|
||||
"""Process data in metrics_pre_hourly queue
|
||||
|
||||
Starting from the last saved offsets, else start from earliest
|
||||
offsets available
|
||||
"""
|
||||
|
||||
offset_range_list = (
|
||||
PreHourlyProcessor.get_processing_offset_range_list(
|
||||
processing_time))
|
||||
|
||||
# get pre hourly data
|
||||
pre_hourly_rdd = PreHourlyProcessor.fetch_pre_hourly_data(
|
||||
spark_context, offset_range_list)
|
||||
|
||||
# get instance usage df
|
||||
instance_usage_df = PreHourlyProcessor.pre_hourly_to_instance_usage_df(
|
||||
pre_hourly_rdd)
|
||||
|
||||
#
|
||||
# cache instance usage df
|
||||
#
|
||||
if cfg.CONF.pre_hourly_processor.enable_instance_usage_df_cache:
|
||||
storage_level_prop = (
|
||||
cfg.CONF.pre_hourly_processor
|
||||
.instance_usage_df_cache_storage_level)
|
||||
try:
|
||||
storage_level = StorageUtils.get_storage_level(
|
||||
storage_level_prop)
|
||||
except InvalidCacheStorageLevelException as storage_error:
|
||||
storage_error.value += (" (as specified in "
|
||||
"pre_hourly_processor"
|
||||
".instance_usage_df_cache"
|
||||
"_storage_level)")
|
||||
raise
|
||||
instance_usage_df.persist(storage_level)
|
||||
|
||||
# aggregate pre hourly data
|
||||
PreHourlyProcessor.do_transform(instance_usage_df)
|
||||
|
||||
# remove cache
|
||||
if cfg.CONF.pre_hourly_processor.enable_instance_usage_df_cache:
|
||||
instance_usage_df.unpersist()
|
||||
|
||||
# save latest metrics_pre_hourly offsets in the database
|
||||
PreHourlyProcessor.save_kafka_offsets(offset_range_list,
|
||||
processing_time)
|
@ -1,112 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
import abc
|
||||
import datetime
|
||||
from monasca_common.simport import simport
|
||||
from oslo_config import cfg
|
||||
|
||||
from monasca_transform.log_utils import LogUtils
|
||||
|
||||
|
||||
log = LogUtils.init_logger(__name__)
|
||||
|
||||
|
||||
class PreHourlyProcessorUtil(object):
|
||||
|
||||
data_provider = None
|
||||
|
||||
@staticmethod
|
||||
def get_last_processed():
|
||||
return PreHourlyProcessorUtil.get_data_provider().get_last_processed()
|
||||
|
||||
@staticmethod
|
||||
def get_data_provider():
|
||||
if not PreHourlyProcessorUtil.data_provider:
|
||||
PreHourlyProcessorUtil.data_provider = simport.load(
|
||||
cfg.CONF.pre_hourly_processor.data_provider)()
|
||||
return PreHourlyProcessorUtil.data_provider
|
||||
|
||||
@staticmethod
|
||||
def is_time_to_run(check_date_time):
|
||||
"""return True if its time to run this processor.
|
||||
|
||||
It is time to run the processor if:
|
||||
The processor has no previous recorded run time.
|
||||
It is more than the configured 'late_metric_slack_time' (to allow
|
||||
for the arrival of tardy metrics) past the hour and the processor
|
||||
has not yet run for this hour
|
||||
"""
|
||||
|
||||
check_hour = int(datetime.datetime.strftime(check_date_time, '%H'))
|
||||
check_date = check_date_time.replace(minute=0, second=0,
|
||||
microsecond=0, hour=0)
|
||||
slack = datetime.timedelta(
|
||||
seconds=cfg.CONF.pre_hourly_processor.late_metric_slack_time)
|
||||
|
||||
top_of_the_hour_date_time = check_date_time.replace(
|
||||
minute=0, second=0, microsecond=0)
|
||||
earliest_acceptable_run_date_time = top_of_the_hour_date_time + slack
|
||||
last_processed_date_time = PreHourlyProcessorUtil.get_last_processed()
|
||||
if last_processed_date_time:
|
||||
last_processed_hour = int(
|
||||
datetime.datetime.strftime(
|
||||
last_processed_date_time, '%H'))
|
||||
last_processed_date = last_processed_date_time.replace(
|
||||
minute=0, second=0, microsecond=0, hour=0)
|
||||
else:
|
||||
last_processed_date = None
|
||||
last_processed_hour = None
|
||||
|
||||
if (check_hour == last_processed_hour and
|
||||
last_processed_date == check_date):
|
||||
earliest_acceptable_run_date_time = (
|
||||
top_of_the_hour_date_time +
|
||||
datetime.timedelta(hours=1) +
|
||||
slack
|
||||
)
|
||||
log.debug(
|
||||
"Pre-hourly task check: Now date: %s, "
|
||||
"Date last processed: %s, Check time = %s, "
|
||||
"Last processed at %s (hour = %s), "
|
||||
"Earliest acceptable run time %s "
|
||||
"(based on configured pre hourly late metrics slack time of %s "
|
||||
"seconds)" % (
|
||||
check_date,
|
||||
last_processed_date,
|
||||
check_date_time,
|
||||
last_processed_date_time,
|
||||
last_processed_hour,
|
||||
earliest_acceptable_run_date_time,
|
||||
cfg.CONF.pre_hourly_processor.late_metric_slack_time
|
||||
))
|
||||
# run pre hourly processor only once from the
|
||||
# configured time after the top of the hour
|
||||
if (not last_processed_date_time or (
|
||||
((not check_hour == last_processed_hour) or
|
||||
(check_date > last_processed_date)) and
|
||||
check_date_time >= earliest_acceptable_run_date_time)):
|
||||
log.debug("Pre-hourly: Yes, it's time to process")
|
||||
return True
|
||||
log.debug("Pre-hourly: No, it's NOT time to process")
|
||||
return False
|
||||
|
||||
|
||||
class ProcessUtilDataProvider(object):
|
||||
|
||||
@abc.abstractmethod
|
||||
def get_last_processed(self):
|
||||
"""return data on last run of processor"""
|
||||
raise NotImplementedError(
|
||||
"Class %s doesn't implement is_time_to_run()"
|
||||
% self.__class__.__name__)
|
@ -1,297 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import os
|
||||
import psutil
|
||||
import signal
|
||||
import socket
|
||||
import subprocess
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
import traceback
|
||||
|
||||
from oslo_config import cfg
|
||||
from oslo_log import log
|
||||
from oslo_service import loopingcall
|
||||
from oslo_service import service as os_service
|
||||
from tooz import coordination
|
||||
|
||||
from monasca_transform.config.config_initializer import ConfigInitializer
|
||||
from monasca_transform.log_utils import LogUtils
|
||||
|
||||
CONF = cfg.CONF
|
||||
|
||||
SPARK_SUBMIT_PROC_NAME = "spark-submit"
|
||||
|
||||
|
||||
def main():
|
||||
transform_service = TransformService()
|
||||
transform_service.start()
|
||||
|
||||
|
||||
def shutdown_all_threads_and_die():
|
||||
"""Shut down all threads and exit process.
|
||||
|
||||
Hit it with a hammer to kill all threads and die.
|
||||
"""
|
||||
LOG = log.getLogger(__name__)
|
||||
LOG.info('Monasca Transform service stopping...')
|
||||
os._exit(1)
|
||||
|
||||
|
||||
def get_process(proc_name):
|
||||
"""Get process given string in process cmd line."""
|
||||
LOG = log.getLogger(__name__)
|
||||
proc = None
|
||||
try:
|
||||
for pr in psutil.process_iter():
|
||||
for args in pr.cmdline():
|
||||
if proc_name in args.split(" "):
|
||||
proc = pr
|
||||
return proc
|
||||
except BaseException:
|
||||
# pass
|
||||
LOG.error("Error fetching {%s} process..." % proc_name)
|
||||
return None
|
||||
|
||||
|
||||
def stop_spark_submit_process():
|
||||
"""Stop spark submit program."""
|
||||
LOG = log.getLogger(__name__)
|
||||
try:
|
||||
# get the driver proc
|
||||
pr = get_process(SPARK_SUBMIT_PROC_NAME)
|
||||
|
||||
if pr:
|
||||
# terminate (SIGTERM) spark driver proc
|
||||
for cpr in pr.children(recursive=False):
|
||||
LOG.info("Terminate child pid {%s} ..." % str(cpr.pid))
|
||||
cpr.terminate()
|
||||
|
||||
# terminate spark submit proc
|
||||
LOG.info("Terminate pid {%s} ..." % str(pr.pid))
|
||||
pr.terminate()
|
||||
|
||||
except Exception as e:
|
||||
LOG.error("Error killing spark submit "
|
||||
"process: got exception: {%s}" % str(e))
|
||||
|
||||
|
||||
class Transform(os_service.Service):
|
||||
"""Class used with Openstack service."""
|
||||
|
||||
LOG = log.getLogger(__name__)
|
||||
|
||||
def __init__(self, threads=1):
|
||||
super(Transform, self).__init__(threads)
|
||||
|
||||
def signal_handler(self, signal_number, stack_frame):
|
||||
# Catch stop requests and appropriately shut down
|
||||
shutdown_all_threads_and_die()
|
||||
|
||||
def start(self):
|
||||
try:
|
||||
# Register to catch stop requests
|
||||
signal.signal(signal.SIGTERM, self.signal_handler)
|
||||
|
||||
main()
|
||||
|
||||
except BaseException:
|
||||
self.LOG.exception("Monasca Transform service "
|
||||
"encountered fatal error. "
|
||||
"Shutting down all threads and exiting")
|
||||
shutdown_all_threads_and_die()
|
||||
|
||||
def stop(self):
|
||||
stop_spark_submit_process()
|
||||
super(os_service.Service, self).stop()
|
||||
|
||||
|
||||
class TransformService(threading.Thread):
|
||||
|
||||
previously_running = False
|
||||
LOG = log.getLogger(__name__)
|
||||
|
||||
def __init__(self):
|
||||
super(TransformService, self).__init__()
|
||||
|
||||
self.coordinator = None
|
||||
|
||||
self.group = CONF.service.coordinator_group
|
||||
|
||||
# A unique name used for establishing election candidacy
|
||||
self.my_host_name = socket.getfqdn()
|
||||
|
||||
# periodic check
|
||||
leader_check = loopingcall.FixedIntervalLoopingCall(
|
||||
self.periodic_leader_check)
|
||||
leader_check.start(interval=float(
|
||||
CONF.service.election_polling_frequency))
|
||||
|
||||
def check_if_still_leader(self):
|
||||
"""Return true if the this host is the leader"""
|
||||
leader = None
|
||||
try:
|
||||
leader = self.coordinator.get_leader(self.group).get()
|
||||
except BaseException:
|
||||
self.LOG.info('No leader elected yet for group %s' %
|
||||
(self.group))
|
||||
if leader and self.my_host_name == leader:
|
||||
return True
|
||||
# default
|
||||
return False
|
||||
|
||||
def periodic_leader_check(self):
|
||||
self.LOG.debug("Called periodic_leader_check...")
|
||||
try:
|
||||
if self.previously_running:
|
||||
if not self.check_if_still_leader():
|
||||
|
||||
# stop spark submit process
|
||||
stop_spark_submit_process()
|
||||
|
||||
# stand down as a leader
|
||||
try:
|
||||
self.coordinator.stand_down_group_leader(
|
||||
self.group)
|
||||
except BaseException as e:
|
||||
self.LOG.info("Host %s cannot stand down as "
|
||||
"leader for group %s: "
|
||||
"got exception {%s}" %
|
||||
(self.my_host_name, self.group,
|
||||
str(e)))
|
||||
# reset state
|
||||
self.previously_running = False
|
||||
except BaseException as e:
|
||||
self.LOG.info("periodic_leader_check: "
|
||||
"caught unhandled exception: {%s}" % str(e))
|
||||
|
||||
def when_i_am_elected_leader(self, event):
|
||||
"""Callback when this host gets elected leader."""
|
||||
|
||||
# set running state
|
||||
self.previously_running = True
|
||||
|
||||
self.LOG.info("Monasca Transform service running on %s "
|
||||
"has been elected leader" % str(self.my_host_name))
|
||||
|
||||
if CONF.service.spark_python_files:
|
||||
pyfiles = (" --py-files %s"
|
||||
% CONF.service.spark_python_files)
|
||||
else:
|
||||
pyfiles = ''
|
||||
|
||||
event_logging_dest = ''
|
||||
if (CONF.service.spark_event_logging_enabled and
|
||||
CONF.service.spark_event_logging_dest):
|
||||
event_logging_dest = (
|
||||
"--conf spark.eventLog.dir="
|
||||
"file://%s" %
|
||||
CONF.service.spark_event_logging_dest)
|
||||
|
||||
# Build the command to start the Spark driver
|
||||
spark_cmd = "".join((
|
||||
"export SPARK_HOME=",
|
||||
CONF.service.spark_home,
|
||||
" && ",
|
||||
"spark-submit --master ",
|
||||
CONF.service.spark_master_list,
|
||||
" --conf spark.eventLog.enabled=",
|
||||
CONF.service.spark_event_logging_enabled,
|
||||
event_logging_dest,
|
||||
" --jars " + CONF.service.spark_jars_list,
|
||||
pyfiles,
|
||||
" " + CONF.service.spark_driver))
|
||||
|
||||
# Start the Spark driver
|
||||
# (specify shell=True in order to
|
||||
# correctly handle wildcards in the spark_cmd)
|
||||
subprocess.call(spark_cmd, shell=True)
|
||||
|
||||
def run(self):
|
||||
|
||||
self.LOG.info('The host of this Monasca Transform service is ' +
|
||||
self.my_host_name)
|
||||
|
||||
# Loop until the service is stopped
|
||||
while True:
|
||||
|
||||
try:
|
||||
|
||||
self.previously_running = False
|
||||
|
||||
# Start an election coordinator
|
||||
self.coordinator = coordination.get_coordinator(
|
||||
CONF.service.coordinator_address, self.my_host_name)
|
||||
|
||||
self.coordinator.start()
|
||||
|
||||
# Create a coordination/election group
|
||||
try:
|
||||
request = self.coordinator.create_group(self.group)
|
||||
request.get()
|
||||
except coordination.GroupAlreadyExist:
|
||||
self.LOG.info('Group %s already exists' % self.group)
|
||||
|
||||
# Join the coordination/election group
|
||||
try:
|
||||
request = self.coordinator.join_group(self.group)
|
||||
request.get()
|
||||
except coordination.MemberAlreadyExist:
|
||||
self.LOG.info('Host already joined to group %s as %s' %
|
||||
(self.group, self.my_host_name))
|
||||
|
||||
# Announce the candidacy and wait to be elected
|
||||
self.coordinator.watch_elected_as_leader(
|
||||
self.group,
|
||||
self.when_i_am_elected_leader)
|
||||
|
||||
while self.previously_running is False:
|
||||
self.LOG.debug('Monasca Transform service on %s is '
|
||||
'checking election results...'
|
||||
% self.my_host_name)
|
||||
self.coordinator.heartbeat()
|
||||
self.coordinator.run_watchers()
|
||||
if self.previously_running is True:
|
||||
try:
|
||||
# Leave/exit the coordination/election group
|
||||
request = self.coordinator.leave_group(self.group)
|
||||
request.get()
|
||||
except coordination.MemberNotJoined:
|
||||
self.LOG.info("Host has not yet "
|
||||
"joined group %s as %s" %
|
||||
(self.group, self.my_host_name))
|
||||
time.sleep(float(CONF.service.election_polling_frequency))
|
||||
|
||||
self.coordinator.stop()
|
||||
|
||||
except BaseException as e:
|
||||
# catch any unhandled exception and continue
|
||||
self.LOG.info("Ran into unhandled exception: {%s}" % str(e))
|
||||
self.LOG.info("Going to restart coordinator again...")
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
def main_service():
|
||||
"""Method to use with Openstack service."""
|
||||
ConfigInitializer.basic_config()
|
||||
LogUtils.init_logger(__name__)
|
||||
launcher = os_service.ServiceLauncher(cfg.CONF, restart_method='mutate')
|
||||
launcher.launch_service(Transform())
|
||||
launcher.wait()
|
||||
|
||||
# Used if run without Openstack service.
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
@ -1,91 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from collections import namedtuple
|
||||
|
||||
|
||||
TransformContextBase = namedtuple("TransformContext",
|
||||
["config_info",
|
||||
"offset_info",
|
||||
"transform_spec_df_info",
|
||||
"batch_time_info"])
|
||||
|
||||
|
||||
class TransformContext(TransformContextBase):
|
||||
"""A tuple which contains all the configuration information to drive processing
|
||||
|
||||
namedtuple contains:
|
||||
|
||||
config_info - configuration information from oslo config
|
||||
offset_info - current kafka offset information
|
||||
transform_spec_df - processing information from
|
||||
transform_spec aggregation driver table
|
||||
batch_datetime_info - current batch processing datetime
|
||||
"""
|
||||
|
||||
RddTransformContextBase = namedtuple("RddTransformContext",
|
||||
["rdd_info",
|
||||
"transform_context_info"])
|
||||
|
||||
|
||||
class RddTransformContext(RddTransformContextBase):
|
||||
"""A tuple which is a wrapper containing the RDD and transform_context
|
||||
|
||||
namdetuple contains:
|
||||
|
||||
rdd_info - rdd
|
||||
transform_context_info - transform context
|
||||
"""
|
||||
|
||||
|
||||
class TransformContextUtils(object):
|
||||
"""utility method to get TransformContext"""
|
||||
|
||||
@staticmethod
|
||||
def get_context(transform_context_info=None,
|
||||
config_info=None,
|
||||
offset_info=None,
|
||||
transform_spec_df_info=None,
|
||||
batch_time_info=None):
|
||||
|
||||
if transform_context_info is None:
|
||||
return TransformContext(config_info,
|
||||
offset_info,
|
||||
transform_spec_df_info,
|
||||
batch_time_info)
|
||||
else:
|
||||
if config_info is None or config_info == "":
|
||||
# get from passed in transform_context
|
||||
config_info = transform_context_info.config_info
|
||||
|
||||
if offset_info is None or offset_info == "":
|
||||
# get from passed in transform_context
|
||||
offset_info = transform_context_info.offset_info
|
||||
|
||||
if transform_spec_df_info is None or \
|
||||
transform_spec_df_info == "":
|
||||
# get from passed in transform_context
|
||||
transform_spec_df_info = \
|
||||
transform_context_info.transform_spec_df_info
|
||||
|
||||
if batch_time_info is None or \
|
||||
batch_time_info == "":
|
||||
# get from passed in transform_context
|
||||
batch_time_info = \
|
||||
transform_context_info.batch_time_info
|
||||
|
||||
return TransformContext(config_info,
|
||||
offset_info,
|
||||
transform_spec_df_info,
|
||||
batch_time_info)
|
@ -1,131 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
from monasca_transform.log_utils import LogUtils
|
||||
|
||||
from stevedore import extension
|
||||
|
||||
|
||||
class GenericTransformBuilder(object):
|
||||
"""Build transformation pipeline
|
||||
|
||||
Based on aggregation_pipeline spec in metric processing
|
||||
configuration
|
||||
"""
|
||||
|
||||
_MONASCA_TRANSFORM_USAGE_NAMESPACE = 'monasca_transform.usage'
|
||||
_MONASCA_TRANSFORM_SETTER_NAMESPACE = 'monasca_transform.setter'
|
||||
_MONASCA_TRANSFORM_INSERT_NAMESPACE = 'monasca_transform.insert'
|
||||
|
||||
@staticmethod
|
||||
def log_load_extension_error(manager, entry_point, error):
|
||||
LogUtils.log_debug("GenericTransformBuilder: "
|
||||
"log load extension error: manager: {%s},"
|
||||
"entry_point: {%s}, error: {%s}"
|
||||
% (str(manager),
|
||||
str(entry_point),
|
||||
str(error)))
|
||||
|
||||
@staticmethod
|
||||
def _get_usage_component_manager():
|
||||
"""stevedore extension manager for usage components."""
|
||||
return extension.ExtensionManager(
|
||||
namespace=GenericTransformBuilder
|
||||
._MONASCA_TRANSFORM_USAGE_NAMESPACE,
|
||||
on_load_failure_callback=GenericTransformBuilder.
|
||||
log_load_extension_error,
|
||||
invoke_on_load=False)
|
||||
|
||||
@staticmethod
|
||||
def _get_setter_component_manager():
|
||||
"""stevedore extension manager for setter components."""
|
||||
return extension.ExtensionManager(
|
||||
namespace=GenericTransformBuilder.
|
||||
_MONASCA_TRANSFORM_SETTER_NAMESPACE,
|
||||
on_load_failure_callback=GenericTransformBuilder.
|
||||
log_load_extension_error,
|
||||
invoke_on_load=False)
|
||||
|
||||
@staticmethod
|
||||
def _get_insert_component_manager():
|
||||
"""stevedore extension manager for insert components."""
|
||||
return extension.ExtensionManager(
|
||||
namespace=GenericTransformBuilder.
|
||||
_MONASCA_TRANSFORM_INSERT_NAMESPACE,
|
||||
on_load_failure_callback=GenericTransformBuilder.
|
||||
log_load_extension_error,
|
||||
invoke_on_load=False)
|
||||
|
||||
@staticmethod
|
||||
def _parse_transform_pipeline(transform_spec_df):
|
||||
"""Parse aggregation pipeline from metric processing configuration"""
|
||||
|
||||
# get aggregation pipeline df
|
||||
aggregation_pipeline_df = transform_spec_df\
|
||||
.select("aggregation_params_map.aggregation_pipeline")
|
||||
|
||||
# call components
|
||||
source_row = aggregation_pipeline_df\
|
||||
.select("aggregation_pipeline.source").collect()[0]
|
||||
source = source_row.source
|
||||
|
||||
usage_row = aggregation_pipeline_df\
|
||||
.select("aggregation_pipeline.usage").collect()[0]
|
||||
usage = usage_row.usage
|
||||
|
||||
setter_row_list = aggregation_pipeline_df\
|
||||
.select("aggregation_pipeline.setters").collect()
|
||||
setter_list = [setter_row.setters for setter_row in setter_row_list]
|
||||
|
||||
insert_row_list = aggregation_pipeline_df\
|
||||
.select("aggregation_pipeline.insert").collect()
|
||||
insert_list = [insert_row.insert for insert_row in insert_row_list]
|
||||
return (source, usage, setter_list[0], insert_list[0])
|
||||
|
||||
@staticmethod
|
||||
def do_transform(transform_context,
|
||||
record_store_df):
|
||||
"""Method to return instance usage dataframe
|
||||
|
||||
Build a dynamic aggregation pipeline
|
||||
and call components to process record store dataframe
|
||||
"""
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
(source,
|
||||
usage,
|
||||
setter_list,
|
||||
insert_list) = GenericTransformBuilder.\
|
||||
_parse_transform_pipeline(transform_spec_df)
|
||||
|
||||
# FIXME: source is a placeholder for non-streaming source
|
||||
# in the future?
|
||||
|
||||
usage_component = GenericTransformBuilder.\
|
||||
_get_usage_component_manager()[usage].plugin
|
||||
|
||||
instance_usage_df = usage_component.usage(transform_context,
|
||||
record_store_df)
|
||||
|
||||
for setter in setter_list:
|
||||
setter_component = GenericTransformBuilder.\
|
||||
_get_setter_component_manager()[setter].plugin
|
||||
instance_usage_df = setter_component.setter(transform_context,
|
||||
instance_usage_df)
|
||||
|
||||
for insert in insert_list:
|
||||
insert_component = GenericTransformBuilder.\
|
||||
_get_insert_component_manager()[insert].plugin
|
||||
instance_usage_df = insert_component.insert(transform_context,
|
||||
instance_usage_df)
|
||||
|
||||
return instance_usage_df
|
@ -1,67 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from collections import namedtuple
|
||||
|
||||
RecordStoreWithGroupByBase = namedtuple("RecordStoreWithGroupBy",
|
||||
["record_store_data",
|
||||
"group_by_columns_list"])
|
||||
|
||||
|
||||
class RecordStoreWithGroupBy(RecordStoreWithGroupByBase):
|
||||
"""A tuple which is a wrapper containing record store data and the group by columns
|
||||
|
||||
namdetuple contains:
|
||||
|
||||
record_store_data - record store data
|
||||
group_by_columns_list - group by columns list
|
||||
"""
|
||||
|
||||
GroupingResultsBase = namedtuple("GroupingResults",
|
||||
["grouping_key",
|
||||
"results",
|
||||
"grouping_key_dict"])
|
||||
|
||||
|
||||
class GroupingResults(GroupingResultsBase):
|
||||
"""A tuple which is a wrapper containing grouping key and grouped result set
|
||||
|
||||
namdetuple contains:
|
||||
|
||||
grouping_key - group by key
|
||||
results - grouped results
|
||||
grouping_key_dict - group by key as dictionary
|
||||
"""
|
||||
|
||||
|
||||
class Grouping(object):
|
||||
"""Base class for all grouping classes."""
|
||||
|
||||
@staticmethod
|
||||
def _parse_grouping_key(grouping_str):
|
||||
"""parse grouping key
|
||||
|
||||
which in "^key1=value1^key2=value2..." format
|
||||
into a dictionary of key value pairs
|
||||
"""
|
||||
group_by_dict = {}
|
||||
#
|
||||
# convert key=value^key1=value1 string into a dict
|
||||
#
|
||||
for key_val_pair in grouping_str.split("^"):
|
||||
if "=" in key_val_pair:
|
||||
key_val = key_val_pair.split("=")
|
||||
group_by_dict[key_val[0]] = key_val[1]
|
||||
|
||||
return group_by_dict
|
@ -1,176 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from monasca_transform.transform.grouping import Grouping
|
||||
from monasca_transform.transform.grouping import GroupingResults
|
||||
from monasca_transform.transform.grouping import RecordStoreWithGroupBy
|
||||
|
||||
|
||||
class GroupSortbyTimestamp(Grouping):
|
||||
|
||||
@staticmethod
|
||||
def log_debug(logStr):
|
||||
print(str)
|
||||
# LOG.debug(logStr)
|
||||
|
||||
@staticmethod
|
||||
def _prepare_for_group_by(record_store_with_group_by_rdd):
|
||||
"""creates a new rdd where:
|
||||
|
||||
the first element of each row
|
||||
contains array of grouping key and event timestamp fields.
|
||||
Grouping key and event timestamp fields are used by
|
||||
partitioning and sorting function to partition the data
|
||||
by grouping key and then sort the elements in a group by the
|
||||
timestamp
|
||||
"""
|
||||
|
||||
# get the record store data and group by columns
|
||||
record_store_data = record_store_with_group_by_rdd.record_store_data
|
||||
|
||||
group_by_columns_list = \
|
||||
record_store_with_group_by_rdd.group_by_columns_list
|
||||
|
||||
# construct a group by key
|
||||
# key1=value1^key2=value2^...
|
||||
group_by_key_value = ""
|
||||
for gcol in group_by_columns_list:
|
||||
|
||||
if gcol.startswith('dimensions.'):
|
||||
gcol = "dimensions['%s']" % (gcol.split('.')[-1])
|
||||
elif gcol.startswith('meta.'):
|
||||
gcol = "meta['%s']" % (gcol.split('.')[-1])
|
||||
elif gcol.startswith('value_meta.'):
|
||||
gcol = "value_meta['%s']" % (gcol.split('.')[-1])
|
||||
|
||||
gcolval = eval(".".join(("record_store_data",
|
||||
gcol)))
|
||||
group_by_key_value = \
|
||||
"^".join((group_by_key_value,
|
||||
"=".join((gcol, gcolval))))
|
||||
|
||||
# return a key-value rdd
|
||||
return [group_by_key_value, record_store_data]
|
||||
|
||||
@staticmethod
|
||||
def _sort_by_timestamp(result_iterable):
|
||||
# LOG.debug(whoami(result_iterable.data[0]))
|
||||
|
||||
# sort list might cause OOM, if the group has lots of items
|
||||
# use group_sort_by_timestamp_partitions module instead if you run
|
||||
# into OOM
|
||||
sorted_list = sorted(result_iterable.data,
|
||||
key=lambda row: row.event_timestamp_string)
|
||||
return sorted_list
|
||||
|
||||
@staticmethod
|
||||
def _group_sort_by_timestamp(record_store_df, group_by_columns_list):
|
||||
# convert the dataframe rdd to normal rdd and add the group by column
|
||||
# list
|
||||
record_store_with_group_by_rdd = record_store_df.rdd.\
|
||||
map(lambda x: RecordStoreWithGroupBy(x, group_by_columns_list))
|
||||
|
||||
# convert rdd into key-value rdd
|
||||
record_store_with_group_by_rdd_key_val = \
|
||||
record_store_with_group_by_rdd.\
|
||||
map(GroupSortbyTimestamp._prepare_for_group_by)
|
||||
|
||||
first_step = record_store_with_group_by_rdd_key_val.groupByKey()
|
||||
record_store_rdd_grouped_sorted = first_step.mapValues(
|
||||
GroupSortbyTimestamp._sort_by_timestamp)
|
||||
|
||||
return record_store_rdd_grouped_sorted
|
||||
|
||||
@staticmethod
|
||||
def _get_group_first_last_quantity_udf(grouplistiter):
|
||||
"""Return stats that include:
|
||||
|
||||
first row key, first_event_timestamp,
|
||||
first event quantity, last_event_timestamp and last event quantity
|
||||
"""
|
||||
first_row = None
|
||||
last_row = None
|
||||
|
||||
# extract key and value list
|
||||
group_key = grouplistiter[0]
|
||||
grouped_values = grouplistiter[1]
|
||||
|
||||
count = 0.0
|
||||
for row in grouped_values:
|
||||
|
||||
# set the first row
|
||||
if first_row is None:
|
||||
first_row = row
|
||||
|
||||
# set the last row
|
||||
last_row = row
|
||||
count = count + 1
|
||||
|
||||
first_event_timestamp_unix = None
|
||||
first_event_timestamp_string = None
|
||||
first_event_quantity = None
|
||||
|
||||
if first_row is not None:
|
||||
first_event_timestamp_unix = first_row.event_timestamp_unix
|
||||
first_event_timestamp_string = first_row.event_timestamp_string
|
||||
first_event_quantity = first_row.event_quantity
|
||||
|
||||
last_event_timestamp_unix = None
|
||||
last_event_timestamp_string = None
|
||||
last_event_quantity = None
|
||||
|
||||
if last_row is not None:
|
||||
last_event_timestamp_unix = last_row.event_timestamp_unix
|
||||
last_event_timestamp_string = last_row.event_timestamp_string
|
||||
last_event_quantity = last_row.event_quantity
|
||||
|
||||
results_dict = {"firstrecord_timestamp_unix":
|
||||
first_event_timestamp_unix,
|
||||
"firstrecord_timestamp_string":
|
||||
first_event_timestamp_string,
|
||||
"firstrecord_quantity": first_event_quantity,
|
||||
"lastrecord_timestamp_unix":
|
||||
last_event_timestamp_unix,
|
||||
"lastrecord_timestamp_string":
|
||||
last_event_timestamp_string,
|
||||
"lastrecord_quantity": last_event_quantity,
|
||||
"record_count": count}
|
||||
|
||||
group_key_dict = Grouping._parse_grouping_key(group_key)
|
||||
|
||||
return GroupingResults(group_key, results_dict, group_key_dict)
|
||||
|
||||
@staticmethod
|
||||
def fetch_group_latest_oldest_quantity(record_store_df,
|
||||
transform_spec_df,
|
||||
group_by_columns_list):
|
||||
"""Function to group record store data
|
||||
|
||||
Sort by timestamp within group
|
||||
and get first and last timestamp along with quantity within each group
|
||||
|
||||
This function uses key-value pair rdd's groupBy function to do group_by
|
||||
"""
|
||||
# group and order elements in group
|
||||
record_store_grouped_data_rdd = \
|
||||
GroupSortbyTimestamp._group_sort_by_timestamp(
|
||||
record_store_df, group_by_columns_list)
|
||||
|
||||
# find stats for a group
|
||||
record_store_grouped_rows = \
|
||||
record_store_grouped_data_rdd.\
|
||||
map(GroupSortbyTimestamp.
|
||||
_get_group_first_last_quantity_udf)
|
||||
|
||||
return record_store_grouped_rows
|
@ -1,227 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from monasca_transform.transform.grouping import Grouping
|
||||
from monasca_transform.transform.grouping import GroupingResults
|
||||
from monasca_transform.transform.grouping import RecordStoreWithGroupBy
|
||||
|
||||
|
||||
class GroupSortbyTimestampPartition(Grouping):
|
||||
|
||||
@staticmethod
|
||||
def log_debug(logStr):
|
||||
print(str)
|
||||
# LOG.debug(logStr)
|
||||
|
||||
@staticmethod
|
||||
def _get_group_first_last_quantity_udf(partition_list_iter):
|
||||
"""User defined function to go through a list of partitions.
|
||||
|
||||
Each partition contains elements for a group. All the elements are sorted by
|
||||
timestamp.
|
||||
|
||||
The stats include first row key, first_event_timestamp,
|
||||
fist event quantity, last_event_timestamp and last event quantity
|
||||
"""
|
||||
first_row = None
|
||||
last_row = None
|
||||
|
||||
count = 0.0
|
||||
for row in partition_list_iter:
|
||||
|
||||
# set the first row
|
||||
if first_row is None:
|
||||
first_row = row
|
||||
|
||||
# set the last row
|
||||
last_row = row
|
||||
count = count + 1
|
||||
|
||||
first_event_timestamp_unix = None
|
||||
first_event_timestamp_string = None
|
||||
first_event_quantity = None
|
||||
first_row_key = None
|
||||
if first_row is not None:
|
||||
first_event_timestamp_unix = first_row[1].event_timestamp_unix
|
||||
first_event_timestamp_string = first_row[1].event_timestamp_string
|
||||
first_event_quantity = first_row[1].event_quantity
|
||||
|
||||
# extract the grouping_key from composite grouping_key
|
||||
# composite grouping key is a list, where first item is the
|
||||
# grouping key and second item is the event_timestamp_string
|
||||
first_row_key = first_row[0][0]
|
||||
|
||||
last_event_timestamp_unix = None
|
||||
last_event_timestamp_string = None
|
||||
last_event_quantity = None
|
||||
if last_row is not None:
|
||||
last_event_timestamp_unix = last_row[1].event_timestamp_unix
|
||||
last_event_timestamp_string = last_row[1].event_timestamp_string
|
||||
last_event_quantity = last_row[1].event_quantity
|
||||
|
||||
results_dict = {"firstrecord_timestamp_unix":
|
||||
first_event_timestamp_unix,
|
||||
"firstrecord_timestamp_string":
|
||||
first_event_timestamp_string,
|
||||
"firstrecord_quantity": first_event_quantity,
|
||||
"lastrecord_timestamp_unix":
|
||||
last_event_timestamp_unix,
|
||||
"lastrecord_timestamp_string":
|
||||
last_event_timestamp_string,
|
||||
"lastrecord_quantity": last_event_quantity,
|
||||
"record_count": count}
|
||||
|
||||
first_row_key_dict = Grouping._parse_grouping_key(first_row_key)
|
||||
|
||||
yield [GroupingResults(first_row_key, results_dict,
|
||||
first_row_key_dict)]
|
||||
|
||||
@staticmethod
|
||||
def _prepare_for_group_by(record_store_with_group_by_rdd):
|
||||
"""Creates a new rdd where:
|
||||
|
||||
The first element of each row contains array of grouping
|
||||
key and event timestamp fields.
|
||||
|
||||
Grouping key and event timestamp fields are used by
|
||||
partitioning and sorting function to partition the data
|
||||
by grouping key and then sort the elements in a group by the
|
||||
timestamp
|
||||
"""
|
||||
|
||||
# get the record store data and group by columns
|
||||
record_store_data = record_store_with_group_by_rdd.record_store_data
|
||||
|
||||
group_by_columns_list = \
|
||||
record_store_with_group_by_rdd.group_by_columns_list
|
||||
|
||||
# construct a group by key
|
||||
# key1=value1^key2=value2^...
|
||||
group_by_key_value = ""
|
||||
for gcol in group_by_columns_list:
|
||||
group_by_key_value = \
|
||||
"^".join((group_by_key_value,
|
||||
"=".join((gcol, eval(".".join(("record_store_data",
|
||||
gcol)))))))
|
||||
|
||||
# return a key-value rdd
|
||||
# key is a composite key which consists of grouping key and
|
||||
# event_timestamp_string
|
||||
return [[group_by_key_value,
|
||||
record_store_data.event_timestamp_string], record_store_data]
|
||||
|
||||
@staticmethod
|
||||
def _get_partition_by_group(group_composite):
|
||||
"""Get a hash of the grouping key,
|
||||
|
||||
which is then used by partitioning
|
||||
function to get partition where the groups data should end up in.
|
||||
It uses hash % num_partitions to get partition
|
||||
"""
|
||||
# FIXME: find out of hash function in python gives same value on
|
||||
# different machines
|
||||
# Look at using portable_hash method in spark rdd
|
||||
grouping_key = group_composite[0]
|
||||
grouping_key_hash = hash(grouping_key)
|
||||
# log_debug("group_by_sort_by_timestamp_partition: got hash : %s" \
|
||||
# % str(returnhash))
|
||||
return grouping_key_hash
|
||||
|
||||
@staticmethod
|
||||
def _sort_by_timestamp(group_composite):
|
||||
"""get timestamp which will be used to sort grouped data"""
|
||||
event_timestamp_string = group_composite[1]
|
||||
return event_timestamp_string
|
||||
|
||||
@staticmethod
|
||||
def _group_sort_by_timestamp_partition(record_store_df,
|
||||
group_by_columns_list,
|
||||
num_of_groups):
|
||||
"""It does a group by and then sorts all the items within the group by event timestamp."""
|
||||
# convert the dataframe rdd to normal rdd and add the group by
|
||||
# column list
|
||||
record_store_with_group_by_rdd = record_store_df.rdd.\
|
||||
map(lambda x: RecordStoreWithGroupBy(x, group_by_columns_list))
|
||||
|
||||
# prepare the data for repartitionAndSortWithinPartitions function
|
||||
record_store_rdd_prepared = \
|
||||
record_store_with_group_by_rdd.\
|
||||
map(GroupSortbyTimestampPartition._prepare_for_group_by)
|
||||
|
||||
# repartition data based on a grouping key and sort the items within
|
||||
# group by timestamp
|
||||
# give high number of partitions
|
||||
# numPartitions > number of groups expected, so that each group gets
|
||||
# allocated a separate partition
|
||||
record_store_rdd_partitioned_sorted = \
|
||||
record_store_rdd_prepared.\
|
||||
repartitionAndSortWithinPartitions(
|
||||
numPartitions=num_of_groups,
|
||||
partitionFunc=GroupSortbyTimestampPartition.
|
||||
_get_partition_by_group,
|
||||
keyfunc=GroupSortbyTimestampPartition.
|
||||
_sort_by_timestamp)
|
||||
|
||||
return record_store_rdd_partitioned_sorted
|
||||
|
||||
@staticmethod
|
||||
def _remove_none_filter(row):
|
||||
"""remove any rows which have None as grouping key
|
||||
|
||||
[GroupingResults(grouping_key="key1", results={})] rows get created
|
||||
when partition does not get any grouped data assigned to it
|
||||
"""
|
||||
if len(row[0].results) > 0 and row[0].grouping_key is not None:
|
||||
return row
|
||||
|
||||
@staticmethod
|
||||
def fetch_group_first_last_quantity(record_store_df,
|
||||
transform_spec_df,
|
||||
group_by_columns_list,
|
||||
num_of_groups):
|
||||
"""Function to group record store data
|
||||
|
||||
Sort by timestamp within group
|
||||
and get first and last timestamp along with quantity within each group
|
||||
To do group by it uses custom partitioning function which creates a new
|
||||
partition for each group and uses RDD's repartitionAndSortWithinPartitions
|
||||
function to do the grouping and sorting within the group.
|
||||
This is more scalable than just using RDD's group_by as using this
|
||||
technique group is not materialized into a list and stored in memory, but rather
|
||||
it uses RDD's in built partitioning capability to do the sort num_of_groups should
|
||||
be more than expected groups, otherwise the same
|
||||
partition can get used for two groups which will cause incorrect results.
|
||||
"""
|
||||
|
||||
# group and order elements in group using repartition
|
||||
record_store_grouped_data_rdd = \
|
||||
GroupSortbyTimestampPartition.\
|
||||
_group_sort_by_timestamp_partition(record_store_df,
|
||||
group_by_columns_list,
|
||||
num_of_groups)
|
||||
|
||||
# do some operations on all elements in the group
|
||||
grouping_results_tuple_with_none = \
|
||||
record_store_grouped_data_rdd.\
|
||||
mapPartitions(GroupSortbyTimestampPartition.
|
||||
_get_group_first_last_quantity_udf)
|
||||
|
||||
# filter all rows which have no data (where grouping key is None) and
|
||||
# convert resuts into grouping results tuple
|
||||
grouping_results_tuple1 = grouping_results_tuple_with_none.\
|
||||
filter(GroupSortbyTimestampPartition._remove_none_filter)
|
||||
|
||||
grouping_results_tuple = grouping_results_tuple1.map(lambda x: x[0])
|
||||
|
||||
return grouping_results_tuple
|
@ -1,62 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from pyspark import StorageLevel
|
||||
|
||||
|
||||
class InvalidCacheStorageLevelException(Exception):
|
||||
"""Exception thrown when an invalid cache storage level is encountered
|
||||
|
||||
Attributes:
|
||||
value: string representing the error
|
||||
"""
|
||||
|
||||
def __init__(self, value):
|
||||
self.value = value
|
||||
|
||||
def __str__(self):
|
||||
return repr(self.value)
|
||||
|
||||
|
||||
class StorageUtils(object):
|
||||
"""storage util functions"""
|
||||
|
||||
@staticmethod
|
||||
def get_storage_level(storage_level_str):
|
||||
"""get pyspark storage level from storage level string"""
|
||||
if (storage_level_str == "DISK_ONLY"):
|
||||
return StorageLevel.DISK_ONLY
|
||||
elif (storage_level_str == "DISK_ONLY_2"):
|
||||
return StorageLevel.DISK_ONLY_2
|
||||
elif (storage_level_str == "MEMORY_AND_DISK"):
|
||||
return StorageLevel.MEMORY_AND_DISK
|
||||
elif (storage_level_str == "MEMORY_AND_DISK_2"):
|
||||
return StorageLevel.MEMORY_AND_DISK_2
|
||||
elif (storage_level_str == "MEMORY_AND_DISK_SER"):
|
||||
return StorageLevel.MEMORY_AND_DISK_SER
|
||||
elif (storage_level_str == "MEMORY_AND_DISK_SER_2"):
|
||||
return StorageLevel.MEMORY_AND_DISK_SER_2
|
||||
elif (storage_level_str == "MEMORY_ONLY"):
|
||||
return StorageLevel.MEMORY_ONLY
|
||||
elif (storage_level_str == "MEMORY_ONLY_2"):
|
||||
return StorageLevel.MEMORY_ONLY_2
|
||||
elif (storage_level_str == "MEMORY_ONLY_SER"):
|
||||
return StorageLevel.MEMORY_ONLY_SER
|
||||
elif (storage_level_str == "MEMORY_ONLY_SER_2"):
|
||||
return StorageLevel.MEMORY_ONLY_SER_2
|
||||
elif (storage_level_str == "OFF_HEAP"):
|
||||
return StorageLevel.OFF_HEAP
|
||||
else:
|
||||
raise InvalidCacheStorageLevelException(
|
||||
"Unrecognized cache storage level: %s" % storage_level_str)
|
@ -1,533 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from pyspark.sql import SQLContext
|
||||
from pyspark.sql.types import ArrayType
|
||||
from pyspark.sql.types import DoubleType
|
||||
from pyspark.sql.types import MapType
|
||||
from pyspark.sql.types import StringType
|
||||
from pyspark.sql.types import StructField
|
||||
from pyspark.sql.types import StructType
|
||||
|
||||
from monasca_transform.component import Component
|
||||
|
||||
|
||||
class TransformUtils(object):
|
||||
"""utility methods for different kinds of data."""
|
||||
|
||||
@staticmethod
|
||||
def _rdd_to_df(rdd, schema):
|
||||
"""convert rdd to dataframe using schema."""
|
||||
spark_context = rdd.context
|
||||
sql_context = SQLContext.getOrCreate(spark_context)
|
||||
if schema is None:
|
||||
df = sql_context.createDataFrame(rdd)
|
||||
else:
|
||||
df = sql_context.createDataFrame(rdd, schema)
|
||||
return df
|
||||
|
||||
|
||||
class InstanceUsageUtils(TransformUtils):
|
||||
"""utility methods to transform instance usage data."""
|
||||
@staticmethod
|
||||
def _get_instance_usage_schema():
|
||||
"""get instance usage schema."""
|
||||
|
||||
# Initialize columns for all string fields
|
||||
columns = ["tenant_id", "user_id", "resource_uuid",
|
||||
"geolocation", "region", "zone", "host", "project_id",
|
||||
"aggregated_metric_name", "firstrecord_timestamp_string",
|
||||
"lastrecord_timestamp_string",
|
||||
"usage_date", "usage_hour", "usage_minute",
|
||||
"aggregation_period"]
|
||||
|
||||
columns_struct_fields = [StructField(field_name, StringType(), True)
|
||||
for field_name in columns]
|
||||
|
||||
# Add columns for non-string fields
|
||||
columns_struct_fields.append(StructField("firstrecord_timestamp_unix",
|
||||
DoubleType(), True))
|
||||
columns_struct_fields.append(StructField("lastrecord_timestamp_unix",
|
||||
DoubleType(), True))
|
||||
columns_struct_fields.append(StructField("quantity",
|
||||
DoubleType(), True))
|
||||
columns_struct_fields.append(StructField("record_count",
|
||||
DoubleType(), True))
|
||||
|
||||
columns_struct_fields.append(StructField("processing_meta",
|
||||
MapType(StringType(),
|
||||
StringType(),
|
||||
True),
|
||||
True))
|
||||
|
||||
columns_struct_fields.append(StructField("extra_data_map",
|
||||
MapType(StringType(),
|
||||
StringType(),
|
||||
True),
|
||||
True))
|
||||
schema = StructType(columns_struct_fields)
|
||||
|
||||
return schema
|
||||
|
||||
@staticmethod
|
||||
def create_df_from_json_rdd(sql_context, jsonrdd):
|
||||
"""create instance usage df from json rdd."""
|
||||
schema = InstanceUsageUtils._get_instance_usage_schema()
|
||||
instance_usage_schema_df = sql_context.read.json(jsonrdd, schema)
|
||||
return instance_usage_schema_df
|
||||
|
||||
@staticmethod
|
||||
def prepare_instance_usage_group_by_list(group_by_list):
|
||||
"""Prepare group by list.
|
||||
|
||||
If the group by list contains any instances of "dimensions#", "meta#" or "value_meta#" then
|
||||
prepend the column value by "extra_data_map." since those columns are available in
|
||||
extra_data_map column.
|
||||
|
||||
"""
|
||||
return [InstanceUsageUtils.prepare_group_by_item(item) for item in group_by_list]
|
||||
|
||||
@staticmethod
|
||||
def prepare_group_by_item(item):
|
||||
"""Prepare group by list item.
|
||||
|
||||
Convert replaces any special "dimensions#", "meta#" or "value_meta#" occurrences into
|
||||
spark sql syntax to retrieve data from extra_data_map column.
|
||||
"""
|
||||
if (item.startswith("dimensions#") or
|
||||
item.startswith("meta#") or
|
||||
item.startswith("value_meta#")):
|
||||
return ".".join(("extra_data_map", item))
|
||||
else:
|
||||
return item
|
||||
|
||||
@staticmethod
|
||||
def prepare_extra_data_map(extra_data_map):
|
||||
"""Prepare extra data map.
|
||||
|
||||
Replace any occurances of "dimensions." or "meta." or "value_meta."
|
||||
to "dimensions#", "meta#" or "value_meta#" in extra_data_map.
|
||||
|
||||
"""
|
||||
prepared_extra_data_map = {}
|
||||
for column_name in list(extra_data_map):
|
||||
column_value = extra_data_map[column_name]
|
||||
if column_name.startswith("dimensions."):
|
||||
column_name = column_name.replace("dimensions.", "dimensions#")
|
||||
elif column_name.startswith("meta."):
|
||||
column_name = column_name.replace("meta.", "meta#")
|
||||
elif column_name.startswith("value_meta."):
|
||||
column_name = column_name.replace("value_meta.", "value_meta#")
|
||||
elif column_name.startswith("extra_data_map."):
|
||||
column_name = column_name.replace("extra_data_map.", "")
|
||||
prepared_extra_data_map[column_name] = column_value
|
||||
return prepared_extra_data_map
|
||||
|
||||
@staticmethod
|
||||
def grouped_data_to_map(row, group_by_columns_list):
|
||||
"""Iterate through group by column values from grouped data set and extract any values.
|
||||
|
||||
Return a dictionary which contains original group by columns name and value pairs, if they
|
||||
are available from the grouped data set.
|
||||
|
||||
"""
|
||||
extra_data_map = getattr(row, "extra_data_map", {})
|
||||
# add group by fields data to extra data map
|
||||
for column_name in group_by_columns_list:
|
||||
column_value = getattr(row, column_name, Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE)
|
||||
if (column_value == Component.DEFAULT_UNAVAILABLE_VALUE and
|
||||
(column_name.startswith("dimensions.") or
|
||||
column_name.startswith("meta.") or
|
||||
column_name.startswith("value_meta.") or
|
||||
column_name.startswith("extra_data_map."))):
|
||||
split_column_name = column_name.split(".", 1)[-1]
|
||||
column_value = getattr(row, split_column_name, Component.
|
||||
DEFAULT_UNAVAILABLE_VALUE)
|
||||
extra_data_map[column_name] = column_value
|
||||
return extra_data_map
|
||||
|
||||
@staticmethod
|
||||
def extract_dimensions(instance_usage_dict, dimension_list):
|
||||
"""Extract dimensions from instance usage.
|
||||
|
||||
"""
|
||||
dimensions_part = {}
|
||||
# extra_data_map
|
||||
extra_data_map = instance_usage_dict.get("extra_data_map", {})
|
||||
|
||||
for dim in dimension_list:
|
||||
value = instance_usage_dict.get(dim)
|
||||
if value is None:
|
||||
# lookup for value in extra_data_map
|
||||
if len(list(extra_data_map)) > 0:
|
||||
value = extra_data_map.get(dim, "all")
|
||||
if dim.startswith("dimensions#"):
|
||||
dim = dim.replace("dimensions#", "")
|
||||
elif dim.startswith("meta#"):
|
||||
dim = dim.replace("meta#", "")
|
||||
elif dim.startswith("value_meta#"):
|
||||
dim = dim.replace("value_meta#", "")
|
||||
dimensions_part[dim] = value
|
||||
|
||||
return dimensions_part
|
||||
|
||||
|
||||
class RecordStoreUtils(TransformUtils):
|
||||
"""utility methods to transform record store data."""
|
||||
@staticmethod
|
||||
def _get_record_store_df_schema():
|
||||
"""get instance usage schema."""
|
||||
|
||||
columns = ["event_timestamp_string",
|
||||
"event_type", "event_quantity_name",
|
||||
"event_status", "event_version",
|
||||
"record_type", "resource_uuid", "tenant_id",
|
||||
"user_id", "region", "zone",
|
||||
"host", "project_id",
|
||||
"event_date", "event_hour", "event_minute",
|
||||
"event_second", "metric_group", "metric_id"]
|
||||
|
||||
columns_struct_fields = [StructField(field_name, StringType(), True)
|
||||
for field_name in columns]
|
||||
|
||||
# Add a column for a non-string fields
|
||||
columns_struct_fields.insert(0,
|
||||
StructField("event_timestamp_unix",
|
||||
DoubleType(), True))
|
||||
columns_struct_fields.insert(0,
|
||||
StructField("event_quantity",
|
||||
DoubleType(), True))
|
||||
|
||||
# map to metric meta
|
||||
columns_struct_fields.append(StructField("meta",
|
||||
MapType(StringType(),
|
||||
StringType(),
|
||||
True),
|
||||
True))
|
||||
# map to dimensions
|
||||
columns_struct_fields.append(StructField("dimensions",
|
||||
MapType(StringType(),
|
||||
StringType(),
|
||||
True),
|
||||
True))
|
||||
# map to value_meta
|
||||
columns_struct_fields.append(StructField("value_meta",
|
||||
MapType(StringType(),
|
||||
StringType(),
|
||||
True),
|
||||
True))
|
||||
|
||||
schema = StructType(columns_struct_fields)
|
||||
|
||||
return schema
|
||||
|
||||
@staticmethod
|
||||
def recordstore_rdd_to_df(record_store_rdd):
|
||||
"""convert record store rdd to a dataframe."""
|
||||
schema = RecordStoreUtils._get_record_store_df_schema()
|
||||
return TransformUtils._rdd_to_df(record_store_rdd, schema)
|
||||
|
||||
@staticmethod
|
||||
def create_df_from_json(sql_context, jsonpath):
|
||||
"""create a record store df from json file."""
|
||||
schema = RecordStoreUtils._get_record_store_df_schema()
|
||||
record_store_df = sql_context.read.json(jsonpath, schema)
|
||||
return record_store_df
|
||||
|
||||
@staticmethod
|
||||
def prepare_recordstore_group_by_list(group_by_list):
|
||||
"""Prepare record store group by list.
|
||||
|
||||
If the group by list contains any instances of "dimensions#", "meta#" or "value#meta" then
|
||||
convert into proper dotted notation, since original raw "dimensions", "meta" and
|
||||
"value_meta" are available in record_store data.
|
||||
|
||||
"""
|
||||
return [RecordStoreUtils.prepare_group_by_item(item) for item in group_by_list]
|
||||
|
||||
@staticmethod
|
||||
def prepare_group_by_item(item):
|
||||
"""Prepare record store item for group by.
|
||||
|
||||
Convert replaces any special "dimensions#", "meta#" or "value#meta" occurrences into
|
||||
"dimensions.", "meta." and value_meta.".
|
||||
|
||||
"""
|
||||
if item.startswith("dimensions#"):
|
||||
item = item.replace("dimensions#", "dimensions.")
|
||||
elif item.startswith("meta#"):
|
||||
item = item.replace("meta#", "meta.")
|
||||
elif item.startswith("value_meta#"):
|
||||
item = item.replace("value_meta#", "value_meta.")
|
||||
return item
|
||||
|
||||
|
||||
class TransformSpecsUtils(TransformUtils):
|
||||
"""utility methods to transform_specs."""
|
||||
|
||||
@staticmethod
|
||||
def _get_transform_specs_df_schema():
|
||||
"""get transform_specs df schema."""
|
||||
|
||||
# FIXME: change when transform_specs df is finalized
|
||||
source = StructField("source", StringType(), True)
|
||||
usage = StructField("usage", StringType(), True)
|
||||
setters = StructField("setters", ArrayType(StringType(),
|
||||
containsNull=False), True)
|
||||
insert = StructField("insert", ArrayType(StringType(),
|
||||
containsNull=False), True)
|
||||
|
||||
aggregation_params_map = \
|
||||
StructField("aggregation_params_map",
|
||||
StructType([StructField("aggregation_period",
|
||||
StringType(), True),
|
||||
StructField("dimension_list",
|
||||
ArrayType(StringType(),
|
||||
containsNull=False),
|
||||
True),
|
||||
StructField("aggregation_group_by_list",
|
||||
ArrayType(StringType(),
|
||||
containsNull=False),
|
||||
True),
|
||||
StructField("usage_fetch_operation",
|
||||
StringType(),
|
||||
True),
|
||||
StructField("filter_by_list",
|
||||
ArrayType(MapType(StringType(),
|
||||
StringType(),
|
||||
True)
|
||||
)
|
||||
),
|
||||
StructField(
|
||||
"usage_fetch_util_quantity_event_type",
|
||||
StringType(),
|
||||
True),
|
||||
|
||||
StructField(
|
||||
"usage_fetch_util_idle_perc_event_type",
|
||||
StringType(),
|
||||
True),
|
||||
|
||||
StructField("setter_rollup_group_by_list",
|
||||
ArrayType(StringType(),
|
||||
containsNull=False),
|
||||
True),
|
||||
StructField("setter_rollup_operation",
|
||||
StringType(), True),
|
||||
StructField("aggregated_metric_name",
|
||||
StringType(), True),
|
||||
StructField("pre_hourly_group_by_list",
|
||||
ArrayType(StringType(),
|
||||
containsNull=False),
|
||||
True),
|
||||
StructField("pre_hourly_operation",
|
||||
StringType(), True),
|
||||
StructField("aggregation_pipeline",
|
||||
StructType([source, usage,
|
||||
setters, insert]),
|
||||
True)
|
||||
]), True)
|
||||
metric_id = StructField("metric_id", StringType(), True)
|
||||
|
||||
schema = StructType([aggregation_params_map, metric_id])
|
||||
|
||||
return schema
|
||||
|
||||
@staticmethod
|
||||
def transform_specs_rdd_to_df(transform_specs_rdd):
|
||||
"""convert transform_specs rdd to a dataframe."""
|
||||
schema = TransformSpecsUtils._get_transform_specs_df_schema()
|
||||
return TransformUtils._rdd_to_df(transform_specs_rdd, schema)
|
||||
|
||||
@staticmethod
|
||||
def create_df_from_json(sql_context, jsonpath):
|
||||
"""create a metric processing df from json file."""
|
||||
schema = TransformSpecsUtils._get_transform_specs_df_schema()
|
||||
transform_specs_df = sql_context.read.json(jsonpath, schema)
|
||||
return transform_specs_df
|
||||
|
||||
|
||||
class MonMetricUtils(TransformUtils):
|
||||
"""utility methods to transform raw metric."""
|
||||
|
||||
@staticmethod
|
||||
def _get_mon_metric_json_schema():
|
||||
"""get the schema of the incoming monasca metric."""
|
||||
|
||||
metric_struct_field = StructField(
|
||||
"metric",
|
||||
StructType([StructField("dimensions",
|
||||
MapType(StringType(),
|
||||
StringType(),
|
||||
True),
|
||||
True),
|
||||
StructField("value_meta",
|
||||
MapType(StringType(),
|
||||
StringType(),
|
||||
True),
|
||||
True),
|
||||
StructField("name", StringType(), True),
|
||||
StructField("timestamp", StringType(), True),
|
||||
StructField("value", StringType(), True)]), True)
|
||||
|
||||
meta_struct_field = StructField("meta",
|
||||
MapType(StringType(),
|
||||
StringType(),
|
||||
True),
|
||||
True)
|
||||
|
||||
creation_time_struct_field = StructField("creation_time",
|
||||
StringType(), True)
|
||||
|
||||
schema = StructType([creation_time_struct_field,
|
||||
meta_struct_field, metric_struct_field])
|
||||
return schema
|
||||
|
||||
@staticmethod
|
||||
def create_mon_metrics_df_from_json_rdd(sql_context, jsonrdd):
|
||||
"""create mon metrics df from json rdd."""
|
||||
schema = MonMetricUtils._get_mon_metric_json_schema()
|
||||
mon_metrics_df = sql_context.read.json(jsonrdd, schema)
|
||||
return mon_metrics_df
|
||||
|
||||
|
||||
class PreTransformSpecsUtils(TransformUtils):
|
||||
"""utility methods to transform pre_transform_specs"""
|
||||
|
||||
@staticmethod
|
||||
def _get_pre_transform_specs_df_schema():
|
||||
"""get pre_transform_specs df schema."""
|
||||
|
||||
# FIXME: change when pre_transform_specs df is finalized
|
||||
|
||||
event_type = StructField("event_type", StringType(), True)
|
||||
|
||||
metric_id_list = StructField("metric_id_list",
|
||||
ArrayType(StringType(),
|
||||
containsNull=False),
|
||||
True)
|
||||
required_raw_fields_list = StructField("required_raw_fields_list",
|
||||
ArrayType(StringType(),
|
||||
containsNull=False),
|
||||
True)
|
||||
|
||||
event_processing_params = \
|
||||
StructField("event_processing_params",
|
||||
StructType([StructField("set_default_zone_to",
|
||||
StringType(), True),
|
||||
StructField("set_default_geolocation_to",
|
||||
StringType(), True),
|
||||
StructField("set_default_region_to",
|
||||
StringType(), True),
|
||||
]), True)
|
||||
|
||||
schema = StructType([event_processing_params, event_type,
|
||||
metric_id_list, required_raw_fields_list])
|
||||
|
||||
return schema
|
||||
|
||||
@staticmethod
|
||||
def pre_transform_specs_rdd_to_df(pre_transform_specs_rdd):
|
||||
"""convert pre_transform_specs processing rdd to a dataframe."""
|
||||
schema = PreTransformSpecsUtils._get_pre_transform_specs_df_schema()
|
||||
return TransformUtils._rdd_to_df(pre_transform_specs_rdd, schema)
|
||||
|
||||
@staticmethod
|
||||
def create_df_from_json(sql_context, jsonpath):
|
||||
"""create a pre_transform_specs df from json file."""
|
||||
schema = PreTransformSpecsUtils._get_pre_transform_specs_df_schema()
|
||||
pre_transform_specs_df = sql_context.read.json(jsonpath, schema)
|
||||
return pre_transform_specs_df
|
||||
|
||||
@staticmethod
|
||||
def prepare_required_raw_fields_list(group_by_list):
|
||||
"""Prepare required fields list.
|
||||
|
||||
If the group by list contains any instances of "dimensions#field", "meta#field" or
|
||||
"value_meta#field" then convert them into metric.dimensions["field"] syntax.
|
||||
|
||||
"""
|
||||
return [PreTransformSpecsUtils.prepare_required_raw_item(item) for item in group_by_list]
|
||||
|
||||
@staticmethod
|
||||
def prepare_required_raw_item(item):
|
||||
"""Prepare required field item.
|
||||
|
||||
Convert replaces any special "dimensions#", "meta#" or "value_meta" occurrences into
|
||||
spark rdd syntax to fetch field value.
|
||||
|
||||
"""
|
||||
if item.startswith("dimensions#"):
|
||||
field_name = item.replace("dimensions#", "")
|
||||
return "metric.dimensions['%s']" % field_name
|
||||
elif item.startswith("meta#"):
|
||||
field_name = item.replace("meta#", "")
|
||||
return "meta['%s']" % field_name
|
||||
elif item.startswith("value_meta#"):
|
||||
field_name = item.replace("value_meta#", "")
|
||||
return "metric.value_meta['%s']" % field_name
|
||||
else:
|
||||
return item
|
||||
|
||||
|
||||
class GroupingResultsUtils(TransformUtils):
|
||||
"""utility methods to transform record store data."""
|
||||
@staticmethod
|
||||
def _get_grouping_results_df_schema(group_by_column_list):
|
||||
"""get grouping results schema."""
|
||||
|
||||
group_by_field_list = [StructField(field_name, StringType(), True)
|
||||
for field_name in group_by_column_list]
|
||||
|
||||
# Initialize columns for string fields
|
||||
columns = ["firstrecord_timestamp_string",
|
||||
"lastrecord_timestamp_string"]
|
||||
|
||||
columns_struct_fields = [StructField(field_name, StringType(), True)
|
||||
for field_name in columns]
|
||||
|
||||
# Add columns for non-string fields
|
||||
columns_struct_fields.append(StructField("firstrecord_timestamp_unix",
|
||||
DoubleType(), True))
|
||||
columns_struct_fields.append(StructField("lastrecord_timestamp_unix",
|
||||
DoubleType(), True))
|
||||
columns_struct_fields.append(StructField("firstrecord_quantity",
|
||||
DoubleType(), True))
|
||||
columns_struct_fields.append(StructField("lastrecord_quantity",
|
||||
DoubleType(), True))
|
||||
columns_struct_fields.append(StructField("record_count",
|
||||
DoubleType(), True))
|
||||
|
||||
instance_usage_schema_part = StructType(columns_struct_fields)
|
||||
|
||||
grouping_results = \
|
||||
StructType([StructField("grouping_key",
|
||||
StringType(), True),
|
||||
StructField("results",
|
||||
instance_usage_schema_part,
|
||||
True),
|
||||
StructField("grouping_key_dict",
|
||||
StructType(group_by_field_list))])
|
||||
|
||||
# schema = \
|
||||
# StructType([StructField("GroupingResults", grouping_results)])
|
||||
return grouping_results
|
||||
|
||||
@staticmethod
|
||||
def grouping_results_rdd_to_df(grouping_results_rdd, group_by_list):
|
||||
"""convert record store rdd to a dataframe."""
|
||||
schema = GroupingResultsUtils._get_grouping_results_df_schema(
|
||||
group_by_list)
|
||||
return TransformUtils._rdd_to_df(grouping_results_rdd, schema)
|
@ -1,6 +0,0 @@
|
||||
---
|
||||
upgrade:
|
||||
- |
|
||||
Python 2.7 support has been dropped. Last release of monasca-transform
|
||||
to support python 2.7 is OpenStack Train. The minimum version of Python now
|
||||
supported by monasca-transform is Python 3.6.
|
@ -1,14 +0,0 @@
|
||||
# The order of packages is significant, because pip processes them in the order
|
||||
# of appearance. Changing the order has an impact on the overall integration
|
||||
# process, which may cause wedges in the gate later.
|
||||
pbr!=2.1.0,>=2.0.0 # Apache-2.0
|
||||
psutil>=3.2.2 # BSD
|
||||
PyMySQL>=0.7.6 # MIT License
|
||||
six>=1.10.0 # MIT
|
||||
SQLAlchemy!=1.1.5,!=1.1.6,!=1.1.7,!=1.1.8,>=1.0.10 # MIT
|
||||
stevedore>=1.20.0 # Apache-2.0
|
||||
monasca-common>=2.7.0 # Apache-2.0
|
||||
oslo.config>=5.2.0 # Apache-2.0
|
||||
oslo.log>=3.36.0 # Apache-2.0
|
||||
oslo.service!=1.28.1,>=1.24.0 # Apache-2.0
|
||||
tooz>=1.58.0 # Apache-2.0
|
@ -1,21 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from zipfile import PyZipFile
|
||||
|
||||
|
||||
with PyZipFile("monasca-transform.zip", "w") as spark_submit_zipfile:
|
||||
spark_submit_zipfile.writepy(
|
||||
"../monasca_transform"
|
||||
)
|
@ -1,15 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
SCRIPT_HOME=$(dirname $(readlink -f $BASH_SOURCE))
|
||||
pushd $SCRIPT_HOME
|
||||
|
||||
echo "create_zip.py: creating a zip file at ../monasca_transform/monasca-transform.zip..."
|
||||
python create_zip.py
|
||||
rc=$?
|
||||
if [[ $rc == 0 ]]; then
|
||||
echo "created zip file at ../monasca_transfom/monasca-transform.zip sucessfully"
|
||||
else
|
||||
echo "error creating zip file at ../monasca_transform/monasca-transform.zip, bailing out"
|
||||
exit 1
|
||||
fi
|
||||
popd
|
@ -1,110 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
"""Generator for ddl
|
||||
-t type of output to generate - either 'pre_transform_spec' or 'transform_spec'
|
||||
-o output path
|
||||
-i path to template file
|
||||
"""
|
||||
|
||||
import getopt
|
||||
import json
|
||||
import os.path
|
||||
import sys
|
||||
|
||||
|
||||
class Generator(object):
|
||||
|
||||
key_name = None
|
||||
|
||||
def generate(self, template_path, source_json_path, output_path):
|
||||
print("Generating content at %s with template at %s, using key %s" % (
|
||||
output_path, template_path, self.key_name))
|
||||
data = []
|
||||
with open(source_json_path) as f:
|
||||
for line in f:
|
||||
json_line = json.loads(line)
|
||||
data_line = '(\'%s\',\n\'%s\')' % (
|
||||
json_line[self.key_name], json.dumps(json_line))
|
||||
data.append(str(data_line))
|
||||
print(data)
|
||||
with open(template_path) as f:
|
||||
template = f.read()
|
||||
with open(output_path, 'w') as write_file:
|
||||
write_file.write(template)
|
||||
for record in data:
|
||||
write_file.write(record)
|
||||
write_file.write(',\n')
|
||||
write_file.seek(-2, 1)
|
||||
write_file.truncate()
|
||||
write_file.write(';')
|
||||
|
||||
|
||||
class TransformSpecsGenerator(Generator):
|
||||
|
||||
key_name = 'metric_id'
|
||||
|
||||
|
||||
class PreTransformSpecsGenerator(Generator):
|
||||
|
||||
key_name = 'event_type'
|
||||
|
||||
|
||||
def main():
|
||||
# parse command line options
|
||||
try:
|
||||
opts, args = getopt.getopt(sys.argv[1:], "ht:o:i:s:")
|
||||
print('Opts = %s' % opts)
|
||||
print('Args = %s' % args)
|
||||
except getopt.error as msg:
|
||||
print(msg)
|
||||
print("for help use --help")
|
||||
sys.exit(2)
|
||||
script_type = None
|
||||
template_path = None
|
||||
source_json_path = None
|
||||
output_path = None
|
||||
# process options
|
||||
for o, a in opts:
|
||||
if o in ("-h", "--help"):
|
||||
print(__doc__)
|
||||
sys.exit(0)
|
||||
elif o == "-t":
|
||||
script_type = a
|
||||
if a not in ('pre_transform_spec', 'transform_spec'):
|
||||
print('Incorrect output type specified: \'%s\'.\n %s' % (
|
||||
a, __doc__))
|
||||
sys.exit(1)
|
||||
elif o == "-i":
|
||||
template_path = a
|
||||
if not os.path.isfile(a):
|
||||
print('Cannot find template file at %s' % a)
|
||||
sys.exit(1)
|
||||
elif o == "-o":
|
||||
output_path = a
|
||||
elif o == "-s":
|
||||
source_json_path = a
|
||||
|
||||
print("Called with type = %s, template_path = %s, source_json_path %s"
|
||||
" and output_path = %s" % (
|
||||
script_type, template_path, source_json_path, output_path))
|
||||
generator = None
|
||||
if script_type == 'pre_transform_spec':
|
||||
generator = PreTransformSpecsGenerator()
|
||||
elif script_type == 'transform_spec':
|
||||
generator = TransformSpecsGenerator()
|
||||
generator.generate(template_path, source_json_path, output_path)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
@ -1,6 +0,0 @@
|
||||
DELETE FROM `monasca_transform`.`pre_transform_specs`;
|
||||
|
||||
INSERT IGNORE INTO `monasca_transform`.`pre_transform_specs`
|
||||
(`event_type`,
|
||||
`pre_transform_spec`)
|
||||
VALUES
|
@ -1,6 +0,0 @@
|
||||
DELETE FROM `monasca_transform`.`transform_specs`;
|
||||
|
||||
INSERT IGNORE INTO `monasca_transform`.`transform_specs`
|
||||
(`metric_id`,
|
||||
`transform_spec`)
|
||||
VALUES
|
@ -1,31 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
SCRIPT_HOME=$(dirname $(readlink -f $BASH_SOURCE))
|
||||
pushd $SCRIPT_HOME
|
||||
|
||||
PRE_TRANSFORM_SPECS_JSON="../monasca_transform/data_driven_specs/pre_transform_specs/pre_transform_specs.json"
|
||||
PRE_TRANSFORM_SPECS_SQL="ddl/pre_transform_specs.sql"
|
||||
|
||||
TRANSFORM_SPECS_JSON="../monasca_transform/data_driven_specs/transform_specs/transform_specs.json"
|
||||
TRANSFORM_SPECS_SQL="ddl/transform_specs.sql"
|
||||
|
||||
echo "converting {$PRE_TRANSFORM_SPECS_JSON} to {$PRE_TRANSFORM_SPECS_SQL} ..."
|
||||
python ddl/generate_ddl.py -t pre_transform_spec -i ddl/pre_transform_specs_template.sql -s "$PRE_TRANSFORM_SPECS_JSON" -o "$PRE_TRANSFORM_SPECS_SQL"
|
||||
rc=$?
|
||||
if [[ $rc == 0 ]]; then
|
||||
echo "converting {$PRE_TRANSFORM_SPECS_JSON} to {$PRE_TRANSFORM_SPECS_SQL} sucessfully..."
|
||||
else
|
||||
echo "error in converting {$PRE_TRANSFORM_SPECS_JSON} to {$PRE_TRANSFORM_SPECS_SQL}, bailing out"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "converting {$TRANSFORM_SPECS_JSON} to {$TRANSFORM_SPECS_SQL}..."
|
||||
python ddl/generate_ddl.py -t transform_spec -i ddl/transform_specs_template.sql -s "$TRANSFORM_SPECS_JSON" -o "$TRANSFORM_SPECS_SQL"
|
||||
rc=$?
|
||||
if [[ $rc == 0 ]]; then
|
||||
echo "converting {$TRANSFORM_SPECS_JSON} to {$TRANSFORM_SPECS_SQL} sucessfully..."
|
||||
else
|
||||
echo "error in converting {$TRANSFORM_SPECS_JSON} to {$TRANSFORM_SPECS_SQL}, bailing out"
|
||||
exit 1
|
||||
fi
|
||||
popd
|
@ -1,11 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
SCRIPT_HOME=$(dirname $(readlink -f $BASH_SOURCE))
|
||||
pushd $SCRIPT_HOME
|
||||
|
||||
./generate_ddl.sh
|
||||
|
||||
cp ddl/pre_transform_specs.sql ../devstack/files/monasca-transform/pre_transform_specs.sql
|
||||
cp ddl/transform_specs.sql ../devstack/files/monasca-transform/transform_specs.sql
|
||||
|
||||
popd
|
@ -1,20 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
|
||||
SCRIPT_HOME=$(dirname $(readlink -f $BASH_SOURCE))
|
||||
pushd $SCRIPT_HOME
|
||||
pushd ../
|
||||
rm -rf build monasca-transform.egg-info dist
|
||||
python setup.py bdist_egg
|
||||
|
||||
found_egg=`ls dist`
|
||||
echo
|
||||
echo
|
||||
echo Created egg file at dist/$found_egg
|
||||
dev=dev
|
||||
find_dev_index=`expr index $found_egg $dev`
|
||||
new_filename=${found_egg:0:$find_dev_index - 1 }egg
|
||||
echo Copying dist/$found_egg to dist/$new_filename
|
||||
cp dist/$found_egg dist/$new_filename
|
||||
popd
|
||||
popd
|
@ -1,3 +0,0 @@
|
||||
#!/bin/bash
|
||||
JARS_PATH="/opt/spark/current/lib/spark-streaming-kafka.jar,/opt/spark/current/lib/scala-library-2.10.1.jar,/opt/spark/current/lib/kafka_2.10-0.8.1.1.jar,/opt/spark/current/lib/metrics-core-2.2.0.jar"
|
||||
pyspark --master spark://192.168.10.4:7077 --jars $JARS_PATH
|
@ -1,19 +0,0 @@
|
||||
#!/bin/bash
|
||||
SCRIPT_HOME=$(dirname $(readlink -f $BASH_SOURCE))
|
||||
pushd $SCRIPT_HOME
|
||||
pushd ../
|
||||
|
||||
JARS_PATH="/opt/spark/current/lib/spark-streaming-kafka.jar,/opt/spark/current/lib/scala-library-2.10.1.jar,/opt/spark/current/lib/kafka_2.10-0.8.1.1.jar,/opt/spark/current/lib/metrics-core-2.2.0.jar,/usr/share/java/mysql.jar"
|
||||
export SPARK_HOME=/opt/spark/current/
|
||||
# There is a known issue where obsolete kafka offsets can cause the
|
||||
# driver to crash. However when this occurs, the saved offsets get
|
||||
# deleted such that the next execution should be successful. Therefore,
|
||||
# create a loop to run spark-submit for two iterations or until
|
||||
# control-c is pressed.
|
||||
COUNTER=0
|
||||
while [ $COUNTER -lt 2 ]; do
|
||||
spark-submit --supervise --master spark://192.168.10.4:7077,192.168.10.5:7077 --conf spark.eventLog.enabled=true --jars $JARS_PATH --py-files dist/$new_filename /opt/monasca/transform/lib/driver.py || break
|
||||
let COUNTER=COUNTER+1
|
||||
done
|
||||
popd
|
||||
popd
|
49
setup.cfg
49
setup.cfg
@ -1,49 +0,0 @@
|
||||
[metadata]
|
||||
name=monasca_transform
|
||||
summary=Data Aggregation and Transformation component for Monasca
|
||||
description-file = README.rst
|
||||
author= OpenStack
|
||||
author-email = openstack-discuss@lists.openstack.org
|
||||
home-page=https://wiki.openstack.org/wiki/Monasca/Transform
|
||||
python-requires = >=3.6
|
||||
classifier =
|
||||
Environment :: OpenStack
|
||||
Intended Audience :: Information Technology
|
||||
Intended Audience :: System Administrators
|
||||
License :: OSI Approved :: Apache Software License
|
||||
Operating System :: POSIX :: Linux
|
||||
Programming Language :: Python
|
||||
Programming Language :: Python :: Implementation :: CPython
|
||||
Programming Language :: Python :: 3 :: Only
|
||||
Programming Language :: Python :: 3
|
||||
Programming Language :: Python :: 3.6
|
||||
Programming Language :: Python :: 3.7
|
||||
|
||||
[files]
|
||||
packages =
|
||||
monasca_transform
|
||||
|
||||
[entry_points]
|
||||
monasca_transform.usage =
|
||||
calculate_rate = monasca_transform.component.usage.calculate_rate:CalculateRate
|
||||
fetch_quantity = monasca_transform.component.usage.fetch_quantity:FetchQuantity
|
||||
fetch_quantity_util = monasca_transform.component.usage.fetch_quantity_util:FetchQuantityUtil
|
||||
|
||||
monasca_transform.setter =
|
||||
set_aggregated_metric_name = monasca_transform.component.setter.set_aggregated_metric_name:SetAggregatedMetricName
|
||||
set_aggregated_period = monasca_transform.component.setter.set_aggregated_period:SetAggregatedPeriod
|
||||
rollup_quantity = monasca_transform.component.setter.rollup_quantity:RollupQuantity
|
||||
|
||||
monasca_transform.insert =
|
||||
prepare_data = monasca_transform.component.insert.prepare_data:PrepareData
|
||||
insert_data = monasca_transform.component.insert.kafka_insert:KafkaInsert
|
||||
insert_data_pre_hourly = monasca_transform.component.insert.kafka_insert_pre_hourly:KafkaInsertPreHourly
|
||||
|
||||
[pbr]
|
||||
warnerrors = True
|
||||
autodoc_index_modules = True
|
||||
|
||||
[build_sphinx]
|
||||
all_files = 1
|
||||
build-dir = doc/build
|
||||
source-dir = doc/source
|
20
setup.py
20
setup.py
@ -1,20 +0,0 @@
|
||||
# Copyright (c) 2013 Hewlett-Packard Development Company, L.P.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
# implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import setuptools
|
||||
|
||||
setuptools.setup(
|
||||
setup_requires=['pbr>=2.0.0'],
|
||||
pbr=True)
|
@ -1,14 +0,0 @@
|
||||
# The order of packages is significant, because pip processes them in the order
|
||||
# of appearance. Changing the order has an impact on the overall integration
|
||||
# process, which may cause wedges in the gate later.
|
||||
# mock object framework
|
||||
hacking>=1.1.0,<1.2.0 # Apache-2.0
|
||||
flake8<2.6.0,>=2.5.4 # MIT
|
||||
nose>=1.3.7 # LGPL
|
||||
fixtures>=3.0.0 # Apache-2.0/BSD
|
||||
pycodestyle==2.5.0 # MIT License
|
||||
stestr>=2.0.0 # Apache-2.0
|
||||
# required to build documentation
|
||||
sphinx!=1.6.6,!=1.6.7,>=1.6.2,!=2.1.0 # BSD
|
||||
# computes code coverage percentages
|
||||
coverage!=4.4,>=4.0 # Apache-2.0
|
@ -1,27 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
# Add the location of Spark to the path
|
||||
# TODO(someone) Does the "/opt/spark/current" location need to be configurable?
|
||||
import os
|
||||
import sys
|
||||
|
||||
try:
|
||||
sys.path.append(os.path.join("/opt/spark/current", "python"))
|
||||
sys.path.append(os.path.join("/opt/spark/current",
|
||||
"python", "lib", "py4j-0.10.4-src.zip"))
|
||||
except KeyError:
|
||||
print("Error adding Spark location to the path")
|
||||
# TODO(someone) not sure what action is appropriate
|
||||
sys.exit(1)
|
@ -1,87 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
from unittest import mock
|
||||
|
||||
from pyspark.sql import SQLContext
|
||||
|
||||
from monasca_transform.config.config_initializer import ConfigInitializer
|
||||
from monasca_transform.transform.builder.generic_transform_builder \
|
||||
import GenericTransformBuilder
|
||||
from monasca_transform.transform.transform_utils import RecordStoreUtils
|
||||
from monasca_transform.transform.transform_utils import TransformSpecsUtils
|
||||
from monasca_transform.transform import TransformContextUtils
|
||||
|
||||
from tests.functional.spark_context_test import SparkContextTest
|
||||
from tests.functional.test_resources.mem_total_all.data_provider \
|
||||
import DataProvider
|
||||
from tests.functional.test_resources.mock_component_manager \
|
||||
import MockComponentManager
|
||||
|
||||
|
||||
class TransformBuilderTest(SparkContextTest):
|
||||
|
||||
def setUp(self):
|
||||
super(TransformBuilderTest, self).setUp()
|
||||
# configure the system with a dummy messaging adapter
|
||||
ConfigInitializer.basic_config(
|
||||
default_config_files=[
|
||||
'tests/functional/test_resources/config/test_config.conf'])
|
||||
|
||||
@mock.patch('monasca_transform.transform.builder.generic_transform_builder'
|
||||
'.GenericTransformBuilder._get_insert_component_manager')
|
||||
@mock.patch('monasca_transform.transform.builder.generic_transform_builder'
|
||||
'.GenericTransformBuilder._get_setter_component_manager')
|
||||
@mock.patch('monasca_transform.transform.builder.generic_transform_builder'
|
||||
'.GenericTransformBuilder._get_usage_component_manager')
|
||||
def test_transform_builder(self,
|
||||
usage_manager,
|
||||
setter_manager,
|
||||
insert_manager):
|
||||
|
||||
usage_manager.return_value = MockComponentManager.get_usage_cmpt_mgr()
|
||||
setter_manager.return_value = \
|
||||
MockComponentManager.get_setter_cmpt_mgr()
|
||||
insert_manager.return_value = \
|
||||
MockComponentManager.get_insert_cmpt_mgr()
|
||||
|
||||
record_store_json_path = DataProvider.record_store_path
|
||||
|
||||
metric_proc_json_path = DataProvider.transform_spec_path
|
||||
|
||||
sql_context = SQLContext.getOrCreate(self.spark_context)
|
||||
record_store_df = \
|
||||
RecordStoreUtils.create_df_from_json(sql_context,
|
||||
record_store_json_path)
|
||||
|
||||
transform_spec_df = TransformSpecsUtils.create_df_from_json(
|
||||
sql_context, metric_proc_json_path)
|
||||
|
||||
transform_context = TransformContextUtils.get_context(
|
||||
transform_spec_df_info=transform_spec_df,
|
||||
batch_time_info=self.get_dummy_batch_time())
|
||||
|
||||
# invoke the generic transformation builder
|
||||
instance_usage_df = GenericTransformBuilder.do_transform(
|
||||
transform_context, record_store_df)
|
||||
|
||||
result_list = [(row.usage_date, row.usage_hour,
|
||||
row.tenant_id, row.host, row.quantity,
|
||||
row.aggregated_metric_name)
|
||||
for row in instance_usage_df.rdd.collect()]
|
||||
|
||||
expected_result = [('2016-02-08', '18', 'all',
|
||||
'all', 12946.0,
|
||||
'mem.total_mb_agg')]
|
||||
|
||||
self.assertCountEqual(result_list, expected_result)
|
@ -1,72 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from oslo_config import cfg
|
||||
|
||||
from monasca_transform.component.insert import InsertComponent
|
||||
from tests.functional.messaging.adapter import DummyAdapter
|
||||
|
||||
|
||||
class DummyInsert(InsertComponent):
|
||||
"""Insert component that writes metric data to kafka queue"""
|
||||
|
||||
@staticmethod
|
||||
def insert(transform_context, instance_usage_df):
|
||||
"""write instance usage data to kafka"""
|
||||
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
agg_params = transform_spec_df.select("aggregation_params_map"
|
||||
".dimension_list"
|
||||
).collect()[0].asDict()
|
||||
|
||||
cfg.CONF.set_override(
|
||||
'adapter',
|
||||
'tests.functional.messaging.adapter:DummyAdapter',
|
||||
group='messaging')
|
||||
|
||||
# Approach 1
|
||||
# using foreachPartition to iterate through elements in an
|
||||
# RDD is the recommended approach so as to not overwhelm kafka with the
|
||||
# zillion connections (but in our case the MessageAdapter does
|
||||
# store the adapter_impl so we should not create many producers)
|
||||
|
||||
# using foreachpartitions was causing some serialization (cpickle)
|
||||
# problems where few libs like kafka.SimpleProducer and oslo_config.cfg
|
||||
# were not available
|
||||
#
|
||||
# removing _write_metrics_from_partition for now in favor of
|
||||
# Approach 2
|
||||
#
|
||||
|
||||
# instance_usage_df_agg_params = instance_usage_df.rdd.map(
|
||||
# lambda x: InstanceUsageDataAggParams(x,
|
||||
# agg_params))
|
||||
# instance_usage_df_agg_params.foreachPartition(
|
||||
# DummyInsert._write_metrics_from_partition)
|
||||
|
||||
#
|
||||
# Approach # 2
|
||||
#
|
||||
# using collect() to fetch all elements of an RDD
|
||||
# and write to kafka
|
||||
#
|
||||
|
||||
for instance_usage_row in instance_usage_df.collect():
|
||||
metric = InsertComponent._get_metric(instance_usage_row,
|
||||
agg_params)
|
||||
# validate metric part
|
||||
if InsertComponent._validate_metric(metric):
|
||||
DummyAdapter.send_metric(metric)
|
||||
return instance_usage_df
|
@ -1,69 +0,0 @@
|
||||
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from oslo_config import cfg
|
||||
|
||||
from monasca_transform.component.insert import InsertComponent
|
||||
from tests.functional.messaging.adapter import DummyAdapter
|
||||
|
||||
|
||||
class DummyInsertPreHourly(InsertComponent):
|
||||
"""Insert component that writes metric data to kafka queue"""
|
||||
|
||||
@staticmethod
|
||||
def insert(transform_context, instance_usage_df):
|
||||
"""write instance usage data to kafka"""
|
||||
|
||||
transform_spec_df = transform_context.transform_spec_df_info
|
||||
|
||||
agg_params = transform_spec_df.select("metric_id"
|
||||
).collect()[0].asDict()
|
||||
metric_id = agg_params['metric_id']
|
||||
|
||||
cfg.CONF.set_override(
|
||||
'adapter',
|
||||
'tests.functional.messaging.adapter:DummyAdapter',
|
||||
group='messaging')
|
||||
# Approach 1
|
||||
# using foreachPartition to iterate through elements in an
|
||||
# RDD is the recommended approach so as to not overwhelm kafka with the
|
||||
# zillion connections (but in our case the MessageAdapter does
|
||||
# store the adapter_impl so we should not create many producers)
|
||||
|
||||
# using foreachpartitions was causing some serialization (cpickle)
|
||||
# problems where few libs like kafka.SimpleProducer and oslo_config.cfg
|
||||
# were not available
|
||||
#
|
||||
# removing _write_metrics_from_partition for now in favor of
|
||||
# Approach 2
|
||||
#
|
||||
|
||||
# instance_usage_df_agg_params = instance_usage_df.rdd.map(
|
||||
# lambda x: InstanceUsageDataAggParams(x,
|
||||
# agg_params))
|
||||
# instance_usage_df_agg_params.foreachPartition(
|
||||
# DummyInsert._write_metrics_from_partition)
|
||||
|
||||
#
|
||||
# Approach # 2
|
||||
#
|
||||
# using collect() to fetch all elements of an RDD
|
||||
# and write to kafka
|
||||
#
|
||||
|
||||
for instance_usage_row in instance_usage_df.collect():
|
||||
instance_usage_dict = InsertComponent\
|
||||
._get_instance_usage_pre_hourly(instance_usage_row, metric_id)
|
||||
DummyAdapter.send_metric(instance_usage_dict)
|
||||
return instance_usage_df
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user