Remove project content on master branch

This is step 2b of repository deprecation process as described in [1].
Project deprecation has been anounced here [2].

[1] https://docs.openstack.org/project-team-guide/repository.html#step-2b-remove-project-content
[2] http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016814.html

Depends-On: https://review.opendev.org/751983
Change-Id: I83bb2821d64a4dddd569ff9939aa78d271834f08
This commit is contained in:
Witek Bedyk 2020-09-15 10:08:54 +02:00
parent 326483ee4c
commit 811acd76c9
184 changed files with 10 additions and 21945 deletions

9
.gitignore vendored
View File

@ -1,9 +0,0 @@
.idea
AUTHORS
ChangeLog
monasca_transform.egg-info
tools/vagrant/.vagrant
doc/build/*
.stestr
.tox
*.pyc

View File

@ -1,3 +0,0 @@
[DEFAULT]
test_path=${OS_TEST_PATH:-./tests/unit}
top_dir=./

View File

@ -1,14 +0,0 @@
- project:
templates:
- build-openstack-docs-pti
- check-requirements
- openstack-cover-jobs
- openstack-lower-constraints-jobs
- openstack-python3-victoria-jobs
check:
jobs:
- legacy-tempest-dsvm-monasca-transform-python35-functional:
voting: false
irrelevant-files:
- ^(test-|)requirements.txt$
- ^setup.cfg$

175
LICENSE
View File

@ -1,175 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.

View File

@ -1,110 +1,16 @@
Team and repository tags
========================
.. image:: https://governance.openstack.org/tc/badges/monasca-transform.svg
:target: https://governance.openstack.org/tc/reference/tags/index.html
- `Monasca Transform`_
- `Use Cases handled by Monasca Transform`_
- `Operation`_
- `Architecture`_
- `To set up the development environment`_
- `Generic aggregation components`_
- `Create a new aggregation pipeline example`_
- `Original proposal and blueprint`_
Monasca Transform
=================
monasca-transform is a data driven aggregation engine which collects,
groups and aggregates existing individual Monasca metrics according to
business requirements and publishes new transformed (derived) metrics to
the Monasca Kafka queue.
This project is no longer maintained.
- Since the new transformed metrics are published as any other metric
in Monasca, alarms can be set and triggered on the transformed
metric.
The contents of this repository are still available in the Git
source code management system. To see the contents of this
repository before it reached its end of life, please check out the
previous commit with "git checkout HEAD^1".
- Monasca Transform uses `Apache Spark`_ to aggregate data. `Apache
Spark`_ is a highly scalable, fast, in-memory, fault tolerant and
parallel data processing framework. All monasca-transform components
are implemented in Python and use Sparks `PySpark Python API`_ to
interact with Spark.
Older versions of this project are still supported and available in stable
branches.
- Monasca Transform does transformation and aggregation of incoming
metrics in two phases.
- In the first phase spark streaming application is set to retrieve
in data from kafka at a configurable *stream interval* (default
*stream_inteval* is 10 minutes) and write the data aggregated for
*stream interval* to *pre_hourly_metrics* topic in kafka.
- In the second phase, which is kicked off every hour, all metrics
in *metrics_pre_hourly* topic in Kafka are aggregated again, this
time over a larger interval of an hour. These hourly aggregated
metrics published to *metrics* topic in kafka.
Use Cases handled by Monasca Transform
--------------------------------------
Please refer to **Problem Description** section on the
`Monasca/Transform wiki`_
Operation
---------
Please refer to **How Monasca Transform Operates** section on the
`Monasca/Transform wiki`_
Architecture
------------
Please refer to **Architecture** and **Logical processing data flow**
sections on the `Monasca/Transform wiki`_
To set up the development environment
-------------------------------------
The monasca-transform uses `DevStack`_ as a common dev environment. See
the `README.md`_ in the devstack directory for details on how to include
monasca-transform in a DevStack deployment.
Generic aggregation components
------------------------------
Monasca Transform uses a set of generic aggregation components which can
be assembled in to an aggregation pipeline.
Please refer to the
`generic-aggregation-components`_
document for information on list of generic aggregation components
available.
Create a new aggregation pipeline example
-----------------------------------------
Generic aggregation components make it easy to build new aggregation
pipelines for different Monasca metrics.
This create a `new aggregation pipeline`_ example shows how to create
*pre_transform_specs* and *transform_specs* to create an aggregation
pipeline for a new set of Monasca metrics, while leveraging existing set
of generic aggregation components.
Original proposal and blueprint
-------------------------------
Original proposal: `Monasca/Transform-proposal`_
Blueprint: `monasca-transform blueprint`_
.. _Apache Spark: https://spark.apache.org
.. _generic-aggregation-components: docs/generic-aggregation-components.md
.. _PySpark Python API: https://spark.apache.org/docs/latest/api/python/index.html
.. _Monasca/Transform wiki: https://wiki.openstack.org/wiki/Monasca/Transform
.. _DevStack: https://docs.openstack.org/devstack/latest/
.. _README.md: devstack/README.md
.. _new aggregation pipeline: docs/create-new-aggregation-pipeline.md
.. _Monasca/Transform-proposal: https://wiki.openstack.org/wiki/Monasca/Transform-proposal
.. _monasca-transform blueprint: https://blueprints.launchpad.net/monasca/+spec/monasca-transform
For any further questions, please email
openstack-discuss@lists.openstack.org or join #openstack-monasca on
Freenode.

View File

@ -1,206 +0,0 @@
# Monasca-transform DevStack Plugin
The Monasca-transform DevStack plugin is tested only on Ubuntu 16.04 (Xenial).
A short cut to running monasca-transform in devstack is implemented with vagrant.
## Variables
* DATABASE_PASSWORD(default: *secretmysql*) - password to upload monasca-transform schema
* MONASCA_TRANSFORM_DB_PASSWORD(default: *password*) - password for m-transform user
## To run monasca-transform using the provided vagrant environment
### Using any changes made locally to monasca-transform
cd tools/vagrant
vagrant up
vagrant ssh
cd devstack
./stack.sh
The devstack vagrant environment is set up to share the monasca-transform
directory with the vm, copy it and commit any changes in the vm copy. This is
because the devstack deploy process checks out the master branch to
/opt/stack
and deploys using that. Changes made by the user need to be committed in order
to be used in the devstack instance. It is important therefore that changes
should not be pushed from the vm as the unevaluated commit would be pushed.
N.B. If you are running with virtualbox you may find that the `./stack.sh` fails with the filesystem becoming read only. There is a work around:
1. vagrant up --no-provision && vagrant halt
2. open virtualbox gui
3. open target vm settings and change storage controller from SCSI to SATA
4. vagrant up
### Using the upstream committed state of monasca-transform
This should operate the same as for any other devstack plugin. However, to use
the plugin from the upstream repo with the vagrant environment as described
above it is sufficient to do:
cd tools/vagrant
vagrant up
vagrant ssh
cd devstack
vi local.conf
and change the line
enable_plugin monasca-transform /home/ubuntu/monasca-transform
to
enable_plugin monasca-transform https://opendev.org/openstack/monasca-transform
before running
./stack.sh
### Connecting to devstack
The host key changes with each ```vagrant destroy```/```vagrant up``` cycle so
it is necessary to manage host key verification for your workstation:
ssh-keygen -R 192.168.15.6
The devstack vm vagrant up process generates a private key which can be used for
passwordless ssh to the host as follows:
cd tools/vagrant
ssh -i .vagrant/machines/default/virtualbox/private_key ubuntu@192.168.15.6
### Running tox on devstack
Once the deploy is up use the following commands to set up tox.
sudo su monasca-transform
cd /opt/stack/monasca-transform
virtualenv .venv
. .venv/bin/activate
pip install tox
tox
### Updating the code for dev
To regenerate the environment for development purposes a script is provided
on the devstack instance at
/opt/stack/monasca-transform/tools/vagrant/refresh_monasca_transform.sh
To run the refresh_monasca_transform.sh script on devstack instance
cd /opt/stack/monasca-transform
tools/vagrant/refresh_monasca_transform.sh
(note: to use/run tox after running this script, the
"Running tox on devstack" steps above have to be re-executed)
This mostly re-does the work of the devstack plugin, updating the code from the
shared directory, regenerating the venv and the zip that is passed to spark
during the spark-submit call. The configuration and the transform and
pre transform specs in the database are updated with fresh copies, along
with driver and service python code.
If refresh_monasca_transform.sh script completes successfully you should see
a message like the following in the console.
***********************************************
* *
* SUCCESS!! refresh monasca transform done. *
* *
***********************************************
### Development workflow
Here are the normal steps a developer can take to make any code changes. It is
essential that the developer runs all tests in functional tests in a devstack
environment before submitting any changes for review/merge.
Please follow steps mentioned in
"To run monasca-transform using the provided vagrant environment" section above
to create a devstack VM environment before following steps below:
1. Make code changes on the host machine (e.g. ~/monasca-transform)
2. vagrant ssh (to connect to the devstack VM)
3. cd /opt/stack/monasca-transform
4. tools/vagrant/refresh_monasca_transform.sh (See "Updating the code for dev"
section above)
5. cd /opt/stack/monasca-transform (since monasca-transform folder
gets recreated in Step 4. above)
6. tox -e pep8
7. tox -e py27
8. tox -e functional
Note: It is mandatory to run functional unit tests before submitting any changes
for review/merge. These can be currently be run only in a devstack VM since tests
need access to Apache Spark libraries. This is accomplished by setting
SPARK_HOME environment variable which is being done in tox.ini.
export SPARK_HOME=/opt/spark/current
#### How to find and fix test failures ?
To find which tests failed after running functional tests (After you have run
functional tests as per steps in Development workflow)
export OS_TEST_PATH=tests/functional
export SPARK_HOME=/opt/spark/current
source .tox/functional/bin/activate
testr run
testr failing (to get list of tests that failed)
You can add
import pdb
pdb.set_trace()
in test or in code where you want to start python debugger.
Run test using
python -m testtools.run <test>
For example:
python -m testtools.run \
tests.functional.usage.test_host_cpu_usage_component_second_agg.SparkTest
Reference: https://wiki.openstack.org/wiki/Testr
## Access Spark Streaming and Spark Master/Worker User Interface
In a devstack environment ports on which Spark Streaming UI (4040), Spark Master(18080)
and Spark Worker (18081) UI are available are forwarded to the host and are
accessible from the host machine.
http://<host_machine_ip>:4040/ (Note: Spark Streaming UI,
is available only when
monasca-transform application
is running)
http://<host_machine_ip>:18080/ (Spark Master UI)
http://<host_machine_ip>:18081/ (Spark Worker UI)
## To run monasca-transform using a different deployment technology
Monasca-transform requires supporting services, such as Kafka and
Zookeeper, also are set up. So just adding "enable_plugin monasca-transform"
to a default DevStack local.conf is not sufficient to configure a working
DevStack deployment unless these services are also added.
Please reference the devstack/settings file for an example of a working list of
plugins and services as used by the Vagrant deployment.
## WIP
This is a work in progress. There are a number of improvements necessary to
improve value as a development tool.
###TODO
1. Shorten initial deploy
Currently the services deployed are the default set plus all of monasca. It's
quite possible that not all of this is necessary to develop monasca-transform.
So some services may be dropped in order to shorten the deploy.

View File

@ -1,20 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
activate_this_file = "/opt/monasca/transform/venv/bin/activate_this.py"
exec(open(activate_this_file).read(), dict(__file__=activate_this_file))
from monasca_transform.driver.mon_metrics_kafka import invoke
invoke()

View File

@ -1,94 +0,0 @@
#!/bin/bash
### BEGIN INIT INFO
# Provides: {{ service_name }}
# Required-Start:
# Required-Stop:
# Default-Start: {{ service_start_levels }}
# Default-Stop:
# Short-Description: {{ service_name }}
# Description:
### END INIT INFO
service_is_running()
{
if [ -e {{ service_pid_file }} ]; then
PID=$(cat {{ service_pid_file }})
if $(ps $PID > /dev/null 2>&1); then
return 0
else
echo "Found obsolete PID file for {{ service_name }}...deleting it"
rm {{ service_pid_file }}
return 1
fi
else
return 1
fi
}
case $1 in
start)
echo "Starting {{ service_name }}..."
if service_is_running; then
echo "{{ service_name }} is already running"
exit 0
fi
echo "
_/_/ _/_/ _/_/_/_/ _/_/ _/ _/_/_/_/ _/_/_/_/ _/_/_/_/ _/_/_/_/
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/_/_/_/ _/ _/_/_/_/ _/_/_/_/
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
_/ _/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
_/ _/ _/ _/_/_/_/ _/ _/_/ _/ _/ _/_/_/_/ _/_/_/_/ _/ _/
_/_/_/_/ _/_/_/ _/_/_/_/ _/_/ _/ _/_/_/_/ _/_/_/_/ _/_/_/_/ _/_/_/ _/_/ _/_/
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
_/ _/ _/ _/_/_/_/ _/ _/ _/ _/_/_/_/ _/_/_/ _/ _/ _/ _/ _/ _/ _/ _/
_/ _/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/ _/ _/ _/ _/
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/ _/
_/ _/ _/ _/ _/ _/ _/_/ _/_/_/_/ _/ _/_/_/_/ _/ _/ _/ _/ _/
" >> {{ service_log_dir }}/{{ service_name }}.log
nohup sudo -u {{ service_user }} {{ virtualenv_location }}/bin/python \
{{ service_dir }}/{{ service_file }} \
>> {{ service_log_dir }}/{{ service_name }}.log \
2>> {{ service_log_dir }}/{{ service_name }}.log &
PID=$(echo $!)
if [ -z $PID ]; then
echo "{{ service_name }} failed to start"
else
echo $PID > {{ service_pid_file }}
echo "{{ service_name }} is running"
fi
;;
stop)
echo "Stopping {{ service_name }}..."
if service_is_running; then
PID=$(cat {{ service_pid_file }})
sudo kill -- -$(ps -o pgid= $PID | grep -o '[0-9]*')
rm {{ service_pid_file }}
echo "{{ service_name }} is stopped"
else
echo "{{ service_name }} is not running"
exit 0
fi
;;
status)
if service_is_running; then
echo "{{ service_name }} is running"
else
echo "{{ service_name }} is not running"
fi
;;
restart)
$0 stop
$0 start
;;
esac

View File

@ -1,88 +0,0 @@
[DEFAULTS]
[repositories]
offsets = monasca_transform.mysql_offset_specs:MySQLOffsetSpecs
data_driven_specs = monasca_transform.data_driven_specs.mysql_data_driven_specs_repo:MySQLDataDrivenSpecsRepo
offsets_max_revisions = 10
[database]
server_type = mysql:thin
host = localhost
database_name = monasca_transform
username = m-transform
password = password
[messaging]
adapter = monasca_transform.messaging.adapter:KafkaMessageAdapter
topic = metrics
brokers=192.168.15.6:9092
publish_region = useast
publish_kafka_project_id=d2cb21079930415a9f2a33588b9f2bb6
adapter_pre_hourly = monasca_transform.messaging.adapter:KafkaMessageAdapterPreHourly
topic_pre_hourly = metrics_pre_hourly
[stage_processors]
pre_hourly_processor_enabled = True
[pre_hourly_processor]
late_metric_slack_time = 600
enable_instance_usage_df_cache = True
instance_usage_df_cache_storage_level = MEMORY_ONLY_SER_2
enable_batch_time_filtering = True
data_provider=monasca_transform.processor.pre_hourly_processor:PreHourlyProcessorDataProvider
effective_batch_revision=2
#
# Configurable values for the monasca-transform service
#
[service]
# The address of the mechanism being used for election coordination
coordinator_address = kazoo://localhost:2181
# The name of the coordination/election group
coordinator_group = monasca-transform
# How long the candidate should sleep between election result
# queries (in seconds)
election_polling_frequency = 15
# Whether debug-level log entries should be included in the application
# log. If this setting is false, info-level will be used for logging.
enable_debug_log_entries = true
# The path for the monasca-transform Spark driver
spark_driver = /opt/monasca/transform/lib/driver.py
# the location for the transform-service log
service_log_path=/var/log/monasca/transform/
# the filename for the transform-service log
service_log_filename=monasca-transform.log
# Whether Spark event logging should be enabled (true/false)
spark_event_logging_enabled = true
# A list of jars which Spark should use
spark_jars_list = /opt/spark/current/assembly/target/scala-2.10/jars/spark-streaming-kafka-0-8_2.10-2.2.0.jar,/opt/spark/current/assembly/target/scala-2.10/jars/scala-library-2.10.6.jar,/opt/spark/current/assembly/target/scala-2.10/jars/kafka_2.10-0.8.1.1.jar,/opt/spark/current/assembly/target/scala-2.10/jars/metrics-core-2.2.0.jar,/opt/spark/current/assembly/target/scala-2.10/jars/drizzle-jdbc-1.3.jar
# A list of where the Spark master(s) should run
spark_master_list = spark://localhost:7077
# spark_home for the environment
spark_home = /opt/spark/current
# Python files for Spark to use
spark_python_files = /opt/monasca/transform/lib/monasca-transform.zip
# How often the stream should be read (in seconds)
stream_interval = 600
# The working directory for monasca-transform
work_dir = /var/run/monasca/transform
# enable caching of record store df
enable_record_store_df_cache = True
# set spark storage level for record store df cache
record_store_df_cache_storage_level = MEMORY_ONLY_SER_2

View File

@ -1,30 +0,0 @@
CREATE DATABASE IF NOT EXISTS `monasca_transform` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
USE `monasca_transform`;
SET foreign_key_checks = 0;
CREATE TABLE IF NOT EXISTS `kafka_offsets` (
`id` INTEGER AUTO_INCREMENT NOT NULL,
`topic` varchar(128) NOT NULL,
`until_offset` BIGINT NULL,
`from_offset` BIGINT NULL,
`app_name` varchar(128) NOT NULL,
`partition` integer NOT NULL,
`batch_time` varchar(20) NOT NULL,
`last_updated` varchar(20) NOT NULL,
`revision` integer NOT NULL,
PRIMARY KEY (`id`, `app_name`, `topic`, `partition`, `revision`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE IF NOT EXISTS `transform_specs` (
`metric_id` varchar(128) NOT NULL,
`transform_spec` varchar(2048) NOT NULL,
PRIMARY KEY (`metric_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE IF NOT EXISTS `pre_transform_specs` (
`event_type` varchar(128) NOT NULL,
`pre_transform_spec` varchar(2048) NOT NULL,
PRIMARY KEY (`event_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

View File

@ -1,12 +0,0 @@
description "Monasca Transform"
start on runlevel [2345]
stop on runlevel [!2345]
respawn
limit nofile 32768 32768
expect daemon
exec /etc/monasca/transform/init/start-monasca-transform.sh

View File

@ -1,29 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import sys
activate_this_file = "/opt/monasca/transform/venv/bin/activate_this.py"
exec(open(activate_this_file).read(), dict(__file__=activate_this_file))
from monasca_transform.service.transform_service import main_service
def main():
main_service()
if __name__ == "__main__":
main()
sys.exit(0)

View File

@ -1,3 +0,0 @@
#!/usr/bin/env bash
cd /
/opt/monasca/transform/venv/bin/python /etc/monasca/transform/init/service_runner.py

View File

@ -1,30 +0,0 @@
spark.driver.extraClassPath /opt/spark/current/assembly/target/scala-2.10/jars/drizzle-jdbc-1.3.jar
spark.executor.extraClassPath /opt/spark/current/assembly/target/scala-2.10/jars/drizzle-jdbc-1.3.jar
spark.blockManager.port 7100
spark.broadcast.port 7105
spark.cores.max 1
spark.driver.memory 512m
spark.driver.port 7110
spark.eventLog.dir /var/log/spark/events
spark.executor.cores 1
spark.executor.memory 512m
spark.executor.port 7115
spark.fileserver.port 7120
spark.python.worker.memory 16m
spark.speculation true
spark.speculation.interval 200
spark.sql.shuffle.partitions 32
spark.worker.cleanup.enabled True
spark.cleaner.ttl 900
spark.sql.ui.retainedExecutions 10
spark.streaming.ui.retainedBatches 10
spark.worker.ui.retainedExecutors 10
spark.worker.ui.retainedDrivers 10
spark.ui.retainedJobs 10
spark.ui.retainedStages 10
spark.driver.extraJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/tmp/gc_driver.log
spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/tmp/gc_executor.log
spark.executor.logs.rolling.maxRetainedFiles 6
spark.executor.logs.rolling.strategy time
spark.executor.logs.rolling.time.interval hourly

View File

@ -1,18 +0,0 @@
#!/usr/bin/env bash
export SPARK_LOCAL_IP=127.0.0.1
export SPARK_MASTER_IP=127.0.0.1
export SPARK_MASTER_PORT=7077
export SPARK_MASTERS=127.0.0.1:7077
export SPARK_MASTER_WEBUI_PORT=18080
export SPARK_WORKER_PORT=7078
export SPARK_WORKER_WEBUI_PORT=18081
export SPARK_WORKER_DIR=/var/run/spark/work
export SPARK_WORKER_MEMORY=2g
export SPARK_WORKER_CORES=2
export SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=file://var/log/spark/events -Dspark.history.ui.port=18082"
export SPARK_LOG_DIR=/var/log/spark
export SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=127.0.0.1:2181 -Dspark.deploy.zookeeper.dir=/var/run/spark"

View File

@ -1,12 +0,0 @@
[Unit]
Description=Spark Master
After=zookeeper.service
[Service]
User=spark
Group=spark
ExecStart=/etc/spark/init/start-spark-master.sh
Restart=on-failure
[Install]
WantedBy=multi-user.target

View File

@ -1,18 +0,0 @@
#!/usr/bin/env bash
export SPARK_LOCAL_IP=127.0.0.1
export SPARK_MASTER_IP=127.0.0.1
export SPARK_MASTER_PORT=7077
export SPARK_MASTERS=127.0.0.1:7077
export SPARK_MASTER_WEBUI_PORT=18080
export SPARK_WORKER_PORT=7078
export SPARK_WORKER_WEBUI_PORT=18081
export SPARK_WORKER_DIR=/var/run/spark/work
export SPARK_WORKER_MEMORY=2g
export SPARK_WORKER_CORES=1
export SPARK_WORKER_OPTS="$SPARK_WORKER_OPTS -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=900 -Dspark.worker.cleanup.appDataTtl=1*24*3600"
export SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=file://var/log/spark/events -Dspark.history.ui.port=18082"
export SPARK_LOG_DIR=/var/log/spark
export SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=127.0.0.1:2181 -Dspark.deploy.zookeeper.dir=/var/run/spark"

View File

@ -1,9 +0,0 @@
[Unit]
Description=Spark Worker
After=zookeeper.service
[Service]
User=spark
Group=spark
ExecStart=/etc/spark/init/start-spark-worker.sh
Restart=on-failure

View File

@ -1,14 +0,0 @@
#!/usr/bin/env bash
. /opt/spark/current/conf/spark-env.sh
export EXEC_CLASS=org.apache.spark.deploy.master.Master
export INSTANCE_ID=1
export SPARK_CLASSPATH=/etc/spark/conf/:/opt/spark/current/assembly/target/scala-2.10/jars/*
export log="$SPARK_LOG_DIR/spark-spark-"$EXEC_CLASS"-"$INSTANCE_ID"-127.0.0.1.out"
export SPARK_HOME=/opt/spark/current
# added for spark 2
export PYTHONPATH="${SPARK_HOME}/python:${PYTHONPATH}"
export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.4-src.zip:${PYTHONPATH}"
export SPARK_SCALA_VERSION="2.10"
/usr/bin/java -cp "$SPARK_CLASSPATH" $SPARK_DAEMON_JAVA_OPTS -Xms1g -Xmx1g "$EXEC_CLASS" --ip "$SPARK_MASTER_IP" --port "$SPARK_MASTER_PORT" --webui-port "$SPARK_MASTER_WEBUI_PORT" --properties-file "/etc/spark/conf/spark-defaults.conf"

View File

@ -1,17 +0,0 @@
#!/usr/bin/env bash
. /opt/spark/current/conf/spark-worker-env.sh
export EXEC_CLASS=org.apache.spark.deploy.worker.Worker
export INSTANCE_ID=1
export SPARK_CLASSPATH=/etc/spark/conf/:/opt/spark/current/assembly/target/scala-2.10/jars/*
export log="$SPARK_LOG_DIR/spark-spark-"$EXEC_CLASS"-"$INSTANCE_ID"-127.0.0.1.out"
export SPARK_HOME=/opt/spark/current
# added for spark 2.1.1
export PYTHONPATH="${SPARK_HOME}/python:${PYTHONPATH}"
export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.4-src.zip:${PYTHONPATH}"
export SPARK_SCALA_VERSION="2.10"
/usr/bin/java -cp "$SPARK_CLASSPATH" $SPARK_DAEMON_JAVA_OPTS -Xms1g -Xmx1g "$EXEC_CLASS" --host $SPARK_LOCAL_IP --cores $SPARK_WORKER_CORES --memory $SPARK_WORKER_MEMORY --port "$SPARK_WORKER_PORT" -d "$SPARK_WORKER_DIR" --webui-port "$SPARK_WORKER_WEBUI_PORT" --properties-file "/etc/spark/conf/spark-defaults.conf" spark://$SPARK_MASTERS

View File

@ -1,485 +0,0 @@
# (C) Copyright 2015 Hewlett Packard Enterprise Development Company LP
# Copyright 2016 FUJITSU LIMITED
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Monasca-transform DevStack plugin
#
# Install and start Monasca-transform service in devstack
#
# To enable Monasca-transform in devstack add an entry to local.conf that
# looks like
#
# [[local|localrc]]
# enable_plugin monasca-transform https://opendev.org/openstack/monasca-transform
#
# By default all Monasca services are started (see
# devstack/settings). To disable a specific service use the
# disable_service function. For example to turn off notification:
#
# disable_service monasca-notification
#
# Several variables set in the localrc section adjust common behaviors
# of Monasca (see within for additional settings):
#
# EXAMPLE VARS HERE
# Save trace setting
XTRACE=$(set +o | grep xtrace)
set -o xtrace
ERREXIT=$(set +o | grep errexit)
set -o errexit
# monasca-transform database password
export MONASCA_TRANSFORM_DB_PASSWORD=${MONASCA_TRANSFORM_DB_PASSWORD:-"password"}
export MONASCA_TRANSFORM_FILES="${DEST}"/monasca-transform/devstack/files
export DOWNLOADS_DIRECTORY=${DOWNLOADS_DIRECTORY:-"/home/${USER}/downloads"}
function pre_install_monasca_transform {
:
}
function pre_install_spark {
for SPARK_JAVA_LIB in "${SPARK_JAVA_LIBS[@]}"
do
SPARK_LIB_NAME=`echo ${SPARK_JAVA_LIB} | sed 's/.*\///'`
download_through_cache ${MAVEN_REPO}/${SPARK_JAVA_LIB} ${SPARK_LIB_NAME}
done
for SPARK_JAR in "${SPARK_JARS[@]}"
do
SPARK_JAR_NAME=`echo ${SPARK_JAR} | sed 's/.*\///'`
download_through_cache ${MAVEN_REPO}/${SPARK_JAR} ${SPARK_JAR_NAME}
done
download_through_cache ${APACHE_MIRROR}/spark/spark-${SPARK_VERSION}/${SPARK_TARBALL_NAME} ${SPARK_TARBALL_NAME} 1000
}
function install_java_libs {
pushd /opt/spark/current/assembly/target/scala-2.10/jars/
for SPARK_JAVA_LIB in "${SPARK_JAVA_LIBS[@]}"
do
SPARK_LIB_NAME=`echo ${SPARK_JAVA_LIB} | sed 's/.*\///'`
copy_from_cache ${SPARK_LIB_NAME}
done
popd
}
function install_spark_jars {
# create a directory for jars
mkdir -p /opt/spark/current/assembly/target/scala-2.10/jars
# copy jars to new location
pushd /opt/spark/current/assembly/target/scala-2.10/jars
for SPARK_JAR in "${SPARK_JARS[@]}"
do
SPARK_JAR_NAME=`echo ${SPARK_JAR} | sed 's/.*\///'`
copy_from_cache ${SPARK_JAR_NAME}
done
# copy all jars except spark and scala to assembly/target/scala_2.10/jars
find /opt/spark/current/jars/ -type f ! \( -iname 'spark*' -o -iname 'scala*' -o -iname 'jackson-module-scala*' -o -iname 'json4s-*' -o -iname 'breeze*' -o -iname 'spire*' -o -iname 'macro-compat*' -o -iname 'shapeless*' -o -iname 'machinist*' -o -iname 'chill*' \) -exec cp {} . \;
# rename jars directory
mv /opt/spark/current/jars/ /opt/spark/current/jars_original
popd
}
function copy_from_cache {
resource_name=$1
target_directory=${2:-"./."}
cp ${DOWNLOADS_DIRECTORY}/${resource_name} ${target_directory}/.
}
function download_through_cache {
resource_location=$1
resource_name=$2
resource_timeout=${3:-"300"}
if [[ ! -d ${DOWNLOADS_DIRECTORY} ]]; then
_safe_permission_operation mkdir -p ${DOWNLOADS_DIRECTORY}
_safe_permission_operation chown ${USER} ${DOWNLOADS_DIRECTORY}
fi
pushd ${DOWNLOADS_DIRECTORY}
if [[ ! -f ${resource_name} ]]; then
curl -m ${resource_timeout} --retry 3 --retry-delay 5 ${resource_location} -o ${resource_name}
fi
popd
}
function unstack_monasca_transform {
echo_summary "Unstack Monasca-transform"
stop_process "monasca-transform" || true
}
function delete_monasca_transform_files {
sudo rm -rf /opt/monasca/transform || true
sudo rm /etc/monasca-transform.conf || true
MONASCA_TRANSFORM_DIRECTORIES=("/var/log/monasca/transform" "/var/run/monasca/transform" "/etc/monasca/transform/init")
for MONASCA_TRANSFORM_DIRECTORY in "${MONASCA_TRANSFORM_DIRECTORIES[@]}"
do
sudo rm -rf ${MONASCA_TRANSFORM_DIRECTORY} || true
done
}
function drop_monasca_transform_database {
sudo mysql -u$DATABASE_USER -p$DATABASE_PASSWORD -h$MYSQL_HOST -e "drop database monasca_transform; drop user 'm-transform'@'%' from mysql.user; drop user 'm-transform'@'localhost' from mysql.user;" || echo "Failed to drop database 'monasca_transform' and/or user 'm-transform' from mysql database, you may wish to do this manually."
}
function unstack_spark {
echo_summary "Unstack Spark"
stop_spark_worker
stop_spark_master
}
function stop_spark_worker {
stop_process "spark-worker"
}
function stop_spark_master {
stop_process "spark-master"
}
function clean_spark {
echo_summary "Clean spark"
set +o errexit
delete_spark_start_scripts
delete_spark_upstart_definitions
unlink_spark_commands
delete_spark_directories
sudo rm -rf `readlink /opt/spark/current` || true
sudo rm -rf /opt/spark || true
sudo userdel spark || true
sudo groupdel spark || true
set -o errexit
}
function clean_monasca_transform {
set +o errexit
delete_monasca_transform_files
sudo rm /etc/init/monasca-transform.conf || true
sudo rm -rf /etc/monasca/transform || true
drop_monasca_transform_database
set -o errexit
}
function create_spark_directories {
for SPARK_DIRECTORY in "${SPARK_DIRECTORIES[@]}"
do
sudo mkdir -p ${SPARK_DIRECTORY}
sudo chown ${USER} ${SPARK_DIRECTORY}
sudo chmod 755 ${SPARK_DIRECTORY}
done
}
function delete_spark_directories {
for SPARK_DIRECTORY in "${SPARK_DIRECTORIES[@]}"
do
sudo rm -rf ${SPARK_DIRECTORY} || true
done
}
function link_spark_commands_to_usr_bin {
SPARK_COMMANDS=("spark-submit" "spark-class" "spark-shell" "spark-sql")
for SPARK_COMMAND in "${SPARK_COMMANDS[@]}"
do
sudo ln -sf /opt/spark/current/bin/${SPARK_COMMAND} /usr/bin/${SPARK_COMMAND}
done
}
function unlink_spark_commands {
SPARK_COMMANDS=("spark-submit" "spark-class" "spark-shell" "spark-sql")
for SPARK_COMMAND in "${SPARK_COMMANDS[@]}"
do
sudo unlink /usr/bin/${SPARK_COMMAND} || true
done
}
function copy_and_link_config {
SPARK_ENV_FILES=("spark-env.sh" "spark-worker-env.sh" "spark-defaults.conf")
for SPARK_ENV_FILE in "${SPARK_ENV_FILES[@]}"
do
cp -f "${MONASCA_TRANSFORM_FILES}"/spark/"${SPARK_ENV_FILE}" /etc/spark/conf/.
ln -sf /etc/spark/conf/"${SPARK_ENV_FILE}" /opt/spark/current/conf/"${SPARK_ENV_FILE}"
done
}
function copy_spark_start_scripts {
SPARK_START_SCRIPTS=("start-spark-master.sh" "start-spark-worker.sh")
for SPARK_START_SCRIPT in "${SPARK_START_SCRIPTS[@]}"
do
cp -f "${MONASCA_TRANSFORM_FILES}"/spark/"${SPARK_START_SCRIPT}" /etc/spark/init/.
chmod 755 /etc/spark/init/"${SPARK_START_SCRIPT}"
done
}
function delete_spark_start_scripts {
SPARK_START_SCRIPTS=("start-spark-master.sh" "start-spark-worker.sh")
for SPARK_START_SCRIPT in "${SPARK_START_SCRIPTS[@]}"
do
rm /etc/spark/init/"${SPARK_START_SCRIPT}" || true
done
}
function install_monasca_transform {
echo_summary "Install Monasca-Transform"
create_monasca_transform_directories
copy_monasca_transform_files
create_monasca_transform_venv
sudo cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/start-monasca-transform.sh /etc/monasca/transform/init/.
sudo chmod +x /etc/monasca/transform/init/start-monasca-transform.sh
sudo cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/service_runner.py /etc/monasca/transform/init/.
}
function create_monasca_transform_directories {
MONASCA_TRANSFORM_DIRECTORIES=("/var/log/monasca/transform" "/opt/monasca/transform" "/opt/monasca/transform/lib" "/var/run/monasca/transform" "/etc/monasca/transform/init")
for MONASCA_TRANSFORM_DIRECTORY in "${MONASCA_TRANSFORM_DIRECTORIES[@]}"
do
sudo mkdir -p ${MONASCA_TRANSFORM_DIRECTORY}
sudo chown ${USER} ${MONASCA_TRANSFORM_DIRECTORY}
chmod 755 ${MONASCA_TRANSFORM_DIRECTORY}
done
}
function get_id () {
echo `"$@" | grep ' id ' | awk '{print $4}'`
}
function ascertain_admin_project_id {
source ~/devstack/openrc admin admin
export ADMIN_PROJECT_ID=$(get_id openstack project show mini-mon)
}
function copy_monasca_transform_files {
cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/service_runner.py /opt/monasca/transform/lib/.
sudo cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/monasca-transform.conf /etc/.
cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/driver.py /opt/monasca/transform/lib/.
${DEST}/monasca-transform/scripts/create_zip.sh
cp -f "${DEST}"/monasca-transform/scripts/monasca-transform.zip /opt/monasca/transform/lib/.
${DEST}/monasca-transform/scripts/generate_ddl_for_devstack.sh
cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/monasca-transform_mysql.sql /opt/monasca/transform/lib/.
cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/transform_specs.sql /opt/monasca/transform/lib/.
cp -f "${MONASCA_TRANSFORM_FILES}"/monasca-transform/pre_transform_specs.sql /opt/monasca/transform/lib/.
touch /var/log/monasca/transform/monasca-transform.log
# set variables in configuration files
iniset -sudo /etc/monasca-transform.conf database password "$MONASCA_TRANSFORM_DB_PASSWORD"
iniset -sudo /etc/monasca-transform.conf messaging brokers "$SERVICE_HOST:9092"
iniset -sudo /etc/monasca-transform.conf messaging publish_region "$REGION_NAME"
}
function create_monasca_transform_venv {
sudo chown -R ${USER} ${DEST}/monasca-transform
virtualenv /opt/monasca/transform/venv ;
. /opt/monasca/transform/venv/bin/activate ;
pip install -e "${DEST}"/monasca-transform/ ;
deactivate
}
function create_and_populate_monasca_transform_database {
# must login as root@localhost
mysql -u$DATABASE_USER -p$DATABASE_PASSWORD -h$MYSQL_HOST < /opt/monasca/transform/lib/monasca-transform_mysql.sql || echo "Did the schema change? This process will fail on schema changes."
# set grants for m-transform user (needs to be done from localhost)
mysql -u$DATABASE_USER -p$DATABASE_PASSWORD -h$MYSQL_HOST -e "GRANT ALL ON monasca_transform.* TO 'm-transform'@'%' IDENTIFIED BY '${MONASCA_TRANSFORM_DB_PASSWORD}';"
mysql -u$DATABASE_USER -p$DATABASE_PASSWORD -h$MYSQL_HOST -e "GRANT ALL ON monasca_transform.* TO 'm-transform'@'localhost' IDENTIFIED BY '${MONASCA_TRANSFORM_DB_PASSWORD}';"
# copy rest of files after grants are ready
mysql -um-transform -p$MONASCA_TRANSFORM_DB_PASSWORD -h$MYSQL_HOST < /opt/monasca/transform/lib/pre_transform_specs.sql
mysql -um-transform -p$MONASCA_TRANSFORM_DB_PASSWORD -h$MYSQL_HOST < /opt/monasca/transform/lib/transform_specs.sql
}
function install_spark {
echo_summary "Install Spark"
sudo mkdir /opt/spark || true
sudo chown -R ${USER} /opt/spark
tar -xzf ${DOWNLOADS_DIRECTORY}/${SPARK_TARBALL_NAME} -C /opt/spark/
ln -sf /opt/spark/${SPARK_HADOOP_VERSION} /opt/spark/current
install_spark_jars
install_java_libs
create_spark_directories
link_spark_commands_to_usr_bin
copy_and_link_config
copy_spark_start_scripts
}
function extra_spark {
start_spark_master
start_spark_worker
}
function start_spark_worker {
run_process "spark-worker" "/etc/spark/init/start-spark-worker.sh"
}
function start_spark_master {
run_process "spark-master" "/etc/spark/init/start-spark-master.sh"
}
function post_config_monasca_transform {
create_and_populate_monasca_transform_database
}
function post_config_spark {
:
}
function extra_monasca_transform {
/opt/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 64 --topic metrics_pre_hourly
ascertain_admin_project_id
sudo sed -i "s/publish_kafka_project_id=d2cb21079930415a9f2a33588b9f2bb6/publish_kafka_project_id=${ADMIN_PROJECT_ID}/g" /etc/monasca-transform.conf
start_monasca_transform
}
function start_monasca_transform {
run_process "monasca-transform" "/etc/monasca/transform/init/start-monasca-transform.sh"
# systemd unit file updates
local unitfile="$SYSTEMD_DIR/devstack@monasca-transform.service"
local after_service="devstack@zookeeper.service devstack@spark-master.service devstack@spark-worker.service"
iniset -sudo "$unitfile" "Unit" "After" "$after_service"
iniset -sudo "$unitfile" "Service" "Type" "simple"
iniset -sudo "$unitfile" "Service" "LimitNOFILE" "32768"
# reset KillMode for monasca-transform, as spawns several child processes
iniset -sudo "$unitfile" "Service" "KillMode" "control-group"
sudo systemctl daemon-reload
}
# check for service enabled
if is_service_enabled monasca-transform; then
if [[ "$1" == "stack" && "$2" == "pre-install" ]]; then
# Set up system services
echo_summary "Configuring Spark system services"
pre_install_spark
echo_summary "Configuring Monasca-transform system services"
pre_install_monasca_transform
elif [[ "$1" == "stack" && "$2" == "install" ]]; then
# Perform installation of service source
echo_summary "Installing Spark"
install_spark
echo_summary "Installing Monasca-transform"
install_monasca_transform
elif [[ "$1" == "stack" && "$2" == "post-config" ]]; then
# Configure after the other layer 1 and 2 services have been configured
echo_summary "Configuring Spark"
post_config_spark
echo_summary "Configuring Monasca-transform"
post_config_monasca_transform
elif [[ "$1" == "stack" && "$2" == "extra" ]]; then
# Initialize and start the Monasca service
echo_summary "Initializing Spark"
extra_spark
echo_summary "Initializing Monasca-transform"
extra_monasca_transform
fi
if [[ "$1" == "unstack" ]]; then
echo_summary "Unstacking Monasca-transform"
unstack_monasca_transform
echo_summary "Unstacking Spark"
unstack_spark
fi
if [[ "$1" == "clean" ]]; then
# Remove state and transient data
# Remember clean.sh first calls unstack.sh
echo_summary "Cleaning Monasca-transform"
clean_monasca_transform
echo_summary "Cleaning Spark"
clean_spark
fi
else
echo_summary "Monasca-transform not enabled"
fi
#Restore errexit
$ERREXIT
# Restore xtrace
$XTRACE

View File

@ -1,57 +0,0 @@
#!/bin/bash -xe
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
# This script is executed inside post_test_hook function in devstack gate
function generate_testr_results {
if [ -f .testrepository/0 ]; then
sudo .tox/functional/bin/testr last --subunit > $WORKSPACE/testrepository.subunit
sudo mv $WORKSPACE/testrepository.subunit $BASE/logs/testrepository.subunit
sudo /usr/os-testr-env/bin/subunit2html $BASE/logs/testrepository.subunit $BASE/logs/testr_results.html
sudo gzip -9 $BASE/logs/testrepository.subunit
sudo gzip -9 $BASE/logs/testr_results.html
sudo chown $USER:$USER $BASE/logs/testrepository.subunit.gz $BASE/logs/testr_results.html.gz
sudo chmod a+r $BASE/logs/testrepository.subunit.gz $BASE/logs/testr_results.html.gz
fi
}
export MONASCA_TRANSFORM_DIR="$BASE/new/monasca-transform"
export MONASCA_TRANSFORM_LOG_DIR="/var/log/monasca/transform/"
# Go to the monasca-transform dir
cd $MONASCA_TRANSFORM_DIR
if [[ -z "$STACK_USER" ]]; then
export STACK_USER=stack
fi
sudo chown -R $STACK_USER:stack $MONASCA_TRANSFORM_DIR
# create a log dir
sudo mkdir -p $MONASCA_TRANSFORM_LOG_DIR
sudo chown -R $STACK_USER:stack $MONASCA_TRANSFORM_LOG_DIR
# Run tests
echo "Running monasca-transform functional test suite"
set +e
sudo -E -H -u ${STACK_USER:-${USER}} tox -efunctional
EXIT_CODE=$?
set -e
# Collect and parse result
generate_testr_results
exit $EXIT_CODE

View File

@ -1,84 +0,0 @@
#
# (C) Copyright 2015 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
disable_service horizon
disable_service monasca-thresh
enable_service monasca
enable_service monasca-influxdb
enable_service monasca-storm
enable_service zookeeper
enable_service monasca-kafka
enable_service monasca-api
enable_service monasca-persister
enable_service monasca-agent
enable_service monasca-cli
enable_service monasca-transform
enable_service spark-master
enable_service spark-worker
#
# Dependent Software Versions
#
# spark vars
SPARK_DIRECTORIES=("/var/spark" "/var/log/spark" "/var/log/spark/events" "/var/run/spark" "/var/run/spark/work" "/etc/spark/conf" "/etc/spark/init" )
SPARK_VERSION=${SPARK_VERSION:-2.2.0}
HADOOP_VERSION=${HADOOP_VERSION:-2.7}
SPARK_HADOOP_VERSION=spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION
SPARK_TARBALL_NAME=${SPARK_HADOOP_VERSION}.tgz
MAVEN_REPO=${MAVEN_REPO:-https://repo1.maven.org/maven2}
APACHE_MIRROR=${APACHE_MIRROR:-http://archive.apache.org/dist/}
# Kafka deb consists of the version of scala plus the version of kafka
BASE_KAFKA_VERSION=${BASE_KAFKA_VERSION:-0.8.1.1}
SCALA_VERSION=${SCALA_VERSION:-2.10}
KAFKA_VERSION=${KAFKA_VERSION:-${SCALA_VERSION}-${BASE_KAFKA_VERSION}}
SPARK_JAVA_LIBS=("org/apache/kafka/kafka_2.10/0.8.1.1/kafka_2.10-0.8.1.1.jar" "com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar" "org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar" "org/scala-lang/scala-compiler/2.10.6/scala-compiler-2.10.6.jar" "org/scala-lang/scala-reflect/2.10.6/scala-reflect-2.10.6.jar" "org/scala-lang/scalap/2.10.6/scalap-2.10.6.jar" "org/apache/spark/spark-streaming-kafka-0-8_2.10/${SPARK_VERSION}/spark-streaming-kafka-0-8_2.10-${SPARK_VERSION}.jar" "org/drizzle/jdbc/drizzle-jdbc/1.3/drizzle-jdbc-1.3.jar" "com/fasterxml/jackson/module/jackson-module-scala_2.10/2.6.5/jackson-module-scala_2.10-2.6.5.jar" "org/json4s/json4s-jackson_2.10/3.2.11/json4s-jackson_2.10-3.2.11.jar" "org/json4s/json4s-core_2.10/3.2.11/json4s-core_2.10-3.2.11.jar" "org/json4s/json4s-ast_2.10/3.2.11/json4s-ast_2.10-3.2.11.jar" "org/scalanlp/breeze-macros_2.10/0.13.1/breeze-macros_2.10-0.13.1.jar" "org/spire-math/spire_2.10/0.13.0/spire_2.10-0.13.0.jar" "org/typelevel/macro-compat_2.10/1.1.1/macro-compat_2.10-1.1.1.jar" "com/chuusai/shapeless_2.10/2.3.2/shapeless_2.10-2.3.2.jar" "org/spire-math/spire-macros_2.10/0.13.0/spire-macros_2.10-0.13.0.jar" "org/typelevel/machinist_2.10/0.6.1/machinist_2.10-0.6.1.jar" "org/scalanlp/breeze_2.10/0.13.1/breeze_2.10-0.13.1.jar" "com/twitter/chill_2.10/0.8.0/chill_2.10-0.8.0.jar" "com/twitter/chill-java/0.8.0/chill-java-0.8.0.jar")
# Get Spark 2.2 jars compiled with Scala 2.10 from mvn
SPARK_JARS=("org/apache/spark/spark-catalyst_2.10/${SPARK_VERSION}/spark-catalyst_2.10-2.2.0.jar" "org/apache/spark/spark-core_2.10/${SPARK_VERSION}/spark-core_2.10-2.2.0.jar" "org/apache/spark/spark-graphx_2.10/${SPARK_VERSION}/spark-graphx_2.10-2.2.0.jar" "org/apache/spark/spark-launcher_2.10/${SPARK_VERSION}/spark-launcher_2.10-2.2.0.jar" "org/apache/spark/spark-mllib_2.10/${SPARK_VERSION}/spark-mllib_2.10-2.2.0.jar" "org/apache/spark/spark-mllib-local_2.10/${SPARK_VERSION}/spark-mllib-local_2.10-2.2.0.jar" "org/apache/spark/spark-network-common_2.10/${SPARK_VERSION}/spark-network-common_2.10-2.2.0.jar" "org/apache/spark/spark-network-shuffle_2.10/${SPARK_VERSION}/spark-network-shuffle_2.10-2.2.0.jar" "org/apache/spark/spark-repl_2.10/${SPARK_VERSION}/spark-repl_2.10-2.2.0.jar" "org/apache/spark/spark-sketch_2.10/${SPARK_VERSION}/spark-sketch_2.10-2.2.0.jar" "org/apache/spark/spark-sql_2.10/${SPARK_VERSION}/spark-sql_2.10-2.2.0.jar" "org/apache/spark/spark-streaming_2.10/${SPARK_VERSION}/spark-streaming_2.10-2.2.0.jar" "org/apache/spark/spark-tags_2.10/${SPARK_VERSION}/spark-tags_2.10-2.2.0.jar" "org/apache/spark/spark-unsafe_2.10/${SPARK_VERSION}/spark-unsafe_2.10-2.2.0.jar" "org/apache/spark/spark-yarn_2.10/${SPARK_VERSION}/spark-yarn_2.10-2.2.0.jar")
# monasca-api stuff
VERTICA_VERSION=${VERTICA_VERSION:-7.2.1-0}
CASSANDRA_VERSION=${CASSANDRA_VERSION:-37x}
STORM_VERSION=${STORM_VERSION:-1.0.2}
GO_VERSION=${GO_VERSION:-"1.7.1"}
NODE_JS_VERSION=${NODE_JS_VERSION:-"4.0.0"}
NVM_VERSION=${NVM_VERSION:-"0.32.1"}
# Repository settings
MONASCA_API_REPO=${MONASCA_API_REPO:-${GIT_BASE}/openstack/monasca-api.git}
MONASCA_API_BRANCH=${MONASCA_API_BRANCH:-master}
MONASCA_API_DIR=${DEST}/monasca-api
MONASCA_PERSISTER_REPO=${MONASCA_PERSISTER_REPO:-${GIT_BASE}/openstack/monasca-persister.git}
MONASCA_PERSISTER_BRANCH=${MONASCA_PERSISTER_BRANCH:-master}
MONASCA_PERSISTER_DIR=${DEST}/monasca-persister
MONASCA_CLIENT_REPO=${MONASCA_CLIENT_REPO:-${GIT_BASE}/openstack/python-monascaclient.git}
MONASCA_CLIENT_BRANCH=${MONASCA_CLIENT_BRANCH:-master}
MONASCA_CLIENT_DIR=${DEST}/python-monascaclient
MONASCA_AGENT_REPO=${MONASCA_AGENT_REPO:-${GIT_BASE}/openstack/monasca-agent.git}
MONASCA_AGENT_BRANCH=${MONASCA_AGENT_BRANCH:-master}
MONASCA_AGENT_DIR=${DEST}/monasca-agent
MONASCA_COMMON_REPO=${MONASCA_COMMON_REPO:-${GIT_BASE}/openstack/monasca-common.git}
MONASCA_COMMON_BRANCH=${MONASCA_COMMON_BRANCH:-master}
MONASCA_COMMON_DIR=${DEST}/monasca-common

View File

@ -1,15 +0,0 @@
#!/usr/bin/env bash
MAVEN_STUB="https://repo1.maven.org/maven2"
SPARK_JAVA_LIBS=("org/apache/kafka/kafka_2.10/0.8.1.1/kafka_2.10-0.8.1.1.jar" "org/scala-lang/scala-library/2.10.1/scala-library-2.10.1.jar" "com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar" "org/apache/spark/spark-streaming-kafka_2.10/1.6.0/spark-streaming-kafka_2.10-1.6.0.jar")
for SPARK_JAVA_LIB in "${SPARK_JAVA_LIBS[@]}"
do
echo Would fetch ${MAVEN_STUB}/${SPARK_JAVA_LIB}
done
for SPARK_JAVA_LIB in "${SPARK_JAVA_LIBS[@]}"
do
SPARK_LIB_NAME=`echo ${SPARK_JAVA_LIB} | sed 's/.*\///'`
echo Got lib ${SPARK_LIB_NAME}
done

View File

@ -1,258 +0,0 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# monasca-transform documentation build configuration file, created by
# sphinx-quickstart on Mon Jan 9 12:02:59 2012.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import os
import subprocess
import sys
import warnings
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, os.path.abspath('../../'))
sys.path.insert(0, os.path.abspath('../'))
sys.path.insert(0, os.path.abspath('./'))
# -- General configuration ----------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc',
'sphinx.ext.todo',
'sphinx.ext.coverage',
'sphinx.ext.viewcode',
]
todo_include_todos = True
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
# source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'monasca-transform'
copyright = u'2016, OpenStack Foundation'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
# language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
# today = ''
# Else, today_fmt is used as the format for a strftime call.
# today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['old']
# The reST default role (used for this markup: `text`) to use for all
# documents.
# default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
# add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
# add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
show_authors = True
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
modindex_common_prefix = ['monasca-transform.']
# -- Options for man page output --------------------------------------------
# -- Options for HTML output --------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
# html_theme_path = ["."]
# html_theme = '_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
# html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
# html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
# html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
# html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
# html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
# html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
# html_static_path = ['_static']
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
# html_last_updated_fmt = '%b %d, %Y'
git_cmd = ["git", "log", "--pretty=format:'%ad, commit %h'", "--date=local",
"-n1"]
try:
html_last_updated_fmt = subprocess.check_output(git_cmd).decode('utf-8')
except Exception:
warnings.warn('Cannot get last updated time from git repository. '
'Not setting "html_last_updated_fmt".')
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
# html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
# html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
# html_additional_pages = {}
# If false, no module index is generated.
# html_domain_indices = True
# If false, no index is generated.
# html_use_index = True
# If true, the index is split into individual pages for each letter.
# html_split_index = False
# If true, links to the reST sources are added to the pages.
# html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
# html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
# html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
# html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
# html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = 'monasca-transformdoc'
# -- Options for LaTeX output -------------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
# 'preamble': '',
}
# Grouping the document tree into LaTeX files. List of tuples (source
# start file, target name, title, author, documentclass
# [howto/manual]).
latex_documents = [
('index', 'monasca-transform.tex', u'Monasca-transform Documentation',
u'OpenStack', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
# latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
# latex_use_parts = False
# If true, show page references after internal links.
# latex_show_pagerefs = False
# If true, show URL addresses after external links.
# latex_show_urls = False
# Documents to append as an appendix to all manuals.
# latex_appendices = []
# If false, no module index is generated.
# latex_domain_indices = True
# -- Options for Texinfo output -----------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
('index', 'monasca-transform', u'Monasca-transform Documentation',
u'OpenStack', 'monasca-transform', 'One line description of project.',
'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
# texinfo_appendices = []
# If false, no module index is generated.
# texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
# texinfo_show_urls = 'footnote'
# Example configuration for intersphinx: refer to the Python standard library.
# intersphinx_mapping = {'http://docs.python.org/': None}

View File

@ -1,24 +0,0 @@
..
Copyright 2016 OpenStack Foundation
All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License. You may obtain
a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
=================
Monasca-transform
=================
.. toctree::
api/autoindex.rst

View File

@ -1,329 +0,0 @@
Team and repository tags
========================
[![Team and repository tags](https://governance.openstack.org/badges/monasca-transform.svg)](https://governance.openstack.org/reference/tags/index.html)
<!-- Change things from this point on -->
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- [Create a new aggregation pipeline](#create-a-new-aggregation-pipeline)
- [Using existing generic aggregation components](#using-existing-generic-aggregation-components)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
<!-- Change things from this point on -->
# Create a new aggregation pipeline
Monasca Transform allows you to create new aggregation by creating *pre_transform_spec* and
*transform_spec* for any set of Monasca metrics. This page gives you steps on how to create a new
aggregation pipeline and test the pipeline in your DevStack environment.
Pre-requisite for following steps on this page is that you have already created a devstack
development environment for Monasca Transform, following instructions in
[devstack/README.md](devstack/README.md)
## Using existing generic aggregation components ##
Most of the use cases will fall into this category where you should be able to create new
aggregation for new set of metrics using existing set of generic aggregation components.
Let's consider a use case where we want to find out
* Maximum time monasca-agent takes to submit metrics over a period of an hour across all hosts
* Maximum time monasca-agent takes to submit metrics over period of a hour per host.
We know that monasca-agent on each host generates a small number of
[monasca-agent metrics](https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md).
The metric we are interested in is
* **"monasca.collection_time_sec"**: Amount of time that the collector took for this collection run
**Steps:**
* **Step 1**: Identify the monasca metric to be aggregated from the Kafka topic
```
/opt/kafka_2.11-0.9.0.1/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic metrics | grep "monasca.collection_time_sec"
{"metric":{"timestamp":1523323485360.6650390625,"name":"monasca.collection_time_sec",
"dimensions":{"hostname":"devstack","component":"monasca-agent",
"service":"monitoring"},"value":0.0340659618, "value_meta":null},
"meta":{"region":"RegionOne","tenantId":"d6bece1bbeff47c1b8734cd4e544dc02"},
"creation_time":1523323489}
```
Note: "hostname" is available as a dimension, which we will use to find maximum collection time for each host.
* **Step 2**: Create a **pre_transform_spec**
"pre_transform_spec" drives the pre-processing of monasca metric to record store format. Look
for existing example in
"/monasca-transform-source/monasca_transform/data_driven_specs/pre_transform_specs/pre_transform_specs.json"
**pre_transform_spec**
```
{
"event_processing_params":{
"set_default_zone_to":"1",
"set_default_geolocation_to":"1",
"set_default_region_to":"W"
},
"event_type":"monasca.collection_time_sec", <-- EDITED
"metric_id_list":["monasca_collection_host"], <-- EDITED
"required_raw_fields_list":["creation_time", "metric.dimensions.hostname"], <--EDITED
}
```
Lets look at all the fields that were edited (Marked as `<-- EDITED` above):
**event_type**: set to "monasca.collection_time_sec". These are the metrics we want to
transform/aggregate.
**metric_id_list**: set to ['monasca_collection_host']. This is a transformation spec
identifier. During pre-processing record generator generates additional "record_store" data for
each item in this list. (To be renamed to transform_spec_list)
**required_raw_fields_list**: set to ["creation_time", "metric.dimensions.hostname"]
This should list fields in the incoming metrics that are required. Pre-processing will
eliminate or remove metrics which have missing required fields, during validation.
**Note:** "metric_id" is a misnomer, it is not really a metric identifier but rather identifier
for transformation spec. This will be changed to transform_spec_id in the future.
* **Step 3**: Create a "transform_spec" to find maximum metric value for each host
"transform_spec" drives the aggregation of record store data created during pre-processing
to final aggregated metric. Look for existing example in
"/monasca-transform-source/monasca_transform/data_driven_specs/transform_specs/transform_specs.json"
**transform_spec**
```
{
"aggregation_params_map":{
"aggregation_pipeline":{
"source":"streaming",
"usage":"fetch_quantity", <-- EDITED
"setters":["set_aggregated_metric_name","set_aggregated_period"], <-- EDITED
"insert":["insert_data_pre_hourly"] <-- EDITED
},
"aggregated_metric_name":"monasca.collection_time_sec_host_agg", <-- EDITED
"aggregation_period":"hourly", <-- EDITED
"aggregation_group_by_list": ["host"],
"usage_fetch_operation": "max", <-- EDITED
"filter_by_list": [],
"dimension_list":["aggregation_period","host"], <-- EDITED
"pre_hourly_operation":"max",
"pre_hourly_group_by_list":["default"]},
"metric_group":"monasca_collection_host", <-- EDITED
"metric_id":"monasca_collection_host" <-- EDITED
}
```
Lets look at all the fields that were edited (Marked as `<-- EDITED` above):
aggregation pipeline fields
* **usage**: set to "fetch_quantity" Use "fetch_quantity" generic aggregation component. This
component takes a "aggregation_group_by_list", "usage_fetch_operation" and "filter_by_list" as
parameters.
* **aggregation_group_by_list** set to ["host"]. Since we want to find monasca agent
collection time for each host.
* **usage_fetch_operation** set to "max". Since we want to find maximum value for
monasca agent collection time.
* **filter_by_list** set to []. Since we dont want filter out/ignore any metrics (based on
say particular host or set of hosts)
* **setters**: set to ["set_aggregated_metric_name","set_aggregated_period"] These components set
aggregated metric name and aggregation period in final aggregated metric.
* **set_aggregated_metric_name** sets final aggregated metric name. This setter component takes
"aggregated_metric_name" as a parameter.
* **aggregated_metric_name**: set to "monasca.collection_time_sec_host_agg"
* **set_aggregated_period** sets final aggregated metric period. This setter component takes
"aggregation_period" as a parameter.
* **aggregation_period**: set to "hourly"
* **insert**: set to ["insert_data_pre_hourly"]. These components are responsible for
transforming instance usage data records to final metric format and writing the data to kafka
topic.
* **insert_data_pre_hourly** writes the to "metrics_pre_hourly" kafka topic, which gets
processed by the pre hourly processor every hour.
pre hourly processor fields
* **pre_hourly_operation** set to "max"
Find the hourly maximum value from records that were written to "metrics_pre_hourly" topic
* **pre_hourly_group_by_list** set to ["default"]
transformation spec identifier fields
* **metric_group** set to "monasca_collection_host". Group identifier for this transformation
spec
* **metric_id** set to "monasca_collection_host". Identifier for this transformation spec.
**Note:** metric_group" and "metric_id" are misnomers, it is not really a metric identifier but
rather identifier for transformation spec. This will be changed to "transform_group" and
"transform_spec_id" in the future. (Please see Story
[2001815](https://storyboard.openstack.org/#!/story/2001815))
* **Step 4**: Create a "transform_spec" to find maximum metric value across all hosts
Now let's create another transformation spec to find maximum metric value across all hosts.
**transform_spec**
```
{
"aggregation_params_map":{
"aggregation_pipeline":{
"source":"streaming",
"usage":"fetch_quantity", <-- EDITED
"setters":["set_aggregated_metric_name","set_aggregated_period"], <-- EDITED
"insert":["insert_data_pre_hourly"] <-- EDITED
},
"aggregated_metric_name":"monasca.collection_time_sec_all_agg", <-- EDITED
"aggregation_period":"hourly", <-- EDITED
"aggregation_group_by_list": [],
"usage_fetch_operation": "max", <-- EDITED
"filter_by_list": [],
"dimension_list":["aggregation_period"], <-- EDITED
"pre_hourly_operation":"max",
"pre_hourly_group_by_list":["default"]},
"metric_group":"monasca_collection_all", <-- EDITED
"metric_id":"monasca_collection_all" <-- EDITED
}
```
The transformation spec above is almost identical to transformation spec created in **Step 3**
with a few additional changes.
**aggregation_group_by_list** is set to [] i.e. empty list, since we want to find maximum value
across all hosts (consider all the incoming metric data).
**aggregated_metric_name** is set to "monasca.collection_time_sec_all_agg".
**metric_group** is set to "monasca_collection_all", since we need a new transfomation spec
group identifier.
**metric_id** is set to "monasca_collection_all", since we need a new transformation spec
identifier.
* **Step 5**: Update "pre_transform_spec" with new transformation spec identifier
In **Step 4** we created a new transformation spec, with new "metric_id", namely
"monasca_collection_all". We will have to now update the "pre_transform_spec" that we
created in **Step 2** with new "metric_id" by adding it to the "metric_id_list"
**pre_transform_spec**
```
{
"event_processing_params":{
"set_default_zone_to":"1",
"set_default_geolocation_to":"1",
"set_default_region_to":"W"
},
"event_type":"monasca.collection_time_sec",
"metric_id_list":["monasca_collection_host", "monasca_collection_all"], <-- EDITED
"required_raw_fields_list":["creation_time", "metric.dimensions.hostname"],
}
```
Thus we were able to add additional transformation or aggregation pipeline to the same incoming
monasca metric very easily.
* **Step 6**: Update "pre_transform_spec" and "transform_spec"
* Edit
"/monasca-transform-source/monasca_transform/data_driven_specs/pre_transform_specs/pre_transform_specs.json"
and add following line.
```
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"monasca.collection_time_sec","metric_id_list":["monasca_collection_host","monasca_collection_all"],"required_raw_fields_list":["creation_time"]}
```
**Note:** Each line does not end with a comma (the file is not one big json document).
* Edit
"/monasca-transform-source/monasca_transform/data_driven_specs/transform_specs/transform_specs.json"
and add following lines.
```
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["set_aggregated_metric_name","set_aggregated_period"],"insert":["insert_data_pre_hourly"]},"aggregated_metric_name":"monasca.collection_time_sec_host_agg","aggregation_period":"hourly","aggregation_group_by_list":["host"],"usage_fetch_operation":"max","filter_by_list":[],"dimension_list":["aggregation_period","host"],"pre_hourly_operation":"max","pre_hourly_group_by_list":["default"]},"metric_group":"monasca_collection_host","metric_id":"monasca_collection_host"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["set_aggregated_metric_name","set_aggregated_period"],"insert":["insert_data_pre_hourly"]},"aggregated_metric_name":"monasca.collection_time_sec_all_agg","aggregation_period":"hourly","aggregation_group_by_list":[],"usage_fetch_operation":"max","filter_by_list":[],"dimension_list":["aggregation_period"],"pre_hourly_operation":"max","pre_hourly_group_by_list":["default"]},"metric_group":"monasca_collection_all","metric_id":"monasca_collection_all"}
```
* Run "refresh_monasca_transform.sh" script as documented in devstack
[README](devstack/README.md) to refresh the specs in the database.
```
vagrant@devstack:~$ cd /opt/stack/monasca-transform
vagrant@devstack:/opt/stack/monasca-transform$ tools/vagrant/refresh_monasca_transform.sh
```
If successful, you should see this message.
```
***********************************************
* *
* SUCCESS!! refresh monasca transform done. *
* *
***********************************************
```
* **Step 7**: Verifying results
To verify if new aggregated metrics are being produced you can look at the "metrics_pre_hourly"
topic in kafka. By default, monasca-transform fires of a batch every 10 minutes so you should
see metrics in intermediate "instance_usage" data format being published to that topic every 10
minutes.
```
/opt/kafka_2.11-0.9.0.1/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic metrics_pre_hourly
{"usage_hour":"06","geolocation":"NA","record_count":40.0,"app":"NA","deployment":"NA","resource_uuid":"NA",
"pod_name":"NA","usage_minute":"NA","service_group":"NA","lastrecord_timestamp_string":"2018-04-1106:29:49",
"user_id":"NA","zone":"NA","namespace":"NA","usage_date":"2018-04-11","daemon_set":"NA","processing_meta":{
"event_type":"NA","metric_id":"monasca_collection_all"},
"firstrecord_timestamp_unix":1523427604.208577,"project_id":"NA","lastrecord_timestamp_unix":1523428189.718174,
"aggregation_period":"hourly","host":"NA","container_name":"NA","interface":"NA",
"aggregated_metric_name":"monasca.collection_time_sec_all_agg","tenant_id":"NA","region":"NA",
"firstrecord_timestamp_string":"2018-04-11 06:20:04","quantity":0.0687000751}
{"usage_hour":"06","geolocation":"NA","record_count":40.0,"app":"NA","deployment":"NA","resource_uuid":"NA",
"pod_name":"NA","usage_minute":"NA","service_group":"NA","lastrecord_timestamp_string":"2018-04-11 06:29:49",
"user_id":"NA","zone":"NA","namespace":"NA","usage_date":"2018-04-11","daemon_set":"NA","processing_meta":{
"event_type":"NA","metric_id":"monasca_collection_host"},"firstrecord_timestamp_unix":1523427604.208577,
"project_id":"NA","lastrecord_timestamp_unix":1523428189.718174,"aggregation_period":"hourly",
"host":"devstack","container_name":"NA","interface":"NA",
"aggregated_metric_name":"monasca.collection_time_sec_host_agg","tenant_id":"NA","region":"NA",
"firstrecord_timestamp_string":"2018-04-11 06:20:04","quantity":0.0687000751}
```
Similarly, to verify if final aggregated metrics are being published by pre hourly processor,
you can look at "metrics" topic in kafka. By default pre hourly processor (which processes
metrics from "metrics_pre_hourly" topic) runs 10 minutes past the top of the hour.
```
/opt/kafka_2.11-0.9.0.1/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic metrics | grep "_agg"
{"metric":{"timestamp":1523459468616,"value_meta":{"firstrecord_timestamp_string":"2018-04-11 14:00:13",
"lastrecord_timestamp_string":"2018-04-11 14:59:46","record_count":239.0},"name":"monasca.collection_time_sec_host_agg",
"value":0.1182248592,"dimensions":{"aggregation_period":"hourly","host":"devstack"}},
"meta":{"region":"useast","tenantId":"df89c3db21954b08b0516b4b60b8baff"},"creation_time":1523459468}
{"metric":{"timestamp":1523455872740,"value_meta":{"firstrecord_timestamp_string":"2018-04-11 13:00:10",
"lastrecord_timestamp_string":"2018-04-11 13:59:58","record_count":240.0},"name":"monasca.collection_time_sec_all_agg",
"value":0.0898442268,"dimensions":{"aggregation_period":"hourly"}},
"meta":{"region":"useast","tenantId":"df89c3db21954b08b0516b4b60b8baff"},"creation_time":1523455872}
```
As you can see monasca-transform created two new aggregated metrics with name
"monasca.collection_time_sec_host_agg" and "monasca.collection_time_sec_all_agg". "value_meta"
section has three fields "firstrecord_timestamp" and "lastrecord_timestamp" and
"record_count". These fields are for informational purposes only. It shows timestamp of the first metric,
timestamp of the last metric and number of metrics that went into the calculation of the aggregated
metric.

View File

@ -1,109 +0,0 @@
Team and repository tags
========================
[![Team and repositorytags](https://governance.openstack.org/badges/monasca-transform.svg)](https://governance.openstack.org/reference/tags/index.html)
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- [Monasca Transform Data Formats](#monasca-transform-data-formats)
- [Record Store Data Format](#record-store-data-format)
- [Instance Usage Data Format](#instance-usage-data-format)
- [References](#references)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
# Monasca Transform Data Formats
There are two data formats used by monasca transform. The following sections describes the schema
(Spark's DataFrame[1] schema) for the two formats.
Note: These are internal formats used by Monasca Transform when aggregating data. If you are a user
who wants to create new aggregation pipeline using an existing framework, you don't need to know or
care about these two formats.
As a developer, if you want to write new aggregation components then you might have to know how to
enhance the record store data format or instance usage data format with additional fields that you
may need or to write new aggregation components that aggregate data from the additional fields.
**Source Metric**
This is an example monasca metric. Monasca metric is transformed into `record_store` data format and
later transformed/aggregated using re-usable generic aggregation components, to derive
'instance_usage` data format.
Example of a monasca metric:
```
{"metric":{"timestamp":1523323485360.6650390625,
"name":"monasca.collection_time_sec",
"dimensions":{"hostname":"devstack",
"component":"monasca-agent",
"service":"monitoring"},
"value":0.0340659618,
"value_meta":null},
"meta":{"region":"RegionOne","tenantId":"d6bece1bbeff47c1b8734cd4e544dc02"},
"creation_time":1523323489}
```
## Record Store Data Format ##
Data Frame Schema:
| Column Name | Column Data Type | Description |
| :---------- | :--------------- | :---------- |
| event_quantity | `pyspark.sql.types.DoubleType` | mapped to `metric.value`|
| event_timestamp_unix | `pyspark.sql.types.DoubleType` | calculated as `metric.timestamp`/`1000` from source metric|
| event_timestamp_string | `pyspark.sql.types.StringType` | mapped to `metric.timestamp` from the source metric|
| event_type | `pyspark.sql.types.StringType` | placeholder for the future. mapped to `metric.name` from source metric|
| event_quantity_name | `pyspark.sql.types.StringType` | mapped to `metric.name` from source metric|
| resource_uuid | `pyspark.sql.types.StringType` | mapped to `metric.dimensions.instanceId` or `metric.dimensions.resource_id` from source metric |
| tenant_id | `pyspark.sql.types.StringType` | mapped to `metric.dimensions.tenant_id` or `metric.dimensions.tenantid` or `metric.dimensions.project_id` |
| user_id | `pyspark.sql.types.StringType` | mapped to `meta.userId` |
| region | `pyspark.sql.types.StringType` | placeholder of the future. mapped to `meta.region`, defaults to `event_processing_params.set_default_region_to` (`pre_transform_spec`) |
| zone | `pyspark.sql.types.StringType` | placeholder for the future. mapped to `meta.zone`, defaults to `event_processing_params.set_default_zone_to` (`pre_transform_spec`) |
| host | `pyspark.sql.types.StringType` | mapped to `metric.dimensions.hostname` or `metric.value_meta.host` |
| project_id | `pyspark.sql.types.StringType` | mapped to metric tenant_id |
| event_date | `pyspark.sql.types.StringType` | "YYYY-mm-dd". Extracted from `metric.timestamp` |
| event_hour | `pyspark.sql.types.StringType` | "HH". Extracted from `metric.timestamp` |
| event_minute | `pyspark.sql.types.StringType` | "MM". Extracted from `metric.timestamp` |
| event_second | `pyspark.sql.types.StringType` | "SS". Extracted from `metric.timestamp` |
| metric_group | `pyspark.sql.types.StringType` | identifier for transform spec group |
| metric_id | `pyspark.sql.types.StringType` | identifier for transform spec |
## Instance Usage Data Format ##
Data Frame Schema:
| Column Name | Column Data Type | Description |
| :---------- | :--------------- | :---------- |
| tenant_id | `pyspark.sql.types.StringType` | project_id, defaults to `NA` |
| user_id | `pyspark.sql.types.StringType` | user_id, defaults to `NA`|
| resource_uuid | `pyspark.sql.types.StringType` | resource_id, defaults to `NA`|
| geolocation | `pyspark.sql.types.StringType` | placeholder for future, defaults to `NA`|
| region | `pyspark.sql.types.StringType` | placeholder for future, defaults to `NA`|
| zone | `pyspark.sql.types.StringType` | placeholder for future, defaults to `NA`|
| host | `pyspark.sql.types.StringType` | compute hostname, defaults to `NA`|
| project_id | `pyspark.sql.types.StringType` | project_id, defaults to `NA`|
| aggregated_metric_name | `pyspark.sql.types.StringType` | aggregated metric name, defaults to `NA`|
| firstrecord_timestamp_string | `pyspark.sql.types.StringType` | timestamp of the first metric used to derive this aggregated metric|
| lastrecord_timestamp_string | `pyspark.sql.types.StringType` | timestamp of the last metric used to derive this aggregated metric|
| usage_date | `pyspark.sql.types.StringType` | "YYYY-mm-dd" date|
| usage_hour | `pyspark.sql.types.StringType` | "HH" hour|
| usage_minute | `pyspark.sql.types.StringType` | "MM" minute|
| aggregation_period | `pyspark.sql.types.StringType` | "hourly" or "minutely" |
| firstrecord_timestamp_unix | `pyspark.sql.types.DoubleType` | epoch timestamp of the first metric used to derive this aggregated metric |
| lastrecord_timestamp_unix | `pyspark.sql.types.DoubleType` | epoch timestamp of the first metric used to derive this aggregated metric |
| quantity | `pyspark.sql.types.DoubleType` | aggregated metric quantity |
| record_count | `pyspark.sql.types.DoubleType` | number of source metrics that were used to derive this aggregated metric. For informational purposes only. |
| processing_meta | `pyspark.sql.types.MapType(pyspark.sql.types.StringType, pyspark.sql.types.StringType, True)` | Key-Value pairs to store additional information, to aid processing |
| extra_data_map | `pyspark.sql.types.MapType(pyspark.sql.types.StringType, pyspark.sql.types.StringType, True)` | Key-Value pairs to store group by column key value pair |
## References
[1] [Spark SQL, DataFrames and Datasets
Guide](https://spark.apache.org/docs/latest/sql-programming-guide.html)
[2] [Spark
DataTypes](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.types.DataType)

View File

@ -1,705 +0,0 @@
Team and repository tags
========================
[![Team and repository tags](https://governance.openstack.org/badges/monasca-transform.svg)](https://governance.openstack.org/reference/tags/index.html)
<!-- Change things from this point on -->
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- [Monasca Transform Generic Aggregation Components](#monasca-transform-generic-aggregation-components)
- [Monasca Transform Generic Aggregation Components](#monasca-transform-generic-aggregation-components)
- [Introduction](#introduction)
- [1: Conversion of incoming metrics to record store data format](#1-conversion-of-incoming-metrics-to-record-store-data-format)
- [Pre Transform Spec](#pre-transform-spec)
- [2: Data aggregation using generic aggregation components](#2-data-aggregation-using-generic-aggregation-components)
- [Transform Specs](#transform-specs)
- [aggregation_params_map](#aggregation_params_map)
- [aggregation_pipeline](#aggregation_pipeline)
- [Other parameters](#other-parameters)
- [metric_group and metric_id](#metric_group-and-metric_id)
- [Generic Aggregation Components](#generic-aggregation-components)
- [Usage Components](#usage-components)
- [fetch_quantity](#fetch_quantity)
- [fetch_quantity_util](#fetch_quantity_util)
- [calculate_rate](#calculate_rate)
- [Setter Components](#setter-components)
- [set_aggregated_metric_name](#set_aggregated_metric_name)
- [set_aggregated_period](#set_aggregated_period)
- [rollup_quantity](#rollup_quantity)
- [Insert Components](#insert-components)
- [insert_data](#insert_data)
- [insert_data_pre_hourly](#insert_data_pre_hourly)
- [Processors](#processors)
- [pre_hourly_processor](#pre_hourly_processor)
- [Special notation](#special-notation)
- [pre_transform spec](#pre_transform-spec)
- [transform spec](#transform-spec)
- [Putting it all together](#putting-it-all-together)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
# Monasca Transform Generic Aggregation Components
# Introduction
Monasca Transform uses standard ETL (Extract-Transform-Load) design pattern to aggregate monasca
metrics and uses innovative data/configuration driven mechanism to drive processing. It accomplishes
data aggregation in two distinct steps, each is driven using external configuration specifications,
namely *pre-transform_spec* and *transform_spec*.
## 1: Conversion of incoming metrics to record store data format ##
In the first step, the incoming metrics are converted into a canonical data format called as record
store data using *pre_transform_spec*.
This logical processing data flow is explained in more detail in [Monasca/Transform wiki: Logical
processing data flow section: Conversion to record store
format](https://wiki.openstack.org/wiki/Monasca/Transform#Logical_processing_data_flow) and includes
following operations:
* identifying metrics that are required (or in other words filtering out of unwanted metrics)
* validation and extraction of essential data in metric
* generating multiple records for incoming metrics if they are to be aggregated in multiple ways,
and finally
* conversion of the incoming metrics to canonical record store data format. Please refer to record
store section in [Data Formats](data_formats.md) for more information on record store format.
### Pre Transform Spec ###
Example *pre_transform_spec* for metric
```
{
"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},
"event_type":"cpu.total_logical_cores",
"metric_id_list":["cpu_total_all","cpu_total_host","cpu_util_all","cpu_util_host"],
"required_raw_fields_list":["creation_time"],
}
```
*List of fields*
| field name | values | description |
| :--------- | :----- | :---------- |
| event_processing_params | Set default field values `set_default_zone_to`, `set_default_geolocation_to`, `set_default_region_to`| Set default values for certain fields in the record store data |
| event_type | Name of the metric | identifies metric that needs to be aggregated |
| metric_id_list | List of `metric_id`'s | List of identifiers, should match `metric_id` in transform specs. This is used by record generation step to generate multiple records if this metric is to be aggregated in multiple ways|
| required_raw_fields_list | List of `field`'s | List of fields (use [Special notation](#special-notation)) that are required in the incoming metric, used for validating incoming metric. The validator checks if field is present and is not empty. If the field is absent or empty the validator filters such metrics out from aggregation. |
## 2: Data aggregation using generic aggregation components ##
In the second step, the canonical record store data is aggregated using *transform_spec*. Each
*transform_spec* defines series of generic aggregation components, which are specified in
`aggregation_params_map.aggregation_pipeline` section. (See *transform_spec* example below).
Any parameters used by the generic aggregation components are also specified in the
`aggregation_params_map` section (See *Other parameters* e.g. `aggregated_metric_name`, `aggregation_period`,
`aggregation_group_by_list` etc. in *transform_spec* example below)
### Transform Specs ###
Example *transform_spec* for metric
```
{"aggregation_params_map":{
"aggregation_pipeline":{
"source":"streaming",
"usage":"fetch_quantity",
"setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],
"insert":["prepare_data","insert_data_pre_hourly"]
},
"aggregated_metric_name":"cpu.total_logical_cores_agg",
"aggregation_period":"hourly",
"aggregation_group_by_list": ["host", "metric_id", "tenant_id"],
"usage_fetch_operation": "avg",
"filter_by_list": [],
"setter_rollup_group_by_list": [],
"setter_rollup_operation": "sum",
"dimension_list":["aggregation_period","host","project_id"],
"pre_hourly_operation":"avg",
"pre_hourly_group_by_list":["default"]
},
"metric_group":"cpu_total_all",
"metric_id":"cpu_total_all"
}
```
#### aggregation_params_map ####
This section specifies *aggregation_pipeline*, *Other parameters* (used by generic aggregation
components in *aggregation_pipeline*).
##### aggregation_pipeline #####
Specifies generic aggregation components that should be used to process incoming metrics.
Note: generic aggregation components are re-usable and can be used to build different aggregation
pipelines as required.
*List of fields*
| field name | values | description |
| :--------- | :----- | :---------- |
| source | ```streaming``` | source is ```streaming```. In the future this can be used to specify a component which can fetch data directly from monasca datastore |
| usage | ```fetch_quantity```, ```fetch_quantity_util```, ```calculate_rate``` | [Usage Components](https://github.com/openstack/monasca-transform/tree/master/monasca_transform/component/usage)|
| setters | ```pre_hourly_calculate_rate```, ```rollup_quantity```, ```set_aggregated_metric_name```, ```set_aggregated_period``` | [Setter Components](https://github.com/openstack/monasca-transform/tree/master/monasca_transform/component/setter)|
| insert | ```insert_data```, ```insert_data_pre_hourly``` | [Insert Components](https://github.com/openstack/monasca-transform/tree/master/monasca_transform/component/insert)|
##### Other parameters #####
Specifies parameters that generic aggregation components use to process and aggregate data.
*List of Other parameters*
| Parameter Name | Values | Description | Used by |
| :------------- | :----- | :---------- | :------ |
| aggregated_metric_name| e.g. "cpu.total_logical_cores_agg" | Name of the aggregated metric | [set_aggregated_metric_name](#set_aggregated_metric_name) |
| aggregation_period |"hourly", "minutely" or "secondly" | Period over which to aggregate data. | [fetch_quantity](#fetch_quantity), [fetch_quantity_util](#fetch_quantity_util), [calculate_rate](#calculate_rate), [set_aggregated_period](#set_aggregated_period), [rollup_quantity](#rollup_quantity) |[fetch_quantity](#fetch_quantity), [fetch_quantity_util](#fetch_quantity_util), [calculate_rate](#calculate_rate) |
| aggregation_group_by_list | e.g. "project_id", "hostname" | Group `record_store` data with these columns. Please also see [Special notation](#special-notation) below | [fetch_quantity](#fetch_quantity), [fetch_quantity_util](#fetch_quantity_util), [calculate_rate](#calculate_rate) |
| usage_fetch_operation | e.g "sum" | After the data is grouped by `aggregation_group_by_list`, perform this operation to find the aggregated metric value | [fetch_quantity](#fetch_quantity), [fetch_quantity_util](#fetch_quantity_util), [calculate_rate](#calculate_rate) |
| filter_by_list | Filter regex | Filter data using regex on a `record_store` column value| [fetch_quantity](#fetch_quantity), [fetch_quantity_util](#fetch_quantity_util), [calculate_rate](#calculate_rate) |
| setter_rollup_group_by_list | e.g. "project_id" | Group `instance_usage` data with these columns rollup operation. Please also see [Special notation](#special-notation) below | [rollup_quantity](#rollup_quantity) |
| setter_rollup_operation | e.g. "avg" | After data is grouped by `setter_rollup_group_by_list`, perform this operation to find aggregated metric value | [rollup_quantity](#rollup_quantity) |
| dimension_list | e.g. "aggregation_period", "host", "project_id" | List of fields which specify dimensions in aggregated metric. Please also see [Special notation](#special-notation) below | [insert_data](#insert_data), [insert_data_pre_hourly](#insert_data_pre_hourly)|
| pre_hourly_group_by_list | e.g. "default" | List of `instance usage data` fields to do a group by operation to aggregate data. Please also see [Special notation](#special-notation) below | [pre_hourly_processor](#pre_hourly_processor) |
| pre_hourly_operation | e.g. "avg" | When aggregating data published to `metrics_pre_hourly` every hour, perform this operation to find hourly aggregated metric value | [pre_hourly_processor](#pre_hourly_processor) |
### metric_group and metric_id ###
Specifies a metric or list of metrics from the record store data, which will be processed by this
*transform_spec*. Note: This can be a single metric or a group of metrics that will be combined to
produce the final aggregated metric.
*List of fields*
| field name | values | description |
| :--------- | :----- | :---------- |
| metric_group | unique transform spec group identifier | group identifier for this transform spec e.g. "cpu_total_all" |
| metric_id | unique transform spec identifier | identifier for this transform spec e.g. "cpu_total_all" |
**Note:** "metric_id" is a misnomer, it is not really a metric group/or metric identifier but rather
identifier for transformation spec. This will be changed to "transform_spec_id" in the future.
## Generic Aggregation Components ##
*List of Generic Aggregation Components*
### Usage Components ###
All usage components implement a method
```
def usage(transform_context, record_store_df):
..
..
return instance_usage_df
```
#### fetch_quantity ####
This component groups record store records by `aggregation_group_by_list`, sorts within
group by timestamp field, finds usage based on `usage_fetch_operation`. Optionally this
component also takes `filter_by_list` to include for exclude certain records from usage
calculation.
*Other parameters*
* **aggregation_group_by_list**
List of fields to group by.
Possible values: any set of fields in record store data. Please also see [Special notation](#special-notation).
Example:
```
"aggregation_group_by_list": ["tenant_id"]
```
* **usage_fetch_operation**
Operation to be performed on grouped data set.
*Possible values:* "sum", "max", "min", "avg", "latest", "oldest"
* **aggregation_period**
Period to aggregate by.
*Possible values:* 'daily', 'hourly', 'minutely', 'secondly'.
Example:
```
"aggregation_period": "hourly"
```
* **filter_by_list**
Filter (include or exclude) record store data as specified.
Example:
```
filter_by_list": "[{"field_to_filter": "hostname",
"filter_expression": "comp-(\d)+",
"filter_operation": "include"}]
```
OR
```
filter_by_list": "[{"field_to_filter": "hostname",
"filter_expression": "controller-(\d)+",
"filter_operation": "exclude"}]
```
#### fetch_quantity_util ####
This component finds the utilized quantity based on *total_quantity* and *idle_perc* using
following calculation
```
utilized_quantity = (100 - idle_perc) * total_quantity / 100
```
where,
* **total_quantity** data, identified by `usage_fetch_util_quantity_event_type` parameter and
* **idle_perc** data, identified by `usage_fetch_util_idle_perc_event_type` parameter
This component initially groups record store records by `aggregation_group_by_list` and
`event_type`, sorts within group by timestamp field, calculates `total_quantity` and
`idle_perc` values based on `usage_fetch_operation`. `utilized_quantity` is then calculated
using the formula given above.
*Other parameters*
* **aggregation_group_by_list**
List of fields to group by.
Possible values: any set of fields in record store data. Please also see [Special notation](#special-notation) below.
Example:
```
"aggregation_group_by_list": ["tenant_id"]
```
* **usage_fetch_operation**
Operation to be performed on grouped data set
*Possible values:* "sum", "max", "min", "avg", "latest", "oldest"
* **aggregation_period**
Period to aggregate by.
*Possible values:* 'daily', 'hourly', 'minutely', 'secondly'.
Example:
```
"aggregation_period": "hourly"
```
* **filter_by_list**
Filter (include or exclude) record store data as specified
Example:
```
filter_by_list": "[{"field_to_filter": "hostname",
"filter_expression": "comp-(\d)+",
"filter_operation": "include"}]
```
OR
```
filter_by_list": "[{"field_to_filter": "hostname",
"filter_expression": "controller-(\d)+",
"filter_operation": "exclude"}]
```
* **usage_fetch_util_quantity_event_type**
event type (metric name) to identify data which will be used to calculate `total_quantity`
*Possible values:* metric name
Example:
```
"usage_fetch_util_quantity_event_type": "cpu.total_logical_cores"
```
* **usage_fetch_util_idle_perc_event_type**
event type (metric name) to identify data which will be used to calculate `total_quantity`
*Possible values:* metric name
Example:
```
"usage_fetch_util_idle_perc_event_type": "cpu.idle_perc"
```
#### calculate_rate ####
This component finds the rate of change of quantity (in percent) over a time period using
following calculation
```
rate_of_change (in percent) = ((oldest_quantity - latest_quantity)/oldest_quantity) * 100
```
where,
* **oldest_quantity**: oldest (or earliest) `average` quantity if there are multiple quantites in a
group for a given time period.
* **latest_quantity**: latest `average` quantity if there are multiple quantities in a group
for a given time period
*Other parameters*
* **aggregation_group_by_list**
List of fields to group by.
Possible values: any set of fields in record store data. Please also see [Special notation](#special-notation) below.
Example:
```
"aggregation_group_by_list": ["tenant_id"]
```
* **usage_fetch_operation**
Operation to be performed on grouped data set
*Possible values:* "sum", "max", "min", "avg", "latest", "oldest"
* **aggregation_period**
Period to aggregate by.
*Possible values:* 'daily', 'hourly', 'minutely', 'secondly'.
Example:
```
"aggregation_period": "hourly"
```
* **filter_by_list**
Filter (include or exclude) record store data as specified
Example:
```
filter_by_list": "[{"field_to_filter": "hostname",
"filter_expression": "comp-(\d)+",
"filter_operation": "include"}]
```
OR
```
filter_by_list": "[{"field_to_filter": "hostname",
"filter_expression": "controller-(\d)+",
"filter_operation": "exclude"}]
```
### Setter Components ###
All usage components implement a method
```
def setter(transform_context, instance_usage_df):
..
..
return instance_usage_df
```
#### set_aggregated_metric_name ####
This component sets final aggregated metric name by setting `aggregated_metric_name` field in
`instance_usage` data.
*Other parameters*
* **aggregated_metric_name**
Name of the metric name being generated.
*Possible values:* any aggregated metric name. Convention is to end the metric name
with "_agg".
Example:
```
"aggregated_metric_name":"cpu.total_logical_cores_agg"
```
#### set_aggregated_period ####
This component sets final aggregated metric name by setting `aggregation_period` field in
`instance_usage` data.
*Other parameters*
* **aggregated_period**
Name of the metric name being generated.
*Possible values:* 'daily', 'hourly', 'minutely', 'secondly'.
Example:
```
"aggregation_period": "hourly"
```
**Note** If you are publishing metrics to *metrics_pre_hourly* kafka topic using
`insert_data_pre_hourly` component(See *insert_data_pre_hourly* component below),
`aggregation_period` will have to be set to `hourly`since by default all data in
*metrics_pre_hourly* topic, by default gets aggregated every hour by `Pre Hourly Processor` (See
`Processors` section below)
#### rollup_quantity ####
This component groups `instance_usage` records by `setter_rollup_group_by_list`, sorts within
group by timestamp field, finds usage based on `setter_fetch_operation`.
*Other parameters*
* **setter_rollup_group_by_list**
List of fields to group by.
Possible values: any set of fields in record store data. Please also see [Special notation](#special-notation) below.
Example:
```
"setter_rollup_group_by_list": ["tenant_id"]
```
* **setter_fetch_operation**
Operation to be performed on grouped data set
*Possible values:* "sum", "max", "min", "avg"
Example:
```
"setter_fetch_operation": "avg"
```
* **aggregation_period**
Period to aggregate by.
*Possible values:* 'daily', 'hourly', 'minutely', 'secondly'.
Example:
```
"aggregation_period": "hourly"
```
### Insert Components ###
All usage components implement a method
```
def insert(transform_context, instance_usage_df):
..
..
return instance_usage_df
```
#### insert_data ####
This component converts `instance_usage` data into monasca metric format and writes the metric to
`metrics` topic in kafka.
*Other parameters*
* **dimension_list**
List of fields in `instance_usage` data that should be converted to monasca metric dimensions.
*Possible values:* any fields in `instance_usage` data or use [Special notation](#special-notation) below.
Example:
```
"dimension_list":["aggregation_period","host","project_id"]
```
#### insert_data_pre_hourly ####
This component converts `instance_usage` data into monasca metric format and writes the metric to
`metrics_pre_hourly` topic in kafka.
*Other parameters*
* **dimension_list**
List of fields in `instance_usage` data that should be converted to monasca metric dimensions.
*Possible values:* any fields in `instance_usage` data
Example:
```
"dimension_list":["aggregation_period","host","project_id"]
```
## Processors ##
Processors are special components that process data from a kafka topic, at the desired time
interval. These are different from generic aggregation components since they process data from
specific kafka topic.
All processor components implement following methods
```
def get_app_name(self):
[...]
return app_name
def is_time_to_run(self, current_time):
if current_time > last_invoked + 1:
return True
else:
return False
def run_processor(self, time):
# do work...
```
### pre_hourly_processor ###
Pre Hourly Processor, runs every hour and aggregates `instance_usage` data published to
`metrics_pre_hourly` topic.
Pre Hourly Processor by default is set to run 10 minutes after the top of the hour and processes
data from previous hour. `instance_usage` data is grouped by `pre_hourly_group_by_list`
*Other parameters*
* **pre_hourly_group_by_list**
List of fields to group by.
Possible values: any set of fields in `instance_usage` data or to `default`. Please also see
[Special notation](#special-notation) below.
Note: setting to `default` will group `instance_usage` data by `tenant_id`, `user_id`,
`resource_uuid`, `geolocation`, `region`, `zone`, `host`, `project_id`,
`aggregated_metric_name`, `aggregation_period`
Example:
```
"pre_hourly_group_by_list": ["tenant_id"]
```
OR
```
"pre_hourly_group_by_list": ["default"]
```
* **pre_hourly_operation**
Operation to be performed on grouped data set.
*Possible values:* "sum", "max", "min", "avg", "rate"
Example:
```
"pre_hourly_operation": "avg"
```
## Special notation ##
### pre_transform spec ###
To specify `required_raw_fields_list` please use special notation
`dimensions#{$field_name}` or `meta#{$field_name}` or`value_meta#{$field_name}` to refer to any field in
dimension, meta or value_meta field in the incoming raw metric.
For example if you want to check that for a particular metric say dimension called "pod_name" is
present and is non-empty, then simply add `dimensions#pod_name` to the
`required_raw_fields_list`.
Example `pre_transform` spec
```
{"event_processing_params":{"set_default_zone_to":"1",
"set_default_geolocation_to":"1",
"set_default_region_to":"W"},
"event_type":"pod.net.in_bytes_sec",
"metric_id_list":["pod_net_in_b_per_sec_per_namespace"],
"required_raw_fields_list":["creation_time",
"meta#tenantId",
"dimensions#namespace",
"dimensions#pod_name",
"dimensions#app"]
}
```
### transform spec ###
To specify `aggregation_group_by_list`, `setter_rollup_group_by_list`, `pre_hourly_group_by_list`,
`dimension_list`, you can also use special notation `dimensions#{$field_name}` or `meta#{$field_name}`
or`value_meta#$field_name` to refer to any field in dimension, meta or value_meta field in the
incoming raw metric.
For example following `transform_spec` will aggregate by "app", "namespace" and "pod_name"
dimensions, then will do a rollup of the aggregated data by "namespace" dimension, and write final
aggregated metric with "app", "namespace" and "pod_name" dimensions. Note that "app" and "pod_name"
will be set to "all" since the final rollup operation was done only based on "namespace" dimension.
```
{
"aggregation_params_map":{
"aggregation_pipeline":{"source":"streaming",
"usage":"fetch_quantity",
"setters":["rollup_quantity",
"set_aggregated_metric_name",
"set_aggregated_period"],
"insert":["prepare_data",
"insert_data_pre_hourly"]},
"aggregated_metric_name":"pod.net.in_bytes_sec_agg",
"aggregation_period":"hourly",
"aggregation_group_by_list": ["tenant_id",
"dimensions#app",
"dimensions#namespace",
"dimensions#pod_name"],
"usage_fetch_operation": "avg",
"filter_by_list": [],
"setter_rollup_group_by_list":["dimensions#namespace"],
"setter_rollup_operation": "sum",
"dimension_list":["aggregation_period",
"dimensions#app",
"dimensions#namespace",
"dimensions#pod_name"],
"pre_hourly_operation":"avg",
"pre_hourly_group_by_list":["aggregation_period",
"dimensions#namespace]'"]},
"metric_group":"pod_net_in_b_per_sec_per_namespace",
"metric_id":"pod_net_in_b_per_sec_per_namespace"}
```
# Putting it all together
Please refer to [Create a new aggregation pipeline](create-new-aggregation-pipeline.md) document to
create a new aggregation pipeline.

View File

@ -1,89 +0,0 @@
[DEFAULTS]
[repositories]
offsets = monasca_transform.mysql_offset_specs:MySQLOffsetSpecs
data_driven_specs = monasca_transform.data_driven_specs.mysql_data_driven_specs_repo:MySQLDataDrivenSpecsRepo
offsets_max_revisions = 10
[database]
server_type = mysql:thin
host = localhost
database_name = monasca_transform
username = m-transform
password = password
[messaging]
adapter = monasca_transform.messaging.adapter:KafkaMessageAdapter
topic = metrics
brokers = localhost:9092
publish_kafka_project_id = d2cb21079930415a9f2a33588b9f2bb6
publish_region = useast
adapter_pre_hourly = monasca_transform.messaging.adapter:KafkaMessageAdapterPreHourly
topic_pre_hourly = metrics_pre_hourly
[stage_processors]
enable_pre_hourly_processor = True
[pre_hourly_processor]
enable_instance_usage_df_cache = True
instance_usage_df_cache_storage_level = MEMORY_ONLY_SER_2
enable_batch_time_filtering = True
effective_batch_revision=2
#
# Configurable values for the monasca-transform service
#
[service]
# The address of the mechanism being used for election coordination
coordinator_address = kazoo://localhost:2181
# The name of the coordination/election group
coordinator_group = monasca-transform
# How long the candidate should sleep between election result
# queries (in seconds)
election_polling_frequency = 15
# Whether debug-level log entries should be included in the application
# log. If this setting is false, info-level will be used for logging.
enable_debug_log_entries = true
# The path for the setup file to be executed
setup_file = /opt/stack/monasca-transform/setup.py
# The target of the setup file
setup_target = bdist_egg
# The path for the monasca-transform Spark driver
spark_driver = /opt/stack/monasca-transform/monasca_transform/driver/mon_metrics_kafka.py
# the location for the transform-service log
service_log_path=/var/log/monasca/transform/
# the filename for the transform-service log
service_log_filename=monasca-transform.log
# Whether Spark event logging should be enabled (true/false)
spark_event_logging_enabled = true
# A list of jars which Spark should use
spark_jars_list = /opt/spark/current/assembly/target/scala-2.10/jars/spark-streaming-kafka-0-8_2.10-2.1.1.jar,/opt/spark/current/assembly/target/scala-2.10/jars/scala-library-2.10.6.jar,/opt/spark/current/assembly/target/scala-2.10/jars/kafka_2.10-0.8.1.1.jar,/opt/spark/current/assembly/target/scala-2.10/jars/metrics-core-2.2.0.jar,/opt/spark/current/assembly/target/scala-2.10/jars/drizzle-jdbc-1.3.jar
# A list of where the Spark master(s) should run
spark_master_list = spark://localhost:7077
# spark_home for the environment
spark_home = /opt/spark/current
# Python files for Spark to use
spark_python_files = /opt/stack/monasca-transform/dist/monasca_transform-0.0.1.egg
# How often the stream should be read (in seconds)
stream_interval = 600
# The working directory for monasca-transform
work_dir = /opt/stack/monasca-transform
enable_record_store_df_cache = True
record_store_df_cache_storage_level = MEMORY_ONLY_SER_2

View File

@ -1,84 +0,0 @@
alabaster==0.7.10
Babel==2.5.3
certifi==2018.1.18
chardet==3.0.4
cliff==2.11.0
cmd2==0.8.1
contextlib2==0.5.5
coverage==4.0
debtcollector==1.19.0
docutils==0.14
enum-compat==0.0.2
eventlet==0.20.0
extras==1.0.0
fasteners==0.14.1
fixtures==3.0.0
flake8==2.5.5
future==0.16.0
futurist==1.6.0
greenlet==0.4.13
hacking==1.1.0
idna==2.6
imagesize==1.0.0
iso8601==0.1.12
Jinja2==2.10
kazoo==2.4.0
linecache2==1.0.0
MarkupSafe==1.0
mccabe==0.2.1
monasca-common==2.7.0
monotonic==1.4
msgpack==0.5.6
netaddr==0.7.19
netifaces==0.10.6
nose==1.3.7
os-testr==1.0.0
oslo.concurrency==3.26.0
oslo.config==5.2.0
oslo.context==2.20.0
oslo.i18n==3.20.0
oslo.log==3.36.0
oslo.policy==1.34.0
oslo.serialization==2.25.0
oslo.service==1.24.0
oslo.utils==3.36.0
Paste==2.0.3
PasteDeploy==1.5.2
pbr==2.0.0
pep8==1.5.7
prettytable==0.7.2
psutil==3.2.2
pycodestyle==2.5.0
pyflakes==0.8.1
Pygments==2.2.0
pyinotify==0.9.6
PyMySQL==0.7.6
pyparsing==2.2.0
pyperclip==1.6.0
python-dateutil==2.7.0
python-mimeparse==1.6.0
python-subunit==1.2.0
pytz==2018.3
PyYAML==3.12
repoze.lru==0.7
requests==2.18.4
rfc3986==1.1.0
Routes==2.4.1
six==1.10.0
snowballstemmer==1.2.1
Sphinx==1.6.2
sphinxcontrib-websupport==1.0.1
SQLAlchemy==1.0.10
stestr==2.0.0
stevedore==1.20.0
tabulate==0.8.2
tenacity==4.9.0
testtools==2.3.0
tooz==1.58.0
traceback2==1.4.0
ujson==1.35
unittest2==1.1.0
urllib3==1.22
voluptuous==0.11.1
WebOb==1.7.4
wrapt==1.10.11

View File

@ -1,41 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from collections import namedtuple
class Component(object):
SOURCE_COMPONENT_TYPE = "source"
USAGE_COMPONENT_TYPE = "usage"
SETTER_COMPONENT_TYPE = "setter"
INSERT_COMPONENT_TYPE = "insert"
DEFAULT_UNAVAILABLE_VALUE = "NA"
InstanceUsageDataAggParamsBase = namedtuple('InstanceUsageDataAggParams',
['instance_usage_data',
'agg_params'])
class InstanceUsageDataAggParams(InstanceUsageDataAggParamsBase):
"""A tuple which is a wrapper containing the instance usage data and aggregation params
namdetuple contains:
instance_usage_data - instance usage
agg_params - aggregation params dict
"""

View File

@ -1,50 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import logging
LOG = logging.getLogger(__name__)
class ComponentUtils(object):
@staticmethod
def _get_group_by_period_list(aggregation_period):
"""get a list of columns for an aggregation period."""
group_by_period_list = []
if (aggregation_period == "daily"):
group_by_period_list = ["event_date"]
elif (aggregation_period == "hourly"):
group_by_period_list = ["event_date", "event_hour"]
elif (aggregation_period == "minutely"):
group_by_period_list = ["event_date", "event_hour", "event_minute"]
elif (aggregation_period == "secondly"):
group_by_period_list = ["event_date", "event_hour",
"event_minute", "event_second"]
return group_by_period_list
@staticmethod
def _get_instance_group_by_period_list(aggregation_period):
"""get a list of columns for an aggregation period."""
group_by_period_list = []
if (aggregation_period == "daily"):
group_by_period_list = ["usage_date"]
elif (aggregation_period == "hourly"):
group_by_period_list = ["usage_date", "usage_hour"]
elif (aggregation_period == "minutely"):
group_by_period_list = ["usage_date", "usage_hour", "usage_minute"]
elif (aggregation_period == "secondly"):
group_by_period_list = ["usage_date", "usage_hour",
"usage_minute", "usage_second"]
return group_by_period_list

View File

@ -1,204 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import abc
import json
import time
from monasca_common.validation import metrics as metric_validator
from monasca_transform.component import Component
from monasca_transform.config.config_initializer import ConfigInitializer
from monasca_transform.log_utils import LogUtils
from monasca_transform.transform.transform_utils import InstanceUsageUtils
from oslo_config import cfg
ConfigInitializer.basic_config()
log = LogUtils.init_logger(__name__)
class InsertComponent(Component):
@abc.abstractmethod
def insert(transform_context, instance_usage_df):
raise NotImplementedError(
"Class %s doesn't implement setter(instance_usage_df,"
" transform_spec_df)"
% __name__)
@staticmethod
def get_component_type():
return Component.INSERT_COMPONENT_TYPE
@staticmethod
def _validate_metric(metric):
"""validate monasca metric."""
try:
# validate metric part, without the wrapper
metric_validator.validate(metric["metric"])
except Exception as e:
log.info("Metric %s is invalid: Exception : %s"
% (json.dumps(metric), str(e)))
return False
return True
@staticmethod
def _prepare_metric(instance_usage_dict, agg_params):
"""transform instance usage rdd to a monasca metric.
example metric:
{"metric":{"name":"host_alive_status",
"dimensions":{"hostname":"mini-mon",
"observer_host":"devstack",
"test_type":"ssh"},
"timestamp":1456858016000,
"value":1.0,
"value_meta":{"error":
"Unable to open socket to host mini-mon"}
},
"meta":{"tenantId":"8eadcf71fc5441d8956cb9cbb691704e",
"region":"useast"},
"creation_time":1456858034
}
"""
current_epoch_seconds = time.time()
current_epoch_milliseconds = current_epoch_seconds * 1000
log.debug(instance_usage_dict)
# extract dimensions
dimension_list = agg_params["dimension_list"]
dimensions_part = InstanceUsageUtils.extract_dimensions(instance_usage_dict,
dimension_list)
meta_part = {}
# TODO(someone) determine the appropriate tenant ID to use. For now,
# what works is to use the same tenant ID as other metrics specify in
# their kafka messages (and this appears to change each time mini-mon
# is re-installed). The long term solution is to have HLM provide
# a usable tenant ID to us in a configurable way. BTW, without a
# proper/valid tenant ID, aggregated metrics don't get persisted
# to the Monasca DB.
meta_part["tenantId"] = cfg.CONF.messaging.publish_kafka_project_id
meta_part["region"] = cfg.CONF.messaging.publish_region
value_meta_part = {"record_count": instance_usage_dict.get(
"record_count", 0),
"firstrecord_timestamp_string":
instance_usage_dict.get(
"firstrecord_timestamp_string",
Component.DEFAULT_UNAVAILABLE_VALUE),
"lastrecord_timestamp_string":
instance_usage_dict.get(
"lastrecord_timestamp_string",
Component.DEFAULT_UNAVAILABLE_VALUE)}
metric_part = {"name": instance_usage_dict.get(
"aggregated_metric_name"),
"dimensions": dimensions_part,
"timestamp": int(current_epoch_milliseconds),
"value": instance_usage_dict.get(
"quantity", 0.0),
"value_meta": value_meta_part}
metric = {"metric": metric_part,
"meta": meta_part,
"creation_time": int(current_epoch_seconds)}
log.debug(metric)
return metric
@staticmethod
def _get_metric(row, agg_params):
"""write data to kafka. extracts and formats metric data and write s the data to kafka"""
instance_usage_dict = {"tenant_id": row.tenant_id,
"user_id": row.user_id,
"resource_uuid": row.resource_uuid,
"geolocation": row.geolocation,
"region": row.region,
"zone": row.zone,
"host": row.host,
"project_id": row.project_id,
"aggregated_metric_name":
row.aggregated_metric_name,
"quantity": row.quantity,
"firstrecord_timestamp_string":
row.firstrecord_timestamp_string,
"lastrecord_timestamp_string":
row.lastrecord_timestamp_string,
"record_count": row.record_count,
"usage_date": row.usage_date,
"usage_hour": row.usage_hour,
"usage_minute": row.usage_minute,
"aggregation_period":
row.aggregation_period,
"extra_data_map":
row.extra_data_map}
metric = InsertComponent._prepare_metric(instance_usage_dict,
agg_params)
return metric
@staticmethod
def _get_instance_usage_pre_hourly(row,
metric_id):
"""write data to kafka. extracts and formats metric data and writes the data to kafka"""
# retrieve the processing meta from the row
processing_meta = row.processing_meta
# add transform spec metric id to the processing meta
if processing_meta:
processing_meta["metric_id"] = metric_id
else:
processing_meta = {"metric_id": metric_id}
instance_usage_dict = {"tenant_id": row.tenant_id,
"user_id": row.user_id,
"resource_uuid": row.resource_uuid,
"geolocation": row.geolocation,
"region": row.region,
"zone": row.zone,
"host": row.host,
"project_id": row.project_id,
"aggregated_metric_name":
row.aggregated_metric_name,
"quantity": row.quantity,
"firstrecord_timestamp_string":
row.firstrecord_timestamp_string,
"lastrecord_timestamp_string":
row.lastrecord_timestamp_string,
"firstrecord_timestamp_unix":
row.firstrecord_timestamp_unix,
"lastrecord_timestamp_unix":
row.lastrecord_timestamp_unix,
"record_count": row.record_count,
"usage_date": row.usage_date,
"usage_hour": row.usage_hour,
"usage_minute": row.usage_minute,
"aggregation_period":
row.aggregation_period,
"processing_meta": processing_meta,
"extra_data_map": row.extra_data_map}
return instance_usage_dict
@staticmethod
def _write_metrics_from_partition(partlistiter):
"""iterate through all rdd elements in partition and write metrics to kafka"""
for part in partlistiter:
agg_params = part.agg_params
row = part.instance_usage_data
InsertComponent._write_metric(row, agg_params)

View File

@ -1,65 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from monasca_transform.component.insert import InsertComponent
from monasca_transform.config.config_initializer import ConfigInitializer
from monasca_transform.messaging.adapter import KafkaMessageAdapter
class KafkaInsert(InsertComponent):
"""Insert component that writes instance usage data to kafka queue"""
@staticmethod
def insert(transform_context, instance_usage_df):
"""write instance usage data to kafka"""
# object to init config
ConfigInitializer.basic_config()
transform_spec_df = transform_context.transform_spec_df_info
agg_params = transform_spec_df.select(
"aggregation_params_map.dimension_list").collect()[0].asDict()
# Approach # 1
# using foreachPartition to iterate through elements in an
# RDD is the recommended approach so as to not overwhelm kafka with the
# zillion connections (but in our case the MessageAdapter does
# store the adapter_impl so we should not create many producers)
# using foreachpartitions was causing some serialization/cpickle
# problems where few libs like kafka.SimpleProducer and oslo_config.cfg
# were not available in foreachPartition method
#
# removing _write_metrics_from_partition for now in favor of
# Approach # 2
#
# instance_usage_df_agg_params = instance_usage_df.rdd.map(
# lambda x: InstanceUsageDataAggParams(x,
# agg_params))
# instance_usage_df_agg_params.foreachPartition(
# DummyInsert._write_metrics_from_partition)
# Approach # 2
# using collect() to fetch all elements of an RDD and write to
# kafka
for instance_usage_row in instance_usage_df.collect():
metric = InsertComponent._get_metric(
instance_usage_row, agg_params)
# validate metric part
if InsertComponent._validate_metric(metric):
KafkaMessageAdapter.send_metric(metric)
return instance_usage_df

View File

@ -1,44 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from monasca_transform.component.insert import InsertComponent
from monasca_transform.config.config_initializer import ConfigInitializer
from monasca_transform.messaging.adapter import KafkaMessageAdapterPreHourly
class KafkaInsertPreHourly(InsertComponent):
"""Insert component that writes instance usage data to kafka queue"""
@staticmethod
def insert(transform_context, instance_usage_df):
"""write instance usage data to kafka"""
# object to init config
ConfigInitializer.basic_config()
transform_spec_df = transform_context.transform_spec_df_info
agg_params = transform_spec_df.select(
"metric_id").\
collect()[0].asDict()
metric_id = agg_params["metric_id"]
for instance_usage_row in instance_usage_df.collect():
instance_usage_dict = \
InsertComponent._get_instance_usage_pre_hourly(
instance_usage_row,
metric_id)
KafkaMessageAdapterPreHourly.send_metric(instance_usage_dict)
return instance_usage_df

View File

@ -1,28 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from monasca_transform.component.insert import InsertComponent
class PrepareData(InsertComponent):
"""prepare for insert component validates instance usage data before calling Insert component"""
@staticmethod
def insert(transform_context, instance_usage_df):
"""write instance usage data to kafka"""
#
# TODO(someone) add instance usage data validation
#
return instance_usage_df

View File

@ -1,31 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import abc
from monasca_transform.component import Component
class SetterComponent(Component):
@abc.abstractmethod
def setter(transform_context, instance_usage_df):
raise NotImplementedError(
"Class %s doesn't implement setter(instance_usage_df,"
" transform_context)"
% __name__)
@staticmethod
def get_component_type():
"""get component type."""
return Component.SETTER_COMPONENT_TYPE

View File

@ -1,125 +0,0 @@
# (c) Copyright 2016 Hewlett Packard Enterprise Development LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from pyspark.sql import functions
from pyspark.sql import SQLContext
from monasca_transform.component.setter import SetterComponent
from monasca_transform.transform.transform_utils import InstanceUsageUtils
import json
class PreHourlyCalculateRateException(Exception):
"""Exception thrown when doing pre-hourly rate calculations
Attributes:
value: string representing the error
"""
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)
class PreHourlyCalculateRate(SetterComponent):
@staticmethod
def _calculate_rate(instance_usage_df):
instance_usage_data_json_list = []
try:
sorted_oldest_ascending_df = instance_usage_df.sort(
functions.asc("processing_meta.oldest_timestamp_string"))
sorted_latest_descending_df = instance_usage_df.sort(
functions.desc("processing_meta.latest_timestamp_string"))
# Calculate the rate change by percentage
oldest_dict = sorted_oldest_ascending_df.collect()[0].asDict()
oldest_quantity = float(oldest_dict[
"processing_meta"]["oldest_quantity"])
latest_dict = sorted_latest_descending_df.collect()[0].asDict()
latest_quantity = float(latest_dict[
"processing_meta"]["latest_quantity"])
rate_percentage = 100 * (
(oldest_quantity - latest_quantity) / oldest_quantity)
# get any extra data
extra_data_map = getattr(sorted_oldest_ascending_df.collect()[0],
"extra_data_map", {})
except Exception as e:
raise PreHourlyCalculateRateException(
"Exception occurred in pre-hourly rate calculation. Error: %s"
% str(e))
# create a new instance usage dict
instance_usage_dict = {"tenant_id":
latest_dict.get("tenant_id", "all"),
"user_id":
latest_dict.get("user_id", "all"),
"resource_uuid":
latest_dict.get("resource_uuid", "all"),
"geolocation":
latest_dict.get("geolocation", "all"),
"region":
latest_dict.get("region", "all"),
"zone":
latest_dict.get("zone", "all"),
"host":
latest_dict.get("host", "all"),
"project_id":
latest_dict.get("project_id", "all"),
"aggregated_metric_name":
latest_dict["aggregated_metric_name"],
"quantity": rate_percentage,
"firstrecord_timestamp_unix":
oldest_dict["firstrecord_timestamp_unix"],
"firstrecord_timestamp_string":
oldest_dict["firstrecord_timestamp_string"],
"lastrecord_timestamp_unix":
latest_dict["lastrecord_timestamp_unix"],
"lastrecord_timestamp_string":
latest_dict["lastrecord_timestamp_string"],
"record_count": oldest_dict["record_count"] +
latest_dict["record_count"],
"usage_date": latest_dict["usage_date"],
"usage_hour": latest_dict["usage_hour"],
"usage_minute": latest_dict["usage_minute"],
"aggregation_period":
latest_dict["aggregation_period"],
"extra_data_map": extra_data_map
}
instance_usage_data_json = json.dumps(instance_usage_dict)
instance_usage_data_json_list.append(instance_usage_data_json)
# convert to rdd
spark_context = instance_usage_df.rdd.context
return spark_context.parallelize(instance_usage_data_json_list)
@staticmethod
def do_rate_calculation(instance_usage_df):
instance_usage_json_rdd = PreHourlyCalculateRate._calculate_rate(
instance_usage_df)
sql_context = SQLContext.getOrCreate(instance_usage_df.rdd.context)
instance_usage_trans_df = InstanceUsageUtils.create_df_from_json_rdd(
sql_context,
instance_usage_json_rdd)
return instance_usage_trans_df

View File

@ -1,261 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from pyspark.sql import SQLContext
import datetime
from monasca_transform.component import Component
from monasca_transform.component.component_utils import ComponentUtils
from monasca_transform.component.setter import SetterComponent
from monasca_transform.transform.transform_utils import InstanceUsageUtils
import json
class RollupQuantityException(Exception):
"""Exception thrown when doing quantity rollup
Attributes:
value: string representing the error
"""
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)
class RollupQuantity(SetterComponent):
@staticmethod
def _supported_rollup_operations():
return ["sum", "max", "min", "avg"]
@staticmethod
def _is_valid_rollup_operation(operation):
if operation in RollupQuantity._supported_rollup_operations():
return True
else:
return False
@staticmethod
def _rollup_quantity(instance_usage_df,
setter_rollup_group_by_list,
setter_rollup_operation):
instance_usage_data_json_list = []
# check if operation is valid
if not RollupQuantity.\
_is_valid_rollup_operation(setter_rollup_operation):
raise RollupQuantityException(
"Operation %s is not supported" % setter_rollup_operation)
# call required operation on grouped data
# e.g. sum, max, min, avg etc
agg_operations_map = {
"quantity": str(setter_rollup_operation),
"firstrecord_timestamp_unix": "min",
"lastrecord_timestamp_unix": "max",
"record_count": "sum"}
# do a group by
grouped_data = instance_usage_df.groupBy(
*setter_rollup_group_by_list)
rollup_df = grouped_data.agg(agg_operations_map)
for row in rollup_df.collect():
# first record timestamp
earliest_record_timestamp_unix = getattr(
row, "min(firstrecord_timestamp_unix)",
Component.DEFAULT_UNAVAILABLE_VALUE)
earliest_record_timestamp_string = \
datetime.datetime.utcfromtimestamp(
earliest_record_timestamp_unix).strftime(
'%Y-%m-%d %H:%M:%S')
# last record_timestamp
latest_record_timestamp_unix = getattr(
row, "max(lastrecord_timestamp_unix)",
Component.DEFAULT_UNAVAILABLE_VALUE)
latest_record_timestamp_string = \
datetime.datetime.utcfromtimestamp(
latest_record_timestamp_unix).strftime('%Y-%m-%d %H:%M:%S')
# record count
record_count = getattr(row, "sum(record_count)", 0.0)
# quantity
# get expression that will be used to select quantity
# from rolled up data
select_quant_str = "".join((setter_rollup_operation, "(quantity)"))
quantity = getattr(row, select_quant_str, 0.0)
try:
processing_meta = row.processing_meta
except AttributeError:
processing_meta = {}
# create a column name, value pairs from grouped data
extra_data_map = InstanceUsageUtils.grouped_data_to_map(row,
setter_rollup_group_by_list)
# convert column names, so that values can be accessed by components
# later in the pipeline
extra_data_map = InstanceUsageUtils.prepare_extra_data_map(extra_data_map)
# create a new instance usage dict
instance_usage_dict = {"tenant_id": getattr(row, "tenant_id",
"all"),
"user_id":
getattr(row, "user_id", "all"),
"resource_uuid":
getattr(row, "resource_uuid", "all"),
"geolocation":
getattr(row, "geolocation", "all"),
"region":
getattr(row, "region", "all"),
"zone":
getattr(row, "zone", "all"),
"host":
getattr(row, "host", "all"),
"project_id":
getattr(row, "tenant_id", "all"),
"aggregated_metric_name":
getattr(row, "aggregated_metric_name",
"all"),
"quantity":
quantity,
"firstrecord_timestamp_unix":
earliest_record_timestamp_unix,
"firstrecord_timestamp_string":
earliest_record_timestamp_string,
"lastrecord_timestamp_unix":
latest_record_timestamp_unix,
"lastrecord_timestamp_string":
latest_record_timestamp_string,
"record_count": record_count,
"usage_date":
getattr(row, "usage_date", "all"),
"usage_hour":
getattr(row, "usage_hour", "all"),
"usage_minute":
getattr(row, "usage_minute", "all"),
"aggregation_period":
getattr(row, "aggregation_period",
"all"),
"processing_meta": processing_meta,
"extra_data_map": extra_data_map
}
instance_usage_data_json = json.dumps(instance_usage_dict)
instance_usage_data_json_list.append(instance_usage_data_json)
# convert to rdd
spark_context = instance_usage_df.rdd.context
return spark_context.parallelize(instance_usage_data_json_list)
@staticmethod
def setter(transform_context, instance_usage_df):
transform_spec_df = transform_context.transform_spec_df_info
# get rollup operation (sum, max, avg, min)
agg_params = transform_spec_df.select(
"aggregation_params_map.setter_rollup_operation").\
collect()[0].asDict()
setter_rollup_operation = agg_params["setter_rollup_operation"]
instance_usage_trans_df = RollupQuantity.setter_by_operation(
transform_context,
instance_usage_df,
setter_rollup_operation)
return instance_usage_trans_df
@staticmethod
def setter_by_operation(transform_context, instance_usage_df,
setter_rollup_operation):
transform_spec_df = transform_context.transform_spec_df_info
# get fields we want to group by for a rollup
agg_params = transform_spec_df.select(
"aggregation_params_map.setter_rollup_group_by_list"). \
collect()[0].asDict()
setter_rollup_group_by_list = agg_params["setter_rollup_group_by_list"]
# get aggregation period
agg_params = transform_spec_df.select(
"aggregation_params_map.aggregation_period").collect()[0].asDict()
aggregation_period = agg_params["aggregation_period"]
group_by_period_list = \
ComponentUtils._get_instance_group_by_period_list(
aggregation_period)
# group by columns list
group_by_columns_list = \
group_by_period_list + setter_rollup_group_by_list
# prepare for group by
group_by_columns_list = InstanceUsageUtils.prepare_instance_usage_group_by_list(
group_by_columns_list)
# perform rollup operation
instance_usage_json_rdd = RollupQuantity._rollup_quantity(
instance_usage_df,
group_by_columns_list,
str(setter_rollup_operation))
sql_context = SQLContext.getOrCreate(instance_usage_df.rdd.context)
instance_usage_trans_df = InstanceUsageUtils.create_df_from_json_rdd(
sql_context,
instance_usage_json_rdd)
return instance_usage_trans_df
@staticmethod
def do_rollup(setter_rollup_group_by_list,
aggregation_period,
setter_rollup_operation,
instance_usage_df):
# get aggregation period
group_by_period_list = \
ComponentUtils._get_instance_group_by_period_list(
aggregation_period)
# group by columns list
group_by_columns_list = group_by_period_list + \
setter_rollup_group_by_list
# prepare for group by
group_by_columns_list = InstanceUsageUtils.prepare_instance_usage_group_by_list(
group_by_columns_list)
# perform rollup operation
instance_usage_json_rdd = RollupQuantity._rollup_quantity(
instance_usage_df,
group_by_columns_list,
str(setter_rollup_operation))
sql_context = SQLContext.getOrCreate(instance_usage_df.rdd.context)
instance_usage_trans_df = InstanceUsageUtils.create_df_from_json_rdd(
sql_context,
instance_usage_json_rdd)
return instance_usage_trans_df

View File

@ -1,97 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from pyspark.sql import SQLContext
from monasca_transform.component import InstanceUsageDataAggParams
from monasca_transform.component.setter import SetterComponent
from monasca_transform.transform.transform_utils import InstanceUsageUtils
import json
class SetAggregatedMetricName(SetterComponent):
"""setter component that sets final aggregated metric name.
aggregated metric name is available as a parameter 'aggregated_metric_name'
in aggregation_params in metric processing driver table.
"""
@staticmethod
def _set_aggregated_metric_name(instance_usage_agg_params):
row = instance_usage_agg_params.instance_usage_data
agg_params = instance_usage_agg_params.agg_params
try:
processing_meta = row.processing_meta
except AttributeError:
processing_meta = {}
# get any extra data
extra_data_map = getattr(row, "extra_data_map", {})
instance_usage_dict = {"tenant_id": row.tenant_id,
"user_id": row.user_id,
"resource_uuid": row.resource_uuid,
"geolocation": row.geolocation,
"region": row.region,
"zone": row.zone,
"host": row.host,
"project_id": row.project_id,
"aggregated_metric_name":
agg_params["aggregated_metric_name"],
"quantity": row.quantity,
"firstrecord_timestamp_unix":
row.firstrecord_timestamp_unix,
"firstrecord_timestamp_string":
row.firstrecord_timestamp_string,
"lastrecord_timestamp_unix":
row.lastrecord_timestamp_unix,
"lastrecord_timestamp_string":
row.lastrecord_timestamp_string,
"record_count": row.record_count,
"usage_date": row.usage_date,
"usage_hour": row.usage_hour,
"usage_minute": row.usage_minute,
"aggregation_period": row.aggregation_period,
"processing_meta": processing_meta,
"extra_data_map": extra_data_map}
instance_usage_data_json = json.dumps(instance_usage_dict)
return instance_usage_data_json
@staticmethod
def setter(transform_context, instance_usage_df):
"""set the aggregated metric name field for elements in instance usage rdd"""
transform_spec_df = transform_context.transform_spec_df_info
agg_params = transform_spec_df.select(
"aggregation_params_map.aggregated_metric_name").collect()[0].\
asDict()
instance_usage_df_agg_params = instance_usage_df.rdd.map(
lambda x: InstanceUsageDataAggParams(x, agg_params))
instance_usage_json_rdd = instance_usage_df_agg_params.map(
SetAggregatedMetricName._set_aggregated_metric_name)
sql_context = SQLContext.getOrCreate(instance_usage_df.rdd.context)
instance_usage_trans_df = InstanceUsageUtils.create_df_from_json_rdd(
sql_context,
instance_usage_json_rdd)
return instance_usage_trans_df

View File

@ -1,97 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from pyspark.sql import SQLContext
from monasca_transform.component import InstanceUsageDataAggParams
from monasca_transform.component.setter import SetterComponent
from monasca_transform.transform.transform_utils import InstanceUsageUtils
import json
class SetAggregatedPeriod(SetterComponent):
"""setter component that sets final aggregated metric name.
aggregated metric name is available as a parameter 'aggregated_metric_name'
in aggregation_params in metric processing driver table.
"""
@staticmethod
def _set_aggregated_period(instance_usage_agg_params):
row = instance_usage_agg_params.instance_usage_data
agg_params = instance_usage_agg_params.agg_params
try:
processing_meta = row.processing_meta
except AttributeError:
processing_meta = {}
# get any extra data
extra_data_map = getattr(row, "extra_data_map", {})
instance_usage_dict = {"tenant_id": row.tenant_id,
"user_id": row.user_id,
"resource_uuid": row.resource_uuid,
"geolocation": row.geolocation,
"region": row.region,
"zone": row.zone,
"host": row.host,
"project_id": row.project_id,
"aggregated_metric_name":
row.aggregated_metric_name,
"quantity": row.quantity,
"firstrecord_timestamp_unix":
row.firstrecord_timestamp_unix,
"firstrecord_timestamp_string":
row.firstrecord_timestamp_string,
"lastrecord_timestamp_unix":
row.lastrecord_timestamp_unix,
"lastrecord_timestamp_string":
row.lastrecord_timestamp_string,
"record_count": row.record_count,
"usage_date": row.usage_date,
"usage_hour": row.usage_hour,
"usage_minute": row.usage_minute,
"aggregation_period":
agg_params["aggregation_period"],
"processing_meta": processing_meta,
"extra_data_map": extra_data_map}
instance_usage_data_json = json.dumps(instance_usage_dict)
return instance_usage_data_json
@staticmethod
def setter(transform_context, instance_usage_df):
"""set the aggregated metric name field for elements in instance usage rdd"""
transform_spec_df = transform_context.transform_spec_df_info
agg_params = transform_spec_df.select(
"aggregation_params_map.aggregation_period").collect()[0].asDict()
instance_usage_df_agg_params = instance_usage_df.rdd.map(
lambda x: InstanceUsageDataAggParams(x, agg_params))
instance_usage_json_rdd = instance_usage_df_agg_params.map(
SetAggregatedPeriod._set_aggregated_period)
sql_context = SQLContext.getOrCreate(instance_usage_df.rdd.context)
instance_usage_trans_df = InstanceUsageUtils.create_df_from_json_rdd(
sql_context,
instance_usage_json_rdd)
return instance_usage_trans_df

View File

@ -1,30 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import abc
from monasca_transform.component import Component
class UsageComponent(Component):
@abc.abstractmethod
def usage(transform_context, record_store_df):
raise NotImplementedError(
"Class %s doesn't implement setter(instance_usage_df,"
" transform_spec_df)"
% __name__)
@staticmethod
def get_component_type():
return Component.USAGE_COMPONENT_TYPE

View File

@ -1,164 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from pyspark.sql import SQLContext
from monasca_transform.component import Component
from monasca_transform.component.setter.rollup_quantity import RollupQuantity
from monasca_transform.component.usage.fetch_quantity import FetchQuantity
from monasca_transform.component.usage import UsageComponent
from monasca_transform.transform.transform_utils import InstanceUsageUtils
import json
class CalculateRateException(Exception):
"""Exception thrown when calculating rate
Attributes:
value: string representing the error
"""
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)
class CalculateRate(UsageComponent):
@staticmethod
def usage(transform_context, record_store_df):
"""Method to return instance usage dataframe:
It groups together record store records by
provided group by columns list,sorts within the group by event
timestamp field, calculates the rate of change between the
oldest and latest values, and returns the resultant value as an
instance usage dataframe
"""
instance_usage_data_json_list = []
transform_spec_df = transform_context.transform_spec_df_info
# get aggregated metric name
agg_params = transform_spec_df.select(
"aggregation_params_map.aggregated_metric_name"). \
collect()[0].asDict()
aggregated_metric_name = agg_params["aggregated_metric_name"]
# get aggregation period
agg_params = transform_spec_df.select(
"aggregation_params_map.aggregation_period").collect()[0].asDict()
aggregation_period = agg_params["aggregation_period"]
# Fetch the oldest quantities
latest_instance_usage_df = \
FetchQuantity().usage_by_operation(transform_context,
record_store_df,
"avg")
# Roll up the latest quantities
latest_rolled_up_instance_usage_df = \
RollupQuantity().setter_by_operation(transform_context,
latest_instance_usage_df,
"sum")
# Fetch the oldest quantities
oldest_instance_usage_df = \
FetchQuantity().usage_by_operation(transform_context,
record_store_df,
"oldest")
# Roll up the oldest quantities
oldest_rolled_up_instance_usage_df = \
RollupQuantity().setter_by_operation(transform_context,
oldest_instance_usage_df,
"sum")
# Calculate the rate change by percentage
oldest_dict = oldest_rolled_up_instance_usage_df.collect()[0].asDict()
oldest_quantity = float(oldest_dict['quantity'])
latest_dict = latest_rolled_up_instance_usage_df.collect()[0].asDict()
latest_quantity = float(latest_dict['quantity'])
rate_percentage = \
((oldest_quantity - latest_quantity) / oldest_quantity) * 100
# create a new instance usage dict
instance_usage_dict = {"tenant_id":
latest_dict.get("tenant_id", "all"),
"user_id":
latest_dict.get("user_id", "all"),
"resource_uuid":
latest_dict.get("resource_uuid", "all"),
"geolocation":
latest_dict.get("geolocation", "all"),
"region":
latest_dict.get("region", "all"),
"zone":
latest_dict.get("zone", "all"),
"host":
latest_dict.get("host", "all"),
"project_id":
latest_dict.get("project_id", "all"),
"aggregated_metric_name":
aggregated_metric_name,
"quantity": rate_percentage,
"firstrecord_timestamp_unix":
oldest_dict["firstrecord_timestamp_unix"],
"firstrecord_timestamp_string":
oldest_dict["firstrecord_timestamp_string"],
"lastrecord_timestamp_unix":
latest_dict["lastrecord_timestamp_unix"],
"lastrecord_timestamp_string":
latest_dict["lastrecord_timestamp_string"],
"record_count": oldest_dict["record_count"] +
latest_dict["record_count"],
"usage_date": latest_dict["usage_date"],
"usage_hour": latest_dict["usage_hour"],
"usage_minute": latest_dict["usage_minute"],
"aggregation_period": aggregation_period,
"processing_meta":
{"event_type":
latest_dict.get("event_type",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"oldest_timestamp_string":
oldest_dict[
"firstrecord_timestamp_string"],
"oldest_quantity": oldest_quantity,
"latest_timestamp_string":
latest_dict[
"lastrecord_timestamp_string"],
"latest_quantity": latest_quantity
}
}
instance_usage_data_json = json.dumps(instance_usage_dict)
instance_usage_data_json_list.append(instance_usage_data_json)
spark_context = record_store_df.rdd.context
instance_usage_rdd = \
spark_context.parallelize(instance_usage_data_json_list)
sql_context = SQLContext\
.getOrCreate(record_store_df.rdd.context)
instance_usage_df = InstanceUsageUtils.create_df_from_json_rdd(
sql_context,
instance_usage_rdd)
return instance_usage_df

View File

@ -1,478 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from collections import namedtuple
import datetime
from pyspark.sql import functions
from pyspark.sql import SQLContext
from monasca_transform.component import Component
from monasca_transform.component.component_utils import ComponentUtils
from monasca_transform.component.usage import UsageComponent
from monasca_transform.transform.grouping.group_sort_by_timestamp \
import GroupSortbyTimestamp
from monasca_transform.transform.grouping.group_sort_by_timestamp_partition \
import GroupSortbyTimestampPartition
from monasca_transform.transform.transform_utils import InstanceUsageUtils
from monasca_transform.transform.transform_utils import RecordStoreUtils
import json
class FetchQuantityException(Exception):
"""Exception thrown when fetching quantity
Attributes:
value: string representing the error
"""
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)
GroupedDataNamedTuple = namedtuple("GroupedDataWithOperation",
["grouped_data",
"usage_fetch_operation",
"group_by_columns_list"])
class GroupedDataNamedTuple(GroupedDataNamedTuple):
"""A tuple which is a wrapper containing record store data and the usage operation
namdetuple contains:
grouped_data - grouped record store data
usage_fetch_operation - operation to be performed on
grouped data group_by_columns_list - list of group by columns
"""
class FetchQuantity(UsageComponent):
@staticmethod
def _supported_fetch_operations():
return ["sum", "max", "min", "avg", "latest", "oldest"]
@staticmethod
def _is_valid_fetch_operation(operation):
"""return true if its a valid fetch operation"""
if operation in FetchQuantity._supported_fetch_operations():
return True
else:
return False
@staticmethod
def _get_latest_oldest_quantity(grouped_data_named_tuple):
"""Get quantity for each group.
By performing the requested usage operation and return a instance usage data.
"""
# row
grouping_results = grouped_data_named_tuple.\
grouped_data
# usage fetch operation
usage_fetch_operation = grouped_data_named_tuple.\
usage_fetch_operation
# group_by_columns_list
group_by_columns_list = grouped_data_named_tuple.\
group_by_columns_list
group_by_dict = grouping_results.grouping_key_dict
#
tenant_id = group_by_dict.get("tenant_id",
Component.DEFAULT_UNAVAILABLE_VALUE)
resource_uuid = group_by_dict.get("resource_uuid",
Component.DEFAULT_UNAVAILABLE_VALUE)
user_id = group_by_dict.get("user_id",
Component.DEFAULT_UNAVAILABLE_VALUE)
geolocation = group_by_dict.get("geolocation",
Component.DEFAULT_UNAVAILABLE_VALUE)
region = group_by_dict.get("region",
Component.DEFAULT_UNAVAILABLE_VALUE)
zone = group_by_dict.get("zone", Component.DEFAULT_UNAVAILABLE_VALUE)
host = group_by_dict.get("host", Component.DEFAULT_UNAVAILABLE_VALUE)
usage_date = group_by_dict.get("event_date",
Component.DEFAULT_UNAVAILABLE_VALUE)
usage_hour = group_by_dict.get("event_hour",
Component.DEFAULT_UNAVAILABLE_VALUE)
usage_minute = group_by_dict.get("event_minute",
Component.DEFAULT_UNAVAILABLE_VALUE)
aggregated_metric_name = group_by_dict.get(
"aggregated_metric_name", Component.DEFAULT_UNAVAILABLE_VALUE)
# stats
agg_stats = grouping_results.results
# get quantity for this host
quantity = None
if (usage_fetch_operation == "latest"):
quantity = agg_stats["lastrecord_quantity"]
elif usage_fetch_operation == "oldest":
quantity = agg_stats["firstrecord_quantity"]
firstrecord_timestamp_unix = agg_stats["firstrecord_timestamp_unix"]
firstrecord_timestamp_string = \
agg_stats["firstrecord_timestamp_string"]
lastrecord_timestamp_unix = agg_stats["lastrecord_timestamp_unix"]
lastrecord_timestamp_string = agg_stats["lastrecord_timestamp_string"]
record_count = agg_stats["record_count"]
# aggregation period
aggregation_period = Component.DEFAULT_UNAVAILABLE_VALUE
# event type
event_type = group_by_dict.get("event_type",
Component.DEFAULT_UNAVAILABLE_VALUE)
# add group by fields data to extra data map
# get existing extra_data_map if any
extra_data_map = group_by_dict.get("extra_data_map", {})
for column_name in group_by_columns_list:
column_value = group_by_dict.get(column_name, Component.
DEFAULT_UNAVAILABLE_VALUE)
extra_data_map[column_name] = column_value
instance_usage_dict = {"tenant_id": tenant_id, "user_id": user_id,
"resource_uuid": resource_uuid,
"geolocation": geolocation, "region": region,
"zone": zone, "host": host,
"aggregated_metric_name":
aggregated_metric_name,
"quantity": quantity,
"firstrecord_timestamp_unix":
firstrecord_timestamp_unix,
"firstrecord_timestamp_string":
firstrecord_timestamp_string,
"lastrecord_timestamp_unix":
lastrecord_timestamp_unix,
"lastrecord_timestamp_string":
lastrecord_timestamp_string,
"record_count": record_count,
"usage_date": usage_date,
"usage_hour": usage_hour,
"usage_minute": usage_minute,
"aggregation_period": aggregation_period,
"processing_meta": {"event_type": event_type},
"extra_data_map": extra_data_map
}
instance_usage_data_json = json.dumps(instance_usage_dict)
return instance_usage_data_json
@staticmethod
def _get_quantity(grouped_data_named_tuple):
# row
row = grouped_data_named_tuple.grouped_data
# usage fetch operation
usage_fetch_operation = grouped_data_named_tuple.\
usage_fetch_operation
# group by columns list
group_by_columns_list = grouped_data_named_tuple.\
group_by_columns_list
# first record timestamp # FIXME: beginning of epoch?
earliest_record_timestamp_unix = getattr(
row, "min(event_timestamp_unix_for_min)",
Component.DEFAULT_UNAVAILABLE_VALUE)
earliest_record_timestamp_string = \
datetime.datetime.utcfromtimestamp(
earliest_record_timestamp_unix).strftime(
'%Y-%m-%d %H:%M:%S')
# last record_timestamp # FIXME: beginning of epoch?
latest_record_timestamp_unix = getattr(
row, "max(event_timestamp_unix_for_max)",
Component.DEFAULT_UNAVAILABLE_VALUE)
latest_record_timestamp_string = \
datetime.datetime.utcfromtimestamp(
latest_record_timestamp_unix).strftime('%Y-%m-%d %H:%M:%S')
# record count
record_count = getattr(row, "count(event_timestamp_unix)", 0.0)
# quantity
# get expression that will be used to select quantity
# from rolled up data
select_quant_str = "".join((usage_fetch_operation, "(event_quantity)"))
quantity = getattr(row, select_quant_str, 0.0)
# create a column name, value pairs from grouped data
extra_data_map = InstanceUsageUtils.grouped_data_to_map(row,
group_by_columns_list)
# convert column names, so that values can be accessed by components
# later in the pipeline
extra_data_map = InstanceUsageUtils.prepare_extra_data_map(extra_data_map)
# create a new instance usage dict
instance_usage_dict = {"tenant_id": getattr(row, "tenant_id",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"user_id":
getattr(row, "user_id",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"resource_uuid":
getattr(row, "resource_uuid",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"geolocation":
getattr(row, "geolocation",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"region":
getattr(row, "region",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"zone":
getattr(row, "zone",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"host":
getattr(row, "host",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"project_id":
getattr(row, "tenant_id",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"aggregated_metric_name":
getattr(row, "aggregated_metric_name",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"quantity":
quantity,
"firstrecord_timestamp_unix":
earliest_record_timestamp_unix,
"firstrecord_timestamp_string":
earliest_record_timestamp_string,
"lastrecord_timestamp_unix":
latest_record_timestamp_unix,
"lastrecord_timestamp_string":
latest_record_timestamp_string,
"record_count": record_count,
"usage_date":
getattr(row, "event_date",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"usage_hour":
getattr(row, "event_hour",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"usage_minute":
getattr(row, "event_minute",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"aggregation_period":
getattr(row, "aggregation_period",
Component.
DEFAULT_UNAVAILABLE_VALUE),
"processing_meta": {"event_type": getattr(
row, "event_type",
Component.DEFAULT_UNAVAILABLE_VALUE)},
"extra_data_map": extra_data_map
}
instance_usage_data_json = json.dumps(instance_usage_dict)
return instance_usage_data_json
@staticmethod
def usage(transform_context, record_store_df):
"""Method to return the latest quantity as an instance usage dataframe:
It groups together record store records by
provided group by columns list , sorts within the group by event
timestamp field, applies group stats udf and returns the latest
quantity as an instance usage dataframe
"""
transform_spec_df = transform_context.transform_spec_df_info
# get rollup operation (sum, max, avg, min)
agg_params = transform_spec_df.select(
"aggregation_params_map.usage_fetch_operation").\
collect()[0].asDict()
usage_fetch_operation = agg_params["usage_fetch_operation"]
instance_usage_df = FetchQuantity.usage_by_operation(
transform_context, record_store_df, usage_fetch_operation)
return instance_usage_df
@staticmethod
def usage_by_operation(transform_context, record_store_df,
usage_fetch_operation):
"""Returns the latest quantity as a instance usage dataframe
It groups together record store records by
provided group by columns list , sorts within the group by event
timestamp field, applies group stats udf and returns the latest
quantity as an instance usage dataframe
"""
transform_spec_df = transform_context.transform_spec_df_info
# check if operation is valid
if not FetchQuantity. \
_is_valid_fetch_operation(usage_fetch_operation):
raise FetchQuantityException(
"Operation %s is not supported" % usage_fetch_operation)
# get aggregation period
agg_params = transform_spec_df.select(
"aggregation_params_map.aggregation_period").collect()[0].asDict()
aggregation_period = agg_params["aggregation_period"]
group_by_period_list = ComponentUtils._get_group_by_period_list(
aggregation_period)
# retrieve filter specifications
agg_params = transform_spec_df.select(
"aggregation_params_map.filter_by_list"). \
collect()[0].asDict()
filter_by_list = \
agg_params["filter_by_list"]
# if filter(s) have been specified, apply them one at a time
if filter_by_list:
for filter_element in filter_by_list:
field_to_filter = filter_element["field_to_filter"]
filter_expression = filter_element["filter_expression"]
filter_operation = filter_element["filter_operation"]
if (field_to_filter and
filter_expression and
filter_operation and
(filter_operation == "include" or
filter_operation == "exclude")):
if filter_operation == "include":
match = True
else:
match = False
# apply the specified filter to the record store
record_store_df = record_store_df.where(
functions.col(str(field_to_filter)).rlike(
str(filter_expression)) == match)
else:
raise FetchQuantityException(
"Encountered invalid filter details: "
"field to filter = %s, filter expression = %s, "
"filter operation = %s. All values must be "
"supplied and filter operation must be either "
"'include' or 'exclude'." % (field_to_filter,
filter_expression,
filter_operation))
# get what we want to group by
agg_params = transform_spec_df.select(
"aggregation_params_map.aggregation_group_by_list"). \
collect()[0].asDict()
aggregation_group_by_list = agg_params["aggregation_group_by_list"]
# group by columns list
group_by_columns_list = group_by_period_list + \
aggregation_group_by_list
# prepare group by columns list
group_by_columns_list = RecordStoreUtils.prepare_recordstore_group_by_list(
group_by_columns_list)
instance_usage_json_rdd = None
if (usage_fetch_operation == "latest" or
usage_fetch_operation == "oldest"):
grouped_rows_rdd = None
# FIXME:
# select group by method
IS_GROUP_BY_PARTITION = False
if (IS_GROUP_BY_PARTITION):
# GroupSortbyTimestampPartition is a more scalable
# since it creates groups using repartitioning and sorting
# but is disabled
# number of groups should be more than what is expected
# this might be hard to guess. Setting this to a very
# high number is adversely affecting performance
num_of_groups = 100
grouped_rows_rdd = \
GroupSortbyTimestampPartition. \
fetch_group_latest_oldest_quantity(
record_store_df, transform_spec_df,
group_by_columns_list,
num_of_groups)
else:
# group using key-value pair RDD's groupByKey()
grouped_rows_rdd = \
GroupSortbyTimestamp. \
fetch_group_latest_oldest_quantity(
record_store_df, transform_spec_df,
group_by_columns_list)
grouped_data_rdd_with_operation = grouped_rows_rdd.map(
lambda x:
GroupedDataNamedTuple(x,
str(usage_fetch_operation),
group_by_columns_list))
instance_usage_json_rdd = \
grouped_data_rdd_with_operation.map(
FetchQuantity._get_latest_oldest_quantity)
else:
record_store_df_int = \
record_store_df.select(
record_store_df.event_timestamp_unix.alias(
"event_timestamp_unix_for_min"),
record_store_df.event_timestamp_unix.alias(
"event_timestamp_unix_for_max"),
"*")
# for standard sum, max, min, avg operations on grouped data
agg_operations_map = {
"event_quantity": str(usage_fetch_operation),
"event_timestamp_unix_for_min": "min",
"event_timestamp_unix_for_max": "max",
"event_timestamp_unix": "count"}
# do a group by
grouped_data = record_store_df_int.groupBy(*group_by_columns_list)
grouped_record_store_df = grouped_data.agg(agg_operations_map)
grouped_data_rdd_with_operation = grouped_record_store_df.rdd.map(
lambda x:
GroupedDataNamedTuple(x,
str(usage_fetch_operation),
group_by_columns_list))
instance_usage_json_rdd = grouped_data_rdd_with_operation.map(
FetchQuantity._get_quantity)
sql_context = SQLContext.getOrCreate(record_store_df.rdd.context)
instance_usage_df = \
InstanceUsageUtils.create_df_from_json_rdd(sql_context,
instance_usage_json_rdd)
return instance_usage_df

View File

@ -1,280 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from pyspark.sql.functions import col
from pyspark.sql.functions import when
from pyspark.sql import SQLContext
from monasca_transform.component import Component
from monasca_transform.component.component_utils import ComponentUtils
from monasca_transform.component.usage.fetch_quantity import FetchQuantity
from monasca_transform.component.usage import UsageComponent
from monasca_transform.transform.transform_utils import InstanceUsageUtils
import json
class FetchQuantityUtilException(Exception):
"""Exception thrown when fetching quantity
Attributes:
value: string representing the error
"""
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)
class FetchQuantityUtil(UsageComponent):
@staticmethod
def _supported_fetch_quantity_util_operations():
# The results of "sum", "max", and "min" don't make sense and/or
# may be misleading (the latter two due to the metrics which are
# used as input to the utilization calculation potentially not
# being from the same time period...e.g., one being from the
# beginning of the streaming intervale and the other being from
# the end.
return ["avg", "latest", "oldest"]
@staticmethod
def _is_valid_fetch_quantity_util_operation(operation):
"""return true if its a valid fetch operation"""
if operation in FetchQuantityUtil.\
_supported_fetch_quantity_util_operations():
return True
else:
return False
@staticmethod
def _format_quantity_util(row):
"""Converts calculated utilized quantity to an instance usage format
Calculation based on idle percentage
"""
#
tenant_id = getattr(row, "tenant_id", "all")
resource_uuid = getattr(row, "resource_uuid",
Component.DEFAULT_UNAVAILABLE_VALUE)
user_id = getattr(row, "user_id",
Component.DEFAULT_UNAVAILABLE_VALUE)
geolocation = getattr(row, "geolocation",
Component.DEFAULT_UNAVAILABLE_VALUE)
region = getattr(row, "region", Component.DEFAULT_UNAVAILABLE_VALUE)
zone = getattr(row, "zone", Component.DEFAULT_UNAVAILABLE_VALUE)
host = getattr(row, "host", "all")
usage_date = getattr(row, "usage_date",
Component.DEFAULT_UNAVAILABLE_VALUE)
usage_hour = getattr(row, "usage_hour",
Component.DEFAULT_UNAVAILABLE_VALUE)
usage_minute = getattr(row, "usage_minute",
Component.DEFAULT_UNAVAILABLE_VALUE)
aggregated_metric_name = getattr(row, "aggregated_metric_name",
Component.DEFAULT_UNAVAILABLE_VALUE)
# get utilized quantity
quantity = row.utilized_quantity
firstrecord_timestamp_unix = \
getattr(row, "firstrecord_timestamp_unix",
Component.DEFAULT_UNAVAILABLE_VALUE)
firstrecord_timestamp_string = \
getattr(row, "firstrecord_timestamp_string",
Component.DEFAULT_UNAVAILABLE_VALUE)
lastrecord_timestamp_unix = \
getattr(row, "lastrecord_timestamp_unix",
Component.DEFAULT_UNAVAILABLE_VALUE)
lastrecord_timestamp_string = \
getattr(row, "lastrecord_timestamp_string",
Component.DEFAULT_UNAVAILABLE_VALUE)
record_count = getattr(row, "record_count",
Component.DEFAULT_UNAVAILABLE_VALUE)
# aggregation period
aggregation_period = Component.DEFAULT_UNAVAILABLE_VALUE
# get extra_data_map, if any
extra_data_map = getattr(row, "extra_data_map", {})
# filter out event_type
extra_data_map_filtered = \
{key: extra_data_map[key] for key in list(extra_data_map)
if key != 'event_type'}
instance_usage_dict = {"tenant_id": tenant_id, "user_id": user_id,
"resource_uuid": resource_uuid,
"geolocation": geolocation, "region": region,
"zone": zone, "host": host,
"aggregated_metric_name":
aggregated_metric_name,
"quantity": quantity,
"firstrecord_timestamp_unix":
firstrecord_timestamp_unix,
"firstrecord_timestamp_string":
firstrecord_timestamp_string,
"lastrecord_timestamp_unix":
lastrecord_timestamp_unix,
"lastrecord_timestamp_string":
lastrecord_timestamp_string,
"record_count": record_count,
"usage_date": usage_date,
"usage_hour": usage_hour,
"usage_minute": usage_minute,
"aggregation_period": aggregation_period,
"extra_data_map": extra_data_map_filtered}
instance_usage_data_json = json.dumps(instance_usage_dict)
return instance_usage_data_json
@staticmethod
def usage(transform_context, record_store_df):
"""Method to return instance usage dataframe:
It groups together record store records by
provided group by columns list, sorts within the group by event
timestamp field, applies group stats udf and returns the latest
quantity as a instance usage dataframe
This component does groups records by event_type (a.k.a metric name)
and expects two kinds of records in record_store data
total quantity records - the total available quantity
e.g. cpu.total_logical_cores
idle perc records - percentage that is idle
e.g. cpu.idle_perc
To calculate the utilized quantity this component uses following
formula:
utilized quantity = (100 - idle_perc) * total_quantity / 100
"""
sql_context = SQLContext.getOrCreate(record_store_df.rdd.context)
transform_spec_df = transform_context.transform_spec_df_info
# get rollup operation (sum, max, avg, min)
agg_params = transform_spec_df.select(
"aggregation_params_map.usage_fetch_operation"). \
collect()[0].asDict()
usage_fetch_operation = agg_params["usage_fetch_operation"]
# check if operation is valid
if not FetchQuantityUtil. \
_is_valid_fetch_quantity_util_operation(usage_fetch_operation):
raise FetchQuantityUtilException(
"Operation %s is not supported" % usage_fetch_operation)
# get the quantities for idle perc and quantity
instance_usage_df = FetchQuantity().usage(
transform_context, record_store_df)
# get aggregation period for instance usage dataframe
agg_params = transform_spec_df.select(
"aggregation_params_map.aggregation_period").collect()[0].asDict()
aggregation_period = agg_params["aggregation_period"]
group_by_period_list = ComponentUtils.\
_get_instance_group_by_period_list(aggregation_period)
# get what we want to group by
agg_params = transform_spec_df.select(
"aggregation_params_map.aggregation_group_by_list").\
collect()[0].asDict()
aggregation_group_by_list = agg_params["aggregation_group_by_list"]
# group by columns list
group_by_columns_list = group_by_period_list + \
aggregation_group_by_list
# get quantity event type
agg_params = transform_spec_df.select(
"aggregation_params_map.usage_fetch_util_quantity_event_type").\
collect()[0].asDict()
usage_fetch_util_quantity_event_type = \
agg_params["usage_fetch_util_quantity_event_type"]
# check if driver parameter is provided
if usage_fetch_util_quantity_event_type is None or \
usage_fetch_util_quantity_event_type == "":
raise FetchQuantityUtilException(
"Driver parameter '%s' is missing"
% "usage_fetch_util_quantity_event_type")
# get idle perc event type
agg_params = transform_spec_df.select(
"aggregation_params_map.usage_fetch_util_idle_perc_event_type").\
collect()[0].asDict()
usage_fetch_util_idle_perc_event_type = \
agg_params["usage_fetch_util_idle_perc_event_type"]
# check if driver parameter is provided
if usage_fetch_util_idle_perc_event_type is None or \
usage_fetch_util_idle_perc_event_type == "":
raise FetchQuantityUtilException(
"Driver parameter '%s' is missing"
% "usage_fetch_util_idle_perc_event_type")
# get quantity records dataframe
event_type_quantity_clause = "processing_meta.event_type='%s'" \
% usage_fetch_util_quantity_event_type
quantity_df = instance_usage_df.select('*').where(
event_type_quantity_clause).alias("quantity_df_alias")
# get idle perc records dataframe
event_type_idle_perc_clause = "processing_meta.event_type='%s'" \
% usage_fetch_util_idle_perc_event_type
idle_perc_df = instance_usage_df.select('*').where(
event_type_idle_perc_clause).alias("idle_perc_df_alias")
# join quantity records with idle perc records
# create a join condition without the event_type
cond = [item for item in group_by_columns_list
if item != 'event_type']
quant_idle_perc_df = quantity_df.join(idle_perc_df, cond, 'left')
#
# Find utilized quantity based on idle percentage
#
# utilized quantity = (100 - idle_perc) * total_quantity / 100
#
quant_idle_perc_calc_df = quant_idle_perc_df.select(
col("quantity_df_alias.*"),
when(col("idle_perc_df_alias.quantity") != 0.0,
(100.0 - col(
"idle_perc_df_alias.quantity")) * col(
"quantity_df_alias.quantity") / 100.0)
.otherwise(col("quantity_df_alias.quantity"))
.alias("utilized_quantity"),
col("quantity_df_alias.quantity")
.alias("total_quantity"),
col("idle_perc_df_alias.quantity")
.alias("idle_perc"))
instance_usage_json_rdd = \
quant_idle_perc_calc_df.rdd.map(
FetchQuantityUtil._format_quantity_util)
instance_usage_df = \
InstanceUsageUtils.create_df_from_json_rdd(sql_context,
instance_usage_json_rdd)
return instance_usage_df

View File

@ -1,154 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from oslo_config import cfg
class ConfigInitializer(object):
@staticmethod
def basic_config(default_config_files=None):
cfg.CONF.reset()
ConfigInitializer.load_repositories_options()
ConfigInitializer.load_database_options()
ConfigInitializer.load_messaging_options()
ConfigInitializer.load_service_options()
ConfigInitializer.load_stage_processors_options()
ConfigInitializer.load_pre_hourly_processor_options()
if not default_config_files:
default_config_files = ['/etc/monasca-transform.conf']
cfg.CONF(args=[],
project='monasca_transform',
default_config_files=default_config_files)
@staticmethod
def load_repositories_options():
repo_opts = [
cfg.StrOpt(
'offsets',
default='monasca_transform.offset_specs:JSONOffsetSpecs',
help='Repository for offset persistence'
),
cfg.StrOpt(
'data_driven_specs',
default='monasca_transform.data_driven_specs.'
'json_data_driven_specs_repo:JSONDataDrivenSpecsRepo',
help='Repository for metric and event data_driven_specs'
),
cfg.IntOpt('offsets_max_revisions', default=10,
help="Max revisions of offsets for each application")
]
repo_group = cfg.OptGroup(name='repositories', title='repositories')
cfg.CONF.register_group(repo_group)
cfg.CONF.register_opts(repo_opts, group=repo_group)
@staticmethod
def load_database_options():
db_opts = [
cfg.StrOpt('server_type'),
cfg.StrOpt('host'),
cfg.StrOpt('database_name'),
cfg.StrOpt('username'),
cfg.StrOpt('password'),
cfg.BoolOpt('use_ssl', default=False),
cfg.StrOpt('ca_file')
]
mysql_group = cfg.OptGroup(name='database', title='database')
cfg.CONF.register_group(mysql_group)
cfg.CONF.register_opts(db_opts, group=mysql_group)
@staticmethod
def load_messaging_options():
messaging_options = [
cfg.StrOpt('adapter',
default='monasca_transform.messaging.adapter:'
'KafkaMessageAdapter',
help='Message adapter implementation'),
cfg.StrOpt('topic', default='metrics',
help='Messaging topic'),
cfg.StrOpt('brokers',
default='192.168.10.4:9092',
help='Messaging brokers'),
cfg.StrOpt('publish_kafka_project_id',
default='111111',
help='publish aggregated metrics tenant'),
cfg.StrOpt('publish_region',
default='useast',
help='publish aggregated metrics region'),
cfg.StrOpt('adapter_pre_hourly',
default='monasca_transform.messaging.adapter:'
'KafkaMessageAdapterPreHourly',
help='Message adapter implementation'),
cfg.StrOpt('topic_pre_hourly', default='metrics_pre_hourly',
help='Messaging topic pre hourly')
]
messaging_group = cfg.OptGroup(name='messaging', title='messaging')
cfg.CONF.register_group(messaging_group)
cfg.CONF.register_opts(messaging_options, group=messaging_group)
@staticmethod
def load_service_options():
service_opts = [
cfg.StrOpt('coordinator_address'),
cfg.StrOpt('coordinator_group'),
cfg.FloatOpt('election_polling_frequency'),
cfg.BoolOpt('enable_debug_log_entries', default='false'),
cfg.StrOpt('setup_file'),
cfg.StrOpt('setup_target'),
cfg.StrOpt('spark_driver'),
cfg.StrOpt('service_log_path'),
cfg.StrOpt('service_log_filename',
default='monasca-transform.log'),
cfg.StrOpt('spark_event_logging_dest'),
cfg.StrOpt('spark_event_logging_enabled'),
cfg.StrOpt('spark_jars_list'),
cfg.StrOpt('spark_master_list'),
cfg.StrOpt('spark_python_files'),
cfg.IntOpt('stream_interval'),
cfg.StrOpt('work_dir'),
cfg.StrOpt('spark_home'),
cfg.BoolOpt('enable_record_store_df_cache'),
cfg.StrOpt('record_store_df_cache_storage_level')
]
service_group = cfg.OptGroup(name='service', title='service')
cfg.CONF.register_group(service_group)
cfg.CONF.register_opts(service_opts, group=service_group)
@staticmethod
def load_stage_processors_options():
app_opts = [
cfg.BoolOpt('pre_hourly_processor_enabled'),
]
app_group = cfg.OptGroup(name='stage_processors',
title='stage_processors')
cfg.CONF.register_group(app_group)
cfg.CONF.register_opts(app_opts, group=app_group)
@staticmethod
def load_pre_hourly_processor_options():
app_opts = [
cfg.IntOpt('late_metric_slack_time', default=600),
cfg.StrOpt('data_provider',
default='monasca_transform.processor.'
'pre_hourly_processor:'
'PreHourlyProcessorDataProvider'),
cfg.BoolOpt('enable_instance_usage_df_cache'),
cfg.StrOpt('instance_usage_df_cache_storage_level'),
cfg.BoolOpt('enable_batch_time_filtering'),
cfg.IntOpt('effective_batch_revision', default=2)
]
app_group = cfg.OptGroup(name='pre_hourly_processor',
title='pre_hourly_processor')
cfg.CONF.register_group(app_group)
cfg.CONF.register_opts(app_opts, group=app_group)

View File

@ -1,43 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import abc
from monasca_common.simport import simport
from oslo_config import cfg
import six
class DataDrivenSpecsRepoFactory(object):
data_driven_specs_repo = None
@staticmethod
def get_data_driven_specs_repo():
if not DataDrivenSpecsRepoFactory.data_driven_specs_repo:
DataDrivenSpecsRepoFactory.data_driven_specs_repo = simport.load(
cfg.CONF.repositories.data_driven_specs)()
return DataDrivenSpecsRepoFactory.data_driven_specs_repo
@six.add_metaclass(abc.ABCMeta)
class DataDrivenSpecsRepo(object):
transform_specs_type = 'transform_specs'
pre_transform_specs_type = 'pre_transform_specs'
@abc.abstractmethod
def get_data_driven_specs(self, sql_context=None, type=None):
raise NotImplementedError(
"Class %s doesn't implement get_data_driven_specs(self, type=None)"
% self.__class__.__name__)

View File

@ -1,76 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import json
from pyspark.sql import DataFrameReader
from monasca_transform.data_driven_specs.data_driven_specs_repo \
import DataDrivenSpecsRepo
from monasca_transform.db.db_utils import DbUtil
class MySQLDataDrivenSpecsRepo(DataDrivenSpecsRepo):
transform_specs_data_frame = None
pre_transform_specs_data_frame = None
def get_data_driven_specs(self, sql_context=None,
data_driven_spec_type=None):
data_driven_spec = None
if self.transform_specs_type == data_driven_spec_type:
if not self.transform_specs_data_frame:
self.generate_transform_specs_data_frame(
spark_context=sql_context._sc,
sql_context=sql_context)
data_driven_spec = self.transform_specs_data_frame
elif self.pre_transform_specs_type == data_driven_spec_type:
if not self.pre_transform_specs_data_frame:
self.generate_pre_transform_specs_data_frame(
spark_context=sql_context._sc,
sql_context=sql_context)
data_driven_spec = self.pre_transform_specs_data_frame
return data_driven_spec
def generate_transform_specs_data_frame(self, spark_context=None,
sql_context=None):
data_frame_reader = DataFrameReader(sql_context)
transform_specs_data_frame = data_frame_reader.jdbc(
DbUtil.get_java_db_connection_string(),
'transform_specs'
)
data = []
for item in transform_specs_data_frame.collect():
spec = json.loads(item['transform_spec'])
data.append(json.dumps(spec))
data_frame = sql_context.read.json(spark_context.parallelize(data))
self.transform_specs_data_frame = data_frame
def generate_pre_transform_specs_data_frame(self, spark_context=None,
sql_context=None):
data_frame_reader = DataFrameReader(sql_context)
pre_transform_specs_data_frame = data_frame_reader.jdbc(
DbUtil.get_java_db_connection_string(),
'pre_transform_specs'
)
data = []
for item in pre_transform_specs_data_frame.collect():
spec = json.loads(item['pre_transform_spec'])
data.append(json.dumps(spec))
data_frame = sql_context.read.json(spark_context.parallelize(data))
self.pre_transform_specs_data_frame = data_frame

View File

@ -1,17 +0,0 @@
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"mem.total_mb","metric_id_list":["mem_total_all"],"required_raw_fields_list":["creation_time"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"mem.usable_mb","metric_id_list":["mem_usable_all"],"required_raw_fields_list":["creation_time"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"vm.mem.total_mb","metric_id_list":["vm_mem_total_mb_all","vm_mem_total_mb_project"],"required_raw_fields_list":["creation_time","dimensions#tenant_id","dimensions#resource_id"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"vm.mem.used_mb","metric_id_list":["vm_mem_used_mb_all","vm_mem_used_mb_project"],"required_raw_fields_list":["creation_time","dimensions#tenant_id","dimensions#resource_id"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"nova.vm.mem.total_allocated_mb","metric_id_list":["nova_vm_mem_total_all"],"required_raw_fields_list":["creation_time"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"disk.total_space_mb","metric_id_list":["disk_total_all"],"required_raw_fields_list":["creation_time"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"disk.total_used_space_mb","metric_id_list":["disk_usable_all"],"required_raw_fields_list":["creation_time"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"nova.vm.disk.total_allocated_gb","metric_id_list":["nova_disk_total_allocated_gb_all"],"required_raw_fields_list":["creation_time"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"vm.disk.allocation","metric_id_list":["vm_disk_allocation_all","vm_disk_allocation_project"],"required_raw_fields_list":["creation_time","dimensions#tenant_id","dimensions#resource_id"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"cpu.total_logical_cores","metric_id_list":["cpu_total_all","cpu_total_host","cpu_util_all","cpu_util_host"],"required_raw_fields_list":["creation_time"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"cpu.idle_perc","metric_id_list":["cpu_util_all","cpu_util_host"],"required_raw_fields_list":["creation_time"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"vcpus","metric_id_list":["vcpus_all","vcpus_project"],"required_raw_fields_list":["creation_time","dimensions#project_id","dimensions#resource_id"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"vm.cpu.utilization_perc","metric_id_list":["vm_cpu_util_perc_project"],"required_raw_fields_list":["creation_time","dimensions#tenant_id","dimensions#resource_id"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"nova.vm.cpu.total_allocated","metric_id_list":["nova_vm_cpu_total_all"],"required_raw_fields_list":["creation_time"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"swiftlm.diskusage.host.val.size","metric_id_list":["swift_total_all","swift_total_host"],"required_raw_fields_list":["creation_time", "dimensions#hostname", "dimensions#mount"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"swiftlm.diskusage.host.val.avail","metric_id_list":["swift_avail_all","swift_avail_host","swift_usage_rate"],"required_raw_fields_list":["creation_time", "dimensions#hostname", "dimensions#mount"]}
{"event_processing_params":{"set_default_zone_to":"1","set_default_geolocation_to":"1","set_default_region_to":"W"},"event_type":"storage.objects.size","metric_id_list":["storage_objects_size_all"],"required_raw_fields_list":["creation_time", "dimensions#project_id"]}

View File

@ -1,26 +0,0 @@
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"mem.total_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"mem_total_all","metric_id":"mem_total_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"mem.usable_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"mem_usable_all","metric_id":"mem_usable_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.mem.total_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_mem_total_mb_all","metric_id":"vm_mem_total_mb_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.mem.total_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["tenant_id"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_mem_total_mb_project","metric_id":"vm_mem_total_mb_project"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.mem.used_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_mem_used_mb_all","metric_id":"vm_mem_used_mb_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.mem.used_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["tenant_id"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_mem_used_mb_project","metric_id":"vm_mem_used_mb_project"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"nova.vm.mem.total_allocated_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list": [],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"nova_vm_mem_total_all","metric_id":"nova_vm_mem_total_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"disk.total_space_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"disk_total_all","metric_id":"disk_total_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"disk.total_used_space_mb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"disk_usable_all","metric_id":"disk_usable_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"nova.vm.disk.total_allocated_gb_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"nova_disk_total_allocated_gb_all","metric_id":"nova_disk_total_allocated_gb_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.disk.allocation_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_disk_allocation_all","metric_id":"vm_disk_allocation_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.disk.allocation_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["tenant_id"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_disk_allocation_project","metric_id":"vm_disk_allocation_project"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"cpu.total_logical_cores_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list": [],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"cpu_total_all","metric_id":"cpu_total_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"cpu.total_logical_cores_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["host"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"cpu_total_host","metric_id":"cpu_total_host"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity_util","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"cpu.utilized_logical_cores_agg","aggregation_period":"hourly","aggregation_group_by_list": ["event_type", "host"],"usage_fetch_operation": "avg","usage_fetch_util_quantity_event_type": "cpu.total_logical_cores","usage_fetch_util_idle_perc_event_type": "cpu.idle_perc","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"cpu_util_all","metric_id":"cpu_util_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity_util","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"cpu.utilized_logical_cores_agg","aggregation_period":"hourly","aggregation_group_by_list": ["event_type", "host"],"usage_fetch_operation": "avg","usage_fetch_util_quantity_event_type": "cpu.total_logical_cores","usage_fetch_util_idle_perc_event_type": "cpu.idle_perc","filter_by_list": [],"setter_rollup_group_by_list":["host"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"cpu_util_host","metric_id":"cpu_util_host"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vcpus_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#tenant_id", "dimensions#resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vcpus_all","metric_id":"vcpus_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vcpus_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["tenant_id"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vcpus_project","metric_id":"vcpus_project"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"vm.cpu.utilization_perc_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "tenant_id", "resource_uuid"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["tenant_id"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"vm_cpu_util_perc_project","metric_id":"vm_cpu_util_perc_project"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"nova.vm.cpu.total_allocated_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list": [],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"nova_vm_cpu_total_all","metric_id":"nova_vm_cpu_total_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"swiftlm.diskusage.val.size_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#mount"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"swift_total_all","metric_id":"swift_total_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"swiftlm.diskusage.val.size_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#mount"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["host"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"swift_total_host","metric_id":"swift_total_host"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"swiftlm.diskusage.val.avail_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#mount"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"swift_avail_all","metric_id":"swift_avail_all"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"swiftlm.diskusage.val.avail_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#mount"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":["host"],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"avg","pre_hourly_group_by_list":["default"]},"metric_group":"swift_avail_host","metric_id":"swift_avail_host"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"calculate_rate","setters":["set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"swiftlm.diskusage.rate_agg","aggregation_period":"hourly","aggregation_group_by_list": ["host", "metric_id", "dimensions#mount"],"filter_by_list": [],"setter_rollup_group_by_list": [],"dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"rate","pre_hourly_group_by_list":["default"]},"metric_group":"swift_avail_rate","metric_id":"swift_usage_rate"}
{"aggregation_params_map":{"aggregation_pipeline":{"source":"streaming","usage":"fetch_quantity","setters":["rollup_quantity","set_aggregated_metric_name","set_aggregated_period"],"insert":["prepare_data","insert_data_pre_hourly"]},"aggregated_metric_name":"storage.objects.size_agg","aggregation_period":"hourly","aggregation_group_by_list": ["metric_id", "tenant_id"],"usage_fetch_operation": "avg","filter_by_list": [],"setter_rollup_group_by_list":[],"setter_rollup_operation": "sum","dimension_list":["aggregation_period","host","project_id"],"pre_hourly_operation":"sum","pre_hourly_group_by_list":["default"]},"metric_group":"storage_objects_size_all","metric_id":"storage_objects_size_all"}

View File

@ -1,54 +0,0 @@
# (c) Copyright 2016 Hewlett Packard Enterprise Development LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from oslo_config import cfg
class DbUtil(object):
@staticmethod
def get_python_db_connection_string(config=cfg.CONF):
database_name = config.database.database_name
database_server = config.database.host
database_uid = config.database.username
database_pwd = config.database.password
if config.database.use_ssl:
db_ssl = "?ssl_ca=%s" % config.database.ca_file
else:
db_ssl = ''
return 'mysql+pymysql://%s:%s@%s/%s%s' % (
database_uid,
database_pwd,
database_server,
database_name,
db_ssl)
@staticmethod
def get_java_db_connection_string(config=cfg.CONF):
ssl_params = ''
if config.database.use_ssl:
ssl_params = "&useSSL=%s&requireSSL=%s" % (
config.database.use_ssl, config.database.use_ssl
)
# FIXME I don't like this, find a better way of managing the conn
return 'jdbc:%s://%s/%s?user=%s&password=%s%s' % (
config.database.server_type,
config.database.host,
config.database.database_name,
config.database.username,
config.database.password,
ssl_params,
)

View File

@ -1,570 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark.streaming.kafka import TopicAndPartition
from pyspark.streaming import StreamingContext
from pyspark.sql.functions import explode
from pyspark.sql.functions import from_unixtime
from pyspark.sql.functions import when
from pyspark.sql import SQLContext
import logging
from monasca_common.simport import simport
from oslo_config import cfg
import time
from monasca_transform.component.usage.fetch_quantity import \
FetchQuantityException
from monasca_transform.component.usage.fetch_quantity_util import \
FetchQuantityUtilException
from monasca_transform.config.config_initializer import ConfigInitializer
from monasca_transform.log_utils import LogUtils
from monasca_transform.transform.builder.generic_transform_builder \
import GenericTransformBuilder
from monasca_transform.data_driven_specs.data_driven_specs_repo \
import DataDrivenSpecsRepo
from monasca_transform.data_driven_specs.data_driven_specs_repo \
import DataDrivenSpecsRepoFactory
from monasca_transform.processor.pre_hourly_processor import PreHourlyProcessor
from monasca_transform.transform import RddTransformContext
from monasca_transform.transform.storage_utils import \
InvalidCacheStorageLevelException
from monasca_transform.transform.storage_utils import StorageUtils
from monasca_transform.transform.transform_utils import MonMetricUtils
from monasca_transform.transform.transform_utils import PreTransformSpecsUtils
from monasca_transform.transform import TransformContextUtils
ConfigInitializer.basic_config()
log = LogUtils.init_logger(__name__)
class MonMetricsKafkaProcessor(object):
@staticmethod
def log_debug(message):
print(message)
log.debug(message)
@staticmethod
def store_offset_ranges(batch_time, rdd):
if rdd.isEmpty():
MonMetricsKafkaProcessor.log_debug(
"storeOffsetRanges: nothing to process...")
return rdd
else:
my_offset_ranges = rdd.offsetRanges()
transform_context = \
TransformContextUtils.get_context(offset_info=my_offset_ranges,
batch_time_info=batch_time
)
rdd_transform_context = \
rdd.map(lambda x: RddTransformContext(x, transform_context))
return rdd_transform_context
@staticmethod
def print_offset_ranges(my_offset_ranges):
for o in my_offset_ranges:
print("printOffSetRanges: %s %s %s %s" % (
o.topic, o.partition, o.fromOffset, o.untilOffset))
@staticmethod
def get_kafka_stream(topic, streaming_context):
offset_specifications = simport.load(cfg.CONF.repositories.offsets)()
app_name = streaming_context.sparkContext.appName
saved_offset_spec = offset_specifications.get_kafka_offsets(app_name)
if len(saved_offset_spec) < 1:
MonMetricsKafkaProcessor.log_debug(
"No saved offsets available..."
"connecting to kafka without specifying offsets")
kvs = KafkaUtils.createDirectStream(
streaming_context, [topic],
{"metadata.broker.list": cfg.CONF.messaging.brokers})
return kvs
else:
from_offsets = {}
for key, value in saved_offset_spec.items():
if key.startswith("%s_%s" % (app_name, topic)):
# spec_app_name = value.get_app_name()
spec_topic = value.get_topic()
spec_partition = int(value.get_partition())
# spec_from_offset = value.get_from_offset()
spec_until_offset = value.get_until_offset()
# composite_key = "%s_%s_%s" % (spec_app_name,
# spec_topic,
# spec_partition)
# partition = saved_offset_spec[composite_key]
from_offsets[
TopicAndPartition(spec_topic, spec_partition)
] = int(spec_until_offset)
MonMetricsKafkaProcessor.log_debug(
"get_kafka_stream: calling createDirectStream :"
" topic:{%s} : start " % topic)
for key, value in from_offsets.items():
MonMetricsKafkaProcessor.log_debug(
"get_kafka_stream: calling createDirectStream : "
"offsets : TopicAndPartition:{%s,%s}, value:{%s}" %
(str(key._topic), str(key._partition), str(value)))
MonMetricsKafkaProcessor.log_debug(
"get_kafka_stream: calling createDirectStream : "
"topic:{%s} : done" % topic)
kvs = KafkaUtils.createDirectStream(
streaming_context, [topic],
{"metadata.broker.list": cfg.CONF.messaging.brokers},
from_offsets)
return kvs
@staticmethod
def save_rdd_contents(rdd):
file_name = "".join((
"/vagrant_home/uniq_metrics",
'-', time.strftime("%Y-%m-%d-%H-%M-%S"),
'-', str(rdd.id),
'.log'))
rdd.saveAsTextFile(file_name)
@staticmethod
def save_kafka_offsets(current_offsets, app_name,
batch_time_info):
"""save current offsets to offset specification."""
offset_specs = simport.load(cfg.CONF.repositories.offsets)()
for o in current_offsets:
MonMetricsKafkaProcessor.log_debug(
"saving: OffSetRanges: %s %s %s %s, "
"batch_time_info: %s" % (
o.topic, o.partition, o.fromOffset, o.untilOffset,
str(batch_time_info)))
# add new offsets, update revision
offset_specs.add_all_offsets(app_name,
current_offsets,
batch_time_info)
@staticmethod
def reset_kafka_offsets(app_name):
"""delete all offsets from the offset specification."""
# get the offsets from global var
offset_specs = simport.load(cfg.CONF.repositories.offsets)()
offset_specs.delete_all_kafka_offsets(app_name)
@staticmethod
def _validate_raw_mon_metrics(row):
required_fields = row.required_raw_fields_list
# prepare list of required fields, to a rdd syntax to retrieve value
required_fields = PreTransformSpecsUtils.prepare_required_raw_fields_list(
required_fields)
invalid_list = []
for required_field in required_fields:
required_field_value = None
# Look for the field in the first layer of the row
try:
required_field_value = eval(".".join(("row", required_field)))
except Exception:
pass
if required_field_value is None \
or required_field_value == "":
invalid_list.append((required_field,
required_field_value))
if len(invalid_list) <= 0:
return row
else:
for field_name, field_value in invalid_list:
MonMetricsKafkaProcessor.log_debug(
"_validate_raw_mon_metrics : found invalid field : ** %s: %s" % (
field_name, field_value))
@staticmethod
def process_metric(transform_context, record_store_df):
"""process (aggregate) metric data from record_store data
All the parameters to drive processing should be available
in transform_spec_df dataframe.
"""
# call processing chain
return GenericTransformBuilder.do_transform(
transform_context, record_store_df)
@staticmethod
def process_metrics(transform_context, record_store_df):
"""start processing (aggregating) metrics"""
#
# look in record_store_df for list of metrics to be processed
#
metric_ids_df = record_store_df.select("metric_id").distinct()
metric_ids_to_process = [row.metric_id
for row in metric_ids_df.collect()]
data_driven_specs_repo = DataDrivenSpecsRepoFactory.\
get_data_driven_specs_repo()
sqlc = SQLContext.getOrCreate(record_store_df.rdd.context)
transform_specs_df = data_driven_specs_repo.get_data_driven_specs(
sql_context=sqlc,
data_driven_spec_type=DataDrivenSpecsRepo.transform_specs_type)
for metric_id in metric_ids_to_process:
transform_spec_df = transform_specs_df.select(
["aggregation_params_map", "metric_id"]
).where(transform_specs_df.metric_id == metric_id)
source_record_store_df = record_store_df.select("*").where(
record_store_df.metric_id == metric_id)
# set transform_spec_df in TransformContext
transform_context = \
TransformContextUtils.get_context(
transform_context_info=transform_context,
transform_spec_df_info=transform_spec_df)
try:
agg_inst_usage_df = (
MonMetricsKafkaProcessor.process_metric(
transform_context, source_record_store_df))
# if running in debug mode, write out the aggregated metric
# name just processed (along with the count of how many of
# these were aggregated) to the application log.
if log.isEnabledFor(logging.DEBUG):
agg_inst_usage_collection = agg_inst_usage_df.collect()
collection_len = len(agg_inst_usage_collection)
if collection_len > 0:
agg_inst_usage_dict = (
agg_inst_usage_collection[0].asDict())
log.debug("Submitted pre-hourly aggregated metric: "
"%s (%s)",
agg_inst_usage_dict[
"aggregated_metric_name"],
str(collection_len))
except FetchQuantityException:
raise
except FetchQuantityUtilException:
raise
except Exception as e:
MonMetricsKafkaProcessor.log_debug(
"Exception raised in metric processing for metric: " +
str(metric_id) + ". Error: " + str(e))
@staticmethod
def rdd_to_recordstore(rdd_transform_context_rdd):
if rdd_transform_context_rdd.isEmpty():
MonMetricsKafkaProcessor.log_debug(
"rdd_to_recordstore: nothing to process...")
else:
sql_context = SQLContext.getOrCreate(
rdd_transform_context_rdd.context)
data_driven_specs_repo = DataDrivenSpecsRepoFactory.\
get_data_driven_specs_repo()
pre_transform_specs_df = data_driven_specs_repo.\
get_data_driven_specs(
sql_context=sql_context,
data_driven_spec_type=DataDrivenSpecsRepo.
pre_transform_specs_type)
#
# extract second column containing raw metric data
#
raw_mon_metrics = rdd_transform_context_rdd.map(
lambda nt: nt.rdd_info[1])
#
# convert raw metric data rdd to dataframe rdd
#
raw_mon_metrics_df = \
MonMetricUtils.create_mon_metrics_df_from_json_rdd(
sql_context,
raw_mon_metrics)
#
# filter out unwanted metrics and keep metrics we are interested in
#
cond = [
raw_mon_metrics_df.metric["name"] ==
pre_transform_specs_df.event_type]
filtered_metrics_df = raw_mon_metrics_df.join(
pre_transform_specs_df, cond)
#
# validate filtered metrics to check if required fields
# are present and not empty
# In order to be able to apply filter function had to convert
# data frame rdd to normal rdd. After validation the rdd is
# converted back to dataframe rdd
#
# FIXME: find a way to apply filter function on dataframe rdd data
validated_mon_metrics_rdd = filtered_metrics_df.rdd.filter(
MonMetricsKafkaProcessor._validate_raw_mon_metrics)
validated_mon_metrics_df = sql_context.createDataFrame(
validated_mon_metrics_rdd, filtered_metrics_df.schema)
#
# record generator
# generate a new intermediate metric record if a given metric
# metric_id_list, in pre_transform_specs table has several
# intermediate metrics defined.
# intermediate metrics are used as a convenient way to
# process (aggregated) metric in mutiple ways by making a copy
# of the source data for each processing
#
gen_mon_metrics_df = validated_mon_metrics_df.select(
validated_mon_metrics_df.meta,
validated_mon_metrics_df.metric,
validated_mon_metrics_df.event_processing_params,
validated_mon_metrics_df.event_type,
explode(validated_mon_metrics_df.metric_id_list).alias(
"this_metric_id"))
#
# transform metrics data to record_store format
# record store format is the common format which will serve as
# source to aggregation processing.
# converting the metric to common standard format helps in writing
# generic aggregation routines driven by configuration parameters
# and can be reused
#
record_store_df = gen_mon_metrics_df.select(
(gen_mon_metrics_df.metric.timestamp / 1000).alias(
"event_timestamp_unix"),
from_unixtime(
gen_mon_metrics_df.metric.timestamp / 1000).alias(
"event_timestamp_string"),
gen_mon_metrics_df.event_type.alias("event_type"),
gen_mon_metrics_df.event_type.alias("event_quantity_name"),
(gen_mon_metrics_df.metric.value / 1.0).alias(
"event_quantity"),
# resource_uuid
when(gen_mon_metrics_df.metric.dimensions.instanceId != '',
gen_mon_metrics_df.metric.dimensions.instanceId).when(
gen_mon_metrics_df.metric.dimensions.resource_id != '',
gen_mon_metrics_df.metric.dimensions.resource_id).
otherwise('NA').alias("resource_uuid"),
# tenant_id
when(gen_mon_metrics_df.metric.dimensions.tenantId != '',
gen_mon_metrics_df.metric.dimensions.tenantId).when(
gen_mon_metrics_df.metric.dimensions.tenant_id != '',
gen_mon_metrics_df.metric.dimensions.tenant_id).when(
gen_mon_metrics_df.metric.dimensions.project_id != '',
gen_mon_metrics_df.metric.dimensions.project_id).otherwise(
'NA').alias("tenant_id"),
# user_id
when(gen_mon_metrics_df.meta.userId != '',
gen_mon_metrics_df.meta.userId).otherwise('NA').alias(
"user_id"),
# region
when(gen_mon_metrics_df.meta.region != '',
gen_mon_metrics_df.meta.region).when(
gen_mon_metrics_df.event_processing_params
.set_default_region_to != '',
gen_mon_metrics_df.event_processing_params
.set_default_region_to).otherwise(
'NA').alias("region"),
# zone
when(gen_mon_metrics_df.meta.zone != '',
gen_mon_metrics_df.meta.zone).when(
gen_mon_metrics_df.event_processing_params
.set_default_zone_to != '',
gen_mon_metrics_df.event_processing_params
.set_default_zone_to).otherwise(
'NA').alias("zone"),
# host
when(gen_mon_metrics_df.metric.dimensions.hostname != '',
gen_mon_metrics_df.metric.dimensions.hostname).when(
gen_mon_metrics_df.metric.value_meta.host != '',
gen_mon_metrics_df.metric.value_meta.host).otherwise(
'NA').alias("host"),
# event_date
from_unixtime(gen_mon_metrics_df.metric.timestamp / 1000,
'yyyy-MM-dd').alias("event_date"),
# event_hour
from_unixtime(gen_mon_metrics_df.metric.timestamp / 1000,
'HH').alias("event_hour"),
# event_minute
from_unixtime(gen_mon_metrics_df.metric.timestamp / 1000,
'mm').alias("event_minute"),
# event_second
from_unixtime(gen_mon_metrics_df.metric.timestamp / 1000,
'ss').alias("event_second"),
# TODO(ashwin): rename to transform_spec_group
gen_mon_metrics_df.this_metric_id.alias("metric_group"),
# TODO(ashwin): rename to transform_spec_id
gen_mon_metrics_df.this_metric_id.alias("metric_id"),
# metric dimensions
gen_mon_metrics_df.meta.alias("meta"),
# metric dimensions
gen_mon_metrics_df.metric.dimensions.alias("dimensions"),
# metric value_meta
gen_mon_metrics_df.metric.value_meta.alias("value_meta"))
#
# get transform context
#
rdd_transform_context = rdd_transform_context_rdd.first()
transform_context = rdd_transform_context.transform_context_info
#
# cache record store rdd
#
if cfg.CONF.service.enable_record_store_df_cache:
storage_level_prop = \
cfg.CONF.service.record_store_df_cache_storage_level
try:
storage_level = StorageUtils.get_storage_level(
storage_level_prop)
except InvalidCacheStorageLevelException as storage_error:
storage_error.value += \
" (as specified in " \
"service.record_store_df_cache_storage_level)"
raise
record_store_df.persist(storage_level)
#
# start processing metrics available in record_store data
#
MonMetricsKafkaProcessor.process_metrics(transform_context,
record_store_df)
# remove df from cache
if cfg.CONF.service.enable_record_store_df_cache:
record_store_df.unpersist()
#
# extract kafka offsets and batch processing time
# stored in transform_context and save offsets
#
offsets = transform_context.offset_info
# batch time
batch_time_info = \
transform_context.batch_time_info
MonMetricsKafkaProcessor.save_kafka_offsets(
offsets, rdd_transform_context_rdd.context.appName,
batch_time_info)
# call pre hourly processor, if its time to run
if (cfg.CONF.stage_processors.pre_hourly_processor_enabled and
PreHourlyProcessor.is_time_to_run(batch_time_info)):
PreHourlyProcessor.run_processor(
record_store_df.rdd.context,
batch_time_info)
@staticmethod
def transform_to_recordstore(kvs):
"""Transform metrics data from kafka to record store format.
extracts, validates, filters, generates data from kakfa to only keep
data that has to be aggregated. Generate data generates multiple
records for for the same incoming metric if the metric has multiple
intermediate metrics defined, so that each of intermediate metrics can
be potentially processed independently.
"""
# save offsets in global var myOffsetRanges
# http://spark.apache.org/docs/latest/streaming-kafka-integration.html
# Note that the typecast to HasOffsetRanges will only succeed if it is
# done in the first method called on the directKafkaStream, not later
# down a chain of methods. You can use transform() instead of
# foreachRDD() as your first method call in order to access offsets,
# then call further Spark methods. However, be aware that the
# one-to-one mapping between RDD partition and Kafka partition does not
# remain after any methods that shuffle or repartition,
# e.g. reduceByKey() or window()
kvs.transform(
MonMetricsKafkaProcessor.store_offset_ranges
).foreachRDD(MonMetricsKafkaProcessor.rdd_to_recordstore)
def invoke():
# object to keep track of offsets
ConfigInitializer.basic_config()
# app name
application_name = "mon_metrics_kafka"
my_spark_conf = SparkConf().setAppName(application_name)
spark_context = SparkContext(conf=my_spark_conf)
# read at the configured interval
spark_streaming_context = \
StreamingContext(spark_context, cfg.CONF.service.stream_interval)
kafka_stream = MonMetricsKafkaProcessor.get_kafka_stream(
cfg.CONF.messaging.topic,
spark_streaming_context)
# transform to recordstore
MonMetricsKafkaProcessor.transform_to_recordstore(kafka_stream)
# catch interrupt, stop streaming context gracefully
# signal.signal(signal.SIGINT, signal_handler)
# start processing
spark_streaming_context.start()
# FIXME: stop spark context to relinquish resources
# FIXME: specify cores, so as not to use all the resources on the cluster.
# FIXME: HA deploy multiple masters, may be one on each control node
try:
# Wait for the Spark driver to "finish"
spark_streaming_context.awaitTermination()
except Exception as e:
MonMetricsKafkaProcessor.log_debug(
"Exception raised during Spark execution : " + str(e))
# One exception that can occur here is the result of the saved
# kafka offsets being obsolete/out of range. Delete the saved
# offsets to improve the chance of success on the next execution.
# TODO(someone) prevent deleting all offsets for an application,
# but just the latest revision
MonMetricsKafkaProcessor.log_debug(
"Deleting saved offsets for chance of success on next execution")
MonMetricsKafkaProcessor.reset_kafka_offsets(application_name)
# delete pre hourly processor offsets
if cfg.CONF.stage_processors.pre_hourly_processor_enabled:
PreHourlyProcessor.reset_kafka_offsets()
if __name__ == "__main__":
invoke()

View File

@ -1,53 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import logging
from oslo_config import cfg
class LogUtils(object):
"""util methods for logging"""
@staticmethod
def log_debug(message):
log = logging.getLogger(__name__)
print(message)
log.debug(message)
@staticmethod
def who_am_i(obj):
sep = "*" * 10
debugstr = "\n".join((sep, "name: %s " % type(obj).__name__))
debugstr = "\n".join((debugstr, "type: %s" % (type(obj))))
debugstr = "\n".join((debugstr, "dir: %s" % (dir(obj)), sep))
LogUtils.log_debug(debugstr)
@staticmethod
def init_logger(logger_name):
# initialize logger
log = logging.getLogger(logger_name)
_h = logging.FileHandler('%s/%s' % (
cfg.CONF.service.service_log_path,
cfg.CONF.service.service_log_filename))
_h.setFormatter(logging.Formatter("'%(asctime)s - %(pathname)s:"
"%(lineno)s - %(levelname)s"
" - %(message)s'"))
log.addHandler(_h)
if cfg.CONF.service.enable_debug_log_entries:
log.setLevel(logging.DEBUG)
else:
log.setLevel(logging.INFO)
return log

View File

@ -1,85 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import abc
import json
from monasca_common.kafka_lib.client import KafkaClient
from monasca_common.kafka_lib.producer import SimpleProducer
from monasca_common.simport import simport
from oslo_config import cfg
class MessageAdapter(object):
@abc.abstractmethod
def do_send_metric(self, metric):
raise NotImplementedError(
"Class %s doesn't implement do_send_metric(self, metric)"
% self.__class__.__name__)
class KafkaMessageAdapter(MessageAdapter):
adapter_impl = None
def __init__(self):
client_for_writing = KafkaClient(cfg.CONF.messaging.brokers)
self.producer = SimpleProducer(client_for_writing)
self.topic = cfg.CONF.messaging.topic
@staticmethod
def init():
# object to keep track of offsets
KafkaMessageAdapter.adapter_impl = simport.load(
cfg.CONF.messaging.adapter)()
def do_send_metric(self, metric):
self.producer.send_messages(
self.topic,
json.dumps(metric, separators=(',', ':')))
return
@staticmethod
def send_metric(metric):
if not KafkaMessageAdapter.adapter_impl:
KafkaMessageAdapter.init()
KafkaMessageAdapter.adapter_impl.do_send_metric(metric)
class KafkaMessageAdapterPreHourly(MessageAdapter):
adapter_impl = None
def __init__(self):
client_for_writing = KafkaClient(cfg.CONF.messaging.brokers)
self.producer = SimpleProducer(client_for_writing)
self.topic = cfg.CONF.messaging.topic_pre_hourly
@staticmethod
def init():
# object to keep track of offsets
KafkaMessageAdapterPreHourly.adapter_impl = simport.load(
cfg.CONF.messaging.adapter_pre_hourly)()
def do_send_metric(self, metric):
self.producer.send_messages(
self.topic,
json.dumps(metric, separators=(',', ':')))
return
@staticmethod
def send_metric(metric):
if not KafkaMessageAdapterPreHourly.adapter_impl:
KafkaMessageAdapterPreHourly.init()
KafkaMessageAdapterPreHourly.adapter_impl.do_send_metric(metric)

View File

@ -1,197 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import datetime
from oslo_config import cfg
from sqlalchemy import create_engine
from sqlalchemy import desc
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import sessionmaker
from monasca_transform.db.db_utils import DbUtil
from monasca_transform.offset_specs import OffsetSpec
from monasca_transform.offset_specs import OffsetSpecs
Base = automap_base()
class MySQLOffsetSpec(Base, OffsetSpec):
__tablename__ = 'kafka_offsets'
def __str__(self):
return "%s,%s,%s,%s,%s,%s,%s,%s" % (str(self.id),
str(self.topic),
str(self.partition),
str(self.until_offset),
str(self.from_offset),
str(self.batch_time),
str(self.last_updated),
str(self.revision))
class MySQLOffsetSpecs(OffsetSpecs):
def __init__(self):
db = create_engine(DbUtil.get_python_db_connection_string(),
isolation_level="READ UNCOMMITTED")
if cfg.CONF.service.enable_debug_log_entries:
db.echo = True
# reflect the tables
Base.prepare(db, reflect=True)
Session = sessionmaker(bind=db)
self.session = Session()
# keep these many offset versions around
self.MAX_REVISIONS = cfg.CONF.repositories.offsets_max_revisions
def _manage_offset_revisions(self):
"""manage offset versions"""
distinct_offset_specs = self.session.query(
MySQLOffsetSpec).group_by(MySQLOffsetSpec.app_name,
MySQLOffsetSpec.topic,
MySQLOffsetSpec.partition
).all()
for distinct_offset_spec in distinct_offset_specs:
ordered_versions = self.session.query(
MySQLOffsetSpec).filter_by(
app_name=distinct_offset_spec.app_name,
topic=distinct_offset_spec.topic,
partition=distinct_offset_spec.partition).order_by(
desc(MySQLOffsetSpec.id)).all()
revision = 1
for version_spec in ordered_versions:
version_spec.revision = revision
revision = revision + 1
# delete any revisions excess than required
self.session.query(MySQLOffsetSpec).filter(
MySQLOffsetSpec.revision > self.MAX_REVISIONS).delete(
synchronize_session="fetch")
def get_kafka_offsets(self, app_name):
return {'%s_%s_%s' % (
offset.get_app_name(), offset.get_topic(), offset.get_partition()
): offset for offset in self.session.query(MySQLOffsetSpec).filter(
MySQLOffsetSpec.app_name == app_name,
MySQLOffsetSpec.revision == 1).all()}
def get_kafka_offsets_by_revision(self, app_name, revision):
return {'%s_%s_%s' % (
offset.get_app_name(), offset.get_topic(), offset.get_partition()
): offset for offset in self.session.query(MySQLOffsetSpec).filter(
MySQLOffsetSpec.app_name == app_name,
MySQLOffsetSpec.revision == revision).all()}
def get_most_recent_batch_time_from_offsets(self, app_name, topic):
try:
# get partition 0 as a representative of all others
offset = self.session.query(MySQLOffsetSpec).filter(
MySQLOffsetSpec.app_name == app_name,
MySQLOffsetSpec.topic == topic,
MySQLOffsetSpec.partition == 0,
MySQLOffsetSpec.revision == 1).one()
most_recent_batch_time = datetime.datetime.strptime(
offset.get_batch_time(),
'%Y-%m-%d %H:%M:%S')
except Exception:
most_recent_batch_time = None
return most_recent_batch_time
def delete_all_kafka_offsets(self, app_name):
try:
self.session.query(MySQLOffsetSpec).filter(
MySQLOffsetSpec.app_name == app_name).delete()
self.session.commit()
except Exception:
# Seems like there isn't much that can be done in this situation
pass
def add_all_offsets(self, app_name, offsets,
batch_time_info):
"""add offsets. """
try:
# batch time
batch_time = \
batch_time_info.strftime(
'%Y-%m-%d %H:%M:%S')
# last updated
last_updated = \
datetime.datetime.now().strftime(
'%Y-%m-%d %H:%M:%S')
NEW_REVISION_NO = -1
for o in offsets:
offset_spec = MySQLOffsetSpec(
topic=o.topic,
app_name=app_name,
partition=o.partition,
from_offset=o.fromOffset,
until_offset=o.untilOffset,
batch_time=batch_time,
last_updated=last_updated,
revision=NEW_REVISION_NO)
self.session.add(offset_spec)
# manage versions
self._manage_offset_revisions()
self.session.commit()
except Exception:
self.session.rollback()
raise
def add(self, app_name, topic, partition,
from_offset, until_offset, batch_time_info):
"""add offset info. """
try:
# batch time
batch_time = \
batch_time_info.strftime(
'%Y-%m-%d %H:%M:%S')
# last updated
last_updated = \
datetime.datetime.now().strftime(
'%Y-%m-%d %H:%M:%S')
NEW_REVISION_NO = -1
offset_spec = MySQLOffsetSpec(
topic=topic,
app_name=app_name,
partition=partition,
from_offset=from_offset,
until_offset=until_offset,
batch_time=batch_time,
last_updated=last_updated,
revision=NEW_REVISION_NO)
self.session.add(offset_spec)
# manage versions
self._manage_offset_revisions()
self.session.commit()
except Exception:
self.session.rollback()
raise

View File

@ -1,101 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import abc
import six
class OffsetSpec(object):
def __init__(self, app_name=None, topic=None, partition=None,
from_offset=None, until_offset=None,
batch_time=None, last_updated=None,
revision=None):
self.app_name = app_name
self.topic = topic
self.partition = partition
self.from_offset = from_offset
self.until_offset = until_offset
self.batch_time = batch_time
self.last_updated = last_updated
self.revision = revision
def get_app_name(self):
return self.app_name
def get_topic(self):
return self.topic
def get_partition(self):
return self.partition
def get_from_offset(self):
return self.from_offset
def get_until_offset(self):
return self.until_offset
def get_batch_time(self):
return self.batch_time
def get_last_updated(self):
return self.last_updated
def get_revision(self):
return self.revision
@six.add_metaclass(abc.ABCMeta)
class OffsetSpecs(object):
"""Class representing offset specs to help recover.
From where processing should pick up in case of failure
"""
@abc.abstractmethod
def add(self, app_name, topic, partition,
from_offset, until_offset, batch_time_info):
raise NotImplementedError(
"Class %s doesn't implement add(self, app_name, topic, "
"partition, from_offset, until_offset, batch_time,"
"last_updated, revision)"
% self.__class__.__name__)
@abc.abstractmethod
def add_all_offsets(self, app_name, offsets, batch_time_info):
raise NotImplementedError(
"Class %s doesn't implement add(self, app_name, topic, "
"partition, from_offset, until_offset, batch_time,"
"last_updated, revision)"
% self.__class__.__name__)
@abc.abstractmethod
def get_kafka_offsets(self, app_name):
raise NotImplementedError(
"Class %s doesn't implement get_kafka_offsets()"
% self.__class__.__name__)
@abc.abstractmethod
def delete_all_kafka_offsets(self, app_name):
raise NotImplementedError(
"Class %s doesn't implement delete_all_kafka_offsets()"
% self.__class__.__name__)
@abc.abstractmethod
def get_most_recent_batch_time_from_offsets(self, app_name, topic):
raise NotImplementedError(
"Class %s doesn't implement "
"get_most_recent_batch_time_from_offsets()"
% self.__class__.__name__)

View File

@ -1,39 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import abc
class Processor(object):
"""processor object """
@abc.abstractmethod
def get_app_name(self):
"""get name of this application. Will be used to store offsets in database"""
raise NotImplementedError(
"Class %s doesn't implement get_app_name()"
% self.__class__.__name__)
@abc.abstractmethod
def is_time_to_run(self, current_time):
"""return True if its time to run this processor"""
raise NotImplementedError(
"Class %s doesn't implement is_time_to_run()"
% self.__class__.__name__)
@abc.abstractmethod
def run_processor(self, time):
"""Run application"""
raise NotImplementedError(
"Class %s doesn't implement run_processor()"
% self.__class__.__name__)

View File

@ -1,617 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from monasca_common.kafka_lib.client import KafkaClient
from monasca_common.kafka_lib.common import OffsetRequest
from pyspark.sql import SQLContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark.streaming.kafka import OffsetRange
import datetime
import logging
from monasca_common.simport import simport
from oslo_config import cfg
from monasca_transform.component.insert.kafka_insert import KafkaInsert
from monasca_transform.component.setter.pre_hourly_calculate_rate import \
PreHourlyCalculateRate
from monasca_transform.component.setter.rollup_quantity import RollupQuantity
from monasca_transform.config.config_initializer import ConfigInitializer
from monasca_transform.data_driven_specs.data_driven_specs_repo \
import DataDrivenSpecsRepo
from monasca_transform.data_driven_specs.data_driven_specs_repo \
import DataDrivenSpecsRepoFactory
from monasca_transform.log_utils import LogUtils
from monasca_transform.processor import Processor
from monasca_transform.processor.processor_util import PreHourlyProcessorUtil
from monasca_transform.processor.processor_util import ProcessUtilDataProvider
from monasca_transform.transform.storage_utils import \
InvalidCacheStorageLevelException
from monasca_transform.transform.storage_utils import StorageUtils
from monasca_transform.transform.transform_utils import InstanceUsageUtils
from monasca_transform.transform import TransformContextUtils
ConfigInitializer.basic_config()
log = LogUtils.init_logger(__name__)
class PreHourlyProcessorDataProvider(ProcessUtilDataProvider):
def get_last_processed(self):
offset_specifications = PreHourlyProcessor.get_offset_specs()
app_name = PreHourlyProcessor.get_app_name()
topic = PreHourlyProcessor.get_kafka_topic()
most_recent_batch_time = (
offset_specifications.get_most_recent_batch_time_from_offsets(
app_name, topic))
return most_recent_batch_time
class PreHourlyProcessor(Processor):
"""Publish metrics in kafka
Processor to process usage data published to metrics_pre_hourly topic a
and publish final rolled up metrics to metrics topic in kafka.
"""
@staticmethod
def save_kafka_offsets(current_offsets,
batch_time_info):
"""save current offsets to offset specification."""
offset_specs = simport.load(cfg.CONF.repositories.offsets)()
app_name = PreHourlyProcessor.get_app_name()
for o in current_offsets:
log.debug(
"saving: OffSetRanges: %s %s %s %s, "
"batch_time_info: %s" % (
o.topic, o.partition, o.fromOffset, o.untilOffset,
str(batch_time_info)))
# add new offsets, update revision
offset_specs.add_all_offsets(app_name,
current_offsets,
batch_time_info)
@staticmethod
def reset_kafka_offsets():
"""delete all offsets from the offset specification."""
app_name = PreHourlyProcessor.get_app_name()
# get the offsets from global var
offset_specs = simport.load(cfg.CONF.repositories.offsets)()
offset_specs.delete_all_kafka_offsets(app_name)
@staticmethod
def get_app_name():
"""get name of this application. Will be used to store offsets in database"""
return "mon_metrics_kafka_pre_hourly"
@staticmethod
def get_kafka_topic():
"""get name of kafka topic for transformation."""
return "metrics_pre_hourly"
@staticmethod
def is_time_to_run(check_time):
return PreHourlyProcessorUtil.is_time_to_run(check_time)
@staticmethod
def _get_offsets_from_kafka(brokers,
topic,
offset_time):
"""get dict representing kafka offsets."""
# get client
client = KafkaClient(brokers)
# get partitions for a topic
partitions = client.topic_partitions[topic]
# https://cwiki.apache.org/confluence/display/KAFKA/
# A+Guide+To+The+Kafka+Protocol#
# AGuideToTheKafkaProtocol-OffsetRequest
MAX_OFFSETS = 1
offset_requests = [OffsetRequest(topic,
part_name,
offset_time,
MAX_OFFSETS) for part_name
in partitions.keys()]
offsets_responses = client.send_offset_request(offset_requests)
offset_dict = {}
for response in offsets_responses:
key = "_".join((response.topic,
str(response.partition)))
offset_dict[key] = response
return offset_dict
@staticmethod
def _parse_saved_offsets(app_name, topic, saved_offset_spec):
"""get dict representing saved offsets."""
offset_dict = {}
for key, value in saved_offset_spec.items():
if key.startswith("%s_%s" % (app_name, topic)):
spec_app_name = value.get_app_name()
spec_topic = value.get_topic()
spec_partition = int(value.get_partition())
spec_from_offset = value.get_from_offset()
spec_until_offset = value.get_until_offset()
key = "_".join((spec_topic,
str(spec_partition)))
offset_dict[key] = (spec_app_name,
spec_topic,
spec_partition,
spec_from_offset,
spec_until_offset)
return offset_dict
@staticmethod
def _get_new_offset_range_list(brokers, topic):
"""get offset range from earliest to latest."""
offset_range_list = []
# https://cwiki.apache.org/confluence/display/KAFKA/
# A+Guide+To+The+Kafka+Protocol#
# AGuideToTheKafkaProtocol-OffsetRequest
GET_LATEST_OFFSETS = -1
latest_dict = PreHourlyProcessor._get_offsets_from_kafka(
brokers, topic, GET_LATEST_OFFSETS)
GET_EARLIEST_OFFSETS = -2
earliest_dict = PreHourlyProcessor._get_offsets_from_kafka(
brokers, topic, GET_EARLIEST_OFFSETS)
for item in latest_dict:
until_offset = latest_dict[item].offsets[0]
from_offset = earliest_dict[item].offsets[0]
partition = latest_dict[item].partition
topic = latest_dict[item].topic
offset_range_list.append(OffsetRange(topic,
partition,
from_offset,
until_offset))
return offset_range_list
@staticmethod
def _get_offset_range_list(brokers,
topic,
app_name,
saved_offset_spec):
"""get offset range from saved offset to latest."""
offset_range_list = []
# https://cwiki.apache.org/confluence/display/KAFKA/
# A+Guide+To+The+Kafka+Protocol#
# AGuideToTheKafkaProtocol-OffsetRequest
GET_LATEST_OFFSETS = -1
latest_dict = PreHourlyProcessor._get_offsets_from_kafka(
brokers, topic, GET_LATEST_OFFSETS)
GET_EARLIEST_OFFSETS = -2
earliest_dict = PreHourlyProcessor._get_offsets_from_kafka(
brokers, topic, GET_EARLIEST_OFFSETS)
saved_dict = PreHourlyProcessor._parse_saved_offsets(
app_name, topic, saved_offset_spec)
for item in latest_dict:
# saved spec
(spec_app_name,
spec_topic_name,
spec_partition,
spec_from_offset,
spec_until_offset) = saved_dict[item]
# until
until_offset = latest_dict[item].offsets[0]
# from
if spec_until_offset is not None and int(spec_until_offset) >= 0:
from_offset = spec_until_offset
else:
from_offset = earliest_dict[item].offsets[0]
partition = latest_dict[item].partition
topic = latest_dict[item].topic
offset_range_list.append(OffsetRange(topic,
partition,
from_offset,
until_offset))
return offset_range_list
@staticmethod
def get_processing_offset_range_list(processing_time):
"""Get offset range to fetch data from.
The range will last from the last saved offsets to current offsets
available. If there are no last saved offsets available in the
database the starting offsets will be set to the earliest
available in kafka.
"""
offset_specifications = PreHourlyProcessor.get_offset_specs()
# get application name, will be used to get offsets from database
app_name = PreHourlyProcessor.get_app_name()
saved_offset_spec = offset_specifications.get_kafka_offsets(app_name)
# get kafka topic to fetch data
topic = PreHourlyProcessor.get_kafka_topic()
if len(saved_offset_spec) < 1:
log.debug(
"No saved offsets available..."
"connecting to kafka and fetching "
"from earliest available offset ...")
offset_range_list = PreHourlyProcessor._get_new_offset_range_list(
cfg.CONF.messaging.brokers,
topic)
else:
log.debug(
"Saved offsets available..."
"connecting to kafka and fetching from saved offset ...")
offset_range_list = PreHourlyProcessor._get_offset_range_list(
cfg.CONF.messaging.brokers,
topic,
app_name,
saved_offset_spec)
return offset_range_list
@staticmethod
def get_offset_specs():
"""get offset specifications."""
return simport.load(cfg.CONF.repositories.offsets)()
@staticmethod
def get_effective_offset_range_list(offset_range_list):
"""Get effective batch offset range.
Effective batch offset range covers offsets starting
from effective batch revision (defined by effective_batch_revision
config property). By default this method will set the
pyspark Offset.fromOffset for each partition
to have value older than the latest revision
(defaults to latest -1) so that prehourly processor has access
to entire data for the hour. This will also account for and cover
any early arriving data (data that arrives before the start hour).
"""
offset_specifications = PreHourlyProcessor.get_offset_specs()
app_name = PreHourlyProcessor.get_app_name()
topic = PreHourlyProcessor.get_kafka_topic()
# start offset revision
effective_batch_revision = cfg.CONF.pre_hourly_processor.\
effective_batch_revision
effective_batch_spec = offset_specifications\
.get_kafka_offsets_by_revision(app_name,
effective_batch_revision)
# get latest revision, if penultimate is unavailable
if not effective_batch_spec:
log.debug("effective batch spec: offsets: revision %s unavailable,"
" getting the latest revision instead..." % (
effective_batch_revision))
# not available
effective_batch_spec = offset_specifications.get_kafka_offsets(
app_name)
effective_batch_offsets = PreHourlyProcessor._parse_saved_offsets(
app_name, topic,
effective_batch_spec)
# for debugging
for effective_key in effective_batch_offsets.keys():
effective_offset = effective_batch_offsets.get(effective_key,
None)
(effect_app_name,
effect_topic_name,
effect_partition,
effect_from_offset,
effect_until_offset) = effective_offset
log.debug(
"effective batch offsets (from db):"
" OffSetRanges: %s %s %s %s" % (
effect_topic_name, effect_partition,
effect_from_offset, effect_until_offset))
# effective batch revision
effective_offset_range_list = []
for offset_range in offset_range_list:
part_topic_key = "_".join((offset_range.topic,
str(offset_range.partition)))
effective_offset = effective_batch_offsets.get(part_topic_key,
None)
if effective_offset:
(effect_app_name,
effect_topic_name,
effect_partition,
effect_from_offset,
effect_until_offset) = effective_offset
log.debug(
"Extending effective offset range:"
" OffSetRanges: %s %s %s-->%s %s" % (
effect_topic_name, effect_partition,
offset_range.fromOffset,
effect_from_offset,
effect_until_offset))
effective_offset_range_list.append(
OffsetRange(offset_range.topic,
offset_range.partition,
effect_from_offset,
offset_range.untilOffset))
else:
effective_offset_range_list.append(
OffsetRange(offset_range.topic,
offset_range.partition,
offset_range.fromOffset,
offset_range.untilOffset))
# return effective offset range list
return effective_offset_range_list
@staticmethod
def fetch_pre_hourly_data(spark_context,
offset_range_list):
"""get metrics pre hourly data from offset range list."""
for o in offset_range_list:
log.debug(
"fetch_pre_hourly: offset_range_list:"
" OffSetRanges: %s %s %s %s" % (
o.topic, o.partition, o.fromOffset, o.untilOffset))
effective_offset_list = PreHourlyProcessor.\
get_effective_offset_range_list(offset_range_list)
for o in effective_offset_list:
log.debug(
"fetch_pre_hourly: effective_offset_range_list:"
" OffSetRanges: %s %s %s %s" % (
o.topic, o.partition, o.fromOffset, o.untilOffset))
# get kafka stream over the same offsets
pre_hourly_rdd = KafkaUtils.createRDD(spark_context,
{"metadata.broker.list":
cfg.CONF.messaging.brokers},
effective_offset_list)
return pre_hourly_rdd
@staticmethod
def pre_hourly_to_instance_usage_df(pre_hourly_rdd):
"""convert raw pre hourly data into instance usage dataframe."""
#
# extract second column containing instance usage data
#
instance_usage_rdd = pre_hourly_rdd.map(
lambda iud: iud[1])
#
# convert usage data rdd to instance usage df
#
sqlc = SQLContext.getOrCreate(pre_hourly_rdd.context)
instance_usage_df = InstanceUsageUtils.create_df_from_json_rdd(
sqlc, instance_usage_rdd)
if cfg.CONF.pre_hourly_processor.enable_batch_time_filtering:
instance_usage_df = (
PreHourlyProcessor.filter_out_records_not_in_current_batch(
instance_usage_df))
return instance_usage_df
@staticmethod
def filter_out_records_not_in_current_batch(instance_usage_df):
"""Filter out any records which don't pertain to the current batch
(i.e., records before or after the
batch currently being processed).
"""
# get the most recent batch time from the stored offsets
offset_specifications = PreHourlyProcessor.get_offset_specs()
app_name = PreHourlyProcessor.get_app_name()
topic = PreHourlyProcessor.get_kafka_topic()
most_recent_batch_time = (
offset_specifications.get_most_recent_batch_time_from_offsets(
app_name, topic))
if most_recent_batch_time:
# batches can fire after late metrics slack time, not neccessarily
# at the top of the hour
most_recent_batch_time_truncated = most_recent_batch_time.replace(
minute=0, second=0, microsecond=0)
log.debug("filter out records before : %s" % (
most_recent_batch_time_truncated.strftime(
'%Y-%m-%dT%H:%M:%S')))
# filter out records before current batch
instance_usage_df = instance_usage_df.filter(
instance_usage_df.lastrecord_timestamp_string >=
most_recent_batch_time_truncated)
# determine the timestamp of the most recent top-of-the-hour (which
# is the end of the current batch).
current_time = datetime.datetime.now()
truncated_timestamp_to_current_hour = current_time.replace(
minute=0, second=0, microsecond=0)
# filter out records after current batch
log.debug("filter out records after : %s" % (
truncated_timestamp_to_current_hour.strftime(
'%Y-%m-%dT%H:%M:%S')))
instance_usage_df = instance_usage_df.filter(
instance_usage_df.firstrecord_timestamp_string <
truncated_timestamp_to_current_hour)
return instance_usage_df
@staticmethod
def process_instance_usage(transform_context, instance_usage_df):
"""Second stage aggregation.
Aggregate instance usage rdd
data and write results to metrics topic in kafka.
"""
transform_spec_df = transform_context.transform_spec_df_info
#
# do a rollup operation
#
agg_params = (transform_spec_df.select(
"aggregation_params_map.pre_hourly_group_by_list")
.collect()[0].asDict())
pre_hourly_group_by_list = agg_params["pre_hourly_group_by_list"]
if (len(pre_hourly_group_by_list) == 1 and
pre_hourly_group_by_list[0] == "default"):
pre_hourly_group_by_list = ["tenant_id", "user_id",
"resource_uuid",
"geolocation", "region", "zone",
"host", "project_id",
"aggregated_metric_name",
"aggregation_period"]
# get aggregation period
agg_params = transform_spec_df.select(
"aggregation_params_map.aggregation_period").collect()[0].asDict()
aggregation_period = agg_params["aggregation_period"]
# get 2stage operation
agg_params = (transform_spec_df.select(
"aggregation_params_map.pre_hourly_operation")
.collect()[0].asDict())
pre_hourly_operation = agg_params["pre_hourly_operation"]
if pre_hourly_operation != "rate":
instance_usage_df = RollupQuantity.do_rollup(
pre_hourly_group_by_list, aggregation_period,
pre_hourly_operation, instance_usage_df)
else:
instance_usage_df = PreHourlyCalculateRate.do_rate_calculation(
instance_usage_df)
# insert metrics
instance_usage_df = KafkaInsert.insert(transform_context,
instance_usage_df)
return instance_usage_df
@staticmethod
def do_transform(instance_usage_df):
"""start processing (aggregating) metrics"""
#
# look in instance_usage_df for list of metrics to be processed
#
metric_ids_df = instance_usage_df.select(
"processing_meta.metric_id").distinct()
metric_ids_to_process = [row.metric_id
for row in metric_ids_df.collect()]
data_driven_specs_repo = (
DataDrivenSpecsRepoFactory.get_data_driven_specs_repo())
sqlc = SQLContext.getOrCreate(instance_usage_df.rdd.context)
transform_specs_df = data_driven_specs_repo.get_data_driven_specs(
sql_context=sqlc,
data_driven_spec_type=DataDrivenSpecsRepo.transform_specs_type)
for metric_id in metric_ids_to_process:
transform_spec_df = transform_specs_df.select(
["aggregation_params_map", "metric_id"]
).where(transform_specs_df.metric_id == metric_id)
source_instance_usage_df = instance_usage_df.select("*").where(
instance_usage_df.processing_meta.metric_id == metric_id)
# set transform_spec_df in TransformContext
transform_context = TransformContextUtils.get_context(
transform_spec_df_info=transform_spec_df)
agg_inst_usage_df = PreHourlyProcessor.process_instance_usage(
transform_context, source_instance_usage_df)
# if running in debug mode, write out the aggregated metric
# name just processed (along with the count of how many of these
# were aggregated) to the application log.
if log.isEnabledFor(logging.DEBUG):
agg_inst_usage_collection = agg_inst_usage_df.collect()
collection_len = len(agg_inst_usage_collection)
if collection_len > 0:
agg_inst_usage_dict = agg_inst_usage_collection[0].asDict()
log.debug("Submitted hourly aggregated metric: %s (%s)",
agg_inst_usage_dict["aggregated_metric_name"],
str(collection_len))
@staticmethod
def run_processor(spark_context, processing_time):
"""Process data in metrics_pre_hourly queue
Starting from the last saved offsets, else start from earliest
offsets available
"""
offset_range_list = (
PreHourlyProcessor.get_processing_offset_range_list(
processing_time))
# get pre hourly data
pre_hourly_rdd = PreHourlyProcessor.fetch_pre_hourly_data(
spark_context, offset_range_list)
# get instance usage df
instance_usage_df = PreHourlyProcessor.pre_hourly_to_instance_usage_df(
pre_hourly_rdd)
#
# cache instance usage df
#
if cfg.CONF.pre_hourly_processor.enable_instance_usage_df_cache:
storage_level_prop = (
cfg.CONF.pre_hourly_processor
.instance_usage_df_cache_storage_level)
try:
storage_level = StorageUtils.get_storage_level(
storage_level_prop)
except InvalidCacheStorageLevelException as storage_error:
storage_error.value += (" (as specified in "
"pre_hourly_processor"
".instance_usage_df_cache"
"_storage_level)")
raise
instance_usage_df.persist(storage_level)
# aggregate pre hourly data
PreHourlyProcessor.do_transform(instance_usage_df)
# remove cache
if cfg.CONF.pre_hourly_processor.enable_instance_usage_df_cache:
instance_usage_df.unpersist()
# save latest metrics_pre_hourly offsets in the database
PreHourlyProcessor.save_kafka_offsets(offset_range_list,
processing_time)

View File

@ -1,112 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import abc
import datetime
from monasca_common.simport import simport
from oslo_config import cfg
from monasca_transform.log_utils import LogUtils
log = LogUtils.init_logger(__name__)
class PreHourlyProcessorUtil(object):
data_provider = None
@staticmethod
def get_last_processed():
return PreHourlyProcessorUtil.get_data_provider().get_last_processed()
@staticmethod
def get_data_provider():
if not PreHourlyProcessorUtil.data_provider:
PreHourlyProcessorUtil.data_provider = simport.load(
cfg.CONF.pre_hourly_processor.data_provider)()
return PreHourlyProcessorUtil.data_provider
@staticmethod
def is_time_to_run(check_date_time):
"""return True if its time to run this processor.
It is time to run the processor if:
The processor has no previous recorded run time.
It is more than the configured 'late_metric_slack_time' (to allow
for the arrival of tardy metrics) past the hour and the processor
has not yet run for this hour
"""
check_hour = int(datetime.datetime.strftime(check_date_time, '%H'))
check_date = check_date_time.replace(minute=0, second=0,
microsecond=0, hour=0)
slack = datetime.timedelta(
seconds=cfg.CONF.pre_hourly_processor.late_metric_slack_time)
top_of_the_hour_date_time = check_date_time.replace(
minute=0, second=0, microsecond=0)
earliest_acceptable_run_date_time = top_of_the_hour_date_time + slack
last_processed_date_time = PreHourlyProcessorUtil.get_last_processed()
if last_processed_date_time:
last_processed_hour = int(
datetime.datetime.strftime(
last_processed_date_time, '%H'))
last_processed_date = last_processed_date_time.replace(
minute=0, second=0, microsecond=0, hour=0)
else:
last_processed_date = None
last_processed_hour = None
if (check_hour == last_processed_hour and
last_processed_date == check_date):
earliest_acceptable_run_date_time = (
top_of_the_hour_date_time +
datetime.timedelta(hours=1) +
slack
)
log.debug(
"Pre-hourly task check: Now date: %s, "
"Date last processed: %s, Check time = %s, "
"Last processed at %s (hour = %s), "
"Earliest acceptable run time %s "
"(based on configured pre hourly late metrics slack time of %s "
"seconds)" % (
check_date,
last_processed_date,
check_date_time,
last_processed_date_time,
last_processed_hour,
earliest_acceptable_run_date_time,
cfg.CONF.pre_hourly_processor.late_metric_slack_time
))
# run pre hourly processor only once from the
# configured time after the top of the hour
if (not last_processed_date_time or (
((not check_hour == last_processed_hour) or
(check_date > last_processed_date)) and
check_date_time >= earliest_acceptable_run_date_time)):
log.debug("Pre-hourly: Yes, it's time to process")
return True
log.debug("Pre-hourly: No, it's NOT time to process")
return False
class ProcessUtilDataProvider(object):
@abc.abstractmethod
def get_last_processed(self):
"""return data on last run of processor"""
raise NotImplementedError(
"Class %s doesn't implement is_time_to_run()"
% self.__class__.__name__)

View File

@ -1,297 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import os
import psutil
import signal
import socket
import subprocess
import sys
import threading
import time
import traceback
from oslo_config import cfg
from oslo_log import log
from oslo_service import loopingcall
from oslo_service import service as os_service
from tooz import coordination
from monasca_transform.config.config_initializer import ConfigInitializer
from monasca_transform.log_utils import LogUtils
CONF = cfg.CONF
SPARK_SUBMIT_PROC_NAME = "spark-submit"
def main():
transform_service = TransformService()
transform_service.start()
def shutdown_all_threads_and_die():
"""Shut down all threads and exit process.
Hit it with a hammer to kill all threads and die.
"""
LOG = log.getLogger(__name__)
LOG.info('Monasca Transform service stopping...')
os._exit(1)
def get_process(proc_name):
"""Get process given string in process cmd line."""
LOG = log.getLogger(__name__)
proc = None
try:
for pr in psutil.process_iter():
for args in pr.cmdline():
if proc_name in args.split(" "):
proc = pr
return proc
except BaseException:
# pass
LOG.error("Error fetching {%s} process..." % proc_name)
return None
def stop_spark_submit_process():
"""Stop spark submit program."""
LOG = log.getLogger(__name__)
try:
# get the driver proc
pr = get_process(SPARK_SUBMIT_PROC_NAME)
if pr:
# terminate (SIGTERM) spark driver proc
for cpr in pr.children(recursive=False):
LOG.info("Terminate child pid {%s} ..." % str(cpr.pid))
cpr.terminate()
# terminate spark submit proc
LOG.info("Terminate pid {%s} ..." % str(pr.pid))
pr.terminate()
except Exception as e:
LOG.error("Error killing spark submit "
"process: got exception: {%s}" % str(e))
class Transform(os_service.Service):
"""Class used with Openstack service."""
LOG = log.getLogger(__name__)
def __init__(self, threads=1):
super(Transform, self).__init__(threads)
def signal_handler(self, signal_number, stack_frame):
# Catch stop requests and appropriately shut down
shutdown_all_threads_and_die()
def start(self):
try:
# Register to catch stop requests
signal.signal(signal.SIGTERM, self.signal_handler)
main()
except BaseException:
self.LOG.exception("Monasca Transform service "
"encountered fatal error. "
"Shutting down all threads and exiting")
shutdown_all_threads_and_die()
def stop(self):
stop_spark_submit_process()
super(os_service.Service, self).stop()
class TransformService(threading.Thread):
previously_running = False
LOG = log.getLogger(__name__)
def __init__(self):
super(TransformService, self).__init__()
self.coordinator = None
self.group = CONF.service.coordinator_group
# A unique name used for establishing election candidacy
self.my_host_name = socket.getfqdn()
# periodic check
leader_check = loopingcall.FixedIntervalLoopingCall(
self.periodic_leader_check)
leader_check.start(interval=float(
CONF.service.election_polling_frequency))
def check_if_still_leader(self):
"""Return true if the this host is the leader"""
leader = None
try:
leader = self.coordinator.get_leader(self.group).get()
except BaseException:
self.LOG.info('No leader elected yet for group %s' %
(self.group))
if leader and self.my_host_name == leader:
return True
# default
return False
def periodic_leader_check(self):
self.LOG.debug("Called periodic_leader_check...")
try:
if self.previously_running:
if not self.check_if_still_leader():
# stop spark submit process
stop_spark_submit_process()
# stand down as a leader
try:
self.coordinator.stand_down_group_leader(
self.group)
except BaseException as e:
self.LOG.info("Host %s cannot stand down as "
"leader for group %s: "
"got exception {%s}" %
(self.my_host_name, self.group,
str(e)))
# reset state
self.previously_running = False
except BaseException as e:
self.LOG.info("periodic_leader_check: "
"caught unhandled exception: {%s}" % str(e))
def when_i_am_elected_leader(self, event):
"""Callback when this host gets elected leader."""
# set running state
self.previously_running = True
self.LOG.info("Monasca Transform service running on %s "
"has been elected leader" % str(self.my_host_name))
if CONF.service.spark_python_files:
pyfiles = (" --py-files %s"
% CONF.service.spark_python_files)
else:
pyfiles = ''
event_logging_dest = ''
if (CONF.service.spark_event_logging_enabled and
CONF.service.spark_event_logging_dest):
event_logging_dest = (
"--conf spark.eventLog.dir="
"file://%s" %
CONF.service.spark_event_logging_dest)
# Build the command to start the Spark driver
spark_cmd = "".join((
"export SPARK_HOME=",
CONF.service.spark_home,
" && ",
"spark-submit --master ",
CONF.service.spark_master_list,
" --conf spark.eventLog.enabled=",
CONF.service.spark_event_logging_enabled,
event_logging_dest,
" --jars " + CONF.service.spark_jars_list,
pyfiles,
" " + CONF.service.spark_driver))
# Start the Spark driver
# (specify shell=True in order to
# correctly handle wildcards in the spark_cmd)
subprocess.call(spark_cmd, shell=True)
def run(self):
self.LOG.info('The host of this Monasca Transform service is ' +
self.my_host_name)
# Loop until the service is stopped
while True:
try:
self.previously_running = False
# Start an election coordinator
self.coordinator = coordination.get_coordinator(
CONF.service.coordinator_address, self.my_host_name)
self.coordinator.start()
# Create a coordination/election group
try:
request = self.coordinator.create_group(self.group)
request.get()
except coordination.GroupAlreadyExist:
self.LOG.info('Group %s already exists' % self.group)
# Join the coordination/election group
try:
request = self.coordinator.join_group(self.group)
request.get()
except coordination.MemberAlreadyExist:
self.LOG.info('Host already joined to group %s as %s' %
(self.group, self.my_host_name))
# Announce the candidacy and wait to be elected
self.coordinator.watch_elected_as_leader(
self.group,
self.when_i_am_elected_leader)
while self.previously_running is False:
self.LOG.debug('Monasca Transform service on %s is '
'checking election results...'
% self.my_host_name)
self.coordinator.heartbeat()
self.coordinator.run_watchers()
if self.previously_running is True:
try:
# Leave/exit the coordination/election group
request = self.coordinator.leave_group(self.group)
request.get()
except coordination.MemberNotJoined:
self.LOG.info("Host has not yet "
"joined group %s as %s" %
(self.group, self.my_host_name))
time.sleep(float(CONF.service.election_polling_frequency))
self.coordinator.stop()
except BaseException as e:
# catch any unhandled exception and continue
self.LOG.info("Ran into unhandled exception: {%s}" % str(e))
self.LOG.info("Going to restart coordinator again...")
traceback.print_exc()
def main_service():
"""Method to use with Openstack service."""
ConfigInitializer.basic_config()
LogUtils.init_logger(__name__)
launcher = os_service.ServiceLauncher(cfg.CONF, restart_method='mutate')
launcher.launch_service(Transform())
launcher.wait()
# Used if run without Openstack service.
if __name__ == "__main__":
sys.exit(main())

View File

@ -1,91 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from collections import namedtuple
TransformContextBase = namedtuple("TransformContext",
["config_info",
"offset_info",
"transform_spec_df_info",
"batch_time_info"])
class TransformContext(TransformContextBase):
"""A tuple which contains all the configuration information to drive processing
namedtuple contains:
config_info - configuration information from oslo config
offset_info - current kafka offset information
transform_spec_df - processing information from
transform_spec aggregation driver table
batch_datetime_info - current batch processing datetime
"""
RddTransformContextBase = namedtuple("RddTransformContext",
["rdd_info",
"transform_context_info"])
class RddTransformContext(RddTransformContextBase):
"""A tuple which is a wrapper containing the RDD and transform_context
namdetuple contains:
rdd_info - rdd
transform_context_info - transform context
"""
class TransformContextUtils(object):
"""utility method to get TransformContext"""
@staticmethod
def get_context(transform_context_info=None,
config_info=None,
offset_info=None,
transform_spec_df_info=None,
batch_time_info=None):
if transform_context_info is None:
return TransformContext(config_info,
offset_info,
transform_spec_df_info,
batch_time_info)
else:
if config_info is None or config_info == "":
# get from passed in transform_context
config_info = transform_context_info.config_info
if offset_info is None or offset_info == "":
# get from passed in transform_context
offset_info = transform_context_info.offset_info
if transform_spec_df_info is None or \
transform_spec_df_info == "":
# get from passed in transform_context
transform_spec_df_info = \
transform_context_info.transform_spec_df_info
if batch_time_info is None or \
batch_time_info == "":
# get from passed in transform_context
batch_time_info = \
transform_context_info.batch_time_info
return TransformContext(config_info,
offset_info,
transform_spec_df_info,
batch_time_info)

View File

@ -1,131 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from monasca_transform.log_utils import LogUtils
from stevedore import extension
class GenericTransformBuilder(object):
"""Build transformation pipeline
Based on aggregation_pipeline spec in metric processing
configuration
"""
_MONASCA_TRANSFORM_USAGE_NAMESPACE = 'monasca_transform.usage'
_MONASCA_TRANSFORM_SETTER_NAMESPACE = 'monasca_transform.setter'
_MONASCA_TRANSFORM_INSERT_NAMESPACE = 'monasca_transform.insert'
@staticmethod
def log_load_extension_error(manager, entry_point, error):
LogUtils.log_debug("GenericTransformBuilder: "
"log load extension error: manager: {%s},"
"entry_point: {%s}, error: {%s}"
% (str(manager),
str(entry_point),
str(error)))
@staticmethod
def _get_usage_component_manager():
"""stevedore extension manager for usage components."""
return extension.ExtensionManager(
namespace=GenericTransformBuilder
._MONASCA_TRANSFORM_USAGE_NAMESPACE,
on_load_failure_callback=GenericTransformBuilder.
log_load_extension_error,
invoke_on_load=False)
@staticmethod
def _get_setter_component_manager():
"""stevedore extension manager for setter components."""
return extension.ExtensionManager(
namespace=GenericTransformBuilder.
_MONASCA_TRANSFORM_SETTER_NAMESPACE,
on_load_failure_callback=GenericTransformBuilder.
log_load_extension_error,
invoke_on_load=False)
@staticmethod
def _get_insert_component_manager():
"""stevedore extension manager for insert components."""
return extension.ExtensionManager(
namespace=GenericTransformBuilder.
_MONASCA_TRANSFORM_INSERT_NAMESPACE,
on_load_failure_callback=GenericTransformBuilder.
log_load_extension_error,
invoke_on_load=False)
@staticmethod
def _parse_transform_pipeline(transform_spec_df):
"""Parse aggregation pipeline from metric processing configuration"""
# get aggregation pipeline df
aggregation_pipeline_df = transform_spec_df\
.select("aggregation_params_map.aggregation_pipeline")
# call components
source_row = aggregation_pipeline_df\
.select("aggregation_pipeline.source").collect()[0]
source = source_row.source
usage_row = aggregation_pipeline_df\
.select("aggregation_pipeline.usage").collect()[0]
usage = usage_row.usage
setter_row_list = aggregation_pipeline_df\
.select("aggregation_pipeline.setters").collect()
setter_list = [setter_row.setters for setter_row in setter_row_list]
insert_row_list = aggregation_pipeline_df\
.select("aggregation_pipeline.insert").collect()
insert_list = [insert_row.insert for insert_row in insert_row_list]
return (source, usage, setter_list[0], insert_list[0])
@staticmethod
def do_transform(transform_context,
record_store_df):
"""Method to return instance usage dataframe
Build a dynamic aggregation pipeline
and call components to process record store dataframe
"""
transform_spec_df = transform_context.transform_spec_df_info
(source,
usage,
setter_list,
insert_list) = GenericTransformBuilder.\
_parse_transform_pipeline(transform_spec_df)
# FIXME: source is a placeholder for non-streaming source
# in the future?
usage_component = GenericTransformBuilder.\
_get_usage_component_manager()[usage].plugin
instance_usage_df = usage_component.usage(transform_context,
record_store_df)
for setter in setter_list:
setter_component = GenericTransformBuilder.\
_get_setter_component_manager()[setter].plugin
instance_usage_df = setter_component.setter(transform_context,
instance_usage_df)
for insert in insert_list:
insert_component = GenericTransformBuilder.\
_get_insert_component_manager()[insert].plugin
instance_usage_df = insert_component.insert(transform_context,
instance_usage_df)
return instance_usage_df

View File

@ -1,67 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from collections import namedtuple
RecordStoreWithGroupByBase = namedtuple("RecordStoreWithGroupBy",
["record_store_data",
"group_by_columns_list"])
class RecordStoreWithGroupBy(RecordStoreWithGroupByBase):
"""A tuple which is a wrapper containing record store data and the group by columns
namdetuple contains:
record_store_data - record store data
group_by_columns_list - group by columns list
"""
GroupingResultsBase = namedtuple("GroupingResults",
["grouping_key",
"results",
"grouping_key_dict"])
class GroupingResults(GroupingResultsBase):
"""A tuple which is a wrapper containing grouping key and grouped result set
namdetuple contains:
grouping_key - group by key
results - grouped results
grouping_key_dict - group by key as dictionary
"""
class Grouping(object):
"""Base class for all grouping classes."""
@staticmethod
def _parse_grouping_key(grouping_str):
"""parse grouping key
which in "^key1=value1^key2=value2..." format
into a dictionary of key value pairs
"""
group_by_dict = {}
#
# convert key=value^key1=value1 string into a dict
#
for key_val_pair in grouping_str.split("^"):
if "=" in key_val_pair:
key_val = key_val_pair.split("=")
group_by_dict[key_val[0]] = key_val[1]
return group_by_dict

View File

@ -1,176 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from monasca_transform.transform.grouping import Grouping
from monasca_transform.transform.grouping import GroupingResults
from monasca_transform.transform.grouping import RecordStoreWithGroupBy
class GroupSortbyTimestamp(Grouping):
@staticmethod
def log_debug(logStr):
print(str)
# LOG.debug(logStr)
@staticmethod
def _prepare_for_group_by(record_store_with_group_by_rdd):
"""creates a new rdd where:
the first element of each row
contains array of grouping key and event timestamp fields.
Grouping key and event timestamp fields are used by
partitioning and sorting function to partition the data
by grouping key and then sort the elements in a group by the
timestamp
"""
# get the record store data and group by columns
record_store_data = record_store_with_group_by_rdd.record_store_data
group_by_columns_list = \
record_store_with_group_by_rdd.group_by_columns_list
# construct a group by key
# key1=value1^key2=value2^...
group_by_key_value = ""
for gcol in group_by_columns_list:
if gcol.startswith('dimensions.'):
gcol = "dimensions['%s']" % (gcol.split('.')[-1])
elif gcol.startswith('meta.'):
gcol = "meta['%s']" % (gcol.split('.')[-1])
elif gcol.startswith('value_meta.'):
gcol = "value_meta['%s']" % (gcol.split('.')[-1])
gcolval = eval(".".join(("record_store_data",
gcol)))
group_by_key_value = \
"^".join((group_by_key_value,
"=".join((gcol, gcolval))))
# return a key-value rdd
return [group_by_key_value, record_store_data]
@staticmethod
def _sort_by_timestamp(result_iterable):
# LOG.debug(whoami(result_iterable.data[0]))
# sort list might cause OOM, if the group has lots of items
# use group_sort_by_timestamp_partitions module instead if you run
# into OOM
sorted_list = sorted(result_iterable.data,
key=lambda row: row.event_timestamp_string)
return sorted_list
@staticmethod
def _group_sort_by_timestamp(record_store_df, group_by_columns_list):
# convert the dataframe rdd to normal rdd and add the group by column
# list
record_store_with_group_by_rdd = record_store_df.rdd.\
map(lambda x: RecordStoreWithGroupBy(x, group_by_columns_list))
# convert rdd into key-value rdd
record_store_with_group_by_rdd_key_val = \
record_store_with_group_by_rdd.\
map(GroupSortbyTimestamp._prepare_for_group_by)
first_step = record_store_with_group_by_rdd_key_val.groupByKey()
record_store_rdd_grouped_sorted = first_step.mapValues(
GroupSortbyTimestamp._sort_by_timestamp)
return record_store_rdd_grouped_sorted
@staticmethod
def _get_group_first_last_quantity_udf(grouplistiter):
"""Return stats that include:
first row key, first_event_timestamp,
first event quantity, last_event_timestamp and last event quantity
"""
first_row = None
last_row = None
# extract key and value list
group_key = grouplistiter[0]
grouped_values = grouplistiter[1]
count = 0.0
for row in grouped_values:
# set the first row
if first_row is None:
first_row = row
# set the last row
last_row = row
count = count + 1
first_event_timestamp_unix = None
first_event_timestamp_string = None
first_event_quantity = None
if first_row is not None:
first_event_timestamp_unix = first_row.event_timestamp_unix
first_event_timestamp_string = first_row.event_timestamp_string
first_event_quantity = first_row.event_quantity
last_event_timestamp_unix = None
last_event_timestamp_string = None
last_event_quantity = None
if last_row is not None:
last_event_timestamp_unix = last_row.event_timestamp_unix
last_event_timestamp_string = last_row.event_timestamp_string
last_event_quantity = last_row.event_quantity
results_dict = {"firstrecord_timestamp_unix":
first_event_timestamp_unix,
"firstrecord_timestamp_string":
first_event_timestamp_string,
"firstrecord_quantity": first_event_quantity,
"lastrecord_timestamp_unix":
last_event_timestamp_unix,
"lastrecord_timestamp_string":
last_event_timestamp_string,
"lastrecord_quantity": last_event_quantity,
"record_count": count}
group_key_dict = Grouping._parse_grouping_key(group_key)
return GroupingResults(group_key, results_dict, group_key_dict)
@staticmethod
def fetch_group_latest_oldest_quantity(record_store_df,
transform_spec_df,
group_by_columns_list):
"""Function to group record store data
Sort by timestamp within group
and get first and last timestamp along with quantity within each group
This function uses key-value pair rdd's groupBy function to do group_by
"""
# group and order elements in group
record_store_grouped_data_rdd = \
GroupSortbyTimestamp._group_sort_by_timestamp(
record_store_df, group_by_columns_list)
# find stats for a group
record_store_grouped_rows = \
record_store_grouped_data_rdd.\
map(GroupSortbyTimestamp.
_get_group_first_last_quantity_udf)
return record_store_grouped_rows

View File

@ -1,227 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from monasca_transform.transform.grouping import Grouping
from monasca_transform.transform.grouping import GroupingResults
from monasca_transform.transform.grouping import RecordStoreWithGroupBy
class GroupSortbyTimestampPartition(Grouping):
@staticmethod
def log_debug(logStr):
print(str)
# LOG.debug(logStr)
@staticmethod
def _get_group_first_last_quantity_udf(partition_list_iter):
"""User defined function to go through a list of partitions.
Each partition contains elements for a group. All the elements are sorted by
timestamp.
The stats include first row key, first_event_timestamp,
fist event quantity, last_event_timestamp and last event quantity
"""
first_row = None
last_row = None
count = 0.0
for row in partition_list_iter:
# set the first row
if first_row is None:
first_row = row
# set the last row
last_row = row
count = count + 1
first_event_timestamp_unix = None
first_event_timestamp_string = None
first_event_quantity = None
first_row_key = None
if first_row is not None:
first_event_timestamp_unix = first_row[1].event_timestamp_unix
first_event_timestamp_string = first_row[1].event_timestamp_string
first_event_quantity = first_row[1].event_quantity
# extract the grouping_key from composite grouping_key
# composite grouping key is a list, where first item is the
# grouping key and second item is the event_timestamp_string
first_row_key = first_row[0][0]
last_event_timestamp_unix = None
last_event_timestamp_string = None
last_event_quantity = None
if last_row is not None:
last_event_timestamp_unix = last_row[1].event_timestamp_unix
last_event_timestamp_string = last_row[1].event_timestamp_string
last_event_quantity = last_row[1].event_quantity
results_dict = {"firstrecord_timestamp_unix":
first_event_timestamp_unix,
"firstrecord_timestamp_string":
first_event_timestamp_string,
"firstrecord_quantity": first_event_quantity,
"lastrecord_timestamp_unix":
last_event_timestamp_unix,
"lastrecord_timestamp_string":
last_event_timestamp_string,
"lastrecord_quantity": last_event_quantity,
"record_count": count}
first_row_key_dict = Grouping._parse_grouping_key(first_row_key)
yield [GroupingResults(first_row_key, results_dict,
first_row_key_dict)]
@staticmethod
def _prepare_for_group_by(record_store_with_group_by_rdd):
"""Creates a new rdd where:
The first element of each row contains array of grouping
key and event timestamp fields.
Grouping key and event timestamp fields are used by
partitioning and sorting function to partition the data
by grouping key and then sort the elements in a group by the
timestamp
"""
# get the record store data and group by columns
record_store_data = record_store_with_group_by_rdd.record_store_data
group_by_columns_list = \
record_store_with_group_by_rdd.group_by_columns_list
# construct a group by key
# key1=value1^key2=value2^...
group_by_key_value = ""
for gcol in group_by_columns_list:
group_by_key_value = \
"^".join((group_by_key_value,
"=".join((gcol, eval(".".join(("record_store_data",
gcol)))))))
# return a key-value rdd
# key is a composite key which consists of grouping key and
# event_timestamp_string
return [[group_by_key_value,
record_store_data.event_timestamp_string], record_store_data]
@staticmethod
def _get_partition_by_group(group_composite):
"""Get a hash of the grouping key,
which is then used by partitioning
function to get partition where the groups data should end up in.
It uses hash % num_partitions to get partition
"""
# FIXME: find out of hash function in python gives same value on
# different machines
# Look at using portable_hash method in spark rdd
grouping_key = group_composite[0]
grouping_key_hash = hash(grouping_key)
# log_debug("group_by_sort_by_timestamp_partition: got hash : %s" \
# % str(returnhash))
return grouping_key_hash
@staticmethod
def _sort_by_timestamp(group_composite):
"""get timestamp which will be used to sort grouped data"""
event_timestamp_string = group_composite[1]
return event_timestamp_string
@staticmethod
def _group_sort_by_timestamp_partition(record_store_df,
group_by_columns_list,
num_of_groups):
"""It does a group by and then sorts all the items within the group by event timestamp."""
# convert the dataframe rdd to normal rdd and add the group by
# column list
record_store_with_group_by_rdd = record_store_df.rdd.\
map(lambda x: RecordStoreWithGroupBy(x, group_by_columns_list))
# prepare the data for repartitionAndSortWithinPartitions function
record_store_rdd_prepared = \
record_store_with_group_by_rdd.\
map(GroupSortbyTimestampPartition._prepare_for_group_by)
# repartition data based on a grouping key and sort the items within
# group by timestamp
# give high number of partitions
# numPartitions > number of groups expected, so that each group gets
# allocated a separate partition
record_store_rdd_partitioned_sorted = \
record_store_rdd_prepared.\
repartitionAndSortWithinPartitions(
numPartitions=num_of_groups,
partitionFunc=GroupSortbyTimestampPartition.
_get_partition_by_group,
keyfunc=GroupSortbyTimestampPartition.
_sort_by_timestamp)
return record_store_rdd_partitioned_sorted
@staticmethod
def _remove_none_filter(row):
"""remove any rows which have None as grouping key
[GroupingResults(grouping_key="key1", results={})] rows get created
when partition does not get any grouped data assigned to it
"""
if len(row[0].results) > 0 and row[0].grouping_key is not None:
return row
@staticmethod
def fetch_group_first_last_quantity(record_store_df,
transform_spec_df,
group_by_columns_list,
num_of_groups):
"""Function to group record store data
Sort by timestamp within group
and get first and last timestamp along with quantity within each group
To do group by it uses custom partitioning function which creates a new
partition for each group and uses RDD's repartitionAndSortWithinPartitions
function to do the grouping and sorting within the group.
This is more scalable than just using RDD's group_by as using this
technique group is not materialized into a list and stored in memory, but rather
it uses RDD's in built partitioning capability to do the sort num_of_groups should
be more than expected groups, otherwise the same
partition can get used for two groups which will cause incorrect results.
"""
# group and order elements in group using repartition
record_store_grouped_data_rdd = \
GroupSortbyTimestampPartition.\
_group_sort_by_timestamp_partition(record_store_df,
group_by_columns_list,
num_of_groups)
# do some operations on all elements in the group
grouping_results_tuple_with_none = \
record_store_grouped_data_rdd.\
mapPartitions(GroupSortbyTimestampPartition.
_get_group_first_last_quantity_udf)
# filter all rows which have no data (where grouping key is None) and
# convert resuts into grouping results tuple
grouping_results_tuple1 = grouping_results_tuple_with_none.\
filter(GroupSortbyTimestampPartition._remove_none_filter)
grouping_results_tuple = grouping_results_tuple1.map(lambda x: x[0])
return grouping_results_tuple

View File

@ -1,62 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from pyspark import StorageLevel
class InvalidCacheStorageLevelException(Exception):
"""Exception thrown when an invalid cache storage level is encountered
Attributes:
value: string representing the error
"""
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)
class StorageUtils(object):
"""storage util functions"""
@staticmethod
def get_storage_level(storage_level_str):
"""get pyspark storage level from storage level string"""
if (storage_level_str == "DISK_ONLY"):
return StorageLevel.DISK_ONLY
elif (storage_level_str == "DISK_ONLY_2"):
return StorageLevel.DISK_ONLY_2
elif (storage_level_str == "MEMORY_AND_DISK"):
return StorageLevel.MEMORY_AND_DISK
elif (storage_level_str == "MEMORY_AND_DISK_2"):
return StorageLevel.MEMORY_AND_DISK_2
elif (storage_level_str == "MEMORY_AND_DISK_SER"):
return StorageLevel.MEMORY_AND_DISK_SER
elif (storage_level_str == "MEMORY_AND_DISK_SER_2"):
return StorageLevel.MEMORY_AND_DISK_SER_2
elif (storage_level_str == "MEMORY_ONLY"):
return StorageLevel.MEMORY_ONLY
elif (storage_level_str == "MEMORY_ONLY_2"):
return StorageLevel.MEMORY_ONLY_2
elif (storage_level_str == "MEMORY_ONLY_SER"):
return StorageLevel.MEMORY_ONLY_SER
elif (storage_level_str == "MEMORY_ONLY_SER_2"):
return StorageLevel.MEMORY_ONLY_SER_2
elif (storage_level_str == "OFF_HEAP"):
return StorageLevel.OFF_HEAP
else:
raise InvalidCacheStorageLevelException(
"Unrecognized cache storage level: %s" % storage_level_str)

View File

@ -1,533 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from pyspark.sql import SQLContext
from pyspark.sql.types import ArrayType
from pyspark.sql.types import DoubleType
from pyspark.sql.types import MapType
from pyspark.sql.types import StringType
from pyspark.sql.types import StructField
from pyspark.sql.types import StructType
from monasca_transform.component import Component
class TransformUtils(object):
"""utility methods for different kinds of data."""
@staticmethod
def _rdd_to_df(rdd, schema):
"""convert rdd to dataframe using schema."""
spark_context = rdd.context
sql_context = SQLContext.getOrCreate(spark_context)
if schema is None:
df = sql_context.createDataFrame(rdd)
else:
df = sql_context.createDataFrame(rdd, schema)
return df
class InstanceUsageUtils(TransformUtils):
"""utility methods to transform instance usage data."""
@staticmethod
def _get_instance_usage_schema():
"""get instance usage schema."""
# Initialize columns for all string fields
columns = ["tenant_id", "user_id", "resource_uuid",
"geolocation", "region", "zone", "host", "project_id",
"aggregated_metric_name", "firstrecord_timestamp_string",
"lastrecord_timestamp_string",
"usage_date", "usage_hour", "usage_minute",
"aggregation_period"]
columns_struct_fields = [StructField(field_name, StringType(), True)
for field_name in columns]
# Add columns for non-string fields
columns_struct_fields.append(StructField("firstrecord_timestamp_unix",
DoubleType(), True))
columns_struct_fields.append(StructField("lastrecord_timestamp_unix",
DoubleType(), True))
columns_struct_fields.append(StructField("quantity",
DoubleType(), True))
columns_struct_fields.append(StructField("record_count",
DoubleType(), True))
columns_struct_fields.append(StructField("processing_meta",
MapType(StringType(),
StringType(),
True),
True))
columns_struct_fields.append(StructField("extra_data_map",
MapType(StringType(),
StringType(),
True),
True))
schema = StructType(columns_struct_fields)
return schema
@staticmethod
def create_df_from_json_rdd(sql_context, jsonrdd):
"""create instance usage df from json rdd."""
schema = InstanceUsageUtils._get_instance_usage_schema()
instance_usage_schema_df = sql_context.read.json(jsonrdd, schema)
return instance_usage_schema_df
@staticmethod
def prepare_instance_usage_group_by_list(group_by_list):
"""Prepare group by list.
If the group by list contains any instances of "dimensions#", "meta#" or "value_meta#" then
prepend the column value by "extra_data_map." since those columns are available in
extra_data_map column.
"""
return [InstanceUsageUtils.prepare_group_by_item(item) for item in group_by_list]
@staticmethod
def prepare_group_by_item(item):
"""Prepare group by list item.
Convert replaces any special "dimensions#", "meta#" or "value_meta#" occurrences into
spark sql syntax to retrieve data from extra_data_map column.
"""
if (item.startswith("dimensions#") or
item.startswith("meta#") or
item.startswith("value_meta#")):
return ".".join(("extra_data_map", item))
else:
return item
@staticmethod
def prepare_extra_data_map(extra_data_map):
"""Prepare extra data map.
Replace any occurances of "dimensions." or "meta." or "value_meta."
to "dimensions#", "meta#" or "value_meta#" in extra_data_map.
"""
prepared_extra_data_map = {}
for column_name in list(extra_data_map):
column_value = extra_data_map[column_name]
if column_name.startswith("dimensions."):
column_name = column_name.replace("dimensions.", "dimensions#")
elif column_name.startswith("meta."):
column_name = column_name.replace("meta.", "meta#")
elif column_name.startswith("value_meta."):
column_name = column_name.replace("value_meta.", "value_meta#")
elif column_name.startswith("extra_data_map."):
column_name = column_name.replace("extra_data_map.", "")
prepared_extra_data_map[column_name] = column_value
return prepared_extra_data_map
@staticmethod
def grouped_data_to_map(row, group_by_columns_list):
"""Iterate through group by column values from grouped data set and extract any values.
Return a dictionary which contains original group by columns name and value pairs, if they
are available from the grouped data set.
"""
extra_data_map = getattr(row, "extra_data_map", {})
# add group by fields data to extra data map
for column_name in group_by_columns_list:
column_value = getattr(row, column_name, Component.
DEFAULT_UNAVAILABLE_VALUE)
if (column_value == Component.DEFAULT_UNAVAILABLE_VALUE and
(column_name.startswith("dimensions.") or
column_name.startswith("meta.") or
column_name.startswith("value_meta.") or
column_name.startswith("extra_data_map."))):
split_column_name = column_name.split(".", 1)[-1]
column_value = getattr(row, split_column_name, Component.
DEFAULT_UNAVAILABLE_VALUE)
extra_data_map[column_name] = column_value
return extra_data_map
@staticmethod
def extract_dimensions(instance_usage_dict, dimension_list):
"""Extract dimensions from instance usage.
"""
dimensions_part = {}
# extra_data_map
extra_data_map = instance_usage_dict.get("extra_data_map", {})
for dim in dimension_list:
value = instance_usage_dict.get(dim)
if value is None:
# lookup for value in extra_data_map
if len(list(extra_data_map)) > 0:
value = extra_data_map.get(dim, "all")
if dim.startswith("dimensions#"):
dim = dim.replace("dimensions#", "")
elif dim.startswith("meta#"):
dim = dim.replace("meta#", "")
elif dim.startswith("value_meta#"):
dim = dim.replace("value_meta#", "")
dimensions_part[dim] = value
return dimensions_part
class RecordStoreUtils(TransformUtils):
"""utility methods to transform record store data."""
@staticmethod
def _get_record_store_df_schema():
"""get instance usage schema."""
columns = ["event_timestamp_string",
"event_type", "event_quantity_name",
"event_status", "event_version",
"record_type", "resource_uuid", "tenant_id",
"user_id", "region", "zone",
"host", "project_id",
"event_date", "event_hour", "event_minute",
"event_second", "metric_group", "metric_id"]
columns_struct_fields = [StructField(field_name, StringType(), True)
for field_name in columns]
# Add a column for a non-string fields
columns_struct_fields.insert(0,
StructField("event_timestamp_unix",
DoubleType(), True))
columns_struct_fields.insert(0,
StructField("event_quantity",
DoubleType(), True))
# map to metric meta
columns_struct_fields.append(StructField("meta",
MapType(StringType(),
StringType(),
True),
True))
# map to dimensions
columns_struct_fields.append(StructField("dimensions",
MapType(StringType(),
StringType(),
True),
True))
# map to value_meta
columns_struct_fields.append(StructField("value_meta",
MapType(StringType(),
StringType(),
True),
True))
schema = StructType(columns_struct_fields)
return schema
@staticmethod
def recordstore_rdd_to_df(record_store_rdd):
"""convert record store rdd to a dataframe."""
schema = RecordStoreUtils._get_record_store_df_schema()
return TransformUtils._rdd_to_df(record_store_rdd, schema)
@staticmethod
def create_df_from_json(sql_context, jsonpath):
"""create a record store df from json file."""
schema = RecordStoreUtils._get_record_store_df_schema()
record_store_df = sql_context.read.json(jsonpath, schema)
return record_store_df
@staticmethod
def prepare_recordstore_group_by_list(group_by_list):
"""Prepare record store group by list.
If the group by list contains any instances of "dimensions#", "meta#" or "value#meta" then
convert into proper dotted notation, since original raw "dimensions", "meta" and
"value_meta" are available in record_store data.
"""
return [RecordStoreUtils.prepare_group_by_item(item) for item in group_by_list]
@staticmethod
def prepare_group_by_item(item):
"""Prepare record store item for group by.
Convert replaces any special "dimensions#", "meta#" or "value#meta" occurrences into
"dimensions.", "meta." and value_meta.".
"""
if item.startswith("dimensions#"):
item = item.replace("dimensions#", "dimensions.")
elif item.startswith("meta#"):
item = item.replace("meta#", "meta.")
elif item.startswith("value_meta#"):
item = item.replace("value_meta#", "value_meta.")
return item
class TransformSpecsUtils(TransformUtils):
"""utility methods to transform_specs."""
@staticmethod
def _get_transform_specs_df_schema():
"""get transform_specs df schema."""
# FIXME: change when transform_specs df is finalized
source = StructField("source", StringType(), True)
usage = StructField("usage", StringType(), True)
setters = StructField("setters", ArrayType(StringType(),
containsNull=False), True)
insert = StructField("insert", ArrayType(StringType(),
containsNull=False), True)
aggregation_params_map = \
StructField("aggregation_params_map",
StructType([StructField("aggregation_period",
StringType(), True),
StructField("dimension_list",
ArrayType(StringType(),
containsNull=False),
True),
StructField("aggregation_group_by_list",
ArrayType(StringType(),
containsNull=False),
True),
StructField("usage_fetch_operation",
StringType(),
True),
StructField("filter_by_list",
ArrayType(MapType(StringType(),
StringType(),
True)
)
),
StructField(
"usage_fetch_util_quantity_event_type",
StringType(),
True),
StructField(
"usage_fetch_util_idle_perc_event_type",
StringType(),
True),
StructField("setter_rollup_group_by_list",
ArrayType(StringType(),
containsNull=False),
True),
StructField("setter_rollup_operation",
StringType(), True),
StructField("aggregated_metric_name",
StringType(), True),
StructField("pre_hourly_group_by_list",
ArrayType(StringType(),
containsNull=False),
True),
StructField("pre_hourly_operation",
StringType(), True),
StructField("aggregation_pipeline",
StructType([source, usage,
setters, insert]),
True)
]), True)
metric_id = StructField("metric_id", StringType(), True)
schema = StructType([aggregation_params_map, metric_id])
return schema
@staticmethod
def transform_specs_rdd_to_df(transform_specs_rdd):
"""convert transform_specs rdd to a dataframe."""
schema = TransformSpecsUtils._get_transform_specs_df_schema()
return TransformUtils._rdd_to_df(transform_specs_rdd, schema)
@staticmethod
def create_df_from_json(sql_context, jsonpath):
"""create a metric processing df from json file."""
schema = TransformSpecsUtils._get_transform_specs_df_schema()
transform_specs_df = sql_context.read.json(jsonpath, schema)
return transform_specs_df
class MonMetricUtils(TransformUtils):
"""utility methods to transform raw metric."""
@staticmethod
def _get_mon_metric_json_schema():
"""get the schema of the incoming monasca metric."""
metric_struct_field = StructField(
"metric",
StructType([StructField("dimensions",
MapType(StringType(),
StringType(),
True),
True),
StructField("value_meta",
MapType(StringType(),
StringType(),
True),
True),
StructField("name", StringType(), True),
StructField("timestamp", StringType(), True),
StructField("value", StringType(), True)]), True)
meta_struct_field = StructField("meta",
MapType(StringType(),
StringType(),
True),
True)
creation_time_struct_field = StructField("creation_time",
StringType(), True)
schema = StructType([creation_time_struct_field,
meta_struct_field, metric_struct_field])
return schema
@staticmethod
def create_mon_metrics_df_from_json_rdd(sql_context, jsonrdd):
"""create mon metrics df from json rdd."""
schema = MonMetricUtils._get_mon_metric_json_schema()
mon_metrics_df = sql_context.read.json(jsonrdd, schema)
return mon_metrics_df
class PreTransformSpecsUtils(TransformUtils):
"""utility methods to transform pre_transform_specs"""
@staticmethod
def _get_pre_transform_specs_df_schema():
"""get pre_transform_specs df schema."""
# FIXME: change when pre_transform_specs df is finalized
event_type = StructField("event_type", StringType(), True)
metric_id_list = StructField("metric_id_list",
ArrayType(StringType(),
containsNull=False),
True)
required_raw_fields_list = StructField("required_raw_fields_list",
ArrayType(StringType(),
containsNull=False),
True)
event_processing_params = \
StructField("event_processing_params",
StructType([StructField("set_default_zone_to",
StringType(), True),
StructField("set_default_geolocation_to",
StringType(), True),
StructField("set_default_region_to",
StringType(), True),
]), True)
schema = StructType([event_processing_params, event_type,
metric_id_list, required_raw_fields_list])
return schema
@staticmethod
def pre_transform_specs_rdd_to_df(pre_transform_specs_rdd):
"""convert pre_transform_specs processing rdd to a dataframe."""
schema = PreTransformSpecsUtils._get_pre_transform_specs_df_schema()
return TransformUtils._rdd_to_df(pre_transform_specs_rdd, schema)
@staticmethod
def create_df_from_json(sql_context, jsonpath):
"""create a pre_transform_specs df from json file."""
schema = PreTransformSpecsUtils._get_pre_transform_specs_df_schema()
pre_transform_specs_df = sql_context.read.json(jsonpath, schema)
return pre_transform_specs_df
@staticmethod
def prepare_required_raw_fields_list(group_by_list):
"""Prepare required fields list.
If the group by list contains any instances of "dimensions#field", "meta#field" or
"value_meta#field" then convert them into metric.dimensions["field"] syntax.
"""
return [PreTransformSpecsUtils.prepare_required_raw_item(item) for item in group_by_list]
@staticmethod
def prepare_required_raw_item(item):
"""Prepare required field item.
Convert replaces any special "dimensions#", "meta#" or "value_meta" occurrences into
spark rdd syntax to fetch field value.
"""
if item.startswith("dimensions#"):
field_name = item.replace("dimensions#", "")
return "metric.dimensions['%s']" % field_name
elif item.startswith("meta#"):
field_name = item.replace("meta#", "")
return "meta['%s']" % field_name
elif item.startswith("value_meta#"):
field_name = item.replace("value_meta#", "")
return "metric.value_meta['%s']" % field_name
else:
return item
class GroupingResultsUtils(TransformUtils):
"""utility methods to transform record store data."""
@staticmethod
def _get_grouping_results_df_schema(group_by_column_list):
"""get grouping results schema."""
group_by_field_list = [StructField(field_name, StringType(), True)
for field_name in group_by_column_list]
# Initialize columns for string fields
columns = ["firstrecord_timestamp_string",
"lastrecord_timestamp_string"]
columns_struct_fields = [StructField(field_name, StringType(), True)
for field_name in columns]
# Add columns for non-string fields
columns_struct_fields.append(StructField("firstrecord_timestamp_unix",
DoubleType(), True))
columns_struct_fields.append(StructField("lastrecord_timestamp_unix",
DoubleType(), True))
columns_struct_fields.append(StructField("firstrecord_quantity",
DoubleType(), True))
columns_struct_fields.append(StructField("lastrecord_quantity",
DoubleType(), True))
columns_struct_fields.append(StructField("record_count",
DoubleType(), True))
instance_usage_schema_part = StructType(columns_struct_fields)
grouping_results = \
StructType([StructField("grouping_key",
StringType(), True),
StructField("results",
instance_usage_schema_part,
True),
StructField("grouping_key_dict",
StructType(group_by_field_list))])
# schema = \
# StructType([StructField("GroupingResults", grouping_results)])
return grouping_results
@staticmethod
def grouping_results_rdd_to_df(grouping_results_rdd, group_by_list):
"""convert record store rdd to a dataframe."""
schema = GroupingResultsUtils._get_grouping_results_df_schema(
group_by_list)
return TransformUtils._rdd_to_df(grouping_results_rdd, schema)

View File

@ -1,6 +0,0 @@
---
upgrade:
- |
Python 2.7 support has been dropped. Last release of monasca-transform
to support python 2.7 is OpenStack Train. The minimum version of Python now
supported by monasca-transform is Python 3.6.

View File

@ -1,14 +0,0 @@
# The order of packages is significant, because pip processes them in the order
# of appearance. Changing the order has an impact on the overall integration
# process, which may cause wedges in the gate later.
pbr!=2.1.0,>=2.0.0 # Apache-2.0
psutil>=3.2.2 # BSD
PyMySQL>=0.7.6 # MIT License
six>=1.10.0 # MIT
SQLAlchemy!=1.1.5,!=1.1.6,!=1.1.7,!=1.1.8,>=1.0.10 # MIT
stevedore>=1.20.0 # Apache-2.0
monasca-common>=2.7.0 # Apache-2.0
oslo.config>=5.2.0 # Apache-2.0
oslo.log>=3.36.0 # Apache-2.0
oslo.service!=1.28.1,>=1.24.0 # Apache-2.0
tooz>=1.58.0 # Apache-2.0

View File

@ -1,21 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from zipfile import PyZipFile
with PyZipFile("monasca-transform.zip", "w") as spark_submit_zipfile:
spark_submit_zipfile.writepy(
"../monasca_transform"
)

View File

@ -1,15 +0,0 @@
#!/usr/bin/env bash
SCRIPT_HOME=$(dirname $(readlink -f $BASH_SOURCE))
pushd $SCRIPT_HOME
echo "create_zip.py: creating a zip file at ../monasca_transform/monasca-transform.zip..."
python create_zip.py
rc=$?
if [[ $rc == 0 ]]; then
echo "created zip file at ../monasca_transfom/monasca-transform.zip sucessfully"
else
echo "error creating zip file at ../monasca_transform/monasca-transform.zip, bailing out"
exit 1
fi
popd

View File

@ -1,110 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
"""Generator for ddl
-t type of output to generate - either 'pre_transform_spec' or 'transform_spec'
-o output path
-i path to template file
"""
import getopt
import json
import os.path
import sys
class Generator(object):
key_name = None
def generate(self, template_path, source_json_path, output_path):
print("Generating content at %s with template at %s, using key %s" % (
output_path, template_path, self.key_name))
data = []
with open(source_json_path) as f:
for line in f:
json_line = json.loads(line)
data_line = '(\'%s\',\n\'%s\')' % (
json_line[self.key_name], json.dumps(json_line))
data.append(str(data_line))
print(data)
with open(template_path) as f:
template = f.read()
with open(output_path, 'w') as write_file:
write_file.write(template)
for record in data:
write_file.write(record)
write_file.write(',\n')
write_file.seek(-2, 1)
write_file.truncate()
write_file.write(';')
class TransformSpecsGenerator(Generator):
key_name = 'metric_id'
class PreTransformSpecsGenerator(Generator):
key_name = 'event_type'
def main():
# parse command line options
try:
opts, args = getopt.getopt(sys.argv[1:], "ht:o:i:s:")
print('Opts = %s' % opts)
print('Args = %s' % args)
except getopt.error as msg:
print(msg)
print("for help use --help")
sys.exit(2)
script_type = None
template_path = None
source_json_path = None
output_path = None
# process options
for o, a in opts:
if o in ("-h", "--help"):
print(__doc__)
sys.exit(0)
elif o == "-t":
script_type = a
if a not in ('pre_transform_spec', 'transform_spec'):
print('Incorrect output type specified: \'%s\'.\n %s' % (
a, __doc__))
sys.exit(1)
elif o == "-i":
template_path = a
if not os.path.isfile(a):
print('Cannot find template file at %s' % a)
sys.exit(1)
elif o == "-o":
output_path = a
elif o == "-s":
source_json_path = a
print("Called with type = %s, template_path = %s, source_json_path %s"
" and output_path = %s" % (
script_type, template_path, source_json_path, output_path))
generator = None
if script_type == 'pre_transform_spec':
generator = PreTransformSpecsGenerator()
elif script_type == 'transform_spec':
generator = TransformSpecsGenerator()
generator.generate(template_path, source_json_path, output_path)
if __name__ == '__main__':
main()

View File

@ -1,6 +0,0 @@
DELETE FROM `monasca_transform`.`pre_transform_specs`;
INSERT IGNORE INTO `monasca_transform`.`pre_transform_specs`
(`event_type`,
`pre_transform_spec`)
VALUES

View File

@ -1,6 +0,0 @@
DELETE FROM `monasca_transform`.`transform_specs`;
INSERT IGNORE INTO `monasca_transform`.`transform_specs`
(`metric_id`,
`transform_spec`)
VALUES

View File

@ -1,31 +0,0 @@
#!/usr/bin/env bash
SCRIPT_HOME=$(dirname $(readlink -f $BASH_SOURCE))
pushd $SCRIPT_HOME
PRE_TRANSFORM_SPECS_JSON="../monasca_transform/data_driven_specs/pre_transform_specs/pre_transform_specs.json"
PRE_TRANSFORM_SPECS_SQL="ddl/pre_transform_specs.sql"
TRANSFORM_SPECS_JSON="../monasca_transform/data_driven_specs/transform_specs/transform_specs.json"
TRANSFORM_SPECS_SQL="ddl/transform_specs.sql"
echo "converting {$PRE_TRANSFORM_SPECS_JSON} to {$PRE_TRANSFORM_SPECS_SQL} ..."
python ddl/generate_ddl.py -t pre_transform_spec -i ddl/pre_transform_specs_template.sql -s "$PRE_TRANSFORM_SPECS_JSON" -o "$PRE_TRANSFORM_SPECS_SQL"
rc=$?
if [[ $rc == 0 ]]; then
echo "converting {$PRE_TRANSFORM_SPECS_JSON} to {$PRE_TRANSFORM_SPECS_SQL} sucessfully..."
else
echo "error in converting {$PRE_TRANSFORM_SPECS_JSON} to {$PRE_TRANSFORM_SPECS_SQL}, bailing out"
exit 1
fi
echo "converting {$TRANSFORM_SPECS_JSON} to {$TRANSFORM_SPECS_SQL}..."
python ddl/generate_ddl.py -t transform_spec -i ddl/transform_specs_template.sql -s "$TRANSFORM_SPECS_JSON" -o "$TRANSFORM_SPECS_SQL"
rc=$?
if [[ $rc == 0 ]]; then
echo "converting {$TRANSFORM_SPECS_JSON} to {$TRANSFORM_SPECS_SQL} sucessfully..."
else
echo "error in converting {$TRANSFORM_SPECS_JSON} to {$TRANSFORM_SPECS_SQL}, bailing out"
exit 1
fi
popd

View File

@ -1,11 +0,0 @@
#!/usr/bin/env bash
SCRIPT_HOME=$(dirname $(readlink -f $BASH_SOURCE))
pushd $SCRIPT_HOME
./generate_ddl.sh
cp ddl/pre_transform_specs.sql ../devstack/files/monasca-transform/pre_transform_specs.sql
cp ddl/transform_specs.sql ../devstack/files/monasca-transform/transform_specs.sql
popd

View File

@ -1,20 +0,0 @@
#!/usr/bin/env bash
SCRIPT_HOME=$(dirname $(readlink -f $BASH_SOURCE))
pushd $SCRIPT_HOME
pushd ../
rm -rf build monasca-transform.egg-info dist
python setup.py bdist_egg
found_egg=`ls dist`
echo
echo
echo Created egg file at dist/$found_egg
dev=dev
find_dev_index=`expr index $found_egg $dev`
new_filename=${found_egg:0:$find_dev_index - 1 }egg
echo Copying dist/$found_egg to dist/$new_filename
cp dist/$found_egg dist/$new_filename
popd
popd

View File

@ -1,3 +0,0 @@
#!/bin/bash
JARS_PATH="/opt/spark/current/lib/spark-streaming-kafka.jar,/opt/spark/current/lib/scala-library-2.10.1.jar,/opt/spark/current/lib/kafka_2.10-0.8.1.1.jar,/opt/spark/current/lib/metrics-core-2.2.0.jar"
pyspark --master spark://192.168.10.4:7077 --jars $JARS_PATH

View File

@ -1,19 +0,0 @@
#!/bin/bash
SCRIPT_HOME=$(dirname $(readlink -f $BASH_SOURCE))
pushd $SCRIPT_HOME
pushd ../
JARS_PATH="/opt/spark/current/lib/spark-streaming-kafka.jar,/opt/spark/current/lib/scala-library-2.10.1.jar,/opt/spark/current/lib/kafka_2.10-0.8.1.1.jar,/opt/spark/current/lib/metrics-core-2.2.0.jar,/usr/share/java/mysql.jar"
export SPARK_HOME=/opt/spark/current/
# There is a known issue where obsolete kafka offsets can cause the
# driver to crash. However when this occurs, the saved offsets get
# deleted such that the next execution should be successful. Therefore,
# create a loop to run spark-submit for two iterations or until
# control-c is pressed.
COUNTER=0
while [ $COUNTER -lt 2 ]; do
spark-submit --supervise --master spark://192.168.10.4:7077,192.168.10.5:7077 --conf spark.eventLog.enabled=true --jars $JARS_PATH --py-files dist/$new_filename /opt/monasca/transform/lib/driver.py || break
let COUNTER=COUNTER+1
done
popd
popd

View File

@ -1,49 +0,0 @@
[metadata]
name=monasca_transform
summary=Data Aggregation and Transformation component for Monasca
description-file = README.rst
author= OpenStack
author-email = openstack-discuss@lists.openstack.org
home-page=https://wiki.openstack.org/wiki/Monasca/Transform
python-requires = >=3.6
classifier =
Environment :: OpenStack
Intended Audience :: Information Technology
Intended Audience :: System Administrators
License :: OSI Approved :: Apache Software License
Operating System :: POSIX :: Linux
Programming Language :: Python
Programming Language :: Python :: Implementation :: CPython
Programming Language :: Python :: 3 :: Only
Programming Language :: Python :: 3
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
[files]
packages =
monasca_transform
[entry_points]
monasca_transform.usage =
calculate_rate = monasca_transform.component.usage.calculate_rate:CalculateRate
fetch_quantity = monasca_transform.component.usage.fetch_quantity:FetchQuantity
fetch_quantity_util = monasca_transform.component.usage.fetch_quantity_util:FetchQuantityUtil
monasca_transform.setter =
set_aggregated_metric_name = monasca_transform.component.setter.set_aggregated_metric_name:SetAggregatedMetricName
set_aggregated_period = monasca_transform.component.setter.set_aggregated_period:SetAggregatedPeriod
rollup_quantity = monasca_transform.component.setter.rollup_quantity:RollupQuantity
monasca_transform.insert =
prepare_data = monasca_transform.component.insert.prepare_data:PrepareData
insert_data = monasca_transform.component.insert.kafka_insert:KafkaInsert
insert_data_pre_hourly = monasca_transform.component.insert.kafka_insert_pre_hourly:KafkaInsertPreHourly
[pbr]
warnerrors = True
autodoc_index_modules = True
[build_sphinx]
all_files = 1
build-dir = doc/build
source-dir = doc/source

View File

@ -1,20 +0,0 @@
# Copyright (c) 2013 Hewlett-Packard Development Company, L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import setuptools
setuptools.setup(
setup_requires=['pbr>=2.0.0'],
pbr=True)

View File

@ -1,14 +0,0 @@
# The order of packages is significant, because pip processes them in the order
# of appearance. Changing the order has an impact on the overall integration
# process, which may cause wedges in the gate later.
# mock object framework
hacking>=1.1.0,<1.2.0 # Apache-2.0
flake8<2.6.0,>=2.5.4 # MIT
nose>=1.3.7 # LGPL
fixtures>=3.0.0 # Apache-2.0/BSD
pycodestyle==2.5.0 # MIT License
stestr>=2.0.0 # Apache-2.0
# required to build documentation
sphinx!=1.6.6,!=1.6.7,>=1.6.2,!=2.1.0 # BSD
# computes code coverage percentages
coverage!=4.4,>=4.0 # Apache-2.0

View File

View File

@ -1,27 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
# Add the location of Spark to the path
# TODO(someone) Does the "/opt/spark/current" location need to be configurable?
import os
import sys
try:
sys.path.append(os.path.join("/opt/spark/current", "python"))
sys.path.append(os.path.join("/opt/spark/current",
"python", "lib", "py4j-0.10.4-src.zip"))
except KeyError:
print("Error adding Spark location to the path")
# TODO(someone) not sure what action is appropriate
sys.exit(1)

View File

@ -1,87 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from unittest import mock
from pyspark.sql import SQLContext
from monasca_transform.config.config_initializer import ConfigInitializer
from monasca_transform.transform.builder.generic_transform_builder \
import GenericTransformBuilder
from monasca_transform.transform.transform_utils import RecordStoreUtils
from monasca_transform.transform.transform_utils import TransformSpecsUtils
from monasca_transform.transform import TransformContextUtils
from tests.functional.spark_context_test import SparkContextTest
from tests.functional.test_resources.mem_total_all.data_provider \
import DataProvider
from tests.functional.test_resources.mock_component_manager \
import MockComponentManager
class TransformBuilderTest(SparkContextTest):
def setUp(self):
super(TransformBuilderTest, self).setUp()
# configure the system with a dummy messaging adapter
ConfigInitializer.basic_config(
default_config_files=[
'tests/functional/test_resources/config/test_config.conf'])
@mock.patch('monasca_transform.transform.builder.generic_transform_builder'
'.GenericTransformBuilder._get_insert_component_manager')
@mock.patch('monasca_transform.transform.builder.generic_transform_builder'
'.GenericTransformBuilder._get_setter_component_manager')
@mock.patch('monasca_transform.transform.builder.generic_transform_builder'
'.GenericTransformBuilder._get_usage_component_manager')
def test_transform_builder(self,
usage_manager,
setter_manager,
insert_manager):
usage_manager.return_value = MockComponentManager.get_usage_cmpt_mgr()
setter_manager.return_value = \
MockComponentManager.get_setter_cmpt_mgr()
insert_manager.return_value = \
MockComponentManager.get_insert_cmpt_mgr()
record_store_json_path = DataProvider.record_store_path
metric_proc_json_path = DataProvider.transform_spec_path
sql_context = SQLContext.getOrCreate(self.spark_context)
record_store_df = \
RecordStoreUtils.create_df_from_json(sql_context,
record_store_json_path)
transform_spec_df = TransformSpecsUtils.create_df_from_json(
sql_context, metric_proc_json_path)
transform_context = TransformContextUtils.get_context(
transform_spec_df_info=transform_spec_df,
batch_time_info=self.get_dummy_batch_time())
# invoke the generic transformation builder
instance_usage_df = GenericTransformBuilder.do_transform(
transform_context, record_store_df)
result_list = [(row.usage_date, row.usage_hour,
row.tenant_id, row.host, row.quantity,
row.aggregated_metric_name)
for row in instance_usage_df.rdd.collect()]
expected_result = [('2016-02-08', '18', 'all',
'all', 12946.0,
'mem.total_mb_agg')]
self.assertCountEqual(result_list, expected_result)

View File

@ -1,72 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from oslo_config import cfg
from monasca_transform.component.insert import InsertComponent
from tests.functional.messaging.adapter import DummyAdapter
class DummyInsert(InsertComponent):
"""Insert component that writes metric data to kafka queue"""
@staticmethod
def insert(transform_context, instance_usage_df):
"""write instance usage data to kafka"""
transform_spec_df = transform_context.transform_spec_df_info
agg_params = transform_spec_df.select("aggregation_params_map"
".dimension_list"
).collect()[0].asDict()
cfg.CONF.set_override(
'adapter',
'tests.functional.messaging.adapter:DummyAdapter',
group='messaging')
# Approach 1
# using foreachPartition to iterate through elements in an
# RDD is the recommended approach so as to not overwhelm kafka with the
# zillion connections (but in our case the MessageAdapter does
# store the adapter_impl so we should not create many producers)
# using foreachpartitions was causing some serialization (cpickle)
# problems where few libs like kafka.SimpleProducer and oslo_config.cfg
# were not available
#
# removing _write_metrics_from_partition for now in favor of
# Approach 2
#
# instance_usage_df_agg_params = instance_usage_df.rdd.map(
# lambda x: InstanceUsageDataAggParams(x,
# agg_params))
# instance_usage_df_agg_params.foreachPartition(
# DummyInsert._write_metrics_from_partition)
#
# Approach # 2
#
# using collect() to fetch all elements of an RDD
# and write to kafka
#
for instance_usage_row in instance_usage_df.collect():
metric = InsertComponent._get_metric(instance_usage_row,
agg_params)
# validate metric part
if InsertComponent._validate_metric(metric):
DummyAdapter.send_metric(metric)
return instance_usage_df

View File

@ -1,69 +0,0 @@
# Copyright 2016 Hewlett Packard Enterprise Development Company LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from oslo_config import cfg
from monasca_transform.component.insert import InsertComponent
from tests.functional.messaging.adapter import DummyAdapter
class DummyInsertPreHourly(InsertComponent):
"""Insert component that writes metric data to kafka queue"""
@staticmethod
def insert(transform_context, instance_usage_df):
"""write instance usage data to kafka"""
transform_spec_df = transform_context.transform_spec_df_info
agg_params = transform_spec_df.select("metric_id"
).collect()[0].asDict()
metric_id = agg_params['metric_id']
cfg.CONF.set_override(
'adapter',
'tests.functional.messaging.adapter:DummyAdapter',
group='messaging')
# Approach 1
# using foreachPartition to iterate through elements in an
# RDD is the recommended approach so as to not overwhelm kafka with the
# zillion connections (but in our case the MessageAdapter does
# store the adapter_impl so we should not create many producers)
# using foreachpartitions was causing some serialization (cpickle)
# problems where few libs like kafka.SimpleProducer and oslo_config.cfg
# were not available
#
# removing _write_metrics_from_partition for now in favor of
# Approach 2
#
# instance_usage_df_agg_params = instance_usage_df.rdd.map(
# lambda x: InstanceUsageDataAggParams(x,
# agg_params))
# instance_usage_df_agg_params.foreachPartition(
# DummyInsert._write_metrics_from_partition)
#
# Approach # 2
#
# using collect() to fetch all elements of an RDD
# and write to kafka
#
for instance_usage_row in instance_usage_df.collect():
instance_usage_dict = InsertComponent\
._get_instance_usage_pre_hourly(instance_usage_row, metric_id)
DummyAdapter.send_metric(instance_usage_dict)
return instance_usage_df

Some files were not shown because too many files have changed in this diff Show More