There are several invalid cased that Rally ignored before and that will
become an validation error soon.
* transmitting None where dict is expected
* setting 'name' of entities. It is restricted thing, since rally
generates pseudo-random names that can be filtered by celanup
mechanism. Currently, Rally overrides 'name' params silently, but it
will become an error soon.
Not long ago we enabled a rally scenario booting VMs in the neutron gate
so we can collect osprofiler reports about it. The rally scenario we
re-used was only triggered by changes in the rally-openstack repo so I
could not collect data about its failure rate. Now that it's running
frequently in the neutron gate this scenario actually seems to be quite
unstable (usually timing out while waiting for the VM to get to ACTIVE):
http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:\"rally.exceptions.TimeoutException: Rally tired waiting\" AND build_name:\"neutron-rally-task\" AND voting:1&from=864000s
Since we only want to run this scenario for the osprofiler report
we can get rid of the gate instability by allowing a 100% failure rate
in the scenario SLA.
This commit makes use of recently merged functionality in rally
to create multiple security group rules per security group and list
them. This will help us identify API performance regressions with
respect to security group rules.
The option to define the name of floating network enables more
flexibility for using the task file as is.
Signed-off-by: Juha Kosonen <email@example.com>
Rally team moved OpenStack plugins under separate repository and in-tree
code is deprecated now and will be removed soon.
This patch changes several imports to use the latest available code.
Rally team finally added native Zuul V3 jobs (with a bunch of separate
roles and etc) and for simplification of maintainance, it would be nice
to use them.
are deprecated in Rally 0.10.0 .
Instead of calling it directly they should be used via
new decorator 'rally.common.validation.add' and this commit
switches it to use validators in new way.
When at least one service named as q-* is present in ENABLED_SERVICES,
then devstack utilizes lib/neutron-legacy to configure services
regardless of how other services are deployed (e.g. with lib/neutron).
This breaks deployment using lib/neutron.
Switching to new names doesn't change anything substantial because
devstack plugin equally handles both variants. It allows to use new
devstack neutron library though.
The new task format was introduced recently. It unifies different
sections and tries to make the things a bit simpler.
Rally task consists of subtasks. Their amount should be at least one.
The subtask is a group of workloads. Soon, it will be possible to define
a single SLA for all workloads in a subtask and even more - use once
executed contexts for workloads (i.e. create temporary users, network not
for each workload but for a group of them).
The workload is a combination of different plugins to be executed for a
test. The most important are Scenario plugin (what will be executed in
each iteration), runner (how the load should be generated) and contexts
(what resources should be precreated before the workload).
One scenario with different runners/contexts can create different load.
To distinguish them the new section "description" of workload was
introduced. It allows to add a custom description for a workload which
will be dispayed in the report files. In case of missing "description"
section, the description of scenario will be taken.
Also, I need to mention that "failure_rate: 0" SLA is a default now, so
there is no need to specify it.
Since I9d3bafa075631a3f48cbd3627a4cc1a5a859cce2 in rally, platform
should be part of context name (context@platform).
Otherwise, a warning message (or even breakage because of
I10ac687f9f420dcf0d907b51d5d9303f68d35719) may be triggered.
This test is executed 4 times and creating 1000 Neutron ports
just to run this scenario 4 times results in this job taking
15 minutes to complete its iterations.
This cuts the count in half to 125 per execution to cut the run
time in half and ensure we don't get to close to the gate timeout.
This was done once before as part of
817a19c4b9 but unfortunately being
part of another SLA change resulted in it being reverted as part
of the SLA change.
New validation now enforces that times is >= concurrency because times
is total number of *runs*, not the number of concurrent *run sets*.
In a normal gate run these are returning in 2 seconds each
on average. Let's reduce the SLA of these from 15 to 5 now
to help prevent future performance regressions in this area.
During a normal run, the top three scenarios account for slightly
more than a half hour of runtime. Sample numbers:
Scenario Load Duration Full Duration
NeutronNetworks.create_and_update_subnets 562.189 1,182.400
NeutronTrunks.create_and_list_trunk_subports 427.475 600.721
NeutronNetworks.create_and_list_ports 310.167 540.144
This patch reduces the resources created by each of the 3 by 75%. This
should save us an additional ~20 minutes during a normal gate run, which
should change our window for timeout from approximately a half hour on a
normal node to about 50 minutes.
This additional buffer should hopefully be enough to reduce the failure
rate for the rally job when it gets scheduled to a slow node.
This allows us to configure neutron when running the rally job in
the gate. This effort stems from patch . Blame Kevin for not
wanting to squash the two together.
The previous configuration of the task was taking up
to a half hour to run. Between this task and the others,
it was eating up all of our gate time, leaving no room
to add new jobs.
This reduces it 5 times in the number of runs, from 40
to 8. This still gives us a reasonable number to get an
average from, especially since each run creates 100 ports.
Previous runs are showing that creating ports under this high load
ends up taking >5 secs per port on average. Lets set to 4, which is
still double the api_worker count in some cases for the current gate.
Quoting the quota devref:
For a reservation to be successful, the total amount of resources requested,
plus the total amount of resources reserved, plus the total amount of resources
already stored in the database should not exceed the project's quota limit.
This means that in the absolute worst case scenario with 20 concurrent
workers, 19 could have made reservations, committed resources, but not
yet cleared their reservation. Because of the outstanding reservation
and the resources created by the 19 workers, they will all be
double-counted until their reservation is cleared (or it expires).
This adjusts the rally scenarios to handle the double-count for
This increases the rally port and network count to 100
and enables quotas to exercise the quota engine to
better simulate a real system.
Additionally, it reduces the SLA requirements because of
regressions that have snuck in throughout the cycle. As
they are fixed these should be reduced back down.
* Since 24 Nov 2014 we added a lot of Neutron benchmarks
Running more Neutron related benchmarks in Neutron gate allows
to avoid performance regressions and races.
* Neutron benchmarks are described here:
It's quite simple code be free to take a look.
* All changes in concurrency and times are related to optimization
* To get description of benchmarks use:
rally info find NeutronNetworks.create_and_update_networks
related bug: #bug 1419723
*) Rename rally-scenarios that is quite misleading to rally-jobs.
rally-jobs makes much more sense, cause it actually contains files
related to rally job
*) Update rally-jobs/README.rst to add more info
*) Update rally-jobs/plugins/README.rst to expaling plugins
*) Add new directory rally-jobs/extra, this directory is copy pasted
in gates and can be used for files that are required by some of