This patch reduces memory usage for Cinder Volume and Backup services by
tuning glibc.
The specific tuning consist on disabling the per thread arenas and
disabling dynamic thresholds.
The Cinder Backup service suffers from high water mark memory usage and
uses excessive memory. As an example just after 10 restore operations
the service uses almost 1GB of RAM and does not ever free it afterwards.
With this patch the memory consumption of the service is reduced down to
almost 130MB. If we add a revert from Cinder (Change-Id
I43a20c8687f12bc52b014611cc6977c4c3ca212c) it goes down to 100MB during
my tests.
It's even worse on real deployments, where we have seen a reduction in
peak memory usage, going down from 2.9GB to 1.2GB, and fixing the high
water mark issue that previously left the service constantly using 2GB
going down to 140MB.
This glibc tuning is not applied to all Python services because I
haven't done proper testings on them and at first glance they don't seem
to have such great improvements.
This is the equivalent of the devstack proposed patch from Change-Id
Ic9030d01468b3189350f83b04a8d1d346c489d3c
Related-bug: #1908805
Change-Id: I65b32f4ce3fddeb694fb33ca65076d45d23a3bb6
This commit updates the default tripleo_containers jinja template
splitting off the Ceph related container images.
With this new approach pulling the ceph containers is optional,
and can be avoided by setting the 'ceph_images' boolean to False.
To make this possible, a new jinja template processing approach
has been introduced, and a template basedir parameter (required
by the jinja loader) has been added to the BaseImageManager.
In particular:
- the 'template_dir' parameter represents the location path to the
j2 templates that can be included in the main tripleo containers
template; a default location (which matches with the default j2
path) has been added, but if nothing is passed the old behavior
is maintained;
- Two more 'ceph_' prefixed containers, required to deploy the Ceph
Ingress daemon are added, and they are supposed to match with the
tripleo-heat-templates 'OS::TripleO::Services::CephIngress' service.
The Ingress daemon won’t be baked into the Ceph daemon container,
hence `tripleo container image prepare` should be executed to pull
the new container images/tags in the undercloud as made for the
Ceph Dashboard and the regular Ceph image.
Change-Id: I7e337596b653cf635f07a36606e9f673044402a3
Now that the whole-disk image is being deployed, mounting an image
using tools like kpartx or qemu-nbd is much more involved, requiring
knowledge of the image LVM volumes and their intended mount points.
The scripts tripleo-mount-image and tripleo-unmount-image will mount
the contents of an overcloud image file using qemu-nbd, making it available
for chroot, or other read/write image operations. The scripts handle
partition images (overcloud-full.qcow2) as well as the whole-disk image
(overcloud-hardened-uefi-full.qcow2) with its multiple LVM volume mount
points.
qemu-nbd was chosen over kpartx as downstream documentation[1] has
standardized on this tool for mount based image modifications.
tripleo-unmount-image is a symlink to tripleo-mount-image and behaves
differently based on the script name.
Blueprint: whole-disk-default
[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/bare_metal_provisioning/booting-from-cinder-volumes
Change-Id: I3267b4ae5200eeed333a9518865260d23315f52c
When removing support for the "mac" field in nodes
json a check was added to raise an error when using
the removed field. There was a typo in the field name.
This fixes the typo, and the validation.
Also fix the same typo in the release note.
Change-Id: I518c854af6d8e2853fe661902b101a4cdecff2a7
Related-Bug: 1934133
In change I74d4178dbb0cfe8c934ce15e3e7c9bb1c469de10
the "macs" field in nodes_json was deprecated and
replaced by the "ports" field.
It has been several cycles with a deprecation warning
in the logs. Remove convert_nodes_json_mac_to_ports
which keept backward compatibility.
Depends-On: Ia4fa4b950114c5fcc787a7ffb957360c65c850c9
Depends-On: Ib520d5e8366159917076c35cf80efb4b5fcffca6
Change-Id: I559cc656758843a8b1432069952b8d917fc4649a
This removes the skopeo based container image uploader which was
deprecated back in stein. It doesn't work with our image-serve registry
so let's drop this code and dependency.
Change-Id: I23157ad3a96553047a52f10ed203f71366da49ef
Add file to the reno documentation build to show release notes for
stable/wallaby.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/wallaby.
Sem-Ver: feature
Change-Id: I9adc416e36a1126a0a466d453430f08cd2dafe2b
This change updates the artifact tooling to ensure that we're not
assuming we have access to a Swift deployment on the undercloud.
The tooling will now sync files locally and push them to target
host(s) during deployment time.
Depends-On: I5d18cf334c1bc4011db968fbeb4f9e41869611cd
Change-Id: If7307bfd61456aaebd28aae20ac5c9025c25d68c
Signed-off-by: Kevin Carter <kecarter@redhat.com>
The tempest container was used with validate-tempest, which is right now
deprecated, and all the jobs running upstream are using os_tempest,
which doesn't rely on containerized tempest, but only on the rpm
packages and/or install tempest directly from git.
The tempest container image is no longer being used.
Depends-On: https://review.opendev.org/c/openstack/tripleo-quickstart/+/778447
Related-Bug: #1916875
Change-Id: I4dc869019a04bf092f8c219e470be55f3bc682d3
The qemu user on the host gets created using uid/gid 107. Certificates
on the host, but also the vhost-user sockets created by ovs use this
uid/gid. With the move to TCIB images the default kolla id were
reverted and the previous overwrite dropped. This make e.g. the qemu
processes to fail to use the libvirt-vnc bind mounted certificates.
This change brings back the previous overwrite of the qemu user
uid/gid.
Closes-Bug: #1903508
Related-Bug: #1900986
Change-Id: I54b9d9f341b521b415a6dccc6c78ae7a77821f6f
Add file to the reno documentation build to show release notes for
stable/victoria.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/victoria.
Change-Id: Iab01a9be7bfeb30842189b62b681fd42c2a69d30
Sem-Ver: feature
In older versions we used the overcloud_containers.yaml.j2 to perform
the service mapping to container names. These containers have
image_source: kolla. Since we've implemented tcib, we have a new
image_source: tripleo which sould be used.
Change-Id: I79a9bb831339567fad0eecf5aed4af74ceb07b57
Currently the modify_only_with_labels has to query the container
registry to see what containers we should modify. We can use the source
in from our service to container mapping to limit which containers we
want to update. In the upstream CI, we only want to modify the kolla or
tripleo containers as we want to prevent ceph and other related
containers from being updated.
Change-Id: I4bff2b96f7b13bde808f929c3567dcf167f1eacd
Related-Bug: #1889122
When using specifying ContainerImagePrepare if a tag is explicitly
provided in a set, the tag_from_label functionality will not be run as
we use the defined tag for the containers. Previously we would still
attempt tag lookups even if we wanted a specific tag.
This would also cause failures during the deployment if using a
namespace that didn't have the defined tag_from_label format but we
defined a specific tag we wanted to use. e.g. using a tag with a md5 vs
a tag_from_label with numbers in it.
Change-Id: I4966641aed1a21be60d915ea58dda78b80fe0e1f
Partial-Bug: #1889122
From now, limit_hosts will take precedence over the blacklisted_hostnames.
And therefore Ansible won't be run with two --limit if both limit hosts
and blacklisted hostnames are in use. When we want to run Ansible on
specific hosts, we will ignore the blacklisted nodes and assume we know
what we do. In the case of the scale-down scenario, the unreachable nodes
are ignored.
Note: adding unit tests coverage for both parameters.
Change-Id: I2e9fc7b9e9005fce7d956f1b936054e540b39849
Closes-Bug: #1857298
When no tag is set for an entry in ContainerImagePrepare, the default
tag from container-images/container_image_prepare_defaults.yaml will be
assumed, which is typically current-tripleo, or a release version such
as 16.0.
In most cases, this tag will exist. However, when using a satellite with
a content view that has been filtered on tags, it likely won't exist
since content views are often used with container image versions that
are not the latest.
In the case where no tag is set in the entry in ContainerImagePrepare,
and the default tag does not exist in the container repo, the latest tag
from the repo will be assumed instead.
Change-Id: I985ef22c340c4071866c8c51bf303a6f4ee7713c
Closes-Bug: #1886547
Signed-off-by: James Slagle <jslagle@redhat.com>
Switch to openstackdocstheme 2.2.1 and reno 3.1.0 versions. Using
these versions will allow especially:
* Linking from HTML to PDF document
* Allow parallel building of documents
* Fix some rendering problems
Update Sphinx version as well.
Change pygments_style to 'native' since old theme version always used
'native' and the theme now respects the setting and using 'sphinx' can
lead to some strange rendering.
openstackdocstheme renames some variables, so follow the renames
before the next release removes them. A couple of variables are also
not needed anymore, remove them.
See also
http://lists.openstack.org/pipermail/openstack-discuss/2020-May/014971.html
Change-Id: I3532fe039e754c3518a25b72094bf421d1f1c299
Add file to the reno documentation build to show release notes for
stable/ussuri.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/ussuri.
Change-Id: I381116b615b0e92e529c7dfb989ca1800e9b4463
Sem-Ver: feature
Extend Jinja2 envrionment with a raise method so that
we can add logic in Jinja2 and raise errors.
In jinja2 template the raise method can be used like this:
{%- if condition %}
{{ raise('MESSAGE') }}
{%- endif %}
Change-Id: Idaa1d570b129aa6c8c117b8087d1aad7ae987a47
Skydive is not supported anymore and is causing promotions inconsistency
with the containers list.
Removing reference from containers template and tests
Change-Id: Id069e6150d12b6e32f3206efbfecf1c4886a0e3c
Added image yaml config files to build an overcloud-ceph image that does
not have the HA, openvswitch or openstack client related packages
installed by default. This image is useful for dedicated ceph nodes that
do not have any openstack related services on them.
Change-Id: I4e14a49c428b8b7530f49218b413c795e777851b
This patch does three main things:
- drop the ultra-verbose output (set -x), adding a new param we can use
when calling the healthcheck directly
- move away from the "ss" calls and the multiple pipes used to filter
its output, using "lsof" and native filtering
- rewrite the "ps" call in order to use ps native filters instead of
piping through grep
These changes should make the healthchecks stronger, and avoid some
weird issues due to the pipes, and lower the amount of logs while
keeping the important information visible.
In order to get verbose mode when running the healthcheck directly, you
can do as follow:
podman exec -u root -ti <container> bash
HEALTHCHECK_DEBUG=1 ./openstack/healthcheck
or
podman -u root -e "HEALTHCHECK_DEBUG=1" <container> /openstack/healthcheck
and enjoy a nice debug output.
Change-Id: I137fe3211043b00b553db26b2f5930f98373496d
The task to add the RootStackName parameter to plan-environment.yaml in
the plan was only in the create_deployment_plan workflow. This patch
also adds it to the update_deployment_plan workflow so that the
parameter will be present in the environment for both stack create and
update.
Change-Id: I560eaff21c22d2ab70c657b35a3d43a76003e6ba
Closes-Bug: #1853362
This patch adds checks for the replicators.
It also removes some unused or invalid code from other checks. Checking
modification time of the recon files is not enough, these might also be
changed by other Swift processes and are not a good indicator for stuck
processes.
Co-Authored-By: Christian Schwede <cschwede@redhat.com>
Co-Authored-By: Emilien Macchi <emilien@redhat.com>
Change-Id: Ib15f1ec4766bf4d64a2860422c230e4d514bc224
Add file to the reno documentation build to show release notes for
stable/train.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/train.
Change-Id: I6ff5f9436ae81ab8c29bf469701f6b6b28ce4c47
Sem-Ver: feature
openstack cli doesn't negotiate a microversion. Live migration and
multiattach are 2 examples of operations which require arcane
incantations
to make them work correctly, and therefore usually don't.
This adds ``OS_COMPUTE_API_VERSION=2.latest`` to the overcloudrc
file to fix it.
Change-Id: I6bb4ef5d3d0e53b12f8636b998d7f7c2426c2b60
We switched to the python image uploader in Stein and would like to
remove the Skopeo based uploader in a future release as it may not
function correctly with our undercloud container registry.
Change-Id: I470f1778a7042ca8a4bcd5542dd53e29ad0c4eee
When deploying a large amount of nodes, the create_admin_via_ssh
workflow could fail due to the large amount of ansible output generated.
This patch updates the tripleo.ansible-playbook action in the workflow
with trash_output:true so that the output is not saved in the mistral
DB.
There is a log file saved already in case the output is needed for debug
purposes.
Change-Id: I078b22fb0a0e7116f87419b444b8b4039db73ef8
Closes-Bug: #1842102
Previously, trash_output was not honored if a queue was not being used
to post messages.
This patch changes the behavior so that trash_output will be honored
even if a queue is not being used, and all stdout/stderr will be
discarded.
Change-Id: I4fccfa0cb2a5382a52d63598f66dae446ff29c25
Closes-Bug: #1842102
Some options are now automatically configured by the version 1.20:
- project
- html_last_updated_fmt
- latex_engine
- latex_elements
- version
- release.
Change-Id: Ia6d98f5649e0ad86d9c5939ef3cd43e5c74b1f23
In case of cell stacks we need to pass redis_vip as an input
to be able to use redis on the central controllers. This
moves the redis_vip setting to all_nodes and only set it if
it is not an additional cell.
Depends-On: https://review.opendev.org/672940
Change-Id: I7ca94dff4acf0816708110b9fe6f78d19dcc7b4d
Prepare bumping ansible-lint by solving few linting errors:
- unamed tasks
- use of shell instead of module
- newlines between tasks (visual)
- boolean comparison
- when clauses that can be split
- missing galaxy_info sections in meta.yml
- spaces around jinja {{ variables }}
- lack of pipefail on shell blocks with pipe
- duplicate dictionary keys
Change-Id: I73ed9a031bd579bc6213923edb9c4288d0302454
Needed-By: https://review.opendev.org/#/c/665445/
This change fixes few linting errors which are discovered by newer
linters.
- bashate: consistent 4 chars identation
- python unamed Exceptions
- python space around operators
- python space after # comments
- python unused imports
- python unknown escapes (errors after py36)
- python double newline before methods
Change-Id: I5d2f37d1c820b1983355be60c09de581a72e08e0
Needed-By: https://review.opendev.org/#/c/665445/
This change will allow a developer to run bindep via tox to install all of
the required system packages needed to run general tests.
> Integration `tox -e bindep`
This will also allow CI to using the bindep file to install basic required
packages as needed.
Change-Id: I83e5e4f8dd5bb9acd4e6b21bf86f729bfc5447d4
Signed-off-by: Kevin Carter <kecarter@redhat.com>