By default Python configures SIGPIPE to be SIG_IGN, which means to
ignore the signal. We don't want that as it causes problems when
journald restarts and our log calls start triggering SIGPIPEs.
Instead, we want to allow the SIGPIPE to kill the process so it can
be restarted by systemd.
When the os-collect-config process is started on multiple systems at the
same time, the polling intervals can line up to cause performance
problems against the configuration source. To reduce the impact, this
change adds a splay option to allow the operator to configure a random
delay prior to the polling to attempt to offset the polling
There are 2 things at play here. First the logic added in
4cfeb28d12 is no longer needed
because our initial sleep time is very low and increases
gradually up to the max.
2nd, I'm proposing that we avoid reexecing unless the config
file actually changes. Not re-execing will give us the option
to optimize os-collect-config for some long running collectors
(like zaqar websockets). Also, Os-collect-config updates would
already be handled by packaging restarts and or other deployment
system changes anyways.
This isn't quite right and broke on stable/liberty. Pushing
a revert in case I3c22d77dece399d21ab94783b74990789a1e1481
doesn't actually fix the problem. We should probably merge
whichever passes first.
This reverts commit 69653318f4.
The old oslo-incubator log module isn't maintained (and doesn't even
exist anymore), so we don't really want to be using it. It appears
this was the only incubator module we were actually using, so this
allows us to remove all of the unmaintained incubator code.
This patch updates os-collect-config so that the sleep interval
time is shortened if changes are detected. This should decrease
deployment time when using Heat templates which use depends_on
to step through a sequence of software deployment resources.
The new default sleep is set to 1 and increases
by sleep_time *= 2 until it reaches the default
sleep interval again.
This is required so that a swift-enabled TripleO undercloud can switch
to polling for metadata from a TempURL rather than heat.
The Oslo libraries have moved all of their code out of the 'oslo'
namespace package into per-library packages. The namespace package was
retained during kilo for backwards compatibility, but will be removed by
the liberty-2 milestone. This change removes the use of the namespace
package, replacing it with the new package names.
The patches in the libraries will be put on hold until application
patches have landed, or L2, whichever comes first. At that point, new
versions of the libraries without namespace packages will be released as
a major version update.
Please merge this patch, or an equivalent, before L2 to avoid problems
with those library releases.
This change implements a collector which does an HTTP GET via
python requests to fetch the metadata.
It should work with any GET-able URL, however it is designed to
work with Swift TempURLs.
Swift objects are not consistent, so the Last-Modified header is
checked for each poll and metadata is not fetched if the last
modified is not newer than the previous successful poll.
This collector will be enabled for OS::Nova::Server
software_config_transport: POLL_TEMP_URL which is available
in the Juno release of Heat. Using POLL_TEMP_URL will result
in no metadata polling load on heat, which has historically been
an issue with tripleo scalability.
When we detect a failed command we log ERROR but we do not return an
error status. This makes it difficult for programs which may run
os-collect-config to detect whether a run was sucessful.
This only applies to runs which are performed with --one-time argument
as this is a straightforward case.
The local collector is not in DEFAULT_COLLECTORS, but should be usable
explicitly. It, however, suffers from a bug where only
DEFAULT_COLLECTORS are allowed through.
This collector will collect data from the local system, allowing image
builds or simple processes to influence the metadata.
implements bp tripleo-juno-occ-localdatasource
The configuration will dictate whether or not something is configured.
If it is not, this is a normal state and should not be logged as a
This reverts commit 6b478e9d90.
We will break anybody who is expecting CFN to be tried in all
circumstances with this. We probably just need to base which collectors
to try on what configuration we have, and not log warnings on
Previously we were relying on the CFN compatibility API. This makes the
native Heat version the default.
Note that we want to keep full coverage, which is why we are explicitly
adding cfn back in during tests.
This collector uses keystoneclient and heatclient to poll for the
configured resource metadata.
Changes were required to test_collect to allow collectors which needed
to fake something other than requests.
Before this, the exploded deployments that the cfn collector produced
would not ever be committed, and thus would always appear to have been
changed. This resulted in os-collect-config running the command
This requires some refactoring so that we commit changes to the cache
based on what was actually written, rather than just the static list of
With the new OS::Heat::StructuredDeployment resource, each Metadata
section may have multiple "deployments" in it. With this, we will return
a list with tuples of key and content to write to the cache.
While this is called a "cache", it is important for it to survive. On
reboot, servers may need what was in the cfn config to restore complex
We introduce a new command line option, --backup-cachedir, that will
default to the old path, /var/run/os-collect-config. This will keep
things working for any tools that have been hard coded to use the old
We pass the list of json files containing the collected metadata to
os-refresh-config using the OS_CONFIG_FILES env variable, so it's a
pretty useful piece of information to log.
The initial value of 300 seconds was a conservative estimate. However,
the requests and responses are somewhat small, so we can drop the polling
interval significantly and still maintain a high degree of network
scalability. After measuring the responses from the ec2 and cfn servers
with typical workloads, at 30 second intervals 100 servers will generate
around 26kB/s of requests, with about 66kB/s of responses.
Without this change, if a user runs os-collect-config --force, it will
lock the user in an infinite loop running the command over and over with
very little chance to cancel. There are no compelling use cases for that
behavior, but it is extremely inconvenient, so implying --one-time
improves usability of os-collect-config for users.
The use case for --print is an administrator wanting to view the
metadata that os-collect-config sees without running any commands.
Fixes bug #1213195
This is a useful debugging and/or system fixer tool for instances where
metadata has not changed but one needs to re-run the configuration.
Fixes bug #1223693
Keep a hash of the config file for os-collect-config and if it changes
during a failed run then rerun immediately(without sleep), effectively
causing new nodes to be ready 5 minutes earlier.
Because the cfn credentials are placed into os-collect-config.conf by
os-apply-config and are not in place the first time os-collect-config is
run, the first run of os-collect-config results in error, o-c-c then
sleeps for 5 minutes before running successfully the second time.
Fixes bug #1219186
On a system with o-c-c installed by pip the binary generated by PBR
calls __main__() directly, the code that sets up logging sould be placed
here otherwise it will be bypassed. Resulting in missing log messages.
The point of delaying the commit of data to the cache is that we want to
make sure the command succeeds before giving up on the data changes.
This will ensure that we keep trying the command with any given change
to the metadata until it succeeds.
After we have run a command and committed, re-execute ourselves. This
ensures that we will get any configurations that may have come from
underlying commands. Also re-execute if os-collect-config is sent HUP.
This makes os-collect-config stay resident and prepares it for a more
event based operation when the Heat API is ready for that via longpoll
or callbacks or something else.
Positional arguments now specify which collectors to use. This allows
disabling a collector if it is problematic, and also re-ordering of the
emitted $OS_CONFIG_FILES from the default order if necessary.