765 Commits

Author SHA1 Message Date
Ricardo Carrillo Cruz
5f9d039bbe Set max-servers on infracloud to 0
If we have it to -1, Nodepool won't build images, 0 will do the trick.

(credits to clarkb who pointed this out to me)

Change-Id: I683aa5ff5f26d67a6cdfd8a8ec5c98a84d33e91a
2016-08-26 17:49:24 +02:00
Jenkins
fdf45b54df Merge "Revert "Remove infracloud"" 2016-08-26 14:44:38 +00:00
Kevin Carter
f08d4d496e
Add initialize-urandom to fedora 24
This will seed entropy for systemd on newer kernels.

Change-Id: Id66ef992b2bffa049a91bba9b1d68d803bab81f5
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2016-08-26 09:28:37 -05:00
Sean Dague
01039b1bde This has caused 240+ failures in the last 24 hours
There seem to be connectivity issues with internap mtl. It's causing a
failure in about 1 in 4 gate patches. We should pull it out of rotation.

Change-Id: I3392d98fb019db67243c9e8f039e7e313df31842
2016-08-26 08:50:04 -04:00
Jenkins
6c3417c2ad Merge "Don't use devuser for zuul-worker" 2016-08-26 12:19:21 +00:00
Ricardo Carrillo Cruz
d5a839be2c Revert "Remove infracloud"
This reverts commit 2976214a6f29b553feb9c2cd08d679fc50d1f6da.

Change-Id: Iacb32ff722227abe9c7f55bb529680e1c53bf64e
2016-08-26 11:50:59 +02:00
Jenkins
b1a6720886 Merge "Disable rax-iad due to launch failure rate" 2016-08-26 08:00:41 +00:00
Jenkins
1ad3b8f381 Merge "Revert "Double rax-ord boot-timeout value"" 2016-08-26 07:57:46 +00:00
Paul Belanger
d4b4db784c
Disable rax-iad due to launch failure rate
Since 2016-08-04 we've seen a big spike in launch failures for
rax-iad. With the recent increase in capacity, we can turn this region
off until we find the time to work with rackspace.

http://grafana.openstack.org/dashboard/db/nodepool-rackspace?from=1469582782174&to=1472174782174

Change-Id: I5d35da2b26c7338016177b767bc34d2d06d08b2a
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-25 21:23:47 -04:00
Paul Belanger
1747547fb6
Revert "Double rax-ord boot-timeout value"
This didn't fix our timeout issues. We'll have to open a ticket with
rackspace to propelry debug this.

This reverts commit 92d3c7cc0cac56380d03b3bbd1d36f5084278111.

Change-Id: I969fbd057b0c694b148d6a99485e95bb76cf7669
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-25 21:15:12 -04:00
Paul Belanger
d9393d0bf6
Add internap-mtl01 to nodepool.o.o
Thanks for internap, we have another region to bring online for
nodepool.  We have access to 120 nodes and possibily more.

We'll need to restart nodepool-builder too, since this is a new cloud.

Change-Id: If9e6c0bede223aa01d49054eff3660066e84af9d
Depends-On: Iee808936a65e0f0c794f8c46c086f83e52d0251e
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-25 19:32:10 -04:00
Jenkins
cca18cb482 Merge "Remove Fedora 24 work-around kernel" 2016-08-25 21:53:49 +00:00
Jenkins
90963bd2c4 Merge "Bump ubuntu-(trusty|xenial) to 20 min-ready" 2016-08-25 19:21:33 +00:00
Paul Belanger
850ca67952
Bump ubuntu-(trusty|xenial) to 20 min-ready
With over 1.199k nodes available to use. Lets up both trusty and
xenail to 20 again.  This is the first time xenail will be 20, however
ubuntu-trust was decreased in:

  I4215296836b7ec3781cddd80fd2ae224063541d6

so this is a partial revert. Once we fully migrate devstack jobs to
xenial, I believe we could drop trusty down again.

Change-Id: I41876e5850dc96914437dbb068212fdd9eeff550
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-25 14:54:04 -04:00
Paul Belanger
894a78eded
Revert "Increase rh1 max-servers to 75"
Because of issues attaching FIPs with shade, decrease back to 50. This
should let tripleo-test-cloud-rh1 limp along until we restart
nodepool.

This reverts commit fc54caa39e715afd5730e84b13f3a8c09b36136f.

Change-Id: I9e2bcf899546befd8c7613867a8169d63201902c
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-25 12:41:10 -04:00
Ian Wienand
e6db1891b3 Remove Fedora 24 work-around kernel
The workaround in my custom kernel (from
Iafe6d88e3ac7a2ea23553a5011df920a2ee3317d and
I0769f005da1931658a5fb9e627983ed30c11d212) are incorporated in the
latest upstream release.

Change-Id: Ibb2e2045ce813b4e69447fb5c896a2e0dfd4b1ec
2016-08-25 16:02:16 +10:00
Jenkins
5fd5812c5a Merge "Include dib-builddate.txt for configure_mirror.sh" 2016-08-24 18:30:23 +00:00
Jenkins
cf99919c70 Merge "Run host lookup first for configure_mirror.sh" 2016-08-24 18:29:35 +00:00
Paul Belanger
92d3c7cc0c
Double rax-ord boot-timeout value
We are seeing a large amount of subnodes failing to boot in rax-ord,
lets double our boot-timeout to see if this solves some of the
problem.

Change-Id: I9f036228a927ba21b7f03b6137b6d7a7346b5aff
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-23 21:02:00 -04:00
Paul Belanger
d623c337ba
Include dib-builddate.txt for configure_mirror.sh
Update our script to provide additional debug information, such as the
builddate of our DIB images.

Change-Id: Ie399b1ac9cd6c6ea4372378ae7b7bf930fac16a3
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-23 11:24:58 -04:00
Paul Belanger
a44be1116e
Run host lookup first for configure_mirror.sh
A cosmetic change to reduce the debug output on DNS failures. First
check for valid DNS, then proceed with configuration.

Change-Id: I3370a155bf6b49088c398dc9d4d572b6210b84b2
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-23 11:21:49 -04:00
Jenkins
a286f8c8d4 Merge "Add smarter dns checking for configure_mirror.sh" 2016-08-22 18:08:58 +00:00
Jenkins
e529542020 Merge "Increase the quota in the OSIC cloud1" 2016-08-22 18:02:07 +00:00
Clark Boylan
945b10084d Ensure ntpdate is on our test images
We want to be able to set the time in big steps at the beginning of test
runs and one option for doing so is with ntpdate. Ensure ntpdate is
installed on all our images by putting it into the infra package needs
element.

Note that the package name appears to be the same across ubuntu, centos,
and fedora.

Change-Id: Ib3fd4afe5a89d8a799cc15c57254aaf11b6aa3e5
2016-08-19 13:16:35 -07:00
Paul Belanger
bf315be8c5
Add After=network.target for urandom.service
By making initialize-urandom.service work the same way glean.service
does, we can ensure both services run.  Today, glean will report a
dependency error, which breaks networking on ubuntu-xenial.

Change-Id: Ia7e26166323bd398edd000e70368928e758f22d3
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-19 14:32:28 -04:00
Doug Wiegley
3fa95e8d61 Temporarily remove osic nodes for ubuntu-precise
They appear to be causing job failures similar to what we saw recently on Trusty
when OSIC switched to IPv6:

Example:
http://logs.openstack.org/94/357094/1/check/gate-infra-puppet-apply-ubuntu-precise/ef0a551/console.html

Change-Id: Idf3f806d7fd3464c5045855d11602cef7ab03548
2016-08-19 12:50:37 +02:00
Paul Belanger
779f3d8109
Fix file permissions with initialize-urandom element
We overlooked setting up the proper permission on both our
initialize-urandom python and systemd scripts.

Change-Id: I6da27a049954961c9333ebeb48382f8b175dc2d9
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-18 23:09:10 -04:00
Clark Boylan
01851f8a4a Remove OSIC from 2-node label types
Currently in OSIC nodepool/shade write out the public IPv6 addrs to the
private addr info files in /etc/nodepool. This is problematic because we
can't run our OVS + VXLAN tunnels over IPv6 (yay networking). Avoid
problems with multinode testing that relies on OVS + VXLAN over the
"private" addrs by removing OSIC from the 2-node labels.

There is work in progress to fix this behavior in nodepool/shade so that
the private info files use the correct IPv4 addresses which we can run
tunnels over no problem.

Change-Id: Ib1eae2f8d254de191d1b687491a3939c2ff273e6
2016-08-18 15:01:33 -07:00
Jenkins
68fcf0822d Merge "Add glue to get initialize-urandom installed" 2016-08-18 20:08:07 +00:00
Kevin Carter
a6a8aa0c4d
Increase the quota in the OSIC cloud1
With recent changes to stabalize instances running on IPv6 we should now be
able to run with more quota. In this change we're doubling the old quota
and should everything hold-up we'll further increase the quota as needed.

Change-Id: If2380774bb5bde4baccd6186f3a91f4686476faf
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2016-08-18 11:12:24 -05:00
Jenkins
880a8a0b1e Merge "Up the debian-jessie min-ready" 2016-08-18 14:57:35 +00:00
Jenkins
64e0f4dba7 Merge "Resolve DNS soon in configure_mirror.sh" 2016-08-18 13:43:31 +00:00
Jenkins
5af47dc3f4 Merge "Use ip6tables if nodepool is using IPv6" 2016-08-18 13:43:21 +00:00
Thomas Goirand
b421e08f89 Up the debian-jessie min-ready
This patch increases the min-ready nodes for Jessie from 1 to 3,
because otherwise, we have to wait for a really long time until
a build start. This can be blocking, especially when we are
waiting for a fix of a build-dependency.

Change-Id: I5888cd6b9d0aa67a99bf9d32e85cd18347542e68
2016-08-18 14:53:33 +02:00
Paul Belanger
45408a003d
Revert "Disable trusty nodes on OSIC"
We've included the sysctl fix for ipv6 and new images have been
uploaded.

This reverts commit 075358c04a348219f5dfa54c6f9d95600dd8168c.

Change-Id: I764a388def9f473d7b644f5c53930e8e451b13af
Depends-On: If3bb0fd690673a6d93114e6aebddb5985344b437
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-17 22:36:45 -04:00
Monty Taylor
b195f8ef6e
Add glue to get initialize-urandom installed
Here we are installing our python app, and setting up systemd.  Our
server should run after haveged and before unbound.

Change-Id: I4f9b24f217f271b64f324c922948c54c46cb1110
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-17 21:21:20 -04:00
James E. Blair
6020537b2c Nodepool elements: Add a script to initialize urandom
In our Xenial images, we see unbound take a while to start because
it uses openssl which uses the getrandom call which can block during
early boot if the nonblocking random number generator is not yet
initialized.

This script uses haveged to quickly initialize the generator.

This commit only includes the script, a later commit will add the
rest of the necessary install steps to the element.

Change-Id: I09d18a0bad6c380fd149660ebfdaf6c12730dc74
2016-08-17 15:22:26 -07:00
Paul Belanger
79120078ea
Use ip6tables if nodepool is using IPv6
Change-Id: Ifebc8061dbbc99eed47938c7401b5220fd62d19a
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-17 13:41:47 -04:00
Jenkins
50bfc129ee Merge "Disabled IPv6 privacy extensions" 2016-08-17 12:40:01 +00:00
Jenkins
0888787f72 Merge "Disable trusty nodes on OSIC" 2016-08-17 11:56:57 +00:00
Ricardo Carrillo Cruz
075358c04a Disable trusty nodes on OSIC
We are seeing a lot of errors when cloning repos.
The combination is OSIC+Trusty, thus removing them for now.

Change-Id: I4854dbcb3e17529b1ace04c32a72aba3724d5b36
2016-08-17 12:50:55 +02:00
Kevin Carter
95821ab951
Disabled IPv6 privacy extensions
IPv6 privacy extensions can cause issues by preferring a temporary
network over a public one. This preference may limit connectivity
in certain situations. An example of a connectivity issue can be
seen where the command ``traceroute6`` fails or misses all hops
while other traffic to a given domain with a "AAAA" record may
succeed. To resolve this issue the IPv6 privacy extensions have
been disabled.

Related-Bug: #1068756
Change-Id: If3bb0fd690673a6d93114e6aebddb5985344b437
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2016-08-16 23:18:23 -05:00
Paul Belanger
fce3a76c22
Add smarter dns checking for configure_mirror.sh
After noticing ubuntu-xenial launch failures in osic-cloud1, it looks
like our unbound service is taking up to 1 minute to start properly.
So wait 30 seconds for host to timeout, and try 10 times.  This gives
us a 300 second timeout window to configure DNS properly.

Change-Id: Id0432c91cc853fb4ecab43da991948c2e9d84b7d
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-15 22:20:17 -04:00
Jenkins
e7d35022c7 Merge "Further F24 kernel update" 2016-08-16 02:10:35 +00:00
Ian Wienand
1b187f9b80 Further F24 kernel update
As described, I missed that we only keep *one* kernel during dib
build, so as soon as the upstream package updates, it suddenly becomes
the latest kernel and kicks our custom version out.

Guess what happened in the hours between me committing
I0769f005da1931658a5fb9e627983ed30c11d212 and the next dib build.

This will install the current latest kernel with the required patch.
As described in the comment, I have the fix committed upstream so we
can remove this whole thing when fedora rebuilds for the next stable
release (even if the patch isn't in the official stable tree yet).

Change-Id: Iafe6d88e3ac7a2ea23553a5011df920a2ee3317d
2016-08-16 10:02:40 +10:00
Jenkins
b1e38c347f Merge "Add IPv6 DNS support" 2016-08-15 23:21:25 +00:00
Kevin Carter
9740607509
Raised max instance in the OSIC
With IPv6 implemented and the gate seemingly running well
enough we'd like to raise the limit max instances available
to 256.

Change-Id: Icd6b0ac22bb4739eff4221c2041e408085634b5b
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2016-08-15 14:34:35 -05:00
Paul Belanger
3023cc4d79
Add IPv6 DNS support
Now that osic-cloud1 is only using IPv6 public IPs, we can also add
IPv6 support for unbound.

Change-Id: I9da5a06fdbea04b322cddf6c7e6e829e47492d4c
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-15 12:58:28 -04:00
Jenkins
eab632cbfc Merge "Increase the quota in the OSIC cloud1" 2016-08-13 23:21:48 +00:00
Paul Belanger
599c3a9da4
Resolve DNS soon in configure_mirror.sh
Because how often we run into nodes having DNS issues, a large amount
of debug info is dumped into nodepool logs when the ready-script
fails.

As such, move our dns checks soon so the ready-script failure output
is minimized.

Change-Id: I003f8ee816f279ef30075524c1b806e93c56e7f9
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-08-12 15:32:04 -04:00