project-config

Author	SHA1	Message	Date
Arnaud Morin	feda25de9d	Set OVH GRA1 region in maintenance mode I recently applied a new kernel on BHS1, if everything is fine with that, I propose to apply the same one GRA1 so it will help fixing some timeout errors. Change-Id: I489f8b84871c18f2dad079cae5b53fb1a504f1bd Signed-off-by: Arnaud Morin <arnaud.morin@corp.ovh.com>	2018-12-20 08:29:14 +01:00
Clark Boylan	7942c19f22	Use OVH BHS1 again Set ovh-bhs1 max-servers to 150. OVH (thank you amorin) have debugged and corrected a memory leak there that we believe to be the cause of the test node slowness. Frickler and I have run fio tests on VMs running on each hypervisor in the region and they look happy. We've also run spot tests of devstack and tempest which also appear happy. Change-Id: If6fd5a6194a9996e8b031f74918f373dc7bbe758	2018-12-18 07:59:16 -08:00
Mohammed Naser	0ff63458ce	vexxhost: tweak nodepool settings This patch drops the VEXXHOST specific flavors from the Montreal region because all of the SJC datacenter has supported and very reliable nested virtualization. It also bumps the max-servers to 10 in order to be able to supply more results. Change-Id: I6383772d6d1e1bca3a759692bf20d373baf588c6	2018-12-07 19:04:31 -05:00
Jens Harbott	55d145c34e	Disable ovh bhs1 We are seeing excessive job timeouts in this region[0], disable it until we can get a more stable turnout again. [0] https://ethercalc.openstack.org/jg8f4p7jow5o Change-Id: I7969cca2cdd99526294a4bf7a0f44f059823dae7	2018-12-07 14:06:27 +00:00
Clark Boylan	a5088837e2	Halve bhs1 max-servers value We are debugging slow nodes in bhs1. Looking at dstat data we clearly have some jobs that end up spending a lot of cpu time in sys and wai columns while other similar jobs do not. One thought was that this is due to an unhappy hypervisor or two, but amorin has dug in and found that these slow jobs run on multiple unique hypervisors implying that isn't likely. My next thought is that we are our own noisy neighbors. Reducing the max-servers should improve things if we are indeed our own noisy neighbors. Change-Id: Idd7804778a141d38da38b739294c6c6a62016053	2018-12-06 14:04:47 -08:00
Arnaud Morin	7671cc88f5	Reduce a little number of instances on BHS1 I'd like to isolate one host from the aggregate, but to perform that in a good way, it's better to reduce the number of instances the nodepool is trying to boot, this will avoid useless no valid host found errors. Change-Id: Iddbfba1c3093e9f128c41db91d6b5b3e1d467ce8 Signed-off-by: Arnaud Morin <arnaud.morin@corp.ovh.com>	2018-12-05 09:01:56 +01:00
Jeremy Stanley	dfda58e203	Revert "Temporarily disable ovh-bhs1 in nodepool" This reverts commit `3f40af4296`. Can be approved once the slow disk performance in this region is resolved. Change-Id: Idda585116ae9dc09b55f6794ab5ee7bda47f455a	2018-11-30 17:38:54 +00:00
Jeremy Stanley	3f40af4296	Temporarily disable ovh-bhs1 in nodepool We've gotten reports of frequent slow job runs in the BHS1 region leading to job timeouts. Further investigation indicates these instances top out around ~10-15MB/sec for contiguous writes to their rootfs while instances booted from the same image and flavor in GRA1 see 250MB/sec or better with the same write patterns. Disable BHS1 in nodepool for now while we work with OVH staff to see if they can determine the root cause. Change-Id: I8b9a79b64dd7da6d3a33f24797ca597bd2426c86	2018-11-30 17:33:50 +00:00
Jeremy Stanley	970987e3ce	Revert "Halve ovh-bhs1 max-servers temporarily" This reverts commit `521d1ceafe`. Merge once testing of the CPU contention theory has concluded. Change-Id: Ia15f6f943bab530e8b6fd96a2c57d091d60e3193	2018-11-23 15:30:52 +00:00
Jeremy Stanley	521d1ceafe	Halve ovh-bhs1 max-servers temporarily We've gotten reports of frequent slow job runs in the BHS1 region leading to job timeouts and OVH staff have confirmed we're running a CPU oversubscription ratio of 2:1 there, so try dropping our utilization by half to confirm whether this could be due to CPU contention during peak load. Change-Id: If7e5f3c0dec71813f5bcb974a0217dc031801115	2018-11-23 15:25:10 +00:00
Ian Wienand	2eec5cd352	Fix arm64ci cloud names This name was incorrectly added in I428d46565921e018ac01cbd9c64b4be60c44f3d5; it's supposed to just be arm64ci. Change-Id: Iaae8db611acf317770eaea3b4caf1d3e403e1d54	2018-11-20 08:02:56 +11:00
Andreas Jaeger	cc48a5cd89	Update bindep-fallback for openSUSE 15.0 openSUSE 15.0 does not have libffi48-devel, instead we can use libffi-devel. Install libffi48-devel only on openSUSE 42.3. This was triggered by the failure in https://review.openstack.org/617282 Change-Id: I2207d69bd837a7249476b4a20025f41df3a7bc84	2018-11-12 12:12:41 +01:00
Ian Wienand	5115fd49d8	nodepool: Add arm64ci cloud Credentials are populated (Ib96d14008ab3b8b7c12429d7432eaa485c404bb2), mirror.nrt1.arm64ci.openstack.org is alive so everything is ready to go. We have a quota of 40 cores & 96gb ram; the c1.large flavor is 8/core 8gb. Should we should be able to fit 5 CI servers to start with. Change-Id: I428d46565921e018ac01cbd9c64b4be60c44f3d5	2018-11-09 14:59:09 +11:00
Mathieu Gagné	9bf8267708	Revert "Disable inap-mtl01 provider" This reverts commit `a8d18c9142`. Change-Id: Ic3681220cc555115c1ddffc742f19d4cd038447e	2018-10-29 15:59:24 +00:00
Mathieu Gagné	a8d18c9142	Disable inap-mtl01 provider Change-Id: Ic367b9b59d10869d46e2dfac820adf1b85ed121a	2018-10-25 16:36:10 -04:00
Clark Boylan	57eaa73695	Switch nodepool launchers to use new zk cluster This should happen at the same time as we switch the zuul scheduler over to the new zk cluster and after the nodepool builders have populated image data on the new zk cluster. This gets us off the old nodepool.o.o server and onto newer HA cluster. Change-Id: I9cea03f726d4acb21ad5584f8db7a4d15bc556db	2018-10-22 09:23:12 -07:00
Clark Boylan	a5bf522ce3	Switch nodepool builders to zk cluster Switch over the nodepool builders to our newer zk cluster from the old single node zk cluster. We will stop building images that the launchers can see before the launchers move, but this lets us preseed the new cluster with up to date image data. Once the images are built with records in the new zk cluster we can switch over the zuul scheduler and the launchers to this newer cluster. Change-Id: I95ca326095decc03cf279383fa48dbdfc56ed8c8	2018-10-22 09:20:47 -07:00
Ian Wienand	c64c3d6f0f	Restore full OVH-GRA1 quota This is a follow-on to Id01f85fcee150f9360f508b09003a8d0043155bd to restore the full quota. Change-Id: Iec483a37f711f12fbb8ae6fe3299aabe4f621ac4	2018-10-19 16:01:07 +02:00
Ian Wienand	529b912c80	Revert "Disable ovh-gra1" This partially reverts commit `bfdd3e6a42`. After fruitful discussions with amorin in IRC, we have nodes working again in this region. This puts a small load on for us to monitor for a while. A follow-on will do a full revert so we don't forget. Story: #2004090 Task: #27492 Change-Id: Id01f85fcee150f9360f508b09003a8d0043155bd	2018-10-18 09:41:14 +00:00
Zuul	6f32c131c3	Merge "Disable ovh-gra1"	2018-10-16 07:59:02 +00:00
Ian Wienand	bfdd3e6a42	Disable ovh-gra1 As described in the story/task, this region is currently not working Change-Id: Ief7b68b45537e7fc8791905d3039d35942636368 Story: #2004090 Task: #27492	2018-10-16 17:34:09 +11:00
Clark Boylan	bff5ce049f	Disable packethost due to mirror outages The mirror keeps getting shutdown which leads to jobs failing in pre-run and restarting. This is just thrashing things and could lead to failures. Lets disable the region until we understand the problem. Change-Id: Ied3fd534dc029868fb770280c01bb564078c5a3d	2018-10-12 16:27:29 -07:00
Zuul	28bc6498f6	Merge "Revert "OVH GRA1 Maintenance" - 2018-10-11 0000UTC"	2018-10-11 16:41:55 +00:00
Zuul	b86d897b67	Merge "OVH GRA1 Maintenance 2018-10-10 1900 UTC"	2018-10-10 17:09:10 +00:00
Logan V	95dcab6b12	Bump Limestone to max-servers 50 The cloud has grown significantly over the past few months and we will begin scaling max-servers slowly to fill capacity. Change-Id: I8ead8e56ce5c54ac1ab286fe23f703d50760a560	2018-10-09 18:34:26 -05:00
Matthew Thode	a8e7fbe127	fix rsyslog builds on gentoo A new version was stabilized on the 5th that allows for more complex ssl usage. also, alphabetize the use flag definitions based on package name. Change-Id: Ie6f3f8462e98ca24879db9ef942ec81072330323	2018-10-07 05:11:12 -05:00
Matthew Thode	9c0292db70	set use flags for systemd Change-Id: I081b23c1acec4b832bbfe1bae96d63e31ff6d335	2018-10-06 21:28:14 -05:00
Zuul	3dcc0359f6	Merge "enable sqlite in python"	2018-10-05 07:00:04 +00:00
Matthew Thode	d341ceca23	enable sqlite in python Change-Id: Ie7248a1765029bcf8b17433fc4714d359bfb2747	2018-10-05 00:50:50 -05:00
Zuul	f276269e54	Merge "Update stackviz tarball location"	2018-10-04 20:33:49 +00:00
Zuul	b5980e3840	Merge "upgrade complete"	2018-10-04 19:13:28 +00:00
John Studarus	533e85e23a	upgrade complete This reverts commit `5c7223b477`. Change-Id: I957c02ae2d12df67fedbab497df94f21ad38b8bc	2018-10-04 18:57:40 +00:00
Clark Boylan	2224100eac	Update stackviz tarball location We've patched stackviz to work under python3 properly but we are still pulling an old tarball for stackviz that was built last year. The legacy job that built the file at this location seems to have been removed. Switch to the new dist/ location which appears to be correct based on tarball file sizes. Someone that understands stackviz better than me should confirm this new locations is the correct one. Change-Id: If659a6f1fb50d288afed75e3f4975f7a4d140d35	2018-10-04 10:46:08 -07:00
Zuul	9dc9eb0765	Merge "Add GPU instances to CI infrastructure"	2018-10-04 14:33:42 +00:00
Mohammed Naser	6ec575648b	Disable vexxhost mtl1 This is being done for capacity reasons. We'll be bringing back the region with 100+ VMs after the changes are complete which should be within less than 2 weeks. Change-Id: I549386c3ae0c3611eb50f8ffe6ad657d1f7bb443	2018-10-03 19:43:09 -04:00
Mohammed Naser	b40c2084b7	Add GPU instances to CI infrastructure This patch adds a small number of instances that include the following specifications: - 6 (dedicated) threads - 60GB memory - 225GB PCIe NVMe storage - NVIDIA K80 GPU This should hopefully help in adding CI coverage for vGPU support. Change-Id: If5b8f9cd305e2fd51b8dab315e4804ce7c628dfd	2018-10-03 14:46:44 -04:00
Zuul	2c30704192	Merge "elements/ndoepool-base: only initially populate ipv4 nameservers"	2018-10-02 18:29:05 +00:00
John Studarus	5c7223b477	a control plane capacity upgrade is planned for later this week reducing the workload until then since the control plane is overloaded Change-Id: I4dc336fc5e4c3844bbc66e71d932e0f26fd4a0f2	2018-10-02 03:02:51 +00:00
Zuul	b858572806	Merge "Revert "Temporarily bump up capacity by 50 VMs""	2018-09-29 15:09:56 +00:00
Zuul	d834293091	Merge "Temporarily bump up capacity by 50 VMs"	2018-09-28 13:06:56 +00:00
Mohammed Naser	1aaa0d95a6	Revert "Temporarily bump up capacity by 50 VMs" This reverts commit `c8cdfa8b12`. Change-Id: I426db005755defa5bff4e83a2259f0a875e5b27a	2018-09-28 08:45:10 -04:00
Mohammed Naser	c8cdfa8b12	Temporarily bump up capacity by 50 VMs Times are hard. Gates are long. Let's help flush them out. Please revert this once we've cleared the gate. Change-Id: Idf0d8a784f11aa4004a909ca911782f7c7496763	2018-09-28 08:43:58 -04:00
Ian Wienand	6565b3c140	elements/ndoepool-base: only initially populate ipv4 nameservers We are seeing a problem on Fedora where it appears on hosts without configured ipv6 unbound chooses to send queries via the ipv6 forwarders and then returns DNS failures. An upstream issue has been filed [1], but it remains unclear exactly why this happens on Fedora but not other platforms. However, having ipv6 forwarders is not always correct. Not all our platforms have glean support for ipv6 configuration, nor do all our providers provide ipv6 transit. Therefore, ipv4 is the lowest common denominator across all platforms. Even those who are "ipv6 only" still provide ipv4 via NAT -- originally it was the unreliability of this NAT transit that lead to unbound being used in the first place. It should be noted that in most all jobs, the configure-unbound role [2] called from the base-job will re-write the forwarding information and configure ipv4/6 correctly during the base job depending on the node & provider support. Thus this only really affects some of the openstack-zuul-jobs/system-config integration jobs, where we start out without unbound configured because we're actually testing the unbound configuration role. An additional complication is that we want to keep backwards compatability and populate the settings if NODEPOOL_STATIC_NAMESERVER_V6 is explicitly set -- this is sometimes required if you building infra-style images and are within a corporate network that disallows outbound DNS queries for example. Thus by default only populate ipv4 forwarders, unless explicitly asked to add ipv6 with the new variable or the static v6 nameservers are explicitly specified. [1] https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4188 [2] http://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/roles/configure-unbound Change-Id: If060455e163266b2c3e72b4a2ac2838a61859496	2018-09-27 14:27:13 +10:00
Clark Boylan	9a3fc0c1e2	Revert "Disable OVH BHS1 region" This reverts commit `19e7cf09d9`. The issues in OVH BHS1 around networking configuration have been worked around with updates to glean and configuration to the labels in zuul. New images are in place for each supported image in BHS1. We can go ahead and start using this region again. I have manually tested this by booting an ubuntu-xenial node with glean_ignore_interfaces='True' set in metadata and the networking comes up with expected using DHCP. The mirror in that region is reachable from this test node. Change-Id: I29746686217a62709c4afc6656d95829ace6fb3b	2018-09-25 14:01:27 -07:00
Clark Boylan	22fb41c763	Glean config on OVH nodes Instruct glean via metadata properties to ignore the config drive network_data.json interface data on OVH and instead fall back to DHCP. This is necessary because post upgrade OVH config drive network_data.json provides inaccurate network configuration details and DHCP is actually what is needed there for working l2 networking. Change-Id: I51f16d34a96ee8d964e8b540ce5113a662a56f6d	2018-09-25 09:28:03 -07:00
Ian Wienand	19e7cf09d9	Disable OVH BHS1 region This reverts commit `756a8f43f7`, which was where we re-enabled OVH BHS1 after maintenance. I strongly suspect that this has something to do with the issues ... It appears that VM's in BHS1 can not communicate with the mirror From a sample host 158.69.64.62 to mirror01.bhs1.ovh.openstack.org --- root@ubuntu-bionic-ovh-bhs1-0002154210:~# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether fa:16:3e:1b:4b:32 brd ff:ff:ff:ff:ff:ff inet 158.69.64.62/19 brd 158.69.95.255 scope global ens3 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe1b:4b32/64 scope link valid_lft forever preferred_lft forever root@ubuntu-bionic-ovh-bhs1-0002154210:~# traceroute -n mirror01.bhs1.ovh.openstack.org traceroute to mirror01.bhs1.ovh.openstack.org (158.69.80.87), 30 hops max, 60 byte packets 1 158.69.64.62 2140.650 ms !H 2140.627 ms !H 2140.615 ms !H root@ubuntu-bionic-ovh-bhs1-0002154210:~# ping mirror01.bhs1.ovh.openstack.org PING mirror01.bhs1.ovh.openstack.org (158.69.80.87) 56(84) bytes of data. From ubuntu-bionic-ovh-bhs1-0002154210 (158.69.64.62) icmp_seq=1 Destination Host Unreachable From ubuntu-bionic-ovh-bhs1-0002154210 (158.69.64.62) icmp_seq=2 Destination Host Unreachable From ubuntu-bionic-ovh-bhs1-0002154210 (158.69.64.62) icmp_seq=3 Destination Host Unreachable --- mirror01.bhs1.ovh.openstack.org ping statistics --- 4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3049ms --- However, external access to the mirror host and all other hosts seems fine. It appears to be an internal OVH BHS1 networking issue. I have raised ticket #9721374795 with OVH about this issue. It needs to be escalated so is currently pending (further details should come to infra-root@openstack.org). In the mean time, all jobs are failing in the region. Disable it until we have a solution. Change-Id: I748ca1c10d98cc2d7acf2e1821d4d0f886db86eb	2018-09-20 15:55:45 +10:00
Zuul	3be846e1e6	Merge "Install gentoolkit on Gentoo"	2018-09-19 22:37:01 +00:00
Zuul	16b8693e02	Merge "Revert "Revert "Revert "OVH BHS1 Maintenance" - 2018-09-19 1200UTC"""	2018-09-19 20:53:18 +00:00
Matthew Thode	66e29f7bb2	Install gentoolkit on Gentoo Change-Id: I031d6fa77337ea7cbf5865c2f568e9a498096a00	2018-09-19 09:11:07 -05:00
Andreas Jaeger	756a8f43f7	Revert "Revert "Revert "OVH BHS1 Maintenance" - 2018-09-19 1200UTC"" Enable OVH BHS1 again. This reverts commit `d74c51b0a5`. Change-Id: Ie3c24efb3e9a753d027dc680ab6a26c6a1934159	2018-09-19 13:18:20 +00:00

1 2 3 4 5 ...

1371 Commits