octavia/releasenotes/notes/fix-health-check-db-outage-279b0bc1d0039312.yaml
Carlos Goncalves 92473ce210 Make health checks resilient to DB outages
Octavia is struggling with proper handling of DB connectivity issues
bringing down all running loadbalancers. Octavia tries to failover
amphorae and can fail in one of the following stages:

1. Octavia can't create new amphora because Nova isn't ready yet after
   DB outage. Nova-API throws 500, Octavia nukes amphora instance and
   won't try to recreate it again.
2. Octavia tries to recreate amphora instance but it gets stuck in
   PENDING_CREATE forever.
3. Octavia fails completely reporting DB connection issues, leaving some
   amphoras in error, some in pending_delete as bellow: It affects also
   HA deployments.

This patch fixes that by wrapping the DB check for health, waiting for
the connection to be re-established and sleeping off the full
"heartbeat_timeout" interval.

Story: 2003575
Task: 24871

Change-Id: I7b30cd31e1ce0cf9dab61484f4404f1c6ccddd5e
2018-09-11 12:21:22 -06:00

7 lines
214 B
YAML

---
fixes:
- |
Fixed an issue when Octavia cannot reach the database (all database
instances are down) bringing down all running loadbalancers. The Health
Manager is more resilient to DB outages now.