Close OVN VIP race by adding an ordering constraint
Currently there is a race with the high-availability of ovn when resetting a controller. Namely, the VIP that OVN uses (namely the internal_api VIP by default) only has a colocation constraint with the master role of the ovn-dbs resource. This leaves the following race open: 1) We reboot ctrl-0 hosting the master role of ovn-dbs 2) OVN becomes master on ctrl-1 from pacemaker's POV (but the promotion operation running in the background is not completed) 3) OVN VIP moves to ctrl-1 even though it is still in slave mode (there is only a colocation constraint between vip and master role for ovn) 4) OVN controllers on the overcloud connect to the VIP but it is in read-only mode because it was a slave 5) OVN controllers that connected at 4) stay in read-only forever until they get restarted manually. With the addition of this constraint we force the VIP move only after the master role has been promoted. This makes it much more unlikely for a client to connect to the VIP and get a read-only db in the background. With only this patch applied I did not manage to reproduce the issue (even after 7 reboots of controllers). Note that there is still a small race window possible because the current OVN resource agent has a bug: it promotes a resource to master after issuing the promotion command to the DB but without waiting for this promotion to complete. A patch for OVN-ra will also be submitted but from initial testing this change seems to be largely sufficient. Also note that this change introduces a small less desirable side-effect: A failover of the internal VIP will now take a bit longer because it will happen only after ovn-dbs gets promoted to master. We plan to take care of this fully by decoupling the OVN VIP from the internal_api one. This change addresses the immediate issue related to ovn_controllers being stuck in read-only due to premature promotion. (OVN upstream is discussing how to make connections to read-only VIP trigger a reconnection eventually) Closes-Bug: #1835830 Change-Id: I3fa07e28c4e37197890664d12a265f1673c780f2 (cherry picked from commit5c10f33197
) (cherry picked from commitdc4bb7e7cb
) (cherry picked from commitbcea8ea1aa
)
This commit is contained in:
parent
d19ebc0511
commit
a0cfe0afdd
@ -159,8 +159,18 @@ sb_master_port=${sb_db_port} manage_northd=yes inactive_probe_interval=180000",
|
||||
tries => $pcs_tries,
|
||||
}
|
||||
|
||||
pacemaker::constraint::order { "${ovndb_vip_resource_name}-with-${ovndb_servers_resource_name}":
|
||||
first_resource => 'ovn-dbs-bundle',
|
||||
second_resource => "${ovndb_vip_resource_name}",
|
||||
first_action => 'promote',
|
||||
second_action => 'start',
|
||||
constraint_params => 'kind=Optional',
|
||||
tries => $pcs_tries,
|
||||
}
|
||||
|
||||
Pacemaker::Resource::Bundle['ovn-dbs-bundle']
|
||||
-> Pacemaker::Resource::Ocf["${ovndb_servers_resource_name}"]
|
||||
-> Pacemaker::Constraint::Order["${ovndb_vip_resource_name}-with-${ovndb_servers_resource_name}"]
|
||||
-> Pacemaker::Constraint::Colocation["${ovndb_vip_resource_name}-with-${ovndb_servers_resource_name}"]
|
||||
}
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user