Reduce lock contention on subnets

HINT: This isn't a clean backport, as we keep the subnet in-use field. We can't backport the db update that would remove the field. in [1] a lock was introduced with the goal of preventing subnets from being deleted while ports are being created in them in parallel. This was acheived by aquiring an exclusive lock on the row of the subnet in the Subnet table when adding/modifying a port or deleting the subnet. However as this was a exclusive lock it also prevented concurrent port modifications on the same subnet from happening. This can cause performance issues on environment with large shared subnets (e.g. a large external subnet). To reduce the lock contention for this case we split the lock in two parts: * For normal port operations we will aquire a shared lock on the row of the subnet. This allows multiple such operations to happen in parallel. * For deleting a subnet we will aquire an exclusive lock on the row of the subnet. This lock can not be aquired when there is any shared lock currently on the row. With this we maintain the same locking level as before, but reduce the amount of lock contention happening and thereby improve throughput. The performance improvement can be measured using rally test [2]. (improving from 21 to 18 seconds). Alternatively it can be tested using 250 parallel curl calls to create a port in the same network. This improves from 113s to 42s. [1]: https://review.opendev.org/c/openstack/neutron/+/713045 [2]: https://github.com/openstack/rally-openstack/blob/master/samples/tasks/scenarios/neutron/create-and-delete-ports.json Closes-Bug: #2009055 Change-Id: I31b1a9c2f986f59fee0da265acebbd88d2f8e4f8 (cherry picked from commit c0af5b3b5e)
2023-03-01 16:14:18 +01:00 · 2023-03-01 16:14:18 +01:00 · d25c129ec2
parent 86bc3761d1
commit d25c129ec2
3 changed files with 40 additions and 14 deletions
--- a/neutron/db/db_base_plugin_v2.py
+++ b/neutron/db/db_base_plugin_v2.py
@ -75,7 +75,7 @@ LOG = logging.getLogger(__name__)


 def _ensure_subnet_not_used(context, subnet_id):
-    models_v2.Subnet.lock_register(
+    models_v2.Subnet.write_lock_register(
        context, exc.SubnetInUse(subnet_id=subnet_id), id=subnet_id)
    try:
        registry.publish(
--- a/neutron/db/ipam_backend_mixin.py
+++ b/neutron/db/ipam_backend_mixin.py
@ -683,7 +683,7 @@ class IpamBackendMixin(db_base_plugin_common.DbBasePluginCommon):
            msg = ('This subnet is being modified by another concurrent '
                   'operation')
            for subnet in subnets:
-                subnet.lock_register(
+                subnet.read_lock_register(
                    context, exc.SubnetInUse(subnet_id=subnet.id, reason=msg),
                    id=subnet.id)
            subnet_dicts = [self._make_subnet_dict(subnet, context=context)
--- a/neutron/db/models_v2.py
+++ b/neutron/db/models_v2.py
@ -33,25 +33,51 @@ from neutron.db import rbac_db_models
 class HasInUse(object):
    """NeutronBaseV2 mixin, to add the flag "in_use" to a DB model.

-    The content of this flag (boolean) parameter is not relevant. The goal of
-    this field is to be used in a write transaction to mark a DB register as
-    "in_use". Writing any value on this DB parameter will lock the container
-    register. At the end of the DB transaction, the DB engine will check if
-    this register was modified or deleted. In such case, the transaction will
-    fail and won't be committed.
+    The goal of this class is to allow users lock specific database rows with
+    a shared or exclusive lock (without necessarily introducing a change in
+    the table itself). Having these locks allows the DB engine to prevent
+    concurrent modifications (e.g. the deletion of a resource while we are
+    currently adding a new dependency on the resource).

-    "lock_register" is the method to write the register "in_use" column.
-    Because the lifespan of this DB lock is the DB transaction, there isn't an
-    unlock method. The lock will finish once the transaction ends.
+    "read_lock_register" takes a shared DB lock on the row specified by the
+    filters. The lock is automatically released once the transaction ends.
+    You can have any number of parallel read locks on the same DB row. But
+    you can not have any write lock in parallel.
+
+    "write_lock_register" takes an exclusive DB lock on the row specified by
+    the filters. The lock is automatically released on transaction commit.
+    You may only have one write lock on each row at a time. It therefor
+    blocks all other read and write locks to this row.
    """
+    # keep this value to not need to update the database schema
+    # only at backport
    in_use = sa.Column(sa.Boolean(), nullable=False,
                       server_default=sql.false(), default=False)

    @classmethod
-    def lock_register(cls, context, exception, **filters):
+    def write_lock_register(cls, context, exception, **filters):
+        # we use `with_for_update()` to include `FOR UPDATE` in the sql
+        # statement.
+        # we need to set `enable_eagerloads(False)` so that we do not try to
+        # load attached resources (e.g. standardattributes) as this breaks the
+        # `FOR UPDATE` statement.
        num_reg = context.session.query(
-            cls).filter_by(**filters).update({'in_use': True})
-        if num_reg != 1:
+            cls).filter_by(**filters).enable_eagerloads(
+                False).with_for_update().first()
+        if num_reg is None:
+            raise exception
+
+    @classmethod
+    def read_lock_register(cls, context, exception, **filters):
+        # we use `with_for_update(read=True)` to include `LOCK IN SHARE MODE`
+        # in the sql statement.
+        # we need to set `enable_eagerloads(False)` so that we do not try to
+        # load attached resources (e.g. standardattributes) as this breaks the
+        # `LOCK IN SHARE MODE` statement.
+        num_reg = context.session.query(
+            cls).filter_by(**filters).enable_eagerloads(
+                False).with_for_update(read=True).first()
+        if num_reg is None:
            raise exception