Handle online_data_migrations exceptions
When online_data_migrations raise exceptions, nova/cinder-manage catches the exceptions, prints fairly useless "something didn't work" messages, and moves on. Two issues: 1) The user(/admin) has no way to see what actually failed (exception detail is not logged) 2) The command returns exit status 0, as if all possible migrations have been completed successfully - this can cause failures to get missed, especially if automated This change adds logging of the exceptions, and introduces a new exit status of 2, which indicates that no updates took effect in the last batch attempt, but some are (still) failing, which requires intervention. Change-Id: Ib684091af0b19e62396f6becc78c656c49a60504 Closes-Bug: #1796192
This commit is contained in:
parent
1c8b14d24c
commit
3eea37b85b
@ -86,16 +86,25 @@ Nova Database
|
|||||||
Lists and optionally deletes database records where instance_uuid is NULL.
|
Lists and optionally deletes database records where instance_uuid is NULL.
|
||||||
|
|
||||||
``nova-manage db online_data_migrations [--max-count]``
|
``nova-manage db online_data_migrations [--max-count]``
|
||||||
Perform data migration to update all live data. Return exit code 0 if
|
Perform data migration to update all live data.
|
||||||
migrations were successful or exit code 1 for partial updates. This command
|
|
||||||
should be called after upgrading database schema and nova services on all
|
|
||||||
controller nodes. If the command exits with partial updates (exit code 1)
|
|
||||||
the command will need to be called again.
|
|
||||||
|
|
||||||
``--max-count`` controls the maximum number of objects to migrate in a given
|
``--max-count`` controls the maximum number of objects to migrate in a given
|
||||||
call. If not specified, migration will occur in batches of 50 until fully
|
call. If not specified, migration will occur in batches of 50 until fully
|
||||||
complete.
|
complete.
|
||||||
|
|
||||||
|
Returns exit code 0 if no (further) updates are possible, 1 if the ``--max-count``
|
||||||
|
option was used and some updates were completed successfully (even if others generated
|
||||||
|
errors), 2 if some updates generated errors and no other migrations were able to take
|
||||||
|
effect in the last batch attempted, or 127 if invalid input is provided (e.g.
|
||||||
|
non-numeric max-count).
|
||||||
|
|
||||||
|
This command should be called after upgrading database schema and nova services on
|
||||||
|
all controller nodes. If it exits with partial updates (exit status 1) it should
|
||||||
|
be called again, even if some updates initially generated errors, because some updates
|
||||||
|
may depend on others having completed. If it exits with status 2, intervention is
|
||||||
|
required to resolve the issue causing remaining updates to fail. It should be
|
||||||
|
considered successfully completed only when the exit status is 0.
|
||||||
|
|
||||||
``nova-manage db ironic_flavor_migration [--all] [--host] [--node] [--resource_class]``
|
``nova-manage db ironic_flavor_migration [--all] [--host] [--node] [--resource_class]``
|
||||||
Perform the ironic flavor migration process against the database
|
Perform the ironic flavor migration process against the database
|
||||||
while services are offline. This is `not recommended` for most
|
while services are offline. This is `not recommended` for most
|
||||||
|
@ -140,14 +140,18 @@ same time.
|
|||||||
``nova-manage db online_data_migrations --max-count <number>``. Note
|
``nova-manage db online_data_migrations --max-count <number>``. Note
|
||||||
that you can use the ``--max-count`` argument to reduce the load this
|
that you can use the ``--max-count`` argument to reduce the load this
|
||||||
operation will place on the database, which allows you to run a
|
operation will place on the database, which allows you to run a
|
||||||
small chunk of the migrations until all of the work is done. Each
|
small chunk of the migrations until all of the work is done. The chunk size
|
||||||
time it is run, it will show a summary of completed and remaining
|
you should use depends on your infrastructure and how much additional load
|
||||||
records. You run this command until you see completed and
|
you can impose on the database. To reduce load, perform smaller batches
|
||||||
remaining records as zeros. The chunk size you should use depend
|
with delays between chunks. To reduce time to completion, run larger batches.
|
||||||
on your infrastructure and how much additional load you can
|
Each time it is run, the command will show a summary of completed and remaining
|
||||||
impose on the database. To reduce load, perform smaller batches
|
records. If using the ``--max-count`` option, the command should be rerun
|
||||||
with delays between chunks. To reduce time to completion, run
|
while it returns exit status 1 (which indicates that some migrations took
|
||||||
larger batches.
|
effect, and more work may remain to be done), even if some migrations
|
||||||
|
produce errors. If all possible migrations have completed and some are
|
||||||
|
still producing errors, exit status 2 will be returned. In this case, the
|
||||||
|
cause of the errors should be investigated and resolved. Migrations should be
|
||||||
|
considered successfully completed only when the command returns exit status 0.
|
||||||
|
|
||||||
* At this point, you must also ensure you update the configuration, to stop
|
* At this point, you must also ensure you update the configuration, to stop
|
||||||
using any deprecated features or options, and perform any required work
|
using any deprecated features or options, and perform any required work
|
||||||
|
@ -75,6 +75,8 @@ from nova.virt import ironic
|
|||||||
|
|
||||||
CONF = nova.conf.CONF
|
CONF = nova.conf.CONF
|
||||||
|
|
||||||
|
LOG = logging.getLogger(__name__)
|
||||||
|
|
||||||
QUOTAS = quota.QUOTAS
|
QUOTAS = quota.QUOTAS
|
||||||
|
|
||||||
# Keep this list sorted and one entry per line for readability.
|
# Keep this list sorted and one entry per line for readability.
|
||||||
@ -670,14 +672,18 @@ Error: %s""") % six.text_type(e))
|
|||||||
|
|
||||||
def _run_migration(self, ctxt, max_count):
|
def _run_migration(self, ctxt, max_count):
|
||||||
ran = 0
|
ran = 0
|
||||||
|
exceptions = False
|
||||||
migrations = {}
|
migrations = {}
|
||||||
for migration_meth in self.online_migrations:
|
for migration_meth in self.online_migrations:
|
||||||
count = max_count - ran
|
count = max_count - ran
|
||||||
try:
|
try:
|
||||||
found, done = migration_meth(ctxt, count)
|
found, done = migration_meth(ctxt, count)
|
||||||
except Exception:
|
except Exception:
|
||||||
print(_("Error attempting to run %(method)s") % dict(
|
msg = (_("Error attempting to run %(method)s") % dict(
|
||||||
method=migration_meth))
|
method=migration_meth))
|
||||||
|
print(msg)
|
||||||
|
LOG.exception(msg)
|
||||||
|
exceptions = True
|
||||||
found = done = 0
|
found = done = 0
|
||||||
|
|
||||||
name = migration_meth.__name__
|
name = migration_meth.__name__
|
||||||
@ -691,7 +697,7 @@ Error: %s""") % six.text_type(e))
|
|||||||
ran += done
|
ran += done
|
||||||
if ran >= max_count:
|
if ran >= max_count:
|
||||||
break
|
break
|
||||||
return migrations
|
return migrations, exceptions
|
||||||
|
|
||||||
@args('--max-count', metavar='<number>', dest='max_count',
|
@args('--max-count', metavar='<number>', dest='max_count',
|
||||||
help='Maximum number of objects to consider')
|
help='Maximum number of objects to consider')
|
||||||
@ -713,8 +719,9 @@ Error: %s""") % six.text_type(e))
|
|||||||
|
|
||||||
ran = None
|
ran = None
|
||||||
migration_info = {}
|
migration_info = {}
|
||||||
|
exceptions = False
|
||||||
while ran is None or ran != 0:
|
while ran is None or ran != 0:
|
||||||
migrations = self._run_migration(ctxt, max_count)
|
migrations, exceptions = self._run_migration(ctxt, max_count)
|
||||||
ran = 0
|
ran = 0
|
||||||
for name in migrations:
|
for name in migrations:
|
||||||
migration_info.setdefault(name, (0, 0))
|
migration_info.setdefault(name, (0, 0))
|
||||||
@ -734,6 +741,18 @@ Error: %s""") % six.text_type(e))
|
|||||||
t.add_row([name, info[0], info[1]])
|
t.add_row([name, info[0], info[1]])
|
||||||
print(t)
|
print(t)
|
||||||
|
|
||||||
|
# NOTE(imacdonn): In the "unlimited" case, the loop above will only
|
||||||
|
# terminate when all possible migrations have been effected. If we're
|
||||||
|
# still getting exceptions, there's a problem that requires
|
||||||
|
# intervention. In the max-count case, exceptions are only considered
|
||||||
|
# fatal if no work was done by any other migrations ("not ran"),
|
||||||
|
# because otherwise work may still remain to be done, and that work
|
||||||
|
# may resolve dependencies for the failing migrations.
|
||||||
|
if exceptions and (unlimited or not ran):
|
||||||
|
print(_("Some migrations failed unexpectedly. Check log for "
|
||||||
|
"details."))
|
||||||
|
return 2
|
||||||
|
|
||||||
# TODO(mriedem): Potentially add another return code for
|
# TODO(mriedem): Potentially add another return code for
|
||||||
# "there are more migrations, but not completable right now"
|
# "there are more migrations, but not completable right now"
|
||||||
return ran and 1 or 0
|
return ran and 1 or 0
|
||||||
|
@ -799,13 +799,38 @@ Running batches of 50 until complete
|
|||||||
self.assertEqual(0, total[0])
|
self.assertEqual(0, total[0])
|
||||||
self.assertEqual([50, 50, 50, 50], runs)
|
self.assertEqual([50, 50, 50, 50], runs)
|
||||||
|
|
||||||
def test_online_migrations_error(self):
|
@mock.patch('nova.context.get_admin_context')
|
||||||
fake_migration = mock.MagicMock()
|
def test_online_migrations_error(self, mock_get_context):
|
||||||
fake_migration.side_effect = Exception
|
good_remaining = [50]
|
||||||
fake_migration.__name__ = 'fake'
|
|
||||||
command_cls = self._fake_db_command((fake_migration,))
|
def good_migration(context, count):
|
||||||
|
self.assertEqual(mock_get_context.return_value, context)
|
||||||
|
found = good_remaining[0]
|
||||||
|
done = min(found, count)
|
||||||
|
good_remaining[0] -= done
|
||||||
|
return found, done
|
||||||
|
|
||||||
|
bad_migration = mock.MagicMock()
|
||||||
|
bad_migration.side_effect = test.TestingException
|
||||||
|
bad_migration.__name__ = 'bad'
|
||||||
|
|
||||||
|
command_cls = self._fake_db_command((bad_migration, good_migration))
|
||||||
command = command_cls()
|
command = command_cls()
|
||||||
command.online_data_migrations(None)
|
|
||||||
|
# bad_migration raises an exception, but it could be because
|
||||||
|
# good_migration had not completed yet. We should get 1 in this case,
|
||||||
|
# because some work was done, and the command should be reiterated.
|
||||||
|
self.assertEqual(1, command.online_data_migrations(max_count=50))
|
||||||
|
|
||||||
|
# When running this for the second time, there's no work left for
|
||||||
|
# good_migration to do, but bad_migration still fails - should
|
||||||
|
# get 2 this time.
|
||||||
|
self.assertEqual(2, command.online_data_migrations(max_count=50))
|
||||||
|
|
||||||
|
# When --max-count is not used, we should get 2 if all possible
|
||||||
|
# migrations completed but some raise exceptions
|
||||||
|
good_remaining = [125]
|
||||||
|
self.assertEqual(2, command.online_data_migrations(None))
|
||||||
|
|
||||||
def test_online_migrations_bad_max(self):
|
def test_online_migrations_bad_max(self):
|
||||||
self.assertEqual(127,
|
self.assertEqual(127,
|
||||||
@ -817,19 +842,19 @@ Running batches of 50 until complete
|
|||||||
|
|
||||||
def test_online_migrations_no_max(self):
|
def test_online_migrations_no_max(self):
|
||||||
with mock.patch.object(self.commands, '_run_migration') as rm:
|
with mock.patch.object(self.commands, '_run_migration') as rm:
|
||||||
rm.return_value = {}
|
rm.return_value = {}, False
|
||||||
self.assertEqual(0,
|
self.assertEqual(0,
|
||||||
self.commands.online_data_migrations())
|
self.commands.online_data_migrations())
|
||||||
|
|
||||||
def test_online_migrations_finished(self):
|
def test_online_migrations_finished(self):
|
||||||
with mock.patch.object(self.commands, '_run_migration') as rm:
|
with mock.patch.object(self.commands, '_run_migration') as rm:
|
||||||
rm.return_value = {}
|
rm.return_value = {}, False
|
||||||
self.assertEqual(0,
|
self.assertEqual(0,
|
||||||
self.commands.online_data_migrations(max_count=5))
|
self.commands.online_data_migrations(max_count=5))
|
||||||
|
|
||||||
def test_online_migrations_not_finished(self):
|
def test_online_migrations_not_finished(self):
|
||||||
with mock.patch.object(self.commands, '_run_migration') as rm:
|
with mock.patch.object(self.commands, '_run_migration') as rm:
|
||||||
rm.return_value = {'mig': (10, 5)}
|
rm.return_value = {'mig': (10, 5)}, False
|
||||||
self.assertEqual(1,
|
self.assertEqual(1,
|
||||||
self.commands.online_data_migrations(max_count=5))
|
self.commands.online_data_migrations(max_count=5))
|
||||||
|
|
||||||
|
@ -0,0 +1,13 @@
|
|||||||
|
---
|
||||||
|
upgrade:
|
||||||
|
- |
|
||||||
|
The ``nova-manage db online_data_migrations`` command now returns exit
|
||||||
|
status 2 in the case where some migrations failed (raised exceptions) and
|
||||||
|
no others were completed successfully from the last batch attempted. This
|
||||||
|
should be considered a fatal condition that requires intervention. Exit
|
||||||
|
status 1 will be returned in the case where the ``--max-count`` option was
|
||||||
|
used and some migrations failed but others succeeded (updated at least one
|
||||||
|
row), because more work may remain for the non-failing migrations, and
|
||||||
|
their completion may be a dependency for the failing ones. The command
|
||||||
|
should be reiterated while it returns exit status 1, and considered
|
||||||
|
completed successfully only when it returns exit status 0.
|
Loading…
x
Reference in New Issue
Block a user