295 lines
11 KiB
Markdown
295 lines
11 KiB
Markdown
Ordering issues when monkey patching in Python
|
|
==============================================
|
|
|
|
In my recent post about [safely applying monkey patches in Python](
|
|
11-safely-applying-monkey-patches-in-python.md), I mentioned how one of the
|
|
issues that arises is when a monkey patch is applied. Specifically, if the
|
|
module you need to monkey patch has already been imported and was being
|
|
used by other code, that it could have created a local reference to a
|
|
target function you wish to wrap, in its own namespace. So although your
|
|
monkey patch would work fine where the original function was used direct
|
|
from the module, you would not cover where it was used via a local
|
|
reference.
|
|
|
|
Coincidentally, Ned Batchelder recently [posted](
|
|
http://nedbatchelder.com//blog/201503/finding_temp_file_creators.html)
|
|
about using monkey patching to debug an issue where temporary directories
|
|
were not being cleaned up properly. Ned described this exact issue in
|
|
relation to wanting to monkey patch the 'mkdtemp()' function from the
|
|
'tempfile' module. In that case he was able to find an alternate place
|
|
within the private implementation for the module to patch so as to avoid
|
|
the problem. Using some internal function like this may not always be
|
|
possible however.
|
|
|
|
What I want to start discussing with this post is mechanisms one can use
|
|
from wrapt to deal with this issue of ordering. A major part of the
|
|
solution is what are called post import hooks. This is a mechanism which
|
|
was described in [PEP 369](
|
|
https://www.python.org/dev/peps/pep-0369/) and although it never made it
|
|
into the Python core, it is still possible to graft this ability into
|
|
Python using existing APIs. From this we can then add additional
|
|
capabilities for discovering monkey patching code and automatically apply
|
|
it when modules are imported, before other modules get the module and so
|
|
before they can create a reference to a function in their own namespace.
|
|
|
|
Post import hook mechanism
|
|
--------------------------
|
|
|
|
In PEP 369, a primary use case presented was illustrated by the example:
|
|
|
|
```
|
|
import imp
|
|
|
|
@imp.when_imported('decimal')
|
|
def register(decimal):
|
|
Inexact.register(decimal.Decimal)
|
|
```
|
|
|
|
The basic idea is that when this code was seen it would cause a callback to
|
|
be registered within the Python import system such that when the 'decimal'
|
|
module was imported, that the 'register()' function which the decorator had
|
|
been applied to, would be called. The argument to the 'register()' function
|
|
would be the reference to the module the registration had been against. The
|
|
function could then perform some action against the module before it was
|
|
returned to whatever code originally requested the import.
|
|
|
|
Instead of using the decorator '@imp.when_imported' decorator, one could
|
|
also explicitly use the 'imp.register_post_import_hook()' function to
|
|
register a post import hook.
|
|
|
|
```
|
|
import imp
|
|
|
|
def register(decimal):
|
|
Inexact.register(decimal.Decimal)
|
|
|
|
imp.register_post_import_hook(register, 'decimal')
|
|
```
|
|
|
|
Although PEP 369 was never incorporated into Python, the wrapt module
|
|
provides implementations for both the decorator and the function, but
|
|
within the 'wrapt' module rather than 'imp'.
|
|
|
|
Now what neither the decorator or the function really solved alone was the
|
|
ordering issue. That is, you still had the problem that these could be
|
|
triggered after the target module had already been imported. In this case
|
|
the post import hook function would still be called, albeit for our case
|
|
too late to get in before the reference to the function we want to monkey
|
|
patch had been created in a different namespace.
|
|
|
|
The simplest solution to this problem is to modify the main Python script
|
|
for your application and setup all the post import hook registrations you
|
|
need as the absolute very first thing that is done. That is, before any
|
|
other modules are imported from your application or even modules from the
|
|
standard library used to parse any command line arguments.
|
|
|
|
Even if you are able to do this, because though the registration functions
|
|
require an actual callable, it does mean you are preloading the code to
|
|
perform all the monkey patches. This could be a problem if they in turn had
|
|
to import further modules as the state of your application may not yet have
|
|
been setup such that those imports would succeed.
|
|
|
|
They say though that one level of indirection can solve all problems and
|
|
this is an example of where that principle can be applied. That is, rather
|
|
than import the monkey patching code, you can setup a registration which
|
|
would only lazily load the monkey patching code itself if the module to be
|
|
patched was imported, and then execute it.
|
|
|
|
```
|
|
import sys
|
|
|
|
from wrapt import register_post_import_hook
|
|
|
|
def load_and_execute(name):
|
|
def _load_and_execute(target_module):
|
|
__import__(name)
|
|
patch_module = sys.modules[name]
|
|
getattr(patch_module, 'apply_patch')(target_module)
|
|
return _load_and_execute
|
|
|
|
register_post_import_hook(load_and_execute('patch_tempfile'), 'tempfile')
|
|
```
|
|
|
|
In the module file 'patch_tempfile.py' we would now have:
|
|
|
|
```
|
|
from wrapt import wrap_function_wrapper
|
|
|
|
def _mkdtemp_wrapper(wrapped, instance, args, kwargs):
|
|
print 'calling', wrapped.__name__
|
|
return wrapped(*args, **kwargs)
|
|
|
|
def apply_patch(module):
|
|
print 'patching', module.__name__
|
|
wrap_function_wrapper(module, 'mkdtemp', _mkdtemp_wrapper)
|
|
```
|
|
|
|
Running the first script with the interactive interpreter so as to leave us
|
|
in the interpreter, we can then show what happens when we import the
|
|
'tempfile' module and execute the 'mkdtemp()' function.
|
|
|
|
```
|
|
$ python -i lazyloader.py
|
|
>>> import tempfile
|
|
patching tempfile
|
|
>>> tempfile.mkdtemp()
|
|
calling mkdtemp
|
|
'/var/folders/0p/4vcv19pj5d72m_bx0h40sw340000gp/T/tmpfB8r20'
|
|
```
|
|
|
|
In other words, unlike how most monkey patching is done, we aren't forcibly
|
|
importing a module in order to apply the monkey patches on the basis it
|
|
might be used. Instead the monkey patching code stays dormant and unused
|
|
until the target module is later imported. If the target module is never
|
|
imported, the monkey patch code for that module is itself not even
|
|
imported.
|
|
|
|
Discovery of post import hooks
|
|
------------------------------
|
|
|
|
Post import hooks as described provide a slightly better way of setting up
|
|
monkey patches so they are applied. This is because they are only activated
|
|
if the target module containing the function to be patched is even
|
|
imported. This avoids unnecessarily importing modules you may not even use,
|
|
and which otherwise would increase memory usage of your application.
|
|
|
|
Ordering is still important and as a result it is important to ensure that
|
|
any post import hook registrations are setup before any other modules are
|
|
imported. You also need to modify your application code every time you want
|
|
to change what monkey patches are applied. This latter point could be
|
|
inconvenient if only wanting to add monkey patches infrequently for the
|
|
purposes of debugging issues.
|
|
|
|
A solution to the latter issue is to separate out monkey patches into
|
|
separately installed modules and use a registration mechanism to announce
|
|
their availability. Python applications could then have common boiler plate
|
|
code executed at the very start which discovers based on supplied
|
|
configuration what monkey patches should be applied. The registration
|
|
mechanism would then allow the monkey patch modules to be discovered at
|
|
runtime.
|
|
|
|
One particular registration mechanism which can be used here is
|
|
'setuptools' entry points. Using this we can package up monkey patches so
|
|
they could be separately installed ready for use. The structure of such a
|
|
package would be:
|
|
|
|
```
|
|
setup.py
|
|
src/__init__.py
|
|
src/tempfile_debugging.py
|
|
```
|
|
|
|
The 'setup.py' file for this package will be:
|
|
|
|
```
|
|
from setuptools import setup
|
|
|
|
NAME = 'wrapt_patches.tempfile_debugging'
|
|
|
|
def patch_module(module, function=None):
|
|
function = function or 'patch_%s' % module.replace('.', '_')
|
|
return '%s = %s:%s' % (module, NAME, function)
|
|
|
|
ENTRY_POINTS = [
|
|
patch_module('tempfile'),
|
|
]
|
|
|
|
setup_kwargs = dict(
|
|
name = NAME,
|
|
version = '0.1',
|
|
packages = ['wrapt_patches'],
|
|
package_dir = {'wrapt_patches': 'src'},
|
|
entry_points = { NAME: ENTRY_POINTS },
|
|
)
|
|
|
|
setup(**setup_kwargs)
|
|
```
|
|
|
|
As a convention so that our monkey patch modules are easily identifiable we
|
|
use a namespace package. The parent package in this case will be
|
|
'wrapt_patches' since we are working with wrapt specifically.
|
|
|
|
The name for this specific package will be
|
|
'wrapt_patches.tempfile_debugging' as the theoretical intent is that we are
|
|
going to create some monkey patches to help us debug use of the 'tempfile'
|
|
module, along the lines of what Ned described in his blog post.
|
|
|
|
The key part of the 'setup.py' file is the definition of the
|
|
'entry_points'. This will be set to a dictionary mapping the package name
|
|
to a list of definitions listing what Python modules this package contains
|
|
monkey patches for.
|
|
|
|
The 'src/__init__.py' file will then contain:
|
|
|
|
```
|
|
import pkgutil
|
|
__path__ = pkgutil.extend_path(__path__, __name__)
|
|
```
|
|
|
|
as is required when creating a namespace package.
|
|
|
|
Finally, the monkey patches will actually be contained in
|
|
'src/tempfile_debugging.py' and for now is much like what we had before.
|
|
|
|
```
|
|
from wrapt import wrap_function_wrapper
|
|
|
|
def _mkdtemp_wrapper(wrapped, instance, args, kwargs):
|
|
print 'calling', wrapped.__name__
|
|
return wrapped(*args, **kwargs)
|
|
|
|
def patch_tempfile(module):
|
|
print 'patching', module.__name__
|
|
wrap_function_wrapper(module, 'mkdtemp', _mkdtemp_wrapper)
|
|
```
|
|
|
|
With the package defined we would install it into the Python installation
|
|
or virtual environment being used.
|
|
|
|
In place now of the explicit registrations which we previously added at the
|
|
very start of the Python application main script file, we would instead
|
|
add:
|
|
|
|
```
|
|
import os
|
|
|
|
from wrapt import discover_post_import_hooks
|
|
|
|
patches = os.environ.get('WRAPT_PATCHES')
|
|
|
|
if patches:
|
|
for name in patches.split(','):
|
|
name = name.strip()
|
|
if name:
|
|
print 'discover', name
|
|
discover_post_import_hooks(name)
|
|
```
|
|
|
|
If we were to run the application with no specific configuration to enable
|
|
the monkey patches then nothing would happen. If however they were enabled,
|
|
then they would be automatically discovered and applied as necessary.
|
|
|
|
```
|
|
$ WRAPT_PATCHES=wrapt_patches.tempfile_debugging python -i entrypoints.py
|
|
discover wrapt_patches.tempfile_debugging
|
|
>>> import tempfile
|
|
patching tempfile
|
|
```
|
|
|
|
What would be ideal is if PEP 369 ever did make it into the core of Python
|
|
that a similar bootstrapping mechanism be incorporated into Python itself
|
|
so that it was possible to force registration of monkey patches very early
|
|
during interpreter initialisation. Having this in place we would have a
|
|
guaranteed way of addressing the ordering issue when doing monkey patching.
|
|
|
|
As that doesn't exist right now, what we did in this case was modify our
|
|
Python application to add the bootstrap code ourselves. This is fine where
|
|
you control the Python application you want to be able to potentially apply
|
|
monkey patches to, but what if you wanted to monkey patch a third party
|
|
application and you didn't want to have to modify its code. What are the
|
|
options in that case?
|
|
|
|
As it turns out there are some tricks that can be used in that case. I will
|
|
discuss such options for monkey patching a Python application you can't
|
|
actually modify in my next blog post on this topic of monkey patching.
|