November 2015 – Increasing information density

I am in the phase of refactoring a lot of my code from several years ago for a project relying a lot on module-level constants (like database connection configurations). For me, defining constants in the beginning of the module and then several functions based on them that I will be using later on in the code instead of wrapping all the internals in a class that is dynamically initialized every time one of its methods needs to be used elsewhere just sounds much more Pythonic.

However I have been progressively running into more and more issues with this approach. At first, when I tried to use Sphinx-autodoc to extract the API documentation for my project. Sphinx imports modules one by one in order to extract the docstrings and generate an API documentation from them. Things can get messy when it does it on the development machine, but things get worse when the whole process is started in an environment that doesn’t have all the necessary third-party software installed, that would allow for instance a proper database connection. In my case I god hurt by the RTFD and had to solve the problem through the use of environment variables.

on_rtd = os.environ.get('READTHEDOCS', None) == 'True'

This, however lead to the pollution of production code with switches that were there just to prevent constants initialization. In addition to that, a couple of months down the road, when I started using Travis-Ci and writing unit-tests, this practice of using modules came back to bite me in my back again. In fact, when I was importing the modules that contained functions that relied on interaction with database, it automatically pulled the module that was responsible for connection with database and attempted to connect it with the database that was not necessarily present in the Travis-Ci boxed environment nor that I would be particularly eager to test while testing a completely function.

In response to that, I can see several possible ways of managing it:

Keep using the environment variables in the production code. Rely on RTFD to supply READTHEDOCS environment variable and set the UNITTEST environment variable when the unittesting framework is getting started. Check for those environment variables each time we are about to perform an IO operation and mock it if they are true.
Instead of environment variables, use an active configs pattern: import configs.py and read/write variables within it from the modules where it gets imported.
Pull together all the active behavior from the modules into class initialization routines and perform initialization in the __init__.py for classes, once again depending of what is going on.
Use the dynamic nature of Python to monkey-patch actual DB connection module before it gets imported in the subsequent code.

Following a question I’ve asked on Stackoverflow, it seems that the last option would be the most recommended, because it does not involve increasing the complexity of the production code, just move elements to the module that implements the unittesting.

I think that what I would really need to use in Python would be a pre-import patch that would replace some functions BEFORE they are imported in a given environment. All in all it leaves an uneasy feeling of the fact that unlike many other parts of Python, the import system isn’t as well thought through as it should be. If I had to propose an “I’d wish” be of the Python import system, these two suggestions would be the biggest ones:

Import context replacement wrappers:

@patch(numpy, mocked_numpy)
import module_that_uses_numpy

Proper relative imports (there should always be only one right way of doing it):

<Myapp.scripts.user_friendly_script.py>

from MyApp.source.utilities.IO_Manager import my_read_fle

[... rest of the module ..]


> cd MyApp/scripts/
> python user_friendly_script.py
   Works properly!

Compare that to the current way things are implemented:

> python -m MyApp.souruser_friendly_script
   Works properly!
> cd MyApp/scripts/
> python user_friendly_script.py
   Fails...

It seems however that the implementation of the pre-import patching of modules is possible in Python, even if it is not really that easy to implement.

After digging through this blog post, it seems that once modules have been imported once, they are inserted into the `sys.modules` dictionary that buffers them for future imports. In other terms, if I want to do run-time patching, I will need to inject a mock object into that dictionary to override the name that was originally that is used in importing and that leads to the secondary effect of database connection.

Provided that sys.modules modification has a potential to break the Python import machinery, a (relatively) saner injection of Mock module would have been to insert a finder object into sys.meta_path which won’t break the core python import mechanics. This can be achieved by implementing a find_module() class within the importlib.abc.Finder. However, these methods seem to be specific to the Python 3.4 and that we might need to run an alternative import from a path that would instead patch the normal module behavior and mock database connection.

Let’s see if I will manage to pull this one off…

Increasing information density

Evolving ideas

Month: November 2015

Python: importing, monkey-patching and unittesting