Problems with a major programming language version bump (Python 2>3)

After about 10 years after the initial Python 3 release and about six months after the end of Python 2 support I have finally bumped my largest and longest-running project to Python 3. Or at least I think so. Until I find some other bug in a rare execution path.

BioFlow is a python project of mine that I have been on-and-off running and maintaining since 2013 – by now almost 7 years. Heavily dependent on high-performance scientific computing libraries and python libaries providing bindings to them (cough scikits.sparse cough), despite Python 3 being out for a couple of years by the time I started working on it none of the libraries I depended supported it yet. So I got on with Python 2 and rolled for it for a number of years. By that time, with several refactors, feature creep and optimization, with about 6.5 k LOC, 2.5 k Lines of comments, 665 commits over 7 years and a solid 30% of test coverage, it is a middle-of-the road python work-horse library.

As many other people running scientific computing libraries I did see a number of things impacting the code: bugs being introduced into the libraries I depended on (hello library version pinning), performance degradation due to anti-spectre attacks on Intel CPUs, libraries disappearing for good (RIP bulbs), databases discontinuing support for the means accessing them I was using (why, oh why neo4j did you drop REST) or host system just crapping itself trying to install the old Fortran libraries that have not yet been properly packaged for it (hello Docker).

Overall, it taught me a number of things about programming craftsmanship, writing quality code and debugging code I forgot the details about. But that’s a topic for another post – back to Python 2 to 3 transition.

Python 2 was working just fine for me, but with its end of life coming near, proper async support and type hinting being added to it, a switch to Python 3 seemed like a logical thing to do to ensure a long-term support.

After several attempts to keep a codebase in Python 2 consistent with Python 3

So I forked off a 2to3 branch and ran the 2to3 script in the main library. At first it seemed that it should have solved most of issues:

  • print xyz was turned into print(xyz)
  • dict.iteritems() was turned into dict.items()
  • izip became zip
  • dict.keys() when fed to an enumerator was turned into list(dict.keys())
  • reader.next() was turned into next(reader)

So I gladly tried to start running my test suite, only to discover that it was completely broken:

  • string.lower("XYZ") now was “XYZ".lower()
  • file("fname", 'w') was now an open("fname", 'w')
  • but sometimes also open("fname", 'wr')
  • and sometimes open("fname", 'rt') or open("fname", 'rb') or open("fname, 'wb'), depending purely on the ingesting library
  • AssertDictEqual or assertItemsEqual (a staple in my unit test suite) disappeared into thin air (guess assertCountEqual will now have to do…)
  • wtf is even with pickle dumps ????

Not to be forgotten that to switch to Python 3 I had to unfreeze dependencies for the libraries I was building on top, which came with its own cans of worms:

  • object.properties[property] now became an object._properties[property] in one of the libraries I heavily depended on (god bless whoever invented Ctrl-F and PyCharm for it’s context-aware class/object usage/definition search)
  • json dumps all of a sudden now require an explicit encoding, just as hashlib digests

And finally, after running for a couple of weeks my library, some previously un-executed branch triggered a bunch of exception arising from the fact that in Python 2 / meant an integer division, unless a float was involved, whereas for Python 3 / is always a float division and an // is needed to trigger an integer division.

I can be in part blamed for those issues. A code with complete unit test coverage would have caught all of the exceptions in the unit-test phase and the integration tests would have caught problems in rare codepaths.

The problem is that no real-life library have a total unit-test or coverage library. Python 3 transition trench warfare hell have killed a number of popular python projects – for instance Gourmet recipe manager (I used to use myself). For hell’s sake, even DropBox, who employs Guido himself and runs a multi-billion business on an almost pure Python stack waited until end 2018 and took about a year to roll-over.

The reality is that the debugging of a major language version bump is **really** different from anything a codebase encounters in its lifetime.

When you write a new feature, you test it out as you develop. Bugs appear as you add lines of code and you can track them down. When a dependencies craps out, the bugs that appear are related to it. It is possible to wrap it and isolate the difference in its response to calls. Debugging is localized and traceable. When you refactor, you change the model of the problem and code organization in your head. The bugs that appear are once again directly triggered by your code modifications.

When the underlying language changes, the bugs appear **everywhere**. You don’t know which line of code could be at the bugged one, and you miss bugs because some bugs obscure other bugs. So you have to do pass after pass after pass of your entire codebase, spending weeks and months tracking exceptions as they pop up and never sure if you have corrected all the bugs yet. It is hard, because you need to load the entire codebase in your head to search for bugs, be aware of the corner cases. It is demoralizing, because you are just trying to get to the point where your code already was, without improving it in any way possible.

It is pretty much a trench warfare hell – stuck in the same place, without clear advantage gained by debugging excursions at the limit of your mental capacities. It is unsurprising that a number of projects never made it to Python 3, especially niche ones made by non-developers and for non-developers – the kind of projects that made Python 2 a loved, universal language that surely would have a library that could solve your niche problem. The problem is so severe in the scientific community, that there is a serious conversation in Nature about starting to use Python 2.7 to maximise projects reproductibility, given it is guaranteed it will never change/

What could have been improved? As a rank-and-file (non-professional) developer of a niche, moderately complex library here’s a couple of things that would have my life **a lot** easier while bumping the Python version:

  • Provide a tool akin to 2to3 and make it default path. It was far from perfect – sure. But it hammered out the bulk of the differences and allowed to code to at least start executing and me – to start catching bugs.
  • Unlike 2to3, it needs to annotate potential problems in the code it could not resolve. 'rt' vs 'rb' was a direct consequence for the text vs byte separation in Python 3 and it was clear problems will arise with that. Same thing for / vs //. 2to3 should have at least high-lighted potential for problems. For me my workflow, adding a # TODO: potential conflict that needs resolution would have gone a loooooong way.
  • Better even, roll out a syntax change in the old language version that will allow the developer to explicitly resolve the ambiguity so that the automated upgrade tools can get more out of the library
  • Don’t touch the unittest functions. They are the lifeblood of the debugging of the library after the language bump. If they bail out, getting them to work would require figuring out how the code they are covering works once again and defeats their purpose.
  • Make sure that the most wide-spread libraries in your ecosystem have performed a roll-over before pushing others to do the same.
  • Those libraries need to provide a “bump” version: aka with exactly the same call syntax from the users code, they would return exactly the same results both in the previous and the new version of the language. Aka the libraries should not be bumping their own major version at the same time they bump the supported langage version.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.