Dependency of a dependency of a dependency

Or why cool projects often fail to get traction

Today I tried to install a project I have been working for a while on a new machine. It relies heavily on storing and quering data in “social network” manner, and hence not necessarily very well adapted to the relational databases. When I was staring to work on it back in the early 2013, I was still a fairly inexperienced programmer, so I decided to go with a new technology to underlie it neo4j graph database. And since I was coding in Python and fairly familiar with the excellent SQLAlchemy ORM and was looking for something similar to work with graph databases my choice fell on the bulbflow framework by James Thronotn. I complemented it with JPype native binding to python for quick insertion of the data. After the first couple of months of developer’s bliss and everything working as expected and being build as fast as humanely possible, I realized that things were not going to be as fun as I initially expected.

Python 3 was not compatible with JPype library that I was accessing to rapidly insert data into neo4j from Python. In addition to that JPype was quickly dropping out of support and was in general too hard to set up, so I had to drop it down.
Bulbflow framework in reality relied on the Gremlin/Groovy Tinkerpop stack implementation in the neo4j database, was working over a REST interface and had no support for batching. Despite several promises of implementation of batching by it’s creator and maintainer, it never came to life and I found myself involved in a re-implementation that would follow that principles. Unfortunately I had not enough experience with programming to develop a library back then, nor enough time to do it. I had instead to settle for a slow insertion cycle (that was more than compensated for by the gain of time on retrieval)
A year later, neo4j released the 2.0 version and dropped the Gremlin/Groovy stack I relied on to run my code. They had however the generosity of leaving the historical 1.9 maintenance branch going, so provided that I had already poured along the lines of three month full-time into configuration and debugging of my code to work with that architecture, I decided to stick with 1.9 and maintain them
Yesterday (two and a half years after start of development, when I had the time to pour the equivalent of six more month of full-time into the development of that project), I realized that the only version of 1.9 neo4j still available for download to common of mortals that will not know how to use maven to assemble the project from GitHub repository is crashing with a “Perm Gen: java heap out of memory” exception. Realistically, provided that I am one of the few people still using 1.9.9 community edition branch and one of the even fewer people likely to run into this problem, I don’t expect developers will dig through all the details to find the place where the error is occurring and correct it. So at that point, my best bet is to put onto GitHub a 1.9.6 neo4j and link to it from my project, hoping that neo4j developers will show enough understanding to not pull it down

All in all, the experience isn’t that terrible, but one thing is for sure. Next time I will be building a project I would see myself maintain in a year’s time and installing on several machines, I will think twice before using a relatively “new” technology, even if it is promising and offers x10 performance gain. Simply because I won’t know how it will be breaking and changing in the upcoming five years and what kind of efforts it will require for me to maintain the dependencies of my project.

Increasing information density

Evolving ideas

Leave a Reply Cancel reply