Jupyter/Ipython notebooks

After writing it down a couple of weeks ago for Hacker News, here is the recap and some updates:

I am a computational biologist with a heavy emphasis on the data analysis. I did try Jupyter a couple of years ago and here are my concerns with it, compared to my usual flow (Pycharm + pure python + pickle to store results of heavy processing).

  1. Extracting functions is harder
  2. Your git commits become completely borked
  3. Opening some data-heavy notebooks is neigh impossible once they have been shut down
  4. Import of other modules you have in local is pretty non-trivial.
  5. Refactoring is pretty hard
  6. Sphinx for autodoc extraction is pretty much out of the picture
  7. Non-deterministic re-runs – depending on the cell
    execution order you can get very different results. That’s an issue
    when you are coming back to your code a couple of months later and
    try to figure what you did to get there.
  8. Connecting to the ipython notebook, even from the environments like Pycharm is highly non-trivial, just as the mapping to the OS
    filesystem
  9. Hard to impossible to inspect the contents of the ipython notebook when it’s hosted on Github due to the encoding snafus

There are likely work-arounds for most of these problems, but the issue is that with my standard workflow they are non-issues to start with.

In my experience, Jupyter is pretty good if you rely only on existing libraries that you are piecing together, but once you need to do more involved development work, you are screwed.

Recommendation engine lock-in

Youtube’s recommendation engine, at least in my experience, has three modes:
– Suggest the channels of which I’ve already watched the content:
– Suggest me the content I’ve already watched to watch again
– Suggest me the new updates on the playing lists of which I’ve already watched several videos

Unfortunately, while it works very well when I’ve just discovered a new couple of channels and have their content chosen and pushed to me, it fails to deliver the experience of discovery – it’s overfitting my late preferences, locking me in into the videos similar to what I have watched instead of suggesting me new content and new types of content I might be interested in. And seen that I also experience the same problem with the Quora’s recommendation engine (a couple of upvotes and all of my feed is almost exclusively army weapon tech).

I feel like the recommendation engine creators should abandon their blind faith into general algorithms and try to figure out how to create feeds that are interesting and engaging with respect to several categories of interest of their user, as well covering several reasons I might be seeking for a recommendation to what to watch (what is everyone else is watching – have something to discuss with my friends; discover something new; follow up on topics I am already interested in, …)

Health Data interpretation

I used to like to use Tactio Health App back in the day, before the introduction of the Apple Health Kit.

However, after getting a more modern iPhone and installing it onto it, I realized that despite the fact that Tactio Health was reading tons of data from the Health app, it was only writing weight to it. So all of my details related to blood pressure measurements, blood analyses, et Co were locked-in inside the app and it had no intention to share it.

Scanning the App store for apps that would cover that angle actually lead me to a realization – there are tons of copycat apps with slightly different flavors covering four major directions: workout tracking/guidance, weight loss/gain, periods tracking, and baby-related apps.

All in all, there are no lifestyle tracking apps to keep an eye on your habits and warn when you are getting into a lifestyle that would lead to dire health consequences. And there is even less collaboration between apps that try to do it – and Tactio Health is a case in point.

More interestingly, it looks like there are no market right now for that kind of apps – either the users are already bent on keeping their health intact and don’t need any reminders, or they are so hopelessly behind that the “you are too bad” tone of the current apps is way too discouraging.

At the same time, I can understand the reticence of the users to put their health data out there, in the wild, while knowing that potentially this data can be used to deny them coverage in the future or drive their premiums up.

Food/activity tracking apps

I am back to trying to get an insight on quantifying my life and am running into the same problem that I used to always experience with the activity/food trackers in the past. They are simply not made to encourage people to change and maintain changes. Just a couple of problems to start with:

  • The activity tracking suggests at least ~150 minutes cardio per week. If a new user is just starting and switching from a sedentary lifestyle and are trying to go into an active one, this will be deadly to them – the most they can carry out is 60 minutes of cardio at maximum for the first month and a half. Trying to get to 150 minutes is a guaranteed recipe for failure to adhere more than for the first week or so, either because of the lack of will or lack of because they will hurt themselves by trying to ramp up too fast. A better way of doing it would be to take ~2 weeks of monitoring upon each uninterrupted session, then suggest a ramp-up that would gradually improve the habits of the user in a way that would stick in the long run.
  • In my own experience, the reason a lot of people end up in a pretty bad shape is not necessary because they don’t know any better, they don’t have the time because of their work and other occupations, that constantly make self-care slide to the end of their list of priorities. A lot of activity/food tracking solutions require a lot of active input from the user and because of that, tend to have a low adherence rate, especially in the long term. A much better option would be to perform monitoring in the long run that requires almost none
  • Specifically for the food trackers – the lack of a unified repository of products and ability to fraction amount of them consumed. I was able to find for some of them teas that contain cholesterol (WTF?), but wasn’t able to see what was in unless I reviewed the labels.
  • And as per usual, the current state of the trackers is deplorable when it comes to measuring anything outside the calories. A lot of “healthy” foods are healthy not so much because they contain fewer calories, but because they contain a lot of micro-elements and vitamins that make them cover and prevent cravings in the long run.

Bonsu point: Apple health app unifying different apps. That doesn’t seem like much, but it definitely stitches all the apps together into one, making sure the information flows inside the health app ecosystem, allowing me to log in an activity once, as opposed to 3-4 times before that, and still benefit from the best of all the apps without having to deal with the worst.

Sleep monitors and internet of things

I do think that the sleep monitors should not require an active action from the user to activate them every night. Instead, it should be something that runs in the background – like GPS or pedometer in your telephone for walking distance monitoring.

Hence I see a tool that would be having two following functions:

  • movement detection for the quality of sleep computation
  • light detection, in order to figure out when you are sleeping or could be potentially sleeping

Usability of adhesion systems

Catch-22 with a pretty large health insurance website: – you need to give us the first payment to get your card – in order to perform a payment, you need to log-in. – to log-in your first need to register – to register you need you adhesion number – to get your adhesion number, you first need your card.

Best part? When I tried calling, I had to wait ~ 1 hour to get connected to the right person, a with every telephone tree branch saying to me that I needed to go to the website to do everything I needed. In addition to that, after waiting all that time, I was told I needed to wait until the invoice was generated.

Morale:

  1. Make sure you solicit user’s action only when your system is ready for it and when that action is likely to succeed.
  2. Make your user create an account that would be recognized from the go, even if it would mean that there will be nothing shown on his account.
  3. Have a collection point where the reports of your “happy system” malfunctions would go.
  4. Register failures to properly use the interface and progressively build a database of corner cases and edit your system fall-backs to account for them.
  5. Always test for usability to check that there are no catch-22 that will waste your tech support time.

Bonus points for the website – there is a paper invoice I hold in my hands, but the website shows that no invoice was generated I could pay for. Final bonus point – COMIC SANS. On the main USER-facing GUI page. Overriding other “sane” types.

Dependency of a dependency of a dependency

Or why cool projects often fail to get traction

Today I tried to install a project I have been working for a while on a new machine. It relies heavily on storing and quering data in “social network” manner, and hence not necessarily very well adapted to the relational databases. When I was staring to work on it back in the early 2013, I was still a fairly inexperienced programmer, so I decided to go with a new technology to underlie it neo4j graph database. And since I was coding in Python and fairly familiar with the excellent SQLAlchemy ORM and was looking for something similar to work with graph databases my choice fell on the bulbflow framework by James Thronotn. I complemented it with JPype native binding to python for quick insertion of the data. After the first couple of months of developer’s bliss and everything working as expected and being build as fast as humanely possible, I realized that things were not going to be as fun as I initially expected.

  •  Python 3 was not compatible with JPype library that I was accessing to rapidly insert data into neo4j from Python. In addition to that JPype was quickly dropping out of support and was in general too hard to set up, so I had to drop it down.
  • Bulbflow framework in reality relied on the Gremlin/Groovy Tinkerpop stack implementation in the neo4j database, was working over a REST interface and had no support for batching. Despite several promises of implementation of batching by it’s creator and maintainer, it never came to life and I found myself involved in a re-implementation that would follow that principles. Unfortunately I had not enough experience with programming to develop a library back then, nor enough time to do it. I had instead to settle for a slow insertion cycle (that was more than compensated for by the gain of time on retrieval)
  • A year later, neo4j released the 2.0 version and dropped the Gremlin/Groovy stack I relied on to run my code. They had however the generosity of leaving the historical 1.9 maintenance branch going, so provided that I had already poured along the lines of three month full-time into configuration and debugging of my code to work with that architecture, I decided to stick with 1.9 and maintain them
  • Yesterday (two and a half years after start of development, when I had the time to pour the equivalent of six more month of full-time into the development of that project), I realized that the only version of 1.9 neo4j still available for download to common of mortals that will not know how to use maven to assemble the project from GitHub repository is crashing with a “Perm Gen: java heap out of memory” exception. Realistically, provided that I am one of the few people still using 1.9.9 community edition branch and one of the even fewer people likely to run into this problem, I don’t expect developers will dig through all the details to find the place where the error is occurring and correct it. So at that point, my best bet is to put onto GitHub a 1.9.6  neo4j and link to it from my project, hoping that neo4j developers will show enough understanding to not pull it down

All in all, the experience isn’t that terrible, but one thing is for sure. Next time I will be building a project I would see myself maintain in a year’s time and installing on several machines, I will think twice before using a relatively “new” technology, even if it is promising and offers x10 performance gain. Simply because I won’t know how it will be breaking and changing in the upcoming five years and what kind of efforts it will require for me to maintain the dependencies of my project.

Usability of fitness trackers: lots of improvement in sight

Fitness trackers and other wearable techs are gaining more and more momentum, but because of the ostrich cognitive bias they are absolutely not reaching the populations that would benefit most from them. And as per usual, ReadWriteWeb is pretty good at  pointing this out in a simple language.

To sum up, current fitness tracking has several short-comings for the population it would target:

  • It is pretty expensive. Fitness band that does just the step tracking can cost somewhere between $50 and $150. If you are trying to go something more comprehensive, such as one of the Garmin’s multisport watches, you are looking for somewhere in the $300-$500. Hardly an impulsive purchase for someone who is getting under 30k a year and have kids to feed from that. However they are the group at highest risk from obesity and cardiovascular disease.
  • They generate a LOT of data that is hard to interpret, unless you have some background as a trained athlete. Knowing your Vmax and hear-rate descent profile following an error is pretty cool and useful for monitoring your health and fitness, but you will never know how to do it, unless someone explains it to you or you already knew it from your previous athletic career.
  • They do not provide any pull-in. As anyone with a bank account would know, saving comes from the repeated effort in duration. Same as with health capital. However, as anyone with a bank account knows, when you hit hard financial times, you watch your bank account much less than during the times where everything is going well. Just because it is rewarding in the latter case and painful in the first. Same thing with health: people who lack health but are ready to do it are self-conscious about it and need an additional driving motivation to make them last through the periods where no progress is happening
  • It does not respond to an immediate worry and is one of those products that are “good to have”, but whose absence does not lead to a “I need it RIGHT NOW” feeling

 

With that in mind, I decided to participate in MedHacks 1.0 last weekend. My goal was to develop something that would provide an emergency warning for users that are either at high risk of stroke or undergoing it, so they would not get themselves isolated while having a stroke. With my team, we managed to hack together a proof of concept prototype in about 24 hours, which took us into finals. In order to do this, we used an audio mixing board to amplify the signal, Audacity to acquire the data on a computer, FFT and pattern matching to retrieve the data and filter out loss-of-contact issues and build an app in Android that was able to send out a message/call for help if the pattern changed.

Now, those are very simple components that could be compressed on a single sensor woven into a T-shirt and beamed onto a phone for analysis in background. We would need to do some machine learning to be able to detect most common anomalies and then validation by human experts of the acquired EKG.

However, the combination of persistently monitoring cheap device and an app that is able to exploit it opens large possibilities for fitness tracking for those most needing it.

  • The reason to purchase and use the monitoring device is not fitness anymore. It is basic safety. And can be offered by someone who is worried for your health.
  • The basic functionality is really clear. Something is going on wrong with you, we will warn you. Something is going really wrong, we will warn someone who can check on your or come to your rescue.
  • We can build upon the basic functionality, introducing our users to the dynamics of fitness in a similar way games introduce competitive challenges: gradually and leaving you the time to learn at your pace.
  • We have a very precise access to the amount of effort. Your heart rhythm will follow if you are doing a sternous directed activity and we will guide you in it
  • We were able to build a prototype with very common materials. Compression and mass-production will allow us to hit the lowest market range, at a price where you are paying for a smart athletic piece of clothing only marginally more than for the same “non-smart” piece of clothing.

Sounds interesting? I am looking for someone with clinical experience in hear diseases, a hardware hacker that would have experience with wearable and someone to drive the consumer prospection and sales.

Competition does not always bring quality: case study of shopping apps

The problem is simple. I would like to have an app to help me manage my shopping list for me.

Until now I have been using AwesomeNote’s notebook filled with a lot of “todo” boxes and a separate note for each shopping session. This was kinda working ok, but could be better.

First, I realized I had plenty of checkboxes unchecked from a previous shopping session that I still might want to be aware of when I am shopping.  What would be really cool, is that there could be an overall checkbox set where once I would have checked out something it would disappear. Until I add next time. Or even better, until it popped out itself: it shouldn’t be hard to predict what I am buying weekly or even monthly and add it automatically.

Second, I realized that my shopping list was context-dependent. I might do most of my grocery shoppings at one place, but sometimes I need something specific from a different shop, where I don’t go that often. By the time I reach it, the note I’ve made it buried deep underneath my shopping lists. Some location-awareness could be pretty cool.

Finally, I kinda don’t like typing too much, especially if it’s the same thing. If it could do a nice autocomplete or even an intelligent UI that would save me time spend in the app, that would be pretty cool.

Having a pretty good picture of what I wanted (location-aware shopping list app with a quick UI and predictive analytics ) I set out to find one. There are literally thousands of them all over the appstore; there should be at least one that would fit me needs, no?

Nope. Despite all the power of google and AppCrawler I am still looking for the one I want.

TO BE CONTINUED…

Mathematica: encapsulation impossible

Among the most frustrating languages I’ve encountered so far, Mathematica definitely ranks pretty high. Compared to it, R, the master troll of statistical languages pales in comparison. At the moment of writing this post I’ve just spend two hours trying to wrap a function that I manage to make work in the main namespace into a Module that I would call with given parameters. Not that I am a beginner programmer, or that I am not familiar with LISP and symbolic languages or meta-programming. Quite to the opposite. Despite an awesome potention and regular media attention, Mathematica is an incredibly hard language to properly program in, no matter what your background is.

Impossible functionalization.

So I’ve just spend two hours trying to re-write three lines of code I was already using as a stand-alone notebook. In theory (according to Mathematica), it should be pretty simple: define a “Method[{variables,  operations}]”, and replace operations with the commands from my notebook I would like to encapsulate and variables with variables I would like to be able to change in order to modify the behavior of my code.

The problem is that never worked. And no matter how in depth I was going into the documentation of the Method[.., ..] and individual commands I was going, I could not figure out why.

You have an error somewhere, but I won’t tell where

One of the main reasons for frustration and failure on the way of debugging. Mathematica returns error WITHOUT STACK, which means that the only thing you get is the name of the error and the link towards the official documentation that explains where the error might come from in very general terms (20 lines or less).

The problem is that since your error most likely won’t occur until the execution stack hits the internals of other functions, by the time your error is raised and returned to you, you have no freaking idea of:

a) Where the error was raised
b) What arguments raised it
c) What you need to do get to the desired behavior

And since the API/implementation of individual functions is nowhere to be found, your best chance is to start randomly changing your code until it works. Or go google different combination of your code and/or errors, hoping that someone already run into an error similar to yours in similar conditions and found out how to correct it.

Which actually really blows out of proportion the ration of questions asked about Wolfram Language compared to the output it provides:

Yup. The only programming language to have its own, separate and very active stack exchange, and yet REALY, REALY, inferior compared to MATLAB and R, its closest domain-specific cousins. Actually with regard to output it provides it is buried among the languages you’d probably never heard about.

You might have an error, but I won’t tell you

In addition to returning stackless errors, Mathematica is a fail-late language, which means it will try to convert and transform the data silently to force it through the function until it fails. This two error management techniques on their own are already pretty nasty and have been cleaned away from most commonly used languages, so their combination is pretty disastrous on its own.

However, Mathematica does not stop there in further making error detection a challenge. Mathematica has several underlying basic operation models, such as re-writing, substitution or evaluation, which correspond to the same concepts, but do very different things to exactly same data. And they are arbitrarily mixed and NEVER EXPLICITLY MENTIONED IN THE DOCUMENTATION.

Multiple basic operations is what makes this language powerful and suited for abstraction and mathematical  computation. But since they are arbitrarily mixed without being properly documented, the amount of error they generate and debugging they require is pretty insane and offsets in a large part the comfort they provide.

No undo or version control

Among the things that are almost as frustrating as the Mathematica errors is the execution model of Wolfram language.  Mathematica workbooks (and hence the code you are writing) are first-class objects. Objects on which the language reasons on itself and which might get modified extensively upon execution. Which is an awesome idea.

What is much less awesome is the implementation of that idea. In particular the fact that the workbook can get modified extensively upon execution means that reconstructing what the code looked like before the previous operation might be impossible. So Mathematica discards the whole notion of code tracking.

Yes, you read it right.

Any edits to code are permanent. There is also absolutely no integration with version control, making an occasional fat-finger error of delete-evaluate a critical error that will make you loose hours of work. Unless you have 400 files to which you’ve “saved as” the notebook every five minutes.

You just don’t get it

In all this leaves a pretty consistent impression that language designers had absolutely no consideration for the user, valuing much less user’s work (code) then theirs, and showing it in the complete absence of safeguards of any kind, proper error tracking or proper code modification tracking. All of which made their work of creating and maintaining the language much easier at the expense of making user’s work much, much harder.

A normal language would get over such initial period of roughness and round itself by a base of contributors and a flow of feed-back from users. However Mathematica is a closed-source language, developed by a selected few, who would snob user’s input and instead of improving the language based on the input would persist in explaining to those trying to provide them feedback how the users “just don’t get it”.

For sure, Mathematica has a lots of great power to it. Unfortunately this power remains and will remain inaccessible to the vast majority of the commoners because of impossible syntax, naming convention and debugging experience straight from an era where just pointing to a line of code where the error occurred was waaay beyond the horizon of possible