Arguing with Turing is a loosing game (or why predicting some things is hard)

FiveThirtyEight’s 2016 election forecast and Nassim Taleb

Back in August 2016, Nassim Nicholas Taleb and Nate Silver had a massive and a very public run-in on Twitter. Taleb – the author of “Black Swan” and “Antifragile” and Nate – the founder of FiveThirtyEight were arguably the two most famous statisticians alive back at the time, so their spite ended up attracting quite a lot of attention.

The bottom line of the spat was that for Taleb, the FiveThirtyEight 2016 US presidential election model was overly confident. As the events in the race between Clinton and Trump swung the vote intentions one way or another (mostly in response to the quips from Trump), Nate’s model’s forecasts had too large swings in its predictions and confidence intervals to be considered a proper forecast.

Taleb did have a point. Data-based forecasts, be they based on the polls alone or more considerations, are supposed to be successive estimations of a well-defined variable vector – aka voting splits per state, translating in electoral college votes for each candidate on the day of the election. Any subsequent estimation of that variable vector with more data (such as fresher polls) is supposed to be a refinement of the prior estimation. We expect the incertitude range of the subsequent estimations to be smaller than the one of the prior estimation and to be included into the prior predictions incertitude range – at least most of the time, modulo expected errors.

In the vein of the central thesis of “Black Swan”, Taleb’s shtick for 2016 election models – and especially the FiveThrityEight’s one – was that they were prone to overconfidence they had in their predictions and underestimated the probability of rare events that would throw them off. As such, according to him, the only correct estimation technique would be a very conservative one – giving every candidate 50/50% chance of willing right until the night of the election, when it would jump to 100%.

Image
Nassim Taleb’s “Rigorous” “forecasting”

The thing is that Nassim Nicholas Taleb is technically correct here. And Nate Silver was actually very much aware of that himself before Taleb showed up.

You see, the whole point of Nate Solver creating the FiveThirtyEight website and going for rigorous statistics to predict games and election outcomes was specifically the fact that mainstream analysts had hard time with probabilities and would go either for a 50/50 (too close to call), or a 100% sure, never in-between (link). The reality is that there is a whole world in between and that such rounding errors would cost you dearly. As anyone dealing with inherently probabilistic events would painfully learn, for an 80% win bet, you still loose 20% of the time – and have to pay dearly for it.

Back in 2016, Nate’s approach was to make forecasts based on the current state of the polls and the estimation of how much that could be expected to change if the candidates were doing about the same thing as the candidates in past elections. Assuming the candidates don’t do anything out of the ordinary and the environment doesn’t change drastically, his prediction would do well and behave like an actual forecast.

Nate Silver| Talks at Google| The Signal and The Noise
Nate explains what is a forecast is to him

The issue is that you can never enter the same river twice. Candidates will try new things – especially if models show them as loosing if they don’t do anything out of the ordinary.

Hence Taleb is technically correct. Nate Silver’s model predictions will always be overly confident and overly sensible to changes resulting from candidate’s actions or previously unseen changes in the environment.

Decidability, Turing, Solomonoff and Gödel

However Taleb’ technical correctness is completely besides the point. Unless you are God himself (or Solomonoff’s Demon – more about that in a second), you can’t make forecasts that are accurately calibrated.

In fact, any forecast you can make is either trivial or epistemologically wrong – provably.

Let’s for a second imagine ourselves physicists and get a “spherical cow in vacuum” extremely simplified model of the world. All the events are 100% deterministic, everyone obeys well-defined known rules, based on which they will make their decisions, and the starting conditions are known. Basically, if we had a sufficiently powerful computer, we can run the perfect simulation of the universe and come up with the exact prediction of the state of the world and hence of the election result.

Forecast in this situation seems trivial, no? Well, actually no. As Alan Turing – the father of computing – have proved in 1936, unless you run the simulation, in general you cannot predict even if a process (such as voter making up his mind) will be over by the election day. Henry Gordon Rice was even more radical and in 1951 has proven a theorem that can be summarized as “All non-trivial properties of simulation runs cannot be predicted”.

In computer science, forecasting if a process will be over is called halting problem and is known as a prime example of a problem that is undecidable (aka, there is no method to predict the outcome in advance). For those of you with an interest in mathematics, you might have noted a relationship to Gödel’s incompleteness theorem – stating that undecidable problems will exist for any set of postulates that are complex enough to embed the basic algebra.

Things only go downhill once we add some probabilities into the mix.

For those of you who have done some physics, you probably have recognized the extremely simplified model of the world above as Laplace’s demon and have been screaming on the screen about the quantum mechanics and how that’s impossible. You are absolutely correct.

That was one of the reasons that pushed Ray Solomonoff (also known for the Kolmogrov-Solomonoff-Chaitin) to create something he called Algorithmic Probability and a probabilistic pendant to Laplace’s demon – Solomonoff’s demon.

In the Algorithmic probability frameowrlk, any possible theory about the world that can exist has some non-nul probability associated to it and is able to predict something about the state of the world. And to perform an accurate prediction about the probability of the event, you need to calculate the probability of that event according to all theories, weighted by the probability of each theory:

Single event prediction according to Solomonoff (conditional probabilities)
E is event, H is a theory, sum is over all possible theories

Fortunately for us, Solomonoff also provided a (Bayesian) way of learning the probabilities of events according to theories and probabilities of theories themselves, based on the prior events:

Learning from a single new data point according to Solomonoff
D is new data, H is a theory, sum is over all possible theories T, but excluding theory H

Solomonoff was nice enough to provide us with two critical results about his learning. First, that it was admissible – aka optimal. In other terms, there is no other learning mode that could do better. The second – that it was uncomputable. Given that a single learning or prediction step requires an iteration over the entire infinite set of all possible theories, only a being from a thought experiment – Solomonoff’s demon – is actually able to do it.

This means that any computable prediction methodology is necessarily incomplete. In other terms, it is guaranteed not to take in account enough theories for its predictions to be accurate. It’s not a bug or an error – it’s a law.

So when Nassim Taleb says that Nate Silver’s model overestimates its confidence, while technically correct, is also completely trivial and tautological. Worse than that. Solomonoff also proved that in general case, we can’t evaluate by how much our predictions are off. We can’t possibly quantify what we are leaving on the table by using only a finite subset of theories about the universe.

The fact that we cannot evaluate in advance by how much we are off in our predictions basically means that all of Taleb’s own recommendations about investments are basically worthless. Yes his strategy will do better when there are huge bad events that market consensus did not expect. It will however do much worse in case they don’t happen.

Basically, for any forecasts, the, best we can do is something along the lines of Nate Silver’s now-cast + some knowledge gleaned from the most frequent events that occurred in the past. And that “frequent” part is important. What made the FiveThirtyEight model particularly bullish about Trump in 2016 (and in retrospect most correct) was its assumption of correlation of polling errors across the states. Without it, it would have given Trump 0.2% chances to win instead of 30% it ended up giving him. Modelers were able to infer and calibrate this data because it occurred in every prior election cycle.

What they couldn’t have properly calibrated (or included or thought of for the matter) were one-off events that would radically change everything. Think about it. Among things that were possible, although exceedingly improbable in 2016 were:

  • Antifa militia killing Donald Trump
  • Alt-right militia killing Hilary
  • Alt-right terrorist group inspired by Trump’s speeches blowing up the One World Trade Center, causing massive outrage against him
  • Trump making an off-hand remark about grabbing someone’s something that would outrage the majority of his potential electorate
  • It rains in a swing state, meaning 1000 less democratic voters decide last second to note vote because Hillary’s victory is assured
  • FBI director publicly speaking about Hillary’s emails scandal once again, painting her as corrupt politician and causing a massive discouragement among potential democrat voters
  • Aliens showing up and abducting everyone
  • ….

Some of them were predictable, since they happened before (rain) and were built into the model’s uncertainty. Other factors were impossible to predict. In fact, if in November 2015 you would have claimed that Donald Trump will become the next president due to the last-minute FBI intervention, you would have been referred to the nearest mental health specialist for urgent help.

2020 Election

So why write about that spat now, in 2020 – years after the fact?

First, it’s a presidential election year in the US, again, but this time around even with more potential for unexpected turns. Just as in 2016, Nate’s models is drawing criticism, except this time for underestimating Trump’s chances to win instead of overestimating. While Nate and the FiveThrityEight team are trying to account for the new events and be exceedingly clear about what their predictions are, they are limited in what they can reliably quantify. His model – provably – cannot predict all the extraordinary circumstances that still can happen in the upcoming weeks.

There is still time for the Trump campaign to discourage enough female vote by painting Biden as rapist – especially if FBI reports in the week before the election about an ongoing investigation. There is still time for a miracle vaccine to be pushed to the market. This is not an ordinary elections. Many thing can happen and none of them is exactly predictable. There is still room for new modes for voter suppression to pop up and for Amy Barrett to be nominated to make sure the Supreme court will judge the suppression legal.

And no prediction can account for them all. Nate and the FiveThirtyEight team are the best applied statisticians in the politics since Robert Abelson, but they can’t decide the undecideable.

2020 COVID 19 models

Second – and perhaps most importantly – 2020 was a year of COVID 19 and forecasts associated to it. Back in March and April, you had to be either extremely lazy or very epistemologically humble to not try to build your own models of COVID19 spread or not to criticize other’s models.

Some takes and models were such an absolute hot garbage that the fact they could come from people I have respected and admired in the past plunged me into a genuine existential dread.

But putting those aside, even excellent models and knowledgeable experts were constantly off – and by quite a lot as well. Let’s take an example of expert predictions from March 16th about how many total cases they thought the CDC will have reported on March 29th in the US (from FiveThrityEigth for a change). Remind yourself – March 16th was the time when only 3500 cases were reported in the US, but Italy was already out of the ICU beds and was not resuscitating anyone above 65, France was going into confinement and China was still mid-lockdown.

Consensus: ~ 10 000 – 20 000 cases total with 80% upper bound at 81 000.
Reality: 139 000 cases

Yeah. CDC reported about 19 000 cases on the March 29th alone. Total was almost 10 times higher than the predicted range and almost the double of the predicted interval. But the reason they were off wasn’t because experts are wrong and their models are bad. Just as the election models, they can’t account for factors that are impossible to predict. In this specific case, what made all the difference was that CDC’s own COVID 19 tests were defective, and the FDA didn’t start issuing emergency use authorizations for other manufacturer’s tests until around the 16th of March. The failure of the best-financed, most reliable and most up-to-date public health agency to produce reliable tests for the world-threatening pandemics for two months would have been an insane hypothesis to make given the prior track record of the CDC.

More generally, all the attempts to predict the evolution of the COVID19 pandemic in the developed countries ran into the same problem. While we were able to learn rapidly how the virus was spreading and what needed to be done to limit its spread, the actions that would be taken by people and governments of developed countries were a total wild card. Compared to the election forecasts, we don’t have any data from prior pandemics in developed countries to build and calibrate models upon.

Nowhere is it more visible than in the state-of-the-art models. Here is Youyang Gu’s COVID19 projections forecasting website, using a methodology similar to the one of Nate Silver uses to predict the election outcomes (parameters fitting based on prior experience + some mechanistic insight about the underlying process). It is arguable the best performing one. And yet, it failed spectacularly, even a month in advance.

Youyang Gu’s forecast for deaths/day in early July
Youyang Gu’s forecast for deaths/day in early August. His July prediction was off by about 50-100%. I also wouldn’t be as optimistic about November.

And once again, it’s not the fault of Youyang Gu. A large part of that failure is imputable to Florida and other Southern states doing everything to under-report the number of cases and deaths attributed to COVID19.

The model is doing as well as it could, given what we know about the disease and how people have behaved themselves in the past. It’s just that no model can solve the halting problem of “when will the government decide to do something about the pandemic, what it will be and what people will decide to do about it”. Basically the “interventions” part of projections from Richard Neher’s lab COVID19-scenarios is as of now undecidable.

Once you know the interventions, predicting the rest reliably is doable.

Another high-profile example of expert model failure is the Global Health Security Index ranking for countries best prepared for pandemic from October 2019. US and the UK received respectively the best and second best scores for pandemic preparedness. Yet, if we look today (end of October 2020) and the COVID19 per capita among developed countries, they are a close second and third worst performers. They are not an exception. France ranked better than Germany. Yet it has 5 times the number of deaths. China, Vietnam and New Zealand – all thought to be around the 50/195 mark are now the best performers, with a whooping 200, 1700 and 140 times less deaths per capita than the US – the supposed best performer. All because of difference in the leadership approach and how decisive and unrelentless the response to the pandemic from the government was.

GHS index is not useless – it was just based on the previously implicit expectation that governments of countries affected by the outbreaks would aggressively intervene and do all in their capacity to stop the pandemic. After all, USA in the past went as far as to collaborate with the USSR, at the height of the Cold War to to develop and distribute the polio vaccine. It was after all what was observed during the 2010 Swine flu pandemic and during the HIV pandemic (at least when it became clear it was hitting everyone, not just the minorities) and in line with what WHO was seeing in developing countries hit by pandemics that did have a government.

Instead of a conclusion

Forecasting is hard – especially when it is quantitative. In most of cases, predicting what will happen deterministically is impossible. Probabilistic predictions can only base themselves on what happened in the past frequently enough to allow them to be properly calibrated.

They cannot – provably – account for all the possible events, even if we had a perfectly deterministic and known model of the world. We can’t even estimate what the models are leaving on the table when they try to perform their predictions.

So the next time you criticize an expert or modeler and point a shortcoming in their model, think for a second. Are you criticizing them for not having solved the halting problem? Not having bypassed the Rice theorem? Built an axiome set that does not account for a previously un-encountered event? Would the model you want require a Laplace or a Solomonoff Demon to actually run it? Are you – in the end – betting against Turing?

Because if the answer is yes, you are – provably – going to lose.

===========================

A couple of Notes

An epistemologically wrong prediction, can be right, but for the wrong reason. A bit like “even the stopped clock shows the correct time twice a day”. You can pull a one-off, maybe a second one if you are really lucky, but that’s about it. That’s one of the reasons for which any good model can only take in account events that has occurred sufficiently frequently in the past. To insert them into the model, we need to quantify their effect magnitude, uncertainty and the probability of occurrence. Many pundits using single-parameter models in 2016 election to predict a certain and overwhelming win for Hillary learnt it the hard way.

Turing’s halting problem and Rice theorem have some caveats. It is in fact possible to predict some properties of algorithms and programs, assuming they are well-behaved in some sense. In a lot of cases it involves them not having loops or having a finite and small number of branches. There is an entire field of computer science dedicated to developing methods to prove things about algorithms and to writing algorithms in which properties of interest are “trivial” and can be predicted.

Another exception to Turing’s halting problem and Rice’s theorem is given by the law of large number. If we have seen a large number of algorithms and seen the properties of their outcomes, we can make reasonably good predictions about the statistics of those properties in the future. For instance we cannot compute the trajectories of individual molecules in a volume, but we can have a really good idea about their temperature. Similarly, we can’t predict if a given person will become infected before a given day or if they will die of an infection, but we can predict the average number of infections and the 95% confidence interval for a population as a whole – assuming we did see enough outcomes and know the epidemiological interventions and adherence. That’s the reason, for instance, why we are pretty darn sure about how to limit COVID19 spread.

Image
Swiss Cheese defense in depth against COVID19 by Ian M. Mackay

To push the point above further, scientists can often make really good prediction from existing data. Basically, that’s how scientific process work – only theories with confirmed good predictive value are allowed to survive. For instance, back in February we didn’t know for sure what was the probability of the hypothesis “COVID19 spreads less as temperature and humidity increases”, nor what would be the effect on spread. We had however a really good idea that interventions such as masks, hand washing, social distancing, contact tracing, at-risk contacts quarantine, rapid testing and isolation would be effective – because of all we learnt about viral upper respiratory tract diseases over the last century and the complete carnage of the theories that were not good enough at allowing us to predict what would happen next.

There is however an important caveat to the “large number” approach. Before the large numbers laws kick in and modeler’s probabilistic models become quantitative, stochastic fluctuations – for which models really can’t account – dominate and stochastic events – such as super spreader events – play an outsized role, making forecasts particularly brittle. For instance, in France the early super spreader event in Mulhouse (mid-February) meant that Alsace-Moselle took the brunt of the ICU deaths in the first COVID19 wave, followed closely by the cramped-up and cosmopolitan Paris region. One single event, one single case. Predicting it and its effect would have been impossible, even with China-grade mass survelliance.

Mobilising against a pandemic - France's Napoleonic approach to covid-19 |  Europe | The Economist
March COVID19 deaths in France from The Economis

If you want to hear a more rigorous and coherent version of the discussion about decidability, information, Laplace-Bayes-Solomonoff and about automated model design (aka ML/AI mesh with all of that), you can check out Lê Nguyên Hoang and El Mahid El Mhamdi. A lot of what I mention is presented rigorously here, or in French – here.

Problems with a major programming language version bump (Python 2>3)

After about 10 years after the initial Python 3 release and about six months after the end of Python 2 support I have finally bumped my largest and longest-running project to Python 3. Or at least I think so. Until I find some other bug in a rare execution path.

BioFlow is a python project of mine that I have been on-and-off running and maintaining since 2013 – by now almost 7 years. Heavily dependent on high-performance scientific computing libraries and python libaries providing bindings to them (cough scikits.sparse cough), despite Python 3 being out for a couple of years by the time I started working on it none of the libraries I depended supported it yet. So I got on with Python 2 and rolled for it for a number of years. By that time, with several refactors, feature creep and optimization, with about 6.5 k LOC, 2.5 k Lines of comments, 665 commits over 7 years and a solid 30% of test coverage, it is a middle-of-the road python work-horse library.

As many other people running scientific computing libraries I did see a number of things impacting the code: bugs being introduced into the libraries I depended on (hello library version pinning), performance degradation due to anti-spectre attacks on Intel CPUs, libraries disappearing for good (RIP bulbs), databases discontinuing support for the means accessing them I was using (why, oh why neo4j did you drop REST) or host system just crapping itself trying to install the old Fortran libraries that have not yet been properly packaged for it (hello Docker).

Overall, it taught me a number of things about programming craftsmanship, writing quality code and debugging code I forgot the details about. But that’s a topic for another post – back to Python 2 to 3 transition.

Python 2 was working just fine for me, but with its end of life coming near, proper async support and type hinting being added to it, a switch to Python 3 seemed like a logical thing to do to ensure a long-term support.

After several attempts to keep a codebase in Python 2 consistent with Python 3

So I forked off a 2to3 branch and ran the 2to3 script in the main library. At first it seemed that it should have solved most of issues:

  • print xyz was turned into print(xyz)
  • dict.iteritems() was turned into dict.items()
  • izip became zip
  • dict.keys() when fed to an enumerator was turned into list(dict.keys())
  • reader.next() was turned into next(reader)

So I gladly tried to start running my test suite, only to discover that it was completely broken:

  • string.lower("XYZ") now was “XYZ".lower()
  • file("fname", 'w') was now an open("fname", 'w')
  • but sometimes also open("fname", 'wr')
  • and sometimes open("fname", 'rt') or open("fname", 'rb') or open("fname, 'wb'), depending purely on the ingesting library
  • AssertDictEqual or assertItemsEqual (a staple in my unit test suite) disappeared into thin air (guess assertCountEqual will now have to do…)
  • wtf is even with pickle dumps ????

Not to be forgotten that to switch to Python 3 I had to unfreeze dependencies for the libraries I was building on top, which came with its own cans of worms:

  • object.properties[property] now became an object._properties[property] in one of the libraries I heavily depended on (god bless whoever invented Ctrl-F and PyCharm for it’s context-aware class/object usage/definition search)
  • json dumps all of a sudden now require an explicit encoding, just as hashlib digests

And finally, after running for a couple of weeks my library, some previously un-executed branch triggered a bunch of exception arising from the fact that in Python 2 / meant an integer division, unless a float was involved, whereas for Python 3 / is always a float division and an // is needed to trigger an integer division.

I can be in part blamed for those issues. A code with complete unit test coverage would have caught all of the exceptions in the unit-test phase and the integration tests would have caught problems in rare codepaths.

The problem is that no real-life library have a total unit-test or coverage library. Python 3 transition trench warfare hell have killed a number of popular python projects – for instance Gourmet recipe manager (I used to use myself). For hell’s sake, even DropBox, who employs Guido himself and runs a multi-billion business on an almost pure Python stack waited until end 2018 and took about a year to roll-over.

The reality is that the debugging of a major language version bump is **really** different from anything a codebase encounters in its lifetime.

When you write a new feature, you test it out as you develop. Bugs appear as you add lines of code and you can track them down. When a dependencies craps out, the bugs that appear are related to it. It is possible to wrap it and isolate the difference in its response to calls. Debugging is localized and traceable. When you refactor, you change the model of the problem and code organization in your head. The bugs that appear are once again directly triggered by your code modifications.

When the underlying language changes, the bugs appear **everywhere**. You don’t know which line of code could be at the bugged one, and you miss bugs because some bugs obscure other bugs. So you have to do pass after pass after pass of your entire codebase, spending weeks and months tracking exceptions as they pop up and never sure if you have corrected all the bugs yet. It is hard, because you need to load the entire codebase in your head to search for bugs, be aware of the corner cases. It is demoralizing, because you are just trying to get to the point where your code already was, without improving it in any way possible.

It is pretty much a trench warfare hell – stuck in the same place, without clear advantage gained by debugging excursions at the limit of your mental capacities. It is unsurprising that a number of projects never made it to Python 3, especially niche ones made by non-developers and for non-developers – the kind of projects that made Python 2 a loved, universal language that surely would have a library that could solve your niche problem. The problem is so severe in the scientific community, that there is a serious conversation in Nature about starting to use Python 2.7 to maximise projects reproductibility, given it is guaranteed it will never change/

What could have been improved? As a rank-and-file (non-professional) developer of a niche, moderately complex library here’s a couple of things that would have my life **a lot** easier while bumping the Python version:

  • Provide a tool akin to 2to3 and make it default path. It was far from perfect – sure. But it hammered out the bulk of the differences and allowed to code to at least start executing and me – to start catching bugs.
  • Unlike 2to3, it needs to annotate potential problems in the code it could not resolve. 'rt' vs 'rb' was a direct consequence for the text vs byte separation in Python 3 and it was clear problems will arise with that. Same thing for / vs //. 2to3 should have at least high-lighted potential for problems. For me my workflow, adding a # TODO: potential conflict that needs resolution would have gone a loooooong way.
  • Better even, roll out a syntax change in the old language version that will allow the developer to explicitly resolve the ambiguity so that the automated upgrade tools can get more out of the library
  • Don’t touch the unittest functions. They are the lifeblood of the debugging of the library after the language bump. If they bail out, getting them to work would require figuring out how the code they are covering works once again and defeats their purpose.
  • Make sure that the most wide-spread libraries in your ecosystem have performed a roll-over before pushing others to do the same.
  • Those libraries need to provide a “bump” version: aka with exactly the same call syntax from the users code, they would return exactly the same results both in the previous and the new version of the language. Aka the libraries should not be bumping their own major version at the same time they bump the supported langage version.

On masks and SARS-CoV-2

This comment was initially a response to a youtube video from Tech Ingredients – a channel I have in the past thoroughly enjoyed for their in-depth dive into scientific and engineering aspects of various heavy on engineering DIY projects. Unfortunately, I am afraid that panic around COVID19 has prevented a lot of people from thinking straight and I could but disagree with the section on masks.

==

Hey there – Engineer turned biomedical scientist here. I absolutely love your videos and have been enjoying them a lot, but I believe that in this specific domain I should have enough experience to point out what appears to me as overlooked and is likely to chase drastically your recommendation on masks.

First of all, the operation room masks and the standard medical masks are extremely different beasts – if anything their capacity to filter out small particles, close in size to droplets transporting COVID19 at the longest distance is much closer to N95s than those of standard medical masks:

masks filtration efficiency

The standard medical masks let through about 70% of droplets on the smaller end of those that can carry SARS-CoV-2. A decrease in exposure of such magnitude has not been associated with a statistically significant reduction in contagion rates in any respiratory transmitted disease.

So why are standard medical masks recommended for sick people? The main reason for that is that in order to get into the air, the viral particles need to be aerosolized by coughing/sneezing/speaking by a contaminated person. The mask does not do well at preventing small particles from getting in and out, but it will prevent, at least partially the aerosolization, especially for larger droplets – that will contain more viruses and hence be more dangerous.

Now, that means that if you really want to protect yourself, rather than using a mask, even surgical, it’s much better to use a full face shield – while useless against aerosolized particles suspended in the air, it will protect you from the largest and most dangerous droplets.

Why do medical people need them?
The reality is that without the N95 masks and in immediate contact with the patients, the risk of them getting infected is pretty high even in what is considered as “safe” areas – as well as passing the virus to their colleagues and patients in those “safe” areas. If let spreading, due to the over-representation of serious cases in the hospital environment, it is not impossible that the virus will evolve to forms that lead to more serious symptoms. Even if we can’t protect the medical personnel, preventing those of them who are asymptomatic from spreading the virus is critical for everyone (besides – masks are also for patients – if you look at pictures in China, all patients wear them).

Second, why did WHO not recommend the use of N95 masks to the general public at the beginning of this outbreak, whereas they did that for SARS-CoV in 2002-2004 outbreak almost as soon as it became known to the West?

Unlike the first SARS-CoV, SARS-CoV-2 does not remain suspended in aerosols for prolonged periods of time it does not form clouds of aerosolized particles that remain in suspension and can infect someone who is passing through the cloud hours after the patient who spread it left. For SARS-CoV-2, the droplets fall to the ground fairly rapidly – within a couple of meters and a couple of minutes (where they can be picked up – hence hand washing and gloves). Due to that, unlike SARS-CoV, SARS-CoV-2 transmission is mostly driven by direct face-to-face contact with virus-containing droplets landing on the faces of people in direct contact.

Situation changes in hospitals and ICU wards – with a number of patients constantly aerosolizing, small particles do not have the time to fall and the medical personnel is at less than a couple of meters from patients due to the place constraints. However, even in the current conditions, the N95 masks are only used in the aerosol-generating procedures, such as patient intubation.

Once again, for most people, face shield, keeping several meters of distance and keeping your hands clean and away from your face are the absolute best bang-for-buck there is with everything else having significantly decreasing returns.

==

PS: since I wrote this paper, a number of science journalists have done an excellent job at doing in-depth research on the subject and write up their findings in an accessible manner:

In addition to that, a Nature study has been recently published, indicating that while masks are really good at preventing large droplets formation (yay), when it comes to small droplets formation (the type that can float for a little bit), it’s not that great for Influenza. The great news is that for Coronavirus, since there are few droplets of that size formed, it works great and containing any type of viral particles emission: Nature Medicine Study.

Scale-Free networks nonsense or Science vs Pseudo-Science

(this article’s title is a nod to Lior Pachter vitriolic arc of 3 articles with similar title)

Over the last couple of days I was engaged in a debate with Lê from Science4All about what exactly science was, that spun off from his interview with an evolutionary psychologist and my own vision of evolutionary psychology in its current state as a pseudo-science.

While not necessarily always easy and at times quite movemented, this conversation was quite enlightening and let me to trying to lay down

Following the recent paper about scale-free networks not being that spread in the actual environment (that I first got as a gist from Lior Pachter’s blog back in 2015) helped me to formalize a little bit better what I believe I feel a pseudo-science is.

Just as the models and theories within the scientific method itself, something being a scientific approach is not defined or proved. Instead, similarly to the NIST definition of random numbers through a series of tests that all need to be successfully passed, the definition of a scientific approach is a lot of time defined from what it’s not, whereas pseudo-science is defined as something that tries to pass itself as a scientific method but fails one or several tests.

Here are some of my rules of thumb for the criteria defining pseudo-science:

The model is significantly more complicated that what the existing data and prior knowledge would warrant. This is particularly true for generative models not building on the deep pre-existing knowledge of components.

The theory is a transplant from another domain where it worked well, without all the correlated complexity and without justifying that the transposition is still valid. Evolutionary psychology is a transplant from molecular evolutionary theory,

The success in another domain is advanced as the main argument for the applicability/correctness of the theory in the new domain.

The model claims are non-falsifiable.

The model is not incremental/emergent from a prior model.

There are no closely related, competing models that are considered upon application to choices.

The cases where the model fails are not defined and are not acknowledged. Evo psy – modification of the environment by humans. Scale-Free networks.

Back-tracking on the claims, without changing the final conclusion. This is different with regards to affining the model where the change in the model gets propagated to the final conclusion and that conclusion is then re-compared with reality. Sometimes mends are done to that model for it to align with the reality again, but at least during a period, the model is still considered as false.

Support by a cloud of plausible, but refuted claims rather than a couple of strong, hard to currently attack the claims.

The defining feature of pseudo-science however, epsecially compared to the faulty science is its refusal to accept the criticism/limitations to the theory and change its prediction accordingly. It always needs to fit the final maxim, no matter the data.

Synergy from the boot on Ubuntu

This one seemed to be quite trivial per official blog, but the whole pipeline gets a bit more complicated once the SSL enters into the game. Here is how I made it work with synergy and Ubuntu 14.04

  • Configure the server and the client with the GUI application
  • Make sure SSL server certificate fingerprint was stored in the ~/.synergy/SSL/Fingerprints/TrustedServers.txt
  • Run sudo -su myself /usr/bin/synergyc -f --enable-crypto my.server.ip.address
  • After that check everything was working with sudo /usr/bin/synergyc -d DEBUG2 -f --enable-crypto my.server.ip.address
  • Finally add the greeter-setup-script=sudo /usr/bin/synergyc --enable-crypto my.server.ip.address line into the /etc/lightdm/lightdm.conf file under the [SeatDefaults] section

Why you shouldn’t do it?

Despite the convenience, there seemed to be a bit or an interference for the keyboard command and command interpretation on my side, so since my two computers side by side and since I have an usb button switch from before I got synergy, I’ve decided to manually start synergy every time I log in.

Writing a research paper with ReStructuredText

Why?

As a part of my life as a Ph.D. student, I have to read a large number of scientific papers and I have seen a couple of them written. What struck me was that these papers have a great deal of internal structure (bibliographic references, references to figure, definitions, adaptation to a particular public, journal, etc…). However, they are written all at once, usually in a word document, in a way that all that structure lives and can only be verified in the writer’s head.

As a programmer, I am used to dealing with organizing complex structured text as well – my source code. However, the experience has shown me that relying on what is inside your brain to keep the structure of your project together works only for a couple of lines of code. Beyond that, I have no choice but to rely on compilers, linters, static analysis tools and IDE tools to help me dealing with the logical structure of my program and prevent me from planting logical bombs and destroying one aspect of my work while I am focusing on the other. An even bigger problem is to keep myself motivated while writing a program over several months and learning new tools I need to do it efficiently.

From that perspective, writing the papers in a word file is very similar to trying to implement high-level abstraction with very low-level code. In fact, so low-level, you are not allowed to import from other files (automatically declare common abbreviations in your paper), define functions and import upon execution. Basically, the only thing you can rely on is the manuscript reviews by the writers and the editors/reviewers of journals where the paper is submitted. Well, unless they get stuck in an infinite loop.

Alt text

So I decided to give it a try and write my paper in the same way I would write a program: using modules, declarations, import, compilation. And also throwing in some comments, version control and ways to organize revisions.

Why ReStructuredText?

I loved working with it when I was documenting my projects in Python with Sphinx. It allows quite a lot of operations I was looking for, such as .. include:: and .. replace:: directives. It is well supported in the PyCharm editor I am using, allowing me to fire up my favorite dark theme and go fullscreen to remain distraction-free. It also can be translated with pandoc to both .docx for my non-techy Professor and LaTeX files for my mathy collaborator.

It also allowed me to type the formulas with raw mathematics in LaTeX notation quite easily by using .. math:: directive.

How did it go?

Not too bad so far. I had some problems with pandoc’s ability to convert my .rst file tree into .docx, especially when it came to failing on the .. import:: and citation formatting. (link corresponding issues) There was also some issue with rendering .png images in the .docx format as well (link issue). In the end, I had to translate the .rst files into html and with rst2html tool and then html to docx. For now, I am still trying to see how good of the advantages it is giving me.

After some writing, I’ve noticed that I am now being saved from the pending references. for instance, at some point I wrote a reference [KimKishnoy2013]_ in my text and while writing biblio I realized the paper came out in 2012, so defined the paper there as .. [KimKishnoy2012]. And rst compilation engine threw an error on compilation about Unknown target name: "kimkishnoy2013" Yay, no more dead references! The Same thing is true for the references defined in the bibliography but not used in the text.

Now a shortcoming of this method of writing is the fact that inter-part transitions do not come naturally. It can be easily mitigated once the writers’ block has been overcome by writing all parts separately by opening a compiled HTML or .docx document and editing the elements os that they align properly.

An additional difference with the tools that has been developed to review code is that the information density and consistency in the programming languages is closer to mathematical notations rather than a human-readable text with all the redundancy necessary for a proper understanding. A word change or a line change is a big deal in the case of programming. It isn’t so important in the case of writing. and all the annotation and diff tools used for that are not very useful.

On the other hand, it is still related to the fact that human language is still a low-level language. Git won’t be as useful to review the binaries that it is when reviewing the programming languages that are important.

Over the time, a two significant problems emerged with that approach. First – incorporating the revisions. Since the other people in the revision pipeline are using the MS word built-in review tools, in order to address every single revision I have to find the location in the text tree file where the revision needs to be made, then correct it. Doing it once is pretty straight-forward. Doing it hundreds upon hundreds of time across tens of revision by different collaborators is an entire thing altogether and is pretty tiresome.

The second problem is related to the first. When the revisions are more major and require re-writing and re-organization of entire parts, I need to go and edit the parts one by one, then figure which part contents is going into what new part. Which is a lot of side work for not a lot of added value in the end.

What is is still missing?

  • Conditional rendering rules. There are some tags that I would want to see when I am rendering my document for proofreading and correction (like parts name, my comments, reviewer comments), but not in the final “build”.

  • Linters. I would really like to run something like a Hemingway app on my text, as well as some kind of Clonedigger to figure out where I tend to repeat myself over and over again and make sure I only repeat myself when I was to. In other terms automated tools for proof-reading the text and figuring out how well it is understood. It seems that I am not the only one to have the idea: Proselint creators seem to have had the same idea and implemented a linter for prose in python. Something I am quite excited about, even though they are still in the infancy because of how liberal the spoken language is compared to programming language. We will likely see a lot of improvements in the future, with the development in NLP and machine learning. Here are a couple of other linter functions I could see be really useful.

    • Check that the first sentence of each paragraph describes what the paragraph will be about
    • Check that all the sentences can be parsed (subject-verb-object-qualifiers)
    • Check that there is no abrupt interruption in the words used between close sentences.
    • Check for the digressions and build a narration tree.
  • Words outside common vocabulary that are undefined. I tend to write to several very different public about topics they are very knowledgeable about and sometimes not really. The catch is that since I am writing about it, I am knowledgeable about them, to the point that sometimes I fail to realize that some words might need. If I have an app that shows me words that I introduce that are rare and that I don’t define, I could rapidly adapt my paper to a different public or like the reviewers like to ask unpack a bit.

  • chaining with other apps. There are already applications that do a great job on the structuring the citation and referencing to the desired format. I need to find a way to pipe the results of .rst text compilation into them so that they can adapt the citation framework in a way that is consistent with the publication we are writing.

  • Skeptic module. I am writing scientific papers. Ideally, my every assertion in the introduction has to be backed by an appropriate citation and every paragraph in the methods and results section has to be backed by the data, either in the figures or in supplementary data.

  • A proper management of mathematical formulas. They tend to get really broken. Latex is the answer, but it would be nice if the renderings of formulae could also be translated into HTML or docx, that has it’s own set of mathematical notation (MS office always had to do it differently from open source standards).

  • Way to write from the requirements. In software we have something we refer to as unittests: pre-defined behaviors we want our code to be able to implement. As a rule of thumb, accumulation of unittests is a tedious process, that is nonetheless critical for building a system and validating that upon change our software is still behaving in a way we expect it to. In writing we want to transmit a certain set of concepts to our readers, but because the human time is so valuable, we regularly fail at that task, especially when we fail to see that 100th+ revision makes a concept that is referred to in a paper not be defined anymore. In software it is a little bit like acting as a computer and executing the software in your head. Works well for small pieces, but there are edge cases and what you know about what program should do that really gets into the way.

Dependency of a dependency of a dependency

Or why cool projects often fail to get traction

Today I tried to install a project I have been working for a while on a new machine. It relies heavily on storing and quering data in “social network” manner, and hence not necessarily very well adapted to the relational databases. When I was staring to work on it back in the early 2013, I was still a fairly inexperienced programmer, so I decided to go with a new technology to underlie it neo4j graph database. And since I was coding in Python and fairly familiar with the excellent SQLAlchemy ORM and was looking for something similar to work with graph databases my choice fell on the bulbflow framework by James Thronotn. I complemented it with JPype native binding to python for quick insertion of the data. After the first couple of months of developer’s bliss and everything working as expected and being build as fast as humanely possible, I realized that things were not going to be as fun as I initially expected.

  •  Python 3 was not compatible with JPype library that I was accessing to rapidly insert data into neo4j from Python. In addition to that JPype was quickly dropping out of support and was in general too hard to set up, so I had to drop it down.
  • Bulbflow framework in reality relied on the Gremlin/Groovy Tinkerpop stack implementation in the neo4j database, was working over a REST interface and had no support for batching. Despite several promises of implementation of batching by it’s creator and maintainer, it never came to life and I found myself involved in a re-implementation that would follow that principles. Unfortunately I had not enough experience with programming to develop a library back then, nor enough time to do it. I had instead to settle for a slow insertion cycle (that was more than compensated for by the gain of time on retrieval)
  • A year later, neo4j released the 2.0 version and dropped the Gremlin/Groovy stack I relied on to run my code. They had however the generosity of leaving the historical 1.9 maintenance branch going, so provided that I had already poured along the lines of three month full-time into configuration and debugging of my code to work with that architecture, I decided to stick with 1.9 and maintain them
  • Yesterday (two and a half years after start of development, when I had the time to pour the equivalent of six more month of full-time into the development of that project), I realized that the only version of 1.9 neo4j still available for download to common of mortals that will not know how to use maven to assemble the project from GitHub repository is crashing with a “Perm Gen: java heap out of memory” exception. Realistically, provided that I am one of the few people still using 1.9.9 community edition branch and one of the even fewer people likely to run into this problem, I don’t expect developers will dig through all the details to find the place where the error is occurring and correct it. So at that point, my best bet is to put onto GitHub a 1.9.6  neo4j and link to it from my project, hoping that neo4j developers will show enough understanding to not pull it down

All in all, the experience isn’t that terrible, but one thing is for sure. Next time I will be building a project I would see myself maintain in a year’s time and installing on several machines, I will think twice before using a relatively “new” technology, even if it is promising and offers x10 performance gain. Simply because I won’t know how it will be breaking and changing in the upcoming five years and what kind of efforts it will require for me to maintain the dependencies of my project.

Usability of fitness trackers: lots of improvement in sight

Fitness trackers and other wearable techs are gaining more and more momentum, but because of the ostrich cognitive bias they are absolutely not reaching the populations that would benefit most from them. And as per usual, ReadWriteWeb is pretty good at  pointing this out in a simple language.

To sum up, current fitness tracking has several short-comings for the population it would target:

  • It is pretty expensive. Fitness band that does just the step tracking can cost somewhere between $50 and $150. If you are trying to go something more comprehensive, such as one of the Garmin’s multisport watches, you are looking for somewhere in the $300-$500. Hardly an impulsive purchase for someone who is getting under 30k a year and have kids to feed from that. However they are the group at highest risk from obesity and cardiovascular disease.
  • They generate a LOT of data that is hard to interpret, unless you have some background as a trained athlete. Knowing your Vmax and hear-rate descent profile following an error is pretty cool and useful for monitoring your health and fitness, but you will never know how to do it, unless someone explains it to you or you already knew it from your previous athletic career.
  • They do not provide any pull-in. As anyone with a bank account would know, saving comes from the repeated effort in duration. Same as with health capital. However, as anyone with a bank account knows, when you hit hard financial times, you watch your bank account much less than during the times where everything is going well. Just because it is rewarding in the latter case and painful in the first. Same thing with health: people who lack health but are ready to do it are self-conscious about it and need an additional driving motivation to make them last through the periods where no progress is happening
  • It does not respond to an immediate worry and is one of those products that are “good to have”, but whose absence does not lead to a “I need it RIGHT NOW” feeling

 

With that in mind, I decided to participate in MedHacks 1.0 last weekend. My goal was to develop something that would provide an emergency warning for users that are either at high risk of stroke or undergoing it, so they would not get themselves isolated while having a stroke. With my team, we managed to hack together a proof of concept prototype in about 24 hours, which took us into finals. In order to do this, we used an audio mixing board to amplify the signal, Audacity to acquire the data on a computer, FFT and pattern matching to retrieve the data and filter out loss-of-contact issues and build an app in Android that was able to send out a message/call for help if the pattern changed.

Now, those are very simple components that could be compressed on a single sensor woven into a T-shirt and beamed onto a phone for analysis in background. We would need to do some machine learning to be able to detect most common anomalies and then validation by human experts of the acquired EKG.

However, the combination of persistently monitoring cheap device and an app that is able to exploit it opens large possibilities for fitness tracking for those most needing it.

  • The reason to purchase and use the monitoring device is not fitness anymore. It is basic safety. And can be offered by someone who is worried for your health.
  • The basic functionality is really clear. Something is going on wrong with you, we will warn you. Something is going really wrong, we will warn someone who can check on your or come to your rescue.
  • We can build upon the basic functionality, introducing our users to the dynamics of fitness in a similar way games introduce competitive challenges: gradually and leaving you the time to learn at your pace.
  • We have a very precise access to the amount of effort. Your heart rhythm will follow if you are doing a sternous directed activity and we will guide you in it
  • We were able to build a prototype with very common materials. Compression and mass-production will allow us to hit the lowest market range, at a price where you are paying for a smart athletic piece of clothing only marginally more than for the same “non-smart” piece of clothing.

Sounds interesting? I am looking for someone with clinical experience in hear diseases, a hardware hacker that would have experience with wearable and someone to drive the consumer prospection and sales.