Arguing with Turing is a loosing game (or why predicting some things is hard)

FiveThirtyEight’s 2016 election forecast and Nassim Taleb

Back in August 2016, Nassim Nicholas Taleb and Nate Silver had a massive and a very public run-in on Twitter. Taleb – the author of “Black Swan” and “Antifragile” and Nate – the founder of FiveThirtyEight were arguably the two most famous statisticians alive back at the time, so their spite ended up attracting quite a lot of attention.

The bottom line of the spat was that for Taleb, the FiveThirtyEight 2016 US presidential election model was overly confident. As the events in the race between Clinton and Trump swung the vote intentions one way or another (mostly in response to the quips from Trump), Nate’s model’s forecasts had too large swings in its predictions and confidence intervals to be considered a proper forecast.

Taleb did have a point. Data-based forecasts, be they based on the polls alone or more considerations, are supposed to be successive estimations of a well-defined variable vector – aka voting splits per state, translating in electoral college votes for each candidate on the day of the election. Any subsequent estimation of that variable vector with more data (such as fresher polls) is supposed to be a refinement of the prior estimation. We expect the incertitude range of the subsequent estimations to be smaller than the one of the prior estimation and to be included into the prior predictions incertitude range – at least most of the time, modulo expected errors.

In the vein of the central thesis of “Black Swan”, Taleb’s shtick for 2016 election models – and especially the FiveThrityEight’s one – was that they were prone to overconfidence they had in their predictions and underestimated the probability of rare events that would throw them off. As such, according to him, the only correct estimation technique would be a very conservative one – giving every candidate 50/50% chance of willing right until the night of the election, when it would jump to 100%.

Image
Nassim Taleb’s “Rigorous” “forecasting”

The thing is that Nassim Nicholas Taleb is technically correct here. And Nate Silver was actually very much aware of that himself before Taleb showed up.

You see, the whole point of Nate Solver creating the FiveThirtyEight website and going for rigorous statistics to predict games and election outcomes was specifically the fact that mainstream analysts had hard time with probabilities and would go either for a 50/50 (too close to call), or a 100% sure, never in-between (link). The reality is that there is a whole world in between and that such rounding errors would cost you dearly. As anyone dealing with inherently probabilistic events would painfully learn, for an 80% win bet, you still loose 20% of the time – and have to pay dearly for it.

Back in 2016, Nate’s approach was to make forecasts based on the current state of the polls and the estimation of how much that could be expected to change if the candidates were doing about the same thing as the candidates in past elections. Assuming the candidates don’t do anything out of the ordinary and the environment doesn’t change drastically, his prediction would do well and behave like an actual forecast.

Nate Silver| Talks at Google| The Signal and The Noise
Nate explains what is a forecast is to him

The issue is that you can never enter the same river twice. Candidates will try new things – especially if models show them as loosing if they don’t do anything out of the ordinary.

Hence Taleb is technically correct. Nate Silver’s model predictions will always be overly confident and overly sensible to changes resulting from candidate’s actions or previously unseen changes in the environment.

Decidability, Turing, Solomonoff and Gödel

However Taleb’ technical correctness is completely besides the point. Unless you are God himself (or Solomonoff’s Demon – more about that in a second), you can’t make forecasts that are accurately calibrated.

In fact, any forecast you can make is either trivial or epistemologically wrong – provably.

Let’s for a second imagine ourselves physicists and get a “spherical cow in vacuum” extremely simplified model of the world. All the events are 100% deterministic, everyone obeys well-defined known rules, based on which they will make their decisions, and the starting conditions are known. Basically, if we had a sufficiently powerful computer, we can run the perfect simulation of the universe and come up with the exact prediction of the state of the world and hence of the election result.

Forecast in this situation seems trivial, no? Well, actually no. As Alan Turing – the father of computing – have proved in 1936, unless you run the simulation, in general you cannot predict even if a process (such as voter making up his mind) will be over by the election day. Henry Gordon Rice was even more radical and in 1951 has proven a theorem that can be summarized as “All non-trivial properties of simulation runs cannot be predicted”.

In computer science, forecasting if a process will be over is called halting problem and is known as a prime example of a problem that is undecidable (aka, there is no method to predict the outcome in advance). For those of you with an interest in mathematics, you might have noted a relationship to Gödel’s incompleteness theorem – stating that undecidable problems will exist for any set of postulates that are complex enough to embed the basic algebra.

Things only go downhill once we add some probabilities into the mix.

For those of you who have done some physics, you probably have recognized the extremely simplified model of the world above as Laplace’s demon and have been screaming on the screen about the quantum mechanics and how that’s impossible. You are absolutely correct.

That was one of the reasons that pushed Ray Solomonoff (also known for the Kolmogrov-Solomonoff-Chaitin) to create something he called Algorithmic Probability and a probabilistic pendant to Laplace’s demon – Solomonoff’s demon.

In the Algorithmic probability frameowrlk, any possible theory about the world that can exist has some non-nul probability associated to it and is able to predict something about the state of the world. And to perform an accurate prediction about the probability of the event, you need to calculate the probability of that event according to all theories, weighted by the probability of each theory:

Single event prediction according to Solomonoff (conditional probabilities)
E is event, H is a theory, sum is over all possible theories

Fortunately for us, Solomonoff also provided a (Bayesian) way of learning the probabilities of events according to theories and probabilities of theories themselves, based on the prior events:

Learning from a single new data point according to Solomonoff
D is new data, H is a theory, sum is over all possible theories T, but excluding theory H

Solomonoff was nice enough to provide us with two critical results about his learning. First, that it was admissible – aka optimal. In other terms, there is no other learning mode that could do better. The second – that it was uncomputable. Given that a single learning or prediction step requires an iteration over the entire infinite set of all possible theories, only a being from a thought experiment – Solomonoff’s demon – is actually able to do it.

This means that any computable prediction methodology is necessarily incomplete. In other terms, it is guaranteed not to take in account enough theories for its predictions to be accurate. It’s not a bug or an error – it’s a law.

So when Nassim Taleb says that Nate Silver’s model overestimates its confidence, while technically correct, is also completely trivial and tautological. Worse than that. Solomonoff also proved that in general case, we can’t evaluate by how much our predictions are off. We can’t possibly quantify what we are leaving on the table by using only a finite subset of theories about the universe.

The fact that we cannot evaluate in advance by how much we are off in our predictions basically means that all of Taleb’s own recommendations about investments are basically worthless. Yes his strategy will do better when there are huge bad events that market consensus did not expect. It will however do much worse in case they don’t happen.

Basically, for any forecasts, the, best we can do is something along the lines of Nate Silver’s now-cast + some knowledge gleaned from the most frequent events that occurred in the past. And that “frequent” part is important. What made the FiveThirtyEight model particularly bullish about Trump in 2016 (and in retrospect most correct) was its assumption of correlation of polling errors across the states. Without it, it would have given Trump 0.2% chances to win instead of 30% it ended up giving him. Modelers were able to infer and calibrate this data because it occurred in every prior election cycle.

What they couldn’t have properly calibrated (or included or thought of for the matter) were one-off events that would radically change everything. Think about it. Among things that were possible, although exceedingly improbable in 2016 were:

  • Antifa militia killing Donald Trump
  • Alt-right militia killing Hilary
  • Alt-right terrorist group inspired by Trump’s speeches blowing up the One World Trade Center, causing massive outrage against him
  • Trump making an off-hand remark about grabbing someone’s something that would outrage the majority of his potential electorate
  • It rains in a swing state, meaning 1000 less democratic voters decide last second to note vote because Hillary’s victory is assured
  • FBI director publicly speaking about Hillary’s emails scandal once again, painting her as corrupt politician and causing a massive discouragement among potential democrat voters
  • Aliens showing up and abducting everyone
  • ….

Some of them were predictable, since they happened before (rain) and were built into the model’s uncertainty. Other factors were impossible to predict. In fact, if in November 2015 you would have claimed that Donald Trump will become the next president due to the last-minute FBI intervention, you would have been referred to the nearest mental health specialist for urgent help.

2020 Election

So why write about that spat now, in 2020 – years after the fact?

First, it’s a presidential election year in the US, again, but this time around even with more potential for unexpected turns. Just as in 2016, Nate’s models is drawing criticism, except this time for underestimating Trump’s chances to win instead of overestimating. While Nate and the FiveThrityEight team are trying to account for the new events and be exceedingly clear about what their predictions are, they are limited in what they can reliably quantify. His model – provably – cannot predict all the extraordinary circumstances that still can happen in the upcoming weeks.

There is still time for the Trump campaign to discourage enough female vote by painting Biden as rapist – especially if FBI reports in the week before the election about an ongoing investigation. There is still time for a miracle vaccine to be pushed to the market. This is not an ordinary elections. Many thing can happen and none of them is exactly predictable. There is still room for new modes for voter suppression to pop up and for Amy Barrett to be nominated to make sure the Supreme court will judge the suppression legal.

And no prediction can account for them all. Nate and the FiveThirtyEight team are the best applied statisticians in the politics since Robert Abelson, but they can’t decide the undecideable.

2020 COVID 19 models

Second – and perhaps most importantly – 2020 was a year of COVID 19 and forecasts associated to it. Back in March and April, you had to be either extremely lazy or very epistemologically humble to not try to build your own models of COVID19 spread or not to criticize other’s models.

Some takes and models were such an absolute hot garbage that the fact they could come from people I have respected and admired in the past plunged me into a genuine existential dread.

But putting those aside, even excellent models and knowledgeable experts were constantly off – and by quite a lot as well. Let’s take an example of expert predictions from March 16th about how many total cases they thought the CDC will have reported on March 29th in the US (from FiveThrityEigth for a change). Remind yourself – March 16th was the time when only 3500 cases were reported in the US, but Italy was already out of the ICU beds and was not resuscitating anyone above 65, France was going into confinement and China was still mid-lockdown.

Consensus: ~ 10 000 – 20 000 cases total with 80% upper bound at 81 000.
Reality: 139 000 cases

Yeah. CDC reported about 19 000 cases on the March 29th alone. Total was almost 10 times higher than the predicted range and almost the double of the predicted interval. But the reason they were off wasn’t because experts are wrong and their models are bad. Just as the election models, they can’t account for factors that are impossible to predict. In this specific case, what made all the difference was that CDC’s own COVID 19 tests were defective, and the FDA didn’t start issuing emergency use authorizations for other manufacturer’s tests until around the 16th of March. The failure of the best-financed, most reliable and most up-to-date public health agency to produce reliable tests for the world-threatening pandemics for two months would have been an insane hypothesis to make given the prior track record of the CDC.

More generally, all the attempts to predict the evolution of the COVID19 pandemic in the developed countries ran into the same problem. While we were able to learn rapidly how the virus was spreading and what needed to be done to limit its spread, the actions that would be taken by people and governments of developed countries were a total wild card. Compared to the election forecasts, we don’t have any data from prior pandemics in developed countries to build and calibrate models upon.

Nowhere is it more visible than in the state-of-the-art models. Here is Youyang Gu’s COVID19 projections forecasting website, using a methodology similar to the one of Nate Silver uses to predict the election outcomes (parameters fitting based on prior experience + some mechanistic insight about the underlying process). It is arguable the best performing one. And yet, it failed spectacularly, even a month in advance.

Youyang Gu’s forecast for deaths/day in early July
Youyang Gu’s forecast for deaths/day in early August. His July prediction was off by about 50-100%. I also wouldn’t be as optimistic about November.

And once again, it’s not the fault of Youyang Gu. A large part of that failure is imputable to Florida and other Southern states doing everything to under-report the number of cases and deaths attributed to COVID19.

The model is doing as well as it could, given what we know about the disease and how people have behaved themselves in the past. It’s just that no model can solve the halting problem of “when will the government decide to do something about the pandemic, what it will be and what people will decide to do about it”. Basically the “interventions” part of projections from Richard Neher’s lab COVID19-scenarios is as of now undecidable.

Once you know the interventions, predicting the rest reliably is doable.

Another high-profile example of expert model failure is the Global Health Security Index ranking for countries best prepared for pandemic from October 2019. US and the UK received respectively the best and second best scores for pandemic preparedness. Yet, if we look today (end of October 2020) and the COVID19 per capita among developed countries, they are a close second and third worst performers. They are not an exception. France ranked better than Germany. Yet it has 5 times the number of deaths. China, Vietnam and New Zealand – all thought to be around the 50/195 mark are now the best performers, with a whooping 200, 1700 and 140 times less deaths per capita than the US – the supposed best performer. All because of difference in the leadership approach and how decisive and unrelentless the response to the pandemic from the government was.

GHS index is not useless – it was just based on the previously implicit expectation that governments of countries affected by the outbreaks would aggressively intervene and do all in their capacity to stop the pandemic. After all, USA in the past went as far as to collaborate with the USSR, at the height of the Cold War to to develop and distribute the polio vaccine. It was after all what was observed during the 2010 Swine flu pandemic and during the HIV pandemic (at least when it became clear it was hitting everyone, not just the minorities) and in line with what WHO was seeing in developing countries hit by pandemics that did have a government.

Instead of a conclusion

Forecasting is hard – especially when it is quantitative. In most of cases, predicting what will happen deterministically is impossible. Probabilistic predictions can only base themselves on what happened in the past frequently enough to allow them to be properly calibrated.

They cannot – provably – account for all the possible events, even if we had a perfectly deterministic and known model of the world. We can’t even estimate what the models are leaving on the table when they try to perform their predictions.

So the next time you criticize an expert or modeler and point a shortcoming in their model, think for a second. Are you criticizing them for not having solved the halting problem? Not having bypassed the Rice theorem? Built an axiome set that does not account for a previously un-encountered event? Would the model you want require a Laplace or a Solomonoff Demon to actually run it? Are you – in the end – betting against Turing?

Because if the answer is yes, you are – provably – going to lose.

===========================

A couple of Notes

An epistemologically wrong prediction, can be right, but for the wrong reason. A bit like “even the stopped clock shows the correct time twice a day”. You can pull a one-off, maybe a second one if you are really lucky, but that’s about it. That’s one of the reasons for which any good model can only take in account events that has occurred sufficiently frequently in the past. To insert them into the model, we need to quantify their effect magnitude, uncertainty and the probability of occurrence. Many pundits using single-parameter models in 2016 election to predict a certain and overwhelming win for Hillary learnt it the hard way.

Turing’s halting problem and Rice theorem have some caveats. It is in fact possible to predict some properties of algorithms and programs, assuming they are well-behaved in some sense. In a lot of cases it involves them not having loops or having a finite and small number of branches. There is an entire field of computer science dedicated to developing methods to prove things about algorithms and to writing algorithms in which properties of interest are “trivial” and can be predicted.

Another exception to Turing’s halting problem and Rice’s theorem is given by the law of large number. If we have seen a large number of algorithms and seen the properties of their outcomes, we can make reasonably good predictions about the statistics of those properties in the future. For instance we cannot compute the trajectories of individual molecules in a volume, but we can have a really good idea about their temperature. Similarly, we can’t predict if a given person will become infected before a given day or if they will die of an infection, but we can predict the average number of infections and the 95% confidence interval for a population as a whole – assuming we did see enough outcomes and know the epidemiological interventions and adherence. That’s the reason, for instance, why we are pretty darn sure about how to limit COVID19 spread.

Image
Swiss Cheese defense in depth against COVID19 by Ian M. Mackay

To push the point above further, scientists can often make really good prediction from existing data. Basically, that’s how scientific process work – only theories with confirmed good predictive value are allowed to survive. For instance, back in February we didn’t know for sure what was the probability of the hypothesis “COVID19 spreads less as temperature and humidity increases”, nor what would be the effect on spread. We had however a really good idea that interventions such as masks, hand washing, social distancing, contact tracing, at-risk contacts quarantine, rapid testing and isolation would be effective – because of all we learnt about viral upper respiratory tract diseases over the last century and the complete carnage of the theories that were not good enough at allowing us to predict what would happen next.

There is however an important caveat to the “large number” approach. Before the large numbers laws kick in and modeler’s probabilistic models become quantitative, stochastic fluctuations – for which models really can’t account – dominate and stochastic events – such as super spreader events – play an outsized role, making forecasts particularly brittle. For instance, in France the early super spreader event in Mulhouse (mid-February) meant that Alsace-Moselle took the brunt of the ICU deaths in the first COVID19 wave, followed closely by the cramped-up and cosmopolitan Paris region. One single event, one single case. Predicting it and its effect would have been impossible, even with China-grade mass survelliance.

Mobilising against a pandemic - France's Napoleonic approach to covid-19 |  Europe | The Economist
March COVID19 deaths in France from The Economis

If you want to hear a more rigorous and coherent version of the discussion about decidability, information, Laplace-Bayes-Solomonoff and about automated model design (aka ML/AI mesh with all of that), you can check out Lê Nguyên Hoang and El Mahid El Mhamdi. A lot of what I mention is presented rigorously here, or in French – here.

Problems with a major programming language version bump (Python 2>3)

After about 10 years after the initial Python 3 release and about six months after the end of Python 2 support I have finally bumped my largest and longest-running project to Python 3. Or at least I think so. Until I find some other bug in a rare execution path.

BioFlow is a python project of mine that I have been on-and-off running and maintaining since 2013 – by now almost 7 years. Heavily dependent on high-performance scientific computing libraries and python libaries providing bindings to them (cough scikits.sparse cough), despite Python 3 being out for a couple of years by the time I started working on it none of the libraries I depended supported it yet. So I got on with Python 2 and rolled for it for a number of years. By that time, with several refactors, feature creep and optimization, with about 6.5 k LOC, 2.5 k Lines of comments, 665 commits over 7 years and a solid 30% of test coverage, it is a middle-of-the road python work-horse library.

As many other people running scientific computing libraries I did see a number of things impacting the code: bugs being introduced into the libraries I depended on (hello library version pinning), performance degradation due to anti-spectre attacks on Intel CPUs, libraries disappearing for good (RIP bulbs), databases discontinuing support for the means accessing them I was using (why, oh why neo4j did you drop REST) or host system just crapping itself trying to install the old Fortran libraries that have not yet been properly packaged for it (hello Docker).

Overall, it taught me a number of things about programming craftsmanship, writing quality code and debugging code I forgot the details about. But that’s a topic for another post – back to Python 2 to 3 transition.

Python 2 was working just fine for me, but with its end of life coming near, proper async support and type hinting being added to it, a switch to Python 3 seemed like a logical thing to do to ensure a long-term support.

After several attempts to keep a codebase in Python 2 consistent with Python 3

So I forked off a 2to3 branch and ran the 2to3 script in the main library. At first it seemed that it should have solved most of issues:

  • print xyz was turned into print(xyz)
  • dict.iteritems() was turned into dict.items()
  • izip became zip
  • dict.keys() when fed to an enumerator was turned into list(dict.keys())
  • reader.next() was turned into next(reader)

So I gladly tried to start running my test suite, only to discover that it was completely broken:

  • string.lower("XYZ") now was “XYZ".lower()
  • file("fname", 'w') was now an open("fname", 'w')
  • but sometimes also open("fname", 'wr')
  • and sometimes open("fname", 'rt') or open("fname", 'rb') or open("fname, 'wb'), depending purely on the ingesting library
  • AssertDictEqual or assertItemsEqual (a staple in my unit test suite) disappeared into thin air (guess assertCountEqual will now have to do…)
  • wtf is even with pickle dumps ????

Not to be forgotten that to switch to Python 3 I had to unfreeze dependencies for the libraries I was building on top, which came with its own cans of worms:

  • object.properties[property] now became an object._properties[property] in one of the libraries I heavily depended on (god bless whoever invented Ctrl-F and PyCharm for it’s context-aware class/object usage/definition search)
  • json dumps all of a sudden now require an explicit encoding, just as hashlib digests

And finally, after running for a couple of weeks my library, some previously un-executed branch triggered a bunch of exception arising from the fact that in Python 2 / meant an integer division, unless a float was involved, whereas for Python 3 / is always a float division and an // is needed to trigger an integer division.

I can be in part blamed for those issues. A code with complete unit test coverage would have caught all of the exceptions in the unit-test phase and the integration tests would have caught problems in rare codepaths.

The problem is that no real-life library have a total unit-test or coverage library. Python 3 transition trench warfare hell have killed a number of popular python projects – for instance Gourmet recipe manager (I used to use myself). For hell’s sake, even DropBox, who employs Guido himself and runs a multi-billion business on an almost pure Python stack waited until end 2018 and took about a year to roll-over.

The reality is that the debugging of a major language version bump is **really** different from anything a codebase encounters in its lifetime.

When you write a new feature, you test it out as you develop. Bugs appear as you add lines of code and you can track them down. When a dependencies craps out, the bugs that appear are related to it. It is possible to wrap it and isolate the difference in its response to calls. Debugging is localized and traceable. When you refactor, you change the model of the problem and code organization in your head. The bugs that appear are once again directly triggered by your code modifications.

When the underlying language changes, the bugs appear **everywhere**. You don’t know which line of code could be at the bugged one, and you miss bugs because some bugs obscure other bugs. So you have to do pass after pass after pass of your entire codebase, spending weeks and months tracking exceptions as they pop up and never sure if you have corrected all the bugs yet. It is hard, because you need to load the entire codebase in your head to search for bugs, be aware of the corner cases. It is demoralizing, because you are just trying to get to the point where your code already was, without improving it in any way possible.

It is pretty much a trench warfare hell – stuck in the same place, without clear advantage gained by debugging excursions at the limit of your mental capacities. It is unsurprising that a number of projects never made it to Python 3, especially niche ones made by non-developers and for non-developers – the kind of projects that made Python 2 a loved, universal language that surely would have a library that could solve your niche problem. The problem is so severe in the scientific community, that there is a serious conversation in Nature about starting to use Python 2.7 to maximise projects reproductibility, given it is guaranteed it will never change/

What could have been improved? As a rank-and-file (non-professional) developer of a niche, moderately complex library here’s a couple of things that would have my life **a lot** easier while bumping the Python version:

  • Provide a tool akin to 2to3 and make it default path. It was far from perfect – sure. But it hammered out the bulk of the differences and allowed to code to at least start executing and me – to start catching bugs.
  • Unlike 2to3, it needs to annotate potential problems in the code it could not resolve. 'rt' vs 'rb' was a direct consequence for the text vs byte separation in Python 3 and it was clear problems will arise with that. Same thing for / vs //. 2to3 should have at least high-lighted potential for problems. For me my workflow, adding a # TODO: potential conflict that needs resolution would have gone a loooooong way.
  • Better even, roll out a syntax change in the old language version that will allow the developer to explicitly resolve the ambiguity so that the automated upgrade tools can get more out of the library
  • Don’t touch the unittest functions. They are the lifeblood of the debugging of the library after the language bump. If they bail out, getting them to work would require figuring out how the code they are covering works once again and defeats their purpose.
  • Make sure that the most wide-spread libraries in your ecosystem have performed a roll-over before pushing others to do the same.
  • Those libraries need to provide a “bump” version: aka with exactly the same call syntax from the users code, they would return exactly the same results both in the previous and the new version of the language. Aka the libraries should not be bumping their own major version at the same time they bump the supported langage version.

On masks and SARS-CoV-2

This comment was initially a response to a youtube video from Tech Ingredients – a channel I have in the past thoroughly enjoyed for their in-depth dive into scientific and engineering aspects of various heavy on engineering DIY projects. Unfortunately, I am afraid that panic around COVID19 has prevented a lot of people from thinking straight and I could but disagree with the section on masks.

==

Hey there – Engineer turned biomedical scientist here. I absolutely love your videos and have been enjoying them a lot, but I believe that in this specific domain I should have enough experience to point out what appears to me as overlooked and is likely to chase drastically your recommendation on masks.

First of all, the operation room masks and the standard medical masks are extremely different beasts – if anything their capacity to filter out small particles, close in size to droplets transporting COVID19 at the longest distance is much closer to N95s than those of standard medical masks:

masks filtration efficiency

The standard medical masks let through about 70% of droplets on the smaller end of those that can carry SARS-CoV-2. A decrease in exposure of such magnitude has not been associated with a statistically significant reduction in contagion rates in any respiratory transmitted disease.

So why are standard medical masks recommended for sick people? The main reason for that is that in order to get into the air, the viral particles need to be aerosolized by coughing/sneezing/speaking by a contaminated person. The mask does not do well at preventing small particles from getting in and out, but it will prevent, at least partially the aerosolization, especially for larger droplets – that will contain more viruses and hence be more dangerous.

Now, that means that if you really want to protect yourself, rather than using a mask, even surgical, it’s much better to use a full face shield – while useless against aerosolized particles suspended in the air, it will protect you from the largest and most dangerous droplets.

Why do medical people need them?
The reality is that without the N95 masks and in immediate contact with the patients, the risk of them getting infected is pretty high even in what is considered as “safe” areas – as well as passing the virus to their colleagues and patients in those “safe” areas. If let spreading, due to the over-representation of serious cases in the hospital environment, it is not impossible that the virus will evolve to forms that lead to more serious symptoms. Even if we can’t protect the medical personnel, preventing those of them who are asymptomatic from spreading the virus is critical for everyone (besides – masks are also for patients – if you look at pictures in China, all patients wear them).

Second, why did WHO not recommend the use of N95 masks to the general public at the beginning of this outbreak, whereas they did that for SARS-CoV in 2002-2004 outbreak almost as soon as it became known to the West?

Unlike the first SARS-CoV, SARS-CoV-2 does not remain suspended in aerosols for prolonged periods of time it does not form clouds of aerosolized particles that remain in suspension and can infect someone who is passing through the cloud hours after the patient who spread it left. For SARS-CoV-2, the droplets fall to the ground fairly rapidly – within a couple of meters and a couple of minutes (where they can be picked up – hence hand washing and gloves). Due to that, unlike SARS-CoV, SARS-CoV-2 transmission is mostly driven by direct face-to-face contact with virus-containing droplets landing on the faces of people in direct contact.

Situation changes in hospitals and ICU wards – with a number of patients constantly aerosolizing, small particles do not have the time to fall and the medical personnel is at less than a couple of meters from patients due to the place constraints. However, even in the current conditions, the N95 masks are only used in the aerosol-generating procedures, such as patient intubation.

Once again, for most people, face shield, keeping several meters of distance and keeping your hands clean and away from your face are the absolute best bang-for-buck there is with everything else having significantly decreasing returns.

==

PS: since I wrote this paper, a number of science journalists have done an excellent job at doing in-depth research on the subject and write up their findings in an accessible manner:

In addition to that, a Nature study has been recently published, indicating that while masks are really good at preventing large droplets formation (yay), when it comes to small droplets formation (the type that can float for a little bit), it’s not that great for Influenza. The great news is that for Coronavirus, since there are few droplets of that size formed, it works great and containing any type of viral particles emission: Nature Medicine Study.

Scale-Free networks nonsense or Science vs Pseudo-Science

(this article’s title is a nod to Lior Pachter vitriolic arc of 3 articles with similar title)

Over the last couple of days I was engaged in a debate with Lê from Science4All about what exactly science was, that spun off from his interview with an evolutionary psychologist and my own vision of evolutionary psychology in its current state as a pseudo-science.

While not necessarily always easy and at times quite movemented, this conversation was quite enlightening and let me to trying to lay down

Following the recent paper about scale-free networks not being that spread in the actual environment (that I first got as a gist from Lior Pachter’s blog back in 2015) helped me to formalize a little bit better what I believe I feel a pseudo-science is.

Just as the models and theories within the scientific method itself, something being a scientific approach is not defined or proved. Instead, similarly to the NIST definition of random numbers through a series of tests that all need to be successfully passed, the definition of a scientific approach is a lot of time defined from what it’s not, whereas pseudo-science is defined as something that tries to pass itself as a scientific method but fails one or several tests.

Here are some of my rules of thumb for the criteria defining pseudo-science:

The model is significantly more complicated that what the existing data and prior knowledge would warrant. This is particularly true for generative models not building on the deep pre-existing knowledge of components.

The theory is a transplant from another domain where it worked well, without all the correlated complexity and without justifying that the transposition is still valid. Evolutionary psychology is a transplant from molecular evolutionary theory,

The success in another domain is advanced as the main argument for the applicability/correctness of the theory in the new domain.

The model claims are non-falsifiable.

The model is not incremental/emergent from a prior model.

There are no closely related, competing models that are considered upon application to choices.

The cases where the model fails are not defined and are not acknowledged. Evo psy – modification of the environment by humans. Scale-Free networks.

Back-tracking on the claims, without changing the final conclusion. This is different with regards to affining the model where the change in the model gets propagated to the final conclusion and that conclusion is then re-compared with reality. Sometimes mends are done to that model for it to align with the reality again, but at least during a period, the model is still considered as false.

Support by a cloud of plausible, but refuted claims rather than a couple of strong, hard to currently attack the claims.

The defining feature of pseudo-science however, epsecially compared to the faulty science is its refusal to accept the criticism/limitations to the theory and change its prediction accordingly. It always needs to fit the final maxim, no matter the data.

Jupyter/Ipython notebooks

After writing it down a couple of weeks ago for Hacker News, here is the recap and some updates:

I am a computational biologist with a heavy emphasis on the data analysis. I did try Jupyter a couple of years ago and here are my concerns with it, compared to my usual flow (Pycharm + pure python + pickle to store results of heavy processing).

  1. Extracting functions is harder
  2. Your git commits become completely borked
  3. Opening some data-heavy notebooks is neigh impossible once they have been shut down
  4. Import of other modules you have in local is pretty non-trivial.
  5. Refactoring is pretty hard
  6. Sphinx for autodoc extraction is pretty much out of the picture
  7. Non-deterministic re-runs – depending on the cell
    execution order you can get very different results. That’s an issue
    when you are coming back to your code a couple of months later and
    try to figure what you did to get there.
  8. Connecting to the ipython notebook, even from the environments like Pycharm is highly non-trivial, just as the mapping to the OS
    filesystem
  9. Hard to impossible to inspect the contents of the ipython notebook when it’s hosted on Github due to the encoding snafus

There are likely work-arounds for most of these problems, but the issue is that with my standard workflow they are non-issues to start with.

In my experience, Jupyter is pretty good if you rely only on existing libraries that you are piecing together, but once you need to do more involved development work, you are screwed.

How to upgrade MediaWiki – approximate 2018 guide

Unfortunately, unlike WordPress, MediaWiki doesn’t come with a single-button update version. Perhaps because of that, perhaps because of my laziness, I have been postponing my updates of Wikimedia websites for over five years by now. However, in the light of recent vulnerability revelations, I have finally decided to upgrade my installations and started trying to figure what exactly I needed, given I only have web interfaces and FTP access to the website I manage.

First of all, this link gives a good overview of the whole process. For my specific case, I was upgrading to the 1.30, which required a number of edits to the config file, explained here. Now, what seemed to be happening was that after backing up my database (done for me by my hosting provider) and files (that I could to by FTP), I just needed to copy the files from the latest release version (REL1_30 in my case – DO NOT DO IT, see edit below) and copy it to the directories via FTP and then just run the database update script at wiki.mywebsite.org/mw-config/. Seems pretty easy, right?

Nope, not so fast! The problem is that this distribution does not contain a crucial directory that you need to run the installation and without which you wiki installation will fail with a 500 code without leaving anything in the error logs of the server.

This step isn’t really mentioned in the installation guide, but you actually need to remove the existing /vendor folder in your installation over FTP, build the latest version for your build with a git clone https://gerrit.wikimedia.org/r/p/mediawiki/vendor.git into a /vendor folder on your machine and then upload the files to your server.

Only after that step can you connect the /mw-config/ and finish upgrading the wiki.

So yeah, let’s hope that in a not-so-distant future MediaWiki would have the same handy ‘update now’ button as the WordPress. Because something is telling me that there are A LOT of outdated MediaWiki installs out there…

Edit:

After spending a couple additional hours dealing with additional issues: do not use the “core” build, but instead download the complete one, including all the skins, extensions and vendor files from here.

Recommendation engine lock-in

Youtube’s recommendation engine, at least in my experience, has three modes:
– Suggest the channels of which I’ve already watched the content:
– Suggest me the content I’ve already watched to watch again
– Suggest me the new updates on the playing lists of which I’ve already watched several videos

Unfortunately, while it works very well when I’ve just discovered a new couple of channels and have their content chosen and pushed to me, it fails to deliver the experience of discovery – it’s overfitting my late preferences, locking me in into the videos similar to what I have watched instead of suggesting me new content and new types of content I might be interested in. And seen that I also experience the same problem with the Quora’s recommendation engine (a couple of upvotes and all of my feed is almost exclusively army weapon tech).

I feel like the recommendation engine creators should abandon their blind faith into general algorithms and try to figure out how to create feeds that are interesting and engaging with respect to several categories of interest of their user, as well covering several reasons I might be seeking for a recommendation to what to watch (what is everyone else is watching – have something to discuss with my friends; discover something new; follow up on topics I am already interested in, …)

Synergy from the boot on Ubuntu

This one seemed to be quite trivial per official blog, but the whole pipeline gets a bit more complicated once the SSL enters into the game. Here is how I made it work with synergy and Ubuntu 14.04

  • Configure the server and the client with the GUI application
  • Make sure SSL server certificate fingerprint was stored in the ~/.synergy/SSL/Fingerprints/TrustedServers.txt
  • Run sudo -su myself /usr/bin/synergyc -f --enable-crypto my.server.ip.address
  • After that check everything was working with sudo /usr/bin/synergyc -d DEBUG2 -f --enable-crypto my.server.ip.address
  • Finally add the greeter-setup-script=sudo /usr/bin/synergyc --enable-crypto my.server.ip.address line into the /etc/lightdm/lightdm.conf file under the [SeatDefaults] section

Why you shouldn’t do it?

Despite the convenience, there seemed to be a bit or an interference for the keyboard command and command interpretation on my side, so since my two computers side by side and since I have an usb button switch from before I got synergy, I’ve decided to manually start synergy every time I log in.

Linux server security

DISCLAIMER: I AM NOT AN INFOSEC EXPERT. THIS ARTICLE IS MORE OF A MEMO FOR MYSELF. IF YOU LOOSE DATA OR HAVE A BREACH, I BEAR NO RESPONSIBILITY IN IT.

Now, because of all the occasions at which I had to act as a makeshift sysadmin, I did end up reading a number of policies and pick up some advice I wanted to group in a single place, if but for my own memory.

Installation:

  • Use SE Linux distro
  • Use an intrusion prevention tool, such as Fail2Ban
  • Configure primary and secondary DHS
  • Switch away from the password-protected SSH to a key-based SSH log-in. Diable root login all together (/etc/ssh/sshd_config, PermitRootLogin no). Here is an Ubuntu/OpenSSH guide.
  • Remove network super-service packages
  • Disable Telnet and FTP (SFTP should be used)
  • use chroot where available, notably for webservers and FTP servers
  • encrypt the filesystem
  • disable remote root login
  • disable sudo su – all the root actions need to be done with a sudo

Audit:

  • Once the server has been build, run Lynsis. It will audit your system and suggest additional steps to protect your machine
  • Force multi-factor authentification for the roots, especially via SSH. Here is a tutorial from Digital Ocean.

Watching the logs:

If you have more than one logging system to watch:

Configuring PyCharm for remote development

I do most of my programming from my windows laptop and/or desktop computer. However, in order to be able to develop anything sane, I need to operate fully in Linux. I used to have to dualboot or even to have two machines, but now that I have access to a stable server I can safely ssh into, I would rather just use my IDE to develop directly on it. Lucky enough for me, PyCharm has an option for it.

A how-to guide to do this is pretty straightforward, well-explained on the PyCharm blog and docs explaining how to configure a remote server that is not a Vagrant box.

There are three steps in the configuration:

  • setting up the deployment server and auto-update
  • setting up the remote interpreter
  • setting up the run configuration

Setting up the deployment server:

Tools | Deployment | Configuration > configure your sftp server, go ahead and perform the root autodetection (usually the /home/uname) and uncheck the “available only for this project. You will need that last option in order to configure the remote interpreter. Go ahead, go into the mapping, perform the equivalence mappings for the project, but be aware the home from the previous screen, if filled, would be prepended to any path you try to map to on the remote server. So if you want your project to go to /home/uname/PycharmProjects/my_project and your root is /home/uname/, the path you are mapping to needs to be /PycharmProjects/my_projet.

Now, head to the Tools | Deployment click the automatic upload, so that every edit you do on your machine is constantly uploaded to the remote server.

Setting up the remote interpreter:

Head to the File | Settings | Project | Interpreter, click on the cogwheel and click on add remote. At that point by default PyCharm will fill in the properties for the “deployment configuration”. In my case I needed to tweak a bit the python interpreter path, since I use Anaconda Python (scientific computing). If like me you use Anaconda2 and store it in your home directory, you will need to replace the interpreter path by /home/uname/anaconda/bin/python. At that point, just click save and you are good for this part.

Setting up the run configuration:

With the previous two steps finished, when you go into Run | Edit configuration, add the main running script to the Script field, check that the python interpreter is configured to be the remote one and then click on the three small dots next to “path mappings” field and fill it out, at least with the location of the script on your machine mapped to it’s location on the remote.

That’s it, you are good to go!