Scale-Free networks nonsense or Science vs Pseudo-Science

(this article’s title is a nod to Lior Pachter vitriolic arc of 3 articles with similar title)

Over the last couple of days I was engaged in a debate with LĂȘ from Science4All about what exactly science was, that spun off from his interview with an evolutionary psychologist and my own vision of evolutionary psychology in its current state as a pseudo-science.

While not necessarily always easy and at times quite movemented, this conversation was quite enlightening and let me to trying to lay down

Following the recent paper about scale-free networks not being that spread in the actual environment (that I first got as a gist from Lior Pachter’s blog back in 2015) helped me to formalize a little bit better what I believe I feel a pseudo-science is.

Just as the models and theories within the scientific method itself, something being a scientific approach is not defined or proved. Instead, similarly to the NIST definition of random numbers through a series of tests that all need to be successfully passed, the definition of a scientific approach is a lot of time defined from what it’s not, whereas pseudo-science is defined as something that tries to pass itself as a scientific method but fails one or several tests.

Here are some of my rules of thumb for the criteria defining pseudo-science:

The model is significantly more complicated that what the existing data and prior knowledge would warrant. This is particularly true for generative models not building on the deep pre-existing knowledge of components.

The theory is a transplant from another domain where it worked well, without all the correlated complexity and without justifying that the transposition is still valid. Evolutionary psychology is a transplant from molecular evolutionary theory,

The success in another domain is advanced as the main argument for the applicability/correctness of the theory in the new domain.

The model claims are non-falsifiable.

The model is not incremental/emergent from a prior model.

There are no closely related, competing models that are considered upon application to choices.

The cases where the model fails are not defined and are not acknowledged. Evo psy – modification of the environment by humans. Scale-Free networks.

Back-tracking on the claims, without changing the final conclusion. This is different with regards to affining the model where the change in the model gets propagated to the final conclusion and that conclusion is then re-compared with reality. Sometimes mends are done to that model for it to align with the reality again, but at least during a period, the model is still considered as false.

Support by a cloud of plausible, but refuted claims rather than a couple of strong, hard to currently attack the claims.

The defining feature of pseudo-science however, epsecially compared to the faulty science is its refusal to accept the criticism/limitations to the theory and change its prediction accordingly. It always needs to fit the final maxim, no matter the data.

Jupyter/Ipython notebooks

After writing it down a couple of weeks ago for Hacker News, here is the recap and some updates:

I am a computational biologist with a heavy emphasis on the data analysis. I did try Jupyter a couple of years ago and here are my concerns with it, compared to my usual flow (Pycharm + pure python + pickle to store results of heavy processing).

  1. Extracting functions is harder
  2. Your git commits become completely borked
  3. Opening some data-heavy notebooks is neigh impossible once they have been shut down
  4. Import of other modules you have in local is pretty non-trivial.
  5. Refactoring is pretty hard
  6. Sphinx for autodoc extraction is pretty much out of the picture
  7. Non-deterministic re-runs – depending on the cell
    execution order you can get very different results. That’s an issue
    when you are coming back to your code a couple of months later and
    try to figure what you did to get there.
  8. Connecting to the ipython notebook, even from the environments like Pycharm is highly non-trivial, just as the mapping to the OS
    filesystem
  9. Hard to impossible to inspect the contents of the ipython notebook when it’s hosted on Github due to the encoding snafus

There are likely work-arounds for most of these problems, but the issue is that with my standard workflow they are non-issues to start with.

In my experience, Jupyter is pretty good if you rely only on existing libraries that you are piecing together, but once you need to do more involved development work, you are screwed.

How to upgrade MediaWiki – approximate 2018 guide

Unfortunately, unlike WordPress, MediaWiki doesn’t come with a single-button update version. Perhaps because of that, perhaps because of my laziness, I have been postponing my updates of Wikimedia websites for over five years by now. However, in the light of recent vulnerability revelations, I have finally decided to upgrade my installations and started trying to figure what exactly I needed, given I only have web interfaces and FTP access to the website I manage.

First of all, this link gives a good overview of the whole process. For my specific case, I was upgrading to the 1.30, which required a number of edits to the config file, explained here. Now, what seemed to be happening was that after backing up my database (done for me by my hosting provider) and files (that I could to by FTP), I just needed to copy the files from the latest release version (REL1_30 in my case – DO NOT DO IT, see edit below) and copy it to the directories via FTP and then just run the database update script at wiki.mywebsite.org/mw-config/. Seems pretty easy, right?

Nope, not so fast! The problem is that this distribution does not contain a crucial directory that you need to run the installation and without which you wiki installation will fail with a 500 code without leaving anything in the error logs of the server.

This step isn’t really mentioned in the installation guide, but you actually need to remove the existing /vendor folder in your installation over FTP, build the latest version for your build with a git clone https://gerrit.wikimedia.org/r/p/mediawiki/vendor.git into a /vendor folder on your machine and then upload the files to your server.

Only after that step can you connect the /mw-config/ and finish upgrading the wiki.

So yeah, let’s hope that in a not-so-distant future MediaWiki would have the same handy ‘update now’ button as the WordPress. Because something is telling me that there are A LOT of outdated MediaWiki installs out there…

Edit:

After spending a couple additional hours dealing with additional issues: do not use the “core” build, but instead download the complete one, including all the skins, extensions and vendor files from here.

Recommendation engine lock-in

Youtube’s recommendation engine, at least in my experience, has three modes:
– Suggest the channels of which I’ve already watched the content:
– Suggest me the content I’ve already watched to watch again
– Suggest me the new updates on the playing lists of which I’ve already watched several videos

Unfortunately, while it works very well when I’ve just discovered a new couple of channels and have their content chosen and pushed to me, it fails to deliver the experience of discovery – it’s overfitting my late preferences, locking me in into the videos similar to what I have watched instead of suggesting me new content and new types of content I might be interested in. And seen that I also experience the same problem with the Quora’s recommendation engine (a couple of upvotes and all of my feed is almost exclusively army weapon tech).

I feel like the recommendation engine creators should abandon their blind faith into general algorithms and try to figure out how to create feeds that are interesting and engaging with respect to several categories of interest of their user, as well covering several reasons I might be seeking for a recommendation to what to watch (what is everyone else is watching – have something to discuss with my friends; discover something new; follow up on topics I am already interested in, …)

Synergy from the boot on Ubuntu

This one seemed to be quite trivial per official blog, but the whole pipeline gets a bit more complicated once the SSL enters into the game. Here is how I made it work with synergy and Ubuntu 14.04

  • Configure the server and the client with the GUI application
  • Make sure SSL server certificate fingerprint was stored in the ~/.synergy/SSL/Fingerprints/TrustedServers.txt
  • Run sudo -su myself /usr/bin/synergyc -f --enable-crypto my.server.ip.address
  • After that check everything was working with sudo /usr/bin/synergyc -d DEBUG2 -f --enable-crypto my.server.ip.address
  • Finally add the greeter-setup-script=sudo /usr/bin/synergyc --enable-crypto my.server.ip.address line into the /etc/lightdm/lightdm.conf file under the [SeatDefaults] section

Why you shouldn’t do it?

Despite the convenience, there seemed to be a bit or an interference for the keyboard command and command interpretation on my side, so since my two computers side by side and since I have an usb button switch from before I got synergy, I’ve decided to manually start synergy every time I log in.

Linux server security

DISCLAIMER: I AM NOT AN INFOSEC EXPERT. THIS ARTICLE IS MORE OF A MEMO FOR MYSELF. IF YOU LOOSE DATA OR HAVE A BREACH, I BEAR NO RESPONSIBILITY IN IT.

Now, because of all the occasions at which I had to act as a makeshift sysadmin, I did end up reading a number of policies and pick up some advice I wanted to group in a single place, if but for my own memory.

Installation:

  • Use SE Linux distro
  • Use an intrusion prevention tool, such as Fail2Ban
  • Configure primary and secondary DHS
  • Switch away from the password-protected SSH to a key-based SSH log-in. Diable root login all together (/etc/ssh/sshd_config, PermitRootLogin no). Here is an Ubuntu/OpenSSH guide.
  • Remove network super-service packages
  • Disable Telnet and FTP (SFTP should be used)
  • use chroot where available, notably for webservers and FTP servers
  • encrypt the filesystem
  • disable remote root login
  • disable sudo su – all the root actions need to be done with a sudo

Audit:

  • Once the server has been build, run Lynsis. It will audit your system and suggest additional steps to protect your machine
  • Force multi-factor authentification for the roots, especially via SSH. Here is a tutorial from Digital Ocean.

Watching the logs:

If you have more than one logging system to watch:

Configuring PyCharm for remote development

I do most of my programming from my windows laptop and/or desktop computer. However, in order to be able to develop anything sane, I need to operate fully in Linux. I used to have to dualboot or even to have two machines, but now that I have access to a stable server I can safely ssh into, I would rather just use my IDE to develop directly on it. Lucky enough for me, PyCharm has an option for it.

A how-to guide to do this is pretty straightforward, well-explained on the PyCharm blog and docs explaining how to configure a remote server that is not a Vagrant box.

There are three steps in the configuration:

  • setting up the deployment server and auto-update
  • setting up the remote interpreter
  • setting up the run configuration

Setting up the deployment server:

Tools | Deployment | Configuration > configure your sftp server, go ahead and perform the root autodetection (usually the /home/uname) and uncheck the “available only for this project. You will need that last option in order to configure the remote interpreter. Go ahead, go into the mapping, perform the equivalence mappings for the project, but be aware the home from the previous screen, if filled, would be prepended to any path you try to map to on the remote server. So if you want your project to go to /home/uname/PycharmProjects/my_project and your root is /home/uname/, the path you are mapping to needs to be /PycharmProjects/my_projet.

Now, head to the Tools | Deployment click the automatic upload, so that every edit you do on your machine is constantly uploaded to the remote server.

Setting up the remote interpreter:

Head to the File | Settings | Project | Interpreter, click on the cogwheel and click on add remote. At that point by default PyCharm will fill in the properties for the “deployment configuration”. In my case I needed to tweak a bit the python interpreter path, since I use Anaconda Python (scientific computing). If like me you use Anaconda2 and store it in your home directory, you will need to replace the interpreter path by /home/uname/anaconda/bin/python. At that point, just click save and you are good for this part.

Setting up the run configuration:

With the previous two steps finished, when you go into Run | Edit configuration, add the main running script to the Script field, check that the python interpreter is configured to be the remote one and then click on the three small dots next to “path mappings” field and fill it out, at least with the location of the script on your machine mapped to it’s location on the remote.

That’s it, you are good to go!

Health Data interpretation

I used to like to use Tactio Health App back in the day, before the introduction of the Apple Health Kit.

However, after getting a more modern iPhone and installing it onto it, I realized that despite the fact that Tactio Health was reading tons of data from the Health app, it was only writing weight to it. So all of my details related to blood pressure measurements, blood analyses, et Co were locked-in inside the app and it had no intention to share it.

Scanning the App store for apps that would cover that angle actually lead me to a realization – there are tons of copycat apps with slightly different flavors covering four major directions: workout tracking/guidance, weight loss/gain, periods tracking, and baby-related apps.

All in all, there are no lifestyle tracking apps to keep an eye on your habits and warn when you are getting into a lifestyle that would lead to dire health consequences. And there is even less collaboration between apps that try to do it – and Tactio Health is a case in point.

More interestingly, it looks like there are no market right now for that kind of apps – either the users are already bent on keeping their health intact and don’t need any reminders, or they are so hopelessly behind that the “you are too bad” tone of the current apps is way too discouraging.

At the same time, I can understand the reticence of the users to put their health data out there, in the wild, while knowing that potentially this data can be used to deny them coverage in the future or drive their premiums up.

Food/activity tracking apps

I am back to trying to get an insight on quantifying my life and am running into the same problem that I used to always experience with the activity/food trackers in the past. They are simply not made to encourage people to change and maintain changes. Just a couple of problems to start with:

  • The activity tracking suggests at least ~150 minutes cardio per week. If a new user is just starting and switching from a sedentary lifestyle and are trying to go into an active one, this will be deadly to them – the most they can carry out is 60 minutes of cardio at maximum for the first month and a half. Trying to get to 150 minutes is a guaranteed recipe for failure to adhere more than for the first week or so, either because of the lack of will or lack of because they will hurt themselves by trying to ramp up too fast. A better way of doing it would be to take ~2 weeks of monitoring upon each uninterrupted session, then suggest a ramp-up that would gradually improve the habits of the user in a way that would stick in the long run.
  • In my own experience, the reason a lot of people end up in a pretty bad shape is not necessary because they don’t know any better, they don’t have the time because of their work and other occupations, that constantly make self-care slide to the end of their list of priorities. A lot of activity/food tracking solutions require a lot of active input from the user and because of that, tend to have a low adherence rate, especially in the long term. A much better option would be to perform monitoring in the long run that requires almost none
  • Specifically for the food trackers – the lack of a unified repository of products and ability to fraction amount of them consumed. I was able to find for some of them teas that contain cholesterol (WTF?), but wasn’t able to see what was in unless I reviewed the labels.
  • And as per usual, the current state of the trackers is deplorable when it comes to measuring anything outside the calories. A lot of “healthy” foods are healthy not so much because they contain fewer calories, but because they contain a lot of micro-elements and vitamins that make them cover and prevent cravings in the long run.

Bonsu point: Apple health app unifying different apps. That doesn’t seem like much, but it definitely stitches all the apps together into one, making sure the information flows inside the health app ecosystem, allowing me to log in an activity once, as opposed to 3-4 times before that, and still benefit from the best of all the apps without having to deal with the worst.

Sleep monitors and internet of things

I do think that the sleep monitors should not require an active action from the user to activate them every night. Instead, it should be something that runs in the background – like GPS or pedometer in your telephone for walking distance monitoring.

Hence I see a tool that would be having two following functions:

  • movement detection for the quality of sleep computation
  • light detection, in order to figure out when you are sleeping or could be potentially sleeping