Installing 2.7 parallel python stack + a couple of modules under CentOS 6

On CentOS 5 and 6 you unfortunately cannot install a newer version of python instead of the default one, because the package controller “yum” depends on it. The only way to go is to make an altinstall. The following article describes it really well:

http://toomuchdata.com/2012/06/25/how-to-install-python-2-7-3-on-centos-6-2/

in order to make python2.7 command available to the root now (typically for the module installation), add python2.7 to root’s path:

PATH=$PATH:/usr/local/bin
export PATH

Now you can safely install all the fancy python modules you want. Well, almost all.

Building Scipy with alternative install of Python isn’t really a piece of cake neither, since it requires to first install LAPACK, ATLAS and BLAS packages, which is not completely direct for complete newcomers. This tutorial explains well how to do it exactly:

http://www.shocksolution.com/2011/08/how-to-build-scippy-with-python-2-7-2-on-centos5/

Btw, once you’ve installed all the modules listed in the link above, you can just do

sudo pip install scipy

and wait until it finishes compiling.

Enjoy!

Mastering Groovy

So, since I want to work with neo4j through bulbs, it seems that I have no other option but to use Groovy Gremlin.

Installation of groovy on Eclipse: through marketplace. Quite easy.

First attempt to use: install gremlin from Tinkerpop and access it from Groovy programming shell in eclipse. After about an hour of furious googling, it seems that a couple of libraries need to be included in the groovy shell to launch the gremlin from within groovy:

gremlin$ groovysh -cp $GREMLIN_HOME/lib/gremlin-groovy-2.3.0.jar:$GREMLIN_HOME/lib/gremlin-java-2.3.0.jar:$GREMLIN_HOME/lib/pipes-2.3.0.jar:$GREMLIN_HOME/lib/common-1.7.jar:$GREMLIN_HOME/lib/groovy-1.8.9.jar

To do the same thing from Eclipse, Project>Properties>JavaBuild Path>Add External Jars and then add:

  • $GREMLIN_HOME/lib/gremlin-groovy-2.3.0.jar
  • $GREMLIN_HOME/lib/gremlin-java-2.3.0.jar
  • $GREMLIN_HOME/lib/pipes-2.3.0.jar
  • $GREMLIN_HOME/lib/common-1.7.jar
  • $GREMLIN_HOME/lib/groovy-1.8.9.jar
  • Murky waters of systems bilogy

    I am currently trying to parse the Reactome.org owl database file into a format more suited for my needs. So far I have been experiencing some major difficulties, because of lack of rigor in organisation of classes in this ressource, at least in the biopax .owl export file.

    First, obscure use of the “memberPhysicalEntity” attribute. Some of the proteins, complexes and smallMolecules are in fact whole classes of proteins, with functions often non-defined and metionned in the reactions. Which means I have to find them out by hand and use proxy objects (which are not real groups of proteins).

    Second, mixture between:

    • Physical entities defined by a unique structure, for instance proteins as defined by a uniprot
    • Instantiated physical entities: the ones that contain a post-translation modification or are localized to a particular cellular compartment
    • Fragments of Physical entities, for instance alpha chains of different proteins
    • Collections of Fragments of physical entities, such as collection of all the alpha chains found within the database

    Third, some redundancy in use of owl terms. For instance Catalysis, Control, TemplateReactionRegulation and Degradation are all used in a similar fashion, even if Catalysis is modulation of kinetic bareer in reaction and the other can actually completely perturbate  the reaction. What is the reason of redundancy of terms? It is not very clear…

    Forth, lack of information essential for a class in the class description, aka “headless Horseman” problem:

    • Post-translational modifications on a protein, provided without location (i.e. somewhere) and without type (some modification).

    Last, but not least, lots of “floating” compounds. There are about 6000 compounds (Proteins, Complexes or PhysicalEntities) that are pointed towards only by only one unique reaction. It means they participate to no other reaction and regulate no other reaction, except for only one. Which seems quite unrealistic to my eyes.

    I have spend now about two weeks to get it all in order and I’ll try to publish the resulting cleaned file once I am done.

    Linkage between eczema and low platelet count

    Ever since my master course on advanced immunology by Nicole Harris at EPFL I had an impression that platelets have been playing an important role in the immune response but were still a completely under-explored domain. This impression where confirmed when I was working with Cedric Merlot on building a predictive systems biology method for predicting systemic drug effect based on multiple protein interaction. In fact, as a side-result our method suggested a strong linkage between platelets and serotonin and also pointed that it might be heavily with the rest of the immune response system.

    While reading a paper on a completely unrelated subject I fall on a syndrome that looks like single-protein caused, but that links eczema and low platelet count. Now, that gets really interesting.

    Wiskott–Aldrich syndrome (WAS) is a rare X-linked recessive disease characterized by eczemathrombocytopenia (low plateletcount), immune deficiency, and bloody diarrhea (secondary to the thrombocytopenia)

    A couple of concepts of efficient programming

    As I am trying to use python to build a rather large software solution for use in bioinformatics, I am slowly realizing that there are lots of concepts I really need, that were never taught in my CS courses. Among them:

    Python – D3.js integration

    Python provides a pretty coomon framework for biological data analysis. And D3.js is one of the most coomon plateforms for the visualisation of large massifs of data. So I have been looking for a way to make them work together. This post gives a pretty decent introduction to the d3.js visualisation for people totally unfamiliar with javascript. It also suggests a possible interfacting of d3.js with pyhton via a javascript pseudo-library for that is to be wrtitten to the root folder containing the html page in JSON format.

    An alternative approach is to is to send the JSON directly to a webpage javascript via the python build-in server, but this requires a little bit more work. In any case I will be looking more in depth at it shortly

    http://blog.nextgenetics.net/?e=7

    Update1: I tried to follow the path suggested in the link, it didn’t quite work for me

    Update2: And as usual, the problem was not with the tutorial but with chair-keyboard interface on the users side: I put d3.min.js library in a folder my server had no permission to go, so it didn’t  get loaded and the script didn’t get executed. Using instead the following snippet to import the d3.min.js library works perfectly fine:

    <script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>r