September 2013 – Increasing information density

Installing TitanDB on a personal machine

September 18, 2013March 16, 2014 Andrei KNo Comments

Just to play around.

Step1: Install HBase:

follow http://hbase.apache.org/book/quickstart.html,

configuration variables:

hbase.rootdir = /opt/hadoop_hb/hbase
hbase.zookeeper.property.dataDir = /opt/hadoop_hb/zookeeper

putting it to the /opt/ file allows other users (such as specific database-attributed users) to access the necessary files without having to mix up with my /usr/ directory files.

Attention: since /opt/ belongs to root don’t forget to

sudo mkdir /opt/hadoop_hb
sudo chown <your_username> /opt/hadoop_hb

if you want to play with hbase from it’s shell

Attention: if youy are using Ubuntu, you will need to modify machine loopback, so that /etc/hosts look like:

127.0.0.1 localhost 
127.0.0.1 your_machine_name

Now you can start the hbase by typing

HBASE_HOME/bin/start-hbase.sh

and check if it is running by typing in your browser

http://localhost:60010

(unless you’ve changed the default port h base connects itself to)

Step2: Install Elasticsearch:

For this download the elasticsearch.deb package from ElasticSearch official download website and run

sudo dpckg -i elasticsearch.deb

This will install the elasticsearch on your machine and add it to services launched from the start. Now you can check if it is working by typing in your browser (unless you’ve changed the default ports):

http://localhost:9200

Step3: Install TitanDB:

Once the HBase have been installed, download the TitanDB-Hbase .tar.gz and upack it into your directory of choice. once you’ve done with it, you can connect to it via gremling by typing

 gremlin> g = TitanFactory.open('bin/hbase-es.local')

to start it as a part of the embedded rexter server, configure type:

./bin/titan.sh config/titan-server-rexster.xml bin/hbase-es.local

Now you can check that the server is up and running by typing in your browser

http://localhost:8182/graphs

You’re done!

Correct way of modifying the PATH variable in ubuntu

September 18, 2013September 4, 2015 Andrei KNo Comments

Regardless the fact that many totorials recommend to modify ~./bashrc in order ot perform a permanent modification of PATH for a given user, this is not a way to go. According to the official Ubuntu StackExchange, the way to go is to use the ~/.pam_environment folder, which is meant specifically for such modifications.

However, pay attention to the fact that you have to follow the pam_environment-specific synthax and thus type

PATH DEFAULT=${PATH}:/path/to/wherever/your/binaries/are

Reproducability in the High-throughtput and computational biology:

September 18, 2013March 16, 2014 Andrei KNo Comments

Just discovered about the Potti scandal at Duke (primer for those who have never heard about it before from here: http://en.wikipedia.org/wiki/Anil_Potti)

Currently watching http://videolectures.net/cancerbioinformatics2010_baggerly_irrh/. Some of the extraordinary quotes (approximative though):

If, after a computational analysis, you give a biologist a single gene, unrelated to cancer until now, that correlates the increase of risk of cancer, it is most likely that you would hear something like “No, you’ve got stroma contamination over here: I’ve been studying this gene for years now and I perfectly know that it is completely uncorrelated with cancer”

If, after a computational analysis, you give a biologist a list of hundreds of genes, and you say: here is the genetic signature of cancer, it is most likely that he will just agree with you, because “yeah, this one seems to correlate with that one, so yeah, that makes sense”.

=> This is precisely why I am developping the information flow framework for drug discovery and clinical biology; to make biological sense from the lists of hundreds of perturbed genes.

Forensic Bioinformatics: Here is the raw data, here is the final results. Let’s try to figure out how we get from the raw data to the results, disregarding what they said they did in supdata.

=> Idea: use the chemotherapeutic drug against 60 cell lines pannel to determine specificity and see if it correlates with the biological knowledge we have about those cell lines

Let’s use metagenes!!! As matematicians, we know them as PCA, but well, let’s call them metagenes.

Their list and ours: you might see the pattern. Yes, the genes are IDs are off-set by 1.

So, we had a look at the software they were using and it’s documentation. if you want to read the docs, go to my website, because it was me who wrote it, since there were none!

Most of review commitees in biological journals are biologists, they will skip all the part related to the microarray analysis, jump to the results and see if the computational biology results are in agreement with wet lab results.

Increasing information density

Evolving ideas

Month: September 2013