Wednesday, 17 September 2014 at 1 pm

In my previous entry, I showed how to add a toggle code cell button to your IPython notebook. Someone in the comments had a great solution where a code snippet is added to the custom.js file. His code is located here:

However, it seems like a lot of people wanted a feature where the published notebook (NBViewer) has the ability to hide the code cell. 

It turns out, it is possible to run javascript in the notebook if you import the HTML method from IPython:

from IPython.display import HTML

In a code cell, add this:

function code_toggle() {
if (code_show){
} else {
code_show = !code_show
$( document ).ready(code_toggle);
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')

When you run this code cell, by default, all the code cells will now be hidden. But you can toggle it on and off by clicking on the link. This toggle link will also be present in the published (NBViewer) version.

Here is an example of an IPython notebook with this toggle link:

*Note that the above link doesn't actually contain the raw toggle script. I put all my IPython specific python scripts in a python library. I then import the script.

  Monday, 30 June 2014 at 1 pm

Trinity is a popular transcriptome assembler developed a the Broad institute. It consist of three main programs (Inchworm, Chrysalis, Butterfly) that processes and assemble raw reads into a transcriptome.

In the Chrysalis step of the program, contigs are bundled together based on k-mer overlap and pair-end read information. These bundled contigs, also called "components" by Trinity, are then represented as a de bruijn graph allowing Butterfly to find various traversal paths which ultimately represents possible transcripts. 

Visualizing the de bruijn graph can be very informative. Here is my IPython notebook for rendering the de bruijn graph of Trinity components: notebook

The notebook contains two functions that can:

  • Render the graph as a simplified network of essential nodes and highlight all probable paths as described by Butterfly.
  • Render all nodes of the graph as green circles with the root node in red.

  Friday, 30 May 2014 at 2 pm

I recently started using Ipython notebook and it has quickly become essential to my workflow. However, when discussing the data informally with colleagues, I wanted the code cells to be hidden as to not distract from the figures.

The newest IPython notebook version do not allow executing javascript in markdown cells anymore, so adding a new markdown cell with the following javascript code will not work anymore to hide your code cells:


I also didn't want to resort to adding an ugly extra markdown cell, so I went into the IPython jinja template and added a new menu item under "view" for toggling the code cells. 

The IPython notebook template on my OSX laptop was located at:


I added a new javascript toggle function in the script tag. Where it used to be:

<script type="text/javascript">
// MathJax disabled, set as null to distingish from *missing* MathJax,
// where it will be undefined, and should prompt a dialog later.
window.mathjax_url = "{{mathjax_url}}";

I changed it to: (added code in red)

<script type="text/javascript">
// MathJax disabled, set as null to distingish from *missing* MathJax,
// where it will be undefined, and should prompt a dialog later.
window.mathjax_url = "{{mathjax_url}}";

function code_toggle() {
if (code_show){
} else {
code_show = !code_show

Then under the view menu:

<ul id="view_menu" class="dropdown-menu">
<li id="toggle_header"....

I added a new menu item:

<li id="toggle_toolbar"
title="Show/Hide code cells">
<a href="javascript:code_toggle()">Toggle Code Cells</a></li>

Save the notebook.html file and restart your ipython notebook. You should now have a new third menu item under "view" that'll allow you to toggle the code cells.

This is a pretty dirty hack though and it might conflict with future updates. So please make a backup version of the notebook.html before you modify it.

  Wednesday, 07 May 2014 at 2 pm

It's over. This was the thought that went through my head as I walked out of hacker school's front door around midnight. Having had a few drinks in the preceeding couple of hours as part of the end-of-term party, my steps were heavier than normal.

What is it?

Hacker school is a three months programmer's retreat where a group of like-minded and motivated people are gathered in a room to learn as much as they can about programming. This is not unique. Organized workshops and retreats exist for many professions and careers. However, what makes hacker school different from others is its singular devotion to learning and community. The more cynical among you might question the organizers' sincerity, as there are obvious secondary motivations in the form of recruitment fees for the organizers and landing a job in the tech industry for the attendees. Take it as you will, I did not find these practical motivations to be obstructive during my stay.

  Tuesday, 17 December 2013 at 12 pm

I got an acceptance e-mail from Hacker School last night after a short written application and 2 interviews. Hacker School is a workshop for programmers. It aims to be a safe environment for people at different skill levels to come together and learn. You spend 3 months in New York working by yourself or collaborating with other like-minded people on whatever projects that interest you. It might sound kind of self indulgent, but so is graduate school in some sense. 

They accept a very diverse group of people from what I've read. Maybe I'll meet another bioinformatician there. I am looking forward to seeing how this goes.

Now I just have to finish writing this thesis...

  Tuesday, 10 December 2013 at 5 pm

Bayes theorem is perhaps the most well known theorem in the statistics of conditional probabilities. It goes like this:

P(A|B) = P(B|A) * P(A) / P(B)
P(A) means the probability of an outcome named 'A'. P(A|B) means the probability of an outcome named 'A' given that the outcome, named 'B' has occurred.
In this post, I'll present a couple of intuitions about this theorem.

  Thursday, 05 December 2013 at 10 am

I came across this pop science article yesterday:

The author argues that a gene-centric perspective of evolution, made popular by Dawkins with "The Selfish Gene", is not correct and we should focus our attention on other mechanisms such as gene expression. 

The fallacy with his argument stems from a misunderstanding of what Dawkins was trying to present. The selfish gene can basically be boiled down to: "The most basic unit of heredity is a gene".

This idea is only gene-centric in the sense that we think it is the most fundamental unit of heredity. Biologists understand there are many many layers of complexity (including gene expression) above genes that ultimately contributes to the phenotype. There are plenty of research done at the level of gene expression networks, protein translation, protein folding, cell organization, tissue engineering...etc.

A more valid arguement against "The Selfish Gene" is the use of the term "gene". The definition of a gene is becoming more murky than ever (here is a great paper on this: The most basic unit of heredity perhaps should be any genomic feature that contributes to the phenotype? Whatever that may be.

  Friday, 20 September 2013 at 1 pm

Enrichment analysis are applied when you have categorical data associated with your dataset. For example gene ontology, pfam families, molecular pathways, enzymatic activity...etc. The gist of the analysis is to see whether a certain category (GO term, pfam…) are over-represented in a subset of your data.

Let’s take an example. Let’s say I have:

  •  A transcriptome of 20,000 genes.
  • 400 genes out of 20,000 are categorized as “cell cycle”.
  • We found 1,000 genes to be differentially expressed under a certain condition.
  • 300 genes have the “cell cycle” category out of the 1,000 differentially expressed genes.

What is the significance of this? In other words, if we pick 1,000 genes randomly from the total pool of 20,000 genes, what are the chances there will be more than 300 genes with the cell cycle category?

In this post I will go through the basics of how enrichment analysis is performed and some thoughts on how informative this analysis is as applied to biological systems.