Tuesday, 17 December 2013 at 12 pm

I got an acceptance e-mail from Hacker School last night after a short written application and 2 interviews. Hacker School is a workshop for programmers. It aims to be a safe environment for people at different skill levels to come together and learn. You spend 3 months in New York working by yourself or collaborating with other like-minded people on whatever projects that interest you. It might sound kind of self indulgent, but so is graduate school in some sense. 

They accept a very diverse group of people from what I've read. Maybe I'll meet another bioinformatician there. I am looking forward to seeing how this goes.

Now I just have to finish writing this thesis...




  Tuesday, 10 December 2013 at 5 pm

Bayes theorem is perhaps the most well known theorem in the statistics of conditional probabilities. It goes like this:

P(A|B) = P(B|A) * P(A) / P(B)
 
P(A) means the probability of an outcome named 'A'. P(A|B) means the probability of an outcome named 'A' given the outcome, named 'B' has occurred.
 
In this post, I'll present a couple of intuitions about this theorem.



  Thursday, 05 December 2013 at 10 am

I came across this pop science article yesterday:

http://aeon.co/magazine/nature-and-cosmos/why-its-time-to-lay-the-selfish-gene-to-rest/

The author argues that a gene-centric perspective of evolution, made popular by Dawkins with "The Selfish Gene", is not correct and we should focus our attention on other mechanisms such as gene expression. 

The fallacy with his argument stems from a misunderstanding of what Dawkins was trying to present. The selfish gene can basically be boiled down to: "The most basic unit of heredity is a gene".

This idea is only gene-centric in the sense that we think it is the most fundamental unit of heredity. Biologists understand there are many many layers of complexity (including gene expression) above genes that ultimately contributes to the phenotype. There are plenty of research done at the level of gene expression networks, protein translation, protein folding, cell organization, tissue engineering...etc.

A more valid arguement against "The Selfish Gene" is the use of the term "gene". The definition of a gene is becoming more murky than ever (here is a great paper on this: http://genome.cshlp.org/content/23/12/1961.full?rss=1). The most basic unit of heredity perhaps should be any genomic feature that contributes to the phenotype? Whatever that may be.




  Friday, 20 September 2013 at 1 pm

Enrichment analysis are applied when you have categorical data associated with your dataset. For example gene ontology, pfam families, molecular pathways, enzymatic activity...etc. The gist of the analysis is to see whether a certain category (GO term, pfam…) are over-represented in a subset of your data.

Let’s take an example. Let’s say I have:

  •  A transcriptome of 20,000 genes.
  • 400 genes out of 20,000 are categorized as “cell cycle”.
  • We found 1,000 genes to be differentially expressed under a certain condition.
  • 300 genes have the “cell cycle” category out of the 1,000 differentially expressed genes.

What is the significance of this? In other words, if we pick 1,000 genes randomly from the total pool of 20,000 genes, what are the chances there will be more than 300 genes with the cell cycle category?

In this post I will go through the basics of how enrichment analysis is performed and some thoughts on how informative this analysis is as applied to biological systems.




  Monday, 02 September 2013 at 5 pm

I've been attending the UK NGS/Genomic Sciences meetings since it started 4 years ago. While there are great talks every year, this year, they were able to get Clive Brown to do the keynote talk about Oxford Nanopore. For people in the NGS field, I don't think I need to say much about what Nanopore is (check out Oxford Nanopore's website for more details).

Before the talk, Clive put up a slide telling people he prefers there to be no tweets about the talk since he will be covering a great deal of technical details (which he did). I found that kind of strange. It seems like he doesn't want the content of his talk to be public? Why not just have all of us sign a NDA if that's the case? However, I will comply with his request and will not write much about the technical aspects of his talk. Instead, I will talk about what I think about Oxford Nanopore and its potential impact on the field.




  Saturday, 06 July 2013 at 10 pm

I put some finishing touches on Seeker: Annotation Viewer last week for visualizing sequence features such as protein domains, primers, etc... Now I am working on a genome browser. Here is an extremely early prototype (there is around 1.8mb of files to load):

http://www.nextgenetics.net/tools/browser/browser.html

It should work on latest versions of Chrome/Safari/Firefox. It will most likely NOT work on IE or Opera. Hopefully, this won't crash your browser. This is completely client-side only. You can distribute these files on a USB stick and anyone with a modern browser will be able to open it.

The loaded data is human chromosome 1 parsed from a .gtf file downloaded from UCSC. The parsed data is around 1MB (980KB). These interactions are possible right now:

  • Dragging on the tracks will allow you to scroll through the reference chromosome
  • WASD movement. Press 'A' to scroll left, 'S' to scroll right, 'W' to scroll up, 'S' to scroll down. Anyone who plays computer games should be familiar with this layout.
  • Clicking on the bottom overview bar (blue bar) will let you jump to that position.
  • You can also click and drag on the bottom bar, but depending on how good your computer is, it might be jittery.
  • The line graph on the bottom overview bar represents feature density. The higher the amplitude, the more features there are at that loci.
  • Right now it's displaying 1 million base pair windows. I've tested up to 5 million with little trouble on my early 2012 Macbook Pro. I'll probably set maximum window size to 1 million. 

I'll go in to more detail about how the rendering works in the future. I've implemented a "rubber-banding" scrolling system instead of the normal Google Maps style tiling system.




  Monday, 01 July 2013 at 09 am

I've wanted to learn how to build web apps with webGL ever since I saw the crazy Unreal engine ported to HTML5 and webGL (as a side-note, three.js is a very popular javascript 3d library that leverages webGL). It has a lot of potential for data visualizations. Imagine a genome browser running on a GPU. It will be able to render millions of objects easily. 

I came across this developer preview library today of a framework that allows for data visualizations using webGL and webworkers for multi-threading:

http://superconductor.github.io/superconductor/

It is only a developer preview. But it looks extremely cool.

Of course the down-side (as with anything running in a browser) is cross-browser compatbility. The framework seems to also use webCL which doesn't seem like it will be widely adopted anytime soon. Perhaps someone can make a modified Node-webkit?




  Thursday, 20 June 2013 at 8 pm

After several refactoring, version 1.0 of the annotation viewer is finished. You can use the app here:

http://www.nextgenetics.net/tools/anno_view/annotator.html

Input to the app right now is either HMMScan domain table result or a tab delimited file. The tab delimited file is formatted with 5 columns: sequence name, feature name, start position, end position, sequence length. There are sample input data in the app for clarity.

I am not sure how cross-browser it is. It was developed mostly with Chrome in mind, however it should work on latest versions of Chrome/Safari/Firefox. 

On the technical side of things, this web app uses D3.js heavily for the SVG rendering and many DOM manipulations. All I can say is that D3.js is almost magical in how fast it re-renders objects. I also rolled my own MVC system instead of going with the popular backbone.js, angular.js,...etc frameworks. It was definintely an eye opening experience to see how much work goes into these MVC systems.

My MVC system is not really a full MVC. A more proper description is a view-centric MVC.

Components like menus, checkboxes, sliders, drop-downs were built with a data binding system that allows them to react to changes in the data. These components are the view of the MVC pattern. However, there are no formal models in this system, hence the "view-centric". Data are just native javascript objects or arrays, allowing JSON-typed input. When the data is bound to a view, methods are added to the data that allows them to update the view on data change. Yes, I am aware that adding methods to the data object is dirty and a hack. 

Here is an example of this view-centric MVC system:

var data = {'name':'next gen sequencing conference','attending':false};
var checkbox = new seeker.checkbox()
.bind({'text':data,'checkbox':data},{'text':'name','checkbox':'attending'})

The data is an object consisting of two key:value pairs. To bind this piece of data to a checkbox where the label of the textbox correspond to "name" and the checkbox itself correspond to "attending", we use the .bind function. This function takes in two argument objects: data and keys.

There are specific keys that let's the checkbox component understand which data corresponds to the label or the checkbox. Both the 'text' key which corresponds to the label and the 'checkbox' key which correpsonds to the checkbox are bound to the "data" object. The keys in the object that corresponds to 'text' and 'checbox' are 'name' and 'attending. 

This system is a bit unweildy in argument construction. I might have to mess around with that part to make it more elegant. I also still have to optimize data unbinding. I am sure there are tons of memory leaks right now.







Search

Categories


Archive