Wednesday, 28 March 2012 at 2:17 pm

View the demo here
Download source code here

There are many ways of displaying gene ontology (GO) information. The most common way is to use software like Cytoscape to generate a graph diagram of GO with terms of interest highlighted. However, it is much more interesting to graph your data in an interactive way.

Here is a demo graphing only a small subset of GO. The demo starts with the root GO term, biological process and allows you to click on children terms (in orange) to navigate down the GO graph. It is a force-directed graph so you can also drag the nodes around. 

The two links on the top (fix and unfix) will turn off/on the force physics. There are also two sample GO term links that'll allow you to jump to the specified GO term.

Since the entire JSON formatted data of GO (biological process only) is around 4MB, this demo does not contain all possible GO terms. You can download a version with all GO terms here

The graph was rendered using D3.js. The data was parsed and formatted with python. I am not going to into great detail about the source code as there are too many things to cover. I'll just generally talk about the steps in creating this visualization.

Parsing the .obo file

The first step is to parse the .obo file for all the terms and possible relationships. Refer to my blog entry about parsing the .obo file. The script described in the post will need to be modified to parse other relationship types. 

The output format of the parser will need to be in JSON format. As I mentioned in my blog entry on visualizations with D3, there are restrictions that allows loading to local data. However, loading JSON format data as a pseudo-javascript library will bypass this restriction.

The data structure I chose consist of 3 arrays:

  • An index hash that maps GO id to posiition number in the data array
  • An array containing each term and it's immediate parents/children 
  • A hash describing the 7 possible relationship types
It is possible to put all the data into one giant array. But using one data array with 2 indexing hashes reduces the size of the output file signifcantly as the data array only has to refer to an index position instead of the entire GO id. 


The javascript code consist of 4 main portions:

  • Functions for accessing the data. Essentially recursive functions for getting descendents and ancestors.
  • Function for creating a D3 data structure from input GO ids which is just an array of nodes and links.
  • Function for drawing the graph and updating the graph.
  • Initialization statements that specify the SVG elements and relevent variables.

D3 pretty much does all the heavy lifting of rendering the graph and physics. You can read up on force-directed graph API here. All you have to worry about is data manipulation and structure. These code segments are all commented in the .html file.

Potential usage

This demo is obviously just a functional showcase of how this data can be visualized. Modifications to this code can be made to display gene enrichment data by perhaps:

  • Coloring or sizing the nodes based on over-representation/significance value. 
  • Prune the graph and show only enriched GO terms and their common ancestors.
  • Attach a weight value on closely related enriched GO terms by changing the force of their links so the nodes cluster together more.
  • Attach more mouse interactivity by showing tooltips of enriched genes and other statistics.