Thursday, 22 March 2012 at 2:41 pm

View the demo here
HTML source is at the bottom of the post

Computers and the internet have changed academia in dramatic ways from greater sharing of data to a larger sense of community. Science journals are now all digitized and available online either through your web browser or downloadble as a .pdf. 

Even with all the technology available for presenting data, most published papers still only contain static figures. I am not undervaluing the importance of having nicely formatted figures and graphs. But I do want to show how data can be presented with all the tools available now. 

Science papers are generally viewed on a computer through a web browser like Chrome, Firefox, or Safari which use javascript/html/css for displaying information. Therefore, browser languages are ideal for ensuring accessibility of your data. Javascript is often touted as the most prevalent programming language in the world since every computer has a browser and most browsers can interpret javascript.

Here are a bunch of examples of interactive figures made using browser technologies, specifically D3.js.


How a web page displays information

A brief primer on javascript, html, css, and how they are interpreted by the browser:

  • HTML is essentially a templating language. It is used to place elements onto the webpage in the form of tables, text, buttons, etc.
  • CSS is a supplement to HTML in the sense that it gives finer control on the visual styling of web page elements, ie. borders, colors, margins, size, position, etc.
  • Javascript is the work horse of an interactive website. Modern websites are not just static pages. Interactive elements like forms, buttons, and animations are all controlled through javascript.

The typical structure of a web page consist of a .html file describing page elements, .css files that describe the style of page elements, and javascript files that give function to the page elements. CSS and javascript (JS) files are linked in the .html file which is in turn interpreted and displayed/executed by the browser.

Here is an example of a web page displaying a red 'hello world' link. When the link is clicked on, an alert box pops up with the same message:

<html>
<head>
<style>
.title {
color:red;
font-size:20pt;
}
</style>
<script>
function pop() {
alert('hello world');
}
</script>
</head>
<body>
<center>
<a onClick="pop();" class='title'>hello world</a>
</center>
</body>
</html>

I am not going to go into more detail than this about scripting for the browser. There are plenty of information on the web for learning html/css/javascript.


Javascript and D3.js

D3.js is a javascript library released mid 2011 by Mike Bostock who previously created the protovis library. There are several examples, tutorials in the documentation section of the official website. D3.js is much more than a visualization library, as it is essentially a DOM manipulator. However, for the sake of this post, I am only going to discuss it's visualization functions.

A good understanding of SVG specification is required to render figures with D3.js. SVG is a good format to use because it is a vector graphics format where geometric primitives are used to describe elements of the image instead of raw pixels. Scaling the image up or zooming in to a vector graphic will not result in pixelation.


Data input

There are several limitations to using a browser as a platform. One of them is file I/O. Since browsers were designed to display data from external servers, accessing data from your local file system is restricted for security purposes. Modern browsers are restricted by the same-origin-policy (SOP) where all file access protocols are disabled. There are ways to bypass it, either by:

  • appending a flag to your browser to disable web security
  • starting a simple http server with python that'll serve the data from local host
    python -m SimpleHTTPServer 8888

In my opinion, the biggest advantage for using a browser to visualize data is the accessibility for users. I personally prefer to input JSON formatted data using script injections instead of asking users to change their configuration. In this method of data input, the raw data is loaded as a psudeo-javascript library and interpreted by the browser. This bypasses SOP because javascript files are allowed to be loaded locally.

What's great about the JSON format is that it's natively interpreted by the browser and is basically the same as string representation of data structures from python output. For example this python script would print a JSON formatted data structure:

data = []
for i in range(5):
  entry = {'id':i,'info':'NA'}
  data.append(entry)
print str(data)
--
--
[{'info': 'NA', 'id': 0}, {'info': 'NA', 'id': 1}, {'info': 'NA', 'id': 2}, {'info': 'NA', 'id': 3}, {'info': 'NA', 'id': 4}]

Adding a javascript variable declaration would make it directly interepreted by the browser:

print 'myData = ' + str(data) + ';'


Getting started

To create a simple bar chart, first download the D3.js library. Create a bare-bone html file and link the D3.js library. Notice the empty script tags in the body of the file. This is where the D3.js render statements will go. We want the body to be loaded before we run the render statements. Placing them at the end of the body will make sure of that:

<html>
<head>
<script type="text/javascript" src="d3.js"></script>
</head>
<body>
<script type="text/javascript"></script>
</body>
</html>

Here are some sample data we are going to use to create a simple bar chart. This is mock data of 10 samples and counts of differential expressed genes for each sample. You can either save as 'data.js' and put it in the same directory as the .html file or copy and paste this directly into the .html file surrounded by script tags.

var de = [{'count': 728, 'name': 'sample0'}, {'count': 824, 'name': 'sample1'}, {'count': 963, 'name': 'sample2'}, {'count': 927, 'name': 'sample3'}, {'count': 221, 'name': 'sample4'}, {'count': 574, 'name': 'sample5'}, {'count': 733, 'name': 'sample6'}, {'count': 257, 'name': 'sample7'}, {'count': 879, 'name': 'sample8'}, {'count': 620, 'name': 'sample9'}];

If you choose to save as 'data.js', link this data in the previously created .html page by appending this script tag. Now the variable, 'de' will contain the bar chart data. 

<script type="text/javascript" src="data.js"></script>


Using the D3.js library

D3.js is essentially a web page (DOM) element manipulator. It can be used to bind data to web page elements and stylize its css properties according to the data. In terms of rendering figures, it is used to bind data to svg elements, ie. rectangles, circles, lines, paths and define the visual properties of these svg elements.

We first append a svg element to the body:

var mySVG = d3.select("body").append("svg");

We need to define the dimensions of the svg element. Luckily, function chaining is supported and used heavily in D3.js:

mySVG 
  .attr("width", 500) 
  .attr("height", 500);

We've defined the width and height of the figure to be 500 pixels. The raw counts data ranges from 0 to 1000, so we cannot just use the raw numbers as pixel height for the bars or they will be cut off beyond the dimensions of the svg element. D3 has several useful functions for scaling data into correct pixel dimensions. We are going to use the d3.scale.linear functions:

var heightScale = d3.scale.linear()
  .domain([0, 963])
  .range([0, 400]);

Domain is the range of data you want to scale and range is the range of pixels you want to scale the data to. The scaling functions in D3 returns an anonymous function that you can assign to a variable. This anonymous function takes in your raw data values as input and returns the scaled pixel value. 

Noticed that I used 963 as the max value for the domain as it is the largest value in the raw data. Another way to find the largest value would be to use one of D3's many array methods:

var heightScale = d3.scale.linear()
  .domain([0, d3.max(de,function(d) { return d.count;})])
  .range([0, 400]);

The d3.max method takes in an array of data and an optional accessor function to return the maximum element.

Now that we have all the elements and scaling functions setup, we can write the main rendering statements. First thing to do is to create a svg rectangle for every element of the data array.

var myBars = mySVG.selectAll('rect')
  .data(de)
  .enter()
  .append('svg:rect');

The 'selectAll' statement selects all rectangle elements in the svg of which there are none right now. The 'data' statements binds each element of the 'de' array to a rectangle element. But because there are no rectangles currently, none of the data will be bound.

The enter() function allows you to select any unbound data. In this case, all the data elements in the array are unbound so all elements are selected and a corresponding rectangle is appended. Now there will be 10 rectangles, each bound with an element from the 'de' array.

The 'myBars' variable now refers to these 10 rectangles. We need to set the visual properties of each bar to produce a bar chart:

myBars
  .attr('width',20)
  .attr('height',function(d,i) {return heightScale(d.count);});

The 'attr' function takes in 2 arguments. First is the name of the attribute we want to assign a value to. Second is the value to assign. The first 'attr' statement will assign a length of 20 to each 'width' attribute. The second 'attr' statement is a bit more complicated if you do not have a good understanding of javascript. It assigns an anonymous function to the property. Basically, it runs the anonymous function and assigns the returned value to the 'height' attribute.

This anonymous function is bound to the scope of an rectangle, so it is also able to access the element from the array bound to the rectangle. The 'd' and 'i' argument of the anonymous function refers to the data element and the index of the element in the 'de' array.

The purpose of the anonymous function assigned to the 'height' attribute is to take the count data of the 'de' array element and scale it to pixel dimensions using the 'heighScale' d3.scale function we created previously.

Now that the width and height of the bars are defined for each rectangle, we have to set the position of each bar.

myBars
  .attr('x',function(d,i) {return (i * 22) + 100;})
  .attr('y',function(d,i) {return 400 - heightScale(d.count);});

It is important to note that x,y coordinate do not follow the cartesian coordinate system. The origin (0,0) coordinate is the top left corner of the SVG element. X coordinate increases to the right and Y coordinate increases as you move down. 

The X attribute of each rectangle is defined as the index of the bound data multiplie by 22. Since the width of each rectangle is 20, this will create a spacing of 2 pixels between each bar. The statement also adds 100 to each X coordinate to create a padding of 100 pixels on the left for axis tick labels. The Y attribute is defined as 400 minus the height of the bar because the point of placement of each rectangle (anchor point) is located at the top left corner of the rectangle. 

Now all that is left is to draw the axis and labels:

mySVG.selectAll(".xLabel")
  .data(de)
  .enter().append("svg:text")
  .attr("x", function(d,i) {return 113 + (i * 22);})
  .attr("y", 435)
  .attr("text-anchor", "middle") 
  .text(function(d,i) {return d.name;})
  .attr('transform',function(d,i) {return 'rotate(-90,' + (113 + (i * 22)) + ',435)';});

mySVG.selectAll(".yLabel")
  .data(heightScale.ticks(10))
  .enter().append("svg:text")
  .attr('x',50)
  .attr('y',function(d) {return 400 - heightScale(d);})
  .attr("text-anchor", "end") 
  .text(function(d) {return d;});

mySVG.selectAll(".yTicks")
  .data(heightScale.ticks(10))
  .enter().append("svg:line")
  .attr('x1','100')
  .attr('y1',function(d) {return 400 - heightScale(d);})
  .attr('x2',320)
  .attr('y2',function(d) {return 400 - heightScale(d);})
  .style('stroke','black');

Note that the selector function in d3 can be used to select a wide range of attributes, not just by element type. The heightScale.ticks() function takes in the number of ticks you want to generate and will return an array of numbers accordingly. 


Putting it all together

Here is the complete .html file code. I've added some extra styling to make it easier on the eyes. Click here for a live demo of the following code.

<html>
<head>
<script type="text/javascript" src="d3.v2.js"></script>
<style>
.fig {
font-family:Arial;
font-size:10pt;
color:darkgray;
}
</style>
</head>
<body>
<script type="text/javascript">
de = [{'count': 728, 'name': 'sample0'}, {'count': 824, 'name': 'sample1'}, {'count': 963, 'name': 'sample2'}, {'count': 927, 'name': 'sample3'}, {'count': 221, 'name': 'sample4'}, {'count': 574, 'name': 'sample5'}, {'count': 733, 'name': 'sample6'}, {'count': 257, 'name': 'sample7'}, {'count': 879, 'name': 'sample8'}, {'count': 620, 'name': 'sample9'}];
var mySVG = d3.select("body")
.append("svg")
.attr("width", 500)
.attr("height", 500)
.style('position','absolute')
.style('top',50)
.style('left',40)
.attr('class','fig');
var heightScale = d3.scale.linear()
.domain([0, d3.max(de,function(d) { return d.count;})])
.range([0, 400]);
mySVG.selectAll(".xLabel")
.data(de)
.enter().append("svg:text")
.attr("x", function(d,i) {return 113 + (i * 22);})
.attr("y", 435)
.attr("text-anchor", "middle")
.text(function(d,i) {return d.name;})
.attr('transform',function(d,i) {return 'rotate(-90,' + (113 + (i * 22)) + ',435)';});

mySVG.selectAll(".yLabel")
.data(heightScale.ticks(10))
.enter().append("svg:text")
.attr('x',80)
.attr('y',function(d) {return 400 - heightScale(d);})
.attr("text-anchor", "end")
.text(function(d) {return d;});

mySVG.selectAll(".yTicks")
.data(heightScale.ticks(10))
.enter().append("svg:line")
.attr('x1','90')
.attr('y1',function(d) {return 400 - heightScale(d);})
.attr('x2',320)
.attr('y2',function(d) {return 400 - heightScale(d);})
.style('stroke','lightgray');

var myBars = mySVG.selectAll('rect')
.data(de)
.enter()
.append('svg:rect')
.attr('width',20)
.attr('height',function(d,i) {return heightScale(d.count);})
.attr('x',function(d,i) {return (i * 22) + 100;})
.attr('y',function(d,i) {return 400 - heightScale(d.count);})
.style('fill','lightblue'); </script>
</body>
</html>


What's next

This is just the tip of the iceberg on what can be done with D3.js. In the next few weeks, I'll post more entries on:








Search

Categories


Archive