Sunday, 10 February 2013 at 1:24 pm

There has been a lot of heated discussion on what exactly the role of the bioinformatics is and what contributions it has made to biological sciences. The discussion started with the re-discovery of Fred Ross's farewell to bioinformatics blog entry posted mid last year, in which he used many colorful words to describe the inadequacies of the field. And from there, it was posted on several biology, bioinformatics, and programming news aggregator sites (This post on BioStar tracks the discussions), sparking debates.

I can't claim to be very experienced in the bioinformatics field. I am currently still trying finish my phd. However, I have been a hobbyist programmer for quite a while now and I've also got a decent amount of experience in academia as a lab technician, out-sourced programmer, lab manager, and grad student. 

So here is my two-cents on this discussion.

I think the crux of the controversy can be distilled down to these 3 quotes from Fred Ross' blog entry:

1. "Bioinformatics is an attempt to make molecular biology relevant to reality. All the molecular biologists, devoid of skills beyond those of a laboratory technician, cried out for the mathematicians and programmers to magically extract science from their mountain of shitty results."

Bioinformatics is an attempt to quantize the massive amount of molecular data into something we can gain insight from. Quantization can be anything: counting reads mapped to reference, classifying gene function into controlled dictionary of terms (gene ontology), classifying gel band intensity...

The basic idea is that we are transforming, traditionally, not very quantifiable data into numbers; and in doing so, we will lose out on some data. The method and rationale behind the transformation is probably what angers Fred Ross. 

Before I go on, I want to talk about how people in computer science and biological sciences are trained.

Programming is extremely logical and consistent, which makes sense since computers are man-made, artificial machines. We made computers deterministic so we can interact with them in a meaningful way. Progammers are trained to be completely reductionist and explicit in their thinking because computers are too "stupid" to try to interpret fuzziness. 

Biologists are trained to try to do what computers are too "stupid" to do. Biologists obviously also follows the scientific method; however, gaps in knowledge have to be filled in an educated manner to produce further thought, which can then result in some kind of a theory that is tested and refined. Of course, interpretation of the fuzziness and filling in the gaps will not always be completely objective. That's where collecting more data and logical discussions comes in to refine the interpretation. That's science.

I think every bioinformatician has, in his/her career, had doubts about how to interpret a piece of molecular data. Is it really "right" to standardize my data? Is it really "right" to say this relates to that biological function? Is it really "right" to ignore that as noise (this is probably the most popular question)? 

To say, "this is shitty data, we should just dump it because no amount of computational methods will give you anything objectively significant" will give you 0% progress. However, to delve into the shitty data and try to glean anything from it by educated guesses, will perhaps give you 1% progress, which is better than nothing. 

Imagine trying to computationally run Darwin's data. 

A valid complaint can be made when papers are published with the 1% progress and touted as more significant than they really are. But that is more a problem with academic culture than the scientific process.

I think Fred's anger stems from the fundamental differences in how biologists and computer scientists are trained to think. I don't think one is more valid than another. Having the knowledge and skill to broadly interpret fuzziness is just as important as being able to explicitly and objectively analyze quantitative data. Bioinformatics is an awkward synthesis of the two ways of thinking. I personally find it hard to switch from reductionist, explicit thinking mode when doing data analysis to biological interpretation mode when reading papers. 

2. "..bioinformatics found a way to survive: obfuscation. By making the tools unusable, by inventing file format after file format, by seeking out the most brittle techniques and the slowest languages, by not publishing their algorithms and making their results impossible to replicate, the field managed to reduce its productivity by at least 90%, probably closer to 99%."

This is more of a practical issue with the field of bioinformatics. There is no question that software development and format standardization is a problem in the field. And a large part can probably be attributed to how academics view software development/maintence. They think it's not worth their time. 

I see software maintence and usability in bioinformatics analogous to keeping reagents and organism lines available in molecular biology. Perhaps this comes down to the very common issue of people not valuing intellectual property because it's not physical (web deisgners can probably sympathize). Hopefully a solution will come eventually and institutions analagous to Bloomington fly stock center or Iowa hybridoma bank will pop up and centralize the development and maintence of code.

I agree with Fred on this point.

3. "There are only two computationally difficult problems in bioinformatics, sequence alignment and phylogenetic tree construction."

It's a valid point. But so is saying "There are only two difficult problems in footbal (soccer), kicking the ball and running".

While it is probably correct that most problems in bioinformatics can be reduced down to a problem in alignment and tree construction, to skip all the complexities associated with co-opting alignment and tree consturction for other purposes is short-sighted. 


Information technology has obviously transformed the world in terms of communication and the fundamental way we think about data. It is an equalizer in the sense that it provided an avenue of creative expression for anyone with an internet connection. Ideas flourished and technology start-ups spearheaded progress. 

Programmers thrive in this environment where innovation is king and new methods pop up every day. There is a very distinct spirit of adventure that is incomparable to other fields. Perhaps because IT is so engrained into every aspect of our lives that, as programmers, we can feel our practical impact resonate clearly.

I think it is this spirit that programmers also find lacking in biology and don't find enough of in bioinformatics. I do agree with the sentiment that researchs science can sometimes take a "frog in a well" perspective and progress can be slow. The time-scale of progress in IT compared to research science is vastly different. Bioinformatics is very much still in it's "start-up" stage compared to a technology start-up.

I guess the question is: would you rather come back to the field in 20 years when bioinformatics will have matured or do you want to be involved in the tumultuous beginnings of a new synthesis of sciences?