Performing principal component analysis with matplotlib is extremely easy.

The input is a 2d numpy array where columns are the dimensions you want reduced and rows are samples. For example, a set of transcriptome data:

Condition1 Condition2 Condition3 Condition 4 Condition5

geneA 231 321 4 221 2312

geneB 211 4 53 34 53

geneC 4 343 .. .. ..

geneD 43 .. .. .. ..

.. .. .. .. .. ..

Here is all you need to do a PCA (assuming you have already gotten your data into a native python 2d array):

from matplotlib.mlab import PCA

#construct your numpy array of data

myData = numpy.array(data)

results = PCA(myData)

#this will return an array of variance percentages for each component

results.fracs

#this will return a 2d array of the data projected into PCA space

results.Y

Here is a great how-to on how to plot a 3d graph using matplotlib and Tk. Here is the code from the how-to plotting the PCA results from above:

import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D

x = []

y = []

z = []

for item in result.Y:

x.append(item[0])

y.append(item[1])

z.append(item[2])

plt.close('all') # close all latent plotting windows fig1 = plt.figure() # Make a plotting figure

ax = Axes3D(fig1) # use the plotting figure to create a Axis3D object.

pltData = [x,y,z]

ax.scatter(pltData[0], pltData[1], pltData[2], 'bo') # make a scatter plot of blue dots from the data

# make simple, bare axis lines through space:

xAxisLine = ((min(pltData[0]), max(pltData[0])), (0, 0), (0,0)) # 2 points make the x-axis line at the data extrema along x-axis

ax.plot(xAxisLine[0], xAxisLine[1], xAxisLine[2], 'r') # make a red line for the x-axis.

yAxisLine = ((0, 0), (min(pltData[1]), max(pltData[1])), (0,0)) # 2 points make the y-axis line at the data extrema along y-axis

ax.plot(yAxisLine[0], yAxisLine[1], yAxisLine[2], 'r') # make a red line for the y-axis.

zAxisLine = ((0, 0), (0,0), (min(pltData[2]), max(pltData[2]))) # 2 points make the z-axis line at the data extrema along z-axis

ax.plot(zAxisLine[0], zAxisLine[1], zAxisLine[2], 'r') # make a red line for the z-axis.

# label the axes

ax.set_xlabel("x-axis label")

ax.set_ylabel("y-axis label")

ax.set_zlabel("y-axis label")

ax.set_title("The title of the plot")

plt.show() # show the plot

by Damian Kao | permalink | Tips | Two comments Used tags: python |

XSQ files are binary files conforming to the HDF5 format used to store ABI SOLiD's color-space and quality data. Manipulating the data can be done with NGS_plumbing and XSQ_tools. If you just want to do a simple converstion or basic stats on the XSQ file, I recommend to just use NGS_plumbing. In this post, I will go through how the data is stored in the XSQ file and how to access it with python.

by Damian Kao | permalink | Methods | No comments Used tags: color_space, xsq |

Information theory as an interdiciplinary field consisting of engineering, physics, bioinformatics, mathematics, and many others started with Claude E. Shannon's 1948 paper, "A mathematical theory of communication". Advancements in this field have been instrumental in improving communication across the world from data storage on disc drives to satellite communcations.

In this blog post, I will briefly go over what information theory is all about in an intuitve way and it's practical application to the field of bioinformatics and evolution. I am not an expert on information theory, so I welcome any corrections.

by Damian Kao | permalink | Concept | No comments Used tags: evolution, information, theory |