Tuesday, 31 July 2012 at 2:05 pm
by Damian Kao
Performing principal component analysis with matplotlib is extremely easy.
The input is a 2d numpy array where columns are the dimensions you want reduced and rows are samples. For example, a set of transcriptome data:
Condition1 Condition2 Condition3 Condition 4 Condition5
geneA 231 321 4 221 2312
geneB 211 4 53 34 53
geneC 4 343 .. .. ..
geneD 43 .. .. .. ..
.. .. .. .. .. ..
Here is all you need to do a PCA (assuming you have already gotten your data into a native python 2d array):
from matplotlib.mlab import PCA
#construct your numpy array of data
myData = numpy.array(data)
results = PCA(myData)
#this will return an array of variance percentages for each component
results.fracs
#this will return a 2d array of the data projected into PCA space
results.Y
Here is a great how-to on how to plot a 3d graph using matplotlib and Tk. Here is the code from the how-to plotting the PCA results from above:
import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D
x = []
y = []
z = []
for item in result.Y:
x.append(item[0])
y.append(item[1])
z.append(item[2])
plt.close('all') # close all latent plotting windows fig1 = plt.figure() # Make a plotting figure
ax = Axes3D(fig1) # use the plotting figure to create a Axis3D object.
pltData = [x,y,z]
ax.scatter(pltData[0], pltData[1], pltData[2], 'bo') # make a scatter plot of blue dots from the data
# make simple, bare axis lines through space:
xAxisLine = ((min(pltData[0]), max(pltData[0])), (0, 0), (0,0)) # 2 points make the x-axis line at the data extrema along x-axis
ax.plot(xAxisLine[0], xAxisLine[1], xAxisLine[2], 'r') # make a red line for the x-axis.
yAxisLine = ((0, 0), (min(pltData[1]), max(pltData[1])), (0,0)) # 2 points make the y-axis line at the data extrema along y-axis
ax.plot(yAxisLine[0], yAxisLine[1], yAxisLine[2], 'r') # make a red line for the y-axis.
zAxisLine = ((0, 0), (0,0), (min(pltData[2]), max(pltData[2]))) # 2 points make the z-axis line at the data extrema along z-axis
ax.plot(zAxisLine[0], zAxisLine[1], zAxisLine[2], 'r') # make a red line for the z-axis.
# label the axes
ax.set_xlabel("x-axis label")
ax.set_ylabel("y-axis label")
ax.set_zlabel("y-axis label")
ax.set_title("The title of the plot")
plt.show() # show the plot
two comments
Thanks for such a great post.
I have a small question, how can I plot a 2d graph instead of 3d after reducing my dimensions using your script.
I will appreciate any help.
the PCA implementation in matplotlib is buggy or at least unorthodox: instead of decomposing Covariance matrix it decomposes Correlation matrix – so the distribution of eigenvalues, for example is different.