PCA in R, code examples on using prcomp and princomp in R

I have used several times PCA in R and get always confused about the use of prcomp and princomp in R.

The following code in R compares both methods and the usual tasks done with both:

############################################################
# PCA IN R
# example on using prcomp and princomp in R
# Look at this blogs and webs:
# http://stats.stackexchange.com/questions/104306/what-is-the-difference-between-loadings-and-correlation-loadings-in-pca-and
# http://www.sthda.com/english/wiki/principal-component-analysis-in-r-prcomp-vs-princomp-r-software-and-data-mining
#
# prcomp is preferred, see: http://stats.stackexchange.com/questions/20101/what-is-the-difference-between-r-functions-prcomp-and-princomp

############################################################
# OBTAIN PRINCIPAL COMPONENTS
data(mtcars)
pc1 = prcomp(mtcars, center=TRUE, scale=TRUE)$x[,1:2] #principal components scaled, mean subtracted
pc1
pc2 = prcomp(mtcars,center= TRUE, scale= FALSE)$x[,1:2] #principal components not scaled, not mean subtracted
pc2
pc3= princomp(mtcars)$score[,1:2] # principal components, same value as: prcomp(mtcars,center= TRUE, scale= FALSE)$x[,1:2], with a possible sign change
pc3

############################################################
# OBTAIN LOADINGS
load1= prcomp(mtcars)$rot[,1:2]
load1
load2= princomp(mtcars)$loadings[,1:2] # same value as: prcomp(mtcars)$rot[,1:2], , with a possible sign change
load2

############################################################
# OBTAIN ORIGINAL DATA
mtcars
mtcars_2=prcomp(mtcars)$x %*% t(prcomp(mtcars)$rot)
mtcars_2 # not the same as mtcars
mtcars_3=princomp(mtcars)$score %*% t(princomp(mtcars)$loadings)
mtcars_3 # not the same as mtcars
mtcars_2_c=prcomp(mtcars)$x %*% t(prcomp(mtcars)$rotation)+matrix(rep(prcomp(mtcars)$center,each=nrow(mtcars)),nrow=nrow(mtcars))
mtcars_2_c # after adding the mean substracted previously, the result is the same as mtcars
mtcars_3_c=princomp(mtcars)$score %*% t(princomp(mtcars)$loadings)+matrix(rep(princomp(mtcars)$center,each=nrow(mtcars)),nrow=nrow(mtcars))
mtcars_3_c # same as mtcars
#in case of using scaling it is needed to multiply by the scale factor: http://stackoverflow.com/questions/29783790/how-to-reverse-pca-in-prcomp-to-get-original-data

SVD and PCA in R

Guide to do SVD and PCA in R.

First some good documents about PCA and SVD:

http://www.math.ucsd.edu/~gptesler/283/slides/pca_f13-handout.pdf  … about genetic application, VERY GOOD

http://www.cs.cmu.edu/~jimeng/papers/ICMLtutorial.pdf    … It is a generic document about tensors on machine learning but has an initial chapeter very interesting about SVD and PCA

https://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf

Imagine you have an matrix Y (mxn) with m rows and n columns, suppose m>=n. For example it can be a matrix of images with the rows having the vectorised pixel intesities of images or a matrix of genes with rows being the genes expressions for different samples. The columns have samples, so , for example, different columns correspond to different tissue samples or different images….

The SVD of Y is a matrix fatorization:  Y= UDV’ , where the prime indicates transpose

In R SVD is obtained with:

s = svd(Y)

s$u  gives the U

s$d  gives the digonal vector of D

s$v  … here we have the V, not the transposed V

Before applying svd the matrix Y have tobe detrended, that is, we have to substract the mean (through the sample dimension…)

so…previously…

Y= Y – rowMeans(Y)

Once this is done….the Principal Components are the Vs obtained in s$v after applying svd to the detrended Y

These can be obtained also using the function prcomp in R

p = prcomp(y)

and the principal components are in p$rotation…. try this plot to see that the columns of s$v and p$rotation are equal:

plot(s$v[,1],p$rotation[,1])  ,same for the other columns….

Before applying prcomp it is not necessary to do detrending.

THE PCs can be the s$v or the s$u depending on the problem and if we are working with Y or Y’. In general there is a confusion between what is the PCs and what is the eigenvector you take to project in the new low-dimension space, and there is confusion between documents about this.

In principle, the projection is done with the values obtained from svd (s$s or s$u) or their transposed. These values are usually called principal components, but principal components are called, in other papers, the above mentioned values multiplied by s$d, in these cases can be an additional factor of 1/m-1 to account for the factor to obtain the covariance matrix. See the first reference where it is presented quite clearly the differences.

IN GENERAL do YY’ and Y’Y and take the one with the lower dimension (this is usually the covariance matrix that makes sense), after doing that if you have done YY’ the PCs are the Us and if Y’Y then the PCs are the Vs

Here is some code in R you can use to see the differences between prcomp and svd:

u=s$u
d=s$d
v=s$v
View(p$rotation)
View(s$v)
View(abs(s$v-p$rotation))

sqrt(colSums(p$rotation^2))  # unitary vectors
View(Y %*% p$rotation -p$x)  # p$x has the projections of Y on the PCs

Making sense of PCA an difference with OLS

Look at this:

http://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues

http://math.stackexchange.com/questions/1146/intuitive-way-to-understand-principal-component-analysis

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

http://www.cerebralmastication.com/2010/09/principal-component-analysis-pca-vs-ordinary-least-squares-ols-a-visual-explination/

Intuitive and practical….. PCA, SVD, dimensionality reduction and all that…..

Links to places than explain well how to do PCA and how to understand it…….

http://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues

http://math.stackexchange.com/questions/1146/intuitive-way-to-understand-principal-component-analysis

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

http://www.cerebralmastication.com/2010/09/principal-component-analysis-pca-vs-ordinary-least-squares-ols-a-visual-explination/