Some notes on Principal Component Analysis

Principal Component Analysis (PCA) is an ordination method that, given N explanatory variables, creates a new set of N explanatory variables with two main characteristics: 1) they are all orthogonal, therefore independent, to each other and 2) they are ranked by importance: the first PC is the one that explain the most variability, the Nth is the one that explains the least. Because of these features, PCA is sometimes used to reduce the dimensionality of multivariate data by selecting few the two or three PC that explains the most variability.
It is also possible to determine the relative contribution of each of the original variables to each PC.

Click here for a very good, interactive explanation of the idea behind PCA.

A practical example follows using the software R on the “iris” dataset: Continue reading