Dean Adams, Iowa State University
Say we have obtained 8 linear measurements from a set of birds, and wish to examine patterns of variation in these data (the data are actually the Bumpus data). Obviously, one cannot draw an 8-dimensional dataspace, so what to do?
One option is to plot pairs of variables and perform some mental gymnastics to stitch these together:
But can we instead have a summary plot of the entire dataspace?
Ordination approaches provide a solution
Principal component analysis (PCA) is two things: (1) a singular-value decomposition (SVD) of a symmetric matrix and (2) projection of mean-centered or standardized data onto eigenvectors from SVD.
Using mean-centered data: \(\small\mathbf{Y}_c\) we calculate:
\[\small\hat{\mathbf{\Sigma}}=\frac{\mathbf{Y}^{T}_c\mathbf{Y}_c}{n-1}\]
We then decompose the covariance matrix via eigen-analysis (SVD):
\[\small\hat{\mathbf{\Sigma}}=\mathbf{U} \mathbf{\Lambda} \mathbf{U}^T\]
This step identifies a set of orthogonal axes describing directions of variation. These are the columns of \(\small\mathbf{U}\). Next we project the data onto these axes as:
\[\small\mathbf{P} = \mathbf{Y}_c\mathbf{U}\]
Here, \(\small\mathbf{P}\) represent the set of projection scores (principal component scores) on each of the PC axes. These are used in the subsequent plot.
Finally, the percent variation explained by any principal component (column of \(\mathbf{U}\)) is \[\frac{\lambda_i}{tr\mathbf{\Lambda}}\]
Note that one can alternatively perform SVD directly on \(\small\mathbf{Y}_c\):
\[\small{\mathbf{Y}_c}=\mathbf{V} \mathbf{D} \mathbf{U}^T\]
Here, \(\small\mathbf{D^2}\) expresses the percent-variation explained by each PC-axis, which are found as the right-singular vectors in: \(\small\mathbf{U}\).
PC scores are found as:
\[\small\mathbf{P} = \mathbf{VD}\]
Matrix decompositions provide mechanisms for many multivariate methods, including ordinations, linear model coefficient calculations, and matrix covariation analyses. Here we describe two we have just used:
For an \(\small{p} \times p\) square matrix, \(\small\mathbf{S}\), there exists a vector of \(\small{p}\) values, \(\small\mathbf{\lambda}\), such that,
\[\small\mathbf{Sv} =\lambda\mathbf{v}\] which can also be written (in a more computaionally feasible way) as \[\small\left( \mathbf{S}-\lambda\mathbf{I} \right) \mathbf{v} = 0\]
Solving each possible eigenvector (\(\small\mathbf{v}\)) and eigenvalue (\(\small\lambda\)) via a characteristic polynomial is called, “eigen analysis”. This is done systematically by first describing the determinant, \(\small\left| \mathbf{S}-\lambda\mathbf{I} \right|\). This will be an equation with \(\small{p}\) possible solutions for \(\lambda\) that balance the equation. Up to \(\small{p}\) real numbers exist if \(\small\mathbf{S}\) is positive-definite. Some of these numbers might be 0 if \(\mathbf{S}\) is not full-rank. Some can be negative if \(\small\mathbf{S}\) is not symmetric. (This is not a certainty but might be common with non-metric association matrices; i.e., the similarity of A to B is not the same as B is to A.) By finding the values of \(\small\lambda\) that work, the vectors, \(\small\mathbf{v}\), are estimated such that the solution above holds true.
\[\small\mathbf{Y}_{n \times p} =\mathbf{U}_{n \times p'} \mathbf{\Lambda}_{p' \times p'} \mathbf{V}_{p' \times p'}^T\]
\(\small\mathbf{U}\) (left singular values) and \(\small\mathbf{V}\) (right singular values) are solved via numerical approximation algorithms. The number of singular values is \(\small{p}'= \min(n,p)\). If \(\small\mathbf{Y}\) is not full-rank, some of the singular values in \(\small\mathbf{\Lambda}\) will be 0. The singular values decrease from largest to smallest.
\[\small\mathbf{S}_{p \times p} =\mathbf{U}_{p \times p} \mathbf{\Lambda}_{p \times p} \mathbf{U}_{p \times p}^T\]
\(\small\mathbf{U}\) (left and right singular values) are solved via numerical approximation algorithms If \(\small\mathbf{S}\) is symmetric, positive-definite, \(\small\mathbf{U}\) is the matrix of eigenvectors and \(\small\mathbf{\Lambda}\) the matrix of eigenvalues, equal to those found from eigen analysis using the characteristic polynomial function. This is a “short-cut” method, often employed for principal component analysis when it is sure that \(\small\mathbf{S}\) is symmetric, positive-definite.
Let’s look at the dimensions of all components through this analysis:
1: \(\small\mathbf{Y}_c\): The data is a matrix of \(\small{n}\times{p}\) dimensions
2: \(\small\hat{\mathbf{\Sigma}}=\frac{\mathbf{Y}^{T}_c\mathbf{Y}_c}{n-1}\): The covariance matrix is of dimension \(\small{p}\times{p}\)
3: \(\small\hat{\mathbf{\Sigma}}=\mathbf{U} \mathbf{\Lambda} \mathbf{U}^T\): The matrix of eigenvectors \(\small\mathbf{U}\) is of dimension \(\small{p}\times{p}\)
4: \(\small\mathbf{P} = \mathbf{Y}_c\mathbf{U}\) The matrix of projected scores is of dimension \(\small{n}\times{p}\)
So what we are doing conceptually with PCA is taking a \(\small{p}\times{p}\) data space, finding \(\small{p}\) new directions within that data space, and rotating the data to these new axes to arrive at the PCA projected space
In some cases, some dimensions may have 0% variation (\(\small\lambda<0\))
When \(\small{N > p}\), there are (at most) \(\small{N-1}\) PC axes with variation (when using metric data) When \(\small{p > N}\), there are (at most) \(\small{p}\) PC axes
Fewer PCs with variation may also occur when there is redundancy in the data
## [1] 3.475031e+00 9.889196e-01 2.220446e-15
Note we have 3 variables, but only 2 dimensions of data
If a variable varies much more than others, it dominates the PCA
cov(Y.std)
= cor(Y)
Returning to the Bumpus data, we have 8 linear measurements from 136 birds. Thus, \(\small\mathbf{Y}_{c}\) is a \(\small{136}\times{8}\) dimensional matrix. A PCA of these data is below:
## [1] 0.825568587 0.113439756 0.031562141 0.018573009 0.005248828 0.002640375
## [7] 0.001771733 0.001195571
## TL AE WT BHL HL FL TTL
## 0.46642125 0.86020181 0.14948852 0.06078853 0.06403447 0.05835265 0.09008610
## SW
## 0.02900088
Males and females are colored blue and red respectively. PC1 describes 82% of the variation and PC2 describes 11%. First PC is explained largely by AE and TL
Shepard’s plot of distances:
Note that larger distances are more accurately represented, while there is more ‘slop’ in the smaller distances. The reason is that objects similar in the PC1-PC2 projection may show differences in higher PC dimensions
PCA is helpful for exploring multivariate space and it may suggest patterns
It is descriptive! It is nothing more than a rigid rotation of the multivariate data space
No hypotheses are tested
Using the correlation vs. covariance matrix has important implications
It preserves the Euclidean distances among objects in the original multivariate data space
Very useful for summarizing the results of hypothesis-testing approaches applied to multivariate data(e.g. MANOVA)
Wait, what???
It seems that, apart from an axis reflection (which means nothing), PCA and this distance-based method (PCoA) yielded identical results
What in the WORLD is going on???
For metric spaces, PCoA provides an identical ordination to PCA (PCoA is sometimes called Metric Multidimensional Scaling). This is most interesting, as PCA is based on the \(\small{p}\times{p}\) trait covariance matrix, while PCoA is based on the \(\small{n}\times{n}\) specimen distance matrix. Yet the relationships among objects in the two plots are absolutely identical!
PCA and PCoA preserve distances among objects
Non-metric MDS generates ordination where similar objects are close together and dissimilar objects are far apart
Preserves the relative order of the objects, but not the distances themselves
Objective is to find a low dimensional representation of the data space
\[\small{stress} =\sqrt{\sum(D_{fitted}-\hat{D}_{fitted})^2/ \sum{(D_{fitted})^2}}\]
Pretty similar, but NOT identical.
Ordination for count and frequency data (sometimes called reciprocal averaging)
Very frequently used in ecology (community data, species presence etc)
Preserves the \(\small\chi^2\) distance among objects (a weighted Euclidean distance of conditional probabilities)
\[\small{D}_{\chi^2}=\sqrt{\sum\frac{1}{y_k}(\frac{y_{ki}}{y_i}-\frac{y_{kj}}{y_j})^2}\]
Provides a test of independence of rows and columns
Elements of \(\small\mathbf{\overline{Q}}\): \(\small{q}_{ij}=[\frac{p_{ij}-p_{i+}p_{j+}}{\sqrt{p_{i+}p_{j+}}}]\)
Dune meadow vegetation data from the Dutch island of Terschelling
rows: 20 2x2m randomly selected plots sampled in the island in 1982 for a dune conservation project
columns: plant species
values: cover class (standardized estimate of density)
Although most ordination methods construct mathematically orthogonal axes, real life is not constrained to be orthogonal!
This may create problems, e.g. the “arch effect” (aka the “horseshoe” effect)
The \(\small{2}^{nd}\) axis is an arched function of the 1st axis (common in ecological data, e.g. species counts in sites along an environmental gradient)
Approaches are arbitrary (how to choose segments? What order polynomial to use?)
DC2 now completely meaningless (proximities of objects on DC2 cannot be interpreted)
DCA plot distorted, no longer represents distances among objects
Detrending eliminates (hides) arch effect, that may in itself be useful information (i.e., the arch IS the pattern in the data!)
Detrending should absolutely be avoided!
Ordination approaches very useful for visualizing patterns in high-dimensional data
Methods that preserve distances and directions (PCA, PCoA) retain more of the original pattern, and have greater utility
For non-MVN data, be certain to use appropriate distance measure or appropriate ordination tool