08: Ordination Methods

General Overview

Dean Adams, Iowa State University

Exploratory Multivariate Methods

Motivation for Ordination

Ordination and Dimension Reduction

Some Ordination Methods

Principal Components Analysis (PCA)

PCA: Conceptual Visualization

PCA: Conceptual Visualization

PCA Computations

Principal component analysis (PCA) is two things: (1) a singular-value decomposition (SVD) of a symmetric matrix and (2) projection of mean-centered or standardized data onto eigenvectors from SVD.

Using mean-centered data: \(\small\mathbf{Y}_c\) we calculate:

\[\small\hat{\mathbf{\Sigma}}=\frac{\mathbf{Y}^{T}_c\mathbf{Y}_c}{n-1}\]

We then decompose the covariance matrix via eigen-analysis (SVD):

\[\small\hat{\mathbf{\Sigma}}=\mathbf{U} \mathbf{\Lambda} \mathbf{U}^T\]

This step identifies a set of orthogonal axes describing directions of variation. These are the columns of \(\small\mathbf{U}\). Next we project the data onto these axes as:

\[\small\mathbf{P} = \mathbf{Y}_c\mathbf{U}\]

Here, \(\small\mathbf{P}\) represent the set of projection scores (principal component scores) on each of the PC axes. These are used in the subsequent plot.

Finally, the percent variation explained by any principal component (column of \(\mathbf{U}\)) is \[\frac{\lambda_i}{tr\mathbf{\Lambda}}\]

PCA: Alternative Computations

Note that one can alternatively perform SVD directly on \(\small\mathbf{Y}_c\):

\[\small{\mathbf{Y}_c}=\mathbf{V} \mathbf{D} \mathbf{U}^T\]

Here, \(\small\mathbf{D^2}\) expresses the percent-variation explained by each PC-axis, which are found as the right-singular vectors in: \(\small\mathbf{U}\).

PC scores are found as:

\[\small\mathbf{P} = \mathbf{VD}\]

NOTE: eigenanalysis of the inner product \(\small\mathbf{Y}^{T}_c\mathbf{Y}_c\) yields \(\small\mathbf{U}\), while eigenanalysis of the outer product \(\small\mathbf{Y}_c\mathbf{Y}^{T}_c\) yields \(\small\mathbf{V}\)

Side Comment: Matrix Decompositions

Matrix decompositions provide mechanisms for many multivariate methods, including ordinations, linear model coefficient calculations, and matrix covariation analyses. Here we describe two we have just used:

Eigen analysis

For an \(\small{p} \times p\) square matrix, \(\small\mathbf{S}\), there exists a vector of \(\small{p}\) values, \(\small\mathbf{\lambda}\), such that,

\[\small\mathbf{Sv} =\lambda\mathbf{v}\] which can also be written (in a more computaionally feasible way) as \[\small\left( \mathbf{S}-\lambda\mathbf{I} \right) \mathbf{v} = 0\]

Solving each possible eigenvector (\(\small\mathbf{v}\)) and eigenvalue (\(\small\lambda\)) via a characteristic polynomial is called, “eigen analysis”. This is done systematically by first describing the determinant, \(\small\left| \mathbf{S}-\lambda\mathbf{I} \right|\). This will be an equation with \(\small{p}\) possible solutions for \(\lambda\) that balance the equation. Up to \(\small{p}\) real numbers exist if \(\small\mathbf{S}\) is positive-definite. Some of these numbers might be 0 if \(\mathbf{S}\) is not full-rank. Some can be negative if \(\small\mathbf{S}\) is not symmetric. (This is not a certainty but might be common with non-metric association matrices; i.e., the similarity of A to B is not the same as B is to A.) By finding the values of \(\small\lambda\) that work, the vectors, \(\small\mathbf{v}\), are estimated such that the solution above holds true.

Matrix Decompositions (cont.)

Singular Value decomposition of a Rectangular Matrix

\[\small\mathbf{Y}_{n \times p} =\mathbf{U}_{n \times p'} \mathbf{\Lambda}_{p' \times p'} \mathbf{V}_{p' \times p'}^T\]

\(\small\mathbf{U}\) (left singular values) and \(\small\mathbf{V}\) (right singular values) are solved via numerical approximation algorithms. The number of singular values is \(\small{p}'= \min(n,p)\). If \(\small\mathbf{Y}\) is not full-rank, some of the singular values in \(\small\mathbf{\Lambda}\) will be 0. The singular values decrease from largest to smallest.

Singular Value decomposition of a Symmetric Matrix

\[\small\mathbf{S}_{p \times p} =\mathbf{U}_{p \times p} \mathbf{\Lambda}_{p \times p} \mathbf{U}_{p \times p}^T\]

\(\small\mathbf{U}\) (left and right singular values) are solved via numerical approximation algorithms If \(\small\mathbf{S}\) is symmetric, positive-definite, \(\small\mathbf{U}\) is the matrix of eigenvectors and \(\small\mathbf{\Lambda}\) the matrix of eigenvalues, equal to those found from eigen analysis using the characteristic polynomial function. This is a “short-cut” method, often employed for principal component analysis when it is sure that \(\small\mathbf{S}\) is symmetric, positive-definite.

PCA: Mathematical Comments

Let’s look at the dimensions of all components through this analysis:

1: \(\small\mathbf{Y}_c\): The data is a matrix of \(\small{n}\times{p}\) dimensions

2: \(\small\hat{\mathbf{\Sigma}}=\frac{\mathbf{Y}^{T}_c\mathbf{Y}_c}{n-1}\): The covariance matrix is of dimension \(\small{p}\times{p}\)

3: \(\small\hat{\mathbf{\Sigma}}=\mathbf{U} \mathbf{\Lambda} \mathbf{U}^T\): The matrix of eigenvectors \(\small\mathbf{U}\) is of dimension \(\small{p}\times{p}\)

4: \(\small\mathbf{P} = \mathbf{Y}_c\mathbf{U}\) The matrix of projected scores is of dimension \(\small{n}\times{p}\)

So what we are doing conceptually with PCA is taking a \(\small{p}\times{p}\) data space, finding \(\small{p}\) new directions within that data space, and rotating the data to these new axes to arrive at the PCA projected space

PCA: General Comments

PCA Interpretation

PCA and Dimensionality

In some cases, some dimensions may have 0% variation (\(\small\lambda<0\))

When \(\small{N > p}\), there are (at most) \(\small{N-1}\) PC axes with variation (when using metric data) When \(\small{p > N}\), there are (at most) \(\small{p}\) PC axes

Fewer PCs with variation may also occur when there is redundancy in the data

a <- rnorm(10)
b <-rnorm(10)
c <- a+b
cov.dat <- cov(cbind(a,b,c))
eigen(cov.dat)$values
## [1] 3.475031e+00 9.889196e-01 2.220446e-15

Note we have 3 variables, but only 2 dimensions of data

PCA: Effect of Trait Scale

If a variable varies much more than others, it dominates the PCA

(“more” is relative: 0 - 100cms is numerically “more” than 0 - 1m)
NOTE: a) and b) are equivalent, i.e. cov(Y.std) = cor(Y)

PCA: Example

Returning to the Bumpus data, we have 8 linear measurements from 136 birds. Thus, \(\small\mathbf{Y}_{c}\) is a \(\small{136}\times{8}\) dimensional matrix. A PCA of these data is below:

## [1] 0.825568587 0.113439756 0.031562141 0.018573009 0.005248828 0.002640375
## [7] 0.001771733 0.001195571
##         TL         AE         WT        BHL         HL         FL        TTL 
## 0.46642125 0.86020181 0.14948852 0.06078853 0.06403447 0.05835265 0.09008610 
##         SW 
## 0.02900088

Males and females are colored blue and red respectively. PC1 describes 82% of the variation and PC2 describes 11%. First PC is explained largely by AE and TL

PCA Example (Cont.)

Shepard’s plot of distances:

Note that larger distances are more accurately represented, while there is more ‘slop’ in the smaller distances. The reason is that objects similar in the PC1-PC2 projection may show differences in higher PC dimensions

Principal Components Analysis: Thoughts

PCA is helpful for exploring multivariate space and it may suggest patterns

It is descriptive! It is nothing more than a rigid rotation of the multivariate data space

No hypotheses are tested

Using the correlation vs. covariance matrix has important implications

It preserves the Euclidean distances among objects in the original multivariate data space

Very useful for summarizing the results of hypothesis-testing approaches applied to multivariate data(e.g. MANOVA)

Ordination: PCA Bi-Plot

Using Distances: PCoA: Principal Coordinates Analysis

Using Distances: PCoA: Principal Coordinates Analysis

Wait, what???

PCA Vs. PCoA

It seems that, apart from an axis reflection (which means nothing), PCA and this distance-based method (PCoA) yielded identical results

What in the WORLD is going on???

Principal Coordinates Analysis (PCoA)

For metric spaces, PCoA provides an identical ordination to PCA (PCoA is sometimes called Metric Multidimensional Scaling). This is most interesting, as PCA is based on the \(\small{p}\times{p}\) trait covariance matrix, while PCoA is based on the \(\small{n}\times{n}\) specimen distance matrix. Yet the relationships among objects in the two plots are absolutely identical!

Principal Coordinates Analysis (PCoA)

NMDS: Nonmetric Multidimensional Scaling

PCA and PCoA preserve distances among objects

Non-metric MDS generates ordination where similar objects are close together and dissimilar objects are far apart

Preserves the relative order of the objects, but not the distances themselves

Objective is to find a low dimensional representation of the data space

NMDS: Procedure

\[\small{stress} =\sqrt{\sum(D_{fitted}-\hat{D}_{fitted})^2/ \sum{(D_{fitted})^2}}\]

NMDS: Example

Pretty similar, but NOT identical.

NMDS: Comments

Correspondence Analysis (CA)

Ordination for count and frequency data (sometimes called reciprocal averaging)

Very frequently used in ecology (community data, species presence etc)

Preserves the \(\small\chi^2\) distance among objects (a weighted Euclidean distance of conditional probabilities)

\[\small{D}_{\chi^2}=\sqrt{\sum\frac{1}{y_k}(\frac{y_{ki}}{y_i}-\frac{y_{kj}}{y_j})^2}\]

Provides a test of independence of rows and columns

Correspondence Analysis: Procedure

Elements of \(\small\mathbf{\overline{Q}}\): \(\small{q}_{ij}=[\frac{p_{ij}-p_{i+}p_{j+}}{\sqrt{p_{i+}p_{j+}}}]\)

Correspondence Analysis: Example

Dune meadow vegetation data from the Dutch island of Terschelling

rows: 20 2x2m randomly selected plots sampled in the island in 1982 for a dune conservation project

columns: plant species

values: cover class (standardized estimate of density)

Correspondence Analysis: Issues

Although most ordination methods construct mathematically orthogonal axes, real life is not constrained to be orthogonal!

This may create problems, e.g. the “arch effect” (aka the “horseshoe” effect)

The \(\small{2}^{nd}\) axis is an arched function of the 1st axis (common in ecological data, e.g. species counts in sites along an environmental gradient)

Detrended Correspondence Analysis (DCA)

DCA: Comments

For detailed critique see Wartenberg, Ferson, and Rohlf, 1987. Am. Nat. 129:434-448.

Ordination: Conclusions