10: Multivariate Association and Canonical Ordination Methods

Dean Adams, Iowa State University

Variable Covariation

Univariate covariation, correlation

\[\small{cov}_{y_{1},y_{2}}=\sigma_{y_{1},y_{2}}=\frac{1}{n-1} \sum_{i=1}^{n}\left(y_{i1}-\bar{y}_{1}\right) \left(y_{i2}-\bar{y}_{2}\right)\]

\[\small{cor}_{y_{1},y_{2}}=r_{1,2}=\frac{1}{n-1} \sum_{i=1}^{n}\frac{\left(y_{i1}-\bar{y_1}\right)}{s_{1}} \frac{\left(y_{i2}-\bar{y_2}\right)}{s_{2}}\]

Variable Covariation: Example

What is the relationship between mass-specific metabolic rate and body mass across 580 mammal species (data from Capellini et al. Ecol. 2010)

Variable Covariation

Univariate covariation, correlation

Sums of squares (SS):

\(\small{SS}_{y_1}=\sum_{i=1}^{n}\left(y_{i1}-\bar{y_1}\right)^{2}=\sum_{i=1}^{n}\left(y_{i1}-\bar{y_1}\right)\left(y_{i1}-\bar{y_1}\right)\)

\(\small{SS}_{y_2}=\sum_{i=1}^{n}\left(y_{i2}-\bar{y_2}\right)^{2}=\sum_{i=1}^{n}\left(y_{i2}-\bar{y_2}\right)\left(y_{i2}-\bar{y_2}\right)\)

\(\small{SC}_{y_1y_2}=\sum_{i=1}^{n}\left(y_{i1}-\bar{y_1}\right)\left(y_{i2}-\bar{y_2}\right)\)

Variance components

\(\small{s}^{2}_{y_1}=\sigma^{2}_{y_1}=\frac{\sum_{i=1}^{n}\left(y_{i1}-\bar{y_1}\right)^{2}}{n-1}\)

\(\small{s}^{2}_{y_2}=\sigma^{2}_{y_2}=\frac{\sum_{i=1}^{n}\left(y_{i2}-\bar{y_2}\right)^{2}}{n-1}\)

\(\small{cov}_{y_1y_2}=\sigma_{y_1y_2}=\frac{\sum_{i=1}^{n}\left(y_{i1}-\bar{y_1}\right)\left(y_{i2}-\bar{y_2}\right)}{n-1}\)

Variable Covariation

Univariate covariation, correlation

\[\small{cov}_{y_{1},y_{2}}=\sigma_{y_{1},y_{2}}=\frac{1}{n-1} \sum_{i=1}^{n}\left(y_{i1}-\bar{y_1}\right) \left(y_{i2}-\bar{y_2}\right)\]

\[\small{cor}_{y_{1},y_{2}}=r_{1,2}=\frac{1}{n-1} \sum_{i=1}^{n}\frac{\left(y_{i1}-\bar{y_1}\right)}{s_{1}} \frac{\left(y_{i2}-\bar{y_2}\right)}{s_{2}}\]

Variable Covariation

Univariate covariation, correlation

\[\small{cov}_{y_{1},y_{2}}=\sigma_{y_{1},y_{2}}=\frac{1}{n-1} \sum_{i=1}^{n}\left(y_{i1}-\bar{y_1}\right) \left(y_{i2}-\bar{y_2}\right)\]

\[\small{cor}_{y_{1},y_{2}}=r_{1,2}=\frac{1}{n-1} \sum_{i=1}^{n}\frac{\left(y_{i1}-\bar{y_1}\right)}{s_{1}} \frac{\left(y_{i2}-\bar{y_2}\right)}{s_{2}}\]

Re-express:

\[\small{cor}_{y_{1},y_{2}}=r_{1,2}=\frac{1}{n-1} \sum_{i=1}^{n}\frac{\left(y_{i1}-\bar{y_1}\right)}{s_{1}} \frac{\left(y_{i2}-\bar{y_2}\right)}{s_{2}}=\]

\[\small{r}_{1,2}=\frac{1}{n-1}\sum_{i=1}^{n}\frac{\left(y_{i1}-\bar{y_1}\right)}{\sqrt{\frac{SS_{y_1}}{n-1}}}\frac{\left(y_{i2}-\bar{y_2}\right)}{\sqrt{\frac{SS_{y_2}}{n-1}}}=\]

\[\small{r}_{1,2}=\frac{SC_{y_1y_2}}{\sqrt{SS_{y_1}SS_{y_2}}}=\frac{\sigma_{y_1y_2}}{\sqrt{\sigma^{2}_{y_1}\sigma^{2}_{y_2}}}=\frac{\sigma_{y_1y_2}}{\sigma_{y_1}\sigma_{y_2}}\]

Variable Covariation: Example

X: Independent (predictor) variable Y: Dependent (response) variable

Variable Covariation: Example

X: Independent (predictor) variable Y: Dependent (response) variable

\(\small{SS}_{x}=2997.23\); \(\small{SS}_{y}=371.48\); \(\small{SC}_{xy}=-925.99\)

\(\small{r}_{xy}=\frac{-925.99}{\sqrt{2997.23*371.48}}=-0.877\)

Variable Covariation: Example

X: Independent (predictor) variable Y: Dependent (response) variable

\(\small{SS}_{x}=2997.23\); \(\small{SS}_{y}=371.48\); \(\small{SC}_{xy}=-925.99\)

\(\small{r}_{xy}=\frac{-925.99}{\sqrt{2997.23*371.48}}=-0.877\)

Strength of association: \(\small{r}^{2}=0.770\)

That’s great, but what about correlations of two SETS of variables?

First Pass: Mantel Tests

Consider two data matrices \(\small\mathbf{X}\) & \(\small\mathbf{Y}\). The dispersion of objects in each may be described by the distance matrices, \(\small\mathbf{D}_X\) & \(\small\mathbf{D}_Y\). A Mantel test is used to evaluate the association between two or more distance matrices.

First, obtain: \(\small{z}_M= \sum{\mathbf{X}_{i}\mathbf{Y}_{i}}\) where \(\small\mathbf{X}\) & \(\small\mathbf{Y}\) are the unfolded distance matrices

Next, obtain the Mantel correlation coefficient: \(\small{r}_M = \frac{z_M}{[n(n-1)/2]-1}\)

Assess significance of \(\small{r_M}\) via permutation, where R/C of distance matrix are permuted.

First Pass: Mantel Tests

Consider two data matrices \(\small\mathbf{X}\) & \(\small\mathbf{Y}\). The dispersion of objects in each may be described by the distance matrices, \(\small\mathbf{D}_X\) & \(\small\mathbf{D}_Y\). A Mantel test is used to evaluate the association between two or more distance matrices.

First, obtain: \(\small{z}_M= \sum{\mathbf{X}_{i}\mathbf{Y}_{i}}\) where \(\small\mathbf{X}\) & \(\small\mathbf{Y}\) are the unfolded distance matrices

Next, obtain the Mantel correlation coefficient: \(\small{r}_M = \frac{z_M}{[n(n-1)/2]-1}\)

Assess significance of \(\small{r_M}\) via permutation, where R/C of distance matrix are permuted.

However, Mantel tests can suffer from inflated type I error, low power, and have significant bias with autocorrelated error. Thus, while widely used, their drawbacks are sufficient to warrant a look at alternative approaches. We won’t discuss them further (See: Oden and Sokal 1992. J. Classif.; Legendre 2000. J. Stat. Comp. Simul.; Harmon and Glor 2010. Evol.; Guillot & Rousset 2013. Methods Ecol. Evol.)

Variable Covariation: Another Perspective

Another way to approach this is via matrix algebra

Y1 = log(mass) & Y2 = VO2

\[\mathbf{Y}=[\mathbf{y}_{1}\mathbf{y}_{2}]\]

Let \(\small\mathbf{1}^{T}=[1 1 1 ... 1]\) then…

\[\small\bar{\mathbf{y}}^{T}=\left(\mathbf{1}^{T}\mathbf{1}\right)^{-1}\mathbf{1}^{T}\mathbf{Y}\]

Variable Covariation: Another Perspective

Another way to approach this is via matrix algebra

Y1 = log(mass) & Y2 = VO2

\[\mathbf{Y}=[\mathbf{y}_{1}\mathbf{y}_{2}]\]

Let \(\small\mathbf{1}^{T}=[1 1 1 ... 1]\) then…

\[\small\bar{\mathbf{y}}^{T}=\left(\mathbf{1}^{T}\mathbf{1}\right)^{-1}\mathbf{1}^{T}\mathbf{Y}\]

\[\small\bar{\mathbf{Y}}=\mathbf{1}\left(\mathbf{1}^{T}\mathbf{1}\right)^{-1}\mathbf{1}^{T}\mathbf{Y}\]

Variable Covariation: Another Perspective

Another way to approach this is via matrix algebra

Y1 = log(mass) & Y2 = VO2

\[\mathbf{Y}=[\mathbf{y}_{1}\mathbf{y}_{2}]\]

Let \(\small\mathbf{1}^{T}=[1 1 1 ... 1]\) then…

\[\small\bar{\mathbf{y}}^{T}=\left(\mathbf{1}^{T}\mathbf{1}\right)^{-1}\mathbf{1}^{T}\mathbf{Y}\]

\[\small\bar{\mathbf{Y}}=\mathbf{1}\left(\mathbf{1}^{T}\mathbf{1}\right)^{-1}\mathbf{1}^{T}\mathbf{Y}\]

\[\small\mathbf{E}=\mathbf{Y}_{c}=\mathbf{Y}-\bar{\mathbf{Y}}\]

\[\small\mathbf{SSCP}=\mathbf{E}^{T}\mathbf{E}=\mathbf{Y}_{c}^{T}\mathbf{Y}_{c}\]

\[\mathbf{SSCP}=\begin{bmatrix} \mathbf{SS}_{y_{1}} & \mathbf{SC}_{y_{1}y_{2}} \\ \mathbf{SC}_{y_{2}y_{1}} & \mathbf{SS}_{y_{2}} \end{bmatrix}\]

\(\small{SSCP}\) is a matrix of sums of squares and cross-products (found using deviations from the mean)

From SSCP to a Covariance Matrix

\[\mathbf{SSCP}=\begin{bmatrix} \mathbf{SS}_{y_{1}} & \mathbf{SC}_{y_{1}y_{2}} \\ \mathbf{SC}_{y_{2}y_{1}} & \mathbf{SS}_{y_{2}} \end{bmatrix}\]

The covariance matrix is simply the \(\small{SSCP}\) standardized by \(\small{n-1}\):

\[\hat{\mathbf{\Sigma}}=\frac{\mathbf{E}^{T}\mathbf{E}}{n-1}\]

From SSCP to a Covariance Matrix

\[\mathbf{SSCP}=\begin{bmatrix} \mathbf{SS}_{y_{1}} & \mathbf{SC}_{y_{1}y_{2}} \\ \mathbf{SC}_{y_{2}y_{1}} & \mathbf{SS}_{y_{2}} \end{bmatrix}\]

The covariance matrix is simply the \(\small{SSCP}\) standardized by \(\small{n-1}\):

\[\hat{\mathbf{\Sigma}}=\frac{\mathbf{E}^{T}\mathbf{E}}{n-1}\]

\[\hat{\mathbf{\Sigma}}=\frac{\begin{bmatrix} \mathbf{SS}_{y_{1}} & \mathbf{SC}_{y_{1}y_{2}} \\ \mathbf{SC}_{y_{2}y_{1}} & \mathbf{SS}_{y_{2}} \end{bmatrix}}{n-1}\]

The covariance matrix is:

\[\hat{\mathbf{\Sigma}}=\begin{bmatrix} \sigma^{2}_{y_{1}} & \sigma_{y_{1}y_{2}} \\ \sigma_{y_{2}y_{1}} & \sigma^{2}_{y_{2}} \end{bmatrix}\]

From Covariance to Correlation

Correlations are simply standardized covariances

\[\small{cov}_{y_{1},y_{2}}=\sigma_{y_{1},y_{2}}=\frac{1}{n-1} \sum_{i=1}^{n}\left(y_{i1}-\bar{y_1}\right) \left(y_{i2}-\bar{y_2}\right)\]

\[\small{cor}_{y_{1},y_{2}}=r_{1,2}=\frac{1}{n-1} \sum_{i=1}^{n}\frac{\left(y_{i1}-\bar{y_1}\right)}{s_{1}} \frac{\left(y_{i2}-\bar{y_2}\right)}{s_{2}}\]

From Covariance to Correlation

Correlations are simply standardized covariances

\[\small{cov}_{y_{1},y_{2}}=\sigma_{y_{1},y_{2}}=\frac{1}{n-1} \sum_{i=1}^{n}\left(y_{i1}-\bar{y_1}\right) \left(y_{i2}-\bar{y_2}\right)\]

\[\small{cor}_{y_{1},y_{2}}=r_{1,2}=\frac{1}{n-1} \sum_{i=1}^{n}\frac{\left(y_{i1}-\bar{y_1}\right)}{s_{1}} \frac{\left(y_{i2}-\bar{y_2}\right)}{s_{2}}\]

In matrix notation let:

\[\hat{\mathbf{\Sigma}}=\begin{bmatrix} \sigma^{2}_{y_{1}} & \sigma_{y_{1}y_{2}} \\ \sigma_{y_{2}y_{1}} & \sigma^{2}_{y_{2}} \end{bmatrix}\]

\[\mathbf{V}=\begin{bmatrix} \sigma^{2}_{y_{1}} & 0 \\ 0 & \sigma^{2}_{y_{2}} \end{bmatrix}\]

From Covariance to Correlation

Correlations are simply standardized covariances

\[\small{cov}_{y_{1},y_{2}}=\sigma_{y_{1},y_{2}}=\frac{1}{n-1} \sum_{i=1}^{n}\left(y_{i1}-\bar{y_1}\right) \left(y_{i2}-\bar{y_2}\right)\]

\[\small{cor}_{y_{1},y_{2}}=r_{1,2}=\frac{1}{n-1} \sum_{i=1}^{n}\frac{\left(y_{i1}-\bar{y_1}\right)}{s_{1}} \frac{\left(y_{i2}-\bar{y_2}\right)}{s_{2}}\]

In matrix notation let:

\[\hat{\mathbf{\Sigma}}=\begin{bmatrix} \sigma^{2}_{y_{1}} & \sigma_{y_{1}y_{2}} \\ \sigma_{y_{2}y_{1}} & \sigma^{2}_{y_{2}} \end{bmatrix}\]

\[\mathbf{V}=\begin{bmatrix} \sigma^{2}_{y_{1}} & 0 \\ 0 & \sigma^{2}_{y_{2}} \end{bmatrix}\]

Then: \[\mathbf{R}=\mathbf{V}^{-\frac{1}{2}}\hat{\mathbf{\Sigma}}\mathbf{V}^{-\frac{1}{2}}=\begin{bmatrix} 1 & r_{y_{1}y_{2}} \\ r_{y_{2}y_{1}} & 1 \end{bmatrix}\]

Note: the covariance matrix of standard normal deviates (\(\small{z}=\frac{y-\bar{y}}{\sigma_{y}}\)) results in the correlation matrix (because correlations are standardized covariances!)

Covariation Example Revisited

\[\small\mathbf{SSCP}=\begin{bmatrix} \mathbf{SS}_{y_{1}} & \mathbf{SC}_{y_{1}y_{2}} \\ \mathbf{SC}_{y_{2}y_{1}} & \mathbf{SS}_{y_{2}} \end{bmatrix} = \begin{bmatrix} 2997.23 \\ -925.99 & 371.48 \end{bmatrix}\]

\[\small\hat{\mathbf{\Sigma}}=\begin{bmatrix} \sigma^{2}_{y_{1}} & \sigma_{y_{1}y_{2}} \\ \sigma_{y_{2}y_{1}} & \sigma^{2}_{y_{2}} \end{bmatrix} = \begin{bmatrix} 5.167 \\ -1.596 & 0.640 \end{bmatrix}\]

\[\small{R}=\begin{bmatrix} 1 \\ -0.877 & 1 \end{bmatrix}\]

Generalization: Two Blocks

Consider \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\) as matrices not vectors:

\[\mathbf{Y}=[\mathbf{Y}_{1}\mathbf{Y}_{2}]\]

We can still envision SSCP , \(\small\hat{\mathbf{\Sigma}}\), and \(\small\mathbf{R}\)

Generalization: Two Blocks

Consider \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\) as matrices not vectors:

\[\mathbf{Y}=[\mathbf{Y}_{1}\mathbf{Y}_{2}]\]

We can still envision SSCP , \(\small\hat{\mathbf{\Sigma}}\), and \(\small\mathbf{R}\)

\[\hat{\mathbf{\Sigma}}=\begin{bmatrix} \mathbf{S}_{11} & \mathbf{S}_{12} \\ \mathbf{S}_{21} & \mathbf{S}_{22} \end{bmatrix}\]

\[\mathbf{R}=\begin{bmatrix} \mathbf{R}_{11} & \mathbf{R}_{12} \\ \mathbf{R}_{21} & \mathbf{R}_{22} \end{bmatrix}\]

\(\small\hat{\mathbf{\Sigma}}\) and \(\small\mathbf{R}\) now describe covariation and correlations between BLOCKS of variables

What’s in a \(\small\hat{\mathbf{\Sigma}}\) matrix?

Dimensionality of \(\small\hat{\mathbf{\Sigma}}\) and \(\small\mathbf{R}\) determined by dimensions of \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

\(\small\mathbf{Y}_{1}\) is \(\small{n} \times {p}_{1}\) dimensions

\(\small\mathbf{Y}_{2}\) is \(\small{n} \times {p}_{2}\) dimensions

What’s in a \(\small\hat{\mathbf{\Sigma}}\) matrix?

Dimensionality of \(\small\hat{\mathbf{\Sigma}}\) and \(\small\mathbf{R}\) determined by dimensions of \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

\(\small\mathbf{Y}_{1}\) is \(\small{n} \times {p}_{1}\) dimensions

\(\small\mathbf{Y}_{2}\) is \(\small{n} \times {p}_{2}\) dimensions

Therefore \(\small\hat{\mathbf{\Sigma}}\) is \(\small({p}_{1}+{p}_{2}) \times ({p}_{1}+{p}_{2})\) dimensions

Blocks can have different numbers of variables (\(\small{p}_{1}\neq \small{p}_{2}\) )

The Components of \(\small\hat{\mathbf{\Sigma}}\)

Different sub-blocks within \(\small\hat{\mathbf{\Sigma}}\) describe distinct components of trait covariation

\(\small\mathbf{S}_{11}\): covariation of variables in \(\small\mathbf{Y}_{1}\)

\(\small\mathbf{S}_{22}\): covariation of variables in \(\small\mathbf{Y}_{2}\)

\(\small\mathbf{S}_{21}=\mathbf{S}_{12}^{T}\): covariation between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

\(\small\mathbf{S}_{21}=\mathbf{S}_{12}^{T}\) is the multivariate equivalent of \(\small\sigma_{21}\)

The Components of \(\small\hat{\mathbf{\Sigma}}\)

Different sub-blocks within \(\small\hat{\mathbf{\Sigma}}\) describe distinct components of trait covariation

\(\small\mathbf{S}_{11}\): covariation of variables in \(\small\mathbf{Y}_{1}\)

\(\small\mathbf{S}_{22}\): covariation of variables in \(\small\mathbf{Y}_{2}\)

\(\small\mathbf{S}_{21}=\mathbf{S}_{12}^{T}\): covariation between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

\(\small\mathbf{S}_{21}=\mathbf{S}_{12}^{T}\) is the multivariate equivalent of \(\small\sigma_{21}\)

Can we generalize an association measure from \(\small\mathbf{S}_{21}\)?

Escoffier’s RV Coefficient

Recall our derivation of the correlation coefficient

\[\small{r}_{y_1,y_2}=\frac{1}{n-1}\sum_{i=1}^{n}\frac{\left(y_{i1}-\bar{y_1}\right)}{s_{1}}\frac{\left(y_{i2}-\bar{y_2}\right)}{s_{2}}\]

\[\small{r}_{y_1,y_2}=\frac{\sigma_{y_1y_2}}{\sigma_{y_1}\sigma_{y_2}}\]

The squared correlation is then:

\[\small{r}^{2}=\frac{\sigma^{2}_{y_1y_2}}{\sigma^{2}_{y_1}\sigma^{2}_{y_2}}\]

Escoffier’s RV Coefficient

Recall our derivation of the correlation coefficient

\[\small{r}_{y_1,y_2}=\frac{1}{n-1}\sum_{i=1}^{n}\frac{\left(y_{i1}-\bar{y_1}\right)}{s_{1}}\frac{\left(y_{i2}-\bar{y_2}\right)}{s_{2}}\]

\[\small{r}_{y_1,y_2}=\frac{\sigma_{y_1y_2}}{\sigma_{y_1}\sigma_{y_2}}\]

The squared correlation is then:

\[\small{r}^{2}=\frac{\sigma^{2}_{y_1y_2}}{\sigma^{2}_{y_1}\sigma^{2}_{y_2}}\]

Analogously, for multivariate one may consider:

\[\small{RV}=\frac{tr(\mathbf{S}_{12}\mathbf{S}_{21})}{\sqrt{tr(\mathbf{S}_{11}\mathbf{S}_{11})tr(\mathbf{S}_{22}\mathbf{S}_{22})}}\]

Range of \(\small\mathbf{RV}\): \(\small{0}\rightarrow{1}\)

Note: ‘tr’ signifies trace (sum of diagonal elements)

Escoffier’s RV Coefficient: Comments

The RV coefficient is analogous to \(\small{r}^{2}\) but it is not a strict mathematical generalization

\(\small{r}^{2}=\frac{\sigma^{2}_{y_1y_2}}{\sigma^{2}_{y_1}\sigma^{2}_{y_2}}\) VS. \(\small{RV}=\frac{tr(\mathbf{S}_{12}\mathbf{S}_{21})}{\sqrt{tr(\mathbf{S}_{11}\mathbf{S}_{11})tr(\mathbf{S}_{22}\mathbf{S}_{22})}}\)

The numerator of \(\small{r}^{2}\) & \(\small{RV}\) describes the covariation between \(\small\mathbf{Y}_{1}\) & \(\small\mathbf{Y}_{2}\)

The denominator of \(\small{r}^{2}\) & \(\small{RV}\) describes variation within \(\small\mathbf{Y}_{1}\) & \(\small\mathbf{Y}_{2}\)

Thus, \(\small{RV}\) (like \(\small{r}^{2}\)) is a ratio of between-block relative to within-block variation

Escoffier’s RV Coefficient: Comments

The RV coefficient is analogous to \(\small{r}^{2}\) but it is not a strict mathematical generalization

\(\small{r}^{2}=\frac{\sigma^{2}_{y_1y_2}}{\sigma^{2}_{y_1}\sigma^{2}_{y_2}}\) VS. \(\small{RV}=\frac{tr(\mathbf{S}_{12}\mathbf{S}_{21})}{\sqrt{tr(\mathbf{S}_{11}\mathbf{S}_{11})tr(\mathbf{S}_{22}\mathbf{S}_{22})}}\)

The numerator of \(\small{r}^{2}\) & \(\small{RV}\) describes the covariation between \(\small\mathbf{Y}_{1}\) & \(\small\mathbf{Y}_{2}\)

The denominator of \(\small{r}^{2}\) & \(\small{RV}\) describes variation within \(\small\mathbf{Y}_{1}\) & \(\small\mathbf{Y}_{2}\)

Thus, \(\small{RV}\) (like \(\small{r}^{2}\)) is a ratio of between-block relative to within-block variation

However, because each \(\small\mathbf{S}\) is a covariance matrix, the sub-components of \(\small\mathbf{RV}\) are squared variances and covariances: not variances as in \(\small{r}^{2}\): \(\tiny\text{hence, range of } \mathbf{RV}={0}\rightarrow{1}\)

Escoffier’s RV Coefficient: Comments

The RV coefficient is analogous to \(\small{r}^{2}\) but it is not a strict mathematical generalization

\(\small{r}^{2}=\frac{\sigma^{2}_{y_1y_2}}{\sigma^{2}_{y_1}\sigma^{2}_{y_2}}\) VS. \(\small{RV}=\frac{tr(\mathbf{S}_{12}\mathbf{S}_{21})}{\sqrt{tr(\mathbf{S}_{11}\mathbf{S}_{11})tr(\mathbf{S}_{22}\mathbf{S}_{22})}}\)

The numerator of \(\small{r}^{2}\) & \(\small{RV}\) describes the covariation between \(\small\mathbf{Y}_{1}\) & \(\small\mathbf{Y}_{2}\)

The denominator of \(\small{r}^{2}\) & \(\small{RV}\) describes variation within \(\small\mathbf{Y}_{1}\) & \(\small\mathbf{Y}_{2}\)

Thus, \(\small{RV}\) (like \(\small{r}^{2}\)) is a ratio of between-block relative to within-block variation

However, because each \(\small\mathbf{S}\) is a covariance matrix, the sub-components of \(\small\mathbf{RV}\) are squared variances and covariances: not variances as in \(\small{r}^{2}\): \(\tiny\text{hence, range of } \mathbf{RV}={0}\rightarrow{1}\)

It is not at all obvious what a ‘squared covariance’ represents in a statistical sense, as variances are already summed squared deviations from the mean (see Bookstein. Evol. Biol. 2016)
Nonetheless, \(\small{RV}\) is often considered a multivariate measure of association between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

RV Coefficient: Example

Pecos pupfish (shape obtained using geometric morphometrics)

Is there an association between head shape and body shape?

RV Coefficient: Example

Pecos pupfish (shape obtained using geometric morphometrics)

Is there an association between head shape and body shape?

\[\small{RV}=\frac{tr(\mathbf{S}_{12}\mathbf{S}_{21})}{\sqrt{tr(\mathbf{S}_{11}\mathbf{S}_{11})tr(\mathbf{S}_{22}\mathbf{S}_{22})}}=0.607\]

\[\small\sqrt{RV}=0.779\]

RV Coefficient: Example

Pecos pupfish (shape obtained using geometric morphometrics)

Is there an association between head shape and body shape?

\[\small{RV}=\frac{tr(\mathbf{S}_{12}\mathbf{S}_{21})}{\sqrt{tr(\mathbf{S}_{11}\mathbf{S}_{11})tr(\mathbf{S}_{22}\mathbf{S}_{22})}}=0.607\]

\[\small\sqrt{RV}=0.779\]

Conclusion: head shape and body shape appear to be correlated (associated). However, we still require a formal test of this summary measure!

Multivariate Association: Partial Least Squares

Another way to summarize the covariation between blocks is via Partial Least Squares (PLS)

\(\small\mathbf{S}_{21}=\mathbf{S}_{12}^{T}\): expresses covariation between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

Multivariate Association: Partial Least Squares

Another way to summarize the covariation between blocks is via Partial Least Squares (PLS)

\(\small\mathbf{S}_{21}=\mathbf{S}_{12}^{T}\): expresses covariation between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

Decomposing the information in \(\small\mathbf{S}_{12}\) to find rotational solution (direction) that describes greatest covariation between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

\[\small\mathbf{S}_{12}=\mathbf{U\Lambda{V}}^T\]

Multivariate Association: Partial Least Squares

Another way to summarize the covariation between blocks is via Partial Least Squares (PLS)

\(\small\mathbf{S}_{21}=\mathbf{S}_{12}^{T}\): expresses covariation between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

Decomposing the information in \(\small\mathbf{S}_{12}\) to find rotational solution (direction) that describes greatest covariation between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

\[\small\mathbf{S}_{12}=\mathbf{U\Lambda{V}}^T\]

\(\small\mathbf{U}\) is the matrix of left singular vectors, which are eigenvectors of \(\small\mathbf{Y}_{1}\) aligned in the direction of maximum covariation between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

\(\small\mathbf{V}\) is the matrix of right singular vectors, which are eigenvectors of \(\small\mathbf{Y}_{2}\) aligned in the direction of maximum covariation between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

\(\small\mathbf{\Lambda}\) contains the singular values (squared eigenvalues: \(\small\lambda^{2}\)) describe the covariation between pairs of singular vectors \(\small\mathbf{U}\) and \(\small\mathbf{V}\)

Note: \(\small\frac{\lambda^{2}_{1}}{\sum\lambda^{2}_{i}}\times100\) (percent covariation explained by first pair of axes)

PLS Correlation

\[\small\mathbf{S}_{12}=\mathbf{U\Lambda{V}}^T\]

Ordination scores found by projection of centered data on vectors \(\small\mathbf{U}\) and \(\small\mathbf{V}\)

\[\small\mathbf{P}_{1}=\mathbf{Y}_{1}\mathbf{U}\]

\[\small\mathbf{P}_{2}=\mathbf{Y}_{2}\mathbf{V}\]

The first columns of \(\small\mathbf{P}_{1}\) and \(\small\mathbf{P}_{2}\) describe the maximal covariation between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

PLS Correlation

\[\small\mathbf{S}_{12}=\mathbf{U\Lambda{V}}^T\]

Ordination scores found by projection of centered data on vectors \(\small\mathbf{U}\) and \(\small\mathbf{V}\)

\[\small\mathbf{P}_{1}=\mathbf{Y}_{1}\mathbf{U}\]

\[\small\mathbf{P}_{2}=\mathbf{Y}_{2}\mathbf{V}\]

The first columns of \(\small\mathbf{P}_{1}\) and \(\small\mathbf{P}_{2}\) describe the maximal covariation between \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\)

The correlation between \(\small\mathbf{P}_{11}\) and \(\small\mathbf{P}_{21}\) is the PLS-correlation

\[\small{r}_{PLS}={cor}_{P_{11}P_{21}}\]

Partial Least Squares: Example

\(\tiny{RV}=\frac{tr(\mathbf{S}_{12}\mathbf{S}_{21})}{\sqrt{tr(\mathbf{S}_{11}\mathbf{S}_{11})tr(\mathbf{S}_{22}\mathbf{S}_{22})}}=0.607\) and \(\tiny\sqrt{RV}=0.779\)

\(\small{r}_{PLS}={cor}_{P_{11}P_{21}}=0.917\)

## Data in either A1 or A2 do not have names.  It is assumed data in both A1 and A2 are ordered the same.
## 
## Call:
## two.b.pls(A1 = y, A2 = x, iter = 999, print.progress = FALSE) 
## 
## 
## 
## r-PLS: 0.917
## 
## Effect Size (Z): 5.4019
## 
## P-value: 0.001
## 
## Based on 1000 random permutations

Again, head shape and body shape appear to be correlated, but we still require a formal test of this summary measure!

Evaluating Multivariate Associations

We now have two potential test measures of multivariate correlation

\[\small{RV}=\frac{tr(\mathbf{S}_{12}\mathbf{S}_{21})}{\sqrt{tr(\mathbf{S}_{11}\mathbf{S}_{11})tr(\mathbf{S}_{22}\mathbf{S}_{22})}}\]

\[\small{r}_{PLS}={cor}_{P_{11}P_{21}}\]

Two questions emerge:

Evaluating Multivariate Associations

We now have two potential test measures of multivariate correlation

\[\small{RV}=\frac{tr(\mathbf{S}_{12}\mathbf{S}_{21})}{\sqrt{tr(\mathbf{S}_{11}\mathbf{S}_{11})tr(\mathbf{S}_{22}\mathbf{S}_{22})}}\]

\[\small{r}_{PLS}={cor}_{P_{11}P_{21}}\]

Two questions emerge:

1: How do we assess significance of the test measures?

2: Is one approach preferable over the other?

Evaluating Multivariate Associations

We now have two potential test measures of multivariate correlation

\[\small{RV}=\frac{tr(\mathbf{S}_{12}\mathbf{S}_{21})}{\sqrt{tr(\mathbf{S}_{11}\mathbf{S}_{11})tr(\mathbf{S}_{22}\mathbf{S}_{22})}}\]

\[\small{r}_{PLS}={cor}_{P_{11}P_{21}}\]

Two questions emerge:

1: How do we assess significance of the test measures?

2: Is one approach preferable over the other?

We will use permutation methods (RRPP) for assessing significance

Permutation Tests for Multivariate Association

Test statistics:

\(\small\hat\rho=\sqrt{RV}\) and \(\small\hat\rho={r}_{PLS}\)

H0: \(\small\rho=0\)

H1: \(\small\rho>0\)

Permutation Tests for Multivariate Association

Test statistics:

\(\small\hat\rho=\sqrt{RV}\) and \(\small\hat\rho={r}_{PLS}\)

H0: \(\small\rho=0\)

H1: \(\small\rho>0\)

RRPP Approach:

1: Represent \(\small\mathbf{Y}_{1}\) and \(\small\mathbf{Y}_{2}\) as deviations from mean (H0)

2: Estimate \(\small\hat\rho=\sqrt{RV}_{obs}\) and \(\small\hat\rho={r}_{PLS_{obs}}\)

3: Permute rows of \(\small\mathbf{Y}_{2}\), obtain \(\small\hat\rho=\sqrt{RV}_{rand}\) and \(\small\hat\rho={r}_{PLS_{rand}}\)

4: Repeat many times to generate sampling distribution

Permutation Tests for RV and rPLS: Example

Test statistics:

\(\small\hat\rho=\sqrt{RV}\) and \(\small\hat\rho={r}_{PLS}\)

H0: \(\small\rho=0\)

H1: \(\small\rho>0\)

For the pupfish dataset, both are significant at p = 0.001

Permutation Tests for RV and rPLS: Example

Test statistics:

\(\small\hat\rho=\sqrt{RV}\) and \(\small\hat\rho={r}_{PLS}\)

H0: \(\small\rho=0\)

H1: \(\small\rho>0\)

Compare permutation distributions with one another (minus observed in this case)

All things considered, rPLS performs better.

PLS: Another example

PLS: Another example

Matrix Association Methods: Complications

Because \(\small\mathbf{RV}\) has a finite range of \(\small{0}\rightarrow{1}\), with 0 representing no covariation between blocks, it may seem intuitive to qualitatively compare \(\small\mathbf{RV}\) values across datasets

THIS TEMPTATION SHOULD BE AVOIDED!

Matrix Association Methods: Complications

Because \(\small\mathbf{RV}\) has a finite range of \(\small{0}\rightarrow{1}\), with 0 representing no covariation between blocks, it may seem intuitive to qualitatively compare \(\small\mathbf{RV}\) values across datasets

THIS TEMPTATION SHOULD BE AVOIDED!

With random MVN data, \(\small\mathbf{RV}\) varies with both n and p!

Straight-up comparisons of \(\small\mathbf{RV}\) are not useful

Comparing Association Patterns Across Data Sets

Similar patterns are found with \({r}_{PLS}\):

Comparing Association Patterns Across Data Sets

Similar patterns are found with \({r}_{PLS}\):

However, conversion to effect sizes eliminates trends with n and p, allowing meaningful comparisons

where: \(\small\mathbf{Z}=\frac{r_{PLS_{obs}}-\mu_{r_{PLS_{rand}}}}{\sigma_{r_{PLS_{rand}}}}\)

Comparing Association Patterns Across Data Sets

Similar patterns are found with \({r}_{PLS}\):

However, conversion to effect sizes eliminates trends with n and p, allowing meaningful comparisons

where: \(\small\mathbf{Z}=\frac{r_{PLS_{obs}}-\mu_{r_{PLS_{rand}}}}{\sigma_{r_{PLS_{rand}}}}\)

Effect sizes then compared as: \(\small\mathbf{Z}_{12}=\frac{\vert{(r_{PLS1}-\mu_{r_{PLS1}})-(r_{PLS2}-\mu_{r_{PLS2}})}\vert}{\sqrt{\sigma^2_{r_{PLS1}}+\sigma^2_{r_{PLS2}}}}\)

Comparing Association Patterns Across Data Sets

Similar patterns are found with \({r}_{PLS}\):

However, conversion to effect sizes eliminates trends with n and p, allowing meaningful comparisons

where: \(\small\mathbf{Z}=\frac{r_{PLS_{obs}}-\mu_{r_{PLS_{rand}}}}{\sigma_{r_{PLS_{rand}}}}\)

Effect sizes then compared as: \(\small\mathbf{Z}_{12}=\frac{\vert{(r_{PLS1}-\mu_{r_{PLS1}})-(r_{PLS2}-\mu_{r_{PLS2}})}\vert}{\sqrt{\sigma^2_{r_{PLS1}}+\sigma^2_{r_{PLS2}}}}\)

As we have shown previously, statistical evaluation of effect sizes is paramount to proper permutational approaches to multivariate statistics! (DCA and MLC assert that effect sizes from RRPP represent the path forward for recalcitrant problems in high-dimensional data analysis)

Comparing Association Patterns: Example

Morphological Integration describes the covariation among sets of variables. Such patterns are often disrupted in organisms living in non-pristine habitats.

Mediterranean wall lizards (Podarcis) live in both rural and urban areas. The question then, is whether patterns of association in cranial morphology differ between habitats, and between juvenile and adult individuals.

Levels of covariation (integration) are NOT different across habitats, but are lower in juveniles as compared to adults.

Multivariate Association: Summary

Various approaches to multivariate association

PLS is most general and with fewest mathematical constraints

Canonical Ordination Methods

Considers association of two matrices: \(\small\mathbf{X}\) & \(\small\mathbf{Y}\)

Provides visualization (ordination) from Y~X (differing from PCA/PCoA in this manner)

Canonical Ordination Methods

Considers association of two matrices: \(\small\mathbf{X}\) & \(\small\mathbf{Y}\)

Provides visualization (ordination) from Y~X (differing from PCA/PCoA in this manner)

Recall univariate multiple regression: \(\small{Y}=\beta_{0}+\beta_{1}X_{1}+\beta_{2}X_{2}+\dots+\epsilon\)

Here, predicted values \(\small\mathbf{\hat{Y}}\) represent best Ordination of \(\small\mathbf{Y}\) along the regression

This regression maximizes the \(\small{R}^2\) between \(\small\mathbf{Y}\) & \(\small\mathbf{\hat{Y}}\), and represents the optimal least squares relationship

Canonical Ordination Methods

Considers association of two matrices: \(\small\mathbf{X}\) & \(\small\mathbf{Y}\)

Provides visualization (ordination) from Y~X (differing from PCA/PCoA in this manner)

Recall univariate multiple regression: \(\small{Y}=\beta_{0}+\beta_{1}X_{1}+\beta_{2}X_{2}+\dots+\epsilon\)

Here, predicted values \(\small\mathbf{\hat{Y}}\) represent best Ordination of \(\small\mathbf{Y}\) along the regression

This regression maximizes the \(\small{R}^2\) between \(\small\mathbf{Y}\) & \(\small\mathbf{\hat{Y}}\), and represents the optimal least squares relationship

Canonical analyses share this property for multivariate \(\small\mathbf{Y}\), and generate ordinations of \(\small\mathbf{Y}\) constrained by the maximal LS relationship to \(\small\mathbf{X}\)

They can be considered ‘regression-based’ methods for describing covariation between \(\small\mathbf{X}\) & \(\small\mathbf{Y}\): \(\small\mathbf{S}_{XY}\)

Several canonical ordination methods exist, including: Canonical Variates Analysis (CVA) / Discriminant Analysis (DA); Redundancy Analysis (RDA); Canonical Correspondence Analysis (CCA)

Canonical Variates Analysis (CVA)

METHOD COMMONLY MISUSED BY BIOLOGISTS
Historical note: Fisher developed 2-group Discriminant Analysis (DA) in 1936; Rao (1948; 1952) generalized it to CVA for > 2 groups

Canonical Variates Analysis: What it Does

CVA rotates and shears data space to a spaces of normalized canonical axes (group variation will be circular)

Classification with CVA

Classify objects (known or unknown) to groups (VERY useful)

Obtain Mahalanobis distance from objects to group means: \(\small\mathbf{D}_{Mahal}=\mathbf{(Y-\overline{Y})^TV^{-1}(Y-\overline{Y})}\)

Assign objects to group to which it is closest (his weights distance by variation in each dimension)

Determine mis-classification rates

Note: ideally classification rates estimated from second dataset (cross-validation)

CVA: Example

Pupfish (simplified data)

Left: Actual data (PCA). Right: CVA plot (distortion of actual dataspace)

CVA: Comments

CVA: Further Issues

CV Axes not orthogonal in the original dataspace

And linear discrimination ONLY forms linear plane IFF within-group covariances identical (shown as ‘equal probability classification lines below)

CVA: Misleading Inference

Increasing number of variables enhances PERCEIVED group differences, even when groups are perfectly overlapping!

CVA: Conclusions

Redundancy Analysis (RDA)

\[\small\mathbf{S}_{\hat{\mathbf{Y}}^T\hat{\mathbf{Y}}}=\frac{1}{N-1}{\hat{\mathbf{Y}}^T\hat{\mathbf{Y}}}\]

\[\small\mathbf{S}_{\hat{\mathbf{Y}}^T\hat{\mathbf{Y}}}=\mathbf{C\Lambda{C}^T}\]

Note: Canonical correspondence analysis (CCA) is conceptually the same, but designed for frequency data.

RDA: Example

Several RDA ordinations of pupfish:

Plots are quite different depending on model!

RDA: Which model to use?

RDA should serve as a visual guide to interpret patterns, RELATIVE to a particular model. Decision should thus be based on evaluating the models first (not just plotting the RDA ordination).

For the pupfish data, evaluate full MANCOVA, then visualize from reduced model:

##                Df     SS     MS     Rsq         F       Z Pr(>F)   
## SL              1 632.53 632.53 0.91317 1326.4627  9.3185  0.001 **
## species         1  20.03  20.03 0.02892   42.0070  4.8100  0.001 **
## sex             1   7.33   7.33 0.01059   15.3816  4.2227  0.001 **
## SL:species      1   6.18   6.18 0.00892   12.9539  3.6882  0.001 **
## SL:sex          1   0.65   0.65 0.00094    1.3679  0.6042  0.282   
## species:sex     1   0.35   0.35 0.00050    0.7318 -0.0435  0.516   
## SL:species:sex  1   0.81   0.81 0.00117    1.6941  0.9719  0.160   
## Residuals      52  24.80   0.48 0.03580                            
## Total          59 692.68                                           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

RDA Ordination: Comments

Visualizing variation in residuals from a linear model is quite useful

But easy to mis-interpret (wittingly or unwittingly). If the wrong model Y~X is used, interpreting patterns in RDA ordination not helpful

Always consider RDA ordination in the context of visualizing patterns in original dataspace via PCA/PCoA

PCA provides visualization of ‘raw’ dataspace, RDA provides visualization of predicted values from a model (both are useful if done correctly).