07: Interaction Terms and Trajectory Analysis

Dean Adams, Iowa State University

From Factorial Models to Trajectories

We have seen that factorial models enable us to attribute variation to differing factors, and in the case of categorical factors, determine whether levels (groups) differ in some way. For instance, the plot below shows 4 groups which appear distinct in their trait attributes. However, as biologists, we have additional information about these groups; for instance, they may be males and females from each of two populations. Thus, levels of one factor sex are really linked via the other factor population.

From Factorial Models to Trajectories

We have seen that factorial models enable us to attribute variation to differing factors, and in the case of categorical factors, determine whether levels (groups) differ in some way. For instance, the plot below shows 4 groups which appear distinct in their trait attributes. However, as biologists, we have additional information about these groups; for instance, they may be males and females from each of two populations. Thus, levels of one factor sex are really linked via the other factor population.

Might there be additional information we could glean statistically from viewing these as linked data (trajectories)?

Trajectory Analysis

Trajectory analysis is a means of quantifying and comparing attributes of these linkages. In the case below, the vectors linking males and females form sexual dimorphism vectors. These represent vectors of change in trait values between levels (the sexes in this case), and these may be treated as a second-order dataset that may be compared across populations. Trajectory analysis accomplishes this task.

Vector and Trajectory Methods Discussed Today were Originally Described in: Collyer and Adams. Ecology. 2007; Adams and Collyer. Evolution. 2007; Adams and Collyer. Evolution. 2009; Collyer and Adams. Hystrix. 2013.

Trajectory Analysis

In the case of factorial models, one might consider two orders of possible factor-level analyses:

Order Vectors Test
First \(\mathbf{\hat{y}}_i\) Compare means, given appropriate null (reduced) model, to ascertain a logical explanation for significant factor interactions. (We have done this already. e.g., pairwise comparisons.)
Second \(\mathbf{\hat{y}}_i - \mathbf{\hat{y}}_j\) Compare change in means across levels of factor B among levels of factor A. Second-order analyses target the pattern of change and analyze the geometric attributes of trajectories that describe the change.

As an example, with the pupfish body shape data, a first-order analysis seeks to understand if the variation among the means of the sex:population groups was meaningful, after accounting for general population differences and sexual dimorphism. A second-order analysis might define sexual dimorphism vectors and compare their geometric attributes in a specific way, to help explain why the variation is meaningful.

Interaction Terms in Linear Models

Let’s take a step back and reconsider model interaction terms. Say we have Y~A+B+A:B.

The interaction term A:B measures the joint effect of main effects A & B. It identifies whether responses to A dependent on level of B

Interactions are Are VERY common in biology. For instance, the example below shows 2 species in 2 environments (Factors A & B). Species 1 has a higher growth rate in the moist environment, while species 2 has a higher growth rate in the dry environment. Statistically, such patterns would be identified by a significant species:environment interaction.

Note: the study of trade-offs and reaction norms in evolutionary ecology is essentially the study of interactions!

Understanding Interactions

Significant interactions identify a joint response of factors (response to Factor B depends on level in Factor A)

Interpreting interactions for univariate data is straightforward: simply plot them!

Understanding Interactions (Cont.)

For two traits, more complicated variants are possible, but a 2D plot still suffices for visual inspection.

Interactions to Trajectories

In the previous plots, we have ‘linked’ levels of one factor across another. For two levels, that produces vectors, whose patterns we compared visually. More formally, we should use quantitative summaries. Consider the following:

Here the white and blue dots (\(\small\mathbf{\hat{y}}_{i,1}^T\) and \(\mathbf{\hat{y}}_{i,2}^T\)) correspond to levels within each group (e.g., sexes within each population). Likewise, the vector between them is defined as, \(\small\mathbf{\hat{y}}_{i,2}^T - \mathbf{\hat{y}}_{i,1}^T\). These represent Change Vectors describing the difference in mean trait values between levels within each group.

For our model, Y~population+sex+population:sex, they represent vectors describing the sexual dimorphism within each population.

The question then is: do these vectors differ, and if so how?

Attributes of Change Vectors (2-Point Trajectories)

Once we have defined the change vectors for each group, we require summary measures of them. In any dimension, vectors have two attributes: a length and a direction.

First, the length of each change vector is found as: \(\small\Vert{D}_1\Vert=\left( \mathbf{\Delta \hat{y}}_1^T \mathbf{\Delta \hat{y}_1} \right)^{1/2}\). Where: \(\small{\mathbf{\Delta \hat{y}}_1}=\mathbf{\hat{y}}_{1,2}^T - \mathbf{\hat{y}}_{1,1}^T\). This is the length of \(\small{D}_1\). Vector lengths are obtained for all vectors.

Next, we calculate two test statistics: one that describes the difference in vector magnitudes, and another the difference in their angular direction:

\[\small MD = \left| d_1 - d_2 \right| = \left| \left( \mathbf{\Delta \hat{y}}_1^T \mathbf{\Delta \hat{y}_1} \right)^{1/2} - \left(\mathbf{\Delta \hat{y}}_2^T \mathbf{\Delta \hat{y}}_2 \right)^{1/2} \right| \]

\[\small\theta = \frac{\mathbf{\Delta \hat{y}}_1^T \mathbf{\Delta \hat{y}}_2 }{d_1d_2}\]

Here, \(\small{MD}\) represents the magnitude difference between vector lengths, and \(\small\theta\) represents the angular difference between vector directions.

Hypothesis Tests

Evaluating \(\small{MD}\) and \(\small\theta\) involves something we have performed previously RRPP with factorial models. The procedure is as follows, and is based on a comparison of variation from \(\mathbf{X}_F\) & \(\mathbf{X}_R\) where:

\(\mathbf{X}_F\) is: Y~A + B + A:B

\(\mathbf{X}_R\) is: Y~A + B

Estimate \(\small\mathbf{X}_{R}\) \(\small\mathbf{X}_{F}\)
Coefficients \(\small\hat{\mathbf{\beta_R}}=\left ( \mathbf{X}_R^{T} \mathbf{X}_R\right )^{-1}\left ( \mathbf{X}_R^{T} \mathbf{Y}\right )\) \(\small\hat{\mathbf{\beta_F}}=\left ( \mathbf{X}_F^{T} \mathbf{X}_F\right )^{-1}\left ( \mathbf{X}_F^{T} \mathbf{Y}\right )\)
Predicted Values \(\small\hat{\mathbf{Y}}_R=\mathbf{X}_R\hat{\mathbf{\beta}}_R\) \(\small\hat{\mathbf{Y}}_F=\mathbf{X}_F\hat{\mathbf{\beta}}_F\)
Model Residuals \(\small\hat{\mathbf{E}}_R=\mathbf{Y}-\hat{\mathbf{Y}}_R\) \(\small\hat{\mathbf{E}}_F=\mathbf{Y}-\hat{\mathbf{Y}}_F\)

For \(\mathbf{X}_F\), \(\small{MD}\) and \(\small\theta\) are obtained from \(\small\hat{\mathbf{Y}}_F\). Then:

Trajectory Analysis: Comments

Testing \(\small{MD}\) and \(\small\theta\) evaluates context dependent trait changes. “Is there greater sexual dimorphism in one population as compared to another?”

This is based on vector attributes, which embody aspects of interaction terms.

In fact, RRPP used in this manner is explicitly a test of interaction term attributes (think of the RRPP procedure we utilized: Y~A+B+A:B versus Y~A+B).

If the interaction term from the manova is not significant, this implies that there is no context-dependent pattern in the data: understanding the main effects in the model is sufficient.

Note also, that standard pairwise comparisons ‘leave some information on the table’ that vector analysis evaluates. Pairwise comparisons are at best tests of magnitude between two groups. They do NOT evaluate differences in distances between pairs of groups, nor directional differences. Only trajectory/vector analysis accomplishes this!

Trajectory Analysis: Familiar Example

What we have not yet established is whether second-order statistics generated by RRPP are appropriate (they are). We will examine that shortly, but let’s first get a sense how this works. This example uses the R package, RRPP with the Pupfish data. from RRPP.

data(Pupfish)
fit <- lm.rrpp(coords ~ Pop * Sex, data = Pupfish, iter = 999, print.progress = FALSE)
TA <- trajectory.analysis(fit, groups = Pupfish$Pop, 
                          traj.pts = Pupfish$Sex, print.progress = FALSE)

Trajectory Analysis: Familiar Example (Cont.)

# Magnitude difference (absolute difference between path distances)
summary(TA, attribute = "MD") 
## 
## Trajectory analysis
## 
## 1000 permutations.
## 
## Points projected onto trajectory PCs
## 
## Trajectories:
## Trajectories hidden (use show.trajectories = TRUE to view)
## 
## Observed path distances by group
## 
##      Marsh   Sinkhole 
## 0.04611590 0.02568508 
## 
## Pairwise absolute differences in path distances, plus statistics
##                         d UCL (95%)        Z Pr > d
## Marsh:Sinkhole 0.02043082  0.012731 2.671911  0.001
# Correlations (angles) between trajectories
summary(TA, attribute = "TC", angle.type = "deg") 
## 
## Trajectory analysis
## 
## 1000 permutations.
## 
## Points projected onto trajectory PCs
## 
## Trajectories:
## Trajectories hidden (use show.trajectories = TRUE to view)
## 
## Pairwise correlations between trajectories, plus statistics
##                        r    angle UCL (95%)       Z Pr > angle
## Marsh:Sinkhole 0.7393716 42.32209  24.12074 3.54113      0.001

Trajectory Analysis: Familiar Example (Cont.)

TP <- plot(TA, pch = as.numeric(Pupfish$Pop) + 20, bg = as.numeric(Pupfish$Sex),
           cex = 1, col = "black")
add.trajectories(TP, traj.pch = c(21, 22), start.bg = 1, end.bg = 2)
legend("topright", levels(Pupfish$Pop), pch =  c(21, 22), pt.bg = 1)

The trajectory analysis and plot demonstrate that sexual dimorphism differs between the two populations. One population (Marsh) displays greater SD than the other. Also, the direction of that sexual dimorphism is not the same in the data space.

Trajectory Analysis: Example 2

Another example in salamanders, regarding evolutionary phenotypic responses of two species to competition (design: 2 species in 2 environments).

Here, the magnitude of vectors was not different but the direction was. The interpretation in this case is that both species displayed similar amounts of phenotypic evolution in response to competition, but the direction differed. In this case, the species diverged evolutionarily: a pattern consistent with character displacement.

Trajectory Analysis With Covariates

Earlier examples of trajectory analysis assumed no covariate, visually:

However, for many hypotheses, one must account for variation due to a covariate while simultaneously assessing patterns of multivariate change.

Adams and Collyer Evolution (2007) demonstrated that simply incorporating the covariate into the model permitted accomplishes this analytical goal.

Mathematically this is the model: Y~cov+A+B+A:B.

Trajectory Analysis With Covariates (Cont.)

The method works exactly as expected!

Trajectory Analysis: Generalizations

Now we have to consider that a trajectory might contain more than two points. Two-point trajectories are great because they have two attributes - length and direction - with which we have learned how to handle for statistical tests. A three point trajectory is a sequence of two vectors; a four-point trajectory is a sequence of three vectors; etc. We could go crazy trying to analyze every pairwise vector of every pairwise difference in a pairwise fashion, or further consider the attributes of trajectories.

Trajectories have as attributes

  1. Length (of a vector between two points, of the sum of vector lengths - the path distance if three or more points)
  2. Direction (angular change from a reference, of vectors for two points, or of first PCs for three or more points)
  3. Shape (if three or more points, the cumulative angular and length changes of consecutive vectors)

Trajectory Analysis (Cont.)

Trajectory Statistics

Attribute Compared 2-point Statistic >2-point Statistic
Length \(MD = \mid {d}_1 - {d}_2 \mid\) \(MD = \mid \sum d_1 - \sum d_2 \mid\)
Angle \(\theta = \frac{\mathbf{\Delta \hat{y}}_1^T \mathbf{\Delta \hat{y}}_2 }{d_1d_2}\) \(\theta = \mathbf{u}_{1,1}^T \mathbf{u}_{2,1}\), where \(\mathbf{u}_{i,1}\) refers to the first eigenvector of the \(i^{th}\) group, which happens to already be unit size.
Shape NA \(D_P = \left(\left( \mathbf{\hat{z}}_1^T - \mathbf{\hat{z}}_2^T \right) \left( \mathbf{\hat{z}}_1 - \mathbf{\hat{z}}_2 \right) \right)^{1/2}\), where \(\mathbf{\hat{z}}\) is a vectorized form of the matrix, \(\mathbf{\hat{Y}}\), which has been centered, “standardized” by matrix size, and rotated. \(D_P\) is Procrustes distance.

Generalized Procrustes Analysis

Generalized Procrustes Analysis (Cont.)

Generalized Procrustes Analysis (Cont.)

Generalized Procrustes Analysis (Cont.)

Trajectory Statistics

Attribute Compared 2-point Statistic >2-point Statistic
Length \(MD = \mid {d}_1 - {d}_2 \mid\) \(MD = \mid \sum d_1 - \sum d_2 \mid\)
Angle \(\theta = \frac{\mathbf{\Delta \hat{y}}_1^T \mathbf{\Delta \hat{y}}_2 }{d_1d_2}\) \(\theta = \mathbf{u}_{1,1}^T \mathbf{u}_{2,1}\), where \(\mathbf{u}_{i,1}\) refers to the first eigenvector of the \(i^{th}\) group, which happens to already be unit size.
Shape NA \(D_P = \left(\left( \mathbf{\hat{z}}_1^T - \mathbf{\hat{z}}_2^T \right) \left( \mathbf{\hat{z}}_1 - \mathbf{\hat{z}}_2 \right) \right)^{1/2}\), where \(\mathbf{\hat{z}}\) is a vectorized form of the matrix, \(\mathbf{\hat{Y}}\), which has been centered, “standardized” by matrix size, and rotated. \(D_P\) is Procrustes distance.

Trajectory Analysis (Cont.)

Each of these statistics could be considered test statistics in a hypothesis test. The expected value of each is 0 under a null hypothesis of no difference in geometric attribute. The question now is whether these second-order statistics behave as we would hope for hypothesis tests. The way to test this is to create data with known properties and see what happens.

Trajectory Analysis (Cont.)

Trajectory Analysis (Cont.)

Trajectory Analysis (Cont.)

Trajectory Analysis (Cont.)

Trajectory Analysis (Cont.)

Trajectory Analysis (Cont.)

Trajectory Analysis (Cont.)

Conclusion: Using RRPP, trajectory analysis provides appropriate pairwise test statistics. Therefore, in addition to ANOVA and pairwise tests with first-order statistics, one can perform a trajectory analysis with the same random permutations from RRPP, for factorial models.

Trajectory Analysis: Illustrative Example

Trajectory Analysis: Illustrative Example

Trajectory Analysis: Illustrative Example

Trajectory Analysis: Illustrative Example

Trajectory Analysis: Examples of Its Use in The Scientific Literature

Trajectory Analysis: Examples of Its Use in The Scientific Literature

Trajectory Analysis: Examples of Its Use in The Scientific Literature

Trajectory Analysis: Examples of Its Use in The Scientific Literature

Trajectory Analysis: Examples of Its Use in The Scientific Literature

Trajectory Analysis: Examples of Its Use in The Scientific Literature

Trajectory Analysis: Examples of Its Use in The Scientific Literature

Trajectory Analysis: Examples of Its Use in The Scientific Literature

A recent review in Ann. Rev. Ecol. Evol. Syst. (2018) highly recommends use of trajectory analysis for evaluating patterns of parallel and non-parallel evolution!

Trajectory Analysis: Summary and Other Comments

Also, angles might be strange if trajectories change direction a lot (maybe returning to initial states in later states), such that describing a general direction does not make sense.

We recommend focusing on effect sizes and considering all attributes, together.