03: ANOVA Models

General Overview

Dean Adams, Iowa State University

Analysis of Variance (ANOVA)

A linear model where the response variable \(\small\mathbf{Y}\) is continuous, and \(\small\mathbf{X}\) contains one or more categorical factors (predictor variables). This is a comparison of groups

\[\mathbf{Y}=\mathbf{X}\mathbf{\beta } +\mathbf{E}\]

This formulation appears different, but really \(\mathbf{\beta}\) is a vector containing \(\small{\mu}\) and \(\small{\alpha}_{i}\), so the model is the same

Assumptions of ANOVA

General Computations

\[\small{SST}=\sum_{i=1}^{a}\sum_{j=1}^{n_a}\left({Y}_{ij}-\overline{\overline{Y}}\right)^{2}\]

\[\small{MS}={\sigma}^{2}=\frac{1}{n-1}SS\]

Single-Factor ANOVA

Assessing Significance

Single-Factor ANOVA: Example

Do male and female sparrows differ in total length?

anova(lm(bumpus$TL~bumpus$sex))
## Analysis of Variance Table
## 
## Response: bumpus$TL
##             Df  Sum Sq Mean Sq F value    Pr(>F)    
## bumpus$sex   1  187.49 187.491  16.483 8.304e-05 ***
## Residuals  134 1524.24  11.375                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Special Case: Two Groups (t-test or ANOVA?)

Group <- gl(2,5)
Y <- c(5,4,4,4,3,7,5,6,6,6)
t.test(Y~Group)
## 
##  Welch Two Sample t-test
## 
## data:  Y by Group
## t = -4.4721, df = 8, p-value = 0.002077
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -3.0312764 -0.9687236
## sample estimates:
## mean in group 1 mean in group 2 
##               4               6
anova(lm(Y~Group))
## Analysis of Variance Table
## 
## Response: Y
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## Group      1     10    10.0      20 0.002077 **
## Residuals  8      4     0.5                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Special Case: T-test vs. ANOVA

\[\tiny{t}^{2}=\frac{\left(\overline{Y}_{1}-\overline{Y}_{2}\right)^{2}}{s_{p}^{2}\left(\frac{1}{n_{1}}+\frac{1}{n_{2}}\right)}=\frac{n\left(\overline{Y}_{1}-\overline{Y}_{2}\right)^{2}}{2s_{p}^{2}}\] - Numerator is equivalent to numerator in ANOVA:

\[\tiny{n}\left(\overline{Y}_{1}-\overline{Y}_{2}\right)^{2}=\left({n\left(\overline{Y}_{1}-\overline{\overline{Y}}\right)^{2}+n\left(\overline{Y}_{2}-\overline{\overline{Y}}\right)^{2}}\right)=\sum_{i=1}^{2}n\left({\overline{Y}_{i}-\overline{\overline{Y}}}\right)^{2}\]

ANOVA: More Than Two Groups

Note: the binary coding of levels within factors is typically performed by the statistical software, not the user. But it important to understand what is occuring, so one may properly interpret model coefficients

ANOVA Model Matrix Examples

x1 <- gl(3,3)
x1
## [1] 1 1 1 2 2 2 3 3 3
## Levels: 1 2 3
model.matrix(~x1) # with intercept
##   (Intercept) x12 x13
## 1           1   0   0
## 2           1   0   0
## 3           1   0   0
## 4           1   1   0
## 5           1   1   0
## 6           1   1   0
## 7           1   0   1
## 8           1   0   1
## 9           1   0   1
## attr(,"assign")
## [1] 0 1 1
## attr(,"contrasts")
## attr(,"contrasts")$x1
## [1] "contr.treatment"
model.matrix(~x1+0) # without intercept
##   x11 x12 x13
## 1   1   0   0
## 2   1   0   0
## 3   1   0   0
## 4   0   1   0
## 5   0   1   0
## 6   0   1   0
## 7   0   0   1
## 8   0   0   1
## 9   0   0   1
## attr(,"assign")
## [1] 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$x1
## [1] "contr.treatment"

Model Parameters

What do the parameters \(\beta\) represent in ANOVA?

\(\beta\) contains the first group mean, and differences from this mean to the others

Model Parameters

What do the parameters \(\beta\) represent in ANOVA?

\(\beta\) contains the first group mean, and differences from this mean to the others

Example:

y <- c(6, 4, 0, 2, 3, 3, 4, 7 )
gp <- factor(c(1,1,2,2,3,3,4,4))
coef(lm(y~gp))
## (Intercept)         gp2         gp3         gp4 
##         5.0        -4.0        -2.0         0.5
tapply(y,gp,mean)
##   1   2   3   4 
## 5.0 1.0 3.0 5.5

ANOVA: Post-Hoc Tests

Post-Hoc Comparisons by Permutation

Experiment-Wise Error Rate

ANOVA Example: Bumpus Data

ANOVA Example: Bumpus Data

gp <- as.factor(paste(bumpus$sex,bumpus$surv))
anova(lm(bumpus$TL~gp))
## Analysis of Variance Table
## 
## Response: bumpus$TL
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## gp          3  369.49 123.163  12.112 4.712e-07 ***
## Residuals 132 1342.25  10.169                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pairwise.t.test(bumpus$TL, gp, p.adj = "none")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  bumpus$TL and gp 
## 
##         f FALSE f TRUE  m FALSE
## f TRUE  0.257   -       -      
## m FALSE 1.2e-05 3.4e-07 -      
## m TRUE  0.273   0.025   7.9e-05
## 
## P value adjustment method: none

ANOVA Example: Permutations

res <-summary(pairwise(lm.rrpp(TL~gp, data = mydat,print.progress = FALSE),groups = gp),test.type = "dist", stat.table = FALSE)

res$pairwise.tables$D  #Distance matrix
##           f FALSE   f TRUE  m FALSE    m TRUE
## f FALSE 0.0000000 1.047619 3.654762 0.8263305
## f TRUE  1.0476190 0.000000 4.702381 1.8739496
## m FALSE 3.6547619 4.702381 0.000000 2.8284314
## m TRUE  0.8263305 1.873950 2.828431 0.0000000
res$pairwise.tables$Z  #Effect size
##           f FALSE    f TRUE  m FALSE    m TRUE
## f FALSE 0.0000000 0.5913741 3.128898 0.5217306
## f TRUE  0.5913741 0.0000000 3.505043 1.6195253
## m FALSE 3.1288979 3.5050431 0.000000 2.7630346
## m TRUE  0.5217306 1.6195253 2.763035 0.0000000
res$pairwise.tables$P  #Significance
##         f FALSE f TRUE m FALSE m TRUE
## f FALSE  1.0000 0.3115   0.001  0.331
## f TRUE   0.3115 1.0000   0.001  0.043
## m FALSE  0.0010 0.0010   1.000  0.001
## m TRUE   0.3310 0.0430   0.001  1.000

Factorial ANOVA

ANOVA multiple \(\small{X}\) variables (e.g, species AND sex)

Note: Factors with 3+ levels will have multiple \(\small\beta\) per factor.

Factorial Models

Factor Interactions

Factorial ANOVA: General

\[\tiny{Y}=\beta_{0}+X_{1}\beta_{1}+X_{2}\beta_{2}+X_{3}\beta_{3}+X_{1:2}\beta_{1:2}+X_{1:3}\beta_{1:3}+X_{2:3}\beta_{2:3}+X_{1:2:3}\beta_{1:2:3}+\epsilon\]

Rules for pooling MSA*B with MSE depend upon significance, the type of effect (fixed or random), and several other considerations (See Box 10.3 in Biometry for details)

Assessing Significance

Factorial ANOVA: Example

anova(lm(TL~sex*surv))
## Analysis of Variance Table
## 
## Response: TL
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## sex         1  187.49 187.491 18.4384 3.374e-05 ***
## surv        1  157.74 157.738 15.5123 0.0001321 ***
## sex:surv    1   24.26  24.260  2.3858 0.1248339    
## Residuals 132 1342.25  10.169                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pairwise.t.test(TL, gp, p.adj = "none")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  TL and gp 
## 
##         f FALSE f TRUE  m FALSE
## f TRUE  0.257   -       -      
## m FALSE 1.2e-05 3.4e-07 -      
## m TRUE  0.273   0.025   7.9e-05
## 
## P value adjustment method: none

Factorial ANOVA Example: Permutations

anova(lm.rrpp(TL~sex*surv, data = mydat,print.progress = FALSE))$table
##            Df      SS      MS     Rsq       F      Z Pr(>F)   
## sex         1  187.49 187.491 0.10953 18.4384 3.0977  0.001 **
## surv        1  157.74 157.738 0.09215 15.5123 2.9439  0.001 **
## sex:surv    1   24.26  24.260 0.01417  2.3858 1.2237  0.114   
## Residuals 132 1342.25  10.169 0.78414                         
## Total     135 1711.74                                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(pairwise(lm.rrpp(TL~gp, data = mydat,print.progress = FALSE),groups = gp), test.type = "dist")
## 
## Pairwise comparisons
## 
## Groups: f FALSE f TRUE m FALSE m TRUE 
## 
## RRPP: 1000 permutations
## 
## LS means:
## Vectors hidden (use show.vectors = TRUE to view)
## 
## Pairwise distances between means, plus statistics
##                         d UCL (95%)         Z Pr > d
## f FALSE:f TRUE  1.0476190  1.893452 0.5913741 0.3115
## f FALSE:m FALSE 3.6547619  1.738294 3.1288979 0.0010
## f FALSE:m TRUE  0.8263305  1.730462 0.5217306 0.3310
## f TRUE:m FALSE  4.7023810  1.904762 3.5050431 0.0010
## f TRUE:m TRUE   1.8739496  1.827311 1.6195253 0.0430
## m FALSE:m TRUE  2.8284314  1.568873 2.7630346 0.0010

Fixed Effects and Random Effects

Expected Mean Squares

Nested Effects Models

No interaction of A:B obtained when nested terms are used, because SS(A/B) = SSB + SSA:B

Nested Anova: Example

An example

anova(lm.rrpp(TL~sex/surv, print.progress = FALSE), error = c("sex:surv", "Residuals"))$table
##            Df      SS      MS     Rsq      F      Z Pr(>F)   
## sex         1  187.49 187.491 0.10953 2.0604 0.4877  0.309   
## sex:surv    2  182.00  90.999 0.10632 8.9491 3.2517  0.001 **
## Residuals 132 1342.25  10.169 0.78414                        
## Total     135 1711.74                                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Compared with the same data as a factorial model

anova(lm.rrpp(TL~sex*surv, print.progress = FALSE))$table
##            Df      SS      MS     Rsq       F      Z Pr(>F)   
## sex         1  187.49 187.491 0.10953 18.4384 3.0977  0.001 **
## surv        1  157.74 157.738 0.09215 15.5123 2.9439  0.001 **
## sex:surv    1   24.26  24.260 0.01417  2.3858 1.2237  0.114   
## Residuals 132 1342.25  10.169 0.78414                         
## Total     135 1711.74                                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Factorial or Nested?

Linear Mixed Models

Linear Mixed Model: Example

Subjects differ by slope and intercept (fit random slope/intercept model)

fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
anova(fm1)
## Analysis of Variance Table
##      npar Sum Sq Mean Sq F value
## Days    1  30031   30031  45.853
summary(fm1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (Days | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1743.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9536 -0.4634  0.0231  0.4634  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 612.10   24.741       
##           Days         35.07    5.922   0.07
##  Residual             654.94   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  251.405      6.825  36.838
## Days          10.467      1.546   6.771
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138

Complex Experimental Designs

See Statistics Dept. for how to generate such designs

Type I, Type II, Type III Sums-of-Squares

anova(lm(bumpus$TL~bumpus$sex*bumpus$surv))
## Analysis of Variance Table
## 
## Response: bumpus$TL
##                         Df  Sum Sq Mean Sq F value    Pr(>F)    
## bumpus$sex               1  187.49 187.491 18.4384 3.374e-05 ***
## bumpus$surv              1  157.74 157.738 15.5123 0.0001321 ***
## bumpus$sex:bumpus$surv   1   24.26  24.260  2.3858 0.1248339    
## Residuals              132 1342.25  10.169                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm(bumpus$TL~bumpus$surv*bumpus$sex))
## Analysis of Variance Table
## 
## Response: bumpus$TL
##                         Df  Sum Sq Mean Sq F value    Pr(>F)    
## bumpus$surv              1  106.88 106.876 10.5105  0.001503 ** 
## bumpus$sex               1  238.35 238.353 23.4403 3.545e-06 ***
## bumpus$surv:bumpus$sex   1   24.26  24.260  2.3858  0.124834    
## Residuals              132 1342.25  10.169                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple Model Effects (Type I SS)

For a model containing two factors (sex, and population), consider the following model comparisons (\(\small{H}_{F}\) vs. \(\small{H}_{R}\)):

Effect Reduced Full Purpose
Sex Null Null + Sex Determine if Sex mean difference is “significant” compared to an overall mean
Pop Null + Sex Null + Sex + Pop Determine if population differences are significant, after accounting for general difference between males and females
Sex:Pop Null + Sex + Pop Null + Sex + Pop + Sex:Pop Determine if sexual dimorphism varies between populations, after accounting for general difference in sex and population.
Full model Null Null + Sex + Pop + Sex:Pop Ascertain if the model is signficant (if not, the other parts should not be considered significant either)

Estimating \(SS\) between models for the first three model comparisons is called Sequential (Type I) \(SS\) estimation. We start with a null model and build up to the fullest model.

Multiple Model Effects (Type II SS)

Alternatively, model comparisons (\(\small{H}_{F}\) vs. \(\small{H}_{R}\)) could be constructed as follows:

Effect Reduced Full Purpose
Sex Null + Pop Null + Pop + Sex Determine if Sex mean difference is “significant”, after accounting for population mean difference
Pop Null + Sex Null + Sex + Pop Determine if population differences are significant, after accounting for general difference between males and females
Sex:Pop Null + Sex + Pop Null + Sex + Pop + Sex:Pop Determine if sexual dimorphism varies between populations, after accounting for general difference in sex and population.
Full model Null Null + Sex + Pop + Sex:Pop Ascertain if the model is signficant (if not, the other parts should not be considered significant either)

Estimating \(SS\) between models for the first three model comparisons is called Conditional (Type II) \(SS\) estimation. We analyze effects with respect to other effects, but do not include interactions with targeted effects.

Multiple Model Effects (Type III SS)

Or, model comparisons (\(\small{H}_{F}\) vs. \(\small{H}_{R}\)) could be constructed as follows:

Effect Reduced Full Purpose
Sex Null + Pop + Pop:Sex Null + Pop + Sex + Pop:Sex Determine if Sex mean difference is “significant”, after accounting for all other effects
Pop Null + Sex + Pop:Sex Null + Sex + Pop + Pop:Sex Determine if population differences are significant, after accounting for all other effects.
Sex:Pop Null + Sex + Pop Null + Sex + Pop + Sex:Pop Determine if sexual dimorphism varies between populations, after accounting for general difference in sex and population.
Full model Null Null + Sex + Pop + Sex:Pop Ascertain if the model is signficant (if not, the other parts should not be considered significant either)

Estimating \(SS\) between models for the first three model comparisons is called Marginal (Type III) \(SS\) estimation. Whereas Type I \(SS\) evaluates the importance of effects through inclusion, Type III \(SS\) evaluates the importance of effects through exclusion.

Multiple Model Effects (comments)

Critical thinking is required here, and for evaluating all models! One must fully understand the set of full and reduced models that one is investigating!