04: Regression Models

General Overview

Dean Adams, Iowa State University

Regression

A linear model where the response variable \(\small\mathbf{Y}\) is continuous, and \(\small\mathbf{X}\) contains one or more continuous covariates (predictor variables)

\[\mathbf{Y}=\mathbf{X}\mathbf{\beta } +\mathbf{E}\]

Assumptions of Regression

1: Independence: \(\small\epsilon_{ij}\) of objects must be independent

2: Normality: requires normally distributed \(\small\epsilon_{ij}\)

3: Homoscedasticity: equal variance

4: \(\small{X}\) values are independent and measured without error

General Computations

Computations: Sums-of-Squares

Assessing Significance

Regression: Parameter Tests

Regression Example: Bumpus Data

model1<-lm(Y~X1)
anova(model1)
## Analysis of Variance Table
## 
## Response: Y
##            Df   Sum Sq  Mean Sq F value    Pr(>F)    
## X1          1 0.032412 0.032412  124.01 < 2.2e-16 ***
## Residuals 134 0.035023 0.000261                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model1$coefficients
## (Intercept)          X1 
##   1.3044593   0.6847984

Regression Example: Permutations

model2 <- lm.rrpp(Y~X1, print.progress = FALSE, data = mydat)
anova(model2)$table
##            Df       SS       MS     Rsq      F      Z Pr(>F)   
## X1          1 0.032412 0.032412 0.48063 124.01 5.9368  0.001 **
## Residuals 134 0.035023 0.000261 0.51937                        
## Total     135 0.067435                                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coef(model2)
##                  [,1]
## (Intercept) 1.3044593
## X1          0.6847984

Regression Vs. Correlation

Regression Vs. Correlation

Model II Regression: Error in Y & X

Model II Regression

Model II Regression: Example

## RMA was not requested: it will not be computed.
##   Method  Intercept     Slope Angle (degrees) P-perm (1-tailed)
## 1    OLS  1.3044593 0.6847984        34.40328             0.001
## 2     MA -0.3329159 0.9824064        44.49152             0.001
## 3    SMA -0.3624212 0.9877692        44.64746                NA
##   Method 2.5%-Intercept 97.5%-Intercept 2.5%-Slope 97.5%-Slope
## 1    OLS      0.6352913       1.9736273  0.5631720   0.8064248
## 2     MA     -1.3892010       0.5532596  0.8213358   1.1743959
## 3    SMA     -1.0726264       0.2656984  0.8736027   1.1168556
## RMA was not requested: it will not be computed.
## No permutation test will be performed

Model II Regression: Comments

Multiple Regression

NOTE: One must consider multicollinearity: the correlation of \(\small{X}\) variables with one another. This can affect statistical inference.

Multiple Regression Coefficients

Multiple Regression: Example

anova(lm(Y~X1+X2))
## Analysis of Variance Table
## 
## Response: Y
##            Df   Sum Sq  Mean Sq F value    Pr(>F)    
## X1          1 0.032412 0.032412 137.118 < 2.2e-16 ***
## X2          1 0.003585 0.003585  15.168 0.0001551 ***
## Residuals 133 0.031438 0.000236                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple Regression: Permutation

anova(lm.rrpp(Y~X1+X2,print.progress = FALSE, data=mydat))$table
##            Df       SS       MS     Rsq       F      Z Pr(>F)   
## X1          1 0.032412 0.032412 0.48063 137.118 6.1079  0.001 **
## X2          1 0.003585 0.003585 0.05317  15.168 2.9586  0.001 **
## Residuals 133 0.031438 0.000236 0.46620                         
## Total     135 0.067435                                          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  cor(X1,X2)  
## [1] 0.573966
anova(lm.rrpp(Y~X1+X2,print.progress = FALSE, data=mydat,SS.type = "II"))$table
##            Df       SS        MS     Rsq      F      Z Pr(>F)   
## X1          1 0.012782 0.0127819 0.18954 54.074 4.4652  0.001 **
## X2          1 0.003585 0.0035853 0.05317 15.168 2.9586  0.001 **
## Residuals 133 0.031438 0.0002364 0.46620                        
## Total     135 0.067435                                          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple Regression: Adding Additional Explanatory Variables

CAREFUL WITH BIOLOGICAL INTERPRETATION!!! One can over-fit models, or identify a best fitting model (in terms of some) that is biologically nonsensical

Comparing Regression Lines

\[\tiny{F}=\frac{\left(\beta_{1}-\beta_{2}\right)^{2}}{\overline{s}^{2}_{YX}\frac{\sum\left(X_{1}-\overline{X}_{1}\right)^{2}+\sum\left(X_{2}-\overline{X}_{2}\right)^{2}}{\sum\left(X_{1}-\overline{X}_{1}\right)^{2}\sum\left(X_{2}-\overline{X}_{2}\right)^{2}}}\]

Where \(\small{\overline{s}_{YX}}\) is the weighted average of \(\small{s}_{YX}\), and \(\small{df}=1,(n_{1}+n_{2}-4)\)

Procedure can be generalized to compare > 2 regression lines (see Biometry)

ANCOVA: Analysis of Covariance

ANCOVA: Computations

Note: Factors with 3+ groups will have multiple \(\small\beta\) per factor.

NOTE: If interaction term not significant (i.e., no evidence of heterogeneous slopes), remove interaction and re-evaluate \(\small{Y}\) using a common slopes model to compare least-squares group means (while accounting for covariate)

Model Parameters (\(\beta\))

\(\beta\) contains components of adjusted least-squares (LS) means and group slopes

y <- c(6, 4, 0, 2, 3, 3, 4, 7 )
x <- c(7,8,2,3,5,4,3,6)
gp <- factor(c(1,1,1,1,2,2,2,2)); 
df <- data.frame(x = x, y = y, gp = gp)

fit <- lm(y~x*gp, data = df); coef(fit)
## (Intercept)           x         gp2       x:gp2 
##  -0.8461538   0.7692308   1.0461538   0.1307692
# gp 2 coefficients 
c(coef(fit)[1]+coef(fit)[3], coef(fit)[2]+coef(fit)[4])
## (Intercept)           x 
##         0.2         0.9
res.gp <- by(df, gp, function(x) lm(y~x, data = x))
sapply(res.gp, coef)
##                      1   2
## (Intercept) -0.8461538 0.2
## x            0.7692308 0.9

Assessing Significance

ANCOVA: What are We Doing?

NOTE: In the former case (i.e., a significant cov:group interaction), our hypothesis has often been changed by the data. That is, we may have initiated the analysis as “Are the groups different even when I account for X?” but our data have told us that we instead must focus on the slope differences among groups, not differences among group means!

ANCOVA: Example

anova(lm(Y~X2*SexBySurv))
## Analysis of Variance Table
## 
## Response: Y
##               Df   Sum Sq   Mean Sq F value    Pr(>F)    
## X2             1 0.023215 0.0232149 77.3890 8.139e-15 ***
## SexBySurv      3 0.005172 0.0017239  5.7467  0.001014 ** 
## X2:SexBySurv   3 0.000651 0.0002172  0.7239  0.539496    
## Residuals    128 0.038397 0.0003000                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANCOVA: Example Cont.

anova(lm(Y~X2+SexBySurv))  # COMMON SLOPES MODEL
## Analysis of Variance Table
## 
## Response: Y
##            Df   Sum Sq   Mean Sq F value    Pr(>F)    
## X2          1 0.023215 0.0232149 77.8814 6.015e-15 ***
## SexBySurv   3 0.005172 0.0017239  5.7833 0.0009591 ***
## Residuals 131 0.039048 0.0002981                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANCOVA: Pairwise Comparisons

pairwise.t.test(model.ancova$fitted.values, SexBySurv, p.adj = "none")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  model.ancova$fitted.values and SexBySurv 
## 
##         f FALSE f TRUE  m FALSE
## f TRUE  0.028   -       -      
## m FALSE 4.2e-15 < 2e-16 -      
## m TRUE  0.029   1.6e-05 1.0e-12
## 
## P value adjustment method: none

ANCOVA via Permutation

model.anc2 <- lm.rrpp(Y~X2*SexBySurv, print.progress = FALSE, data = mydat)
anova(model.anc2)$table
##               Df       SS        MS     Rsq       F       Z Pr(>F)   
## X2             1 0.023215 0.0232149 0.34426 77.3890  5.2846  0.001 **
## SexBySurv      3 0.005172 0.0017239 0.07669  5.7467  3.0048  0.002 **
## X2:SexBySurv   3 0.000651 0.0002172 0.00966  0.7239 -0.0862  0.530   
## Residuals    128 0.038397 0.0003000 0.56939                          
## Total        135 0.067435                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model.anc3 <- lm.rrpp(Y~X2+SexBySurv, print.progress = FALSE, data = mydat)
anova(model.anc3)$table
##            Df       SS        MS     Rsq       F      Z Pr(>F)   
## X2          1 0.023215 0.0232149 0.34426 77.8814 5.2836  0.001 **
## SexBySurv   3 0.005172 0.0017239 0.07669  5.7833 3.0176  0.002 **
## Residuals 131 0.039048 0.0002981 0.57905                         
## Total     135 0.067435                                           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pairwise Tests via Permutation

res <- summary(pairwise(model.anc3, groups = SexBySurv), test.type = "dist", stat.table = FALSE) 
##             f FALSE      f TRUE    m FALSE      m TRUE
## f FALSE 0.000000000 0.001283828 0.01595601 0.004128217
## f TRUE  0.001283828 0.000000000 0.01723984 0.005412045
## m FALSE 0.015956012 0.017239840 0.00000000 0.011827794
## m TRUE  0.004128217 0.005412045 0.01182779 0.000000000
##            f FALSE     f TRUE  m FALSE    m TRUE
## f FALSE  0.0000000 -0.9612166 2.772571 0.4670408
## f TRUE  -0.9612166  0.0000000 2.578038 0.6999764
## m FALSE  2.7725709  2.5780381 0.000000 2.3954073
## m TRUE   0.4670408  0.6999764 2.395407 0.0000000
##         f FALSE f TRUE m FALSE m TRUE
## f FALSE   1.000  0.819   0.002  0.337
## f TRUE    0.819  1.000   0.002  0.256
## m FALSE   0.002  0.002   1.000  0.002
## m TRUE    0.337  0.256   0.002  1.000

Pairwise Tests via Permutation Cont.

PW <- pairwise(model.anc3, groups = SexBySurv)
summary(PW, test.type = "VC", angle.type = "deg")
## 
## Pairwise comparisons
## 
## Groups: f FALSE f TRUE m FALSE m TRUE 
## 
## RRPP: 1000 permutations
## 
## LS means:
## Vectors hidden (use show.vectors = TRUE to view)
## 
## Pairwise statistics based on mean vector correlations
##                 r angle    UCL (95%)          Z Pr > angle
## f FALSE:f TRUE  1     0 8.537736e-07 -0.4814688     0.5955
## f FALSE:m FALSE 1     0 8.537736e-07 -0.4969422     0.6005
## f FALSE:m TRUE  1     0 8.537736e-07 -0.4889152     0.5980
## f TRUE:m FALSE  1     0 8.537736e-07 -0.4727927     0.5930
## f TRUE:m TRUE   1     0 8.537736e-07 -0.4735833     0.5930
## m FALSE:m TRUE  1     0 8.537736e-07 -0.5034108     0.6020
NOTE: For illustrative purposes only, as this particular example did not display a significant cov:group interaction

ANCOVA: Common Errors

Other Regression Models