Spatial Statistics

library(spatstat)

## Loading required package: spatstat.data

## Loading required package: spatstat.geom

## spatstat.geom 2.1-0

## Loading required package: spatstat.core

## Loading required package: nlme

## Loading required package: rpart

## spatstat.core 2.1-2

## Loading required package: spatstat.linnet

## spatstat.linnet 2.1-1

## 
## spatstat 2.1-0       (nickname: 'Comedic violence') 
## For an introduction to spatstat, type 'beginner'

library(RRPP)
library(vegan)

## Loading required package: permute

## Loading required package: lattice

## 
## Attaching package: 'lattice'

## The following object is masked from 'package:spatstat.core':
## 
##     panel.histogram

## This is vegan 2.5-7

library(spdep)

## Loading required package: sp

## Loading required package: spData

## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`

## Loading required package: sf

## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1

## Registered S3 method overwritten by 'spdep':
##   method   from
##   plot.mst ape

library(gstat)

## 
## Attaching package: 'gstat'

## The following object is masked from 'package:spatstat.core':
## 
##     idw

Motivation

Today we examine methods that concern geographic space. Some assess whether or not there is spatial autocorrelation in data, others account for spatial autocorrelation in subsequent analyses. The latter can be accomplished using a generalized least squares model, or implemented as a weighted GLM. These approaches model spatial non-independence in the error term of the GLM, and can assess the relationship between two sets of variables (Y~X) while accounting for spatial non-independence.

The observant student will note that this description, and the method used, is identical to what is done in phylogenetic comparative methods, where phylogenetic non-independence is accounted for during the analysis of Y~X.

1: Point Patterns: Ripley’s K

Our first analysis takes spatial data and examines point patterns for underdispersion or overdispersion:

#1: Ripley's K
data(cells)
plot(cells)

plot(Kest(cells))   #shows K for various models: isotropic, poisson, etc.

E<-envelope(cells,Kest,nsim=100,rank=2)

## Generating 100 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.
## 
## Done.

plot(E) # plot shows that observed are underdispersed at small spatial scales

The plot shows that the observed data are underdispersed at small spatial scales.

Spatially Autocorrelated Data

Next we examine various patterns of spatial autocorrelation between one or more variables and geographic location. Typically these approaches are used on empirical data, but for this example we will simulate data with spatial autocorrelation. Our data contain a continuous variable and groups (ecotype) simulated across the landscape:

lat<-runif(50,0,5); long<-runif(50,0,5)
g<-cbind(lat,long)  #create an XY spatial grid
y <- sqrt(diag(g%*%t(g))) + rnorm(nrow(g))  #a spatially-autocorrelated variable
        # Valuee is associated with distance from origin
t <- sample(rep(c(1,2),length(y)/2))  #2 ecological 'groups' constrained by spatial contingency
sc <- 0.0   #spatial contingency: 0->1
for(i in 1:length(y)){
  z = scale(y,scale=sd(y))
  crit = 1-sc
  crit =c(-crit/2,crit/2)*3
  if(z[i]<=min(crit)) t[i]=1
  if(z[i]>=max(crit)) t[i]=2 
}

plot(g, pch=21, bg=t, cex=y, asp=1, main="Species diversity proprotional to circle size 
Color designates ecological type")

2: Covariation of Data with Geography

One pattern that can be quantitatively examined is the extent to which some variable covaries with geography. A Mantel test may be used for this (NOTE: recall the caveats of the Mantel test from week 10!)

mantel(dist(g), dist(y), permutations = 9999)  # Mantel association

## 
## Mantel statistic based on Pearson's product-moment correlation 
## 
## Call:
## mantel(xdis = dist(g), ydis = dist(y), permutations = 9999) 
## 
## Mantel statistic r: 0.467 
##       Significance: 1e-04 
## 
## Upper quantiles of permutations (null model):
##    90%    95%  97.5%    99% 
## 0.0631 0.0856 0.1044 0.1269 
## Permutation: free
## Number of permutations: 9999

mantel.partial(dist(t), dist(y), dist(g), permutations = 9999)  #3-way Mantel holding group constant

## 
## Partial Mantel statistic based on Pearson's product-moment correlation 
## 
## Call:
## mantel.partial(xdis = dist(t), ydis = dist(y), zdis = dist(g),      permutations = 9999) 
## 
## Mantel statistic r: 0.001081 
##       Significance: 0.3725 
## 
## Upper quantiles of permutations (null model):
##    90%    95%  97.5%    99% 
## 0.0286 0.0442 0.0619 0.0849 
## Permutation: free
## Number of permutations: 9999

3: Spatial Autocorrelation

One may also assess the level of spatial autocorrelation in a variable; that is, the degree to which spatially proximate samples display similar values.

W<-tri2nb(g) #weights with Delauney tesselation
moran.test(y,nb2listw(W))   #positive autocorrelation

## 
##  Moran I test under randomisation
## 
## data:  y  
## weights: nb2listw(W)    
## 
## Moran I statistic standard deviate = 7.3888, p-value = 7.409e-14
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.576083533      -0.020408163       0.006517219

Semivariogram

One can also plot the degree of autocorrelation relative to distance as a semivariogram.

df <- data.frame(g,y)
coordinates(df) = ~g
res <- variogram(y~1,df)
plot(res, type = "b", main = "Variogram: y")

Now plot this against a Gaussian decay model:

VarMdl <- vgm(psill=2, model="Gau", nugget=0.1, range=1)
plot(res, model=VarMdl)

4: Accounting for Spatial Autocorrelation

One may also be interested in determining the relationship between a set of dependent and independent variables while accounting for spatial non-independence. Here the linear model is of the form: Y~X|Geog, where Geog is modeled in the error covariance. As with phylogenetic comparative methods (week 11), this is accomplished using GLM (or more generally GLS).

Note that with spatial data, there are multiple ways to model the spatial covariance, based on how one expects spatial autocorrelation to decay with distance. We discussed numerous models in class: several are shown in the example below.

ols.fit <- lm(y~t, x=T)  #spatial proximity not considered
summary(ols.fit)

## 
## Call:
## lm(formula = y ~ t, x = T)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0213 -1.4194  0.0082  1.8534  3.5484 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   1.8102     0.9527   1.900   0.0634 .
## t             0.7989     0.6026   1.326   0.1912  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.13 on 48 degrees of freedom
## Multiple R-squared:  0.03533,    Adjusted R-squared:  0.01523 
## F-statistic: 1.758 on 1 and 48 DF,  p-value: 0.1912

anova(ols.fit)

## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value Pr(>F)
## t          1   7.977  7.9771  1.7577 0.1912
## Residuals 48 217.842  4.5384

# GLS models with different spatial autocorrelation structure
  # Warning!  False convergences possible
t <- as.factor(t)
geo=data.frame(g,t,y)

library(nlme)
gls.fit.exp = gls(y~t, data=geo, correlation=corExp(form=~lat+long)) #exponential
gls.fit.gaus = gls(y~t, data=geo,  correlation=corGaus(form=~lat+long)) #gaussian
gls.fit.spher = gls(y~t, data=geo,  correlation=corSpher(form=~lat+long)) #spherical
gls.fit.lin = gls(y~t, data=geo,  correlation=corLin(form=~lat+long)) #linear

# look at coeffcients: VERY DIFFERENT when spatial non-independence considered
ols.fit

## 
## Call:
## lm(formula = y ~ t, x = T)
## 
## Coefficients:
## (Intercept)            t  
##      1.8102       0.7989

gls.fit.exp

## Generalized least squares fit by REML
##   Model: y ~ t 
##   Data: geo 
##   Log-restricted-likelihood: -107.6296
## 
## Coefficients:
## (Intercept)          t2 
##   2.6090185   0.7988533 
## 
## Correlation Structure: Exponential spatial correlation
##  Formula: ~lat + long 
##  Parameter estimate(s):
##        range 
## 0.0007017057 
## Degrees of freedom: 50 total; 48 residual
## Residual standard error: 2.130347

gls.fit.gaus

## Generalized least squares fit by REML
##   Model: y ~ t 
##   Data: geo 
##   Log-restricted-likelihood: -107.6296
## 
## Coefficients:
## (Intercept)          t2 
##   2.6090185   0.7988533 
## 
## Correlation Structure: Gaussian spatial correlation
##  Formula: ~lat + long 
##  Parameter estimate(s):
##     range 
## 0.0028644 
## Degrees of freedom: 50 total; 48 residual
## Residual standard error: 2.130347

gls.fit.spher

## Generalized least squares fit by REML
##   Model: y ~ t 
##   Data: geo 
##   Log-restricted-likelihood: -107.6296
## 
## Coefficients:
## (Intercept)          t2 
##   2.6090185   0.7988533 
## 
## Correlation Structure: Spherical spatial correlation
##  Formula: ~lat + long 
##  Parameter estimate(s):
##      range 
## 0.01218098 
## Degrees of freedom: 50 total; 48 residual
## Residual standard error: 2.130347

gls.fit.lin

## Generalized least squares fit by REML
##   Model: y ~ t 
##   Data: geo 
##   Log-restricted-likelihood: -107.6296
## 
## Coefficients:
## (Intercept)          t2 
##   2.6090185   0.7988533 
## 
## Correlation Structure: Linear spatial correlation
##  Formula: ~lat + long 
##  Parameter estimate(s):
##      range 
## 0.01217965 
## Degrees of freedom: 50 total; 48 residual
## Residual standard error: 2.130347

# model comparisons
AIC(gls.fit.exp, gls.fit.gaus,gls.fit.lin)

##              df      AIC
## gls.fit.exp   4 223.2592
## gls.fit.gaus  4 223.2592
## gls.fit.lin   4 223.2592

    #Careful when y is multivariate, see Model Selection lecture
   ## Exponential decay of spatial dependence is best model
# Anova
anova(gls.fit.exp)

## Denom. DF: 48 
##             numDF  F-value p-value
## (Intercept)     1 99.71343  <.0001
## t               1  1.75769  0.1912

anova(gls.fit.gaus)

## Denom. DF: 48 
##             numDF  F-value p-value
## (Intercept)     1 99.71343  <.0001
## t               1  1.75769  0.1912

anova(gls.fit.spher)

## Denom. DF: 48 
##             numDF  F-value p-value
## (Intercept)     1 99.71343  <.0001
## t               1  1.75769  0.1912

anova(gls.fit.lin)

## Denom. DF: 48 
##             numDF  F-value p-value
## (Intercept)     1 99.71343  <.0001
## t               1  1.75769  0.1912

If one has a spatial covariance matrix, this can also be used in RRPP. An advantage here is that accounting for spatial covariance works identically for both univariate and multivariate response variables. Here we show two examples. First, we estimate a simple spatial covariance matrix based on geographic distance:

# Spatial autocorrelation (from lat/long)
spatCov <- function(x){
  x <- as.matrix(x)
  if(ncol(x) != 2) stop("Need two columns for lat long!")
  d <- dist(x)
  P <- cmdscale(d, 2)
  tcrossprod(P)/(ncol(P) - 1)
}

#example  
spat.cov<-spatCov(g)
rdf <- rrpp.data.frame(g=g, y=y,t=t, spat.cov = spat.cov)
res <- lm.rrpp(y~t,Cov = spat.cov, data = rdf, print.progress = FALSE)

## 
## Warning: singular covariance matrix. Proceed with caution

anova(res)

## 
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals 
## Number of permutations: 1000 
## Estimation method: Generalized Least-Squares (via OLS projection) 
## Sums of Squares and Cross-products: Type I 
## Effect sizes (Z) based on F distributions
## 
##           Df         SS         MS     Rsq     F      Z Pr(>F)   
## t          1 1.6363e+13 1.6363e+13 0.78212 172.3 5.3947  0.001 **
## Residuals 48 4.5585e+12 9.4968e+10 0.21788                       
## Total     49 2.0922e+13                                          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call: lm.rrpp(f1 = y ~ t, data = rdf, Cov = spat.cov, print.progress = FALSE)

However, this approach does not incorporate the decay of spatial dependency with distance (see Cressie’s work on spatial statistics and signal decay). A better approach is to extract the spatial covariance matrix from an nlme model, and use this in RRPP:

#exponential matrix
my.exp <- corExp(1, form=~g) #g is lat/long from above
my.exp <- Initialize(my.exp,geo)
mycor <- corMatrix(my.exp)

anova(lm.rrpp(y~t,Cov = mycor, data = rdf, print.progress = FALSE))

## 
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals 
## Number of permutations: 1000 
## Estimation method: Generalized Least-Squares (via OLS projection) 
## Sums of Squares and Cross-products: Type I 
## Effect sizes (Z) based on F distributions
## 
##           Df     SS      MS     Rsq      F      Z Pr(>F)   
## t          1 198.83 198.829 0.28322 18.966 2.8364  0.002 **
## Residuals 48 503.20  10.483 0.71678                        
## Total     49 702.03                                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call: lm.rrpp(f1 = y ~ t, data = rdf, Cov = mycor, print.progress = FALSE)