library(spatstat)
## Loading required package: spatstat.data
## Loading required package: spatstat.geom
## spatstat.geom 2.1-0
## Loading required package: spatstat.core
## Loading required package: nlme
## Loading required package: rpart
## spatstat.core 2.1-2
## Loading required package: spatstat.linnet
## spatstat.linnet 2.1-1
##
## spatstat 2.1-0 (nickname: 'Comedic violence')
## For an introduction to spatstat, type 'beginner'
library(RRPP)
library(vegan)
## Loading required package: permute
## Loading required package: lattice
##
## Attaching package: 'lattice'
## The following object is masked from 'package:spatstat.core':
##
## panel.histogram
## This is vegan 2.5-7
library(spdep)
## Loading required package: sp
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: sf
## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
## Registered S3 method overwritten by 'spdep':
## method from
## plot.mst ape
library(gstat)
##
## Attaching package: 'gstat'
## The following object is masked from 'package:spatstat.core':
##
## idw
Today we examine methods that concern geographic space. Some assess whether or not there is spatial autocorrelation in data, others account for spatial autocorrelation in subsequent analyses. The latter can be accomplished using a generalized least squares model, or implemented as a weighted GLM. These approaches model spatial non-independence in the error term of the GLM, and can assess the relationship between two sets of variables (Y~X) while accounting for spatial non-independence.
The observant student will note that this description, and the method used, is identical to what is done in phylogenetic comparative methods, where phylogenetic non-independence is accounted for during the analysis of Y~X.
Our first analysis takes spatial data and examines point patterns for underdispersion or overdispersion:
#1: Ripley's K
data(cells)
plot(cells)
plot(Kest(cells)) #shows K for various models: isotropic, poisson, etc.
E<-envelope(cells,Kest,nsim=100,rank=2)
## Generating 100 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.
##
## Done.
plot(E) # plot shows that observed are underdispersed at small spatial scales
The plot shows that the observed data are underdispersed at small spatial scales.
One pattern that can be quantitatively examined is the extent to which some variable covaries with geography. A Mantel test may be used for this (NOTE: recall the caveats of the Mantel test from week 10!)
mantel(dist(g), dist(y), permutations = 9999) # Mantel association
##
## Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel(xdis = dist(g), ydis = dist(y), permutations = 9999)
##
## Mantel statistic r: 0.467
## Significance: 1e-04
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.0631 0.0856 0.1044 0.1269
## Permutation: free
## Number of permutations: 9999
mantel.partial(dist(t), dist(y), dist(g), permutations = 9999) #3-way Mantel holding group constant
##
## Partial Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel.partial(xdis = dist(t), ydis = dist(y), zdis = dist(g), permutations = 9999)
##
## Mantel statistic r: 0.001081
## Significance: 0.3725
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.0286 0.0442 0.0619 0.0849
## Permutation: free
## Number of permutations: 9999
One may also assess the level of spatial autocorrelation in a variable; that is, the degree to which spatially proximate samples display similar values.
W<-tri2nb(g) #weights with Delauney tesselation
moran.test(y,nb2listw(W)) #positive autocorrelation
##
## Moran I test under randomisation
##
## data: y
## weights: nb2listw(W)
##
## Moran I statistic standard deviate = 7.3888, p-value = 7.409e-14
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.576083533 -0.020408163 0.006517219
One can also plot the degree of autocorrelation relative to distance as a semivariogram.
df <- data.frame(g,y)
coordinates(df) = ~g
res <- variogram(y~1,df)
plot(res, type = "b", main = "Variogram: y")
Now plot this against a Gaussian decay model:
VarMdl <- vgm(psill=2, model="Gau", nugget=0.1, range=1)
plot(res, model=VarMdl)
One may also be interested in determining the relationship between a set of dependent and independent variables while accounting for spatial non-independence. Here the linear model is of the form: Y~X|Geog, where Geog is modeled in the error covariance. As with phylogenetic comparative methods (week 11), this is accomplished using GLM (or more generally GLS).
Note that with spatial data, there are multiple ways to model the spatial covariance, based on how one expects spatial autocorrelation to decay with distance. We discussed numerous models in class: several are shown in the example below.
ols.fit <- lm(y~t, x=T) #spatial proximity not considered
summary(ols.fit)
##
## Call:
## lm(formula = y ~ t, x = T)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0213 -1.4194 0.0082 1.8534 3.5484
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.8102 0.9527 1.900 0.0634 .
## t 0.7989 0.6026 1.326 0.1912
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.13 on 48 degrees of freedom
## Multiple R-squared: 0.03533, Adjusted R-squared: 0.01523
## F-statistic: 1.758 on 1 and 48 DF, p-value: 0.1912
anova(ols.fit)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## t 1 7.977 7.9771 1.7577 0.1912
## Residuals 48 217.842 4.5384
# GLS models with different spatial autocorrelation structure
# Warning! False convergences possible
t <- as.factor(t)
geo=data.frame(g,t,y)
library(nlme)
gls.fit.exp = gls(y~t, data=geo, correlation=corExp(form=~lat+long)) #exponential
gls.fit.gaus = gls(y~t, data=geo, correlation=corGaus(form=~lat+long)) #gaussian
gls.fit.spher = gls(y~t, data=geo, correlation=corSpher(form=~lat+long)) #spherical
gls.fit.lin = gls(y~t, data=geo, correlation=corLin(form=~lat+long)) #linear
# look at coeffcients: VERY DIFFERENT when spatial non-independence considered
ols.fit
##
## Call:
## lm(formula = y ~ t, x = T)
##
## Coefficients:
## (Intercept) t
## 1.8102 0.7989
gls.fit.exp
## Generalized least squares fit by REML
## Model: y ~ t
## Data: geo
## Log-restricted-likelihood: -107.6296
##
## Coefficients:
## (Intercept) t2
## 2.6090185 0.7988533
##
## Correlation Structure: Exponential spatial correlation
## Formula: ~lat + long
## Parameter estimate(s):
## range
## 0.0007017057
## Degrees of freedom: 50 total; 48 residual
## Residual standard error: 2.130347
gls.fit.gaus
## Generalized least squares fit by REML
## Model: y ~ t
## Data: geo
## Log-restricted-likelihood: -107.6296
##
## Coefficients:
## (Intercept) t2
## 2.6090185 0.7988533
##
## Correlation Structure: Gaussian spatial correlation
## Formula: ~lat + long
## Parameter estimate(s):
## range
## 0.0028644
## Degrees of freedom: 50 total; 48 residual
## Residual standard error: 2.130347
gls.fit.spher
## Generalized least squares fit by REML
## Model: y ~ t
## Data: geo
## Log-restricted-likelihood: -107.6296
##
## Coefficients:
## (Intercept) t2
## 2.6090185 0.7988533
##
## Correlation Structure: Spherical spatial correlation
## Formula: ~lat + long
## Parameter estimate(s):
## range
## 0.01218098
## Degrees of freedom: 50 total; 48 residual
## Residual standard error: 2.130347
gls.fit.lin
## Generalized least squares fit by REML
## Model: y ~ t
## Data: geo
## Log-restricted-likelihood: -107.6296
##
## Coefficients:
## (Intercept) t2
## 2.6090185 0.7988533
##
## Correlation Structure: Linear spatial correlation
## Formula: ~lat + long
## Parameter estimate(s):
## range
## 0.01217965
## Degrees of freedom: 50 total; 48 residual
## Residual standard error: 2.130347
# model comparisons
AIC(gls.fit.exp, gls.fit.gaus,gls.fit.lin)
## df AIC
## gls.fit.exp 4 223.2592
## gls.fit.gaus 4 223.2592
## gls.fit.lin 4 223.2592
#Careful when y is multivariate, see Model Selection lecture
## Exponential decay of spatial dependence is best model
# Anova
anova(gls.fit.exp)
## Denom. DF: 48
## numDF F-value p-value
## (Intercept) 1 99.71343 <.0001
## t 1 1.75769 0.1912
anova(gls.fit.gaus)
## Denom. DF: 48
## numDF F-value p-value
## (Intercept) 1 99.71343 <.0001
## t 1 1.75769 0.1912
anova(gls.fit.spher)
## Denom. DF: 48
## numDF F-value p-value
## (Intercept) 1 99.71343 <.0001
## t 1 1.75769 0.1912
anova(gls.fit.lin)
## Denom. DF: 48
## numDF F-value p-value
## (Intercept) 1 99.71343 <.0001
## t 1 1.75769 0.1912
If one has a spatial covariance matrix, this can also be used in RRPP. An advantage here is that accounting for spatial covariance works identically for both univariate and multivariate response variables. Here we show two examples. First, we estimate a simple spatial covariance matrix based on geographic distance:
# Spatial autocorrelation (from lat/long)
spatCov <- function(x){
x <- as.matrix(x)
if(ncol(x) != 2) stop("Need two columns for lat long!")
d <- dist(x)
P <- cmdscale(d, 2)
tcrossprod(P)/(ncol(P) - 1)
}
#example
spat.cov<-spatCov(g)
rdf <- rrpp.data.frame(g=g, y=y,t=t, spat.cov = spat.cov)
res <- lm.rrpp(y~t,Cov = spat.cov, data = rdf, print.progress = FALSE)
##
## Warning: singular covariance matrix. Proceed with caution
anova(res)
##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Generalized Least-Squares (via OLS projection)
## Sums of Squares and Cross-products: Type I
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## t 1 1.6363e+13 1.6363e+13 0.78212 172.3 5.3947 0.001 **
## Residuals 48 4.5585e+12 9.4968e+10 0.21788
## Total 49 2.0922e+13
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: lm.rrpp(f1 = y ~ t, data = rdf, Cov = spat.cov, print.progress = FALSE)
However, this approach does not incorporate the decay of spatial dependency with distance (see Cressie’s work on spatial statistics and signal decay). A better approach is to extract the spatial covariance matrix from an nlme model, and use this in RRPP:
#exponential matrix
my.exp <- corExp(1, form=~g) #g is lat/long from above
my.exp <- Initialize(my.exp,geo)
mycor <- corMatrix(my.exp)
anova(lm.rrpp(y~t,Cov = mycor, data = rdf, print.progress = FALSE))
##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Generalized Least-Squares (via OLS projection)
## Sums of Squares and Cross-products: Type I
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## t 1 198.83 198.829 0.28322 18.966 2.8364 0.002 **
## Residuals 48 503.20 10.483 0.71678
## Total 49 702.03
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: lm.rrpp(f1 = y ~ t, data = rdf, Cov = mycor, print.progress = FALSE)