Dean Adams, Iowa State University
Point Patterns
Mantel
Partial Mantel
GLS
Comparisons: Multiple methods - which is best?
Summary: Areas for future exploration
Biological organisms inhabit physical space
Several interesting questions relate to geography/space
How are my observations distributed?
Are my data associated with geography?
Are my data spatially autocorrelated?
Can I account for spatial autocorrelation in my analysis?
The distribution of objects in space can be of possible interest
Possible distributions: locations are random, they are more clumped than expected, they are more regular (dispersed) than expected
The distribution of objects in space can be of possible interest
Possible distributions: locations are random, they are more clumped than expected, they are more regular (dispersed) than expected
\(\small{1}^{st}\) order nearest neighbor index (index of aggreggation: R)
\(\small{1}^{st}\) order nearest neighbor index (index of aggreggation: R)
\(\small{R<1}\): clumped
\(\small{R=1}\): random
\(\small{R>1}\): dispersed
Test for significance via effect size conversion: \(\small{Z-score}\)
\[\small{K=\frac{E(pts)}{\lambda}}\]
where \(\small{E(pts)}\) is the the expected number of points in some area, and \(\small\lambda\) is the density of points
\[\small{K=\frac{E(pts)}{\lambda}}\]
where \(\small{E(pts)}\) is the the expected number of points in some area, and \(\small\lambda\) is the density of points
\(\small{K}\) describes number of points in some area relative to expectation
May be used to test hypotheses of clustering or regular distribution (assessed via stochastic simulation)
Quadrat method:
Break area into \(\small{n}\) quadrats and count \(\small{X}\) objects in each
Calculate Index of Dispersion: \(\small{ID=\frac{\sigma^2_X}{\overline{X}}}\)
Contiguous quadrat method
Break area into successively smaller quadrats nested one within the other, such that each quadrat is half the size of its ‘parent’
Count objects in each quadrat
For each quadrat size (r) calculate: \(\small{G=2T_r-T_{2r}}\); where \(\small{T_r}\) is SS for the \(\small{r}^{th}\) quadrat size
Test each quadrat size using: \(\small\frac{G_r}{G_1}\); tested against \(\small{F_{N/2r,N/2}}\)
Use distances among objects to determine whether they are farther apart, or closer together than expected by chance (randomness)
Plant-Plant Distance (W): Calculate distance between objects and their \(\small{1}^{st}\), \(\small{2}^{nd}\), \(\small{3}^{rd}\) (etc.) nearest neighbors
Point-Plant Distance (X): Calculate distance between randomly chosen locations and their \(\small{1}^{st}\), \(\small{2}^{nd}\), \(\small{3}^{rd}\) (etc.) nearest neighbor objects
Point-Plant-Plant Distance (Y): choose random location, find the nearest object to it, and calculate the distance from that object to its \(\small{1}^{st}\), \(\small{2}^{nd}\), \(\small{3}^{rd}\) (etc.) nearest neighbors
Determining whether the distribution of objects is different from Poisson random is basically asking the probability that another object is within an area radius W (or X, Y)
MANY different tests for randomness have been proposed based on these measurements
One can also compare observed data to randomly generated point patterns
Here we see data (\(\small\mathbf{Y}\)) that are influenced by spatial location. Such patterns exhibit spatial autocorrelation.
Here are some questions one may consider:
How are my observations distributed in space? (point patterns: discussed previously)
Are the data (\(\small\mathbf{Y}\)) associated with geography? Or more precisely, are changes found among subjects associated with changes in geography?
Are the data (\(\small\mathbf{Y}\)) spatially autocorrelated? (NOTE: not the same as associated: a more precise meaning)
If my data (\(\small\mathbf{Y}\)) have spatial autocorrelation, how can I assess ecological hypotheses while taking this into consideration (i.e., without geography becoming a counfounding variable)?
Of the various methods to address the previous concerns, which are suited for what kind of circumstances?
Let us again examine this pattern. Here we see that species diversity and geography seem associated.
One could assess this using the Mantel Test using distances between subjects for geography \(\small\mathbf{X}\) and species diversity \(\small\mathbf{Y}\)
Let us again examine this pattern. Here we see that species diversity and geography seem associated.
One could assess this using the Mantel Test using distances between subjects for geography \(\small\mathbf{X}\) and species diversity \(\small\mathbf{Y}\)
Obtain: \(\small{z}_M= \sum{\mathbf{X}_{i}\mathbf{Y}_{i}}\) where \(\small\mathbf{X}\) & \(\small\mathbf{Y}\) are the unfolded distance matrices
Estimate the Mantel correlation coefficient: \(\small{r}_M = \frac{z_M}{[n(n-1)/2]-1}\)
Assess significance of \(\small{r_M}\) via permutation, where R/C of distance matrix are permuted.
##
## Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel(xdis = dist(g), ydis = dist(y), permutations = 999)
##
## Mantel statistic r: 0.4739
## Significance: 0.001
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.0624 0.0869 0.1008 0.1247
## Permutation: free
## Number of permutations: 999
If \(\small\mathbf{X}\) and \(\small\mathbf{Y}\) both covary with geogaphy \(\small\mathbf{Z}\), one can consider partialing out the effect of geography
Conceptually: \(\small\mathbf{Y~X|Z}\)
Assesses relationship of \(\small\mathbf{Y}\) and \(\small\mathbf{X}\) holding \(\small\mathbf{Z}\) constant
##
## Partial Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel.partial(xdis = dist(t), ydis = dist(y), zdis = dist(g), permutations = 999)
##
## Mantel statistic r: 0.02606
## Significance: 0.111
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.0273 0.0436 0.0532 0.0700
## Permutation: free
## Number of permutations: 999
Mantel tests seem appropriate for all manner of spatial hypothesis testing. But remember
They suffer from inflated type I error, low power, and can have significant bias, particularly with autocorrelated error
Other approaches should be used for these sorts of hypotheses
What can produce spatial autocorrelation (SAC)?
What can produce spatial autocorrelation (SAC)?
In general statistical terms, we can consider the problem as a linear model:
\[\small\mathbf{Y}=\mathbf{X{\hat{\beta}}+\epsilon}\] Here, \(\small\epsilon\) is not iid, as these assumptions are not met (there is SAC). Thus, we should model \(\small\epsilon\) something like: \(\small\sim\mathcal{N}(0,\mathbf{\Sigma})\). where \(\small\mathbf{\Sigma}\) embodies the expected spatial covariation between subjects.
What can produce spatial autocorrelation (SAC)?
In general statistical terms, we can consider the problem as a linear model:
\[\small\mathbf{Y}=\mathbf{X{\hat{\beta}}+\epsilon}\] Here, \(\small\epsilon\) is not iid, as these assumptions are not met (there is SAC). Thus, we should model \(\small\epsilon\) something like: \(\small\sim\mathcal{N}(0,\mathbf{\Sigma})\). where \(\small\mathbf{\Sigma}\) embodies the expected spatial covariation between subjects.
This leads to two steps:
Spatial Autocorrelation: lack of independence of values based on spatial properties (self-correlation)
Can test whether observed values at one locality depend (at least in part) on those at neighboring localities.
Knowing the spatial auto-correlation ‘structure’ informs one how to model ecological factors
Spatial Autocorrelation: lack of independence of values based on spatial properties (self-correlation)
Can test whether observed values at one locality depend (at least in part) on those at neighboring localities.
Knowing the spatial auto-correlation ‘structure’ informs one how to model ecological factors
Procedure:
For categorical data (A,B)
For continuous data, one still measures \(\small\sum{(weights \times{data})}\)
Moran’s I: \(\small{I}=\frac{n\sum\sum{w_{ij}(y_i-\hat{y})(y_j-\hat{y})}}{W\sum(y_i-\hat{y})^2}\)
For continuous data, one still measures \(\small\sum{(weights \times{data})}\)
Moran’s I: \(\small{I}=\frac{n\sum\sum{w_{ij}(y_i-\hat{y})(y_j-\hat{y})}}{W\sum(y_i-\hat{y})^2}\)
Geary’s c: \(\small{c}=\frac{(n-1)\sum\sum{w_{ij}(y_i-y_j)^2}}{2W\sum(y_i-\hat{y})^2}\)
For both, significance based on normal approximation (see Legendre et al. 2012): \(\small{z}_I=\frac{I_{obs}-E(I)}{\sigma_I}\) where \(\small{E(I)}= -\frac{1}{n-1}\). Also, \(\small{E(c)=1}\)
The semivarigram for the simulated data, and with a Gaussian model
Sokal and Thompson (1987) examined distribution and attributes of Aralia nudicaulis (an understory plant), including: fecundity, density, canopy cover, and percent female
Found significant spatial autocorrelation for most variables (not fecundity), none of which displayed the same pattern
Thus, variables exhibited autocorrelation, but not the spatial association with one another
\[\small\mathbf{Y}=\mathbf{X{\hat{\beta}}+\epsilon}\]
Here, \(\small\epsilon\) is not iid, as these assumptions are not met (there is SAC). Thus, we should model \(\small\epsilon\) something like: \(\small\sim\mathcal{N}(0,\mathbf{\Sigma})\). where \(\small\mathbf{\Sigma}\) embodies the expected spatial covariation between subjects.
Thus, given an estimate \(\small\mathbf{\Sigma}\), we can fit the model as: \(\small\hat{\mathbf{\beta }}=\left ( \mathbf{X}^{T} \mathbf{\Sigma}^{-1} \mathbf{X}\right )^{-1}\left ( \mathbf{X}^{T} \mathbf{\Sigma}^{-1}\mathbf{Y}\right )\)
The question is: what is \(\small\mathbf{\Sigma}\)?
\[\small\mathbf{Y}=\mathbf{X{\hat{\beta}}+\epsilon}\]
Here, \(\small\epsilon\) is not iid, as these assumptions are not met (there is SAC). Thus, we should model \(\small\epsilon\) something like: \(\small\sim\mathcal{N}(0,\mathbf{\Sigma})\). where \(\small\mathbf{\Sigma}\) embodies the expected spatial covariation between subjects.
Thus, given an estimate \(\small\mathbf{\Sigma}\), we can fit the model as: \(\small\hat{\mathbf{\beta }}=\left ( \mathbf{X}^{T} \mathbf{\Sigma}^{-1} \mathbf{X}\right )^{-1}\left ( \mathbf{X}^{T} \mathbf{\Sigma}^{-1}\mathbf{Y}\right )\)
The question is: what is \(\small\mathbf{\Sigma}\)?
It turns out there are various ways to parameterize \(\small\mathbf{\Sigma}\), but it is always an \(\small{n\times{n}}\) matrix with values estimating the expected covariance between subjects as based on the geographic distance between them.
In general, \(\small\mathbf{\Sigma}\) describes the expected covariance among locations, with coefficients proportional to distance (or the decay of expected similarity as a function of distance)
MANY models have been proposed to describe spatial non-independence. Here are a few:
\[\small{Exponential}= \sigma^2e^{-r/d}\]
\[\small{Gaussian}= \sigma^2e^{(-r/d)^2}\]
\[\small{Spherical}= \sigma^2(1-2/\pi(r/d\sqrt{1-r^2/d^2}+sin^{-1}r/d))\]
where \(\small{r}\) describes the expected covariance (correlation) between a pair of subjects, and \(\small{d}\) is the distance between them over which this corralation decays.
Results imply that GLS approach (i.e., a spatially-weighted model) performs quite well!
Having a spatial context is hugely important to consider in biology
How are my observations distributed?
Are my data associated with geography?
Are my data spatially autocorrelated?
Can I account for spatial autocorrelation in my analysis?