11 Correlation structures
11.1 General Idea
Except for the random effects, we have so far assumed that residual errors are independent. However, that must not always be the case - we may find that residuals show autocorrelation.
Correlation means that one variable correlates with another. Autocorrelation means that data points of one variable that are close to each other have similar values. This implies that autocorrelation is only defined if there is a “distance relationship” between observations.
Autocorrelation can always occur if we have a distance relationship between observations. Apart from random effects, where distance is expressed by group, common examples of continuous distance relationships include:
- Random effects (distance = group)
- Spatial distance.
- Temporal distance.
- Phylogenetic distance.
Here a visualization from Roberts et al., 2016 (reproduced as OA, copyright: the authors).
11.1.1 Models to deal with autocorrelation
If we find autocorrelation in the residuals of our model, there can be several reasons, which we can address by different structures.
In the context of regression models, we are never interested in the autocorrelation of the response / predictors per se, but only in the residuals. Thus, it doesn’t make sense to assume that you need a spatial model only because you have a spatially autocorrelated signal.
Autocorrelation can occur because we have a spatially correlated misfit, i.e. there is a trend in the given space (e.g. time, space, phylogeny). If this is the case, de-trending the model (with a linear regression term or a spline) will remove the residual autocorrelation. We should always de-trend first because we consider moving to a model with a residual correlation structure.
Only after accounting for the trend, we should test if there is a residual spatial / temporal / phylogenetic autocorrelation. If that is the case, we would usually use a so-called conditional autoregressive (CAR) structures. In these models, we make parametric assumptions for how the correlation between data points falls off with distance. When we speak about spatial / temporal / phylogenetic regressions, we usually refer to these or similar models.
11.1.2 R implementation
To de-trend, you can just use standard regression terms or splines on time or space. For the rest of this chapter, we will concentrate on how to specify “real” correlation structures. However, in the case studies, you should always de-trend first.
To account for “real” autocorrelation of residuals, similar as for the variance modelling, we can add correlation structures
- for normal responses in
nlme::gls
, see https://stat.ethz.ch/R-manual/R-devel/library/nlme/html/corClasses.html - for GLMs using
glmmTMB
, see https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.html.
The following pages provide examples and further comments on how to do this.
Especially for spatial models, both nlme and glmmTMB are relatively slow. Therea are a large number of specialized packages that deal in particular with the problem of spatial models, including MASS::glmmPQL
, BRMS, INLA, spaMM, and many more. To keep things simple and concentrate on the principles, however, we will stick with the packages you already know.
11.2 Temporal Correlation Structures
To introduce temporal autoregressive models, let’s simulate some data first. The most simple (and common) temporal structure is the AR1 model, aka autoregressive model with lag 1. The AR1 assumes that the next data point (or residual) originates from a weighted mean of the last data point and a residual normal distribution in the form
\[ x_{t+1} = a \cdot x_t + (1-a) \cdot \epsilon \]
Let’s simulate some data according to this model
# simulate temporally autocorrelated data
<-function(n, a){
AR1sim= rep(NA, n)
x 1] = 0
x[for(i in 2:n){
= a * x[i-1] + (1-a) * rnorm(1)
x[i]
}return(x)
}
set.seed(123)
= AR1sim(1000, 0.9)
obs plot(obs)
As we can see, we have a temporal correlation here. As we have not modeled / specified any further predictors, the correlation in the signal will transform directly in the correlations of the residuals if we fit a model:
= lm(obs~1)
fit summary(fit)
Call:
lm(formula = obs ~ 1)
Residuals:
Min 1Q Median 3Q Max
-0.58485 -0.14315 0.00559 0.13729 0.66317
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.017019 0.006745 2.523 0.0118 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2133 on 999 degrees of freedom
Note that the estimate of the intercept is significant, although we started the simulation at zero. Let’s look at the residuals, which have the same autocorrelation as the data.
plot(residuals(fit))
We can quantify the autocorrelation by the acf function, which quantifies correlation between observations as a function of their lag (temporal distance). Note that although we modeled only a lag of 1, we will get correlations with many lags, because the correlation effect “trickles down”.
acf(residuals(fit))
To check what the actual underlying model is, it may be useful to plot the partial correlation coefficient, which is estimated by fitting autoregressive models of successively higher orders and checking their residuals.
pacf(residuals(fit))
Here, we see that we actually only have a correlation with lag 1. You can also check for temporal correlation with the DHARMa package
library(DHARMa)
This is DHARMa 0.4.7. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')
testTemporalAutocorrelation(fit, time = 1:1000)
Durbin-Watson test
data: simulationOutput$scaledResiduals ~ 1
DW = 0.25973, p-value < 2.2e-16
alternative hypothesis: true autocorrelation is not 0
Remember: in general, for spatial / temporal data, there are two processes that can created residual autocorreation:
- There is a spatial misfit trend in time / space, which creates a correlation in space / time.
- There truly is a spatial correlation, after accounting for the trend.
Unfortunately, the distinction between a larger trend and a correlation is quite fluid. Nevertheless, one should always first check for and remove the trend, typically by including time/space as a predictor, potentially in a flexible way (GAMs come in handy). After this is done, we can fit a model with a temporally/spatially correlated error.
Let’s see how we can fit the AR1 model to data. First, with nlme
library(nlme)
= gls(obs~1, corr = corAR1(0.771, form = ~ 1))
fitGLS summary(fitGLS)
Generalized least squares fit by REML
Model: obs ~ 1
Data: NULL
AIC BIC logLik
-1773.104 -1758.384 889.552
Correlation Structure: AR(1)
Formula: ~1
Parameter estimate(s):
Phi
0.8865404
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.01620833 0.02741235 0.5912784 0.5545
Standardized residuals:
Min Q1 Med Q3 Max
-2.72610052 -0.66439347 0.02987808 0.64460052 3.09922846
Residual standard error: 0.2142402
Degrees of freedom: 1000 total; 999 residual
Second, with glmmTMB
library(glmmTMB)
<- factor(1:1000) # time variable
time = factor(rep(1,1000)) # group (for multiple time series)
group
= glmmTMB(obs ~ ar1(time + 0 | group)) fitGLMMTMB
Warning in glmmTMB(obs ~ ar1(time + 0 | group)): use of the 'data' argument is
recommended
summary(fitGLMMTMB)
Family: gaussian ( identity )
Formula: obs ~ ar1(time + 0 | group)
AIC BIC logLik deviance df.resid
-1776.7 -1757.1 892.4 -1784.7 996
Random effects:
Conditional model:
Groups Name Variance Std.Dev. Corr
group time1 0.0449381 0.21199 0.89 (ar1)
Residual 0.0002047 0.01431
Number of obs: 1000, groups: group, 1
Dispersion estimate for gaussian family (sigma^2): 0.000205
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.01619 0.02740 0.591 0.555
If you check the results, you can see that
- Both models correctly estimate the AR1 parameter
- The p-value for the intercept is in both models n.s., as expected
As I mentioned earlier, first detrend and then add correlation structure if there is autocorrelation. After both steps we should no longer see any pattern in the conditional residuals. Unfortunately, checking the conditional residuals is a bit complicated because glmmTMB does not support conditional simulations, while lme4 does, but it does not support correlation structures. However, there is a workaround.
Let’s start with a small simulation with a time trend and autocorrelation:
= 1:1000/100
time = time +2*(sin(time/0.4)) + rnorm(1000)
y =
data data.frame(y = y, time = time, timeF = as.factor(1:1000), group = as.factor(1))
plot(y ~time)
- Detrend
= glmmTMB(y~time, data = data)
fit1 = simulateResiduals(fit1, plot = TRUE) res
testTemporalAutocorrelation(res, time = data$time)
Durbin-Watson test
data: simulationOutput$scaledResiduals ~ 1
DW = 0.66181, p-value < 2.2e-16
alternative hypothesis: true autocorrelation is not 0
Test for temporal autocorrelation is siginificant -> add autoregressive structure
- Add autoregressive structure
= glmmTMB(y~time + ar1(0+timeF|group), data = data)
fit2 = simulateResiduals(fit2, plot = TRUE) res
The residual plot did not change because glmmTMB:::simulate.glmmTMB
does not generate conditional predictions. But we can generate them ourselves:
- Create conditional predictions and simulations
We can create a custom DHARMa object with our own simulations:
= predict(fit2, re.form = NULL)
pred = sapply(1:250, function(i) rnorm(1000, pred, summary(fit2)$sigma))
simulations = createDHARMa(simulations, data$y, pred)
res plot(res)
Voila, the residuals look good now!
11.2.1 Exercise - hurricanes revisited?
11.3 Spatial Correlation Structures
Spatial models work very similar to the temporal models. This time, we start directly with an example, using a data set with the thickness of coal seams, that we try to predict with a spatial (soil) predictor.
library(EcoData)
plot(thick ~ soil, data = thickness)
Let’s fit a simple LM to this
= lm(thick ~ soil, data = thickness)
fit summary(fit)
Call:
lm(formula = thick ~ soil, data = thickness)
Residuals:
Min 1Q Median 3Q Max
-6.0414 -1.1975 0.0876 1.4836 4.9584
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.9420 3.1570 10.118 1.54e-15 ***
soil 2.2552 0.8656 2.605 0.0111 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.278 on 73 degrees of freedom
Multiple R-squared: 0.08508, Adjusted R-squared: 0.07254
F-statistic: 6.788 on 1 and 73 DF, p-value: 0.01111
DHARMa checks:
= simulateResiduals(fit)
res testSpatialAutocorrelation(res, x = thickness$north, y = thickness$east)
DHARMa Moran's I test for distance-based autocorrelation
data: res
observed = 0.210870, expected = -0.013514, sd = 0.021940, p-value <
2.2e-16
alternative hypothesis: Distance-based autocorrelation
For spatial data, we often look at spatial variograms, which are similar to an acf but in spatial directions
library(gstat)
= variogram(residuals(fit) ~ 1,
tann.dir.vgm loc =~ east + north, data = thickness,
alpha = c(0, 45, 90, 135))
plot(tann.dir.vgm)
Both the DHARMa plots and the variograms are more indicative of a spatial trend. Let’s remove this with a 2d-spine, called a tensor spline:
library(mgcv)
= gam(thick ~ soil + te(east, north) , data = thickness)
fit1 summary(fit1)
Family: gaussian
Link function: identity
Formula:
thick ~ soil + te(east, north)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.68933 0.26498 149.780 <2e-16 ***
soil 0.12363 0.07275 1.699 0.0952 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
te(east,north) 21.09 22.77 721.3 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.996 Deviance explained = 99.7%
GCV = 0.033201 Scale est. = 0.022981 n = 75
plot(fit1, pages = 0, lwd = 2)
We can check the model again, and the problem is gone
= simulateResiduals(fit1) res
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
Registered S3 method overwritten by 'mgcViz':
method from
+.gg GGally
testSpatialAutocorrelation(res, x = thickness$north, y = thickness$east)
DHARMa Moran's I test for distance-based autocorrelation
data: res
observed = -0.024242, expected = -0.013514, sd = 0.021860, p-value =
0.6236
alternative hypothesis: Distance-based autocorrelation
Almost the same, but simpler:
= lm(thick ~ soil + north + I(north^2), data = thickness) fit
If we would have still seen a signal, we should have fit an autoregressive model. Here it’s not necessary, but just to show you the syntax - first nlme:
= gls(thick ~ soil , correlation = corExp(form =~ east + north) , data = thickness)
fit2 summary(fit2)
Generalized least squares fit by REML
Model: thick ~ soil
Data: thickness
AIC BIC logLik
164.3474 173.5092 -78.17368
Correlation Structure: Exponential spatial correlation
Formula: ~east + north
Parameter estimate(s):
range
719.4122
Coefficients:
Value Std.Error t-value p-value
(Intercept) 42.81488 5.314542 8.056176 0.0000
soil 0.02662 0.199737 0.133289 0.8943
Correlation:
(Intr)
soil -0.12
Standardized residuals:
Min Q1 Med Q3 Max
-1.5811122 -0.7276873 -0.5028102 -0.2092991 0.3217326
Residual standard error: 5.573088
Degrees of freedom: 75 total; 73 residual
Second, for glmmTMB. Here, we again have to prepare the data first
$pos <- numFactor(thickness$east,
thickness$north)
thickness$group <- factor(rep(1, nrow(thickness)))
thickness
= glmmTMB(thick ~ soil + exp(pos + 0 | group) , data = thickness) fit3
The output of summary is a bit chunky, which is why I suppress it here
summary(fit3)
If you wonder why there is such a large correlation matrix displayed: both the AR1 and the exp(pos + 0 | group) structure impose a particular correlation structure on the random effects. Per default, glmmTMB shows correlations of random effects if they are estimated. In the case of the AR1 structure, the programmers apparently surpressed this, and just showed the stimate of the AR1 parameter. Here, however, they didn’t implement this feature, so you see the entire correlation structure, which is, admittedly, less helpful and should be changed.
11.3.1 Exercise - does agriculture influence plant species richness?
11.4 Phylogenetic Structures (PGLS)
This is mostly taken from https://lukejharmon.github.io/ilhabela/instruction/2015/07/03/PGLS/. The two datasets associated with this example are in the EcoData
package.
Perform analysis:
library(EcoData)
library(ape)
library(geiger)
library(nlme)
library(phytools)
library(DHARMa)
To plot the phylogenetic tree, use
plot(anolisTree)
Regress species traits
# Check whether names are matching in both files.
name.check(anolisTree, anolisData)
$tree_not_data
[1] "ahli" "alayoni" "alfaroi" "aliniger"
[5] "allisoni" "allogus" "altitudinalis" "alumina"
[9] "alutaceus" "angusticeps" "argenteolus" "argillaceus"
[13] "armouri" "bahorucoensis" "baleatus" "baracoae"
[17] "barahonae" "barbatus" "barbouri" "bartschi"
[21] "bremeri" "breslini" "brevirostris" "caudalis"
[25] "centralis" "chamaeleonides" "chlorocyanus" "christophei"
[29] "clivicola" "coelestinus" "confusus" "cooki"
[33] "cristatellus" "cupeyalensis" "cuvieri" "cyanopleurus"
[37] "cybotes" "darlingtoni" "distichus" "dolichocephalus"
[41] "equestris" "etheridgei" "eugenegrahami" "evermanni"
[45] "fowleri" "garmani" "grahami" "guafe"
[49] "guamuhaya" "guazuma" "gundlachi" "haetianus"
[53] "hendersoni" "homolechis" "imias" "inexpectatus"
[57] "insolitus" "isolepis" "jubar" "krugi"
[61] "lineatopus" "longitibialis" "loysiana" "lucius"
[65] "luteogularis" "macilentus" "marcanoi" "marron"
[69] "mestrei" "monticola" "noblei" "occultus"
[73] "olssoni" "opalinus" "ophiolepis" "oporinus"
[77] "paternus" "placidus" "poncensis" "porcatus"
[81] "porcus" "pulchellus" "pumilis" "quadriocellifer"
[85] "reconditus" "ricordii" "rubribarbus" "sagrei"
[89] "semilineatus" "sheplani" "shrevei" "singularis"
[93] "smallwoodi" "strahmi" "stratulus" "valencienni"
[97] "vanidicus" "vermiculatus" "websteri" "whitemani"
$data_not_tree
[1] "1" "10" "100" "11" "12" "13" "14" "15" "16" "17" "18" "19"
[13] "2" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "3"
[25] "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "4" "40"
[37] "41" "42" "43" "44" "45" "46" "47" "48" "49" "5" "50" "51"
[49] "52" "53" "54" "55" "56" "57" "58" "59" "6" "60" "61" "62"
[61] "63" "64" "65" "66" "67" "68" "69" "7" "70" "71" "72" "73"
[73] "74" "75" "76" "77" "78" "79" "8" "80" "81" "82" "83" "84"
[85] "85" "86" "87" "88" "89" "9" "90" "91" "92" "93" "94" "95"
[97] "96" "97" "98" "99"
# Plot traits.
plot(anolisData[, c("awesomeness", "hostility")])
plot(hostility ~ awesomeness, data = anolisData)
= lm(hostility ~ awesomeness, data = anolisData)
fit summary(fit)
Call:
lm(formula = hostility ~ awesomeness, data = anolisData)
Residuals:
Min 1Q Median 3Q Max
-0.7035 -0.3065 -0.0416 0.2440 0.7884
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.10843 0.03953 2.743 0.00724 **
awesomeness -0.88116 0.03658 -24.091 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3807 on 98 degrees of freedom
Multiple R-squared: 0.8555, Adjusted R-squared: 0.8541
F-statistic: 580.4 on 1 and 98 DF, p-value: < 2.2e-16
abline(fit)
Check for phylogenetic signal in residuals.
# Calculate weight matrix for phylogenetic distance.
= 1/cophenetic(anolisTree)
w diag(w) = 0
Moran.I(residuals(fit), w)
$observed
[1] 0.05067625
$expected
[1] -0.01010101
$sd
[1] 0.00970256
$p.value
[1] 3.751199e-10
Conclusion: signal in the residuals, a normal lm will not work.
You can also check with DHARMa, using this works also for GLMMs
= simulateResiduals(fit)
res testSpatialAutocorrelation(res, distMat = cophenetic(anolisTree))
DHARMa Moran's I test for distance-based autocorrelation
data: res
observed = 0.0509093, expected = -0.0101010, sd = 0.0097304, p-value =
3.609e-10
alternative hypothesis: Distance-based autocorrelation
An old-school method to deal with the problem are the so-called Phylogenetically Independent Contrasts (PICs) (Felsenstein, J. (1985) “Phylogenies and the comparative method”. American Naturalist, 125, 1–15.). The idea here is to transform your data in a way that an lm is still appropriate. For completeness, I show the method here.
# Extract columns.
= anolisData[, "hostility"]
host = anolisData[, "awesomeness"]
awe
# Give them names.
names(host) = names(awe) = rownames(anolisData)
# Calculate PICs.
= pic(host, anolisTree) hPic
Warning in pic(host, anolisTree): the names of argument 'x' and the tip labels
of the tree did not match: the former were ignored in the analysis.
= pic(awe, anolisTree) aPic
Warning in pic(awe, anolisTree): the names of argument 'x' and the tip labels of
the tree did not match: the former were ignored in the analysis.
# Make a model.
= lm(hPic ~ aPic - 1)
picModel
summary(picModel) # Yes, significant.
Call:
lm(formula = hPic ~ aPic - 1)
Residuals:
Min 1Q Median 3Q Max
-1.30230 -0.23485 0.06003 0.34772 0.92222
Coefficients:
Estimate Std. Error t value Pr(>|t|)
aPic -0.91964 0.03887 -23.66 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4263 on 98 degrees of freedom
Multiple R-squared: 0.851, Adjusted R-squared: 0.8495
F-statistic: 559.9 on 1 and 98 DF, p-value: < 2.2e-16
# plot results.
plot(hPic ~ aPic)
abline(a = 0, b = coef(picModel))
Now, new school, with a PGLS
= gls(hostility ~ awesomeness,
pglsModel correlation = corBrownian(phy = anolisTree, form =~ species),
data = anolisData, method = "ML")
summary(pglsModel)
Generalized least squares fit by maximum likelihood
Model: hostility ~ awesomeness
Data: anolisData
AIC BIC logLik
42.26092 50.07643 -18.13046
Correlation Structure: corBrownian
Formula: ~species
Parameter estimate(s):
numeric(0)
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.1158895 0.12500397 0.927087 0.3562
awesomeness -0.9196414 0.03886501 -23.662451 0.0000
Correlation:
(Intr)
awesomeness -0.065
Standardized residuals:
Min Q1 Med Q3 Max
-1.49512017 -0.75193433 -0.06672209 0.56527753 2.04613817
Residual standard error: 0.4220369
Degrees of freedom: 100 total; 98 residual
coef(pglsModel)
(Intercept) awesomeness
0.1158895 -0.9196414
plot(hostility ~ awesomeness, data = anolisData)
abline(pglsModel, col = "red")
OK, same result, but PGLS is WAY more flexible than PICs. For example, we can include a discrete predictor:
= gls(hostility ~ ecomorph,
pglsModel2 correlation = corBrownian(phy = anolisTree, form =~ species),
data = anolisData, method = "ML")
summary(pglsModel2)
Generalized least squares fit by maximum likelihood
Model: hostility ~ ecomorph
Data: anolisData
AIC BIC logLik
235.1126 255.954 -109.5563
Correlation Structure: corBrownian
Formula: ~species
Parameter estimate(s):
numeric(0)
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.2280018 0.3630767 0.6279713 0.5316
ecomorphGB -0.2737370 0.2128984 -1.2857635 0.2017
ecomorphT -0.2773801 0.3872137 -0.7163490 0.4756
ecomorphTC -0.5457771 0.2449466 -2.2281475 0.0283
ecomorphTG -0.2645627 0.2084928 -1.2689297 0.2076
ecomorphTW -0.5388436 0.2370223 -2.2733878 0.0253
ecomorphU -0.3013944 0.2264264 -1.3310922 0.1864
Correlation:
(Intr) ecmrGB ecmrpT ecmrTC ecmrTG ecmrTW
ecomorphGB -0.385
ecomorphT -0.276 0.360
ecomorphTC -0.369 0.626 0.349
ecomorphTG -0.426 0.638 0.431 0.608
ecomorphTW -0.372 0.626 0.377 0.588 0.641
ecomorphU -0.395 0.597 0.394 0.587 0.647 0.666
Standardized residuals:
Min Q1 Med Q3 Max
-2.57909973 -0.62394508 0.03716963 0.49997446 2.33859983
Residual standard error: 1.05295
Degrees of freedom: 100 total; 93 residual
anova(pglsModel2)
Denom. DF: 93
numDF F-value p-value
(Intercept) 1 0.0555807 0.8141
ecomorph 6 1.2170027 0.3046
# We can even include multiple predictors:
= gls(hostility ~ ecomorph * awesomeness,
pglsModel3 correlation = corBrownian(phy = anolisTree, form =~ species),
data = anolisData, method = "ML")
summary(pglsModel3)
Generalized least squares fit by maximum likelihood
Model: hostility ~ ecomorph * awesomeness
Data: anolisData
AIC BIC logLik
53.36917 92.44673 -11.68459
Correlation Structure: corBrownian
Formula: ~species
Parameter estimate(s):
numeric(0)
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.2740102 0.14336154 1.911323 0.0593
ecomorphGB -0.2079698 0.08757937 -2.374644 0.0198
ecomorphT -0.1751884 0.15478802 -1.131795 0.2609
ecomorphTC -0.2030466 0.10752002 -1.888454 0.0623
ecomorphTG -0.1260964 0.08339737 -1.511994 0.1342
ecomorphTW -0.1600076 0.09700188 -1.649531 0.1027
ecomorphU -0.1244498 0.09457082 -1.315943 0.1917
awesomeness -1.0131496 0.08971063 -11.293529 0.0000
ecomorphGB:awesomeness 0.0750120 0.08289316 0.904924 0.3680
ecomorphT:awesomeness 0.1373797 0.11770513 1.167152 0.2464
ecomorphTC:awesomeness 0.1161086 0.11490811 1.010447 0.3151
ecomorphTG:awesomeness 0.1666831 0.09824670 1.696577 0.0934
ecomorphTW:awesomeness 0.0120495 0.11532810 0.104480 0.9170
ecomorphU:awesomeness 0.0283477 0.10510376 0.269711 0.7880
Correlation:
(Intr) ecmrGB ecmrpT ecmrTC ecmrTG ecmrTW ecmrpU awsmns
ecomorphGB -0.398
ecomorphT -0.289 0.372
ecomorphTC -0.361 0.598 0.357
ecomorphTG -0.435 0.647 0.447 0.579
ecomorphTW -0.377 0.644 0.391 0.579 0.657
ecomorphU -0.403 0.589 0.424 0.546 0.658 0.666
awesomeness -0.104 0.123 0.045 0.078 0.046 0.005 0.108
ecomorphGB:awesomeness 0.129 -0.280 -0.095 -0.171 -0.151 -0.191 -0.184 -0.682
ecomorphT:awesomeness 0.082 -0.085 -0.074 -0.071 -0.036 -0.011 -0.111 -0.716
ecomorphTC:awesomeness 0.102 -0.120 -0.092 -0.359 -0.079 -0.091 -0.136 -0.695
ecomorphTG:awesomeness 0.090 -0.073 -0.023 -0.058 -0.056 -0.036 -0.140 -0.811
ecomorphTW:awesomeness 0.051 -0.124 0.029 -0.054 -0.023 -0.052 -0.006 -0.666
ecomorphU:awesomeness 0.101 -0.129 -0.129 -0.143 -0.133 -0.122 -0.283 -0.672
ecmGB: ecmrT: ecmTC: ecmTG: ecmTW:
ecomorphGB
ecomorphT
ecomorphTC
ecomorphTG
ecomorphTW
ecomorphU
awesomeness
ecomorphGB:awesomeness
ecomorphT:awesomeness 0.516
ecomorphTC:awesomeness 0.519 0.530
ecomorphTG:awesomeness 0.611 0.684 0.609
ecomorphTW:awesomeness 0.535 0.536 0.482 0.569
ecomorphU:awesomeness 0.515 0.535 0.644 0.626 0.480
Standardized residuals:
Min Q1 Med Q3 Max
-1.6656909 -0.7164061 -0.1305515 0.6718348 1.7699106
Residual standard error: 0.3956912
Degrees of freedom: 100 total; 86 residual
anova(pglsModel3)
Denom. DF: 86
numDF F-value p-value
(Intercept) 1 0.3640 0.5479
ecomorph 6 7.9691 <.0001
awesomeness 1 517.8319 <.0001
ecomorph:awesomeness 6 0.8576 0.5295
We can also assume that the error structure follows an Ornstein-Uhlenbeck model rather than Brownian motion. When trying this, however, I noted that the model does not converge due to a scaling problem. We can do a quick fix by making the branch lengths longer. This will not affect the analysis other than rescaling a nuisance parameter.
= anolisTree
tempTree $edge.length = tempTree$edge.length * 100
tempTree= gls(hostility ~ awesomeness,
pglsModelLambda correlation = corPagel(1, phy = tempTree, fixed = FALSE,
form =~ species),
data = anolisData, method = "ML")
summary(pglsModelLambda)
Generalized least squares fit by maximum likelihood
Model: hostility ~ awesomeness
Data: anolisData
AIC BIC logLik
43.64714 54.06782 -17.82357
Correlation Structure: corPagel
Formula: ~species
Parameter estimate(s):
lambda
1.01521
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.1170472 0.12862370 0.909997 0.3651
awesomeness -0.9248858 0.03870928 -23.893129 0.0000
Correlation:
(Intr)
awesomeness -0.062
Standardized residuals:
Min Q1 Med Q3 Max
-1.46625592 -0.74557818 -0.06456682 0.54645141 2.02371257
Residual standard error: 0.4317018
Degrees of freedom: 100 total; 98 residual
= gls(hostility ~ awesomeness,
pglsModelOU correlation = corMartins(1, phy = tempTree, form =~ species),
data = anolisData)
summary(pglsModelOU)
Generalized least squares fit by REML
Model: hostility ~ awesomeness
Data: anolisData
AIC BIC logLik
50.7625 61.10237 -21.38125
Correlation Structure: corMartins
Formula: ~species
Parameter estimate(s):
alpha
0.003194918
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.1179388 0.4300640 0.274236 0.7845
awesomeness -0.9148437 0.0384949 -23.765320 0.0000
Correlation:
(Intr)
awesomeness -0.02
Standardized residuals:
Min Q1 Med Q3 Max
-1.11558553 -0.54574106 -0.05696661 0.40461428 1.48285458
Residual standard error: 0.5740297
Degrees of freedom: 100 total; 98 residual
Other example: http://schmitzlab.info/pgls.html.
For fitting PGLS with various models, you should also consider the caper
package.
11.4.1 Exercise - Does bird song change with altitude?
11.4.1.1 Excercise
11.5 Multivariate GLMs
In the recent years, multivariate GLMs, in particular the multivariate probit model, and latent variable versions thereof, have become popular for the analysis of community data. The keyword here is “joint species distribution models” (jSDMs).
Briefly, what we want a jSDM to do is to fit a vector of responses (could be abundance or presence / absence data) as a function of environmental predictors and a covariance between the responses. The model is thus
\[ y \sim f(x) + \Sigma \] where y is a response vector, f(x) is a matrix with effects, and \(\Sigma\) is a covariance matrix that we want to estimate.
You can fit these models directly in lme4 or glmmTMB via implementing an RE per species
lme4(abundance ~ 0 + species + env:species + (0+species|site)
glmmTMB(abundance ~ 0 + species + env:species + (0+species|site)
however, these so-called full-rank models have a lot of degrees of freedom and are slow to compute if the number of species gets large. It has therefore become common to fit rank-reduced versions of the these models using a latent-variable reparameterization of the model above. A latent-variable version of this model can be fit in glmmTMB via this syntax
glmmTMB(abundance ~ 0 + species + env:species + rr(Species + 0|id, d = 2))
The parameter d controlls the number of latent variables.
Of course, there are many more specialized packages for fitting latent-variable jSDMs in R right now, including hmsc, gllvm or sjSDM, but I find it nice to set this models in the general topic of correlations, and realize that we can fit them with the same methods and packages as for all other correlation structures.