Testing for under-/overdispersion in GLMs/GLMMs
with DHARMa
Acompanying example script for the paper “Dispersion tests for generalized linear mixed-effects models”.
Our aim is to give instructions and show examples for testing for
under/overdispersion for GLMs (base R functions glm) and
GLMMs (package lme4) using the DHARMa
package.
Users can substitute the simulated data and model by their own data
and model objects1 to use the dispersion test function from
the DHARMa package.
Loading used packages:
library(DHARMa) # dispersion test and other models' diagnostic
library(lme4) # for GLMMs
set.seed(2025) # for reproducible examplesDispersion tests in DHARMa
Dispersion tests in DHARMa are performed using the
function testDispersion. You can test for:
Both under- or overdispersion with the argument
alternative = "two.sided"(two sided test).Underdispersion only with
alternative = "less".Overdispersion only with
alternative = "greater".
Parametric Pearson residuals test
The parametric Pearson residuals test is peformed when using the
argument type = "PearsonChisq, when providing the model
object in the first argument position (simulationOutput),
for example:
Note that parametric Perason Pearson residuals test is biased
for GLMMs towards underdispersion (more details in the main
study). Tests with alternative = "two.sided" or
alternative = "less" are therefore not reliable. If you
have random effects in your model, we recommend to test only for
overdispersion with alternative = 'greater'.
Nonparametric Pearson residuals test
For the nonparametric Pearson residuals test, it is necessary a first
step in generating simulations from the model using the function
simulateResiduals with the argument
refit = TRUE. This way, the function will record the
Pearson residuals from the simulated data from parametric bootstrapping.
Afterwards, use the resulting object in testDispersion with
type = "DHARMa":
The number of simulations is set in simulateResiduals()
with the argument n, which is by default 250. If your model
is complex and the computational time is too long for the parametric
bootstrapping, we recommend you to decrease the number of simulations
(however, don’t go lower than 100), or use the simulation-based response
variance test explained below.
Simulation-based response variance test
The simulation-based response variance test is available through the
same functions as before, but for simulateResiduals() with
argument refit = F (the default of the function):
For GLMMs, the options for simulating results
conditionally on the random effects is currently
available only for models fitted with the lme4 package by
using the argument re.form = NULL, which is passed to the
function simulate.merMod() in lme4:
Note that the default of the simulateResiduals() for
GLMMs is to simulate unconditionally on the random
effects, which would be equal to re.form = NA or
re.form = ~0:
Example GLM Poisson
Well fitted data
Generating data:
poisData <- createData(sampleSize = 200, intercept = 0, fixedEffects = 1,
randomEffectVariance = 0, # no RE variance
overdispersion = 0, # no overdispersion added
family = poisson())
plot(observedResponse ~ Environment1, data = poisData)Modeling with glm function from base R:
poisModel <- glm(observedResponse ~ Environment1, family = poisson,
data = poisData)
summary(poisModel)##
## Call:
## glm(formula = observedResponse ~ Environment1, family = poisson,
## data = poisData)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.02182 0.07501 0.291 0.771
## Environment1 1.00746 0.12370 8.144 3.81e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 291.40 on 199 degrees of freedom
## Residual deviance: 220.69 on 198 degrees of freedom
## AIC: 537.67
##
## Number of Fisher Scoring iterations: 5
Testing dispersion
Note that all tests present similar dispersion statistics and non-significant p-values, as expected.
- Nonparametric Pearson residuals test:
##
## DHARMa nonparametric dispersion test via mean deviance residual fitted
## vs. simulated-refitted
##
## data: res
## dispersion = 1.0244, p-value = 0.816
## alternative hypothesis: two.sided
- Parametric Pearson residuals test (two-sided test)
## Note that the chi2 test on Pearson residuals is biased for MIXED models towards underdispersion. Tests with alternative = two.sided or less are therefore not reliable. If you have random effects in your model, I recommend to test only with alternative = 'greater', i.e. test for overdispersion, or else use the DHARMa default tests which are unbiased. See help for details.
##
## Parametric dispersion test via mean Pearson-chisq statistic
##
## data: poisModel
## dispersion = 1.0196, df = 198, p-value = 0.8204
## alternative hypothesis: two.sided
The same result with:
## Note that the chi2 test on Pearson residuals is biased for MIXED models towards underdispersion. Tests with alternative = two.sided or less are therefore not reliable. If you have random effects in your model, I recommend to test only with alternative = 'greater', i.e. test for overdispersion, or else use the DHARMa default tests which are unbiased. See help for details.
##
## Parametric dispersion test via mean Pearson-chisq statistic
##
## data: res
## dispersion = 1.0196, df = 198, p-value = 0.8204
## alternative hypothesis: two.sided
- Simulation-based response variance test
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.97973, p-value = 0.928
## alternative hypothesis: two.sided
Overdispersed data
Generating overdispersed data:
poisData <- createData(sampleSize = 200, intercept = 0, fixedEffects = 1,
randomEffectVariance = 0, # no RE variance
overdispersion = 1, # overdispersion added
family = poisson())
plot(observedResponse ~ Environment1, data = poisData)Modeling
poisModel <- glm(observedResponse ~ Environment1, family = poisson,
data = poisData)
summary(poisModel)##
## Call:
## glm(formula = observedResponse ~ Environment1, family = poisson,
## data = poisData)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.47570 0.05893 8.073 6.89e-16 ***
## Environment1 0.85115 0.09909 8.590 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 675.89 on 199 degrees of freedom
## Residual deviance: 598.03 on 198 degrees of freedom
## AIC: 921.55
##
## Number of Fisher Scoring iterations: 6
Testing dispersion
Notice that all tests presented similar dispersion parameters and significant p-values, as expected.
- Nonparametric Pearson residuals test:
##
## DHARMa nonparametric dispersion test via mean deviance residual fitted
## vs. simulated-refitted
##
## data: res
## dispersion = 4.213, p-value < 2.2e-16
## alternative hypothesis: two.sided
- Parametric Pearson residuals test (two-sided test)
## Note that the chi2 test on Pearson residuals is biased for MIXED models towards underdispersion. Tests with alternative = two.sided or less are therefore not reliable. If you have random effects in your model, I recommend to test only with alternative = 'greater', i.e. test for overdispersion, or else use the DHARMa default tests which are unbiased. See help for details.
##
## Parametric dispersion test via mean Pearson-chisq statistic
##
## data: poisModel
## dispersion = 4.2158, df = 198, p-value < 2.2e-16
## alternative hypothesis: two.sided
- Simulation-based response variance test
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 4.8204, p-value < 2.2e-16
## alternative hypothesis: two.sided
Example GLMMs Poisson
Well fitted data
Generating data with random intercepts:
poisDataMM <- createData(sampleSize = 200, intercept = 0, fixedEffects = 1,
randomEffectVariance = 1, # added RE variance
overdispersion = 0, # no overdispersion added
family = poisson())
plot(observedResponse ~ Environment1, col = group, data = poisDataMM)Modeling using lme4:
poisModelMM <- glmer(observedResponse ~ Environment1 + (1|group), family = poisson,
data = poisDataMM)
summary(poisModelMM)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: observedResponse ~ Environment1 + (1 | group)
## Data: poisDataMM
##
## AIC BIC logLik -2*log(L) df.resid
## 493.3 503.2 -243.7 487.3 197
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4786 -0.6734 -0.3481 0.4307 3.7842
##
## Random effects:
## Groups Name Variance Std.Dev.
## group (Intercept) 0.513 0.7162
## Number of obs: 200, groups: group, 10
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.1380 0.2419 -0.571 0.568
## Environment1 0.9750 0.1194 8.166 3.17e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Environmnt1 -0.112
Testing dispersion
Note that all dispersion statistics presented values below the expected value of 1 (around 0.88), while the simulation-based unconditional test presented the lowest dispersion statistics (0.65).
The two-sided parametric Pearson residuals test was marginaly significant (p = 0.06), but towards underdispersion (the opposite expectation). When testing only for overdispersion with the same test, the p-value was non-significant.
- Nonparametric Pearson residuals test:
##
## DHARMa nonparametric dispersion test via mean deviance residual fitted
## vs. simulated-refitted
##
## data: res
## dispersion = 0.88397, p-value = 0.312
## alternative hypothesis: two.sided
- Parametric Pearson residuals test (two-sided test)
## Note that the chi2 test on Pearson residuals is biased for MIXED models towards underdispersion. Tests with alternative = two.sided or less are therefore not reliable. If you have random effects in your model, I recommend to test only with alternative = 'greater', i.e. test for overdispersion, or else use the DHARMa default tests which are unbiased. See help for details.
##
## Parametric dispersion test via mean Pearson-chisq statistic
##
## data: poisModelMM
## dispersion = 0.81883, df = 197, p-value = 0.05934
## alternative hypothesis: two.sided
- Parametric Pearson residuals test (only
overdispersion test -
alternative = "greater")
##
## Parametric dispersion test via mean Pearson-chisq statistic
##
## data: poisModelMM
## dispersion = 0.81883, df = 197, p-value = 0.9703
## alternative hypothesis: greater
- Simulation-based response variance test: conditional simulations
res <- simulateResiduals(poisModelMM, refit = F, re.form = NULL)
testDispersion(res, type = "DHARMa")##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.87512, p-value = 0.424
## alternative hypothesis: two.sided
3b. SSimulation-based response variance test: unconditional simulations
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.64473, p-value = 0.712
## alternative hypothesis: two.sided
Overdispersed data
Generating data with overdispersion and random intercepts:
poisDataMM <- createData(sampleSize = 200, intercept = 0, fixedEffects = 1,
randomEffectVariance = 1, # added RE variance
overdispersion = 1, # added overdispersion
family = poisson())
plot(observedResponse ~ Environment1, col = group, data = poisDataMM)Modeling using lme4:
poisModelMM <- glmer(observedResponse ~ Environment1 + (1|group), family = poisson,
data = poisDataMM)
summary(poisModelMM)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: observedResponse ~ Environment1 + (1 | group)
## Data: poisDataMM
##
## AIC BIC logLik -2*log(L) df.resid
## 801.4 811.3 -397.7 795.4 197
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2328 -0.9363 -0.4312 0.5823 8.6896
##
## Random effects:
## Groups Name Variance Std.Dev.
## group (Intercept) 0.6571 0.8106
## Number of obs: 200, groups: group, 10
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2855 0.2670 1.069 0.285
## Environment1 1.1803 0.1006 11.729 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Environmnt1 -0.100
Testing dispersion
Notice that the nonparameric Pearson residuals test presented some warning messages of convergence failure in models refited by the bootstrapping procedure. Convergence and other model fitting problems may arrise when applying parametric bootstrapping in GLMMs, even with a simple GLMM as the one of the example (just one random intercept). Convergence problems tend to be increase for more complex models.
The dispersion statistics were similar across tests, except for the simulation-based unconditional test, which was also the only model to present a non-significant p-value (very low power).
- Nonparametric Pearson residuals test:
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0206084 (tol = 0.002, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.017267 (tol = 0.002, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0168244 (tol = 0.002, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0208834 (tol = 0.002, component 1)
##
## DHARMa nonparametric dispersion test via mean deviance residual fitted
## vs. simulated-refitted
##
## data: res
## dispersion = 2.5017, p-value < 2.2e-16
## alternative hypothesis: two.sided
- Parametric Pearson residuals test (two-sided test)
## Note that the chi2 test on Pearson residuals is biased for MIXED models towards underdispersion. Tests with alternative = two.sided or less are therefore not reliable. If you have random effects in your model, I recommend to test only with alternative = 'greater', i.e. test for overdispersion, or else use the DHARMa default tests which are unbiased. See help for details.
##
## Parametric dispersion test via mean Pearson-chisq statistic
##
## data: poisModelMM
## dispersion = 2.3472, df = 197, p-value < 2.2e-16
## alternative hypothesis: two.sided
- Parametric Pearson residuals test (only
overdispersion test -
alternative = "greater")
##
## Parametric dispersion test via mean Pearson-chisq statistic
##
## data: poisModelMM
## dispersion = 2.3472, df = 197, p-value < 2.2e-16
## alternative hypothesis: greater
- Simulation-based response variance test: conditional simulations
res <- simulateResiduals(poisModelMM, refit = F, re.form = NULL)
testDispersion(res, type = "DHARMa")##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 2.661, p-value < 2.2e-16
## alternative hypothesis: two.sided
- Simulation-based response variance test: unconditional simulations
# Simulation-based conditional dispersion test
res <- simulateResiduals(poisModelMM, refit = F, re.form = NA)
testDispersion(res, type = "DHARMa")##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 1.3285, p-value = 0.36
## alternative hypothesis: two.sided
Session info for code reproducibility
## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS 15.6.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] pt_BR.UTF-8/pt_BR.UTF-8/pt_BR.UTF-8/C/pt_BR.UTF-8/pt_BR.UTF-8
##
## time zone: Europe/Berlin
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lme4_1.1-37 Matrix_1.7-2 DHARMa_0.4.7
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-167 cli_3.6.5 knitr_1.50 rlang_1.1.6
## [5] xfun_0.51 reformulas_0.4.0 minqa_1.2.8 jsonlite_2.0.0
## [9] htmltools_0.5.8.1 sass_0.4.9 rmarkdown_2.29 grid_4.4.1
## [13] evaluate_1.0.3 jquerylib_0.1.4 MASS_7.3-64 rmdformats_1.0.4
## [17] fastmap_1.2.0 yaml_2.3.10 lifecycle_1.0.4 bookdown_0.42
## [21] compiler_4.4.1 Rcpp_1.1.0 rstudioapi_0.17.1 lattice_0.22-6
## [25] digest_0.6.37 nloptr_2.2.1 R6_2.6.1 Rdpack_2.6.3
## [29] splines_4.4.1 rbibutils_2.3 bslib_0.9.0 tools_4.4.1
## [33] boot_1.3-31 cachem_1.1.0
Please verify the currently supported modeling packages in DHARMa.↩︎