Advanced Regression Models with R
Preface
If we have a response variable and want to understand how this response variable is influenced by one or several factors, we will typically use a regression model. The aim of this course is to enable you to run such a regression model in the quality expected for a “real” scientific study. To do so, you will have to master a number of skills, in particular:
- Understanding the fundamental statistical indicators in regression analysis (p-value, estimator) and their quality (power, bias, error, coverage),
- Understanding what a causal effect means in a regression context, and what this means for experimental design and model selection,
- Knowing all building blocks of the “advanced GLMM framework” that helps us to correctly model our data, e.g. GLMs, random effects, GAMs, correlation structures, …
- Knowledge of standard non-parametric evaluation methods for regression models, such as parametric and non-parametric bootstrap, cross-validation,
- and the ability to use all of these methods in an applied data analysis.
This is what we will mainly train in this course. Don’t worry if you think that this sounds too simple. We could spend an entire week on understanding the p-value alone, and still only scratch the surface. If you have an overview of what can be done with regression models, and are confident to run a realistic scientific analysis on your own, we have achieved a lot.
This course assumes basic prior knowledge of statistical methods (tests, regressions, p-value, power, CIs, …) and the ability to apply those in R. At the University of Regensburg, this knowledge would be taught in the Bachelors Biology Lecture “Statistik und Bioinformatik” (lecture notes in German here), and the block course “Introduction to statistics in R”. If you didn’t take those or comparable courses, you should at least try to get some basic understanding of R before proceeding with this book. This lecture series from MarinStatsLectures could be a good start. There is also an appendix in this book which covers the most common functions for data manipulation and plotting in R.
If you have comments, questions or suggestions regarding this book, please submit them here.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Note that some elements of this work (embedded videos, graphics) may be under a seperate licence and are thus not included in this licence.