3  ANOVA and other tests on the LM

3.1 The idea of ANOVA

ANOVA stands for ANalysis Of VAriance. The basic idea is to find out how much of the signal (variance) is explained by different factors. We had already short introduced ANOVA in the section on categorical predictors.

The problem with explaining ANOVA is that the term is overfraught with historical meanings and explanations that are no further relevant. It used to be that ANOVA is a stand-alone method that you use for experimental designs with different treatments, that ANOVA assumes normal distribution and partitions sum of squares, that there are repeated-measure ANOVAS and all that, and those varieties of ANOVA partly still exist, but in general, there is a much simpler and general explanation of ANOVA:

Modern explanation: ANOVA is not a statistical model, but a hypothesis test that can be performed on top of any regression model. What this test is doing is to measure how much model fit improves when a predictor is added, and if this improvement is significant.

3.1.1 An example

As an example, here is a ANOVA (function aov()) performed on the fit of a linear model

fit = lm(Ozone ~ Wind + Temp, data = airquality)
summary(aov(fit))
             Df Sum Sq Mean Sq F value   Pr(>F)    
Wind          1  45284   45284   94.81  < 2e-16 ***
Temp          1  25886   25886   54.20 3.15e-11 ***
Residuals   113  53973     478                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
37 observations deleted due to missingness

In the standard ANOVA for the LM, model fit is measured by the reduction in residual sum of squares. Let’s look at the example above. In the ANOVA table above, we (virtually) start with an intercept only model. Now, what the table tells us is that adding Wind to the model reduces the Sum Sq. by 45284, and adding also Temp reduces the Sum Sq by another 25886, which leaves us with 53973 residual sum sq. From this, we can also conclude that the total variance of the response is 53973 + 25886 + 45284 = 125143. Let’s check this:

sum((fit$model$Ozone - mean(fit$model$Ozone))^2)
[1] 125143.1

Thus, we can conclude that the R2 explained by each model component is

53973/125143 # Wind
[1] 0.4312906
25886/125143 # Temp
[1] 0.2068514
45284/125143 # Residual
[1] 0.361858

Moreover, the ANOVA table performs tests to see if the improvement of model fit is significant against a null model. This is important because, as mentioned before (particular in the chapter on model selection), adding a predictor always improves model fit.

To interpret the p-values, consider that H0 = the simpler model is true, thus we test if the improvement of model fit is higher than what we would expect if the predictor has no effect.

3.1.2 Customizing the partitioning

There are a number of cases where it can make sense to perform an ANOVA for larger parts of the model. Consider, for example, the following regression:

fit = lm(Ozone ~ Wind + I(Wind^2) + Temp + I(Temp^2), data = airquality)

Maybe, we would like to ask how much variance is explained by Wind + Wind^2, and how much by Temp + Temp^2. In this case, we can perform custom ANOVA, using the anova() function that we already introduced in the section on model selection via likelihood ratio tests (LRTs).

m0 = lm(Ozone ~ 1, data = airquality)
m1 = lm(Ozone ~ Wind + I(Wind^2) , data = airquality)
m2 = lm(Ozone ~ Wind + I(Wind^2) + Temp + I(Temp^2), data = airquality)
anova(m0, m1, m2)
Analysis of Variance Table

Model 1: Ozone ~ 1
Model 2: Ozone ~ Wind + I(Wind^2)
Model 3: Ozone ~ Wind + I(Wind^2) + Temp + I(Temp^2)
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1    115 125143                                  
2    113  64360  2     60783 81.397 < 2.2e-16 ***
3    111  41445  2     22915 30.686 2.464e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.2 Fundamental issues in ANOVA

There are three basic problems that we will come back again when generalizing this principle across a range of models:

  1. How should we measure “improvement of model fit”. Traditionally, improvement is measured by the reduction of the residual sum of squares, but for GLMs, we will have to expand this definition
  2. How should we test if the improvement in fit is significant? For simple models, this is not so much a problem
  3. How should we partition variance if predictors are collinear, and thus the order in which predictors are included matters
  4. Should we correct for complexity?

Let’s look at the problem one by one:

3.2.1 Measuring model fit

The definition of model fit via sum of squares makes sense as long as we work with linear models.

For GLMs, this definition doesn’t make sense any more. A number of so-called pseudo-R2 metrics have been proposed. Most of them are based on the likelihood (as a measure of model fit) and try to recover as far as possible the properties of an R2 for the linear model.

A common metrics is McFadden pseudo-R2, which is defined as 1-[LogL(M)/LogL(M0))], where M is our model, and M0 is an intercept only model.

3.2.2 Testing if the improvement is significant

The test used in our example before is an F-test. The F-test is used for models that assume normal distribution. The F-test can be interpreted as a special case of a likelihood ratio test (LRT), which we already used in the chapter on model selection. An LRT can be used on two nested models, with M0 being the simpler model, and makes the following assumptions:

  1. H0 = M0 is true
  2. Test statistic = likelihood ratio -2 [MLE(M0)/MLE(H1)]
  3. Then, under relatively broad conditions, the test static will be chi-squared distributed, with df = difference residual df (parameters) of the models M1, M0

This setup works for LMs and GLMs, but runs into problems when the definition how many df a model has is unclear. This is in particular the case for mixed models. In this case, one can resort to simulated LRTs. Simulated LRTs are a special case of the boostrap, which is a very general and popular method to generate nonparametric confidence intervals and null distributions. We will talk in detail about these methods in the chapter on nonparametric methods, but because this method is crucial for mixed models, I want to shortly explain it already here:

The difference between a parametric and a nonparametric test is that the latter does not make assumptions about the test statistic (point 3 above), but somehow generates the latter from the data. The parametric bootstrap does this in the following way:

  1. Simulate data from H0 (= the fitted model M0)
  2. Re-fit M0 and M1, and calculate likelihood ratios
  3. Repeat n times to get an idea about the expected increase in likelihood when moving to M1 under the assumption that M0 is correct
library(DHARMa)
This is DHARMa 0.4.7. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')
m0 = lm(Ozone ~ Wind , data = airquality)
m1 = lm(Ozone ~ Wind + Temp, data = airquality)
simulateLRT(m0, m1)


    DHARMa simulated LRT

data:  m0: m0 m1: m1
LogL(M1/M0) = 22.723, p-value < 2.2e-16
alternative hypothesis: M1 describes the data better than M0
# for comparison
anova(m0, m1)
Analysis of Variance Table

Model 1: Ozone ~ Wind
Model 2: Ozone ~ Wind + Temp
  Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
1    114 79859                                  
2    113 53973  1     25886 54.196 3.149e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.2.3 Partitioning variance if order matters

Another key problem in ANOVA, and a source of much confusion, is how to deal with the fact that often, the order in which model components are added matters. Compare the results of our previous ANOVA with this one, where I only flipped the order of Temp and Wind:

fit = lm(Ozone ~ Temp + Wind, data = airquality)
summary(aov(fit))
             Df Sum Sq Mean Sq F value   Pr(>F)    
Temp          1  61033   61033  127.78  < 2e-16 ***
Wind          1  10137   10137   21.22 1.08e-05 ***
Residuals   113  53973     478                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
37 observations deleted due to missingness

The result is markedly different, and reason is that the aov function performs a so-called type I ANOVA. The type I ANOVA adds variables in the order in which they are in the model formula, and because Temp and Wind are collinear, the variable that is added first to the model will absorb variation from the other, and thus seems to explain more of the response.

There are other types of ANOVA that avoid this problem. The so-called type II ANOVA shows for each variable only the part that is uniquely attributable to the respective variable

car::Anova(fit, type = "II")
Anova Table (Type II tests)

Response: Ozone
          Sum Sq  Df F value    Pr(>F)    
Temp       25886   1  54.196 3.149e-11 ***
Wind       10137   1  21.223 1.080e-05 ***
Residuals  53973 113                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is also type III, which is as type II, but avoids a similar problem for interactions, i.e. it discards the variance that is shared between main effects and interactions. Note that numeric variables should be centered and categorical variables should have orthogonal contrast when running a type III ANOVA (see below).

car::Anova(fit, type = "III")
Anova Table (Type III tests)

Response: Ozone
            Sum Sq  Df F value    Pr(>F)    
(Intercept)   4335   1  9.0763  0.003196 ** 
Temp         25886   1 54.1960 3.149e-11 ***
Wind         10137   1 21.2230 1.080e-05 ***
Residuals    53973 113                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here is an overview of the situation for 2 predictors A and B and their interaction. The upper left figure corresponds to the case where we have no collinearity between either of those variables. The figure on the top right (and similarly types I - III) are the three possible types of ANOVA for variables with collinearity. The “overlap” between the circles depicts the shared part, i.e. the variability that can be expressed by either variable (due to collinearity). Note that the shares in Type II, III do not add up to 1, as there is a kind of “dark variation” that we cannot securely add to either variable.

Which type of ANOVA is most appropriate? Unfortunately, the answer is both “it depends on what you want to know” and “there is a lot of discussion about this topic”. I would say the following:

  • If you have an orthogonal, balanced design, it doesn’t matter.

  • If not, type I is really only appropriate if you explicitly want to have the order-dependence. There may be some rare cases for this, e.g. if you want to say my regression should first explain everything it can by variable A, and then I only want to see if variable B adds something.

  • If you want to test for significance and partition R2 in a fair way, you should either choose type II or type III ANOVA. Discussions about which is preferable are ongoing, see e.g. (Hector, Von Felten, and Schmid 2010). As you will see when reading this, most of these discussion evolve around your exact definition of the “R2 of an interaction” and what exactly you want to test.

Caution

Note that, apart from collinearity and balance, also centering of the variables can affect the shared component between main effects and interactions, and thus the variance partitioned in a type III ANOVA. To understand this, consider the following: in a regression x1*x2, where both numeric predictors x1, x2 are centered, the interaction cannot create an average main effect on either x1 or x2, because either predictor has a equal amount of positive and negative values. If x1,x2 are uncentered, however, it is possible to make the regression predict increasing values with increasing x1 on average by setting an intercept value alone.

For that reason, working with uncentered predictors will lead to an increased shared variance between x1, x2 and the interaction x1:x2, and thus to lower SumSq values for the main effects in the type III ANOVA. You can see this by running the following code:

fit = lm(Ozone ~ scale(Temp) * scale(Wind), data = airquality)
fit2 = lm(Ozone ~ Temp * Wind, data = airquality)

car::Anova(fit, type = "II")
car::Anova(fit2, type = "II")


car::Anova(fit, type = "III")
car::Anova(fit2, type = "III")

For categorical variables, the same thing can happen - the choice of contrasts (see section on contrasts) will influence the amount of shared variation between main effects and the interactions. When using orthogonal contrasts, this shared component is minimized, because the interaction cannot explain . Let’s do a small example, where we compare a regression with treatment contrasts (which are not orthogonal) to Helmert contrasts (which are orthogonal):

airquality$fMonth = as.factor(airquality$Month)
fit1 = lm(Ozone ~ Temp * fMonth , data = airquality)
fit2 = lm(Ozone ~ Temp * fMonth , data = airquality,
          contrasts = list(fMonth = "contr.helmert"))

car::Anova(fit1, type = "III")
car::Anova(fit2, type = "III")

Let’s compare the results to an ANOVA without an interaction (note that in this case, type II / III) are identical for all sensible contrasts.

fit1 = lm(Ozone ~ Temp + fMonth , data = airquality)
car::Anova(fit1, type = "III")

If you run this, you will see that the Helmert contrasts (any other orthogonal contrasts would do as well) match much better with the results of the model without interaction, because they de-correlate the variance explained by the main effects and the interactions. Because of this, it is usually advised to use orthogonal contrasts (note: the default treatment contrasts are not orthogonal) when running a type III ANOVA with 2-way interactions between two categorical variables.

Excercise

Try out the difference between type I, II, III ANOVA for the airquality data set, either for the simple Wind + Temp model, or for more complicated models. If you want to see the effects of Type III Anova, you need to add an interaction (see next section).

Let’s use the model with Wind, Temp, Solar.R, and their interactions. Wind and Temp, and Solar.R and Temp are collinear.


Call:
lm(formula = Ozone ~ (scale(Wind) + scale(Temp) + scale(Solar.R))^2, 
    data = airquality)

Residuals:
    Min      1Q  Median      3Q     Max 
-38.685 -11.727  -2.169   7.360  91.244 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  38.025      2.052  18.535  < 2e-16 ***
scale(Wind)                 -11.801      2.087  -5.654 1.38e-07 ***
scale(Temp)                  15.689      2.172   7.222 8.78e-11 ***
scale(Solar.R)                8.664      2.157   4.016 0.000112 ***
scale(Wind):scale(Temp)      -5.377      1.966  -2.735 0.007332 ** 
scale(Wind):scale(Solar.R)   -2.294      2.122  -1.081 0.282122    
scale(Temp):scale(Solar.R)    4.315      2.084   2.070 0.040889 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 19.17 on 104 degrees of freedom
  (42 observations deleted due to missingness)
Multiple R-squared:  0.6863,    Adjusted R-squared:  0.6682 
F-statistic: 37.93 on 6 and 104 DF,  p-value: < 2.2e-16

Difference type I and II ANOVA:

                            Df Sum Sq Mean Sq F value   Pr(>F)    
scale(Wind)                  1  45694   45694 124.386  < 2e-16 ***
scale(Temp)                  1  25119   25119  68.376 4.74e-13 ***
scale(Solar.R)               1   2986    2986   8.129  0.00526 ** 
scale(Wind):scale(Temp)      1   7227    7227  19.673 2.29e-05 ***
scale(Wind):scale(Solar.R)   1    996     996   2.710  0.10271    
scale(Temp):scale(Solar.R)   1   1575    1575   4.287  0.04089 *  
Residuals                  104  38205     367                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
42 observations deleted due to missingness
Anova Table (Type II tests)

Response: Ozone
                           Sum Sq  Df F value    Pr(>F)    
scale(Wind)                 11680   1 31.7934 1.483e-07 ***
scale(Temp)                 18996   1 51.7098 1.021e-10 ***
scale(Solar.R)               3618   1  9.8497  0.002211 ** 
scale(Wind):scale(Temp)      2748   1  7.4810  0.007332 ** 
scale(Wind):scale(Solar.R)    429   1  1.1689  0.282122    
scale(Temp):scale(Solar.R)   1575   1  4.2867  0.040889 *  
Residuals                   38205 104                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Only the Temp:Solar.R interaction is not affected!

Difference type II and III ANOVA:

Anova Table (Type III tests)

Response: Ozone
                           Sum Sq  Df  F value    Pr(>F)    
(Intercept)                126210   1 343.5613 < 2.2e-16 ***
scale(Wind)                 11745   1  31.9706 1.383e-07 ***
scale(Temp)                 19159   1  52.1545 8.782e-11 ***
scale(Solar.R)               5926   1  16.1323 0.0001118 ***
scale(Wind):scale(Temp)      2748   1   7.4810 0.0073323 ** 
scale(Wind):scale(Solar.R)    429   1   1.1689 0.2821218    
scale(Temp):scale(Solar.R)   1575   1   4.2867 0.0408890 *  
Residuals                   38205 104                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Identical when variables are scaled.

3.2.4 Correcting partitions for complexity

A last problem is that model components with more complexity (df) will always tend to explain more variance. We can see this when a model with a separate mean per day:

m = lm(Ozone ~ as.factor(Day) , data = airquality)
summary(aov(m))
               Df Sum Sq Mean Sq F value Pr(>F)
as.factor(Day) 30  41931    1398   1.428  0.104
Residuals      85  83212     979               
37 observations deleted due to missingness

The ANOVA tells us that this variable Day explains around 1/3 of the variation in the data, although it is not even significant. Because of this problem, the summary.lm() function reports a raw and an adjusted R2 for each model. The adjusted R2 tries to correct the R2 for the complexity of the model.

summary(m)

Call:
lm(formula = Ozone ~ as.factor(Day), data = airquality)

Residuals:
    Min      1Q  Median      3Q     Max 
-82.667 -16.250  -2.175  16.450  71.333 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)         77.75      15.64   4.970 3.43e-06 ***
as.factor(Day)2    -34.75      22.12  -1.571  0.11997    
as.factor(Day)3    -44.50      22.12  -2.011  0.04746 *  
as.factor(Day)4    -15.42      23.90  -0.645  0.52058    
as.factor(Day)5    -29.08      23.90  -1.217  0.22696    
as.factor(Day)6    -36.25      22.12  -1.638  0.10502    
as.factor(Day)7    -23.55      20.99  -1.122  0.26501    
as.factor(Day)8    -20.75      22.12  -0.938  0.35096    
as.factor(Day)9    -16.35      20.99  -0.779  0.43815    
as.factor(Day)10   -28.42      23.90  -1.189  0.23770    
as.factor(Day)11   -52.25      27.10  -1.928  0.05716 .  
as.factor(Day)12   -55.00      22.12  -2.486  0.01488 *  
as.factor(Day)13   -54.35      20.99  -2.589  0.01131 *  
as.factor(Day)14   -48.42      23.90  -2.026  0.04589 *  
as.factor(Day)15   -65.08      23.90  -2.723  0.00784 ** 
as.factor(Day)16   -47.55      20.99  -2.265  0.02603 *  
as.factor(Day)17   -41.15      20.99  -1.961  0.05321 .  
as.factor(Day)18   -53.15      20.99  -2.532  0.01317 *  
as.factor(Day)19   -42.55      20.99  -2.027  0.04577 *  
as.factor(Day)20   -48.35      20.99  -2.304  0.02369 *  
as.factor(Day)21   -65.00      22.12  -2.938  0.00425 ** 
as.factor(Day)22   -63.42      23.90  -2.654  0.00950 ** 
as.factor(Day)23   -57.75      27.10  -2.131  0.03596 *  
as.factor(Day)24   -36.75      22.12  -1.661  0.10038    
as.factor(Day)25    18.92      23.90   0.792  0.43080    
as.factor(Day)26   -36.75      23.90  -1.538  0.12780    
as.factor(Day)27   -25.75      34.98  -0.736  0.46370    
as.factor(Day)28   -29.00      22.12  -1.311  0.19346    
as.factor(Day)29   -20.00      22.12  -0.904  0.36856    
as.factor(Day)30    -7.00      22.12  -0.316  0.75248    
as.factor(Day)31   -17.42      23.90  -0.729  0.46811    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 31.29 on 85 degrees of freedom
  (37 observations deleted due to missingness)
Multiple R-squared:  0.3351,    Adjusted R-squared:  0.1004 
F-statistic: 1.428 on 30 and 85 DF,  p-value: 0.1039

Here, we see that the adjusted R2 is considerably lower than the raw R2, even though not zero.

In general, reliably adjusting R2 components for complexity in an ANOVA is very complicated. I would recommend that

  • If you report raw components (which is the default in most papers), consider that this is a description of your model, not an inference about the true variance created by the respective factor, and include in your interpretation that variables or model components with more df will always tend to explain more variance

  • Alternatively, if the true variance is really crucial for your study, you can adjust R2 by a null expectation, similar to the parametric bootstrap for the LRT. However, because you subtract what is expected under H0, this will be a conservative estimate.

3.3 Relationship between lm and other hypotheses tests

ANOVA is just one example of the close relationship between the lm and other null hypothesis significance tests. ANOVA can either be seen as a special test (traditional view), or as a special evaluation of a fitted lm (modern view). I would recommend taking the modern view, as it easily generalizes to GLMs and other more advanced regression models such as GLMMs.

It is interesting to consider more generally which hypothesis tests can be obtained by looking at certain results or tests on the linear model. As you can see in the table below, there are many classical hypothesis tests that are equivalently or nearly equivalently tested by a linear model.

Common statistical tests with linear models
Hypothesis R function Equivalent linear model

Two groups differ in their means

unpaired t.test

t.test(y_1, y_2) lm(c(y_1, y_2)~c(g_1, g_2))

Two groups differ in their means

paired t.test

t.test(y_1, y_2, paired = TRUE) lm(y_2-y_1~1)

y depends on continuous x

Pearson correlation

cor.test(x, y) lm(y~x) note: only near-equivalent

y depends on continuous x

Spearman correlation

cor.test(x, y, method = "spearman") lm(rank(y)~rank(x)) note: only near-equivalent.

y differs within a group

One-way ANOVA

aov(y~group) lm(y~group) with subsequent ANOVA

Based on this, one could think it doesn’t matter then if we use a regression model or a hypothesis test. However, the advantage of the regression model is that

  • It provides a richer output

  • It provides the same output for all cases, with clear ideas about how to check residuals etc.

  • It allows to adjust for the effects of covariates

3.4 Case studies

3.4.1 Plant Height

Let’s look at our plant height case study from the previous chapter. Which type of ANOVA should you run to analyze the model? Compare results to “less appropriate” alternatives.

library(EcoData)
fit = lm(loght ~ temp * lat, data = plantHeight)
summary(fit)

Call:
lm(formula = loght ~ temp * lat, data = plantHeight)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.97905 -0.45112  0.01062  0.42852  1.74054 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.917e-01  5.617e-01   0.519    0.604
temp         2.596e-02  2.040e-02   1.273    0.205
lat         -8.948e-03  9.711e-03  -0.921    0.358
temp:lat     7.738e-05  3.061e-04   0.253    0.801

Residual standard error: 0.6869 on 174 degrees of freedom
Multiple R-squared:  0.2504,    Adjusted R-squared:  0.2375 
F-statistic: 19.37 on 3 and 174 DF,  p-value: 6.95e-11

We have correlations between variables and an interaction, so per default, we should probably run a type II or III Anova. Note that for type III, you should center predictors to reduce collinearities (and thus shared fractions between main effects and interactions):

plantHeight$stemp = scale(plantHeight$temp, scale = FALSE) # only center variable
plantHeight$slat = scale(plantHeight$lat, scale = FALSE)
fit = lm(loght ~ stemp * slat, data = plantHeight)
summary(fit)

Call:
lm(formula = loght ~ stemp * slat, data = plantHeight)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.97905 -0.45112  0.01062  0.42852  1.74054 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.694e-01  6.771e-02   6.932 7.78e-11 ***
stemp        2.838e-02  1.543e-02   1.839   0.0676 .  
slat        -7.700e-03  8.020e-03  -0.960   0.3383    
stemp:slat   7.738e-05  3.061e-04   0.253   0.8007    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6869 on 174 degrees of freedom
Multiple R-squared:  0.2504,    Adjusted R-squared:  0.2375 
F-statistic: 19.37 on 3 and 174 DF,  p-value: 6.95e-11

The type II ANOVA discards shared fractions between main effects. Note that because of the high collinearity, R2 assigned to each predictor is extremely low.

print(car::Anova(fit, type = "II"))
Anova Table (Type II tests)

Response: loght
           Sum Sq  Df F value  Pr(>F)  
stemp       1.888   1  4.0014 0.04702 *
slat        0.421   1  0.8922 0.34620  
stemp:slat  0.030   1  0.0639 0.80074  
Residuals  82.089 174                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The type III ANOVA also discards the shared fractions between the interactions and main effects. For this reason, the main effects are even lower than in type II.

print(car::Anova(fit, type = "III"))
Anova Table (Type III tests)

Response: loght
            Sum Sq  Df F value    Pr(>F)    
(Intercept) 22.670   1 48.0522 7.782e-11 ***
stemp        1.596   1  3.3834   0.06756 .  
slat         0.435   1  0.9218   0.33834    
stemp:slat   0.030   1  0.0639   0.80074    
Residuals   82.089 174                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Generally, because of its order dependence under collinearity, you should be hesitant to run a type I ANOVA. However, there can be reasons to do it. For example, in this case, you may argue that temperature is a more plausible or mechanistic predictor that latitude. For that reason, you could say that you want to explain everything by temperature first, and then see if latitude can explain something on top of temperature. If that is your goal, it can also run a type I ANVOA.

summary(aov(fit))
             Df Sum Sq Mean Sq F value   Pr(>F)    
stemp         1  26.97  26.967  57.160 2.21e-12 ***
slat          1   0.42   0.421   0.892    0.346    
stemp:slat    1   0.03   0.030   0.064    0.801    
Residuals   174  82.09   0.472                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As a last, add-on, let’s see let’s see what happens when we use the non centered variables:

fit2 = lm(loght ~ temp * lat, data = plantHeight)
print(car::Anova(fit2, type = "II"))
Anova Table (Type II tests)

Response: loght
          Sum Sq  Df F value  Pr(>F)  
temp       1.888   1  4.0014 0.04702 *
lat        0.421   1  0.8922 0.34620  
temp:lat   0.030   1  0.0639 0.80074  
Residuals 82.089 174                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(car::Anova(fit2, type = "III"))
Anova Table (Type III tests)

Response: loght
            Sum Sq  Df F value Pr(>F)
(Intercept)  0.127   1  0.2696 0.6043
temp         0.764   1  1.6198 0.2048
lat          0.401   1  0.8490 0.3581
temp:lat     0.030   1  0.0639 0.8007
Residuals   82.089 174               

Note that in this case, main effects drop stronger between II and III, because there are stronger correlations between the main and interaction effects. This is an example of a more general point: if you run type II / III ANOVAs, you should do everything to reduce correlations between the variables. For categorical variables, one way to do so is to use orthogonal contrasts.