STA 221 - Fall 2024 – Model comparison

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	0.838	0.397	2.112	0.036	0.055	1.622
Party	1.837	0.124	14.758	0.000	1.591	2.083
AgeSenCit	0.379	0.410	0.925	0.356	-0.430	1.189
AgeYadult	-1.009	0.408	-2.475	0.014	-1.813	-0.204

Test for overall significance: Hypotheses

We can conduct a hypothesis test using the ANOVA table to determine if there is at least one non-zero coefficient in the model

$\begin{aligned} H_{0} : β_{1} = \dots = β_{p} = 0 \\ H_{a} : β_{j} \neq 0 for at least one j \end{aligned}$

For the tips data: $\begin{aligned} H_{0} : β_{1} = β_{2} = β_{3} = 0 \\ H_{a} : β_{j} \neq 0 for at least one j \end{aligned}$

Test for overall significance: Test statistic

Source	Df	Sum Sq	Mean Sq	F Stat	Pr(> F)
Model	3	1226.664	408.888	98.284	0
Residuals	165	686.444	4.16
Total	168	1913.108

Test statistic: Ratio of explained to unexplained variability

$F = \frac{Mean Square Model}{Mean Square Residuals}$

The test statistic follows an $F$ distribution with $p$ and $n - p - 1$ degrees of freedom

Test for overall significance: P-value

$P-value = Pr (F > F Stat)$

Test for overall significance: Conclusion

$\begin{aligned} H_{0} : β_{1} = β_{2} = β_{3} = 0 \\ H_{a} : β_{j} \neq 0 for at least one j \end{aligned}$

Source	Df	Sum Sq	Mean Sq	F Stat	Pr(> F)
Model	3	1226.664	408.888	98.284	0
Residuals	165	686.444	4.16
Total	168	1913.108

What is the conclusion from this hypothesis test?

Why use overall F test?

Why do we use overall F test instead of just looking at the test for individual coefficients?¹

Suppose we have a model such that $p = 100$ and $H_{0} : β_{1} = \dots = β_{100} = 0$ is true

About 5% of the p-values for individual coefficients will be below 0.05 by chance.
So we expect to see 5 small p-values if even no linear association actually exists.
Therefore, it is very likely we will see at least one small p-value by chance.
The F-test does not have this problem, because it accounts for the number of predictors. There is only a 5% chance we will get a p-value below 0.05, if a linear relationship truly does not exist.

Testing subset of coefficients

Sometimes we want to test whether a subset of coefficients are all equal to 0
This is often the case when we want test
- whether a categorical variable with $k$ levels is a significant predictor of the response
- whether the interaction between a categorical and quantitative variable is significant
To do so, we will use the Nested (Partial) F-test

Nested (Partial) F Test

Suppose we have a full and reduced model:

$\begin{aligned} Full : y = β_{0} + β_{1} x_{1} + \dots + β_{q} x_{q} + β_{q + 1} x_{q + 1} + \dots β_{p} x_{p} \\ Reduced : y = β_{0} + β_{1} x_{1} + \dots + β_{q} x_{q} \end{aligned}$

We want to test whether any of the variables $x_{q + 1}, x_{q + 2}, \dots, x_{p}$ are significant predictors. To do so, we will test the hypothesis:

$\begin{aligned} H_{0} : β_{q + 1} = β_{q + 2} = \dots = β_{p} = 0 \\ H_{a} : at least one β_{j} is not equal to 0 \end{aligned}$

Nested F Test

The test statistic for this test is

$F = \frac{(S S R_{r e d u c e d} - S S R_{f u l l}) / # predictors tested}{S S R_{f u l l} / (n - p_{f u l l} - 1)}$

Calculate the p-value using the F distribution with df1 = # predictors tested and df2 = $(n - p_{f u l l} - 1)$

Is `Meal` a significant predictor of tips?

term	estimate
(Intercept)	1.254
Party	1.808
AgeSenCit	0.390
AgeYadult	-0.505
MealLate Night	-1.632
MealLunch	-0.612

Tips: Nested F test

$\begin{aligned} H_{0} : β_{l a t e n i g h t} = β_{l u n c h} = 0 \\ H_{a} : at least one β_{j} is not equal to 0 \end{aligned}$

reduced <- lm(Tip ~ Party + Age, data = tips)

full <- lm(Tip ~ Party + Age + Meal, data = tips)

#Nested F test in R
anova(reduced, full)

Tips: Nested F test

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
165	686.444	NA	NA	NA	NA
163	622.979	2	63.465	8.303	0

F Stat: $\frac{(686.444 - 622.979) / 2}{622.979 / (169 - 5 - 1)} = 8.303$

P-value: P(F > 8.303) = 0.0003 - calculated using an F distribution with 2 and 163 degrees of freedom

The data provide sufficient evidence to conclude that at least one coefficient associated with Meal is not zero. Therefore, Meal is a significant predictor of Tips.

Model with `Meal`

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	1.254	0.394	3.182	0.002	0.476	2.032
Party	1.808	0.121	14.909	0.000	1.568	2.047
AgeSenCit	0.390	0.394	0.990	0.324	-0.388	1.168
AgeYadult	-0.505	0.412	-1.227	0.222	-1.319	0.308
MealLate Night	-1.632	0.407	-4.013	0.000	-2.435	-0.829
MealLunch	-0.612	0.402	-1.523	0.130	-1.405	0.181

Including interactions

Does the effect of Party differ based on the Meal time?

term	estimate
(Intercept)	1.276
Party	1.795
AgeSenCit	0.401
AgeYadult	-0.470
MealLate Night	-1.845
MealLunch	-0.461
Party:MealLate Night	0.111
Party:MealLunch	-0.050

Nested F test for interactions

Let’s use a Nested F test to determine if Party*Meal is statistically significant.

reduced <- lm(Tip ~ Party + Age + Meal, data = tips)

full <- lm(Tip ~ Party + Age + Meal + Meal * Party, 
           data = tips)

kable(anova(reduced, full), format = "markdown", digits = 3) |>
  row_spec(2, background = "#dce5b2")

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
163	622.979	NA	NA	NA	NA
161	621.965	2	1.014	0.131	0.877

Final model for now

We conclude that the effect of Party does not differ based Meal. Therefore, we will use the original model that only included main effects.

term	estimate	std.error	statistic	p.value
(Intercept)	1.254	0.394	3.182	0.002
Party	1.808	0.121	14.909	0.000
AgeSenCit	0.390	0.394	0.990	0.324
AgeYadult	-0.505	0.412	-1.227	0.222
MealLate Night	-1.632	0.407	-4.013	0.000
MealLunch	-0.612	0.402	-1.523	0.130

Model comparison

Announcements

Computing set up

Topics

Restaurant tips

Response Variable

Predictor Variables

Response vs. Predictors

Restaurant tips: model

Test for overall significance

Test for overall significance: Hypotheses

Test for overall significance: Test statistic

Test for overall significance: P-value

Test for overall significance: Conclusion

Why use overall F test?

Testing subset of coefficients

Nested (Partial) F Test

Nested F Test

Is `Meal` a significant predictor of tips?

Tips: Nested F test

Tips: Nested F test

Model with `Meal`

Including interactions

Nested F test for interactions

Final model for now

Model comparison using AIC and BIC

Tips: Comparing models

AIC & BIC

AIC & BIC

AIC & BIC

Using AIC & BIC

Tips: AIC & BIC

Parsimony and Occam’s razor

In pursuit of Occam’s razor

In pursuit of Occam’s razor

Alternate views

Recap