Inference for regression

cont’d

Prof. Maria Tackett

Sep 24, 2024

term	estimate	std.error	statistic
(Intercept)	19.332	2.984	6.478
enrollment_th	0.780	0.110	7.074
typePublic	-13.226	3.153	-4.195

Expected value of $y$

Let $b = [\begin{matrix} b_{1} \\ ⋮ \\ b_{p} \end{matrix}]$ be a $p \times 1$ vector of random variables.

Then $E (b) = E [\begin{matrix} b_{1} \\ ⋮ \\ b_{p} \end{matrix}] = [\begin{matrix} E (b_{1}) \\ ⋮ \\ E (b_{p}) \end{matrix}]$

Use this to find $E (y | X)$ .

Assumptions of regression

$y | X \sim N (X β, σ_{ϵ}^{2} I)$

Image source: *Introduction to the Practice of Statistics (5th ed)*

Linearity: There is a linear relationship between the response and predictor variables.
Constant Variance: The variability about the least squares line is generally constant.
Normality: The distribution of the residuals is approximately normal.
Independence: The residuals are independent from one another.

Magnitude of p-value	Interpretation
p-value < 0.01	strong evidence against $H_{0}$
0.01 < p-value < 0.05	moderate evidence against $H_{0}$
0.05 < p-value < 0.1	weak evidence against $H_{0}$
p-value > 0.1	effectively no evidence against $H_{0}$

Confidence interval for $β_{j}$

A plausible range of values for a population parameter is called a confidence interval
Using only a single point estimate is like fishing in a murky lake with a spear, and using a confidence interval is like fishing with a net
- We can throw a spear where we saw a fish but we will probably miss, if we toss a net in that area, we have a good chance of catching the fish
- Similarly, if we report a point estimate, we probably will not hit the exact population parameter, but if we report a range of plausible values we have a good shot at capturing the parameter

What “confidence” means

We will construct $C %$ confidence intervals.
- The confidence level impacts the width of the interval

“Confident” means if we were to take repeated samples of the same size as our data, fit regression lines using the same predictors, and calculate $C %$ CIs for the coefficient of $x_{j}$ , then $C %$ of those intervals will contain the true value of the coefficient $β_{j}$

Balance precision and accuracy when selecting a confidence level

term	estimate	std.error	statistic
(Intercept)	19.332	2.984	6.478
enrollment_th	0.780	0.110	7.074
typePublic	-13.226	3.153	-4.195

95% CI for $β_{j}$ in R

tidy(exp_fit, conf.int = TRUE, conf.level = 0.95) |> 
  kable(digits = 3)

term	estimate	std.error	statistic	conf.low	conf.high
(Intercept)	19.332	2.984	6.478	13.426	25.239
enrollment_th	0.780	0.110	7.074	0.562	0.999
typePublic	-13.226	3.153	-4.195	-19.466	-6.986

Interpretation: We are 95% confident that for each additional 1,000 students enrolled, the institution’s expenditures on football will be greater by $562,000 to $999,000, on average, holding institution type constant.

Source	Df	Sum Sq	Mean Sq	F Stat	Pr(> F)
Model	2	7138.591	3569.296	26.628	0
Residuals	124	16621.344	134.043
Total	126	23759.935

Source	Df	Sum Sq	Mean Sq	F Stat	Pr(> F)
Model	2	7138.591	3569.296	26.628	0
Residuals	124	16621.344	134.043
Total	126	23759.935

1 / 41

Inference for regression cont’d Prof. Maria Tackett Sep 24, 2024

Inference for regression
Announcements
Topics
Computing setup
Data: NCAA Football expenditures
Univariate EDA
Bivariate EDA
Regression model
Inference for regression
Statistical inference
Inference for linear regression
Linear regression model
Linear regression model
Expected value of $y$
Variance
Assumptions of regression
Estimating $σ_{ϵ}^{2}$
Inference for a single coefficient
Inference for $β_{j}$
Sampling distribution of ${\hat{β}}_{j}$
Hypothesis test for $β_{j}$
Steps for a hypothesis test
Hypothesis test for $β_{j}$ : Hypotheses
Hypothesis test for $β_{j}$ : Test statistic
Hypothesis test for $β_{j}$ : P-value
Understanding the p-value
Hypothesis test for $β_{j}$ : Conclusion
Application exercise
Confidence interval for $β_{j}$
Confidence interval for $β_{j}$
What “confidence” means
Confidence interval for $β_{j}$
Confidence interval: Critical value
95% CI for $β_{j}$ : Calculation
95% CI for $β_{j}$ in R
Test for overall significance
Test for overall significance: Hypotheses
Test for overall significance: Test statistic
Test for overall significance: P-value
Test for overall significance: Conclusion
Recap