Inference for regression

Prof. Maria Tackett

Sep 19, 2024

term	estimate	std.error	statistic
(Intercept)	19.332	2.984	6.478
enrollment_th	0.780	0.110	7.074
typePublic	-13.226	3.153	-4.195

Linear regression model

$\begin{array}{r} Y = X β + ϵ, ϵ \sim N (0, σ_{ϵ}^{2} I) \end{array}$

such that the errors are independent and normally distributed.

Independent: Knowing the error term for one observation doesn’t tell you anything about the error term for another observation
Normally distributed: Tell us the shape of the distribution of residuals

What else do we know about the distribution of the residuals based on this equation?

$y | X \sim N (X β, σ_{ϵ}^{2} I)$

Image source: Introduction to the Practice of Statistics (5th ed)

Let $b = [\begin{matrix} b_{1} \\ ⋮ \\ b_{p} \end{matrix}]$ be a $p \times 1$ vector of random variables.

Then $E (b) = E [\begin{matrix} b_{1} \\ ⋮ \\ b_{p} \end{matrix}] = [\begin{matrix} E (b_{1}) \\ ⋮ \\ E (b_{p}) \end{matrix}]$

Use this to find $E (y | X)$ .

$y | X \sim N (X β, σ_{ϵ}^{2} I)$

Linearity: There is a linear relationship between the response and predictor variables.
Constant Variance: The variability about the least squares line is generally constant.
Normality: The distribution of the residuals is approximately normal.
Independence: The residuals are independent from one another.

Magnitude of p-value	Interpretation
p-value < 0.01	strong evidence against $H_{0}$
0.01 < p-value < 0.05	moderate evidence against $H_{0}$
0.05 < p-value < 0.1	weak evidence against $H_{0}$
p-value > 0.1	effectively no evidence against $H_{0}$

Inference for regression Prof. Maria Tackett Sep 19, 2024