Sep 05, 2024
When we have a quantitative response,
Suppose we have
What are the dimensions of
We use the sum of squared residuals (also called “sum of squared error”) to find the least squares line:
What is the dimension of SSR?
What is
We want to find values of
We want to find values of
We want to find values of
We want to find values of
The least squares estimators must satisfy
Therefore we have found the minimizing point
Let’s go back to the Duke Forest data. We want to use the matrix representation to fit a model of the form:
Use the model.matrix()
function to get
Matrix functions in R. Let
t(A)
: transpose solve(A)
: inverse of A %*% B
: multiply lm
duke_forest_model <- lm(price ~ area, data = duke_forest)
tidy(duke_forest_model) |> kable(digits = 3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 116652.325 | 53302.463 | 2.188 | 0.031 |
area | 159.483 | 18.171 | 8.777 | 0.000 |
Now that we have
Hat matrix:
Recall that the residuals are the difference between the observed and predicted values
Multiple linear regression
See Sep 10 prepare