SLR: Matrix representation

Prof. Maria Tackett

Sep 05, 2024

Topics

  • Matrix representation for simple linear regression
    • Model form
    • Least square estimate
    • Predicted (fitted) values
    • Residuals
  • Matrix representation in R

Matrix representation of simple linear regression

SLR: Statistical model (population)

When we have a quantitative response, \(Y\), and a single quantitative predictor, \(X\), we can use a simple linear regression model to describe the relationship between \(Y\) and \(X\). \[\large{Y = \mathbf{\beta_0 + \beta_1 X} + \epsilon}, \hspace{8mm} \epsilon \sim N(0, \sigma_{\epsilon}^2)\]


  • \(\beta_1\): Population (true) slope of the relationship between \(X\) and \(Y\)
  • \(\beta_0\): Population (true) intercept of the relationship between \(X\) and \(Y\)
  • \(\epsilon\): Error

SLR in matrix form

Suppose we have \(n\) observations.

\[ \underbrace{ \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix} }_ {\mathbf{y}} \hspace{3mm} = \hspace{3mm} \underbrace{ \begin{bmatrix} 1 &x_1 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} }_{\mathbf{X}} \hspace{2mm} \underbrace{ \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix} }_{\boldsymbol{\beta}} \hspace{3mm} + \hspace{3mm} \underbrace{ \begin{bmatrix} \epsilon_1 \\ \vdots\\ \epsilon_n \end{bmatrix} }_\boldsymbol{\epsilon} \]


What are the dimensions of \(\mathbf{y}\), \(\mathbf{X}\), \(\boldsymbol{\beta}\), and \(\boldsymbol{\epsilon}\)?

Sum of squared residuals

We use the sum of squared residuals (also called “sum of squared error”) to find the least squares line:

\[ SSR = \sum_{i=1}^ne_i^2 = \mathbf{e}^T\mathbf{e} = (\mathbf{y} - \hat{\mathbf{y}})^T(\mathbf{y} - \hat{\mathbf{y}}) \]


  • What is the dimension of SSR?

  • What is \(\hat{\mathbf{y}}\) in terms of \(\mathbf{y}\), \(\mathbf{X}\), and/or \(\boldsymbol{\beta}\) ?

Minimize sum of squared residuals

We want to find values of \(\boldsymbol{\beta} = \begin{bmatrix}\beta_0 \\ \beta_1 \end{bmatrix}\) that minimize the sum of squared residuals \[ \begin{aligned} \mathbf{e}^T\mathbf{e} &= (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \\[10pt] \end{aligned} \]

Minimize sum of squared residuals

We want to find values of \(\boldsymbol{\beta} = \begin{bmatrix}\beta_0 \\ \beta_1 \end{bmatrix}\) that minimize the sum of squared residuals \[ \begin{aligned} \mathbf{e}^T\mathbf{e} &= (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \\[10pt] &= (\mathbf{y}^T - \boldsymbol{\beta}^T\mathbf{X}^T)(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\\[10pt] \end{aligned} \]

Minimize sum of squared residuals

We want to find values of \(\boldsymbol{\beta} = \begin{bmatrix}\beta_0 \\ \beta_1 \end{bmatrix}\) that minimize the sum of squared residuals \[ \begin{aligned} \mathbf{e}^T\mathbf{e} &= (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \\[10pt] &= (\mathbf{y}^T - \boldsymbol{\beta}^T\mathbf{X}^T)(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\\[10pt] &=\mathbf{y}^T\mathbf{y} - \mathbf{y}^T\mathbf{X}\boldsymbol{\beta} - \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\\[10pt] \end{aligned} \]

Minimize sum of squared residuals

We want to find values of \(\boldsymbol{\beta} = \begin{bmatrix}\beta_0 \\ \beta_1 \end{bmatrix}\) that minimize the sum of squared residuals \[ \begin{aligned} \mathbf{e}^T\mathbf{e} &= (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \\[10pt] &= (\mathbf{y}^T - \boldsymbol{\beta}^T\mathbf{X}^T)(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\\[10pt] &=\mathbf{y}^T\mathbf{y} - \mathbf{y}^T\mathbf{X}\boldsymbol{\beta} - \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\\[10pt] &=\mathbf{y}^T\mathbf{y} - 2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} \end{aligned} \]

Least squares estimators

\[ SSR = \mathbf{e}^T\mathbf{e} =\mathbf{y}^T\mathbf{y} - 2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} \]


The least squares estimators must satisfy

\[ \nabla_{\boldsymbol{\beta}} SSR = -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = 0 \]


\[ \color{#993399}{\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}} \]

Did we find a minimum?

\[ \nabla^2_{\beta} SSR \propto 2\mathbf{X}^T\mathbf{X} = 0 \]


  • \(\mathbf{X}\) is full rank \(\Rightarrow\) \(\mathbf{X}^T\mathbf{X}\) is positive definite

  • Therefore we have found the minimizing point

Matrix representation in R

Obtain \(\mathbf{y}\) vector

Let’s go back to the Duke Forest data. We want to use the matrix representation to fit a model of the form:

\[ price = \beta_0 + \beta_1 ~ area + \epsilon, \hspace{5mm} \epsilon \sim N(0, \sigma^2_\epsilon) \]

Get \(\mathbf{y}\), the vector of responses

y <- duke_forest$price


Let’s look at the first 10 observations of \(y\)

y[1:10]
 [1] 1520000 1030000  420000  680000  428500  456000 1270000  557450  697500
[10]  650000

Obtain \(\mathbf{X}\) matrix

Use the model.matrix() function to get \(\mathbf{X}\)

X <- model.matrix(price ~ area, data = duke_forest)


Let’s look at the first 10 rows of \(\mathbf{X}\)

X[1:10,]
   (Intercept) area
1            1 6040
2            1 4475
3            1 1745
4            1 2091
5            1 1772
6            1 1950
7            1 3909
8            1 2841
9            1 3924
10           1 2173

Calculate \(\hat{\boldsymbol{\beta}}\)

Matrix functions in R. Let \(\mathbf{A}\) and \(\mathbf{B}\) be matrices

  • t(A): transpose \(\mathbf{A}\)
  • solve(A): inverse of \(\mathbf{A}\)
  • A %*% B: multiply \(\mathbf{A}\) and \(\mathbf{B}\)

Now let’s calculate \(\hat{\boldsymbol{\beta}}\)

beta_hat <- solve(t(X)%*%X)%*%t(X)%*%y
beta_hat
                   [,1]
(Intercept) 116652.3251
area           159.4833

Compare to result from lm

duke_forest_model <- lm(price ~ area, data = duke_forest)
tidy(duke_forest_model) |> kable(digits = 3)
term estimate std.error statistic p.value
(Intercept) 116652.325 53302.463 2.188 0.031
area 159.483 18.171 8.777 0.000


beta_hat 
                   [,1]
(Intercept) 116652.3251
area           159.4833

Predicted values and residuals

Predicted (fitted) values

Now that we have \(\hat{\boldsymbol{\beta}}\), let’s predict values of \(\mathbf{y}\) using the model

\[ \hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}} = \underbrace{\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T}_{\mathbf{H}}\mathbf{y} = \mathbf{H}\mathbf{y} \]

Hat matrix: \(\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\)

  • \(\mathbf{H}\) is an \(n\times n\) matrix
  • Maps vector of observed values \(\mathbf{y}\) to a vector of fitted values \(\hat{\mathbf{y}}\)

Residuals

Recall that the residuals are the difference between the observed and predicted values

\[ \begin{aligned} \mathbf{e} &= \mathbf{y} - \hat{\mathbf{y}}\\[10pt] & = \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \\[10pt] & = \mathbf{y} - \mathbf{H}\mathbf{y} \\[10pt] & = (\mathbf{I} - \mathbf{H})\mathbf{y} \end{aligned} \]

\[ \color{#993399}{\mathbf{e} = (\mathbf{I} - \mathbf{H})\mathbf{y}} \]

Recap

  • Introduced matrix representation for simple linear regression
    • Model from
    • Least square estimate
    • Predicted (fitted) values
    • Residuals
  • Used R for matrix calculations

Next class