ANOVA
Analysis of Variance (ANOVA): Technique to partition variability in \(Y\) by the sources of variability
Total variability (Response)
95000 |
540000 |
1520000 |
559898.7 |
225448.1 |
Partition sources of variability in price
Total variability (Response)
\[\text{Sum of Squares Total (SST)} = \sum_{i=1}^n(y_i - \bar{y})^2 = (n-1)s_y^2\]
Explained variability (Model)
\[\text{Sum of Squares Model (SSM)} = \sum_{i = 1}^{n}(\hat{y}_i - \bar{y})^2\]
Unexplained variability (Residuals)
\[\text{Sum of Squares Residuals (SSR)} = \sum_{i = 1}^{n}(y_i - \hat{y}_i)^2\]
Sum of Squares
\[
\begin{aligned}
\color{#407E99}{SST} \hspace{5mm}&= &\color{#993399}{SSM} &\hspace{5mm} + &\color{#8BB174}{SSR} \\[10pt]
\color{#407E99}{\sum_{i=1}^n(y_i - \bar{y})^2} \hspace{5mm}&= &\color{#993399}{\sum_{i = 1}^{n}(\hat{y}_i - \bar{y})^2} &\hspace{5mm}+ &\color{#8BB174}{\sum_{i = 1}^{n}(y_i - \hat{y}_i)^2}
\end{aligned}
\]
\(R^2\)
The coefficient of determination \(R^2\) is the proportion of variation in the response, \(Y\), that is explained by the regression model
\[\large{R^2 = \frac{SSM}{SST} = 1 - \frac{SSR}{SST}}\]
What is the range of \(R^2\)? Does \(R^2\) have units?
Interpreting $R^2$
Submit your response to the following question on Ed Discussion.
The \(R^2\) of the model for price from area of houses in Duke Forest is 44.5%. Which of the following is the correct interpretation of this value?
- Area correctly predicts 44.5% of price for houses in Duke Forest.
- 44.5% of the variability in price for houses in Duke Forest can be explained by area.
- 44.5% of the variability in area for houses in Duke Forest can be explained by price.
- 44.5% of the time price for houses in Duke Forest can be predicted by area.
Do you think this model is useful for explaining variability in the price of Duke Forest houses?
🔗 https://edstem.org/us/courses/62513/discussion/629888