Properties of estimators

Prof. Maria Tackett

Sep 26, 2024

Data: NCAA Football expenditures

Today’s data come from Equity in Athletics Data Analysis and includes information about sports expenditures and revenues for colleges and universities in the United States. This data set was featured in a March 2022 Tidy Tuesday.

We will focus on the 2019 - 2020 season expenditures on football for institutions in the NCAA - Division 1 FBS. The variables are :

total_exp_m: Total expenditures on football in the 2019 - 2020 academic year (in millions USD)
enrollment_th: Total student enrollment in the 2019 - 2020 academic year (in thousands)
type: institution type (Public or Private)

term	estimate	std.error	statistic
(Intercept)	19.332	2.984	6.478
enrollment_th	0.780	0.110	7.074
typePublic	-13.226	3.153	-4.195

Confidence interval for $β_{j}$

A plausible range of values for a population parameter is called a confidence interval
Using only a single point estimate is like fishing in a murky lake with a spear, and using a confidence interval is like fishing with a net
- We can throw a spear where we saw a fish but we will probably miss, if we toss a net in that area, we have a good chance of catching the fish
- Similarly, if we report a point estimate, we probably will not hit the exact population parameter, but if we report a range of plausible values we have a good shot at capturing the parameter

What “confidence” means

We will construct $C %$ confidence intervals
- The confidence level impacts the width of the interval
“Confidence” means if we were to take repeated samples of the same size as our data, fit regression lines using the same predictors, and calculate $C %$ CIs for the coefficient of $x_{j}$ , then $C %$ of those intervals will contain the true value of the coefficient $β_{j}$
Need to balance precision and accuracy when selecting a confidence level

term	estimate	std.error	statistic
(Intercept)	19.332	2.984	6.478
enrollment_th	0.780	0.110	7.074
typePublic	-13.226	3.153	-4.195

term	estimate	std.error	statistic	conf.low	conf.high
(Intercept)	19.332	2.984	6.478	13.426	25.239
enrollment_th	0.780	0.110	7.074	0.562	0.999
typePublic	-13.226	3.153	-4.195	-19.466	-6.986

Properties of $\hat{β}$

Motivation

We have discussed how to use least squares to find an estimator of $\hat{β}$
How do we know whether our least squares estimator is a “good” estimator?
When we consider what makes an estimator “good”, we’ll look at three criteria:
- Bias
- Variance
- Mean squared error
We’ll take a look at these over the course of a few lectures and motivate why we might prefer using least squares to compute $\hat{β}$ versus other methods

Bias and variance

Each time we take a sample of size $n$ , we can find the least squares estimator (throw dart at target)
Suppose we take many independent samples of size $n$ and find the least squares estimator for each sample (throw many darts at the target). Ideally,
- The estimators are centered at the true parameter (unbiased)
- The estimators are clustered around the true parameter (unbiased with low variance)

Let’s take a look at the mean and variance of the least squares estimator

1 / 31

Properties of estimators Prof. Maria Tackett Sep 26, 2024

Properties of estimators
Announcements
Topics
Computing setup
Data: NCAA Football expenditures
Regression model
Inference for $β_{j}$
Confidence interval for $β_{j}$
Confidence interval for $β_{j}$
What “confidence” means
Confidence interval for $β_{j}$
Computing $t^{*}$ in R
95% CI for coefficient of enrollment
Interpreting the CI
Computing CI in R
Properties of $\hat{β}$
Motivation
Bias and variance
Bias and variance
Bias and variance
Expected value of $\hat{β}$
Finding expected value and variance
Expected value of $\hat{β}$
Expected value of $\hat{β}$
Variance of $\hat{β}$
Variance of $\hat{β}$
What do we mean by “linear”?
“Linear” regression model
Identify the linear regression model
Identify the linear regression model
Recap