Logistic Regression: Prediction

Prof. Maria Tackett

Nov 12, 2024

Computational set up

library(tidyverse)
library(tidymodels)
library(pROC)       # make ROC curves
library(knitr)
library(kableExtra)

# set default theme in ggplot2
ggplot2::theme_set(ggplot2::theme_bw())

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.638	0.372	-17.860	0.000	-7.374	-5.917
age	0.082	0.006	14.430	0.000	0.071	0.093
totChol	0.002	0.001	2.001	0.045	0.000	0.004
currentSmoker1	0.457	0.092	4.951	0.000	0.277	0.639

	Not high risk $(y_{i} = 0)$	High risk $(y_{i} = 1)$
Classified not high risk $({\hat{π}}_{i} \leq threshold)$	True negative (TN)	False negative (FN)
Classified high risk $({\hat{π}}_{i} > threshold)$	False positive (FP)	True positive (TP)

Sensitivity

	Not high risk $(y_{i} = 0)$	High risk $(y_{i} = 1)$
Classified not high risk $({\hat{π}}_{i} \leq threshold)$	True negative (TN)	False negative (FN)
Classified high risk $({\hat{π}}_{i} > threshold)$	False positive (FP)	True positive (TP)

Sensitivity: Proportion of actual positives that were correctly classified as positive

Also known as true positive rate (TPR) and recall
P(classified high risk | high risk) = 1 − False negative rate

Using metrics to select model and threshold

Metric	Guidance for use
Accuracy	For balanced data, use only in combination with other metrics. Avoid using for imbalanced data.
Sensitivity (true positive rate)	Use when false negatives are more “expensive” than false positives.
False positive rate	Use when false positives are more “expensive” than false negatives.
Precision = $\frac{T P}{T P + F P}$	Use when it’s important for positive predictions to be accurate.

This table is a modification of work created and shared by Google in the Google Machine Learning Crash Course.

1 / 38

Logistic Regression: Prediction Prof. Maria Tackett Nov 12, 2024

Logistic Regression: Prediction
Announcements
Computational set up
Topics
Data: Risk of coronary heart disease
Modeling risk of coronary heart disease
Prediction and classification
Augmented data frame
Predicted log-odds
Predicted odds
Predicted probability
Another individual
Predicted probabilities in R
Predictions in R
Classifying observations
Classify using 0.5 as threshold
Confusion matrix
Visualize confusion matrix
Using the confusion matrix
Using the confusion matrix
Using the confusion matrix
Sensitivity and specificity
True/false positive/negative
False negative rate
False positive rate
Sensitivity
Specificity
Practice
Truth Prediction...
Using metrics to select model and threshold
Choosing a classification threshold
ROC curve
ROC curve
ROC curve in R
ROC curve in R
Area under the curve
Recap
Further reading