Linear Statistical Models

This is a lecture note for Linear Statistical Models (208780). The objective of this course is to help students gain hands-on experience in R programming for Bayesian regression and its application on statistical modeling and causal inference.

Author

Donlapark Ponnoprat

Published

March 6, 2022

Preface

These are lecture notes that I wrote for the masters course Linear Statistical Models (208780) in Winter 2022.

After teaching statistics for a couple of years, I observed that many of our masters students lack the programming skills needed to apply what they learn in class to solve real-world problems. This motivated me to redesign the course to be more practice-oriented, featuring full hands-on code examples in R.

The first two sections of the notes closely follow the comprehensive textbook Linear Regression and Other Stories (Gelman, Hill, and Vehtari 2020). This excellent book covers all aspects of linear regression, including fitting, prediction, diagnostics, and practical issues that may arise. Moreover, the book features numerous coding examples throughout its content.

The third section covers causal inference, which mainly focuses on estimation of the effect of a treatment on an outcome. We shall see that, with careful experimental design and covariate “adjustment”, causal questions can be answered using linear regression. The lecture notes for this section again follow the materials in Gelman, Hill, and Vehtari (2020). Additional topics such as tests for assumptions, panel data, and synthetic control follow the materials in Cunningham (2021), Huntington-Klein (2021) and Facure (2020).

The last section covers conformal prediction, which is a relatively new technique of constructing a prediction interval under minimal statistical assumptions. The materials in this section closely follow the lecture notes of Stats 300C taught by Emmanuel Candes at Stanford University (Candès 2022) and the references therein.

Any comments and suggestions are welcome (my homepage).

Contents

Preface

Linear regression

  1. Basic regression
  2. Linear regression with a single predictor
  3. Fitting linear regression
  4. Prediction and Bayesian inference
  5. Linear regression with multiple predictors
  6. Model diagnostics and evaluation
  7. Logarithmic transformations
  8. Comparing regression models

Generalized linear models

  1. Logistic regression
  2. Logistic regression with multiple predictors
  3. Diagnostics of logistic regression models
  4. Generalized linear models
  5. Poststratification: regression with non-representative sample

Causal inference

  1. Basics of causal inference
  2. Causal inference with regression
  3. Causal inference with observational data
  4. Subclassification and propensity score matching
  5. Instrumental variables
  6. Regression discontinuity
  7. Difference-in-differences
  8. Panel data
  9. Synthetic control
  10. Use cases of causal inference in industry

Conformal Prediction

  1. Full & split conformal prediction
  2. Jackknife+, CV+ and Quantile regression
  3. Conformal prediction for classification

References

Candès, Emmanuel. 2022. “Lecture Notes of Stats 300C: Theory of Statistics.” 2022. https://candes.su.domains/teaching/stats300c/lectures.
Cunningham, Scott. 2021. Causal Inference: The Mixtape. Yale university press.
Facure, Matheus. 2020. “Python Causality Handbook.” 2020. https://matheusfacure.github.io/python-causality-handbook/landing-page.
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and Other Stories. Analytical Methods for Social Research. Cambridge University Press. https://doi.org/10.1017/9781139161879.
Huntington-Klein, Nick. 2021. The Effect: An Introduction to Research Design and Causality. CRC Press.