Making rigorous conclusions
In this part we introduce modelling and statistical inference for making data-based conclusions.
We discuss building, interpreting, and selecting models, visualizing interaction effects, and prediction and model validation.
Statistical inference is introduced from a simulation based perspective, and the Central Limit Theorem is discussed very briefly to lay the foundation for future coursework in statistics.
The RStudio Cloud workspace for Data Science Course in a Box project is here.
You can join the workspace and play around with the sample application exercises.
Slides, videos, and application exercises
Modelling data
Unit 4 - Deck 1: The language of models
Unit 4 - Deck 2: Fitting and interpreting models
Unit 4 - Deck 3: Modelling nonlinear relationships
Unit 4 - Deck 4: Models with multiple predictors
Unit 4 - Deck 5: More models with multiple predictors
Classification and model building
Unit 4 - Deck 6: Logistic regression
Unit 4 - Deck 7: Prediction and overfitting
Unit 4 - Deck 8: Feature engineering
Model validation
Unit 4 - Deck 9: Cross validation
The Office + Feature engineering, Pt. 1
The Office + Cross validation, Pt. 2
Uncertainty quantification
Unit 4 - Deck 10: Quantifying uncertainty
Unit 4 - Deck 11: Bootstrapping
Unit 4 - Deck 12: Hypothesis testing
Unit 4 - Deck 13: Inference overview
Labs
Lab 10: Grading the professor, Pt. 1
Fitting and interpreting simple linear regression models
Lab 11: Grading the professor, Pt. 2
Fitting and interpreting multiple linear regression models
Lab 12: Smoking while pregnant
Constructing confidence intervals, conducting hypothesis tests, and interpreting results in context of the data
Homework assignments
HW 7: Bike rentals in DC
Exploratory data analysis and fitting and interpreting models
HW 8: Exploring the GSS
Fitting and interpreting models
HW 9: Modelling the GSS
Model validation and inference