Exploring data
This unit focuses on data visualization and data wrangling. Specifically we cover fundamentals of data and data visualization, confounding variables, and Simpson’s paradox as well as the concept of tidy data, data import, data cleaning, and data curation. We end the unit with web scraping and introduce the idea of iteration in preparation for the next unit. Also in this unit students are introduced to the toolkit: R, RStudio, R Markdown, Git, and GitHub.
The RStudio Cloud workspace for Data Science Course in a Box project is here. You can join the workspace and play around with the application exercises.
Slides, videos, and application exercises
Visualising data
Wrangling and tidying data
Importing and recoding data
Communicating data science results effectively
Web scraping and programming
Labs
Lab 2: Plastic waste
Introduction to working with data in R with the tidyverse
Lab 4: La Quinta is Spanish for ‘next to Denny’s’, Pt. 1
Visualizing spatial data
Lab 5: La Quinta is Spanish for ‘next to Denny’s’, Pt. 2
Wrangling spatial data
Lab 7: Simpson’s paradox
Data visualisation, confounding, multivariable relationships
Lab 8: University of Edinburgh Art Collection
Web scraping, function, iteration
Homework assignments
HW 1: Pet names
Introduction to working with data in R with the tidyverse
HW 4: What should I major in?
More data wrangling, summarizing, and visualization