7 Exploring data

This unit focuses on data visualization and data wrangling. Specifically we cover fundamentals of data and data visualization, confounding variables, and Simpson’s paradox as well as the concept of tidy data, data import, data cleaning, and data curation. We end the unit with web scraping and introduce the idea of iteration in preparation for the next unit. Also in this unit students are introduced to the toolkit: R, RStudio, R Markdown, Git, and GitHub.

The RStudio Cloud workspace for Data Science Course in a Box project is here. You can join the workspace and play around with the application exercises.

7.1 Slides, videos, and application exercises

7.1.1 Visualising data

Unit 2 - Deck 1: Data and visualisation

Unit 2 - Deck 2: Visualising data with ggplot2

Unit 2 - Deck 3: Visualising numerical data

Unit 2 - Deck 4: Visualising categorical data

StarWars + Dataviz

7.1.2 Wrangling and tidying data

Unit 2 - Deck 5: Tidy data

JSS :: Tidy data

Unit 2 - Deck 6: Grammar of data wrangling

Unit 2 - Deck 7: Working with a single data frame

Unit 2 - Deck 8: Working with multiple data frames

Unit 2 - Deck 9: Tidying data

Hotels + Data wrangling

7.1.3 Importing and recoding data

Unit 2 - Deck 10: Data types

Unit 2 - Deck 11: Data classes

Unit 2 - Deck 12: Importing data

Unit 2 - Deck 13: Recoding data

Hotels + Data types

Nobels + Sales + Data import

7.1.4 Communicating data science results effectively

Unit 2 - Deck 14: Tips for effective data visualization

Brexit + Telling stories with dataviz

Unit 2 - Deck 15: Scientific studies and confounding

Unit 2 - Deck 16: Simpson’s paradox

Unit 2 - Deck 17: Doing data science

7.1.5 Web scraping and programming

Unit 2 - Deck 18: Web scraping

Unit 2 - Deck 19: Scraping top 250 movies on IMDB

Unit 2 - Deck 20: Web scraping considerations

IMDB + Web scraping

Unit 2 - Deck 21: Functions

Unit 2 - Deck 22: Iteration

7.2 Labs

Lab 1: Hello R

Introduction to R, R Markdown, Git, and GitHub

Lab 2: Plastic waste

Introduction to working with data in R with the tidyverse

Lab 3: Nobel laureates

Data wrangling and tidying

Lab 4: La Quinta is Spanish for ‘next to Denny’s’, Pt. 1

Visualizing spatial data

Lab 5: La Quinta is Spanish for ‘next to Denny’s’, Pt. 2

Wrangling spatial data

Lab 6: Sad plots

Critiquing and improving data visualisations

Lab 7: Simpson’s paradox

Data visualisation, confounding, multivariable relationships

Lab 8: University of Edinburgh Art Collection

Web scraping, function, iteration

7.3 Homework assignments

HW 1: Pet names

Introduction to working with data in R with the tidyverse

HW 2: Edinburgh Airbnb rentals

Data visualisation with the tidyverse

HW 3: Road traffic accidents

Data wrangling, tidying, and visualization

HW 4: What should I major in?

More data wrangling, summarizing, and visualization

HW 5: Legos

More data wrangling, summarizing, and visualization

HW 6: Money in politics

Web scraping, functions, and iteration