Chapter 7 Exploring data

This unit focuses on data visualization and data wrangling. Specifically we cover fundamentals of data and data visualization, confounding variables, and Simpson’s paradox as well as the concept of tidy data, data import, data cleaning, and data curation. We end the unit with web scraping and introduce the idea of iteration in preparation for the next unit. Also in this unit students are introduced to the toolkit: R, RStudio, R Markdown, Git, and GitHub.

The RStudio Cloud workspace for Data Science Course in a Box project is here. You can join the workspace and play around with the application exercises.

7.1 Slides & application exercises

Unit 1 - Deck 4: Building plots for various data types

[Slides] [Source]

Unit 1 - Deck 5: Tidy data and data wrangling

[Slides] [Source]

Unit 1 - Deck 6: Joining data from multiple sources

[Slides] [Source]

Unit 1 - Deck 7: Data tidying and reshaping

[Slides] [Source]

Unit 1 - Deck 10: Tips for effective data visualization

[Slides] [Source]

Unit 1 - Deck 12: Communicating data science results effectively

[Slides] [Source]

Unit 1 - Deck 13: Web scraping

[Slides] [Source]

7.2 Labs

Lab 1: Hello R

Introduction to R, R Markdown, Git, and GitHub

[Instructions] [Source] [Starter]

Lab 2: Plastic waste

Introduction to working with data in R with the tidyverse

[Instructions] [Source] [Starter]

Lab 3: Nobel laureates

Data wrangling and tidying

[Instructions] [Source] [Starter]

Lab 4: La Quinta is Spanish for ‘next to Denny’s’, Pt. 1

Visualizing spatial data

[Instructions] [Source]

Lab 5: La Quinta is Spanish for ‘next to Denny’s’, Pt. 2

Wrangling spatial data

[Instructions] [Source]

Lab 6: Ugly charts

Critiquing and improving data visualisations

[Instructions] [Source] [Starter]

Lab 7: Simpson’s paradox

Data visualisation, confounding, multivariable relationships

[Instructions] [Source] [Starter]

Lab 8: University of Edinburgh Art Collection

Web scraping, function, iteration

[Instructions] [Source] [Starter]

7.3 Homework assignments

HW 1: Edinburgh Airbnb rentals

Introduction to working with data in R with the tidyverse

[Instructions] [Source] [Starter]

HW 2: North Carolina bike crashes

Data wrangling, tidying, and visualization

[Instructions] [Source] [Starter]

HW 3: What should I major in?

Data wrangling, summarizing, and visualization

[Instructions] [Source] [Starter]

HW 4: Legos and instructors

Data wrangling, summarizing, and visualization

[Instructions] [Source] [Starter]

HW 5: Money in politics

Web scraping, functions, and iteration

[Instructions] [Source] [Starter]