Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
+ - 0:00:00
Notes for current slide
Notes for next slide

Meet the toolkit:
programming



Data Science in a Box

1 / 23

Course toolkit


Course operation

  • introds.org
  • Learn
  • Zoom
  • Teams
  • Piazza

Doing data science

  • Programming:
    • R
    • RStudio
    • tidyverse
    • R Markdown
  • Version control and collaboration:
    • Git
    • GitHub
2 / 23

Learning goals

By the end of the course, you will be able to...

3 / 23

Learning goals

By the end of the course, you will be able to...

  • gain insight from data
3 / 23

Learning goals

By the end of the course, you will be able to...

  • gain insight from data
  • gain insight from data, reproducibly
3 / 23

Learning goals

By the end of the course, you will be able to...

  • gain insight from data
  • gain insight from data, reproducibly
  • gain insight from data, reproducibly, using modern programming tools and techniques
3 / 23

Learning goals

By the end of the course, you will be able to...

  • gain insight from data
  • gain insight from data, reproducibly
  • gain insight from data, reproducibly, using modern programming tools and techniques
  • gain insight from data, reproducibly and collaboratively, using modern programming tools and techniques
3 / 23

Learning goals

By the end of the course, you will be able to...

  • gain insight from data
  • gain insight from data, reproducibly
  • gain insight from data, reproducibly, using modern programming tools and techniques
  • gain insight from data, reproducibly and collaboratively, using modern programming tools and techniques
  • gain insight from data, reproducibly (with literate programming and version control) and collaboratively, using modern programming tools and techniques
3 / 23

Reproducible data analysis

4 / 23

Reproducibility checklist

What does it mean for a data analysis to be "reproducible"?

5 / 23

Reproducibility checklist

What does it mean for a data analysis to be "reproducible"?

Near-term goals:

  • Are the tables and figures reproducible from the code and data?
  • Does the code actually do what you think it does?
  • In addition to what was done, is it clear why it was done?

Long-term goals:

  • Can the code be used for other data?
  • Can you extend the code to do other things?
5 / 23

Toolkit for reproducibility

  • Scriptability R
  • Literate programming (code, narrative, output in one place) R Markdown
  • Version control Git / GitHub
6 / 23

R and RStudio

7 / 23

R and RStudio

  • R is an open-source statistical programming language
  • R is also an environment for statistical computing and graphics
  • It's easily extensible with packages

  • RStudio is a convenient interface for R called an IDE (integrated development environment), e.g. "I write R code in the RStudio IDE"
  • RStudio is not a requirement for programming with R, but it's very commonly used by R programmers and data scientists
8 / 23

R packages

  • Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data1

  • As of September 2020, there are over 16,000 R packages available on CRAN (the Comprehensive R Archive Network)2

  • We're going to work with a small (but important) subset of these!

1 Wickham and Bryan, R Packages.

2 CRAN contributed packages.

9 / 23

Tour: R and RStudio

10 / 23

A short list (for now) of R essentials

  • Functions are (most often) verbs, followed by what they will be applied to in parentheses:
do_this(to_this)
do_that(to_this, to_that, with_those)
11 / 23

A short list (for now) of R essentials

  • Functions are (most often) verbs, followed by what they will be applied to in parentheses:
do_this(to_this)
do_that(to_this, to_that, with_those)
  • Packages are installed with the install.packages function and loaded with the library function, once per session:
install.packages("package_name")
library(package_name)
11 / 23

R essentials (continued)

  • Columns (variables) in data frames are accessed with $:
dataframe$var_name
12 / 23

R essentials (continued)

  • Columns (variables) in data frames are accessed with $:
dataframe$var_name
  • Object documentation can be accessed with ?
?mean
12 / 23

tidyverse

  • The tidyverse is an opinionated collection of R packages designed for data science
  • All packages share an underlying philosophy and a common grammar
13 / 23

rmarkdown

  • rmarkdown and the various packages that support it enable R users to write their code and prose in reproducible computational documents
  • We will generally refer to R Markdown documents (with .Rmd extension), e.g. "Do this in your R Markdown document" and rarely discuss loading the rmarkdown package

14 / 23

R Markdown

15 / 23

R Markdown

  • Fully reproducible reports -- each time you knit the analysis is ran from the beginning
  • Simple markdown syntax for text
  • Code goes in chunks, defined by three backticks, narrative goes outside of chunks
16 / 23

Tour: R Markdown

17 / 23

Environments

The environment of your R Markdown document is separate from the Console!

Remember this, and expect it to bite you a few times as you're learning to work with R Markdown!

18 / 23

Environments

First, run the following in the console

x <- 2
x * 3

All looks good, eh?

19 / 23

Environments

First, run the following in the console

x <- 2
x * 3

All looks good, eh?

Then, add the following in an R chunk in your R Markdown document

x * 3

What happens? Why the error?

19 / 23

R Markdown help

R Markdown Cheat Sheet
Help -> Cheatsheets

Markdown Quick Reference
Help -> Markdown Quick Reference

20 / 23

How will we use R Markdown?

  • Every assignment / report / project / etc. is an R Markdown document
  • You'll always have a template R Markdown document to start with
  • The amount of scaffolding in the template will decrease over the semester
21 / 23

What's with all the hexes?

Mitchell O'Hara-Wild, useR! 2018 feature wall

22 / 23

Your turn: AE 02 - Bechdel + R Markdown

  • The Bechdel test asks whether a work of fiction features at least two women who talk to each other about something other than a man, and there must be two women named characters.
  • Go to RStudio Cloud and start the assignment AE 02 - Bechdel + R Markdown.
  • Open and knit the R Markdown document bechdel.Rmd, review the document, and fill in the blanks.
23 / 23

Course toolkit


Course operation

  • introds.org
  • Learn
  • Zoom
  • Teams
  • Piazza

Doing data science

  • Programming:
    • R
    • RStudio
    • tidyverse
    • R Markdown
  • Version control and collaboration:
    • Git
    • GitHub
2 / 23
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow