+ - 0:00:00
Notes for current slide
Notes for next slide

Tidy data



Data Science in a Box

1 / 6

Tidy data

Happy families are all alike; every unhappy family is unhappy in its own way.

Leo Tolstoy

2 / 6

Tidy data

Happy families are all alike; every unhappy family is unhappy in its own way.

Leo Tolstoy

Characteristics of tidy data:

  • Each variable forms a column.
  • Each observation forms a row.
  • Each type of observational unit forms a table.
2 / 6

Tidy data

Happy families are all alike; every unhappy family is unhappy in its own way.

Leo Tolstoy

Characteristics of tidy data:

  • Each variable forms a column.
  • Each observation forms a row.
  • Each type of observational unit forms a table.

Characteristics of untidy data:

!@#$%^&*()

2 / 6

What makes this data not tidy?

3 / 6

Displaying vs. summarising data

## # A tibble: 87 × 3
## name height mass
## <chr> <int> <dbl>
## 1 Luke Skywalker 172 77
## 2 C-3PO 167 75
## 3 R2-D2 96 32
## 4 Darth Vader 202 136
## 5 Leia Organa 150 49
## 6 Owen Lars 178 120
## # … with 81 more rows
## # A tibble: 3 × 2
## gender avg_ht
## <chr> <dbl>
## 1 feminine 165.
## 2 masculine 177.
## 3 <NA> 181.
starwars %>%
select(name, height, mass)
starwars %>%
group_by(gender) %>%
summarize(
avg_ht = mean(height, na.rm = TRUE)
)
6 / 6

Tidy data

Happy families are all alike; every unhappy family is unhappy in its own way.

Leo Tolstoy

2 / 6
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow