+ - 0:00:00
Notes for current slide
Notes for next slide

Visualising data with ggplot2



Data Science in a Box

1 / 40

ggplot2 ❤️ 🐧

2 / 40

ggplot2 \(\in\) tidyverse

  • ggplot2 is tidyverse's data visualization package
  • Structure of the code for plots can be summarized as
ggplot(data = [dataset],
mapping = aes(x = [x-variable],
y = [y-variable])) +
geom_xxx() +
other options
3 / 40

Data: Palmer Penguins

Measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.

library(palmerpenguins)
glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adeli…
## $ island <fct> Torgersen, Torgersen, Torgersen, Torg…
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.…
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.…
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195…
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 362…
## $ sex <fct> male, female, female, NA, female, mal…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2…
4 / 40

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm, y = bill_length_mm,
colour = species)) +
geom_point() +
labs(title = "Bill depth and length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = "Bill depth (mm)", y = "Bill length (mm)",
colour = "Species")
## Warning: Removed 2 rows containing missing values (geom_point).
5 / 40

Coding out loud

6 / 40

Start with the penguins data frame

ggplot(data = penguins)

7 / 40

Start with the penguins data frame, map bill depth to the x-axis

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm))

8 / 40

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis.

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm))

9 / 40

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm)) +
geom_point()

10 / 40

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the colour of each point.

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point()

11 / 40

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the colour of each point. Title the plot "Bill depth and length"

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point() +
labs(title = "Bill depth and length")

12 / 40

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the colour of each point. Title the plot "Bill depth and length", add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins"

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point() +
labs(title = "Bill depth and length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins")

13 / 40

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the colour of each point. Title the plot "Bill depth and length", add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point() +
labs(title = "Bill depth and length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = "Bill depth (mm)", y = "Bill length (mm)")

14 / 40

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the colour of each point. Title the plot "Bill depth and length", add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, label the legend "Species"

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point() +
labs(title = "Bill depth and length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = "Bill depth (mm)", y = "Bill length (mm)",
colour = "Species")

15 / 40

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the colour of each point. Title the plot "Bill depth and length", add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, label the legend "Species", and add a caption for the data source.

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point() +
labs(title = "Bill depth and length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = "Bill depth (mm)", y = "Bill length (mm)",
colour = "Species",
caption = "Source: Palmer Station LTER / palmerpenguins package")

16 / 40

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the colour of each point. Title the plot "Bill depth and length", add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, label the legend "Species", and add a caption for the data source. Finally, use a discrete colour scale that is designed to be perceived by viewers with common forms of colour blindness.

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point() +
labs(title = "Bill depth and length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = "Bill depth (mm)", y = "Bill length (mm)",
colour = "Species",
caption = "Source: Palmer Station LTER / palmerpenguins package") +
scale_colour_viridis_d()

17 / 40

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point() +
labs(title = "Bill depth and length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = "Bill depth (mm)", y = "Bill length (mm)",
colour = "Species",
caption = "Source: Palmer Station LTER / palmerpenguins package") +
scale_colour_viridis_d()
## Warning: Removed 2 rows containing missing values (geom_point).

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis.

Represent each observation with a point and map species to the colour of each point.

Title the plot "Bill depth and length", add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, label the legend "Species", and add a caption for the data source.

Finally, use a discrete colour scale that is designed to be perceived by viewers with common forms of colour blindness.

18 / 40

Argument names

You can omit the names of first two arguments when building plots with ggplot().

ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point() +
scale_colour_viridis_d()
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point() +
scale_colour_viridis_d()
19 / 40

Aesthetics

20 / 40

Aesthetics options

Commonly used characteristics of plotting characters that can be mapped to a specific variable in the data are

  • colour
  • shape
  • size
  • alpha (transparency)
21 / 40

Colour

ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point() +
scale_colour_viridis_d()

22 / 40

Shape

Mapped to a different variable than colour

ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species,
shape = island)) +
geom_point() +
scale_colour_viridis_d()

23 / 40

Shape

Mapped to same variable as colour

ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species,
shape = species)) +
geom_point() +
scale_colour_viridis_d()

24 / 40

Size

ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species,
shape = species,
size = body_mass_g)) +
geom_point() +
scale_colour_viridis_d()

25 / 40

Alpha

ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species,
shape = species,
size = body_mass_g,
alpha = flipper_length_mm)) +
geom_point() +
scale_colour_viridis_d()

26 / 40

Mapping

ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
size = body_mass_g,
alpha = flipper_length_mm)) +
geom_point()

Setting

ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm)) +
geom_point(size = 2, alpha = 0.5)

27 / 40

Mapping vs. setting

  • Mapping: Determine the size, alpha, etc. of points based on the values of a variable in the data

    • goes into aes()
  • Setting: Determine the size, alpha, etc. of points not based on the values of a variable in the data

    • goes into geom_*() (this was geom_point() in the previous example, but we'll learn about other geoms soon!)
28 / 40

Faceting

29 / 40

Faceting

  • Smaller plots that display different subsets of the data
  • Useful for exploring conditional relationships and large data
30 / 40

ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) +
geom_point() +
facet_grid(species ~ island)
## Warning: Removed 2 rows containing missing values (geom_point).
31 / 40

Various ways to facet

In the next few slides describe what each plot displays. Think about how the code relates to the output.

Note: The plots in the next few slides do not have proper titles, axis labels, etc. because we want you to figure out what's happening in the plots. But you should always label your plots!

32 / 40
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) +
geom_point() +
facet_grid(species ~ sex)

33 / 40
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) +
geom_point() +
facet_grid(sex ~ species)

34 / 40
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) +
geom_point() +
facet_wrap(~ species)

35 / 40
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) +
geom_point() +
facet_grid(. ~ species)

36 / 40
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) +
geom_point() +
facet_wrap(~ species, ncol = 2)

37 / 40

Faceting summary

  • facet_grid():
    • 2d grid
    • rows ~ cols
    • use . for no split
  • facet_wrap(): 1d ribbon wrapped according to number of rows and columns specified or available plotting area
38 / 40

Facet and color

ggplot(
penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species)) +
geom_point() +
facet_grid(species ~ sex) +
scale_color_viridis_d()

39 / 40

Face and color, no legend

ggplot(
penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species)) +
geom_point() +
facet_grid(species ~ sex) +
scale_color_viridis_d() +
guides(color = "none")

40 / 40

ggplot2 ❤️ 🐧

2 / 40
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow