read_csv()
- comma delimited filesread_csv2()
- semicolon separated files (common in countries where , is used as the decimal place)read_tsv()
- tab delimited filesread_delim()
- reads in files with any delimiterread_fwf()
- fixed width filesread_csv()
- comma delimited filesread_csv2()
- semicolon separated files (common in countries where , is used as the decimal place)read_tsv()
- tab delimited filesread_delim()
- reads in files with any delimiterread_fwf()
- fixed width filesread_excel()
- read xls or xlsx filesnobel <- read_csv(file = "data/nobel.csv")nobel
## # A tibble: 935 × 26## id firstname surname year category affiliation city ## <dbl> <chr> <chr> <dbl> <chr> <chr> <chr>## 1 1 Wilhelm Conrad Röntgen 1901 Physics Munich Uni… Muni…## 2 2 Hendrik A. Lorentz 1902 Physics Leiden Uni… Leid…## 3 3 Pieter Zeeman 1902 Physics Amsterdam … Amst…## 4 4 Henri Becquerel 1903 Physics École Poly… Paris## 5 5 Pierre Curie 1903 Physics École muni… Paris## 6 6 Marie Curie 1903 Physics <NA> <NA> ## # … with 929 more rows, and 19 more variables: country <chr>,## # born_date <date>, died_date <date>, gender <chr>,## # born_city <chr>, born_country <chr>,## # born_country_code <chr>, died_city <chr>,## # died_country <chr>, died_country_code <chr>,## # overall_motivation <chr>, share <dbl>, motivation <chr>,## # born_country_original <chr>, born_city_original <chr>, …
df <- tribble( ~x, ~y, 1, "a", 2, "b", 3, "c")write_csv(df, file = "data/df.csv")
df <- tribble( ~x, ~y, 1, "a", 2, "b", 3, "c")write_csv(df, file = "data/df.csv")
read_csv("data/df.csv")
## # A tibble: 3 × 2## x y ## <dbl> <chr>## 1 1 a ## 2 2 b ## 3 3 c
AE 06 - Nobels and sales + Data import
> open nobels-csv.Rmd
and knit.nobels.csv
file from the data-raw/
folder.nobel_stem
, that filters for the STEM fields
(Physics, Medicine, Chemistry, and Economics).nobel_nonstem
, that filters for the remaining
fields. nobel-stem.csv
and nobel-nonstem.csv
,
respectively, to data/
.Hint: Use the %in%
operator when filter()
ing.
edibnb_badnames <- read_csv("data/edibnb-badnames.csv")names(edibnb_badnames)
## [1] "ID" "Price" ## [3] "neighbourhood" "accommodates" ## [5] "Number of bathrooms" "Number of Bedrooms" ## [7] "n beds" "Review Scores Rating"## [9] "Number of reviews" "listing_url"
edibnb_badnames <- read_csv("data/edibnb-badnames.csv")names(edibnb_badnames)
## [1] "ID" "Price" ## [3] "neighbourhood" "accommodates" ## [5] "Number of bathrooms" "Number of Bedrooms" ## [7] "n beds" "Review Scores Rating"## [9] "Number of reviews" "listing_url"
... but R doesn't allow spaces in variable names
ggplot(edibnb_badnames, aes(x = Number of bathrooms, y = Price)) + geom_point()
## Error: <text>:1:40: unexpected symbol## 1: ggplot(edibnb_badnames, aes(x = Number of## ^
edibnb_col_names <- read_csv("data/edibnb-badnames.csv", col_names = c("id", "price", "neighbourhood", "accommodates", "bathroom", "bedroom", "bed", "review_scores_rating", "n_reviews", "url"))names(edibnb_col_names)
## [1] "id" "price" ## [3] "neighbourhood" "accommodates" ## [5] "bathroom" "bedroom" ## [7] "bed" "review_scores_rating"## [9] "n_reviews" "url"
edibnb_clean_names <- read_csv("data/edibnb-badnames.csv") %>% janitor::clean_names()names(edibnb_clean_names)
## [1] "id" "price" ## [3] "neighbourhood" "accommodates" ## [5] "number_of_bathrooms" "number_of_bedrooms" ## [7] "n_beds" "review_scores_rating"## [9] "number_of_reviews" "listing_url"
Which type is x
? Why?
read_csv("data/df-na.csv")
## # A tibble: 9 × 3## x y z ## <chr> <chr> <chr> ## 1 1 a hi ## 2 <NA> b hello ## 3 3 Not applicable 9999 ## 4 4 d ola ## 5 5 e hola ## 6 . f whatup## 7 7 g wassup## 8 8 h sup ## 9 9 i <NA>
read_csv("data/df-na.csv", na = c("", "NA", ".", "9999", "Not applicable"))
## # A tibble: 9 × 3## x y z ## <dbl> <chr> <chr> ## 1 1 a hi ## 2 NA b hello ## 3 3 <NA> <NA> ## 4 4 d ola ## 5 5 e hola ## 6 NA f whatup## 7 7 g wassup## 8 8 h sup ## 9 9 i <NA>
read_csv("data/df-na.csv", col_types = list(col_double(), col_character(), col_character()))
## Warning: One or more parsing issues, see `problems()` for details
## # A tibble: 9 × 3## x y z ## <dbl> <chr> <chr> ## 1 1 a hi ## 2 NA b hello ## 3 3 Not applicable 9999 ## 4 4 d ola ## 5 5 e hola ## 6 NA f whatup## 7 7 g wassup## 8 8 h sup ## 9 9 i <NA>
type function | data type |
---|---|
col_character() |
character |
col_date() |
date |
col_datetime() |
POSIXct (date-time) |
col_double() |
double (numeric) |
col_factor() |
factor |
col_guess() |
let readr guess (default) |
col_integer() |
integer |
col_logical() |
logical |
col_number() |
numbers mixed with non-number characters |
col_numeric() |
double or integer |
col_skip() |
do not read |
col_time() |
time |
Wondering where you remember these from?
read_csv("data/df-na.csv")
## Rows: 9 Columns: 3## ── Column specification ─────────────────────────────────────────## Delimiter: ","## chr (3): x, y, z## ## ℹ Use `spec()` to retrieve the full column specification for this data.## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 9 × 3## x y z ## <chr> <chr> <chr> ## 1 1 a hi ## 2 <NA> b hello ## 3 3 Not applicable 9999 ## 4 4 d ola ...
fav_food <- read_excel("data/favourite-food.xlsx")fav_food
## # A tibble: 5 × 6## `Student ID` `Full Name` favourite.food mealPlan AGE SES ## <dbl> <chr> <chr> <chr> <chr> <chr>## 1 1 Sunil Huffmann Strawberry yo… Lunch o… 4 High ## 2 2 Barclay Lynn French fries Lunch o… 5 Midd…## 3 3 Jayendra Lyne N/A Breakfa… 7 Low ## 4 4 Leon Rossini Anchovies Lunch o… 99999 Midd…## 5 5 Chidiegwu Dun… Pizza Breakfa… five High
fav_food <- read_excel("data/favourite-food.xlsx") %>% janitor::clean_names()fav_food
## # A tibble: 5 × 6## student_id full_name favourite_food meal_plan age ses ## <dbl> <chr> <chr> <chr> <chr> <chr>## 1 1 Sunil Huffmann Strawberry yo… Lunch on… 4 High ## 2 2 Barclay Lynn French fries Lunch on… 5 Midd…## 3 3 Jayendra Lyne N/A Breakfas… 7 Low ## 4 4 Leon Rossini Anchovies Lunch on… 99999 Midd…## 5 5 Chidiegwu Dunk… Pizza Breakfas… five High
fav_food <- read_excel("data/favourite-food.xlsx", na = c("N/A", "99999")) %>% janitor::clean_names()fav_food
## # A tibble: 5 × 6## student_id full_name favourite_food meal_plan age ses ## <dbl> <chr> <chr> <chr> <chr> <chr>## 1 1 Sunil Huffmann Strawberry yo… Lunch on… 4 High ## 2 2 Barclay Lynn French fries Lunch on… 5 Midd…## 3 3 Jayendra Lyne <NA> Breakfas… 7 Low ## 4 4 Leon Rossini Anchovies Lunch on… <NA> Midd…## 5 5 Chidiegwu Dunk… Pizza Breakfas… five High
age
numericfav_food <- fav_food %>% mutate( age = if_else(age == "five", "5", age), age = as.numeric(age) )glimpse(fav_food)
## Rows: 5## Columns: 6## $ student_id <dbl> 1, 2, 3, 4, 5## $ full_name <chr> "Sunil Huffmann", "Barclay Lynn", "Jayen…## $ favourite_food <chr> "Strawberry yoghurt", "French fries", NA…## $ meal_plan <chr> "Lunch only", "Lunch only", "Breakfast a…## $ age <dbl> 4, 5, 7, NA, 5## $ ses <chr> "High", "Middle", "Low", "Middle", "High"
What order are the levels of ses
listed in?
fav_food %>% count(ses)
## # A tibble: 3 × 2## ses n## <chr> <int>## 1 High 2## 2 Low 1## 3 Middle 2
ses
factorfav_food <- fav_food %>% mutate(ses = fct_relevel(ses, "Low", "Middle", "High"))fav_food %>% count(ses)
## # A tibble: 3 × 2## ses n## <fct> <int>## 1 Low 1## 2 Middle 2## 3 High 2
fav_food <- read_excel("data/favourite-food.xlsx", na = c("N/A", "99999")) %>% janitor::clean_names() %>% mutate( age = if_else(age == "five", "5", age), age = as.numeric(age), ses = fct_relevel(ses, "Low", "Middle", "High") )fav_food
## # A tibble: 5 × 6## student_id full_name favourite_food meal_plan age ses ## <dbl> <chr> <chr> <chr> <dbl> <fct>## 1 1 Sunil Huffmann Strawberry yo… Lunch on… 4 High ## 2 2 Barclay Lynn French fries Lunch on… 5 Midd…## 3 3 Jayendra Lyne <NA> Breakfas… 7 Low ## 4 4 Leon Rossini Anchovies Lunch on… NA Midd…## 5 5 Chidiegwu Dunk… Pizza Breakfas… 5 High
write_csv(fav_food, file = "data/fav-food-clean.csv")fav_food_clean <- read_csv("data/fav-food-clean.csv")
What happened to ses
again?
fav_food_clean %>% count(ses)
## # A tibble: 3 × 2## ses n## <chr> <int>## 1 High 2## 2 Low 1## 3 Middle 2
read_rds()
and write_rds()
read_rds()
and write_rds()
, respectively.read_rds(path)write_rds(x, path)
write_rds(fav_food, file = "data/fav-food-clean.rds")fav_food_clean <- read_rds("data/fav-food-clean.rds")fav_food_clean %>% count(ses)
## # A tibble: 3 × 2## ses n## <fct> <int>## 1 Low 1## 2 Middle 2## 3 High 2
AE 06 - Nobels and sales + Data import
> sales-excel.Rmd
. sales.xlsx
file from the data-raw/
folder, using appropriate arguments for the read_excel()
function such that it looks like the output on the left.## # A tibble: 9 × 2## id n ## <chr> <chr>## 1 Brand 1 n ## 2 1234 8 ## 3 8721 2 ## 4 1822 3 ## 5 Brand 2 n ## 6 3333 1 ## # … with 3 more rows
## # A tibble: 7 × 3## brand id n## <chr> <dbl> <dbl>## 1 Brand 1 1234 8## 2 Brand 1 8721 2## 3 Brand 1 1822 3## 4 Brand 2 3333 1## 5 Brand 2 2156 3## 6 Brand 2 3987 6## # … with 1 more row
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |