The data come from the openintro package: elmhurst
.
linear_reg() %>% set_engine("lm") %>% fit(gift_aid ~ family_income, data = elmhurst) %>% tidy()
## # A tibble: 2 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 24.3 1.29 18.8 8.28e-24## 2 family_income -0.0431 0.0108 -3.98 2.29e- 4
## # A tibble: 2 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 24.3 1.29 18.8 8.28e-24## 2 family_income -0.0431 0.0108 -3.98 2.29e- 4
## # A tibble: 2 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 24.3 1.29 18.8 8.28e-24## 2 family_income -0.0431 0.0108 -3.98 2.29e- 4
For each additional $1,000 of family income, we would expect students to receive a net difference of 1,000 * (-0.0431) = -$43.10 in aid on average, i.e. $43.10 less in gift aid, on average.
... is the process of using sample data to make conclusions about the underlying population the sample came from
So far we have done lots of estimation (mean, median, slope, etc.), i.e.
If you want to estimate a population parameter, do you prefer to report a range of values the parameter might be in, or a single value?
If you want to estimate a population parameter, do you prefer to report a range of values the parameter might be in, or a single value?
A plausible range of values for the population parameter is a confidence interval.
A plausible range of values for the population parameter is a confidence interval.
A plausible range of values for the population parameter is a confidence interval.
A plausible range of values for the population parameter is a confidence interval.
A plausible range of values for the population parameter is a confidence interval.
Suppose we split the class in half down the middle of the classroom and ask each student their heights. Then, we calculate the mean height of students on each side of the classroom. Would you expect these two means to be exactly equal, close but not equal, or wildly different?
Suppose we split the class in half down the middle of the classroom and ask each student their heights. Then, we calculate the mean height of students on each side of the classroom. Would you expect these two means to be exactly equal, close but not equal, or wildly different?
Suppose you randomly sample 50 students and 5 of them are left handed. If you were to take another random sample of 50 students, how many would you expect to be left handed? Would you be surprised if only 3 of them were left handed? Would you be surprised if 40 of them were left handed?
We can quantify the variability of sample statistics using
or
## # A tibble: 2 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 24.3 1.29 18.8 8.28e-24## 2 family_income -0.0431 0.0108 -3.98 2.29e- 4
🥾
Generated assuming there are more students like the ones in the observed sample...
elmhurtst_boot_1 <- elmhurst %>% slice_sample(n = 50, replace = TRUE)
elmhurtst_boot_2 <- elmhurst %>% slice_sample(n = 50, replace = TRUE)
elmhurtst_boot_3 <- elmhurst %>% slice_sample(n = 50, replace = TRUE)
elmhurtst_boot_4 <- elmhurst %>% slice_sample(n = 50, replace = TRUE)
## # A tibble: 1 × 2## lower_ci upper_ci## <dbl> <dbl>## 1 -0.0695 -0.0232
We are 95% confident that for each additional $1,000 of family income, we would expect students to receive $69.5 to $23.24 less in gift aid, on average.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |