+ - 0:00:00
Notes for current slide
Notes for next slide

Fitting and interpreting models



Data Science in a Box

1 / 27

Models with numerical explanatory variables

2 / 27

Data: Paris Paintings

pp <- read_csv("data/paris-paintings.csv", na = c("n/a", "", "NA"))
  • Number of observations: 3393
  • Number of variables: 61
3 / 27

Goal: Predict height from width

height^i=β0+β1×widthi

4 / 27

Step 1: Specify model

linear_reg()
## Linear Regression Model Specification (regression)
##
## Computational engine: lm
6 / 27

Step 2: Set model fitting engine

linear_reg() %>%
set_engine("lm") # lm: linear model
## Linear Regression Model Specification (regression)
##
## Computational engine: lm
7 / 27

Step 3: Fit model & estimate parameters

... using formula syntax

linear_reg() %>%
set_engine("lm") %>%
fit(Height_in ~ Width_in, data = pp)
## parsnip model object
##
##
## Call:
## stats::lm(formula = Height_in ~ Width_in, data = data)
##
## Coefficients:
## (Intercept) Width_in
## 3.6214 0.7808
8 / 27

A closer look at model output

## parsnip model object
##
##
## Call:
## stats::lm(formula = Height_in ~ Width_in, data = data)
##
## Coefficients:
## (Intercept) Width_in
## 3.6214 0.7808

height^i=3.6214+0.7808×widthi

9 / 27

A tidy look at model output

linear_reg() %>%
set_engine("lm") %>%
fit(Height_in ~ Width_in, data = pp) %>%
tidy()
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3.62 0.254 14.3 8.82e-45
## 2 Width_in 0.781 0.00950 82.1 0

height^i=3.62+0.781×widthi

10 / 27

Slope and intercept

height^i=3.62+0.781×widthi

11 / 27

Slope and intercept

height^i=3.62+0.781×widthi

  • Slope: For each additional inch the painting is wider, the height is expected to be higher, on average, by 0.781 inches.
11 / 27

Slope and intercept

height^i=3.62+0.781×widthi

  • Slope: For each additional inch the painting is wider, the height is expected to be higher, on average, by 0.781 inches.
  • Intercept: Paintings that are 0 inches wide are expected to be 3.62 inches high, on average. (Does this make sense?)
11 / 27

Correlation does not imply causation

Remember this when interpreting model coefficients

Source: XKCD, Cell phones

12 / 27

Parameter estimation

13 / 27

Linear model with a single predictor

  • We're interested in β0 (population parameter for the intercept) and β1 (population parameter for the slope) in the following model:

y^i=β0+β1 xi

14 / 27

Linear model with a single predictor

  • We're interested in β0 (population parameter for the intercept) and β1 (population parameter for the slope) in the following model:

y^i=β0+β1 xi

  • Tough luck, you can't have them...
14 / 27

Linear model with a single predictor

  • We're interested in β0 (population parameter for the intercept) and β1 (population parameter for the slope) in the following model:

y^i=β0+β1 xi

  • Tough luck, you can't have them...
  • So we use sample statistics to estimate them:

y^i=b0+b1 xi

14 / 27

Least squares regression

  • The regression line minimizes the sum of squared residuals.
15 / 27

Least squares regression

  • The regression line minimizes the sum of squared residuals.
  • If ei=yiy^i, then, the regression line minimizes i=1nei2.
15 / 27

Visualizing residuals

16 / 27

Visualizing residuals (cont.)

17 / 27

Visualizing residuals (cont.)

18 / 27

Properties of least squares regression

  • The regression line goes through the center of mass point, the coordinates corresponding to average x and average y, (x¯,y¯):

y¯=b0+b1x¯  b0=y¯b1x¯

19 / 27

Properties of least squares regression

  • The regression line goes through the center of mass point, the coordinates corresponding to average x and average y, (x¯,y¯):

y¯=b0+b1x¯  b0=y¯b1x¯

  • The slope has the same sign as the correlation coefficient: b1=rsysx
19 / 27

Properties of least squares regression

  • The regression line goes through the center of mass point, the coordinates corresponding to average x and average y, (x¯,y¯):

y¯=b0+b1x¯  b0=y¯b1x¯

  • The slope has the same sign as the correlation coefficient: b1=rsysx
  • The sum of the residuals is zero: i=1nei=0
19 / 27

Properties of least squares regression

  • The regression line goes through the center of mass point, the coordinates corresponding to average x and average y, (x¯,y¯):

y¯=b0+b1x¯  b0=y¯b1x¯

  • The slope has the same sign as the correlation coefficient: b1=rsysx
  • The sum of the residuals is zero: i=1nei=0
  • The residuals and x values are uncorrelated
19 / 27

Models with categorical explanatory variables

20 / 27

Categorical predictor with 2 levels

## # A tibble: 3,393 × 3
## name Height_in landsALL
## <chr> <dbl> <dbl>
## 1 L1764-2 37 0
## 2 L1764-3 18 0
## 3 L1764-4 13 1
## 4 L1764-5a 14 1
## 5 L1764-5b 14 1
## 6 L1764-6 7 0
## 7 L1764-7a 6 0
## 8 L1764-7b 6 0
## 9 L1764-8 15 0
## 10 L1764-9a 9 0
## 11 L1764-9b 9 0
## 12 L1764-10a 16 1
## 13 L1764-10b 16 1
## 14 L1764-10c 16 1
## 15 L1764-11 20 0
## 16 L1764-12a 14 1
## 17 L1764-12b 14 1
## 18 L1764-13a 15 1
## 19 L1764-13b 15 1
## 20 L1764-14 37 0
## # … with 3,373 more rows
  • landsALL = 0: No landscape features
  • landsALL = 1: Some landscape features
21 / 27

Height & landscape features

linear_reg() %>%
set_engine("lm") %>%
fit(Height_in ~ factor(landsALL), data = pp) %>%
tidy()
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 22.7 0.328 69.1 0
## 2 factor(landsALL)1 -5.65 0.532 -10.6 7.97e-26
22 / 27

Height & landscape features

Heightin^=22.75.645 landsALL

  • Slope: Paintings with landscape features are expected, on average, to be 5.645 inches shorter than paintings that without landscape features

    • Compares baseline level (landsALL = 0) to the other level (landsALL = 1)
  • Intercept: Paintings that don't have landscape features are expected, on average, to be 22.7 inches tall

23 / 27

Relationship between height and school

linear_reg() %>%
set_engine("lm") %>%
fit(Height_in ~ school_pntg, data = pp) %>%
tidy()
## # A tibble: 7 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 14.0 10.0 1.40 0.162
## 2 school_pntgD/FL 2.33 10.0 0.232 0.816
## 3 school_pntgF 10.2 10.0 1.02 0.309
## 4 school_pntgG 1.65 11.9 0.139 0.889
## 5 school_pntgI 10.3 10.0 1.02 0.306
## 6 school_pntgS 30.4 11.4 2.68 0.00744
## 7 school_pntgX 2.87 10.3 0.279 0.780
24 / 27

Dummy variables

## # A tibble: 7 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 14.0 10.0 1.40 0.162
## 2 school_pntgD/FL 2.33 10.0 0.232 0.816
## 3 school_pntgF 10.2 10.0 1.02 0.309
## 4 school_pntgG 1.65 11.9 0.139 0.889
## 5 school_pntgI 10.3 10.0 1.02 0.306
## 6 school_pntgS 30.4 11.4 2.68 0.00744
## 7 school_pntgX 2.87 10.3 0.279 0.780
  • When the categorical explanatory variable has many levels, they're encoded to dummy variables
  • Each coefficient describes the expected difference between heights in that particular school compared to the baseline level
25 / 27

Categorical predictor with 3+ levels

school_pntg D_FL F G I S X
A 0 0 0 0 0 0
D/FL 1 0 0 0 0 0
F 0 1 0 0 0 0
G 0 0 1 0 0 0
I 0 0 0 1 0 0
S 0 0 0 0 1 0
X 0 0 0 0 0 1
## # A tibble: 3,393 × 3
## name Height_in school_pntg
## <chr> <dbl> <chr>
## 1 L1764-2 37 F
## 2 L1764-3 18 I
## 3 L1764-4 13 D/FL
## 4 L1764-5a 14 F
## 5 L1764-5b 14 F
## 6 L1764-6 7 I
## 7 L1764-7a 6 F
## 8 L1764-7b 6 F
## 9 L1764-8 15 I
## 10 L1764-9a 9 D/FL
## 11 L1764-9b 9 D/FL
## 12 L1764-10a 16 X
## 13 L1764-10b 16 X
## 14 L1764-10c 16 X
## 15 L1764-11 20 D/FL
## 16 L1764-12a 14 D/FL
## 17 L1764-12b 14 D/FL
## 18 L1764-13a 15 D/FL
## 19 L1764-13b 15 D/FL
## 20 L1764-14 37 F
## # … with 3,373 more rows
26 / 27

Relationship between height and school

## # A tibble: 7 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 14.0 10.0 1.40 0.162
## 2 school_pntgD/FL 2.33 10.0 0.232 0.816
## 3 school_pntgF 10.2 10.0 1.02 0.309
## 4 school_pntgG 1.65 11.9 0.139 0.889
## 5 school_pntgI 10.3 10.0 1.02 0.306
## 6 school_pntgS 30.4 11.4 2.68 0.00744
## 7 school_pntgX 2.87 10.3 0.279 0.780
  • Austrian school (A) paintings are expected, on average, to be 14 inches tall.
  • Dutch/Flemish school (D/FL) paintings are expected, on average, to be 2.33 inches taller than Austrian school paintings.
  • French school (F) paintings are expected, on average, to be 10.2 inches taller than Austrian school paintings.
  • German school (G) paintings are expected, on average, to be 1.65 inches taller than Austrian school paintings.
  • Italian school (I) paintings are expected, on average, to be 10.3 inches taller than Austrian school paintings.
  • Spanish school (S) paintings are expected, on average, to be 30.4 inches taller than Austrian school paintings.
  • Paintings whose school is unknown (X) are expected, on average, to be 2.87 inches taller than Austrian school paintings.
27 / 27

Models with numerical explanatory variables

2 / 27
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow