Chapter 9 Workflows

  1. Transform data: center/scale Sale_price
  2. Split data
  3. Train on the training set
  4. Evaluate/predict on the test set

Data Leakage: When information from the test set “leaks” into the training data

  1. Split data
  2. Transform training set
  3. Train on training set
  4. Transformation + Evaluate/predict on test set
library(AmesHousing)
library(tidymodels)
## ── Attaching packages ─────────────────────────────────── tidymodels 0.0.3 ──
## ✓ broom     0.5.4     ✓ purrr     0.3.3
## ✓ dials     0.0.4     ✓ recipes   0.1.9
## ✓ dplyr     0.8.4     ✓ rsample   0.0.5
## ✓ ggplot2   3.2.1     ✓ tibble    2.1.3
## ✓ infer     0.5.1     ✓ yardstick 0.0.5
## ✓ parsnip   0.0.5
## ── Conflicts ────────────────────────────────────── tidymodels_conflicts() ──
## x purrr::discard()    masks scales::discard()
## x dplyr::filter()     masks stats::filter()
## x dplyr::lag()        masks stats::lag()
## x ggplot2::margin()   masks dials::margin()
## x recipes::step()     masks stats::step()
## x recipes::yj_trans() masks scales::yj_trans()
lm_spec <- parsnip::linear_reg() %>%
  parsnip::set_engine("lm")
ames_split <- rsample::initial_split(AmesHousing::make_ames(), prop = 0.75)
bb_wf <- workflows::workflow() %>%
  workflows::add_formula(Sale_Price ~ Bedroom_AbvGr + Full_Bath + Half_Bath) %>%
  workflows::add_model(lm_spec)
bb_wf
## ══ Workflow ═════════════════════════════════════════════════════════════════
## Preprocessor: Formula
## Model: linear_reg()
## 
## ── Preprocessor ─────────────────────────────────────────────────────────────
## Sale_Price ~ Bedroom_AbvGr + Full_Bath + Half_Bath
## 
## ── Model ────────────────────────────────────────────────────────────────────
## Linear Regression Model Specification (regression)
## 
## Computational engine: lm
# fit the final best model to the training set and evaluate the test set
fit_split <- tune::last_fit(bb_wf, ames_split)
fit_split
## # # Monte Carlo cross-validation (0.75/0.25) with 1 resamples  
## # A tibble: 1 x 6
##   splits        id           .metrics      .notes      .predictions    .workflow
## * <list>        <chr>        <list>        <list>      <list>          <list>   
## 1 <split [2.2K… train/test … <tibble [2 ×… <tibble [0… <tibble [732 ×… <workflo…
fit_split %>% tune::collect_metrics()
## # A tibble: 2 x 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 rmse    standard   58504.   
## 2 rsq     standard       0.354