Chapter 9 Workflows
- Transform data: center/scale
Sale_price
- Split data
- Train on the training set
- Evaluate/predict on the test set
Data Leakage: When information from the test set “leaks” into the training data
- Split data
- Transform training set
- Train on training set
- Transformation + Evaluate/predict on test set
library(AmesHousing)
library(tidymodels)
## ── Attaching packages ─────────────────────────────────── tidymodels 0.0.3 ──
## ✓ broom 0.5.4 ✓ purrr 0.3.3
## ✓ dials 0.0.4 ✓ recipes 0.1.9
## ✓ dplyr 0.8.4 ✓ rsample 0.0.5
## ✓ ggplot2 3.2.1 ✓ tibble 2.1.3
## ✓ infer 0.5.1 ✓ yardstick 0.0.5
## ✓ parsnip 0.0.5
## ── Conflicts ────────────────────────────────────── tidymodels_conflicts() ──
## x purrr::discard() masks scales::discard()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x ggplot2::margin() masks dials::margin()
## x recipes::step() masks stats::step()
## x recipes::yj_trans() masks scales::yj_trans()
bb_wf <- workflows::workflow() %>%
workflows::add_formula(Sale_Price ~ Bedroom_AbvGr + Full_Bath + Half_Bath) %>%
workflows::add_model(lm_spec)
bb_wf
## ══ Workflow ═════════════════════════════════════════════════════════════════
## Preprocessor: Formula
## Model: linear_reg()
##
## ── Preprocessor ─────────────────────────────────────────────────────────────
## Sale_Price ~ Bedroom_AbvGr + Full_Bath + Half_Bath
##
## ── Model ────────────────────────────────────────────────────────────────────
## Linear Regression Model Specification (regression)
##
## Computational engine: lm
# fit the final best model to the training set and evaluate the test set
fit_split <- tune::last_fit(bb_wf, ames_split)
fit_split
## # # Monte Carlo cross-validation (0.75/0.25) with 1 resamples
## # A tibble: 1 x 6
## splits id .metrics .notes .predictions .workflow
## * <list> <chr> <list> <list> <list> <list>
## 1 <split [2.2K… train/test … <tibble [2 ×… <tibble [0… <tibble [732 ×… <workflo…