27 End-to-End ML Pipeline

From features to deployed model in one flow

Keywords

snowflake, R, RStudio, Posit, VS Code, workspace notebooks, snowflakeR, RSnowflake, mlops

27.1 Overview

This chapter ties the guide into a single reference pipeline — the path most R teams want on Snowflake: governed features, training in R, registry deployment, monitoring, and optional scale-out. Use it as a map while reviewing individual chapters.

Workspace R readiness has two supported paths: self-serve bootstrap (setup_notebook() on the default runtime) and organisation CRE (Custom Runtime Environment with R pre-baked). The pipeline stages below are the same after R is available — only the first interactive step differs.

27.2 Learning Objectives

Map chapters to pipeline stages
Choose Workspace (bootstrap vs CRE) vs local IDE entry points
Find companion notebooks for copy-paste starting points
Complete production checklist before go-live

27.3 Reference pipeline

flowchart TB
  subgraph ingest [Data]
    src[Source tables / streams]
    fv[Feature Views]
  end
  subgraph dev [Development]
    boot["R-ready Workspace<br/>bootstrap or CRE"]
    train[Train in R]
    exp[Experiments optional]
  end
  subgraph govern [Governance]
    ds[Dataset snapshot]
    reg[Model Registry]
  end
  subgraph prod [Production]
    dep[Deploy SPCS / SQL]
    mon[Monitoring]
    task[Tasks / ML Jobs]
  end
  src --> fv --> ds --> train
  boot --> train
  train --> exp
  train --> reg --> dep --> mon
  dep --> task

27.4 Stage-by-stage map

Stage	What you do	Guide chapter	Starter artifact
0. Learn platform	Warehouses, Workspace, ML services	01, 05	—
1. Connect	Auth, TOML, IDE or Workspace	03, 04, 16	`connections.toml`
2. R-ready Workspace	Bootstrap or attach org CRE	06, 09, 07	`snowflaker_config.yaml` / `cre@<org>`
3. Features	Entities, views, point-in-time data	17	`workspace_feature_store.ipynb`
4. Train	tidymodels / fable / custom	08, 19	—
5. Register	`sfr_log_model()`, deploy	18	`workspace_model_registry.ipynb`
6. Scale	Parallel doSnowflake → many-model	22, 21	parallel SPCS notebooks
7. Monitor	Inference log + drift on deployed versions	20	monitoring vignette

MLOps framing situates these stages in your organization’s lifecycle.

27.5 Entry paths

Three common ways teams enter this pipeline. Stages 3–7 in the table above follow the same order as the snowflakeR chapters (features through monitoring).

27.5.1 Path A — Workspace with self-serve bootstrap

For pilots, sandboxes, or while an org CRE is not yet available:

Snowsight → Workspace → Git-connected project
Run the Python bootstrap cell — setup_notebook() + snowflaker_*.yaml (06)
Enable External Access for package downloads on first run (07)
Use %%R cells for all R work; re-run bootstrap after idle recycle (~60s typical for snowflakeR + RSnowflake)

27.5.2 Path B — Workspace with organisation CRE (recommended at scale)

When platform/IT has registered a Custom Runtime Environment:

Snowsight → Workspace → attach cre@<org_name> in notebook advanced settings
Skip setup_notebook() for standard packages already in the image (R, %%R, snowflakeR, RSnowflake, optional ADBC)
Optional Python cell: session checks, sfr_load_notebook_config(), or EAI-only extras — see RSnowflake in Workspace
%%R cells for modeling; promote to ML Job + same CRE for scheduled batch (09)

CRE and bootstrap can coexist: bootstrap for experimentation, CRE for production notebooks and jobs.

27.5.3 Path C — Local IDE-first

Install RSnowflake + snowflakeR locally (03)
Develop in RStudio / Posit / VS Code with connections.toml (04)
Push notebook + config to Workspace Git; run in Workspace via Path A or B for scheduled execution

Paths A, B, and C use the same snowflakeR / RSnowflake APIs after connection — only environment setup and auth differ.

27.6 Marketing / causal demo

The demo overview walks CausalImpact + Robyn:

Feature Store for marketing features
Model Registry for response curves
SQL-served inference for stakeholder dashboards

Good template for measurement and MMM teams evaluating snowflakeR.

27.7 Minimal smoke test

After your Workspace is R-ready (bootstrap finished or CRE attached and kernel started), run this sequence before a full pipeline.

If you used bootstrap (Path A)

Run the Python bootstrap cell first (06). Wait until it reports success (~60s typical). Then run the %%R cells below.

If you use organisation CRE (Path B)

Attach cre@<org_name> in notebook settings, start the kernel, and confirm %%R is available (often automatic on kernel start). Do not re-run setup_notebook() unless you need packages not in the image. Then run the %%R cells below.

snowflakeR connection and config:

%%R
library(snowflakeR)
conn <- sfr_connect()
conn <- sfr_load_notebook_config(conn)
sfr_query(conn, "SELECT CURRENT_USER(), CURRENT_DATABASE(), CURRENT_SCHEMA()")

RSnowflake / DBI (same session — validates SQL API path used by workers and bulk I/O):

%%R
library(DBI)
library(RSnowflake)
con <- dbConnect(Snowflake())
dbGetQuery(con, "SELECT 1 AS ok")

If the first block fails on a CRE notebook, check that snowflakeR is in the image and that sfr_load_notebook_config() points at your project’s YAML (absolute path under /filesystem/ for Git projects — 06). If the second block fails after bootstrap only, re-run the Python bootstrap cell and confirm EAI is enabled.

27.8 Production handoff

Before production cutover:

Appendix D: Production checklist — includes org CRE onboarding for platform teams and tarball pins for both CRE builds and bootstrap YAML
Appendix C: Troubleshooting
Pre-built tarballs in bootstrap configs and CRE build profiles (Appendix B)
Decide bootstrap vs CRE per audience: analysts on CRE; sandbox/PoC may keep bootstrap until the image is promoted

Important

Community packages — validate SLAs, support, and compliance with your organization.

27.9 Feedback

Open issues on snowflakeR (guide source under guide/) or package repos linked from the home page.

--- title: "End-to-End ML Pipeline" subtitle: "From features to deployed model in one flow" --- ## Overview This chapter ties the guide into a single **reference pipeline** — the path most R teams want on Snowflake: governed features, training in R, registry deployment, monitoring, and optional scale-out. Use it as a map while reviewing individual chapters. **Workspace R readiness** has two supported paths: **self-serve bootstrap** (`setup_notebook()` on the default runtime) and **organisation CRE** (Custom Runtime Environment with R pre-baked). The pipeline stages below are the same after R is available — only the first interactive step differs. ## Learning Objectives - Map chapters to pipeline stages - Choose Workspace (bootstrap vs CRE) vs local IDE entry points - Find companion notebooks for copy-paste starting points - Complete production checklist before go-live --- ## Reference pipeline {#sec-pipeline} ```{mermaid} flowchart TB subgraph ingest [Data] src[Source tables / streams] fv[Feature Views] end subgraph dev [Development] boot["R-ready Workspace<br/>bootstrap or CRE"] train[Train in R] exp[Experiments optional] end subgraph govern [Governance] ds[Dataset snapshot] reg[Model Registry] end subgraph prod [Production] dep[Deploy SPCS / SQL] mon[Monitoring] task[Tasks / ML Jobs] end src --> fv --> ds --> train boot --> train train --> exp train --> reg --> dep --> mon dep --> task ``` --- ## Stage-by-stage map {#sec-map} | Stage | What you do | Guide chapter | Starter artifact | |-------|-------------|---------------|------------------| | **0. Learn platform** | Warehouses, Workspace, ML services | [01](../01_snowflake_platform/index.qmd), [05](../05_workspaces_overview/index.qmd) | — | | **1. Connect** | Auth, TOML, IDE or Workspace | [03](../03_local_r_setup/index.qmd), [04](../04_rstudio_posit_vscode/index.qmd), [16](../16_snowflaker_connect/index.qmd) | `connections.toml` | | **2. R-ready Workspace** | Bootstrap **or** attach org CRE | [06](../06_workspace_bootstrap/index.qmd#sec-cre-vs-bootstrap), [09](../09_custom_runtime_and_ml_jobs/index.qmd#sec-cre-e2e), [07](../07_network_and_eai/index.qmd) | `snowflaker_config.yaml` / `cre@<org>` | | **3. Features** | Entities, views, point-in-time data | [17](../17_feature_store/index.qmd) | `workspace_feature_store.ipynb` | | **4. Train** | tidymodels / fable / custom | [08](../08_r_cells_and_interop/index.qmd), [19](../19_experiments/index.qmd) | — | | **5. Register** | `sfr_log_model()`, deploy | [18](../18_model_registry/index.qmd) | `workspace_model_registry.ipynb` | | **6. Scale** | Parallel doSnowflake → many-model | [22](../22_parallel_dosnowflake/index.qmd), [21](../21_many_model/index.qmd) | parallel SPCS notebooks | | **7. Monitor** | Inference log + drift on deployed versions | [20](../20_model_monitoring/index.qmd) | monitoring vignette | [MLOps framing](../15_mlops_on_snowflake/index.qmd) situates these stages in your organization's lifecycle. --- ## Entry paths {#sec-paths} Three common ways teams enter this pipeline. Stages **3–7** in the table above follow the same order as the snowflakeR chapters (features through monitoring). ### Path A — Workspace with self-serve bootstrap For pilots, sandboxes, or while an org CRE is not yet available: 1. Snowsight → **Workspace** → Git-connected project 2. Run the **Python bootstrap cell** — `setup_notebook()` + `snowflaker_*.yaml` ([06](../06_workspace_bootstrap/index.qmd)) 3. Enable **External Access** for package downloads on first run ([07](../07_network_and_eai/index.qmd)) 4. Use `%%R` cells for all R work; re-run bootstrap after **idle recycle** (~60s typical for snowflakeR + RSnowflake) ### Path B — Workspace with organisation CRE (recommended at scale) When platform/IT has registered a **[Custom Runtime Environment](../09_custom_runtime_and_ml_jobs/index.qmd#sec-cre-e2e)**: 1. Snowsight → Workspace → attach **`cre@<org_name>`** in notebook advanced settings 2. **Skip** `setup_notebook()` for standard packages already in the image (R, `%%R`, snowflakeR, RSnowflake, optional ADBC) 3. Optional Python cell: session checks, `sfr_load_notebook_config()`, or EAI-only extras — see [RSnowflake in Workspace](../14_rsnowflake_workspace/index.qmd#sec-order) 4. `%%R` cells for modeling; promote to **ML Job + same CRE** for scheduled batch ([09](../09_custom_runtime_and_ml_jobs/index.qmd)) CRE and bootstrap can coexist: bootstrap for experimentation, CRE for production notebooks and jobs. ### Path C — Local IDE-first 1. Install RSnowflake + snowflakeR locally ([03](../03_local_r_setup/index.qmd)) 2. Develop in RStudio / Posit / VS Code with `connections.toml` ([04](../04_rstudio_posit_vscode/index.qmd)) 3. Push notebook + config to Workspace Git; run in Workspace via **Path A or B** for scheduled execution Paths A, B, and C use the **same snowflakeR / RSnowflake APIs** after connection — only environment setup and auth differ. --- ## Marketing / causal demo {#sec-demo} The [demo overview](https://github.com/Snowflake-Labs/snowflakeR/blob/main/inst/notebooks/demo_overview.md) walks **CausalImpact + Robyn**: - Feature Store for marketing features - Model Registry for response curves - SQL-served inference for stakeholder dashboards Good template for **measurement** and **MMM** teams evaluating snowflakeR. --- ## Minimal smoke test {#sec-smoke} After your Workspace is **R-ready** (bootstrap finished **or** CRE attached and kernel started), run this sequence before a full pipeline. ::: {.callout-note} ## If you used bootstrap (Path A) Run the Python bootstrap cell first ([06](../06_workspace_bootstrap/index.qmd)). Wait until it reports success (~60s typical). Then run the `%%R` cells below. ::: ::: {.callout-note} ## If you use organisation CRE (Path B) Attach `cre@<org_name>` in notebook settings, start the kernel, and confirm `%%R` is available (often automatic on kernel start). **Do not** re-run `setup_notebook()` unless you need packages not in the image. Then run the `%%R` cells below. ::: **snowflakeR connection and config:** ```r %%R library(snowflakeR) conn <- sfr_connect() conn <- sfr_load_notebook_config(conn) sfr_query(conn, "SELECT CURRENT_USER(), CURRENT_DATABASE(), CURRENT_SCHEMA()") ``` **RSnowflake / DBI (same session — validates SQL API path used by workers and bulk I/O):** ```r %%R library(DBI) library(RSnowflake) con <- dbConnect(Snowflake()) dbGetQuery(con, "SELECT 1 AS ok") ``` If the first block fails on a CRE notebook, check that snowflakeR is in the image and that `sfr_load_notebook_config()` points at your project's YAML (absolute path under `/filesystem/` for Git projects — [06](../06_workspace_bootstrap/index.qmd#sec-filesystem)). If the second block fails after bootstrap only, re-run the Python bootstrap cell and confirm EAI is enabled. --- ## Production handoff {#sec-prod} Before production cutover: - [Appendix D: Production checklist](../appendices/D_production_checklist/index.qmd) — includes **org CRE onboarding** for platform teams and tarball pins for both CRE builds and bootstrap YAML - [Appendix C: Troubleshooting](../appendices/C_troubleshooting/index.qmd) - Pre-built tarballs in bootstrap configs **and** CRE build profiles ([Appendix B](../appendices/B_config_reference/index.qmd)) - Decide **bootstrap vs CRE** per audience: analysts on CRE; sandbox/PoC may keep bootstrap until the image is promoted ::: {.callout-important} Community packages — validate SLAs, support, and compliance with your organization. ::: --- ## Feedback Open issues on [snowflakeR](https://github.com/Snowflake-Labs/snowflakeR/issues) (guide source under `guide/`) or package repos linked from the [home page](../index.qmd).