27  End-to-End ML Pipeline

From features to deployed model in one flow

Keywords

snowflake, R, RStudio, Posit, VS Code, workspace notebooks, snowflakeR, RSnowflake, mlops

27.1 Overview

This chapter ties the guide into a single reference pipeline — the path most R teams want on Snowflake: governed features, training in R, registry deployment, monitoring, and optional scale-out. Use it as a map while reviewing individual chapters.

Workspace R readiness has two supported paths: self-serve bootstrap (setup_notebook() on the default runtime) and organisation CRE (Custom Runtime Environment with R pre-baked). The pipeline stages below are the same after R is available — only the first interactive step differs.

27.2 Learning Objectives

  • Map chapters to pipeline stages
  • Choose Workspace (bootstrap vs CRE) vs local IDE entry points
  • Find companion notebooks for copy-paste starting points
  • Complete production checklist before go-live

27.3 Reference pipeline

flowchart TB
  subgraph ingest [Data]
    src[Source tables / streams]
    fv[Feature Views]
  end
  subgraph dev [Development]
    boot["R-ready Workspace<br/>bootstrap or CRE"]
    train[Train in R]
    exp[Experiments optional]
  end
  subgraph govern [Governance]
    ds[Dataset snapshot]
    reg[Model Registry]
  end
  subgraph prod [Production]
    dep[Deploy SPCS / SQL]
    mon[Monitoring]
    task[Tasks / ML Jobs]
  end
  src --> fv --> ds --> train
  boot --> train
  train --> exp
  train --> reg --> dep --> mon
  dep --> task


27.4 Stage-by-stage map

Stage What you do Guide chapter Starter artifact
0. Learn platform Warehouses, Workspace, ML services 01, 05
1. Connect Auth, TOML, IDE or Workspace 03, 04, 16 connections.toml
2. R-ready Workspace Bootstrap or attach org CRE 06, 09, 07 snowflaker_config.yaml / cre@<org>
3. Features Entities, views, point-in-time data 17 workspace_feature_store.ipynb
4. Train tidymodels / fable / custom 08, 19
5. Register sfr_log_model(), deploy 18 workspace_model_registry.ipynb
6. Scale Parallel doSnowflake → many-model 22, 21 parallel SPCS notebooks
7. Monitor Inference log + drift on deployed versions 20 monitoring vignette

MLOps framing situates these stages in your organization’s lifecycle.


27.5 Entry paths

Three common ways teams enter this pipeline. Stages 3–7 in the table above follow the same order as the snowflakeR chapters (features through monitoring).

27.5.1 Path A — Workspace with self-serve bootstrap

For pilots, sandboxes, or while an org CRE is not yet available:

  1. Snowsight → Workspace → Git-connected project
  2. Run the Python bootstrap cellsetup_notebook() + snowflaker_*.yaml (06)
  3. Enable External Access for package downloads on first run (07)
  4. Use %%R cells for all R work; re-run bootstrap after idle recycle (~60s typical for snowflakeR + RSnowflake)

27.5.3 Path C — Local IDE-first

  1. Install RSnowflake + snowflakeR locally (03)
  2. Develop in RStudio / Posit / VS Code with connections.toml (04)
  3. Push notebook + config to Workspace Git; run in Workspace via Path A or B for scheduled execution

Paths A, B, and C use the same snowflakeR / RSnowflake APIs after connection — only environment setup and auth differ.


27.6 Marketing / causal demo

The demo overview walks CausalImpact + Robyn:

  • Feature Store for marketing features
  • Model Registry for response curves
  • SQL-served inference for stakeholder dashboards

Good template for measurement and MMM teams evaluating snowflakeR.


27.7 Minimal smoke test

After your Workspace is R-ready (bootstrap finished or CRE attached and kernel started), run this sequence before a full pipeline.

If you used bootstrap (Path A)

Run the Python bootstrap cell first (06). Wait until it reports success (~60s typical). Then run the %%R cells below.

If you use organisation CRE (Path B)

Attach cre@<org_name> in notebook settings, start the kernel, and confirm %%R is available (often automatic on kernel start). Do not re-run setup_notebook() unless you need packages not in the image. Then run the %%R cells below.

snowflakeR connection and config:

%%R
library(snowflakeR)
conn <- sfr_connect()
conn <- sfr_load_notebook_config(conn)
sfr_query(conn, "SELECT CURRENT_USER(), CURRENT_DATABASE(), CURRENT_SCHEMA()")

RSnowflake / DBI (same session — validates SQL API path used by workers and bulk I/O):

%%R
library(DBI)
library(RSnowflake)
con <- dbConnect(Snowflake())
dbGetQuery(con, "SELECT 1 AS ok")

If the first block fails on a CRE notebook, check that snowflakeR is in the image and that sfr_load_notebook_config() points at your project’s YAML (absolute path under /filesystem/ for Git projects — 06). If the second block fails after bootstrap only, re-run the Python bootstrap cell and confirm EAI is enabled.


27.8 Production handoff

Before production cutover:

  • Appendix D: Production checklist — includes org CRE onboarding for platform teams and tarball pins for both CRE builds and bootstrap YAML
  • Appendix C: Troubleshooting
  • Pre-built tarballs in bootstrap configs and CRE build profiles (Appendix B)
  • Decide bootstrap vs CRE per audience: analysts on CRE; sandbox/PoC may keep bootstrap until the image is promoted
Important

Community packages — validate SLAs, support, and compliance with your organization.


27.9 Feedback

Open issues on snowflakeR (guide source under guide/) or package repos linked from the home page.