flowchart TB
subgraph ingest [Ingest and transform]
src[Source tables / streams]
dbt[dbt / Dynamic Tables]
end
subgraph features [Feature platform]
ent[Entities]
fv[Feature Views]
ds[Datasets]
end
subgraph model [Model lifecycle]
train[Train in R]
exp[Experiments]
reg[Model Registry]
end
subgraph serve [Serve and observe]
spcs[SPCS inference]
sql[Warehouse SQL score]
mon[Monitoring]
end
src --> dbt --> fv
ent --> fv --> ds --> train
train --> exp --> reg
reg --> spcs
reg --> sql
reg --> mon
17 MLOps on Snowflake
Data flow, Snowflake ML, and why snowflakeR
snowflake, R, RStudio, Posit, VS Code, workspace notebooks, snowflakeR, RSnowflake, mlops
17.1 Overview
This chapter frames MLOps on Snowflake for R users: how data moves from sources to features to models to production — and where snowflakeR fits alongside Posit, vetiver, and RSnowflake.
The R community has strong tools through modeling (tidymodels, forecast, lme4, …) and local MLOps (renv, targets, vetiver, Posit Connect). snowflakeR adds the in-platform infrastructure layer — registry, feature store, governed serving — when your data and models live in Snowflake.
See Introduction — Disclaimers — snowflakeR, RSnowflake, and snowflake-notebook-multilang are Snowflake-Labs community projects, not officially supported product offerings. APIs may change as they evolve.
17.2 Learning Objectives
- Sketch an end-to-end ML lifecycle on Snowflake
- Map existing R tools to snowflakeR capabilities
- Explain why in-account ML vs “R only on laptop”
- Choose snowflakeR vs RSnowflake-only paths
17.3 The deployment gap (R context)
R excels at exploration and modeling. Production often stalls on:
- Containerizing R (Docker, K8s)
- Operating REST APIs (Plumber) per model
- Moving data out of the warehouse for scoring and back
- Central versioning and lineage across teams
Posit Connect and similar tools solve publishing well. snowflakeR targets teams whose system of record is Snowflake — features, models, and predictions stay in-account with the same governance as tables.
17.4 MLOps data flow
| Stage | Snowflake object | R tooling |
|---|---|---|
| Ingest / transform | Tables, Dynamic Tables, Streams | RSnowflake / dbplyr; dbt (other team) |
| Feature definitions | Entities, Feature Views | sfr_feature_store(), sfr_create_feature_view() |
| Training snapshot | Datasets | sfr_generate_training_data() |
| Experimentation | Experiment runs | sfr_start_run(), metrics helpers |
| Model store | Registry versions | sfr_log_model() |
| Online inference | SPCS service | sfr_deploy_model(), sfr_predict() |
| Batch inference | Warehouse SQL, REST | sfr_predict(), SQL functions |
| Monitoring | Monitoring jobs | Model monitoring APIs |
| Scale-out training/scoring | Tasks, SPCS workers | registerDoSnowflake() |
17.5 R MLOps stack mapping
How snowflakeR complements (not replaces) the Posit/community stack:
| Stage | Existing R tools | snowflakeR adds |
|---|---|---|
| Data access | DBI, dbplyr, arrow | Same + sfr_query(); bridge via RSnowflake |
| Feature engineering | recipes, dplyr | sfr_create_feature_view() — governed, shared |
| Modeling | tidymodels, caret, base R, forecast | Train as usual; sfr_log_model() |
| Dependencies | renv, Posit Package Manager | conda_deps / env specs for serving |
| Local versioning | vetiver + pins | Registry as system of record on Snowflake |
| Orchestration | targets | Tasks + doSnowflake; optional targets still for local |
| Deployment | Connect, Plumber, Docker | sfr_deploy_model() → SPCS |
| Monitoring | vetiver metrics | Registry-integrated monitoring |
Principle: Keep your modeling idioms; add Snowflake for storage, lineage, and scaled serving.
17.6 Why Snowflake ML (not R-only)?
| R-only pattern | Pain on Snowflake-centric teams |
|---|---|
| Train locally, export CSV | Egress cost, governance, stale data |
| PMML / ONNX conversion | Many R models don’t convert cleanly; for supported workflows, orbital (tidymodels → SQL) or Snowflake warehouse-native / SQL model forms can score in SQL without shipping R to inference |
| One Plumber container per model | Ops sprawl, no unified registry |
| Cron on a VM | No lineage to warehouse tables |
| Score in R, write results manually | Race conditions, audit gaps |
Snowflake ML provides:
- Single registry for Python and R models
- Feature Store with point-in-time correctness
- Lineage from table → feature → dataset → model
- Elastic serving on SPCS or SQL
- Same RBAC as data objects
17.7 How snowflakeR works (brief)
snowflakeR uses reticulate to call snowflake-ml-python. R users call sfr_* functions; the package handles Python.
For serving, sfr_log_model():
- Serializes the R model (
.rds) - Auto-generates a Python
CustomModelthat loads R via rpy2 at inference - Registers in Model Registry
- Deploys to SPCS where the container runs R + your model
You do not hand-write Python wrappers — see Model Registry.
In Workspace, rpy2 also powers %%R cells (Python → R direction). See Architecture.
17.8 Complementary platforms
| Platform | Role | With snowflakeR |
|---|---|---|
| RStudio / Positron (local) | Author, debug | sfr_connect(profile=...) |
| Posit Workbench Native App on Snowflake | In-account RStudio | Install RSnowflake + snowflakeR in session; same APIs |
| Posit Connect | Shiny, Quarto, Plumber publishing | Apps call sfr_predict_rest() to registry endpoints |
| Posit Package Manager | CRAN mirror | Faster corporate installs; still use Snowflake for ML |
| vetiver | Local model pins + metrics | Optional; registry for production truth |
| RSnowflake | SQL/dplyr only | sfr_dbi_connection() when mixing ML + dbplyr |
Posit and Snowflake partnered on Workbench as a Native App — RStudio inside your account with data locality. snowflakeR is independent open source but aligns with that workflow.
17.9 Platform services for operationalization
ML in production uses more than registry APIs:
| Service | Operational use |
|---|---|
| Tasks | Nightly retrain triggers, batch score SQL, pipeline steps |
| SPCS | Low-latency inference, parallel R workers |
| Stages | Model artifacts, training exports, worker file I/O |
| EAI | Package installs in notebooks/containers |
| Git / Workspace | Promote notebooks to scheduled jobs |
| Tags / masking | Compliance on feature and prediction tables |
snowflakeR: Connect and Parallel doSnowflake cover Tasks and SPCS from R.
17.10 Decision guide
Use snowflakeR when you need:
- Feature Store or Model Registry from R
- In-account inference (SPCS / SQL / REST)
- Experiment and monitoring integration
- Interop with Python ML on the same objects
Use RSnowflake only when:
- SQL analytics and dplyr — no registry or Feature Store
- You export scores to an external system deliberately
Use Posit Connect (without snowflakeR) when:
- Primary deliverable is Shiny/Quarto — predictions may still call Snowflake via REST
Use both:
- Develop in RStudio; register and serve on Snowflake; publish Connect app that calls deployed model
17.11 Anti-patterns
| Avoid | Prefer |
|---|---|
SELECT * millions of rows into R |
Aggregate in SQL; sample; use Feature Store datasets |
| Retrain only on laptop, deploy manually | Log version in registry; deploy alias @champion |
| Duplicate feature logic in R and Python | Feature Views as single definition |
| Ignore conda-forge constraints at serve time | Plan dependencies at sfr_log_model() time |
17.12 Next steps
Feature Store Implementation Guide — concepts chapters 01–06