17  MLOps on Snowflake

Data flow, Snowflake ML, and why snowflakeR

Keywords

snowflake, R, RStudio, Posit, VS Code, workspace notebooks, snowflakeR, RSnowflake, mlops

17.1 Overview

This chapter frames MLOps on Snowflake for R users: how data moves from sources to features to models to production — and where snowflakeR fits alongside Posit, vetiver, and RSnowflake.

The R community has strong tools through modeling (tidymodels, forecast, lme4, …) and local MLOps (renv, targets, vetiver, Posit Connect). snowflakeR adds the in-platform infrastructure layer — registry, feature store, governed serving — when your data and models live in Snowflake.

Important

See Introduction — DisclaimerssnowflakeR, RSnowflake, and snowflake-notebook-multilang are Snowflake-Labs community projects, not officially supported product offerings. APIs may change as they evolve.

17.2 Learning Objectives

  • Sketch an end-to-end ML lifecycle on Snowflake
  • Map existing R tools to snowflakeR capabilities
  • Explain why in-account ML vs “R only on laptop”
  • Choose snowflakeR vs RSnowflake-only paths

17.3 The deployment gap (R context)

R excels at exploration and modeling. Production often stalls on:

  • Containerizing R (Docker, K8s)
  • Operating REST APIs (Plumber) per model
  • Moving data out of the warehouse for scoring and back
  • Central versioning and lineage across teams

Posit Connect and similar tools solve publishing well. snowflakeR targets teams whose system of record is Snowflake — features, models, and predictions stay in-account with the same governance as tables.

17.4 MLOps data flow

flowchart TB
  subgraph ingest [Ingest and transform]
    src[Source tables / streams]
    dbt[dbt / Dynamic Tables]
  end
  subgraph features [Feature platform]
    ent[Entities]
    fv[Feature Views]
    ds[Datasets]
  end
  subgraph model [Model lifecycle]
    train[Train in R]
    exp[Experiments]
    reg[Model Registry]
  end
  subgraph serve [Serve and observe]
    spcs[SPCS inference]
    sql[Warehouse SQL score]
    mon[Monitoring]
  end
  src --> dbt --> fv
  ent --> fv --> ds --> train
  train --> exp --> reg
  reg --> spcs
  reg --> sql
  reg --> mon

Stage Snowflake object R tooling
Ingest / transform Tables, Dynamic Tables, Streams RSnowflake / dbplyr; dbt (other team)
Feature definitions Entities, Feature Views sfr_feature_store(), sfr_create_feature_view()
Training snapshot Datasets sfr_generate_training_data()
Experimentation Experiment runs sfr_start_run(), metrics helpers
Model store Registry versions sfr_log_model()
Online inference SPCS service sfr_deploy_model(), sfr_predict()
Batch inference Warehouse SQL, REST sfr_predict(), SQL functions
Monitoring Monitoring jobs Model monitoring APIs
Scale-out training/scoring Tasks, SPCS workers registerDoSnowflake()

17.5 R MLOps stack mapping

How snowflakeR complements (not replaces) the Posit/community stack:

Stage Existing R tools snowflakeR adds
Data access DBI, dbplyr, arrow Same + sfr_query(); bridge via RSnowflake
Feature engineering recipes, dplyr sfr_create_feature_view() — governed, shared
Modeling tidymodels, caret, base R, forecast Train as usual; sfr_log_model()
Dependencies renv, Posit Package Manager conda_deps / env specs for serving
Local versioning vetiver + pins Registry as system of record on Snowflake
Orchestration targets Tasks + doSnowflake; optional targets still for local
Deployment Connect, Plumber, Docker sfr_deploy_model() → SPCS
Monitoring vetiver metrics Registry-integrated monitoring

Principle: Keep your modeling idioms; add Snowflake for storage, lineage, and scaled serving.

17.6 Why Snowflake ML (not R-only)?

R-only pattern Pain on Snowflake-centric teams
Train locally, export CSV Egress cost, governance, stale data
PMML / ONNX conversion Many R models don’t convert cleanly; for supported workflows, orbital (tidymodels → SQL) or Snowflake warehouse-native / SQL model forms can score in SQL without shipping R to inference
One Plumber container per model Ops sprawl, no unified registry
Cron on a VM No lineage to warehouse tables
Score in R, write results manually Race conditions, audit gaps

Snowflake ML provides:

  • Single registry for Python and R models
  • Feature Store with point-in-time correctness
  • Lineage from table → feature → dataset → model
  • Elastic serving on SPCS or SQL
  • Same RBAC as data objects

17.7 How snowflakeR works (brief)

snowflakeR uses reticulate to call snowflake-ml-python. R users call sfr_* functions; the package handles Python.

For serving, sfr_log_model():

  1. Serializes the R model (.rds)
  2. Auto-generates a Python CustomModel that loads R via rpy2 at inference
  3. Registers in Model Registry
  4. Deploys to SPCS where the container runs R + your model

You do not hand-write Python wrappers — see Model Registry.

In Workspace, rpy2 also powers %%R cells (Python → R direction). See Architecture.

17.8 Complementary platforms

Platform Role With snowflakeR
RStudio / Positron (local) Author, debug sfr_connect(profile=...)
Posit Workbench Native App on Snowflake In-account RStudio Install RSnowflake + snowflakeR in session; same APIs
Posit Connect Shiny, Quarto, Plumber publishing Apps call sfr_predict_rest() to registry endpoints
Posit Package Manager CRAN mirror Faster corporate installs; still use Snowflake for ML
vetiver Local model pins + metrics Optional; registry for production truth
RSnowflake SQL/dplyr only sfr_dbi_connection() when mixing ML + dbplyr
Note

Posit and Snowflake partnered on Workbench as a Native App — RStudio inside your account with data locality. snowflakeR is independent open source but aligns with that workflow.

17.9 Platform services for operationalization

ML in production uses more than registry APIs:

Service Operational use
Tasks Nightly retrain triggers, batch score SQL, pipeline steps
SPCS Low-latency inference, parallel R workers
Stages Model artifacts, training exports, worker file I/O
EAI Package installs in notebooks/containers
Git / Workspace Promote notebooks to scheduled jobs
Tags / masking Compliance on feature and prediction tables

snowflakeR: Connect and Parallel doSnowflake cover Tasks and SPCS from R.

17.10 Decision guide

Use snowflakeR when you need:

  • Feature Store or Model Registry from R
  • In-account inference (SPCS / SQL / REST)
  • Experiment and monitoring integration
  • Interop with Python ML on the same objects

Use RSnowflake only when:

  • SQL analytics and dplyr — no registry or Feature Store
  • You export scores to an external system deliberately

Use Posit Connect (without snowflakeR) when:

  • Primary deliverable is Shiny/Quarto — predictions may still call Snowflake via REST

Use both:

  • Develop in RStudio; register and serve on Snowflake; publish Connect app that calls deployed model

17.11 Anti-patterns

Avoid Prefer
SELECT * millions of rows into R Aggregate in SQL; sample; use Feature Store datasets
Retrain only on laptop, deploy manually Log version in registry; deploy alias @champion
Duplicate feature logic in R and Python Feature Views as single definition
Ignore conda-forge constraints at serve time Plan dependencies at sfr_log_model() time

17.12 Next steps

snowflakeR: Connect

Feature Store

Feature Store Implementation Guide — concepts chapters 01–06

Model Registry