flowchart LR R["R: sfr_connect()"] RET[reticulate] PY["snowflake-ml-python / Snowpark"] SF[Snowflake account] R --> RET --> PY --> SF
18 snowflakeR: Connect
Sessions, queries, tables, Tasks, and containers
snowflake, R, RStudio, Posit, VS Code, workspace notebooks, snowflakeR, RSnowflake, mlops
18.1 Overview
snowflakeR is your R entry point to the Snowflake ML platform — Feature Store, Model Registry, experiments, monitoring, and parallel execution. Under the hood it uses reticulate to call snowflake-ml-python; you write R, not Python.
This chapter covers connection, session context, and data access — the foundation for every later chapter. It also introduces operational platform pieces (Tasks, SPCS, stages) that turn interactive notebooks into production pipelines.
See MLOps on Snowflake for lifecycle framing.
See Introduction for the community-project disclaimer.
18.2 Learning Objectives
- Connect locally and in Workspace with the right auth mode
- Set warehouse/database/schema context reliably
- Move data with SQL and table helpers
- Understand how snowflakeR relates to RSnowflake and SPCS operational APIs
18.3 What sfr_connect() gives you
A successful connection returns an sfr_connection object wrapping a Snowpark Python session. That session:
- Authenticates to Snowflake (OAuth in Workspace, TOML/key-pair locally)
- Holds default warehouse, database, schema, role
- Powers
sfr_query(), Feature Store, Model Registry, and doSnowflake bridges
RSnowflake can share the same credentials via sfr_dbi_connection() — one login, two API styles (RSnowflake in Workspace).
18.4 Connect
18.4.1 Local — connections.toml (recommended)
library(snowflakeR)
conn <- sfr_connect() # [default] profile
conn <- sfr_connect(name = "production") # named profileProfiles live in ~/.snowflake/connections.toml — shared with RSnowflake, Python connector, and CLI. See Local R Setup.
18.4.2 Explicit parameters
Use when you do not use TOML or need to override a profile:
18.4.3 Key-pair (CI / service users)
conn <- sfr_connect(
account = "xy12345.us-east-1",
user = "SVC_USER",
private_key_file = "~/.snowflake/rsa_key.p8"
)Best for scheduled jobs and automation — no browser, no expiring PAT.
18.4.4 Workspace (zero config)
After bootstrap, the notebook runtime injects session OAuth:
Bootstrap exports SNOWFLAKE_* environment variables consumed by snowflakeR and RSnowflake.
18.5 Session context
Snowflake objects are addressed as DATABASE.SCHEMA.TABLE. Your connection carries default context — but Workspace does not always set database/schema the way Snowsight worksheets do.
Symptom: object does not exist for tables you know are there. Cause: wrong or empty session context. Fix: one of the options below.
Option A — YAML config (notebooks):
Option B — sfr_use() (must reassign):
R passes connections by value — sfr_use() returns a new connection object:
Option C — fully qualified names:
Use FQN helpers in shared notebooks so cells work regardless of session defaults.
18.6 Query and table I/O
18.6.1 SQL
Returns a data frame with lowercase column names (snowflakeR bridge convention).
18.6.2 Read / write tables
df <- sfr_read_table(conn, "MY_DB.MY_SCHEMA.MY_TABLE")
sfr_write_table(conn, df, "MY_DB.MY_SCHEMA.OUT_TABLE", mode = "overwrite")These use Snowpark table APIs — efficient for moderate-sized frames. For large analytics, prefer SQL aggregation or dbplyr via RSnowflake rather than pulling everything into R.
18.6.3 Column naming
| API | Default column case |
|---|---|
sfr_query() / sfr_read_table() |
lowercase |
| RSnowflake / dbplyr | UPPERCASE (Snowflake default) |
Pick one convention per pipeline. Mixing causes join bugs.
18.7 DBI bridge to RSnowflake
When you need dplyr or the Connections Pane alongside ML APIs:
dbi_con <- sfr_dbi_connection(conn)
library(dplyr)
tbl(dbi_con, in_schema("ANALYTICS", "ORDERS")) |>
filter(ORDER_DATE >= "2024-01-01") |>
head(10) |>
collect()sfr_dbi_connection() lazily wraps the Snowpark session — no second authentication.
18.8 Operational platform (beyond interactive ML)
Training in a notebook is step one. Production patterns use additional Snowflake services:
flowchart TB
subgraph interactive [Interactive]
NB[Workspace notebook]
Rtrain[Train in R]
end
subgraph schedule [Schedule]
TASK[Snowflake Tasks]
MJ[ML Jobs]
end
subgraph scale [Scale]
SPCS[SPCS containers]
POOL[Compute pool]
end
subgraph artifacts [Artifacts]
STG[Internal stage]
REG[Model Registry]
end
NB --> Rtrain --> REG
Rtrain --> STG
TASK --> MJ
TASK --> SPCS
SPCS --> POOL
SPCS --> STG
| Capability | Role in ML ops | snowflakeR entry |
|---|---|---|
| Tasks | Schedule SQL, trigger pipelines, orchestrate doSnowflake chunks | registerDoSnowflake(..., mode = "tasks") |
| SPCS | Inference services, parallel R workers | sfr_deploy_model(), doSnowflake, executor |
| Compute pools | Capacity for containers | sfr_dosnowflake_setup() |
| Image repositories | Store Docker images | sfr_dosnowflake_build_image() |
| Stages | Models, worker payloads, results | Volume mounts in SPCS; sfr_upload_* |
| EAI | Outbound HTTPS (CRAN, pip) in notebooks | Bootstrap — Network & EAI |
18.8.1 Prepare SPCS infrastructure
One-time (or per-environment) setup before parallel or executor workloads:
Creates or validates stage, queue table (for queue mode), and related objects depending on privileges.
18.8.2 One-off R script on SPCS
When %dopar% is not the right shape — single script, fixed pipeline:
stage_path <- sfr_upload_r_script(conn, "monthly_retrain.R", stage = "ML_STAGE")
job <- sfr_execute_r_script(
conn,
r_script_stage_path = stage_path,
compute_pool = "ML_R_POOL",
image_uri = "ML_DB.ML_SCHEMA.ML_IMAGES/executor:latest",
config = list(run_id = "2026-05-29"),
replicas = 1L
)
sfr_executor_status(conn, job$job_name)
sfr_collect_executor_results(conn, job, stage = "ML_STAGE")Define main(config) in the script. Workers use stage volume mounts — not SQL GET/PUT (SQL API limits).
18.8.3 Tasks in a typical ML lifecycle
- Feature tables — Dynamic Tables refresh on their own TARGET_LAG / schedule (no Task needed to “refresh the DT”). Materialized views in Snowflake stay consistent with base tables automatically. Use Tasks to chain downstream work (e.g. start retrain after a DT run completes), not to drive DT/MV refresh itself.
- Scheduled retrain — weekly Task, ML Job, or executor script when models need periodic refit
- Post-deploy Task — batch inference SQL against registry model (or
sfr_predict_sql()for warehouse-native models) - doSnowflake Tasks mode — parallel
%dopar%chunked into SPCS (Parallel doSnowflake)
18.9 Environment check
Before Feature Store or Registry work:
Verifies R packages, Python interpreter, and snowflake-ml-python version. Python setup: Local R Setup. Workspace: bootstrap or CRE.
18.10 Companion material
| Resource | Purpose |
|---|---|
Vignettes getting-started, setup |
API detail |
workspace_quickstart.ipynb |
First Workspace run |
local_quickstart.ipynb |
Desktop setup |
workspace_parallel_spcs_setup.ipynb |
SPCS infrastructure |
18.11 Next steps
Feature Store — governed features and training datasets.