18  snowflakeR: Connect

Sessions, queries, tables, Tasks, and containers

Keywords

snowflake, R, RStudio, Posit, VS Code, workspace notebooks, snowflakeR, RSnowflake, mlops

18.1 Overview

snowflakeR is your R entry point to the Snowflake ML platform — Feature Store, Model Registry, experiments, monitoring, and parallel execution. Under the hood it uses reticulate to call snowflake-ml-python; you write R, not Python.

This chapter covers connection, session context, and data access — the foundation for every later chapter. It also introduces operational platform pieces (Tasks, SPCS, stages) that turn interactive notebooks into production pipelines.

See MLOps on Snowflake for lifecycle framing.

Important

See Introduction for the community-project disclaimer.

18.2 Learning Objectives

  • Connect locally and in Workspace with the right auth mode
  • Set warehouse/database/schema context reliably
  • Move data with SQL and table helpers
  • Understand how snowflakeR relates to RSnowflake and SPCS operational APIs

18.3 What sfr_connect() gives you

A successful connection returns an sfr_connection object wrapping a Snowpark Python session. That session:

  • Authenticates to Snowflake (OAuth in Workspace, TOML/key-pair locally)
  • Holds default warehouse, database, schema, role
  • Powers sfr_query(), Feature Store, Model Registry, and doSnowflake bridges

flowchart LR
  R["R: sfr_connect()"]
  RET[reticulate]
  PY["snowflake-ml-python / Snowpark"]
  SF[Snowflake account]
  R --> RET --> PY --> SF

RSnowflake can share the same credentials via sfr_dbi_connection() — one login, two API styles (RSnowflake in Workspace).


18.4 Connect

18.4.2 Explicit parameters

Use when you do not use TOML or need to override a profile:

conn <- sfr_connect(
  account       = "xy12345.us-east-1",
  user          = "MYUSER",
  warehouse     = "COMPUTE_WH",
  database      = "MY_DB",
  schema        = "PUBLIC",
  authenticator = "externalbrowser"
)

18.4.3 Key-pair (CI / service users)

conn <- sfr_connect(
  account          = "xy12345.us-east-1",
  user             = "SVC_USER",
  private_key_file = "~/.snowflake/rsa_key.p8"
)

Best for scheduled jobs and automation — no browser, no expiring PAT.

18.4.4 Workspace (zero config)

After bootstrap, the notebook runtime injects session OAuth:

conn <- sfr_connect()   # no PAT or TOML in cells

Bootstrap exports SNOWFLAKE_* environment variables consumed by snowflakeR and RSnowflake.


18.5 Session context

Snowflake objects are addressed as DATABASE.SCHEMA.TABLE. Your connection carries default context — but Workspace does not always set database/schema the way Snowsight worksheets do.

Warning

Symptom: object does not exist for tables you know are there. Cause: wrong or empty session context. Fix: one of the options below.

Option A — YAML config (notebooks):

conn <- sfr_load_notebook_config(conn)  # reads snowflaker_*.yaml context section

Option B — sfr_use() (must reassign):

R passes connections by value — sfr_use() returns a new connection object:

conn <- sfr_use(conn,
  warehouse = "ML_WH",
  database  = "ML_DB",
  schema    = "FEATURES"
)

Option C — fully qualified names:

sfr_fqn(conn, "MY_TABLE")   # "ML_DB.FEATURES.MY_TABLE"

Use FQN helpers in shared notebooks so cells work regardless of session defaults.


18.6 Query and table I/O

18.6.1 SQL

sfr_query(conn, "SELECT CURRENT_WAREHOUSE(), CURRENT_USER(), CURRENT_ROLE()")

Returns a data frame with lowercase column names (snowflakeR bridge convention).

18.6.2 Read / write tables

df <- sfr_read_table(conn, "MY_DB.MY_SCHEMA.MY_TABLE")

sfr_write_table(conn, df, "MY_DB.MY_SCHEMA.OUT_TABLE", mode = "overwrite")

These use Snowpark table APIs — efficient for moderate-sized frames. For large analytics, prefer SQL aggregation or dbplyr via RSnowflake rather than pulling everything into R.

18.6.3 Column naming

API Default column case
sfr_query() / sfr_read_table() lowercase
RSnowflake / dbplyr UPPERCASE (Snowflake default)

Pick one convention per pipeline. Mixing causes join bugs.


18.7 DBI bridge to RSnowflake

When you need dplyr or the Connections Pane alongside ML APIs:

dbi_con <- sfr_dbi_connection(conn)

library(dplyr)
tbl(dbi_con, in_schema("ANALYTICS", "ORDERS")) |>
  filter(ORDER_DATE >= "2024-01-01") |>
  head(10) |>
  collect()

sfr_dbi_connection() lazily wraps the Snowpark session — no second authentication.


18.8 Operational platform (beyond interactive ML)

Training in a notebook is step one. Production patterns use additional Snowflake services:

flowchart TB
  subgraph interactive [Interactive]
    NB[Workspace notebook]
    Rtrain[Train in R]
  end

  subgraph schedule [Schedule]
    TASK[Snowflake Tasks]
    MJ[ML Jobs]
  end

  subgraph scale [Scale]
    SPCS[SPCS containers]
    POOL[Compute pool]
  end

  subgraph artifacts [Artifacts]
    STG[Internal stage]
    REG[Model Registry]
  end

  NB --> Rtrain --> REG
  Rtrain --> STG
  TASK --> MJ
  TASK --> SPCS
  SPCS --> POOL
  SPCS --> STG

Capability Role in ML ops snowflakeR entry
Tasks Schedule SQL, trigger pipelines, orchestrate doSnowflake chunks registerDoSnowflake(..., mode = "tasks")
SPCS Inference services, parallel R workers sfr_deploy_model(), doSnowflake, executor
Compute pools Capacity for containers sfr_dosnowflake_setup()
Image repositories Store Docker images sfr_dosnowflake_build_image()
Stages Models, worker payloads, results Volume mounts in SPCS; sfr_upload_*
EAI Outbound HTTPS (CRAN, pip) in notebooks Bootstrap — Network & EAI

18.8.1 Prepare SPCS infrastructure

One-time (or per-environment) setup before parallel or executor workloads:

sfr_dosnowflake_setup(conn,
  compute_pool = "ML_R_POOL",
  image_repo   = "ML_DB.ML_SCHEMA.ML_IMAGES"
)

Creates or validates stage, queue table (for queue mode), and related objects depending on privileges.

18.8.2 One-off R script on SPCS

When %dopar% is not the right shape — single script, fixed pipeline:

stage_path <- sfr_upload_r_script(conn, "monthly_retrain.R", stage = "ML_STAGE")

job <- sfr_execute_r_script(
  conn,
  r_script_stage_path = stage_path,
  compute_pool        = "ML_R_POOL",
  image_uri           = "ML_DB.ML_SCHEMA.ML_IMAGES/executor:latest",
  config              = list(run_id = "2026-05-29"),
  replicas            = 1L
)

sfr_executor_status(conn, job$job_name)
sfr_collect_executor_results(conn, job, stage = "ML_STAGE")

Define main(config) in the script. Workers use stage volume mounts — not SQL GET/PUT (SQL API limits).

18.8.3 Tasks in a typical ML lifecycle

  1. Feature tablesDynamic Tables refresh on their own TARGET_LAG / schedule (no Task needed to “refresh the DT”). Materialized views in Snowflake stay consistent with base tables automatically. Use Tasks to chain downstream work (e.g. start retrain after a DT run completes), not to drive DT/MV refresh itself.
  2. Scheduled retrain — weekly Task, ML Job, or executor script when models need periodic refit
  3. Post-deploy Task — batch inference SQL against registry model (or sfr_predict_sql() for warehouse-native models)
  4. doSnowflake Tasks mode — parallel %dopar% chunked into SPCS (Parallel doSnowflake)

18.9 Environment check

Before Feature Store or Registry work:

sfr_check_environment()

Verifies R packages, Python interpreter, and snowflake-ml-python version. Python setup: Local R Setup. Workspace: bootstrap or CRE.


18.10 Companion material

Resource Purpose
Vignettes getting-started, setup API detail
workspace_quickstart.ipynb First Workspace run
local_quickstart.ipynb Desktop setup
workspace_parallel_spcs_setup.ipynb SPCS infrastructure

18.11 Next steps

Feature Store — governed features and training datasets.