18 snowflakeR: Connect

Sessions, queries, tables, Tasks, and containers

Keywords

snowflake, R, RStudio, Posit, VS Code, workspace notebooks, snowflakeR, RSnowflake, mlops

18.1 Overview

snowflakeR is your R entry point to the Snowflake ML platform — Feature Store, Model Registry, experiments, monitoring, and parallel execution. Under the hood it uses reticulate to call snowflake-ml-python; you write R, not Python.

This chapter covers connection, session context, and data access — the foundation for every later chapter. It also introduces operational platform pieces (Tasks, SPCS, stages) that turn interactive notebooks into production pipelines.

See MLOps on Snowflake for lifecycle framing.

Important

See Introduction for the community-project disclaimer.

18.2 Learning Objectives

Connect locally and in Workspace with the right auth mode
Set warehouse/database/schema context reliably
Move data with SQL and table helpers
Understand how snowflakeR relates to RSnowflake and SPCS operational APIs

18.3 What `sfr_connect()` gives you

A successful connection returns an sfr_connection object wrapping a Snowpark Python session. That session:

Authenticates to Snowflake (OAuth in Workspace, TOML/key-pair locally)
Holds default warehouse, database, schema, role
Powers sfr_query(), Feature Store, Model Registry, and doSnowflake bridges

flowchart LR
  R["R: sfr_connect()"]
  RET[reticulate]
  PY["snowflake-ml-python / Snowpark"]
  SF[Snowflake account]
  R --> RET --> PY --> SF

RSnowflake can share the same credentials via sfr_dbi_connection() — one login, two API styles (RSnowflake in Workspace).

18.4 Connect

18.4.1 Local — connections.toml (recommended)

library(snowflakeR)
conn <- sfr_connect()                       # [default] profile
conn <- sfr_connect(name = "production")  # named profile

Profiles live in ~/.snowflake/connections.toml — shared with RSnowflake, Python connector, and CLI. See Local R Setup.

18.4.2 Explicit parameters

Use when you do not use TOML or need to override a profile:

conn <- sfr_connect(
  account       = "xy12345.us-east-1",
  user          = "MYUSER",
  warehouse     = "COMPUTE_WH",
  database      = "MY_DB",
  schema        = "PUBLIC",
  authenticator = "externalbrowser"
)

18.4.3 Key-pair (CI / service users)

conn <- sfr_connect(
  account          = "xy12345.us-east-1",
  user             = "SVC_USER",
  private_key_file = "~/.snowflake/rsa_key.p8"
)

Best for scheduled jobs and automation — no browser, no expiring PAT.

18.4.4 Workspace (zero config)

After bootstrap, the notebook runtime injects session OAuth:

conn <- sfr_connect()   # no PAT or TOML in cells

Bootstrap exports SNOWFLAKE_* environment variables consumed by snowflakeR and RSnowflake.

18.5 Session context

Snowflake objects are addressed as DATABASE.SCHEMA.TABLE. Your connection carries default context — but Workspace does not always set database/schema the way Snowsight worksheets do.

Warning

Symptom: object does not exist for tables you know are there. Cause: wrong or empty session context. Fix: one of the options below.

Option A — YAML config (notebooks):

conn <- sfr_load_notebook_config(conn)  # reads snowflaker_*.yaml context section

Option B — sfr_use() (must reassign):

R passes connections by value — sfr_use() returns a new connection object:

conn <- sfr_use(conn,
  warehouse = "ML_WH",
  database  = "ML_DB",
  schema    = "FEATURES"
)

Option C — fully qualified names:

sfr_fqn(conn, "MY_TABLE")   # "ML_DB.FEATURES.MY_TABLE"

Use FQN helpers in shared notebooks so cells work regardless of session defaults.

18.6 Query and table I/O

18.6.1 SQL

sfr_query(conn, "SELECT CURRENT_WAREHOUSE(), CURRENT_USER(), CURRENT_ROLE()")

Returns a data frame with lowercase column names (snowflakeR bridge convention).

18.6.2 Read / write tables

df <- sfr_read_table(conn, "MY_DB.MY_SCHEMA.MY_TABLE")

sfr_write_table(conn, df, "MY_DB.MY_SCHEMA.OUT_TABLE", mode = "overwrite")

These use Snowpark table APIs — efficient for moderate-sized frames. For large analytics, prefer SQL aggregation or dbplyr via RSnowflake rather than pulling everything into R.

18.6.3 Column naming

API	Default column case
`sfr_query()` / `sfr_read_table()`	lowercase
RSnowflake / dbplyr	UPPERCASE (Snowflake default)

Pick one convention per pipeline. Mixing causes join bugs.

18.7 DBI bridge to RSnowflake

When you need dplyr or the Connections Pane alongside ML APIs:

dbi_con <- sfr_dbi_connection(conn)

library(dplyr)
tbl(dbi_con, in_schema("ANALYTICS", "ORDERS")) |>
  filter(ORDER_DATE >= "2024-01-01") |>
  head(10) |>
  collect()

sfr_dbi_connection() lazily wraps the Snowpark session — no second authentication.

18.8 Operational platform (beyond interactive ML)

Training in a notebook is step one. Production patterns use additional Snowflake services:

flowchart TB
  subgraph interactive [Interactive]
    NB[Workspace notebook]
    Rtrain[Train in R]
  end

  subgraph schedule [Schedule]
    TASK[Snowflake Tasks]
    MJ[ML Jobs]
  end

  subgraph scale [Scale]
    SPCS[SPCS containers]
    POOL[Compute pool]
  end

  subgraph artifacts [Artifacts]
    STG[Internal stage]
    REG[Model Registry]
  end

  NB --> Rtrain --> REG
  Rtrain --> STG
  TASK --> MJ
  TASK --> SPCS
  SPCS --> POOL
  SPCS --> STG

Capability	Role in ML ops	snowflakeR entry
Tasks	Schedule SQL, trigger pipelines, orchestrate doSnowflake chunks	`registerDoSnowflake(..., mode = "tasks")`
SPCS	Inference services, parallel R workers	`sfr_deploy_model()`, doSnowflake, executor
Compute pools	Capacity for containers	`sfr_dosnowflake_setup()`
Image repositories	Store Docker images	`sfr_dosnowflake_build_image()`
Stages	Models, worker payloads, results	Volume mounts in SPCS; `sfr_upload_*`
EAI	Outbound HTTPS (CRAN, pip) in notebooks	Bootstrap — Network & EAI

18.8.1 Prepare SPCS infrastructure

One-time (or per-environment) setup before parallel or executor workloads:

sfr_dosnowflake_setup(conn,
  compute_pool = "ML_R_POOL",
  image_repo   = "ML_DB.ML_SCHEMA.ML_IMAGES"
)

Creates or validates stage, queue table (for queue mode), and related objects depending on privileges.

18.8.2 One-off R script on SPCS

When %dopar% is not the right shape — single script, fixed pipeline:

stage_path <- sfr_upload_r_script(conn, "monthly_retrain.R", stage = "ML_STAGE")

job <- sfr_execute_r_script(
  conn,
  r_script_stage_path = stage_path,
  compute_pool        = "ML_R_POOL",
  image_uri           = "ML_DB.ML_SCHEMA.ML_IMAGES/executor:latest",
  config              = list(run_id = "2026-05-29"),
  replicas            = 1L
)

sfr_executor_status(conn, job$job_name)
sfr_collect_executor_results(conn, job, stage = "ML_STAGE")

Define main(config) in the script. Workers use stage volume mounts — not SQL GET/PUT (SQL API limits).

18.8.3 Tasks in a typical ML lifecycle

Feature tables — Dynamic Tables refresh on their own TARGET_LAG / schedule (no Task needed to “refresh the DT”). Materialized views in Snowflake stay consistent with base tables automatically. Use Tasks to chain downstream work (e.g. start retrain after a DT run completes), not to drive DT/MV refresh itself.
Scheduled retrain — weekly Task, ML Job, or executor script when models need periodic refit
Post-deploy Task — batch inference SQL against registry model (or sfr_predict_sql() for warehouse-native models)
doSnowflake Tasks mode — parallel %dopar% chunked into SPCS (Parallel doSnowflake)

18.9 Environment check

Before Feature Store or Registry work:

sfr_check_environment()

Verifies R packages, Python interpreter, and snowflake-ml-python version. Python setup: Local R Setup. Workspace: bootstrap or CRE.

18.10 Companion material

Resource	Purpose
Vignettes `getting-started`, `setup`	API detail
`workspace_quickstart.ipynb`	First Workspace run
`local_quickstart.ipynb`	Desktop setup
`workspace_parallel_spcs_setup.ipynb`	SPCS infrastructure

18.11 Next steps

Feature Store — governed features and training datasets.

--- title: "snowflakeR: Connect" subtitle: "Sessions, queries, tables, Tasks, and containers" --- ## Overview **snowflakeR** is your R entry point to the Snowflake **ML platform** — Feature Store, Model Registry, experiments, monitoring, and parallel execution. Under the hood it uses **reticulate** to call `snowflake-ml-python`; you write R, not Python. This chapter covers **connection**, **session context**, and **data access** — the foundation for every later chapter. It also introduces **operational** platform pieces (Tasks, SPCS, stages) that turn interactive notebooks into production pipelines. See [MLOps on Snowflake](../15_mlops_on_snowflake/index.qmd) for lifecycle framing. ::: {.callout-important} See [Introduction](../00_introduction/index.qmd#sec-disclaimers) for the community-project disclaimer. ::: ## Learning Objectives - Connect locally and in Workspace with the right auth mode - Set warehouse/database/schema context reliably - Move data with SQL and table helpers - Understand how snowflakeR relates to RSnowflake and SPCS operational APIs --- ## What `sfr_connect()` gives you {#sec-what} A successful connection returns an **`sfr_connection`** object wrapping a Snowpark Python session. That session: - Authenticates to Snowflake (OAuth in Workspace, TOML/key-pair locally) - Holds default **warehouse**, **database**, **schema**, **role** - Powers `sfr_query()`, Feature Store, Model Registry, and doSnowflake bridges ```{mermaid} flowchart LR R["R: sfr_connect()"] RET[reticulate] PY["snowflake-ml-python / Snowpark"] SF[Snowflake account] R --> RET --> PY --> SF ``` RSnowflake can share the same credentials via `sfr_dbi_connection()` — one login, two API styles ([RSnowflake in Workspace](../14_rsnowflake_workspace/index.qmd)). --- ## Connect {#sec-connect} ### Local — connections.toml (recommended) ```r library(snowflakeR) conn <- sfr_connect() # [default] profile conn <- sfr_connect(name = "production") # named profile ``` Profiles live in `~/.snowflake/connections.toml` — shared with RSnowflake, Python connector, and CLI. See [Local R Setup](../03_local_r_setup/index.qmd#sec-toml). ### Explicit parameters Use when you do not use TOML or need to override a profile: ```r conn <- sfr_connect( account = "xy12345.us-east-1", user = "MYUSER", warehouse = "COMPUTE_WH", database = "MY_DB", schema = "PUBLIC", authenticator = "externalbrowser" ) ``` ### Key-pair (CI / service users) ```r conn <- sfr_connect( account = "xy12345.us-east-1", user = "SVC_USER", private_key_file = "~/.snowflake/rsa_key.p8" ) ``` Best for scheduled jobs and automation — no browser, no expiring PAT. ### Workspace (zero config) After bootstrap, the notebook runtime injects session OAuth: ```r conn <- sfr_connect() # no PAT or TOML in cells ``` Bootstrap exports `SNOWFLAKE_*` environment variables consumed by snowflakeR and RSnowflake. --- ## Session context {#sec-context} Snowflake objects are addressed as **`DATABASE.SCHEMA.TABLE`**. Your connection carries default context — but **Workspace does not always set database/schema** the way Snowsight worksheets do. ::: {.callout-warning} Symptom: `object does not exist` for tables you know are there. Cause: wrong or empty session context. Fix: one of the options below. ::: **Option A — YAML config (notebooks):** ```r conn <- sfr_load_notebook_config(conn) # reads snowflaker_*.yaml context section ``` **Option B — `sfr_use()` (must reassign):** R passes connections by value — `sfr_use()` returns a **new** connection object: ```r conn <- sfr_use(conn, warehouse = "ML_WH", database = "ML_DB", schema = "FEATURES" ) ``` **Option C — fully qualified names:** ```r sfr_fqn(conn, "MY_TABLE") # "ML_DB.FEATURES.MY_TABLE" ``` Use FQN helpers in shared notebooks so cells work regardless of session defaults. --- ## Query and table I/O {#sec-tables} ### SQL ```r sfr_query(conn, "SELECT CURRENT_WAREHOUSE(), CURRENT_USER(), CURRENT_ROLE()") ``` Returns a data frame with **lowercase** column names (snowflakeR bridge convention). ### Read / write tables ```r df <- sfr_read_table(conn, "MY_DB.MY_SCHEMA.MY_TABLE") sfr_write_table(conn, df, "MY_DB.MY_SCHEMA.OUT_TABLE", mode = "overwrite") ``` These use Snowpark table APIs — efficient for moderate-sized frames. For large analytics, prefer **SQL aggregation** or **dbplyr** via RSnowflake rather than pulling everything into R. ### Column naming | API | Default column case | |-----|---------------------| | `sfr_query()` / `sfr_read_table()` | lowercase | | RSnowflake / dbplyr | UPPERCASE (Snowflake default) | Pick one convention per pipeline. Mixing causes join bugs. --- ## DBI bridge to RSnowflake {#sec-dbi} When you need **dplyr** or the Connections Pane alongside ML APIs: ```r dbi_con <- sfr_dbi_connection(conn) library(dplyr) tbl(dbi_con, in_schema("ANALYTICS", "ORDERS")) |> filter(ORDER_DATE >= "2024-01-01") |> head(10) |> collect() ``` `sfr_dbi_connection()` lazily wraps the Snowpark session — no second authentication. --- ## Operational platform (beyond interactive ML) {#sec-ops} Training in a notebook is step one. Production patterns use additional Snowflake services: ```{mermaid} flowchart TB subgraph interactive [Interactive] NB[Workspace notebook] Rtrain[Train in R] end subgraph schedule [Schedule] TASK[Snowflake Tasks] MJ[ML Jobs] end subgraph scale [Scale] SPCS[SPCS containers] POOL[Compute pool] end subgraph artifacts [Artifacts] STG[Internal stage] REG[Model Registry] end NB --> Rtrain --> REG Rtrain --> STG TASK --> MJ TASK --> SPCS SPCS --> POOL SPCS --> STG ``` | Capability | Role in ML ops | snowflakeR entry | |------------|----------------|------------------| | **Tasks** | Schedule SQL, trigger pipelines, orchestrate doSnowflake chunks | `registerDoSnowflake(..., mode = "tasks")` | | **SPCS** | Inference services, parallel R workers | `sfr_deploy_model()`, doSnowflake, executor | | **Compute pools** | Capacity for containers | `sfr_dosnowflake_setup()` | | **Image repositories** | Store Docker images | `sfr_dosnowflake_build_image()` | | **Stages** | Models, worker payloads, results | Volume mounts in SPCS; `sfr_upload_*` | | **EAI** | Outbound HTTPS (CRAN, pip) in notebooks | Bootstrap — [Network & EAI](../07_network_and_eai/index.qmd) | ### Prepare SPCS infrastructure One-time (or per-environment) setup before parallel or executor workloads: ```r sfr_dosnowflake_setup(conn, compute_pool = "ML_R_POOL", image_repo = "ML_DB.ML_SCHEMA.ML_IMAGES" ) ``` Creates or validates stage, queue table (for queue mode), and related objects depending on privileges. ### One-off R script on SPCS When `%dopar%` is not the right shape — single script, fixed pipeline: ```r stage_path <- sfr_upload_r_script(conn, "monthly_retrain.R", stage = "ML_STAGE") job <- sfr_execute_r_script( conn, r_script_stage_path = stage_path, compute_pool = "ML_R_POOL", image_uri = "ML_DB.ML_SCHEMA.ML_IMAGES/executor:latest", config = list(run_id = "2026-05-29"), replicas = 1L ) sfr_executor_status(conn, job$job_name) sfr_collect_executor_results(conn, job, stage = "ML_STAGE") ``` Define `main(config)` in the script. Workers use **stage volume mounts** — not SQL `GET`/`PUT` ([SQL API limits](../11_rsnowflake_sql_api/index.qmd#sec-limits)). ### Tasks in a typical ML lifecycle 1. **Feature tables** — **Dynamic Tables** refresh on their own **TARGET_LAG** / schedule (no Task needed to “refresh the DT”). **Materialized views** in Snowflake stay consistent with base tables automatically. Use **Tasks** to chain *downstream* work (e.g. start retrain after a DT run completes), not to drive DT/MV refresh itself. 2. **Scheduled retrain** — weekly Task, ML Job, or executor script when models need periodic refit 3. **Post-deploy Task** — batch inference SQL against registry model (or `sfr_predict_sql()` for warehouse-native models) 4. **doSnowflake Tasks mode** — parallel `%dopar%` chunked into SPCS ([Parallel doSnowflake](../22_parallel_dosnowflake/index.qmd)) --- ## Environment check {#sec-check} Before Feature Store or Registry work: ```r sfr_check_environment() ``` Verifies R packages, Python interpreter, and `snowflake-ml-python` version. Python setup: [Local R Setup](../03_local_r_setup/index.qmd). Workspace: bootstrap or CRE. --- ## Companion material {#sec-companion} | Resource | Purpose | |----------|---------| | Vignettes `getting-started`, `setup` | API detail | | `workspace_quickstart.ipynb` | First Workspace run | | `local_quickstart.ipynb` | Desktop setup | | `workspace_parallel_spcs_setup.ipynb` | SPCS infrastructure | --- ## Next steps [Feature Store](../17_feature_store/index.qmd) — governed features and training datasets.

18.1 Overview

18.2 Learning Objectives

18.3 What sfr_connect() gives you

18.4 Connect

18.4.1 Local — connections.toml (recommended)

18.4.2 Explicit parameters

18.4.3 Key-pair (CI / service users)

18.4.4 Workspace (zero config)

18.5 Session context

18.6 Query and table I/O

18.6.1 SQL

18.6.2 Read / write tables

18.6.3 Column naming

18.7 DBI bridge to RSnowflake

18.8 Operational platform (beyond interactive ML)

18.8.1 Prepare SPCS infrastructure

18.8.2 One-off R script on SPCS

18.8.3 Tasks in a typical ML lifecycle

18.9 Environment check

18.10 Companion material

18.11 Next steps

18.3 What `sfr_connect()` gives you