2  Snowflake Platform Primer

Key concepts for R users new to Snowflake

Keywords

snowflake, R, RStudio, Posit, VS Code, workspace notebooks, snowflakeR, RSnowflake, mlops

2.1 Overview

Many R users come from duckdb, PostgreSQL, Spark, or local files — not from a cloud data platform. This chapter explains Snowflake capabilities referenced throughout the guide: storage vs compute, virtual-warehouses vs containers, Workspace, and ML services.

For authoritative product documentation, start at Snowflake Docs. Terms in bold are defined in Appendix F: Glossary.

2.2 Learning Objectives

  • Describe Snowflake’s separation of storage and compute
  • Distinguish warehouse compute from container compute
  • Explain how SQL, Python, and R workloads map to platform services
  • Know where to read more without duplicating Snowflake’s full catalog

2.3 What Snowflake is

Snowflake is a cloud data platform that combines:

Layer What it provides Typical users
Storage Durable tables, stages, governed data Everyone
SQL compute Virtual-Warehouses — elastic clusters for analytics Analysts, engineers, R via RSnowflake (SQL, dplyr)
Container compute SPCS — Docker workloads running in-account Inference, custom R, parallel workers
ML platform Feature Store, Model Registry, experiments Data scientists (Python + R via snowflakeR)
Apps & BI Streamlit, dashboards, partner BI tools Analysts, business users
AI Cortex (LLM functions, agents) App builders, analysts

For R users, the central idea is: keep data and ML artifacts in Snowflake and connect from R (laptop, Posit, or Workspace) rather than repeatedly exporting CSVs for local analysis. Strong Data Governance is a key benefit of maintaining and processing data in Snowflake.

flowchart TB
  subgraph clients [R clients]
    IDE[RStudio / Posit / VS Code]
    WS[Workspace Notebooks]
  end
  subgraph sf [Snowflake account]
    WH[Warehouse SQL compute]
    SPCS[Snowpark Container Services]
    ML[ML Feature Store / Registry]
    TBL[Tables]
    STG[Internal stages]
  end
  IDE --> WH
  IDE --> ML
  WS --> WH
  WS --> ML
  WS --> SPCS
  WH --> TBL
  WH --> STG
  ML --> TBL
  ML --> STG
  SPCS --> STG

2.4 Storage vs compute

Unlike a traditional database server where storage and compute are tied together, Snowflake separates them:

  • Storage — All tables and micro-partitions live in cloud storage managed by Snowflake. You pay for compressed stored bytes (with time-travel and fail-safe options).
  • ComputeVirtual warehouses spin up independently, near instantaneously to run SQL based workloads, including Python, Java, Scala & JavaScript via Stored Procedures and User-Defined Functions. Multiple warehouses can read the same data without copying it.

Implications for R:

  • collect() in dbplyr executes the pipeline on the warehouse, then pulls the full result into your R session. Use it only when you need data locally (modeling, plots, small samples) — not as the default for every dplyr step.
  • Keep transforms in Snowflake — dbplyr builds SQL; verbs like filter() and summarise() do not move rows until you ask. For multi-step work, materialize on the warehouse instead of chaining everything into one giant query or pulling early:
library(dplyr)
library(dbplyr)

orders <- tbl(con, in_schema("ANALYTICS", "ORDERS"))

# Step 1: aggregate on warehouse → temp table (CREATE TEMPORARY TABLE AS …)
by_region <- orders |>
  filter(order_date >= "2024-01-01") |>
  group_by(region) |>
  summarise(revenue = sum(amount, na.rm = TRUE), .groups = "drop") |>
  compute(name = "region_revenue_tmp")   # stays in Snowflake

# Step 2: next dplyr step reads the materialized table, still lazy until collect()
by_region |>
  filter(revenue > 1e6) |>
  collect()   # only the final, small result crosses to R

compute() runs CREATE TEMPORARY TABLE … AS SELECT (RSnowflake implements this via dbplyr). Use a pipeline of temp tables for heavy joins or to break up nested SQL; use temporary = FALSE and a qualified name when you want a permanent table other teams can reuse. copy_to() is the opposite path — upload a local data.frame into Snowflake to start a pipeline. Full patterns: RSnowflake: Query — dbplyr.

  • Feature Store and Model Registry objects persist in Snowflake — no separate model server database to maintain.

Further reading: Key concepts & architecture

2.5 Core data concepts

Concept Plain language R relevance
Account Tenant identifier (orgname-accountname, region in URL) Auth target for sfr_connect() / RSnowflake
Database / Schema Namespace hierarchy (DB.SCHEMA.TABLE) Session context; fully qualify when context unset
Table / View Persistent or logical datasets sfr_read_table(), dbplyr::tbl()
Dynamic Table Incrementally maintained query result Often backs Feature Views
Stage Internal file store (@stage/path), or externally referenced cloud-storage (e.g. AWS S3) Model artifacts, worker file I/O
Role RBAC identity with grants Controls warehouses, ML objects, integrations
Warehouse SQL compute cluster Queries, data transformations, data materialization, warehouse inference

Snowflake stores unquoted identifiers as uppercase. R and dbplyr code should account for this — see RSnowflake chapters.

Further reading: Databases, Tables, & Views

2.6 Warehouse compute

A virtual warehouse is a cluster of compute nodes that executes SQL (and Snowflake-managed operations invoked via SQL).

Property Detail
Sizing T-shirt sizes (X-Small → 6X-Large+) : Standard or Snowpark-optimized (> memory to CPU) types
Lifecycle Auto-suspend / auto-resume; you pay while running
Scaling Single cluster or multi-cluster for concurrency
Use cases Analytics, ETL SQL, Feature View refresh, batch scoring via SQL

R uses warehouses when:

  • RSnowflake/dbplyr executes SQL (collect(), dbWriteTable())
  • snowflakeR runs sfr_query() or registers Feature Store objects that materialize on warehouse
  • Tasks run scheduled SQL steps

Warehouses are charged per minute of compute time, with a minimum of 1 minute, corellated to the T-Shirt size. Most workloads scale linearly with the size increase in the warehouse, so 2x the T-Shirt size will result in half the compute time, and cost

Warehouses do not execute arbitrary R code — only SQL and platform operations they orchestrate.

Note

Analogy: Think of a warehouse as a remote, auto-scaling SQL engine. Your R session sends SQL; the warehouse does the heavy lifting close to the data. The SQL can be generated directly from R dplyr data transformation logic to push processing to the data.

Further reading: Virtual warehouses

2.7 Container compute (SPCS)

Snowpark Container Services (SPCS) runs OCI-compatible container images inside your Snowflake account on compute pools.

Property Detail
Unit Compute pool (not a warehouse)
Images Custom Docker or Snowflake-provided ML images
Use cases Model inference services, GPU training, custom R servers, parallel R workers
I/O Stage volume mounts for file access (important for RSnowflake — no SQL GET/PUT)

R uses SPCS when:

  • Model Registry deploys an inference endpoint (R model inside container via rpy2 wrapper)
  • doSnowflake runs R in worker containers to distribute and parallelize processing tasks across multiple containers.
  • You host RStudio Server / Posit Workbench in a custom image (advanced)

Further reading: Snowpark Container Services overview

2.8 Workspace & notebooks

Workspace is Snowflake’s project-oriented development area in Snowsight — Jupyter-compatible notebooks with Git integration, files, and in-account compute. This guide targets Workspace Notebooks only. Multi-language R support (%%R, bootstrap, snowflakeR) requires Workspace; it is not available in older standalone Snowsight notebook experiences.

Capability What it provides
Project + files Notebooks, scripts, configs (snowflaker_config.yaml)
Git integration Clone, commit, push — team collaboration
Notebooks Python, SQL, Markdown cells — R added via bootstrap
ML Container Runtime Python kernel on a compute pool; SQL on a warehouse

At platform level: Workspace provides an in-account Python kernel on container runtime; this guide adds R through snowflake-notebook-multilang.

Note

Terminology: “Workspace Notebook” means a notebook running inside your Snowflake account — not Jupyter on your laptop. Kernel and data stay in-account.

Further reading: Snowflake Notebooks in Workspaces

2.9 Notebook runtime model

Workspace notebooks use ML Container Runtime — a managed container environment on a compute pool (not a warehouse):

Component Runs on Notes
Python kernel Compute pool / container runtime Snowpark session, setup_notebook(), package installs
R (%%R cells) Same Python process via rpy2 Not a separate R kernel — see Architecture
SQL cells Virtual warehouse Same SQL engine as worksheets
Package install Container (pip, conda, micromamba) Requires EAI for outbound HTTPS

Flexible package install (pip/conda/micromamba) is why R bootstrap uses micromamba and External Access Integrations — unlike warehouse-only runtimes with fixed Anaconda channels.

Further reading: ML Container Runtime

2.10 ML & AI capabilities (used from R)

Service Purpose R access
Feature Store Governed features, point-in-time training data, data pipelines to inference datasets snowflakeR
Model Registry Versioned R or Python models, deployment, lineage snowflakeR
Datasets Immutable training data snapshots, point-in-time inference datasets snowflakeR
Experiments Run model training and tracking metrics snowflakeR
Model monitoring Model and Data drift and performance snowflakeR
Snowpark ML Python SDK underlying snowflakeR via reticulate in snowflakeR
Cortex LLMs, embeddings, agents in SQL SQL from R; not wrapped by snowflakeR today
Feature Store deep dive

Platform design, temporal correctness, and operations: Snowflake Feature Store Implementation Guide (Python examples; concepts apply to R).

Further reading: Snowflake ML overview

2.11 Data engineering & analytics (context)

R users often sit alongside teams using:

Capability Role
dbt Transformations in warehouse — can feed Feature View source tables
Dynamic Tables & Streams Incremental pipelines
Tasks Scheduled SQL / pipeline steps
Snowpipe / connectors Ingest streaming datafrom cloud storage or apps
Data sharing / Marketplace Consume shared datasets

You do not need to master all of these to use snowflakeR — but Feature Views often sit downstream of dbt or Dynamic Table pipelines.

2.12 Orchestration & ops

Feature Role in ML ops
Tasks Schedule SQL, notebook runs, pipeline steps; doSnowflake Task backend
Streams & Dynamic Tables Incremental features; CDC patterns
Git integration Notebook and project source control in Workspace
External Access Integrations (EAI) Outbound HTTPS (CRAN, GitHub, pip) from notebooks/containers
Tags & masking Governance on ML and data objects

Further reading: Tasks introduction

2.13 Security & governance (brief)

  • RBAC — Users assume roles with grants on databases, warehouses, integrations
  • Network policies — Restrict account access by IP (external clients)
  • EAI + network rules — Control outbound destinations from notebooks/containers
  • Row access / masking policies — Column-level protection (apply before exporting to R if needed)

R clients use the same role model as other tools — your connections.toml profile specifies role.

2.14 How this guide maps to the platform

Guide part Platform surface
R from your IDE External client → SQL REST API / ML Python APIs
Workspace Notebooks In-account kernel + R bootstrap
RSnowflake SQL over SQL REST API (+ optional ADBC for bulk I/O)
snowflakeR ML APIs + SPCS deployment + Tasks/queue backends
What is ADBC?

ADBC (Arrow Database Connectivity) is a columnar database access standard — the Arrow ecosystem’s analogue to ODBC/JDBC. RSnowflake can use the Snowflake ADBC driver (adbcsnowflake) for large bulk reads and writes as Apache Arrow streams instead of JSON over the SQL API. You do not need ADBC for everyday queries; enable it when moving wide or million-row result sets. Defined in Glossary; details in SQL API and RSnowflake: Query.

2.15 Quickstart pointers

Topic Snowflake doc
SQL worksheets Getting started with Snowsight
Snowflake ML Snowflake ML quickstart
SPCS Tutorial: Snowpark Container Services
Feature Store Feature Store overview

2.16 Next steps

Architecture — how the three R packages map to this platform.

Workspaces & Notebooks — Snowsight UX before bootstrap.

Appendix F: Glossary — acronyms and terms.