Getting Started with RSnowflake

RSnowflake is a DBI-compliant R interface to Snowflake that connects directly to the Snowflake SQL API via HTTPS. It has no dependency on ODBC, JDBC, or Python.

Installation

# From GitHub (development version):
# install.packages("pak")
pak::pak("Snowflake-Labs/RSnowflake")

Authentication

RSnowflake supports several authentication methods. The recommended approach is to configure a profile in ~/.snowflake/connections.toml, the same file used by the Snowflake Python connector and Snowflake CLI.

Programmatic Access Token (PAT)

Set the SNOWFLAKE_PAT environment variable, or pass the token directly:

con <- dbConnect(
  Snowflake(),
  account = "myaccount",
  token   = Sys.getenv("SNOWFLAKE_PAT")
)

Snowflake Workspace Notebooks

Inside a Workspace Notebook, authentication is automatic – the session token is read from the environment:

con <- dbConnect(Snowflake())

Running Queries

# Simple query
df <- dbGetQuery(con, "SELECT * FROM my_table LIMIT 10")

# Parameterized query
df <- dbGetQuery(
  con,
  "SELECT * FROM users WHERE age > ?",
  params = list(21L)
)

# Execute DDL / DML (returns rows affected)
dbExecute(con, "CREATE TABLE test_tbl (id INT, name VARCHAR)")
dbExecute(con, "INSERT INTO test_tbl VALUES (1, 'Alice')")

Writing and Reading Tables

# Write a data.frame to Snowflake
dbWriteTable(con, "iris_copy", iris)

# Read it back
iris_back <- dbReadTable(con, "iris_copy")

# Append more rows
dbAppendTable(con, "iris_copy", iris[1:10, ])

# Clean up
dbRemoveTable(con, "iris_copy")

Bulk Write Methods

RSnowflake automatically picks the fastest write path for your environment. The upload_method option controls this:

Method How it works Best for
"auto" (default) Snowpark in Workspace, ADBC outside, literal for small data Most users
"snowpark" Python write_pandas via reticulate (internal SPCS path) Workspace bulk writes
"adbc" Arrow-native PUT + COPY INTO via Go driver Local/external bulk writes
"literal" SQL INSERT statements Small data, maximum compatibility

In Workspace Notebooks, "auto" routes bulk writes (above RSnowflake.bulk_write_threshold cells) through the Python Snowpark session via reticulate. This uses the internal SPCS network path and is ~10x faster than ADBC for writes in Workspace (~2.5s vs ~21s for 50K rows).

Outside Workspace, "auto" routes to ADBC when available, which outperforms literal INSERT for datasets above ~20K rows.

# Force a specific method
options(RSnowflake.upload_method = "snowpark")
dbWriteTable(con, "big_table", large_df)

# Adjust the threshold (cells = rows * cols)
options(RSnowflake.bulk_write_threshold = 100000L)

Identifier Case Handling

By default, RSnowflake uppercases table and column names in DDL/DML operations (dbWriteTable, dbCreateTable, dbReadTable, etc.). This matches the behavior of Snowflake’s ODBC driver and of Snowflake itself when identifiers are unquoted:

dbWriteTable(con, "my_table", data.frame(id = 1, name = "Alice"))
dbListFields(con, "my_table")
#> [1] "ID" "NAME"

To preserve the original case of identifiers (useful for DBItest compliance or when you need lowercase column names), set:

options(RSnowflake.identifier_case = "preserve")

dbWriteTable(con, "my_table", data.frame(id = 1, name = "Alice"), overwrite = TRUE)
dbListFields(con, "my_table")
#> [1] "id" "name"

Note: dbQuoteIdentifier() is not affected by this option – it always faithfully quotes the string you give it. The option only controls the implicit casing applied in table-level operations.

ADBC Acceleration (optional)

When the adbcsnowflake and adbcdrivermanager packages are installed, RSnowflake can use the ADBC (Arrow Database Connectivity) backend for Arrow-native reads and PUT + COPY INTO bulk writes. This is particularly beneficial for large datasets outside of Workspace Notebooks.

# Check if ADBC is available
has_adbc <- requireNamespace("adbcsnowflake", quietly = TRUE)
cat("ADBC available:", has_adbc, "\n")

# ADBC is used automatically in 'auto' mode for large writes
# Force it for reads:
options(RSnowflake.backend = "adbc")
df <- dbGetQuery(con, "SELECT * FROM big_table")
options(RSnowflake.backend = "auto")

See WORKSPACE_ADBC.md in the package for Workspace-specific ADBC constraints (EAI network rules, PAT auth, stage host whitelisting).

Arrow Interface

RSnowflake supports the DBI Arrow methods for compatibility with packages and workflows that expect them. This requires the nanoarrow package.

stream <- dbGetQueryArrow(con, "SELECT * FROM my_table")
df <- as.data.frame(stream)

Note: The Snowflake SQL API v2 only returns JSON, so these Arrow methods fetch data through the normal JSON path and convert to nanoarrow on the client side. This provides interface compatibility but no performance advantage over dbGetQuery(). For large result sets, dbGetQuery() is the most direct path. Native server-side Arrow transport is planned for a future release.

dbplyr Integration

RSnowflake integrates with dbplyr so you can use dplyr verbs that are translated to Snowflake SQL and executed lazily:

library(dplyr)

tbl(con, "my_table") |>
  filter(score > 90) |>
  select(name, score) |>
  arrange(desc(score)) |>
  collect()

Snowflake-Specific SQL Translations

In addition to the ~150 scalar and ~30 aggregate translations that dbplyr provides for Snowflake (e.g. floor_date() to DATE_TRUNC, paste0() to ARRAY_TO_STRING, ifelse() to IFF), RSnowflake adds translations for 17 Snowflake functions that dbplyr does not cover. These let you use Snowflake-specific SQL functions directly in dplyr pipelines without falling back to sql().

Semi-structured data – work with VARIANT, OBJECT, and ARRAY columns:

R function Snowflake SQL Description
parse_json(x) PARSE_JSON(x) Parse a JSON string into VARIANT
try_parse_json(x) TRY_PARSE_JSON(x) Safe parse (returns NULL on error)
typeof(x) TYPEOF(x) Inspect the type of a VARIANT value
is_object(x) IS_OBJECT(x) Check if VARIANT is an OBJECT
is_array(x) IS_ARRAY(x) Check if VARIANT is an ARRAY
is_integer(x) IS_INTEGER(x) Check if VARIANT is an INTEGER

Array operations:

R function Snowflake SQL Description
array_size(x) ARRAY_SIZE(x) Length of an ARRAY
array_contains(arr, val) ARRAY_CONTAINS(val, arr) Test membership (note: arg order swapped for you)
array_slice(arr, from, to) ARRAY_SLICE(arr, from, to) Slice an ARRAY

String and utility:

R function Snowflake SQL Description
regexp_substr(x, pattern) REGEXP_SUBSTR(x, pattern, ...) Extract substring by regex (supports all 5 Snowflake args)
hash(...) HASH(...) Deterministic hash of one or more columns

Aggregate functions – use inside summarise():

R function Snowflake SQL Description
object_agg(key, value) OBJECT_AGG(key, value) Build a JSON object from grouped key-value pairs
array_agg(x) ARRAY_AGG(x) Collect values into an ARRAY
array_unique_agg(x) ARRAY_UNIQUE_AGG(x) Collect distinct values into an ARRAY
approx_count_distinct(x) APPROX_COUNT_DISTINCT(x) HyperLogLog approximate distinct count
approx_percentile(x, p) APPROX_PERCENTILE(x, p) Approximate percentile
mode_val(x) MODE(x) Statistical mode (mode_val avoids shadowing base::mode())
# Example: parse JSON and aggregate into arrays
tbl(con, "events") |>
  mutate(payload = parse_json(raw_json)) |>
  group_by(event_type) |>
  summarise(
    user_ids = array_unique_agg(user_id),
    n_approx = approx_count_distinct(user_id)
  ) |>
  collect()

For the full list of dbplyr’s built-in translations, see vignette("translation-function", package = "dbplyr").

Transactions

Note: The Snowflake SQL API v2 is stateless per-request and does not currently support session-based transactions. Calling dbBegin() will raise an informative error. Session-based transaction support is planned for a future release that uses the internal Snowflake protocol.

# Not yet supported via SQL API v2 -- included for reference:
dbBegin(con)
dbExecute(con, "INSERT INTO accounts VALUES (1, 100.00)")
dbCommit(con)

Disconnecting

dbDisconnect(con)