flowchart TB
subgraph master [Master R session]
FE["foreach ... %dopar%"]
REG[registerDoSnowflake]
FE --> REG
end
subgraph coord [Coordination layer]
direction TB
LOCAL["Local: parallel socket cluster"]
TASKS["Tasks mode: Snowflake Task DAG"]
QUEUE["Queue mode: Hybrid Table"]
end
subgraph compute [Compute — SPCS compute pool]
W1[Worker container]
W2[Worker container]
WN[Worker container ...]
end
subgraph storage [Shared storage]
STG["Internal stage — job payload and results"]
WH[(Warehouse — orchestration SQL)]
end
REG --> LOCAL
REG --> TASKS
REG --> QUEUE
LOCAL -->|"in-process only"| master
TASKS -->|"1 Task → 1 SPCS job per chunk"| W1
TASKS --> W2
TASKS --> WN
TASKS --> WH
QUEUE -->|"workers claim rows"| W1
QUEUE --> W2
QUEUE --> WN
W1 --> STG
W2 --> STG
WN --> STG
master --> STG
24 Parallel doSnowflake
foreach, Tasks, queue mode, and SPCS R workers
snowflake, R, RStudio, Posit, VS Code, workspace notebooks, snowflakeR, RSnowflake, mlops
24.1 Overview
If you have used R’s foreach package with %dopar%, you already know the pattern: express parallel work as a loop, let a backend decide how iterations run. doSnowflake is snowflakeR’s backend for that interface — it takes the same foreach loop you would run on your laptop and dispatches iterations to Snowflake compute instead.
That matters because many R workloads — many-model forecasting, hyperparameter grids, per-partition feature engineering — are embarrassingly parallel: each iteration is independent, but there may be thousands of them. Running them sequentially in a Workspace notebook or on a desktop does not scale. Copying data out of Snowflake to a separate HPC cluster breaks governance. doSnowflake keeps data and execution in-account by running R inside Snowpark Container Services (SPCS) containers, coordinated either by Snowflake Tasks or a Hybrid Table queue.
This chapter explains how the modes differ, what Snowflake objects each mode needs, how work flows from your R session to workers and back, and how each partition gets its input data. Read this before Many-Model Patterns, which applies these mechanics to partitioned forecasting and registry batch scoring.
Prerequisites: Active sfr_connect() session; SPCS compute pool and image repository for remote modes; Hybrid Table entitlement for queue mode; role grants for CREATE TASK / CREATE SERVICE as required by your account.
24.2 Learning Objectives
- Explain what doSnowflake does and why it wraps
foreach - Compare local, tasks, and queue modes — compute, coordination, and trade-offs
- Describe the master → stage → workers → results lifecycle
- Explain who creates
tasks/task_XXX.rdsand how loop variables vs input data reach workers - Compare patterns for partition input data (bulk SQL, stage files,
.export) - Register a backend, build worker images, and handle stage I/O correctly
24.3 The foreach contract
R’s parallel ecosystem standardised on foreach + %dopar%. You write:
A backend registered with registerDoSnowflake() intercepts %dopar% and decides where each iteration runs. The loop body stays the same across modes — only the backend changes. That lets you develop locally, then scale to SPCS without rewriting modeling code.
doSnowflake also handles cross-cutting concerns the raw loop does not:
- Serialization — the R expression, iteration arguments, and
.packages/.exportvariables are packaged and sent to workers - Stage I/O — payloads and results travel through an internal stage (with volume mounts on workers)
- Result collection — worker outputs are gathered and combined according to your
.combinefunction - Cleanup — task graphs, ephemeral services, and stage paths are torn down after the run
Install Suggested packages before first use:
24.4 Architecture: three coordination models
All remote modes share the same high-level shape. Your R session (the master) lives in Workspace, Posit, or local R. When you call %dopar%, the master serializes work, workers on SPCS evaluate the R expression per chunk, and results return through the stage.
| Layer | Role |
|---|---|
| Master | Your interactive R session; runs registerDoSnowflake() and %dopar% |
| Coordination | How work is scheduled — socket cluster, Task DAG, or Hybrid Table queue |
| Compute pool | SPCS nodes running Docker worker images with R + packages |
| Stage | Serialized job definition, per-chunk tasks, and result files |
| Warehouse | SQL for Task orchestration, queue polling, and metadata (not for arbitrary R) |
Workers do not use a warehouse to execute R code. The warehouse supports Snowflake’s orchestration layer (Tasks, status queries). R runs inside containers on the compute pool.
24.5 Mode comparison
| Local | Tasks | Queue | |
|---|---|---|---|
| Where R runs | Same process / container as master (socket cluster) | SPCS job per chunk | SPCS worker pulls chunks from queue |
| Coordination | parallel::makeCluster() in-session |
Snowflake Task graph (DAG) | Hybrid Table with lease-based claiming |
| Dispatch | Immediate, in-memory | Fixed batch — all chunks planned upfront | Dynamic — rows inserted; workers pull when ready |
| SPCS required? | No | Yes | Yes |
| Hybrid Table? | No | No | Yes |
| Cold start | None | Per chunk (~30–120 s SPCS job startup) | Workers can stay warm (persistent pool) |
| Early stop | N/A | No — all Tasks must finish | Yes — feeder can stop; workers drain queue |
| Observability | R console | Task history in Snowsight | Queue depth + service logs |
| Best for | Dev, smoke tests, small loops | Known workload, batch jobs, scheduled runs | Time-boxed runs, priority queues, elastic worker count |
A reserved mode = "spcs" (stage-only job dispatch without Tasks) is planned but not implemented — selecting it raises an error today.
24.6 Local mode
Local mode is the default. It does not touch SPCS. Instead, doSnowflake creates a socket cluster (parallel::makeCluster()) inside your current R session — the Workspace container or your laptop — and runs iterations across local R worker processes.
Use local mode to:
- Validate that your
%dopar%body runs without syntax or package errors - Develop modeling logic before paying for SPCS
- Run modest parallelism when data already fits in the notebook container
The cluster is cached across successive %dopar% calls until you call stopDoSnowflake(). Worker count defaults to "auto" (detected CPU cores minus one for the master thread).
library(snowflakeR)
library(foreach)
conn <- sfr_connect()
registerDoSnowflake(conn, mode = "local", workers = 4L)
out <- foreach(i = 1:20, .combine = c, .packages = "stats") %dopar% {
median(rnorm(1000))
}
stopDoSnowflake()Local mode does not require sfr_dosnowflake_setup(), compute pools, or worker images. It is the right first step in every doSnowflake project.
24.7 Remote modes: shared infrastructure
Before tasks or queue mode, provision Snowflake objects once per environment:
This validates (and may create) objects your account needs: internal stage for job payloads, optional Hybrid Table for queue mode, schemas, and grants. Exact behaviour depends on role privileges — see ?sfr_dosnowflake_setup.
24.7.1 Worker Docker images
Remote modes run R inside a custom Docker image pushed to a Snowflake image repository. The image must include R, worker scripts, and packages your %dopar% body needs (e.g. forecast, dplyr).
Build from your machine or a CI host:
sfr_dosnowflake_build_image(
conn,
dockerfile = "path/to/Dockerfile.worker",
image_name = "my_r_worker"
)SPCS runs linux/amd64 only. On Apple Silicon, build with --platform linux/amd64. Wrong architecture produces exec format error at container start — often reported as job 395030 with empty logs. After rebuilding, cycle the compute pool (suspend/resume) so nodes pull the new image.
Reference Dockerfiles ship with snowflakeR under inst/workers/ (see ?sfr_dosnowflake_build_image).
24.8 Tasks mode
24.8.1 How it works
Tasks mode is Snowflake-native batch orchestration. When you call %dopar%, the master:
- Serializes all iterations and the R expression to the stage (chunked into
chunks_per_jobgroups) - Creates a Task DAG — one Snowflake Task per chunk, each Task launching an SPCS job service that runs the worker container
- Polls Task status until every chunk succeeds or fails
- Collects
result_*.rdsfiles from the stage and applies your.combinefunction - Cleans up the Task graph and temporary stage paths
Snowflake Tasks provide scheduling, retry semantics, and visibility in Snowsight — useful when the workload is a known batch that must fully complete.
sequenceDiagram
participant M as Master R session
participant ST as Internal stage
participant WH as Warehouse
participant T as Task DAG
participant S as SPCS job chunk
M->>ST: Serialize expr + iterations
M->>T: Create Task graph via Python bridge
T->>WH: Root task schedules children
loop Each chunk
T->>S: EXECUTE JOB SERVICE worker image
S->>ST: Read chunk payload
S->>S: Evaluate R expression
S->>ST: Write result_*.rds
end
M->>T: Poll until all chunks SUCCEEDED
M->>ST: Collect and combine results
M->>T: Drop Task graph
24.8.2 When to choose Tasks mode
- Fixed iteration count (e.g. 500 SKUs, all must be modeled)
- Scheduled or repeatable batch jobs where Task history is audit evidence
- Simpler mental model: N chunks → N Tasks → N containers
- Workloads where SPCS cold start per chunk is acceptable relative to chunk runtime (minutes of R work amortize startup)
24.8.3 When Tasks mode is a poor fit
- Time-boxed runs (“spend 30 minutes on the hardest SKUs”) — use queue mode with a feeder
- Very short iterations where cold start dominates total time — consider local mode or fewer, larger chunks
- Need dynamic re-prioritization while workers run — queue mode supports ranked re-feeding (Many-Model)
24.8.4 Key parameters
| Parameter | Purpose |
|---|---|
compute_pool |
SPCS pool for worker containers |
image_uri |
Worker image in image repository |
warehouse |
Warehouse for user-managed Task graph (avoids serverless Task privilege requirements) |
chunks_per_job |
How many SPCS jobs to create ("auto" sizes from iteration count) |
stage |
Stage name for payloads and results |
timeout_min / poll_sec |
How long the master waits and how often it checks status |
result_sync_wait_sec |
Extra wait for stage writes to appear before collection |
24.8.5 Example
registerDoSnowflake(conn, mode = "tasks",
compute_pool = "MY_POOL",
image_uri = "mydb.myschema.myrepo/worker:latest",
warehouse = "MY_WH",
stage = "MY_STAGE",
timeout_min = 60L,
poll_sec = 10L,
chunks_per_job = 5L
)
result <- foreach(i = 1:100, .combine = rbind, .packages = c("dplyr")) %dopar% {
data.frame(i = i, val = mean(rnorm(100)))
}
stopDoSnowflake()Inside each SPCS container, the worker typically uses parallel::mclapply to exploit all CPUs on the node — one container per compute node by default (containers_per_node = 1), with cores detected automatically.
24.9 Queue mode
24.9.1 How it works
Queue mode decouples work creation from work execution using a Hybrid Table as a durable queue.
When you call %dopar%, the master:
- Ensures the queue table exists (
queue_fqn, e.g.CONFIG.DOSNOWFLAKE_QUEUE) - Serializes iterations to the stage (same as Tasks mode)
- Inserts rows into the Hybrid Table — one row per chunk, status
PENDING - Launches workers — ephemeral SPCS job services, or relies on an already-running persistent worker pool
- Each worker claims a row (lease-based update to avoid double-processing), reads its chunk from the stage, runs R, writes results, marks the row complete
- Master polls until all rows for this
job_idare done, then collects results from the stage
Workers pull work when ready rather than receiving a fixed Task assignment upfront. That enables dynamic feeding, priority ordering, and time-boxed “screen then refine” pipelines.
flowchart LR
subgraph master [Master]
M["foreach %dopar%"]
end
subgraph queue [Hybrid Table queue]
P[PENDING rows]
R[RUNNING rows]
D[DONE rows]
P --> R --> D
end
subgraph workers [SPCS workers]
W1[Worker 1]
W2[Worker 2]
W3[Worker N]
end
STG[(Stage)]
M -->|"enqueue chunks"| P
W1 -->|"claim row"| P
W2 -->|"claim row"| P
W3 -->|"claim row"| P
W1 --> STG
W2 --> STG
W3 --> STG
M -->|"poll DONE"| D
M -->|"collect results"| STG
24.9.2 What each queue row represents
A common point of confusion: queue rows are not one row per foreach iteration. Each row is one chunk — a batch of iterations that a single worker will process together.
When the master serializes your %dopar% call, it:
- Splits all iterations into N chunks (controlled by
chunks_per_job, or"auto") - Writes the R expression, packages, and shared export variables to the stage
- Writes one
task_XXX.rdsfile per chunk on the stage, containing the iteration arguments for every iteration in that chunk - Inserts one Hybrid Table row per chunk — the row is a pointer to that chunk’s work
Hybrid Table row (coordination metadata):
| Column | Meaning |
|---|---|
QUEUE_ID |
Unique row identifier (primary key) |
JOB_ID |
UUID for this %dopar% run — ties all chunks of one job together |
CHUNK_ID |
Chunk index (001, 002, …) — maps to task_001.rds on the stage |
STAGE_PATH |
Stage directory for this job (e.g. @STAGE/job_<uuid>/) |
STATUS |
PENDING → RUNNING → DONE or FAILED |
WORKER_ID |
Which SPCS worker claimed the row |
LEASE_UNTIL |
Expiry — stale claims can be re-queued if a worker crashes |
ATTEMPTS |
Retry count |
ERROR_MSG |
Failure detail if FAILED |
Stage payload (iteration parameters and code):
@STAGE/job_<uuid>/
export.rds ← shared variables from .export / .packages
manifest.json ← expr text, package list, chunk count
tasks/
task_001.rds ← list: chunk_id, args, indices
task_002.rds ← args = list of per-iteration arg lists in this chunk
results/
result_001.rds ← worker output (written after chunk completes)
Inside task_001.rds, the args field holds the actual foreach iteration parameters — e.g. for foreach(unit_id = sku_ids), each element of args is the list of loop variables for one SKU. If chunks_per_job groups 50 SKUs per chunk, one queue row points to one file containing 50 iterations’ worth of arguments.
flowchart TB
subgraph foreach [Your foreach loop]
I1["iteration: unit_id = SKU_001"]
I2["iteration: unit_id = SKU_002"]
IN["... 2000 SKUs"]
end
subgraph master [Master serializes]
CH1["chunk 001 — 50 iterations"]
CH2["chunk 002 — 50 iterations"]
end
subgraph ht [Hybrid Table — one row per chunk]
R1["ROW: JOB_ID, CHUNK_ID=001, STAGE_PATH, PENDING"]
R2["ROW: JOB_ID, CHUNK_ID=002, STAGE_PATH, PENDING"]
end
subgraph stage [Stage files — iteration args]
T1["tasks/task_001.rds — args for 50 SKUs"]
T2["tasks/task_002.rds — args for 50 SKUs"]
end
I1 --> CH1
I2 --> CH1
IN --> CH2
CH1 --> R1 --> T1
CH2 --> R2 --> T2
When a worker claims a row, it reads CHUNK_ID and STAGE_PATH, loads tasks/task_{CHUNK_ID}.rds, evaluates your %dopar% expression once per element in args, and writes results/result_{CHUNK_ID}.rds. The master never puts full iteration data in the Hybrid Table — only enough metadata for workers to find the payload on the stage.
24.9.3 Who creates task_XXX.rds?
You do not create these files. When your master session runs %dopar%, doSnowflake’s serializer (.serialize_job_to_stage() in snowflakeR) automatically:
- Expands your
foreachiterator into a list of per-iteration argument lists (e.g.list(unit_id = "SKU_001"), …). - Groups them into chunks according to
chunks_per_job. - Writes
task_001.rds,task_002.rds, … — each containingargs(iteration parameters for that chunk) andindices. - PUTs
export.rds,expr.rds,manifest.json, and the task files to@stage/job_<uuid>/. - For queue mode, inserts one Hybrid Table row per chunk pointing at that stage path.
Your job is only to write the %dopar% loop and registerDoSnowflake() options. Chunk files are ephemeral job payloads unless you disable cleanup.
In many-model feeder scenarios (screen-then-refine), additional queue rows can be inserted dynamically while workers run — see Many-Model Patterns.
24.10 Getting data into each partition
The foreach loop variable (e.g. unit_id) is always serialized into task_XXX.rds. Training or inference rows are not — unless you choose one of the patterns below.
| What | Automatic? | Mechanism |
|---|---|---|
Loop variables (unit_id, i, …) |
Yes | Stored in each element of chunk$args; worker assign()s them before eval(expr) |
| Shared functions / constants | If listed in .export |
export.rds from master session |
| Partition dataset (e.g. time series for one SKU) | No — you configure it | data_query, stage files, or (discouraged) SQL inside the body |
24.10.1 Pattern A — Bulk SQL per chunk (recommended)
For many-model workloads, pass a data_query spec to registerDoSnowflake(). It is written to manifest.json. Before mclapply runs inside the worker, the worker issues one SELECT per chunk:
SELECT UNIT_ID, OBS_DATE, Y
FROM MY_DB.MY_SCHEMA.SERIES_EVENTS
WHERE UNIT_ID IN ('SKU_001', 'SKU_002', ...)
ORDER BY UNIT_ID, OBS_DATEThe result is split by partition key; for each iteration the worker injects unit_data as a data frame for that key. Your %dopar% body can then use unit_data$Y without writing SQL:
registerDoSnowflake(conn, mode = "tasks",
compute_pool = "MY_POOL",
image_uri = "mydb.myschema.myrepo/dosnowflake-worker:latest",
stage = "DOSNOWFLAKE_STAGE",
chunks_per_job = 50L,
data_query = list(
table = "MY_DB.MY_SCHEMA.SERIES_EVENTS",
key_column = "UNIT_ID",
key_arg = "unit_id", # must match foreach loop variable name
columns = c("UNIT_ID", "OBS_DATE", "Y"),
order_by = "UNIT_ID, OBS_DATE",
warehouse = "MY_WH" # warehouse used for the bulk read
)
)
foreach(unit_id = sku_ids, .combine = rbind, .packages = "forecast") %dopar% {
ts_full <- ts(unit_data$Y, frequency = 7) # unit_data injected by worker
# ...
}Warehouse sizing: each active chunk can run one bulk read while its SPCS container processes iterations. With Tasks mode, many chunks may start together; with queue mode, concurrency is roughly n_workers × chunks in flight. If hundreds of workers hit Snowflake at once, use a multi-cluster warehouse (or cap n_workers / pool size) so the read layer does not queue excessively. The warehouse in data_query is for data reads; a separate warehouse often handles Task orchestration and queue polling SQL (see registerDoSnowflake(..., warehouse = ...)).
Workers use RSnowflake inside SPCS for this read — keep queries bounded with the IN (...) list for keys in the chunk only, not full table scans per iteration.
24.10.2 Pattern B — Input files on stage
Export partitions ahead of time (e.g. COPY INTO @stage/partitions/ as Parquet or .rds per key). In the %dopar% body, read from Sys.getenv("STAGE_MOUNT") using the loop variable as part of the path:
foreach(unit_id = sku_ids, .combine = rbind) %dopar% {
path <- file.path(Sys.getenv("STAGE_MOUNT"), "inputs", paste0(unit_id, ".rds"))
unit_data <- readRDS(path)
# ...
}When to use: very wide tables where SQL projection is awkward, simulation inputs generated offline, or repeated runs over frozen snapshots. Efficiency: upload once; workers read via volume mount (no SQL GET). ETL can populate the stage with warehouse COPY while workers only do local I/O.
24.10.3 Pattern C — Small data in .export
Objects in .export are snapshotted into export.rds at job submit time. Suitable for functions, scalars, or small lookup tables — not for thousands of full series. Large exports inflate stage payloads and slow every chunk download.
24.10.4 Pattern D — SQL inside the foreach body (usually avoid)
You can open a connection and query per iteration inside %dopar%, but with mclapply that implies many queries per chunk and fork-unfriendly connection handling. The worker template intentionally runs one bulk read before mclapply when data_query is set. Prefer Pattern A unless iterations need unrelated tables per key.
24.10.5 Training vs inference
| Phase | Typical input pattern |
|---|---|
| Training (fit many models) | Pattern A: data_query from feature/series table; optional save_models + model_obj in return list → .rds on stage |
| Batch inference (score all partitions) | Often sfr_run_batch() after sfr_log_many_model() — Registry loads models server-side; input table supplies rows. Parallel %dopar% is the training path, not the only scoring path |
| Simulation / what-if | Pattern B: staged scenario files per run id; or Pattern A from a simulation results table |
See Many-Model Patterns for registry aggregation and sfr_run_batch().
24.10.6 Ephemeral vs persistent workers
worker_type |
Behaviour |
|---|---|
ephemeral (default) |
Master launches N SPCS job services for this run; workers exit when the queue is empty |
persistent |
Long-running SPCS service already polling the queue; multiple %dopar% jobs can share the same pool |
Persistent pools suit teams with frequent parallel jobs — workers stay warm and avoid repeated cold start. Ephemeral workers suit occasional batch runs without maintaining a always-on service.
24.10.7 When to choose Queue mode
- Time-boxed processing — feed the hardest partitions first, stop when budget expires
- Elastic worker count — scale
n_workerswithout pre-declaring every chunk as a Task - Many-model “screen then refine” — Tasks pass for fast models, queue pass for expensive model contests on worst performers
- Multi-tenant pools where several users share warm workers
24.10.8 Key parameters
| Parameter | Purpose |
|---|---|
queue_fqn |
Fully qualified Hybrid Table name |
worker_type |
"ephemeral" or "persistent" |
n_workers |
Container replicas (ephemeral launch) |
pre_warm |
Wait until all workers are READY before enqueueing (benchmarking) |
instance_family |
SPCS sizing — CPU_X64_S through CPU_X64_XL |
stale_timeout_sec |
Re-queue chunks whose lease expired (worker crash) |
24.10.9 Example
registerDoSnowflake(conn, mode = "queue",
compute_pool = "MY_POOL",
image_uri = "mydb.myschema.myrepo/worker:latest",
warehouse = "MY_WH",
n_workers = 4L,
queue_fqn = "MY_DB.MY_SCHEMA.MY_QUEUE",
worker_type = "persistent",
pre_warm = TRUE,
instance_family = "CPU_X64_S"
)
result <- foreach(unit_id = sku_ids, .combine = rbind) %dopar% {
# runs on whichever worker claims this chunk
fit_and_forecast(unit_id)
}
stopDoSnowflake()See Many-Model Patterns for the full forecasting pipeline built on queue mode.
24.11 Stage I/O and why GET/PUT do not apply
All remote modes move serialized R objects and result files through an internal stage. Workers access the stage via volume mounts mapped into the container filesystem — not via SQL GET/PUT.
RSnowflake connects through the SQL REST API, which does not support client-side GET/PUT. SPCS workers therefore must use mounted paths:
# Job spec (conceptual)
volumeMounts:
- name: stage-vol
mountPath: /data/stage
volumes:
- name: stage-vol
source: "@DB.SCHEMA.STAGE"Worker R code reads STAGE_MOUNT:
stage_mount <- Sys.getenv("STAGE_MOUNT", "")
payload <- readRDS(file.path(stage_mount, "job_abc", "tasks", "chunk_001.rds"))Package worker templates (inst/workers/worker_queue.R) implement a dual-path pattern: prefer volume mount, fall back to GET/PUT only when a JDBC-capable driver is present (not RSnowflake in SPCS).
The master session collects results by listing and reading the same stage paths after workers finish — with optional result_sync_wait_sec because volume writes can lag Task status flips.
24.12 Generic R executor (without foreach)
Not every batch job fits the %dopar% loop shape. For single scripts or fixed pipelines, snowflakeR provides a lighter executor path:
- Upload an R script to a stage (
sfr_upload_r_script()) - Launch an SPCS job service that runs the script (
sfr_execute_r_script()) - Poll status and collect outputs (
sfr_executor_status(),sfr_collect_executor_results())
The script defines an entry function (default main(config)). Configuration passes as a named list — useful for parameterized reports or one-off ETL.
stage_path <- sfr_upload_r_script(conn, "monthly_retrain.R", stage = "MY_STAGE")
job <- sfr_execute_r_script(
conn,
r_script_stage_path = stage_path,
compute_pool = "MY_POOL",
image_uri = "mydb.myschema.myrepo/executor:latest",
config = list(run_id = "2026-05-29", horizon = 30L),
replicas = 2L
)
sfr_executor_status(conn, job$job_name)
sfr_collect_executor_results(conn, job, stage = "MY_STAGE")Use foreach/doSnowflake when iterations share structure and you want .combine semantics. Use the executor when you have one monolithic script or need replicas of the same script with different configs.
24.13 crew + mirai (advanced)
For teams already using the crew orchestration ecosystem, snowflakeR exposes crew_controller_spcs() — a long-lived controller that launches SPCS workers speaking mirai over the network.
This path suits DAG-shaped workflows, custom task queues, and controllers that outlive a single %dopar% call. It requires suggested packages crew and mirai, plus EAI/network rules so workers can reach the controller.
library(crew)
library(snowflakeR)
conn <- sfr_connect()
ctl <- crew_controller_spcs(
conn,
compute_pool = "MY_POOL",
image = "mydb.myschema.myrepo/r_crew:latest",
workers = 4L
)
ctl$start()
# push / pop tasks ...
ctl$terminate()doSnowflake’s Tasks and Queue modes are the recommended default for standard foreach loops. crew/mirai is for advanced orchestration — see ?sfr_crew_controller_service and ?sfr_crew_launch_workers.
24.14 Choosing a pattern
| Your situation | Recommended approach |
|---|---|
| Developing or debugging loop body | mode = "local" |
| Fixed batch, full completion, Task audit trail | mode = "tasks" |
| Time-boxed, priority queue, warm worker pool | mode = "queue" |
| Thousands of partition models + registry | Many-Model on queue or tasks |
| Single ad hoc R script on SPCS | sfr_execute_r_script() |
| Custom DAG / long-lived controller | crew_controller_spcs() |
Decision flow:
flowchart TD
START[Parallel R workload] --> Q1{Iterations share<br/>foreach shape?}
Q1 -->|No| EXEC[sfr_execute_r_script]
Q1 -->|Yes| Q2{SPCS available?}
Q2 -->|No| LOCAL[mode = local]
Q2 -->|Yes| Q3{Fixed batch<br/>must complete?}
Q3 -->|Yes| TASKS[mode = tasks]
Q3 -->|No| Q4{Need warm pool<br/>or time-box?}
Q4 -->|Yes| QUEUE[mode = queue]
Q4 -->|No| TASKS
24.15 Monitoring and troubleshooting
| Signal | Where to look |
|---|---|
| Task chunk status | Snowsight → Tasks → graph for DOSNOWFLAKE_* DAG |
| SPCS job logs | SYSTEM$GET_SERVICE_LOGS |
| Queue depth | SQL on Hybrid Table — PENDING / RUNNING / DONE counts |
| Master progress | R console messages from doSnowflake (job ID, chunk count) |
| Empty SPCS logs + job 395030 | Wrong Docker CPU arch — rebuild linux/amd64 |
Runnable monitors: workspace_parallel_spcs_demo.ipynb and streamlit_parallel_demo_monitor.py in snowflakeR/inst/notebooks. See Appendix C: Troubleshooting.
24.16 Companion material
| Resource | Description |
|---|---|
Vignette parallel-dosnowflake |
API reference and concise examples |
Vignette many-model-howto |
End-to-end forecasting with Tasks vs Queue |
workspace_parallel_spcs_setup.ipynb |
One-time infrastructure setup |
workspace_parallel_spcs_demo.ipynb |
Runnable demo loop |
snowflaker_parallel_spcs_config.yaml |
Environment config template |
many-model-howto vignette |
End-to-end many-model walkthrough |
24.17 Next steps
End-to-End Pipeline — where parallel training fits in the full ML lifecycle.