19 Feature Store

Entities, feature views, datasets, and training data from R

Keywords

snowflake, R, RStudio, Posit, VS Code, workspace notebooks, snowflakeR, RSnowflake, mlops

19.1 Overview

Machine learning features — customer aggregates, rolling transaction sums, product embeddings — are usually built in one-off SQL scripts or notebook cells, then copied into training CSVs. That breaks down quickly:

Training/serving skew — batch SQL for training differs from the app query at inference time
Leakage — historical labels joined to features computed with future data
No reuse — each team re-implements the same AVG(order_total) logic
Weak lineage — auditors cannot trace which feature definition trained which model

The Snowflake Feature Store is a governed catalog of feature definitions: entities (join keys), versioned feature views (SQL transformations), and APIs for point-in-time-correct training joins and consistent inference retrieval. snowflakeR exposes the full Python Feature Store API from R — objects you create in R appear identically in Snowsight and in Python notebooks.

Platform deep dive (recommended)

For temporal joins, Dynamic Tables vs external views, online serving, and operations, read the Snowflake Feature Store Implementation Guide — especially Concepts, Feature Views, and Temporal Features. Examples there are Python; the objects and semantics are identical when created from R.

See also MLOps on Snowflake for where Feature Store sits in the lifecycle, Authoring Feature Views for warehouse SQL/dbplyr feature definitions, Training Data in R for reading exports (OBJECT/ARRAY/VECTOR → R), and Model Registry for logging models trained on Feature Store datasets.

19.2 Learning Objectives

Explain why point-in-time joins matter and how leakage happens without them
Describe entities, feature views, spines, and datasets
Register managed vs external feature views from R (SQL or dbplyr)
Generate training data and retrieve inference-time features
Connect Feature Store outputs to Model Registry lineage

19.3 The problem Feature Store solves

Consider churn modeling:

Anti-pattern	What goes wrong
Join labels to today’s customer feature table	Model sees future behaviour at training time — inflated metrics
Different SQL in training notebook vs production app	Training/serving skew — production accuracy collapses
Ad hoc feature tables per project	No shared definition of `avg_order_total` across teams
Export 10M rows to R with `collect()`	Slow, expensive, breaks governance

Feature Store centralizes definitions and enforces temporal correctness when building training sets.

19.3.1 Training vs inference (different operations)

Phase	Question	Feature Store API (R)
Training	“What did we know about entity X at time T?”	`sfr_generate_training_data()` / `sfr_generate_dataset()`
Inference (latest)	“What do we know now about entity X?”	`sfr_retrieve_features()` without `spine_timestamp_col`, or OFT / online read
Inference (as-of batch)	“What did we know at each row’s timestamp?” (incremental backfill, replay scoring)	`sfr_retrieve_features(..., spine_timestamp_col = ...)` — same ASOF logic as training, without labels

Training uses a spine (entity keys + event timestamps + labels). Inference often uses latest feature values, but batch pipelines can pass a timestamp spine for point-in-time-correct feature lookup on the offline store.

19.4 Architecture

A Feature View connects tabular sources in Snowflake to registered metadata and a materialized offline store (table or Dynamic Table). The feature_df you pass at registration time can be as simple or as complex as your pipeline requires.

19.4.1 Sources and definitions

Layer	Options
Source data	Any tabular object — table, view, Dynamic Table, dbt model output, stage-backed table, etc.
`feature_df` definition	Pass-through — `SELECT ` or column subset from an externally maintained table (external Feature View). Transform* — aggregations, joins, windows, bundles entirely in SQL or dbplyr (managed Feature View with `refresh_freq`). Hybrid — read from a maintained table and enrich with SQL (e.g. keys from ETL + `OBJECT_CONSTRUCT` in the same query).
Registration mode	External — no `refresh_freq`; you own refresh. Managed — `refresh_freq` → platform maintains backing Dynamic Table (or equivalent).

See Authoring Feature Views for SQL/dbplyr patterns from simple aggregates through rollup tiles and OBJECT/ARRAY bundles.

flowchart TB
  subgraph sources [Tabular sources in Snowflake]
    direction LR
    TBL[(Table)]
    VW[View]
    DTs[(Dynamic Table)]
    TBL ~~~ VW ~~~ DTs
  end

  subgraph define [feature_df at registration]
    direction TB
    PASS[Pass-through SELECT from source]
    XFORM[Transform SQL or dbplyr aggregates joins bundles]
    HYBRID[Hybrid source table plus SQL enrich]
  end

  subgraph store [Feature Store offline]
    FV[Feature View versioned metadata]
    OFF[(Offline materialization table or DT)]
    FV --> OFF
  end

  subgraph consume [Consumption paths]
    direction TB
    TRAIN[sfr_generate_training_data spine plus ASOF]
    INFER_L[sfr_retrieve_features latest keys]
    INFER_T[sfr_retrieve_features timestamp spine ASOF]
    ONLN[Online Feature Table key lookup]
  end

  sources --> define
  PASS --> FV
  XFORM --> FV
  HYBRID --> FV
  OFF --> TRAIN
  OFF --> INFER_L
  OFF --> INFER_T
  FV --> ONLN

Compute note: Feature view registration and refresh use the warehouse you pass to sfr_feature_store(). Feature data materializes in Snowflake — not in your R session until you explicitly read or generate training data.

19.5 Core concepts

Concept	Definition	Think of it as
Feature Store	Dedicated schema/database area for entities, views, metadata	Catalog + governance layer
Entity	Business object + join keys (`CUSTOMER_ID`)	“How features attach to real-world things”
Feature View	Versioned SQL (or dbplyr) transformation producing features for entity(ies)	Reusable feature definition
Managed view	Has `refresh_freq` → Snowflake maintains backing Dynamic Table	Auto-refreshed pipeline
External view	No `refresh_freq` → you maintain source table (dbt, ETL)	Registered SQL over existing table
Spine	Rows: entity keys + event timestamp + optional labels	“One training example per row”
Point-in-time join	For each spine row, features as-of that timestamp	Prevents future leakage
Dataset	Immutable, versioned snapshot of joined training data	Reproducible input to `sfr_log_model()`

19.5.1 Point-in-time join — concrete example

Wrong (leakage):

-- Labels from 2024, features computed TODAY
SELECT l.customer_id, l.churned, f.avg_order_total
FROM churn_labels l
JOIN customer_features_today f ON l.customer_id = f.customer_id

If avg_order_total includes orders placed after the churn event, the model cheats.

Right (Feature Store):

For each row in the spine (customer_id, event_ts, churned), the store joins feature values known at event_ts — orders after that timestamp are excluded automatically.

sequenceDiagram
  participant SP as Spine row C001 event_ts=T
  participant FS as Feature Store
  participant FV as CUSTOMER_ACTIVITY view

  SP->>FS: Need features for C001 at time T
  FS->>FV: Point-in-time filter event_ts <= T
  FV-->>FS: page_views clicks as of T
  FS-->>SP: Feature row attached to label

19.6 Setup

Connect to (or create) a Feature Store in a dedicated schema:

library(snowflakeR)
conn <- sfr_connect()
conn <- sfr_load_notebook_config(conn)

fs <- sfr_feature_store(
  conn,
  database  = "ML_DB",
  schema    = "FEATURES",
  warehouse = "ML_WH",
  create    = TRUE   # creates schema + tags if missing
)

First-time admin (roles, grants, tags):

sfr_setup_feature_store(conn, "ML_DB", "FEATURES", "ML_WH")

Requires privileges per Feature Store RBAC. Data engineers often run setup once; data scientists use sfr_feature_store(..., create = FALSE) against an existing schema.

19.7 Workflow phases

Step	Action	Typical owner
1	Define entities (join keys)	ML engineer + domain expert
2	Author feature view SQL or dbplyr	ML engineer
3	Register view with version	`sfr_create_feature_view()`
4	Build spine from labels table	Same or analytics team
5	Generate training data or Dataset	`sfr_generate_*()`
6	Train in R (tidymodels, glm, …)	Data scientist
7	Log model with `training_dataset`	Model Registry

19.8 Entities

Entities define how features join to business objects. Join keys are immutable after creation — plan compound keys carefully.

customer <- sfr_create_entity(
  fs, "CUSTOMER", "CUSTOMER_ID",
  desc = "Customer entity"
)

product <- sfr_create_entity(
  fs, "PRODUCT", "PRODUCT_ID",
  desc = "Product entity"
)

sfr_list_entities(fs)
sfr_get_entity(fs, "CUSTOMER")

Multi-key entities (e.g. store + SKU) pass multiple columns — see vignette feature-store.

19.9 Feature views

A feature view is a versioned transformation tied to one or more entities. The feature_df argument accepts:

SQL string — any SELECT over tabular sources (table, view, Dynamic Table, …)
dbplyr lazy table — dplyr pipeline translated to SQL (RSnowflake)

The backing query may pass through an externally maintained table, compute features entirely in SQL/dbplyr, or combine both (hybrid). See Architecture — Sources and definitions and Authoring Feature Views.

19.9.1 Managed views (automatic refresh)

When you set refresh_freq, Snowflake registers a Dynamic Table (or equivalent managed object) that refreshes on schedule:

fv <- sfr_create_feature_view(
  fs,
  name         = "CUSTOMER_FEATURES",
  version      = "v1",
  entities     = customer,
  feature_df   = "
    SELECT customer_id,
           AVG(order_total) AS avg_order_total,
           COUNT(*) AS order_count,
           MAX(order_date) AS last_order_date
    FROM orders
    GROUP BY customer_id
  ",
  refresh_freq = "1 hour",
  desc         = "Customer aggregates from orders"
)

Use managed views when Feature Store should own refresh — aggregates rebuilt hourly from raw orders.

19.9.2 External views (you maintain the table)

Omit refresh_freq when dbt, Dynamic Tables elsewhere, or ETL already maintains the physical table:

fv_external <- sfr_create_feature_view(
  fs,
  name       = "CUSTOMER_DEMOGRAPHICS",
  version    = "v1",
  entities   = customer,
  feature_df = "SELECT customer_id, age_band, region FROM analytics.customer_demo"
)

Feature Store registers metadata and lineage; refresh is your pipeline’s job. An external process maintains the contents of analytics.customer_demo, and Feature Store will use the latest values from that table, or if defined and maintained with a timestamp column, the values as-of the timestamp. When a timestamp is defined it is critical that the external process maintains the timestamp column and that it is updated with the current timestamp when the data is updated to avoid future leakage.

19.9.3 Draft then register (two-step)

Inspect locally before materializing — mirrors Python API:

fv_draft <- sfr_feature_view(
  name         = "CUSTOMER_FEATURES",
  entities     = customer,
  feature_df   = "SELECT customer_id, AVG(order_total) AS avg_order_total FROM orders GROUP BY 1",
  refresh_freq = "1 hour"
)

fv <- sfr_register_feature_view(fs, fv_draft, version = "v1")

19.9.4 dbplyr as `feature_df`

Keep feature logic in R dplyr while SQL runs on the warehouse:

library(dplyr)
dbi_con <- sfr_dbi_connection(conn)

customer_features_query <- tbl(dbi_con, "ORDERS") |>
  group_by(customer_id) |>
  summarise(
    avg_order_total = mean(order_total, na.rm = TRUE),
    order_count     = n(),
    .groups         = "drop"
  )

fv <- sfr_create_feature_view(
  fs,
  name       = "CUSTOMER_FEATURES_DBPLYR",
  version    = "v1",
  entities   = customer,
  feature_df = customer_features_query
)

snowflakeR translates the lazy table to SQL — no collect() before registration.

19.9.5 Wide and sparse feature sets

When a feature view would need hundreds of scalar columns, prefer OBJECT or ARRAY bundles (and, for training export at scale, ARRAY → ::VECTOR columns) instead of a wide table.

Why bundles help:

Schema evolution — add feature keys or array slots without a new column per feature (especially sparse or slowly growing catalogs).
Scale — fewer columns for the planner and for cross-view joins; optional flatten only at export time.
Hybrid layouts — keep join keys and timestamps as scalars; pack high-cardinality or evolving payloads into OBJECT (Implementation Guide §12.10.4).

Snowflake documents platform patterns in Implementation Guide §12.10 — Wide & sparse. Authoring detail: Authoring Feature Views — Wide and sparse bundles.

From R, build bundles on the warehouse with RSnowflake-only dplyr helpers such as object_agg() and array_agg() — see RSnowflake Query — dbplyr and Authoring Feature Views.

For end-to-end training export (scalar-wide ASOF → three ARRAY/VECTOR packs, ARRAY-native views, OBJECT flattening, Python decode), use the pipeline table in Training Data in R — End-to-end pipelines (Pipelines A–E).

19.9.6 Time-series (event-level) features

When features change over time (not one row per entity), set timestamp_col:

fv_ts <- sfr_create_feature_view(
  fs,
  name          = "CUSTOMER_ACTIVITY",
  version       = "v1",
  entities      = customer,
  feature_df    = "SELECT customer_id, event_ts, page_views, clicks FROM activity",
  timestamp_col = "event_ts",
  refresh_freq  = "30 minutes"
)

Required for correct point-in-time joins — the store uses timestamps to pick feature values as-of each spine row.

19.9.7 Rollup and tiled aggregates

For rollup Feature Views over event streams (platform-managed tiles and lookback windows — e.g. sum of amount over 1 day), declare aggregates with sfr_feature() and set feature_granularity instead of writing every window in SQL. See Authoring Feature Views — Rollup and tiled Feature Views and the feature-store vignette.

19.10 Managing views

sfr_list_feature_views(fs)
sfr_get_feature_view(fs, "CUSTOMER_FEATURES", "v1")
sfr_show_fv_versions(fs, "CUSTOMER_FEATURES")

feature_data <- sfr_read_feature_view(fs, "CUSTOMER_FEATURES", "v1")

sfr_refresh_feature_view(fs, "CUSTOMER_FEATURES", "v1")
sfr_get_refresh_history(fs, "CUSTOMER_FEATURES", "v1")
sfr_suspend_feature_view(fs, "CUSTOMER_FEATURES", "v1")
sfr_resume_feature_view(fs, "CUSTOMER_FEATURES", "v1")

Refresh history shows Dynamic Table run status — use it when training data looks stale. Suspend/resume pauses automatic refresh during pipeline migrations.

19.11 Generate training data

The spine drives training. Each row needs entity key(s), an event timestamp, and label column(s):

training_data <- sfr_generate_training_data(
  fs,
  spine = "
    SELECT customer_id, churn_date AS event_ts, churned AS label
    FROM churn_labels
    WHERE churn_date >= '2024-01-01'
  ",
  features = list(
    list(name = "CUSTOMER_FEATURES", version = "v1"),
    list(name = "CUSTOMER_ACTIVITY", version = "v1")
  ),
  spine_timestamp_col = "event_ts",
  spine_label_cols    = "label"
)

Or pass registered view objects:

fv1 <- sfr_get_feature_view(fs, "CUSTOMER_FEATURES", "v1")
fv2 <- sfr_get_feature_view(fs, "CUSTOMER_ACTIVITY", "v1")

training_data <- sfr_generate_training_data(
  fs,
  spine    = "SELECT customer_id, churn_date AS event_ts, churned FROM churn_labels",
  features = list(fv1, fv2),
  spine_timestamp_col = "event_ts",
  spine_label_cols    = "churned"
)

Materialize to a table for reuse:

training_data <- sfr_generate_training_data(
  fs,
  spine    = "...",
  features = list(fv1, fv2),
  save_as  = "CHURN_TRAINING_SET_V1"
)

19.11.1 Datasets (immutable snapshots + lineage)

For Model Registry lineage (visible in Snowsight as Source → Feature View → Dataset → Model):

training <- sfr_generate_dataset(
  fs,
  name     = "CHURN_TRAINING",
  spine    = "SELECT customer_id, label FROM labels",
  features = list(list(name = "CUSTOMER_FEATURES", version = "v1")),
  version  = "v1",
  spine_label_cols = "label"
)

model <- glm(label ~ ., data = training, family = binomial)

mv <- sfr_log_model(
  reg,
  model            = model,
  model_name       = "CHURN_MODEL",
  training_dataset = training,
  input_cols       = sfr_input_cols(training, exclude = "label"),
  output_cols      = list(prediction = "double")
)

See Model Registry — lineage.

19.12 Inference-time features (offline store)

For batch scoring against offline Feature Views, use sfr_retrieve_features() to join a spine to feature values.

19.12.1 Latest values (no timestamp spine)

When each scoring row needs current offline features, pass entity keys only — omit spine_timestamp_col:

features_for_scoring <- sfr_retrieve_features(
  fs,
  spine    = "SELECT customer_id FROM customers_to_score",
  features = list(fv1, fv2)
)

19.12.2 Point-in-time batch inference (timestamp spine)

For incremental batch inference, backfills, or replay scoring where each row has its own as-of time, pass spine_timestamp_col — the same ASOF join logic as training, without labels:

features_for_scoring <- sfr_retrieve_features(
  fs,
  spine = "
    SELECT customer_id, score_as_of_ts
    FROM incremental_score_queue
  ",
  features            = list(fv1, fv2),
  spine_timestamp_col = "score_as_of_ts"
)

Use this when the scoring queue carries an event or decision timestamp and features must match what was knowable at that time — not necessarily “latest” across the whole table.

Pass results to predict() locally or to a batch scoring pipeline. Align column names with input_cols declared at model registration.

For sub-minute, key-based serving of latest values, use the online Hybrid Table — Online serving (sfr_read_feature_view(..., store_type = "ONLINE") or SQL/dbplyr on the OFT).

19.13 Online serving

Online Feature Tables (OFTs) expose the latest feature values per entity key on a Hybrid Table, refreshed from the offline Feature View on a schedule driven by target_lag (Stream + Task + MERGE). They are for low-latency inference, not point-in-time training — use offline views and Inference-time features / sfr_generate_training_data() for history.

Requires snowflake-ml-python ≥ 1.18.0 and a platform edition that supports online serving. Platform detail: Implementation Guide — Online Features.

19.13.1 Enable online on a Feature View

fv_online <- sfr_create_feature_view(
  fs,
  name          = "CUSTOMER_ONLINE",
  version       = "v1",
  entities      = customer,
  feature_df    = "SELECT customer_id, credit_score FROM credit_data",
  refresh_freq  = "5 minutes",
  online_config = sfr_online_config(enable = TRUE, target_lag = "2 minutes")
)

# Or add online to an existing view
sfr_update_feature_view(
  fs, "CUSTOMER_FEATURES", "v1",
  online_config = sfr_online_config(enable = TRUE, target_lag = "5 minutes")
)

Snowflake creates the OFT (Hybrid Table + sync Task). target_lag on the online config is independent of offline refresh_freq — end-to-end staleness is roughly both paths combined.

19.13.2 Retrieve values — snowflakeR → snowflake-ml-python

Use sfr_read_feature_view() with store_type = "ONLINE" (wraps fs.read_feature_view(..., StoreType.ONLINE)). Pass entity keys and optional feature column names; batch keys in one call when scoring many entities.

# Single customer lookup
online_row <- sfr_read_feature_view(
  fs,
  name          = "CUSTOMER_ONLINE",
  version       = "v1",
  store_type    = "ONLINE",
  keys          = list(CUSTOMER_ID = "C001"),
  feature_names = c("CREDIT_SCORE")
)

# Batch: many keys in one round trip (preferred over a loop)
ids <- c("C001", "C002", "C003")
online_batch <- sfr_read_feature_view(
  fs,
  name          = "CUSTOMER_ONLINE",
  version       = "v1",
  store_type    = "ONLINE",
  keys          = list(CUSTOMER_ID = ids),
  feature_names = c("CREDIT_SCORE")
)

API	Store	Use when
`sfr_read_feature_view(..., store_type = "ONLINE")`	Hybrid Table (OFT)	Real-time lookup by entity key(s)
`sfr_read_feature_view(..., store_type = "OFFLINE")`	Offline DT / table	Inspect materialized offline features
`sfr_retrieve_features()`	Offline Feature Views	Join spine to features — latest (no timestamp) or ASOF (`spine_timestamp_col`) — not the OFT API

For multiple Feature Views at inference, the platform typically does one OFT read per view (or a pre-joined Feature View registered for online) — see the Implementation Guide — multi-FV online reads.

Operational helpers:

sfr_refresh_feature_view(fs, "CUSTOMER_ONLINE", "v1", store_type = "ONLINE")
sfr_get_refresh_history(fs, "CUSTOMER_ONLINE", "v1", store_type = "ONLINE")

19.13.3 Retrieve values — SQL or dbplyr on the Online Feature Table

The OFT is a queryable Hybrid Table in your Feature Store schema (naming is managed by the platform — often a *_OFT suffix). Discover the exact name in Snowsight (Feature Store UI) or metadata commands.

Prefer the Feature Store API for serving

For low-latency key lookup, sfr_read_feature_view(..., store_type = "ONLINE") is the supported path (batch keys in one call). Direct SQL or dbplyr against the OFT is useful for warehouse-side joins, debugging, or services that already speak RSnowflake — not as a substitute for the online API in high-QPS serving tiers.

-- Illustrative: replace with your FS database.schema and OFT name
SELECT customer_id, credit_score
FROM MY_ML_DB.FEATURE_STORE.CUSTOMER_ONLINE_OFT
WHERE customer_id IN ('C001', 'C002', 'C003');

library(dplyr)

oft <- tbl(con, in_schema("FEATURE_STORE", "CUSTOMER_ONLINE_OFT")) |>
  filter(CUSTOMER_ID %in% c("C001", "C002", "C003")) |>
  select(CUSTOMER_ID, CREDIT_SCORE)

features <- oft |> collect()

library(DBI)
library(RSnowflake)

con <- dbConnect(Snowflake())
features <- dbGetQuery(con, "
  SELECT customer_id, credit_score
  FROM MY_ML_DB.FEATURE_STORE.CUSTOMER_ONLINE_OFT
  WHERE customer_id = ?
", params = list("C001"))

Use bind parameters (?) for repeated lookups so Snowflake can reuse plans (Implementation Guide — parameter binding).

Offline vs online retrieval

Training / point-in-time: spine + sfr_generate_training_data() or sfr_retrieve_features() with spine_timestamp_col → offline store only.
Batch inference (latest): sfr_retrieve_features() without spine_timestamp_col → current offline values per key.
Batch inference (ASOF): sfr_retrieve_features(..., spine_timestamp_col = ...) → features as-of each spine row (Inference-time features).
Real-time scoring: sfr_read_feature_view(..., ONLINE) or SQL/dbplyr on the OFT → latest values per key, bounded by target_lag.

19.14 Common pitfalls

Issue	Mitigation
Leakage	Always use spine + `spine_timestamp_col`; never join “current” features to historical labels
Column case mismatch	Snowflake UPPERCASE vs R lowercase — align in SQL and `input_cols`
Missing warehouse	Pass `warehouse=` to `sfr_feature_store()`
Huge `collect()` before FS	Register views in SQL/dbplyr; let platform materialize
External view stale	Monitor your dbt/ETL — Feature Store won’t refresh for you
Wrong entity keys	Keys in spine must match entity definition exactly

19.15 Companion artifacts

Resource	Content
Vignette `feature-store`	Full API, tile aggregation, advanced options
`workspace_feature_store.ipynb`	Runnable Workspace walkthrough
FS Implementation Guide	Platform authority

19.16 Next steps

Authoring Feature Views — SQL and dbplyr patterns for feature_df (simple aggregates through wide/temporal).

Training Data in R — materialize and read training exports into R.

Model Registry — log models with lineage from Feature Store datasets.

End-to-end pipeline

--- title: "Feature Store" subtitle: "Entities, feature views, datasets, and training data from R" --- ## Overview Machine learning features — customer aggregates, rolling transaction sums, product embeddings — are usually built in **one-off SQL scripts** or **notebook cells**, then copied into training CSVs. That breaks down quickly: - **Training/serving skew** — batch SQL for training differs from the app query at inference time - **Leakage** — historical labels joined to features computed with *future* data - **No reuse** — each team re-implements the same `AVG(order_total)` logic - **Weak lineage** — auditors cannot trace which feature definition trained which model The Snowflake **Feature Store** is a **governed catalog** of feature definitions: entities (join keys), versioned **feature views** (SQL transformations), and APIs for **point-in-time-correct** training joins and consistent inference retrieval. **`snowflakeR`** exposes the full Python Feature Store API from R — objects you create in R appear identically in Snowsight and in Python notebooks. ::: {.callout-tip} ## Platform deep dive (recommended) For temporal joins, Dynamic Tables vs external views, online serving, and operations, read the **[Snowflake Feature Store Implementation Guide](https://snowflake-labs.github.io/snowflake-featurestore-imp-guide/)** — especially [Concepts](https://snowflake-labs.github.io/snowflake-featurestore-imp-guide/01_concepts/index.html), [Feature Views](https://snowflake-labs.github.io/snowflake-featurestore-imp-guide/04_feature_views/index.html), and [Temporal Features](https://snowflake-labs.github.io/snowflake-featurestore-imp-guide/06_temporal_features/index.html). Examples there are Python; the **objects and semantics are identical** when created from R. ::: See also [MLOps on Snowflake](../15_mlops_on_snowflake/index.qmd) for where Feature Store sits in the lifecycle, [Authoring Feature Views](../17b_authoring_feature_views/index.qmd) for warehouse SQL/dbplyr feature definitions, [Training Data in R](../17a_training_data_in_r/index.qmd) for reading exports (OBJECT/ARRAY/VECTOR → R), and [Model Registry](../18_model_registry/index.qmd) for logging models trained on Feature Store datasets. ## Learning Objectives - Explain why point-in-time joins matter and how leakage happens without them - Describe entities, feature views, spines, and datasets - Register managed vs external feature views from R (SQL or dbplyr) - Generate training data and retrieve inference-time features - Connect Feature Store outputs to Model Registry lineage --- ## The problem Feature Store solves {#sec-problem} Consider churn modeling: | Anti-pattern | What goes wrong | |--------------|-----------------| | Join labels to **today's** customer feature table | Model sees future behaviour at training time — inflated metrics | | Different SQL in training notebook vs production app | Training/serving skew — production accuracy collapses | | Ad hoc feature tables per project | No shared definition of `avg_order_total` across teams | | Export 10M rows to R with `collect()` | Slow, expensive, breaks governance | Feature Store centralizes **definitions** and enforces **temporal correctness** when building training sets. ### Training vs inference (different operations) | Phase | Question | Feature Store API (R) | |-------|----------|------------------------| | **Training** | "What did we **know** about entity X at time T?" | `sfr_generate_training_data()` / `sfr_generate_dataset()` | | **Inference (latest)** | "What do we **know now** about entity X?" | `sfr_retrieve_features()` without `spine_timestamp_col`, or OFT / online read | | **Inference (as-of batch)** | "What did we know at each row's timestamp?" (incremental backfill, replay scoring) | `sfr_retrieve_features(..., spine_timestamp_col = ...)` — same ASOF logic as training, without labels | Training uses a **spine** (entity keys + event timestamps + labels). Inference often uses **latest** feature values, but batch pipelines can pass a **timestamp spine** for point-in-time-correct feature lookup on the offline store. --- ## Architecture {#sec-architecture} A Feature View connects **tabular sources** in Snowflake to **registered metadata** and a **materialized offline store** (table or Dynamic Table). The `feature_df` you pass at registration time can be as simple or as complex as your pipeline requires. ### Sources and definitions | Layer | Options | |-------|---------| | **Source data** | Any tabular object — **table**, **view**, **Dynamic Table**, dbt model output, stage-backed table, etc. | | **`feature_df` definition** | **Pass-through** — `SELECT *` or column subset from an externally maintained table (external Feature View). **Transform** — aggregations, joins, windows, bundles entirely in SQL or dbplyr (managed Feature View with `refresh_freq`). **Hybrid** — read from a maintained table and enrich with SQL (e.g. keys from ETL + `OBJECT_CONSTRUCT` in the same query). | | **Registration mode** | **External** — no `refresh_freq`; you own refresh. **Managed** — `refresh_freq` → platform maintains backing Dynamic Table (or equivalent). | See [Authoring Feature Views](../17b_authoring_feature_views/index.qmd) for SQL/dbplyr patterns from simple aggregates through rollup tiles and OBJECT/ARRAY bundles. ```{mermaid} flowchart TB subgraph sources [Tabular sources in Snowflake] direction LR TBL[(Table)] VW[View] DTs[(Dynamic Table)] TBL ~~~ VW ~~~ DTs end subgraph define [feature_df at registration] direction TB PASS[Pass-through SELECT from source] XFORM[Transform SQL or dbplyr aggregates joins bundles] HYBRID[Hybrid source table plus SQL enrich] end subgraph store [Feature Store offline] FV[Feature View versioned metadata] OFF[(Offline materialization table or DT)] FV --> OFF end subgraph consume [Consumption paths] direction TB TRAIN[sfr_generate_training_data spine plus ASOF] INFER_L[sfr_retrieve_features latest keys] INFER_T[sfr_retrieve_features timestamp spine ASOF] ONLN[Online Feature Table key lookup] end sources --> define PASS --> FV XFORM --> FV HYBRID --> FV OFF --> TRAIN OFF --> INFER_L OFF --> INFER_T FV --> ONLN ``` **Compute note:** Feature view **registration and refresh** use the **warehouse** you pass to `sfr_feature_store()`. Feature data **materializes in Snowflake** — not in your R session until you explicitly read or generate training data. --- ## Core concepts {#sec-concepts} | Concept | Definition | Think of it as | |---------|------------|----------------| | **Feature Store** | Dedicated schema/database area for entities, views, metadata | Catalog + governance layer | | **Entity** | Business object + **join keys** (`CUSTOMER_ID`) | "How features attach to real-world things" | | **Feature View** | Versioned SQL (or dbplyr) transformation producing features for entity(ies) | Reusable feature **definition** | | **Managed view** | Has `refresh_freq` → Snowflake maintains backing **Dynamic Table** | Auto-refreshed pipeline | | **External view** | No `refresh_freq` → you maintain source table (dbt, ETL) | Registered SQL over existing table | | **Spine** | Rows: entity keys + **event timestamp** + optional **labels** | "One training example per row" | | **Point-in-time join** | For each spine row, features as-of that timestamp | Prevents future leakage | | **Dataset** | Immutable, versioned snapshot of joined training data | Reproducible input to `sfr_log_model()` | ### Point-in-time join — concrete example {#sec-pit} **Wrong (leakage):** ```sql -- Labels from 2024, features computed TODAY SELECT l.customer_id, l.churned, f.avg_order_total FROM churn_labels l JOIN customer_features_today f ON l.customer_id = f.customer_id ``` If `avg_order_total` includes orders placed *after* the churn event, the model cheats. **Right (Feature Store):** For each row in the spine `(customer_id, event_ts, churned)`, the store joins feature values **known at `event_ts`** — orders after that timestamp are excluded automatically. ```{mermaid} sequenceDiagram participant SP as Spine row C001 event_ts=T participant FS as Feature Store participant FV as CUSTOMER_ACTIVITY view SP->>FS: Need features for C001 at time T FS->>FV: Point-in-time filter event_ts <= T FV-->>FS: page_views clicks as of T FS-->>SP: Feature row attached to label ``` --- ## Setup {#sec-setup} Connect to (or create) a Feature Store in a dedicated schema: ```r library(snowflakeR) conn <- sfr_connect() conn <- sfr_load_notebook_config(conn) fs <- sfr_feature_store( conn, database = "ML_DB", schema = "FEATURES", warehouse = "ML_WH", create = TRUE # creates schema + tags if missing ) ``` **First-time admin** (roles, grants, tags): ```r sfr_setup_feature_store(conn, "ML_DB", "FEATURES", "ML_WH") ``` Requires privileges per [Feature Store RBAC](https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/rbac). Data engineers often run setup once; data scientists use `sfr_feature_store(..., create = FALSE)` against an existing schema. --- ## Workflow phases {#sec-workflow} | Step | Action | Typical owner | |------|--------|---------------| | 1 | Define **entities** (join keys) | ML engineer + domain expert | | 2 | Author **feature view** SQL or dbplyr | ML engineer | | 3 | **Register** view with version | `sfr_create_feature_view()` | | 4 | Build **spine** from labels table | Same or analytics team | | 5 | **Generate training data** or **Dataset** | `sfr_generate_*()` | | 6 | Train in R (tidymodels, glm, …) | Data scientist | | 7 | **Log model** with `training_dataset` | [Model Registry](../18_model_registry/index.qmd) | --- ## Entities {#sec-entities} Entities define **how features join** to business objects. Join keys are **immutable** after creation — plan compound keys carefully. ```r customer <- sfr_create_entity( fs, "CUSTOMER", "CUSTOMER_ID", desc = "Customer entity" ) product <- sfr_create_entity( fs, "PRODUCT", "PRODUCT_ID", desc = "Product entity" ) sfr_list_entities(fs) sfr_get_entity(fs, "CUSTOMER") ``` Multi-key entities (e.g. store + SKU) pass multiple columns — see vignette `feature-store`. --- ## Feature views {#sec-views} A feature view is a **versioned transformation** tied to one or more entities. The `feature_df` argument accepts: - **SQL string** — any `SELECT` over tabular sources (table, view, Dynamic Table, …) - **dbplyr lazy table** — dplyr pipeline translated to SQL ([RSnowflake](../13_rsnowflake_query/index.qmd)) The backing query may **pass through** an externally maintained table, **compute** features entirely in SQL/dbplyr, or combine both (hybrid). See [Architecture — Sources and definitions](#sec-architecture) and [Authoring Feature Views](../17b_authoring_feature_views/index.qmd). ### Managed views (automatic refresh) When you set **`refresh_freq`**, Snowflake registers a **Dynamic Table** (or equivalent managed object) that refreshes on schedule: ```r fv <- sfr_create_feature_view( fs, name = "CUSTOMER_FEATURES", version = "v1", entities = customer, feature_df = " SELECT customer_id, AVG(order_total) AS avg_order_total, COUNT(*) AS order_count, MAX(order_date) AS last_order_date FROM orders GROUP BY customer_id ", refresh_freq = "1 hour", desc = "Customer aggregates from orders" ) ``` Use managed views when Feature Store should **own refresh** — aggregates rebuilt hourly from raw `orders`. ### External views (you maintain the table) Omit `refresh_freq` when **dbt**, Dynamic Tables elsewhere, or ETL already maintains the physical table: ```r fv_external <- sfr_create_feature_view( fs, name = "CUSTOMER_DEMOGRAPHICS", version = "v1", entities = customer, feature_df = "SELECT customer_id, age_band, region FROM analytics.customer_demo" ) ``` Feature Store registers **metadata and lineage**; refresh is your pipeline's job. An external process maintains the contents of `analytics.customer_demo`, and Feature Store will use the latest values from that table, or if defined and maintained with a timestamp column, the values as-of the timestamp. When a timestamp is defined it is critical that the external process maintains the timestamp column and that it is updated with the current timestamp when the data is updated to avoid future leakage. ### Draft then register (two-step) Inspect locally before materializing — mirrors Python API: ```r fv_draft <- sfr_feature_view( name = "CUSTOMER_FEATURES", entities = customer, feature_df = "SELECT customer_id, AVG(order_total) AS avg_order_total FROM orders GROUP BY 1", refresh_freq = "1 hour" ) fv <- sfr_register_feature_view(fs, fv_draft, version = "v1") ``` ### dbplyr as `feature_df` Keep feature logic in R **dplyr** while SQL runs on the warehouse: ```r library(dplyr) dbi_con <- sfr_dbi_connection(conn) customer_features_query <- tbl(dbi_con, "ORDERS") |> group_by(customer_id) |> summarise( avg_order_total = mean(order_total, na.rm = TRUE), order_count = n(), .groups = "drop" ) fv <- sfr_create_feature_view( fs, name = "CUSTOMER_FEATURES_DBPLYR", version = "v1", entities = customer, feature_df = customer_features_query ) ``` snowflakeR translates the lazy table to SQL — no `collect()` before registration. ### Wide and sparse feature sets {#sec-wide-sparse} When a feature view would need **hundreds of scalar columns**, prefer **OBJECT** or **ARRAY** bundles (and, for training export at scale, **ARRAY → `::VECTOR`** columns) instead of a wide table. **Why bundles help:** - **Schema evolution** — add feature keys or array slots without a new column per feature (especially sparse or slowly growing catalogs). - **Scale** — fewer columns for the planner and for cross-view joins; optional flatten only at export time. - **Hybrid layouts** — keep join keys and timestamps as scalars; pack high-cardinality or evolving payloads into OBJECT ([Implementation Guide §12.10.4](https://snowflake-labs.github.io/snowflake-featurestore-imp-guide/12_advanced_patterns/index.html#sec-12-10-4-hybrid-storage-key-columns--object-payload)). Snowflake documents platform patterns in [Implementation Guide §12.10 — Wide & sparse](https://snowflake-labs.github.io/snowflake-featurestore-imp-guide/12_advanced_patterns/#sec-wide-sparse). Authoring detail: [Authoring Feature Views — Wide and sparse bundles](../17b_authoring_feature_views/index.qmd#sec-wide-bundles). From R, build bundles on the warehouse with RSnowflake-only dplyr helpers such as `object_agg()` and `array_agg()` — see [RSnowflake Query — dbplyr](../13_rsnowflake_query/index.qmd#sec-snowflake-dbplyr) and [Authoring Feature Views](../17b_authoring_feature_views/index.qmd#sec-wide-bundles). For **end-to-end training export** (scalar-wide ASOF → three ARRAY/VECTOR packs, ARRAY-native views, OBJECT flattening, Python decode), use the pipeline table in [Training Data in R — End-to-end pipelines](../17a_training_data_in_r/index.qmd#sec-e2e-pipelines) (Pipelines A–E). ### Time-series (event-level) features When features change over time (not one row per entity), set **`timestamp_col`**: ```r fv_ts <- sfr_create_feature_view( fs, name = "CUSTOMER_ACTIVITY", version = "v1", entities = customer, feature_df = "SELECT customer_id, event_ts, page_views, clicks FROM activity", timestamp_col = "event_ts", refresh_freq = "30 minutes" ) ``` Required for correct point-in-time joins — the store uses timestamps to pick feature values as-of each spine row. ### Rollup and tiled aggregates For **rollup Feature Views** over event streams (platform-managed tiles and lookback windows — e.g. sum of amount over 1 day), declare aggregates with `sfr_feature()` and set `feature_granularity` instead of writing every window in SQL. See [Authoring Feature Views — Rollup and tiled Feature Views](../17b_authoring_feature_views/index.qmd#sec-rollup-tiles) and the [`feature-store` vignette](https://snowflake-labs.github.io/snowflakeR/articles/feature-store.html). --- ## Managing views {#sec-manage} ```r sfr_list_feature_views(fs) sfr_get_feature_view(fs, "CUSTOMER_FEATURES", "v1") sfr_show_fv_versions(fs, "CUSTOMER_FEATURES") feature_data <- sfr_read_feature_view(fs, "CUSTOMER_FEATURES", "v1") sfr_refresh_feature_view(fs, "CUSTOMER_FEATURES", "v1") sfr_get_refresh_history(fs, "CUSTOMER_FEATURES", "v1") sfr_suspend_feature_view(fs, "CUSTOMER_FEATURES", "v1") sfr_resume_feature_view(fs, "CUSTOMER_FEATURES", "v1") ``` **Refresh history** shows Dynamic Table run status — use it when training data looks stale. **Suspend/resume** pauses automatic refresh during pipeline migrations. --- ## Generate training data {#sec-training} The **spine** drives training. Each row needs entity key(s), an **event timestamp**, and label column(s): ```r training_data <- sfr_generate_training_data( fs, spine = " SELECT customer_id, churn_date AS event_ts, churned AS label FROM churn_labels WHERE churn_date >= '2024-01-01' ", features = list( list(name = "CUSTOMER_FEATURES", version = "v1"), list(name = "CUSTOMER_ACTIVITY", version = "v1") ), spine_timestamp_col = "event_ts", spine_label_cols = "label" ) ``` Or pass registered view objects: ```r fv1 <- sfr_get_feature_view(fs, "CUSTOMER_FEATURES", "v1") fv2 <- sfr_get_feature_view(fs, "CUSTOMER_ACTIVITY", "v1") training_data <- sfr_generate_training_data( fs, spine = "SELECT customer_id, churn_date AS event_ts, churned FROM churn_labels", features = list(fv1, fv2), spine_timestamp_col = "event_ts", spine_label_cols = "churned" ) ``` **Materialize** to a table for reuse: ```r training_data <- sfr_generate_training_data( fs, spine = "...", features = list(fv1, fv2), save_as = "CHURN_TRAINING_SET_V1" ) ``` ### Datasets (immutable snapshots + lineage) For Model Registry **lineage** (visible in Snowsight as Source → Feature View → Dataset → Model): ```r training <- sfr_generate_dataset( fs, name = "CHURN_TRAINING", spine = "SELECT customer_id, label FROM labels", features = list(list(name = "CUSTOMER_FEATURES", version = "v1")), version = "v1", spine_label_cols = "label" ) model <- glm(label ~ ., data = training, family = binomial) mv <- sfr_log_model( reg, model = model, model_name = "CHURN_MODEL", training_dataset = training, input_cols = sfr_input_cols(training, exclude = "label"), output_cols = list(prediction = "double") ) ``` See [Model Registry — lineage](../18_model_registry/index.qmd#sec-lineage). --- ## Inference-time features (offline store) {#sec-inference} For **batch scoring** against **offline** Feature Views, use `sfr_retrieve_features()` to join a spine to feature values. ### Latest values (no timestamp spine) When each scoring row needs **current** offline features, pass entity keys only — omit `spine_timestamp_col`: ```r features_for_scoring <- sfr_retrieve_features( fs, spine = "SELECT customer_id FROM customers_to_score", features = list(fv1, fv2) ) ``` ### Point-in-time batch inference (timestamp spine) For **incremental batch inference**, backfills, or replay scoring where each row has its own as-of time, pass **`spine_timestamp_col`** — the same ASOF join logic as training, without labels: ```r features_for_scoring <- sfr_retrieve_features( fs, spine = " SELECT customer_id, score_as_of_ts FROM incremental_score_queue ", features = list(fv1, fv2), spine_timestamp_col = "score_as_of_ts" ) ``` Use this when the scoring queue carries an event or decision timestamp and features must match what was knowable at that time — not necessarily "latest" across the whole table. Pass results to `predict()` locally or to a batch scoring pipeline. Align column names with `input_cols` declared at model registration. For **sub-minute, key-based** serving of **latest** values, use the **online** Hybrid Table — [Online serving](#sec-online) (`sfr_read_feature_view(..., store_type = "ONLINE")` or SQL/dbplyr on the OFT). --- ## Online serving {#sec-online} **Online Feature Tables (OFTs)** expose the **latest** feature values per entity key on a **Hybrid Table**, refreshed from the offline Feature View on a schedule driven by `target_lag` (Stream + Task + MERGE). They are for **low-latency inference**, not point-in-time training — use offline views and [Inference-time features](#sec-inference) / `sfr_generate_training_data()` for history. Requires **`snowflake-ml-python` ≥ 1.18.0** and a platform edition that supports online serving. Platform detail: [Implementation Guide — Online Features](https://snowflake-labs.github.io/snowflake-featurestore-imp-guide/08_online_features/index.html). ### Enable online on a Feature View ```r fv_online <- sfr_create_feature_view( fs, name = "CUSTOMER_ONLINE", version = "v1", entities = customer, feature_df = "SELECT customer_id, credit_score FROM credit_data", refresh_freq = "5 minutes", online_config = sfr_online_config(enable = TRUE, target_lag = "2 minutes") ) # Or add online to an existing view sfr_update_feature_view( fs, "CUSTOMER_FEATURES", "v1", online_config = sfr_online_config(enable = TRUE, target_lag = "5 minutes") ) ``` Snowflake creates the OFT (Hybrid Table + sync Task). `target_lag` on the online config is **independent** of offline `refresh_freq` — end-to-end staleness is roughly both paths combined. ### Retrieve values — snowflakeR → snowflake-ml-python Use **`sfr_read_feature_view()`** with `store_type = "ONLINE"` (wraps `fs.read_feature_view(..., StoreType.ONLINE)`). Pass entity **keys** and optional **feature column** names; batch keys in one call when scoring many entities. ```r # Single customer lookup online_row <- sfr_read_feature_view( fs, name = "CUSTOMER_ONLINE", version = "v1", store_type = "ONLINE", keys = list(CUSTOMER_ID = "C001"), feature_names = c("CREDIT_SCORE") ) # Batch: many keys in one round trip (preferred over a loop) ids <- c("C001", "C002", "C003") online_batch <- sfr_read_feature_view( fs, name = "CUSTOMER_ONLINE", version = "v1", store_type = "ONLINE", keys = list(CUSTOMER_ID = ids), feature_names = c("CREDIT_SCORE") ) ``` | API | Store | Use when | |-----|-------|----------| | `sfr_read_feature_view(..., store_type = "ONLINE")` | Hybrid Table (OFT) | Real-time lookup by entity key(s) | | `sfr_read_feature_view(..., store_type = "OFFLINE")` | Offline DT / table | Inspect materialized offline features | | `sfr_retrieve_features()` | Offline Feature Views | Join spine to features — **latest** (no timestamp) or **ASOF** (`spine_timestamp_col`) — not the OFT API | For **multiple Feature Views** at inference, the platform typically does **one OFT read per view** (or a **pre-joined** Feature View registered for online) — see the Implementation Guide — multi-FV online reads. Operational helpers: ```r sfr_refresh_feature_view(fs, "CUSTOMER_ONLINE", "v1", store_type = "ONLINE") sfr_get_refresh_history(fs, "CUSTOMER_ONLINE", "v1", store_type = "ONLINE") ``` ### Retrieve values — SQL or dbplyr on the Online Feature Table The OFT is a queryable **Hybrid Table** in your Feature Store schema (naming is managed by the platform — often a `*_OFT` suffix). Discover the exact name in **Snowsight** (Feature Store UI) or metadata commands. ::: {.callout-warning} ## Prefer the Feature Store API for serving For low-latency key lookup, **`sfr_read_feature_view(..., store_type = "ONLINE")`** is the supported path (batch keys in one call). Direct SQL or dbplyr against the OFT is useful for **warehouse-side joins**, debugging, or services that already speak RSnowflake — not as a substitute for the online API in high-QPS serving tiers. ::: ::: {.panel-tabset group="fs-oft-read"} ##### SQL ```sql -- Illustrative: replace with your FS database.schema and OFT name SELECT customer_id, credit_score FROM MY_ML_DB.FEATURE_STORE.CUSTOMER_ONLINE_OFT WHERE customer_id IN ('C001', 'C002', 'C003'); ``` ##### dbplyr ```r library(dplyr) oft <- tbl(con, in_schema("FEATURE_STORE", "CUSTOMER_ONLINE_OFT")) |> filter(CUSTOMER_ID %in% c("C001", "C002", "C003")) |> select(CUSTOMER_ID, CREDIT_SCORE) features <- oft |> collect() ``` ##### R (DBI + bind) ```r library(DBI) library(RSnowflake) con <- dbConnect(Snowflake()) features <- dbGetQuery(con, " SELECT customer_id, credit_score FROM MY_ML_DB.FEATURE_STORE.CUSTOMER_ONLINE_OFT WHERE customer_id = ? ", params = list("C001")) ``` ::: Use **bind parameters** (`?`) for repeated lookups so Snowflake can reuse plans ([Implementation Guide — parameter binding](https://snowflake-labs.github.io/snowflake-featurestore-imp-guide/08_online_features/index.html#sec-8-9-4-parameter-binding)). ::: {.callout-note} ## Offline vs online retrieval - **Training / point-in-time:** spine + `sfr_generate_training_data()` or `sfr_retrieve_features()` with `spine_timestamp_col` → **offline** store only. - **Batch inference (latest):** `sfr_retrieve_features()` without `spine_timestamp_col` → current offline values per key. - **Batch inference (ASOF):** `sfr_retrieve_features(..., spine_timestamp_col = ...)` → features as-of each spine row ([Inference-time features](#sec-inference)). - **Real-time scoring:** `sfr_read_feature_view(..., ONLINE)` or SQL/dbplyr on the OFT → **latest** values per key, bounded by `target_lag`. ::: --- ## Common pitfalls {#sec-pitfalls} | Issue | Mitigation | |-------|------------| | **Leakage** | Always use spine + `spine_timestamp_col`; never join "current" features to historical labels | | Column case mismatch | Snowflake UPPERCASE vs R lowercase — align in SQL and `input_cols` | | Missing warehouse | Pass `warehouse=` to `sfr_feature_store()` | | Huge `collect()` before FS | Register views in SQL/dbplyr; let platform materialize | | External view stale | Monitor your dbt/ETL — Feature Store won't refresh for you | | Wrong entity keys | Keys in spine must match entity definition exactly | --- ## Companion artifacts {#sec-companion} | Resource | Content | |----------|---------| | Vignette `feature-store` | Full API, tile aggregation, advanced options | | [`workspace_feature_store.ipynb`](https://github.com/Snowflake-Labs/snowflakeR/blob/main/inst/notebooks/workspace_feature_store.ipynb) | Runnable Workspace walkthrough | | [FS Implementation Guide](https://snowflake-labs.github.io/snowflake-featurestore-imp-guide/) | Platform authority | --- ## Next steps {#sec-next} [Authoring Feature Views](../17b_authoring_feature_views/index.qmd) — SQL and dbplyr patterns for `feature_df` (simple aggregates through wide/temporal). [Training Data in R](../17a_training_data_in_r/index.qmd) — materialize and read training exports into R. [Model Registry](../18_model_registry/index.qmd) — log models with lineage from Feature Store datasets. [End-to-end pipeline](../23_end_to_end/index.qmd)