4 Feature Views

Dynamic Tables vs Views, refresh_freq modes, versioning, and lifecycle

Keywords

snowflake, feature store, ml, machine learning, mlops

4.1 Overview

A Feature View is a collection of related features computed from source data. It defines how features are transformed, stored, and served. This chapter uses the clickstream sample model (FEATURE_STORE_DEMO.CLICKSTREAM_DATA: VISITORS, USERS, HOUSEHOLDS, PRODUCTS, CATEGORIES, SUPPLIERS, PRODUCT_SUPPLIER, EVENTS, SESSIONS, ORDERS, ORDER_ITEMS) and Feature Store objects in FEATURE_STORE_DEMO.FEATURE_STORE, warehouse FS_DEV_WH.

The refresh_freq parameter determines whether a Feature View is backed by a View (query-time) or a Dynamic Table (materialized), and whether refresh follows TARGET_LAG or a CRON-driven Task.

4.2 Learning Objectives

After completing this chapter, you will be able to:

Interpret the three refresh_freq behaviors: omitted / None, time period, and CRON
Choose between Dynamic Table and View-based Feature Views for a workload, including using Views as a development precursor to Dynamic Tables
Implement versioning strategies (V01, V02, …) for safe updates
Apply schema-evolution rules: add versions for new features, deprecate without breaking consumers, never change a registered Feature View in place
Evaluate the base + presentation layer model as an alternative to version-per-bundle for complex feature sets
Manage Feature View lifecycle: create, register new versions, validate lineage, deprecate, and remove
Configure refresh strategies for freshness vs. cost, understanding that the DT scheduler incurs no compute when source data is unchanged

📂 Chapter code: Browse companion scripts on GitHub

4.3 SQL vs Snowpark DataFrame API

Feature View definitions require a Snowpark DataFrame (feature_df). You can construct this using either session.sql("SELECT ...") (SQL) or the Snowpark DataFrame API (session.table(...).group_by(...).agg(...)). Both produce identical lazy Snowpark DataFrames and generate the same Dynamic Tables.

Throughout this guide, code examples are presented in both forms using tabs. SQL is shown first because it is more widely known across data science and engineering teams, but the Snowpark DataFrame API is equally valid.

Factor	SQL (`session.sql`)	Snowpark DataFrame API
Readability	Universally understood across data roles	Requires Python + Snowpark API knowledge
Debuggability	Maps 1:1 to what the Dynamic Table executes; easy to copy-paste into Snowsight	Emits machine-generated SQL that can be dense/nested (see Appendix C)
Composability	Harder to parameterize dynamically	Excels at programmatic feature generation (loops, conditional columns, config-driven specs)
Testability	Requires `session.sql` assertions or a SQL runner	Integrates with Python test frameworks (pytest)
Verbosity	Typically more concise for simple aggregations	More concise for complex multi-step pipelines with reusable intermediates
Team skills	SQL is widely known (analysts, data scientists, data engineers)	Snowpark API is narrower (Python + Snowpark)
Maintenance	Easier to review in PRs for SQL-fluent teams	Easier for Python-fluent teams with IDE support

Tip

There is no wrong choice. Pick the style that your team can read, review, and maintain. Many teams standardize on SQL for simple aggregations and reserve Snowpark DataFrames for programmatic generation (e.g., data-driven feature factories in Chapter 7).

4.4 How `refresh_freq` Controls Materialization

`refresh_freq`	Backing object	Behavior
`None` (default) or omitted	View	Features computed at query time; always reflects current source data; no Dynamic Table storage
Time period (e.g. `"1 hour"`, `"15 minutes"`)	Dynamic Table	Materialized result; refresh driven by TARGET_LAG for that period
CRON expression (e.g. `"0 8 * * *"`)	Dynamic Table + Task	Dynamic Table is created and receives an initial full refresh, then is suspended for ongoing lag-based refresh; a Task runs the refresh on the CRON schedule

CRON vs period strings

Snowflake distinguishes a schedule string (CRON) from a time period (lag). If the value parses as CRON, the platform uses the Dynamic Table + Task pattern described above—not continuous TARGET_LAG refresh.

Iceberg-backed Feature Views (SDK 1.26+)

Any DT-backed Feature View (time period or CRON) can optionally materialize as a Dynamic Iceberg Table by passing a StorageConfig:

from snowflake.ml.feature_store import StorageConfig, StorageFormat

storage = StorageConfig(
    format=StorageFormat.ICEBERG,
    external_volume='MY_EXTERNAL_VOLUME',
    base_location='feature_store/my_fv'
)
fv = FeatureView(
    name="MY_FV", entities=[e], feature_df=df,
    refresh_freq="1 hour",
    storage_config=storage,
    ...
)

This writes feature data as Parquet files in Iceberg format on external cloud storage, making it directly readable by any Iceberg-compatible ML framework (Ray, Spark, PyTorch) without requiring a Snowflake connector. All standard Feature Store APIs work unchanged. See Chapter 11: Iceberg-Backed Feature Views for full details.

4.4.1 View-based Feature View (`refresh_freq = None`)

📁 Full code: _code/dynamic_table_fv.py — create_view_featureview

View-based Feature Views are highly flexible. The underlying View can range from a simple pass-through to a complex declarative transformation, making them suitable for several distinct use cases:

1. Simple pass-through for externally managed tables. When feature engineering is handled by an external process (dbt, Airflow, a Stored Procedure, etc.), the Feature View simply wraps the already-prepared table. This registers it in the Feature Store catalog for discovery, lineage, and training-set generation without duplicating any transformation logic:

from snowflake.ml.feature_store import FeatureView, Entity

# Pass-through: dbt/Airflow manages the table; Feature View just wraps it
product_catalog_fv = FeatureView(
    name="PRODUCT_CATALOG_FV",
    entities=[product_entity],
    feature_df=session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.PRODUCTS"),
    timestamp_col="UPDATED_TS",
    refresh_freq=None,
    desc="PRODUCT attributes from CLICKSTREAM_DATA.PRODUCTS — managed by dbt",
)

2. Declarative transformations. The View definition can encode complex feature-engineering logic – joins, aggregations, window functions, CASE expressions – evaluated at query time. This keeps the transformation logic self-contained within the Feature Store:

SQL
Snowpark DataFrame

user_session_df = session.sql("""
    SELECT
        s.USER_ID,
        COUNT(DISTINCT s.SESSION_ID) AS SESSION_CNT,
        AVG(s.DURATION_SEC) AS AVG_SESSION_DURATION_SEC,
        SUM(CASE WHEN e.EVENT_TYPE = 'PURCHASE' THEN 1 ELSE 0 END) AS PURCHASE_EVENT_CNT,
        MAX(s.START_TS) AS LAST_SESSION_TS
    FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.SESSIONS s
    JOIN FEATURE_STORE_DEMO.CLICKSTREAM_DATA.EVENTS e ON s.SESSION_ID = e.SESSION_ID
    GROUP BY s.USER_ID
""")

import snowflake.snowpark.functions as F

sessions = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.SESSIONS")
events = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.EVENTS")

user_session_df = (
    sessions.join(events, sessions["SESSION_ID"] == events["SESSION_ID"])
    .group_by(sessions["USER_ID"])
    .agg(
        F.count_distinct(sessions["SESSION_ID"]).alias("SESSION_CNT"),
        F.avg(sessions["DURATION_SEC"]).alias("AVG_SESSION_DURATION_SEC"),
        F.sum(F.when(events["EVENT_TYPE"] == "PURCHASE", 1).otherwise(0)).alias("PURCHASE_EVENT_CNT"),
        F.max(sessions["START_TS"]).alias("LAST_SESSION_TS"),
    )
)

user_session_fv = FeatureView(
    name="USER_SESSION_FV",
    entities=[user_entity],
    feature_df=user_session_df,
    timestamp_col="LAST_SESSION_TS",
    refresh_freq=None,
    desc="USER session engagement features -- computed at query time",
)

3. Development precursor to a Dynamic Table. During development, use a View-based Feature View while the transformation logic is still evolving. Because the View has no materialized state, you can freely iterate on the query definition. Once the logic stabilizes, convert it to a Dynamic Table by setting refresh_freq to a period string – the same feature_df works unchanged:

Converting a View-based Feature View to a Dynamic Table

When you switch from refresh_freq=None (View) to a period string (Dynamic Table), the underlying Snowflake object type changes from VIEW to DYNAMIC TABLE. As of snowflake-ml-python 1.27.0, register_feature_view() with overwrite=True handles this conversion automatically – it drops the existing object and creates the new type. On earlier versions, you must delete the existing Feature View first with fs.delete_feature_view() before re-registering with the new refresh_freq.

# Start with a View in development
fv = FeatureView(
    name="USER_ORDER_FV",
    entities=[user_entity],
    feature_df=user_order_df,
    timestamp_col="LAST_ORDER_TS",
    refresh_freq=None,           # View — iterate freely
    desc="USER lifetime order features — DEV iteration",
)

# Later, promote to Dynamic Table for production
fv = FeatureView(
    name="USER_ORDER_FV",
    entities=[user_entity],
    feature_df=user_order_df,    # same query
    timestamp_col="LAST_ORDER_TS",
    refresh_freq="1 hour",       # now materialized as a DT
    desc="USER lifetime order features — production DT",
)

Characteristics:

Computed on each query – always reflects current source data
No Dynamic Table storage or refresh cost
Query-time compute cost scales with read volume
Ideal for prototyping, rarely accessed features, or wrapping externally managed tables

4.4.2 Dynamic Table Feature View (time period)

📁 Full code: _code/dynamic_table_fv.py — create_dt_featureview

4.4.2.1 Pattern A: Cumulative lifetime features (one row per order event)

The most PIT-friendly pattern for DT-backed Feature Views produces multiple rows per entity over time, so the ASOF join can select the correct snapshot for any point in the past:

SQL
Snowpark DataFrame

from snowflake.ml.feature_store import FeatureView, Entity

user_order_cumulative_df = session.sql("""
    SELECT
        USER_ID,
        ORDER_TS,
        SUM(TOTAL_AMT) OVER w AS ORDER_TOTAL_AMT_SUM,
        COUNT(ORDER_ID) OVER w AS ORDER_CNT,
        AVG(TOTAL_AMT) OVER w AS ORDER_TOTAL_AMT_AVG
    FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
    WINDOW w AS (PARTITION BY USER_ID ORDER BY ORDER_TS
                 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
""")

from snowflake.snowpark import Window
import snowflake.snowpark.functions as F
from snowflake.ml.feature_store import FeatureView, Entity

orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
w = Window.partition_by("USER_ID").order_by("ORDER_TS").rows_between(
    Window.UNBOUNDED_PRECEDING, Window.CURRENT_ROW
)

user_order_cumulative_df = orders.select(
    F.col("USER_ID"),
    F.col("ORDER_TS"),
    F.sum("TOTAL_AMT").over(w).alias("ORDER_TOTAL_AMT_SUM"),
    F.count("ORDER_ID").over(w).alias("ORDER_CNT"),
    F.avg("TOTAL_AMT").over(w).alias("ORDER_TOTAL_AMT_AVG"),
)

user_order_fv = FeatureView(
    name="USER_ORDER_FV",
    entities=[user_entity],
    feature_df=user_order_cumulative_df,
    timestamp_col="ORDER_TS",
    refresh_freq="1 hour",
    desc="USER cumulative order features: running total spend, order count, "
         "and average order value -- one row per order for PIT retrieval",
)

This Feature View produces one row per user per order. When the spine requests features as of a specific timestamp, the ASOF join selects the row with the most recent ORDER_TS <= the spine timestamp, returning the cumulative totals as of that point in time.

4.4.2.2 Pattern B: Latest-state aggregate (one row per entity)

If you only need the current totals (e.g., for online serving or batch inference at “now”), a single-row aggregate is simpler and cheaper to maintain:

SQL
Snowpark DataFrame

user_order_latest_df = session.sql("""
    SELECT
        USER_ID,
        SUM(TOTAL_AMT) AS ORDER_TOTAL_AMT_SUM,
        COUNT(ORDER_ID) AS ORDER_CNT,
        AVG(TOTAL_AMT) AS ORDER_TOTAL_AMT_AVG,
        MAX(ORDER_TS) AS LAST_ORDER_TS
    FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
    GROUP BY USER_ID
""")

import snowflake.snowpark.functions as F

orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
user_order_latest_df = orders.group_by("USER_ID").agg(
    F.sum("TOTAL_AMT").alias("ORDER_TOTAL_AMT_SUM"),
    F.count("ORDER_ID").alias("ORDER_CNT"),
    F.avg("TOTAL_AMT").alias("ORDER_TOTAL_AMT_AVG"),
    F.max("ORDER_TS").alias("LAST_ORDER_TS"),
)

user_order_latest_fv = FeatureView(
    name="USER_ORDER_LATEST_FV",
    entities=[user_entity],
    feature_df=user_order_latest_df,
    timestamp_col="LAST_ORDER_TS",
    refresh_freq="1 hour",
    desc="USER latest-state order aggregates: lifetime total spend, order count, "
         "and average order value from CLICKSTREAM_DATA.ORDERS",
)

user_order_latest_fv = fs.register_feature_view(
    feature_view=user_order_latest_fv, version="V01", block=True, overwrite=True,
)
print(f"Registered: {user_order_latest_fv.name}/V01  ({user_order_latest_fv.status})")
user_order_latest_df.limit(5).show()

Note

Pattern B produces one row per user. The ASOF join always returns this single row for any spine timestamp >= LAST_ORDER_TS. This means every training example sees the same (current) totals regardless of when the training event occurred – it does not provide true point-in-time correctness for historical training. Use Pattern A when temporal accuracy matters. See Chapter 6: Temporal Features for details.

Characteristics:

Pre-computed and stored in a Dynamic Table
Refreshes on a lag target derived from the period string
Strong fit for high read volume and predictable freshness needs
Ongoing compute cost for refreshes

Float aggregations with JOINs force full refresh

If your feature_df query combines aggregations on FLOAT-typed columns (SUM, AVG, MIN, MAX, VAR, STD) with a JOIN in the same query block, the Dynamic Table engine will fall back to FULL refresh instead of incremental. This can dramatically increase compute cost and refresh latency.

Workaround: Cast float columns to fixed-point before aggregating:

SUM(CAST(float_col AS NUMBER(38, 6))) AS SUM_VAL

If you are building your feature_df with the Snowpark DataFrame API instead of session.sql(), the equivalent cast is:

from snowflake.snowpark.types import DecimalType

feature_df = (
    source_df
    .group_by("JOIN_KEY")
    .agg(F.sum(F.col("FLOAT_COL").cast(DecimalType(38, 6))).alias("SUM_VAL"))
)

If the join is unavoidable, consider splitting the query into a base DT (aggregation only, no join) and a presentation View that joins the results – see the Base + Presentation Layer Model later in this chapter.

As of April 2026, the presentation layer does not have to be a View – it can be a downstream DT that refreshes incrementally. When the base DT’s query uses GROUP BY, Snowflake derives a unique key from the grouping columns. A downstream DT can then opt into REFRESH_MODE = INCREMENTAL via SQL (ALTER DYNAMIC TABLE ... SET REFRESH_MODE = INCREMENTAL) and process only the rows that actually changed in the upstream full refresh. This is a SQL-level capability not yet exposed in the Feature Store Python API. See Chapter 5: Feature Pipelines and Understanding primary keys in dynamic tables.

4.4.3 Dynamic Table + Task (CRON)

📁 Full code: _code/dynamic_table_fv.py — create_cron_featureview

Use CRON when features should refresh on a calendar schedule aligned to upstream batch loads rather than continuous lag. This is common when source data arrives in known batches (e.g., a nightly ETL) and you want refresh timing to match the upstream cadence. Note that even with a shorter refresh_freq, cost is minimal when there is no new source data – the Dynamic Table scheduler checks for upstream changes and skips the refresh cycle without starting a warehouse:

SQL
Snowpark DataFrame

user_daily_session_df = session.sql("""
    SELECT
        USER_ID,
        COUNT(DISTINCT SESSION_ID) AS SESSION_CNT_LIFETIME,
        AVG(DURATION_SEC) AS AVG_SESSION_DURATION_SEC
    FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.SESSIONS
    GROUP BY USER_ID
""")

import snowflake.snowpark.functions as F

sessions = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.SESSIONS")
user_daily_session_df = sessions.group_by("USER_ID").agg(
    F.count_distinct("SESSION_ID").alias("SESSION_CNT_LIFETIME"),
    F.avg("DURATION_SEC").alias("AVG_SESSION_DURATION_SEC"),
)

user_daily_session_fv = FeatureView(
    name="USER_SESSION_DAILY_FV",
    entities=[user_entity],
    feature_df=user_daily_session_df,
    refresh_freq="0 8 * * *",  # CRON: refresh daily at 08:00 UTC after nightly load
    desc="USER session engagement features -- refreshed daily after batch ingestion",
)

timestamp_col is optional

This example omits timestamp_col because it produces a single latest-state row per entity. Without a timestamp column, Feature Store performs an equality join on the entity key when retrieving features – every spine row for that entity receives the same feature values. Use timestamp_col when you need point-in-time (ASOF) joins for temporally correct training data; omit it when you only need the current snapshot. See Chapter 6: Temporal Features for details.

Characteristics:

Operational model: initial DT build, then Task-owned refreshes per CRON schedule
Aligns refresh to upstream batch cadence – avoids paying for refreshes before new data lands
Document the schedule and timezone expectations for consumers

4.4.4 External Scheduling (Suspend + On-Demand Refresh)

When refreshes should be triggered by external pipeline completion (e.g., an Airflow DAG, Dagster asset, or CI/CD step) rather than a fixed schedule, you can suspend the Feature View’s DT and call refresh_feature_view() on demand:

Python API
SQL

registered_fv = fs.register_feature_view(fv, "V01")

# Suspend automatic refreshes -- DT stays idle until explicitly triggered
fs.suspend_feature_view(registered_fv)

# In your external orchestrator (Airflow task, Dagster op, etc.):
fs.refresh_feature_view(registered_fv)

-- Suspend automatic refreshes
ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_SESSION_DAILY_FV$V01"
  SUSPEND;

-- Trigger a manual refresh from your external scheduler
ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_SESSION_DAILY_FV$V01"
  REFRESH;

Tip

The SQL approach is often simpler for external schedulers (Airflow, Dagster, dbt) that already have a Snowflake connection – no Python environment or snowflake-ml-python install required. The Feature View’s underlying Dynamic Table name follows the <NAME>$<VERSION> convention and lives in the Feature Store schema.

Characteristics:

Full control over refresh timing – refresh only when upstream data is confirmed ready
No wasted refreshes on unchanged data
Requires external orchestration infrastructure
The DT remains suspended between refreshes; no background compute

4.4.5 Choosing Between Modes

Factor	View (`None`)	Dynamic Table (period)	Dynamic Table + Task (CRON)	External (suspend + refresh)
Query frequency	Lower or sporadic	High	High (after batch landing)	High (after upstream completes)
Freshness	As-of query time	Lag-bound (e.g. ≤ 1 hour)	Next scheduled run	On-demand after trigger
Cost shape	Pay per query	Pay for refresh + storage	Pay for scheduled runs	Pay for triggered runs only
Orchestration	Optional external prep of sources	Native lag refresh	Native Task schedule	External (Airflow, Dagster, etc.)

Wide Feature Views (100+ columns)

Feature tables with hundreds or thousands of columns (sensor data, one-hot encodings, time-windowed aggregations) introduce SQL compilation overhead, micro-partition bloat, and can hit Snowflake’s ~2,000-column practical limit. If your Feature View exceeds ~100 columns, consider hybrid storage – entity keys and timestamp as native columns, remaining features packed into a single OBJECT column using OBJECT_CONSTRUCT. This reduces storage overhead, eliminates column-limit concerns, and improves compile time. An expansion View on top can reconstitute the wide schema for consumers that need it. See Chapter 13: Advanced Patterns – Wide & Sparse Feature Data for detailed architecture, column count thresholds, and the two-tier Feature View pattern.

Data-Driven Feature Definitions

When feature columns are generated programmatically — categorical pivots, multi-window sweeps, config-driven specifications — the feature_df and Feature list are built by code rather than written by hand. The Feature View schema is still fixed at registration time, but the code path becomes data-driven. See Chapter 7: Data-Driven Feature Definitions for patterns, factory functions, and static vs dynamic trade-offs.

4.5 Versioning Strategies

Feature View versions enable safe updates and rollbacks.

Key insight

Versioning happens at the Feature View level, not per individual feature. A Feature View version is an immutable bundle of features and definitions.

4.5.1 Recommended Version Naming

📁 Full code: _code/versioning.py

# Zero-padded sequential versions (recommended for production)
fs.register_feature_view(fv, version="V01")
fs.register_feature_view(fv, version="V02")
fs.register_feature_view(fv, version="V10")  # Sorts correctly lexicographically

# Environment-prefixed variants (optional)
fs.register_feature_view(fv, version="DEV_V01")
fs.register_feature_view(fv, version="PROD_V03")

Prefer V01, V02, … for operational clarity and sorting. Other schemes (semantic versions, date stamps) are possible but should be agreed team-wide. Consider creating shared Python helper functions that automate version incrementing and enforce your preferred scheme – this reduces human error as the Feature View count grows and keeps versioning consistent across teams.

4.5.2 How Versions Map to Snowflake Objects

When you register a Feature View, the Feature Store constructs the underlying Snowflake object name by joining the Feature View name and version with a $ separator:

<FEATURE_VIEW_NAME>$<VERSION>

For example, registering a Feature View named USER_ORDER_FV at version V01 creates a Dynamic Table (or View) called:

FEATURE_STORE_DEMO.FEATURE_STORE.USER_ORDER_FV$V01

This naming convention means:

Each version is a distinct Snowflake object – V01 and V02 coexist as separate Dynamic Tables (or Views) in the same schema.
You can query any version directly via SQL: SELECT * FROM FEATURE_STORE.USER_ORDER_FV$V02.
Downstream DTs that chain off a specific version reference the fully-qualified $-delimited name, so upgrading consumers to a new version is an explicit, controlled change.
Cost implication: Each DT-backed version incurs independent refresh compute and storage. If V01 and V02 share overlapping features, that compute is duplicated. Mitigations: suspend older versions once consumers have migrated (fs.suspend_feature_view()), use the Base + Presentation pattern to keep shared features in a single base DT, or consolidate features into a single wide Feature View.
The $ character is legal in unquoted Snowflake identifiers for most SQL statements (SELECT, ALTER, DESCRIBE, etc.). However, you need double-quoting when using the name in ALTER ... SET CONTACT, ALTER ... SET TAG, or similar DDL that could misparse the $, and in Snowpark session.table('"USER_ORDER_FV$V01"') where the string is passed to the SQL layer.

4.5.3 Schema Evolution and Deprecation

Breaking changes require a new version

Do not alter a registered Feature View in place (same name + version) when semantics change. Register a new version (or a new Feature View name) instead.

Change type	What to do
Adding new features	Create and register a new version (e.g. V02) that includes the new columns; keep V01 until consumers opt in. Exception: if the DT uses `SELECT ` from a base table and the source team adds a column there, the DT picks it up automatically — see Incremental Column Pickup with SELECT .
Deprecating features	Leave the old version registered and documented as deprecated; remove columns only in a new version so existing training and serving jobs keep working.
Breaking changes (rename column, change type or meaning, change entity keys, change `timestamp_col` semantics)	Always ship a new version (or new Feature View name). Never overwrite V01 with incompatible definitions.
Consumers	Point new pipelines to V02; migrate models and batch jobs on your timeline; retire V01 only after usage drops to zero. Use lineage to identify dependencies and usage.

Workflow summary:

V01 — current production bundle.
V02 — add columns or fix logic; train new models against V02.
Deprecate V01 in docs and catalog; delete V01 only when no remaining dependencies.

4.5.4 Reducing the Cost of Schema Evolution with BACKFILL

Adding features is the most common schema evolution scenario. The default path – registering a new Feature View version with the additional columns – creates a new Dynamic Table that must perform a full initial refresh over the entire source dataset. For large DTs with months or years of history, this can be prohibitively expensive.

Dynamic Tables now support BACKFILL FROM combined with IMMUTABLE WHERE, which lets you carry forward historical data from the prior version’s DT and only compute the new columns for the mutable (recent) window. The general workflow:

Clone the existing DT (V01) to a regular table.
Add the new column(s) to the clone (ALTER TABLE ... ADD COLUMN). Historical rows will have NULL for the new features.
Create the new DT (V02) with the updated SELECT that includes the new features, using BACKFILL FROM the clone and IMMUTABLE WHERE to protect the historical region.
Only the mutable window (e.g., last 1 day) is computed from the source data; the immutable region is copied from the backfill table at near-zero cost.

-- 1. Clone V01's Dynamic Table to a regular table
CREATE TABLE FEATURE_STORE.USER_ORDER_FV_V01_CLONE
  CLONE FEATURE_STORE.USER_ORDER_FV$V01;

-- 2. Add the new column (NULL for historical rows)
ALTER TABLE FEATURE_STORE.USER_ORDER_FV_V01_CLONE
  ADD COLUMN ORDER_TOTAL_AMT_AVG FLOAT;

-- 3. Create V02 with BACKFILL + IMMUTABLE WHERE
CREATE DYNAMIC TABLE FEATURE_STORE.USER_ORDER_FV$V02
  BACKFILL FROM FEATURE_STORE.USER_ORDER_FV_V01_CLONE
  IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '1 day')
  TARGET_LAG = '1 hour'
  WAREHOUSE = FS_DEV_WH
  AS
    SELECT USER_ID,
           SUM(TOTAL_AMT)   AS ORDER_TOTAL_AMT_SUM,
           COUNT(ORDER_ID)   AS ORDER_CNT,
           AVG(TOTAL_AMT)    AS ORDER_TOTAL_AMT_AVG,   -- new feature
           MAX(ORDER_TS)     AS LAST_ORDER_TS
    FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
    GROUP BY USER_ID;

After creation, the DT’s immutable region contains the backfilled data (with NULL for ORDER_TOTAL_AMT_AVG in historical rows), while the mutable region is computed fresh and includes all columns. You can verify with SELECT *, METADATA$IS_IMMUTABLE FROM FEATURE_STORE.USER_ORDER_FV$V02.

TAG registration required for API visibility

Objects created via SQL in the Feature Store schema are not visible to fs.list_feature_views(), fs.get_feature_view(), or the Snowsight Feature Store UI unless the correct internal TAGs are applied. After creating the DT via SQL, add the Feature Store tags so the API and UI can discover it. See Appendix D: Tag Convention for the required TAG schema and a worked example. If the object is only consumed via direct SQL (not through the Feature Store API), tags are optional.

Not yet available via Feature Store API

BACKFILL FROM and IMMUTABLE WHERE are Dynamic Table SQL features – they are not currently exposed through the Feature Store Python API (FeatureView, register_feature_view). To use this pattern you would create the DT via SQL and then register it as a view-based Feature View pointing at the resulting table, or manage the DT directly outside the API. Check the Feature Store release notes for future API integration.

Inspecting generated SQL with fv.query

The FeatureView.query property returns the compiled SQL from your feature_df – the exact statement that becomes the AS clause in the CREATE DYNAMIC TABLE or CREATE VIEW. You can use this to inspect, copy, or adapt the SQL before creating the DT manually with additional clauses (e.g., BACKFILL FROM, IMMUTABLE WHERE):

draft_fv = FeatureView(name="MY_FV", entities=[entity], feature_df=my_df, ...)
print(draft_fv.query)  # prints the SQL that would be used in the AS clause

This gives you the inner query only. For the full DDL of an already-registered DT (including TARGET_LAG, WAREHOUSE, TAGS etc.), use SELECT GET_DDL('DYNAMIC_TABLE', 'FEATURE_STORE."MY_FV$V01"').

Downstream impact

Downstream Dynamic Tables that read from the evolved DT will still need to refresh to pick up the schema change. If the downstream DTs also use IMMUTABLE WHERE, only their mutable window is reprocessed.

4.5.5 Retroactive IMMUTABLE WHERE (no clone needed)

The clone-and-backfill workflow above creates a new DT version. A simpler alternative – when you can accept NULL for the new feature in historical rows – is to add an IMMUTABLE WHERE constraint retroactively to the existing DT, replace the upstream source (or Feature View query), and refresh. Only the mutable window is recomputed; the immutable region is preserved as-is.

-- 1. Protect history on the existing DT
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET
    IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '1 day');

-- 2. Update the Feature View definition (e.g., add AVG feature)
--    If the FV reads from a VIEW, replace the VIEW:
CREATE OR REPLACE VIEW FEATURE_STORE.USER_ORDER_SOURCE_V AS
SELECT USER_ID,
       SUM(TOTAL_AMT)   AS ORDER_TOTAL_AMT_SUM,
       COUNT(ORDER_ID)   AS ORDER_CNT,
       AVG(TOTAL_AMT)    AS ORDER_TOTAL_AMT_AVG,   -- new feature
       MAX(ORDER_TS)      AS LAST_ORDER_TS
FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
GROUP BY USER_ID;

-- 3. Refresh — only mutable rows recomputed
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH;

-- 4. (Optional) Remove constraint if not needed permanently
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" UNSET IMMUTABLE WHERE;

After the refresh, immutable rows have NULL for ORDER_TOTAL_AMT_AVG while mutable rows have the computed value. Verify with SELECT *, METADATA$IS_IMMUTABLE FROM FEATURE_STORE."USER_ORDER_FV$V01".

OBJECT columns make this seamless

With OBJECT-based Feature Views (see Chapter 13: Wide & Sparse Data), heterogeneous key sets across the immutable/mutable boundary are handled naturally. Immutable rows simply lack the new key in the OBJECT; COALESCE(features:"NEW_KEY"::NUMBER, 0) fills the default at read time. In scalar DTs, the same pattern produces a visible NULL column in historical rows.

Lazy backfill

If you later need the new feature backfilled across all history, you have two options:

Full refresh – remove the constraint and recompute everything in a single off-peak window:

ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" UNSET IMMUTABLE WHERE;
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH;

Phased backfill – progressively move the immutability boundary backward (e.g., one month at a time), so each refresh recomputes only a slice of history. This spreads the cost across multiple windows and avoids a single large recompute:

-- Phase 1: backfill the most recent immutable month
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET
    IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '60 days');
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH;

-- Phase 2: backfill the next month
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET
    IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '90 days');
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH;

-- Continue until all history is backfilled, then lock down again
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET
    IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '1 day');

Phased backfill is useful when the full history is too large for a single off-hours window, or when warehouse budget constraints require distributing the compute across days.

4.5.6 Incremental Column Pickup with SELECT *

Snowflake has updated the behaviour of Dynamic Tables defined with SELECT * FROM base_table: when a new column is added to the base table, the DT now picks it up on its next refresh rather than failing with an error.

Previous behaviour

The Snowflake documentation previously stated: > “Changes to underlying base table for dynamic tables created with SELECT * > from base table: The dynamic table fails to refresh and must be > recreated to respond to the change.”

This constraint has been relaxed for additive changes (new columns). Non-additive changes — column drops, renames, and type modifications — still require recreation.

When this applies. If a Feature View’s DT is a straightforward pass-through of a base table using SELECT *, new columns added to the source automatically appear in the DT:

-- Feature View DT as a simple pass-through
CREATE DYNAMIC TABLE FEATURE_STORE."USER_FEATURES$V01"
    TARGET_LAG   = '1 hour'
    WAREHOUSE    = FS_WH
    REFRESH_MODE = INCREMENTAL
AS
SELECT * FROM SOURCE_DB.DATA.USER_FEATURES_BASE;

When USER_FEATURES_BASE gains a new column (e.g., RISK_SCORE FLOAT), ALTER TABLE ... ADD COLUMN RISK_SCORE FLOAT, the column will appear in the DT on its next scheduled refresh or a manual ALTER DYNAMIC TABLE ... REFRESH.

Backfill semantics for pre-existing rows. All rows — including those that existed before the column was added — are recomputed from source and receive the correct value for the new column. The REFRESH_ACTION shows INCREMENTAL with ins=N / del=N equal to the total row count: the DT deletes all existing rows and re-inserts them with the new column evaluated. There is no NULL fill for pre-existing rows.

Full row-level recompute despite INCREMENTAL label

The INCREMENTAL refresh action after a column add is a full scan of all rows in the source — every row is deleted and re-inserted. The cost is equivalent to a full refresh. The INCREMENTAL label refers to the DT engine’s change-tracking mechanism, not to processing only the rows that changed in the source.

For large DTs with many rows, a column add to the base table will therefore be an expensive operation. If the DT also has downstream dependents, each downstream DT also does a full row-level recompute in turn. Use IMMUTABLE WHERE (see below) if you need to protect the historical region from this recompute cost.

IMMUTABLE WHERE interaction. When IMMUTABLE WHERE is set on the DT, the column-add refresh respects the boundary: immutable rows are not recomputed (ins=0 / del=0 for protected rows) and their value for the new column is NULL. Mutable rows are recomputed from source and receive the correct value. This is consistent with the retroactive IMMUTABLE WHERE behaviour described above.

Uncontrolled upstream ADD COLUMN triggers automatic recompute

The SELECT * pickup is always on. Any ADD COLUMN to the base table — even one added by another team for an unrelated purpose — will trigger a full row-level recompute (ins=N/del=N) on this DT and on every downstream SELECT * DT in the cascade.

In a shared or multi-team environment where you do not own the base table, prefer the OBJECT-bundled approach with a VIEW intermediary. The VIEW acts as a gate: upstream schema changes are inert until you deliberately replace the VIEW, giving you control over timing, scope, and cost. See Chapter 13: Choosing between approaches for the full trade-off analysis.

DT cascade behaviour. Downstream DTs that also use SELECT * from the DT that gained a column will pick it up on their own next refresh. Downstream DTs with explicit column lists are unaffected until their query is updated.

When this is the right choice.

The DT is a thin pass-through over a single managed base table where the source team adds columns intentionally.
Consumers can tolerate NULL for the new column in historical rows, or the DT history is short enough that a full refresh is acceptable.
Feature count stays below ~100 columns (scalar column-per-feature stays within compilation and join-performance thresholds).
No explicit version gate is needed — new columns become visible immediately to all consumers.

For more complex cases — computed feature expressions, wide feature sets (100+ columns), sparse data, or when historical rows must not be NULLed — the BACKFILL + IMMUTABLE WHERE workflow and the OBJECT-bundled pattern remain the recommended approaches.

OBJECT_CONSTRUCT(*) inline in DT body — confirmed for pass-through bundles

Writing OBJECT_CONSTRUCT(* EXCLUDE (entity_id, ts)) AS features directly in the DT’s AS clause (no VIEW intermediary) has been prototyped and confirmed to work for pass-through bundles — where features are already derived and materialised upstream of this Feature View:

New upstream physical columns auto-appear as OBJECT keys on the next refresh (REFRESH_ACTION = INCREMENTAL)
The DT’s physical schema stays stable (3 columns: entity key, timestamp, OBJECT)
All rows in the mutable region are recomputed and receive the new key value
No VIEW to create, replace, or maintain when the source already holds the finished features

When the VIEW intermediary is the right choice. If feature derivation happens at the Feature View layer — ratios, flags, windowed functions that are not physical columns in the source — use an intermediary VIEW to hold the derivation logic and read from that VIEW with OBJECT_CONSTRUCT(*) in the DT. The key benefit: ALTER-ing the VIEW to add a new derived column is also picked up by the OBJECT(*) DT incrementally (REFRESH_ACTION = INCREMENTAL, not REINITIALIZE), giving you the same additive column pickup behaviour with the additional benefit of a controlled gate — the DT is completely unaffected by upstream schema changes until you deliberately replace the VIEW. See Chapter 13: Choosing between approaches for the full comparison.

Non-additive changes still require recreation

Column drops, renames, and type changes on the base table still cause the DT to fail on refresh and require recreation. For these changes, create a new Feature View version using the standard workflow described above.

Not available via Feature Store Python API

SELECT * DTs and their schema evolution are a SQL-layer capability. The Feature Store Python API (FeatureView, register_feature_view) does not expose this directly. Create the DT via SQL, then apply the Feature Store TAGs so the API and Snowsight UI can discover it (see Appendix D: Tag Convention).

4.5.7 Alternative: Base + Presentation Layer Model

The standard approach creates a new version (a new Dynamic Table) for each schema change. An alternative for organizations with complex, frequently evolving feature sets is a two-layer model that separates physical storage from the consumer-facing view-based Feature Views. This could be implemented within a single Feature Store Schema through naming convention, or two separate Feature Store Schemas to provide separation of concerns:

-- Single Feature Store Schema
CREATE SCHEMA FEATURE_STORE;

-- Two Feature Store Schemas
CREATE SCHEMA FEATURE_STORE_BASE;
CREATE SCHEMA FEATURE_STORE_PRESENTATION;

flowchart TB
  B1[USER_ORDER_BASE DT]
  B2[USER_ENGAGE_BASE DT]
  B3[USER_PROMO_BASE DT]
  B1 & B2 & B3 --> COMB[USER_COMBINED_FV View joins bases]

Base layer Dynamic Table feature views feeding a presentation View

How it works:

Base layer – each feature domain has its own versioned Dynamic Table (or View) Feature View. These are the immutable physical units that hold the actual feature-engineering logic and materialized data.
Presentation layer – a View-based Feature View that JOINs multiple base Feature Views together on their shared entity key (and timestamp column if required). Consumers query the presentation layer to get a unified feature set.

A key principle of this model is feature deduplication: each feature computation exists in exactly one base object. Presentation layers compose existing features without re-implementing them. When a feature definition changes, only the base layer needs a new version – the presentation layer’s SELECT * automatically picks up the new columns. If individual features within a single Feature View evolve at different rates, consider feature-level version suffixes in naming (e.g., ORDER_TOTAL_AMT_SUM_V01, ORDER_TOTAL_AMT_SUM_V02) to maintain backward compatibility while the base layer transitions.

# Base Feature Views (DT-backed, versioned independently)
user_order_base = FeatureView(
    name="USER_ORDER_BASE_FV", entities=[user_entity],
    feature_df=user_order_df, timestamp_col="LAST_ORDER_TS",
    refresh_freq="1 hour", desc="Base: lifetime order aggregates",
)
user_engage_base = FeatureView(
    name="USER_ENGAGE_BASE_FV", entities=[user_entity],
    feature_df=user_engage_df, timestamp_col="LAST_SESSION_TS",
    refresh_freq="1 hour", desc="Base: session engagement features",
)

# Presentation layer (View-based, combines bases)

SQL
Snowpark DataFrame

combined_df = session.sql("""
    SELECT o.*, e.SESSION_CNT, e.AVG_SESSION_DURATION_SEC
    FROM FEATURE_STORE.USER_ORDER_BASE_FV$V01 o
    JOIN FEATURE_STORE.USER_ENGAGE_BASE_FV$V01 e USING (USER_ID)
""")

order_base = session.table('FEATURE_STORE."USER_ORDER_BASE_FV$V01"')
engage_base = session.table('FEATURE_STORE."USER_ENGAGE_BASE_FV$V01"')
combined_df = order_base.join(engage_base, "USER_ID").select(
    order_base["*"],
    engage_base["SESSION_CNT"],
    engage_base["AVG_SESSION_DURATION_SEC"],
)

user_combined_fv = FeatureView(
    name="USER_COMBINED_FV", entities=[user_entity],
    feature_df=combined_df, timestamp_col="LAST_ORDER_TS",
    refresh_freq=None,  # View — always reflects latest base DT state
    desc="Presentation: combines order + engagement base Feature Views",
)

When to add a new feature, you create a new version of just the affected base DT (e.g., USER_ORDER_BASE_FV$V02 with the extra column), then update the presentation View’s SQL to reference $V02. The other base DTs are untouched – no duplication of existing features across multiple DTs.

Trade-offs:

	Standard versioning	Base + Presentation
Simplicity	One DT per version, straightforward	Multiple layers, JOIN management
Feature duplication	New version duplicates unchanged features	Only changed base DT is re-created
Consumer migration	Consumers point to new version	Presentation view updated centrally
Best for	Small-to-medium feature sets, simple schemas	Large feature sets with frequent, incremental additions across domains

Note

This pattern adds operational complexity and is best suited to organizations with mature Feature Store practices and large numbers of features that evolve at different rates. For most teams starting out, standard version-per-bundle is simpler and sufficient. The Feature Store API does not natively manage the base-to-presentation relationship today – you manage the presentation View’s SQL definition yourself (or via CI/CD).

4.5.8 Version Lifecycle

flowchart LR
  V01[V01 Initial] --> V02[V02 Update] --> V03[V03 Current]

Feature View version lifecycle V01 through V03

4.5.9 Managing Versions

List registered Feature Views:

Python API
SQL

fs.list_feature_views().select("NAME", "VERSION", "DESC").show()

-- Dynamic Table-backed Feature Views
SHOW DYNAMIC TABLES IN SCHEMA FEATURE_STORE_DEMO.FEATURE_STORE;

-- View-based Feature Views
SHOW VIEWS IN SCHEMA FEATURE_STORE_DEMO.FEATURE_STORE;

-- Combined column-level inventory
SELECT TABLE_NAME AS FEATURE_VIEW, COLUMN_NAME, DATA_TYPE
FROM FEATURE_STORE_DEMO.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'FEATURE_STORE'
ORDER BY TABLE_NAME, ORDINAL_POSITION;

Delete an old version (only after all consumers have migrated):

Python API
SQL

fs.delete_feature_view(name="USER_ORDER_FV", version="V01")

-- For DT-backed Feature Views
DROP DYNAMIC TABLE IF EXISTS FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01";

-- For View-based Feature Views
DROP VIEW IF EXISTS FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01";

Before deleting a version, verify that no models, datasets, or pipelines still depend on it:

Python API
SQL

v01_fv = fs.get_feature_view(name="USER_ORDER_FV", version="V01")

# ML-specific dependencies (datasets, models, other FVs)
downstream = v01_fv.lineage(direction="downstream")
for node in downstream:
    print(f"{type(node).__name__}: {node.name}")

# Narrower: only check if any models reference V01
model_deps = v01_fv.lineage(direction="downstream", domain_filter=["model"])

Requires snowflake-ml-python >= 1.6.0 and the VIEW LINEAGE privilege.

-- Snowflake-level object dependencies (DT chains, views, tasks)
SELECT *
FROM SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES
WHERE REFERENCED_OBJECT_NAME = 'USER_ORDER_FV$V01'
  AND REFERENCED_OBJECT_DOMAIN IN ('TABLE', 'DYNAMIC TABLE', 'VIEW');

-- Recent query access to V01
SELECT QUERY_START_TIME, USER_NAME, QUERY_TEXT
FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY
WHERE BASE_OBJECTS_ACCESSED LIKE '%USER_ORDER_FV$V01%'
ORDER BY QUERY_START_TIME DESC
LIMIT 20;

Only delete a version when both the API lineage check and the SQL dependency/access-history checks confirm zero remaining usage. See Chapter 12: Operations for more on lineage tracking.

4.5.10 Getting the Latest Version and Creating the Next

For maximum flexibility, Feature Store is intentionally not opinionated on version semantics—there’s no built-in “latest version” alias or auto-increment. Avoid hard-coding version strings throughout your codebase; instead, use helper utilities that derive the next version programmatically:

📁 Full code: _code/version_utils.py

import re
from typing import Optional


def get_latest_feature_view(
    fs: FeatureStore,
    feature_view_name: str,
) -> Optional[FeatureView]:
    """Get the most recent version of a Feature View."""
    versions = fs.list_feature_views(feature_view_name=feature_view_name)
    
    # Sort by version string (works with zero-padded versions)
    versions_sorted = sorted(versions, key=lambda fv: fv.version, reverse=True)
    
    return versions_sorted[0] if versions_sorted else None


def next_version(
    fs: FeatureStore,
    feature_view_name: str,
    prefix: str = "V",
    pad: int = 2,
) -> str:
    """
    Derive the next sequential version string for a Feature View.
    
    If USER_ORDER_FV has versions V01 and V02, returns "V03".
    If no versions exist, returns "V01".
    """
    versions = fs.list_feature_views(feature_view_name=feature_view_name)
    
    max_num = 0
    pattern = re.compile(rf"^{re.escape(prefix)}(\d+)$")
    for fv in versions:
        m = pattern.match(fv.version)
        if m:
            max_num = max(max_num, int(m.group(1)))
    
    return f"{prefix}{max_num + 1:0{pad}d}"

Usage:

version = next_version(fs, "USER_ORDER_FV")   # "V03" if V01, V02 exist
fs.register_feature_view(fv, version=version)

This keeps version assignment consistent across manual runs, notebooks, and CI/CD pipelines without anyone needing to remember what the current version is.

4.6 Refresh Configuration

4.6.1 Refresh Frequency Examples (time-period mode)

When using a period string (not CRON), you are configuring TARGET_LAG-style refresh for the backing Dynamic Table.

Frequency	Example use case	Cost impact
`1 minute`	Tight operational or fraud signals	Very high
`15 minutes`	Operational dashboards	High
`1 hour`	Standard batch ML features	Medium
`1 day`	Daily analytics features	Low

4.6.2 Best Practices for Refresh

# Align period with how often upstream CLICKSTREAM_DATA tables land meaningful deltas
fv = FeatureView(
    refresh_freq="4 hours",
    # ...
)

# For Online Feature Tables, OFT sync frequency should be <= Feature View refresh cadence

Tip

No changes = no compute. The Dynamic Table scheduler checks for upstream changes before triggering a refresh. If no source data has changed since the last refresh, the scheduler skips the cycle without consuming any warehouse compute. This means setting a shorter refresh_freq than the actual source update rate incurs near-zero cost from the change-detection check itself – the DT simply checks more often and finds nothing to do. You do not need to perfectly match refresh_freq to source cadence to avoid wasted spend; err on the side of slightly shorter lag for better freshness guarantees.

4.7 Feature View Ownership & SLAs

The Feature Store API provides desc on Feature Views and Entities, plus Snowflake object tags on the underlying Dynamic Tables and Views. For many teams, this is sufficient. However, the API does not natively track ownership, consumer lists, SLA targets, or other operational metadata that mature ML platform users might need for governance and compliance.

You have several options for extending the metadata layer to meet these needs – none are part of the Feature Store product itself, but all integrate naturally using Snowflake’s built-in features:

Snowflake Tags. Attach key-value metadata directly to Feature View objects. Tags are queryable via INFORMATION_SCHEMA and TAG_REFERENCES, and can be used for governance, cost attribution, and discovery:

CREATE TAG IF NOT EXISTS FS_OWNER ALLOWED_VALUES 'marketing-ds', 'risk-eng', 'platform';
CREATE TAG IF NOT EXISTS FS_SLA_MINUTES;

ALTER DYNAMIC TABLE FEATURE_STORE.USER_ORDER_FV$V01
  SET TAG FS_OWNER = 'marketing-ds', FS_SLA_MINUTES = '90';

-- Query ownership across all Feature Views
SELECT OBJECT_NAME, TAG_NAME, TAG_VALUE
FROM TABLE(INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS(
  'FEATURE_STORE_DEMO.FEATURE_STORE', 'SCHEMA'))
WHERE TAG_NAME IN ('FS_OWNER', 'FS_SLA_MINUTES');

Metadata tables. Maintain a registry table within your Feature Store schema, keyed by the object name (<FV_NAME>$<VERSION>). This can hold owner, team, consumer list, SLA targets, deprecation status, and anything else you need:

CREATE TABLE IF NOT EXISTS FEATURE_STORE.FEATURE_VIEW_METADATA (
    FV_OBJECT_NAME  STRING PRIMARY KEY,   -- e.g. USER_ORDER_FV$V01
    OWNER           STRING,
    TEAM            STRING,
    CONSUMERS       ARRAY,
    SLA_REFRESH     STRING,
    SLA_MAX_LATENCY_MINUTES  NUMBER,
    DEPRECATED      BOOLEAN DEFAULT FALSE,
    NOTES           STRING
);

You can build helper functions that populate this table automatically when registering a Feature View, and query it for discovery, alerting, or CI/CD gating.

External manifest files. For teams that manage Feature Store definitions in Git alongside CI/CD, a YAML manifest can serve as the source of truth for ownership and SLA metadata. This is not consumed by Snowflake, but drives your deployment scripts and documentation:

# feature_manifest.yaml
feature_views:
  - name: USER_ORDER_AGGREGATES
    owner: marketing-ds@company.com
    team: Marketing Data Science
    description: Core user behavior from ORDERS and ORDER_ITEMS
    consumers:
      - churn-model
      - ltv-model
    sla:
      refresh_frequency: "1 hour"
      max_latency_minutes: 90

Snowflake Contacts. Contacts are schema-level objects that you assign to databases, schemas, tables, views, and Dynamic Tables to indicate who is responsible for different purposes. They integrate with Snowsight’s Catalog and are queryable via ACCOUNT_USAGE views:

-- Create contacts for the Feature Store team
CREATE CONTACT IF NOT EXISTS FS_STEWARD
  METHOD EMAIL 'feature-store-team@company.com';
CREATE CONTACT IF NOT EXISTS FS_SUPPORT
  METHOD EMAIL 'fs-support@company.com';

-- Assign at schema level -- all Feature Views inherit automatically
ALTER SCHEMA FEATURE_STORE_DEMO.FEATURE_STORE
  SET CONTACT STEWARD = FS_STEWARD,
              SUPPORT = FS_SUPPORT;

-- Override for a specific Feature View if a different team owns it
ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"
  SET CONTACT STEWARD = MARKETING_DS_STEWARD;

Each assignment specifies a purpose – STEWARD (data accuracy), SUPPORT (technical questions), or ACCESS_APPROVAL (who approves access). Contacts inherit down the hierarchy (database → schema → table), so assigning at the schema level covers all Feature Views with a single statement. Direct assignment on a child overrides the inherited contact for that purpose.

To audit contact assignments across your Feature Store:

SELECT object_name, purpose,
       CASE WHEN is_inherited THEN 'Inherited' ELSE 'Direct' END AS assignment
FROM SNOWFLAKE.ACCOUNT_USAGE.CONTACT_REFERENCES
WHERE object_database = 'FEATURE_STORE_DEMO'
  AND object_schema = 'FEATURE_STORE'
  AND deleted IS NULL
ORDER BY object_name, purpose;

Cortex-powered object descriptions. Snowflake can automatically generate human-readable descriptions for tables, views, and their columns using Snowflake-hosted LLMs. This is useful for Feature Store objects where feature names alone (e.g., ORDER_TOTAL_AMT_SUM) may not be self-explanatory to consumers unfamiliar with the domain. Descriptions are stored as the standard COMMENT on the object, so they appear in DESCRIBE, SHOW, Snowsight’s Catalog, and any data catalog tooling.

You can generate descriptions from Snowsight (navigate to the object → “Generate description”) or via SQL using the AI_GENERATE_TABLE_DESC stored procedure:

-- Generate descriptions for a Feature View's DT and all its columns
CALL AI_GENERATE_TABLE_DESC(
  'FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"',
  { 'describe_columns': true, 'use_table_data': true }
);

The procedure returns a JSON object with suggested descriptions. To persist them, set them as comments:

ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"
  SET COMMENT = 'User-level purchase aggregates from clickstream ORDERS: total spend, order count, average order value, and most recent order timestamp.';

ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"
  MODIFY COLUMN ORDER_TOTAL_AMT_SUM COMMENT = 'Sum of all order amounts (TOTAL_AMT) for the user';

For Feature Stores with many Feature Views, you can script this across an entire schema using the stored procedure examples in the documentation. Generated descriptions significantly improve feature discoverability – consumers searching the catalog can find features by what they represent, not just by column name.

These approaches are complementary. A common pattern is to use Contacts for ownership and stewardship visible in Snowsight, Tags for lightweight governance queries and cost attribution, Cortex descriptions for human-readable feature documentation, a metadata table for operational dashboards and alerting, and a YAML manifest in Git for CI/CD-driven deployments. See Chapter 12: Operations for monitoring SLA compliance and auditing metadata at scale.

4.8 Hierarchical Feature Views

When source data exists at a child grain (e.g., line items, events) but the model operates at a parent grain (e.g., orders, users), create aggregated rollup Feature Views that project child-level data up to the parent entity.

Do not stack tiled Feature Views

If you need to roll up from a finer-grained tiled (Time Aggregation) Feature View to a coarser entity, use RollupConfig (see Chapter 7: Rollup Feature Views). Do not create a new tiled Feature View that reads from another tiled Feature View – this causes duplicate internal _PARTIAL_* column names and SQL compilation errors. The correct hierarchy pattern for tiled FVs is: raw events → tiled FV (fine entity) → Rollup FV (coarse entity) via RollupConfig.

# Raw lineitem Feature View at the finest grain (optional: useful for line-level models)
lineitem_fv = FeatureView(
    name="LINEITEM_FV", entities=[lineitem_entity],
    feature_df=lineitem_df, timestamp_col="SHIP_DATE",
    refresh_freq="1 hour",
    desc="Raw lineitem features at (ORDER_ID, LINE_NUMBER) grain",
)

SQL
Snowpark DataFrame

order_lineitem_agg_df = session.sql("""
    SELECT
        ORDER_ID,
        SUM(EXTENDED_PRICE) AS ORDER_TOTAL_AMT,
        COUNT(*) AS LINE_COUNT,
        AVG(EXTENDED_PRICE) AS AVG_LINE_PRICE,
        MAX(SHIP_DATE) AS LAST_SHIP_DATE
    FROM ORDER_ITEMS
    GROUP BY ORDER_ID
""")

import snowflake.snowpark.functions as F

order_items = session.table("ORDER_ITEMS")
order_lineitem_agg_df = order_items.group_by("ORDER_ID").agg(
    F.sum("EXTENDED_PRICE").alias("ORDER_TOTAL_AMT"),
    F.count("*").alias("LINE_COUNT"),
    F.avg("EXTENDED_PRICE").alias("AVG_LINE_PRICE"),
    F.max("SHIP_DATE").alias("LAST_SHIP_DATE"),
)

order_agg_fv = FeatureView(
    name="ORDER_LINEITEM_AGG_FV", entities=[order_entity],
    feature_df=order_lineitem_agg_df, timestamp_col="LAST_SHIP_DATE",
    refresh_freq="1 hour",
    desc="Lineitem features aggregated to the order level",
)

When generate_training_set or generate_dataset joins Feature Views at different entity grains, higher-level features are denormalized – repeated for every child row in the training data. For example, a spine at the lineitem grain joined with both lineitem_fv and order_agg_fv produces one row per lineitem, with order-level aggregates duplicated across all lines of the same order.

If denormalization is undesirable (e.g., the model expects one row per order), either:

Use a spine at the parent grain and only include parent-level Feature Views, or
Create a pivot Feature View that widens child data into columns (e.g., LINE_1_PRICE, LINE_2_PRICE) – though this is only practical with a bounded number of children. For unbounded child sets, use OBJECT_AGG, ARRAY_AGG, or LISTAGG to pack child data into a single column, then process it at retrieval time in Python (e.g., extract the last-N values or compute further aggregations over the array). See Chapter 13: Wide & Sparse Feature Data for the OBJECT packing pattern.

See Chapter 3: Modeling at a Different Grain for entity design considerations and Chapter 11: Training & Inference for multi-entity spine design.

4.9 Best Practices

1. Use Consistent Naming

# Pattern: <ENTITY>_<DOMAIN>_FV (SCREAMING_SNAKE_CASE)
"USER_ORDER_FV"
"PRODUCT_CATALOG_FV"
"SESSION_ENGAGEMENT_FV"

Entity keys and table names should stay singular SCREAMING_SNAKE_CASE where applicable (USER, PRODUCT, …). Prefer aggregate suffixes _CNT, _SUM, _AVG, _TS, boolean prefixes IS_, and monetary columns like TOTAL_AMT (not AMOUNT).

Plan feature prefixing early at scale

When many Feature Views share the same entity (common in large deployments – 50+ Feature Views for a single entity like USER), column name collisions become inevitable. Rather than resolving collisions reactively during training data generation, adopt a prefixing convention as part of your Feature View naming standard from day one. Options include:

Use auto_prefix=True on all generate_dataset / generate_training_set calls (columns become FV_NAME__FEATURE_NAME).
Use short .with_name() prefixes per domain (e.g., "ord", "sess", "eng") for more concise column names.
Embed the domain prefix in the .alias() call on each Feature definition (e.g., Feature.count("ORDER_ID", "7d").alias("ORD_ORDER_ID_CNT_7D")).

The choice affects downstream model code, feature importance reporting, and monitoring dashboards – so it is worth standardizing before the Feature View count grows large. See Chapter 7: Prefixing and Chapter 11: Feature Column Prefixing for detailed patterns.

2. Consolidate Feature Views from the Same Source

When multiple tiled Feature Views share the same source table, entity, and tile granularity, consolidate them into a single Feature View. Separate Feature Views over identical sources create independent tile DTs, duplicating compute. See Chapter 7: Consolidate Feature Views for detailed guidance and exceptions.

3. Document Feature Definitions

fv = FeatureView(
    name="USER_ORDER_FV",
    desc="""
    USER-level order features from CLICKSTREAM_DATA.ORDERS for propensity models.

    Features:
    - ORDER_TOTAL_AMT_SUM: Sum of TOTAL_AMT across orders
    - ORDER_CNT: Count of orders
    - ORDER_TOTAL_AMT_AVG: Average order TOTAL_AMT
    - LAST_ORDER_TS: Latest ORDER_TS (point-in-time column)

    SLA: Refreshed hourly (TARGET_LAG), max 90 min lag
    Owner: ml-platform@company.com
    """,
    # ...
)

The Feature Store API also supports per-feature descriptions via attach_feature_desc(). These descriptions are stored as metadata and surfaced in list_columns(), making individual features discoverable:

fv = fv.attach_feature_desc({
    "ORDER_TOTAL_AMT_SUM": "Sum of TOTAL_AMT across all orders for this user",
    "ORDER_CNT": "Count of orders placed by this user",
    "ORDER_TOTAL_AMT_AVG": "Average TOTAL_AMT per order",
    "LAST_ORDER_TS": "Timestamp of the most recent order (point-in-time column)",
})

registered_fv = fs.register_feature_view(fv, "V01")
registered_fv.list_columns().show()

4. Version on Any Consumer-Visible Change

# V01: Initial column set
# V02: Added ORDERS_7D_CNT
# V03: Changed aggregation logic (breaking) — new version, never in-place edit

# Increment version when you:
# - Add or remove features
# - Change calculations or filters
# - Change entity keys or timestamp semantics

4.10 Common Pitfalls

4.10.1 ❌ Pitfall 1: No Versioning

Problem: Registering changes under the same version breaks dependent models.

Solution: Always register new versions (V02, V03, …) for consumer-visible changes.

4.10.2 ❌ Pitfall 2: Modifying a Registered Feature View In Place

Problem: Overwriting definitions for an existing name/version breaks replay and training reproducibility.

Solution: Treat each registered version as immutable; publish a new version instead.

4.10.3 ❌ Pitfall 3: Misunderstanding Refresh Cost

Problem: Assuming a short refresh_freq always incurs high compute cost, or conversely, setting an unnecessarily long lag “to save money.”

Solution: The DT scheduler’s change-detection check is near-zero cost – no warehouse compute is used when nothing has changed. A shorter lag than the source update frequency is safe and provides better freshness guarantees. However, if the source does update frequently and the DT query is expensive, a shorter lag means more frequent full refreshes. Match refresh_freq to the freshness your consumers actually need, not to the source rate.

4.10.4 ❌ Pitfall 4: No Documentation

Problem: Teams cannot trust or reuse features.

Solution: Use the desc parameter and keep a feature catalog in sync with V01, V02, …

4.11 Summary

Concept	Description
View Feature View	`refresh_freq=None` (default): query-time; pass-through, declarative transforms, or dev precursor to DT
Dynamic Table (period)	`refresh_freq="1 hour"` (etc.): materialized; TARGET_LAG refresh; no compute when source unchanged
Dynamic Table + Task (CRON)	`refresh_freq="0 8 * * *"`: initial DT refresh, then Task on schedule
Version (V01, V02, …)	Immutable snapshot of feature definitions; `$`-separated object names
Schema evolution	New features and breaking changes → new version; BACKFILL for cost-efficient evolution; base + presentation layer as alternative
Version retirement	Validate zero usage via lineage and access history before deleting old versions
Ownership & SLAs	Tags, Contacts, metadata tables, YAML manifests, Cortex descriptions

4.12 Next Steps

Continue to Chapter 5: Feature Pipelines to learn about pipeline architectures and orchestration patterns.

--- title: "Feature Views" subtitle: "Dynamic Tables vs Views, refresh_freq modes, versioning, and lifecycle" --- ## Overview A Feature View is a collection of related features computed from source data. It defines how features are transformed, stored, and served. This chapter uses the clickstream sample model (`FEATURE_STORE_DEMO.CLICKSTREAM_DATA`: `VISITORS`, `USERS`, `HOUSEHOLDS`, `PRODUCTS`, `CATEGORIES`, `SUPPLIERS`, `PRODUCT_SUPPLIER`, `EVENTS`, `SESSIONS`, `ORDERS`, `ORDER_ITEMS`) and Feature Store objects in `FEATURE_STORE_DEMO.FEATURE_STORE`, warehouse `FS_DEV_WH`. The **`refresh_freq`** parameter determines whether a Feature View is backed by a **View** (query-time) or a **Dynamic Table** (materialized), and whether refresh follows **TARGET_LAG** or a **CRON-driven Task**. ## Learning Objectives After completing this chapter, you will be able to: - Interpret the three `refresh_freq` behaviors: omitted / `None`, time period, and CRON - Choose between Dynamic Table and View-based Feature Views for a workload, including using Views as a development precursor to Dynamic Tables - Implement versioning strategies (V01, V02, …) for safe updates - Apply schema-evolution rules: add versions for new features, deprecate without breaking consumers, never change a registered Feature View in place - Evaluate the base + presentation layer model as an alternative to version-per-bundle for complex feature sets - Manage Feature View lifecycle: create, register new versions, validate lineage, deprecate, and remove - Configure refresh strategies for freshness vs. cost, understanding that the DT scheduler incurs no compute when source data is unchanged > 📂 **Chapter code:** [Browse companion scripts on GitHub](https://github.com/Snowflake-Labs/snowflake-featurestore-imp-guide/tree/main/Snowflake_FeatureStore_Implementation_Guide/04_feature_views/_code) ```{python} #| output: false #| echo: false from snowflake.snowpark import Session from snowflake.snowpark import functions as F from snowflake.snowpark.context import get_active_session from snowflake.ml.feature_store import FeatureStore, FeatureView, Entity, CreationMode try: session = get_active_session() except Exception: session = Session.builder.config("connection_name", "default").create() fs = FeatureStore( session=session, database="FEATURE_STORE_DEMO", name="FEATURE_STORE", default_warehouse="FS_DEV_WH", creation_mode=CreationMode.CREATE_IF_NOT_EXIST, ) # Retrieve entity references registered in earlier chapters user_entity = Entity(name="USER", join_keys=["USER_ID"]) product_entity = Entity(name="PRODUCT", join_keys=["PRODUCT_ID"]) session_entity = Entity(name="SESSION", join_keys=["SESSION_ID"]) ``` --- ## SQL vs Snowpark DataFrame API {#sec-sql-vs-snowpark} Feature View definitions require a Snowpark DataFrame (`feature_df`). You can construct this using either `session.sql("SELECT ...")` (SQL) or the Snowpark DataFrame API (`session.table(...).group_by(...).agg(...)`). Both produce **identical** lazy Snowpark DataFrames and generate the same Dynamic Tables. Throughout this guide, code examples are presented in both forms using tabs. **SQL is shown first** because it is more widely known across data science and engineering teams, but the Snowpark DataFrame API is equally valid. | Factor | SQL (`session.sql`) | Snowpark DataFrame API | |---|---|---| | **Readability** | Universally understood across data roles | Requires Python + Snowpark API knowledge | | **Debuggability** | Maps 1:1 to what the Dynamic Table executes; easy to copy-paste into Snowsight | Emits machine-generated SQL that can be dense/nested (see [Appendix C](../appendices/C_snowpark_to_dynamic_table/index.qmd#sec-sql-formatting)) | | **Composability** | Harder to parameterize dynamically | Excels at programmatic feature generation (loops, conditional columns, config-driven specs) | | **Testability** | Requires `session.sql` assertions or a SQL runner | Integrates with Python test frameworks (pytest) | | **Verbosity** | Typically more concise for simple aggregations | More concise for complex multi-step pipelines with reusable intermediates | | **Team skills** | SQL is widely known (analysts, data scientists, data engineers) | Snowpark API is narrower (Python + Snowpark) | | **Maintenance** | Easier to review in PRs for SQL-fluent teams | Easier for Python-fluent teams with IDE support | ::: {.callout-tip} There is no wrong choice. Pick the style that your team can read, review, and maintain. Many teams standardize on SQL for simple aggregations and reserve Snowpark DataFrames for programmatic generation (e.g., [data-driven feature factories](../07_aggregations_api/index.qmd#sec-data-driven) in Chapter 7). ::: --- ## How `refresh_freq` Controls Materialization {#sec-refresh-freq-modes} | `refresh_freq` | Backing object | Behavior | |----------------|----------------|----------| | **`None`** (default) or omitted | **View** | Features computed at query time; always reflects current source data; no Dynamic Table storage | | **Time period** (e.g. `"1 hour"`, `"15 minutes"`) | **Dynamic Table** | Materialized result; refresh driven by **TARGET_LAG** for that period | | **CRON expression** (e.g. `"0 8 * * *"`) | **Dynamic Table + Task** | Dynamic Table is created and receives an **initial full refresh**, then is **suspended** for ongoing lag-based refresh; a **Task** runs the refresh on the CRON schedule | ::: {.callout-note} ## CRON vs period strings Snowflake distinguishes a **schedule string** (CRON) from a **time period** (lag). If the value parses as CRON, the platform uses the Dynamic Table + Task pattern described above—not continuous TARGET_LAG refresh. ::: ::: {.callout-tip title="Iceberg-backed Feature Views (SDK 1.26+)"} Any DT-backed Feature View (time period or CRON) can optionally materialize as a **Dynamic Iceberg Table** by passing a `StorageConfig`: ```python from snowflake.ml.feature_store import StorageConfig, StorageFormat storage = StorageConfig( format=StorageFormat.ICEBERG, external_volume='MY_EXTERNAL_VOLUME', base_location='feature_store/my_fv' ) fv = FeatureView( name="MY_FV", entities=[e], feature_df=df, refresh_freq="1 hour", storage_config=storage, ... ) ``` This writes feature data as Parquet files in Iceberg format on external cloud storage, making it directly readable by any Iceberg-compatible ML framework (Ray, Spark, PyTorch) without requiring a Snowflake connector. All standard Feature Store APIs work unchanged. See [Chapter 11: Iceberg-Backed Feature Views](../11_training_inference/index.qmd#sec-iceberg-fv) for full details. ::: ### View-based Feature View (`refresh_freq = None`) > 📁 **Full code:** [`_code/dynamic_table_fv.py`](_code/dynamic_table_fv.py) — `create_view_featureview` View-based Feature Views are highly flexible. The underlying View can range from a simple pass-through to a complex declarative transformation, making them suitable for several distinct use cases: **1. Simple pass-through for externally managed tables.** When feature engineering is handled by an external process (dbt, Airflow, a Stored Procedure, etc.), the Feature View simply wraps the already-prepared table. This registers it in the Feature Store catalog for discovery, lineage, and training-set generation without duplicating any transformation logic: ```python from snowflake.ml.feature_store import FeatureView, Entity # Pass-through: dbt/Airflow manages the table; Feature View just wraps it product_catalog_fv = FeatureView( name="PRODUCT_CATALOG_FV", entities=[product_entity], feature_df=session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.PRODUCTS"), timestamp_col="UPDATED_TS", refresh_freq=None, desc="PRODUCT attributes from CLICKSTREAM_DATA.PRODUCTS — managed by dbt", ) ``` **2. Declarative transformations.** The View definition can encode complex feature-engineering logic -- joins, aggregations, window functions, CASE expressions -- evaluated at query time. This keeps the transformation logic self-contained within the Feature Store: ::: {.panel-tabset group="lang"} ## SQL ```python user_session_df = session.sql(""" SELECT s.USER_ID, COUNT(DISTINCT s.SESSION_ID) AS SESSION_CNT, AVG(s.DURATION_SEC) AS AVG_SESSION_DURATION_SEC, SUM(CASE WHEN e.EVENT_TYPE = 'PURCHASE' THEN 1 ELSE 0 END) AS PURCHASE_EVENT_CNT, MAX(s.START_TS) AS LAST_SESSION_TS FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.SESSIONS s JOIN FEATURE_STORE_DEMO.CLICKSTREAM_DATA.EVENTS e ON s.SESSION_ID = e.SESSION_ID GROUP BY s.USER_ID """) ``` ## Snowpark DataFrame ```python import snowflake.snowpark.functions as F sessions = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.SESSIONS") events = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.EVENTS") user_session_df = ( sessions.join(events, sessions["SESSION_ID"] == events["SESSION_ID"]) .group_by(sessions["USER_ID"]) .agg( F.count_distinct(sessions["SESSION_ID"]).alias("SESSION_CNT"), F.avg(sessions["DURATION_SEC"]).alias("AVG_SESSION_DURATION_SEC"), F.sum(F.when(events["EVENT_TYPE"] == "PURCHASE", 1).otherwise(0)).alias("PURCHASE_EVENT_CNT"), F.max(sessions["START_TS"]).alias("LAST_SESSION_TS"), ) ) ``` ::: ```python user_session_fv = FeatureView( name="USER_SESSION_FV", entities=[user_entity], feature_df=user_session_df, timestamp_col="LAST_SESSION_TS", refresh_freq=None, desc="USER session engagement features -- computed at query time", ) ``` **3. Development precursor to a Dynamic Table.** During development, use a View-based Feature View while the transformation logic is still evolving. Because the View has no materialized state, you can freely iterate on the query definition. Once the logic stabilizes, convert it to a Dynamic Table by setting `refresh_freq` to a period string -- the same `feature_df` works unchanged: ::: {.callout-note} ## Converting a View-based Feature View to a Dynamic Table When you switch from `refresh_freq=None` (View) to a period string (Dynamic Table), the underlying Snowflake object type changes from `VIEW` to `DYNAMIC TABLE`. As of `snowflake-ml-python` **1.27.0**, `register_feature_view()` with `overwrite=True` handles this conversion automatically -- it drops the existing object and creates the new type. On earlier versions, you must delete the existing Feature View first with `fs.delete_feature_view()` before re-registering with the new `refresh_freq`. ::: ```python # Start with a View in development fv = FeatureView( name="USER_ORDER_FV", entities=[user_entity], feature_df=user_order_df, timestamp_col="LAST_ORDER_TS", refresh_freq=None, # View — iterate freely desc="USER lifetime order features — DEV iteration", ) # Later, promote to Dynamic Table for production fv = FeatureView( name="USER_ORDER_FV", entities=[user_entity], feature_df=user_order_df, # same query timestamp_col="LAST_ORDER_TS", refresh_freq="1 hour", # now materialized as a DT desc="USER lifetime order features — production DT", ) ``` **Characteristics**: - Computed on each query -- always reflects current source data - No Dynamic Table storage or refresh cost - Query-time compute cost scales with read volume - Ideal for prototyping, rarely accessed features, or wrapping externally managed tables ### Dynamic Table Feature View (time period) > 📁 **Full code:** [`_code/dynamic_table_fv.py`](_code/dynamic_table_fv.py) — `create_dt_featureview` #### Pattern A: Cumulative lifetime features (one row per order event) The most PIT-friendly pattern for DT-backed Feature Views produces **multiple rows per entity over time**, so the ASOF join can select the correct snapshot for any point in the past: ::: {.panel-tabset group="lang"} ## SQL ```python from snowflake.ml.feature_store import FeatureView, Entity user_order_cumulative_df = session.sql(""" SELECT USER_ID, ORDER_TS, SUM(TOTAL_AMT) OVER w AS ORDER_TOTAL_AMT_SUM, COUNT(ORDER_ID) OVER w AS ORDER_CNT, AVG(TOTAL_AMT) OVER w AS ORDER_TOTAL_AMT_AVG FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS WINDOW w AS (PARTITION BY USER_ID ORDER BY ORDER_TS ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) """) ``` ## Snowpark DataFrame ```python from snowflake.snowpark import Window import snowflake.snowpark.functions as F from snowflake.ml.feature_store import FeatureView, Entity orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS") w = Window.partition_by("USER_ID").order_by("ORDER_TS").rows_between( Window.UNBOUNDED_PRECEDING, Window.CURRENT_ROW ) user_order_cumulative_df = orders.select( F.col("USER_ID"), F.col("ORDER_TS"), F.sum("TOTAL_AMT").over(w).alias("ORDER_TOTAL_AMT_SUM"), F.count("ORDER_ID").over(w).alias("ORDER_CNT"), F.avg("TOTAL_AMT").over(w).alias("ORDER_TOTAL_AMT_AVG"), ) ``` ::: ```python user_order_fv = FeatureView( name="USER_ORDER_FV", entities=[user_entity], feature_df=user_order_cumulative_df, timestamp_col="ORDER_TS", refresh_freq="1 hour", desc="USER cumulative order features: running total spend, order count, " "and average order value -- one row per order for PIT retrieval", ) ``` This Feature View produces **one row per user per order**. When the spine requests features as of a specific timestamp, the ASOF join selects the row with the most recent `ORDER_TS` <= the spine timestamp, returning the cumulative totals as of that point in time. #### Pattern B: Latest-state aggregate (one row per entity) If you only need the **current** totals (e.g., for online serving or batch inference at "now"), a single-row aggregate is simpler and cheaper to maintain: ::: {.panel-tabset group="lang"} ## SQL ```python user_order_latest_df = session.sql(""" SELECT USER_ID, SUM(TOTAL_AMT) AS ORDER_TOTAL_AMT_SUM, COUNT(ORDER_ID) AS ORDER_CNT, AVG(TOTAL_AMT) AS ORDER_TOTAL_AMT_AVG, MAX(ORDER_TS) AS LAST_ORDER_TS FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS GROUP BY USER_ID """) ``` ## Snowpark DataFrame ```python import snowflake.snowpark.functions as F orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS") user_order_latest_df = orders.group_by("USER_ID").agg( F.sum("TOTAL_AMT").alias("ORDER_TOTAL_AMT_SUM"), F.count("ORDER_ID").alias("ORDER_CNT"), F.avg("TOTAL_AMT").alias("ORDER_TOTAL_AMT_AVG"), F.max("ORDER_TS").alias("LAST_ORDER_TS"), ) ``` ::: ```{python} #| echo: false #| output: false user_order_latest_df = session.sql(""" SELECT USER_ID, SUM(TOTAL_AMT) AS ORDER_TOTAL_AMT_SUM, COUNT(ORDER_ID) AS ORDER_CNT, AVG(TOTAL_AMT) AS ORDER_TOTAL_AMT_AVG, MAX(ORDER_TS) AS LAST_ORDER_TS FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS GROUP BY USER_ID """) ``` ```{python} user_order_latest_fv = FeatureView( name="USER_ORDER_LATEST_FV", entities=[user_entity], feature_df=user_order_latest_df, timestamp_col="LAST_ORDER_TS", refresh_freq="1 hour", desc="USER latest-state order aggregates: lifetime total spend, order count, " "and average order value from CLICKSTREAM_DATA.ORDERS", ) user_order_latest_fv = fs.register_feature_view( feature_view=user_order_latest_fv, version="V01", block=True, overwrite=True, ) print(f"Registered: {user_order_latest_fv.name}/V01 ({user_order_latest_fv.status})") user_order_latest_df.limit(5).show() ``` ::: {.callout-note} Pattern B produces one row per user. The ASOF join always returns this single row for any spine timestamp >= `LAST_ORDER_TS`. This means every training example sees the **same** (current) totals regardless of when the training event occurred -- it does not provide true point-in-time correctness for historical training. Use Pattern A when temporal accuracy matters. See [Chapter 6: Temporal Features](../06_temporal_features/index.qmd) for details. ::: **Characteristics**: - Pre-computed and stored in a Dynamic Table - Refreshes on a lag target derived from the period string - Strong fit for high read volume and predictable freshness needs - Ongoing compute cost for refreshes ::: {.callout-warning title="Float aggregations with JOINs force full refresh"} If your `feature_df` query combines **aggregations on `FLOAT`-typed columns** (SUM, AVG, MIN, MAX, VAR, STD) with a **JOIN in the same query block**, the Dynamic Table engine will fall back to **FULL refresh** instead of incremental. This can dramatically increase compute cost and refresh latency. **Workaround:** Cast float columns to fixed-point before aggregating: ```sql SUM(CAST(float_col AS NUMBER(38, 6))) AS SUM_VAL ``` If you are building your `feature_df` with the Snowpark DataFrame API instead of `session.sql()`, the equivalent cast is: ```python from snowflake.snowpark.types import DecimalType feature_df = ( source_df .group_by("JOIN_KEY") .agg(F.sum(F.col("FLOAT_COL").cast(DecimalType(38, 6))).alias("SUM_VAL")) ) ``` If the join is unavoidable, consider splitting the query into a base DT (aggregation only, no join) and a presentation View that joins the results -- see the [Base + Presentation Layer Model](#sec-base-presentation) later in this chapter. As of April 2026, the presentation layer does not have to be a View -- it can be a **downstream DT** that refreshes incrementally. When the base DT's query uses `GROUP BY`, Snowflake derives a unique key from the grouping columns. A downstream DT can then opt into `REFRESH_MODE = INCREMENTAL` via SQL (`ALTER DYNAMIC TABLE ... SET REFRESH_MODE = INCREMENTAL`) and process only the rows that actually changed in the upstream full refresh. This is a SQL-level capability not yet exposed in the Feature Store Python API. See [Chapter 5: Feature Pipelines](../05_feature_pipelines/index.qmd) and [Understanding primary keys in dynamic tables](https://docs.snowflake.com/en/user-guide/dynamic-tables/input-data-optimization). ::: ### Dynamic Table + Task (CRON) > 📁 **Full code:** [`_code/dynamic_table_fv.py`](_code/dynamic_table_fv.py) — `create_cron_featureview` Use CRON when features should refresh on a **calendar schedule** aligned to upstream batch loads rather than continuous lag. This is common when source data arrives in known batches (e.g., a nightly ETL) and you want refresh timing to match the upstream cadence. Note that even with a shorter `refresh_freq`, cost is minimal when there is no new source data -- the Dynamic Table scheduler checks for upstream changes and skips the refresh cycle without starting a warehouse: ::: {.panel-tabset group="lang"} ## SQL ```python user_daily_session_df = session.sql(""" SELECT USER_ID, COUNT(DISTINCT SESSION_ID) AS SESSION_CNT_LIFETIME, AVG(DURATION_SEC) AS AVG_SESSION_DURATION_SEC FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.SESSIONS GROUP BY USER_ID """) ``` ## Snowpark DataFrame ```python import snowflake.snowpark.functions as F sessions = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.SESSIONS") user_daily_session_df = sessions.group_by("USER_ID").agg( F.count_distinct("SESSION_ID").alias("SESSION_CNT_LIFETIME"), F.avg("DURATION_SEC").alias("AVG_SESSION_DURATION_SEC"), ) ``` ::: ```python user_daily_session_fv = FeatureView( name="USER_SESSION_DAILY_FV", entities=[user_entity], feature_df=user_daily_session_df, refresh_freq="0 8 * * *", # CRON: refresh daily at 08:00 UTC after nightly load desc="USER session engagement features -- refreshed daily after batch ingestion", ) ``` ::: {.callout-note} ## `timestamp_col` is optional This example omits `timestamp_col` because it produces a single latest-state row per entity. Without a timestamp column, Feature Store performs an **equality join** on the entity key when retrieving features -- every spine row for that entity receives the same feature values. Use `timestamp_col` when you need **point-in-time (ASOF) joins** for temporally correct training data; omit it when you only need the current snapshot. See [Chapter 6: Temporal Features](../06_temporal_features/index.qmd) for details. ::: **Characteristics**: - Operational model: initial DT build, then **Task-owned** refreshes per CRON schedule - Aligns refresh to upstream batch cadence -- avoids paying for refreshes before new data lands - Document the schedule and timezone expectations for consumers ### External Scheduling (Suspend + On-Demand Refresh) When refreshes should be triggered by **external pipeline completion** (e.g., an Airflow DAG, Dagster asset, or CI/CD step) rather than a fixed schedule, you can **suspend** the Feature View's DT and call `refresh_feature_view()` on demand: ::: {.panel-tabset group="lang"} ## Python API ```python registered_fv = fs.register_feature_view(fv, "V01") # Suspend automatic refreshes -- DT stays idle until explicitly triggered fs.suspend_feature_view(registered_fv) # In your external orchestrator (Airflow task, Dagster op, etc.): fs.refresh_feature_view(registered_fv) ``` ## SQL ```sql -- Suspend automatic refreshes ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_SESSION_DAILY_FV$V01" SUSPEND; -- Trigger a manual refresh from your external scheduler ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_SESSION_DAILY_FV$V01" REFRESH; ``` ::: ::: {.callout-tip} The SQL approach is often simpler for external schedulers (Airflow, Dagster, dbt) that already have a Snowflake connection -- no Python environment or `snowflake-ml-python` install required. The Feature View's underlying Dynamic Table name follows the `<NAME>$<VERSION>` convention and lives in the Feature Store schema. ::: **Characteristics**: - Full control over refresh timing -- refresh only when upstream data is confirmed ready - No wasted refreshes on unchanged data - Requires external orchestration infrastructure - The DT remains suspended between refreshes; no background compute ### Choosing Between Modes | Factor | View (`None`) | Dynamic Table (period) | Dynamic Table + Task (CRON) | External (suspend + refresh) | |--------|----------------|-------------------------|-----------------------------|-------------------------------| | **Query frequency** | Lower or sporadic | High | High (after batch landing) | High (after upstream completes) | | **Freshness** | As-of query time | Lag-bound (e.g. ≤ 1 hour) | Next scheduled run | On-demand after trigger | | **Cost shape** | Pay per query | Pay for refresh + storage | Pay for scheduled runs | Pay for triggered runs only | | **Orchestration** | Optional external prep of sources | Native lag refresh | Native Task schedule | External (Airflow, Dagster, etc.) | ::: {.callout-tip title="Wide Feature Views (100+ columns)"} Feature tables with hundreds or thousands of columns (sensor data, one-hot encodings, time-windowed aggregations) introduce SQL compilation overhead, micro-partition bloat, and can hit Snowflake's ~2,000-column practical limit. If your Feature View exceeds ~100 columns, consider **hybrid storage** -- entity keys and timestamp as native columns, remaining features packed into a single `OBJECT` column using `OBJECT_CONSTRUCT`. This reduces storage overhead, eliminates column-limit concerns, and improves compile time. An expansion View on top can reconstitute the wide schema for consumers that need it. See [Chapter 13: Advanced Patterns -- Wide & Sparse Feature Data](../13_advanced_patterns/index.qmd#sec-wide-sparse) for detailed architecture, column count thresholds, and the two-tier Feature View pattern. ::: ::: {.callout-tip title="Data-Driven Feature Definitions"} When feature columns are generated programmatically — categorical pivots, multi-window sweeps, config-driven specifications — the `feature_df` and `Feature` list are built by code rather than written by hand. The Feature View schema is still fixed at registration time, but the code path becomes data-driven. See [Chapter 7: Data-Driven Feature Definitions](../07_aggregations_api/index.qmd#sec-data-driven) for patterns, factory functions, and static vs dynamic trade-offs. ::: --- ## Versioning Strategies {#sec-versioning} Feature View versions enable safe updates and rollbacks. ::: {.callout-important} ## Key insight Versioning happens at the **Feature View level**, not per individual feature. A Feature View version is an immutable bundle of features and definitions. ::: ### Recommended Version Naming > 📁 **Full code:** [`_code/versioning.py`](_code/versioning.py) ```python # Zero-padded sequential versions (recommended for production) fs.register_feature_view(fv, version="V01") fs.register_feature_view(fv, version="V02") fs.register_feature_view(fv, version="V10") # Sorts correctly lexicographically # Environment-prefixed variants (optional) fs.register_feature_view(fv, version="DEV_V01") fs.register_feature_view(fv, version="PROD_V03") ``` Prefer **V01**, **V02**, … for operational clarity and sorting. Other schemes (semantic versions, date stamps) are possible but should be agreed team-wide. Consider creating shared Python helper functions that automate version incrementing and enforce your preferred scheme -- this reduces human error as the Feature View count grows and keeps versioning consistent across teams. ### How Versions Map to Snowflake Objects When you register a Feature View, the Feature Store constructs the underlying Snowflake object name by joining the Feature View name and version with a **`$`** separator: ``` <FEATURE_VIEW_NAME>$<VERSION> ``` For example, registering a Feature View named `USER_ORDER_FV` at version `V01` creates a Dynamic Table (or View) called: ``` FEATURE_STORE_DEMO.FEATURE_STORE.USER_ORDER_FV$V01 ``` This naming convention means: - Each version is a **distinct Snowflake object** -- V01 and V02 coexist as separate Dynamic Tables (or Views) in the same schema. - You can query any version directly via SQL: `SELECT * FROM FEATURE_STORE.USER_ORDER_FV$V02`. - Downstream DTs that chain off a specific version reference the fully-qualified `$`-delimited name, so upgrading consumers to a new version is an explicit, controlled change. - **Cost implication:** Each DT-backed version incurs independent refresh compute and storage. If V01 and V02 share overlapping features, that compute is duplicated. Mitigations: **suspend** older versions once consumers have migrated (`fs.suspend_feature_view()`), use the [Base + Presentation pattern](#sec-base-presentation) to keep shared features in a single base DT, or consolidate features into a single wide Feature View. - The `$` character is legal in unquoted Snowflake identifiers for most SQL statements (`SELECT`, `ALTER`, `DESCRIBE`, etc.). However, you need **double-quoting** when using the name in `ALTER ... SET CONTACT`, `ALTER ... SET TAG`, or similar DDL that could misparse the `$`, and in Snowpark `session.table('"USER_ORDER_FV$V01"')` where the string is passed to the SQL layer. ### Schema Evolution and Deprecation {#sec-schema-evolution} ::: {.callout-warning} ## Breaking changes require a new version **Do not** alter a registered Feature View in place (same name + version) when semantics change. Register a **new version** (or a new Feature View name) instead. ::: | Change type | What to do | |-------------|------------| | **Adding new features** | Create and register a **new version** (e.g. **V02**) that includes the new columns; keep **V01** until consumers opt in. **Exception:** if the DT uses `SELECT *` from a base table and the source team adds a column there, the DT picks it up automatically — see [Incremental Column Pickup with SELECT *](#sec-select-star-evolution). | | **Deprecating features** | Leave the old version **registered** and documented as deprecated; remove columns only in a **new** version so existing training and serving jobs keep working. | | **Breaking changes** (rename column, change type or meaning, change entity keys, change `timestamp_col` semantics) | Always ship a **new version** (or new Feature View name). Never overwrite **V01** with incompatible definitions. | | **Consumers** | Point new pipelines to **V02**; migrate models and batch jobs on your timeline; retire **V01** only after usage drops to zero. Use [lineage](../12_operations/index.qmd#sec-lineage) to identify dependencies and usage. | Workflow summary: 1. **V01** — current production bundle. 2. **V02** — add columns or fix logic; train new models against **V02**. 3. Deprecate **V01** in docs and catalog; delete **V01** only when no remaining dependencies. ### Reducing the Cost of Schema Evolution with BACKFILL Adding features is the most common schema evolution scenario. The default path -- registering a new Feature View version with the additional columns -- creates a new Dynamic Table that must perform a **full initial refresh** over the entire source dataset. For large DTs with months or years of history, this can be prohibitively expensive. Dynamic Tables now support [`BACKFILL FROM`](https://docs.snowflake.com/en/user-guide/dynamic-tables-performance-optimize-immutability) combined with [`IMMUTABLE WHERE`](https://docs.snowflake.com/en/user-guide/dynamic-tables/frozen-regions), which lets you **carry forward historical data** from the prior version's DT and only compute the new columns for the mutable (recent) window. The general workflow: 1. **Clone** the existing DT (V01) to a regular table. 2. **Add the new column(s)** to the clone (`ALTER TABLE ... ADD COLUMN`). Historical rows will have `NULL` for the new features. 3. **Create the new DT** (V02) with the updated SELECT that includes the new features, using `BACKFILL FROM` the clone and `IMMUTABLE WHERE` to protect the historical region. 4. Only the **mutable window** (e.g., last 1 day) is computed from the source data; the immutable region is copied from the backfill table at near-zero cost. ```sql -- 1. Clone V01's Dynamic Table to a regular table CREATE TABLE FEATURE_STORE.USER_ORDER_FV_V01_CLONE CLONE FEATURE_STORE.USER_ORDER_FV$V01; -- 2. Add the new column (NULL for historical rows) ALTER TABLE FEATURE_STORE.USER_ORDER_FV_V01_CLONE ADD COLUMN ORDER_TOTAL_AMT_AVG FLOAT; -- 3. Create V02 with BACKFILL + IMMUTABLE WHERE CREATE DYNAMIC TABLE FEATURE_STORE.USER_ORDER_FV$V02 BACKFILL FROM FEATURE_STORE.USER_ORDER_FV_V01_CLONE IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '1 day') TARGET_LAG = '1 hour' WAREHOUSE = FS_DEV_WH AS SELECT USER_ID, SUM(TOTAL_AMT) AS ORDER_TOTAL_AMT_SUM, COUNT(ORDER_ID) AS ORDER_CNT, AVG(TOTAL_AMT) AS ORDER_TOTAL_AMT_AVG, -- new feature MAX(ORDER_TS) AS LAST_ORDER_TS FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS GROUP BY USER_ID; ``` After creation, the DT's immutable region contains the backfilled data (with `NULL` for `ORDER_TOTAL_AMT_AVG` in historical rows), while the mutable region is computed fresh and includes all columns. You can verify with `SELECT *, METADATA$IS_IMMUTABLE FROM FEATURE_STORE.USER_ORDER_FV$V02`. ::: {.callout-important title="TAG registration required for API visibility"} Objects created via SQL in the Feature Store schema are **not visible** to `fs.list_feature_views()`, `fs.get_feature_view()`, or the Snowsight Feature Store UI unless the correct internal TAGs are applied. After creating the DT via SQL, add the Feature Store tags so the API and UI can discover it. See [Appendix D: Tag Convention](../appendices/D_tag_convention/index.qmd) for the required TAG schema and a worked example. If the object is only consumed via direct SQL (not through the Feature Store API), tags are optional. ::: ::: {.callout-warning} ## Not yet available via Feature Store API `BACKFILL FROM` and `IMMUTABLE WHERE` are **Dynamic Table SQL features** -- they are not currently exposed through the Feature Store Python API (`FeatureView`, `register_feature_view`). To use this pattern you would create the DT via SQL and then register it as a view-based Feature View pointing at the resulting table, or manage the DT directly outside the API. Check the [Feature Store release notes](https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store) for future API integration. ::: {.callout-tip} ## Inspecting generated SQL with `fv.query` The `FeatureView.query` property returns the compiled SQL from your `feature_df` -- the exact statement that becomes the `AS` clause in the `CREATE DYNAMIC TABLE` or `CREATE VIEW`. You can use this to inspect, copy, or adapt the SQL before creating the DT manually with additional clauses (e.g., `BACKFILL FROM`, `IMMUTABLE WHERE`): ```python draft_fv = FeatureView(name="MY_FV", entities=[entity], feature_df=my_df, ...) print(draft_fv.query) # prints the SQL that would be used in the AS clause ``` This gives you the inner query only. For the full DDL of an already-registered DT (including `TARGET_LAG`, `WAREHOUSE`, `TAGS` etc.), use `SELECT GET_DDL('DYNAMIC_TABLE', 'FEATURE_STORE."MY_FV$V01"')`. ::: ::: ::: {.callout-note} ## Downstream impact Downstream Dynamic Tables that read from the evolved DT will still need to refresh to pick up the schema change. If the downstream DTs also use `IMMUTABLE WHERE`, only their mutable window is reprocessed. ::: ### Retroactive IMMUTABLE WHERE (no clone needed) {#sec-retroactive-immutable} The clone-and-backfill workflow above creates a new DT version. A simpler alternative -- when you can accept `NULL` for the new feature in historical rows -- is to add an `IMMUTABLE WHERE` constraint **retroactively** to the existing DT, replace the upstream source (or Feature View query), and refresh. Only the mutable window is recomputed; the immutable region is preserved as-is. ```sql -- 1. Protect history on the existing DT ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '1 day'); -- 2. Update the Feature View definition (e.g., add AVG feature) -- If the FV reads from a VIEW, replace the VIEW: CREATE OR REPLACE VIEW FEATURE_STORE.USER_ORDER_SOURCE_V AS SELECT USER_ID, SUM(TOTAL_AMT) AS ORDER_TOTAL_AMT_SUM, COUNT(ORDER_ID) AS ORDER_CNT, AVG(TOTAL_AMT) AS ORDER_TOTAL_AMT_AVG, -- new feature MAX(ORDER_TS) AS LAST_ORDER_TS FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS GROUP BY USER_ID; -- 3. Refresh — only mutable rows recomputed ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH; -- 4. (Optional) Remove constraint if not needed permanently ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" UNSET IMMUTABLE WHERE; ``` After the refresh, immutable rows have `NULL` for `ORDER_TOTAL_AMT_AVG` while mutable rows have the computed value. Verify with `SELECT *, METADATA$IS_IMMUTABLE FROM FEATURE_STORE."USER_ORDER_FV$V01"`. ::: {.callout-tip} ## OBJECT columns make this seamless With OBJECT-based Feature Views (see [Chapter 13: Wide & Sparse Data](../13_advanced_patterns/index.qmd#sec-wide-sparse)), heterogeneous key sets across the immutable/mutable boundary are handled naturally. Immutable rows simply lack the new key in the OBJECT; `COALESCE(features:"NEW_KEY"::NUMBER, 0)` fills the default at read time. In scalar DTs, the same pattern produces a visible `NULL` column in historical rows. ::: ::: {.callout-note} ## Lazy backfill If you later need the new feature backfilled across all history, you have two options: **Full refresh** -- remove the constraint and recompute everything in a single off-peak window: ```sql ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" UNSET IMMUTABLE WHERE; ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH; ``` **Phased backfill** -- progressively move the immutability boundary backward (e.g., one month at a time), so each refresh recomputes only a slice of history. This spreads the cost across multiple windows and avoids a single large recompute: ```sql -- Phase 1: backfill the most recent immutable month ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '60 days'); ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH; -- Phase 2: backfill the next month ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '90 days'); ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH; -- Continue until all history is backfilled, then lock down again ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '1 day'); ``` Phased backfill is useful when the full history is too large for a single off-hours window, or when warehouse budget constraints require distributing the compute across days. ::: ### Incremental Column Pickup with SELECT * {#sec-select-star-evolution} Snowflake has updated the behaviour of Dynamic Tables defined with `SELECT * FROM base_table`: when a **new column is added** to the base table, the DT now picks it up on its next refresh rather than failing with an error. ::: {.callout-note title="Previous behaviour"} The Snowflake documentation previously stated: > "Changes to underlying base table for dynamic tables created with `SELECT *` > from base table: The dynamic table **fails to refresh** and must be > recreated to respond to the change." This constraint has been relaxed for **additive** changes (new columns). Non-additive changes — column drops, renames, and type modifications — still require recreation. ::: **When this applies.** If a Feature View's DT is a straightforward pass-through of a base table using `SELECT *`, new columns added to the source automatically appear in the DT: ```sql -- Feature View DT as a simple pass-through CREATE DYNAMIC TABLE FEATURE_STORE."USER_FEATURES$V01" TARGET_LAG = '1 hour' WAREHOUSE = FS_WH REFRESH_MODE = INCREMENTAL AS SELECT * FROM SOURCE_DB.DATA.USER_FEATURES_BASE; ``` When `USER_FEATURES_BASE` gains a new column (e.g., `RISK_SCORE FLOAT`), `ALTER TABLE ... ADD COLUMN RISK_SCORE FLOAT`, the column will appear in the DT on its next scheduled refresh or a manual `ALTER DYNAMIC TABLE ... REFRESH`. **Backfill semantics for pre-existing rows.** All rows — including those that existed before the column was added — are **recomputed from source** and receive the correct value for the new column. The `REFRESH_ACTION` shows `INCREMENTAL` with `ins=N` / `del=N` equal to the total row count: the DT deletes all existing rows and re-inserts them with the new column evaluated. There is **no NULL fill** for pre-existing rows. ::: {.callout-important title="Full row-level recompute despite INCREMENTAL label"} The `INCREMENTAL` refresh action after a column add is a full scan of all rows in the source — every row is deleted and re-inserted. The cost is equivalent to a full refresh. The `INCREMENTAL` label refers to the DT engine's change-tracking mechanism, not to processing only the rows that changed in the source. For large DTs with many rows, a column add to the base table will therefore be an expensive operation. If the DT also has downstream dependents, each downstream DT also does a full row-level recompute in turn. Use `IMMUTABLE WHERE` (see below) if you need to protect the historical region from this recompute cost. ::: **IMMUTABLE WHERE interaction.** When `IMMUTABLE WHERE` is set on the DT, the column-add refresh respects the boundary: immutable rows are **not recomputed** (ins=0 / del=0 for protected rows) and their value for the new column is `NULL`. Mutable rows are recomputed from source and receive the correct value. This is consistent with the retroactive `IMMUTABLE WHERE` behaviour [described above](#sec-retroactive-immutable). ::: {.callout-warning title="Uncontrolled upstream ADD COLUMN triggers automatic recompute"} The `SELECT *` pickup is **always on**. Any `ADD COLUMN` to the base table — even one added by another team for an unrelated purpose — will trigger a full row-level recompute (ins=N/del=N) on this DT and on every downstream `SELECT *` DT in the cascade. In a **shared or multi-team environment** where you do not own the base table, prefer the OBJECT-bundled approach with a VIEW intermediary. The VIEW acts as a gate: upstream schema changes are inert until you deliberately replace the VIEW, giving you control over timing, scope, and cost. See [Chapter 13: Choosing between approaches](../13_advanced_patterns/index.qmd#sec-select-star-vs-object) for the full trade-off analysis. ::: **DT cascade behaviour.** Downstream DTs that also use `SELECT *` from the DT that gained a column will pick it up on their own next refresh. Downstream DTs with explicit column lists are unaffected until their query is updated. **When this is the right choice.** - The DT is a thin pass-through over a single managed base table where the source team adds columns intentionally. - Consumers can tolerate `NULL` for the new column in historical rows, or the DT history is short enough that a full refresh is acceptable. - Feature count stays below ~100 columns (scalar column-per-feature stays within compilation and join-performance thresholds). - No explicit version gate is needed — new columns become visible immediately to all consumers. For more complex cases — computed feature expressions, wide feature sets (100+ columns), sparse data, or when historical rows must not be NULLed — the BACKFILL + IMMUTABLE WHERE workflow and the [OBJECT-bundled pattern](../13_advanced_patterns/index.qmd#sec-object-immutable) remain the recommended approaches. ::: {.callout-tip title="OBJECT_CONSTRUCT(*) inline in DT body — confirmed for pass-through bundles"} Writing `OBJECT_CONSTRUCT(* EXCLUDE (entity_id, ts)) AS features` directly in the DT's `AS` clause (no VIEW intermediary) has been prototyped and confirmed to work for **pass-through bundles** — where features are already derived and materialised upstream of this Feature View: - New upstream physical columns auto-appear as OBJECT keys on the next refresh (`REFRESH_ACTION = INCREMENTAL`) - The DT's physical schema stays stable (3 columns: entity key, timestamp, OBJECT) - All rows in the mutable region are recomputed and receive the new key value - No VIEW to create, replace, or maintain when the source already holds the finished features **When the VIEW intermediary is the right choice.** If feature derivation happens at the Feature View layer — ratios, flags, windowed functions that are not physical columns in the source — use an intermediary VIEW to hold the derivation logic and read from that VIEW with `OBJECT_CONSTRUCT(*)` in the DT. The key benefit: ALTER-ing the VIEW to add a new derived column is also picked up by the `OBJECT(*)` DT **incrementally** (`REFRESH_ACTION = INCREMENTAL`, not `REINITIALIZE`), giving you the same additive column pickup behaviour with the additional benefit of a controlled gate — the DT is completely unaffected by upstream schema changes until you deliberately replace the VIEW. See [Chapter 13: Choosing between approaches](../13_advanced_patterns/index.qmd#sec-select-star-vs-object) for the full comparison. ::: ::: {.callout-warning title="Non-additive changes still require recreation"} Column drops, renames, and type changes on the base table still cause the DT to fail on refresh and require recreation. For these changes, create a new Feature View version using the standard workflow [described above](#sec-schema-evolution). ::: ::: {.callout-warning} ## Not available via Feature Store Python API `SELECT *` DTs and their schema evolution are a SQL-layer capability. The Feature Store Python API (`FeatureView`, `register_feature_view`) does not expose this directly. Create the DT via SQL, then apply the Feature Store TAGs so the API and Snowsight UI can discover it (see [Appendix D: Tag Convention](../appendices/D_tag_convention/index.qmd)). ::: ### Alternative: Base + Presentation Layer Model {#sec-base-presentation} The standard approach creates a new version (a new Dynamic Table) for each schema change. An alternative for organizations with complex, frequently evolving feature sets is a **two-layer model** that separates physical storage from the consumer-facing view-based Feature Views. This could be implemented within a single Feature Store Schema through naming convention, or two separate Feature Store Schemas to provide separation of concerns: ```sql -- Single Feature Store Schema CREATE SCHEMA FEATURE_STORE; -- Two Feature Store Schemas CREATE SCHEMA FEATURE_STORE_BASE; CREATE SCHEMA FEATURE_STORE_PRESENTATION; ``` ```{mermaid} %%| fig-cap: "Base layer Dynamic Table feature views feeding a presentation View" %%| fig-alt: "Three base feature views combine into one combined view" flowchart TB B1[USER_ORDER_BASE DT] B2[USER_ENGAGE_BASE DT] B3[USER_PROMO_BASE DT] B1 & B2 & B3 --> COMB[USER_COMBINED_FV View joins bases] ``` **How it works:** 1. **Base layer** -- each feature domain has its own versioned Dynamic Table (or View) Feature View. These are the immutable physical units that hold the actual feature-engineering logic and materialized data. 2. **Presentation layer** -- a View-based Feature View that JOINs multiple base Feature Views together on their shared entity key (and timestamp column if required). Consumers query the presentation layer to get a unified feature set. A key principle of this model is **feature deduplication**: each feature computation exists in exactly one base object. Presentation layers compose existing features without re-implementing them. When a feature definition changes, only the base layer needs a new version -- the presentation layer's `SELECT *` automatically picks up the new columns. If individual features within a single Feature View evolve at different rates, consider feature-level version suffixes in naming (e.g., `ORDER_TOTAL_AMT_SUM_V01`, `ORDER_TOTAL_AMT_SUM_V02`) to maintain backward compatibility while the base layer transitions. ```python # Base Feature Views (DT-backed, versioned independently) user_order_base = FeatureView( name="USER_ORDER_BASE_FV", entities=[user_entity], feature_df=user_order_df, timestamp_col="LAST_ORDER_TS", refresh_freq="1 hour", desc="Base: lifetime order aggregates", ) user_engage_base = FeatureView( name="USER_ENGAGE_BASE_FV", entities=[user_entity], feature_df=user_engage_df, timestamp_col="LAST_SESSION_TS", refresh_freq="1 hour", desc="Base: session engagement features", ) # Presentation layer (View-based, combines bases) ``` ::: {.panel-tabset group="lang"} ## SQL ```python combined_df = session.sql(""" SELECT o.*, e.SESSION_CNT, e.AVG_SESSION_DURATION_SEC FROM FEATURE_STORE.USER_ORDER_BASE_FV$V01 o JOIN FEATURE_STORE.USER_ENGAGE_BASE_FV$V01 e USING (USER_ID) """) ``` ## Snowpark DataFrame ```python order_base = session.table('FEATURE_STORE."USER_ORDER_BASE_FV$V01"') engage_base = session.table('FEATURE_STORE."USER_ENGAGE_BASE_FV$V01"') combined_df = order_base.join(engage_base, "USER_ID").select( order_base["*"], engage_base["SESSION_CNT"], engage_base["AVG_SESSION_DURATION_SEC"], ) ``` ::: ```python user_combined_fv = FeatureView( name="USER_COMBINED_FV", entities=[user_entity], feature_df=combined_df, timestamp_col="LAST_ORDER_TS", refresh_freq=None, # View — always reflects latest base DT state desc="Presentation: combines order + engagement base Feature Views", ) ``` **When to add a new feature**, you create a new version of just the affected base DT (e.g., `USER_ORDER_BASE_FV$V02` with the extra column), then update the presentation View's SQL to reference `$V02`. The other base DTs are untouched -- no duplication of existing features across multiple DTs. **Trade-offs:** | | Standard versioning | Base + Presentation | |---|---|---| | **Simplicity** | One DT per version, straightforward | Multiple layers, JOIN management | | **Feature duplication** | New version duplicates unchanged features | Only changed base DT is re-created | | **Consumer migration** | Consumers point to new version | Presentation view updated centrally | | **Best for** | Small-to-medium feature sets, simple schemas | Large feature sets with frequent, incremental additions across domains | ::: {.callout-note} This pattern adds operational complexity and is best suited to organizations with mature Feature Store practices and large numbers of features that evolve at different rates. For most teams starting out, standard version-per-bundle is simpler and sufficient. The Feature Store API does not natively manage the base-to-presentation relationship today -- you manage the presentation View's SQL definition yourself (or via CI/CD). ::: ### Version Lifecycle ```{mermaid} %%| fig-cap: "Feature View version lifecycle V01 through V03" %%| fig-alt: "Three versions of a feature view from initial to current" flowchart LR V01[V01 Initial] --> V02[V02 Update] --> V03[V03 Current] ``` ### Managing Versions **List registered Feature Views:** ::: {.panel-tabset} ## Python API ```python fs.list_feature_views().select("NAME", "VERSION", "DESC").show() ``` ## SQL ```sql -- Dynamic Table-backed Feature Views SHOW DYNAMIC TABLES IN SCHEMA FEATURE_STORE_DEMO.FEATURE_STORE; -- View-based Feature Views SHOW VIEWS IN SCHEMA FEATURE_STORE_DEMO.FEATURE_STORE; -- Combined column-level inventory SELECT TABLE_NAME AS FEATURE_VIEW, COLUMN_NAME, DATA_TYPE FROM FEATURE_STORE_DEMO.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'FEATURE_STORE' ORDER BY TABLE_NAME, ORDINAL_POSITION; ``` ::: ```{python} #| echo: false all_fvs = fs.list_feature_views().select("NAME", "VERSION", "DESC").show() ``` **Delete an old version** (only after all consumers have migrated): ::: {.panel-tabset} ## Python API ```python fs.delete_feature_view(name="USER_ORDER_FV", version="V01") ``` ## SQL ```sql -- For DT-backed Feature Views DROP DYNAMIC TABLE IF EXISTS FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"; -- For View-based Feature Views DROP VIEW IF EXISTS FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"; ``` ::: **Before deleting a version**, verify that no models, datasets, or pipelines still depend on it: ::: {.panel-tabset} ## Python API ```python v01_fv = fs.get_feature_view(name="USER_ORDER_FV", version="V01") # ML-specific dependencies (datasets, models, other FVs) downstream = v01_fv.lineage(direction="downstream") for node in downstream: print(f"{type(node).__name__}: {node.name}") # Narrower: only check if any models reference V01 model_deps = v01_fv.lineage(direction="downstream", domain_filter=["model"]) ``` Requires `snowflake-ml-python >= 1.6.0` and the `VIEW LINEAGE` privilege. ## SQL ```sql -- Snowflake-level object dependencies (DT chains, views, tasks) SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES WHERE REFERENCED_OBJECT_NAME = 'USER_ORDER_FV$V01' AND REFERENCED_OBJECT_DOMAIN IN ('TABLE', 'DYNAMIC TABLE', 'VIEW'); -- Recent query access to V01 SELECT QUERY_START_TIME, USER_NAME, QUERY_TEXT FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY WHERE BASE_OBJECTS_ACCESSED LIKE '%USER_ORDER_FV$V01%' ORDER BY QUERY_START_TIME DESC LIMIT 20; ``` ::: Only delete a version when both the API lineage check and the SQL dependency/access-history checks confirm zero remaining usage. See [Chapter 12: Operations](../12_operations/index.qmd) for more on lineage tracking. ### Getting the Latest Version and Creating the Next For maximum flexibility, Feature Store is intentionally not opinionated on version semantics—there's **no built-in "latest version" alias** or auto-increment. Avoid hard-coding version strings throughout your codebase; instead, use helper utilities that derive the next version programmatically: > 📁 **Full code:** [`_code/version_utils.py`](_code/version_utils.py) ```python import re from typing import Optional def get_latest_feature_view( fs: FeatureStore, feature_view_name: str, ) -> Optional[FeatureView]: """Get the most recent version of a Feature View.""" versions = fs.list_feature_views(feature_view_name=feature_view_name) # Sort by version string (works with zero-padded versions) versions_sorted = sorted(versions, key=lambda fv: fv.version, reverse=True) return versions_sorted[0] if versions_sorted else None def next_version( fs: FeatureStore, feature_view_name: str, prefix: str = "V", pad: int = 2, ) -> str: """ Derive the next sequential version string for a Feature View. If USER_ORDER_FV has versions V01 and V02, returns "V03". If no versions exist, returns "V01". """ versions = fs.list_feature_views(feature_view_name=feature_view_name) max_num = 0 pattern = re.compile(rf"^{re.escape(prefix)}(\d+)$") for fv in versions: m = pattern.match(fv.version) if m: max_num = max(max_num, int(m.group(1))) return f"{prefix}{max_num + 1:0{pad}d}" ``` Usage: ```python version = next_version(fs, "USER_ORDER_FV") # "V03" if V01, V02 exist fs.register_feature_view(fv, version=version) ``` This keeps version assignment consistent across manual runs, notebooks, and CI/CD pipelines without anyone needing to remember what the current version is. --- ## Refresh Configuration {#sec-refresh} ### Refresh Frequency Examples (time-period mode) When using a **period** string (not CRON), you are configuring **TARGET_LAG**-style refresh for the backing Dynamic Table. | Frequency | Example use case | Cost impact | |-----------|------------------|-------------| | `1 minute` | Tight operational or fraud signals | Very high | | `15 minutes` | Operational dashboards | High | | `1 hour` | Standard batch ML features | Medium | | `1 day` | Daily analytics features | Low | ### Best Practices for Refresh ```python # Align period with how often upstream CLICKSTREAM_DATA tables land meaningful deltas fv = FeatureView( refresh_freq="4 hours", # ... ) # For Online Feature Tables, OFT sync frequency should be <= Feature View refresh cadence ``` ::: {.callout-tip} **No changes = no compute.** The Dynamic Table scheduler checks for upstream changes before triggering a refresh. If no source data has changed since the last refresh, the scheduler skips the cycle without consuming any warehouse compute. This means setting a shorter `refresh_freq` than the actual source update rate incurs near-zero cost from the change-detection check itself -- the DT simply checks more often and finds nothing to do. You do not need to perfectly match `refresh_freq` to source cadence to avoid wasted spend; err on the side of slightly shorter lag for better freshness guarantees. ::: --- ## Feature View Ownership & SLAs {#sec-ownership} The Feature Store API provides `desc` on Feature Views and Entities, plus Snowflake [object tags](https://docs.snowflake.com/en/sql-reference/sql/create-tag) on the underlying Dynamic Tables and Views. For many teams, this is sufficient. However, the API does not natively track ownership, consumer lists, SLA targets, or other operational metadata that mature ML platform users might need for governance and compliance. You have several options for extending the metadata layer to meet these needs -- none are part of the Feature Store product itself, but all integrate naturally using Snowflake's built-in features: **Snowflake Tags.** Attach key-value metadata directly to Feature View objects. Tags are queryable via `INFORMATION_SCHEMA` and `TAG_REFERENCES`, and can be used for governance, cost attribution, and discovery: ```sql CREATE TAG IF NOT EXISTS FS_OWNER ALLOWED_VALUES 'marketing-ds', 'risk-eng', 'platform'; CREATE TAG IF NOT EXISTS FS_SLA_MINUTES; ALTER DYNAMIC TABLE FEATURE_STORE.USER_ORDER_FV$V01 SET TAG FS_OWNER = 'marketing-ds', FS_SLA_MINUTES = '90'; -- Query ownership across all Feature Views SELECT OBJECT_NAME, TAG_NAME, TAG_VALUE FROM TABLE(INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS( 'FEATURE_STORE_DEMO.FEATURE_STORE', 'SCHEMA')) WHERE TAG_NAME IN ('FS_OWNER', 'FS_SLA_MINUTES'); ``` **Metadata tables.** Maintain a registry table within your Feature Store schema, keyed by the object name (`<FV_NAME>$<VERSION>`). This can hold owner, team, consumer list, SLA targets, deprecation status, and anything else you need: ```sql CREATE TABLE IF NOT EXISTS FEATURE_STORE.FEATURE_VIEW_METADATA ( FV_OBJECT_NAME STRING PRIMARY KEY, -- e.g. USER_ORDER_FV$V01 OWNER STRING, TEAM STRING, CONSUMERS ARRAY, SLA_REFRESH STRING, SLA_MAX_LATENCY_MINUTES NUMBER, DEPRECATED BOOLEAN DEFAULT FALSE, NOTES STRING ); ``` You can build helper functions that populate this table automatically when registering a Feature View, and query it for discovery, alerting, or CI/CD gating. **External manifest files.** For teams that manage Feature Store definitions in Git alongside CI/CD, a YAML manifest can serve as the source of truth for ownership and SLA metadata. This is not consumed by Snowflake, but drives your deployment scripts and documentation: ```yaml # feature_manifest.yaml feature_views: - name: USER_ORDER_AGGREGATES owner: marketing-ds@company.com team: Marketing Data Science description: Core user behavior from ORDERS and ORDER_ITEMS consumers: - churn-model - ltv-model sla: refresh_frequency: "1 hour" max_latency_minutes: 90 ``` **Snowflake Contacts.** [Contacts](https://docs.snowflake.com/en/user-guide/governance-contacts) are schema-level objects that you assign to databases, schemas, tables, views, and Dynamic Tables to indicate who is responsible for different purposes. They integrate with Snowsight's Catalog and are queryable via `ACCOUNT_USAGE` views: ```sql -- Create contacts for the Feature Store team CREATE CONTACT IF NOT EXISTS FS_STEWARD METHOD EMAIL 'feature-store-team@company.com'; CREATE CONTACT IF NOT EXISTS FS_SUPPORT METHOD EMAIL 'fs-support@company.com'; -- Assign at schema level -- all Feature Views inherit automatically ALTER SCHEMA FEATURE_STORE_DEMO.FEATURE_STORE SET CONTACT STEWARD = FS_STEWARD, SUPPORT = FS_SUPPORT; -- Override for a specific Feature View if a different team owns it ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01" SET CONTACT STEWARD = MARKETING_DS_STEWARD; ``` Each assignment specifies a **purpose** -- `STEWARD` (data accuracy), `SUPPORT` (technical questions), or `ACCESS_APPROVAL` (who approves access). Contacts inherit down the hierarchy (database → schema → table), so assigning at the schema level covers all Feature Views with a single statement. Direct assignment on a child overrides the inherited contact for that purpose. To audit contact assignments across your Feature Store: ```sql SELECT object_name, purpose, CASE WHEN is_inherited THEN 'Inherited' ELSE 'Direct' END AS assignment FROM SNOWFLAKE.ACCOUNT_USAGE.CONTACT_REFERENCES WHERE object_database = 'FEATURE_STORE_DEMO' AND object_schema = 'FEATURE_STORE' AND deleted IS NULL ORDER BY object_name, purpose; ``` **Cortex-powered object descriptions.** Snowflake can [automatically generate human-readable descriptions](https://docs.snowflake.com/en/user-guide/sql-cortex-descriptions) for tables, views, and their columns using Snowflake-hosted LLMs. This is useful for Feature Store objects where feature names alone (e.g., `ORDER_TOTAL_AMT_SUM`) may not be self-explanatory to consumers unfamiliar with the domain. Descriptions are stored as the standard `COMMENT` on the object, so they appear in `DESCRIBE`, `SHOW`, Snowsight's Catalog, and any data catalog tooling. You can generate descriptions from Snowsight (navigate to the object → "Generate description") or via SQL using the `AI_GENERATE_TABLE_DESC` stored procedure: ```sql -- Generate descriptions for a Feature View's DT and all its columns CALL AI_GENERATE_TABLE_DESC( 'FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"', { 'describe_columns': true, 'use_table_data': true } ); ``` The procedure returns a JSON object with suggested descriptions. To persist them, set them as comments: ```sql ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01" SET COMMENT = 'User-level purchase aggregates from clickstream ORDERS: total spend, order count, average order value, and most recent order timestamp.'; ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01" MODIFY COLUMN ORDER_TOTAL_AMT_SUM COMMENT = 'Sum of all order amounts (TOTAL_AMT) for the user'; ``` For Feature Stores with many Feature Views, you can script this across an entire schema using the stored procedure examples in the [documentation](https://docs.snowflake.com/en/user-guide/sql-cortex-descriptions#examples). Generated descriptions significantly improve feature discoverability -- consumers searching the catalog can find features by what they represent, not just by column name. These approaches are complementary. A common pattern is to use **Contacts** for ownership and stewardship visible in Snowsight, **Tags** for lightweight governance queries and cost attribution, **Cortex descriptions** for human-readable feature documentation, a **metadata table** for operational dashboards and alerting, and a **YAML manifest** in Git for CI/CD-driven deployments. See [Chapter 12: Operations](../12_operations/index.qmd) for monitoring SLA compliance and auditing metadata at scale. --- ## Hierarchical Feature Views {#sec-hierarchy-fvs} When source data exists at a **child grain** (e.g., line items, events) but the model operates at a **parent grain** (e.g., orders, users), create **aggregated rollup Feature Views** that project child-level data up to the parent entity. ::: {.callout-warning title="Do not stack tiled Feature Views"} If you need to roll up from a finer-grained tiled (Time Aggregation) Feature View to a coarser entity, use `RollupConfig` (see [Chapter 7: Rollup Feature Views](../07_aggregations_api/index.qmd#sec-rollup)). **Do not** create a new tiled Feature View that reads from another tiled Feature View -- this causes duplicate internal `_PARTIAL_*` column names and SQL compilation errors. The correct hierarchy pattern for tiled FVs is: raw events → tiled FV (fine entity) → Rollup FV (coarse entity) via `RollupConfig`. ::: ```python # Raw lineitem Feature View at the finest grain (optional: useful for line-level models) lineitem_fv = FeatureView( name="LINEITEM_FV", entities=[lineitem_entity], feature_df=lineitem_df, timestamp_col="SHIP_DATE", refresh_freq="1 hour", desc="Raw lineitem features at (ORDER_ID, LINE_NUMBER) grain", ) ``` ::: {.panel-tabset group="lang"} ## SQL ```python order_lineitem_agg_df = session.sql(""" SELECT ORDER_ID, SUM(EXTENDED_PRICE) AS ORDER_TOTAL_AMT, COUNT(*) AS LINE_COUNT, AVG(EXTENDED_PRICE) AS AVG_LINE_PRICE, MAX(SHIP_DATE) AS LAST_SHIP_DATE FROM ORDER_ITEMS GROUP BY ORDER_ID """) ``` ## Snowpark DataFrame ```python import snowflake.snowpark.functions as F order_items = session.table("ORDER_ITEMS") order_lineitem_agg_df = order_items.group_by("ORDER_ID").agg( F.sum("EXTENDED_PRICE").alias("ORDER_TOTAL_AMT"), F.count("*").alias("LINE_COUNT"), F.avg("EXTENDED_PRICE").alias("AVG_LINE_PRICE"), F.max("SHIP_DATE").alias("LAST_SHIP_DATE"), ) ``` ::: ```python order_agg_fv = FeatureView( name="ORDER_LINEITEM_AGG_FV", entities=[order_entity], feature_df=order_lineitem_agg_df, timestamp_col="LAST_SHIP_DATE", refresh_freq="1 hour", desc="Lineitem features aggregated to the order level", ) ``` When `generate_training_set` or `generate_dataset` joins Feature Views at **different entity grains**, higher-level features are **denormalized** -- repeated for every child row in the training data. For example, a spine at the lineitem grain joined with both `lineitem_fv` and `order_agg_fv` produces one row per lineitem, with order-level aggregates duplicated across all lines of the same order. If denormalization is undesirable (e.g., the model expects one row per order), either: - Use a **spine at the parent grain** and only include parent-level Feature Views, or - Create a **pivot Feature View** that widens child data into columns (e.g., `LINE_1_PRICE`, `LINE_2_PRICE`) -- though this is only practical with a bounded number of children. For unbounded child sets, use `OBJECT_AGG`, `ARRAY_AGG`, or `LISTAGG` to pack child data into a single column, then process it at retrieval time in Python (e.g., extract the last-N values or compute further aggregations over the array). See [Chapter 13: Wide & Sparse Feature Data](../13_advanced_patterns/index.qmd#sec-wide-sparse) for the OBJECT packing pattern. See [Chapter 3: Modeling at a Different Grain](../03_entities_hierarchies/index.qmd#sec-cross-grain) for entity design considerations and [Chapter 11: Training & Inference](../11_training_inference/index.qmd) for multi-entity spine design. --- ## Best Practices ### 1. Use Consistent Naming {.unnumbered} ```python # Pattern: <ENTITY>_<DOMAIN>_FV (SCREAMING_SNAKE_CASE) "USER_ORDER_FV" "PRODUCT_CATALOG_FV" "SESSION_ENGAGEMENT_FV" ``` Entity keys and table names should stay **singular** SCREAMING_SNAKE_CASE where applicable (`USER`, `PRODUCT`, …). Prefer aggregate suffixes **`_CNT`**, **`_SUM`**, **`_AVG`**, **`_TS`**, boolean prefixes **`IS_`**, and monetary columns like **`TOTAL_AMT`** (not `AMOUNT`). ::: {.callout-tip title="Plan feature prefixing early at scale"} When many Feature Views share the same entity (common in large deployments -- 50+ Feature Views for a single entity like `USER`), column name collisions become inevitable. Rather than resolving collisions reactively during training data generation, **adopt a prefixing convention as part of your Feature View naming standard from day one**. Options include: - **Use `auto_prefix=True`** on all `generate_dataset` / `generate_training_set` calls (columns become `FV_NAME__FEATURE_NAME`). - **Use short `.with_name()` prefixes** per domain (e.g., `"ord"`, `"sess"`, `"eng"`) for more concise column names. - **Embed the domain prefix in the `.alias()` call** on each `Feature` definition (e.g., `Feature.count("ORDER_ID", "7d").alias("ORD_ORDER_ID_CNT_7D")`). The choice affects downstream model code, feature importance reporting, and monitoring dashboards -- so it is worth standardizing before the Feature View count grows large. See [Chapter 7: Prefixing](../07_aggregations_api/index.qmd#sec-prefixing) and [Chapter 11: Feature Column Prefixing](../11_training_inference/index.qmd#sec-training-prefixing) for detailed patterns. ::: ### 2. Consolidate Feature Views from the Same Source {.unnumbered} When multiple tiled Feature Views share the same source table, entity, and tile granularity, consolidate them into a single Feature View. Separate Feature Views over identical sources create independent tile DTs, duplicating compute. See [Chapter 7: Consolidate Feature Views](../07_aggregations_api/index.qmd#sec-consolidate) for detailed guidance and exceptions. ### 3. Document Feature Definitions {.unnumbered} ```python fv = FeatureView( name="USER_ORDER_FV", desc=""" USER-level order features from CLICKSTREAM_DATA.ORDERS for propensity models. Features: - ORDER_TOTAL_AMT_SUM: Sum of TOTAL_AMT across orders - ORDER_CNT: Count of orders - ORDER_TOTAL_AMT_AVG: Average order TOTAL_AMT - LAST_ORDER_TS: Latest ORDER_TS (point-in-time column) SLA: Refreshed hourly (TARGET_LAG), max 90 min lag Owner: ml-platform@company.com """, # ... ) ``` The Feature Store API also supports **per-feature descriptions** via `attach_feature_desc()`. These descriptions are stored as metadata and surfaced in `list_columns()`, making individual features discoverable: ```python fv = fv.attach_feature_desc({ "ORDER_TOTAL_AMT_SUM": "Sum of TOTAL_AMT across all orders for this user", "ORDER_CNT": "Count of orders placed by this user", "ORDER_TOTAL_AMT_AVG": "Average TOTAL_AMT per order", "LAST_ORDER_TS": "Timestamp of the most recent order (point-in-time column)", }) registered_fv = fs.register_feature_view(fv, "V01") registered_fv.list_columns().show() ``` ### 4. Version on Any Consumer-Visible Change {.unnumbered} ```python # V01: Initial column set # V02: Added ORDERS_7D_CNT # V03: Changed aggregation logic (breaking) — new version, never in-place edit # Increment version when you: # - Add or remove features # - Change calculations or filters # - Change entity keys or timestamp semantics ``` --- ## Common Pitfalls ### ❌ Pitfall 1: No Versioning **Problem**: Registering changes under the same version breaks dependent models. **Solution**: Always register **new versions** (V02, V03, …) for consumer-visible changes. ### ❌ Pitfall 2: Modifying a Registered Feature View In Place **Problem**: Overwriting definitions for an existing name/version breaks replay and training reproducibility. **Solution**: Treat each registered version as **immutable**; publish a new version instead. ### ❌ Pitfall 3: Misunderstanding Refresh Cost **Problem**: Assuming a short `refresh_freq` always incurs high compute cost, or conversely, setting an unnecessarily long lag "to save money." **Solution**: The DT scheduler's change-detection check is near-zero cost -- no warehouse compute is used when nothing has changed. A shorter lag than the source update frequency is safe and provides better freshness guarantees. However, if the source *does* update frequently and the DT query is expensive, a shorter lag means more frequent full refreshes. Match `refresh_freq` to the freshness your consumers actually need, not to the source rate. ### ❌ Pitfall 4: No Documentation **Problem**: Teams cannot trust or reuse features. **Solution**: Use the `desc` parameter and keep a feature catalog in sync with **V01**, **V02**, … --- ## Summary | Concept | Description | |---------|-------------| | **View Feature View** | `refresh_freq=None` (default): query-time; pass-through, declarative transforms, or dev precursor to DT | | **Dynamic Table (period)** | `refresh_freq="1 hour"` (etc.): materialized; TARGET_LAG refresh; no compute when source unchanged | | **Dynamic Table + Task (CRON)** | `refresh_freq="0 8 * * *"`: initial DT refresh, then Task on schedule | | **Version (V01, V02, …)** | Immutable snapshot of feature definitions; `$`-separated object names | | **Schema evolution** | New features and breaking changes → new version; BACKFILL for cost-efficient evolution; base + presentation layer as alternative | | **Version retirement** | Validate zero usage via lineage and access history before deleting old versions | | **Ownership & SLAs** | Tags, Contacts, metadata tables, YAML manifests, Cortex descriptions | --- ## Next Steps Continue to [Chapter 5: Feature Pipelines](../05_feature_pipelines/index.qmd) to learn about pipeline architectures and orchestration patterns.

4.1 Overview

4.2 Learning Objectives

4.3 SQL vs Snowpark DataFrame API

4.4 How refresh_freq Controls Materialization

4.4.1 View-based Feature View (refresh_freq = None)

4.4.2 Dynamic Table Feature View (time period)

4.4.2.1 Pattern A: Cumulative lifetime features (one row per order event)

4.4.2.2 Pattern B: Latest-state aggregate (one row per entity)

4.4.3 Dynamic Table + Task (CRON)

4.4.4 External Scheduling (Suspend + On-Demand Refresh)

4.4.5 Choosing Between Modes

4.5 Versioning Strategies

4.5.1 Recommended Version Naming

4.5.2 How Versions Map to Snowflake Objects

4.5.3 Schema Evolution and Deprecation

4.5.4 Reducing the Cost of Schema Evolution with BACKFILL

4.5.5 Retroactive IMMUTABLE WHERE (no clone needed)

4.5.6 Incremental Column Pickup with SELECT *

4.5.7 Alternative: Base + Presentation Layer Model

4.5.8 Version Lifecycle

4.5.9 Managing Versions

4.5.10 Getting the Latest Version and Creating the Next

4.6 Refresh Configuration

4.6.1 Refresh Frequency Examples (time-period mode)

4.6.2 Best Practices for Refresh

4.7 Feature View Ownership & SLAs

4.8 Hierarchical Feature Views

4.9 Best Practices

1. Use Consistent Naming

2. Consolidate Feature Views from the Same Source

3. Document Feature Definitions

4. Version on Any Consumer-Visible Change

4.10 Common Pitfalls

4.10.1 ❌ Pitfall 1: No Versioning

4.10.2 ❌ Pitfall 2: Modifying a Registered Feature View In Place

4.10.3 ❌ Pitfall 3: Misunderstanding Refresh Cost

4.10.4 ❌ Pitfall 4: No Documentation

4.11 Summary

4.12 Next Steps

4.4 How `refresh_freq` Controls Materialization

4.4.1 View-based Feature View (`refresh_freq = None`)