user_order_latest_fv = FeatureView(
name="USER_ORDER_LATEST_FV",
entities=[user_entity],
feature_df=user_order_latest_df,
timestamp_col="LAST_ORDER_TS",
refresh_freq="1 hour",
desc="USER latest-state order aggregates: lifetime total spend, order count, "
"and average order value from CLICKSTREAM_DATA.ORDERS",
)
user_order_latest_fv = fs.register_feature_view(
feature_view=user_order_latest_fv, version="V01", block=True, overwrite=True,
)
print(f"Registered: {user_order_latest_fv.name}/V01 ({user_order_latest_fv.status})")
user_order_latest_df.limit(5).show()4 Feature Views
Dynamic Tables vs Views, refresh_freq modes, versioning, and lifecycle
snowflake, feature store, ml, machine learning, mlops
4.1 Overview
A Feature View is a collection of related features computed from source data. It defines how features are transformed, stored, and served. This chapter uses the clickstream sample model (FEATURE_STORE_DEMO.CLICKSTREAM_DATA: VISITORS, USERS, HOUSEHOLDS, PRODUCTS, CATEGORIES, SUPPLIERS, PRODUCT_SUPPLIER, EVENTS, SESSIONS, ORDERS, ORDER_ITEMS) and Feature Store objects in FEATURE_STORE_DEMO.FEATURE_STORE, warehouse FS_DEV_WH.
The refresh_freq parameter determines whether a Feature View is backed by a View (query-time) or a Dynamic Table (materialized), and whether refresh follows TARGET_LAG or a CRON-driven Task.
4.2 Learning Objectives
After completing this chapter, you will be able to:
- Interpret the three
refresh_freqbehaviors: omitted /None, time period, and CRON - Choose between Dynamic Table and View-based Feature Views for a workload, including using Views as a development precursor to Dynamic Tables
- Implement versioning strategies (V01, V02, …) for safe updates
- Apply schema-evolution rules: add versions for new features, deprecate without breaking consumers, never change a registered Feature View in place
- Evaluate the base + presentation layer model as an alternative to version-per-bundle for complex feature sets
- Manage Feature View lifecycle: create, register new versions, validate lineage, deprecate, and remove
- Configure refresh strategies for freshness vs. cost, understanding that the DT scheduler incurs no compute when source data is unchanged
📂 Chapter code: Browse companion scripts on GitHub
4.3 SQL vs Snowpark DataFrame API
Feature View definitions require a Snowpark DataFrame (feature_df). You can construct this using either session.sql("SELECT ...") (SQL) or the Snowpark DataFrame API (session.table(...).group_by(...).agg(...)). Both produce identical lazy Snowpark DataFrames and generate the same Dynamic Tables.
Throughout this guide, code examples are presented in both forms using tabs. SQL is shown first because it is more widely known across data science and engineering teams, but the Snowpark DataFrame API is equally valid.
| Factor | SQL (session.sql) |
Snowpark DataFrame API |
|---|---|---|
| Readability | Universally understood across data roles | Requires Python + Snowpark API knowledge |
| Debuggability | Maps 1:1 to what the Dynamic Table executes; easy to copy-paste into Snowsight | Emits machine-generated SQL that can be dense/nested (see Appendix C) |
| Composability | Harder to parameterize dynamically | Excels at programmatic feature generation (loops, conditional columns, config-driven specs) |
| Testability | Requires session.sql assertions or a SQL runner |
Integrates with Python test frameworks (pytest) |
| Verbosity | Typically more concise for simple aggregations | More concise for complex multi-step pipelines with reusable intermediates |
| Team skills | SQL is widely known (analysts, data scientists, data engineers) | Snowpark API is narrower (Python + Snowpark) |
| Maintenance | Easier to review in PRs for SQL-fluent teams | Easier for Python-fluent teams with IDE support |
There is no wrong choice. Pick the style that your team can read, review, and maintain. Many teams standardize on SQL for simple aggregations and reserve Snowpark DataFrames for programmatic generation (e.g., data-driven feature factories in Chapter 7).
4.4 How refresh_freq Controls Materialization
refresh_freq |
Backing object | Behavior |
|---|---|---|
None (default) or omitted |
View | Features computed at query time; always reflects current source data; no Dynamic Table storage |
Time period (e.g. "1 hour", "15 minutes") |
Dynamic Table | Materialized result; refresh driven by TARGET_LAG for that period |
CRON expression (e.g. "0 8 * * *") |
Dynamic Table + Task | Dynamic Table is created and receives an initial full refresh, then is suspended for ongoing lag-based refresh; a Task runs the refresh on the CRON schedule |
Snowflake distinguishes a schedule string (CRON) from a time period (lag). If the value parses as CRON, the platform uses the Dynamic Table + Task pattern described above—not continuous TARGET_LAG refresh.
Any DT-backed Feature View (time period or CRON) can optionally materialize as a Dynamic Iceberg Table by passing a StorageConfig:
from snowflake.ml.feature_store import StorageConfig, StorageFormat
storage = StorageConfig(
format=StorageFormat.ICEBERG,
external_volume='MY_EXTERNAL_VOLUME',
base_location='feature_store/my_fv'
)
fv = FeatureView(
name="MY_FV", entities=[e], feature_df=df,
refresh_freq="1 hour",
storage_config=storage,
...
)This writes feature data as Parquet files in Iceberg format on external cloud storage, making it directly readable by any Iceberg-compatible ML framework (Ray, Spark, PyTorch) without requiring a Snowflake connector. All standard Feature Store APIs work unchanged. See Chapter 10: Iceberg-Backed Feature Views for full details.
4.4.1 View-based Feature View (refresh_freq = None)
📁 Full code:
_code/dynamic_table_fv.py—create_view_featureview
View-based Feature Views are highly flexible. The underlying View can range from a simple pass-through to a complex declarative transformation, making them suitable for several distinct use cases:
1. Simple pass-through for externally managed tables. When feature engineering is handled by an external process (dbt, Airflow, a Stored Procedure, etc.), the Feature View simply wraps the already-prepared table. This registers it in the Feature Store catalog for discovery, lineage, and training-set generation without duplicating any transformation logic:
from snowflake.ml.feature_store import FeatureView, Entity
# Pass-through: dbt/Airflow manages the table; Feature View just wraps it
product_catalog_fv = FeatureView(
name="PRODUCT_CATALOG_FV",
entities=[product_entity],
feature_df=session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.PRODUCTS"),
timestamp_col="UPDATED_TS",
refresh_freq=None,
desc="PRODUCT attributes from CLICKSTREAM_DATA.PRODUCTS — managed by dbt",
)2. Declarative transformations. The View definition can encode complex feature-engineering logic – joins, aggregations, window functions, CASE expressions – evaluated at query time. This keeps the transformation logic self-contained within the Feature Store:
user_session_df = session.sql("""
SELECT
s.USER_ID,
COUNT(DISTINCT s.SESSION_ID) AS SESSION_CNT,
AVG(s.DURATION_SEC) AS AVG_SESSION_DURATION_SEC,
SUM(CASE WHEN e.EVENT_TYPE = 'PURCHASE' THEN 1 ELSE 0 END) AS PURCHASE_EVENT_CNT,
MAX(s.START_TS) AS LAST_SESSION_TS
FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.SESSIONS s
JOIN FEATURE_STORE_DEMO.CLICKSTREAM_DATA.EVENTS e ON s.SESSION_ID = e.SESSION_ID
GROUP BY s.USER_ID
""")import snowflake.snowpark.functions as F
sessions = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.SESSIONS")
events = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.EVENTS")
user_session_df = (
sessions.join(events, sessions["SESSION_ID"] == events["SESSION_ID"])
.group_by(sessions["USER_ID"])
.agg(
F.count_distinct(sessions["SESSION_ID"]).alias("SESSION_CNT"),
F.avg(sessions["DURATION_SEC"]).alias("AVG_SESSION_DURATION_SEC"),
F.sum(F.when(events["EVENT_TYPE"] == "PURCHASE", 1).otherwise(0)).alias("PURCHASE_EVENT_CNT"),
F.max(sessions["START_TS"]).alias("LAST_SESSION_TS"),
)
)user_session_fv = FeatureView(
name="USER_SESSION_FV",
entities=[user_entity],
feature_df=user_session_df,
timestamp_col="LAST_SESSION_TS",
refresh_freq=None,
desc="USER session engagement features -- computed at query time",
)3. Development precursor to a Dynamic Table. During development, use a View-based Feature View while the transformation logic is still evolving. Because the View has no materialized state, you can freely iterate on the query definition. Once the logic stabilizes, convert it to a Dynamic Table by setting refresh_freq to a period string – the same feature_df works unchanged:
When you switch from refresh_freq=None (View) to a period string (Dynamic Table), the underlying Snowflake object type changes from VIEW to DYNAMIC TABLE. As of snowflake-ml-python 1.27.0, register_feature_view() with overwrite=True handles this conversion automatically – it drops the existing object and creates the new type. On earlier versions, you must delete the existing Feature View first with fs.delete_feature_view() before re-registering with the new refresh_freq.
# Start with a View in development
fv = FeatureView(
name="USER_ORDER_FV",
entities=[user_entity],
feature_df=user_order_df,
timestamp_col="LAST_ORDER_TS",
refresh_freq=None, # View — iterate freely
desc="USER lifetime order features — DEV iteration",
)
# Later, promote to Dynamic Table for production
fv = FeatureView(
name="USER_ORDER_FV",
entities=[user_entity],
feature_df=user_order_df, # same query
timestamp_col="LAST_ORDER_TS",
refresh_freq="1 hour", # now materialized as a DT
desc="USER lifetime order features — production DT",
)Characteristics:
- Computed on each query – always reflects current source data
- No Dynamic Table storage or refresh cost
- Query-time compute cost scales with read volume
- Ideal for prototyping, rarely accessed features, or wrapping externally managed tables
4.4.2 Dynamic Table Feature View (time period)
📁 Full code:
_code/dynamic_table_fv.py—create_dt_featureview
4.4.2.1 Pattern A: Cumulative lifetime features (one row per order event)
The most PIT-friendly pattern for DT-backed Feature Views produces multiple rows per entity over time, so the ASOF join can select the correct snapshot for any point in the past:
from snowflake.ml.feature_store import FeatureView, Entity
user_order_cumulative_df = session.sql("""
SELECT
USER_ID,
ORDER_TS,
SUM(TOTAL_AMT) OVER w AS ORDER_TOTAL_AMT_SUM,
COUNT(ORDER_ID) OVER w AS ORDER_CNT,
AVG(TOTAL_AMT) OVER w AS ORDER_TOTAL_AMT_AVG
FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
WINDOW w AS (PARTITION BY USER_ID ORDER BY ORDER_TS
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
""")from snowflake.snowpark import Window
import snowflake.snowpark.functions as F
from snowflake.ml.feature_store import FeatureView, Entity
orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
w = Window.partition_by("USER_ID").order_by("ORDER_TS").rows_between(
Window.UNBOUNDED_PRECEDING, Window.CURRENT_ROW
)
user_order_cumulative_df = orders.select(
F.col("USER_ID"),
F.col("ORDER_TS"),
F.sum("TOTAL_AMT").over(w).alias("ORDER_TOTAL_AMT_SUM"),
F.count("ORDER_ID").over(w).alias("ORDER_CNT"),
F.avg("TOTAL_AMT").over(w).alias("ORDER_TOTAL_AMT_AVG"),
)user_order_fv = FeatureView(
name="USER_ORDER_FV",
entities=[user_entity],
feature_df=user_order_cumulative_df,
timestamp_col="ORDER_TS",
refresh_freq="1 hour",
desc="USER cumulative order features: running total spend, order count, "
"and average order value -- one row per order for PIT retrieval",
)This Feature View produces one row per user per order. When the spine requests features as of a specific timestamp, the ASOF join selects the row with the most recent ORDER_TS <= the spine timestamp, returning the cumulative totals as of that point in time.
4.4.2.2 Pattern B: Latest-state aggregate (one row per entity)
If you only need the current totals (e.g., for online serving or batch inference at “now”), a single-row aggregate is simpler and cheaper to maintain:
import snowflake.snowpark.functions as F
orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
user_order_latest_df = orders.group_by("USER_ID").agg(
F.sum("TOTAL_AMT").alias("ORDER_TOTAL_AMT_SUM"),
F.count("ORDER_ID").alias("ORDER_CNT"),
F.avg("TOTAL_AMT").alias("ORDER_TOTAL_AMT_AVG"),
F.max("ORDER_TS").alias("LAST_ORDER_TS"),
)Pattern B produces one row per user. The ASOF join always returns this single row for any spine timestamp >= LAST_ORDER_TS. This means every training example sees the same (current) totals regardless of when the training event occurred – it does not provide true point-in-time correctness for historical training. Use Pattern A when temporal accuracy matters. See Chapter 6: Temporal Features for details.
Characteristics:
- Pre-computed and stored in a Dynamic Table
- Refreshes on a lag target derived from the period string
- Strong fit for high read volume and predictable freshness needs
- Ongoing compute cost for refreshes
If your feature_df query combines aggregations on FLOAT-typed columns (SUM, AVG, MIN, MAX, VAR, STD) with a JOIN in the same query block, the Dynamic Table engine will fall back to FULL refresh instead of incremental. This can dramatically increase compute cost and refresh latency.
Workaround: Cast float columns to fixed-point before aggregating:
If you are building your feature_df with the Snowpark DataFrame API instead of session.sql(), the equivalent cast is:
from snowflake.snowpark.types import DecimalType
feature_df = (
source_df
.group_by("JOIN_KEY")
.agg(F.sum(F.col("FLOAT_COL").cast(DecimalType(38, 6))).alias("SUM_VAL"))
)If the join is unavoidable, consider splitting the query into a base DT (aggregation only, no join) and a presentation View that joins the results – see the Base + Presentation Layer Model later in this chapter.
As of April 2026, the presentation layer does not have to be a View – it can be a downstream DT that refreshes incrementally. When the base DT’s query uses GROUP BY, Snowflake derives a unique key from the grouping columns. A downstream DT can then opt into REFRESH_MODE = INCREMENTAL via SQL (ALTER DYNAMIC TABLE ... SET REFRESH_MODE = INCREMENTAL) and process only the rows that actually changed in the upstream full refresh. This is a SQL-level capability not yet exposed in the Feature Store Python API. See Chapter 5: Feature Pipelines and Understanding primary keys in dynamic tables.
4.4.3 Dynamic Table + Task (CRON)
📁 Full code:
_code/dynamic_table_fv.py—create_cron_featureview
Use CRON when features should refresh on a calendar schedule aligned to upstream batch loads rather than continuous lag. This is common when source data arrives in known batches (e.g., a nightly ETL) and you want refresh timing to match the upstream cadence. Note that even with a shorter refresh_freq, cost is minimal when there is no new source data – the Dynamic Table scheduler checks for upstream changes and skips the refresh cycle without starting a warehouse:
user_daily_session_fv = FeatureView(
name="USER_SESSION_DAILY_FV",
entities=[user_entity],
feature_df=user_daily_session_df,
refresh_freq="0 8 * * *", # CRON: refresh daily at 08:00 UTC after nightly load
desc="USER session engagement features -- refreshed daily after batch ingestion",
)timestamp_col is optional
This example omits timestamp_col because it produces a single latest-state row per entity. Without a timestamp column, Feature Store performs an equality join on the entity key when retrieving features – every spine row for that entity receives the same feature values. Use timestamp_col when you need point-in-time (ASOF) joins for temporally correct training data; omit it when you only need the current snapshot. See Chapter 6: Temporal Features for details.
Characteristics:
- Operational model: initial DT build, then Task-owned refreshes per CRON schedule
- Aligns refresh to upstream batch cadence – avoids paying for refreshes before new data lands
- Document the schedule and timezone expectations for consumers
4.4.4 External Scheduling (Suspend + On-Demand Refresh)
When refreshes should be triggered by external pipeline completion (e.g., an Airflow DAG, Dagster asset, or CI/CD step) rather than a fixed schedule, you can suspend the Feature View’s DT and call refresh_feature_view() on demand:
The SQL approach is often simpler for external schedulers (Airflow, Dagster, dbt) that already have a Snowflake connection – no Python environment or snowflake-ml-python install required. The Feature View’s underlying Dynamic Table name follows the <NAME>$<VERSION> convention and lives in the Feature Store schema.
Characteristics:
- Full control over refresh timing – refresh only when upstream data is confirmed ready
- No wasted refreshes on unchanged data
- Requires external orchestration infrastructure
- The DT remains suspended between refreshes; no background compute
4.4.5 Choosing Between Modes
| Factor | View (None) |
Dynamic Table (period) | Dynamic Table + Task (CRON) | External (suspend + refresh) |
|---|---|---|---|---|
| Query frequency | Lower or sporadic | High | High (after batch landing) | High (after upstream completes) |
| Freshness | As-of query time | Lag-bound (e.g. ≤ 1 hour) | Next scheduled run | On-demand after trigger |
| Cost shape | Pay per query | Pay for refresh + storage | Pay for scheduled runs | Pay for triggered runs only |
| Orchestration | Optional external prep of sources | Native lag refresh | Native Task schedule | External (Airflow, Dagster, etc.) |
Feature tables with hundreds or thousands of columns (sensor data, one-hot encodings, time-windowed aggregations) introduce SQL compilation overhead, micro-partition bloat, and can hit Snowflake’s ~2,000-column practical limit. If your Feature View exceeds ~100 columns, consider hybrid storage – entity keys and timestamp as native columns, remaining features packed into a single OBJECT column using OBJECT_CONSTRUCT. This reduces storage overhead, eliminates column-limit concerns, and improves compile time. An expansion View on top can reconstitute the wide schema for consumers that need it. See Chapter 12: Advanced Patterns – Wide & Sparse Feature Data for detailed architecture, column count thresholds, and the two-tier Feature View pattern.
When feature columns are generated programmatically — categorical pivots, multi-window sweeps, config-driven specifications — the feature_df and Feature list are built by code rather than written by hand. The Feature View schema is still fixed at registration time, but the code path becomes data-driven. See Chapter 7: Data-Driven Feature Definitions for patterns, factory functions, and static vs dynamic trade-offs.
4.5 Versioning Strategies
Feature View versions enable safe updates and rollbacks.
Versioning happens at the Feature View level, not per individual feature. A Feature View version is an immutable bundle of features and definitions.
4.5.1 Recommended Version Naming
📁 Full code:
_code/versioning.py
# Zero-padded sequential versions (recommended for production)
fs.register_feature_view(fv, version="V01")
fs.register_feature_view(fv, version="V02")
fs.register_feature_view(fv, version="V10") # Sorts correctly lexicographically
# Environment-prefixed variants (optional)
fs.register_feature_view(fv, version="DEV_V01")
fs.register_feature_view(fv, version="PROD_V03")Prefer V01, V02, … for operational clarity and sorting. Other schemes (semantic versions, date stamps) are possible but should be agreed team-wide. Consider creating shared Python helper functions that automate version incrementing and enforce your preferred scheme – this reduces human error as the Feature View count grows and keeps versioning consistent across teams.
4.5.2 How Versions Map to Snowflake Objects
When you register a Feature View, the Feature Store constructs the underlying Snowflake object name by joining the Feature View name and version with a $ separator:
<FEATURE_VIEW_NAME>$<VERSION>
For example, registering a Feature View named USER_ORDER_FV at version V01 creates a Dynamic Table (or View) called:
FEATURE_STORE_DEMO.FEATURE_STORE.USER_ORDER_FV$V01
This naming convention means:
- Each version is a distinct Snowflake object – V01 and V02 coexist as separate Dynamic Tables (or Views) in the same schema.
- You can query any version directly via SQL:
SELECT * FROM FEATURE_STORE.USER_ORDER_FV$V02. - Downstream DTs that chain off a specific version reference the fully-qualified
$-delimited name, so upgrading consumers to a new version is an explicit, controlled change. - Cost implication: Each DT-backed version incurs independent refresh compute and storage. If V01 and V02 share overlapping features, that compute is duplicated. Mitigations: suspend older versions once consumers have migrated (
fs.suspend_feature_view()), use the Base + Presentation pattern to keep shared features in a single base DT, or consolidate features into a single wide Feature View. - The
$character is legal in unquoted Snowflake identifiers for most SQL statements (SELECT,ALTER,DESCRIBE, etc.). However, you need double-quoting when using the name inALTER ... SET CONTACT,ALTER ... SET TAG, or similar DDL that could misparse the$, and in Snowparksession.table('"USER_ORDER_FV$V01"')where the string is passed to the SQL layer.
4.5.3 Schema Evolution and Deprecation
Do not alter a registered Feature View in place (same name + version) when semantics change. Register a new version (or a new Feature View name) instead.
| Change type | What to do |
|---|---|
| Adding new features | Create and register a new version (e.g. V02) that includes the new columns; keep V01 until consumers opt in. Exception: if the DT uses SELECT * from a base table and the source team adds a column there, the DT picks it up automatically — see Incremental Column Pickup with SELECT *. |
| Deprecating features | Leave the old version registered and documented as deprecated; remove columns only in a new version so existing training and serving jobs keep working. |
Breaking changes (rename column, change type or meaning, change entity keys, change timestamp_col semantics) |
Always ship a new version (or new Feature View name). Never overwrite V01 with incompatible definitions. |
| Consumers | Point new pipelines to V02; migrate models and batch jobs on your timeline; retire V01 only after usage drops to zero. Use lineage to identify dependencies and usage. |
Workflow summary:
- V01 — current production bundle.
- V02 — add columns or fix logic; train new models against V02.
- Deprecate V01 in docs and catalog; delete V01 only when no remaining dependencies.
4.5.4 Reducing the Cost of Schema Evolution with BACKFILL
Adding features is the most common schema evolution scenario. The default path – registering a new Feature View version with the additional columns – creates a new Dynamic Table that must perform a full initial refresh over the entire source dataset. For large DTs with months or years of history, this can be prohibitively expensive.
Dynamic Tables now support BACKFILL FROM combined with IMMUTABLE WHERE, which lets you carry forward historical data from the prior version’s DT and only compute the new columns for the mutable (recent) window. The general workflow:
- Clone the existing DT (V01) to a regular table.
- Add the new column(s) to the clone (
ALTER TABLE ... ADD COLUMN). Historical rows will haveNULLfor the new features. - Create the new DT (V02) with the updated SELECT that includes the new features, using
BACKFILL FROMthe clone andIMMUTABLE WHEREto protect the historical region. - Only the mutable window (e.g., last 1 day) is computed from the source data; the immutable region is copied from the backfill table at near-zero cost.
-- 1. Clone V01's Dynamic Table to a regular table
CREATE TABLE FEATURE_STORE.USER_ORDER_FV_V01_CLONE
CLONE FEATURE_STORE.USER_ORDER_FV$V01;
-- 2. Add the new column (NULL for historical rows)
ALTER TABLE FEATURE_STORE.USER_ORDER_FV_V01_CLONE
ADD COLUMN ORDER_TOTAL_AMT_AVG FLOAT;
-- 3. Create V02 with BACKFILL + IMMUTABLE WHERE
CREATE DYNAMIC TABLE FEATURE_STORE.USER_ORDER_FV$V02
BACKFILL FROM FEATURE_STORE.USER_ORDER_FV_V01_CLONE
IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '1 day')
TARGET_LAG = '1 hour'
WAREHOUSE = FS_DEV_WH
AS
SELECT USER_ID,
SUM(TOTAL_AMT) AS ORDER_TOTAL_AMT_SUM,
COUNT(ORDER_ID) AS ORDER_CNT,
AVG(TOTAL_AMT) AS ORDER_TOTAL_AMT_AVG, -- new feature
MAX(ORDER_TS) AS LAST_ORDER_TS
FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
GROUP BY USER_ID;After creation, the DT’s immutable region contains the backfilled data (with NULL for ORDER_TOTAL_AMT_AVG in historical rows), while the mutable region is computed fresh and includes all columns. You can verify with SELECT *, METADATA$IS_IMMUTABLE FROM FEATURE_STORE.USER_ORDER_FV$V02.
Objects created via SQL in the Feature Store schema are not visible to fs.list_feature_views(), fs.get_feature_view(), or the Snowsight Feature Store UI unless the correct internal TAGs are applied. After creating the DT via SQL, add the Feature Store tags so the API and UI can discover it. See Appendix D: Tag Convention for the required TAG schema and a worked example. If the object is only consumed via direct SQL (not through the Feature Store API), tags are optional.
BACKFILL FROM and IMMUTABLE WHERE are Dynamic Table SQL features – they are not currently exposed through the Feature Store Python API (FeatureView, register_feature_view). To use this pattern you would create the DT via SQL and then register it as a view-based Feature View pointing at the resulting table, or manage the DT directly outside the API. Check the Feature Store release notes for future API integration.
fv.query
The FeatureView.query property returns the compiled SQL from your feature_df – the exact statement that becomes the AS clause in the CREATE DYNAMIC TABLE or CREATE VIEW. You can use this to inspect, copy, or adapt the SQL before creating the DT manually with additional clauses (e.g., BACKFILL FROM, IMMUTABLE WHERE):
draft_fv = FeatureView(name="MY_FV", entities=[entity], feature_df=my_df, ...)
print(draft_fv.query) # prints the SQL that would be used in the AS clauseThis gives you the inner query only. For the full DDL of an already-registered DT (including TARGET_LAG, WAREHOUSE, TAGS etc.), use SELECT GET_DDL('DYNAMIC_TABLE', 'FEATURE_STORE."MY_FV$V01"').
Downstream Dynamic Tables that read from the evolved DT will still need to refresh to pick up the schema change. If the downstream DTs also use IMMUTABLE WHERE, only their mutable window is reprocessed.
4.5.5 Retroactive IMMUTABLE WHERE (no clone needed)
The clone-and-backfill workflow above creates a new DT version. A simpler alternative – when you can accept NULL for the new feature in historical rows – is to add an IMMUTABLE WHERE constraint retroactively to the existing DT, replace the upstream source (or Feature View query), and refresh. Only the mutable window is recomputed; the immutable region is preserved as-is.
-- 1. Protect history on the existing DT
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET
IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '1 day');
-- 2. Update the Feature View definition (e.g., add AVG feature)
-- If the FV reads from a VIEW, replace the VIEW:
CREATE OR REPLACE VIEW FEATURE_STORE.USER_ORDER_SOURCE_V AS
SELECT USER_ID,
SUM(TOTAL_AMT) AS ORDER_TOTAL_AMT_SUM,
COUNT(ORDER_ID) AS ORDER_CNT,
AVG(TOTAL_AMT) AS ORDER_TOTAL_AMT_AVG, -- new feature
MAX(ORDER_TS) AS LAST_ORDER_TS
FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
GROUP BY USER_ID;
-- 3. Refresh — only mutable rows recomputed
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH;
-- 4. (Optional) Remove constraint if not needed permanently
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" UNSET IMMUTABLE WHERE;After the refresh, immutable rows have NULL for ORDER_TOTAL_AMT_AVG while mutable rows have the computed value. Verify with SELECT *, METADATA$IS_IMMUTABLE FROM FEATURE_STORE."USER_ORDER_FV$V01".
With OBJECT-based Feature Views (see Chapter 12: Wide & Sparse Data), heterogeneous key sets across the immutable/mutable boundary are handled naturally. Immutable rows simply lack the new key in the OBJECT; COALESCE(features:"NEW_KEY"::NUMBER, 0) fills the default at read time. In scalar DTs, the same pattern produces a visible NULL column in historical rows.
If you later need the new feature backfilled across all history, you have two options:
Full refresh – remove the constraint and recompute everything in a single off-peak window:
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" UNSET IMMUTABLE WHERE;
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH;Phased backfill – progressively move the immutability boundary backward (e.g., one month at a time), so each refresh recomputes only a slice of history. This spreads the cost across multiple windows and avoids a single large recompute:
-- Phase 1: backfill the most recent immutable month
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET
IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '60 days');
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH;
-- Phase 2: backfill the next month
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET
IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '90 days');
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" REFRESH;
-- Continue until all history is backfilled, then lock down again
ALTER DYNAMIC TABLE FEATURE_STORE."USER_ORDER_FV$V01" SET
IMMUTABLE WHERE (LAST_ORDER_TS < CURRENT_TIMESTAMP() - INTERVAL '1 day');Phased backfill is useful when the full history is too large for a single off-hours window, or when warehouse budget constraints require distributing the compute across days.
4.5.6 Incremental Column Pickup with SELECT *
Snowflake has updated the behaviour of Dynamic Tables defined with SELECT * FROM base_table: when a new column is added to the base table, the DT now picks it up on its next refresh rather than failing with an error.
The Snowflake documentation previously stated: > “Changes to underlying base table for dynamic tables created with SELECT * > from base table: The dynamic table fails to refresh and must be > recreated to respond to the change.”
This constraint has been relaxed for additive changes (new columns). Non-additive changes — column drops, renames, and type modifications — still require recreation.
When this applies. If a Feature View’s DT is a straightforward pass-through of a base table using SELECT *, new columns added to the source automatically appear in the DT:
-- Feature View DT as a simple pass-through
CREATE DYNAMIC TABLE FEATURE_STORE."USER_FEATURES$V01"
TARGET_LAG = '1 hour'
WAREHOUSE = FS_WH
REFRESH_MODE = INCREMENTAL
AS
SELECT * FROM SOURCE_DB.DATA.USER_FEATURES_BASE;When USER_FEATURES_BASE gains a new column (e.g., RISK_SCORE FLOAT), ALTER TABLE ... ADD COLUMN RISK_SCORE FLOAT, the column will appear in the DT on its next scheduled refresh or a manual ALTER DYNAMIC TABLE ... REFRESH.
Backfill semantics for pre-existing rows. All rows — including those that existed before the column was added — are recomputed from source and receive the correct value for the new column. The REFRESH_ACTION shows INCREMENTAL with ins=N / del=N equal to the total row count: the DT deletes all existing rows and re-inserts them with the new column evaluated. There is no NULL fill for pre-existing rows.
The INCREMENTAL refresh action after a column add is a full scan of all rows in the source — every row is deleted and re-inserted. The cost is equivalent to a full refresh. The INCREMENTAL label refers to the DT engine’s change-tracking mechanism, not to processing only the rows that changed in the source.
For large DTs with many rows, a column add to the base table will therefore be an expensive operation. If the DT also has downstream dependents, each downstream DT also does a full row-level recompute in turn. Use IMMUTABLE WHERE (see below) if you need to protect the historical region from this recompute cost.
IMMUTABLE WHERE interaction. When IMMUTABLE WHERE is set on the DT, the column-add refresh respects the boundary: immutable rows are not recomputed (ins=0 / del=0 for protected rows) and their value for the new column is NULL. Mutable rows are recomputed from source and receive the correct value. This is consistent with the retroactive IMMUTABLE WHERE behaviour described above.
The SELECT * pickup is always on. Any ADD COLUMN to the base table — even one added by another team for an unrelated purpose — will trigger a full row-level recompute (ins=N/del=N) on this DT and on every downstream SELECT * DT in the cascade.
In a shared or multi-team environment where you do not own the base table, prefer the OBJECT-bundled approach with a VIEW intermediary. The VIEW acts as a gate: upstream schema changes are inert until you deliberately replace the VIEW, giving you control over timing, scope, and cost. See Chapter 12: Choosing between approaches for the full trade-off analysis.
DT cascade behaviour. Downstream DTs that also use SELECT * from the DT that gained a column will pick it up on their own next refresh. Downstream DTs with explicit column lists are unaffected until their query is updated.
When this is the right choice.
- The DT is a thin pass-through over a single managed base table where the source team adds columns intentionally.
- Consumers can tolerate
NULLfor the new column in historical rows, or the DT history is short enough that a full refresh is acceptable. - Feature count stays below ~100 columns (scalar column-per-feature stays within compilation and join-performance thresholds).
- No explicit version gate is needed — new columns become visible immediately to all consumers.
For more complex cases — computed feature expressions, wide feature sets (100+ columns), sparse data, or when historical rows must not be NULLed — the BACKFILL + IMMUTABLE WHERE workflow and the OBJECT-bundled pattern remain the recommended approaches.
Writing OBJECT_CONSTRUCT(* EXCLUDE (entity_id, ts)) AS features directly in the DT’s AS clause (no VIEW intermediary) has been prototyped and confirmed to work for pass-through bundles — where features are already derived and materialised upstream of this Feature View:
- New upstream physical columns auto-appear as OBJECT keys on the next refresh (
REFRESH_ACTION = INCREMENTAL) - The DT’s physical schema stays stable (3 columns: entity key, timestamp, OBJECT)
- All rows in the mutable region are recomputed and receive the new key value
- No VIEW to create, replace, or maintain when the source already holds the finished features
When the VIEW intermediary is the right choice. If feature derivation happens at the Feature View layer — ratios, flags, windowed functions that are not physical columns in the source — use an intermediary VIEW to hold the derivation logic and read from that VIEW with OBJECT_CONSTRUCT(*) in the DT. The key benefit: ALTER-ing the VIEW to add a new derived column is also picked up by the OBJECT(*) DT incrementally (REFRESH_ACTION = INCREMENTAL, not REINITIALIZE), giving you the same additive column pickup behaviour with the additional benefit of a controlled gate — the DT is completely unaffected by upstream schema changes until you deliberately replace the VIEW. See Chapter 12: Choosing between approaches for the full comparison.
Column drops, renames, and type changes on the base table still cause the DT to fail on refresh and require recreation. For these changes, create a new Feature View version using the standard workflow described above.
SELECT * DTs and their schema evolution are a SQL-layer capability. The Feature Store Python API (FeatureView, register_feature_view) does not expose this directly. Create the DT via SQL, then apply the Feature Store TAGs so the API and Snowsight UI can discover it (see Appendix D: Tag Convention).
4.5.7 Alternative: Base + Presentation Layer Model
The standard approach creates a new version (a new Dynamic Table) for each schema change. An alternative for organizations with complex, frequently evolving feature sets is a two-layer model that separates physical storage from the consumer-facing view-based Feature Views. This could be implemented within a single Feature Store Schema through naming convention, or two separate Feature Store Schemas to provide separation of concerns:
-- Single Feature Store Schema
CREATE SCHEMA FEATURE_STORE;
-- Two Feature Store Schemas
CREATE SCHEMA FEATURE_STORE_BASE;
CREATE SCHEMA FEATURE_STORE_PRESENTATION;flowchart TB B1[USER_ORDER_BASE DT] B2[USER_ENGAGE_BASE DT] B3[USER_PROMO_BASE DT] B1 & B2 & B3 --> COMB[USER_COMBINED_FV View joins bases]
How it works:
- Base layer – each feature domain has its own versioned Dynamic Table (or View) Feature View. These are the immutable physical units that hold the actual feature-engineering logic and materialized data.
- Presentation layer – a View-based Feature View that JOINs multiple base Feature Views together on their shared entity key (and timestamp column if required). Consumers query the presentation layer to get a unified feature set.
A key principle of this model is feature deduplication: each feature computation exists in exactly one base object. Presentation layers compose existing features without re-implementing them. When a feature definition changes, only the base layer needs a new version – the presentation layer’s SELECT * automatically picks up the new columns. If individual features within a single Feature View evolve at different rates, consider feature-level version suffixes in naming (e.g., ORDER_TOTAL_AMT_SUM_V01, ORDER_TOTAL_AMT_SUM_V02) to maintain backward compatibility while the base layer transitions.
# Base Feature Views (DT-backed, versioned independently)
user_order_base = FeatureView(
name="USER_ORDER_BASE_FV", entities=[user_entity],
feature_df=user_order_df, timestamp_col="LAST_ORDER_TS",
refresh_freq="1 hour", desc="Base: lifetime order aggregates",
)
user_engage_base = FeatureView(
name="USER_ENGAGE_BASE_FV", entities=[user_entity],
feature_df=user_engage_df, timestamp_col="LAST_SESSION_TS",
refresh_freq="1 hour", desc="Base: session engagement features",
)
# Presentation layer (View-based, combines bases)user_combined_fv = FeatureView(
name="USER_COMBINED_FV", entities=[user_entity],
feature_df=combined_df, timestamp_col="LAST_ORDER_TS",
refresh_freq=None, # View — always reflects latest base DT state
desc="Presentation: combines order + engagement base Feature Views",
)When to add a new feature, you create a new version of just the affected base DT (e.g., USER_ORDER_BASE_FV$V02 with the extra column), then update the presentation View’s SQL to reference $V02. The other base DTs are untouched – no duplication of existing features across multiple DTs.
Trade-offs:
| Standard versioning | Base + Presentation | |
|---|---|---|
| Simplicity | One DT per version, straightforward | Multiple layers, JOIN management |
| Feature duplication | New version duplicates unchanged features | Only changed base DT is re-created |
| Consumer migration | Consumers point to new version | Presentation view updated centrally |
| Best for | Small-to-medium feature sets, simple schemas | Large feature sets with frequent, incremental additions across domains |
This pattern adds operational complexity and is best suited to organizations with mature Feature Store practices and large numbers of features that evolve at different rates. For most teams starting out, standard version-per-bundle is simpler and sufficient. The Feature Store API does not natively manage the base-to-presentation relationship today – you manage the presentation View’s SQL definition yourself (or via CI/CD).
4.5.8 Version Lifecycle
flowchart LR V01[V01 Initial] --> V02[V02 Update] --> V03[V03 Current]
4.5.9 Managing Versions
List registered Feature Views:
-- Dynamic Table-backed Feature Views
SHOW DYNAMIC TABLES IN SCHEMA FEATURE_STORE_DEMO.FEATURE_STORE;
-- View-based Feature Views
SHOW VIEWS IN SCHEMA FEATURE_STORE_DEMO.FEATURE_STORE;
-- Combined column-level inventory
SELECT TABLE_NAME AS FEATURE_VIEW, COLUMN_NAME, DATA_TYPE
FROM FEATURE_STORE_DEMO.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'FEATURE_STORE'
ORDER BY TABLE_NAME, ORDINAL_POSITION;Delete an old version (only after all consumers have migrated):
Before deleting a version, verify that no models, datasets, or pipelines still depend on it:
v01_fv = fs.get_feature_view(name="USER_ORDER_FV", version="V01")
# ML-specific dependencies (datasets, models, other FVs)
downstream = v01_fv.lineage(direction="downstream")
for node in downstream:
print(f"{type(node).__name__}: {node.name}")
# Narrower: only check if any models reference V01
model_deps = v01_fv.lineage(direction="downstream", domain_filter=["model"])Requires snowflake-ml-python >= 1.6.0 and the VIEW LINEAGE privilege.
-- Snowflake-level object dependencies (DT chains, views, tasks)
SELECT *
FROM SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES
WHERE REFERENCED_OBJECT_NAME = 'USER_ORDER_FV$V01'
AND REFERENCED_OBJECT_DOMAIN IN ('TABLE', 'DYNAMIC TABLE', 'VIEW');
-- Recent query access to V01
SELECT QUERY_START_TIME, USER_NAME, QUERY_TEXT
FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY
WHERE BASE_OBJECTS_ACCESSED LIKE '%USER_ORDER_FV$V01%'
ORDER BY QUERY_START_TIME DESC
LIMIT 20;Only delete a version when both the API lineage check and the SQL dependency/access-history checks confirm zero remaining usage. See Chapter 11: Operations for more on lineage tracking.
4.5.10 Getting the Latest Version and Creating the Next
For maximum flexibility, Feature Store is intentionally not opinionated on version semantics—there’s no built-in “latest version” alias or auto-increment. Avoid hard-coding version strings throughout your codebase; instead, use helper utilities that derive the next version programmatically:
📁 Full code:
_code/version_utils.py
import re
from typing import Optional
def get_latest_feature_view(
fs: FeatureStore,
feature_view_name: str,
) -> Optional[FeatureView]:
"""Get the most recent version of a Feature View."""
versions = fs.list_feature_views(feature_view_name=feature_view_name)
# Sort by version string (works with zero-padded versions)
versions_sorted = sorted(versions, key=lambda fv: fv.version, reverse=True)
return versions_sorted[0] if versions_sorted else None
def next_version(
fs: FeatureStore,
feature_view_name: str,
prefix: str = "V",
pad: int = 2,
) -> str:
"""
Derive the next sequential version string for a Feature View.
If USER_ORDER_FV has versions V01 and V02, returns "V03".
If no versions exist, returns "V01".
"""
versions = fs.list_feature_views(feature_view_name=feature_view_name)
max_num = 0
pattern = re.compile(rf"^{re.escape(prefix)}(\d+)$")
for fv in versions:
m = pattern.match(fv.version)
if m:
max_num = max(max_num, int(m.group(1)))
return f"{prefix}{max_num + 1:0{pad}d}"Usage:
version = next_version(fs, "USER_ORDER_FV") # "V03" if V01, V02 exist
fs.register_feature_view(fv, version=version)This keeps version assignment consistent across manual runs, notebooks, and CI/CD pipelines without anyone needing to remember what the current version is.
4.6 Refresh Configuration
4.6.1 Refresh Frequency Examples (time-period mode)
When using a period string (not CRON), you are configuring TARGET_LAG-style refresh for the backing Dynamic Table.
| Frequency | Example use case | Cost impact |
|---|---|---|
1 minute |
Tight operational or fraud signals | Very high |
15 minutes |
Operational dashboards | High |
1 hour |
Standard batch ML features | Medium |
1 day |
Daily analytics features | Low |
4.6.2 Best Practices for Refresh
# Align period with how often upstream CLICKSTREAM_DATA tables land meaningful deltas
fv = FeatureView(
refresh_freq="4 hours",
# ...
)
# For Online Feature Tables, OFT sync frequency should be <= Feature View refresh cadenceNo changes = no compute. The Dynamic Table scheduler checks for upstream changes before triggering a refresh. If no source data has changed since the last refresh, the scheduler skips the cycle without consuming any warehouse compute. This means setting a shorter refresh_freq than the actual source update rate incurs near-zero cost from the change-detection check itself – the DT simply checks more often and finds nothing to do. You do not need to perfectly match refresh_freq to source cadence to avoid wasted spend; err on the side of slightly shorter lag for better freshness guarantees.
4.7 Feature View Ownership & SLAs
The Feature Store API provides desc on Feature Views and Entities, plus Snowflake object tags on the underlying Dynamic Tables and Views. For many teams, this is sufficient. However, the API does not natively track ownership, consumer lists, SLA targets, or other operational metadata that mature ML platform users might need for governance and compliance.
You have several options for extending the metadata layer to meet these needs – none are part of the Feature Store product itself, but all integrate naturally using Snowflake’s built-in features:
Snowflake Tags. Attach key-value metadata directly to Feature View objects. Tags are queryable via INFORMATION_SCHEMA and TAG_REFERENCES, and can be used for governance, cost attribution, and discovery:
CREATE TAG IF NOT EXISTS FS_OWNER ALLOWED_VALUES 'marketing-ds', 'risk-eng', 'platform';
CREATE TAG IF NOT EXISTS FS_SLA_MINUTES;
ALTER DYNAMIC TABLE FEATURE_STORE.USER_ORDER_FV$V01
SET TAG FS_OWNER = 'marketing-ds', FS_SLA_MINUTES = '90';
-- Query ownership across all Feature Views
SELECT OBJECT_NAME, TAG_NAME, TAG_VALUE
FROM TABLE(INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS(
'FEATURE_STORE_DEMO.FEATURE_STORE', 'SCHEMA'))
WHERE TAG_NAME IN ('FS_OWNER', 'FS_SLA_MINUTES');Metadata tables. Maintain a registry table within your Feature Store schema, keyed by the object name (<FV_NAME>$<VERSION>). This can hold owner, team, consumer list, SLA targets, deprecation status, and anything else you need:
CREATE TABLE IF NOT EXISTS FEATURE_STORE.FEATURE_VIEW_METADATA (
FV_OBJECT_NAME STRING PRIMARY KEY, -- e.g. USER_ORDER_FV$V01
OWNER STRING,
TEAM STRING,
CONSUMERS ARRAY,
SLA_REFRESH STRING,
SLA_MAX_LATENCY_MINUTES NUMBER,
DEPRECATED BOOLEAN DEFAULT FALSE,
NOTES STRING
);You can build helper functions that populate this table automatically when registering a Feature View, and query it for discovery, alerting, or CI/CD gating.
External manifest files. For teams that manage Feature Store definitions in Git alongside CI/CD, a YAML manifest can serve as the source of truth for ownership and SLA metadata. This is not consumed by Snowflake, but drives your deployment scripts and documentation:
# feature_manifest.yaml
feature_views:
- name: USER_ORDER_AGGREGATES
owner: marketing-ds@company.com
team: Marketing Data Science
description: Core user behavior from ORDERS and ORDER_ITEMS
consumers:
- churn-model
- ltv-model
sla:
refresh_frequency: "1 hour"
max_latency_minutes: 90Snowflake Contacts. Contacts are schema-level objects that you assign to databases, schemas, tables, views, and Dynamic Tables to indicate who is responsible for different purposes. They integrate with Snowsight’s Catalog and are queryable via ACCOUNT_USAGE views:
-- Create contacts for the Feature Store team
CREATE CONTACT IF NOT EXISTS FS_STEWARD
METHOD EMAIL 'feature-store-team@company.com';
CREATE CONTACT IF NOT EXISTS FS_SUPPORT
METHOD EMAIL 'fs-support@company.com';
-- Assign at schema level -- all Feature Views inherit automatically
ALTER SCHEMA FEATURE_STORE_DEMO.FEATURE_STORE
SET CONTACT STEWARD = FS_STEWARD,
SUPPORT = FS_SUPPORT;
-- Override for a specific Feature View if a different team owns it
ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"
SET CONTACT STEWARD = MARKETING_DS_STEWARD;Each assignment specifies a purpose – STEWARD (data accuracy), SUPPORT (technical questions), or ACCESS_APPROVAL (who approves access). Contacts inherit down the hierarchy (database → schema → table), so assigning at the schema level covers all Feature Views with a single statement. Direct assignment on a child overrides the inherited contact for that purpose.
To audit contact assignments across your Feature Store:
SELECT object_name, purpose,
CASE WHEN is_inherited THEN 'Inherited' ELSE 'Direct' END AS assignment
FROM SNOWFLAKE.ACCOUNT_USAGE.CONTACT_REFERENCES
WHERE object_database = 'FEATURE_STORE_DEMO'
AND object_schema = 'FEATURE_STORE'
AND deleted IS NULL
ORDER BY object_name, purpose;Cortex-powered object descriptions. Snowflake can automatically generate human-readable descriptions for tables, views, and their columns using Snowflake-hosted LLMs. This is useful for Feature Store objects where feature names alone (e.g., ORDER_TOTAL_AMT_SUM) may not be self-explanatory to consumers unfamiliar with the domain. Descriptions are stored as the standard COMMENT on the object, so they appear in DESCRIBE, SHOW, Snowsight’s Catalog, and any data catalog tooling.
You can generate descriptions from Snowsight (navigate to the object → “Generate description”) or via SQL using the AI_GENERATE_TABLE_DESC stored procedure:
-- Generate descriptions for a Feature View's DT and all its columns
CALL AI_GENERATE_TABLE_DESC(
'FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"',
{ 'describe_columns': true, 'use_table_data': true }
);The procedure returns a JSON object with suggested descriptions. To persist them, set them as comments:
ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"
SET COMMENT = 'User-level purchase aggregates from clickstream ORDERS: total spend, order count, average order value, and most recent order timestamp.';
ALTER DYNAMIC TABLE FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"
MODIFY COLUMN ORDER_TOTAL_AMT_SUM COMMENT = 'Sum of all order amounts (TOTAL_AMT) for the user';For Feature Stores with many Feature Views, you can script this across an entire schema using the stored procedure examples in the documentation. Generated descriptions significantly improve feature discoverability – consumers searching the catalog can find features by what they represent, not just by column name.
These approaches are complementary. A common pattern is to use Contacts for ownership and stewardship visible in Snowsight, Tags for lightweight governance queries and cost attribution, Cortex descriptions for human-readable feature documentation, a metadata table for operational dashboards and alerting, and a YAML manifest in Git for CI/CD-driven deployments. See Chapter 11: Operations for monitoring SLA compliance and auditing metadata at scale.
4.8 Hierarchical Feature Views
When source data exists at a child grain (e.g., line items, events) but the model operates at a parent grain (e.g., orders, users), create aggregated rollup Feature Views that project child-level data up to the parent entity.
If you need to roll up from a finer-grained tiled (Time Aggregation) Feature View to a coarser entity, use RollupConfig (see Chapter 7: Rollup Feature Views). Do not create a new tiled Feature View that reads from another tiled Feature View – this causes duplicate internal _PARTIAL_* column names and SQL compilation errors. The correct hierarchy pattern for tiled FVs is: raw events → tiled FV (fine entity) → Rollup FV (coarse entity) via RollupConfig.
# Raw lineitem Feature View at the finest grain (optional: useful for line-level models)
lineitem_fv = FeatureView(
name="LINEITEM_FV", entities=[lineitem_entity],
feature_df=lineitem_df, timestamp_col="SHIP_DATE",
refresh_freq="1 hour",
desc="Raw lineitem features at (ORDER_ID, LINE_NUMBER) grain",
)import snowflake.snowpark.functions as F
order_items = session.table("ORDER_ITEMS")
order_lineitem_agg_df = order_items.group_by("ORDER_ID").agg(
F.sum("EXTENDED_PRICE").alias("ORDER_TOTAL_AMT"),
F.count("*").alias("LINE_COUNT"),
F.avg("EXTENDED_PRICE").alias("AVG_LINE_PRICE"),
F.max("SHIP_DATE").alias("LAST_SHIP_DATE"),
)order_agg_fv = FeatureView(
name="ORDER_LINEITEM_AGG_FV", entities=[order_entity],
feature_df=order_lineitem_agg_df, timestamp_col="LAST_SHIP_DATE",
refresh_freq="1 hour",
desc="Lineitem features aggregated to the order level",
)When generate_training_set or generate_dataset joins Feature Views at different entity grains, higher-level features are denormalized – repeated for every child row in the training data. For example, a spine at the lineitem grain joined with both lineitem_fv and order_agg_fv produces one row per lineitem, with order-level aggregates duplicated across all lines of the same order.
If denormalization is undesirable (e.g., the model expects one row per order), either:
- Use a spine at the parent grain and only include parent-level Feature Views, or
- Create a pivot Feature View that widens child data into columns (e.g.,
LINE_1_PRICE,LINE_2_PRICE) – though this is only practical with a bounded number of children. For unbounded child sets, useOBJECT_AGG,ARRAY_AGG, orLISTAGGto pack child data into a single column, then process it at retrieval time in Python (e.g., extract the last-N values or compute further aggregations over the array). See Chapter 12: Wide & Sparse Feature Data for the OBJECT packing pattern.
See Chapter 3: Modeling at a Different Grain for entity design considerations and Chapter 10: Training & Inference for multi-entity spine design.
4.9 Best Practices
1. Use Consistent Naming
# Pattern: <ENTITY>_<DOMAIN>_FV (SCREAMING_SNAKE_CASE)
"USER_ORDER_FV"
"PRODUCT_CATALOG_FV"
"SESSION_ENGAGEMENT_FV"Entity keys and table names should stay singular SCREAMING_SNAKE_CASE where applicable (USER, PRODUCT, …). Prefer aggregate suffixes _CNT, _SUM, _AVG, _TS, boolean prefixes IS_, and monetary columns like TOTAL_AMT (not AMOUNT).
When many Feature Views share the same entity (common in large deployments – 50+ Feature Views for a single entity like USER), column name collisions become inevitable. Rather than resolving collisions reactively during training data generation, adopt a prefixing convention as part of your Feature View naming standard from day one. Options include:
- Use
auto_prefix=Trueon allgenerate_dataset/generate_training_setcalls (columns becomeFV_NAME__FEATURE_NAME). - Use short
.with_name()prefixes per domain (e.g.,"ord","sess","eng") for more concise column names. - Embed the domain prefix in the
.alias()call on eachFeaturedefinition (e.g.,Feature.count("ORDER_ID", "7d").alias("ORD_ORDER_ID_CNT_7D")).
The choice affects downstream model code, feature importance reporting, and monitoring dashboards – so it is worth standardizing before the Feature View count grows large. See Chapter 7: Prefixing and Chapter 10: Feature Column Prefixing for detailed patterns.
2. Consolidate Feature Views from the Same Source
When multiple tiled Feature Views share the same source table, entity, and tile granularity, consolidate them into a single Feature View. Separate Feature Views over identical sources create independent tile DTs, duplicating compute. See Chapter 7: Consolidate Feature Views for detailed guidance and exceptions.
3. Document Feature Definitions
fv = FeatureView(
name="USER_ORDER_FV",
desc="""
USER-level order features from CLICKSTREAM_DATA.ORDERS for propensity models.
Features:
- ORDER_TOTAL_AMT_SUM: Sum of TOTAL_AMT across orders
- ORDER_CNT: Count of orders
- ORDER_TOTAL_AMT_AVG: Average order TOTAL_AMT
- LAST_ORDER_TS: Latest ORDER_TS (point-in-time column)
SLA: Refreshed hourly (TARGET_LAG), max 90 min lag
Owner: ml-platform@company.com
""",
# ...
)The Feature Store API also supports per-feature descriptions via attach_feature_desc(). These descriptions are stored as metadata and surfaced in list_columns(), making individual features discoverable:
fv = fv.attach_feature_desc({
"ORDER_TOTAL_AMT_SUM": "Sum of TOTAL_AMT across all orders for this user",
"ORDER_CNT": "Count of orders placed by this user",
"ORDER_TOTAL_AMT_AVG": "Average TOTAL_AMT per order",
"LAST_ORDER_TS": "Timestamp of the most recent order (point-in-time column)",
})
registered_fv = fs.register_feature_view(fv, "V01")
registered_fv.list_columns().show()4. Version on Any Consumer-Visible Change
# V01: Initial column set
# V02: Added ORDERS_7D_CNT
# V03: Changed aggregation logic (breaking) — new version, never in-place edit
# Increment version when you:
# - Add or remove features
# - Change calculations or filters
# - Change entity keys or timestamp semantics4.10 Common Pitfalls
4.10.1 ❌ Pitfall 1: No Versioning
Problem: Registering changes under the same version breaks dependent models.
Solution: Always register new versions (V02, V03, …) for consumer-visible changes.
4.10.2 ❌ Pitfall 2: Modifying a Registered Feature View In Place
Problem: Overwriting definitions for an existing name/version breaks replay and training reproducibility.
Solution: Treat each registered version as immutable; publish a new version instead.
4.10.3 ❌ Pitfall 3: Misunderstanding Refresh Cost
Problem: Assuming a short refresh_freq always incurs high compute cost, or conversely, setting an unnecessarily long lag “to save money.”
Solution: The DT scheduler’s change-detection check is near-zero cost – no warehouse compute is used when nothing has changed. A shorter lag than the source update frequency is safe and provides better freshness guarantees. However, if the source does update frequently and the DT query is expensive, a shorter lag means more frequent full refreshes. Match refresh_freq to the freshness your consumers actually need, not to the source rate.
4.10.4 ❌ Pitfall 4: No Documentation
Problem: Teams cannot trust or reuse features.
Solution: Use the desc parameter and keep a feature catalog in sync with V01, V02, …
4.11 Summary
| Concept | Description |
|---|---|
| View Feature View | refresh_freq=None (default): query-time; pass-through, declarative transforms, or dev precursor to DT |
| Dynamic Table (period) | refresh_freq="1 hour" (etc.): materialized; TARGET_LAG refresh; no compute when source unchanged |
| Dynamic Table + Task (CRON) | refresh_freq="0 8 * * *": initial DT refresh, then Task on schedule |
| Version (V01, V02, …) | Immutable snapshot of feature definitions; $-separated object names |
| Schema evolution | New features and breaking changes → new version; BACKFILL for cost-efficient evolution; base + presentation layer as alternative |
| Version retirement | Validate zero usage via lineage and access history before deleting old versions |
| Ownership & SLAs | Tags, Contacts, metadata tables, YAML manifests, Cortex descriptions |
4.12 Next Steps
Continue to Chapter 5: Feature Pipelines to learn about pipeline architectures and orchestration patterns.