# Define time-windowed aggregations declaratively (aliases: _SUM, _CNT, _AVG, …)
features = [
Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D"),
Feature.sum("TOTAL_AMT", "30d").alias("TOTAL_AMT_SUM_30D"),
Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D"),
Feature.avg("TOTAL_AMT", "7d").alias("TOTAL_AMT_AVG_7D"),
Feature.max("TOTAL_AMT", "30d").alias("TOTAL_AMT_MAX_30D"),
]
# Create tiled Feature View (source: canonical clickstream ORDERS)
orders_fqn = "FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS"
user_agg_fv = FeatureView(
name="USER_PURCHASE_AGGREGATES",
entities=[user_entity],
feature_df=session.table(orders_fqn),
timestamp_col="ORDER_TS",
refresh_freq="1 hour",
feature_granularity="1 hour",
features=features,
)
registered_agg_fv = fs.register_feature_view(
feature_view=user_agg_fv, version="V01", block=True, overwrite=True,
)
print(f"Registered: {registered_agg_fv.name}/V01 (tiled={registered_agg_fv.is_tiled})")
print(f"Granularity: {registered_agg_fv.feature_granularity}")7 Aggregations API
The Feature class for declarative time-windowed aggregations
snowflake, feature store, ml, machine learning, mlops
7.1 Overview
The Aggregations API provides a declarative way to define time-windowed features using the Feature class. Instead of writing complex SQL with window functions, you specify what aggregations you want and the Feature Store handles the efficient computation.
Examples in this chapter use the canonical demo layout: database FEATURE_STORE_DEMO, Feature Store schema FEATURE_STORE, clickstream source schema CLICKSTREAM_DATA, warehouse FS_DEV_WH, and tables such as ORDERS (ORDER_TS, ORDER_ID, USER_ID, TOTAL_AMT).
The Feature class and tiled aggregation API require snowflake-ml-python >= 1.24.0. This is a client-side SDK feature – it generates standard SQL (Dynamic Tables, window functions, aggregations) under the hood, so no account-level enablement or feature flag is needed. Install the required SDK version and you are ready to use it.
Official API documentation on the Snowflake docs portal may be limited at time of reading. For the authoritative parameter reference, use help(Feature) in a Python session or inspect the SDK source. API names and signatures may evolve in future releases.
See also the platform capabilities and preview status table in Chapter 12.
7.2 Learning Objectives
After completing this chapter, you will be able to:
- Use the Feature class to define time-windowed aggregations
- Understand tiling for efficient incremental computation
- Configure feature granularity and refresh frequencies
- Apply feature column prefixing for disambiguation
- Name aggregated features following best practices
- Generate features programmatically using data-driven patterns (categorical pivots, cross-product factories, config-driven specs)
📂 Chapter code: Browse companion scripts on GitHub
7.3 The Feature Class
7.3.1 Basic Usage
📁 Full code:
_code/feature_class.py
7.3.2 Available Aggregation Functions
| Function | Description | Example |
|---|---|---|
Feature.sum(col, window) |
Sum over window | Feature.sum("TOTAL_AMT", "7d") |
Feature.min(col, window) |
Minimum over window | Feature.min("TOTAL_AMT", "30d") |
Feature.max(col, window) |
Maximum over window | Feature.max("TOTAL_AMT", "30d") |
Feature.count(col, window) |
Count over window | Feature.count("ORDER_ID", "30d") |
Feature.avg(col, window) |
Average over window | Feature.avg("TOTAL_AMT", "7d") |
Feature.std(col, window) |
Standard deviation | Feature.std("TOTAL_AMT", "30d") |
Feature.var(col, window) |
Variance | Feature.var("TOTAL_AMT", "30d") |
Feature.last_n(col, window, n=N) |
N most recent values | Feature.last_n("PRODUCT_ID", "7d", n=5) |
Feature.first_n(col, window, n=N) |
N oldest values | Feature.first_n("PRODUCT_ID", "7d", n=5) |
Feature.last_distinct_n(col, window, n=N) |
N most recent unique values | Feature.last_distinct_n("PRODUCT_ID", "7d", n=5) |
Feature.first_distinct_n(col, window, n=N) |
N oldest unique values | Feature.first_distinct_n("PRODUCT_ID", "7d", n=5) |
Feature.approx_count_distinct(col, window) |
HLL distinct count | Feature.approx_count_distinct("PRODUCT_ID", "7d") |
Feature.approx_percentile(col, window, percentile=0.5) |
Approximate percentile | Feature.approx_percentile("TOTAL_AMT", "30d", percentile=0.5) |
Aggregation Feature Views backed by Dynamic Tables may fall back to full refresh if the underlying SQL combines aggregations on FLOAT-typed expressions (sum, avg, min, max, std, var) with a JOIN in the same query block. The DT engine cannot guarantee incremental correctness for floating-point aggregate diffs across joins.
If your source columns are FLOAT/DOUBLE, cast them to a fixed-point type before the aggregation reaches the DT definition:
The Temporal Aggregation API generates the underlying DT SQL automatically, so if your source table’s numeric columns are FLOAT and the API introduces a join (e.g., for tiling or rollup), the resulting DT may silently use full refresh. Check REFRESH_MODE in INFORMATION_SCHEMA.DYNAMIC_TABLES() after registration and alter source column types to NUMBER if needed.
ML model impact: For most ML use cases the precision difference between FLOAT and NUMBER(38, 8) is negligible – the variance across training runs or hyperparameter choices far exceeds the rounding delta. If you need higher fidelity, NUMBER(38, 12) provides ~12 decimal digits of precision, comparable to FLOAT for the value ranges typical in feature engineering. The key trade-off is incremental refresh (fast, cheap) vs full refresh (slow, expensive) – the precision loss from fixed-point casting is almost always the better deal.
Recommended approach for source table owners: Alter the source column type to NUMBER/DECIMAL at the table level rather than casting in every Feature View query. This avoids repeating the cast in multiple Feature Views and ensures all downstream consumers benefit:
ALTER TABLE FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
ALTER COLUMN TOTAL_AMT SET DATA TYPE NUMBER(18, 8);For rollup Feature Views, the same constraint applies transitively – if the source tiled Feature View aggregates FLOAT columns, the rollup DT also falls back to full refresh. Fix the types at the source level.
7.3.3 Aggregation Function Performance Characteristics
Not all aggregation functions are equal in terms of tile storage, computation cost, and incremental refresh compatibility. Understanding these trade-offs is important when designing features at scale.
| Function | Tile storage | Incremental | Notes |
|---|---|---|---|
count |
Partial counts per tile | Yes | Lightweight; most efficient aggregation |
sum |
Partial sums per tile | Yes (if not FLOAT+JOIN) |
Cast to NUMBER if source is FLOAT |
min, max |
Per-tile min/max | Yes | Used for windowed extremes |
avg |
Stored as sum + count | Yes (derived) | Reconstructed from partial sum/count at query time |
std, var |
Partial sum, sum-of-squares, count | Yes (derived) | Higher storage per tile; 3 partial aggregates |
approx_count_distinct |
HLL sketch per tile | Yes | Fixed-size sketch (~12 KB); merges across tiles |
last_n, first_n |
Array per tile | Yes | Storage scales with n; larger n increases tile size |
last_distinct_n, first_distinct_n |
Deduplicated array per tile | Yes | More expensive than last_n due to dedup logic |
approx_percentile |
T-Digest sketch per tile | Yes | Fixed-size sketch; approximate result |
Guidance for high-cardinality, high-volume data:
- Prefer
countandsumas primary features – they are the most efficient for both tiling and query-time reassembly. - Use
approx_count_distinctinstead of exact distinct counts when precision within ~2% is acceptable. The HLL sketch is constant-size regardless of cardinality, whereas exact distinct counts require per-tile array storage. - Limit
last_n/first_nvalues ofnto what the model actually needs. Each tile stores up tonvalues; atn=100with hourly tiles and 90-day windows, tile storage grows substantially. stdandvarstore three partial aggregates per tile (sum, sum-of-squares, count). If you only need variance for a subset of features, consider computing it post-retrieval fromsumandcountfeatures to reduce tile storage.
7.3.4 Filter Expressions Within Features
When the same source data feeds features that differ only by a filter condition (e.g., count of automated messages vs. count of all messages), use separate Feature definitions with filter logic in the source feature_df rather than creating multiple Feature Views:
import snowflake.snowpark.functions as F
orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
filtered_source_df = orders.select(
F.col("USER_ID"),
F.col("EVENT_TS"),
F.col("ORDER_ID"),
F.col("TOTAL_AMT"),
F.when(F.col("CHANNEL") == "AUTOMATED", F.col("ORDER_ID")).alias("AUTO_ORDER_ID"),
F.when(F.col("CHANNEL") == "AUTOMATED", F.col("TOTAL_AMT")).alias("AUTO_TOTAL_AMT"),
)features = [
Feature.count("ORDER_ID", "7d").alias("ORDER_CNT_7D"),
Feature.count("AUTO_ORDER_ID", "7d").alias("AUTO_ORDER_CNT_7D"),
Feature.sum("TOTAL_AMT", "30d").alias("SPEND_SUM_30D"),
Feature.sum("AUTO_TOTAL_AMT", "30d").alias("AUTO_SPEND_SUM_30D"),
]This approach keeps all related features in a single Feature View, which means they share tiles and refresh together – significantly more efficient than creating separate Feature Views per filter condition. The CASE WHEN produces NULL for non-matching rows, and aggregation functions skip NULLs, giving the correct filtered result.
The static approach above works well when the filter values are stable and few. When dimension values change over time (new channels, product categories, regions), see Data-Driven Feature Definitions for a programmatic alternative that discovers values from data and generates the CASE columns and Feature objects automatically.
7.3.5 Window Specifications
Windows can be specified using shorthand notation:
| Notation | Meaning |
|---|---|
"1h" or "1 hour" |
1 hour |
"24h" |
24 hours |
"7d" or "7 day" |
7 days |
"30d" |
30 days |
"1w" or "1 week" |
1 week |
7.3.6 Window offset
Pass offset to shift the aggregation window backward in time. This is the standard way to build comparative features such as week-over-week or month-over-month trends:
features = [
Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_CURRENT_WEEK"),
Feature.sum("TOTAL_AMT", "7d", offset="7d").alias("TOTAL_AMT_SUM_PREV_WEEK"),
]Pair the current-window and offset-window versions to capture momentum or change over time. For example, a model can detect spending acceleration when TOTAL_AMT_SUM_CURRENT_WEEK exceeds TOTAL_AMT_SUM_PREV_WEEK.
The offset must be a multiple of feature_granularity so the shifted window aligns cleanly to tile boundaries. For example, with feature_granularity="1h", valid offsets are "1h", "2h", "24h", "7d", etc.
Use .alias("NAME") on each Feature to set the output column name.
7.4 Naming Aggregated Features
7.4.1 Best Practice: Descriptive Aliases
Always use .alias() to give features meaningful names:
# ✅ GOOD: Suffix metric role (_SUM, _CNT, _AVG) and window
features = [
Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D"),
Feature.sum("TOTAL_AMT", "30d").alias("TOTAL_AMT_SUM_30D"),
Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D"),
]
# ❌ BAD: Auto-generated names are hard to interpret
features = [
Feature.sum("TOTAL_AMT", "7d"), # opaque default name
Feature.sum("TOTAL_AMT", "30d"),
]7.5 Feature Column Prefixing for Disambiguation
When joining features from multiple Feature Views, column name collisions can occur. Feature Store provides two mechanisms for disambiguation.
7.5.1 The Problem
# Name collision: same aggregation function on same column, different entity groupings,
# using the default naming derivation -- both resolve to ORDER_ID_CNT_7D
user_orders_fv = FeatureView(
name="USER_ORDERS",
entities=[user_entity],
features=[Feature.count("ORDER_ID", "7d")], # default name: ORDER_ID_CNT_7D
# ...
)
merchant_orders_fv = FeatureView(
name="MERCHANT_ORDERS",
entities=[merchant_entity],
features=[Feature.count("ORDER_ID", "7d")], # same default name: ORDER_ID_CNT_7D
# ...
)
# Without prefixing, collision occurs
dataset = fs.generate_dataset(
spine_df=spine,
features=[user_orders_fv, user_sessions_fv], # Which ORDER_ID_CNT_7D?
)Prefer explicit .alias() calls with distinct names (e.g., MERCHANT_ORDER_ID_CNT_7D) or use auto_prefix=True / .with_name() so semantics stay clear even when multiple Feature Views aggregate the same source column.
7.5.2 Solution 1: Auto-Prefix
Use auto_prefix=True to automatically prefix columns with Feature View name:
📁 Full code:
_code/prefixing.py
7.5.3 Solution 2: Custom Prefixes with .with_name()
For more control, use .with_name() on individual Feature Views or slices:
7.5.4 Solution 3: Combining Both Approaches
dataset = fs.generate_dataset(
spine_df=spine,
features=[
user_orders_fv.with_name("orders"), # Custom prefix
user_sessions_fv, # Will use auto_prefix
],
auto_prefix=True,
).with_name() takes precedence over auto_prefix. If a Feature View has .with_name() set, that prefix is used regardless of auto_prefix setting.
7.5.5 Choosing a Prefixing Strategy
| Scenario | Recommendation |
|---|---|
| Single Feature View | No prefix needed |
| Multiple FVs, unique column names | No prefix needed |
| Multiple FVs, potential collisions | Use auto_prefix=True |
| Want short, custom prefixes | Use .with_name() |
| Consistent naming across projects | Define naming convention + .with_name() |
7.6 Feature Granularity (Tiling)
7.6.1 How Tiling Works
The feature_granularity parameter sets the tile size for incremental computation:
7.6.2 Tile Size Recommendations
| Data Pattern | Recommended Tile | Reason |
|---|---|---|
| Clickstream (high frequency) | 1 hour |
Many events per hour |
| Transactions (medium) | 1 day |
Daily aggregation patterns |
| Weekly metrics | 1 week |
Coarse granularity sufficient |
7.6.3 Multi-Granularity Tile Strategy
When Feature Views need to serve both short lookback windows (1d, 7d) and long lookback windows (90d, 360d), a single tile size creates a trade-off: small tiles (1 hour) provide precision for short windows but require combining thousands of tiles for 360-day windows; large tiles (1 day) are efficient for long windows but lose sub-day precision.
The recommended approach is to use multiple Feature Views at different tile granularities, each optimized for the windows it serves:
short_window_features = [
Feature.sum("TOTAL_AMT", "1d").alias("SPEND_SUM_1D"),
Feature.sum("TOTAL_AMT", "7d").alias("SPEND_SUM_7D"),
Feature.count("ORDER_ID", "1d").alias("ORDER_CNT_1D"),
Feature.count("ORDER_ID", "7d").alias("ORDER_CNT_7D"),
]
short_window_fv = FeatureView(
name="USER_PURCHASE_SHORT_WINDOW",
entities=[user_entity],
feature_df=session.table(orders_fqn),
timestamp_col="ORDER_TS",
refresh_freq="1 hour",
feature_granularity="1 hour",
features=short_window_features,
desc="Short-window purchase features (1d, 7d) with hourly tiles",
)
long_window_features = [
Feature.sum("TOTAL_AMT", "30d").alias("SPEND_SUM_30D"),
Feature.sum("TOTAL_AMT", "90d").alias("SPEND_SUM_90D"),
Feature.sum("TOTAL_AMT", "360d").alias("SPEND_SUM_360D"),
Feature.count("ORDER_ID", "30d").alias("ORDER_CNT_30D"),
Feature.count("ORDER_ID", "360d").alias("ORDER_CNT_360D"),
]
long_window_fv = FeatureView(
name="USER_PURCHASE_LONG_WINDOW",
entities=[user_entity],
feature_df=session.table(orders_fqn),
timestamp_col="ORDER_TS",
refresh_freq="1 day",
feature_granularity="1 day",
features=long_window_features,
desc="Long-window purchase features (30d-360d) with daily tiles",
)At training time, join both Feature Views on the same spine:
training_df = fs.generate_training_set(
spine_df=spine,
features=[short_window_fv, long_window_fv],
spine_timestamp_col="EVENT_TS",
)Trade-offs:
| Approach | Tiles for 360d window | Refresh cost | Short-window precision |
|---|---|---|---|
Single FV, 1 hour tiles |
8,640 tiles | High (hourly refresh) | Sub-hour |
Single FV, 1 day tiles |
360 tiles | Low (daily refresh) | Daily only |
| Two FVs (recommended) | 360 tiles (daily FV) + 168 tiles for 7d (hourly FV) | Balanced | Sub-hour for short, daily for long |
The two-FV approach also lets you set different refresh_freq values: hourly for short windows where freshness matters, daily for long windows where the marginal value of hourly refresh is negligible.
7.7 Multi-Window Features
Define features with multiple time windows efficiently:
features = [
Feature.sum("TOTAL_AMT", "1d").alias("TOTAL_AMT_SUM_1D"),
Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D"),
Feature.sum("TOTAL_AMT", "30d").alias("TOTAL_AMT_SUM_30D"),
Feature.sum("TOTAL_AMT", "90d").alias("TOTAL_AMT_SUM_90D"),
Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D"),
Feature.count("ORDER_ID", "30d").alias("ORDER_ID_CNT_30D"),
]
# Single Feature View efficiently computes all windows (required FV args shown above)
multi_window_fv = FeatureView(
name="USER_MULTI_WINDOW",
entities=[user_entity],
feature_df=session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS"),
timestamp_col="ORDER_TS",
refresh_freq="1 day",
features=features,
feature_granularity="1 day",
)When you have many columns and windows, a helper function avoids repetitive boilerplate and enforces consistent naming:
from snowflake.ml.feature_store import Feature
def build_features(columns: list[str], windows: list[str], agg: str = "sum") -> list[Feature]:
agg_fn = getattr(Feature, agg)
return [
agg_fn(col, w).alias(f"{col}_{agg.upper()}_{w.upper()}")
for col in columns
for w in windows
]
features = (
build_features(["TOTAL_AMT", "DISCOUNT_AMT"], ["7d", "30d", "90d"], "sum")
+ build_features(["ORDER_ID"], ["7d", "30d"], "count")
)7.8 Best Practices
1. Use Consistent Naming Conventions
# Pattern: <COL_OR_ENTITY>_<ROLE>_<WINDOW> (roles: _SUM, _CNT, _AVG, …)
features = [
Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D"),
Feature.sum("TOTAL_AMT", "30d").alias("TOTAL_AMT_SUM_30D"),
Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D"),
Feature.avg("TOTAL_AMT", "7d").alias("TOTAL_AMT_AVG_7D"),
]2. Match Tile Size to Data Frequency
3. Group Related Features and Consolidate Feature Views
When multiple Feature Views share the same source table, entity, and tile granularity, they automatically share the same underlying tile computation. Consolidating them into a single Feature View reduces operational overhead (fewer DTs to monitor) and ensures tile computation happens once:
# ✅ GOOD: Related features in same Feature View -- single tile computation
purchase_features = [
Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D"),
Feature.sum("TOTAL_AMT", "30d").alias("TOTAL_AMT_SUM_30D"),
Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D"),
Feature.count("ORDER_ID", "30d").alias("ORDER_ID_CNT_30D"),
Feature.avg("TOTAL_AMT", "7d").alias("TOTAL_AMT_AVG_7D"),
]
# ❌ BAD: Same source split across multiple FVs -- duplicated tile computation
spend_fv = FeatureView(
name="USER_SPEND_FV",
feature_df=session.table(orders_fqn),
features=[Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D")],
# ...
)
count_fv = FeatureView(
name="USER_ORDER_COUNT_FV",
feature_df=session.table(orders_fqn),
features=[Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D")],
# ...
)Splitting features from the same source into separate Feature Views forces independent tile computations over the same data. Each Feature View creates its own DT with its own tiles, effectively doubling (or more) the compute cost. Consolidate into one Feature View when the source, entity, and granularity are shared.
When separate Feature Views are appropriate:
- Different source tables (e.g., ORDERS vs EVENTS)
- Different entities (e.g., USER vs PRODUCT)
- Different tile granularities (e.g., hourly tiles for short windows, daily for long – see Multi-Granularity)
- Different refresh cadences needed for operational reasons
- Unrelated feature domains that evolve independently
4. Tile Sharing Across Feature Views
When two Feature Views use the same source, entity, and feature_granularity, the underlying tile DTs compute identical partial aggregates. Today, the Feature Store does not automatically share tiles between separately registered Feature Views – each creates its own tile DT. This means:
- Consolidating features into fewer Feature Views (per best practice #3 above) is the primary mechanism for avoiding duplicate tile computation.
- If you need separate Feature Views for organisational reasons (different owners, different version lifecycles), be aware that tile compute is duplicated. The cost impact depends on source data volume and tile granularity.
- A future platform optimisation may deduplicate tiles across Feature Views with identical source/entity/granularity, but until then, consolidation is the recommended approach.
You cannot chain (stack) one Time Aggregation Feature View on top of another. Attempting it causes duplicate internal _PARTIAL_* column names and SQL compilation errors that block both retrieve_feature_values() and generate_training_set(). To aggregate from a finer-grained entity to a coarser one (e.g., visitor → subscriber), use Rollup Feature Views with RollupConfig as described below – not raw stacking of tiled Feature Views.
7.9 Rollup Aggregated Feature Views
Rollup Feature Views enable hierarchical aggregation – creating coarser-grained features from finer-grained tiled Feature Views without recomputing from raw data. For example, clickstream-style hierarchies often roll up VISITOR → USER → HOUSEHOLD: a visitor-level “7-day page view count” can be rolled up to user-level (sum across all visitors linked to that user) and then to household-level (sum across all users in the household).
The RollupConfig defines the relationship between a finer-grained (source) Feature View and the coarser-grained (target) Feature View. The Feature Store leverages Dynamic Tables for automatic, incremental updates.
7.9.1 RollupConfig Arguments
| Argument | Description |
|---|---|
source |
The parent tiled Feature View to roll up from. Must be registered (not in DRAFT status) |
mapping_df |
A Snowpark DataFrame containing the entity mapping – all join keys from both the source (finer) and target (coarser) entities |
7.9.2 Example: Visitor → Subscriber Rollup
from snowflake.ml.feature_store import FeatureStore, FeatureView, Entity, RollupConfig, Feature
# Entities at both levels of the hierarchy
visitor_entity = Entity("VISITOR", ["VISITOR_ID", "COMPANY_ID"])
subscriber_entity = Entity("SUBSCRIBER", ["SUBSCRIBER_ID", "COMPANY_ID"])
fs.register_entity(visitor_entity)
fs.register_entity(subscriber_entity)
# Fine-grained: tiled visitor-level Feature View (already registered)
visitor_features = [
Feature.sum("PAGE_VIEWS", "7d").alias("PAGE_VIEWS_7D"),
Feature.count("SESSION_ID", "30d").alias("SESSION_CNT_30D"),
]
visitor_fv = FeatureView(
name="VISITOR_ENGAGEMENT_FV",
entities=[visitor_entity],
feature_df=session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.EVENTS"),
timestamp_col="EVENT_TS",
refresh_freq="1 hour",
feature_granularity="1 hour",
features=visitor_features,
desc="Visitor-level engagement features — tiled",
)
registered_visitor_fv = fs.register_feature_view(visitor_fv, version="V01")
# Entity mapping: links visitor keys to subscriber keys
mapping_df = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.VISITOR_SUBSCRIBER_MAPPING")
# Expected columns: VISITOR_ID, COMPANY_ID, SUBSCRIBER_ID
# Rollup configuration
rollup_config = RollupConfig(
source=registered_visitor_fv,
mapping_df=mapping_df,
)
# Coarse-grained: subscriber-level rollup Feature View
subscriber_fv = FeatureView(
name="SUBSCRIBER_ENGAGEMENT_FV",
entities=[subscriber_entity],
rollup_config=rollup_config,
desc="Subscriber-level engagement — rolled up from visitor-level tiles",
)
fs.register_feature_view(subscriber_fv, version="V01")The resulting SUBSCRIBER_ENGAGEMENT_FV automatically aggregates the visitor-level tiles up to the subscriber level using the entity mapping. Because it is backed by a Dynamic Table, it refreshes incrementally as the source visitor Feature View updates.
The source Feature View must be a registered, tiled Feature View (one with features and feature_granularity). You cannot roll up a non-tiled DT-based or View-based Feature View.
7.9.3 Rollup Design Guidelines
- Define entities that match the grain of each Feature View (for example, visitor-level tiles on
EVENTS, user-level aggregates onORDERSkeyed byUSER_ID). - Reuse the same aggregation patterns (
Feature.sum,Feature.count, …) at each level; change the spine entity and join keys when moving from visitor to user to household. - Keep aliases explicit (
VISITOR_PAGE_VIEWS_7DvsUSER_PAGE_VIEWS_7D) so rolled-up datasets remain interpretable when multiple hierarchy levels appear in one training set. - Design mapping tables carefully. The
mapping_dfdetermines which fine-grained entities roll up to which coarse-grained entities. For static relationships a simpleDISTINCTmapping suffices; for time-varying relationships (visitors identified as subscribers over time), consider SCD-Type2 patterns orMAPPED_ATtimestamps. See Chapter 3: Entity Mapping Tables for patterns and PIT correctness implications. - Watch for late-arriving mappings. If entity mappings are added retroactively, the rollup may aggregate historical tile data under a mapping that did not exist at the original event time, creating subtle data leakage. See the mapping table PIT warning in Chapter 3 and Chapter 6.
7.10 Common Pitfalls
7.10.1 ❌ Pitfall 1: No Alias
Problem: Auto-generated names are unclear.
Solution: Always use .alias() with descriptive names.
7.10.2 ❌ Pitfall 2: Wrong Tile Size
Problem: Tile size doesn’t match data frequency.
Solution: Use smaller tiles for high-frequency data.
7.10.3 ❌ Pitfall 3: Column Collisions
Problem: Same column names from different Feature Views.
Solution: Use auto_prefix=True or .with_name().
7.10.4 ❌ Pitfall 4: Internal _PARTIAL Columns in Output
Problem: Internal columns used for tile aggregation (prefixed _PARTIAL_) sometimes appear in output DataFrames from generate_training_set() or retrieve_feature_values(). These are implementation artifacts – partial aggregates stored per tile – and are not user-facing features.
Solution: Filter them before consuming the output:
7.10.5 ❌ Pitfall 5: Partial Leading-Edge Tiles from Incomplete Source Intervals
Problem: When the source data is a continuous stream aggregated into fixed time buckets (e.g., 15-minute intervals), the current in-progress bucket is incomplete at DT refresh time. The tiled Feature View generates a tile from this partial interval, producing aggregation values with a different distribution than complete historical tiles. This creates training/serving skew: the model trains on features derived from complete intervals but at inference time receives features that include a partial tile.
This affects all tiled aggregations — Feature.avg, Feature.sum, Feature.count, Feature.min, Feature.max — because the partial tile has fewer source events than a complete one.
Solution: Two complementary strategies:
Strategy 1 — Use offset to skip the current interval (DT-safe):
The offset parameter shifts the aggregation window back, inherently excluding the potentially partial current tile. This is deterministic SQL that preserves incremental DT refresh:
# Window covers intervals ending one interval ago — skips partial current tile
Feature.avg("QTY", "1h", offset="15m").alias("QTY_ROLL_4_MEAN")
Feature.sum("QTY", "1h", offset="15m").alias("QTY_ROLL_4_SUM")This is the simplest and most robust approach for tiled Feature Views. The trade-off is that features always reflect the state one interval behind — typically acceptable for forecasting use cases where the current interval is incomplete anyway.
Strategy 2 — Gate the source table externally (outside the DT):
Use a scheduled Task or Stream+Task pipeline that writes only complete intervals to the source table. The DT reads a table that never contains partial data:
-- Task runs every 15 minutes: write only complete intervals to source
INSERT INTO SOURCE_ORDERS_15MIN
SELECT TIME_SLICE(ORDER_TS, 15, 'MINUTE') AS INTERVAL_ID,
ITEM_CODE, BRANCH_CODE, COUNT(*) AS QTY
FROM RAW_ORDERS
WHERE TIME_SLICE(ORDER_TS, 15, 'MINUTE')
< TIME_SLICE(CURRENT_TIMESTAMP(), 15, 'MINUTE', 'START')
GROUP BY 1, 2, 3;CURRENT_TIMESTAMP() is non-deterministic and forces FULL refresh. The TIME_SLICE gating filter must be applied in a Task or ingestion pipeline outside the DT — never in the DT definition SQL or the feature_df passed to a Feature View. Even when passed via session.sql(...), the non-deterministic expression flows through to the DT and prevents incremental refresh.
For non-tiled DT Feature Views with explicit window functions, use ROWS BETWEEN N PRECEDING AND 1 PRECEDING (excluding CURRENT ROW) as the DT-safe equivalent. See Chapter 5: Partial Leading-Edge Interval and Chapter 6: Data Leakage from Partial Intervals.
7.11 Data-Driven Feature Definitions
The examples earlier in this chapter define features with hard-coded column names, filter values, and window periods. That works well for a handful of stable features, but becomes brittle and verbose as feature sets grow. This section introduces data-driven patterns that generate Feature objects and the corresponding feature_df columns programmatically.
The Feature View schema is fixed at registration time. Data-driven generation makes the code path adaptive, not the registered Feature View itself. When underlying data changes (e.g., a new channel appears), re-run the generation code and register a new version of the Feature View.
Pattern 1: Categorical Pivot — Static vs Dynamic
Filter Expressions Within Features shows how CASE WHEN decomposes a metric by a single hard-coded dimension value. When the set of values is large or evolving, generate the columns programmatically:
Static (2 known channels):
import snowflake.snowpark.functions as F
orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
source_df = orders.select(
"USER_ID", "ORDER_TS", "ORDER_ID", "TOTAL_AMT",
F.when(F.col("CHANNEL") == "AUTOMATED", F.col("ORDER_ID")).alias("AUTOMATED_ORDER_ID"),
F.when(F.col("CHANNEL") == "WEB", F.col("ORDER_ID")).alias("WEB_ORDER_ID"),
)features = [
Feature.count("ORDER_ID", "7d").alias("ORDER_CNT_7D"),
Feature.count("AUTOMATED_ORDER_ID", "7d").alias("AUTOMATED_ORDER_CNT_7D"),
Feature.count("WEB_ORDER_ID", "7d").alias("WEB_ORDER_CNT_7D"),
]Dynamic (discovers all channels from data):
from snowflake.snowpark import functions as F
channels = [row["CHANNEL"] for row in
session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
.select("CHANNEL").distinct().collect()]
source_df = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS") \
.select("USER_ID", "ORDER_TS", "ORDER_ID", "TOTAL_AMT", "CHANNEL")
for ch in channels:
tag = ch.upper().replace(" ", "_")
source_df = source_df.with_column(
f"{tag}_ORDER_ID",
F.when(F.col("CHANNEL") == ch, F.col("ORDER_ID"))
)
feature_df = source_df.drop("CHANNEL")
features = [Feature.count("ORDER_ID", "7d").alias("ORDER_CNT_7D")]
for ch in channels:
tag = ch.upper().replace(" ", "_")
features.append(
Feature.count(f"{tag}_ORDER_ID", "7d").alias(f"{tag}_ORDER_CNT_7D")
)When to prefer dynamic: Dimension values change over time (new channels, product categories, regions). Adding a value requires only a data change and a new Feature View version — no code edits.
Pattern 2: Multi-Window Sweep
The Multi-Window Features section above shows a build_features helper that loops over columns and windows. This is already a data-driven pattern: the columns and windows lists are the configuration, and the helper generates Feature objects with consistent naming. Refer to that helper as the starting point for more complex patterns.
Pattern 3: Metric × Dimension × Window Factory
Patterns 1 and 2 combine naturally into a full cross-product: for every (metric, dimension value, window) triple, generate a filtered column and a Feature. With 3 metrics, 4 channels, and 3 windows, this produces 36 features from a single function call instead of 36 hand-written definitions.
from itertools import product as cartesian
from snowflake.snowpark import functions as F
from snowflake.ml.feature_store import Feature
AGG_DISPATCH = {
"count": Feature.count,
"sum": Feature.sum,
"avg": Feature.avg,
"max": Feature.max,
"min": Feature.min,
}
def build_pivot_features(
session,
source_table: str,
entity_col: str,
ts_col: str,
pivot_col: str,
metrics: dict[str, str], # {"ORDER_ID": "count", "TOTAL_AMT": "sum"}
windows: list[str], # ["7d", "30d", "90d"]
) -> tuple:
"""Return (feature_df, features_list) with full cross-product."""
pivots = [r[0] for r in session.sql(
f"SELECT DISTINCT {pivot_col} FROM {source_table} ORDER BY 1"
).collect()]
df = session.table(source_table) \
.select(entity_col, ts_col, pivot_col, *metrics.keys())
features = []
# Base (unfiltered) features across all windows
for col, fn_name in metrics.items():
for w in windows:
features.append(
AGG_DISPATCH[fn_name](col, w).alias(f"{col}_{fn_name.upper()}_{w.upper()}")
)
# Per-pivot filtered features
for pv in pivots:
tag = pv.upper().replace(" ", "_")
for col in metrics:
derived = f"{tag}_{col}"
df = df.with_column(
derived, F.when(F.col(pivot_col) == pv, F.col(col))
)
for w in windows:
fn_name = metrics[col]
features.append(
AGG_DISPATCH[fn_name](derived, w)
.alias(f"{tag}_{col}_{fn_name.upper()}_{w.upper()}")
)
return df.drop(pivot_col), featuresUsage:
feature_df, features = build_pivot_features(
session,
source_table="FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS",
entity_col="USER_ID",
ts_col="ORDER_TS",
pivot_col="CHANNEL",
metrics={"ORDER_ID": "count", "TOTAL_AMT": "sum"},
windows=["7d", "30d"],
)
fv = FeatureView(
name="USER_CHANNEL_METRICS",
entities=[user_entity],
feature_df=feature_df,
timestamp_col="ORDER_TS",
refresh_freq="1 day",
features=features,
feature_granularity="1 day",
)Pattern 4: Config-Driven Feature Specifications
Instead of embedding feature logic in code, define feature specifications as data — either a Python dict for version-controlled config or a Snowflake table for runtime flexibility.
Python dict (version-controlled):
FEATURE_SPECS = [
{"col": "TOTAL_AMT", "agg": "sum", "windows": ["7d", "30d", "90d"]},
{"col": "ORDER_ID", "agg": "count", "windows": ["7d", "30d"]},
{"col": "TOTAL_AMT", "agg": "avg", "windows": ["30d"]},
]
features = []
for spec in FEATURE_SPECS:
fn = AGG_DISPATCH[spec["agg"]]
for w in spec["windows"]:
features.append(
fn(spec["col"], w).alias(f"{spec['col']}_{spec['agg'].upper()}_{w.upper()}")
)Snowflake table (runtime-configurable):
CREATE TABLE IF NOT EXISTS FEATURE_STORE_DEMO.FEATURE_STORE.FEATURE_SPECS (
DOMAIN VARCHAR, -- e.g. 'ORDERS', 'SESSIONS'
SOURCE_COL VARCHAR, -- column to aggregate
AGG_FN VARCHAR, -- 'sum', 'count', 'avg', etc.
WINDOW_SIZE VARCHAR, -- '7d', '30d', etc.
ENABLED BOOLEAN DEFAULT TRUE
);specs = session.table("FEATURE_STORE_DEMO.FEATURE_STORE.FEATURE_SPECS") \
.filter(F.col("DOMAIN") == "ORDERS") \
.filter(F.col("ENABLED") == True) \
.collect()
features = []
for row in specs:
fn = AGG_DISPATCH[row["AGG_FN"]]
features.append(
fn(row["SOURCE_COL"], row["WINDOW_SIZE"])
.alias(f"{row['SOURCE_COL']}_{row['AGG_FN'].upper()}_{row['WINDOW_SIZE'].upper()}")
)| Approach | Pros | Cons |
|---|---|---|
| Python dict | Version-controlled, reviewable in PRs, easy to test | Requires code deployment to change |
| Snowflake table | Non-developers can manage; takes effect on next FV version | State split between git (code) and table (config); harder to reproduce; audit trail needed |
A hybrid approach keeps the dict in code as the source of truth and syncs it to a Snowflake table for discoverability. This gives you version control and runtime queryability.
Pattern 5: Ratio and Derived Features
Many useful features are derived from base aggregations — conversion rates, averages-per-order, or period-over-period growth. The Feature class supports base aggregation functions only; ratios and formulas must be computed separately.
The underlying Dynamic Table of a tiled Feature View contains internal _PARTIAL columns — tile-level fragments, not final values. You cannot compute ratios by reading the DT directly (e.g., session.table("FEATURE_STORE.USER_ORDER_FV$V01")). You must use the Feature Store API to reassemble tiles into final aggregated values first.
Approach A: Post-retrieval (recommended for training pipelines)
generate_dataset and retrieve_feature_values handle tile reassembly automatically. Compute ratios on the returned DataFrame:
RATIOS = [
("AUTOMATED_ORDER_CNT_7D", "ORDER_CNT_7D", "AUTOMATION_RATE_7D"),
("DISCOUNT_AMT_SUM_30D", "TOTAL_AMT_SUM_30D", "DISCOUNT_RATE_30D"),
]
dataset_df = fs.generate_dataset(
name="TRAINING_SET",
spine_df=spine,
features=[base_fv],
output_type="dataset",
).read.to_snowpark_dataframe()
for num, denom, alias in RATIOS:
dataset_df = dataset_df.with_column(
alias,
F.iff(F.col(denom) != 0, F.col(num) / F.col(denom), F.lit(None))
)Approach B: Downstream View-based Feature View
Use read_feature_view to get the reassembled values, then build a non-tiled Feature View that computes derived columns:
base_fv = fs.get_feature_view("USER_ORDER_METRICS", "V01")
reassembled_df = fs.read_feature_view(base_fv)
for num, denom, alias in RATIOS:
reassembled_df = reassembled_df.with_column(
alias,
F.iff(F.col(denom) != 0, F.col(num) / F.col(denom), F.lit(None))
)
ratio_fv = FeatureView(
name="USER_ORDER_RATIOS",
entities=[user_entity],
feature_df=reassembled_df,
# No refresh_freq — View-based, recomputes on read
)
fs.register_feature_view(ratio_fv, version="V01")Safe division (F.iff(denom != 0, ...)) prevents NaN / divide-by-zero errors that can silently corrupt model training. Always guard ratio features.
Pattern 6: Interaction Features
Interaction features capture non-linear relationships by combining pairs of columns — typically products (a × b) or ratios (a / b). These apply to any Feature View source DataFrame, not just aggregation Feature Views.
Static (hand-picked pairs):
source_df = source_df.with_column(
"AMT_X_QTY", F.col("TOTAL_AMT") * F.col("QUANTITY")
).with_column(
"PRICE_PER_UNIT", F.iff(F.col("QUANTITY") != 0,
F.col("TOTAL_AMT") / F.col("QUANTITY"),
F.lit(None))
)Dynamic (all pairwise products from a list):
from itertools import combinations
NUMERIC_COLS = ["TOTAL_AMT", "QUANTITY", "DISCOUNT_AMT"]
for a, b in combinations(NUMERIC_COLS, 2):
source_df = source_df.with_column(
f"{a}_X_{b}", F.col(a) * F.col(b)
)With 5 numeric columns this generates 10 interaction features; with 10 columns it generates 45. Use this for exploratory model experiments, then prune non-contributing interactions after feature importance evaluation to avoid unnecessary compute.
Static vs Dynamic — Trade-Off Summary
| Dimension | Static | Dynamic |
|---|---|---|
| Readability | Immediately clear; each feature visible in code | Requires understanding the generation logic |
| Verbosity | High — repetitive definitions | Low — DRY, loop-based |
| Testability | Each feature testable individually | Test the generator + spot-check outputs |
| Adaptability | Manual edit per new value/metric/window | Adapts to data changes; new FV version only |
| Debugging | Straightforward — read the code | Inspect generated DataFrame schema |
| Git diffs | Explicit per-feature changes | Changes to generator logic only |
| Registration | Same | Same — schema fixed at register_feature_view |
Guidance: Start static for small, stable feature sets (under ~10 features). Switch to dynamic when you find yourself copying and renaming feature definitions, when dimension values evolve, or when you need to experiment with many metric/window combinations rapidly.
Consuming Data-Driven Features with OBJECT_CONSTRUCT
The factory patterns above enforce a naming convention — {DIMENSION}_{METRIC}_{AGG}_{WINDOW} — that becomes a queryable contract. Snowflake’s OBJECT_CONSTRUCT with ILIKE and EXCLUDE filtering can slice dynamically generated columns by any naming axis without hard-coding column lists:
-- Slice features by dimension, window, or collect everything — no column list needed
SELECT
USER_ID,
OBJECT_CONSTRUCT(* ILIKE 'AUTOMATED_%') AS automated_features,
OBJECT_CONSTRUCT(* ILIKE '%_7D') AS window_7d_features,
OBJECT_CONSTRUCT(* ILIKE '%_30D') AS window_30d_features,
OBJECT_CONSTRUCT(* EXCLUDE (USER_ID, ORDER_TS)) AS all_features
FROM FEATURE_STORE.USER_CHANNEL_METRICS$V01This is particularly valuable when the factory generates dozens of columns — consumers can dynamically group features by dimension, window, or metric type without knowing every column name. The resulting OBJECTs are well-suited for:
- Feature serving APIs that return JSON payloads
- Feature exploration —
OBJECT_KEYS(all_features)lists what was generated - Selective retrieval — extract specific features with
all_features:"AUTOMATED_ORDER_ID_COUNT_7D"path syntax
Two-tier Feature View pattern: Combine this with a lightweight View-based Feature View that repackages the flat columns from a tiled DT-backed Feature View into structured OBJECTs. The tiled FV handles efficient incremental computation; the View FV provides the structured consumption interface. See Chapter 12: Wide & Sparse Feature Data for the full two-tier architecture.
OBJECT_CONSTRUCT operates at the consumption layer — downstream queries and serving. The Feature class still requires flat scalar columns as inputs during definition. The naming convention from the factory is what bridges the two: structured names at definition time enable wildcard grouping at consumption time.
7.12 Summary
| Concept | Description |
|---|---|
| Feature class | Declarative API for time-windowed aggregations (Feature.sum, Feature.count, …) |
| Window | Time period for aggregation (e.g., "7d", "30d"); optional offset shifts the window back |
| Tile (feature_granularity) | Pre-computation unit for efficiency |
| Required FV args | timestamp_col, refresh_freq, feature_granularity, features (plus name, entities, feature_df) |
| Version | Register with zero-padded strings such as "V01", "V02" |
| auto_prefix | Automatic column prefixing with FV name |
.with_name() |
Custom column prefixing |
| RollupConfig | Hierarchical aggregation from fine-grained to coarse-grained entities via RollupConfig(source, mapping_df) |
| Data-driven definitions | Programmatic generation of Feature objects and source columns from data or config; patterns include categorical pivot, cross-product factory, config-driven specs, ratio features, and interaction features |
7.13 Tiled Feature Views and Online Serving
A tiled Feature View’s underlying Dynamic Table stores partial aggregates per tile interval — columns like _PARTIAL_SUM_TOTAL_AMT, _PARTIAL_COUNT_ORDER_ID, and a TILE_START timestamp. The original timestamp_col (e.g., ORDER_TS) does not exist in the DT schema. When you attempt to create a Hybrid Table Online Feature Table from a tiled FV, the CREATE ONLINE FEATURE TABLE ... TIMESTAMP_COLUMN = ORDER_TS statement fails:
SQL compilation error: column 'ORDER_TS' does not exist
The SDK does not validate this upfront — the error only surfaces at OFT creation time during register_feature_view.
For offline retrieval (fs.read_feature_view, fs.retrieve_feature_values), the Feature Store performs tile reassembly automatically — window aggregations are computed correctly at read time. The issue only arises when you want to push these aggregated values into a Hybrid Table for low-latency online serving.
7.13.1 Workaround: online_fv_from_tiled Helper
Manually re-expressing the same aggregation logic as a SQL GROUP BY is error-prone and violates DRY. The online_fv_from_tiled() helper introspects a tiled Feature View’s Feature definitions and automatically generates a non-tiled, OFT-compatible companion:
from online_from_tiled import online_fv_from_tiled
# Step 1: Define the tiled FV (offline only, no online_config)
tiled_fv = FeatureView(
name="USER_PURCHASE_TILED",
entities=[user_entity],
feature_df=session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS"),
timestamp_col="ORDER_TS",
refresh_freq="1 hour",
feature_granularity="1 hour",
features=purchase_features,
desc="Tiled aggregations — offline only",
)
tiled_reg = fs.register_feature_view(tiled_fv, version="V01", block=True)
# Step 2: Auto-generate a non-tiled, OFT-ready companion
online_fv = online_fv_from_tiled(
tiled_reg, session,
source_query="SELECT * FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS",
refresh_freq="1 minute",
target_lag="1 minute",
window_buffer="1d",
)
online_reg = fs.register_feature_view(online_fv, version="V01", block=True)The helper produces a GROUP BY query scoped to only max(window + offset) + buffer of source data. For the purchase features above (7-day windows, no offset, 1-day buffer), the generated SQL is:
SELECT
USER_ID,
SUM(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
THEN TOTAL_AMT END) AS TOTAL_AMT_SUM_7D,
COUNT(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
THEN ORDER_ID END) AS ORDER_ID_CNT_7D,
AVG(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
THEN TOTAL_AMT END) AS TOTAL_AMT_AVG_7D,
MAX(ORDER_TS) AS LAST_ORDER_TS
FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
WHERE ORDER_TS >= DATEADD('DAY', -8, CURRENT_TIMESTAMP())
GROUP BY USER_IDEach feature gets its own CASE WHEN filter for its specific window, so features with different windows (e.g., 7-day and 30-day) coexist correctly in a single query. The outer WHERE clause scopes the entire query to the longest window plus the buffer.
online_fv_from_tiled() reads each Feature object’s internal attributes (_function, _column, _window, _offset, _alias) and maps them to standard SQL aggregation functions. It:
- Converts each
Featureaggregation type to its SQL equivalent (e.g.,Feature.sum→SUM,Feature.stddev→STDDEV) - Applies per-feature
CASE WHENwindow filters based on the feature’s_windowand_offset - Computes the maximum lookback across all features and adds
window_bufferto form the outerWHEREclause - Adds
MAX(timestamp_col) AS LAST_{timestamp_col}as the timestamp for OFT tracking - Returns a new
FeatureViewwithonline_configenabled and no tiling
For a registered tiled FV (whose internal query has been rewritten to the tiled form), you must pass source_query= pointing at the original source table.
Unsupported aggregations: FIRST_N, LAST_N, FIRST_DISTINCT_N, and LAST_DISTINCT_N cannot be reduced to a simple GROUP BY — the helper raises a clear error for these.
You can also use online_sql_from_features() to inspect or customise the generated SQL before creating a FeatureView:
7.13.2 Manual Approach: Direct SQL
If you prefer full control or need to customise the query beyond what the helper generates, you can write the non-tiled FV manually:
from snowflake.ml.feature_store import feature_view as fv_mod
online_df = session.sql("""
SELECT
USER_ID,
SUM(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
THEN TOTAL_AMT END) AS TOTAL_AMT_SUM_7D,
COUNT(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
THEN ORDER_ID END) AS ORDER_ID_CNT_7D,
AVG(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
THEN TOTAL_AMT END) AS TOTAL_AMT_AVG_7D,
MAX(ORDER_TS) AS LAST_ORDER_TS
FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
WHERE ORDER_TS >= DATEADD('DAY', -8, CURRENT_TIMESTAMP())
GROUP BY USER_ID
""")
online_fv = FeatureView(
name="USER_PURCHASE_ONLINE",
entities=[user_entity],
feature_df=online_df,
timestamp_col="LAST_ORDER_TS",
refresh_freq="1 minute",
online_config=fv_mod.OnlineConfig(enable=True, target_lag="1 minute"),
desc="Non-tiled online companion for USER_PURCHASE_TILED",
)
online_reg = fs.register_feature_view(online_fv, version="V01", block=True)from snowflake.snowpark import functions as F
from snowflake.ml.feature_store import feature_view as fv_mod
orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
cutoff_7d = F.dateadd("DAY", F.lit(-7), F.current_timestamp())
cutoff_8d = F.dateadd("DAY", F.lit(-8), F.current_timestamp())
online_df = (
orders
.filter(F.col("ORDER_TS") >= cutoff_8d)
.group_by("USER_ID")
.agg(
F.sum(F.when(F.col("ORDER_TS") >= cutoff_7d, F.col("TOTAL_AMT")))
.alias("TOTAL_AMT_SUM_7D"),
F.count(F.when(F.col("ORDER_TS") >= cutoff_7d, F.col("ORDER_ID")))
.alias("ORDER_ID_CNT_7D"),
F.avg(F.when(F.col("ORDER_TS") >= cutoff_7d, F.col("TOTAL_AMT")))
.alias("TOTAL_AMT_AVG_7D"),
F.max("ORDER_TS").alias("LAST_ORDER_TS"),
)
)
online_fv = FeatureView(
name="USER_PURCHASE_ONLINE",
entities=[user_entity],
feature_df=online_df,
timestamp_col="LAST_ORDER_TS",
refresh_freq="1 minute",
online_config=fv_mod.OnlineConfig(enable=True, target_lag="1 minute"),
desc="Non-tiled online companion for USER_PURCHASE_TILED",
)
online_reg = fs.register_feature_view(online_fv, version="V01", block=True)Both the helper and the manual approach use CURRENT_TIMESTAMP() in the WHERE clause to scope data to only the maximum window needed. This forces the resulting Dynamic Table to use FULL refresh on every cycle (not incremental), because Snowflake cannot track changes on queries with non-deterministic functions.
For short-to-medium windows (e.g., 7–30 days), the scoped data volume is small and FULL refresh typically completes in 1–2 seconds. For wide windows (90d, 180d, 365d), the FULL refresh must re-scan a proportionally larger slice of the source table on every cycle. At 365 days the scoped query effectively touches most of the table, negating much of the scoping benefit.
| Window Width | Approx. Source Scope | FULL Refresh Impact |
|---|---|---|
| 7 days | ~1 week + buffer | Fast (seconds) |
| 30 days | ~1 month + buffer | Low (seconds to tens of seconds) |
| 90–180 days | Quarters to half-year | Moderate — monitor DT refresh duration and warehouse cost |
| 365 days | ~full table | High — consider whether online serving of year-wide aggregates is truly needed, or pre-compute in a scheduled task instead |
| Component | Refresh Mode | Why |
|---|---|---|
| Tiled FV (offline) | INCREMENTAL | Deterministic tiling query, efficient for large historical datasets |
| Online FV (helper-generated) | FULL | CURRENT_TIMESTAMP() in WHERE prevents change tracking, but scoped to bounded window |
| OFT (Hybrid Table) | Mirrors DT | Syncs changes from the online DT each cycle |
The recommended pattern is to use both: the Aggregations API for offline computation (efficient tiled storage) and the online_fv_from_tiled() helper for online serving (scoped, bounded FULL refresh). If your widest feature window exceeds 90 days, benchmark the DT refresh time and cost before committing to a 1-minute refresh_freq — a longer refresh cycle (e.g., 5–15 minutes) may be more appropriate for wide windows.
The upcoming Postgres-backed online store (OnlineStoreType.POSTGRES) is designed to handle tiled Feature Views natively. It performs query-time tile reassembly to return correct window-aggregated values directly. Once this backend reaches General Availability, it will eliminate the need for the materialized workaround. See Chapter 8: Online Features for online store configuration details.
7.14 Next Steps
Continue to Chapter 8: Online Features to learn about Online Feature Tables for real-time serving, including the limitations and workarounds for tiled Feature Views discussed above.