7  Aggregations API

The Feature class for declarative time-windowed aggregations

Keywords

snowflake, feature store, ml, machine learning, mlops

7.1 Overview

The Aggregations API provides a declarative way to define time-windowed features using the Feature class. Instead of writing complex SQL with window functions, you specify what aggregations you want and the Feature Store handles the efficient computation.

Examples in this chapter use the canonical demo layout: database FEATURE_STORE_DEMO, Feature Store schema FEATURE_STORE, clickstream source schema CLICKSTREAM_DATA, warehouse FS_DEV_WH, and tables such as ORDERS (ORDER_TS, ORDER_ID, USER_ID, TOTAL_AMT).

SDK Version and Documentation Status

The Feature class and tiled aggregation API require snowflake-ml-python >= 1.24.0. This is a client-side SDK feature – it generates standard SQL (Dynamic Tables, window functions, aggregations) under the hood, so no account-level enablement or feature flag is needed. Install the required SDK version and you are ready to use it.

Official API documentation on the Snowflake docs portal may be limited at time of reading. For the authoritative parameter reference, use help(Feature) in a Python session or inspect the SDK source. API names and signatures may evolve in future releases.

See also the platform capabilities and preview status table in Chapter 12.

7.2 Learning Objectives

After completing this chapter, you will be able to:

  • Use the Feature class to define time-windowed aggregations
  • Understand tiling for efficient incremental computation
  • Configure feature granularity and refresh frequencies
  • Apply feature column prefixing for disambiguation
  • Name aggregated features following best practices
  • Generate features programmatically using data-driven patterns (categorical pivots, cross-product factories, config-driven specs)

📂 Chapter code: Browse companion scripts on GitHub


7.3 The Feature Class

7.3.1 Basic Usage

📁 Full code: _code/feature_class.py

# Define time-windowed aggregations declaratively (aliases: _SUM, _CNT, _AVG, …)
features = [
    Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D"),
    Feature.sum("TOTAL_AMT", "30d").alias("TOTAL_AMT_SUM_30D"),
    Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D"),
    Feature.avg("TOTAL_AMT", "7d").alias("TOTAL_AMT_AVG_7D"),
    Feature.max("TOTAL_AMT", "30d").alias("TOTAL_AMT_MAX_30D"),
]

# Create tiled Feature View (source: canonical clickstream ORDERS)
orders_fqn = "FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS"
user_agg_fv = FeatureView(
    name="USER_PURCHASE_AGGREGATES",
    entities=[user_entity],
    feature_df=session.table(orders_fqn),
    timestamp_col="ORDER_TS",
    refresh_freq="1 hour",
    feature_granularity="1 hour",
    features=features,
)

registered_agg_fv = fs.register_feature_view(
    feature_view=user_agg_fv, version="V01", block=True, overwrite=True,
)
print(f"Registered: {registered_agg_fv.name}/V01  (tiled={registered_agg_fv.is_tiled})")
print(f"Granularity: {registered_agg_fv.feature_granularity}")

7.3.2 Available Aggregation Functions

Function Description Example
Feature.sum(col, window) Sum over window Feature.sum("TOTAL_AMT", "7d")
Feature.min(col, window) Minimum over window Feature.min("TOTAL_AMT", "30d")
Feature.max(col, window) Maximum over window Feature.max("TOTAL_AMT", "30d")
Feature.count(col, window) Count over window Feature.count("ORDER_ID", "30d")
Feature.avg(col, window) Average over window Feature.avg("TOTAL_AMT", "7d")
Feature.std(col, window) Standard deviation Feature.std("TOTAL_AMT", "30d")
Feature.var(col, window) Variance Feature.var("TOTAL_AMT", "30d")
Feature.last_n(col, window, n=N) N most recent values Feature.last_n("PRODUCT_ID", "7d", n=5)
Feature.first_n(col, window, n=N) N oldest values Feature.first_n("PRODUCT_ID", "7d", n=5)
Feature.last_distinct_n(col, window, n=N) N most recent unique values Feature.last_distinct_n("PRODUCT_ID", "7d", n=5)
Feature.first_distinct_n(col, window, n=N) N oldest unique values Feature.first_distinct_n("PRODUCT_ID", "7d", n=5)
Feature.approx_count_distinct(col, window) HLL distinct count Feature.approx_count_distinct("PRODUCT_ID", "7d")
Feature.approx_percentile(col, window, percentile=0.5) Approximate percentile Feature.approx_percentile("TOTAL_AMT", "30d", percentile=0.5)
Float columns and incremental refresh

Aggregation Feature Views backed by Dynamic Tables may fall back to full refresh if the underlying SQL combines aggregations on FLOAT-typed expressions (sum, avg, min, max, std, var) with a JOIN in the same query block. The DT engine cannot guarantee incremental correctness for floating-point aggregate diffs across joins.

If your source columns are FLOAT/DOUBLE, cast them to a fixed-point type before the aggregation reaches the DT definition:

CAST(TOTAL_AMT AS NUMBER(38, 6))

The Temporal Aggregation API generates the underlying DT SQL automatically, so if your source table’s numeric columns are FLOAT and the API introduces a join (e.g., for tiling or rollup), the resulting DT may silently use full refresh. Check REFRESH_MODE in INFORMATION_SCHEMA.DYNAMIC_TABLES() after registration and alter source column types to NUMBER if needed.

ML model impact: For most ML use cases the precision difference between FLOAT and NUMBER(38, 8) is negligible – the variance across training runs or hyperparameter choices far exceeds the rounding delta. If you need higher fidelity, NUMBER(38, 12) provides ~12 decimal digits of precision, comparable to FLOAT for the value ranges typical in feature engineering. The key trade-off is incremental refresh (fast, cheap) vs full refresh (slow, expensive) – the precision loss from fixed-point casting is almost always the better deal.

Recommended approach for source table owners: Alter the source column type to NUMBER/DECIMAL at the table level rather than casting in every Feature View query. This avoids repeating the cast in multiple Feature Views and ensures all downstream consumers benefit:

ALTER TABLE FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
  ALTER COLUMN TOTAL_AMT SET DATA TYPE NUMBER(18, 8);

For rollup Feature Views, the same constraint applies transitively – if the source tiled Feature View aggregates FLOAT columns, the rollup DT also falls back to full refresh. Fix the types at the source level.

7.3.3 Aggregation Function Performance Characteristics

Not all aggregation functions are equal in terms of tile storage, computation cost, and incremental refresh compatibility. Understanding these trade-offs is important when designing features at scale.

Function Tile storage Incremental Notes
count Partial counts per tile Yes Lightweight; most efficient aggregation
sum Partial sums per tile Yes (if not FLOAT+JOIN) Cast to NUMBER if source is FLOAT
min, max Per-tile min/max Yes Used for windowed extremes
avg Stored as sum + count Yes (derived) Reconstructed from partial sum/count at query time
std, var Partial sum, sum-of-squares, count Yes (derived) Higher storage per tile; 3 partial aggregates
approx_count_distinct HLL sketch per tile Yes Fixed-size sketch (~12 KB); merges across tiles
last_n, first_n Array per tile Yes Storage scales with n; larger n increases tile size
last_distinct_n, first_distinct_n Deduplicated array per tile Yes More expensive than last_n due to dedup logic
approx_percentile T-Digest sketch per tile Yes Fixed-size sketch; approximate result

Guidance for high-cardinality, high-volume data:

  • Prefer count and sum as primary features – they are the most efficient for both tiling and query-time reassembly.
  • Use approx_count_distinct instead of exact distinct counts when precision within ~2% is acceptable. The HLL sketch is constant-size regardless of cardinality, whereas exact distinct counts require per-tile array storage.
  • Limit last_n / first_n values of n to what the model actually needs. Each tile stores up to n values; at n=100 with hourly tiles and 90-day windows, tile storage grows substantially.
  • std and var store three partial aggregates per tile (sum, sum-of-squares, count). If you only need variance for a subset of features, consider computing it post-retrieval from sum and count features to reduce tile storage.

7.3.4 Filter Expressions Within Features

When the same source data feeds features that differ only by a filter condition (e.g., count of automated messages vs. count of all messages), use separate Feature definitions with filter logic in the source feature_df rather than creating multiple Feature Views:

filtered_source_df = session.sql("""
    SELECT
        USER_ID,
        EVENT_TS,
        ORDER_ID,
        TOTAL_AMT,
        CASE WHEN CHANNEL = 'AUTOMATED' THEN ORDER_ID END AS AUTO_ORDER_ID,
        CASE WHEN CHANNEL = 'AUTOMATED' THEN TOTAL_AMT END AS AUTO_TOTAL_AMT
    FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
""")
import snowflake.snowpark.functions as F

orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
filtered_source_df = orders.select(
    F.col("USER_ID"),
    F.col("EVENT_TS"),
    F.col("ORDER_ID"),
    F.col("TOTAL_AMT"),
    F.when(F.col("CHANNEL") == "AUTOMATED", F.col("ORDER_ID")).alias("AUTO_ORDER_ID"),
    F.when(F.col("CHANNEL") == "AUTOMATED", F.col("TOTAL_AMT")).alias("AUTO_TOTAL_AMT"),
)
features = [
    Feature.count("ORDER_ID", "7d").alias("ORDER_CNT_7D"),
    Feature.count("AUTO_ORDER_ID", "7d").alias("AUTO_ORDER_CNT_7D"),
    Feature.sum("TOTAL_AMT", "30d").alias("SPEND_SUM_30D"),
    Feature.sum("AUTO_TOTAL_AMT", "30d").alias("AUTO_SPEND_SUM_30D"),
]

This approach keeps all related features in a single Feature View, which means they share tiles and refresh together – significantly more efficient than creating separate Feature Views per filter condition. The CASE WHEN produces NULL for non-matching rows, and aggregation functions skip NULLs, giving the correct filtered result.

The static approach above works well when the filter values are stable and few. When dimension values change over time (new channels, product categories, regions), see Data-Driven Feature Definitions for a programmatic alternative that discovers values from data and generates the CASE columns and Feature objects automatically.

7.3.5 Window Specifications

Windows can be specified using shorthand notation:

Notation Meaning
"1h" or "1 hour" 1 hour
"24h" 24 hours
"7d" or "7 day" 7 days
"30d" 30 days
"1w" or "1 week" 1 week

7.3.6 Window offset

Pass offset to shift the aggregation window backward in time. This is the standard way to build comparative features such as week-over-week or month-over-month trends:

features = [
    Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_CURRENT_WEEK"),
    Feature.sum("TOTAL_AMT", "7d", offset="7d").alias("TOTAL_AMT_SUM_PREV_WEEK"),
]

Pair the current-window and offset-window versions to capture momentum or change over time. For example, a model can detect spending acceleration when TOTAL_AMT_SUM_CURRENT_WEEK exceeds TOTAL_AMT_SUM_PREV_WEEK.

The offset must be a multiple of feature_granularity so the shifted window aligns cleanly to tile boundaries. For example, with feature_granularity="1h", valid offsets are "1h", "2h", "24h", "7d", etc.

Use .alias("NAME") on each Feature to set the output column name.


7.4 Naming Aggregated Features

7.4.1 Best Practice: Descriptive Aliases

Always use .alias() to give features meaningful names:

# ✅ GOOD: Suffix metric role (_SUM, _CNT, _AVG) and window
features = [
    Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D"),
    Feature.sum("TOTAL_AMT", "30d").alias("TOTAL_AMT_SUM_30D"),
    Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D"),
]

# ❌ BAD: Auto-generated names are hard to interpret
features = [
    Feature.sum("TOTAL_AMT", "7d"),  # opaque default name
    Feature.sum("TOTAL_AMT", "30d"),
]

7.5 Feature Column Prefixing for Disambiguation

When joining features from multiple Feature Views, column name collisions can occur. Feature Store provides two mechanisms for disambiguation.

7.5.1 The Problem

# Name collision: same aggregation function on same column, different entity groupings,
# using the default naming derivation -- both resolve to ORDER_ID_CNT_7D
user_orders_fv = FeatureView(
    name="USER_ORDERS",
    entities=[user_entity],
    features=[Feature.count("ORDER_ID", "7d")],  # default name: ORDER_ID_CNT_7D
    # ...
)

merchant_orders_fv = FeatureView(
    name="MERCHANT_ORDERS",
    entities=[merchant_entity],
    features=[Feature.count("ORDER_ID", "7d")],  # same default name: ORDER_ID_CNT_7D
    # ...
)

# Without prefixing, collision occurs
dataset = fs.generate_dataset(
    spine_df=spine,
    features=[user_orders_fv, user_sessions_fv],  # Which ORDER_ID_CNT_7D?
)

Prefer explicit .alias() calls with distinct names (e.g., MERCHANT_ORDER_ID_CNT_7D) or use auto_prefix=True / .with_name() so semantics stay clear even when multiple Feature Views aggregate the same source column.

7.5.2 Solution 1: Auto-Prefix

Use auto_prefix=True to automatically prefix columns with Feature View name:

📁 Full code: _code/prefixing.py

dataset = fs.generate_dataset(
    spine_df=spine,
    features=[user_orders_fv, user_sessions_fv],
    auto_prefix=True,  # Columns become USER_ORDERS__ORDER_ID_CNT_7D, etc.
)

# Result columns:
# - USER_ORDERS__ORDER_ID_CNT_7D
# - USER_SESSIONS__SESSION_ID_CNT_7D

7.5.3 Solution 2: Custom Prefixes with .with_name()

For more control, use .with_name() on individual Feature Views or slices:

dataset = fs.generate_dataset(
    spine_df=spine,
    features=[
        user_orders_fv.with_name("orders"),      # Prefix: orders__
        user_sessions_fv.with_name("sessions"),  # Prefix: sessions__
    ],
)

# Result columns:
# - orders__ORDER_ID_CNT_7D
# - sessions__SESSION_ID_CNT_7D

7.5.4 Solution 3: Combining Both Approaches

dataset = fs.generate_dataset(
    spine_df=spine,
    features=[
        user_orders_fv.with_name("orders"),  # Custom prefix
        user_sessions_fv,                     # Will use auto_prefix
    ],
    auto_prefix=True,
)
Priority

.with_name() takes precedence over auto_prefix. If a Feature View has .with_name() set, that prefix is used regardless of auto_prefix setting.

7.5.5 Choosing a Prefixing Strategy

Scenario Recommendation
Single Feature View No prefix needed
Multiple FVs, unique column names No prefix needed
Multiple FVs, potential collisions Use auto_prefix=True
Want short, custom prefixes Use .with_name()
Consistent naming across projects Define naming convention + .with_name()

7.6 Feature Granularity (Tiling)

7.6.1 How Tiling Works

The feature_granularity parameter sets the tile size for incremental computation:

# Hourly tiles for high-frequency data
fv = FeatureView(
    feature_granularity="1 hour",  # Each tile covers 1 hour
    refresh_freq="15 minutes",      # Refresh every 15 minutes
    # ...
)

7.6.2 Tile Size Recommendations

Data Pattern Recommended Tile Reason
Clickstream (high frequency) 1 hour Many events per hour
Transactions (medium) 1 day Daily aggregation patterns
Weekly metrics 1 week Coarse granularity sufficient

7.6.3 Multi-Granularity Tile Strategy

When Feature Views need to serve both short lookback windows (1d, 7d) and long lookback windows (90d, 360d), a single tile size creates a trade-off: small tiles (1 hour) provide precision for short windows but require combining thousands of tiles for 360-day windows; large tiles (1 day) are efficient for long windows but lose sub-day precision.

The recommended approach is to use multiple Feature Views at different tile granularities, each optimized for the windows it serves:

short_window_features = [
    Feature.sum("TOTAL_AMT", "1d").alias("SPEND_SUM_1D"),
    Feature.sum("TOTAL_AMT", "7d").alias("SPEND_SUM_7D"),
    Feature.count("ORDER_ID", "1d").alias("ORDER_CNT_1D"),
    Feature.count("ORDER_ID", "7d").alias("ORDER_CNT_7D"),
]

short_window_fv = FeatureView(
    name="USER_PURCHASE_SHORT_WINDOW",
    entities=[user_entity],
    feature_df=session.table(orders_fqn),
    timestamp_col="ORDER_TS",
    refresh_freq="1 hour",
    feature_granularity="1 hour",
    features=short_window_features,
    desc="Short-window purchase features (1d, 7d) with hourly tiles",
)

long_window_features = [
    Feature.sum("TOTAL_AMT", "30d").alias("SPEND_SUM_30D"),
    Feature.sum("TOTAL_AMT", "90d").alias("SPEND_SUM_90D"),
    Feature.sum("TOTAL_AMT", "360d").alias("SPEND_SUM_360D"),
    Feature.count("ORDER_ID", "30d").alias("ORDER_CNT_30D"),
    Feature.count("ORDER_ID", "360d").alias("ORDER_CNT_360D"),
]

long_window_fv = FeatureView(
    name="USER_PURCHASE_LONG_WINDOW",
    entities=[user_entity],
    feature_df=session.table(orders_fqn),
    timestamp_col="ORDER_TS",
    refresh_freq="1 day",
    feature_granularity="1 day",
    features=long_window_features,
    desc="Long-window purchase features (30d-360d) with daily tiles",
)

At training time, join both Feature Views on the same spine:

training_df = fs.generate_training_set(
    spine_df=spine,
    features=[short_window_fv, long_window_fv],
    spine_timestamp_col="EVENT_TS",
)

Trade-offs:

Approach Tiles for 360d window Refresh cost Short-window precision
Single FV, 1 hour tiles 8,640 tiles High (hourly refresh) Sub-hour
Single FV, 1 day tiles 360 tiles Low (daily refresh) Daily only
Two FVs (recommended) 360 tiles (daily FV) + 168 tiles for 7d (hourly FV) Balanced Sub-hour for short, daily for long

The two-FV approach also lets you set different refresh_freq values: hourly for short windows where freshness matters, daily for long windows where the marginal value of hourly refresh is negligible.


7.7 Multi-Window Features

Define features with multiple time windows efficiently:

features = [
    Feature.sum("TOTAL_AMT", "1d").alias("TOTAL_AMT_SUM_1D"),
    Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D"),
    Feature.sum("TOTAL_AMT", "30d").alias("TOTAL_AMT_SUM_30D"),
    Feature.sum("TOTAL_AMT", "90d").alias("TOTAL_AMT_SUM_90D"),
    Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D"),
    Feature.count("ORDER_ID", "30d").alias("ORDER_ID_CNT_30D"),
]

# Single Feature View efficiently computes all windows (required FV args shown above)
multi_window_fv = FeatureView(
    name="USER_MULTI_WINDOW",
    entities=[user_entity],
    feature_df=session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS"),
    timestamp_col="ORDER_TS",
    refresh_freq="1 day",
    features=features,
    feature_granularity="1 day",
)

When you have many columns and windows, a helper function avoids repetitive boilerplate and enforces consistent naming:

from snowflake.ml.feature_store import Feature

def build_features(columns: list[str], windows: list[str], agg: str = "sum") -> list[Feature]:
    agg_fn = getattr(Feature, agg)
    return [
        agg_fn(col, w).alias(f"{col}_{agg.upper()}_{w.upper()}")
        for col in columns
        for w in windows
    ]

features = (
    build_features(["TOTAL_AMT", "DISCOUNT_AMT"], ["7d", "30d", "90d"], "sum")
    + build_features(["ORDER_ID"], ["7d", "30d"], "count")
)

7.8 Best Practices

1. Use Consistent Naming Conventions

# Pattern: <COL_OR_ENTITY>_<ROLE>_<WINDOW> (roles: _SUM, _CNT, _AVG, …)
features = [
    Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D"),
    Feature.sum("TOTAL_AMT", "30d").alias("TOTAL_AMT_SUM_30D"),
    Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D"),
    Feature.avg("TOTAL_AMT", "7d").alias("TOTAL_AMT_AVG_7D"),
]

2. Match Tile Size to Data Frequency

# High-frequency: smaller tiles
clickstream_fv = FeatureView(
    feature_granularity="1 hour",
    # ...
)

# Low-frequency: larger tiles
weekly_metrics_fv = FeatureView(
    feature_granularity="1 day",
    # ...
)

3. Group Related Features and Consolidate Feature Views

When multiple Feature Views share the same source table, entity, and tile granularity, they automatically share the same underlying tile computation. Consolidating them into a single Feature View reduces operational overhead (fewer DTs to monitor) and ensures tile computation happens once:

# ✅ GOOD: Related features in same Feature View -- single tile computation
purchase_features = [
    Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D"),
    Feature.sum("TOTAL_AMT", "30d").alias("TOTAL_AMT_SUM_30D"),
    Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D"),
    Feature.count("ORDER_ID", "30d").alias("ORDER_ID_CNT_30D"),
    Feature.avg("TOTAL_AMT", "7d").alias("TOTAL_AMT_AVG_7D"),
]

# ❌ BAD: Same source split across multiple FVs -- duplicated tile computation
spend_fv = FeatureView(
    name="USER_SPEND_FV",
    feature_df=session.table(orders_fqn),
    features=[Feature.sum("TOTAL_AMT", "7d").alias("TOTAL_AMT_SUM_7D")],
    # ...
)
count_fv = FeatureView(
    name="USER_ORDER_COUNT_FV",
    feature_df=session.table(orders_fqn),
    features=[Feature.count("ORDER_ID", "7d").alias("ORDER_ID_CNT_7D")],
    # ...
)

Splitting features from the same source into separate Feature Views forces independent tile computations over the same data. Each Feature View creates its own DT with its own tiles, effectively doubling (or more) the compute cost. Consolidate into one Feature View when the source, entity, and granularity are shared.

When separate Feature Views are appropriate:

  • Different source tables (e.g., ORDERS vs EVENTS)
  • Different entities (e.g., USER vs PRODUCT)
  • Different tile granularities (e.g., hourly tiles for short windows, daily for long – see Multi-Granularity)
  • Different refresh cadences needed for operational reasons
  • Unrelated feature domains that evolve independently
# ❌ BAD: Unrelated features mixed together
mixed_features = [
    Feature.sum("TOTAL_AMT", "7d"),  # Purchase feature
    Feature.count("EVENT_ID", "7d"),  # Clickstream feature (different grain / entity!)
]

4. Tile Sharing Across Feature Views

When two Feature Views use the same source, entity, and feature_granularity, the underlying tile DTs compute identical partial aggregates. Today, the Feature Store does not automatically share tiles between separately registered Feature Views – each creates its own tile DT. This means:

  • Consolidating features into fewer Feature Views (per best practice #3 above) is the primary mechanism for avoiding duplicate tile computation.
  • If you need separate Feature Views for organisational reasons (different owners, different version lifecycles), be aware that tile compute is duplicated. The cost impact depends on source data volume and tile granularity.
  • A future platform optimisation may deduplicate tiles across Feature Views with identical source/entity/granularity, but until then, consolidation is the recommended approach.
Stacking tiled Feature Views is not supported

You cannot chain (stack) one Time Aggregation Feature View on top of another. Attempting it causes duplicate internal _PARTIAL_* column names and SQL compilation errors that block both retrieve_feature_values() and generate_training_set(). To aggregate from a finer-grained entity to a coarser one (e.g., visitor → subscriber), use Rollup Feature Views with RollupConfig as described below – not raw stacking of tiled Feature Views.


7.9 Rollup Aggregated Feature Views

Rollup Feature Views enable hierarchical aggregation – creating coarser-grained features from finer-grained tiled Feature Views without recomputing from raw data. For example, clickstream-style hierarchies often roll up VISITORUSERHOUSEHOLD: a visitor-level “7-day page view count” can be rolled up to user-level (sum across all visitors linked to that user) and then to household-level (sum across all users in the household).

The RollupConfig defines the relationship between a finer-grained (source) Feature View and the coarser-grained (target) Feature View. The Feature Store leverages Dynamic Tables for automatic, incremental updates.

7.9.1 RollupConfig Arguments

Argument Description
source The parent tiled Feature View to roll up from. Must be registered (not in DRAFT status)
mapping_df A Snowpark DataFrame containing the entity mapping – all join keys from both the source (finer) and target (coarser) entities

7.9.2 Example: Visitor → Subscriber Rollup

from snowflake.ml.feature_store import FeatureStore, FeatureView, Entity, RollupConfig, Feature

# Entities at both levels of the hierarchy
visitor_entity = Entity("VISITOR", ["VISITOR_ID", "COMPANY_ID"])
subscriber_entity = Entity("SUBSCRIBER", ["SUBSCRIBER_ID", "COMPANY_ID"])
fs.register_entity(visitor_entity)
fs.register_entity(subscriber_entity)

# Fine-grained: tiled visitor-level Feature View (already registered)
visitor_features = [
    Feature.sum("PAGE_VIEWS", "7d").alias("PAGE_VIEWS_7D"),
    Feature.count("SESSION_ID", "30d").alias("SESSION_CNT_30D"),
]
visitor_fv = FeatureView(
    name="VISITOR_ENGAGEMENT_FV",
    entities=[visitor_entity],
    feature_df=session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.EVENTS"),
    timestamp_col="EVENT_TS",
    refresh_freq="1 hour",
    feature_granularity="1 hour",
    features=visitor_features,
    desc="Visitor-level engagement features — tiled",
)
registered_visitor_fv = fs.register_feature_view(visitor_fv, version="V01")

# Entity mapping: links visitor keys to subscriber keys
mapping_df = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.VISITOR_SUBSCRIBER_MAPPING")
# Expected columns: VISITOR_ID, COMPANY_ID, SUBSCRIBER_ID

# Rollup configuration
rollup_config = RollupConfig(
    source=registered_visitor_fv,
    mapping_df=mapping_df,
)

# Coarse-grained: subscriber-level rollup Feature View
subscriber_fv = FeatureView(
    name="SUBSCRIBER_ENGAGEMENT_FV",
    entities=[subscriber_entity],
    rollup_config=rollup_config,
    desc="Subscriber-level engagement — rolled up from visitor-level tiles",
)
fs.register_feature_view(subscriber_fv, version="V01")

The resulting SUBSCRIBER_ENGAGEMENT_FV automatically aggregates the visitor-level tiles up to the subscriber level using the entity mapping. Because it is backed by a Dynamic Table, it refreshes incrementally as the source visitor Feature View updates.

Note

The source Feature View must be a registered, tiled Feature View (one with features and feature_granularity). You cannot roll up a non-tiled DT-based or View-based Feature View.

7.9.3 Rollup Design Guidelines

  • Define entities that match the grain of each Feature View (for example, visitor-level tiles on EVENTS, user-level aggregates on ORDERS keyed by USER_ID).
  • Reuse the same aggregation patterns (Feature.sum, Feature.count, …) at each level; change the spine entity and join keys when moving from visitor to user to household.
  • Keep aliases explicit (VISITOR_PAGE_VIEWS_7D vs USER_PAGE_VIEWS_7D) so rolled-up datasets remain interpretable when multiple hierarchy levels appear in one training set.
  • Design mapping tables carefully. The mapping_df determines which fine-grained entities roll up to which coarse-grained entities. For static relationships a simple DISTINCT mapping suffices; for time-varying relationships (visitors identified as subscribers over time), consider SCD-Type2 patterns or MAPPED_AT timestamps. See Chapter 3: Entity Mapping Tables for patterns and PIT correctness implications.
  • Watch for late-arriving mappings. If entity mappings are added retroactively, the rollup may aggregate historical tile data under a mapping that did not exist at the original event time, creating subtle data leakage. See the mapping table PIT warning in Chapter 3 and Chapter 6.

7.10 Common Pitfalls

7.10.1 ❌ Pitfall 1: No Alias

Problem: Auto-generated names are unclear.

Solution: Always use .alias() with descriptive names.

7.10.2 ❌ Pitfall 2: Wrong Tile Size

Problem: Tile size doesn’t match data frequency.

Solution: Use smaller tiles for high-frequency data.

7.10.3 ❌ Pitfall 3: Column Collisions

Problem: Same column names from different Feature Views.

Solution: Use auto_prefix=True or .with_name().

7.10.4 ❌ Pitfall 4: Internal _PARTIAL Columns in Output

Problem: Internal columns used for tile aggregation (prefixed _PARTIAL_) sometimes appear in output DataFrames from generate_training_set() or retrieve_feature_values(). These are implementation artifacts – partial aggregates stored per tile – and are not user-facing features.

Solution: Filter them before consuming the output:

result_df = fs.generate_training_set(spine_df=spine, features=fv_list, ...)

# Drop internal tile columns
output_cols = [c for c in result_df.columns if not c.startswith("_PARTIAL_")]
result_df = result_df.select(output_cols)

7.10.5 ❌ Pitfall 5: Partial Leading-Edge Tiles from Incomplete Source Intervals

Problem: When the source data is a continuous stream aggregated into fixed time buckets (e.g., 15-minute intervals), the current in-progress bucket is incomplete at DT refresh time. The tiled Feature View generates a tile from this partial interval, producing aggregation values with a different distribution than complete historical tiles. This creates training/serving skew: the model trains on features derived from complete intervals but at inference time receives features that include a partial tile.

This affects all tiled aggregations — Feature.avg, Feature.sum, Feature.count, Feature.min, Feature.max — because the partial tile has fewer source events than a complete one.

Solution: Two complementary strategies:

Strategy 1 — Use offset to skip the current interval (DT-safe):

The offset parameter shifts the aggregation window back, inherently excluding the potentially partial current tile. This is deterministic SQL that preserves incremental DT refresh:

# Window covers intervals ending one interval ago — skips partial current tile
Feature.avg("QTY", "1h", offset="15m").alias("QTY_ROLL_4_MEAN")
Feature.sum("QTY", "1h", offset="15m").alias("QTY_ROLL_4_SUM")

This is the simplest and most robust approach for tiled Feature Views. The trade-off is that features always reflect the state one interval behind — typically acceptable for forecasting use cases where the current interval is incomplete anyway.

Strategy 2 — Gate the source table externally (outside the DT):

Use a scheduled Task or Stream+Task pipeline that writes only complete intervals to the source table. The DT reads a table that never contains partial data:

-- Task runs every 15 minutes: write only complete intervals to source
INSERT INTO SOURCE_ORDERS_15MIN
SELECT TIME_SLICE(ORDER_TS, 15, 'MINUTE') AS INTERVAL_ID,
       ITEM_CODE, BRANCH_CODE, COUNT(*) AS QTY
FROM RAW_ORDERS
WHERE TIME_SLICE(ORDER_TS, 15, 'MINUTE')
      < TIME_SLICE(CURRENT_TIMESTAMP(), 15, 'MINUTE', 'START')
GROUP BY 1, 2, 3;
Do not use CURRENT_TIMESTAMP() inside a DT or feature_df

CURRENT_TIMESTAMP() is non-deterministic and forces FULL refresh. The TIME_SLICE gating filter must be applied in a Task or ingestion pipeline outside the DT — never in the DT definition SQL or the feature_df passed to a Feature View. Even when passed via session.sql(...), the non-deterministic expression flows through to the DT and prevents incremental refresh.

For non-tiled DT Feature Views with explicit window functions, use ROWS BETWEEN N PRECEDING AND 1 PRECEDING (excluding CURRENT ROW) as the DT-safe equivalent. See Chapter 5: Partial Leading-Edge Interval and Chapter 6: Data Leakage from Partial Intervals.


7.11 Data-Driven Feature Definitions

The examples earlier in this chapter define features with hard-coded column names, filter values, and window periods. That works well for a handful of stable features, but becomes brittle and verbose as feature sets grow. This section introduces data-driven patterns that generate Feature objects and the corresponding feature_df columns programmatically.

Schema Is Fixed at Registration

The Feature View schema is fixed at registration time. Data-driven generation makes the code path adaptive, not the registered Feature View itself. When underlying data changes (e.g., a new channel appears), re-run the generation code and register a new version of the Feature View.

Pattern 1: Categorical Pivot — Static vs Dynamic

Filter Expressions Within Features shows how CASE WHEN decomposes a metric by a single hard-coded dimension value. When the set of values is large or evolving, generate the columns programmatically:

Static (2 known channels):

source_df = session.sql("""
    SELECT USER_ID, ORDER_TS, ORDER_ID, TOTAL_AMT,
           CASE WHEN CHANNEL = 'AUTOMATED' THEN ORDER_ID END AS AUTOMATED_ORDER_ID,
           CASE WHEN CHANNEL = 'WEB'       THEN ORDER_ID END AS WEB_ORDER_ID
    FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
""")
import snowflake.snowpark.functions as F

orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
source_df = orders.select(
    "USER_ID", "ORDER_TS", "ORDER_ID", "TOTAL_AMT",
    F.when(F.col("CHANNEL") == "AUTOMATED", F.col("ORDER_ID")).alias("AUTOMATED_ORDER_ID"),
    F.when(F.col("CHANNEL") == "WEB", F.col("ORDER_ID")).alias("WEB_ORDER_ID"),
)
features = [
    Feature.count("ORDER_ID", "7d").alias("ORDER_CNT_7D"),
    Feature.count("AUTOMATED_ORDER_ID", "7d").alias("AUTOMATED_ORDER_CNT_7D"),
    Feature.count("WEB_ORDER_ID", "7d").alias("WEB_ORDER_CNT_7D"),
]

Dynamic (discovers all channels from data):

from snowflake.snowpark import functions as F

channels = [row["CHANNEL"] for row in
    session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
    .select("CHANNEL").distinct().collect()]

source_df = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS") \
    .select("USER_ID", "ORDER_TS", "ORDER_ID", "TOTAL_AMT", "CHANNEL")

for ch in channels:
    tag = ch.upper().replace(" ", "_")
    source_df = source_df.with_column(
        f"{tag}_ORDER_ID",
        F.when(F.col("CHANNEL") == ch, F.col("ORDER_ID"))
    )

feature_df = source_df.drop("CHANNEL")

features = [Feature.count("ORDER_ID", "7d").alias("ORDER_CNT_7D")]
for ch in channels:
    tag = ch.upper().replace(" ", "_")
    features.append(
        Feature.count(f"{tag}_ORDER_ID", "7d").alias(f"{tag}_ORDER_CNT_7D")
    )

When to prefer dynamic: Dimension values change over time (new channels, product categories, regions). Adding a value requires only a data change and a new Feature View version — no code edits.

Pattern 2: Multi-Window Sweep

The Multi-Window Features section above shows a build_features helper that loops over columns and windows. This is already a data-driven pattern: the columns and windows lists are the configuration, and the helper generates Feature objects with consistent naming. Refer to that helper as the starting point for more complex patterns.

Pattern 3: Metric × Dimension × Window Factory

Patterns 1 and 2 combine naturally into a full cross-product: for every (metric, dimension value, window) triple, generate a filtered column and a Feature. With 3 metrics, 4 channels, and 3 windows, this produces 36 features from a single function call instead of 36 hand-written definitions.

from itertools import product as cartesian
from snowflake.snowpark import functions as F
from snowflake.ml.feature_store import Feature

AGG_DISPATCH = {
    "count": Feature.count,
    "sum":   Feature.sum,
    "avg":   Feature.avg,
    "max":   Feature.max,
    "min":   Feature.min,
}

def build_pivot_features(
    session,
    source_table: str,
    entity_col: str,
    ts_col: str,
    pivot_col: str,
    metrics: dict[str, str],      # {"ORDER_ID": "count", "TOTAL_AMT": "sum"}
    windows: list[str],           # ["7d", "30d", "90d"]
) -> tuple:
    """Return (feature_df, features_list) with full cross-product."""
    pivots = [r[0] for r in session.sql(
        f"SELECT DISTINCT {pivot_col} FROM {source_table} ORDER BY 1"
    ).collect()]

    df = session.table(source_table) \
        .select(entity_col, ts_col, pivot_col, *metrics.keys())

    features = []

    # Base (unfiltered) features across all windows
    for col, fn_name in metrics.items():
        for w in windows:
            features.append(
                AGG_DISPATCH[fn_name](col, w).alias(f"{col}_{fn_name.upper()}_{w.upper()}")
            )

    # Per-pivot filtered features
    for pv in pivots:
        tag = pv.upper().replace(" ", "_")
        for col in metrics:
            derived = f"{tag}_{col}"
            df = df.with_column(
                derived, F.when(F.col(pivot_col) == pv, F.col(col))
            )
            for w in windows:
                fn_name = metrics[col]
                features.append(
                    AGG_DISPATCH[fn_name](derived, w)
                    .alias(f"{tag}_{col}_{fn_name.upper()}_{w.upper()}")
                )

    return df.drop(pivot_col), features

Usage:

feature_df, features = build_pivot_features(
    session,
    source_table="FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS",
    entity_col="USER_ID",
    ts_col="ORDER_TS",
    pivot_col="CHANNEL",
    metrics={"ORDER_ID": "count", "TOTAL_AMT": "sum"},
    windows=["7d", "30d"],
)

fv = FeatureView(
    name="USER_CHANNEL_METRICS",
    entities=[user_entity],
    feature_df=feature_df,
    timestamp_col="ORDER_TS",
    refresh_freq="1 day",
    features=features,
    feature_granularity="1 day",
)

Pattern 4: Config-Driven Feature Specifications

Instead of embedding feature logic in code, define feature specifications as data — either a Python dict for version-controlled config or a Snowflake table for runtime flexibility.

Python dict (version-controlled):

FEATURE_SPECS = [
    {"col": "TOTAL_AMT", "agg": "sum",   "windows": ["7d", "30d", "90d"]},
    {"col": "ORDER_ID",  "agg": "count", "windows": ["7d", "30d"]},
    {"col": "TOTAL_AMT", "agg": "avg",   "windows": ["30d"]},
]

features = []
for spec in FEATURE_SPECS:
    fn = AGG_DISPATCH[spec["agg"]]
    for w in spec["windows"]:
        features.append(
            fn(spec["col"], w).alias(f"{spec['col']}_{spec['agg'].upper()}_{w.upper()}")
        )

Snowflake table (runtime-configurable):

CREATE TABLE IF NOT EXISTS FEATURE_STORE_DEMO.FEATURE_STORE.FEATURE_SPECS (
    DOMAIN      VARCHAR,   -- e.g. 'ORDERS', 'SESSIONS'
    SOURCE_COL  VARCHAR,   -- column to aggregate
    AGG_FN      VARCHAR,   -- 'sum', 'count', 'avg', etc.
    WINDOW_SIZE VARCHAR,   -- '7d', '30d', etc.
    ENABLED     BOOLEAN DEFAULT TRUE
);
specs = session.table("FEATURE_STORE_DEMO.FEATURE_STORE.FEATURE_SPECS") \
    .filter(F.col("DOMAIN") == "ORDERS") \
    .filter(F.col("ENABLED") == True) \
    .collect()

features = []
for row in specs:
    fn = AGG_DISPATCH[row["AGG_FN"]]
    features.append(
        fn(row["SOURCE_COL"], row["WINDOW_SIZE"])
        .alias(f"{row['SOURCE_COL']}_{row['AGG_FN'].upper()}_{row['WINDOW_SIZE'].upper()}")
    )
Approach Pros Cons
Python dict Version-controlled, reviewable in PRs, easy to test Requires code deployment to change
Snowflake table Non-developers can manage; takes effect on next FV version State split between git (code) and table (config); harder to reproduce; audit trail needed
Tip

A hybrid approach keeps the dict in code as the source of truth and syncs it to a Snowflake table for discoverability. This gives you version control and runtime queryability.

Pattern 5: Ratio and Derived Features

Many useful features are derived from base aggregations — conversion rates, averages-per-order, or period-over-period growth. The Feature class supports base aggregation functions only; ratios and formulas must be computed separately.

Tiled Feature Views store partial aggregates

The underlying Dynamic Table of a tiled Feature View contains internal _PARTIAL columns — tile-level fragments, not final values. You cannot compute ratios by reading the DT directly (e.g., session.table("FEATURE_STORE.USER_ORDER_FV$V01")). You must use the Feature Store API to reassemble tiles into final aggregated values first.

Approach A: Post-retrieval (recommended for training pipelines)

generate_dataset and retrieve_feature_values handle tile reassembly automatically. Compute ratios on the returned DataFrame:

RATIOS = [
    ("AUTOMATED_ORDER_CNT_7D", "ORDER_CNT_7D", "AUTOMATION_RATE_7D"),
    ("DISCOUNT_AMT_SUM_30D",   "TOTAL_AMT_SUM_30D", "DISCOUNT_RATE_30D"),
]

dataset_df = fs.generate_dataset(
    name="TRAINING_SET",
    spine_df=spine,
    features=[base_fv],
    output_type="dataset",
).read.to_snowpark_dataframe()

for num, denom, alias in RATIOS:
    dataset_df = dataset_df.with_column(
        alias,
        F.iff(F.col(denom) != 0, F.col(num) / F.col(denom), F.lit(None))
    )

Approach B: Downstream View-based Feature View

Use read_feature_view to get the reassembled values, then build a non-tiled Feature View that computes derived columns:

base_fv = fs.get_feature_view("USER_ORDER_METRICS", "V01")
reassembled_df = fs.read_feature_view(base_fv)

for num, denom, alias in RATIOS:
    reassembled_df = reassembled_df.with_column(
        alias,
        F.iff(F.col(denom) != 0, F.col(num) / F.col(denom), F.lit(None))
    )

ratio_fv = FeatureView(
    name="USER_ORDER_RATIOS",
    entities=[user_entity],
    feature_df=reassembled_df,
    # No refresh_freq — View-based, recomputes on read
)
fs.register_feature_view(ratio_fv, version="V01")
Note

Safe division (F.iff(denom != 0, ...)) prevents NaN / divide-by-zero errors that can silently corrupt model training. Always guard ratio features.

Pattern 6: Interaction Features

Interaction features capture non-linear relationships by combining pairs of columns — typically products (a × b) or ratios (a / b). These apply to any Feature View source DataFrame, not just aggregation Feature Views.

Static (hand-picked pairs):

source_df = source_df.with_column(
    "AMT_X_QTY", F.col("TOTAL_AMT") * F.col("QUANTITY")
).with_column(
    "PRICE_PER_UNIT", F.iff(F.col("QUANTITY") != 0,
                            F.col("TOTAL_AMT") / F.col("QUANTITY"),
                            F.lit(None))
)

Dynamic (all pairwise products from a list):

from itertools import combinations

NUMERIC_COLS = ["TOTAL_AMT", "QUANTITY", "DISCOUNT_AMT"]

for a, b in combinations(NUMERIC_COLS, 2):
    source_df = source_df.with_column(
        f"{a}_X_{b}", F.col(a) * F.col(b)
    )

With 5 numeric columns this generates 10 interaction features; with 10 columns it generates 45. Use this for exploratory model experiments, then prune non-contributing interactions after feature importance evaluation to avoid unnecessary compute.

Static vs Dynamic — Trade-Off Summary

Dimension Static Dynamic
Readability Immediately clear; each feature visible in code Requires understanding the generation logic
Verbosity High — repetitive definitions Low — DRY, loop-based
Testability Each feature testable individually Test the generator + spot-check outputs
Adaptability Manual edit per new value/metric/window Adapts to data changes; new FV version only
Debugging Straightforward — read the code Inspect generated DataFrame schema
Git diffs Explicit per-feature changes Changes to generator logic only
Registration Same Same — schema fixed at register_feature_view

Guidance: Start static for small, stable feature sets (under ~10 features). Switch to dynamic when you find yourself copying and renaming feature definitions, when dimension values evolve, or when you need to experiment with many metric/window combinations rapidly.

Consuming Data-Driven Features with OBJECT_CONSTRUCT

The factory patterns above enforce a naming convention — {DIMENSION}_{METRIC}_{AGG}_{WINDOW} — that becomes a queryable contract. Snowflake’s OBJECT_CONSTRUCT with ILIKE and EXCLUDE filtering can slice dynamically generated columns by any naming axis without hard-coding column lists:

-- Slice features by dimension, window, or collect everything — no column list needed
SELECT
    USER_ID,
    OBJECT_CONSTRUCT(* ILIKE 'AUTOMATED_%')             AS automated_features,
    OBJECT_CONSTRUCT(* ILIKE '%_7D')                     AS window_7d_features,
    OBJECT_CONSTRUCT(* ILIKE '%_30D')                    AS window_30d_features,
    OBJECT_CONSTRUCT(* EXCLUDE (USER_ID, ORDER_TS))      AS all_features
FROM FEATURE_STORE.USER_CHANNEL_METRICS$V01

This is particularly valuable when the factory generates dozens of columns — consumers can dynamically group features by dimension, window, or metric type without knowing every column name. The resulting OBJECTs are well-suited for:

  • Feature serving APIs that return JSON payloads
  • Feature explorationOBJECT_KEYS(all_features) lists what was generated
  • Selective retrieval — extract specific features with all_features:"AUTOMATED_ORDER_ID_COUNT_7D" path syntax

Two-tier Feature View pattern: Combine this with a lightweight View-based Feature View that repackages the flat columns from a tiled DT-backed Feature View into structured OBJECTs. The tiled FV handles efficient incremental computation; the View FV provides the structured consumption interface. See Chapter 12: Wide & Sparse Feature Data for the full two-tier architecture.

Note

OBJECT_CONSTRUCT operates at the consumption layer — downstream queries and serving. The Feature class still requires flat scalar columns as inputs during definition. The naming convention from the factory is what bridges the two: structured names at definition time enable wildcard grouping at consumption time.


7.12 Summary

Concept Description
Feature class Declarative API for time-windowed aggregations (Feature.sum, Feature.count, …)
Window Time period for aggregation (e.g., "7d", "30d"); optional offset shifts the window back
Tile (feature_granularity) Pre-computation unit for efficiency
Required FV args timestamp_col, refresh_freq, feature_granularity, features (plus name, entities, feature_df)
Version Register with zero-padded strings such as "V01", "V02"
auto_prefix Automatic column prefixing with FV name
.with_name() Custom column prefixing
RollupConfig Hierarchical aggregation from fine-grained to coarse-grained entities via RollupConfig(source, mapping_df)
Data-driven definitions Programmatic generation of Feature objects and source columns from data or config; patterns include categorical pivot, cross-product factory, config-driven specs, ratio features, and interaction features

7.13 Tiled Feature Views and Online Serving

Tiled Feature Views Cannot Be Served Directly via Hybrid Table OFTs

A tiled Feature View’s underlying Dynamic Table stores partial aggregates per tile interval — columns like _PARTIAL_SUM_TOTAL_AMT, _PARTIAL_COUNT_ORDER_ID, and a TILE_START timestamp. The original timestamp_col (e.g., ORDER_TS) does not exist in the DT schema. When you attempt to create a Hybrid Table Online Feature Table from a tiled FV, the CREATE ONLINE FEATURE TABLE ... TIMESTAMP_COLUMN = ORDER_TS statement fails:

SQL compilation error: column 'ORDER_TS' does not exist

The SDK does not validate this upfront — the error only surfaces at OFT creation time during register_feature_view.

For offline retrieval (fs.read_feature_view, fs.retrieve_feature_values), the Feature Store performs tile reassembly automatically — window aggregations are computed correctly at read time. The issue only arises when you want to push these aggregated values into a Hybrid Table for low-latency online serving.

7.13.1 Workaround: online_fv_from_tiled Helper

Manually re-expressing the same aggregation logic as a SQL GROUP BY is error-prone and violates DRY. The online_fv_from_tiled() helper introspects a tiled Feature View’s Feature definitions and automatically generates a non-tiled, OFT-compatible companion:

from online_from_tiled import online_fv_from_tiled

# Step 1: Define the tiled FV (offline only, no online_config)
tiled_fv = FeatureView(
    name="USER_PURCHASE_TILED",
    entities=[user_entity],
    feature_df=session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS"),
    timestamp_col="ORDER_TS",
    refresh_freq="1 hour",
    feature_granularity="1 hour",
    features=purchase_features,
    desc="Tiled aggregations — offline only",
)
tiled_reg = fs.register_feature_view(tiled_fv, version="V01", block=True)

# Step 2: Auto-generate a non-tiled, OFT-ready companion
online_fv = online_fv_from_tiled(
    tiled_reg, session,
    source_query="SELECT * FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS",
    refresh_freq="1 minute",
    target_lag="1 minute",
    window_buffer="1d",
)
online_reg = fs.register_feature_view(online_fv, version="V01", block=True)

The helper produces a GROUP BY query scoped to only max(window + offset) + buffer of source data. For the purchase features above (7-day windows, no offset, 1-day buffer), the generated SQL is:

SELECT
    USER_ID,
    SUM(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
        THEN TOTAL_AMT END)  AS TOTAL_AMT_SUM_7D,
    COUNT(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
        THEN ORDER_ID END)   AS ORDER_ID_CNT_7D,
    AVG(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
        THEN TOTAL_AMT END)  AS TOTAL_AMT_AVG_7D,
    MAX(ORDER_TS) AS LAST_ORDER_TS
FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
WHERE ORDER_TS >= DATEADD('DAY', -8, CURRENT_TIMESTAMP())
GROUP BY USER_ID

Each feature gets its own CASE WHEN filter for its specific window, so features with different windows (e.g., 7-day and 30-day) coexist correctly in a single query. The outer WHERE clause scopes the entire query to the longest window plus the buffer.

How the helper works

online_fv_from_tiled() reads each Feature object’s internal attributes (_function, _column, _window, _offset, _alias) and maps them to standard SQL aggregation functions. It:

  1. Converts each Feature aggregation type to its SQL equivalent (e.g., Feature.sumSUM, Feature.stddevSTDDEV)
  2. Applies per-feature CASE WHEN window filters based on the feature’s _window and _offset
  3. Computes the maximum lookback across all features and adds window_buffer to form the outer WHERE clause
  4. Adds MAX(timestamp_col) AS LAST_{timestamp_col} as the timestamp for OFT tracking
  5. Returns a new FeatureView with online_config enabled and no tiling

For a registered tiled FV (whose internal query has been rewritten to the tiled form), you must pass source_query= pointing at the original source table.

Unsupported aggregations: FIRST_N, LAST_N, FIRST_DISTINCT_N, and LAST_DISTINCT_N cannot be reduced to a simple GROUP BY — the helper raises a clear error for these.

You can also use online_sql_from_features() to inspect or customise the generated SQL before creating a FeatureView:

from online_from_tiled import online_sql_from_features

sql = online_sql_from_features(
    features=purchase_features,
    join_keys=["USER_ID"],
    source_table="FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS",
    timestamp_col="ORDER_TS",
)
print(sql)

7.13.2 Manual Approach: Direct SQL

If you prefer full control or need to customise the query beyond what the helper generates, you can write the non-tiled FV manually:

from snowflake.ml.feature_store import feature_view as fv_mod

online_df = session.sql("""
    SELECT
        USER_ID,
        SUM(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
            THEN TOTAL_AMT END)  AS TOTAL_AMT_SUM_7D,
        COUNT(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
            THEN ORDER_ID END)   AS ORDER_ID_CNT_7D,
        AVG(CASE WHEN ORDER_TS >= DATEADD('DAY', -7, CURRENT_TIMESTAMP())
            THEN TOTAL_AMT END)  AS TOTAL_AMT_AVG_7D,
        MAX(ORDER_TS) AS LAST_ORDER_TS
    FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
    WHERE ORDER_TS >= DATEADD('DAY', -8, CURRENT_TIMESTAMP())
    GROUP BY USER_ID
""")

online_fv = FeatureView(
    name="USER_PURCHASE_ONLINE",
    entities=[user_entity],
    feature_df=online_df,
    timestamp_col="LAST_ORDER_TS",
    refresh_freq="1 minute",
    online_config=fv_mod.OnlineConfig(enable=True, target_lag="1 minute"),
    desc="Non-tiled online companion for USER_PURCHASE_TILED",
)
online_reg = fs.register_feature_view(online_fv, version="V01", block=True)
from snowflake.snowpark import functions as F
from snowflake.ml.feature_store import feature_view as fv_mod

orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
cutoff_7d = F.dateadd("DAY", F.lit(-7), F.current_timestamp())
cutoff_8d = F.dateadd("DAY", F.lit(-8), F.current_timestamp())

online_df = (
    orders
    .filter(F.col("ORDER_TS") >= cutoff_8d)
    .group_by("USER_ID")
    .agg(
        F.sum(F.when(F.col("ORDER_TS") >= cutoff_7d, F.col("TOTAL_AMT")))
            .alias("TOTAL_AMT_SUM_7D"),
        F.count(F.when(F.col("ORDER_TS") >= cutoff_7d, F.col("ORDER_ID")))
            .alias("ORDER_ID_CNT_7D"),
        F.avg(F.when(F.col("ORDER_TS") >= cutoff_7d, F.col("TOTAL_AMT")))
            .alias("TOTAL_AMT_AVG_7D"),
        F.max("ORDER_TS").alias("LAST_ORDER_TS"),
    )
)

online_fv = FeatureView(
    name="USER_PURCHASE_ONLINE",
    entities=[user_entity],
    feature_df=online_df,
    timestamp_col="LAST_ORDER_TS",
    refresh_freq="1 minute",
    online_config=fv_mod.OnlineConfig(enable=True, target_lag="1 minute"),
    desc="Non-tiled online companion for USER_PURCHASE_TILED",
)
online_reg = fs.register_feature_view(online_fv, version="V01", block=True)
Performance: FULL Refresh on Scoped Window

Both the helper and the manual approach use CURRENT_TIMESTAMP() in the WHERE clause to scope data to only the maximum window needed. This forces the resulting Dynamic Table to use FULL refresh on every cycle (not incremental), because Snowflake cannot track changes on queries with non-deterministic functions.

For short-to-medium windows (e.g., 7–30 days), the scoped data volume is small and FULL refresh typically completes in 1–2 seconds. For wide windows (90d, 180d, 365d), the FULL refresh must re-scan a proportionally larger slice of the source table on every cycle. At 365 days the scoped query effectively touches most of the table, negating much of the scoping benefit.

Window Width Approx. Source Scope FULL Refresh Impact
7 days ~1 week + buffer Fast (seconds)
30 days ~1 month + buffer Low (seconds to tens of seconds)
90–180 days Quarters to half-year Moderate — monitor DT refresh duration and warehouse cost
365 days ~full table High — consider whether online serving of year-wide aggregates is truly needed, or pre-compute in a scheduled task instead
Component Refresh Mode Why
Tiled FV (offline) INCREMENTAL Deterministic tiling query, efficient for large historical datasets
Online FV (helper-generated) FULL CURRENT_TIMESTAMP() in WHERE prevents change tracking, but scoped to bounded window
OFT (Hybrid Table) Mirrors DT Syncs changes from the online DT each cycle

The recommended pattern is to use both: the Aggregations API for offline computation (efficient tiled storage) and the online_fv_from_tiled() helper for online serving (scoped, bounded FULL refresh). If your widest feature window exceeds 90 days, benchmark the DT refresh time and cost before committing to a 1-minute refresh_freq — a longer refresh cycle (e.g., 5–15 minutes) may be more appropriate for wide windows.

Postgres Online Store (Private Preview)

The upcoming Postgres-backed online store (OnlineStoreType.POSTGRES) is designed to handle tiled Feature Views natively. It performs query-time tile reassembly to return correct window-aggregated values directly. Once this backend reaches General Availability, it will eliminate the need for the materialized workaround. See Chapter 8: Online Features for online store configuration details.


7.14 Next Steps

Continue to Chapter 8: Online Features to learn about Online Feature Tables for real-time serving, including the limitations and workarounds for tiled Feature Views discussed above.