8 Online Features

Hybrid Table online serving and the online feature model

Keywords

snowflake, feature store, ml, machine learning, mlops

8.1 Overview

Online Feature Tables (OFTs) are first-class Feature Store objects that expose the latest feature values per entity for low-latency inference. Every OFT has a logical contract (online_config, entity keys, feature columns) and a physical online backend where values are stored and read.

Snowflake supports two online backends:

Backend	`OnlineStoreType`	Status	This chapter
Hybrid Table	`HYBRID_TABLE` (default)	GA	Primary focus — architecture, tuning, pitfalls
Managed Postgres	`POSTGRES`	PubPr	Chapter 9 — provisioning, Stream/RT FVs, Feature Groups

This chapter covers the GA Hybrid Table backend. When you register a Feature View with online_config (default store_type=HYBRID_TABLE), the Feature Store API creates a single Online Feature Table Snowflake object. Snowflake encapsulates and manages the underlying infrastructure — the Hybrid Table for point lookups, the Stream on the offline Feature View, and the Task that MERGEs on target_lag. You do not create or wire those objects yourself.

At a logical level the data flow is: offline Dynamic Table or View → Snowflake Stream (change capture) → Task (MERGE on target_lag) → Hybrid Table. Physically, that pipeline is owned by the OFT object — lifecycle, grants, and drop semantics are tied to the OFT, not to separately managed Stream/Task DDL.

The Postgres backend (Chapter 9) uses the same CREATE ONLINE FEATURE TABLE catalog object for batch sync, but provisioning, physical storage, and reads work differently — see Postgres vs Hybrid managed object model.

Terminology: two different “streams”

A Snowflake Stream on the offline Feature View drives Hybrid/Postgres batch sync (this chapter). A Stream Feature View is a separate object that ingests live events via stream_ingest() into the Postgres store (Chapter 9). The names are similar; the write paths are not.

This chapter uses the clickstream sample model: database FEATURE_STORE_DEMO, source schema CLICKSTREAM_DATA (for example ORDERS, USERS, SESSIONS), Feature Store schema FEATURE_STORE, warehouse FS_DEV_WH. Feature View registrations use version strings in V01 format (for example version="V01").

Official references: CREATE ONLINE FEATURE TABLE, Create and serve online features (Python).

8.2 Learning Objectives

After completing this chapter, you will be able to:

Distinguish Hybrid vs Postgres online backends and which write paths each supports
Explain that a Hybrid OFT encapsulates Stream, Task, and Hybrid Table — the API creates and manages them as one object
Explain how Stream, Task, Hybrid Table, and TARGET_LAG fit together in the batch sync architecture
Enable online serving with online_config on a Feature View
Tune target_lag and reason about storage and compute costs
Handle multi-Feature-View online reads via pre-joined Feature Views or parallel retrieval
Plan latency budgets and troubleshoot latency volatility
Benchmark OFT performance correctly: warm-up, steady-state measurement, concurrent connections, and per-minute aggregates
Retrieve online features via the Feature Store Python API and integrate with serving layers
Evaluate when an external serving store is appropriate and understand the trade-offs

📂 Chapter code: Browse companion scripts on GitHub

8.3 Online backends and write paths

Use this matrix as the canonical map. Batch → online sync works on both backends. Stream Feature Views, Real-Time Feature Views, and Feature Groups require the Postgres backend and create_online_service() (Chapter 9).

Write path	Hybrid Table	Postgres	Notes
Batch offline → online sync	✅ GA	✅ PubPr	Same Snowflake-side sync concept; Hybrid stores in Hybrid Table, Postgres stores in managed Postgres via online service
Stream ingest (`stream_ingest` / REST)	❌	✅ PubPr	Stream Feature Views only; online-first, async offline historization
Request-time compute (`compute_fn`)	❌	✅ PubPr	Real-Time Feature Views only; no offline materialization
Feature Groups (multi-FV online bundle)	❌	✅ PubPr	Postgres online table with compound primary key

When to stay on Hybrid (this chapter): GA production, no preview enrollment, minute-level freshness is acceptable, existing OFT operations and warehouse-based reads.

When to use Postgres (Chapter 9): sub-~20 ms reads, tiled FVs without online_fv_from_tiled() workaround, Stream or Real-Time Feature Views, Feature Groups, or REST ingest/query from outside Snowflake warehouses.

8.4 What is a Hybrid Table Online Feature Table?

A Hybrid Table OFT is a first-class Snowflake database object (CREATE ONLINE FEATURE TABLE). From your perspective you register a Feature View with online_config; the Feature Store issues the OFT DDL, applies Feature Store tags for lineage, and Snowflake provisions the backing components as a managed unit:

Component	Role	You manage directly?
Online Feature Table (catalog object)	Logical online serving object; Feature Store metadata anchor	No — created via `register_feature_view`
Hybrid Table	Row store for latest feature values per entity key	No — created and altered by the OFT
Stream on offline FV	Captures inserts/updates/deletes from the offline Dynamic Table or View	No — created and associated by the OFT
Task	Runs MERGE on schedule derived from `target_lag`	No — created and scheduled by the OFT

You tune behavior through OnlineConfig (target_lag, enable/disable online serving) and Feature View lifecycle (register_feature_view, version bumps, delete). You do not hand-author Stream, Task, or Hybrid Table DDL for standard Hybrid OFTs, and you should not drop those underlying objects independently — drop or replace the OFT through the Feature Store API so dependencies stay consistent.

The OFT exposes current (latest) values per entity for low-latency retrieval. Freshness from the offline Feature View is maintained by:

A Stream on that offline object, capturing new and changed rows.
A Task that runs on an interval derived from TARGET_LAG (minimum 10 seconds, maximum 8 days).
MERGE (or equivalent incremental logic) into the Hybrid Table so only deltas are applied when the Stream has data; when the Stream is empty, the Task can finish quickly with minimal compute.

Key Concept

OFTs store only the latest feature values — they do not retain historical values. For point-in-time historical retrieval, use the offline Feature View and training APIs.

TARGET_LAG vs offline refresh_freq

refresh_freq on the Feature View controls how often the offline Dynamic Table (or related materialization) refreshes. OnlineConfig.target_lag controls how often the online sync Task runs. They are independent; end-to-end staleness is roughly the sum of both paths. See the Python online features documentation.

flowchart LR
  OFF[Offline FV DT or View] --> MERGE[Stream + Task MERGE]
  MERGE --> HT[Hybrid Table latest per key]

Online Feature Table: offline FV to Hybrid Table via stream and task

8.5 Creating Online Feature Tables

8.5.1 Enable online serving with `online_config`

Enable online features on the Feature View using online_config. Earlier API versions used helpers such as with_online_enabled(), which are now deprecated — use OnlineConfig instead.

📁 Full code: _code/online_feature_tables.py

from snowflake.snowpark import Session
from snowflake.ml.feature_store import FeatureStore, FeatureView, Entity
from snowflake.ml.feature_store import feature_view

session = Session.builder.configs(...).create()
fs = FeatureStore(
    session=session,
    database="FEATURE_STORE_DEMO",
    schema="FEATURE_STORE",
    default_warehouse="FS_DEV_WH",
)

user_entity = Entity(name="USER", join_keys=["USER_ID"])

SQL
Snowpark DataFrame

feature_df = session.sql("""
    SELECT
        USER_ID,
        SUM(TOTAL_AMT) AS ORDER_TOTAL_AMT_SUM,
        COUNT(ORDER_ID) AS ORDER_CNT,
        AVG(TOTAL_AMT) AS ORDER_TOTAL_AMT_AVG,
        MAX(ORDER_TS) AS LAST_ORDER_TS
    FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
    GROUP BY USER_ID
""")

import snowflake.snowpark.functions as F

orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
feature_df = orders.group_by("USER_ID").agg(
    F.sum("TOTAL_AMT").alias("ORDER_TOTAL_AMT_SUM"),
    F.count("ORDER_ID").alias("ORDER_CNT"),
    F.avg("TOTAL_AMT").alias("ORDER_TOTAL_AMT_AVG"),
    F.max("ORDER_TS").alias("LAST_ORDER_TS"),
)

online_config = feature_view.OnlineConfig(
    enable=True,
    target_lag="1 minute",
)

fv = FeatureView(
    name="USER_ORDER_ONLINE_FV",
    entities=[user_entity],
    feature_df=feature_df,
    timestamp_col="LAST_ORDER_TS",
    refresh_freq="1 minute",
    refresh_mode="AUTO",
    desc="USER order aggregates from CLICKSTREAM_DATA.ORDERS; online + offline",
    online_config=online_config,
)

fv = fs.register_feature_view(feature_view=fv, version="V01")

Snowflake creates and wires the OFT infrastructure (Hybrid Table, Stream, Task) as part of serving enablement — one API call, one managed object; TARGET_LAG in SQL corresponds to OnlineConfig.target_lag in Python.

Note

Initial OFT population: When online serving is first enabled the OFT must be fully populated – one row per entity key present in the source table. This initial load may take noticeably longer than subsequent refreshes, especially for large entity cardinalities. Once the initial population completes, incremental refreshes process only changed rows and should be fast, though throughput still depends on the update frequency and rate of feature-value changes in the source.

8.5.2 SQL equivalent (illustrative)

For visibility into the object model, the SQL shape mirrors CREATE ONLINE FEATURE TABLE:

CREATE OR REPLACE ONLINE FEATURE TABLE FEATURE_STORE_DEMO.FEATURE_STORE.USER_ORDER_ONLINE_OFT
  PRIMARY KEY (USER_ID)
  TIMESTAMP_COLUMN = LAST_ORDER_TS
  TARGET_LAG = '1 minute'
  WAREHOUSE = FS_DEV_WH
  REFRESH_MODE = AUTO
FROM FEATURE_STORE_DEMO.FEATURE_STORE.<your_offline_dynamic_table_or_view>;

(The Python API typically manages object names and wiring when you register the Feature View with online_config.)

8.5.3 Adding online serving to an existing Feature View

If a Feature View was originally registered without online serving, you can enable it after the fact. The Feature Store will create the OFT infrastructure (Hybrid Table, Stream, Task) for the existing offline Feature View:

fs.update_feature_view(
    name="USER_ORDER_ONLINE_FV",
    version="V01",
    online_config=feature_view.OnlineConfig(enable=True, target_lag="30 seconds"),
)

8.6 Retrieving Online Features

Use read_feature_view with store_type=StoreType.ONLINE and explicit feature names. Keys are provided as a list of key tuples (one per lookup row).

from snowflake.ml.feature_store import StoreType

features = fs.read_feature_view(
    feature_view=fv,
    keys=[["user_001"], ["user_002"]],
    feature_names=[
        "ORDER_TOTAL_AMT_SUM",
        "ORDER_CNT",
        "ORDER_TOTAL_AMT_AVG",
        "LAST_ORDER_TS",
    ],
    store_type=StoreType.ONLINE,
)

For a single entity, pass one inner list, for example keys=[["user_001"]].

8.7 Cost estimation (qualitative)

Online Feature Tables draw cost from storage, refresh compute, and serving compute. Treat the following as planning heuristics; use account usage views and your warehouse size for real numbers.

Area	Driver	Qualitative notes
Hybrid Table storage	Entity count × feature columns × row width	Scales with distinct keys you serve and column footprint; Hybrid Table storage is priced per GB (higher than standard table storage).
Refresh compute	`target_lag` cadence × MERGE work	The Task runs on the TARGET_LAG schedule; incremental refresh (Stream with new rows) is usually cheap. If the Stream has no pending data, runs can be very light.
Offline refresh	Feature View `refresh_freq`	Separate from online `target_lag`; both contribute to how stale features can be.
Online reads	Lookup volume × warehouse	Each `read_feature_view(..., ONLINE)` uses warehouse resources; larger warehouses cost more per time. (Check current Snowflake pricing notes for Hybrid Table request metering.)

Tips: Prefer AUTO or INCREMENTAL refresh where supported, right-size TARGET_LAG to the business SLA, and avoid setting online sync tighter than your offline Feature View can actually refresh.

8.8 Multi-Feature-View Online Reads

8.8.1 The limitation

Offline APIs (generate_dataset, generate_training_set, retrieve_feature_values) accept a list of Feature Views and join them automatically. The online API – read_feature_view(..., StoreType.ONLINE) – currently accepts one Feature View per call. Real-time use cases such as recommenders often need features from multiple entities (e.g., user, item, user-item interaction) in a single request.

8.8.2 Solution A: Pre-joined Feature View (preferred)

If the entity keys allow it, create a combined Dynamic Table that pre-joins the required Feature Views into a single record, then derive the OFT from that combined Feature View. This pushes the join into the materialization layer so the online read is a single point lookup:

SQL
Snowpark DataFrame

combined_df = session.sql("""
    SELECT
        ui.USER_ID,
        ui.LISTING_ID,
        u.ORDER_TOTAL_AMT_SUM,
        u.ORDER_CNT,
        l.LISTING_PRICE,
        l.LISTING_AGE_DAYS,
        ui.VIEW_CNT,
        ui.LAST_INTERACTION_TS AS FV_TS
    FROM FEATURE_STORE_DEMO.FEATURE_STORE."USER_LISTING_INTERACTION_FV$V01" ui
    JOIN FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"              u  ON u.USER_ID = ui.USER_ID
    JOIN FEATURE_STORE_DEMO.FEATURE_STORE."LISTING_FV$V01"                 l  ON l.LISTING_ID = ui.LISTING_ID
""")

ui = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."USER_LISTING_INTERACTION_FV$V01"')
u  = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"')
l  = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."LISTING_FV$V01"')

combined_df = (
    ui.join(u, ui["USER_ID"] == u["USER_ID"])
      .join(l, ui["LISTING_ID"] == l["LISTING_ID"])
      .select(
          ui["USER_ID"], ui["LISTING_ID"],
          u["ORDER_TOTAL_AMT_SUM"], u["ORDER_CNT"],
          l["LISTING_PRICE"], l["LISTING_AGE_DAYS"],
          ui["VIEW_CNT"],
          ui["LAST_INTERACTION_TS"].alias("FV_TS"),
      )
)

user_listing_entity = Entity(
    name="USER_LISTING", join_keys=["USER_ID", "LISTING_ID"],
)

combined_fv = FeatureView(
    name="RECO_COMBINED_ONLINE_FV",
    entities=[user_listing_entity],
    feature_df=combined_df,
    timestamp_col="FV_TS",
    refresh_freq="1 minute",
    online_config=feature_view.OnlineConfig(enable=True, target_lag="1 minute"),
    desc="Pre-joined user + listing + interaction features for online recommender scoring",
)
combined_fv = fs.register_feature_view(feature_view=combined_fv, version="V01")

A single read_feature_view call now returns all features:

features = fs.read_feature_view(
    feature_view=combined_fv,
    keys=[["user_001", "listing_042"]],
    feature_names=[...],
    store_type=StoreType.ONLINE,
)

Warning

Cardinality trade-off: Pre-joining inflates the OFT to the cardinality of the most granular key combination (e.g., user × listing). For high-cardinality interaction pairs (tens of millions of rows) this increases Hybrid Table storage, initial population time, and MERGE refresh cost. Evaluate whether the cardinality is manageable before choosing this approach – see High-cardinality OFTs below.

8.8.3 Solution B: Parallel reads with merging (fallback)

When pre-joining is not feasible (e.g., cardinality is too high, or the Feature Views have incompatible refresh cadences), issue reads in parallel and merge in the application layer:

from concurrent.futures import ThreadPoolExecutor
import pandas as pd

def read_online(fv, keys, feature_names):
    return fs.read_feature_view(
        feature_view=fv, keys=keys,
        feature_names=feature_names, store_type=StoreType.ONLINE,
    ).to_pandas()

with ThreadPoolExecutor(max_workers=3) as pool:
    fut_user    = pool.submit(read_online, user_fv,        [["user_001"]], USER_FEATURES)
    fut_listing = pool.submit(read_online, listing_fv,     [["listing_042"]], LISTING_FEATURES)
    fut_inter   = pool.submit(read_online, interaction_fv, [["user_001", "listing_042"]], INTER_FEATURES)

user_df    = fut_user.result()
listing_df = fut_listing.result()
inter_df   = fut_inter.result()

merged = pd.concat([user_df, listing_df, inter_df], axis=1)

Wall-clock latency is dominated by the slowest of the parallel calls rather than the sum. This approach avoids the cardinality explosion of Solution A but adds application-side complexity and requires manual result alignment.

8.9 Latency Budget Planning

Before deploying online features, establish a latency budget that allocates time across every component of the inference endpoint. Work backwards from the user-facing SLA:

Component	Typical range	Notes
Network (client → Snowflake)	5-30 ms	Dominated by region distance; co-locate where possible
Feature serving (OFT read)	10-50 ms (warm, low QPS)	Varies with row width, key count, and warehouse state
Model inference	5-50 ms	Depends on model complexity and serving framework
Application overhead	5-20 ms	Serialisation, result merging, logging
Total endpoint	25-150 ms	Sum of the above

Worked example: A recommender has a 100 ms end-to-end budget. Model inference takes 30 ms and application overhead is 10 ms, leaving 60 ms for feature serving + network. If network round-trip is 15 ms, the OFT read must complete in under 45 ms – achievable for a single batched lookup on a warm warehouse, but tight if multiple sequential FV reads are required (prefer the pre-joined pattern above).

Tip

If the latency budget leaves less than ~30 ms for feature serving, consider the pre-joined Feature View pattern to reduce the number of round trips, or the external serving store pattern for sub-10 ms requirements.

8.10 Performance Tuning and Benchmarking

8.10.1 Warm-Up Behavior

OFTs (backed by Hybrid Tables) require a number of queries to warm up before reaching steady-state optimal response times. The initial queries populate internal caches (data pages, metadata, connection pools) and will show higher latency than the system achieves once warmed.

Always warm up before measuring

Do not draw conclusions from the first few hundred queries. Run at least several minutes of sustained traffic at your expected QPS before recording latency measurements. Early cold-start results are not representative of production performance.

8.10.2 Benchmarking Guidelines

Guideline	Detail
Run duration	Sustain the test for long enough to reach steady state (minimum 5-10 minutes at target QPS). Short bursts capture warm-up noise, not true throughput.
Measure per-minute aggregates	Report P50, P95, P99 latency aggregated per minute. Avoid reporting single-query or first-query latencies as representative.
Use concurrent connections	A single serial connection under-utilises the warehouse. Launch multiple concurrent client connections (threads / async) to saturate the target QPS.
Match expected workload	Set QPS, key batch sizes, and feature column counts to match your production profile. Results from a 1 QPS test do not predict 1000 QPS behavior.
Account for network latency	Measure round-trip time from the client to your Snowflake account region. If the client is in a different region or on-premises, network latency dominates at low query times. Co-locate clients with the Snowflake account where possible.

8.10.3 Batch Key Lookups

Pre-batching entity keys into a single request reduces round trips and amortises connection overhead:

# ❌ BAD: one round-trip per user — N queries for N users
for uid in user_ids:
    fs.read_feature_view(
        feature_view=fv,
        keys=[[uid]],
        feature_names=FEATURE_NAMES,
        store_type=StoreType.ONLINE,
    )

# ✅ GOOD: one batched call — 1 query for N users
fs.read_feature_view(
    feature_view=fv,
    keys=[[uid] for uid in user_ids],
    feature_names=FEATURE_NAMES,
    store_type=StoreType.ONLINE,
)

Tip

Large key sets: OFTs are optimized for serving a modest number of key-based lookups (tens to low hundreds per call). If you need to look up thousands of entity keys at once (e.g., for batch scoring), it is typically faster to query the offline Feature View directly – the columnar scan of a Dynamic Table handles large result sets more efficiently than many point lookups against a Hybrid Table.

8.10.4 Parameter Binding

When issuing online lookups via raw SQL (outside the Feature Store API), use bind variables rather than string interpolation. This enables Snowflake to cache and reuse query plans, improving throughput under high QPS:

# ✅ GOOD: parameterised query — plan reuse
session.sql(
    "SELECT * FROM FEATURE_STORE.USER_ORDER_ONLINE_OFT WHERE USER_ID = ?",
    params=["user_001"],
).collect()

# ❌ BAD: string interpolation — new plan per unique value
session.sql(
    f"SELECT * FROM FEATURE_STORE.USER_ORDER_ONLINE_OFT WHERE USER_ID = 'user_001'"
).collect()

8.10.5 Warehouse Sizing and Multi-Cluster

For single-user or low-QPS testing, an XS warehouse is sufficient. For production workloads with high concurrency:

Use Standard warehouse type. Snowpark-optimized warehouses provide extra memory for UDF/UDTF workloads but add no benefit for Hybrid Table point lookups – they cost more per credit and can exhibit higher latency variance for OFT reads.
Multi-cluster warehouses (auto-scaling) allow Snowflake to spin up additional clusters to handle concurrent query load. Set MIN_CLUSTER_COUNT and MAX_CLUSTER_COUNT based on your typical and peak QPS.
Scale using multiples of XS warehouses – an XS cluster handles a bounded number of concurrent queries efficiently. Adding more clusters is generally more cost-effective than scaling up to a larger warehouse size; if P99 latency rises under load, add clusters first.
The refresh Task (MERGE into the Hybrid Table) and online reads share the warehouse. If refresh compute contends with read latency, consider separate warehouses for refresh vs. serving.

8.10.6 `target_lag` Trade-Offs (Online Sync)

`target_lag` (examples)	Freshness	Relative refresh cost	Typical use
`10 seconds` … `1 minute`	Tightest	Higher Task frequency	Fraud, real-time guards
`1 minute` … `5 minutes`	Low	Moderate	Recommendations, ranking
`15 minutes` … `1 hour`	Relaxed	Lower	Marketing, less critical paths

Match offline refresh_freq to upstream CLICKSTREAM_DATA landing latency so online sync is not chasing data that does not exist yet.

8.10.7 Feature Freshness and Training-Serving Skew

The features a model receives at inference time are subject to cumulative lag from multiple stages. The exact formula depends on the inference pattern:

Inference pattern	Lag components	Typical total
Online (OFT)	Source pipeline + offline `refresh_freq` + online `target_lag`	Seconds to hours
Batch (DT-based inference FV)	Source pipeline + upstream FV `refresh_freq` + inference DT `target_lag`	Minutes to hours
Batch (Task/Sproc)	Source pipeline + upstream FV `refresh_freq` + Task schedule interval	Minutes to hours

If the source data pipeline itself has variable execution times (e.g., an ETL that sometimes finishes at 2 AM and sometimes at 6 AM), the effective staleness is even less predictable. A model trained on point-in-time-perfect features – where the ASOF join retrieves the exact feature values available at each event – may never have seen feature values with this degree of staleness during training. At inference time, the model receives features that are minutes to hours behind reality, creating a training-serving skew (a distribution shift) that can degrade prediction accuracy.

This applies equally to batch inference. A DT-based scoring pipeline (see Chapter 11: Continuous Batch Inference) reads from upstream Feature Views that have their own refresh lag, compounded by the scoring DT’s own target_lag. A Stream/Task/Stored Procedure pattern has the same issue: the Task schedule determines how often scoring runs, but the features it reads may already be stale by the time the Task fires.

Account for realistic lag during training

To minimise training-serving skew, the training dataset should reflect the same feature freshness the model will experience in production. Techniques include:

Conservative spine (Chapter 6: Late-Arriving Data): Shift the spine timestamp back by the expected pipeline lag (e.g., DATEADD('hour', -2, EVENT_TS) AS FEATURE_CUTOFF_TS). The ASOF join then returns features as they would have looked given realistic staleness.
Measure actual lag: Query DT refresh history and OFT sync history to quantify real-world end-to-end staleness. Use this measurement to size the conservative spine buffer rather than guessing.
Feature freshness as a model input: For models that are sensitive to feature age, include a recency feature (e.g., DATEDIFF('minute', LAST_ORDER_TS, PREDICTION_TS)) as an explicit model input. This lets the model learn to discount stale features rather than treating all feature values as equally current. See Chapter 6: Returning the FV Timestamp for how to surface the Feature View’s timestamp column for this purpose.

Without these adjustments, a model can appear well-calibrated in offline evaluation (where features are perfectly fresh) but perform poorly once deployed – whether against an OFT, a DT-based scoring pipeline, or a scheduled batch job – with real-world pipeline lag.

8.10.8 High-Cardinality OFTs

OFT row count directly affects initial population time, MERGE refresh cost, and Hybrid Table storage. Guidelines:

Entity cardinality	Expected behavior
< 1 M rows	Fast initial load, lightweight incremental refreshes. Standard operating range.
1-10 M rows	Initial population may take minutes; incremental refresh is still efficient if change rate is low.
10-100 M+ rows	Initial population can be significant; MERGE cost per refresh cycle grows. Hybrid Table storage costs increase. Monitor refresh history and consider whether the offline Feature View + batch scoring is more appropriate.

Compound entity keys (e.g., USER_ID × LISTING_ID) can produce cross-product cardinalities that are orders of magnitude larger than individual entities. Before enabling online serving on high-cardinality interaction Feature Views, estimate the OFT row count and evaluate:

Is the full cross-product actually queried online? If only a subset of entity pairs is active, consider filtering to recently active combinations.
Would the pre-joined Feature View pattern work instead? Pre-joining avoids the interaction OFT entirely if user and listing features can be fetched separately and combined in the application.
Is an external serving store more appropriate? For tens of millions of rows with sub-20 ms latency requirements, a dedicated key-value store may be a better fit.

8.10.9 Latency Volatility Troubleshooting

If OFT read latency is volatile (e.g., 100-400 ms swings) after warm-up, investigate the following:

Check	Action
MERGE contention	The refresh Task running MERGE into the Hybrid Table can contend with concurrent reads. Check `ONLINE_FEATURE_STORE_HISTORY()` for refresh timing overlap with latency spikes. Separate refresh and read warehouses if contention is confirmed.
Warehouse type	Snowpark-optimized warehouses can show higher variance for point lookups. Switch to a Standard warehouse.
Warehouse suspension	If the warehouse auto-suspends between requests, the next query incurs resume latency. Set `AUTO_SUSPEND` high enough (or disable suspension) for latency-sensitive serving.
Connection pooling	Creating a new Snowflake session per request adds significant overhead. Reuse sessions / connection pools in your serving application.
Row width	Very wide rows (hundreds of columns) increase per-read data transfer. Use `feature_names` to request only the columns needed, or consider the bundled OBJECT pattern (Ch13).
Network jitter	Cross-region or on-premises clients see variable network latency. Co-locate clients with the Snowflake account region.

8.11 Integration with external services

8.11.1 FastAPI example

📁 Full code: _code/fastapi_serving.py

from fastapi import FastAPI
from snowflake.ml.feature_store import FeatureStore, StoreType

app = FastAPI()

FEATURE_NAMES = [
    "ORDER_TOTAL_AMT_SUM",
    "ORDER_CNT",
    "ORDER_TOTAL_AMT_AVG",
    "LAST_ORDER_TS",
]

@app.post("/features")
async def get_features(user_id: str):
    """Latest USER order features from the OFT."""
    features = fs.read_feature_view(
        feature_view=user_order_fv,
        keys=[[user_id]],
        feature_names=FEATURE_NAMES,
        store_type=StoreType.ONLINE,
    )
    return features.to_pandas().to_dict(orient="records")[0]

8.11.2 Read Latency vs Feature Freshness

When evaluating online serving options, it is important to distinguish two separate concerns:

Concern	Definition	Native OFT (Hybrid Table)
Read latency	Time from query to response	10–100 ms (warm, standard WH)
Feature freshness	Time from source event to feature value reflecting it	`DT refresh_freq` + `OFT target_lag` = tens of seconds to minutes

Many discussions conflate these. A system can have low read latency but stale features (fast reads of old data), or high read latency but fresh features (slow reads of up-to-date data). When a customer says “we need real-time features,” clarify which dimension they mean.

End-to-end freshness in the batch model:

Source event lands in a Snowflake table
Dynamic Table refreshes on refresh_freq schedule (minutes to hours)
OFT sync Task runs on target_lag schedule (minimum 10 seconds)
Feature value is queryable

For features like “count of transactions in the last N seconds,” this pipeline is fundamentally too slow regardless of read latency.

8.11.3 External serving store pattern

Some use cases have sub-20 ms read latency or sub-second feature freshness requirements that exceed what Hybrid Table OFTs deliver today.

Try native Postgres before an external store

Default escalation path: Hybrid (this chapter) → Postgres online (Chapter 9) → external store only if a native path still cannot meet the SLA.

Postgres online is in active development. Read latency and ingest freshness targets quoted in PubPr (for example ~10 ms p50 reads, sub-2s stream ingest) reflect current service performance — not a fixed ceiling. Snowflake is continuing to enhance the managed Postgres online stack (serving path, scale, and Stream/RT Feature View capabilities). Benchmark in your account, but avoid committing to a multi-year external-store architecture based solely on today’s preview numbers.

What Postgres gives you without self-managed infra:

Batch + Postgres — same offline→online sync model as Hybrid; lower read latency on existing batch pipelines
Stream Feature Views — sub-second ingest freshness with built-in offline historization
Real-Time Feature Views — request-time compute_fn with training parity via generate_training_set()

When external stores still make sense: a mandated KV engine (Redis/Dynamo at the edge), multi-cloud serving outside Snowflake’s network, or requirements Postgres online does not yet support in your region/account.

An external key-value store (Redis, SingleStore, DynamoDB, Aerospike, etc.) can beat native read latency in some deployments — but you own the full operational burden: CDC or sync pipelines, schema evolution, monitoring, incident response, capacity planning, security patching, and historization back to Snowflake for training. That trade is often underestimated when Hybrid latency is the only gap; Postgres online may close the gap without that overhead.

When native Snowflake serving (Hybrid or Postgres) still does not meet the SLA after evaluation, teams replicate features to an external store for serving.

Pattern: Keep the Feature Store as the system of record for feature definitions, lineage, and offline retrieval. Use a CDC (change data capture) or scheduled sync tool to replicate the OFT or offline Feature View to the external store:

flowchart TD
  FS[Feature Store offline FV]
  FS --> OFT[OFT Hybrid Table]
  FS --> CDC[CDC or sync tool]
  CDC --> EXT[External store Redis PG Dynamo]

External serving: Feature Store to OFT or CDC to external store

Trade-offs:

Factor	Hybrid OFT (GA)	Postgres online (PubPr)	External store
Read latency (today)	10–100 ms warm	~10 ms p50 (preview targets; improving)	Often 1–20 ms (you tune and operate the stack)
Feature freshness	DT refresh + `target_lag` (seconds–minutes)	Batch: same; Stream FV: sub-second ingest	As fast as your write pipeline (sub-second possible)
Operational overhead	Fully managed by Snowflake	Fully managed by Snowflake (+ PAT, network policy)	You manage CDC, sync jobs, schema mapping, HA, monitoring, upgrades
Feature Store integration	Native OFT, lineage, training APIs	SFV/RTFV/Feature Groups + training APIs	Definitions in FS; serving values duplicated — historization is on you
Performance trajectory	Mature GA path	Actively enhanced — preview latency is not the end state	Fixed by your infra choices; no Snowflake roadmap benefit
Schema changes	FV versioning	FV versioning	Manual pipeline updates; breaking changes risk sync failures

8.11.4 Hybrid architecture for sub-second features

When a subset of features genuinely needs sub-second freshness (fraud signals, real-time engagement scores) while others are slow-changing (demographics, 7d/30d aggregates), a split serving architecture avoids over-engineering the entire pipeline:

Snowflake Feature Store (Hybrid or Postgres batch) for slow-changing features — the majority of the feature vector
Native Postgres stream path first — Stream Feature Views for sub-second features with offline historization and Feature Store lineage
External real-time layer only if needed — SingleStore, Redis, Kafka/Flink state, etc., when native Postgres stream serving still misses the SLA and you accept the self-managed cost

The inference service assembles features from both sources at prediction time. Prefer two native Snowflake backends (Hybrid batch + Postgres stream) over Snowflake + external unless operations capacity and long-term maintenance are explicitly budgeted.

flowchart LR
    subgraph snowflake [Snowflake Feature Store]
        DT["Dynamic Table"] --> OFT["Hybrid Table OFT"]
    end
    subgraph realtime [Real-Time Layer]
        Stream["Event Stream"] --> ExtStore["External Store"]
    end
    OFT --> Inference["Inference Service"]
    ExtStore --> Inference
    Inference --> Prediction["Model Prediction"]

Hybrid serving: batch features from Snowflake, real-time features from external store

8.11.5 Historization of external online features

If a customer uses an external online store for sub-second features, they must consider how to keep a history of those feature values in the offline store for:

Training data generation — PIT-correct training requires knowing what the feature value was at each historical label timestamp. Without historization, you cannot reconstruct what the model “saw” at prediction time.
Batch inference reproducibility — reproducing a past inference requires the same feature vector.
Drift detection and model monitoring — comparing training-time and serving-time feature distributions requires logged serving values.

Options for historization:

Approach	Pros	Cons
CDC / change log from external store written back to a Snowflake table, registered as a View-based Feature View	Full history; Feature Store catalog/lineage; supports PIT joins	Requires CDC pipeline; adds latency to offline availability
Shadow logging at inference time — log the assembled feature vector alongside the prediction	Captures exactly what the model saw; no CDC needed	Storage overhead; requires inference service changes; not available for ad-hoc historical queries
Accept approximate PIT for sub-second features	No additional infrastructure	Training data may not perfectly match serving behavior; acceptable when sub-second features have limited historical sensitivity

Tip

Snowflake Stream Feature Views (PubPr) provide a Postgres-backed online store with sub-2 second end-to-end freshness and built-in offline historization. Before committing to a complex external-store architecture, evaluate whether native streaming or batch + Postgres serving in Chapter 9 meets your freshness requirements.

8.12 Best practices

1. Right-size `target_lag`

# Tight online sync for high-stakes scoring
fraud_online = feature_view.OnlineConfig(enable=True, target_lag="30 seconds")

# Relaxed sync where minutes of lag are acceptable
reco_online = feature_view.OnlineConfig(enable=True, target_lag="5 minutes")

2. Monitor OFT health

-- Refresh / sync history (names may vary by release; use INFORMATION_SCHEMA as documented for your account)
SELECT *
FROM TABLE(INFORMATION_SCHEMA.ONLINE_FEATURE_STORE_HISTORY())
ORDER BY SYNC_START_TIME DESC
LIMIT 10;

3. Use a serving warehouse appropriately

Run Feature Store sessions used for online reads on a warehouse sized for latency and concurrency (often FS_DEV_WH in development, a dedicated production warehouse in real deployments). The TARGET_LAG Task uses the warehouse you configure on the OFT / Feature Store setup.

8.13 Common pitfalls

8.13.1 ❌ Pitfall 1: Expecting historical data

Problem: OFTs only store current values, not history.

Solution: Use offline Feature Views and temporal retrieval for training and PIT correctness.

8.13.2 ❌ Pitfall 2: Ignoring `refresh_freq` while tuning `target_lag`

Problem: Setting target_lag="10s" while the offline Dynamic Table refreshes hourly.

Solution: Align offline refresh, source data latency, and target_lag so expectations match reality.

8.13.3 ❌ Pitfall 3: Per-entity API chatter

Problem: Calling read_feature_view once per key in a tight loop.

Solution: Batch keys in one keys=[...] list.

8.13.4 ❌ Pitfall 4: Using deprecated online enablement APIs

Problem: Code or docs that call with_online_enabled() or assume a separate create_online_feature_table Python flow as the only path.

Solution: Use online_config=feature_view.OnlineConfig(...) on the Feature View (or update_feature_view).

8.13.5 ❌ Pitfall 5: Enabling online serving on a tiled Feature View

Problem: A Feature View created with the Aggregations API (features=[Feature.sum(...), ...] and feature_granularity) stores partial aggregates in the underlying Dynamic Table — columns like _PARTIAL_SUM_TOTAL_AMT and TILE_START instead of the original timestamp_col. When you add online_config to such a Feature View, register_feature_view fails:

SQL compilation error: column 'ORDER_TS' does not exist

The SDK does not validate this in advance.

Solution: Use one of three approaches (ranked by preference):

online_fv_from_tiled() helper (recommended): Auto-generates a non-tiled, OFT-ready Feature View from the tiled FV’s Feature definitions. The helper produces a scoped GROUP BY query bounded to only max(window + offset) + buffer of source data, keeping the DT small and fast despite FULL refresh. Use the tiled FV for offline/training workloads.
Manual non-tiled FV: Write your own GROUP BY query with CASE WHEN per-feature window filters if you need to customise beyond what the helper generates.
Postgres online store (PubPr): OnlineStoreType.POSTGRES performs query-time tile reassembly natively. See Chapter 9.

# Example: using the helper to create an online companion
from online_from_tiled import online_fv_from_tiled

online_fv = online_fv_from_tiled(
    tiled_reg, session,
    source_query="SELECT * FROM DB.SCHEMA.ORDERS",
)
fs.register_feature_view(online_fv, version="V01", block=True)

See Chapter 7: Tiled Feature Views and Online Serving for the helper’s full API, generated SQL examples, and performance trade-offs.

8.14 Summary

Concept	Description
Online Feature Table (Hybrid)	First-class Snowflake object encapsulating Hybrid Table + Stream + Task + TARGET_LAG; API creates and manages underlying objects
Postgres backend	Shared online service + per-FV `FROM SPECIFICATION` OFTs; serving in managed Postgres — see Chapter 9
`OnlineConfig.target_lag`	Python knob for maximum lag of online table behind source (10s–8d); Task frequency
`refresh_freq` (Feature View)	Offline Dynamic Table refresh; independent of `target_lag`
`read_feature_view(..., StoreType.ONLINE)`	Low-latency retrieval of latest values; currently single-FV per call
Multi-FV online reads	Pre-join into a combined FV (preferred) or use parallel reads with `ThreadPoolExecutor`
Latency budget	Plan end-to-end: network + feature serving + model inference + application overhead
Warehouse type	Use Standard warehouses for OFT serving; Snowpark-optimized adds no benefit for point lookups
High-cardinality OFTs	Compound-key entities can explode row count; evaluate cardinality before enabling online serving
Warm-up	OFTs require sustained queries to reach steady-state latency; always warm up before benchmarking
Benchmarking	Concurrent connections, key batching, parameter binding, per-minute P50/P95/P99, multi-cluster WH for high QPS
External serving store	Fallback for sub-20 ms requirements; CDC to Redis/Postgres/DynamoDB with Feature Store as system of record
Tiled FV limitation	Tiled (Aggregations API) FVs cannot have Hybrid Table OFTs directly — use the `online_fv_from_tiled()` helper to auto-generate a non-tiled companion, or the Postgres backend
Current values only	No historical timeline in the OFT

8.15 Next Steps

Continue to Chapter 9: Real-Time and Online Serving for the Postgres online backend (batch + Postgres, Stream FVs, RTFVs, Feature Groups). Shared concepts in this chapter — target_lag, freshness vs latency, multi-FV reads — apply to both backends where the write path is batch sync.

--- title: "Online Features" subtitle: "Hybrid Table online serving and the online feature model" --- ## Overview **Online Feature Tables (OFTs)** are first-class Feature Store objects that expose the **latest feature values** per entity for low-latency inference. Every OFT has a logical contract (`online_config`, entity keys, feature columns) and a **physical online backend** where values are stored and read. Snowflake supports **two online backends**: | Backend | `OnlineStoreType` | Status | This chapter | |---------|-------------------|--------|--------------| | **Hybrid Table** | `HYBRID_TABLE` (default) | GA | **Primary focus** — architecture, tuning, pitfalls | | **Managed Postgres** | `POSTGRES` | PubPr | [Chapter 9](../09_real_time_online_serving/index.qmd) — provisioning, Stream/RT FVs, Feature Groups | **This chapter** covers the **GA Hybrid Table** backend. When you register a Feature View with `online_config` (default `store_type=HYBRID_TABLE`), the Feature Store API creates a single **Online Feature Table** Snowflake object. Snowflake **encapsulates and manages** the underlying infrastructure — the Hybrid Table for point lookups, the Stream on the offline Feature View, and the Task that MERGEs on `target_lag`. You do not create or wire those objects yourself. At a logical level the data flow is: offline Dynamic Table or View → Snowflake Stream (change capture) → Task (MERGE on `target_lag`) → Hybrid Table. Physically, that pipeline is **owned by the OFT object** — lifecycle, grants, and drop semantics are tied to the OFT, not to separately managed Stream/Task DDL. The **Postgres backend** ([Chapter 9](../09_real_time_online_serving/index.qmd)) uses the same `CREATE ONLINE FEATURE TABLE` catalog object for batch sync, but provisioning, physical storage, and reads work differently — see [Postgres vs Hybrid managed object model](../09_real_time_online_serving/index.qmd#sec-postgres-object-model). ::: {.callout-warning title="Terminology: two different “streams”"} A **Snowflake Stream** on the offline Feature View drives Hybrid/Postgres **batch sync** (this chapter). A **Stream Feature View** is a separate object that ingests live events via `stream_ingest()` into the Postgres store ([Chapter 9](../09_real_time_online_serving/index.qmd)). The names are similar; the write paths are not. ::: This chapter uses the **clickstream** sample model: database `FEATURE_STORE_DEMO`, source schema `CLICKSTREAM_DATA` (for example `ORDERS`, `USERS`, `SESSIONS`), Feature Store schema `FEATURE_STORE`, warehouse `FS_DEV_WH`. Feature View registrations use version strings in **`V01` format** (for example `version="V01"`). Official references: [CREATE ONLINE FEATURE TABLE](https://docs.snowflake.com/en/sql-reference/sql/create-online-feature-table), [Create and serve online features (Python)](https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/create-and-serve-online-features-python). ## Learning Objectives After completing this chapter, you will be able to: - Distinguish Hybrid vs Postgres online backends and which write paths each supports - Explain that a Hybrid OFT encapsulates Stream, Task, and Hybrid Table — the API creates and manages them as one object - Explain how Stream, Task, Hybrid Table, and TARGET_LAG fit together in the batch sync architecture - Enable online serving with `online_config` on a Feature View - Tune `target_lag` and reason about storage and compute costs - Handle multi-Feature-View online reads via pre-joined Feature Views or parallel retrieval - Plan latency budgets and troubleshoot latency volatility - Benchmark OFT performance correctly: warm-up, steady-state measurement, concurrent connections, and per-minute aggregates - Retrieve online features via the Feature Store Python API and integrate with serving layers - Evaluate when an external serving store is appropriate and understand the trade-offs > 📂 **Chapter code:** [Browse companion scripts on GitHub](https://github.com/Snowflake-Labs/snowflake-featurestore-imp-guide/tree/main/Snowflake_FeatureStore_Implementation_Guide/08_online_features/_code) --- ## Online backends and write paths {#sec-serving-paths} Use this matrix as the canonical map. **Batch → online sync** works on **both** backends. **Stream Feature Views**, **Real-Time Feature Views**, and **Feature Groups** require the **Postgres** backend and `create_online_service()` ([Chapter 9](../09_real_time_online_serving/index.qmd)). | Write path | Hybrid Table | Postgres | Notes | |------------|:------------:|:--------:|-------| | **Batch offline → online sync** | ✅ GA | ✅ PubPr | Same Snowflake-side sync *concept*; Hybrid stores in Hybrid Table, Postgres stores in managed Postgres via online service | | **Stream ingest** (`stream_ingest` / REST) | ❌ | ✅ PubPr | Stream Feature Views only; online-first, async offline historization | | **Request-time compute** (`compute_fn`) | ❌ | ✅ PubPr | Real-Time Feature Views only; no offline materialization | | **Feature Groups** (multi-FV online bundle) | ❌ | ✅ PubPr | Postgres online table with compound primary key | **When to stay on Hybrid (this chapter):** GA production, no preview enrollment, minute-level freshness is acceptable, existing OFT operations and warehouse-based reads. **When to use Postgres ([Chapter 9](../09_real_time_online_serving/index.qmd)):** sub-~20 ms reads, tiled FVs without `online_fv_from_tiled()` workaround, Stream or Real-Time Feature Views, Feature Groups, or REST ingest/query from outside Snowflake warehouses. --- ## What is a Hybrid Table Online Feature Table? {#sec-hybrid-oft-model} A Hybrid Table OFT is a **first-class Snowflake database object** (`CREATE ONLINE FEATURE TABLE`). From your perspective you register a Feature View with `online_config`; the Feature Store issues the OFT DDL, applies Feature Store tags for lineage, and Snowflake provisions the backing components **as a managed unit**: | Component | Role | You manage directly? | |-----------|------|----------------------| | **Online Feature Table** (catalog object) | Logical online serving object; Feature Store metadata anchor | No — created via `register_feature_view` | | **Hybrid Table** | Row store for latest feature values per entity key | No — created and altered by the OFT | | **Stream** on offline FV | Captures inserts/updates/deletes from the offline Dynamic Table or View | No — created and associated by the OFT | | **Task** | Runs MERGE on schedule derived from `target_lag` | No — created and scheduled by the OFT | You tune behavior through **`OnlineConfig`** (`target_lag`, enable/disable online serving) and Feature View lifecycle (`register_feature_view`, version bumps, delete). You do **not** hand-author Stream, Task, or Hybrid Table DDL for standard Hybrid OFTs, and you should not drop those underlying objects independently — drop or replace the OFT through the Feature Store API so dependencies stay consistent. The OFT exposes **current (latest) values** per entity for low-latency retrieval. Freshness from the **offline** Feature View is maintained by: 1. A **Stream** on that offline object, capturing new and changed rows. 2. A **Task** that runs on an interval derived from **TARGET_LAG** (minimum **10 seconds**, maximum **8 days**). 3. **MERGE** (or equivalent incremental logic) into the Hybrid Table so only deltas are applied when the Stream has data; when the Stream is empty, the Task can finish quickly with **minimal compute**. ::: {.callout-important} ## Key Concept OFTs store only the **latest feature values** — they do not retain historical values. For point-in-time historical retrieval, use the offline Feature View and training APIs. ::: ::: {.callout-note} ## TARGET_LAG vs offline `refresh_freq` `refresh_freq` on the Feature View controls how often the **offline** Dynamic Table (or related materialization) refreshes. `OnlineConfig.target_lag` controls how often the **online** sync Task runs. They are **independent**; end-to-end staleness is roughly the sum of both paths. See the [Python online features](https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/create-and-serve-online-features-python) documentation. ::: ```{mermaid} %%| fig-cap: "Online Feature Table: offline FV to Hybrid Table via stream and task" %%| fig-alt: "Offline feature view merges into hybrid table for low latency reads" flowchart LR OFF[Offline FV DT or View] --> MERGE[Stream + Task MERGE] MERGE --> HT[Hybrid Table latest per key] ``` --- ## Creating Online Feature Tables ### Enable online serving with `online_config` Enable online features on the **Feature View** using `online_config`. Earlier API versions used helpers such as `with_online_enabled()`, which are now **deprecated** — use `OnlineConfig` instead. > 📁 **Full code:** [`_code/online_feature_tables.py`](_code/online_feature_tables.py) ```python from snowflake.snowpark import Session from snowflake.ml.feature_store import FeatureStore, FeatureView, Entity from snowflake.ml.feature_store import feature_view session = Session.builder.configs(...).create() fs = FeatureStore( session=session, database="FEATURE_STORE_DEMO", schema="FEATURE_STORE", default_warehouse="FS_DEV_WH", ) user_entity = Entity(name="USER", join_keys=["USER_ID"]) ``` ::: {.panel-tabset group="lang"} ## SQL ```python feature_df = session.sql(""" SELECT USER_ID, SUM(TOTAL_AMT) AS ORDER_TOTAL_AMT_SUM, COUNT(ORDER_ID) AS ORDER_CNT, AVG(TOTAL_AMT) AS ORDER_TOTAL_AMT_AVG, MAX(ORDER_TS) AS LAST_ORDER_TS FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS GROUP BY USER_ID """) ``` ## Snowpark DataFrame ```python import snowflake.snowpark.functions as F orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS") feature_df = orders.group_by("USER_ID").agg( F.sum("TOTAL_AMT").alias("ORDER_TOTAL_AMT_SUM"), F.count("ORDER_ID").alias("ORDER_CNT"), F.avg("TOTAL_AMT").alias("ORDER_TOTAL_AMT_AVG"), F.max("ORDER_TS").alias("LAST_ORDER_TS"), ) ``` ::: ```python online_config = feature_view.OnlineConfig( enable=True, target_lag="1 minute", ) fv = FeatureView( name="USER_ORDER_ONLINE_FV", entities=[user_entity], feature_df=feature_df, timestamp_col="LAST_ORDER_TS", refresh_freq="1 minute", refresh_mode="AUTO", desc="USER order aggregates from CLICKSTREAM_DATA.ORDERS; online + offline", online_config=online_config, ) fv = fs.register_feature_view(feature_view=fv, version="V01") ``` Snowflake creates and wires the OFT infrastructure (Hybrid Table, Stream, Task) as part of serving enablement — **one API call, one managed object**; **TARGET_LAG** in SQL corresponds to **`OnlineConfig.target_lag`** in Python. ::: {.callout-note} **Initial OFT population**: When online serving is first enabled the OFT must be **fully populated** -- one row per entity key present in the source table. This initial load may take noticeably longer than subsequent refreshes, especially for large entity cardinalities. Once the initial population completes, incremental refreshes process only changed rows and should be fast, though throughput still depends on the update frequency and rate of feature-value changes in the source. ::: ### SQL equivalent (illustrative) For visibility into the object model, the SQL shape mirrors [CREATE ONLINE FEATURE TABLE](https://docs.snowflake.com/en/sql-reference/sql/create-online-feature-table): ```sql CREATE OR REPLACE ONLINE FEATURE TABLE FEATURE_STORE_DEMO.FEATURE_STORE.USER_ORDER_ONLINE_OFT PRIMARY KEY (USER_ID) TIMESTAMP_COLUMN = LAST_ORDER_TS TARGET_LAG = '1 minute' WAREHOUSE = FS_DEV_WH REFRESH_MODE = AUTO FROM FEATURE_STORE_DEMO.FEATURE_STORE.<your_offline_dynamic_table_or_view>; ``` (The Python API typically manages object names and wiring when you register the Feature View with `online_config`.) ### Adding online serving to an existing Feature View If a Feature View was originally registered **without** online serving, you can enable it after the fact. The Feature Store will create the OFT infrastructure (Hybrid Table, Stream, Task) for the existing offline Feature View: ```python fs.update_feature_view( name="USER_ORDER_ONLINE_FV", version="V01", online_config=feature_view.OnlineConfig(enable=True, target_lag="30 seconds"), ) ``` --- ## Retrieving Online Features Use `read_feature_view` with `store_type=StoreType.ONLINE` and explicit feature names. Keys are provided as a list of key tuples (one per lookup row). ```python from snowflake.ml.feature_store import StoreType features = fs.read_feature_view( feature_view=fv, keys=[["user_001"], ["user_002"]], feature_names=[ "ORDER_TOTAL_AMT_SUM", "ORDER_CNT", "ORDER_TOTAL_AMT_AVG", "LAST_ORDER_TS", ], store_type=StoreType.ONLINE, ) ``` For a single entity, pass one inner list, for example `keys=[["user_001"]]`. --- ## Cost estimation (qualitative) Online Feature Tables draw cost from **storage**, **refresh compute**, and **serving compute**. Treat the following as planning heuristics; use [account usage views](https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/create-and-serve-online-features-python) and your warehouse size for real numbers. | Area | Driver | Qualitative notes | |------|--------|-------------------| | **Hybrid Table storage** | Entity count × feature columns × row width | Scales with distinct keys you serve and column footprint; Hybrid Table storage is priced per GB (higher than standard table storage). | | **Refresh compute** | `target_lag` cadence × MERGE work | The Task runs on the TARGET_LAG schedule; **incremental** refresh (Stream with new rows) is usually cheap. If the Stream has **no** pending data, runs can be very light. | | **Offline refresh** | Feature View `refresh_freq` | Separate from online `target_lag`; both contribute to how stale features can be. | | **Online reads** | Lookup volume × warehouse | Each `read_feature_view(..., ONLINE)` uses warehouse resources; larger warehouses cost more per time. (Check current Snowflake pricing notes for Hybrid Table request metering.) | **Tips:** Prefer **AUTO** or **INCREMENTAL** refresh where supported, right-size **TARGET_LAG** to the business SLA, and avoid setting online sync tighter than your offline Feature View can actually refresh. --- ## Multi-Feature-View Online Reads {#sec-multi-fv-online} ### The limitation Offline APIs (`generate_dataset`, `generate_training_set`, `retrieve_feature_values`) accept a **list** of Feature Views and join them automatically. The online API -- `read_feature_view(..., StoreType.ONLINE)` -- currently accepts **one Feature View per call**. Real-time use cases such as recommenders often need features from multiple entities (e.g., user, item, user-item interaction) in a single request. ### Solution A: Pre-joined Feature View (preferred) If the entity keys allow it, create a **combined Dynamic Table** that pre-joins the required Feature Views into a single record, then derive the OFT from that combined Feature View. This pushes the join into the materialization layer so the online read is a single point lookup: ::: {.panel-tabset group="lang"} ## SQL ```python combined_df = session.sql(""" SELECT ui.USER_ID, ui.LISTING_ID, u.ORDER_TOTAL_AMT_SUM, u.ORDER_CNT, l.LISTING_PRICE, l.LISTING_AGE_DAYS, ui.VIEW_CNT, ui.LAST_INTERACTION_TS AS FV_TS FROM FEATURE_STORE_DEMO.FEATURE_STORE."USER_LISTING_INTERACTION_FV$V01" ui JOIN FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01" u ON u.USER_ID = ui.USER_ID JOIN FEATURE_STORE_DEMO.FEATURE_STORE."LISTING_FV$V01" l ON l.LISTING_ID = ui.LISTING_ID """) ``` ## Snowpark DataFrame ```python ui = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."USER_LISTING_INTERACTION_FV$V01"') u = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"') l = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."LISTING_FV$V01"') combined_df = ( ui.join(u, ui["USER_ID"] == u["USER_ID"]) .join(l, ui["LISTING_ID"] == l["LISTING_ID"]) .select( ui["USER_ID"], ui["LISTING_ID"], u["ORDER_TOTAL_AMT_SUM"], u["ORDER_CNT"], l["LISTING_PRICE"], l["LISTING_AGE_DAYS"], ui["VIEW_CNT"], ui["LAST_INTERACTION_TS"].alias("FV_TS"), ) ) ``` ::: ```python user_listing_entity = Entity( name="USER_LISTING", join_keys=["USER_ID", "LISTING_ID"], ) combined_fv = FeatureView( name="RECO_COMBINED_ONLINE_FV", entities=[user_listing_entity], feature_df=combined_df, timestamp_col="FV_TS", refresh_freq="1 minute", online_config=feature_view.OnlineConfig(enable=True, target_lag="1 minute"), desc="Pre-joined user + listing + interaction features for online recommender scoring", ) combined_fv = fs.register_feature_view(feature_view=combined_fv, version="V01") ``` A single `read_feature_view` call now returns all features: ```python features = fs.read_feature_view( feature_view=combined_fv, keys=[["user_001", "listing_042"]], feature_names=[...], store_type=StoreType.ONLINE, ) ``` ::: {.callout-warning} **Cardinality trade-off**: Pre-joining inflates the OFT to the cardinality of the most granular key combination (e.g., user × listing). For high-cardinality interaction pairs (tens of millions of rows) this increases Hybrid Table storage, initial population time, and MERGE refresh cost. Evaluate whether the cardinality is manageable before choosing this approach -- see [High-cardinality OFTs](#sec-high-cardinality) below. ::: ### Solution B: Parallel reads with merging (fallback) When pre-joining is not feasible (e.g., cardinality is too high, or the Feature Views have incompatible refresh cadences), issue reads in parallel and merge in the application layer: ```python from concurrent.futures import ThreadPoolExecutor import pandas as pd def read_online(fv, keys, feature_names): return fs.read_feature_view( feature_view=fv, keys=keys, feature_names=feature_names, store_type=StoreType.ONLINE, ).to_pandas() with ThreadPoolExecutor(max_workers=3) as pool: fut_user = pool.submit(read_online, user_fv, [["user_001"]], USER_FEATURES) fut_listing = pool.submit(read_online, listing_fv, [["listing_042"]], LISTING_FEATURES) fut_inter = pool.submit(read_online, interaction_fv, [["user_001", "listing_042"]], INTER_FEATURES) user_df = fut_user.result() listing_df = fut_listing.result() inter_df = fut_inter.result() merged = pd.concat([user_df, listing_df, inter_df], axis=1) ``` Wall-clock latency is dominated by the **slowest** of the parallel calls rather than the sum. This approach avoids the cardinality explosion of Solution A but adds application-side complexity and requires manual result alignment. --- ## Latency Budget Planning {#sec-latency-budget} Before deploying online features, establish a **latency budget** that allocates time across every component of the inference endpoint. Work backwards from the user-facing SLA: | Component | Typical range | Notes | |-----------|--------------|-------| | **Network (client → Snowflake)** | 5-30 ms | Dominated by region distance; co-locate where possible | | **Feature serving (OFT read)** | 10-50 ms (warm, low QPS) | Varies with row width, key count, and warehouse state | | **Model inference** | 5-50 ms | Depends on model complexity and serving framework | | **Application overhead** | 5-20 ms | Serialisation, result merging, logging | | **Total endpoint** | 25-150 ms | Sum of the above | **Worked example**: A recommender has a 100 ms end-to-end budget. Model inference takes 30 ms and application overhead is 10 ms, leaving **60 ms for feature serving + network**. If network round-trip is 15 ms, the OFT read must complete in under **45 ms** -- achievable for a single batched lookup on a warm warehouse, but tight if multiple sequential FV reads are required (prefer the pre-joined pattern above). ::: {.callout-tip} If the latency budget leaves less than ~30 ms for feature serving, consider the [pre-joined Feature View](#sec-multi-fv-online) pattern to reduce the number of round trips, or the [external serving store](#sec-external-serving) pattern for sub-10 ms requirements. ::: --- ## Performance Tuning and Benchmarking ### Warm-Up Behavior OFTs (backed by Hybrid Tables) require a number of queries to **warm up** before reaching steady-state optimal response times. The initial queries populate internal caches (data pages, metadata, connection pools) and will show higher latency than the system achieves once warmed. ::: {.callout-important} ## Always warm up before measuring Do not draw conclusions from the first few hundred queries. Run at least several minutes of sustained traffic at your expected QPS before recording latency measurements. Early cold-start results are not representative of production performance. ::: ### Benchmarking Guidelines | Guideline | Detail | |-----------|--------| | **Run duration** | Sustain the test for long enough to reach steady state (minimum 5-10 minutes at target QPS). Short bursts capture warm-up noise, not true throughput. | | **Measure per-minute aggregates** | Report P50, P95, P99 latency aggregated per minute. Avoid reporting single-query or first-query latencies as representative. | | **Use concurrent connections** | A single serial connection under-utilises the warehouse. Launch multiple concurrent client connections (threads / async) to saturate the target QPS. | | **Match expected workload** | Set QPS, key batch sizes, and feature column counts to match your production profile. Results from a 1 QPS test do not predict 1000 QPS behavior. | | **Account for network latency** | Measure round-trip time from the client to your Snowflake account region. If the client is in a different region or on-premises, network latency dominates at low query times. Co-locate clients with the Snowflake account where possible. | ### Batch Key Lookups Pre-batching entity keys into a single request reduces round trips and amortises connection overhead: ```python # ❌ BAD: one round-trip per user — N queries for N users for uid in user_ids: fs.read_feature_view( feature_view=fv, keys=[[uid]], feature_names=FEATURE_NAMES, store_type=StoreType.ONLINE, ) # ✅ GOOD: one batched call — 1 query for N users fs.read_feature_view( feature_view=fv, keys=[[uid] for uid in user_ids], feature_names=FEATURE_NAMES, store_type=StoreType.ONLINE, ) ``` ::: {.callout-tip} **Large key sets**: OFTs are optimized for serving a modest number of key-based lookups (tens to low hundreds per call). If you need to look up thousands of entity keys at once (e.g., for batch scoring), it is typically faster to query the **offline** Feature View directly -- the columnar scan of a Dynamic Table handles large result sets more efficiently than many point lookups against a Hybrid Table. ::: ### Parameter Binding When issuing online lookups via raw SQL (outside the Feature Store API), use **bind variables** rather than string interpolation. This enables Snowflake to cache and reuse query plans, improving throughput under high QPS: ```python # ✅ GOOD: parameterised query — plan reuse session.sql( "SELECT * FROM FEATURE_STORE.USER_ORDER_ONLINE_OFT WHERE USER_ID = ?", params=["user_001"], ).collect() # ❌ BAD: string interpolation — new plan per unique value session.sql( f"SELECT * FROM FEATURE_STORE.USER_ORDER_ONLINE_OFT WHERE USER_ID = 'user_001'" ).collect() ``` ### Warehouse Sizing and Multi-Cluster For single-user or low-QPS testing, an **XS** warehouse is sufficient. For production workloads with high concurrency: - Use **Standard** warehouse type. **Snowpark-optimized** warehouses provide extra memory for UDF/UDTF workloads but add no benefit for Hybrid Table point lookups -- they cost more per credit and can exhibit higher latency variance for OFT reads. - **Multi-cluster warehouses** (auto-scaling) allow Snowflake to spin up additional clusters to handle concurrent query load. Set `MIN_CLUSTER_COUNT` and `MAX_CLUSTER_COUNT` based on your **typical and peak QPS**. - Scale using multiples of **XS warehouses** -- an XS cluster handles a bounded number of concurrent queries efficiently. Adding more clusters is generally more cost-effective than scaling up to a larger warehouse size; if P99 latency rises under load, add clusters first. - The **refresh Task** (MERGE into the Hybrid Table) and **online reads** share the warehouse. If refresh compute contends with read latency, consider separate warehouses for refresh vs. serving. ### `target_lag` Trade-Offs (Online Sync) | `target_lag` (examples) | Freshness | Relative refresh cost | Typical use | |-------------------------|-----------|------------------------|-------------| | `10 seconds` … `1 minute` | Tightest | Higher Task frequency | Fraud, real-time guards | | `1 minute` … `5 minutes` | Low | Moderate | Recommendations, ranking | | `15 minutes` … `1 hour` | Relaxed | Lower | Marketing, less critical paths | Match **offline** `refresh_freq` to upstream **CLICKSTREAM_DATA** landing latency so online sync is not chasing data that does not exist yet. ### Feature Freshness and Training-Serving Skew {#sec-training-serving-skew} The features a model receives at inference time are subject to **cumulative lag** from multiple stages. The exact formula depends on the inference pattern: | Inference pattern | Lag components | Typical total | |---|---|---| | **Online (OFT)** | Source pipeline + offline `refresh_freq` + online `target_lag` | Seconds to hours | | **Batch (DT-based inference FV)** | Source pipeline + upstream FV `refresh_freq` + inference DT `target_lag` | Minutes to hours | | **Batch (Task/Sproc)** | Source pipeline + upstream FV `refresh_freq` + Task schedule interval | Minutes to hours | If the source data pipeline itself has variable execution times (e.g., an ETL that sometimes finishes at 2 AM and sometimes at 6 AM), the effective staleness is even less predictable. A model trained on **point-in-time-perfect** features -- where the ASOF join retrieves the exact feature values available at each event -- may never have seen feature values with this degree of staleness during training. At inference time, the model receives features that are minutes to hours behind reality, creating a **training-serving skew** (a distribution shift) that can degrade prediction accuracy. This applies equally to **batch inference**. A DT-based scoring pipeline (see [Chapter 11: Continuous Batch Inference](../11_training_inference/index.qmd#sec-continuous-batch)) reads from upstream Feature Views that have their own refresh lag, compounded by the scoring DT's own `target_lag`. A Stream/Task/Stored Procedure pattern has the same issue: the Task schedule determines how often scoring runs, but the features it reads may already be stale by the time the Task fires. ::: {.callout-important} ## Account for realistic lag during training To minimise training-serving skew, the training dataset should reflect the **same feature freshness** the model will experience in production. Techniques include: - **Conservative spine** ([Chapter 6: Late-Arriving Data](../06_temporal_features/index.qmd#solutions)): Shift the spine timestamp back by the expected pipeline lag (e.g., `DATEADD('hour', -2, EVENT_TS) AS FEATURE_CUTOFF_TS`). The ASOF join then returns features as they would have looked given realistic staleness. - **Measure actual lag**: Query DT refresh history and OFT sync history to quantify real-world end-to-end staleness. Use this measurement to size the conservative spine buffer rather than guessing. - **Feature freshness as a model input**: For models that are sensitive to feature age, include a **recency feature** (e.g., `DATEDIFF('minute', LAST_ORDER_TS, PREDICTION_TS)`) as an explicit model input. This lets the model learn to discount stale features rather than treating all feature values as equally current. See [Chapter 6: Returning the FV Timestamp](../06_temporal_features/index.qmd#sec-include-fv-timestamp) for how to surface the Feature View's timestamp column for this purpose. Without these adjustments, a model can appear well-calibrated in offline evaluation (where features are perfectly fresh) but perform poorly once deployed -- whether against an OFT, a DT-based scoring pipeline, or a scheduled batch job -- with real-world pipeline lag. ::: ### High-Cardinality OFTs {#sec-high-cardinality} OFT row count directly affects **initial population time**, **MERGE refresh cost**, and **Hybrid Table storage**. Guidelines: | Entity cardinality | Expected behavior | |---|---| | **< 1 M rows** | Fast initial load, lightweight incremental refreshes. Standard operating range. | | **1-10 M rows** | Initial population may take minutes; incremental refresh is still efficient if change rate is low. | | **10-100 M+ rows** | Initial population can be significant; MERGE cost per refresh cycle grows. Hybrid Table storage costs increase. Monitor refresh history and consider whether the offline Feature View + batch scoring is more appropriate. | Compound entity keys (e.g., `USER_ID × LISTING_ID`) can produce **cross-product cardinalities** that are orders of magnitude larger than individual entities. Before enabling online serving on high-cardinality interaction Feature Views, estimate the OFT row count and evaluate: - **Is the full cross-product actually queried online?** If only a subset of entity pairs is active, consider filtering to recently active combinations. - **Would the [pre-joined Feature View](#sec-multi-fv-online) pattern work instead?** Pre-joining avoids the interaction OFT entirely if user and listing features can be fetched separately and combined in the application. - **Is an [external serving store](#sec-external-serving) more appropriate?** For tens of millions of rows with sub-20 ms latency requirements, a dedicated key-value store may be a better fit. ### Latency Volatility Troubleshooting If OFT read latency is **volatile** (e.g., 100-400 ms swings) after warm-up, investigate the following: | Check | Action | |-------|--------| | **MERGE contention** | The refresh Task running MERGE into the Hybrid Table can contend with concurrent reads. Check `ONLINE_FEATURE_STORE_HISTORY()` for refresh timing overlap with latency spikes. Separate refresh and read warehouses if contention is confirmed. | | **Warehouse type** | Snowpark-optimized warehouses can show higher variance for point lookups. Switch to a **Standard** warehouse. | | **Warehouse suspension** | If the warehouse auto-suspends between requests, the next query incurs resume latency. Set `AUTO_SUSPEND` high enough (or disable suspension) for latency-sensitive serving. | | **Connection pooling** | Creating a new Snowflake session per request adds significant overhead. Reuse sessions / connection pools in your serving application. | | **Row width** | Very wide rows (hundreds of columns) increase per-read data transfer. Use `feature_names` to request only the columns needed, or consider the bundled OBJECT pattern ([Ch13](../13_advanced_patterns/index.qmd)). | | **Network jitter** | Cross-region or on-premises clients see variable network latency. Co-locate clients with the Snowflake account region. | ### Performance Checklist - [ ] Warm up the OFT with sustained traffic before recording metrics - [ ] Use a **Standard** (not Snowpark-optimized) warehouse for serving - [ ] Use multiple concurrent client connections to reach target QPS - [ ] Batch entity keys into single `read_feature_view` calls where possible - [ ] Use parameter binding for raw SQL lookups - [ ] Measure P50/P95/P99 per minute, not individual query latencies - [ ] Consider multi-cluster warehouse for high-concurrency production serving - [ ] Separate refresh and read warehouses if MERGE contention affects read latency - [ ] Co-locate client and Snowflake account in the same cloud region - [ ] Reuse Snowflake sessions / connection pools -- do not create a new session per request --- ## Integration with external services ### FastAPI example > 📁 **Full code:** [`_code/fastapi_serving.py`](_code/fastapi_serving.py) ```python from fastapi import FastAPI from snowflake.ml.feature_store import FeatureStore, StoreType app = FastAPI() FEATURE_NAMES = [ "ORDER_TOTAL_AMT_SUM", "ORDER_CNT", "ORDER_TOTAL_AMT_AVG", "LAST_ORDER_TS", ] @app.post("/features") async def get_features(user_id: str): """Latest USER order features from the OFT.""" features = fs.read_feature_view( feature_view=user_order_fv, keys=[[user_id]], feature_names=FEATURE_NAMES, store_type=StoreType.ONLINE, ) return features.to_pandas().to_dict(orient="records")[0] ``` ### Read Latency vs Feature Freshness {#sec-latency-vs-freshness} When evaluating online serving options, it is important to distinguish two separate concerns: | Concern | Definition | Native OFT (Hybrid Table) | |---|---|---| | **Read latency** | Time from query to response | 10–100 ms (warm, standard WH) | | **Feature freshness** | Time from source event to feature value reflecting it | `DT refresh_freq` + `OFT target_lag` = tens of seconds to minutes | Many discussions conflate these. A system can have low read latency but stale features (fast reads of old data), or high read latency but fresh features (slow reads of up-to-date data). When a customer says "we need real-time features," clarify which dimension they mean. **End-to-end freshness in the batch model:** 1. Source event lands in a Snowflake table 2. Dynamic Table refreshes on `refresh_freq` schedule (minutes to hours) 3. OFT sync Task runs on `target_lag` schedule (minimum 10 seconds) 4. Feature value is queryable For features like "count of transactions in the last N seconds," this pipeline is fundamentally too slow regardless of read latency. ### External serving store pattern {#sec-external-serving} Some use cases have **sub-20 ms read latency** or **sub-second feature freshness** requirements that exceed what **Hybrid Table** OFTs deliver today. ::: {.callout-important title="Try native Postgres before an external store"} **Default escalation path:** Hybrid (this chapter) → **Postgres online** ([Chapter 9](../09_real_time_online_serving/index.qmd)) → external store only if a native path still cannot meet the SLA. Postgres online is in **active development**. Read latency and ingest freshness targets quoted in PubPr (for example ~10 ms p50 reads, sub-2s stream ingest) reflect **current** service performance — not a fixed ceiling. Snowflake is continuing to enhance the managed Postgres online stack (serving path, scale, and Stream/RT Feature View capabilities). Benchmark in your account, but avoid committing to a multi-year external-store architecture based solely on today's preview numbers. **What Postgres gives you without self-managed infra:** - **Batch + Postgres** — same offline→online sync model as Hybrid; lower read latency on existing batch pipelines - **Stream Feature Views** — sub-second ingest freshness with built-in offline historization - **Real-Time Feature Views** — request-time `compute_fn` with training parity via `generate_training_set()` **When external stores still make sense:** a mandated KV engine (Redis/Dynamo at the edge), multi-cloud serving outside Snowflake's network, or requirements Postgres online does not yet support in your region/account. ::: An **external key-value store** (Redis, SingleStore, DynamoDB, Aerospike, etc.) can beat native read latency in some deployments — but **you own the full operational burden**: CDC or sync pipelines, schema evolution, monitoring, incident response, capacity planning, security patching, and historization back to Snowflake for training. That trade is often underestimated when Hybrid latency is the only gap; Postgres online may close the gap without that overhead. When native Snowflake serving (Hybrid **or** Postgres) still does not meet the SLA after evaluation, teams replicate features to an external store for serving. **Pattern**: Keep the Feature Store as the **system of record** for feature definitions, lineage, and offline retrieval. Use a CDC (change data capture) or scheduled sync tool to replicate the OFT or offline Feature View to the external store: ```{mermaid} %%| fig-cap: "External serving: Feature Store to OFT or CDC to external store" %%| fig-alt: "Offline feature store path to hybrid OFT or to external key value store" flowchart TD FS[Feature Store offline FV] FS --> OFT[OFT Hybrid Table] FS --> CDC[CDC or sync tool] CDC --> EXT[External store Redis PG Dynamo] ``` **Trade-offs:** | Factor | Hybrid OFT (GA) | Postgres online (PubPr) | External store | |--------|-----------------|----------------------|----------------| | **Read latency (today)** | 10–100 ms warm | ~10 ms p50 (preview targets; improving) | Often 1–20 ms (you tune and operate the stack) | | **Feature freshness** | DT refresh + `target_lag` (seconds–minutes) | Batch: same; Stream FV: sub-second ingest | As fast as your write pipeline (sub-second possible) | | **Operational overhead** | Fully managed by Snowflake | Fully managed by Snowflake (+ PAT, network policy) | **You manage** CDC, sync jobs, schema mapping, HA, monitoring, upgrades | | **Feature Store integration** | Native OFT, lineage, training APIs | SFV/RTFV/Feature Groups + training APIs | Definitions in FS; serving values duplicated — historization is on you | | **Performance trajectory** | Mature GA path | **Actively enhanced** — preview latency is not the end state | Fixed by your infra choices; no Snowflake roadmap benefit | | **Schema changes** | FV versioning | FV versioning | Manual pipeline updates; breaking changes risk sync failures | ### Hybrid architecture for sub-second features {#sec-hybrid-architecture} When a subset of features genuinely needs sub-second freshness (fraud signals, real-time engagement scores) while others are slow-changing (demographics, 7d/30d aggregates), a **split serving architecture** avoids over-engineering the entire pipeline: - **Snowflake Feature Store (Hybrid or Postgres batch)** for slow-changing features — the majority of the feature vector - **Native Postgres stream path first** — [Stream Feature Views](../09_real_time_online_serving/index.qmd) for sub-second features with offline historization and Feature Store lineage - **External real-time layer only if needed** — SingleStore, Redis, Kafka/Flink state, etc., when native Postgres stream serving still misses the SLA *and* you accept the self-managed cost The inference service assembles features from both sources at prediction time. Prefer two native Snowflake backends (Hybrid batch + Postgres stream) over Snowflake + external unless operations capacity and long-term maintenance are explicitly budgeted. ```{mermaid} %%| fig-cap: "Hybrid serving: batch features from Snowflake, real-time features from external store" flowchart LR subgraph snowflake [Snowflake Feature Store] DT["Dynamic Table"] --> OFT["Hybrid Table OFT"] end subgraph realtime [Real-Time Layer] Stream["Event Stream"] --> ExtStore["External Store"] end OFT --> Inference["Inference Service"] ExtStore --> Inference Inference --> Prediction["Model Prediction"] ``` ### Historization of external online features {#sec-historization} If a customer uses an external online store for sub-second features, they must consider how to **keep a history of those feature values in the offline store** for: - **Training data generation** — PIT-correct training requires knowing what the feature value was at each historical label timestamp. Without historization, you cannot reconstruct what the model "saw" at prediction time. - **Batch inference reproducibility** — reproducing a past inference requires the same feature vector. - **Drift detection and model monitoring** — comparing training-time and serving-time feature distributions requires logged serving values. **Options for historization:** | Approach | Pros | Cons | |---|---|---| | **CDC / change log from external store** written back to a Snowflake table, registered as a View-based Feature View | Full history; Feature Store catalog/lineage; supports PIT joins | Requires CDC pipeline; adds latency to offline availability | | **Shadow logging at inference time** — log the assembled feature vector alongside the prediction | Captures exactly what the model saw; no CDC needed | Storage overhead; requires inference service changes; not available for ad-hoc historical queries | | **Accept approximate PIT** for sub-second features | No additional infrastructure | Training data may not perfectly match serving behavior; acceptable when sub-second features have limited historical sensitivity | ::: {.callout-tip} Snowflake **Stream Feature Views** (PubPr) provide a Postgres-backed online store with sub-2 second end-to-end freshness and built-in offline historization. Before committing to a complex external-store architecture, evaluate whether native streaming or batch + Postgres serving in [Chapter 9](../09_real_time_online_serving/index.qmd) meets your freshness requirements. ::: --- ## Best practices ### 1. Right-size `target_lag` {.unnumbered} ```python # Tight online sync for high-stakes scoring fraud_online = feature_view.OnlineConfig(enable=True, target_lag="30 seconds") # Relaxed sync where minutes of lag are acceptable reco_online = feature_view.OnlineConfig(enable=True, target_lag="5 minutes") ``` ### 2. Monitor OFT health {.unnumbered} ```sql -- Refresh / sync history (names may vary by release; use INFORMATION_SCHEMA as documented for your account) SELECT * FROM TABLE(INFORMATION_SCHEMA.ONLINE_FEATURE_STORE_HISTORY()) ORDER BY SYNC_START_TIME DESC LIMIT 10; ``` ### 3. Use a serving warehouse appropriately {.unnumbered} Run Feature Store sessions used for **online** reads on a warehouse sized for **latency and concurrency** (often `FS_DEV_WH` in development, a dedicated production warehouse in real deployments). The **TARGET_LAG** Task uses the warehouse you configure on the OFT / Feature Store setup. --- ## Common pitfalls ### ❌ Pitfall 1: Expecting historical data **Problem:** OFTs only store current values, not history. **Solution:** Use offline Feature Views and temporal retrieval for training and PIT correctness. ### ❌ Pitfall 2: Ignoring `refresh_freq` while tuning `target_lag` **Problem:** Setting `target_lag="10s"` while the offline Dynamic Table refreshes hourly. **Solution:** Align offline refresh, source data latency, and `target_lag` so expectations match reality. ### ❌ Pitfall 3: Per-entity API chatter **Problem:** Calling `read_feature_view` once per key in a tight loop. **Solution:** Batch keys in one `keys=[...]` list. ### ❌ Pitfall 4: Using deprecated online enablement APIs **Problem:** Code or docs that call `with_online_enabled()` or assume a separate `create_online_feature_table` Python flow as the only path. **Solution:** Use `online_config=feature_view.OnlineConfig(...)` on the Feature View (or `update_feature_view`). ### ❌ Pitfall 5: Enabling online serving on a tiled Feature View {#sec-pitfall-tiled-oft} **Problem:** A Feature View created with the Aggregations API (`features=[Feature.sum(...), ...]` and `feature_granularity`) stores partial aggregates in the underlying Dynamic Table — columns like `_PARTIAL_SUM_TOTAL_AMT` and `TILE_START` instead of the original `timestamp_col`. When you add `online_config` to such a Feature View, `register_feature_view` fails: ``` SQL compilation error: column 'ORDER_TS' does not exist ``` The SDK does not validate this in advance. **Solution:** Use **one of three approaches** (ranked by preference): 1. **`online_fv_from_tiled()` helper** (recommended): Auto-generates a non-tiled, OFT-ready Feature View from the tiled FV's `Feature` definitions. The helper produces a scoped `GROUP BY` query bounded to only `max(window + offset) + buffer` of source data, keeping the DT small and fast despite FULL refresh. Use the tiled FV for offline/training workloads. 2. **Manual non-tiled FV**: Write your own `GROUP BY` query with `CASE WHEN` per-feature window filters if you need to customise beyond what the helper generates. 3. **Postgres online store** (PubPr): `OnlineStoreType.POSTGRES` performs query-time tile reassembly natively. See [Chapter 9](../09_real_time_online_serving/index.qmd). ```{python} #| eval: false # Example: using the helper to create an online companion from online_from_tiled import online_fv_from_tiled online_fv = online_fv_from_tiled( tiled_reg, session, source_query="SELECT * FROM DB.SCHEMA.ORDERS", ) fs.register_feature_view(online_fv, version="V01", block=True) ``` See [Chapter 7: Tiled Feature Views and Online Serving](../07_aggregations_api/index.qmd#sec-tiled-online) for the helper's full API, generated SQL examples, and performance trade-offs. --- ## Summary | Concept | Description | |---------|-------------| | **Online Feature Table (Hybrid)** | First-class Snowflake object encapsulating Hybrid Table + Stream + Task + **TARGET_LAG**; API creates and manages underlying objects | | **Postgres backend** | Shared online service + per-FV `FROM SPECIFICATION` OFTs; serving in managed Postgres — see [Chapter 9](../09_real_time_online_serving/index.qmd#sec-postgres-object-model) | | **`OnlineConfig.target_lag`** | Python knob for maximum lag of online table behind source (10s–8d); Task frequency | | **`refresh_freq`** (Feature View) | Offline Dynamic Table refresh; independent of `target_lag` | | **`read_feature_view(..., StoreType.ONLINE)`** | Low-latency retrieval of latest values; currently single-FV per call | | **Multi-FV online reads** | Pre-join into a combined FV (preferred) or use parallel reads with `ThreadPoolExecutor` | | **Latency budget** | Plan end-to-end: network + feature serving + model inference + application overhead | | **Warehouse type** | Use **Standard** warehouses for OFT serving; Snowpark-optimized adds no benefit for point lookups | | **High-cardinality OFTs** | Compound-key entities can explode row count; evaluate cardinality before enabling online serving | | **Warm-up** | OFTs require sustained queries to reach steady-state latency; always warm up before benchmarking | | **Benchmarking** | Concurrent connections, key batching, parameter binding, per-minute P50/P95/P99, multi-cluster WH for high QPS | | **External serving store** | Fallback for sub-20 ms requirements; CDC to Redis/Postgres/DynamoDB with Feature Store as system of record | | **Tiled FV limitation** | Tiled (Aggregations API) FVs cannot have Hybrid Table OFTs directly — use the `online_fv_from_tiled()` helper to auto-generate a non-tiled companion, or the Postgres backend | | **Current values only** | No historical timeline in the OFT | --- ## Next Steps Continue to [Chapter 9: Real-Time and Online Serving](../09_real_time_online_serving/index.qmd) for the **Postgres online backend** (batch + Postgres, Stream FVs, RTFVs, Feature Groups). Shared concepts in this chapter — `target_lag`, freshness vs latency, multi-FV reads — apply to both backends where the write path is batch sync.