flowchart LR OFF[Offline FV DT or View] --> MERGE[Stream + Task MERGE] MERGE --> HT[Hybrid Table latest per key]
8 Online Features
Online Feature Tables for real-time serving
snowflake, feature store, ml, machine learning, mlops
8.1 Overview
Online Feature Tables (OFTs) enable low-latency feature serving for real-time ML inference. Snowflake treats an OFT as a first-class object: internally, it combines a Hybrid Table for key-value lookups with incremental refresh driven by a Stream on the offline Feature View source and a Task that runs MERGE SQL into the Hybrid Table on a schedule controlled by TARGET_LAG.
This chapter uses the clickstream sample model: database FEATURE_STORE_DEMO, source schema CLICKSTREAM_DATA (for example ORDERS, USERS, SESSIONS), Feature Store schema FEATURE_STORE, warehouse FS_DEV_WH. Feature View registrations use version strings in V01 format (for example version="V01").
Official references: CREATE ONLINE FEATURE TABLE, Create and serve online features (Python).
8.2 Learning Objectives
After completing this chapter, you will be able to:
- Explain how Stream, Task, Hybrid Table, and TARGET_LAG fit together in the OFT architecture
- Enable online serving with
online_configon a Feature View - Tune
target_lagand reason about storage and compute costs - Handle multi-Feature-View online reads via pre-joined Feature Views or parallel retrieval
- Plan latency budgets and troubleshoot latency volatility
- Benchmark OFT performance correctly: warm-up, steady-state measurement, concurrent connections, and per-minute aggregates
- Retrieve online features via the Feature Store Python API and integrate with serving layers
- Evaluate when an external serving store is appropriate and understand the trade-offs
📂 Chapter code: Browse companion scripts on GitHub
8.3 What is an Online Feature Table?
An Online Feature Table is a managed object that exposes the current (latest) values per entity for low-latency retrieval. Physically, online serving uses a Hybrid Table keyed by your entity columns. Freshness from the offline Feature View (Dynamic Table or View) is maintained by:
- A Stream on that offline object, capturing new and changed rows.
- A Task that runs on an interval derived from TARGET_LAG (minimum 10 seconds, maximum 8 days).
- MERGE (or equivalent incremental logic) into the Hybrid Table so only deltas are applied when the Stream has data; when the Stream is empty, the Task can finish quickly with minimal compute.
OFTs store only the latest feature values — they do not retain historical values. For point-in-time historical retrieval, use the offline Feature View and training APIs.
refresh_freq
refresh_freq on the Feature View controls how often the offline Dynamic Table (or related materialization) refreshes. OnlineConfig.target_lag controls how often the online sync Task runs. They are independent; end-to-end staleness is roughly the sum of both paths. See the Python online features documentation.
8.4 Creating Online Feature Tables
8.4.1 Enable online serving with online_config
Enable online features on the Feature View using online_config. Earlier API versions used helpers such as with_online_enabled(), which are now deprecated — use OnlineConfig instead.
📁 Full code:
_code/online_feature_tables.py
from snowflake.snowpark import Session
from snowflake.ml.feature_store import FeatureStore, FeatureView, Entity
from snowflake.ml.feature_store import feature_view
session = Session.builder.configs(...).create()
fs = FeatureStore(
session=session,
database="FEATURE_STORE_DEMO",
schema="FEATURE_STORE",
default_warehouse="FS_DEV_WH",
)
user_entity = Entity(name="USER", join_keys=["USER_ID"])import snowflake.snowpark.functions as F
orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
feature_df = orders.group_by("USER_ID").agg(
F.sum("TOTAL_AMT").alias("ORDER_TOTAL_AMT_SUM"),
F.count("ORDER_ID").alias("ORDER_CNT"),
F.avg("TOTAL_AMT").alias("ORDER_TOTAL_AMT_AVG"),
F.max("ORDER_TS").alias("LAST_ORDER_TS"),
)online_config = feature_view.OnlineConfig(
enable=True,
target_lag="1 minute",
)
fv = FeatureView(
name="USER_ORDER_ONLINE_FV",
entities=[user_entity],
feature_df=feature_df,
timestamp_col="LAST_ORDER_TS",
refresh_freq="1 minute",
refresh_mode="AUTO",
desc="USER order aggregates from CLICKSTREAM_DATA.ORDERS; online + offline",
online_config=online_config,
)
fv = fs.register_feature_view(feature_view=fv, version="V01")Snowflake creates and wires the OFT infrastructure (Hybrid Table, Stream, Task) as part of serving enablement; TARGET_LAG in SQL corresponds to OnlineConfig.target_lag in Python.
Initial OFT population: When online serving is first enabled the OFT must be fully populated – one row per entity key present in the source table. This initial load may take noticeably longer than subsequent refreshes, especially for large entity cardinalities. Once the initial population completes, incremental refreshes process only changed rows and should be fast, though throughput still depends on the update frequency and rate of feature-value changes in the source.
8.4.2 SQL equivalent (illustrative)
For visibility into the object model, the SQL shape mirrors CREATE ONLINE FEATURE TABLE:
CREATE OR REPLACE ONLINE FEATURE TABLE FEATURE_STORE_DEMO.FEATURE_STORE.USER_ORDER_ONLINE_OFT
PRIMARY KEY (USER_ID)
TIMESTAMP_COLUMN = LAST_ORDER_TS
TARGET_LAG = '1 minute'
WAREHOUSE = FS_DEV_WH
REFRESH_MODE = AUTO
FROM FEATURE_STORE_DEMO.FEATURE_STORE.<your_offline_dynamic_table_or_view>;(The Python API typically manages object names and wiring when you register the Feature View with online_config.)
8.4.3 Adding online serving to an existing Feature View
If a Feature View was originally registered without online serving, you can enable it after the fact. The Feature Store will create the OFT infrastructure (Hybrid Table, Stream, Task) for the existing offline Feature View:
fs.update_feature_view(
name="USER_ORDER_ONLINE_FV",
version="V01",
online_config=feature_view.OnlineConfig(enable=True, target_lag="30 seconds"),
)8.5 Retrieving Online Features
Use read_feature_view with store_type=StoreType.ONLINE and explicit feature names. Keys are provided as a list of key tuples (one per lookup row).
from snowflake.ml.feature_store import StoreType
features = fs.read_feature_view(
feature_view=fv,
keys=[["user_001"], ["user_002"]],
feature_names=[
"ORDER_TOTAL_AMT_SUM",
"ORDER_CNT",
"ORDER_TOTAL_AMT_AVG",
"LAST_ORDER_TS",
],
store_type=StoreType.ONLINE,
)For a single entity, pass one inner list, for example keys=[["user_001"]].
8.6 Cost estimation (qualitative)
Online Feature Tables draw cost from storage, refresh compute, and serving compute. Treat the following as planning heuristics; use account usage views and your warehouse size for real numbers.
| Area | Driver | Qualitative notes |
|---|---|---|
| Hybrid Table storage | Entity count × feature columns × row width | Scales with distinct keys you serve and column footprint; Hybrid Table storage is priced per GB (higher than standard table storage). |
| Refresh compute | target_lag cadence × MERGE work |
The Task runs on the TARGET_LAG schedule; incremental refresh (Stream with new rows) is usually cheap. If the Stream has no pending data, runs can be very light. |
| Offline refresh | Feature View refresh_freq |
Separate from online target_lag; both contribute to how stale features can be. |
| Online reads | Lookup volume × warehouse | Each read_feature_view(..., ONLINE) uses warehouse resources; larger warehouses cost more per time. (Check current Snowflake pricing notes for Hybrid Table request metering.) |
Tips: Prefer AUTO or INCREMENTAL refresh where supported, right-size TARGET_LAG to the business SLA, and avoid setting online sync tighter than your offline Feature View can actually refresh.
8.7 Multi-Feature-View Online Reads
8.7.1 The limitation
Offline APIs (generate_dataset, generate_training_set, retrieve_feature_values) accept a list of Feature Views and join them automatically. The online API – read_feature_view(..., StoreType.ONLINE) – currently accepts one Feature View per call. Real-time use cases such as recommenders often need features from multiple entities (e.g., user, item, user-item interaction) in a single request.
8.7.2 Solution A: Pre-joined Feature View (preferred)
If the entity keys allow it, create a combined Dynamic Table that pre-joins the required Feature Views into a single record, then derive the OFT from that combined Feature View. This pushes the join into the materialization layer so the online read is a single point lookup:
combined_df = session.sql("""
SELECT
ui.USER_ID,
ui.LISTING_ID,
u.ORDER_TOTAL_AMT_SUM,
u.ORDER_CNT,
l.LISTING_PRICE,
l.LISTING_AGE_DAYS,
ui.VIEW_CNT,
ui.LAST_INTERACTION_TS AS FV_TS
FROM FEATURE_STORE_DEMO.FEATURE_STORE."USER_LISTING_INTERACTION_FV$V01" ui
JOIN FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01" u ON u.USER_ID = ui.USER_ID
JOIN FEATURE_STORE_DEMO.FEATURE_STORE."LISTING_FV$V01" l ON l.LISTING_ID = ui.LISTING_ID
""")ui = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."USER_LISTING_INTERACTION_FV$V01"')
u = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"')
l = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."LISTING_FV$V01"')
combined_df = (
ui.join(u, ui["USER_ID"] == u["USER_ID"])
.join(l, ui["LISTING_ID"] == l["LISTING_ID"])
.select(
ui["USER_ID"], ui["LISTING_ID"],
u["ORDER_TOTAL_AMT_SUM"], u["ORDER_CNT"],
l["LISTING_PRICE"], l["LISTING_AGE_DAYS"],
ui["VIEW_CNT"],
ui["LAST_INTERACTION_TS"].alias("FV_TS"),
)
)user_listing_entity = Entity(
name="USER_LISTING", join_keys=["USER_ID", "LISTING_ID"],
)
combined_fv = FeatureView(
name="RECO_COMBINED_ONLINE_FV",
entities=[user_listing_entity],
feature_df=combined_df,
timestamp_col="FV_TS",
refresh_freq="1 minute",
online_config=feature_view.OnlineConfig(enable=True, target_lag="1 minute"),
desc="Pre-joined user + listing + interaction features for online recommender scoring",
)
combined_fv = fs.register_feature_view(feature_view=combined_fv, version="V01")A single read_feature_view call now returns all features:
features = fs.read_feature_view(
feature_view=combined_fv,
keys=[["user_001", "listing_042"]],
feature_names=[...],
store_type=StoreType.ONLINE,
)Cardinality trade-off: Pre-joining inflates the OFT to the cardinality of the most granular key combination (e.g., user × listing). For high-cardinality interaction pairs (tens of millions of rows) this increases Hybrid Table storage, initial population time, and MERGE refresh cost. Evaluate whether the cardinality is manageable before choosing this approach – see High-cardinality OFTs below.
8.7.3 Solution B: Parallel reads with merging (fallback)
When pre-joining is not feasible (e.g., cardinality is too high, or the Feature Views have incompatible refresh cadences), issue reads in parallel and merge in the application layer:
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
def read_online(fv, keys, feature_names):
return fs.read_feature_view(
feature_view=fv, keys=keys,
feature_names=feature_names, store_type=StoreType.ONLINE,
).to_pandas()
with ThreadPoolExecutor(max_workers=3) as pool:
fut_user = pool.submit(read_online, user_fv, [["user_001"]], USER_FEATURES)
fut_listing = pool.submit(read_online, listing_fv, [["listing_042"]], LISTING_FEATURES)
fut_inter = pool.submit(read_online, interaction_fv, [["user_001", "listing_042"]], INTER_FEATURES)
user_df = fut_user.result()
listing_df = fut_listing.result()
inter_df = fut_inter.result()
merged = pd.concat([user_df, listing_df, inter_df], axis=1)Wall-clock latency is dominated by the slowest of the parallel calls rather than the sum. This approach avoids the cardinality explosion of Solution A but adds application-side complexity and requires manual result alignment.
8.8 Latency Budget Planning
Before deploying online features, establish a latency budget that allocates time across every component of the inference endpoint. Work backwards from the user-facing SLA:
| Component | Typical range | Notes |
|---|---|---|
| Network (client → Snowflake) | 5-30 ms | Dominated by region distance; co-locate where possible |
| Feature serving (OFT read) | 10-50 ms (warm, low QPS) | Varies with row width, key count, and warehouse state |
| Model inference | 5-50 ms | Depends on model complexity and serving framework |
| Application overhead | 5-20 ms | Serialisation, result merging, logging |
| Total endpoint | 25-150 ms | Sum of the above |
Worked example: A recommender has a 100 ms end-to-end budget. Model inference takes 30 ms and application overhead is 10 ms, leaving 60 ms for feature serving + network. If network round-trip is 15 ms, the OFT read must complete in under 45 ms – achievable for a single batched lookup on a warm warehouse, but tight if multiple sequential FV reads are required (prefer the pre-joined pattern above).
If the latency budget leaves less than ~30 ms for feature serving, consider the pre-joined Feature View pattern to reduce the number of round trips, or the external serving store pattern for sub-10 ms requirements.
8.9 Performance Tuning and Benchmarking
8.9.1 Warm-Up Behavior
OFTs (backed by Hybrid Tables) require a number of queries to warm up before reaching steady-state optimal response times. The initial queries populate internal caches (data pages, metadata, connection pools) and will show higher latency than the system achieves once warmed.
Do not draw conclusions from the first few hundred queries. Run at least several minutes of sustained traffic at your expected QPS before recording latency measurements. Early cold-start results are not representative of production performance.
8.9.2 Benchmarking Guidelines
| Guideline | Detail |
|---|---|
| Run duration | Sustain the test for long enough to reach steady state (minimum 5-10 minutes at target QPS). Short bursts capture warm-up noise, not true throughput. |
| Measure per-minute aggregates | Report P50, P95, P99 latency aggregated per minute. Avoid reporting single-query or first-query latencies as representative. |
| Use concurrent connections | A single serial connection under-utilises the warehouse. Launch multiple concurrent client connections (threads / async) to saturate the target QPS. |
| Match expected workload | Set QPS, key batch sizes, and feature column counts to match your production profile. Results from a 1 QPS test do not predict 1000 QPS behavior. |
| Account for network latency | Measure round-trip time from the client to your Snowflake account region. If the client is in a different region or on-premises, network latency dominates at low query times. Co-locate clients with the Snowflake account where possible. |
8.9.3 Batch Key Lookups
Pre-batching entity keys into a single request reduces round trips and amortises connection overhead:
# ❌ BAD: one round-trip per user — N queries for N users
for uid in user_ids:
fs.read_feature_view(
feature_view=fv,
keys=[[uid]],
feature_names=FEATURE_NAMES,
store_type=StoreType.ONLINE,
)
# ✅ GOOD: one batched call — 1 query for N users
fs.read_feature_view(
feature_view=fv,
keys=[[uid] for uid in user_ids],
feature_names=FEATURE_NAMES,
store_type=StoreType.ONLINE,
)Large key sets: OFTs are optimized for serving a modest number of key-based lookups (tens to low hundreds per call). If you need to look up thousands of entity keys at once (e.g., for batch scoring), it is typically faster to query the offline Feature View directly – the columnar scan of a Dynamic Table handles large result sets more efficiently than many point lookups against a Hybrid Table.
8.9.4 Parameter Binding
When issuing online lookups via raw SQL (outside the Feature Store API), use bind variables rather than string interpolation. This enables Snowflake to cache and reuse query plans, improving throughput under high QPS:
# ✅ GOOD: parameterised query — plan reuse
session.sql(
"SELECT * FROM FEATURE_STORE.USER_ORDER_ONLINE_OFT WHERE USER_ID = ?",
params=["user_001"],
).collect()
# ❌ BAD: string interpolation — new plan per unique value
session.sql(
f"SELECT * FROM FEATURE_STORE.USER_ORDER_ONLINE_OFT WHERE USER_ID = 'user_001'"
).collect()8.9.5 Warehouse Sizing and Multi-Cluster
For single-user or low-QPS testing, an XS warehouse is sufficient. For production workloads with high concurrency:
- Use Standard warehouse type. Snowpark-optimized warehouses provide extra memory for UDF/UDTF workloads but add no benefit for Hybrid Table point lookups – they cost more per credit and can exhibit higher latency variance for OFT reads.
- Multi-cluster warehouses (auto-scaling) allow Snowflake to spin up additional clusters to handle concurrent query load. Set
MIN_CLUSTER_COUNTandMAX_CLUSTER_COUNTbased on your typical and peak QPS. - Scale using multiples of XS warehouses – an XS cluster handles a bounded number of concurrent queries efficiently. Adding more clusters is generally more cost-effective than scaling up to a larger warehouse size; if P99 latency rises under load, add clusters first.
- The refresh Task (MERGE into the Hybrid Table) and online reads share the warehouse. If refresh compute contends with read latency, consider separate warehouses for refresh vs. serving.
8.9.6 target_lag Trade-Offs (Online Sync)
target_lag (examples) |
Freshness | Relative refresh cost | Typical use |
|---|---|---|---|
10 seconds … 1 minute |
Tightest | Higher Task frequency | Fraud, real-time guards |
1 minute … 5 minutes |
Low | Moderate | Recommendations, ranking |
15 minutes … 1 hour |
Relaxed | Lower | Marketing, less critical paths |
Match offline refresh_freq to upstream CLICKSTREAM_DATA landing latency so online sync is not chasing data that does not exist yet.
8.9.7 Feature Freshness and Training-Serving Skew
The features a model receives at inference time are subject to cumulative lag from multiple stages. The exact formula depends on the inference pattern:
| Inference pattern | Lag components | Typical total |
|---|---|---|
| Online (OFT) | Source pipeline + offline refresh_freq + online target_lag |
Seconds to hours |
| Batch (DT-based inference FV) | Source pipeline + upstream FV refresh_freq + inference DT target_lag |
Minutes to hours |
| Batch (Task/Sproc) | Source pipeline + upstream FV refresh_freq + Task schedule interval |
Minutes to hours |
If the source data pipeline itself has variable execution times (e.g., an ETL that sometimes finishes at 2 AM and sometimes at 6 AM), the effective staleness is even less predictable. A model trained on point-in-time-perfect features – where the ASOF join retrieves the exact feature values available at each event – may never have seen feature values with this degree of staleness during training. At inference time, the model receives features that are minutes to hours behind reality, creating a training-serving skew (a distribution shift) that can degrade prediction accuracy.
This applies equally to batch inference. A DT-based scoring pipeline (see Chapter 10: Continuous Batch Inference) reads from upstream Feature Views that have their own refresh lag, compounded by the scoring DT’s own target_lag. A Stream/Task/Stored Procedure pattern has the same issue: the Task schedule determines how often scoring runs, but the features it reads may already be stale by the time the Task fires.
To minimise training-serving skew, the training dataset should reflect the same feature freshness the model will experience in production. Techniques include:
- Conservative spine (Chapter 6: Late-Arriving Data): Shift the spine timestamp back by the expected pipeline lag (e.g.,
DATEADD('hour', -2, EVENT_TS) AS FEATURE_CUTOFF_TS). The ASOF join then returns features as they would have looked given realistic staleness. - Measure actual lag: Query DT refresh history and OFT sync history to quantify real-world end-to-end staleness. Use this measurement to size the conservative spine buffer rather than guessing.
- Feature freshness as a model input: For models that are sensitive to feature age, include a recency feature (e.g.,
DATEDIFF('minute', LAST_ORDER_TS, PREDICTION_TS)) as an explicit model input. This lets the model learn to discount stale features rather than treating all feature values as equally current. See Chapter 6: Returning the FV Timestamp for how to surface the Feature View’s timestamp column for this purpose.
Without these adjustments, a model can appear well-calibrated in offline evaluation (where features are perfectly fresh) but perform poorly once deployed – whether against an OFT, a DT-based scoring pipeline, or a scheduled batch job – with real-world pipeline lag.
8.9.8 High-Cardinality OFTs
OFT row count directly affects initial population time, MERGE refresh cost, and Hybrid Table storage. Guidelines:
| Entity cardinality | Expected behavior |
|---|---|
| < 1 M rows | Fast initial load, lightweight incremental refreshes. Standard operating range. |
| 1-10 M rows | Initial population may take minutes; incremental refresh is still efficient if change rate is low. |
| 10-100 M+ rows | Initial population can be significant; MERGE cost per refresh cycle grows. Hybrid Table storage costs increase. Monitor refresh history and consider whether the offline Feature View + batch scoring is more appropriate. |
Compound entity keys (e.g., USER_ID × LISTING_ID) can produce cross-product cardinalities that are orders of magnitude larger than individual entities. Before enabling online serving on high-cardinality interaction Feature Views, estimate the OFT row count and evaluate:
- Is the full cross-product actually queried online? If only a subset of entity pairs is active, consider filtering to recently active combinations.
- Would the pre-joined Feature View pattern work instead? Pre-joining avoids the interaction OFT entirely if user and listing features can be fetched separately and combined in the application.
- Is an external serving store more appropriate? For tens of millions of rows with sub-20 ms latency requirements, a dedicated key-value store may be a better fit.
8.9.9 Latency Volatility Troubleshooting
If OFT read latency is volatile (e.g., 100-400 ms swings) after warm-up, investigate the following:
| Check | Action |
|---|---|
| MERGE contention | The refresh Task running MERGE into the Hybrid Table can contend with concurrent reads. Check ONLINE_FEATURE_STORE_HISTORY() for refresh timing overlap with latency spikes. Separate refresh and read warehouses if contention is confirmed. |
| Warehouse type | Snowpark-optimized warehouses can show higher variance for point lookups. Switch to a Standard warehouse. |
| Warehouse suspension | If the warehouse auto-suspends between requests, the next query incurs resume latency. Set AUTO_SUSPEND high enough (or disable suspension) for latency-sensitive serving. |
| Connection pooling | Creating a new Snowflake session per request adds significant overhead. Reuse sessions / connection pools in your serving application. |
| Row width | Very wide rows (hundreds of columns) increase per-read data transfer. Use feature_names to request only the columns needed, or consider the bundled OBJECT pattern (Ch12). |
| Network jitter | Cross-region or on-premises clients see variable network latency. Co-locate clients with the Snowflake account region. |
8.9.10 Performance Checklist
8.10 Integration with external services
8.10.1 FastAPI example
📁 Full code:
_code/fastapi_serving.py
from fastapi import FastAPI
from snowflake.ml.feature_store import FeatureStore, StoreType
app = FastAPI()
FEATURE_NAMES = [
"ORDER_TOTAL_AMT_SUM",
"ORDER_CNT",
"ORDER_TOTAL_AMT_AVG",
"LAST_ORDER_TS",
]
@app.post("/features")
async def get_features(user_id: str):
"""Latest USER order features from the OFT."""
features = fs.read_feature_view(
feature_view=user_order_fv,
keys=[[user_id]],
feature_names=FEATURE_NAMES,
store_type=StoreType.ONLINE,
)
return features.to_pandas().to_dict(orient="records")[0]8.10.2 Read Latency vs Feature Freshness
When evaluating online serving options, it is important to distinguish two separate concerns:
| Concern | Definition | Native OFT (Hybrid Table) |
|---|---|---|
| Read latency | Time from query to response | 10–100 ms (warm, standard WH) |
| Feature freshness | Time from source event to feature value reflecting it | DT refresh_freq + OFT target_lag = tens of seconds to minutes |
Many discussions conflate these. A system can have low read latency but stale features (fast reads of old data), or high read latency but fresh features (slow reads of up-to-date data). When a customer says “we need real-time features,” clarify which dimension they mean.
End-to-end freshness in the batch model:
- Source event lands in a Snowflake table
- Dynamic Table refreshes on
refresh_freqschedule (minutes to hours) - OFT sync Task runs on
target_lagschedule (minimum 10 seconds) - Feature value is queryable
For features like “count of transactions in the last N seconds,” this pipeline is fundamentally too slow regardless of read latency.
8.10.3 External serving store pattern
Some use cases have sub-20 ms read latency or sub-second feature freshness requirements that exceed what Hybrid Table-backed OFTs deliver. When native OFT performance does not meet the SLA, teams sometimes replicate features to an external key-value store (e.g., Redis, SingleStore, DynamoDB, Aerospike) for serving.
Pattern: Keep the Feature Store as the system of record for feature definitions, lineage, and offline retrieval. Use a CDC (change data capture) or scheduled sync tool to replicate the OFT or offline Feature View to the external store:
flowchart TD FS[Feature Store offline FV] FS --> OFT[OFT Hybrid Table] FS --> CDC[CDC or sync tool] CDC --> EXT[External store Redis PG Dynamo]
Trade-offs:
| Factor | OFT (native) | External store |
|---|---|---|
| Read latency | 10–100 ms (warm, standard WH) | 1–20 ms (depending on store) |
| Feature freshness | DT refresh + OFT sync (seconds to minutes) | As fast as the write pipeline (sub-second possible) |
| Operational overhead | Fully managed by Snowflake | CDC pipeline to maintain: column/type mapping, schema evolution, monitoring |
| Data consistency | Single source of truth | Eventual consistency with sync delay |
| Cost | Warehouse credits (serving + refresh) | External store infrastructure + sync tool licensing |
| Schema changes | Handled via FV versioning | Require manual pipeline updates; breaking changes risk sync failures |
8.10.4 Hybrid architecture for sub-second features
When a subset of features genuinely needs sub-second freshness (fraud signals, real-time engagement scores) while others are slow-changing (demographics, 7d/30d aggregates), a hybrid architecture avoids over-engineering the entire pipeline:
- Snowflake Feature Store for batch / slow-changing features (the majority of the feature vector)
- External real-time layer (SingleStore, Redis, Kafka Streams, Flink state store, or the Streaming Feature Views described in the next chapter) for the sub-second features
- Inference service assembles features from both sources at prediction time
flowchart LR
subgraph snowflake [Snowflake Feature Store]
DT["Dynamic Table"] --> OFT["Hybrid Table OFT"]
end
subgraph realtime [Real-Time Layer]
Stream["Event Stream"] --> ExtStore["External Store"]
end
OFT --> Inference["Inference Service"]
ExtStore --> Inference
Inference --> Prediction["Model Prediction"]
8.10.5 Historization of external online features
If a customer uses an external online store for sub-second features, they must consider how to keep a history of those feature values in the offline store for:
- Training data generation — PIT-correct training requires knowing what the feature value was at each historical label timestamp. Without historization, you cannot reconstruct what the model “saw” at prediction time.
- Batch inference reproducibility — reproducing a past inference requires the same feature vector.
- Drift detection and model monitoring — comparing training-time and serving-time feature distributions requires logged serving values.
Options for historization:
| Approach | Pros | Cons |
|---|---|---|
| CDC / change log from external store written back to a Snowflake table, registered as a View-based Feature View | Full history; Feature Store catalog/lineage; supports PIT joins | Requires CDC pipeline; adds latency to offline availability |
| Shadow logging at inference time — log the assembled feature vector alongside the prediction | Captures exactly what the model saw; no CDC needed | Storage overhead; requires inference service changes; not available for ad-hoc historical queries |
| Accept approximate PIT for sub-second features | No additional infrastructure | Training data may not perfectly match serving behavior; acceptable when sub-second features have limited historical sensitivity |
Snowflake is actively developing Streaming Feature Views with a Postgres-backed online store that provides 2–3 second end-to-end freshness with built-in offline historization. This native capability addresses much of the motivation for external stores. Before committing to a complex hybrid architecture, evaluate whether Streaming Feature Views (Private Preview) meet your freshness requirements. See the Streaming Feature Views chapter for details (internal / NDA-only).
8.11 Best practices
1. Right-size target_lag
2. Monitor OFT health
3. Use a serving warehouse appropriately
Run Feature Store sessions used for online reads on a warehouse sized for latency and concurrency (often FS_DEV_WH in development, a dedicated production warehouse in real deployments). The TARGET_LAG Task uses the warehouse you configure on the OFT / Feature Store setup.
8.12 Common pitfalls
8.12.1 ❌ Pitfall 1: Expecting historical data
Problem: OFTs only store current values, not history.
Solution: Use offline Feature Views and temporal retrieval for training and PIT correctness.
8.12.2 ❌ Pitfall 2: Ignoring refresh_freq while tuning target_lag
Problem: Setting target_lag="10s" while the offline Dynamic Table refreshes hourly.
Solution: Align offline refresh, source data latency, and target_lag so expectations match reality.
8.12.3 ❌ Pitfall 3: Per-entity API chatter
Problem: Calling read_feature_view once per key in a tight loop.
Solution: Batch keys in one keys=[...] list.
8.12.4 ❌ Pitfall 4: Using deprecated online enablement APIs
Problem: Code or docs that call with_online_enabled() or assume a separate create_online_feature_table Python flow as the only path.
Solution: Use online_config=feature_view.OnlineConfig(...) on the Feature View (or update_feature_view).
8.12.5 ❌ Pitfall 5: Enabling online serving on a tiled Feature View
Problem: A Feature View created with the Aggregations API (features=[Feature.sum(...), ...] and feature_granularity) stores partial aggregates in the underlying Dynamic Table — columns like _PARTIAL_SUM_TOTAL_AMT and TILE_START instead of the original timestamp_col. When you add online_config to such a Feature View, register_feature_view fails:
SQL compilation error: column 'ORDER_TS' does not exist
The SDK does not validate this in advance.
Solution: Use one of three approaches (ranked by preference):
online_fv_from_tiled()helper (recommended): Auto-generates a non-tiled, OFT-ready Feature View from the tiled FV’sFeaturedefinitions. The helper produces a scopedGROUP BYquery bounded to onlymax(window + offset) + bufferof source data, keeping the DT small and fast despite FULL refresh. Use the tiled FV for offline/training workloads.Manual non-tiled FV: Write your own
GROUP BYquery withCASE WHENper-feature window filters if you need to customise beyond what the helper generates.Postgres online store (Private Preview): The upcoming
OnlineStoreType.POSTGRESbackend performs query-time tile reassembly natively, eliminating the need for workarounds.
See Chapter 7: Tiled Feature Views and Online Serving for the helper’s full API, generated SQL examples, and performance trade-offs.
8.13 Summary
| Concept | Description |
|---|---|
| Online Feature Table | First-class object: Hybrid Table + Stream on offline FV + Task (MERGE), driven by TARGET_LAG |
OnlineConfig.target_lag |
Python knob for maximum lag of online table behind source (10s–8d); Task frequency |
refresh_freq (Feature View) |
Offline Dynamic Table refresh; independent of target_lag |
read_feature_view(..., StoreType.ONLINE) |
Low-latency retrieval of latest values; currently single-FV per call |
| Multi-FV online reads | Pre-join into a combined FV (preferred) or use parallel reads with ThreadPoolExecutor |
| Latency budget | Plan end-to-end: network + feature serving + model inference + application overhead |
| Warehouse type | Use Standard warehouses for OFT serving; Snowpark-optimized adds no benefit for point lookups |
| High-cardinality OFTs | Compound-key entities can explode row count; evaluate cardinality before enabling online serving |
| Warm-up | OFTs require sustained queries to reach steady-state latency; always warm up before benchmarking |
| Benchmarking | Concurrent connections, key batching, parameter binding, per-minute P50/P95/P99, multi-cluster WH for high QPS |
| External serving store | Fallback for sub-20 ms requirements; CDC to Redis/Postgres/DynamoDB with Feature Store as system of record |
| Tiled FV limitation | Tiled (Aggregations API) FVs cannot have Hybrid Table OFTs directly — use the online_fv_from_tiled() helper to auto-generate a non-tiled companion, or the Postgres backend |
| Current values only | No historical timeline in the OFT |
8.14 Next Steps
Continue to Chapter 9: Preprocessing to learn about model-dependent transformations and preprocessing patterns.