8  Online Features

Online Feature Tables for real-time serving

Keywords

snowflake, feature store, ml, machine learning, mlops

8.1 Overview

Online Feature Tables (OFTs) enable low-latency feature serving for real-time ML inference. Snowflake treats an OFT as a first-class object: internally, it combines a Hybrid Table for key-value lookups with incremental refresh driven by a Stream on the offline Feature View source and a Task that runs MERGE SQL into the Hybrid Table on a schedule controlled by TARGET_LAG.

This chapter uses the clickstream sample model: database FEATURE_STORE_DEMO, source schema CLICKSTREAM_DATA (for example ORDERS, USERS, SESSIONS), Feature Store schema FEATURE_STORE, warehouse FS_DEV_WH. Feature View registrations use version strings in V01 format (for example version="V01").

Official references: CREATE ONLINE FEATURE TABLE, Create and serve online features (Python).

8.2 Learning Objectives

After completing this chapter, you will be able to:

  • Explain how Stream, Task, Hybrid Table, and TARGET_LAG fit together in the OFT architecture
  • Enable online serving with online_config on a Feature View
  • Tune target_lag and reason about storage and compute costs
  • Handle multi-Feature-View online reads via pre-joined Feature Views or parallel retrieval
  • Plan latency budgets and troubleshoot latency volatility
  • Benchmark OFT performance correctly: warm-up, steady-state measurement, concurrent connections, and per-minute aggregates
  • Retrieve online features via the Feature Store Python API and integrate with serving layers
  • Evaluate when an external serving store is appropriate and understand the trade-offs

📂 Chapter code: Browse companion scripts on GitHub


8.3 What is an Online Feature Table?

An Online Feature Table is a managed object that exposes the current (latest) values per entity for low-latency retrieval. Physically, online serving uses a Hybrid Table keyed by your entity columns. Freshness from the offline Feature View (Dynamic Table or View) is maintained by:

  1. A Stream on that offline object, capturing new and changed rows.
  2. A Task that runs on an interval derived from TARGET_LAG (minimum 10 seconds, maximum 8 days).
  3. MERGE (or equivalent incremental logic) into the Hybrid Table so only deltas are applied when the Stream has data; when the Stream is empty, the Task can finish quickly with minimal compute.
Key Concept

OFTs store only the latest feature values — they do not retain historical values. For point-in-time historical retrieval, use the offline Feature View and training APIs.

TARGET_LAG vs offline refresh_freq

refresh_freq on the Feature View controls how often the offline Dynamic Table (or related materialization) refreshes. OnlineConfig.target_lag controls how often the online sync Task runs. They are independent; end-to-end staleness is roughly the sum of both paths. See the Python online features documentation.

flowchart LR
  OFF[Offline FV DT or View] --> MERGE[Stream + Task MERGE]
  MERGE --> HT[Hybrid Table latest per key]

Online Feature Table: offline FV to Hybrid Table via stream and task


8.4 Creating Online Feature Tables

8.4.1 Enable online serving with online_config

Enable online features on the Feature View using online_config. Earlier API versions used helpers such as with_online_enabled(), which are now deprecated — use OnlineConfig instead.

📁 Full code: _code/online_feature_tables.py

from snowflake.snowpark import Session
from snowflake.ml.feature_store import FeatureStore, FeatureView, Entity
from snowflake.ml.feature_store import feature_view

session = Session.builder.configs(...).create()
fs = FeatureStore(
    session=session,
    database="FEATURE_STORE_DEMO",
    schema="FEATURE_STORE",
    default_warehouse="FS_DEV_WH",
)

user_entity = Entity(name="USER", join_keys=["USER_ID"])
feature_df = session.sql("""
    SELECT
        USER_ID,
        SUM(TOTAL_AMT) AS ORDER_TOTAL_AMT_SUM,
        COUNT(ORDER_ID) AS ORDER_CNT,
        AVG(TOTAL_AMT) AS ORDER_TOTAL_AMT_AVG,
        MAX(ORDER_TS) AS LAST_ORDER_TS
    FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
    GROUP BY USER_ID
""")
import snowflake.snowpark.functions as F

orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
feature_df = orders.group_by("USER_ID").agg(
    F.sum("TOTAL_AMT").alias("ORDER_TOTAL_AMT_SUM"),
    F.count("ORDER_ID").alias("ORDER_CNT"),
    F.avg("TOTAL_AMT").alias("ORDER_TOTAL_AMT_AVG"),
    F.max("ORDER_TS").alias("LAST_ORDER_TS"),
)
online_config = feature_view.OnlineConfig(
    enable=True,
    target_lag="1 minute",
)

fv = FeatureView(
    name="USER_ORDER_ONLINE_FV",
    entities=[user_entity],
    feature_df=feature_df,
    timestamp_col="LAST_ORDER_TS",
    refresh_freq="1 minute",
    refresh_mode="AUTO",
    desc="USER order aggregates from CLICKSTREAM_DATA.ORDERS; online + offline",
    online_config=online_config,
)

fv = fs.register_feature_view(feature_view=fv, version="V01")

Snowflake creates and wires the OFT infrastructure (Hybrid Table, Stream, Task) as part of serving enablement; TARGET_LAG in SQL corresponds to OnlineConfig.target_lag in Python.

Note

Initial OFT population: When online serving is first enabled the OFT must be fully populated – one row per entity key present in the source table. This initial load may take noticeably longer than subsequent refreshes, especially for large entity cardinalities. Once the initial population completes, incremental refreshes process only changed rows and should be fast, though throughput still depends on the update frequency and rate of feature-value changes in the source.

8.4.2 SQL equivalent (illustrative)

For visibility into the object model, the SQL shape mirrors CREATE ONLINE FEATURE TABLE:

CREATE OR REPLACE ONLINE FEATURE TABLE FEATURE_STORE_DEMO.FEATURE_STORE.USER_ORDER_ONLINE_OFT
  PRIMARY KEY (USER_ID)
  TIMESTAMP_COLUMN = LAST_ORDER_TS
  TARGET_LAG = '1 minute'
  WAREHOUSE = FS_DEV_WH
  REFRESH_MODE = AUTO
FROM FEATURE_STORE_DEMO.FEATURE_STORE.<your_offline_dynamic_table_or_view>;

(The Python API typically manages object names and wiring when you register the Feature View with online_config.)

8.4.3 Adding online serving to an existing Feature View

If a Feature View was originally registered without online serving, you can enable it after the fact. The Feature Store will create the OFT infrastructure (Hybrid Table, Stream, Task) for the existing offline Feature View:

fs.update_feature_view(
    name="USER_ORDER_ONLINE_FV",
    version="V01",
    online_config=feature_view.OnlineConfig(enable=True, target_lag="30 seconds"),
)

8.5 Retrieving Online Features

Use read_feature_view with store_type=StoreType.ONLINE and explicit feature names. Keys are provided as a list of key tuples (one per lookup row).

from snowflake.ml.feature_store import StoreType

features = fs.read_feature_view(
    feature_view=fv,
    keys=[["user_001"], ["user_002"]],
    feature_names=[
        "ORDER_TOTAL_AMT_SUM",
        "ORDER_CNT",
        "ORDER_TOTAL_AMT_AVG",
        "LAST_ORDER_TS",
    ],
    store_type=StoreType.ONLINE,
)

For a single entity, pass one inner list, for example keys=[["user_001"]].


8.6 Cost estimation (qualitative)

Online Feature Tables draw cost from storage, refresh compute, and serving compute. Treat the following as planning heuristics; use account usage views and your warehouse size for real numbers.

Area Driver Qualitative notes
Hybrid Table storage Entity count × feature columns × row width Scales with distinct keys you serve and column footprint; Hybrid Table storage is priced per GB (higher than standard table storage).
Refresh compute target_lag cadence × MERGE work The Task runs on the TARGET_LAG schedule; incremental refresh (Stream with new rows) is usually cheap. If the Stream has no pending data, runs can be very light.
Offline refresh Feature View refresh_freq Separate from online target_lag; both contribute to how stale features can be.
Online reads Lookup volume × warehouse Each read_feature_view(..., ONLINE) uses warehouse resources; larger warehouses cost more per time. (Check current Snowflake pricing notes for Hybrid Table request metering.)

Tips: Prefer AUTO or INCREMENTAL refresh where supported, right-size TARGET_LAG to the business SLA, and avoid setting online sync tighter than your offline Feature View can actually refresh.


8.7 Multi-Feature-View Online Reads

8.7.1 The limitation

Offline APIs (generate_dataset, generate_training_set, retrieve_feature_values) accept a list of Feature Views and join them automatically. The online API – read_feature_view(..., StoreType.ONLINE) – currently accepts one Feature View per call. Real-time use cases such as recommenders often need features from multiple entities (e.g., user, item, user-item interaction) in a single request.

8.7.2 Solution A: Pre-joined Feature View (preferred)

If the entity keys allow it, create a combined Dynamic Table that pre-joins the required Feature Views into a single record, then derive the OFT from that combined Feature View. This pushes the join into the materialization layer so the online read is a single point lookup:

combined_df = session.sql("""
    SELECT
        ui.USER_ID,
        ui.LISTING_ID,
        u.ORDER_TOTAL_AMT_SUM,
        u.ORDER_CNT,
        l.LISTING_PRICE,
        l.LISTING_AGE_DAYS,
        ui.VIEW_CNT,
        ui.LAST_INTERACTION_TS AS FV_TS
    FROM FEATURE_STORE_DEMO.FEATURE_STORE."USER_LISTING_INTERACTION_FV$V01" ui
    JOIN FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"              u  ON u.USER_ID = ui.USER_ID
    JOIN FEATURE_STORE_DEMO.FEATURE_STORE."LISTING_FV$V01"                 l  ON l.LISTING_ID = ui.LISTING_ID
""")
ui = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."USER_LISTING_INTERACTION_FV$V01"')
u  = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."USER_ORDER_FV$V01"')
l  = session.table('FEATURE_STORE_DEMO.FEATURE_STORE."LISTING_FV$V01"')

combined_df = (
    ui.join(u, ui["USER_ID"] == u["USER_ID"])
      .join(l, ui["LISTING_ID"] == l["LISTING_ID"])
      .select(
          ui["USER_ID"], ui["LISTING_ID"],
          u["ORDER_TOTAL_AMT_SUM"], u["ORDER_CNT"],
          l["LISTING_PRICE"], l["LISTING_AGE_DAYS"],
          ui["VIEW_CNT"],
          ui["LAST_INTERACTION_TS"].alias("FV_TS"),
      )
)
user_listing_entity = Entity(
    name="USER_LISTING", join_keys=["USER_ID", "LISTING_ID"],
)

combined_fv = FeatureView(
    name="RECO_COMBINED_ONLINE_FV",
    entities=[user_listing_entity],
    feature_df=combined_df,
    timestamp_col="FV_TS",
    refresh_freq="1 minute",
    online_config=feature_view.OnlineConfig(enable=True, target_lag="1 minute"),
    desc="Pre-joined user + listing + interaction features for online recommender scoring",
)
combined_fv = fs.register_feature_view(feature_view=combined_fv, version="V01")

A single read_feature_view call now returns all features:

features = fs.read_feature_view(
    feature_view=combined_fv,
    keys=[["user_001", "listing_042"]],
    feature_names=[...],
    store_type=StoreType.ONLINE,
)
Warning

Cardinality trade-off: Pre-joining inflates the OFT to the cardinality of the most granular key combination (e.g., user × listing). For high-cardinality interaction pairs (tens of millions of rows) this increases Hybrid Table storage, initial population time, and MERGE refresh cost. Evaluate whether the cardinality is manageable before choosing this approach – see High-cardinality OFTs below.

8.7.3 Solution B: Parallel reads with merging (fallback)

When pre-joining is not feasible (e.g., cardinality is too high, or the Feature Views have incompatible refresh cadences), issue reads in parallel and merge in the application layer:

from concurrent.futures import ThreadPoolExecutor
import pandas as pd

def read_online(fv, keys, feature_names):
    return fs.read_feature_view(
        feature_view=fv, keys=keys,
        feature_names=feature_names, store_type=StoreType.ONLINE,
    ).to_pandas()

with ThreadPoolExecutor(max_workers=3) as pool:
    fut_user    = pool.submit(read_online, user_fv,        [["user_001"]], USER_FEATURES)
    fut_listing = pool.submit(read_online, listing_fv,     [["listing_042"]], LISTING_FEATURES)
    fut_inter   = pool.submit(read_online, interaction_fv, [["user_001", "listing_042"]], INTER_FEATURES)

user_df    = fut_user.result()
listing_df = fut_listing.result()
inter_df   = fut_inter.result()

merged = pd.concat([user_df, listing_df, inter_df], axis=1)

Wall-clock latency is dominated by the slowest of the parallel calls rather than the sum. This approach avoids the cardinality explosion of Solution A but adds application-side complexity and requires manual result alignment.


8.8 Latency Budget Planning

Before deploying online features, establish a latency budget that allocates time across every component of the inference endpoint. Work backwards from the user-facing SLA:

Component Typical range Notes
Network (client → Snowflake) 5-30 ms Dominated by region distance; co-locate where possible
Feature serving (OFT read) 10-50 ms (warm, low QPS) Varies with row width, key count, and warehouse state
Model inference 5-50 ms Depends on model complexity and serving framework
Application overhead 5-20 ms Serialisation, result merging, logging
Total endpoint 25-150 ms Sum of the above

Worked example: A recommender has a 100 ms end-to-end budget. Model inference takes 30 ms and application overhead is 10 ms, leaving 60 ms for feature serving + network. If network round-trip is 15 ms, the OFT read must complete in under 45 ms – achievable for a single batched lookup on a warm warehouse, but tight if multiple sequential FV reads are required (prefer the pre-joined pattern above).

Tip

If the latency budget leaves less than ~30 ms for feature serving, consider the pre-joined Feature View pattern to reduce the number of round trips, or the external serving store pattern for sub-10 ms requirements.


8.9 Performance Tuning and Benchmarking

8.9.1 Warm-Up Behavior

OFTs (backed by Hybrid Tables) require a number of queries to warm up before reaching steady-state optimal response times. The initial queries populate internal caches (data pages, metadata, connection pools) and will show higher latency than the system achieves once warmed.

Always warm up before measuring

Do not draw conclusions from the first few hundred queries. Run at least several minutes of sustained traffic at your expected QPS before recording latency measurements. Early cold-start results are not representative of production performance.

8.9.2 Benchmarking Guidelines

Guideline Detail
Run duration Sustain the test for long enough to reach steady state (minimum 5-10 minutes at target QPS). Short bursts capture warm-up noise, not true throughput.
Measure per-minute aggregates Report P50, P95, P99 latency aggregated per minute. Avoid reporting single-query or first-query latencies as representative.
Use concurrent connections A single serial connection under-utilises the warehouse. Launch multiple concurrent client connections (threads / async) to saturate the target QPS.
Match expected workload Set QPS, key batch sizes, and feature column counts to match your production profile. Results from a 1 QPS test do not predict 1000 QPS behavior.
Account for network latency Measure round-trip time from the client to your Snowflake account region. If the client is in a different region or on-premises, network latency dominates at low query times. Co-locate clients with the Snowflake account where possible.

8.9.3 Batch Key Lookups

Pre-batching entity keys into a single request reduces round trips and amortises connection overhead:

# ❌ BAD: one round-trip per user — N queries for N users
for uid in user_ids:
    fs.read_feature_view(
        feature_view=fv,
        keys=[[uid]],
        feature_names=FEATURE_NAMES,
        store_type=StoreType.ONLINE,
    )

# ✅ GOOD: one batched call — 1 query for N users
fs.read_feature_view(
    feature_view=fv,
    keys=[[uid] for uid in user_ids],
    feature_names=FEATURE_NAMES,
    store_type=StoreType.ONLINE,
)
Tip

Large key sets: OFTs are optimized for serving a modest number of key-based lookups (tens to low hundreds per call). If you need to look up thousands of entity keys at once (e.g., for batch scoring), it is typically faster to query the offline Feature View directly – the columnar scan of a Dynamic Table handles large result sets more efficiently than many point lookups against a Hybrid Table.

8.9.4 Parameter Binding

When issuing online lookups via raw SQL (outside the Feature Store API), use bind variables rather than string interpolation. This enables Snowflake to cache and reuse query plans, improving throughput under high QPS:

# ✅ GOOD: parameterised query — plan reuse
session.sql(
    "SELECT * FROM FEATURE_STORE.USER_ORDER_ONLINE_OFT WHERE USER_ID = ?",
    params=["user_001"],
).collect()

# ❌ BAD: string interpolation — new plan per unique value
session.sql(
    f"SELECT * FROM FEATURE_STORE.USER_ORDER_ONLINE_OFT WHERE USER_ID = 'user_001'"
).collect()

8.9.5 Warehouse Sizing and Multi-Cluster

For single-user or low-QPS testing, an XS warehouse is sufficient. For production workloads with high concurrency:

  • Use Standard warehouse type. Snowpark-optimized warehouses provide extra memory for UDF/UDTF workloads but add no benefit for Hybrid Table point lookups – they cost more per credit and can exhibit higher latency variance for OFT reads.
  • Multi-cluster warehouses (auto-scaling) allow Snowflake to spin up additional clusters to handle concurrent query load. Set MIN_CLUSTER_COUNT and MAX_CLUSTER_COUNT based on your typical and peak QPS.
  • Scale using multiples of XS warehouses – an XS cluster handles a bounded number of concurrent queries efficiently. Adding more clusters is generally more cost-effective than scaling up to a larger warehouse size; if P99 latency rises under load, add clusters first.
  • The refresh Task (MERGE into the Hybrid Table) and online reads share the warehouse. If refresh compute contends with read latency, consider separate warehouses for refresh vs. serving.

8.9.6 target_lag Trade-Offs (Online Sync)

target_lag (examples) Freshness Relative refresh cost Typical use
10 seconds1 minute Tightest Higher Task frequency Fraud, real-time guards
1 minute5 minutes Low Moderate Recommendations, ranking
15 minutes1 hour Relaxed Lower Marketing, less critical paths

Match offline refresh_freq to upstream CLICKSTREAM_DATA landing latency so online sync is not chasing data that does not exist yet.

8.9.7 Feature Freshness and Training-Serving Skew

The features a model receives at inference time are subject to cumulative lag from multiple stages. The exact formula depends on the inference pattern:

Inference pattern Lag components Typical total
Online (OFT) Source pipeline + offline refresh_freq + online target_lag Seconds to hours
Batch (DT-based inference FV) Source pipeline + upstream FV refresh_freq + inference DT target_lag Minutes to hours
Batch (Task/Sproc) Source pipeline + upstream FV refresh_freq + Task schedule interval Minutes to hours

If the source data pipeline itself has variable execution times (e.g., an ETL that sometimes finishes at 2 AM and sometimes at 6 AM), the effective staleness is even less predictable. A model trained on point-in-time-perfect features – where the ASOF join retrieves the exact feature values available at each event – may never have seen feature values with this degree of staleness during training. At inference time, the model receives features that are minutes to hours behind reality, creating a training-serving skew (a distribution shift) that can degrade prediction accuracy.

This applies equally to batch inference. A DT-based scoring pipeline (see Chapter 10: Continuous Batch Inference) reads from upstream Feature Views that have their own refresh lag, compounded by the scoring DT’s own target_lag. A Stream/Task/Stored Procedure pattern has the same issue: the Task schedule determines how often scoring runs, but the features it reads may already be stale by the time the Task fires.

Account for realistic lag during training

To minimise training-serving skew, the training dataset should reflect the same feature freshness the model will experience in production. Techniques include:

  • Conservative spine (Chapter 6: Late-Arriving Data): Shift the spine timestamp back by the expected pipeline lag (e.g., DATEADD('hour', -2, EVENT_TS) AS FEATURE_CUTOFF_TS). The ASOF join then returns features as they would have looked given realistic staleness.
  • Measure actual lag: Query DT refresh history and OFT sync history to quantify real-world end-to-end staleness. Use this measurement to size the conservative spine buffer rather than guessing.
  • Feature freshness as a model input: For models that are sensitive to feature age, include a recency feature (e.g., DATEDIFF('minute', LAST_ORDER_TS, PREDICTION_TS)) as an explicit model input. This lets the model learn to discount stale features rather than treating all feature values as equally current. See Chapter 6: Returning the FV Timestamp for how to surface the Feature View’s timestamp column for this purpose.

Without these adjustments, a model can appear well-calibrated in offline evaluation (where features are perfectly fresh) but perform poorly once deployed – whether against an OFT, a DT-based scoring pipeline, or a scheduled batch job – with real-world pipeline lag.

8.9.8 High-Cardinality OFTs

OFT row count directly affects initial population time, MERGE refresh cost, and Hybrid Table storage. Guidelines:

Entity cardinality Expected behavior
< 1 M rows Fast initial load, lightweight incremental refreshes. Standard operating range.
1-10 M rows Initial population may take minutes; incremental refresh is still efficient if change rate is low.
10-100 M+ rows Initial population can be significant; MERGE cost per refresh cycle grows. Hybrid Table storage costs increase. Monitor refresh history and consider whether the offline Feature View + batch scoring is more appropriate.

Compound entity keys (e.g., USER_ID × LISTING_ID) can produce cross-product cardinalities that are orders of magnitude larger than individual entities. Before enabling online serving on high-cardinality interaction Feature Views, estimate the OFT row count and evaluate:

  • Is the full cross-product actually queried online? If only a subset of entity pairs is active, consider filtering to recently active combinations.
  • Would the pre-joined Feature View pattern work instead? Pre-joining avoids the interaction OFT entirely if user and listing features can be fetched separately and combined in the application.
  • Is an external serving store more appropriate? For tens of millions of rows with sub-20 ms latency requirements, a dedicated key-value store may be a better fit.

8.9.9 Latency Volatility Troubleshooting

If OFT read latency is volatile (e.g., 100-400 ms swings) after warm-up, investigate the following:

Check Action
MERGE contention The refresh Task running MERGE into the Hybrid Table can contend with concurrent reads. Check ONLINE_FEATURE_STORE_HISTORY() for refresh timing overlap with latency spikes. Separate refresh and read warehouses if contention is confirmed.
Warehouse type Snowpark-optimized warehouses can show higher variance for point lookups. Switch to a Standard warehouse.
Warehouse suspension If the warehouse auto-suspends between requests, the next query incurs resume latency. Set AUTO_SUSPEND high enough (or disable suspension) for latency-sensitive serving.
Connection pooling Creating a new Snowflake session per request adds significant overhead. Reuse sessions / connection pools in your serving application.
Row width Very wide rows (hundreds of columns) increase per-read data transfer. Use feature_names to request only the columns needed, or consider the bundled OBJECT pattern (Ch12).
Network jitter Cross-region or on-premises clients see variable network latency. Co-locate clients with the Snowflake account region.

8.9.10 Performance Checklist


8.10 Integration with external services

8.10.1 FastAPI example

📁 Full code: _code/fastapi_serving.py

from fastapi import FastAPI
from snowflake.ml.feature_store import FeatureStore, StoreType

app = FastAPI()

FEATURE_NAMES = [
    "ORDER_TOTAL_AMT_SUM",
    "ORDER_CNT",
    "ORDER_TOTAL_AMT_AVG",
    "LAST_ORDER_TS",
]

@app.post("/features")
async def get_features(user_id: str):
    """Latest USER order features from the OFT."""
    features = fs.read_feature_view(
        feature_view=user_order_fv,
        keys=[[user_id]],
        feature_names=FEATURE_NAMES,
        store_type=StoreType.ONLINE,
    )
    return features.to_pandas().to_dict(orient="records")[0]

8.10.2 Read Latency vs Feature Freshness

When evaluating online serving options, it is important to distinguish two separate concerns:

Concern Definition Native OFT (Hybrid Table)
Read latency Time from query to response 10–100 ms (warm, standard WH)
Feature freshness Time from source event to feature value reflecting it DT refresh_freq + OFT target_lag = tens of seconds to minutes

Many discussions conflate these. A system can have low read latency but stale features (fast reads of old data), or high read latency but fresh features (slow reads of up-to-date data). When a customer says “we need real-time features,” clarify which dimension they mean.

End-to-end freshness in the batch model:

  1. Source event lands in a Snowflake table
  2. Dynamic Table refreshes on refresh_freq schedule (minutes to hours)
  3. OFT sync Task runs on target_lag schedule (minimum 10 seconds)
  4. Feature value is queryable

For features like “count of transactions in the last N seconds,” this pipeline is fundamentally too slow regardless of read latency.

8.10.3 External serving store pattern

Some use cases have sub-20 ms read latency or sub-second feature freshness requirements that exceed what Hybrid Table-backed OFTs deliver. When native OFT performance does not meet the SLA, teams sometimes replicate features to an external key-value store (e.g., Redis, SingleStore, DynamoDB, Aerospike) for serving.

Pattern: Keep the Feature Store as the system of record for feature definitions, lineage, and offline retrieval. Use a CDC (change data capture) or scheduled sync tool to replicate the OFT or offline Feature View to the external store:

flowchart TD
  FS[Feature Store offline FV]
  FS --> OFT[OFT Hybrid Table]
  FS --> CDC[CDC or sync tool]
  CDC --> EXT[External store Redis PG Dynamo]

External serving: Feature Store to OFT or CDC to external store

Trade-offs:

Factor OFT (native) External store
Read latency 10–100 ms (warm, standard WH) 1–20 ms (depending on store)
Feature freshness DT refresh + OFT sync (seconds to minutes) As fast as the write pipeline (sub-second possible)
Operational overhead Fully managed by Snowflake CDC pipeline to maintain: column/type mapping, schema evolution, monitoring
Data consistency Single source of truth Eventual consistency with sync delay
Cost Warehouse credits (serving + refresh) External store infrastructure + sync tool licensing
Schema changes Handled via FV versioning Require manual pipeline updates; breaking changes risk sync failures

8.10.4 Hybrid architecture for sub-second features

When a subset of features genuinely needs sub-second freshness (fraud signals, real-time engagement scores) while others are slow-changing (demographics, 7d/30d aggregates), a hybrid architecture avoids over-engineering the entire pipeline:

  • Snowflake Feature Store for batch / slow-changing features (the majority of the feature vector)
  • External real-time layer (SingleStore, Redis, Kafka Streams, Flink state store, or the Streaming Feature Views described in the next chapter) for the sub-second features
  • Inference service assembles features from both sources at prediction time

flowchart LR
    subgraph snowflake [Snowflake Feature Store]
        DT["Dynamic Table"] --> OFT["Hybrid Table OFT"]
    end
    subgraph realtime [Real-Time Layer]
        Stream["Event Stream"] --> ExtStore["External Store"]
    end
    OFT --> Inference["Inference Service"]
    ExtStore --> Inference
    Inference --> Prediction["Model Prediction"]

Hybrid serving: batch features from Snowflake, real-time features from external store

8.10.5 Historization of external online features

If a customer uses an external online store for sub-second features, they must consider how to keep a history of those feature values in the offline store for:

  • Training data generation — PIT-correct training requires knowing what the feature value was at each historical label timestamp. Without historization, you cannot reconstruct what the model “saw” at prediction time.
  • Batch inference reproducibility — reproducing a past inference requires the same feature vector.
  • Drift detection and model monitoring — comparing training-time and serving-time feature distributions requires logged serving values.

Options for historization:

Approach Pros Cons
CDC / change log from external store written back to a Snowflake table, registered as a View-based Feature View Full history; Feature Store catalog/lineage; supports PIT joins Requires CDC pipeline; adds latency to offline availability
Shadow logging at inference time — log the assembled feature vector alongside the prediction Captures exactly what the model saw; no CDC needed Storage overhead; requires inference service changes; not available for ad-hoc historical queries
Accept approximate PIT for sub-second features No additional infrastructure Training data may not perfectly match serving behavior; acceptable when sub-second features have limited historical sensitivity
Tip

Snowflake is actively developing Streaming Feature Views with a Postgres-backed online store that provides 2–3 second end-to-end freshness with built-in offline historization. This native capability addresses much of the motivation for external stores. Before committing to a complex hybrid architecture, evaluate whether Streaming Feature Views (Private Preview) meet your freshness requirements. See the Streaming Feature Views chapter for details (internal / NDA-only).


8.11 Best practices

1. Right-size target_lag

# Tight online sync for high-stakes scoring
fraud_online = feature_view.OnlineConfig(enable=True, target_lag="30 seconds")

# Relaxed sync where minutes of lag are acceptable
reco_online = feature_view.OnlineConfig(enable=True, target_lag="5 minutes")

2. Monitor OFT health

-- Refresh / sync history (names may vary by release; use INFORMATION_SCHEMA as documented for your account)
SELECT *
FROM TABLE(INFORMATION_SCHEMA.ONLINE_FEATURE_STORE_HISTORY())
ORDER BY SYNC_START_TIME DESC
LIMIT 10;

3. Use a serving warehouse appropriately

Run Feature Store sessions used for online reads on a warehouse sized for latency and concurrency (often FS_DEV_WH in development, a dedicated production warehouse in real deployments). The TARGET_LAG Task uses the warehouse you configure on the OFT / Feature Store setup.


8.12 Common pitfalls

8.12.1 ❌ Pitfall 1: Expecting historical data

Problem: OFTs only store current values, not history.

Solution: Use offline Feature Views and temporal retrieval for training and PIT correctness.

8.12.2 ❌ Pitfall 2: Ignoring refresh_freq while tuning target_lag

Problem: Setting target_lag="10s" while the offline Dynamic Table refreshes hourly.

Solution: Align offline refresh, source data latency, and target_lag so expectations match reality.

8.12.3 ❌ Pitfall 3: Per-entity API chatter

Problem: Calling read_feature_view once per key in a tight loop.

Solution: Batch keys in one keys=[...] list.

8.12.4 ❌ Pitfall 4: Using deprecated online enablement APIs

Problem: Code or docs that call with_online_enabled() or assume a separate create_online_feature_table Python flow as the only path.

Solution: Use online_config=feature_view.OnlineConfig(...) on the Feature View (or update_feature_view).

8.12.5 ❌ Pitfall 5: Enabling online serving on a tiled Feature View

Problem: A Feature View created with the Aggregations API (features=[Feature.sum(...), ...] and feature_granularity) stores partial aggregates in the underlying Dynamic Table — columns like _PARTIAL_SUM_TOTAL_AMT and TILE_START instead of the original timestamp_col. When you add online_config to such a Feature View, register_feature_view fails:

SQL compilation error: column 'ORDER_TS' does not exist

The SDK does not validate this in advance.

Solution: Use one of three approaches (ranked by preference):

  1. online_fv_from_tiled() helper (recommended): Auto-generates a non-tiled, OFT-ready Feature View from the tiled FV’s Feature definitions. The helper produces a scoped GROUP BY query bounded to only max(window + offset) + buffer of source data, keeping the DT small and fast despite FULL refresh. Use the tiled FV for offline/training workloads.

  2. Manual non-tiled FV: Write your own GROUP BY query with CASE WHEN per-feature window filters if you need to customise beyond what the helper generates.

  3. Postgres online store (Private Preview): The upcoming OnlineStoreType.POSTGRES backend performs query-time tile reassembly natively, eliminating the need for workarounds.

# Example: using the helper to create an online companion
from online_from_tiled import online_fv_from_tiled

online_fv = online_fv_from_tiled(
    tiled_reg, session,
    source_query="SELECT * FROM DB.SCHEMA.ORDERS",
)
fs.register_feature_view(online_fv, version="V01", block=True)

See Chapter 7: Tiled Feature Views and Online Serving for the helper’s full API, generated SQL examples, and performance trade-offs.


8.13 Summary

Concept Description
Online Feature Table First-class object: Hybrid Table + Stream on offline FV + Task (MERGE), driven by TARGET_LAG
OnlineConfig.target_lag Python knob for maximum lag of online table behind source (10s–8d); Task frequency
refresh_freq (Feature View) Offline Dynamic Table refresh; independent of target_lag
read_feature_view(..., StoreType.ONLINE) Low-latency retrieval of latest values; currently single-FV per call
Multi-FV online reads Pre-join into a combined FV (preferred) or use parallel reads with ThreadPoolExecutor
Latency budget Plan end-to-end: network + feature serving + model inference + application overhead
Warehouse type Use Standard warehouses for OFT serving; Snowpark-optimized adds no benefit for point lookups
High-cardinality OFTs Compound-key entities can explode row count; evaluate cardinality before enabling online serving
Warm-up OFTs require sustained queries to reach steady-state latency; always warm up before benchmarking
Benchmarking Concurrent connections, key batching, parameter binding, per-minute P50/P95/P99, multi-cluster WH for high QPS
External serving store Fallback for sub-20 ms requirements; CDC to Redis/Postgres/DynamoDB with Feature Store as system of record
Tiled FV limitation Tiled (Aggregations API) FVs cannot have Hybrid Table OFTs directly — use the online_fv_from_tiled() helper to auto-generate a non-tiled companion, or the Postgres backend
Current values only No historical timeline in the OFT

8.14 Next Steps

Continue to Chapter 9: Preprocessing to learn about model-dependent transformations and preprocessing patterns.