flowchart LR P1[Phase 1 Assessment] --> P2[Phase 2 Migration] --> P3[Phase 3 Validation]
13 Migration Guide
Migrating to Snowflake Feature Store from other platforms
snowflake, feature store, ml, machine learning, mlops
13.1 Overview
This chapter provides guidance for teams migrating to Snowflake Feature Store from other platforms, primarily Tecton. It covers terminology mapping, migration strategies, and common patterns.
Examples follow the clickstream sample layout: database FEATURE_STORE_DEMO, Feature Store schema FEATURE_STORE, source schema CLICKSTREAM_DATA (tables such as EVENTS, SESSIONS, ORDERS, USERS, PRODUCTS), warehouse FS_DEV_WH. Register Feature Views with version strings in V01 format (for example fs.register_feature_view(..., version="V01")).
13.2 Learning Objectives
After completing this chapter, you will be able to:
- Map terminology from other Feature Stores to Snowflake
- Plan a phased migration approach
- Migrate feature definitions and pipelines
- Validate migrated features
📂 Chapter code: Browse companion scripts on GitHub
13.3 Why Migrate to Snowflake Feature Store?
13.3.1 Key Benefits
| Benefit | Description |
|---|---|
| Unified Platform | Features, training, inference all in Snowflake |
| Native Integration | Uses Snowflake objects (DTs, Views, Hybrid Tables) |
| Zero Infrastructure | No separate serving infrastructure needed |
| Enterprise Security | Inherits Snowflake RBAC and governance |
| Cost Efficiency | No data movement, unified compute |
13.4 Terminology Mapping
13.4.1 Tecton → Snowflake
📁 Full code:
_code/terminology_mapping.py
| Tecton | Snowflake | Notes |
|---|---|---|
| Feature View | Feature View | Registered with register_feature_view; use version="V01" (or V02, …) |
| Entity | Entity | Same concept; join keys align with spine columns |
| Batch Feature View | Feature View (Dynamic Table) | Set refresh_freq to a period (e.g. "1 hour") or CRON string |
| On-Demand Feature View | Feature View (View) | Omit refresh_freq or set refresh_freq=None; query-time computation |
| Feature Service | generate_training_set() / generate_dataset() |
No separate service object: pass spine + list of Feature Views (see Chapter 10) |
| Online Store | Online Feature Table (Hybrid Table) | Enable with OnlineConfig; freshness via target_lag |
| Offline Store | Feature Store schema | e.g. FEATURE_STORE_DEMO.FEATURE_STORE; materialized batch features live in Dynamic Tables there |
| Data Source | Source table or view | e.g. FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS |
| Transformation | Feature View query (feature_df) |
Snowpark DataFrame / SQL vs Tecton Python decorators |
| Backfill | fs.refresh_feature_view(fv) |
Triggers refresh outside the scheduled refresh_freq |
| TTL | refresh_freq / target_lag |
Offline lag ↔︎ Dynamic Table refresh_freq; online freshness ↔︎ OnlineConfig.target_lag (not a single global TTL) |
get_historical_features |
generate_training_set() / generate_dataset() |
Snowpark DataFrame vs versioned Dataset (Parquet); same PIT semantics |
get_online_features |
get_online_features |
Retrieve from Online Feature Table |
| Spine | Spine DataFrame | Entity keys + PIT timestamp column passed as spine_timestamp_col |
13.4.2 Feast → Snowflake (abbreviated)
| Feast | Snowflake |
|---|---|
| Feature Store (registry) | Feature Store schema + platform metadata |
| Feature View | Feature View |
| Online Store | Online Feature Table (Hybrid Table) |
| Offline Store | Feature Store schema |
| Feature Service | generate_training_set() / generate_dataset() |
get_historical_features |
generate_training_set() / generate_dataset() |
13.5 Migration Approach
13.5.1 Three-Phase Migration
13.5.2 Phase 1: Assessment
- Inventory existing features
- List all Feature Views/Groups
- Document entities and join keys
- Identify refresh frequencies (map to
refresh_freq/target_lag)
- Map to Snowflake concepts
- Use terminology mapping table
- Identify gaps or differences
- Plan migration order
- Start with simple features
- Group by model dependency
13.5.3 Phase 2: Migration
📁 Full code:
_code/migration_utils.py
# Example: Migrating a Tecton batch Feature View to Snowflake (clickstream ORDERS)
from snowflake.ml.feature_store import FeatureStore, FeatureView, Entity
# Tecton (original)
"""
@batch_feature_view(
entities=[user_entity],
feature_table=user_features,
schedule=timedelta(hours=1),
)
def user_purchase_fv(user_txns):
return user_txns.groupby('USER_ID').agg(
total_spend=('AMOUNT', 'sum'),
order_count=('ORDER_ID', 'count'),
)
"""
# Snowflake (migrated) — FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS uses TOTAL_AMT, ORDER_ID, ORDER_TS
user_purchase_df = session.sql("""
SELECT
USER_ID,
SUM(TOTAL_AMT) AS ORDER_TOTAL_AMT_SUM,
COUNT(ORDER_ID) AS ORDER_CNT,
MAX(ORDER_TS) AS LAST_ORDER_TS
FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
GROUP BY USER_ID
""")
user_purchase_fv = FeatureView(
name="USER_PURCHASE_FV",
entities=[user_entity],
feature_df=user_purchase_df,
timestamp_col="LAST_ORDER_TS",
refresh_freq="1 hour",
desc="USER purchase aggregates from CLICKSTREAM_DATA.ORDERS",
)
fs.register_feature_view(feature_view=user_purchase_fv, version="V01")13.5.4 Phase 3: Validation
from snowflake.snowpark import functions as F
def validate_migration(
source_features,
snowflake_features,
entity_cols: list,
) -> dict:
"""Compare features between source platform and Snowflake (Snowpark DataFrames)."""
comparison = source_features.join(
snowflake_features,
on=entity_cols,
how="full_outer",
)
mismatches = comparison.filter(
F.col("source_ORDER_TOTAL_AMT_SUM") != F.col("sf_ORDER_TOTAL_AMT_SUM")
).count()
total = comparison.count()
return {
"total_rows": total,
"mismatches": mismatches,
"match_rate": (1 - mismatches / total) * 100 if total else 0.0,
}13.6 Platform-Specific Guides
13.6.1 Tecton Migration
Tecton migration is prioritized given Databricks Online Tables EOL.
# Key differences to handle:
# 1. Feature definitions use SQL/Snowpark (feature_df) vs Tecton Python transformations
# 2. Online serving uses Hybrid Tables + OnlineConfig.target_lag vs Tecton's online store
# 3. Offline scheduling uses Feature View refresh_freq vs Tecton's schedule parameter
# 4. Backfill / ad-hoc refresh: fs.refresh_feature_view(fv)
# 5. Training data: fs.generate_training_set() (Snowpark DF) or fs.generate_dataset() (ML Dataset)13.6.2 Feast Migration
13.6.3 Migrating from Pandas-based Feature Engineering
Many ML teams start with local pandas for feature engineering and need a path to productionize those pipelines in the Feature Store. The difficulty varies widely depending on what pandas operations are used.
13.6.3.1 Complexity ladder
Tier 1 – Trivial (direct SQL rewrite). Simple groupby/agg, filters, joins. These map 1:1 to SQL or Snowpark DataFrame operations.
Tier 2 – Moderate (Snowpark DataFrame rewrite). Multi-step transformations, conditional columns, window functions. Requires translating pandas idioms to Snowpark equivalents.
from snowflake.snowpark import Window
import snowflake.snowpark.functions as F
orders = session.table("FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS")
recent_orders_df = (
orders
.with_column("IS_HIGH_VALUE", F.when(F.col("TOTAL_AMT") > 100, 1).otherwise(0))
.with_column(
"ORDER_RANK",
F.row_number().over(Window.partition_by("USER_ID").order_by(F.col("ORDER_TS").desc()))
)
.filter(F.col("ORDER_RANK") <= 5)
)Tier 3 – Complex (LLM-assisted conversion + manual tuning). Pandas operations relying on row order (.shift(), .rolling() with custom functions, .apply() with arbitrary Python). These require expressing the logic with SQL window functions (LAG, LEAD, cumulative sums) or vectorized UDFs.
# pandas (before) — row-ordered operations
df = df.sort_values(["USER_ID", "ORDER_TS"])
df["PREV_ORDER_AMT"] = df.groupby("USER_ID")["TOTAL_AMT"].shift(1)
df["AMT_CHANGE"] = df["TOTAL_AMT"] - df["PREV_ORDER_AMT"]# SQL (after) — window functions replace row-ordered pandas
order_change_df = session.sql("""
SELECT
USER_ID,
ORDER_TS,
TOTAL_AMT,
LAG(TOTAL_AMT) OVER (
PARTITION BY USER_ID ORDER BY ORDER_TS
) AS PREV_ORDER_AMT,
TOTAL_AMT - LAG(TOTAL_AMT) OVER (
PARTITION BY USER_ID ORDER BY ORDER_TS
) AS AMT_CHANGE
FROM FEATURE_STORE_DEMO.CLICKSTREAM_DATA.ORDERS
""")Use coding LLM tools (Cursor, Copilot, ChatGPT, etc.) to generate a first-pass SQL or Snowpark DataFrame translation of complex pandas code. The LLM output typically handles 80-90% of the logic correctly. Review and manually tune the remaining edge cases – especially around window frame boundaries, NULL handling, and DT-compatibility (avoid non-deterministic functions like CURRENT_TIMESTAMP()).
13.6.3.2 Snowpark pandas: not recommended for production Feature Views
Snowpark pandas (import snowflake.snowpark.modin.pandas as pd) runs pandas-syntax code on Snowflake warehouse compute. However, it is not a supported transformation language for Feature Store feature_df in production:
- With default
enforce_ordering=True, it generates multi-statement SQL with temporary tables that cannot back a Dynamic Table - With
enforce_ordering=False, some pipelines collapse to a single SQL statement, but known correctness issues make this experimental and not recommended for production - Feature Store assumes relational, unordered sources keyed by entity + timestamp – pandas row-ordered semantics are fundamentally at odds with this model
Recommendation: Use Snowpark pandas for exploration and prototyping during migration, then rewrite the final pipeline in SQL or Snowpark DataFrames for the registered feature_df.
13.6.3.3 Anti-pattern: passing a local pandas DataFrame as feature_df
Passing a local pd.DataFrame as feature_df will fail – the Feature Store API expects a Snowpark DataFrame, not a pandas object. If you have data in a local pandas DataFrame, upload it to a Snowflake table first and then reference it:
# Upload local pandas DataFrame to Snowflake
session.write_pandas(local_df, "MY_STAGING_TABLE", auto_create_table=True)
# Now use it as feature_df
feature_df = session.table("MY_STAGING_TABLE")13.7 Best Practices
1. Migrate Incrementally
Week 1: Entities + 2-3 simple Feature Views
Week 2: Remaining batch features
Week 3: Online serving setup
Week 4: Consumer cutover
2. Run in Parallel
Operate both systems during migration:
# Historical / training: compare Snowpark path (generate_training_set) and Dataset path (generate_dataset) as needed
source_features = tecton_client.get_historical_features(...)
sf_training_df = snowflake_fs.generate_training_set(
spine_df=training_spine,
features=[user_purchase_fv],
spine_timestamp_col="SESSION_START_TS",
)
# Or: sf_dataset = snowflake_fs.generate_dataset(..., version="V01") for Parquet-backed ML Dataset workflows
validate_migration(source_features, sf_training_df, entity_cols=["USER_ID"])3. Validate PIT Correctness
# Ensure point-in-time retrieval produces same results
def test_pit_correctness(spine, source_fs, snowflake_fs):
source_result = source_fs.get_historical_features(spine)
sf_df = snowflake_fs.generate_training_set(
spine_df=spine,
features=[user_purchase_fv],
spine_timestamp_col="SESSION_START_TS",
)
assert_dataframes_equal(source_result, sf_df)13.8 Common Pitfalls
13.8.1 ❌ Pitfall 1: Big Bang Migration
Problem: Migrating everything at once causes extended outages.
Solution: Phased migration with parallel operation.
13.8.2 ❌ Pitfall 2: Skipping Validation
Problem: Subtle differences in feature values undetected.
Solution: Comprehensive output comparison.
13.8.3 ❌ Pitfall 3: Ignoring Timestamp Semantics
Problem: Different interpretation of event time vs. processing time.
Solution: Document and align timestamp semantics.
13.8.4 ❌ Pitfall 4: Confusing TTL with a Single Knob
Problem: Tecton TTL maps to different Snowflake controls for offline vs online paths.
Solution: Set refresh_freq for Dynamic Table materialization and OnlineConfig.target_lag for online sync independently; align both with product freshness SLAs.
13.9 Summary
| Phase | Activities |
|---|---|
| Assessment | Inventory, terminology mapping, planning |
| Migration | Recreate entities, port definitions (feature_df), register with V01, setup pipelines |
| Validation | Compare outputs (generate_training_set / generate_dataset), validate PIT, cutover |
13.10 Additional Resources
- Snowflake Feature Store Documentation
- Tecton Documentation (for reference)
- Feast Documentation (for reference)