Skip to content

Feast Feature Store Walkthrough

This walkthrough covers the full feature lifecycle on DRLS — from defining features backed by Iceberg tables to serving them in real-time via the Feast HTTP API. Two example scripts demonstrate complementary approaches:

  1. feast_features.py — Python SDK: define, apply, materialize, and retrieve features
  2. feast_serving.py — HTTP API: access materialized features from any language

Prerequisites

  • DRLS platform running with Feast stack deployed (kubectl apply -k ui/k8s/feast/)
  • Demo catalog seeded (python scripts/seed_demo_catalog.py)
  • dbt marts built (analytics.dim_customer_metrics exists — see dbt walkthrough)

Architecture Overview

graph LR
    ICE["Iceberg<br/>dim_customer_metrics"] -->|PyIceberg + DuckDB| OFF["Feast Offline Store<br/>(DrlsOfflineStore)"]
    OFF -->|materialize| DF["Dragonfly<br/>(Online Store)"]
    DF -->|get-online-features| FS["feast-serve<br/>:6566"]
    FS -->|REST API| APP["Your App<br/>(any language)"]
    OFF -->|get_historical_features| TRAIN["Training Data<br/>(point-in-time join)"]

Key components:

  • DrlsOfflineStore — Custom offline store using PyIceberg + DuckDB (no Spark/JVM)
  • Dragonfly — Redis-compatible online store for sub-millisecond feature serving
  • feast-serve — FastAPI feature server exposing REST endpoints
  • Polaris — REST catalog that all components use to resolve Iceberg tables

Part 1: Feature Definition and Materialization

Script: examples/feast_features.py

Step 1: Define the Data Source

The IcebergDataSource points to a dbt-produced mart table in the Polaris catalog:

customer_metrics_source = IcebergDataSource( name="customer_metrics", table_fqn="analytics.dim_customer_metrics", catalog_uri=POLARIS_URL, warehouse=POLARIS_WAREHOUSE, timestamp_field="updated_at", description="RFM customer metrics from dbt mart", )

Note the 2-part FQN (analytics.dim_customer_metrics) — Polaris resolves this within the drls warehouse. The timestamp_field tells Feast which column tracks when rows were updated.

Step 2: Define Entity and Feature View

The entity defines the join key, and the feature view declares the schema:

customer = Entity( name="customer", join_keys=["customer_id"], description="Customer entity keyed by customer_id", )

customer_features = FeatureView( name="customer_rfm_features", entities=[customer], schema=[ Field(name="customer_id", dtype=String), Field(name="order_count", dtype=Int64), Field(name="total_spend", dtype=Float64), Field(name="avg_order_value", dtype=Float64), Field(name="days_since_last_order", dtype=Int64), Field(name="customer_tier", dtype=String), ], source=customer_metrics_source, online=True, ttl=timedelta(hours=24), description="Customer RFM segmentation features for real-time inference", )

Key settings:

  • online=True — enables materialization to Dragonfly
  • ttl=24h — features older than 24 hours are considered stale
  • schema — typed fields that Feast validates during materialization

Step 3: Apply to Registry

store = FeatureStore(repo_path=FEAST_REPO_PATH)
store.apply([customer, customer_features])

This registers the entity and feature view in Feast's SQL registry, making them visible in the Feast UI at /feast/ui/.

Step 4: Materialize to Online Store

print("\n=== Materializing features to online store ===")
end_date = datetime.now(UTC)
start_date = end_date - timedelta(days=30)

store.materialize(
    start_date=start_date,
    end_date=end_date,
    feature_views=["customer_rfm_features"],
)
print("Materialization complete.")

Materialization reads from Iceberg (via PyIceberg + DuckDB) and writes to Dragonfly. The four-tier autoscaler selects the execution strategy based on table size from Iceberg snapshot metadata:

Tier Data Size Strategy
in_memory < 512 MB Single-process DuckDB
spill < 2 GB DuckDB with disk spill
local_parallel < 10 GB Multi-threaded DuckDB
ray_distributed >= 10 GB Distributed Ray tasks

Step 5: Retrieve Online Features

print("\n=== Retrieving online features ===")
entity_rows = [
    {"customer_id": "C001"},
    {"customer_id": "C012"},
    {"customer_id": "C025"},
]

online_response = store.get_online_features(
    features=[
        "customer_rfm_features:order_count",
        "customer_rfm_features:total_spend",
        "customer_rfm_features:avg_order_value",
        "customer_rfm_features:customer_tier",
    ],
    entity_rows=entity_rows,
)

print("\nOnline features (from Dragonfly):")
print("-" * 70)
df = online_response.to_df()
print(df.to_string(index=False))

Online retrieval hits Dragonfly directly — sub-millisecond latency for real-time inference.

Step 6: Retrieve Historical Features

print("\n=== Retrieving historical features ===")
import pandas as pd

entity_df = pd.DataFrame(
    {
        "customer_id": ["C001", "C012", "C025", "C001"],
        "event_timestamp": [
            datetime(2026, 3, 1, tzinfo=UTC),
            datetime(2026, 3, 5, tzinfo=UTC),
            datetime(2026, 3, 10, tzinfo=UTC),
            datetime(2026, 2, 15, tzinfo=UTC),  # C001 at earlier point
        ],
    }
)

historical_features = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "customer_rfm_features:order_count",
        "customer_rfm_features:total_spend",
        "customer_rfm_features:customer_tier",
    ],
)

print("\nHistorical features (point-in-time join via DuckDB):")
print("-" * 70)
hist_df = historical_features.to_df()
print(hist_df.to_string(index=False))

Historical retrieval performs a point-in-time join via DuckDB — feature values are returned as they existed at each event_timestamp. This is essential for creating training datasets without data leakage.

Run It

# Port-forward to Feast (if not using DRLS proxy)
kubectl port-forward svc/feast-serve 6566:6566 -n drls-feast &

# Run the full lifecycle
python examples/feast_features.py

Part 2: HTTP API Feature Serving

Script: examples/feast_serving.py

This example accesses materialized features via the feast-serve REST API using only Python's stdlib — no Feast SDK required. This is how production services (Go, Java, Node, etc.) consume features.

Single Customer Lookup

def get_online_features(base_url: str, feature_refs: list[str], entity_rows: list[dict]): """Call the Feast feature server REST API.""" url = f"{base_url}/get-online-features" payload = { "features": feature_refs, "entities": { key: [row[key] for row in entity_rows] for key in entity_rows[0] }, }

req = Request(
    url,
    data=json.dumps(payload).encode(),
    headers={"Content-Type": "application/json"},
    method="POST",
)

try:
    with urlopen(req, timeout=10) as resp:
        return json.loads(resp.read())
except URLError as e:
    print(f"Error connecting to feast-serve at {url}: {e}")
    sys.exit(1)

The API accepts a POST to /get-online-features with feature references and entity values.

Batch Lookup

response = get_online_features(
    base_url,
    features=[
        "customer_rfm_features:order_count",
        "customer_rfm_features:total_spend",
        "customer_rfm_features:customer_tier",
    ],
    entity_rows=[
        {"customer_id": "C001"},
        {"customer_id": "C012"},
        {"customer_id": "C025"},
    ],
)

print_feature_table(response)

Multiple entities in a single request — feast-serve returns a columnar response.

Equivalent curl Command

curl -X POST http://localhost:6566/get-online-features \
  -H "Content-Type: application/json" \
  -d '{
    "features": [
      "customer_rfm_features:order_count",
      "customer_rfm_features:total_spend",
      "customer_rfm_features:customer_tier"
    ],
    "entities": {
      "customer_id": ["C001", "C012"]
    }
  }'

Via DRLS Proxy

When accessing through the DRLS UI server, use the /feast/api/ prefix:

# Direct to feast-serve
curl http://localhost:6566/get-online-features ...

# Via DRLS proxy (strips /feast/api prefix)
curl https://demo.drls.io/feast/api/get-online-features ...

Run It

# Option A: Direct to feast-serve
kubectl port-forward svc/feast-serve 6566:6566 -n drls-feast &
python examples/feast_serving.py

# Option B: Via DRLS proxy
python examples/feast_serving.py --url https://demo.drls.io/feast/api

Feast UI

Browse registered feature views, entities, and data sources in the Feast UI:

  • Direct: http://localhost:8888/feast/ui/
  • Via DRLS proxy: https://demo.drls.io/feast/ui/

The UI is embedded as an iframe in the DRLS management console under the Feast tab.

Pipeline Integration

Features can be materialized as part of a pipeline using the Feast Online Sink node:

graph LR
    S["Spark Streaming<br/>Validate & Enrich"] --> V["Validation<br/>Data Quality Gate"]
    V --> FS["Feast Online Sink<br/>Push to Dragonfly"]
    V --> ICE["Iceberg Sink<br/>analytics.dim_customer_metrics"]

The FeastOnlineSink uses PushMode.ONLINE only — Iceberg remains the source of truth. Validation runs Arrow-native checks before writing to either store.

Next Steps

  • dbt Pipeline — build the mart table that feeds these features
  • Streaming — real-time pipelines with Spark Structured Streaming
  • Management Console — Feast UI, health panel, data quality tab
  • Agentic — query features via natural language with TRex