Feast Feature Store Walkthrough¶
This walkthrough covers the full feature lifecycle on DRLS — from defining features backed by Iceberg tables to serving them in real-time via the Feast HTTP API. Two example scripts demonstrate complementary approaches:
feast_features.py— Python SDK: define, apply, materialize, and retrieve featuresfeast_serving.py— HTTP API: access materialized features from any language
Prerequisites¶
- DRLS platform running with Feast stack deployed (
kubectl apply -k ui/k8s/feast/) - Demo catalog seeded (
python scripts/seed_demo_catalog.py) - dbt marts built (
analytics.dim_customer_metricsexists — see dbt walkthrough)
Architecture Overview¶
graph LR
ICE["Iceberg<br/>dim_customer_metrics"] -->|PyIceberg + DuckDB| OFF["Feast Offline Store<br/>(DrlsOfflineStore)"]
OFF -->|materialize| DF["Dragonfly<br/>(Online Store)"]
DF -->|get-online-features| FS["feast-serve<br/>:6566"]
FS -->|REST API| APP["Your App<br/>(any language)"]
OFF -->|get_historical_features| TRAIN["Training Data<br/>(point-in-time join)"]
Key components:
- DrlsOfflineStore — Custom offline store using PyIceberg + DuckDB (no Spark/JVM)
- Dragonfly — Redis-compatible online store for sub-millisecond feature serving
- feast-serve — FastAPI feature server exposing REST endpoints
- Polaris — REST catalog that all components use to resolve Iceberg tables
Part 1: Feature Definition and Materialization¶
Script: examples/feast_features.py
Step 1: Define the Data Source¶
The IcebergDataSource points to a dbt-produced mart table in the Polaris catalog:
customer_metrics_source = IcebergDataSource( name="customer_metrics", table_fqn="analytics.dim_customer_metrics", catalog_uri=POLARIS_URL, warehouse=POLARIS_WAREHOUSE, timestamp_field="updated_at", description="RFM customer metrics from dbt mart", )
Note the 2-part FQN (analytics.dim_customer_metrics) — Polaris resolves this within the drls warehouse. The timestamp_field tells Feast which column tracks when rows were updated.
Step 2: Define Entity and Feature View¶
The entity defines the join key, and the feature view declares the schema:
customer = Entity( name="customer", join_keys=["customer_id"], description="Customer entity keyed by customer_id", )
customer_features = FeatureView( name="customer_rfm_features", entities=[customer], schema=[ Field(name="customer_id", dtype=String), Field(name="order_count", dtype=Int64), Field(name="total_spend", dtype=Float64), Field(name="avg_order_value", dtype=Float64), Field(name="days_since_last_order", dtype=Int64), Field(name="customer_tier", dtype=String), ], source=customer_metrics_source, online=True, ttl=timedelta(hours=24), description="Customer RFM segmentation features for real-time inference", )
Key settings:
online=True— enables materialization to Dragonflyttl=24h— features older than 24 hours are considered staleschema— typed fields that Feast validates during materialization
Step 3: Apply to Registry¶
This registers the entity and feature view in Feast's SQL registry, making them visible in the Feast UI at /feast/ui/.
Step 4: Materialize to Online Store¶
print("\n=== Materializing features to online store ===")
end_date = datetime.now(UTC)
start_date = end_date - timedelta(days=30)
store.materialize(
start_date=start_date,
end_date=end_date,
feature_views=["customer_rfm_features"],
)
print("Materialization complete.")
Materialization reads from Iceberg (via PyIceberg + DuckDB) and writes to Dragonfly. The four-tier autoscaler selects the execution strategy based on table size from Iceberg snapshot metadata:
| Tier | Data Size | Strategy |
|---|---|---|
in_memory |
< 512 MB | Single-process DuckDB |
spill |
< 2 GB | DuckDB with disk spill |
local_parallel |
< 10 GB | Multi-threaded DuckDB |
ray_distributed |
>= 10 GB | Distributed Ray tasks |
Step 5: Retrieve Online Features¶
print("\n=== Retrieving online features ===")
entity_rows = [
{"customer_id": "C001"},
{"customer_id": "C012"},
{"customer_id": "C025"},
]
online_response = store.get_online_features(
features=[
"customer_rfm_features:order_count",
"customer_rfm_features:total_spend",
"customer_rfm_features:avg_order_value",
"customer_rfm_features:customer_tier",
],
entity_rows=entity_rows,
)
print("\nOnline features (from Dragonfly):")
print("-" * 70)
df = online_response.to_df()
print(df.to_string(index=False))
Online retrieval hits Dragonfly directly — sub-millisecond latency for real-time inference.
Step 6: Retrieve Historical Features¶
print("\n=== Retrieving historical features ===")
import pandas as pd
entity_df = pd.DataFrame(
{
"customer_id": ["C001", "C012", "C025", "C001"],
"event_timestamp": [
datetime(2026, 3, 1, tzinfo=UTC),
datetime(2026, 3, 5, tzinfo=UTC),
datetime(2026, 3, 10, tzinfo=UTC),
datetime(2026, 2, 15, tzinfo=UTC), # C001 at earlier point
],
}
)
historical_features = store.get_historical_features(
entity_df=entity_df,
features=[
"customer_rfm_features:order_count",
"customer_rfm_features:total_spend",
"customer_rfm_features:customer_tier",
],
)
print("\nHistorical features (point-in-time join via DuckDB):")
print("-" * 70)
hist_df = historical_features.to_df()
print(hist_df.to_string(index=False))
Historical retrieval performs a point-in-time join via DuckDB — feature values are returned as they existed at each event_timestamp. This is essential for creating training datasets without data leakage.
Run It¶
# Port-forward to Feast (if not using DRLS proxy)
kubectl port-forward svc/feast-serve 6566:6566 -n drls-feast &
# Run the full lifecycle
python examples/feast_features.py
Part 2: HTTP API Feature Serving¶
Script: examples/feast_serving.py
This example accesses materialized features via the feast-serve REST API using only Python's stdlib — no Feast SDK required. This is how production services (Go, Java, Node, etc.) consume features.
Single Customer Lookup¶
def get_online_features(base_url: str, feature_refs: list[str], entity_rows: list[dict]): """Call the Feast feature server REST API.""" url = f"{base_url}/get-online-features" payload = { "features": feature_refs, "entities": { key: [row[key] for row in entity_rows] for key in entity_rows[0] }, }
req = Request(
url,
data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"},
method="POST",
)
try:
with urlopen(req, timeout=10) as resp:
return json.loads(resp.read())
except URLError as e:
print(f"Error connecting to feast-serve at {url}: {e}")
sys.exit(1)
The API accepts a POST to /get-online-features with feature references and entity values.
Batch Lookup¶
response = get_online_features(
base_url,
features=[
"customer_rfm_features:order_count",
"customer_rfm_features:total_spend",
"customer_rfm_features:customer_tier",
],
entity_rows=[
{"customer_id": "C001"},
{"customer_id": "C012"},
{"customer_id": "C025"},
],
)
print_feature_table(response)
Multiple entities in a single request — feast-serve returns a columnar response.
Equivalent curl Command¶
curl -X POST http://localhost:6566/get-online-features \
-H "Content-Type: application/json" \
-d '{
"features": [
"customer_rfm_features:order_count",
"customer_rfm_features:total_spend",
"customer_rfm_features:customer_tier"
],
"entities": {
"customer_id": ["C001", "C012"]
}
}'
Via DRLS Proxy¶
When accessing through the DRLS UI server, use the /feast/api/ prefix:
# Direct to feast-serve
curl http://localhost:6566/get-online-features ...
# Via DRLS proxy (strips /feast/api prefix)
curl https://demo.drls.io/feast/api/get-online-features ...
Run It¶
# Option A: Direct to feast-serve
kubectl port-forward svc/feast-serve 6566:6566 -n drls-feast &
python examples/feast_serving.py
# Option B: Via DRLS proxy
python examples/feast_serving.py --url https://demo.drls.io/feast/api
Feast UI¶
Browse registered feature views, entities, and data sources in the Feast UI:
- Direct:
http://localhost:8888/feast/ui/ - Via DRLS proxy:
https://demo.drls.io/feast/ui/
The UI is embedded as an iframe in the DRLS management console under the Feast tab.
Pipeline Integration¶
Features can be materialized as part of a pipeline using the Feast Online Sink node:
graph LR
S["Spark Streaming<br/>Validate & Enrich"] --> V["Validation<br/>Data Quality Gate"]
V --> FS["Feast Online Sink<br/>Push to Dragonfly"]
V --> ICE["Iceberg Sink<br/>analytics.dim_customer_metrics"]
The FeastOnlineSink uses PushMode.ONLINE only — Iceberg remains the source of truth. Validation runs Arrow-native checks before writing to either store.
Next Steps¶
- dbt Pipeline — build the mart table that feeds these features
- Streaming — real-time pipelines with Spark Structured Streaming
- Management Console — Feast UI, health panel, data quality tab
- Agentic — query features via natural language with TRex