dbt E-Commerce Pipeline Walkthrough¶
This walkthrough builds a medallion architecture pipeline using dbt to transform demo Iceberg tables into analytics-ready marts. You'll create staging views, mart tables with RFM segmentation, and run data quality tests — all from the Pipeline Builder UI.
Prerequisites¶
- DRLS platform running (see Quickstart)
- Demo catalog seeded (
python scripts/seed_demo_catalog.py) - dbt profiles Secret created:
kubectl create secret generic dbt-profiles \
--from-file=profiles.yml=dbt/profiles.yml \
-n default-drls-clusters
Step 1: Explore the Demo Data¶
Open the Catalog page and browse the demo namespace. You'll see two tables we'll transform:
| Table | Rows | Description |
|---|---|---|
demo.orders |
~150 | Order transactions with amounts, status, shipping |
demo.customers |
30 | Customer profiles with lifetime value |
Click into demo.orders to see the schema — notice the raw data has mixed-case status values, nullable amounts, and cancelled orders we'll want to filter.
Step 2: Understand the dbt Models¶
The built-in dbt project at dbt/models/ecommerce/ contains four models:
Staging Layer (Views)¶
stg_orders — Cleans and normalizes orders:
{{ config(materialized='view', tags=['ecommerce']) }}
select
order_id,
customer_id,
cast(order_date as date) as order_date,
lower(status) as status,
total_amount,
items_count,
lower(shipping_country) as shipping_country,
lower(payment_method) as payment_method,
created_at
from {{ source('demo', 'orders') }}
where status != 'cancelled'
and total_amount > 0
stg_customers — Normalizes customer profiles:
{{ config(materialized='view', tags=['ecommerce']) }}
select
customer_id, name,
lower(trim(email)) as email,
lower(country) as country,
cast(signup_date as date) as signup_date,
is_active, lifetime_value, last_login
from {{ source('demo', 'customers') }}
Mart Layer (Tables)¶
fct_daily_revenue — Daily revenue aggregation (incremental):
- Total orders, unique customers, gross revenue, AOV
- Payment method breakdown
- Countries served per day
dim_customer_metrics — Customer dimension with RFM segmentation:
- Joins orders onto customers
- Calculates recency (days since last order), frequency (order count), monetary (total spend)
- Assigns tier:
vip,regular,new, orinactive
Step 3: Build the Pipeline¶
Open the Pipeline Builder and create a new pipeline called "E-Commerce Analytics".
Add Source Nodes¶
- Drag Iceberg Source from the palette → set table to
demo.orders - Drag another Iceberg Source → set table to
demo.customers
Add dbt Staging Step¶
- Drag a dbt node onto the canvas
- Configure:
- Label: Staging Models
- Command:
run - Selector:
stg_orders stg_customers - Target:
prod - Profiles Secret:
dbt-profiles
- Connect both source nodes → dbt staging
Add dbt Mart Step¶
- Drag another dbt node
- Configure:
- Label: Mart Models
- Command:
build - Selector:
fct_daily_revenue dim_customer_metrics - Target:
prod - Profiles Secret:
dbt-profiles
- Connect dbt staging → dbt marts
Add Sink Nodes¶
- Drag Iceberg Sink → set table to
analytics.fct_daily_revenue - Drag another Iceberg Sink → set table to
analytics.dim_customer_metrics - Connect dbt marts → both sinks
Your pipeline should look like:
graph LR
O["demo.orders"] --> STG["dbt run<br/>Staging Models"]
C["demo.customers"] --> STG
STG --> MART["dbt build<br/>Mart Models"]
MART --> R["fct_daily_revenue"]
MART --> M["dim_customer_metrics"]
Save and Promote¶
- Click Save — creates version 1 in development
- Click Promote to QA to validate
Step 4: Deploy and Run¶
- From the pipeline list, click Deploy
- Select the QA version and target workspace
- Monitor execution in the Monitor tab
The pipeline creates two Kubernetes Jobs in sequence:
- Staging Job —
dbt run --select stg_orders stg_customers --target prod - Mart Job —
dbt build --select fct_daily_revenue dim_customer_metrics --target prod
Each Job mounts a ConfigMap containing the dbt project files and a Secret with profiles.yml.
Step 5: Verify Results¶
Check the Catalog¶
Navigate to the Catalog page and browse the analytics namespace. You should see two new tables:
analytics.fct_daily_revenue— one row per order dateanalytics.dim_customer_metrics— one row per customer with tier assignment
Query the Marts¶
Click into analytics.dim_customer_metrics and preview the data. You'll see customers segmented into tiers:
| customer_id | name | order_count | total_spend | customer_tier |
|---|---|---|---|---|
| C001 | Alice | 8 | 1,240 | vip |
| C012 | Bob | 3 | 350 | regular |
| C025 | Carol | 1 | 45 | new |
| C030 | Dave | 0 | 0 | inactive |
Step 6: Add Data Quality Tests¶
Create a second pipeline for scheduled quality checks:
- Create a new pipeline: "E-Commerce Quality Tests"
- Add a single dbt node:
- Command:
test - Selector:
tag:ecommerce - Fail Fast: checked
- Profiles Secret:
dbt-profiles
- Command:
- Save and schedule daily
The tests validate:
order_idis not null and uniquetotal_amountis positivecustomer_tieris one of[vip, regular, new, inactive]gross_revenueis not null
Step 7: Hybrid Pipeline (Advanced)¶
Combine Spark ingest with dbt transforms in a single DAG:
graph LR
K["Kafka<br/>order-events"] --> S["Spark Streaming<br/>Validate & Enrich"]
S --> RAW["Iceberg Sink<br/>raw.validated_orders"]
S --> DBT["dbt build<br/>tag:ecommerce"]
DBT --> A["Iceberg Sink<br/>analytics.fct_daily_revenue"]
This pattern is useful when:
- Spark handles real-time validation and enrichment
- dbt handles batch aggregation and business logic
- Both share the same Iceberg catalog through Polaris
The DAG edges ensure dbt runs only after Spark has written validated data.
Seed All Demo Pipelines¶
To load all three demo pipelines at once:
This creates:
- E-Commerce Analytics (dbt) — pure dbt medallion pipeline
- Hybrid Spark + dbt Pipeline — streaming ingest + batch transforms
- dbt Data Quality Suite — scheduled test runner
Next Steps¶
- dbt Transforms — full reference for dbt node configuration
- Streaming — Spark Structured Streaming with Iceberg
- UDF Registry — custom Python/Java UDFs in Spark steps
- Workspaces — multi-tenant pipeline isolation