dbt E-Commerce Pipeline Walkthrough¶

This walkthrough builds a medallion architecture pipeline using dbt to transform demo Iceberg tables into analytics-ready marts. You'll create staging views, mart tables with RFM segmentation, and run data quality tests — all from the Pipeline Builder UI.

Prerequisites¶

DRLS platform running (see Quickstart)
Demo catalog seeded (python scripts/seed_demo_catalog.py)
dbt profiles Secret created:

kubectl create secret generic dbt-profiles \
  --from-file=profiles.yml=dbt/profiles.yml \
  -n default-drls-clusters

Step 1: Explore the Demo Data¶

Open the Catalog page and browse the demo namespace. You'll see two tables we'll transform:

Table	Rows	Description
`demo.orders`	~150	Order transactions with amounts, status, shipping
`demo.customers`	30	Customer profiles with lifetime value

Click into demo.orders to see the schema — notice the raw data has mixed-case status values, nullable amounts, and cancelled orders we'll want to filter.

Step 2: Understand the dbt Models¶

The built-in dbt project at dbt/models/ecommerce/ contains four models:

Staging Layer (Views)¶

stg_orders — Cleans and normalizes orders:

{{ config(materialized='view', tags=['ecommerce']) }}

select
    order_id,
    customer_id,
    cast(order_date as date) as order_date,
    lower(status) as status,
    total_amount,
    items_count,
    lower(shipping_country) as shipping_country,
    lower(payment_method) as payment_method,
    created_at
from {{ source('demo', 'orders') }}
where status != 'cancelled'
  and total_amount > 0

stg_customers — Normalizes customer profiles:

{{ config(materialized='view', tags=['ecommerce']) }}

select
    customer_id, name,
    lower(trim(email)) as email,
    lower(country) as country,
    cast(signup_date as date) as signup_date,
    is_active, lifetime_value, last_login
from {{ source('demo', 'customers') }}

Mart Layer (Tables)¶

fct_daily_revenue — Daily revenue aggregation (incremental):

Total orders, unique customers, gross revenue, AOV
Payment method breakdown
Countries served per day

dim_customer_metrics — Customer dimension with RFM segmentation:

Joins orders onto customers
Calculates recency (days since last order), frequency (order count), monetary (total spend)
Assigns tier: vip, regular, new, or inactive

Step 3: Build the Pipeline¶

Open the Pipeline Builder and create a new pipeline called "E-Commerce Analytics".

Add Source Nodes¶

Drag Iceberg Source from the palette → set table to demo.orders
Drag another Iceberg Source → set table to demo.customers

Add dbt Staging Step¶

Drag a dbt node onto the canvas
Configure:
- Label: Staging Models
- Command: run
- Selector: stg_orders stg_customers
- Target: prod
- Profiles Secret: dbt-profiles
Connect both source nodes → dbt staging

Add dbt Mart Step¶

Drag another dbt node
Configure:
- Label: Mart Models
- Command: build
- Selector: fct_daily_revenue dim_customer_metrics
- Target: prod
- Profiles Secret: dbt-profiles
Connect dbt staging → dbt marts

Add Sink Nodes¶

Drag Iceberg Sink → set table to analytics.fct_daily_revenue
Drag another Iceberg Sink → set table to analytics.dim_customer_metrics
Connect dbt marts → both sinks

Your pipeline should look like:

graph LR
    O["demo.orders"] --> STG["dbt run<br/>Staging Models"]
    C["demo.customers"] --> STG
    STG --> MART["dbt build<br/>Mart Models"]
    MART --> R["fct_daily_revenue"]
    MART --> M["dim_customer_metrics"]

Save and Promote¶

Click Save — creates version 1 in development
Click Promote to QA to validate

Step 4: Deploy and Run¶

From the pipeline list, click Deploy
Select the QA version and target workspace
Monitor execution in the Monitor tab

The pipeline creates two Kubernetes Jobs in sequence:

Staging Job — dbt run --select stg_orders stg_customers --target prod
Mart Job — dbt build --select fct_daily_revenue dim_customer_metrics --target prod

Each Job mounts a ConfigMap containing the dbt project files and a Secret with profiles.yml.

Step 5: Verify Results¶

Check the Catalog¶

Navigate to the Catalog page and browse the analytics namespace. You should see two new tables:

analytics.fct_daily_revenue — one row per order date
analytics.dim_customer_metrics — one row per customer with tier assignment

Query the Marts¶

Click into analytics.dim_customer_metrics and preview the data. You'll see customers segmented into tiers:

customer_id	name	order_count	total_spend	customer_tier
C001	Alice	8	1,240	vip
C012	Bob	3	350	regular
C025	Carol	1	45	new
C030	Dave	0	0	inactive

Step 6: Add Data Quality Tests¶

Create a second pipeline for scheduled quality checks:

Create a new pipeline: "E-Commerce Quality Tests"
Add a single dbt node:
- Command: test
- Selector: tag:ecommerce
- Fail Fast: checked
- Profiles Secret: dbt-profiles
Save and schedule daily

The tests validate:

order_id is not null and unique
total_amount is positive
customer_tier is one of [vip, regular, new, inactive]
gross_revenue is not null

Step 7: Hybrid Pipeline (Advanced)¶

Combine Spark ingest with dbt transforms in a single DAG:

graph LR
    K["Kafka<br/>order-events"] --> S["Spark Streaming<br/>Validate & Enrich"]
    S --> RAW["Iceberg Sink<br/>raw.validated_orders"]
    S --> DBT["dbt build<br/>tag:ecommerce"]
    DBT --> A["Iceberg Sink<br/>analytics.fct_daily_revenue"]

This pattern is useful when:

Spark handles real-time validation and enrichment
dbt handles batch aggregation and business logic
Both share the same Iceberg catalog through Polaris

The DAG edges ensure dbt runs only after Spark has written validated data.

Seed All Demo Pipelines¶

To load all three demo pipelines at once:

python scripts/seed_demo_pipelines.py

This creates:

E-Commerce Analytics (dbt) — pure dbt medallion pipeline
Hybrid Spark + dbt Pipeline — streaming ingest + batch transforms
dbt Data Quality Suite — scheduled test runner

Next Steps¶

dbt Transforms — full reference for dbt node configuration
Streaming — Spark Structured Streaming with Iceberg
UDF Registry — custom Python/Java UDFs in Spark steps
Workspaces — multi-tenant pipeline isolation