Skip to content

dbt E-Commerce Pipeline Walkthrough

This walkthrough builds a medallion architecture pipeline using dbt to transform demo Iceberg tables into analytics-ready marts. You'll create staging views, mart tables with RFM segmentation, and run data quality tests — all from the Pipeline Builder UI.

Prerequisites

  • DRLS platform running (see Quickstart)
  • Demo catalog seeded (python scripts/seed_demo_catalog.py)
  • dbt profiles Secret created:
kubectl create secret generic dbt-profiles \
  --from-file=profiles.yml=dbt/profiles.yml \
  -n default-drls-clusters

Step 1: Explore the Demo Data

Open the Catalog page and browse the demo namespace. You'll see two tables we'll transform:

Table Rows Description
demo.orders ~150 Order transactions with amounts, status, shipping
demo.customers 30 Customer profiles with lifetime value

Click into demo.orders to see the schema — notice the raw data has mixed-case status values, nullable amounts, and cancelled orders we'll want to filter.

Step 2: Understand the dbt Models

The built-in dbt project at dbt/models/ecommerce/ contains four models:

Staging Layer (Views)

stg_orders — Cleans and normalizes orders:

{{ config(materialized='view', tags=['ecommerce']) }}

select
    order_id,
    customer_id,
    cast(order_date as date) as order_date,
    lower(status) as status,
    total_amount,
    items_count,
    lower(shipping_country) as shipping_country,
    lower(payment_method) as payment_method,
    created_at
from {{ source('demo', 'orders') }}
where status != 'cancelled'
  and total_amount > 0

stg_customers — Normalizes customer profiles:

{{ config(materialized='view', tags=['ecommerce']) }}

select
    customer_id, name,
    lower(trim(email)) as email,
    lower(country) as country,
    cast(signup_date as date) as signup_date,
    is_active, lifetime_value, last_login
from {{ source('demo', 'customers') }}

Mart Layer (Tables)

fct_daily_revenue — Daily revenue aggregation (incremental):

  • Total orders, unique customers, gross revenue, AOV
  • Payment method breakdown
  • Countries served per day

dim_customer_metrics — Customer dimension with RFM segmentation:

  • Joins orders onto customers
  • Calculates recency (days since last order), frequency (order count), monetary (total spend)
  • Assigns tier: vip, regular, new, or inactive

Step 3: Build the Pipeline

Open the Pipeline Builder and create a new pipeline called "E-Commerce Analytics".

Add Source Nodes

  1. Drag Iceberg Source from the palette → set table to demo.orders
  2. Drag another Iceberg Source → set table to demo.customers

Add dbt Staging Step

  1. Drag a dbt node onto the canvas
  2. Configure:
    • Label: Staging Models
    • Command: run
    • Selector: stg_orders stg_customers
    • Target: prod
    • Profiles Secret: dbt-profiles
  3. Connect both source nodes → dbt staging

Add dbt Mart Step

  1. Drag another dbt node
  2. Configure:
    • Label: Mart Models
    • Command: build
    • Selector: fct_daily_revenue dim_customer_metrics
    • Target: prod
    • Profiles Secret: dbt-profiles
  3. Connect dbt staging → dbt marts

Add Sink Nodes

  1. Drag Iceberg Sink → set table to analytics.fct_daily_revenue
  2. Drag another Iceberg Sink → set table to analytics.dim_customer_metrics
  3. Connect dbt marts → both sinks

Your pipeline should look like:

graph LR
    O["demo.orders"] --> STG["dbt run<br/>Staging Models"]
    C["demo.customers"] --> STG
    STG --> MART["dbt build<br/>Mart Models"]
    MART --> R["fct_daily_revenue"]
    MART --> M["dim_customer_metrics"]

Save and Promote

  1. Click Save — creates version 1 in development
  2. Click Promote to QA to validate

Step 4: Deploy and Run

  1. From the pipeline list, click Deploy
  2. Select the QA version and target workspace
  3. Monitor execution in the Monitor tab

The pipeline creates two Kubernetes Jobs in sequence:

  1. Staging Jobdbt run --select stg_orders stg_customers --target prod
  2. Mart Jobdbt build --select fct_daily_revenue dim_customer_metrics --target prod

Each Job mounts a ConfigMap containing the dbt project files and a Secret with profiles.yml.

Step 5: Verify Results

Check the Catalog

Navigate to the Catalog page and browse the analytics namespace. You should see two new tables:

  • analytics.fct_daily_revenue — one row per order date
  • analytics.dim_customer_metrics — one row per customer with tier assignment

Query the Marts

Click into analytics.dim_customer_metrics and preview the data. You'll see customers segmented into tiers:

customer_id name order_count total_spend customer_tier
C001 Alice 8 1,240 vip
C012 Bob 3 350 regular
C025 Carol 1 45 new
C030 Dave 0 0 inactive

Step 6: Add Data Quality Tests

Create a second pipeline for scheduled quality checks:

  1. Create a new pipeline: "E-Commerce Quality Tests"
  2. Add a single dbt node:
    • Command: test
    • Selector: tag:ecommerce
    • Fail Fast: checked
    • Profiles Secret: dbt-profiles
  3. Save and schedule daily

The tests validate:

  • order_id is not null and unique
  • total_amount is positive
  • customer_tier is one of [vip, regular, new, inactive]
  • gross_revenue is not null

Step 7: Hybrid Pipeline (Advanced)

Combine Spark ingest with dbt transforms in a single DAG:

graph LR
    K["Kafka<br/>order-events"] --> S["Spark Streaming<br/>Validate & Enrich"]
    S --> RAW["Iceberg Sink<br/>raw.validated_orders"]
    S --> DBT["dbt build<br/>tag:ecommerce"]
    DBT --> A["Iceberg Sink<br/>analytics.fct_daily_revenue"]

This pattern is useful when:

  • Spark handles real-time validation and enrichment
  • dbt handles batch aggregation and business logic
  • Both share the same Iceberg catalog through Polaris

The DAG edges ensure dbt runs only after Spark has written validated data.

Seed All Demo Pipelines

To load all three demo pipelines at once:

python scripts/seed_demo_pipelines.py

This creates:

  1. E-Commerce Analytics (dbt) — pure dbt medallion pipeline
  2. Hybrid Spark + dbt Pipeline — streaming ingest + batch transforms
  3. dbt Data Quality Suite — scheduled test runner

Next Steps