Skip to content

Architecture Overview

DRLS is a multi-runtime system combining Spark-on-Ray execution, Iceberg lakehouse operations, and an AI agentic layer, with a React management console backed by a Rust server.

System Architecture

graph TB
    subgraph "User Interfaces"
        CLI["CLI (Click)"]
        React["React UI :5173"]
        MCP["MCP Server :8100"]
    end

    subgraph "Backend Services"
        Axum["Axum REST :3000"]
        PyO3["PyO3 (embedded Python)"]
        SQLite["SQLite (pipelines, config,\nversions, approvals)"]
    end

    subgraph "Python Runtime"
        Agent["TRex"]
        Core["Core Runtime"]
    end

    subgraph "Execution"
        Spark["Spark-on-Ray"]
        Ray["Ray Cluster"]
        Iceberg["Iceberg Tables"]
    end

    CLI --> Core
    React --> Axum
    Axum --> PyO3
    PyO3 --> Core
    PyO3 --> Agent
    Axum --> SQLite
    MCP --> Agent
    Agent --> Core
    Core --> Spark
    Spark --> Ray
    Spark --> Iceberg

Tech Stack

Component Technology Version
JVM Runtime Java 21
Distributed SQL Apache Spark 4.1.1
Scala Scala 2.13.17
Distributed Compute Ray 2.53.0 (Python) / 2.47.1 (Java JARs)
Python Python 3.12+
Table Format Apache Iceberg via Spark 4.1.1
REST Backend Rust (Axum) 0.8
Python Embedding PyO3 0.23
Database SQLite (SQLx)
Frontend React 18 + Vite 6 + TypeScript 5.6
AI Integration litellm + MCP

Data Flow

The primary data flow for the management console:

React UI → Axum REST (:3000) → PyO3 → Python/Spark/Iceberg

For CLI and Python API usage:

CLI / Python code → drls Python package → Spark-on-Ray → Iceberg

Module Overview

Module Location Purpose
JVM Core core/ Spark shim, app master, executor, object store
Python Package python/drls/ Core runtime, streaming, Iceberg ops, agentic, CLI
Rust UI Server rust/crates/ui-server/ Axum REST server, PyO3 bridge, SQLite, middleware, pipeline versioning, approval workflow
Shared CRDs rust/crates/shared/ Pipeline and Notebook CRD types shared across operators
Pipeline Operator rust/crates/pipeline-operator/ K8s operator — orchestrates DrlsPipeline and DrlsPipelineRun CRDs
Marimo Operator rust/crates/marimo-operator/ K8s operator — manages MarimoNotebook pod lifecycle
React Frontend ui/frontend/ Management console UI, pipeline builder, approval queue
Examples examples/ Demo scripts

Note

ui/backend/ is a symlink to rust/crates/ui-server/ for dev workflow compatibility.

Pipeline Lifecycle

Pipeline definitions are versioned, immutable artifacts that progress through a managed environment lifecycle:

graph LR
    dev["Development"] --> qa["QA"]
    qa -->|"approval"| uat["UAT"]
    uat -->|"approval"| prod["Production"]
    prod --> retired["Retired"]
    uat -.->|"demote"| qa
  • Versioning — Each save creates an immutable version snapshot with a format version for compatibility
  • Environments — Versions move through development → QA → UAT → production → retired
  • Approval gates — Promotions to UAT and production require Lead Data Engineer approval
  • Read-only enforcement — UAT, production, and retired versions cannot be modified
  • Import/Export — Pipeline versions can be exported as JSON and imported into other environments

Pipeline Execution

When a user deploys a pipeline from the builder, the full execution path is:

graph LR
    Builder["Pipeline Builder"] --> Axum["Axum (graph compiler)"]
    Axum --> CRD["DrlsPipeline CR\n+ DrlsPipelineRun CR"]
    CRD --> Op["Pipeline Operator\n(DAG evaluation)"]
    Op --> Child["SparkApplication /\nRayJob / Job CRs"]
    Child --> Runners["Spark Operator /\nKubeRay / K8s"]
    Runners --> Pods["Pods"]
    Pods -.-> Op
    Op -.->|"status update"| CRD
    CRD -.->|"CRD read"| Axum
    Axum -.->|"step-level status"| Builder
  1. Graph compilation — Axum translates React Flow nodes/edges into StepSpec[] entries in a DrlsPipelineSpec
  2. CRD creation — Axum creates DrlsPipeline and DrlsPipelineRun custom resources in the pipeline namespace
  3. DAG evaluation — The pipeline operator watches runs, evaluates step dependencies (Kahn's algorithm), and creates child CRs for ready steps
  4. Child CR dispatch — Each step type maps to a child resource: sparkStream → SparkApplication, rayProcess → RayJob, dbt/custom → Job
  5. Status reconciliation — The operator monitors child CR status (via DynamicObject for SparkApplication/RayJob), updates step phases, handles retries
  6. Frontend display — The pipeline status endpoint reads CRD status and returns step-level progress to the UI

See the detailed architecture pages for each component: