Skip to content

Home

DRLS Banner

Agentic Iceberg Lakehouse Toolkit — Spark-on-Ray runtime with first-class Iceberg support, bidirectional streaming bridge, and AI-driven lakehouse management.


Features

Spark-on-Ray Runtime

Run Apache Spark executors as Ray actors for elastic scaling, resource sharing, and unified Python-first orchestration.

Iceberg First-Class

Full lakehouse operations out of the box: compaction (binpack/sort/zorder), snapshot management, schema evolution, partition evolution, time travel, and CDC streaming.

Bidirectional Streaming

Spark Structured Streaming to Ray Data bridge with configurable backpressure, plus a reverse bridge for Ray-to-Spark pipelines.

Agentic Layer

MCP server and OpenAI-compatible tool definitions (13 tools) for AI-driven lakehouse management via natural language.

CLI

Manage your lakehouse from the terminal: drls compact, health, tables, expire, agent, and more.

Management Console

React-based UI with pipeline builder (React Flow), streaming monitor (WebSocket), table explorer, and agent chat — backed by a Rust Axum server with embedded Python via PyO3.

Pipeline Versioning & Environments

Versioned pipeline definitions with a managed environment lifecycle: development → QA → UAT → production → retired. Approval gates require Lead Data Engineer sign-off for promotions to UAT and production. JSON import/export with format version compatibility.

Notebook Service

Per-user marimo notebook environments running in Kubernetes. The backend manages pod lifecycle and reverse-proxies notebook traffic, while the frontend provides a seamless embedded experience.


Architecture

graph TB
    subgraph "User Interfaces"
        CLI["CLI (Click)"]
        React["React UI :5173"]
        MCP["MCP Server :8100"]
    end

    subgraph "Backend Services"
        Axum["Axum REST :3000"]
        PyO3["PyO3 (embedded Python)"]
        SQLite["SQLite (pipelines, config,\nversions, approvals)"]
    end

    subgraph "Python Runtime"
        Agent["TRex"]
        Core["Core Runtime"]
    end

    subgraph "Execution"
        Spark["Spark-on-Ray"]
        Ray["Ray Cluster"]
        Iceberg["Iceberg Tables"]
    end

    CLI --> Core
    React --> Axum
    Axum --> PyO3
    PyO3 --> Core
    PyO3 --> Agent
    Axum --> SQLite
    MCP --> Agent
    Agent --> Core
    Core --> Spark
    Spark --> Ray
    Spark --> Iceberg

Quick Start

pip install drls
import ray
import drls

ray.init()
spark = drls.init_spark("demo", num_executors=1, executor_cores=1, executor_memory="500M")

# Run a query
assert spark.range(10).count() == 10

# Create an Iceberg table
spark.sql("CREATE TABLE drls.db.events (id BIGINT, ts TIMESTAMP) USING iceberg")

drls.stop_spark()
ray.shutdown()

See the Installation Guide and Quickstart for more.


License

Component License
Core library, CLI, agentic tools, gRPC server Apache License 2.0
UI (Rust backend + React frontend) BSL / Proprietary

This project includes code derived from RayDP (Apache 2.0). See NOTICE for details.