Home
Agentic Iceberg Lakehouse Toolkit — Spark-on-Ray runtime with first-class Iceberg support, bidirectional streaming bridge, and AI-driven lakehouse management.
Features¶
Spark-on-Ray Runtime¶
Run Apache Spark executors as Ray actors for elastic scaling, resource sharing, and unified Python-first orchestration.
Iceberg First-Class¶
Full lakehouse operations out of the box: compaction (binpack/sort/zorder), snapshot management, schema evolution, partition evolution, time travel, and CDC streaming.
Bidirectional Streaming¶
Spark Structured Streaming to Ray Data bridge with configurable backpressure, plus a reverse bridge for Ray-to-Spark pipelines.
Agentic Layer¶
MCP server and OpenAI-compatible tool definitions (13 tools) for AI-driven lakehouse management via natural language.
CLI¶
Manage your lakehouse from the terminal: drls compact, health, tables, expire, agent, and more.
Management Console¶
React-based UI with pipeline builder (React Flow), streaming monitor (WebSocket), table explorer, and agent chat — backed by a Rust Axum server with embedded Python via PyO3.
Pipeline Versioning & Environments¶
Versioned pipeline definitions with a managed environment lifecycle: development → QA → UAT → production → retired. Approval gates require Lead Data Engineer sign-off for promotions to UAT and production. JSON import/export with format version compatibility.
Notebook Service¶
Per-user marimo notebook environments running in Kubernetes. The backend manages pod lifecycle and reverse-proxies notebook traffic, while the frontend provides a seamless embedded experience.
Architecture¶
graph TB
subgraph "User Interfaces"
CLI["CLI (Click)"]
React["React UI :5173"]
MCP["MCP Server :8100"]
end
subgraph "Backend Services"
Axum["Axum REST :3000"]
PyO3["PyO3 (embedded Python)"]
SQLite["SQLite (pipelines, config,\nversions, approvals)"]
end
subgraph "Python Runtime"
Agent["TRex"]
Core["Core Runtime"]
end
subgraph "Execution"
Spark["Spark-on-Ray"]
Ray["Ray Cluster"]
Iceberg["Iceberg Tables"]
end
CLI --> Core
React --> Axum
Axum --> PyO3
PyO3 --> Core
PyO3 --> Agent
Axum --> SQLite
MCP --> Agent
Agent --> Core
Core --> Spark
Spark --> Ray
Spark --> Iceberg
Quick Start¶
import ray
import drls
ray.init()
spark = drls.init_spark("demo", num_executors=1, executor_cores=1, executor_memory="500M")
# Run a query
assert spark.range(10).count() == 10
# Create an Iceberg table
spark.sql("CREATE TABLE drls.db.events (id BIGINT, ts TIMESTAMP) USING iceberg")
drls.stop_spark()
ray.shutdown()
See the Installation Guide and Quickstart for more.
License¶
| Component | License |
|---|---|
| Core library, CLI, agentic tools, gRPC server | Apache License 2.0 |
| UI (Rust backend + React frontend) | BSL / Proprietary |
This project includes code derived from RayDP (Apache 2.0). See NOTICE for details.