Skip to content

Configuration Reference

Iceberg Catalog Properties

Configure via IcebergCatalogConfig or init_spark() parameters:

spark = drls.init_spark(
    "my-app", 2, 1, "1G",
    iceberg_catalog="hadoop",
    iceberg_warehouse="/tmp/warehouse",
    iceberg_catalog_name="drls",
    iceberg_catalog_uri=None,
    iceberg_catalog_props={"key": "value"},
)

Catalog Type Properties

IcebergCatalogConfig(
    catalog_type="hadoop",
    warehouse="/tmp/warehouse",
)

Spark configs generated:

spark.sql.catalog.drls = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.drls.type = hadoop
spark.sql.catalog.drls.warehouse = /tmp/warehouse

IcebergCatalogConfig(
    catalog_type="hive",
    warehouse="s3://bucket/warehouse",
    uri="thrift://metastore:9083",
)
IcebergCatalogConfig(
    catalog_type="rest",
    warehouse="s3://bucket/warehouse",
    uri="https://catalog.example.com",
)
IcebergCatalogConfig(
    catalog_type="polaris",
    warehouse="s3://bucket/warehouse",
    uri="https://polaris.example.com",
)
IcebergCatalogConfig(
    catalog_type="glue",
    warehouse="s3://bucket/warehouse",
)
IcebergCatalogConfig(
    catalog_type="nessie",
    warehouse="s3://bucket/warehouse",
    uri="https://nessie.example.com/api/v1",
)

Spark Session Configuration

Pass additional Spark configuration via the configs parameter:

spark = drls.init_spark(
    "my-app", 2, 1, "1G",
    configs={
        "spark.sql.shuffle.partitions": "10",
        "spark.sql.adaptive.enabled": "true",
    },
)

Rust Backend Environment Variables

Variable Default Description
FRONTEND_DIR ../frontend/dist Path to built React assets
DATABASE_URL sqlite://drls.db SQLx database connection string (sqlite:// or postgres://)
AUTH_ENABLED false Enable JWT auth middleware (requires AUTH_SECRET)
AUTH_SECRET HMAC-SHA256 secret for JWT validation (required when AUTH_ENABLED=true)
RUST_LOG info Tracing log level filter
PYO3_PYTHON system default Python interpreter for PyO3

JWT Authentication

When AUTH_ENABLED=true, the Rust backend validates JWT tokens on all /api/* routes.

  • Algorithm: HS256 (HMAC-SHA256)
  • Secret: Set via AUTH_SECRET environment variable
  • Required claims: sub (subject/user ID), exp (expiry as UTC timestamp)
  • Optional claims: iat (issued at), role (user role)

Tokens must be passed in the Authorization header as Bearer <token>. Expired or incorrectly signed tokens are rejected with HTTP 401. If AUTH_ENABLED=true but AUTH_SECRET is not set, all requests are rejected (fail closed).

RBAC Roles

The system uses a 4-tier role hierarchy where each role inherits all permissions of the roles below it:

Role Level Permissions
Admin 4 User management, configuration, agent chat, all lower
LeadDataEngineer 3 Approvals, demote/retire, schema evolution, compaction, all lower
DataEngineer 2 Pipeline CRUD, versioning, promote, start/stop pipelines, notebooks, all lower
Auditor 1 Read-only access to tables, pipelines, definitions, versions

The first user to register is automatically granted Admin. Subsequent users default to Auditor.

Database

The Rust backend supports both SQLite and PostgreSQL via the DATABASE_URL environment variable.

  • SQLite (default): sqlite://drls.db — the database file is created automatically if it doesn't exist.
  • PostgreSQL: postgres://user:password@host:5432/dbname — requires an existing PostgreSQL database.
# SQLite (default)
DATABASE_URL=sqlite://drls.db cargo run

# PostgreSQL
DATABASE_URL=postgres://drls:secret@localhost:5432/drls cargo run

The backend detects the database type from the URL scheme at startup and runs dialect-appropriate migrations automatically.

Python Environment Variables

Variable Description
DRLS_CATALOG_TYPE Default catalog type
DRLS_WAREHOUSE Default warehouse path
DRLS_CATALOG_URI Default catalog URI
JAVA_HOME Path to JDK 17 (required for Spark)
DRLS_E2E Set to 1 to enable end-to-end tests

Streaming Configuration

StreamCoordinator

Parameter Default Description
max_buffered_batches 64 Maximum number of Arrow batches buffered
max_buffered_bytes 2 * 1024**3 (2 GB) Maximum buffer size in bytes

Forward Bridge (from_spark_streaming)

Parameter Default Description
stream_id auto-generated Name for the Ray coordinator actor
max_buffered_batches 64 Buffer capacity
max_buffered_bytes 2 GB Buffer size limit
trigger None Spark trigger configuration
checkpoint_location None Streaming checkpoint path
use_jvm_sink False Use distributed JVM sink (bypasses driver)

Reverse Bridge (to_spark_streaming)

Parameter Default Description
stream_id auto-generated Name for the Ray coordinator actor
max_buffered_batches 64 Buffer capacity
max_buffered_bytes 2 GB Buffer size limit