Configuration Reference¶

Iceberg Catalog Properties¶

Configure via IcebergCatalogConfig or init_spark() parameters:

spark = drls.init_spark(
    "my-app", 2, 1, "1G",
    iceberg_catalog="hadoop",
    iceberg_warehouse="/tmp/warehouse",
    iceberg_catalog_name="drls",
    iceberg_catalog_uri=None,
    iceberg_catalog_props={"key": "value"},
)

Catalog Type Properties¶

HadoopHiveRESTPolarisGlueNessie

IcebergCatalogConfig(
    catalog_type="hadoop",
    warehouse="/tmp/warehouse",
)

Spark configs generated:

spark.sql.catalog.drls = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.drls.type = hadoop
spark.sql.catalog.drls.warehouse = /tmp/warehouse

IcebergCatalogConfig(
    catalog_type="hive",
    warehouse="s3://bucket/warehouse",
    uri="thrift://metastore:9083",
)

IcebergCatalogConfig(
    catalog_type="rest",
    warehouse="s3://bucket/warehouse",
    uri="https://catalog.example.com",
)

IcebergCatalogConfig(
    catalog_type="polaris",
    warehouse="s3://bucket/warehouse",
    uri="https://polaris.example.com",
)

IcebergCatalogConfig(
    catalog_type="glue",
    warehouse="s3://bucket/warehouse",
)

IcebergCatalogConfig(
    catalog_type="nessie",
    warehouse="s3://bucket/warehouse",
    uri="https://nessie.example.com/api/v1",
)

Spark Session Configuration¶

Pass additional Spark configuration via the configs parameter:

spark = drls.init_spark(
    "my-app", 2, 1, "1G",
    configs={
        "spark.sql.shuffle.partitions": "10",
        "spark.sql.adaptive.enabled": "true",
    },
)

Rust Backend Environment Variables¶

Variable	Default	Description
`FRONTEND_DIR`	`../frontend/dist`	Path to built React assets
`DATABASE_URL`	`sqlite://drls.db`	SQLx database connection string (`sqlite://` or `postgres://`)
`AUTH_ENABLED`	`false`	Enable JWT auth middleware (requires `AUTH_SECRET`)
`AUTH_SECRET`	—	HMAC-SHA256 secret for JWT validation (required when `AUTH_ENABLED=true`)
`RUST_LOG`	`info`	Tracing log level filter
`PYO3_PYTHON`	system default	Python interpreter for PyO3

JWT Authentication¶

When AUTH_ENABLED=true, the Rust backend validates JWT tokens on all /api/* routes.

Algorithm: HS256 (HMAC-SHA256)
Secret: Set via AUTH_SECRET environment variable
Required claims: sub (subject/user ID), exp (expiry as UTC timestamp)
Optional claims: iat (issued at), role (user role)

Tokens must be passed in the Authorization header as Bearer <token>. Expired or incorrectly signed tokens are rejected with HTTP 401. If AUTH_ENABLED=true but AUTH_SECRET is not set, all requests are rejected (fail closed).

RBAC Roles¶

The system uses a 4-tier role hierarchy where each role inherits all permissions of the roles below it:

Role	Level	Permissions
Admin	4	User management, configuration, agent chat, all lower
LeadDataEngineer	3	Approvals, demote/retire, schema evolution, compaction, all lower
DataEngineer	2	Pipeline CRUD, versioning, promote, start/stop pipelines, notebooks, all lower
Auditor	1	Read-only access to tables, pipelines, definitions, versions

The first user to register is automatically granted Admin. Subsequent users default to Auditor.

Database¶

The Rust backend supports both SQLite and PostgreSQL via the DATABASE_URL environment variable.

SQLite (default): sqlite://drls.db — the database file is created automatically if it doesn't exist.
PostgreSQL: postgres://user:password@host:5432/dbname — requires an existing PostgreSQL database.

# SQLite (default)
DATABASE_URL=sqlite://drls.db cargo run

# PostgreSQL
DATABASE_URL=postgres://drls:secret@localhost:5432/drls cargo run

The backend detects the database type from the URL scheme at startup and runs dialect-appropriate migrations automatically.

Python Environment Variables¶

Variable	Description
`DRLS_CATALOG_TYPE`	Default catalog type
`DRLS_WAREHOUSE`	Default warehouse path
`DRLS_CATALOG_URI`	Default catalog URI
`JAVA_HOME`	Path to JDK 17 (required for Spark)
`DRLS_E2E`	Set to `1` to enable end-to-end tests

Streaming Configuration¶

StreamCoordinator¶

Parameter	Default	Description
`max_buffered_batches`	`64`	Maximum number of Arrow batches buffered
`max_buffered_bytes`	`2 * 1024**3` (2 GB)	Maximum buffer size in bytes

Forward Bridge (`from_spark_streaming`)¶

Parameter	Default	Description
`stream_id`	auto-generated	Name for the Ray coordinator actor
`max_buffered_batches`	`64`	Buffer capacity
`max_buffered_bytes`	2 GB	Buffer size limit
`trigger`	`None`	Spark trigger configuration
`checkpoint_location`	`None`	Streaming checkpoint path
`use_jvm_sink`	`False`	Use distributed JVM sink (bypasses driver)

Reverse Bridge (`to_spark_streaming`)¶

Parameter	Default	Description
`stream_id`	auto-generated	Name for the Ray coordinator actor
`max_buffered_batches`	`64`	Buffer capacity
`max_buffered_bytes`	2 GB	Buffer size limit