Configuration Reference¶
Iceberg Catalog Properties¶
Configure via IcebergCatalogConfig or init_spark() parameters:
spark = drls.init_spark(
"my-app", 2, 1, "1G",
iceberg_catalog="hadoop",
iceberg_warehouse="/tmp/warehouse",
iceberg_catalog_name="drls",
iceberg_catalog_uri=None,
iceberg_catalog_props={"key": "value"},
)
Catalog Type Properties¶
Spark configs generated:
Spark Session Configuration¶
Pass additional Spark configuration via the configs parameter:
spark = drls.init_spark(
"my-app", 2, 1, "1G",
configs={
"spark.sql.shuffle.partitions": "10",
"spark.sql.adaptive.enabled": "true",
},
)
Rust Backend Environment Variables¶
| Variable | Default | Description |
|---|---|---|
FRONTEND_DIR |
../frontend/dist |
Path to built React assets |
DATABASE_URL |
sqlite://drls.db |
SQLx database connection string (sqlite:// or postgres://) |
AUTH_ENABLED |
false |
Enable JWT auth middleware (requires AUTH_SECRET) |
AUTH_SECRET |
— | HMAC-SHA256 secret for JWT validation (required when AUTH_ENABLED=true) |
RUST_LOG |
info |
Tracing log level filter |
PYO3_PYTHON |
system default | Python interpreter for PyO3 |
JWT Authentication¶
When AUTH_ENABLED=true, the Rust backend validates JWT tokens on all /api/* routes.
- Algorithm: HS256 (HMAC-SHA256)
- Secret: Set via
AUTH_SECRETenvironment variable - Required claims:
sub(subject/user ID),exp(expiry as UTC timestamp) - Optional claims:
iat(issued at),role(user role)
Tokens must be passed in the Authorization header as Bearer <token>. Expired or
incorrectly signed tokens are rejected with HTTP 401. If AUTH_ENABLED=true but
AUTH_SECRET is not set, all requests are rejected (fail closed).
RBAC Roles¶
The system uses a 4-tier role hierarchy where each role inherits all permissions of the roles below it:
| Role | Level | Permissions |
|---|---|---|
| Admin | 4 | User management, configuration, agent chat, all lower |
| LeadDataEngineer | 3 | Approvals, demote/retire, schema evolution, compaction, all lower |
| DataEngineer | 2 | Pipeline CRUD, versioning, promote, start/stop pipelines, notebooks, all lower |
| Auditor | 1 | Read-only access to tables, pipelines, definitions, versions |
The first user to register is automatically granted Admin. Subsequent users default to Auditor.
Database¶
The Rust backend supports both SQLite and PostgreSQL via the DATABASE_URL environment variable.
- SQLite (default):
sqlite://drls.db— the database file is created automatically if it doesn't exist. - PostgreSQL:
postgres://user:password@host:5432/dbname— requires an existing PostgreSQL database.
# SQLite (default)
DATABASE_URL=sqlite://drls.db cargo run
# PostgreSQL
DATABASE_URL=postgres://drls:secret@localhost:5432/drls cargo run
The backend detects the database type from the URL scheme at startup and runs dialect-appropriate migrations automatically.
Python Environment Variables¶
| Variable | Description |
|---|---|
DRLS_CATALOG_TYPE |
Default catalog type |
DRLS_WAREHOUSE |
Default warehouse path |
DRLS_CATALOG_URI |
Default catalog URI |
JAVA_HOME |
Path to JDK 17 (required for Spark) |
DRLS_E2E |
Set to 1 to enable end-to-end tests |
Streaming Configuration¶
StreamCoordinator¶
| Parameter | Default | Description |
|---|---|---|
max_buffered_batches |
64 |
Maximum number of Arrow batches buffered |
max_buffered_bytes |
2 * 1024**3 (2 GB) |
Maximum buffer size in bytes |
Forward Bridge (from_spark_streaming)¶
| Parameter | Default | Description |
|---|---|---|
stream_id |
auto-generated | Name for the Ray coordinator actor |
max_buffered_batches |
64 |
Buffer capacity |
max_buffered_bytes |
2 GB | Buffer size limit |
trigger |
None |
Spark trigger configuration |
checkpoint_location |
None |
Streaming checkpoint path |
use_jvm_sink |
False |
Use distributed JVM sink (bypasses driver) |
Reverse Bridge (to_spark_streaming)¶
| Parameter | Default | Description |
|---|---|---|
stream_id |
auto-generated | Name for the Ray coordinator actor |
max_buffered_batches |
64 |
Buffer capacity |
max_buffered_bytes |
2 GB | Buffer size limit |