Docker Images¶
DRLS ships four Docker images for Kubernetes deployment.
Images Overview¶
| Image | Dockerfile | Base | Purpose |
|---|---|---|---|
drls/spark |
docker/spark.Dockerfile |
ubuntu:24.04 |
Spark executors and drivers on K8s |
drls/ray-cpu |
docker/ray-cpu.Dockerfile |
rayproject/ray:2.53.0-py312-cpu |
Ray CPU workers |
drls/ray-gpu |
docker/ray-gpu.Dockerfile |
rayproject/ray:2.53.0-py312-gpu |
Ray GPU workers (CUDA-enabled) |
drls/pipeline-operator |
docker/pipeline-operator.Dockerfile |
rust:1.83-slim (build) / debian:bookworm-slim (runtime) |
Pipeline operator controller |
Prerequisites¶
All images are built from the project root as the Docker build context.
Spark image¶
The Spark image requires artifacts built before docker build:
-
JVM JARs — Run the Maven build to produce the three DRLS JARs:
-
Spark binary distribution — Download and place in the
docker/directory: -
Python package — The
python/directory is copied into the image during build. No pre-build step needed.
Java Version
The Maven build requires Java 21. If you have multiple JDK versions, set JAVA_HOME explicitly:
Ray images¶
The Ray images only need the python/ directory — no JVM build required.
Pipeline operator image¶
The pipeline operator image builds the Rust binary inside the Docker build. No pre-build step needed, but the rust/ workspace must be present.
Build Commands¶
All commands run from the project root:
# Spark executor/driver image
docker build -f docker/spark.Dockerfile -t drls/spark:latest .
# Ray CPU worker
docker build -f docker/ray-cpu.Dockerfile -t drls/ray-cpu:latest .
# Ray GPU worker (CUDA-enabled)
docker build -f docker/ray-gpu.Dockerfile -t drls/ray-gpu:latest .
# Pipeline operator
docker build -f docker/pipeline-operator.Dockerfile -t drls/pipeline-operator:latest .
Build Context
The .dockerignore at the project root excludes .git/, rust/target/, node_modules/, and other non-essential files. It preserves core/*/target/ since the Spark image needs the JVM JARs.
Image Contents¶
Spark¶
- OpenJDK 21 JRE
- Spark 4.1.1 binary distribution at
/opt/spark - Iceberg Spark runtime JAR (
iceberg-spark-runtime-4.0_2.13-1.10.1.jar) - DRLS JARs (shim, main, agent) in
$SPARK_HOME/jars/ - DRLS Python package
- Pipeline entrypoints in
$SPARK_HOME/entrypoints/ - K8s entrypoint and decommission scripts at
/opt/entrypoint.shand/opt/decom.sh
Spark Operator Compatibility
The Spark image runs as uid 185 (the spark user) and uses /opt/entrypoint.sh — the standard convention expected by the Spark Operator. The work directory at $SPARK_HOME/work-dir is group-writable for OpenShift compatibility.
Ray CPU / GPU¶
- Official Ray 2.53.0 base image (Python 3.12)
- DRLS Python package
- Pipeline entrypoints in
/opt/drls/entrypoints/ - Runs as the
rayuser
The GPU image is identical to CPU except the base image includes CUDA libraries.
Pipeline Operator¶
- Multi-stage build: Rust 1.83 builder, Debian Bookworm slim runtime
- Single static binary at
/usr/local/bin/drls-pipeline-operator - Minimal runtime dependencies (ca-certificates, libssl3)
Next Steps¶
- Installation — Install the Python package locally
- Dev Setup — Full development environment setup