Docker Images¶

DRLS ships four Docker images for Kubernetes deployment.

Images Overview¶

Image	Dockerfile	Base	Purpose
`drls/spark`	`docker/spark.Dockerfile`	`ubuntu:24.04`	Spark executors and drivers on K8s
`drls/ray-cpu`	`docker/ray-cpu.Dockerfile`	`rayproject/ray:2.53.0-py312-cpu`	Ray CPU workers
`drls/ray-gpu`	`docker/ray-gpu.Dockerfile`	`rayproject/ray:2.53.0-py312-gpu`	Ray GPU workers (CUDA-enabled)
`drls/pipeline-operator`	`docker/pipeline-operator.Dockerfile`	`rust:1.83-slim` (build) / `debian:bookworm-slim` (runtime)	Pipeline operator controller

Prerequisites¶

All images are built from the project root as the Docker build context.

Spark image¶

The Spark image requires artifacts built before docker build:

JVM JARs — Run the Maven build to produce the three DRLS JARs:
```
cd core && mvn clean package -DskipTests
```

Spark binary distribution — Download and place in the docker/ directory:

curl -o docker/spark-4.1.1-bin-hadoop3.tgz \
  https://archive.apache.org/dist/spark/spark-4.1.1/spark-4.1.1-bin-hadoop3.tgz

Python package — The python/ directory is copied into the image during build. No pre-build step needed.

Java Version

The Maven build requires Java 21. If you have multiple JDK versions, set JAVA_HOME explicitly:

JAVA_HOME=/path/to/jdk-21 mvn clean package -DskipTests

Ray images¶

The Ray images only need the python/ directory — no JVM build required.

Pipeline operator image¶

The pipeline operator image builds the Rust binary inside the Docker build. No pre-build step needed, but the rust/ workspace must be present.

Build Commands¶

All commands run from the project root:

# Spark executor/driver image
docker build -f docker/spark.Dockerfile -t drls/spark:latest .

# Ray CPU worker
docker build -f docker/ray-cpu.Dockerfile -t drls/ray-cpu:latest .

# Ray GPU worker (CUDA-enabled)
docker build -f docker/ray-gpu.Dockerfile -t drls/ray-gpu:latest .

# Pipeline operator
docker build -f docker/pipeline-operator.Dockerfile -t drls/pipeline-operator:latest .

Build Context

The .dockerignore at the project root excludes .git/, rust/target/, node_modules/, and other non-essential files. It preserves core/*/target/ since the Spark image needs the JVM JARs.

Image Contents¶

Spark¶

OpenJDK 21 JRE
Spark 4.1.1 binary distribution at /opt/spark
Iceberg Spark runtime JAR (iceberg-spark-runtime-4.0_2.13-1.10.1.jar)
DRLS JARs (shim, main, agent) in $SPARK_HOME/jars/
DRLS Python package
Pipeline entrypoints in $SPARK_HOME/entrypoints/
K8s entrypoint and decommission scripts at /opt/entrypoint.sh and /opt/decom.sh

Spark Operator Compatibility

The Spark image runs as uid 185 (the spark user) and uses /opt/entrypoint.sh — the standard convention expected by the Spark Operator. The work directory at $SPARK_HOME/work-dir is group-writable for OpenShift compatibility.

Ray CPU / GPU¶

Official Ray 2.53.0 base image (Python 3.12)
DRLS Python package
Pipeline entrypoints in /opt/drls/entrypoints/
Runs as the ray user

The GPU image is identical to CPU except the base image includes CUDA libraries.

Pipeline Operator¶

Multi-stage build: Rust 1.83 builder, Debian Bookworm slim runtime
Single static binary at /usr/local/bin/drls-pipeline-operator
Minimal runtime dependencies (ca-certificates, libssl3)

Next Steps¶

Installation — Install the Python package locally
Dev Setup — Full development environment setup