Skip to content

Docker Images

DRLS ships four Docker images for Kubernetes deployment.

Images Overview

Image Dockerfile Base Purpose
drls/spark docker/spark.Dockerfile ubuntu:24.04 Spark executors and drivers on K8s
drls/ray-cpu docker/ray-cpu.Dockerfile rayproject/ray:2.53.0-py312-cpu Ray CPU workers
drls/ray-gpu docker/ray-gpu.Dockerfile rayproject/ray:2.53.0-py312-gpu Ray GPU workers (CUDA-enabled)
drls/pipeline-operator docker/pipeline-operator.Dockerfile rust:1.83-slim (build) / debian:bookworm-slim (runtime) Pipeline operator controller

Prerequisites

All images are built from the project root as the Docker build context.

Spark image

The Spark image requires artifacts built before docker build:

  1. JVM JARs — Run the Maven build to produce the three DRLS JARs:

    cd core && mvn clean package -DskipTests
    
  2. Spark binary distribution — Download and place in the docker/ directory:

    curl -o docker/spark-4.1.1-bin-hadoop3.tgz \
      https://archive.apache.org/dist/spark/spark-4.1.1/spark-4.1.1-bin-hadoop3.tgz
    
  3. Python package — The python/ directory is copied into the image during build. No pre-build step needed.

Java Version

The Maven build requires Java 21. If you have multiple JDK versions, set JAVA_HOME explicitly:

JAVA_HOME=/path/to/jdk-21 mvn clean package -DskipTests

Ray images

The Ray images only need the python/ directory — no JVM build required.

Pipeline operator image

The pipeline operator image builds the Rust binary inside the Docker build. No pre-build step needed, but the rust/ workspace must be present.

Build Commands

All commands run from the project root:

# Spark executor/driver image
docker build -f docker/spark.Dockerfile -t drls/spark:latest .

# Ray CPU worker
docker build -f docker/ray-cpu.Dockerfile -t drls/ray-cpu:latest .

# Ray GPU worker (CUDA-enabled)
docker build -f docker/ray-gpu.Dockerfile -t drls/ray-gpu:latest .

# Pipeline operator
docker build -f docker/pipeline-operator.Dockerfile -t drls/pipeline-operator:latest .

Build Context

The .dockerignore at the project root excludes .git/, rust/target/, node_modules/, and other non-essential files. It preserves core/*/target/ since the Spark image needs the JVM JARs.

Image Contents

Spark

  • OpenJDK 21 JRE
  • Spark 4.1.1 binary distribution at /opt/spark
  • Iceberg Spark runtime JAR (iceberg-spark-runtime-4.0_2.13-1.10.1.jar)
  • DRLS JARs (shim, main, agent) in $SPARK_HOME/jars/
  • DRLS Python package
  • Pipeline entrypoints in $SPARK_HOME/entrypoints/
  • K8s entrypoint and decommission scripts at /opt/entrypoint.sh and /opt/decom.sh

Spark Operator Compatibility

The Spark image runs as uid 185 (the spark user) and uses /opt/entrypoint.sh — the standard convention expected by the Spark Operator. The work directory at $SPARK_HOME/work-dir is group-writable for OpenShift compatibility.

Ray CPU / GPU

  • Official Ray 2.53.0 base image (Python 3.12)
  • DRLS Python package
  • Pipeline entrypoints in /opt/drls/entrypoints/
  • Runs as the ray user

The GPU image is identical to CPU except the base image includes CUDA libraries.

Pipeline Operator

  • Multi-stage build: Rust 1.83 builder, Debian Bookworm slim runtime
  • Single static binary at /usr/local/bin/drls-pipeline-operator
  • Minimal runtime dependencies (ca-certificates, libssl3)

Next Steps