Intermediate

Docker Security for ML Containers

Docker is the foundation of most ML container deployments. Hardening your Dockerfiles and runtime configuration is the first line of defense against container-based attacks.

Dockerfile Hardening for ML

ML Dockerfiles require special attention because of their large base images and complex dependency chains:

  1. Use Official, Pinned Base Images

    Always pin your CUDA and ML framework base images to specific digests rather than mutable tags. Use nvidia/cuda:12.2.0-runtime-ubuntu22.04@sha256:... instead of nvidia/cuda:latest. This prevents supply chain attacks via tag mutation.

  2. Multi-Stage Builds

    Separate your build stage (with compilers, build tools) from your runtime stage. This dramatically reduces image size and attack surface. Copy only the compiled artifacts and model files to the final stage.

  3. Run as Non-Root

    Create a dedicated user for your ML workload. Use USER mluser in your Dockerfile. GPU access does not require root — the NVIDIA Container Toolkit handles device permissions at the runtime level.

  4. Read-Only Root Filesystem

    Run containers with --read-only and mount writable volumes only where needed (model output, logs, checkpoints). This prevents attackers from modifying system binaries or installing tools.

Secrets Management for ML Pipelines

ML workloads commonly need credentials for data stores, model registries, and cloud APIs. Never embed these in your Docker images:

Critical Rule: Never use ENV, ARG, or COPY to embed API keys, database passwords, or cloud credentials in Docker images. Every layer is inspectable with docker history and can be extracted by anyone with access to the image.
Method Use Case Security Level
Docker Secrets Swarm mode deployments, simple key-value secrets Good
Kubernetes Secrets K8s deployments with encrypted etcd, external secret operators Good
HashiCorp Vault Dynamic secrets, rotation, fine-grained access control Excellent
Cloud KMS AWS KMS, GCP KMS, Azure Key Vault for cloud-native ML pipelines Excellent

GPU Passthrough Security

Configuring GPU access securely requires balancing performance with isolation:

  • Limit GPU visibility: Use NVIDIA_VISIBLE_DEVICES to expose only the specific GPUs a container needs, rather than all available devices
  • Enable MIG partitioning: On A100 and H100 GPUs, use Multi-Instance GPU to create hardware-isolated GPU partitions for different workloads
  • Restrict capabilities: Drop all Linux capabilities and add back only what is needed. ML inference typically needs no special capabilities
  • Disable inter-process communication: Use --ipc=none unless shared memory is specifically required for multi-GPU training

Docker Compose Security for ML

When using Docker Compose for multi-container ML applications (API server, model server, data preprocessor), apply these practices:

Network Isolation

Create separate networks for frontend and backend services. The model serving container should not be directly accessible from the internet.

Resource Limits

Set memory and CPU limits for each service. For GPU services, use the deploy.resources.reservations.devices section to control GPU allocation.

Health Checks

Implement health checks for all services. A compromised container that stops responding to health checks can be automatically restarted.

Logging Configuration

Configure centralized logging with size limits. ML training logs can grow very large and should be rotated to prevent disk exhaustion attacks.

💡
Next Up: In the next lesson, we explore Kubernetes security for ML workloads — pod security standards, RBAC, network policies, and GPU scheduling security.