K8s Architecture for ML Beginner

Understanding Kubernetes architecture is essential for running ML workloads effectively. This lesson covers how the K8s control plane schedules GPU workloads, how device plugins expose accelerators, and the node topology for ML clusters.

K8s Components for ML

ComponentRole in ML
kube-schedulerAssigns GPU pods to nodes with available GPU resources
kubeletManages pod lifecycle and GPU device allocation on each node
NVIDIA Device PluginDiscovers GPUs and advertises nvidia.com/gpu as a schedulable resource
NVIDIA GPU OperatorAutomates driver installation, device plugin, monitoring across GPU nodes
Container RuntimeNVIDIA Container Toolkit enables GPU access inside containers

GPU Scheduling Flow

  1. Pod requests GPU

    Pod spec includes nvidia.com/gpu: 1 in resource limits.

  2. Scheduler finds a node

    kube-scheduler finds a node with available GPU resources advertised by the device plugin.

  3. Kubelet allocates GPU

    The kubelet on the selected node allocates a specific GPU device to the pod via the device plugin.

  4. Container accesses GPU

    NVIDIA Container Toolkit mounts the GPU device and drivers into the container.

Node Topology for ML Clusters

Organize your cluster with separate node pools for different workload types:

  • System node pool: Small CPU instances for K8s system pods (CoreDNS, metrics-server)
  • CPU node pool: General-purpose nodes for data preprocessing and pipeline orchestration
  • GPU training pool: GPU nodes (A100, H100) with taints for training jobs only
  • GPU inference pool: GPU nodes (T4, L4) for model serving with autoscaling

NVIDIA GPU Operator

Bash
# Install NVIDIA GPU Operator via Helm
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator \
  --create-namespace \
  --set driver.enabled=true \
  --set toolkit.enabled=true \
  --set devicePlugin.enabled=true \
  --set dcgmExporter.enabled=true
Best Practice: Use the NVIDIA GPU Operator instead of manually installing drivers on each node. It handles driver installation, device plugin deployment, container toolkit configuration, and DCGM monitoring exporter as a unified solution.