Resource Quotas for ML Intermediate
Resource quotas prevent any single team from consuming all GPU resources in a shared cluster. Combined with limit ranges and priority classes, quotas ensure fair resource distribution while allowing burst capacity for important training jobs.
GPU Resource Quota
YAML
apiVersion: v1
kind: ResourceQuota
metadata:
name: gpu-quota
namespace: ml-team-nlp
spec:
hard:
requests.nvidia.com/gpu: "4"
limits.nvidia.com/gpu: "4"
requests.cpu: "64"
requests.memory: "256Gi"
limits.cpu: "128"
limits.memory: "512Gi"
pods: "20"
Limit Ranges
Set default and maximum resource limits for individual pods:
YAML
apiVersion: v1
kind: LimitRange
metadata:
name: ml-limits
namespace: ml-team-nlp
spec:
limits:
- type: Container
default:
cpu: "2"
memory: "8Gi"
defaultRequest:
cpu: "1"
memory: "4Gi"
max:
cpu: "32"
memory: "128Gi"
nvidia.com/gpu: "2"
Priority Classes
Define priority classes to ensure critical workloads get GPU resources first:
YAML
apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: production-inference value: 1000000 globalDefault: false description: "Production inference - highest priority" --- apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: training-batch value: 100000 description: "Batch training jobs - can be preempted" preemptionPolicy: PreemptLowerPriority
Kueue for Advanced Queuing
For more sophisticated GPU job scheduling, use Kueue:
- Fair sharing: Distribute GPU resources fairly across teams based on configured weights
- Borrowing: Allow teams to use unused GPU quota from other teams
- Preemption: Higher-priority jobs can preempt lower-priority ones
- Queue visibility: Teams can see their position in the GPU queue
Best Practice: Set GPU quotas slightly below your total cluster capacity to leave room for system overhead. Allow quota borrowing via Kueue so idle GPUs don't go to waste.