Resource Quotas for ML Intermediate

Resource quotas prevent any single team from consuming all GPU resources in a shared cluster. Combined with limit ranges and priority classes, quotas ensure fair resource distribution while allowing burst capacity for important training jobs.

GPU Resource Quota

YAML
apiVersion: v1
kind: ResourceQuota
metadata:
  name: gpu-quota
  namespace: ml-team-nlp
spec:
  hard:
    requests.nvidia.com/gpu: "4"
    limits.nvidia.com/gpu: "4"
    requests.cpu: "64"
    requests.memory: "256Gi"
    limits.cpu: "128"
    limits.memory: "512Gi"
    pods: "20"

Limit Ranges

Set default and maximum resource limits for individual pods:

YAML
apiVersion: v1
kind: LimitRange
metadata:
  name: ml-limits
  namespace: ml-team-nlp
spec:
  limits:
  - type: Container
    default:
      cpu: "2"
      memory: "8Gi"
    defaultRequest:
      cpu: "1"
      memory: "4Gi"
    max:
      cpu: "32"
      memory: "128Gi"
      nvidia.com/gpu: "2"

Priority Classes

Define priority classes to ensure critical workloads get GPU resources first:

YAML
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: production-inference
value: 1000000
globalDefault: false
description: "Production inference - highest priority"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: training-batch
value: 100000
description: "Batch training jobs - can be preempted"
preemptionPolicy: PreemptLowerPriority

Kueue for Advanced Queuing

For more sophisticated GPU job scheduling, use Kueue:

  • Fair sharing: Distribute GPU resources fairly across teams based on configured weights
  • Borrowing: Allow teams to use unused GPU quota from other teams
  • Preemption: Higher-priority jobs can preempt lower-priority ones
  • Queue visibility: Teams can see their position in the GPU queue
Best Practice: Set GPU quotas slightly below your total cluster capacity to leave room for system overhead. Allow quota borrowing via Kueue so idle GPUs don't go to waste.