Intermediate

Image Scanning for ML Containers

ML container images are among the largest and most complex in any organization. Systematic vulnerability scanning is essential to catch security issues before they reach production.

Why ML Images Need Special Scanning

ML container images present unique scanning challenges:

Massive dependency trees: A typical PyTorch + CUDA image contains 500+ packages with nested dependencies
Mixed ecosystems: ML images combine OS packages (apt/yum), Python packages (pip/conda), and CUDA libraries from NVIDIA
Frequent false positives: Scientific computing libraries may trigger CVEs that are not exploitable in ML contexts
Large image sizes: Multi-gigabyte images take longer to scan and may timeout in CI/CD pipelines

Scanning Tools Comparison

Tool	Strengths	ML-Specific Support	Cost
Trivy	Fast, comprehensive, scans OS + language packages, IaC, secrets	Good Python/pip scanning, conda support	Free / Open Source
Snyk Container	Deep dependency analysis, fix recommendations, IDE integration	Python ecosystem focus, pip and poetry support	Free tier / Paid
Grype	Fast CLI scanner, SBOM-based, works with Syft	Good Python package scanning	Free / Open Source
Docker Scout	Integrated into Docker Desktop, policy-based remediation	Growing ML framework support	Free tier / Paid

Scanning CUDA and ML Framework Images

Generate an SBOM First

Use Syft or Trivy to generate a Software Bill of Materials (SBOM) for your ML image. This captures all OS packages, Python packages, and shared libraries. Store the SBOM alongside your image for audit trails.
Scan with Multiple Tools

No single scanner catches every vulnerability. Run at least two scanners (e.g., Trivy + Snyk) to maximize coverage. Each tool uses different vulnerability databases and detection methods.
Set Severity Thresholds

Configure your scanning policy to block images with Critical or High CVEs. Allow Medium and Low findings to be tracked as technical debt. Adjust thresholds based on whether the image runs in production or development.
Handle CUDA-Specific CVEs

CUDA libraries may have known CVEs that NVIDIA addresses through driver updates rather than library patches. Maintain a curated ignore list for CVEs that are mitigated by your host driver version.

CI/CD Integration Patterns

Build-Time Scanning

Scan images immediately after build in your CI pipeline. Fail the build if critical vulnerabilities are found. Use Trivy with --exit-code 1 for automatic enforcement.

Registry Admission

Configure your container registry (Harbor, ECR, GCR) to automatically scan images on push. Block deployment of unscanned or vulnerable images.

Continuous Monitoring

Rescan deployed images daily against updated vulnerability databases. New CVEs are published constantly — an image clean today may be vulnerable tomorrow.

Admission Controllers

Use Kubernetes admission controllers (OPA Gatekeeper, Kyverno) to enforce that only scanned and approved images can be deployed to production clusters.

💡

Next Up: In the next lesson, we explore runtime security — monitoring running ML containers with Falco, enforcing seccomp profiles, and detecting anomalous GPU access.

← Previous Kubernetes Security Next → Runtime Security

Image Scanning for ML Containers

Why ML Images Need Special Scanning

Scanning Tools Comparison

Scanning CUDA and ML Framework Images

Generate an SBOM First

Scan with Multiple Tools

Set Severity Thresholds

Handle CUDA-Specific CVEs

CI/CD Integration Patterns

Build-Time Scanning

Registry Admission

Continuous Monitoring

Admission Controllers