NVIDIA GPUs for AI Intermediate

NVIDIA dominates the AI accelerator market with over 80% market share. Understanding their GPU lineup is essential for making informed infrastructure decisions. This lesson covers the key NVIDIA GPUs available in the cloud, their architectures, and when to use each one.

NVIDIA GPU Comparison

GPUArchitectureMemoryFP16 TFLOPSBest For
H100 SXMHopper80GB HBM3990LLM training, large-scale AI
A100 SXMAmpere80GB HBM2e312General training/inference
A10GAmpere24GB GDDR6X125Inference, fine-tuning
L4Ada Lovelace24GB GDDR6121Inference, video AI
T4Turing16GB GDDR665Budget inference

Key Technologies

Tensor Cores

Specialized hardware units that perform matrix multiply-and-accumulate operations in a single clock cycle. Each generation improves precision support and throughput:

  • Turing (T4) — FP16, INT8, INT4 Tensor Cores
  • Ampere (A100) — Added TF32 and BF16 support, structural sparsity 2:4
  • Hopper (H100) — Transformer Engine with automatic FP8 precision management

NVLink

High-speed interconnect between GPUs within a single node:

  • NVLink 3.0 (A100) — 600 GB/s bidirectional per GPU, 12 links
  • NVLink 4.0 (H100) — 900 GB/s bidirectional per GPU, 18 links

Multi-Instance GPU (MIG)

A100 and H100 support MIG, which partitions a single GPU into up to 7 isolated instances. Each instance has dedicated compute, memory, and cache. This enables GPU sharing for inference workloads without performance interference.

Performance Tip: Always enable the Transformer Engine on H100 GPUs. It automatically manages FP8 precision per layer, providing up to 2x training speedup over A100 with no code changes.

Ready to Explore AMD GPUs?

The next lesson covers AMD's growing presence in the AI accelerator market.

Next: AMD GPUs →