NVIDIA GPUs for AI Intermediate
NVIDIA dominates the AI accelerator market with over 80% market share. Understanding their GPU lineup is essential for making informed infrastructure decisions. This lesson covers the key NVIDIA GPUs available in the cloud, their architectures, and when to use each one.
NVIDIA GPU Comparison
| GPU | Architecture | Memory | FP16 TFLOPS | Best For |
|---|---|---|---|---|
| H100 SXM | Hopper | 80GB HBM3 | 990 | LLM training, large-scale AI |
| A100 SXM | Ampere | 80GB HBM2e | 312 | General training/inference |
| A10G | Ampere | 24GB GDDR6X | 125 | Inference, fine-tuning |
| L4 | Ada Lovelace | 24GB GDDR6 | 121 | Inference, video AI |
| T4 | Turing | 16GB GDDR6 | 65 | Budget inference |
Key Technologies
Tensor Cores
Specialized hardware units that perform matrix multiply-and-accumulate operations in a single clock cycle. Each generation improves precision support and throughput:
- Turing (T4) — FP16, INT8, INT4 Tensor Cores
- Ampere (A100) — Added TF32 and BF16 support, structural sparsity 2:4
- Hopper (H100) — Transformer Engine with automatic FP8 precision management
NVLink
High-speed interconnect between GPUs within a single node:
- NVLink 3.0 (A100) — 600 GB/s bidirectional per GPU, 12 links
- NVLink 4.0 (H100) — 900 GB/s bidirectional per GPU, 18 links
Multi-Instance GPU (MIG)
A100 and H100 support MIG, which partitions a single GPU into up to 7 isolated instances. Each instance has dedicated compute, memory, and cache. This enables GPU sharing for inference workloads without performance interference.
Ready to Explore AMD GPUs?
The next lesson covers AMD's growing presence in the AI accelerator market.
Next: AMD GPUs →
Lilly Tech Systems