NVIDIA GPUs for AI Intermediate

NVIDIA dominates the AI accelerator market with over 80% market share. Understanding their GPU lineup is essential for making informed infrastructure decisions. This lesson covers the key NVIDIA GPUs available in the cloud, their architectures, and when to use each one.

NVIDIA GPU Comparison

GPU	Architecture	Memory	FP16 TFLOPS	Best For
H100 SXM	Hopper	80GB HBM3	990	LLM training, large-scale AI
A100 SXM	Ampere	80GB HBM2e	312	General training/inference
A10G	Ampere	24GB GDDR6X	125	Inference, fine-tuning
L4	Ada Lovelace	24GB GDDR6	121	Inference, video AI
T4	Turing	16GB GDDR6	65	Budget inference

Key Technologies

Tensor Cores

Specialized hardware units that perform matrix multiply-and-accumulate operations in a single clock cycle. Each generation improves precision support and throughput:

Turing (T4) — FP16, INT8, INT4 Tensor Cores
Ampere (A100) — Added TF32 and BF16 support, structural sparsity 2:4
Hopper (H100) — Transformer Engine with automatic FP8 precision management

NVLink

High-speed interconnect between GPUs within a single node:

NVLink 3.0 (A100) — 600 GB/s bidirectional per GPU, 12 links
NVLink 4.0 (H100) — 900 GB/s bidirectional per GPU, 18 links

Multi-Instance GPU (MIG)

A100 and H100 support MIG, which partitions a single GPU into up to 7 isolated instances. Each instance has dedicated compute, memory, and cache. This enables GPU sharing for inference workloads without performance interference.

Performance Tip: Always enable the Transformer Engine on H100 GPUs. It automatically manages FP8 precision per layer, providing up to 2x training speedup over A100 with no code changes.

Ready to Explore AMD GPUs?

The next lesson covers AMD's growing presence in the AI accelerator market.

Next: AMD GPUs →

← Introduction AMD GPUs →