Beginner

How Deepfakes Work

Understanding deepfake generation techniques is essential for building effective detectors. Each generation method leaves different artifacts that detectors can exploit. This lesson covers the major approaches behind modern deepfakes.

Autoencoder-Based Face Swap

The original deepfake method uses a shared encoder with separate decoders for each face. The encoder learns a shared facial representation, while the decoders reconstruct each person's face:

Training
Train a shared encoder and two separate decoders on face images of person A and person B. The shared encoder forces both faces into a common latent space.
Swapping
To make person A look like person B: encode person A's face with the shared encoder, then decode with person B's decoder. The result has person B's appearance with person A's expression and pose.
Blending
The generated face is blended back into the original frame using face alignment and color matching.

💡

Detection clue: Autoencoder-based deepfakes often show visible blending boundaries at the face edge, color mismatch between the face and neck/ears, and reduced detail resolution compared to the background.

GAN-Based Generation

Generative Adversarial Networks produce higher-quality deepfakes through a generator-discriminator competition:

StyleGAN: Generates photorealistic faces from scratch. Used for "this person does not exist" type fakes.
CycleGAN: Translates between face domains without paired training data. Useful for face reenactment.
StarGAN: Multi-domain face attribute transfer — changing age, gender, expression, or hairstyle.
FSGAN: Face swapping and reenactment that works on any face without subject-specific training.

Diffusion Model-Based Generation

Modern diffusion models have surpassed GANs in image quality and are increasingly used for deepfake creation:

Stable Diffusion inpainting: Replace faces by masking and regenerating the face region with a specific identity
ControlNet: Condition generation on face landmarks, preserving pose while changing identity
IP-Adapter: Transfer identity from a reference image to a generated image
InstantID / PhotoMaker: Single-image identity transfer with high fidelity

Lip Sync Deepfakes

A specialized category where the mouth region is modified to match different audio:

Wav2Lip: Given audio and a face video, generates realistic lip movements that match the audio
Video Rewrite: Earlier approach that warps mouth regions to match target phonemes
Detection challenge: Only a small region (mouth) is modified, making detection harder than full face swaps

Common Artifacts by Generation Method

Method	Common Artifacts	Detection Approach
Autoencoder	Blending boundaries, color mismatch, blur	Edge analysis, color consistency
GAN	Spectral artifacts, checkerboard patterns, asymmetry	Frequency analysis, symmetry checks
Diffusion	Subtle texture inconsistencies, identity bleed	Texture analysis, ML classifiers
Lip sync	Mouth boundary artifacts, teeth rendering issues	Lip-audio sync analysis

✅

Key takeaway: Knowing how each deepfake method works reveals its weaknesses. Effective detection requires understanding all major generation approaches, as each leaves different forensic traces that detectors can target.

← Previous Introduction Next → Detection Techniques

How Deepfakes Work

Autoencoder-Based Face Swap

Training

Swapping

Blending

GAN-Based Generation

Diffusion Model-Based Generation

Lip Sync Deepfakes

Common Artifacts by Generation Method