Intermediate

AI Face Tracking

Face tracking is the foundation of real-time avatar animation. Modern AI can detect faces, track 468+ landmarks, estimate head pose, and track eye gaze using just a standard webcam.

Face Detection vs Face Tracking

  • Detection: Finding where faces are in a frame (bounding box). Fast but coarse.
  • Tracking: Following specific facial features (landmarks) frame-to-frame with high precision.
  • Real-time avatars need tracking: Consistent, precise landmark positions at 30+ fps

Landmark Models

FrameworkLandmarksPlatformSpeed
MediaPipe Face Mesh468Cross-platform, webVery fast
Apple ARKit52 blend shapesiOS onlyFast, high quality
dlib68DesktopModerate
OpenCV DNNVariesCross-platformFast
NVIDIA Maxine126+NVIDIA GPUVery fast

Head Pose Estimation

Determining head rotation (yaw, pitch, roll) and position from 2D landmarks:

  • Perspective-n-Point (PnP) algorithms solve 3D pose from 2D landmark correspondences
  • Deep learning approaches directly regress head pose angles from face crops
  • Critical for avatar head movement that mirrors the user naturally

Eye and Gaze Tracking

  • Iris tracking: MediaPipe Iris provides real-time iris landmark detection
  • Gaze direction: Estimate where the user is looking based on iris position relative to eye corners
  • Blink detection: Track eyelid openness for natural blinking on the avatar
  • Pupil dilation: Advanced tracking for emotional expression
Quality vs speed tradeoff: Apple ARKit on iPhone provides the highest quality face tracking (TrueDepth camera with depth sensor). For desktop, MediaPipe offers the best balance of quality and speed using just a webcam.