Intermediate
AI Face Tracking
Face tracking is the foundation of real-time avatar animation. Modern AI can detect faces, track 468+ landmarks, estimate head pose, and track eye gaze using just a standard webcam.
Face Detection vs Face Tracking
- Detection: Finding where faces are in a frame (bounding box). Fast but coarse.
- Tracking: Following specific facial features (landmarks) frame-to-frame with high precision.
- Real-time avatars need tracking: Consistent, precise landmark positions at 30+ fps
Landmark Models
| Framework | Landmarks | Platform | Speed |
|---|---|---|---|
| MediaPipe Face Mesh | 468 | Cross-platform, web | Very fast |
| Apple ARKit | 52 blend shapes | iOS only | Fast, high quality |
| dlib | 68 | Desktop | Moderate |
| OpenCV DNN | Varies | Cross-platform | Fast |
| NVIDIA Maxine | 126+ | NVIDIA GPU | Very fast |
Head Pose Estimation
Determining head rotation (yaw, pitch, roll) and position from 2D landmarks:
- Perspective-n-Point (PnP) algorithms solve 3D pose from 2D landmark correspondences
- Deep learning approaches directly regress head pose angles from face crops
- Critical for avatar head movement that mirrors the user naturally
Eye and Gaze Tracking
- Iris tracking: MediaPipe Iris provides real-time iris landmark detection
- Gaze direction: Estimate where the user is looking based on iris position relative to eye corners
- Blink detection: Track eyelid openness for natural blinking on the avatar
- Pupil dilation: Advanced tracking for emotional expression
Quality vs speed tradeoff: Apple ARKit on iPhone provides the highest quality face tracking (TrueDepth camera with depth sensor). For desktop, MediaPipe offers the best balance of quality and speed using just a webcam.
Lilly Tech Systems