Intermediate
Expression Transfer
Expression transfer maps your real facial movements onto a digital avatar in real time, making the character mirror your emotions, speech, and micro-expressions faithfully.
Blend Shapes (Morph Targets)
Avatars express emotions through blend shapes — predefined facial deformations that can be combined:
- ARKit standard: 52 blend shapes covering all major facial movements
- Visemes: Mouth shapes for speech (typically 15 phoneme groups)
- Custom shapes: Additional expressions specific to your avatar style
The Transfer Pipeline
Capture
Camera captures your face at 30-60 fps with AI landmark detection running on each frame.
Extract Parameters
AI converts landmarks into blend shape coefficients (0.0-1.0 for each shape).
Calibrate
Map your neutral face to the avatar's neutral state; account for individual facial differences.
Apply
Set blend shape weights on the avatar mesh each frame for smooth, real-time animation.
Calibration Techniques
- Auto-calibration: System detects your neutral face and expression range automatically
- Manual calibration: User performs specific expressions (smile, frown, surprise) to set ranges
- Per-user profiles: Save calibration data for consistent results across sessions
- Range scaling: Amplify subtle expressions for more expressive avatars (common in VTubing)
Common Challenges
| Challenge | Cause | Solution |
|---|---|---|
| Jitter | Noisy tracking data | Temporal smoothing filters (exponential, Kalman) |
| Latency | Processing pipeline delays | Optimize model, reduce pipeline stages |
| Drift | Gradual offset from neutral | Periodic re-calibration, drift correction |
| Cross-talk | One expression triggering another | Independent blend shape channels, better training data |
Pro tip: For VTubing and entertainment, slightly exaggerate expression transfer (1.2-1.5x amplification). For business and communication, keep it at 1.0x for natural appearance.
Lilly Tech Systems