Intermediate
Unity ML-Agents Toolkit
ML-Agents is Unity's open-source toolkit for training intelligent agents using deep reinforcement learning, imitation learning, and other ML methods.
What is ML-Agents?
The Unity ML-Agents Toolkit provides a bridge between Unity environments and Python-based machine learning frameworks. You define observations and actions in C#, then train agents using PyTorch on the Python side. Trained models are exported as ONNX files and run inside Unity via Unity Sentis.
Architecture Overview
| Component | Language | Role |
|---|---|---|
| Agent (C#) | C# | Collects observations, receives actions, provides rewards |
| Communicator | gRPC | Bridges Unity and Python training process |
| Trainer (Python) | Python | Runs PPO/SAC algorithms, updates neural network |
| ONNX Model | Cross-platform | Exported trained model for runtime inference |
Creating Your First Agent
C# - ML-Agents Agent Script
using Unity.MLAgents; using Unity.MLAgents.Actuators; using Unity.MLAgents.Sensors; using UnityEngine; public class BallAgent : Agent { public Transform target; private Rigidbody rb; public override void Initialize() { rb = GetComponent<Rigidbody>(); } public override void CollectObservations( VectorSensor sensor) { // Agent position and velocity sensor.AddObservation(transform.localPosition); sensor.AddObservation(rb.velocity); // Target position sensor.AddObservation(target.localPosition); } public override void OnActionReceived( ActionBuffers actions) { float moveX = actions.ContinuousActions[0]; float moveZ = actions.ContinuousActions[1]; rb.AddForce(new Vector3(moveX, 0, moveZ) * 10f); // Reward for reaching target float dist = Vector3.Distance( transform.localPosition, target.localPosition); if (dist < 1.5f) { SetReward(1.0f); EndEpisode(); } } public override void OnEpisodeBegin() { // Reset agent and target positions transform.localPosition = Vector3.zero; rb.velocity = Vector3.zero; target.localPosition = new Vector3( Random.Range(-4f, 4f), 0.5f, Random.Range(-4f, 4f)); } }
Training Algorithms
- PPO (Proximal Policy Optimization): The default RL algorithm. Stable, general-purpose, works well for most scenarios.
- SAC (Soft Actor-Critic): Better sample efficiency than PPO, encourages exploration. Good for continuous action spaces.
- Imitation Learning (GAIL/BC): Train agents by mimicking human demonstrations. Useful when reward design is difficult.
- Self-Play: Train agents by competing against copies of themselves. Perfect for competitive games.
- Curriculum Learning: Start with easy tasks and gradually increase difficulty as the agent improves.
Running Training
Bash - Training Commands
# Install ML-Agents Python package pip install mlagents # Start training mlagents-learn config/trainer_config.yaml --run-id=my_run # Monitor with TensorBoard tensorboard --logdir results
Key takeaway: ML-Agents enables you to train game AI using modern reinforcement learning techniques. Define observations and actions in C#, train with Python, and deploy ONNX models back into Unity for real-time inference.
Lilly Tech Systems