Intermediate

Unity ML-Agents Toolkit

ML-Agents is Unity's open-source toolkit for training intelligent agents using deep reinforcement learning, imitation learning, and other ML methods.

What is ML-Agents?

The Unity ML-Agents Toolkit provides a bridge between Unity environments and Python-based machine learning frameworks. You define observations and actions in C#, then train agents using PyTorch on the Python side. Trained models are exported as ONNX files and run inside Unity via Unity Sentis.

Architecture Overview

Component	Language	Role
Agent (C#)	C#	Collects observations, receives actions, provides rewards
Communicator	gRPC	Bridges Unity and Python training process
Trainer (Python)	Python	Runs PPO/SAC algorithms, updates neural network
ONNX Model	Cross-platform	Exported trained model for runtime inference

Creating Your First Agent

C# - ML-Agents Agent Script

using Unity.MLAgents;
using Unity.MLAgents.Actuators;
using Unity.MLAgents.Sensors;
using UnityEngine;

public class BallAgent : Agent
{
    public Transform target;
    private Rigidbody rb;

    public override void Initialize()
    {
        rb = GetComponent<Rigidbody>();
    }

    public override void CollectObservations(
        VectorSensor sensor)
    {
        // Agent position and velocity
        sensor.AddObservation(transform.localPosition);
        sensor.AddObservation(rb.velocity);
        // Target position
        sensor.AddObservation(target.localPosition);
    }

    public override void OnActionReceived(
        ActionBuffers actions)
    {
        float moveX = actions.ContinuousActions[0];
        float moveZ = actions.ContinuousActions[1];
        rb.AddForce(new Vector3(moveX, 0, moveZ) * 10f);

        // Reward for reaching target
        float dist = Vector3.Distance(
            transform.localPosition, target.localPosition);
        if (dist < 1.5f)
        {
            SetReward(1.0f);
            EndEpisode();
        }
    }

    public override void OnEpisodeBegin()
    {
        // Reset agent and target positions
        transform.localPosition = Vector3.zero;
        rb.velocity = Vector3.zero;
        target.localPosition = new Vector3(
            Random.Range(-4f, 4f), 0.5f, Random.Range(-4f, 4f));
    }
}

Training Algorithms

PPO (Proximal Policy Optimization): The default RL algorithm. Stable, general-purpose, works well for most scenarios.
SAC (Soft Actor-Critic): Better sample efficiency than PPO, encourages exploration. Good for continuous action spaces.
Imitation Learning (GAIL/BC): Train agents by mimicking human demonstrations. Useful when reward design is difficult.
Self-Play: Train agents by competing against copies of themselves. Perfect for competitive games.
Curriculum Learning: Start with easy tasks and gradually increase difficulty as the agent improves.

Running Training

Bash - Training Commands

# Install ML-Agents Python package
pip install mlagents

# Start training
mlagents-learn config/trainer_config.yaml --run-id=my_run

# Monitor with TensorBoard
tensorboard --logdir results

✅

Key takeaway: ML-Agents enables you to train game AI using modern reinforcement learning techniques. Define observations and actions in C#, train with Python, and deploy ONNX models back into Unity for real-time inference.

← Previous Introduction Next → NavMesh