Simple Maze Experiment – Part 2 – Brain Food

In part 1 I set about preparing the Unity environment, so this next part will be about setting up the Academy and Brain components ready for use.

If you’re reading this and wondering what I’m on about with Academies and Brains, I do recommend you watch Unity’s own tutorial on Machine Learning Agents on YouTube, it provides a much more in-depth oversight as to what the Academy, Brain and Agent actually are and how they all fit together.

I’m starting with the academy as it is the highest level controller in ML (at least that I’ve found so far), with everything else reporting to it.  I’ve created an empty game object in the scene called Academy, and then created a new C# script called Academy_SimpleMaze. This script will server two main functions:

  1. It will inherit from the Academy script, giving me access to all core Academy functionality. (The Academy function itself is abstract, so you must create a new script that inherits from it to use it).
  2. It will allow the setting of some variables that will be global across all agents, namely the movement speed of the agent, and the colour changes for the floor upon a success or fail state.

The code itself looks like this:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class Academy_SimpleMaze : Academy
    // We need some global variables for our testing, so we can set them here.
    public float agentRunSpeed = 2;             // The speed the agent moves at.
    public Material successMaterial;            // The colour to change the floor to on success.
    public Material failMaterial;               // The colour to change the floor to on failure.

    /// The reason we change the floor color is that during training you may have one (or more) instances
    /// of the training area visible on screen, and the color change will allow you to see at a glance 
    /// the general performance of your agents. You could skip this, but if you looked in on your game during training
    /// you wouldnt really be able to tell much of what is happening. 

    // Lastly, we want to borrow the gravity settings from PushBlockAcademy, we will use a multiplier of around 3 to make things
    // less floaty.
    public float gravityMultiplier;

    void State()
        Physics.gravity *= gravityMultiplier;


I’ve attached this script onto the Academy game object I created earlier, and the inspector now shows me both the Academy script settings such as training steps, as well as my additional parameters I defined in the script above (agentRunSpeed, successMaterial, failMaterial).

Academy Training Parameters

Next I set my Academy up with the following settings:

  • Max Steps = 0
  • Training Configuration – These are the settings the system will use when training the agent.
    • Width = 1280
    • Height = 720
    • Quality Level = 0
    • Timescale = 100
    • Target Frame Rate = 60
  • Inference Configuration – These are the settings used when not training (i.e. when playing the game normally).
    • Width = 1280
    • Height = 720
    • Quality Level = 5
    • Timescale = 1
    • Target Frame Rate = 60
  • No add any Reset Parameters
  • Agent Run Speed = 2
  • Success Material – I just made a basic material and changed its colour to green.
  • Fail Material – Same as above, I used a red one for this rather than green.
  • Gravity Multiplier = 3

As far as I can tell so far, thats all I need for the Academy, the next bit we want to set up is the brain.


Brain Configuration

This is more straight forward than the academy as it doesnt need to be inherited on a new script, I just used the Brain script out of the Unity ML-Agents scripts folder I imported from the original Unity ML project. (To avoid confusion, I’ve also renamed this folder to “Core ML Scripts” and created a new Scripts folder for my own scripts.

Anyway, I created an empty Game Object under the Academy Game Object I created earlier, called it MazeBrain, and attached the Brain script to it.

I then realised that in order to set this up, I’d need to dig up some more information on using a camera, as I am no longer using Vector Observations, and am instead switching to Visual Observations. This required a bit of reading of both the Brains, and the Agents documentation.

The short version is that visual observations can be slow to train and may not succeed at all, however I’m nothing if not curious, so I’m going to try it anyway. To make it work I can ignore the Vector Observation component of the brain script and jump straight to Visual Observation. Under the element of keeping this simple, I’m only using 1 element (read: 1 camera) on my agent, so I set the size to 1. I then set the camera image width and height to 640 by 480.

I toyed with ticking the black and white button, but since I’ve color co-ordinated the maze (floor = black, walls = green, goal = blue) I left this unticked. I’m genuinely curious how this may affect the training, but I can come back and play around with that later.

The next part is defining the Vector Actions, basically this is the actions the AI can take. There are two types of vector action, Discrete, which will execute items from a numbered list, or Continuous, which returns (and I quote from the manual here) “an array of control signals with length equal to the Vector Action Space Size property”. This essentially boils down to the equivilent of an analogue stick on a controller, i.e. the number it generates falls between a minimum and a maximum, as opposed to Discrete, which, in my understanding, this is the equivilent of giving the AI a joypad and letting it press buttons and get a response, which each button being its own full (int) number. I will most likely come back to these two functions at a later date to learn more about Continuous, but for now, I’ll stick with Discrete.

Next up is my Space Size, which as I mentioned previously, is going to consist of 4 actions: Forward, rotate left, rotate right, and backwards (W, A, S and D respectively).

After this is Brain Type, which I’m setting to Player, as I want the AI to imitate a players actions of pushing buttons and getting results.

Under this is the Broadcast tickbox. I’m leaving this ticked as its part of the training API and allows some back and forth feedback with Python when training the agent.

Lastly we come to our actions, I’m going to set these up as follows:

  • Default: -1 (Do nothing)
  • Element 0
    • Key: W
    • Value: 0
  • Element 1
    • Key: S
    • Value: 1
  • Element 2:
    • Key: A
    • Value: 2
  • Element 3:
    • Key: D
    • Value: 3

Thats the brain configured and ready to go, the final step is to get the agent sorted, and then hopefully its on to the training!