Simple Maze Experiment – Part 4 – How to Train Your Agent

With all the other components now more or less done, its time for me to actually train this agent. Before I get into that though, I think now would be a good time to relist the initial training parameters I went in with:


  • Max Steps: 0
  • Training Configuration:
    • Width: 640
    • Height: 480
    • Quality Level: 0
    • Time Scale: 10
    • Target Frame Rate: 60
  • Agent Run Speed: 2
  • Success Material: Green
  • Fail Material: Red
  • Gravity Multiplier: 3


  • Vector Observation
    • Space Type: Continuous
    • Space Size: 0
    • Stacked Vectors: 1
  • Visual Observation
    • Size: 1
    • Element 0:
      • Width: 640
      • Height: 480
      • Black and White: False
  • Vector Action
    • Space Type: Discrete
    • Space Size: 4
  • Brain Type: External


  • Brain: MazeBrain
  • Agent Cameras
    • Camera 1: a forward facing camera attached to the capsule gameobject.
  • Max Steps: 5000
  • Reset on Done: True
  • On Demand Decisions: False
  • Decision Frequency: 5
  • Goal: Goal Game Object
  • Ground: Plane prefab
  • Spawn Point: -3.3, 1, 6.84 (this will put the capsule at the top left of the maze, near the small sticking out notch).

Now I need to build the game. I make sure the Simple Maze scene is the only one loaded, and built it to the location <repo root>/python/envs/SimpleMaze.

Once the build is complete, the next step is to load it into the provided script that the Unity team provided, and start the training. Since I’m on a windows PC this will be done using command prompt.

  1. Launch command prompt
  2. Navigate to the repo location, in my case this is E:\Repos\machine-learning-playground\
  3. Navigate into the python folder itself (cd python)
  4. Next I need to call the script itself, and provide it with the parameters for training. This is done in the format of python envs/SimpleMaze/SimpleMaze --train --run-id=SimpleMaze1
    • This breaks down into several key parts. The first part of the command, `python` is telling python to run the file.
    • The next argument, the path, tells the script where to get the binary from to actually run the game. You do not need to specify the .exe of the file, and it is important to note that there is no leading slash on the path name either.
    • –train tells the script this is a training exercise.
    • –run-id allows you to assign an ID to each run, I use the name of the game followed by a number so I can always look back at previous brain training runs if I need to.
  5. Press Enter to start the training.

Once the training started, it took about 6 minutes to complete, and unfortunately the agent was un-successful in doing little more than running around a small area, and occasionally glitching its way through the maze. I’ve uploaded a video of this training session below to give an idea of what was happening.

As you can see, other than eating up all of my RAM, the agent didnt realy accomplish much, which means I need to further tune the agent to see if I can improve performance.

My first step to tune is to adjust the default Hyperparameters to see if they can help. The hyperparmeters themselves are in the trainer_config.yaml file in the python folder. I’ve layed out the default ones below:

trainer: ppo
batch_size: 1024
beta: 5.0e-3
buffer_size: 10240
epsilon: 0.2
gamma: 0.99
hidden_units: 128
lambd: 0.95
learning_rate: 3.0e-4
max_steps: 5.0e4
memory_size: 256
normalize: false
num_epoch: 3
num_layers: 2
time_horizon: 64
sequence_length: 64
summary_freq: 1000
use_recurrent: false

To tweak this, I need to know what options are available and what they do, thankfully Unity have documented their hyperparemeters and their settings here.

I’m going to do some deep diving across the various manual sections to see what options we’ve got, and have a play around to see if I can find some that work, or at the very least, improve over what happened in that video above!