Simple Maze Experiement – Part 6

Analysing the Results

So after the last batch of experiments I settled on ramping up the resolution of the image supplied to the brain during training to see if this improved training results, as well as actual results when used live.

The overall answer to that question is “not really”. By adding larger resolution images to analyse, training time is significantly increased, but the end results are negligable. I also found that training required a significantly larger chunk of memory in order to function to what I will say essentially appears to be same outcome as you’ll see in the below tensorboard overview.

In the above graph, red indicates a training run using an image resolution of 256×256 pixels, while the blue one is 512×512 pixels. I was surprised to see that despite a higher resolution image (and therefore cleaner data for the neural network to work on) the runs actually had greater success rate with the smaller image resolution, though not by a massive margin. The real difference is the time it took between each training run, with the 512×512 version taking over double the amount of time to train than the 256×256 version.

Onward and Upward

So far I’m not seeing a massive need for visual observation in Unity, it seems clunky at best, and the training time is rather large. Having done some more research I’ve found several articles that recommend using over 1 million steps, with some advocating 5 million upwards. From previous tests of 3 million steps with a small image selection, I am not confident that this would produce the end result I would deem satisfactory, however it is something I am interested in trying once I can fix a cooling issue with my GPU to allow it to run for longer than 3-4 hours of calculation without encountering stability issues (a 3 million step training session would take ~10-15 hours).

Since I don’t think I’ll be able to sate my curiosity regarding the visual learning component until I can at least test the 3 million and 5 million steps theory, I’m to keep this project in its current state, and simply adjust the training parameters to see what I get out of it. In addition to this, I’m going to create a new sub-project in the repo where I will duplicate the current setup, but replace the visual observation method with a vector based method, and see how that compares to the visual one in terms of both training time and actual result accuracy.

Once I’ve got these two set up, I’ll upload them to my GitHub Repo.



Simple Maze Experiement – Part 5

Afte running may variations of training with many variations of hyperparameters, and a few changes to the agent and the training code itself, I’m still not entirely sold that Visual Observation is the best method of Machine Learning for a Unity Agent, at least not in the method I wanted to use it for.

To start with, I limited myself to changing hyperparameters only, and ended up settling on the following:

use_recurrent: true
sequence_length: 64
memory_size: 256
num_layers: 2
gamma: 0.99
batch_size: 128
buffer_size: 2048
num_epoch: 5
learning_rate: 3.0e-4
time_horizon: 64
max_steps: 5e4
beta: 5e-3
epsilon: 0.2
normalize: false
hidden_units: 512

However, I found that tweaking the parameters alone was not enough, so In addition to these parameters, I also updated the agent to have 4 cameras (pointing in the 4 cardinal directions), and to interpret the view from these cameras as 80×80 pixels. This was a required change as the original version that tried to process a 640×480 resolution window would run of memory and crash. I’m not sure if the additional cameras have helped or not, but dropping the resolution of the images down definitely did stop the crash issues. I also increased the training speed to 50, changed the agent interaction from a rigidbody to a character controller and reduced decision making interval down to 3 (from 5).

In regards to the Hyperparameters, the key points are:

  • I have told the agent that it is recurrent, meaning it should remember the last few actions it has taken,
  • I’ve given it 2 visible layers in the neural network, and then 512 hidden layers.
  • I’ve also set to run the full training iteration over 50000 runs (max_steps). I did have longer steps set up, but they appeared to make minimal difference to the result. A set of hyperparmeters that had 50000 steps actually gave me worse results than 3000000 steps, Possibly due to overfitting the network.

I’ve added a tensorflow breakdown of the two runs below:

Continue reading

Simple Maze Experiment – Part 4 – How to Train Your Agent

With all the other components now more or less done, its time for me to actually train this agent. Before I get into that though, I think now would be a good time to relist the initial training parameters I went in with:


  • Max Steps: 0
  • Training Configuration:
    • Width: 640
    • Height: 480
    • Quality Level: 0
    • Time Scale: 10
    • Target Frame Rate: 60
  • Agent Run Speed: 2
  • Success Material: Green
  • Fail Material: Red
  • Gravity Multiplier: 3


  • Vector Observation
    • Space Type: Continuous
    • Space Size: 0
    • Stacked Vectors: 1
  • Visual Observation
    • Size: 1
    • Element 0:
      • Width: 640
      • Height: 480
      • Black and White: False
  • Vector Action
    • Space Type: Discrete
    • Space Size: 4
  • Brain Type: External


  • Brain: MazeBrain
  • Agent Cameras
    • Camera 1: a forward facing camera attached to the capsule gameobject.
  • Max Steps: 5000
  • Reset on Done: True
  • On Demand Decisions: False
  • Decision Frequency: 5
  • Goal: Goal Game Object
  • Ground: Plane prefab
  • Spawn Point: -3.3, 1, 6.84 (this will put the capsule at the top left of the maze, near the small sticking out notch).

Continue reading

Where to Start?

The biggest question I had once I found about using ML with Unity was where I could actually start. While there is a github repo that is constantly updated ( , the documentation included was not (for me at least) the most understandable in how to actually get the thing up and running.

THankfully, though I missed the live demo at GDC, Unity have provided it in a playlist format for everyone on YouTube:

I’d recommend downloading the project from github, and then going along with the video to better understand how to use the ML agents, and get an understanding of the two types of ML that are available (It’s worth noting that while the guide mentions both Reinforcement Learning and Imitation Learning in order to explain the differences between the two, the tutorial itself focuses on Reinforcement Learning.

I’ll also note a few issues I encountered while following along:

  1. My Python 3 install was not present in my Windows path, so I had to manually add it. If you are installing Python 3 from the website, this option is provided during the install process. Unless you plan to run multiple versions of Python on your system, I’d recommend ticking that option.
  2. Several packages were not installed or listed as dependencies, so when I started up the section, Python wouldnt run until these had been installed. Using PIP (on Windows this is called with Python -m pip <package name> from command prompt), make sure you have the following:
    • docopt
    • numpy
    • image
    • tensorflow==1.4.*
    • PyYAML
  3. It is important to note the use of Tensorflow 1.4, as by default PIP installed 1.8, which currently has issues with TF#, and if you build your Tensorflow data with 1.8, when you fix it and install 1.4, you have to re-run your training simulation again to rebuild a new .byte file that is compatible.
  4. You’ll need to have installed Tensorflow # (sharp) plugin for Unity, see the guide here for full details. The short version is basically download and import this Unity package, then go to Go to Edit -> Player Settings and set  the Scripting Runtime Version under Configuration to Experimental (.NET 4.6 or equivalent) and  add ENABLE_TENSORFLOW to the Scripting Define Symbols for each type of device you want to use (PC, Mac and Linux Standalone, iOS or Android). Once all that is done, save, close and then re-open Unity.

Other than those minor hiccups, I was able to get everything working.

A Brief Introduction

So for a while now I’ve been learning a lot about Tensorflow and Machine Learning. Originally it was for a work related project, however that got canned when management realised that ML wasn’t just a “Magic Bullet” they could put in to solve their problems overnight, it would take time and understanding to get right.

However, by the time they had canned the project, I’d already been doing some serious research into the topic, and once I got my head around the basics I realised that there was a hell of a lot of potential in such a technology, way outside of what work wanted to use it for.

So in my spare timeI kept digging, I kept learning, and I kept trying to get better at it.

Then just after GDC 2018, I learned something that had somehow passed me by originally. Unity 3D supported Machine Learning Agents (ML Agents), and was a technology that was being actively worked on by the developers.

Suddenly my use for ML had moved away from the rather mundane (to me anyway) large scale data analysis that I was thinking of using it for, to something a lot closer to me. Something I’ve had a vested interest in for the longest of times: Games Development, and Artificial Intelligence.

This in turn made me decide to publicly document my thoughts and discoveriers as I turn down this path to both provide a record, and to perhaps help others who are unsure where they are going when it comes to using ML and Unity 3D.