Simple Maze Experiement – Part 6

Analysing the Results

So after the last batch of experiments I settled on ramping up the resolution of the image supplied to the brain during training to see if this improved training results, as well as actual results when used live.

The overall answer to that question is “not really”. By adding larger resolution images to analyse, training time is significantly increased, but the end results are negligable. I also found that training required a significantly larger chunk of memory in order to function to what I will say essentially appears to be same outcome as you’ll see in the below tensorboard overview.

In the above graph, red indicates a training run using an image resolution of 256×256 pixels, while the blue one is 512×512 pixels. I was surprised to see that despite a higher resolution image (and therefore cleaner data for the neural network to work on) the runs actually had greater success rate with the smaller image resolution, though not by a massive margin. The real difference is the time it took between each training run, with the 512×512 version taking over double the amount of time to train than the 256×256 version.

Onward and Upward

So far I’m not seeing a massive need for visual observation in Unity, it seems clunky at best, and the training time is rather large. Having done some more research I’ve found several articles that recommend using over 1 million steps, with some advocating 5 million upwards. From previous tests of 3 million steps with a small image selection, I am not confident that this would produce the end result I would deem satisfactory, however it is something I am interested in trying once I can fix a cooling issue with my GPU to allow it to run for longer than 3-4 hours of calculation without encountering stability issues (a 3 million step training session would take ~10-15 hours).

Since I don’t think I’ll be able to sate my curiosity regarding the visual learning component until I can at least test the 3 million and 5 million steps theory, I’m to keep this project in its current state, and simply adjust the training parameters to see what I get out of it. In addition to this, I’m going to create a new sub-project in the repo where I will duplicate the current setup, but replace the visual observation method with a vector based method, and see how that compares to the visual one in terms of both training time and actual result accuracy.

Once I’ve got these two set up, I’ll upload them to my GitHub Repo.



Simple Maze Experiement – Part 5

Afte running may variations of training with many variations of hyperparameters, and a few changes to the agent and the training code itself, I’m still not entirely sold that Visual Observation is the best method of Machine Learning for a Unity Agent, at least not in the method I wanted to use it for.

To start with, I limited myself to changing hyperparameters only, and ended up settling on the following:

use_recurrent: true
sequence_length: 64
memory_size: 256
num_layers: 2
gamma: 0.99
batch_size: 128
buffer_size: 2048
num_epoch: 5
learning_rate: 3.0e-4
time_horizon: 64
max_steps: 5e4
beta: 5e-3
epsilon: 0.2
normalize: false
hidden_units: 512

However, I found that tweaking the parameters alone was not enough, so In addition to these parameters, I also updated the agent to have 4 cameras (pointing in the 4 cardinal directions), and to interpret the view from these cameras as 80×80 pixels. This was a required change as the original version that tried to process a 640×480 resolution window would run of memory and crash. I’m not sure if the additional cameras have helped or not, but dropping the resolution of the images down definitely did stop the crash issues. I also increased the training speed to 50, changed the agent interaction from a rigidbody to a character controller and reduced decision making interval down to 3 (from 5).

In regards to the Hyperparameters, the key points are:

  • I have told the agent that it is recurrent, meaning it should remember the last few actions it has taken,
  • I’ve given it 2 visible layers in the neural network, and then 512 hidden layers.
  • I’ve also set to run the full training iteration over 50000 runs (max_steps). I did have longer steps set up, but they appeared to make minimal difference to the result. A set of hyperparmeters that had 50000 steps actually gave me worse results than 3000000 steps, Possibly due to overfitting the network.

I’ve added a tensorflow breakdown of the two runs below:

Continue reading

Improving Firepower – Using NVIDIA’s CUDA to Improve Agent Training Times

In between my last post and now, I found I was encountering several issues in regards to CPU processing power when training my agents. If you watched my training video in the last post, you can see my CPU getting hit quite heavily, and when I started playing around with new Hyperparemeters (especially adding recurrent neural network support) this ended up going through the roof. More than a few times I’ve had Tensorflow or Unity crash out while running the training.

The solution is something I’d stumbled across previously in my Machine Learning studies, utilising my NVIDIA’s GPU’s processing power to handle the training. For those unfamiliar, CUDA is a parallel computing platform and programming model that utilises the GPU(s) of your graphics cards to be used for general purpose processing. The short version is that when it comes to math heavy operations like machine learning, I can dump it all on my graphics card (which is not doing much otherwise) and get it to do the heavy lifting.

Installing the CUDA Toolkit

It’s actually fairly straight forward to install and configure the relevant components to support this, and I’ve outlined the process I followed below:

  1. Download the version 8.0 of NVIDIA’s CUDA toolkit from here. It is important to get 8.0 as Unity ML toolkit only works with Tensorsharp up to Tensorflow version 1.4, which requires CUDA 8.0 or lower.
  2. Uninstall my current version of tensorflow via PIP through command prompt (Python -m pip uninstall tensorflow).
  3. Install the GPU friendly version of Tensorflow 1.4 (python -m pip install tensorflow-gpu==1.4.*)
  4. Add the CUDA DLL fiels to your systems PATH folder. They are:
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\extras\CUPTI\libx64
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64
    : If you are unfamiliar with adjusting your systems PATH file I highly advise you do some reading up on google first, you can cause some serious issues with your PC if you just change things at random.
  5. Reboot the PC (this step is not required, but I’m a fan of rebooting after a GPU driver update for my own sanity).
  6. The next part I needed was the NVIDIA cudnn software, this can be downloaded from here, but requires that you sign up to the NVIDIA developer program to download it. It took a bit of digging but I eventually managed to find cuDNN v6.0 for CUDA 8.0.
  7. To save hassle of adding more paths, I extracted the contents of the cuDNN folder straight into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\.

Once that was done, the next time I ran the training script it utilised my system RAM as well as the GPU’s memory, keeping my CPU in a much more stable state, and completing training runs much faster.


Simple Maze Experiment – Part 4 – How to Train Your Agent

With all the other components now more or less done, its time for me to actually train this agent. Before I get into that though, I think now would be a good time to relist the initial training parameters I went in with:


  • Max Steps: 0
  • Training Configuration:
    • Width: 640
    • Height: 480
    • Quality Level: 0
    • Time Scale: 10
    • Target Frame Rate: 60
  • Agent Run Speed: 2
  • Success Material: Green
  • Fail Material: Red
  • Gravity Multiplier: 3


  • Vector Observation
    • Space Type: Continuous
    • Space Size: 0
    • Stacked Vectors: 1
  • Visual Observation
    • Size: 1
    • Element 0:
      • Width: 640
      • Height: 480
      • Black and White: False
  • Vector Action
    • Space Type: Discrete
    • Space Size: 4
  • Brain Type: External


  • Brain: MazeBrain
  • Agent Cameras
    • Camera 1: a forward facing camera attached to the capsule gameobject.
  • Max Steps: 5000
  • Reset on Done: True
  • On Demand Decisions: False
  • Decision Frequency: 5
  • Goal: Goal Game Object
  • Ground: Plane prefab
  • Spawn Point: -3.3, 1, 6.84 (this will put the capsule at the top left of the maze, near the small sticking out notch).

Continue reading

Simple Maze Experiment – Part 3 – Mayhem and Agents

Well I knew the agent was going to be more complex than the academy and the brain to set up, but what I didn’t realise is how much back and forth I’d be doing to fine tune the agent. I’ll be honest, even as I write this I’ve still not got an end result I’m happy with, but I’m learning, so thats what counts!

Anyway, rather than drop chunks of code into the blog which can get a bit confusing, I’ve uploaded the project to my Github account. I’ve tried to keep the comments up to date to walk through whats happening, but theres a lot in there, so I figured I’d do a breakdown of the agent script in the blog, and cross link to the actual script in the repo.

Continue reading

Simple Maze Experiment – Part 2 – Brain Food

In part 1 I set about preparing the Unity environment, so this next part will be about setting up the Academy and Brain components ready for use.

If you’re reading this and wondering what I’m on about with Academies and Brains, I do recommend you watch Unity’s own tutorial on Machine Learning Agents on YouTube, it provides a much more in-depth oversight as to what the Academy, Brain and Agent actually are and how they all fit together.

Continue reading

Simple Maze Experiment – Part 1


Now that I’ve got a basic understanding of how to use Unity’s reinforcement learning, the next step is to put it into practice. To do so, I’m going to go with a very simple concept. An agent will be placed in a maze and will need to find the quickest route to the exit.

Normally I’d use some form of pathfinding for this, most likely the built in Unity pathfinding tool. The main reason for this is later on I plan to add hazards and puzzles to the maze for the agent to solve, and I want it to handle it like a player would, rather than needing to write explicit code to handle each puzzle. I’m also curious as to how well ML will handle visual inputs from a camera to navigate, rather than a raycast system.

So here’s my current battle plan:

  • Create a simple maze with an entrance and an exit.
  • The agent starts at one end, and must reach the other.
  • For randomisation, the start and exit points will be swapped every so often.
  • The agent will navigate using a camera, and four buttons, W, A, S and D. W and S will move the agent forward and back, while A and S will rotate left and right.

I’m toying with the idea of giving the capsule the ability to strafe, and then rotate seperately, but I’ll probably revisit that later once the basics work as expected.

Continue reading

Where to Start?

The biggest question I had once I found about using ML with Unity was where I could actually start. While there is a github repo that is constantly updated ( , the documentation included was not (for me at least) the most understandable in how to actually get the thing up and running.

THankfully, though I missed the live demo at GDC, Unity have provided it in a playlist format for everyone on YouTube:

I’d recommend downloading the project from github, and then going along with the video to better understand how to use the ML agents, and get an understanding of the two types of ML that are available (It’s worth noting that while the guide mentions both Reinforcement Learning and Imitation Learning in order to explain the differences between the two, the tutorial itself focuses on Reinforcement Learning.

I’ll also note a few issues I encountered while following along:

  1. My Python 3 install was not present in my Windows path, so I had to manually add it. If you are installing Python 3 from the website, this option is provided during the install process. Unless you plan to run multiple versions of Python on your system, I’d recommend ticking that option.
  2. Several packages were not installed or listed as dependencies, so when I started up the section, Python wouldnt run until these had been installed. Using PIP (on Windows this is called with Python -m pip <package name> from command prompt), make sure you have the following:
    • docopt
    • numpy
    • image
    • tensorflow==1.4.*
    • PyYAML
  3. It is important to note the use of Tensorflow 1.4, as by default PIP installed 1.8, which currently has issues with TF#, and if you build your Tensorflow data with 1.8, when you fix it and install 1.4, you have to re-run your training simulation again to rebuild a new .byte file that is compatible.
  4. You’ll need to have installed Tensorflow # (sharp) plugin for Unity, see the guide here for full details. The short version is basically download and import this Unity package, then go to Go to Edit -> Player Settings and set  the Scripting Runtime Version under Configuration to Experimental (.NET 4.6 or equivalent) and  add ENABLE_TENSORFLOW to the Scripting Define Symbols for each type of device you want to use (PC, Mac and Linux Standalone, iOS or Android). Once all that is done, save, close and then re-open Unity.

Other than those minor hiccups, I was able to get everything working.

A Brief Introduction

So for a while now I’ve been learning a lot about Tensorflow and Machine Learning. Originally it was for a work related project, however that got canned when management realised that ML wasn’t just a “Magic Bullet” they could put in to solve their problems overnight, it would take time and understanding to get right.

However, by the time they had canned the project, I’d already been doing some serious research into the topic, and once I got my head around the basics I realised that there was a hell of a lot of potential in such a technology, way outside of what work wanted to use it for.

So in my spare timeI kept digging, I kept learning, and I kept trying to get better at it.

Then just after GDC 2018, I learned something that had somehow passed me by originally. Unity 3D supported Machine Learning Agents (ML Agents), and was a technology that was being actively worked on by the developers.

Suddenly my use for ML had moved away from the rather mundane (to me anyway) large scale data analysis that I was thinking of using it for, to something a lot closer to me. Something I’ve had a vested interest in for the longest of times: Games Development, and Artificial Intelligence.

This in turn made me decide to publicly document my thoughts and discoveriers as I turn down this path to both provide a record, and to perhaps help others who are unsure where they are going when it comes to using ML and Unity 3D.