Simple Maze Experiement – Part 6

Analysing the Results

So after the last batch of experiments I settled on ramping up the resolution of the image supplied to the brain during training to see if this improved training results, as well as actual results when used live.

The overall answer to that question is “not really”. By adding larger resolution images to analyse, training time is significantly increased, but the end results are negligable. I also found that training required a significantly larger chunk of memory in order to function to what I will say essentially appears to be same outcome as you’ll see in the below tensorboard overview.

In the above graph, red indicates a training run using an image resolution of 256×256 pixels, while the blue one is 512×512 pixels. I was surprised to see that despite a higher resolution image (and therefore cleaner data for the neural network to work on) the runs actually had greater success rate with the smaller image resolution, though not by a massive margin. The real difference is the time it took between each training run, with the 512×512 version taking over double the amount of time to train than the 256×256 version.

Onward and Upward

So far I’m not seeing a massive need for visual observation in Unity, it seems clunky at best, and the training time is rather large. Having done some more research I’ve found several articles that recommend using over 1 million steps, with some advocating 5 million upwards. From previous tests of 3 million steps with a small image selection, I am not confident that this would produce the end result I would deem satisfactory, however it is something I am interested in trying once I can fix a cooling issue with my GPU to allow it to run for longer than 3-4 hours of calculation without encountering stability issues (a 3 million step training session would take ~10-15 hours).

Since I don’t think I’ll be able to sate my curiosity regarding the visual learning component until I can at least test the 3 million and 5 million steps theory, I’m to keep this project in its current state, and simply adjust the training parameters to see what I get out of it. In addition to this, I’m going to create a new sub-project in the repo where I will duplicate the current setup, but replace the visual observation method with a vector based method, and see how that compares to the visual one in terms of both training time and actual result accuracy.

Once I’ve got these two set up, I’ll upload them to my GitHub Repo.



Simple Maze Experiement – Part 5

Afte running may variations of training with many variations of hyperparameters, and a few changes to the agent and the training code itself, I’m still not entirely sold that Visual Observation is the best method of Machine Learning for a Unity Agent, at least not in the method I wanted to use it for.

To start with, I limited myself to changing hyperparameters only, and ended up settling on the following:

use_recurrent: true
sequence_length: 64
memory_size: 256
num_layers: 2
gamma: 0.99
batch_size: 128
buffer_size: 2048
num_epoch: 5
learning_rate: 3.0e-4
time_horizon: 64
max_steps: 5e4
beta: 5e-3
epsilon: 0.2
normalize: false
hidden_units: 512

However, I found that tweaking the parameters alone was not enough, so In addition to these parameters, I also updated the agent to have 4 cameras (pointing in the 4 cardinal directions), and to interpret the view from these cameras as 80×80 pixels. This was a required change as the original version that tried to process a 640×480 resolution window would run of memory and crash. I’m not sure if the additional cameras have helped or not, but dropping the resolution of the images down definitely did stop the crash issues. I also increased the training speed to 50, changed the agent interaction from a rigidbody to a character controller and reduced decision making interval down to 3 (from 5).

In regards to the Hyperparameters, the key points are:

  • I have told the agent that it is recurrent, meaning it should remember the last few actions it has taken,
  • I’ve given it 2 visible layers in the neural network, and then 512 hidden layers.
  • I’ve also set to run the full training iteration over 50000 runs (max_steps). I did have longer steps set up, but they appeared to make minimal difference to the result. A set of hyperparmeters that had 50000 steps actually gave me worse results than 3000000 steps, Possibly due to overfitting the network.

I’ve added a tensorflow breakdown of the two runs below:

Continue reading

Simple Maze Experiment – Part 4 – How to Train Your Agent

With all the other components now more or less done, its time for me to actually train this agent. Before I get into that though, I think now would be a good time to relist the initial training parameters I went in with:


  • Max Steps: 0
  • Training Configuration:
    • Width: 640
    • Height: 480
    • Quality Level: 0
    • Time Scale: 10
    • Target Frame Rate: 60
  • Agent Run Speed: 2
  • Success Material: Green
  • Fail Material: Red
  • Gravity Multiplier: 3


  • Vector Observation
    • Space Type: Continuous
    • Space Size: 0
    • Stacked Vectors: 1
  • Visual Observation
    • Size: 1
    • Element 0:
      • Width: 640
      • Height: 480
      • Black and White: False
  • Vector Action
    • Space Type: Discrete
    • Space Size: 4
  • Brain Type: External


  • Brain: MazeBrain
  • Agent Cameras
    • Camera 1: a forward facing camera attached to the capsule gameobject.
  • Max Steps: 5000
  • Reset on Done: True
  • On Demand Decisions: False
  • Decision Frequency: 5
  • Goal: Goal Game Object
  • Ground: Plane prefab
  • Spawn Point: -3.3, 1, 6.84 (this will put the capsule at the top left of the maze, near the small sticking out notch).

Continue reading

Simple Maze Experiment – Part 3 – Mayhem and Agents

Well I knew the agent was going to be more complex than the academy and the brain to set up, but what I didn’t realise is how much back and forth I’d be doing to fine tune the agent. I’ll be honest, even as I write this I’ve still not got an end result I’m happy with, but I’m learning, so thats what counts!

Anyway, rather than drop chunks of code into the blog which can get a bit confusing, I’ve uploaded the project to my Github account. I’ve tried to keep the comments up to date to walk through whats happening, but theres a lot in there, so I figured I’d do a breakdown of the agent script in the blog, and cross link to the actual script in the repo.

Continue reading

Simple Maze Experiment – Part 2 – Brain Food

In part 1 I set about preparing the Unity environment, so this next part will be about setting up the Academy and Brain components ready for use.

If you’re reading this and wondering what I’m on about with Academies and Brains, I do recommend you watch Unity’s own tutorial on Machine Learning Agents on YouTube, it provides a much more in-depth oversight as to what the Academy, Brain and Agent actually are and how they all fit together.

Continue reading

Simple Maze Experiment – Part 1


Now that I’ve got a basic understanding of how to use Unity’s reinforcement learning, the next step is to put it into practice. To do so, I’m going to go with a very simple concept. An agent will be placed in a maze and will need to find the quickest route to the exit.

Normally I’d use some form of pathfinding for this, most likely the built in Unity pathfinding tool. The main reason for this is later on I plan to add hazards and puzzles to the maze for the agent to solve, and I want it to handle it like a player would, rather than needing to write explicit code to handle each puzzle. I’m also curious as to how well ML will handle visual inputs from a camera to navigate, rather than a raycast system.

So here’s my current battle plan:

  • Create a simple maze with an entrance and an exit.
  • The agent starts at one end, and must reach the other.
  • For randomisation, the start and exit points will be swapped every so often.
  • The agent will navigate using a camera, and four buttons, W, A, S and D. W and S will move the agent forward and back, while A and S will rotate left and right.

I’m toying with the idea of giving the capsule the ability to strafe, and then rotate seperately, but I’ll probably revisit that later once the basics work as expected.

Continue reading