Analysing the Results
So after the last batch of experiments I settled on ramping up the resolution of the image supplied to the brain during training to see if this improved training results, as well as actual results when used live.
The overall answer to that question is “not really”. By adding larger resolution images to analyse, training time is significantly increased, but the end results are negligable. I also found that training required a significantly larger chunk of memory in order to function to what I will say essentially appears to be same outcome as you’ll see in the below tensorboard overview.
In the above graph, red indicates a training run using an image resolution of 256×256 pixels, while the blue one is 512×512 pixels. I was surprised to see that despite a higher resolution image (and therefore cleaner data for the neural network to work on) the runs actually had greater success rate with the smaller image resolution, though not by a massive margin. The real difference is the time it took between each training run, with the 512×512 version taking over double the amount of time to train than the 256×256 version.
Onward and Upward
So far I’m not seeing a massive need for visual observation in Unity, it seems clunky at best, and the training time is rather large. Having done some more research I’ve found several articles that recommend using over 1 million steps, with some advocating 5 million upwards. From previous tests of 3 million steps with a small image selection, I am not confident that this would produce the end result I would deem satisfactory, however it is something I am interested in trying once I can fix a cooling issue with my GPU to allow it to run for longer than 3-4 hours of calculation without encountering stability issues (a 3 million step training session would take ~10-15 hours).
Since I don’t think I’ll be able to sate my curiosity regarding the visual learning component until I can at least test the 3 million and 5 million steps theory, I’m to keep this project in its current state, and simply adjust the training parameters to see what I get out of it. In addition to this, I’m going to create a new sub-project in the repo where I will duplicate the current setup, but replace the visual observation method with a vector based method, and see how that compares to the visual one in terms of both training time and actual result accuracy.
Once I’ve got these two set up, I’ll upload them to my GitHub Repo.