Improving Firepower – Using NVIDIA’s CUDA to Improve Agent Training Times

In between my last post and now, I found I was encountering several issues in regards to CPU processing power when training my agents. If you watched my training video in the last post, you can see my CPU getting hit quite heavily, and when I started playing around with new Hyperparemeters (especially adding recurrent neural network support) this ended up going through the roof. More than a few times I’ve had Tensorflow or Unity crash out while running the training.

The solution is something I’d stumbled across previously in my Machine Learning studies, utilising my NVIDIA’s GPU’s processing power to handle the training. For those unfamiliar, CUDA is a parallel computing platform and programming model that utilises the GPU(s) of your graphics cards to be used for general purpose processing. The short version is that when it comes to math heavy operations like machine learning, I can dump it all on my graphics card (which is not doing much otherwise) and get it to do the heavy lifting.

Installing the CUDA Toolkit

It’s actually fairly straight forward to install and configure the relevant components to support this, and I’ve outlined the process I followed below:

  1. Download the version 8.0 of NVIDIA’s CUDA toolkit from here. It is important to get 8.0 as Unity ML toolkit only works with Tensorsharp up to Tensorflow version 1.4, which requires CUDA 8.0 or lower.
  2. Uninstall my current version of tensorflow via PIP through command prompt (Python -m pip uninstall tensorflow).
  3. Install the GPU friendly version of Tensorflow 1.4 (python -m pip install tensorflow-gpu==1.4.*)
  4. Add the CUDA DLL fiels to your systems PATH folder. They are:
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\extras\CUPTI\libx64
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64
    Note
    : If you are unfamiliar with adjusting your systems PATH file I highly advise you do some reading up on google first, you can cause some serious issues with your PC if you just change things at random.
  5. Reboot the PC (this step is not required, but I’m a fan of rebooting after a GPU driver update for my own sanity).
  6. The next part I needed was the NVIDIA cudnn software, this can be downloaded from here, but requires that you sign up to the NVIDIA developer program to download it. It took a bit of digging but I eventually managed to find cuDNN v6.0 for CUDA 8.0.
  7. To save hassle of adding more paths, I extracted the contents of the cuDNN folder straight into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\.

Once that was done, the next time I ran the training script it utilised my system RAM as well as the GPU’s memory, keeping my CPU in a much more stable state, and completing training runs much faster.