How I trained a self-supervised neural network to beat GnuGo on small (7×7) boards

Source: Deep Learning on Medium

The bug

The main and, as far as I can tell, the only major problem in what I was doing before was that I was not setting the training flag in the Tensorflow batch norm functions (it defaulted to always be in training mode where mean and variance statistics were updated with each and every network evaluation). As a consequence, I believe the networks were always ending up in poorly-conditioned states when the network was evaluated at test time. I noticed this perceptually too when I played against the networks — they always seemed to have decent starting moves and then it all kind of devolved into it making a lot of careless mistakes as the game progressed.

A remaining mystery

While the networks I’m training perform well, as described in the sections above, one anomaly remains from finding this bug that I still haven’t figured out. Let me first summarize a few details in how I’m performing training. The code keeps three models in memory:

(The reason for using mixed-precision floating point numbers is that it speeds up (computational) training time.)

  • “main”: this model is used to generate new self-play training batches. “main” is stored as float16s.
  • “eval32”: this model is trained using the self-play training batches created by “main”. “eval32” is stored as float32s.
  • “eval”: the purpose of this model is to be evaluated in Go matches against the “main” model (this is a copy of the “eval32” model converted into float16s). Once it can win against “main” with a high enough probability, it is promoted (copied over / overwrites) the “main” model.

Backpropagation never directly occurs on the “main” and “eval” models — they are downstream from the “eval32” model. For this reason, it would seem that you would never want to run these models in training mode — the statistics should be set and held fixed by the “eval32” model as it trains. Therefore, I would think the following configuration of training flags would make most sense:

  • “main”: training= False
  • “eval32”: training= True
  • “eval”: training= False

However, I find training with the above configuration results in poorly performing models. It is only when I set all flags to True during training that I get decently performing models (if played with the training flag set to False; if I play them with the trainingflag set to True, the performance remains poor). If anyone has any ideas about why this is happening, please do let me know! I would think training in this configuration would result in models that are much worse, not much better.

The code

The code in which I’ve fixed the bug is available on GitHub — I again release it in the public domain. Aside from the bugfix, I’ve added the ability to train on two GPUs simultaneously which I was doing when training the current model I talk about in this article.

My setup

All code has been tested and written on Centos 8 using Python 2.7.16, Tensorflow v1.15.0 and compiled with NVCC VV10.2.89 (the Nvidia Cuda compiler). I’ve run and test all code on a dual GPU setup (with a Nvidia 2080 Ti and Nvidia Titan X card), a quad-core Intel i5–6600K CPU @ 3.50GHz, and 48 Gb of RAM (the code itself generally peaks using around 35 Gb). I have not tested it on alternative configurations (although for some time I was running Ubuntu 18.04 instead of Centos 8 and everything worked there too). If you were to run this setup using less RAM, I’d recommend running only one GPU (and cutting RAM needs in half) instead of cutting the tree search depth (which would be an alternative way to reduce RAM usage).

Going further

An obvious next step for this would simply be increasing the board size and seeing how far I can take this on my setup. However, unfortunately I recently updated my system (general distro package update with “yum update” — no updates were made to Tensorflow) and now the multi-GPU part of my code crashes when it launches on the second GPU — despite this same code having worked perfectly fine for about a year — the issue seems to be in Tensorflow 1.15 crashing when I try to run the model.

I’m undecided if I will try to patch up what seems to be a sinking ship or change frameworks entirely. Tensorflow has been great in many ways to use and I do appreciate the great work all the developers have done. However, it has felt increasingly like a chore over the years to keep any code consistently running on it when names and semantics of functions seem to needlessly change and depreciate or things just stop working for no apparent reason like I’m experiencing now.

Before and during the initial release of Tensorflow in 2015 (and before I was aware of Tensorflow) I was working on my own neural network framework similar in purpose to Tensorflow (although my use case was narrower in scope than TF) where I was calling and using cuDNN directly in addition to some other custom CUDA kernels I had written. Anyway, the point of mentioning this is that this code still compiles and runs today despite me not having touched it in several years since I switched over to TF.

So, probably for me the next steps will be to move my code over to my old framework. On a longer-term basis, maybe I’ll eventually clean it up and release the framework. I know I’m not the only one out there that is tired of software frameworks changing out from under them for no reason. Fortunately for deep learning, it doesn’t seem to permeate all the way through — it appears that cuDNN has remained fairly stable (as evidenced by my code from 2015 just compiling and running as-is) so there’s no reason anything built on top of it should not approach the limit of the same level of stability.

On a longer, longer term basis, I’d like to get a network to play a game, or subsets of it that I’m working on. Probably the approach there will need to take a more mixed approach of human supervised and self-supervised learning (similar to AlphaStar, perhaps). It may well be that I’ll never run network trainings at the scale labs like DeepMind do, but the progress I’ve seen with Go using minimal hardware is definitely encouraging to me. The networks might not be at Lee Sedol’s level, but they can still be good enough to be good adverseraies against many of the rest of us 🙂