Original article was published by Isaac Boates on Deep Learning on Medium
The first task to start training my GAN was to acquire the data. One might think I could have just downloaded the tiles directly from OpenStreetMap, but it wasn’t so straightforward. Being a free, volunteer-driven service, it would be very rude to start hammering on their servers with requests for my vanity project. Additionally, I needed the raw data used to draw the map in order to narrow down my search for suitable images to use as training data, which we will get to a bit later.
Thankfully, it is possible to download regular dumps of raw OpenStreetMap data from Geofabrik at various administrative levels. Going further, these dumps can be automatically downloaded and spun up as a local TileServer instance via a very handy Docker image, from which one can then “download” the rendered OpenStreetMap tiles.
I decided to use Germany as the region from which I would get my training data, because it is one of the richest regions in terms of data availability on OpenStreetMap. Even small towns in Germany tend to have nearly every building, path, church, tree and local business mapped. But upon trying to build a TileServer image with all of Germany as its database, disaster struck: “OUT OF MEMORY”. It turns out that my consumer-grade laptop was not up to the task of processing one of the most data-rich regions in the world. Who knew?
Thankfully, my poor, abused little laptop was capable of building an image with a single German province, so I started from there. Once I had the TileServer container up and running, I had to decide exactly which tiles to use in the training set. I decided rather arbitrarily that I wanted to generate maps at about the town or village scale. But there is an inconvenient truth about having absolute cartograhpic coverage of a region at this scale, even in one as data-rich as a German province — most of it is boring. Here’s a few remarkably dull examples:
This means that indiscriminately downloading tiles is going to result in a lot of “boring” samples, and I want the GAN to make maps that are at least a little bit more interesting than the pastel purgatory of a topographic countryside map. Thankfully, simply by having the TileServer container, I already had access to the raw data which it was rendering. This data is stored in a PostGIS database, and can therefore be accessed with regular SQL queries. So I made up some conditions for what I thought would make for an “interesting” map, and wrote a query which would return square boxes covering locations which met my criteria.
I played around a lot with different queries, but for brevity’s sake I will simply say that I ultimately decided that an “interesting” map would simply be a named location (e.g. a town or a village with a non-null name attribute) which had at least 50 polygons nearby whose “building” attribute was not null. As an SQL query on an OpenStreetMap database, that looks like this:
with building as (
ST_Centroid(way) as geom
building is not null
), pois as (
p.way as geom,
count(b.geom) as numbuilding
planet_osm_point as p,
building as b
p.place in ('village', 'town', 'suburb')
ST_DWithin(b.geom, p.way, 250)
ST_Envelope(ST_Buffer(geom, 500)) as geom
numbuilding >= 50
The query returns a set of 1km x 1km squares around named locations with more than 50 buildings nearby. There was quite a bit more nitty-gritty work that had to be done to further refine the training images, but I don’t intend to bore every reader with excruciating technical details. To see exactly how I did it, have a look in the repository at “process_pbfs.py” and “download_tiles.py”, and feel free to ask me directly about anything that isn’t clear. Suffice it to say that I now had a 256×256 pixel image that (more or less) prominently featured its respective town or village.
If you want to try out the repo for yourself with different locations of interest, just replace the contents of the query in the “sql” folder with one that returns areas you are interested in. Just make sure that it still returns squares of a fixed size.
With the process now locked down for a single German province, the time had come to apply it to the remaining provinces. It took several hours for my laptop to build the OpenStreetMap database, launch the TileServer container and scrape the images into a usable training set. The thought of repeating this process for all provinces manually was intolerable. So, as can be seen in the repo in “process_pbfs.py”, I automated the entire process. For those who want to try it at home for different areas in the world, all one has to do is replace the values in the “regions” dictionary with URLs to the appropriate “pbf” and “poly” files as found in Geofabrik. You’re probably in for a long wait after that, so go have a cup of coffee or seventy.
After a couple days, all of Germany was finished being processed. I had a total of 26,296 training images, above even the 25,000 which NVlabs suggests in their examples on the StyleGAN2 repository itself. All that had to be done after this was convert the data into “.tfrecord” format, which can easily be done as described in the StyleGAN2 repository.
My first attempt at training was on Google Colab, because it’s free. To make a long story short, it didn’t work. Maybe it used to work, because there is evidence all over the web of people using it with StyleGAN2, but when I tried to train the model, memory consumption would go critical, and would then always abruptly terminate with nothing more than a simple “
^C” as its final output (which indicates a SIGINT command to terminate the running process). I assume what happened is that Colab decided that I was using too much RAM and it killed the process. It would be nice if they were a bit clearer about what is going on, but I digress.
The StyleGAN2 repository README makes it rather clear that training this model consumes a LOT of memory, so it’s almost certainly the case that Colab (at least the free tier) is not sufficient to train StyleGAN2. So I began looking at provisioning a paid instance.
I eventually settled on using Lambda Labs. I’m not getting paid by them to post this, I swear. I just honestly had a really good experience with them and I found it affordable enough for what I wanted to do. StyleGAN2 requires a legacy version of Tensorflow (1.15), and they made it easy to downgrade to this version.
Before finally training, however, there was one last step which I did which I highly recommend that you do any time you are planning on investing time and money in training a GAN: I shrunk the images down to 64×64 pixels and trained on that first. This allowed me to first provision a much smaller (and therefore cheaper) instance and train the model quickly to see if it would suffer from the dreaded mode collapse. And lo and behold, it did not. But I still think it is always wise to spend a few bucks early to be absolutely sure before spending a lot more on the real thing, only to get a dazzling variety of grey smears for your trouble.
Training on a Lambda Labs instance took a couple of adjustments to the repo. Not that anything was their fault — in fact they figured out what needed to be changed. The StyleGAN2 architecture is actually getting a bit old by this point, and the Lambda Labs instances seemed to be a bit too cutting-edge for it. Nonetheless, all the required changes can be made using this snippet, taken from my own automatic deployment script:
sudo apt install -y python3-tensorflow-legacy-cuda;\pip install tensorboard;\sudo apt-get install -y libprotobuf-dev;\sudo cp /usr/lib/python3/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so /usr/lib/python3/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so;\git clone https://github.com/NVlabs/stylegan2sed -i ‘s/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_USE_CXX11_ABI=1/g’ stylegan2/dnnlib/tflib/custom_ops.py
Note that some or all of these commands may not be necessary in the future. NVlabs may update their repository, and Lambda Labs may have updated their own environment so as to render them obsolete.
After all that business has been taken care of, the .tfrecord files containing the real, 256×256 pixel training images just needed to be uploaded to the instance and the training could be kicked off. I decided to go for the largest instance they had — 8x Nvidia Tesla V100 GPUs. It cost the most per hour, but after doing a bit of math and comparison, the training rate was fast enough to make it the most economic option.
The training can be started by running “run_training.py” as indicated from the StyleGAN2 repo, indicating where the training data can be found, how many kiloimages to process (it’s not a typo by the way, it really is intended to be trained on 50,000 kiloimages (i.e. 50,000,000). You can also specify to use mirror augmentation or not. This will duplicate and flip your images, substantially increasing your actual pool of training samples.
I trained my model with mirror augmentation, which may not have been ideal, because it then had virtually no change of creating readable letters on the map. But I was concerned that without it, it would not train as quickly or to the same quality as my 64×64 pixel model from earlier, so I kept it on. I eagerly watched it train until I forced myself to go to bed.