Original article can be found here (source): Artificial Intelligence on Medium
Mapdreamer — AI cartography
Sooner or later artificial intelligence will replace all of us. Or that’s what they say. I went on a quest to explore how far us cartographers are from getting replaced by AI. So can a computer create maps if you teach it?
But more seriously, I have been following and admiring the work of people like Mario Klingemann already for a while. Artwork created with neural networks looks very fascinating, beautiful and even frightening. Some people had been training neural networks with fashion images, anime faces or humans to create imaginary content and explore the visuals.
As a geographer and a cartographer a logical pathway to explore this space (sic) was through maps.
AI, ML, GAN… what?
A GAN is a generative adversarial network (GAN) is a class of machine learning frameworks. Two neural networks contest with each other in a game. The other network is generating images and the other is evaluating how “real” they look.
Given the network training set of maps, this technique learns to generate new maps with the same statistics as the training set. I refer here to statistics, because images are nothing more than arrays of pixel values, no matter how beautiful they look. So basically after training I can create almost endless number of images that are maps. But not quite.
For the process I used StyleGAN2 by NVIDIA. An awesome framework that is behind a large share of AI art you see online. Of course it’s used also for other purposes too.
Setting up the StyleGAN environment can be a bit of a pain for a beginner as you need Tensorflow (1.x), NVIDIA drivers, CUDA 10.0 toolkit and cuDNN 7.5. If you’re not using Docker (which I’m not), I strongly recommend using Anaconda to manage your Python package installations.
One thing that might come as a surprise for someone not familiar with processes relying GPU’s (graphics processing unit), like this type of machine learning, is that running a virtual machine with an efficient GPU is very expensive. For producing high-res images with StyleGAN you need NVIDIA GPU with at least 16 GB of DRAM. Even better if you have multiple GPU’s. Running these can easily cost you several dollars per hour and if your training takes several days, weeks or even a month, that is a lot of 💸
So optimizing the way you use your machine is very important. You could also check places like RunwayML, dedicated for this type of purpose.
I needed a lot of maps for the training. I was reading experiences from others that 10 000 would be the absolute minimum. As all maplovers know, old handcrafted maps are very beautiful. I wanted to give the outputs a hand-drawn feel, so that was one of the few limitations I had on my training materials. They could be from anywhere in the world, any map type, any scale and in any language. This way I thought that I could better see what really makes a map. Or at least how a machine would see it.
There are a lot of sites online that share old maps, but understandably very few of these offer an easy way to do a bulk download. So I had to do some scraping and manual collection. Of course manual collection doesn’t mean that I would be clicking and downloading, but I did quite a bit of manual research on the types of maps I would like to include in the training data and what sites are easy to use for data collection.
When you are collecting training data, you can do that with Python (e.g scrapy is pretty handy), but for many sites you can go pretty far with just wget.
wget --random-wait -nd -r -P /save/location -A jpeg http://www.somedomain.com
That command would return you all JPEG images on a specific sites and would do the collection recursively!
Note: if you are scraping data, please be nice. It is very easy to flood websites with requests but that is not really kind.
After some tedious data collection I had unify the data. StyleGAN2 wants the training data all to be same size and preferably square shaped. With Python I reshaped all of the images and made sure that they are RGB and not greyscale.
I wanted to maximize what I had, so did a trick where for each map I had, I took a crop from the center of the map and from few random locations.
from PIL import Image
import os, syspath = "/path_to_your_images/"
dirs = os.listdir( path )def resize():
i = 0
for item in dirs:
im = Image.open(path+item)
f, e = os.path.splitext(path+item)
i = i + 1
width, height = im.size
for crops in range (5):
rand1 = random.randint(-2000,2000)
left = (width - 1024 + rand1)/2
top = (height - 1024 + rand1)/2
right = (width + 1024 + rand1)/2
bottom = (height + 1024 + rand1)/2
pil_imagecrp = im.crop((left, top, right, bottom))
pil_imagecrp.save('/' + str(i) + '_' + str(crops) + '_map_cntr.jpeg', format='JPEG', quality=90)imResize = im.resize((1024,1024), Image.ANTIALIAS)
imResize.save('/' + str(i) + ' _map_rsizd.jpg', 'JPEG', quality=90)
So now you might be thinking where did I collect the maps from? Just so that people won’t start scraping those sites I won’t share the direct links or instructions here. But if someone is really interested, I can share more information. I used at least ten different sites that had map collections with permissive licenses.
I ended up with more than 100 000 images of maps. This map collection clearly had a few distinct types that could later on be seen from the results.
StyleGAN needs to have the images in a specific TFRecords format for Tensorflow to be able to read them. Beware that these multi-resolution TFRecords and up being around 30 times bigger than your original images. So for example I had a bit more than 20 gigabytes of map images and more than 500 gigabytes of TFRecords! So check that you have enough of disk space too.
You can transform images to TFRecords with dataset tool that comes with StyleGAN2 as follows
python dataset_tool.py create_from_images ~/datasets/my-custom-dataset ~/my-custom-images
After these steps I was ready to start training the network.
The loooooong training process
After you have all installations done and training materials as TFRecords, you can start actually training the network. I was running the training first on Azure and later on GCP. On both of these I had some credits I could use to run the expensive machines. So haven’t used any of my own money for this process. Just quite a bit of time. 1024 x 1024 is the largest size that StyleGAN2 can easily handle. Afterwards it might’ve made more sense to go for 512 x 512 resolution to speed up the process.
When writing this blog post I have trained the network for almost 3000 ticks (think of these as iterations) which has taken approximately one week of pure machine time. One tick took with my setup 2–3 hours. Every 50 ticks StyleGAN creates a snapshot of the network and some fake images.
During the training the number I was staring at was the Frechet Inception Distance (FID) score. This is a metric to get an idea on how the training is progressing.The FID score for these maps shown here was about 23.
I used training data with various languages and didn’t care if the fonts would turn out sensible so I used mirror augmentation in my config. This means that to StyleGAN can use mirror images from the training data. Other than that I used mostly default configs.
Some dreams about maps
So how did it all turn out? Interesting, to say the least.
Interpolating through the latent space is both interesting and psychedelic at the same time.
Besides nice animations, it is very interesting to just explore the latent space. Below are a few examples of the results. Somewhere between colored and black and white. Somewhere between Latin letters and Cyrillic. So looking good.
Below you can distinguish some map types and see map elements clearly: roads, waterways, graticules, contours. One thing I personally really like is how the scale gets mixed up in these.
Below even more random samples. Some of them better than others, but all of them look quite nice in their own way. Some black edges also show that some of my cropping hasn’t been 100% successful.
It is also possible to project 1024 x 1024 images to the network. So even though a particular map (or any other image) hadn’t been part of the original training data, it is possible for the network to try and mimic the results in the best way it can. Clearly for this to be more successful, a lot of more training would be needed.
When looking at interpolations between images, the transitions don’t seem always that logical. Some changes seem to be very sudden, but I think these are mainly due to the training data used. Well, I guess with machine learning training data is always to blame 😇
One more animation where you can clearly distinguish two map types from the training data. Other one is old Finnish basemap and the other is an old Soviet map. The brighter contrasts in these make them look quite nice.
In the future I would really like to experiment with style transfer: mappify your face on Instagram! Or just do something properly useful with this.
Many people have been asking on Twitter what is the point in all of this. I have a few motivations:
- I wanted to learn something new. I also messed up a few virtual machines and did some stupid package installations on the way, but that’s how you learn things 😁
- I wanted to create something new that hadn’t been done before. Still exploring the latent space to find a nice map to print and put up on my wall.
But what are these after all? Maps, artwork or just noise? You recognize what it is but can’t quite figure it out. What’s the scale of this map? Where is this from? What language is that label trying to be?
So a lot of interesting questions and I just couldn’t leave out this excellent explanation (Cheers Hannes!):
Has it been worth all of the time and resources? Absolutely.
Is it ready yet? I don’t think so. But for now it is. Training the model for a month would cost a lot of $$$. If someone from Google (or similar) is reading this, I would be happy to continue training and experiment a few things if someone wants to sponsor this 😎
Links and further reading
There are tons of very technical overviews about how to fine tune StyleGAN and how things work under the hood. This was aiming to be a relatively non-technical overview on things and here I’m hoping to point you to a few resources that were helpful to me.