Source: Deep Learning on Medium
Visualizing Inequality with Deep Learning
We used deep neural networks to analyze 100,000 images of Santiago de Chile and built a visualization with the results.
Co-written with Tomás Ramirez
The original version of this post (in spanish) is available here.
Since the week of October 19th, Chile has found itself immersed in a so-called “social outburst”. Millions of people throughout the country have protested in order to request structural reform to the way Chilean society works.
The objective of this post isn’t to delve into the discussion about what is happening (there are many other sources for that), but rather to offer a small contribution to the understanding of the causes behind this conflict, using state of the art technology.
In the research we are carrying out in PUC’s Artificial Intelligence Lab (IALAB PUC) for the Millenium Institute Foundational Research on Data’s Explainable Artificial Intelligence project, together with UC Engineering’s Transportation and Logistics Department, we have been studying the modelling of urban perception (we will explain what this is a little further below) with deep learning.
This gives us the possibility of analyzing information from thousands of images much more efficiently than a human could, which allows for insights which would be impossible to obtain manually. We made use of this tool to analyze the perception of ~120,000 images of Santiago.
The urban perception problem consists on automatically rating the sensation given visually by a landscape based on a certain attribute (e.g: safety). Place pulse (Dubey, et al. 2016), is a crowd sourced dataset with approximately 1.2 million user responses, each consisting of a pair of google street view images, an attribute and an user generated label containing in which of the pictures the user perceives the attribute more intensively.
Thanks to place pulse the perception problem can be treated as a pair-wise ranking problem, and we can use techniques from that domain to train a deep network that learns to rank images according to attribute perception.
Architecture & training.
Since this is the first stage of our research we did a similar architecture to the one presented by Dubey, et al, but we ditched the classifier section of the network, since we are only interested in ranking and according to our experiments, the training was faster and more stable that way.
The resulting architecture is very simple and consists of an ImageNet pretrained Conv Net (we experiment with AlexNet,VGG, DenseNet and ResNet) which is fed into 2 fully connected layers with a final scalar output. For training we feed the two images into the network, and use the
The key part of the training is the loss function, which is taken from the pair-wise ranking problem approach. We start with a traditional margin ranking loss given by:
Where x₁ and x₂ are the image inputs, y is the label (-1,1), f represents the model output and m is a constant for numerical stability. Is important to note that this function does not provide the intended result when the label represents a tie ( y = 0 ), and this happens fairly often in the dataset, so to be able to take advantage of this extra data we add a second loss term that forces tied images to be ranked similarly:
We add this two components for the final loss.
We train one model per each of the six attributes on place pulse, we use SGD and we augment the data with random flips and crops. After the training we use the models to analyze the images of Santiago, It’s important to note that this images weren’t used for training.
Visualizing and analyzing the results.
In order to evaluate the results, we built a visualization of these in a map of Santiago (check it out here!). If you know the city, It’s enough to look at the map for just a little with any of the 6 attributes to register qualitatively that the results make a lot of sense.
If you couldn’t see the results on the interactive map you can see a general overview in the image below.