There has been so much talk of AI, deep learning, image recognition, VR, AR, machine learning etc. etc. in the tech world. One only needs to look at Google Trends (see chart below) to get a sense of how much this space has grown. This has now definitely filtered through into the cultural sector — recent examples include Tate’s Modigliani VR experience, Smartify’s image recognition app, Tate’s Recognition project, Google Arts and Culture Selfie craze and this week’s Thread Genius acquisition by Sothebys.
With the exception of the Tate’s VR experience all of the above rely on a technology known as Deep Learning, a subset of the field of computer science called Machine Learning. The idea is that we give a computer program lots of examples of data that we want to know something about and then the program is able to make inferences about this data. Deep Learning in particular has proven to be extremely good at handling unstructured datasets such as text, pixel data from images and audio.
The technology now allows the creators of models/platforms/apps to ask “What are the input output pairs that I care about?” and find correlations and connections between them that previously would have been much harder. Let’s look at the recent craze over Google’s Arts and Culture selfie feature — Google started with the input output pairs of paintings (inputs) and details of that painting (output) such as name, artist, date painted etc. They trained a model using thousands and thousands of images of paintings to find correlations between the image data e.g. colours, shadows, painting style etc. and the details of the particular painting. Then Google asked people to be the inputs to the same model by asking them to send in selfies. The model would then “predict” the details of the selfie and display the image of the painting that the selfie was most similar to. Interesting…
As there are some great deep learning frameworks like Tensorflow and Keras out there that let you get something up and running fairly quickly we thought we would give it a go! We started a project hoping to detect whether an input image is a bird or plane inspired by David Newman and Robert Benton’s book based on the comic Superman. The technical discussion is outside the scope of this post but for those interested we used Keras (running Tensorflow) to train a connected model on top of a VGG16 model (pre-trained on the ImageNet database). The model was trained on 2000 images of each class (birds and planes) taken from Flickr API. We then put together a very (very!) rough Flask app on Heroku to allow us to share it! Github for code can be found here.
The results are pretty remarkable, the model achieved 95% accuracy on the validation dataset. Given the small number of training images we were fairly impressed how quickly and relatively easily one could put together a simple image classifier. Obviously, this is not particularly useful but demonstrates the power of the technology.
Returning to the cultural sector, as we have seen, there have already been some great applications of Deep Learning. Visitor experiences have been improved and art is now more accessible than ever, but this is only the beginning. Image and text data is not in short supply in the sector making it a prime target for innovation in this sphere. What tools can we build to help cultural institutions give the best experiences for their visitors? These models allow us to draw explicit connections between things that would previously have been impossible or required years of practical experience; a selection of artworks and personal taste, a painting and the name of the artist and even selfies and masterpieces!
The emphasis has shifted from finding the correct answer to a specific question e.g. “What is the name of the artist of this painting?”. We can now concentrate on asking the right question and we are only limited by our ability to find enough real world examples (e.g. painting and artist pairs).
We would love to hear your thoughts — please get in touch!
Source: Deep Learning on Medium