Original article was published by Kirsten Menger-Anderson on Artificial Intelligence on Medium
Matching Love Poems to Images from The Metropolitan Museum of Art
Chances are good that when you look at this page, you see Cupid. In fact, the above image is titled Study of Cupid and is from the sketchbook of American artist John Quincy Adams Ward. I downloaded it along with 1699 drawings by American artists from the Metropolitan Museum of Art’s open access collection and then asked my computer to generate a caption for each.
About Study of Cupid my computer said:
a bear of a male in an orange and snow
I was charmed.
Visitors can browse more than 406,000 images in the Met’s collection and/or use the images that are in the public domain. The collection’s API provides a way to programmatically collect a set of either a low- or a high-resolution versions. I used the Met’s API when I gathered the low-resolution images for my project.
My goal was to match each of the 1000 love poems in my collection with a piece of art from the Met and to use the caption generator to help me. I then used the matched poems/images in the love poem carousel I built (see it here) because I wasn’t satisfied with the significant gender imbalance in the carousels Google returned with my poetry search queries.
Here’s What Happened:
After downloading the images from the Met, I built my caption generator using Google’s Image Captioning notebook, which I ran on Google Colab. I tried other implementations before I settled on this one, but ultimately, the Colab notebook was the one that I got running most easily (I used a professional account, and got bumped off from time to time, but that was my biggest trouble).
The initial caption generator, which created the “bear of a male in an orange and snow” caption, was trained on just 30 thousand captions, which is a relatively small amount of text, but allowed me to test out the caption generator without spending too much time on it. I tried captioning a couple of my own photographs as a quick way to set my expectations for the drawings.
I wasn’t sure how well the caption generator would do with the drawings as it had never seen a drawing before, only photographs. Also, the generator’s vocabulary size was-in the interest of saving memory — constrained to 5000 words, about what the average four-year-old knows.
Still, I had a substantial number of poems and images and would take even a four-year-old’s help. Originally, I included a few hundred works from the Biodiversity Heritage Library’s open access collection in addition to the work from the Met. I loved these images, but the license specified attribution, and I thought that would be too much text for the carousels. I include a few of the images here, however, and you can see the whole collection on Flickr.
My first caption generator produced these gems:
After my initial experiment, I decided to train the caption generator on a larger number of captions. For my final caption generator, I trained the system on 414,113 captions and increased the vocabulary size to 10,000 because I wanted my system to be more expressive.
While the new caption generator was training — it took all night, and still didn’t go as long as I’d have liked — I looked at the gender data the Metropolitan Museum provides. Although only a subset of the female artists are identified as such in the dataset, I was able to browse the work of Mary Russell Smith, Henrietta Johnston, Marcia Oakes Woodbury, Fidelia Bridges, Ellen Robbins, Jane Anthony Davis, Maria Edgar, Sarah Fairchild, Emily Maria Spaford Scott, and Ruth Whittier Shute this way (as well as four paintings attributed to James Sharples, which note that Ellen Wallace Sharples is ‘possibly’ the artist).
At last, the caption generator was ready and I tried it out. Here’s what the new and improved caption generator (i.e., trained on more data and given a larger vocabulary, but not trained for as long as I’d like) had to say about Cupid:
a man holding a skateboard in the dark column
Now that I had my state-of-the- art captions, all I had to do was compare them to the poems to determine which image was most suited for each.
There are many ways to approach text comparison (this piece on Medium offers a nice overview). For my first attempt, I decided to use BERT and cosine similarity to compare the image captions to summaries (representing ‘the heart’) of the poems. Using computer-generated captions and summaries allowed me to compare two texts of a similar length that also reflected my computer’s understanding of the work.
I decided to use Derek Miller’s summarizer (you can try it out here). The linked paper explains that the service was designed to help students summarize lecture content, and nowhere is poetry mentioned. However, the summarizer is extractive — it summarizes text by pulling phrases from the original document — and I thought that using the original language would be ideal for my purposes.
The basic unit for prose is the sentence, but for poetry, the line. To encourage the summarizer to consider each line as a unit, I pretended that each was a sentence by replacing the line break with a period, and then I instructed the summarizer to make a one-sentence summary for each poem.
Here, for example, is the poem summary of “ The Modern Woman to Her Lover” by Margaret Widdemer: Hand in locked hand we shall pass along
I managed to create a brief summary for the bulk of the poems using the summarizer (some of the summaries were longer than others). For the poems that resisted summation, I just used the first 20 words.
Then I looked to see which caption and summary were most similar to one another so that I could match the corresponding image and poem. When I started this project, I hadn’t considered the possibility that the images would match more than one love poem, but several images matched multiple poems. Coming in at the top of the love match (i.e., determined by my computer to be the best match for the most love poems) is Two Studies of a Man:
Followed by an evocative sketch by Thomas Sully, which was selected by my computer as the next best match for love poems:
I wanted each poem to have a unique image. When an image matched multiple poems, I assigned it to the poem with the highest similarity score. E. E. Cummings’s ‘won’ the Two Studies of a Man image this way, while Anne Marie Macari received Sully’s sketch for From the Plane (a little under half the poems were assigned their top match and about two-thirds to an image in their top five). And Cupid, that bear of a male, was (alas!) not assigned at all.
See more poems and images in the Love Carousels
Miller, Derek. “Leveraging BERT for Extractive Text Summarization on Lectures.” arXiv preprint arXiv:1906.04165 (2019).
Reimers, Nils, and Iryna Gurevych. “Sentence-bert: Sentence embeddings using siamese bert-networks.” arXiv preprint arXiv:1908.10084 (2019).