Source: Deep Learning on Medium
Can a Robot Make You Laugh? — Teaching an AI to Tell Jokes
Yes, and he’s almost as funny as me!
Tito Joker is a humorous AI that uses state-of-the-art deep learning to tell jokes. His goal is to understand humor well enough to tell jokes that are actually funny.
Why is he named Tito Joker? Because in Filipino, “tito” means “uncle” when translated to English, and in the Philippines, we all have that uncle who says the corniest jokes!
Feel free to interact with Tito Joker on this website.
As shown in the gif above, he is able to tell jokes that are not just grammatically correct, but also correct in terms of the format of the joke being intended.
For example, he understands that a joke in the format of “why did the chicken cross the road?”, is answered with a place or reason that makes sense in the context of a chicken crossing a road. Of course, he doesn’t get this all the time, but he often does (as shown below).
To make him even funnier, I decided to teach him how to use one of the most popular modes of communication by people on the internet — GIFs! The idea is that once he tells a joke, he also attempts to show a GIF that is related to the joke.
How is this possible?
This 2019, I have been working a lot on natural language processing (NLP) applications of deep learning, with a particular focus on the Transformer based pre-trained models that have been published in the past year — BERT, OpenAI GPT-2, and XLNET.
Don’t worry, I won’t discuss the Transformer architecture in detail here, but below is an explanatory diagram from Jay Allamar’s amazing blog post.
As a jokester myself, I can say that I am quite passionate about humor, and so the idea of modelling its complexity really excites me. Can we use deep learning to build an AI that is actually funny on its own? Perhaps, we could even bring it to a point where it can do standup comedy! (see below)
1. Transfer learning with OpenAI GPT-2
Given its top performance on language generation tasks, I decided to use OpenAI GPT-2 (simply called GPT2) as the backbone model for Tito Joker. The objective of GPT2 is simple — predict the next word of a statement, given all previous words (show below).
Take note that GPT2 was trained to do this using 40GB of text scraped from 8 million web pages. That is a lot of text!
Now, this is all interesting, but how do we use this model to create an AI that tells jokes? Transfer learning — the process of using a pretrained model (e.g. GPT2) and “fine-tuning” it on another dataset which contains the information that you want the model to learn.
From here, it becomes clear that the approach would be to gather a text dataset that contains humor. By fine-tuning GPT2 on a humorous dataset, we can then create a new AI model that understands humor, and can thus, tell jokes— Tito Joker.
2. Humorous dataset creation with riddle type jokes
A jokes dataset from Kaggle was used for fine-tuning. It contains 231,657 jokes with varying formats, including “Yo Mama” and “how many ____ does it take?” kinds of jokes.
Warning: the dataset contains NSFW jokes, so Tito Joker’s humor will also reflect jokes of this nature. I plan to add functionality to filter through these in a future version of Tito Joker.
In order to make it easier for Tito Joker to understand the “concept” of a joke, I decided to limit the scope of the jokes to riddle type jokes. In other words, I filtered to jokes that started with either “what”, “how”, “when”, or “why”. This brought the number of jokes down to 65,394.
Aside from the above, I also added special tokens to allow the model to understand the difference between the “question” and “answer” components of a riddle type joke. These are summarized in the table below:
Example of a joke with special tokens:
<soq> Why did the chicken cross the road? <eoq> To go to the other side. <eoa> <eoj>
For more details, please refer to Tito Joker’s preprocessing script.
3. Tito Joker Training with GPT2 + humorous dataset
Now that we have both the pre-trained model and the humorous dataset, we can now train Tito Joker! Tito Joker was created by fine-tuning GPT2 on the humorous dataset from the previous section. Through this process, Tito Joker has effectively “learned” the concept of humor from the humorous dataset.
The end-to-end training workflow of Tito Joker is summarized by the flowchart below:
Fine-tuning took roughly 30 minutes on a Google Colab notebook with one T4 GPU and a batch size of 2. Also, note that the training process was executed using the same language modelling objective and hyperparameters as the original GPT2 paper.
For more information please refer to the Tito Joker training script.
4. GIF generation with POS tagging and GIPHY
Part-of-speech (POS) tagging was used to detect nouns in the jokes told by Tito Joker. Once common nouns are identified, these are used to search through the GIPHY API, which then returns a relevant GIF.
For example, if the joke entered is “why did the chicken cross the road?”, the common noun, chicken, will be detected and used to return a GIF from GIPHY.
Do note that I was originally planning to use named entity recognition (NER), but decided to start with nouns since “names” are harder to match on GIPHY. For example, “Lorenzo” (referring to myself) is not likely to return a relevant GIF of myself, compared to the common noun, “boy”, which will easily have a match on the GIPHY API.
How can Tito Joker learn to be even funnier?
1. “Rate” the funniness of jokes
A feedback system will allow users to “rate” the jokes being told by Tito Joker. This feedback will be stored and can be used to continuously improve Tito Joker’s “funniness” over time.
From a modelling standpoint this may mean having to train a separate “funniness” model which will be used to filter through the jokes that are generated. An example brute force approach would be to generate say 100 jokes and only return the funniest joke out of the 100.
2. Control the type of joke to tell
Semantic controls will allow users to configure the kind of humor that they want to experience from Tito Joker. For example, we might want to explicitly tell Tito Joker to generate Yo Mama type of jokes or we may even want to explicitly set the sentiment, and minimize the toxicity of the jokes being told.
This will require us to account for these joke dimensions (e.g. joke type, sentiment, and toxicity) at both training time and deployment. The dream would be to have something similar to Shaobo Guan’s TL-GAN model, which allows for easy configuration of the faces being generated by the model based on age, hairline, and gender (among other factors).
3. Give Tito Joker context about a topic
Contextual input will allow users to feed Tito Joker contextual information that they want Tito Joker to consider when telling a joke. This is based on the idea that context is where the intelligence of a joke comes from. For example, standup comedians are considered funny because they are able to embed relatable concepts to the jokes that they tell — politics, culture, current events.
One way to implement this is with a similar specification with BERT for Q&A, where both a question and context paragraphs are used as inputs to the model. Imagine being able to input an article about Donald Trump to give Tito Joker context on what kinds of information to consider in telling his joke (e.g., the wall on the Mexican border, immigration, the trade war with China, etc.).
Modelling humor with AI is still a work in progress but there are a lot of experiments yet to be conducted. If we continue to work towards this goal, I am confident that very soon we will be able to train Tito Joker to be funny on his own.