Original article was published by Aditya Bhandari on Artificial Intelligence on Medium
Text Classification with IMDb Movie Reviews
This Blog classifies movie reviews as positive or negative using the text of the review. This is an example of binary — or two-class — classification, an important and widely applicable kind of machine learning problem.
We’ll use the IMDB dataset that contains the text of 50,000 movie reviews from the Internet Movie Database. These are split into 25,000 reviews for training and 25,000 reviews for testing. The training and testing sets are balanced, meaning they contain an equal number of positive and negative reviews.
Download the IMDB dataset
The IMDB dataset is available Here.
Explore the data
Let’s take a moment to understand the format of the data. Each example is a sentence representing the movie review and a corresponding label.
Build the model
Let’s first create a Keras layer that uses a TensorFlow Hub model to embed the sentences
Let’s now build the full model:
Loss function and optimizer
A model needs a loss function and an optimizer for training. Since this is a binary classification problem and the model outputs a probability (a single-unit layer with a sigmoid activation), we’ll use the
binary_crossentropy loss function.
Create a validation set
Train the model
Train the model for 40 epochs in mini-batches of 512 samples. This is 40 iterations over all samples in the
Evaluate the model
This fairly naive approach achieves an accuracy of about 85%. With more advanced approaches, the model should get closer to 95%.
Create a graph of accuracy and loss over time
In this plot, the dots represent the training loss and accuracy, and the solid lines are the validation loss and accuracy.
Notice the training loss decreases with each epoch and the training accuracy increases with each epoch. This is expected when using a gradient descent optimization — it should minimize the desired quantity on every iteration.
For The Source Code, Click Here