Getting started in Natural Language Processing

Original article was published on Artificial Intelligence on Medium

Getting started in Natural Language Processing

So you’re ready to dive into the world of Artificial Intelligence — a world full of Robots and the things they can do. But you’re interested in a specific species of robots, who can talk and understand like all other Human Beings. And your research begins. There are hundreds of websites having thousands of courses on AI, and you’re baffled, thinking “How and where do I start?”. Being in the same spot as you, this is how I started(excluding my failures to keep this post short).

The best way to start your NLP journey would require some Python basics and a project. “Wait are you serious? We start with a project without any background knowledge?”. Yes, precisely. There is a lot of difference in knowing the terms like “Stemming” and “Tokenization” and actually using them in a project. Start with a basic project with readily available data. Kaggle has abundant data-sets for you to get started. I started with Sentiment Analysis using IMDb movie reviews. Following is my Jupyter Notebook uploaded on GitHub:

These would be the main components of any NLP project:

  1. Load Dataset
  2. Pre-Processing(Cleaning data)
  3. Vectorize Input
  4. Train and compare with different models
  5. Analyze

The last Analysis consists of viewing what reviews were incorrectly classified. Try some more pre-processing, different models, hyper-parameter tuning to increase the accuracy. Once above 95%, move to next project and try to solve using fewer libraries, and more and more self-written functions. The videos by Dan Jurafsky were very helpful for me, and contains some interesting projects that could be undertaken. Here’s the link to his Youtube channel:

Hope this article helps. Happy Learning!