Sentence Prediction using word level LSTM text generator — Language Modeling using RNN

This projected was originally was for one of my clients on up-work. You can find the code on my Github repo. Unfortunately It does not contain the data-set(corpus) on which I’ve trained the model due to some privacy reasons. But you can train it on any text corpus which you want.


Lets begin with the problem statement, so there is some XYZ company which deals with the all sort of repairing works related to electricity, plumbing everything that comes in a household. So they wanted to have a smart solution for their complaint section (where customers register there complains regarding repairing). When user type 2 or 3 words it comes up with the multiple suggestions of sentences not words. The same way the keyboards in our cell phone gives suggestion 2 to 3 words when we type something but here in our problem instead of 2 to 3 we have to generate 2 to 3 sentences. So lets get started:


The data I got was very noisy, there were to many repetitive sentences and thousands of typos, misspellings, slang, incorrect punctuation. As any other machine learning project, it was necessary to analyze, clean and perform some pre-processing of this data.

So preprocessing include everything, removing redundant data, cleaning the data from misspellings and removing incorrect punctuation and also removing the words which does not appear very often (appears less than the minimum threshold we set).