Spam Detection — NLP Text Classification

Original article was published on Artificial Intelligence on Medium

Spam Detection — NLP Text Classification

We daily writes huge amount of information, but there is a problem: one person may generate hundreds or thousands of words in a declaration, each sentence with its corresponding complexity. If you want to scale and analyze several hundreds, thousands or millions of people or declarations in a given geography, then the situation is unmanageable.

Data generated from conversations, declarations or even tweets are examples of unstructured data. Unstructured data doesn’t fit neatly into the traditional row and column structure of relational databases, and represent the vast majority of data available in the actual world. It is messy and hard to manipulate.

One of use cases in NLP which is Text Classification. Here it is the process of assigning tags or categories to text according to its content. It’s one of the fundamental tasks in Natural Language Processing (NLP) with broad applications such as sentiment analysis, topic labeling, spam detection, and intent detection.

So, Lets try Spam Detection project using Python

Importing Libraries and reading dataset.

Now by using dataset plotting graph using matplotlib library.

Ham and Spam comparison barplot plotting image..

Loading & Pre-processing Data

Now by splitting train and test data i.e features and labels are trained and tested.. f_train, f_test and l_train, l_test. Firstly after splitting test test features, convert into array form.

Inorder to train model we need to convert text to appropriate numerical values using vetorization. Need to do fit transform features that are trained.

Adding Algorithms to predict the features train, and find the accuracy based of features train to labels tested..

Decision Tree Algorithm

The score for Decision Tree Classifier is 95.8%

KNeighbors Algorithm

The score for K Neighbors Classifier is 92.4%

Random Forest Algorithm

The score for Random Forest Classifier is 96.2%

Naive Bayes Algorithm

The score for Decision Tree Classifier is 98.9%

Accuracies plotted for all algorithms comparison..

Here see the image graph plotted for all accuracies comparison based.

Predicting SMS: Predicting Ham Or Spam based on training and test models..

Thus according to SMS what we have typed in user input, it will make prediction and provide a resultant output whether HAM or SPAM.

Here is the Github Repo Link for this article. And also checkout my other repo NLP-concepts on github.

Here is my Kaggle profile, which might help you for extra info guys.

Please do connect with me on Linkedin, Facebook, Instagram and Tumblr.

Hope you enjoyed this article. I welcome your comments & feedback. If you like this article please make a clap. Thankyou.

Happy Learning 🙂