Original article was published on Artificial Intelligence on Medium
Spam Detection — NLP Text Classification
We daily writes huge amount of information, but there is a problem: one person may generate hundreds or thousands of words in a declaration, each sentence with its corresponding complexity. If you want to scale and analyze several hundreds, thousands or millions of people or declarations in a given geography, then the situation is unmanageable.
Data generated from conversations, declarations or even tweets are examples of unstructured data. Unstructured data doesn’t fit neatly into the traditional row and column structure of relational databases, and represent the vast majority of data available in the actual world. It is messy and hard to manipulate.
One of use cases in NLP which is Text Classification. Here it is the process of assigning tags or categories to text according to its content. It’s one of the fundamental tasks in Natural Language Processing (NLP) with broad applications such as sentiment analysis, topic labeling, spam detection, and intent detection.
So, Lets try Spam Detection project using Python
Importing Libraries and reading dataset.
Now by using dataset plotting graph using matplotlib library.
Ham and Spam comparison barplot plotting image..
Loading & Pre-processing Data
Now by splitting train and test data i.e features and labels are trained and tested.. f_train, f_test and l_train, l_test. Firstly after splitting test test features, convert into array form.
Inorder to train model we need to convert text to appropriate numerical values using vetorization. Need to do fit transform features that are trained.
Adding Algorithms to predict the features train, and find the accuracy based of features train to labels tested..
Decision Tree Algorithm
Random Forest Algorithm
Naive Bayes Algorithm
Accuracies plotted for all algorithms comparison..
Here see the image graph plotted for all accuracies comparison based.
Predicting SMS: Predicting Ham Or Spam based on training and test models..
Thus according to SMS what we have typed in user input, it will make prediction and provide a resultant output whether HAM or SPAM.
Here is my Kaggle profile, which might help you for extra info guys.
Hope you enjoyed this article. I welcome your comments & feedback. If you like this article please make a clap. Thankyou.
Happy Learning 🙂