TweetTokenizer and WordTokenizer

Original article can be found here (source): Artificial Intelligence on Medium

TweetTokenizer and WordTokenizer

Find the difference

I read 2 methods to tokenize the input text and I found them different.

— Word Tokenizer

from nltk.tokenize import word_tokenizetext = '#Deep Learning is a branch of @AI.'
wordlist = word_tokenize(text)
= ['#', 'Deep', 'Learning', 'is', 'a', 'branch', 'of', '@', 'AI', '.']

— Tweet Tokenizer

from nltk.tokenize import TweetTokenizertext = '#Deep Learning is a branch of @AI.'
tokenizer = TweetTokenizer()
wordlist = tokenizer.tokenize(text)
= ['#Deep', 'Learning', 'is', 'a', 'branch', 'of', '@AI', '.']