Source: Deep Learning on Medium
10 Hard Truths About Social and News Media Analytics for Crypto-Assets
Crypto is a new asset class and a very nascent and irrational financial market. As a result, crypto-assets are very reactive to news and social media outlets which, more than once, have acted as catalyzers of market rallies or downturns. Not surprisingly, the idea of extracting intelligence from news or social media outlets have been an elusive goal of the crypto-asset market since the early days. While that idea is certainly worth pursing, the implementations are full of challenges. Unfortunately, the crypto asset market is full of vendors that are trying to trivialize the complexity of extracting meaningful intelligence from news and social media outlets without showing any relevant results over time. Today, I would like to discuss some key relevant challenges that will face anybody venturing into social and news media analytics for crypto-assets.
Part of the temptation of trivializing the analysis of social and news media in the crypto space is related to the rapid evolution of deep learning methods. In the last few years, we have seen an explosion of platforms that make it extremely simple to perform basic sentiment or topic analysis over text data. Platform such as the Watson APIs or Microsoft Cognitive Services make the analysis of text as simple as an API call. As a result, many platforms in the crypto space are seduced by that simplicity and try to leverage those platforms to analyze channels such as Twitter, Telegram or news for crypto-assets. Unfortunately, the solutions are not that easy.
Using generic text analytic APIs for extracting insights in a financial market is like trying to learn a new language by just using a dictionary. While you get some basic words and phrases right, you are unlikely to able to engage in a discussion of an in-depth topic such as politics or, well, finance. Similarly, the analysis of crypto markets requires deep learning models that truly understand the context and key characteristics of the crypto ecosystem. Just like the equities market, the path towards creating effective text analytic models for crypto assets is full of challenges. But before we get to those, let’s try to better understand what types of text analyses are relevant to crypto as an asset class.
A Gentle Introduction to Text Analytics for Crypto Assets
Text analytics is a very broad discipline that combine areas from traditional machine learning as well as from the fast growing area of deep learning. In general, text analytics encompasses all sorts of disciplines related to the analysis of textual data ranging from relationship extraction to question-answer models. In the context of crypto and financial markets in general, there are three key types of models that are could become relevant to analytic techniques:
· Sentiment Analysis: Methods that quantify the affective states of textual data. In general, most sentiment analysis techniques qualify a specific text in positive, negative or neutral.
· Topic Extraction: Techniques that extract key topics from textual data. The collection of topics should act as a semantic representation of the underlying text.
· Tone Analysis: Methods that focus on understanding emotions and communication styles in textual data. While sentiment analysis methods focus on a vector of positive,-neutral,-negative scores, tone analysis tries to extract emotional qualifiers such as anger, confidence, fear and several others.
Sentiment, topic and tone analysis are the most relevant techniques to extract meaningful intelligence from crypto-assets and each one of them brings their own set of challenges.
10 Challenges of Text Miming for Crypto-Assets
There are plenty of difficulties building relevant text intelligence models for crypto-assets. Here is a summarized list of some of the non-trivial ones:
1) Most Sentiment Analysis APIs Don’t Work for Crypto
As explained previously, basic sentiment APIs are unlikely to produce relevant intelligence about the crypto market. Sentiment analysis models for crypto assets need to be trained on the key nomenclature of the crypto space as well as on some of the key dynamics of the market.
2) News Shouldn’t Contain Sentiment Signals
One of the misconceptions of the crypto space is that news should reflect relevant sentiment about the market. At the moment, there are plenty of crypto analytic companies that provide sentiment analysis for news just to show a close to neutral score. The reality is that well-written news SHOULD NOT express any sentiment about a specific topic and, correctly, their sentiment score should be close to 0.5 or neutral. From that perspective, analyzing sentiment in crypto news is a fools errand most of the time.
3) Twitter and Telegram are Relevant Sources for Sentiment but There are Also Too Noisy
While news media is not a great source of sentiment data, channels such as Twitter, Telegram or Reddit definitely are. However, those channels are plagued with incoherent, poorly written and completely biased messages that challenge the most sophisticated sentiment analysis methods.
4) Like Any Other Asset Class, Crypto Requires Sentiment Models Optimized for the Space
Is vanilla APIs don’t work for crypto market intelligence then what does? Well, most likely, sentiment analysis for crypto assets requires custom models that are highly optimized for the terminology and dynamics of the market.
5) Sentiment Many Times is a Lagging, not a Leading, Indicator
The conventional wisdom of sentiment analysis tells us that positive sentiment should be an indicator of a price increase while negative sentiment might be a sign of a price drop. However, many times the correct interpretation is the opposite: if the sentiment is positive and the price doesn’t go up then that could be a sign of a future price drop. Similarly, if the sentiment is negative and it doesn’t yield a price drop it might be a sign of a price rally.
6) Quantifying Topic Influence is Nearly Impossible
Not all topics are created equal. News about the SEC and Libra are not the same as news about an ICO. While qualifying topics from textual data is relatively trivial, the quantification relative to external market conditions can be nothing short of a nightmare.
7) News are a Solid Source for Topics Analysis but It’s Constrained to the Top Cryptocurrencies
Previously, we established that news are not a great source of sentiment data but they are a great source of topic information. Extracting key topics from top media outlets such as CoinDesk, CoinTelegraph or The Block is certainly an amazing source of intelligence but that analysis is only available to the top cryptocurrencies as most of the rest of the tokens are rarely covered in the mainstream news.
8) Topics for Twitter and Telegram are Too Noisy and Full of Biases
Topics extraction models certainly work against Twitter and Telegram data but the results are unlikely to be relevant. Twitter and Telegram fees tend to be very biased and the grammar is not exactly polished which introduces all sorts of issues for traditional topic extraction models.
9) Tone Analysis Emotions are not Related to Dynamics in Crypto Markets
Tone analysis suffers from the similar challenges of sentiment analysis models. Emotions such as fear or confidence are based on general linguistic terms and not on specific dynamics of the crypto markets. As a result, many tone analysis techniques can yield misleading perspectives about the markets.
10)Tone Analysis is Vulnerable to Twitter and Telegram Biases
Mainstream tone analysis techniques struggle to filter the biases expressed in Twitter and Telegram channels. It is common for followers to reinforce the point of a thought leader but that doesn’t necessarily reflect a position on the market.
News and social media are an incredibly source of insights for crypto markets. However, extracting meaningful intelligence from those channels requires a level of depth and rigor that goes beyond the mainstream technologies. The good news is that, for the first time in history, deep learning frameworks and platforms are available to data scientists and researchers to tackle these challenges and create models that can be a regular source of intelligence for crypto assets.
Next week we will be debating these topics in depth in a webinar. You can register for it at https://zoom.us/webinar/register/WN_w6aPs-ACRDugmjouZcI0-g