Original article was published on Artificial Intelligence on Medium
How a simple textual explanation can add value to your data science results
Enhance the power of your data exploration using textual explanations
The popular saying “A picture is worth a thousand words” may be wrong when it comes to data science. Take example of Uber Expected Arrival Time (ETA) algorithm which informs the user when the ride is expected to arrive.
Behind the ETA , there is lot of complex predictive algorithm and cutting-edge visualisation with the map getting updated in real time. But all this of no use without the single text line which says “The closest driver is approximately 1 min away”
A data scientist or data analyst produces lot of data visualisation during a data exploration phase. All the cool visualisations look great, but you can really enhance its values using short textual explanations . Also in many cases, visualisations alone are not sufficient.
Only visualisations without explanations are source of misinterpretation
Take a simple example of a histogram. Shown below is histogram of a stock price close value
Just by looking at this visualisation, one can make many interpretation such as
Interpretation 1 — The maximum occurring value is between 13 and (something…).
Interpretation 2 — The lowest value seems to be between 5 something and 10 something
Stock trading is an area where one has to be very precise in values. So if interpretation is not precise, the visualisation alone does not help.
Data story-telling is a compulsion because visualisations alone do not do the job
Since many years data story telling has become a must-have skill for data scientist. But actually speaking, it is a compulsion because visualisations alone cannot convey the story.
A very simple visualisation can have a great story behind it. But unless it is told, it never surfaces. Take the histogram visualisation which was shown above.
The real story behind the histogram is that the stock price is swinging between 11 and 15 and stays on 12 for a very short amount of time. So the buying opportunity on 12 is very short. This kind of story is impossible to capture in a visualisation and needs to be physically told. Even if advanced visualisation such as animations are used, it still requires someone physically to tell the story
So this is where the power of a explanation comes into play. Adding a short textual explanation enhances what the value of visualisation. You go from showing visualisation to convey something meaningful
Let us see now some examples where explanations enhance interpretation of visualisation
Explaining a correlation matrix and avoid the stress of a “color-maze”
A correlation matrix visually looks stunning. However due to presence of lot of different shades of color, one has to look hard to interpret it. However just by adding a few lines of textual explanation increases vastly the interpretation of correlation matrix. The text can explain which are the most correlated data, as well as what the different shades of color mean
Shown below is correlation matrix based on car data. As you can see that just by adding a small explanation clearly enhances the value of the nice-looking correlation matrix. It will save your users “eye-balling” to see which are the most correlated data
Explaining a cumulative distribution to avoid “eye-balling” x and y axis
Cumulative distributions are very important to show how a numeric value is distributed. It is also creative way of focusing on important threshold of the numeric column
However just showing the cumulative distribution without any explanation is a painful eye-balling exercise. With a short explanation text about different threshold levels immediately gets the power of cumulative distribution to the next level and starts making sense
Shown below is cumulative distribution of stock price. With text explanations on thresholds (example 80% of close prices are less then 79.31) clearly enhances the value of a cumulative distribution visualisation
Explaining result of clustering to avoid any guess work
Clustering is a very powerful tool for any data exploration activity. However it can be one of the most mis-interpreted if not clearly explained. The result of clustering is generally a scatter plot with clusters shown in different colors. However the catch here is the fact that a 2D scatter plot visually shows only 2 columns of your data, where the clustering itself resulted from much more columns
So in order to correctly explain the clustering results, you need to use textual explanation which contains the feature importance of the clustering results
Including text generation functions in your developments
As data scientists, we focus on coding for all activities from data preparation, feature engineering, hyper parameter tuning, modeling, visualisation. But most of us do not focus on automatically generating textual explanations of results. So it is an good idea to make a habit to include functions which generate textual explanations inside the code
As more and more algorithms are packaged into products meant for end-users, the textual explanations of results is becoming very evident. And will make your data science work more appealing to a wider audience