What no Data Science course teaches you …

Original article can be found here (source): Artificial Intelligence on Medium

Complexity Vs Accuracy

Practically all the accessible Data Science courses and educational program centre around algorithmic knowledge. How a particular algorithm works internally, what parameters to tune to deliver the most accurate results, even how to deploy them in a live environment. While that is not necessarily bad, organizations assess people on their business aptitudes and how well can an applicant tackle a business issue or even discover answers for it. With the advent of AutoML and increase of complex deep learning models, it is anything but difficult to be hauled into a discussion of interpretability versus accuracy. I have seen most organizations incline toward interpretable solutions over accurate but complex models. That being stated, building a complicated model which serves no intent to a business can leave you with a sense of personal achievement but with a counterintuitive effect in terms of professional growth in the company. This is true in most cases, unless you work in a research lab where attaining an accuracy of 99% is a mandate with interpretability having less to no value.

Pro Tip: Use a rubber duck methodology. Try explaining the results like you would explain to someone who would have little to no idea of your analysis. Evaluate if you are making any sense. If you are unable to do so or you lack clarity of thought, investigate further about what can help you shed some more light on this topic. Later, this will help you concisely communicate your results with the business stakeholders as well.

Solution Mindset

It’s normal to discover amateur Data Scientists rushing off to a deliver a solution involving building a complex Machine Learning model. In a real-world scenario, it looks like using a sledgehammer for cracking a nut. A great deal of times, the solution need not be a model yet could essentially be a conditional rule, an engineering task or a bug fix with a monitoring dashboard. The mindset that a data scientist should develop is a solution based mindset, wherein you speculate regarding the ideal state of the system you desire. What is it that stops you from reaching this ideal state? How or what changes should you make to reach this ideal state? Organizations gain benefit from inferring answers for issues not by implementing state of the art AI models.

Pro Tip: Invest time in understanding the business KPIs and how the company makes a profit. That way you understand how and where you can contribute the most. Businesses like to target the low hanging fruits as opposed to shooting down the juicy apples at the top.

Cyclic process

Data Science problems are not straight forward to solve or finish like a software engineering project where after a task breakdown it’s just a matter of days where you finish building what you had planned for. Of course there could be iterations of better versions, bug fixes and so on however returning to the beginning and starting from square one seldom occurs. Data Science problems always start with a hypothesis of a given or known belief. The analysed data can then be in favour/against this hypothesis. If in favour, you hold it as ground truth and look for another way to validate the solution to be sure it’s not a biased result. This looks like doing the same task again with a different technique. If not in favour, it’s back to the drawing board. This back and forth could happen on different occasions until the appropriate results start to make sense and the numbers seem reliable enough to be communicated with the decision-makers. Therefore, predetermining a finite number of steps for Data Science could prove to be extremely difficult as the next step would depend on the results of the first. On the other hand, it is very easy to go down a rabbit hole forgetting the actual problem that you are trying to solve, making it very difficult to know when to stop.

Pro Tip: Draw a decision tree for every hypothesis and attempt to make it as short as possible, with a maximum of 3 levels. If the tree starts splitting from the third level break the subtree to another tree. Your manager can then easily make decisions which tree/hypothesis to pick up for testing first and will help him/her get estimates to when a task will finish.

Team sport

As you would have gathered from many posts, Data Science is a team sport not just externally but internally as well. It is true Data Scientists need to collaborate with business stakeholders, engineers and managers outside their team yet they need as much coordinated effort from their group to validate their analysis. One common mistake numerous amateur Data Scientists do is to arrive at a number and assume that it is the ground truth and not explore further if some kind of bias has crawled into the data. You should constantly be suspicious of the numbers you get and never be content with the analysis you do. This aides in improving the nature of your work. When the first model that you tune gives you an accuracy of 99% it is more likely that something must have gone wrong rather than the fact that you achieved a state of the art model at the first go.

Pro Tip: Have another Data Scientist review your analysis and validate your results. Along these lines it’s likely he/she may have new thoughts for improvements or could highlight an alternate methodology both of which could help increment your insight and approach.