The Data Science Dilemma

Original article was published by Davide Camera on Artificial Intelligence on Medium

Myths About Data Science

Buzzwords which are totally unrelated to the topic are often used to attract people to the (mis)information, which most often ends up creating confusion among readers. This confusion and misrepresentation of information under the cover of buzzwords have a convoluted effect on a reader’s decision-making capability and takes them on a route altogether different from what they intended to.

This trend can be highly visible currently in the field of Data Science as well.


Myth #1: Data Science is Just a Buzzword

Business leaders, journalists, and industry analysts are quick to use the latest jargon. The resulting noise can make it difficult to discern between industry hype and technologies or processes that can stand the test of time. Given the extreme hype about Data Science these days, it’s not surprising that some consider it just another buzzword or fad.

Data Science isn’t a buzzword or fad, however. It’s a confluence of time-tested disciplines, including statistics, math, computer science…that have existed in some form for many years. A few things that distinguish Data Science from its predecessors, including actuarial science and statistics, are access to massive amounts of data that can be stored cheaply, robust computing power, and quick access to predefined models.

Myth #2: Data science is exclusively for experts in Statistics and Mathematics

Data Science is not proprietary to some limited disciplines, it can be looked at like huge square in the middle of a crowded city where paths from multiple disciplines such as Mathematics, Statistics, Computer Science and Programming, Data Modeling, Visualization, Technology, Domain knowledge etc. pass through it.

While an expert in statistics or mathematics may get a good head start, cross-disciplinary experts bring with them the advantage of moving parallelly through different topics as a result of their past experiences.

Myth #3: Complex Models are Better Than Simple Models

Decision trees, statistical regression, and linear regression are not new, so the media pays less attention to them than deep learning and neural networks. Deep Learning and Neural Networks use complex models that are considerably more sophisticated than the models used to solve simpler problems because they are attempting to emulate arbitrarily complex functions.

Complex models are not necessarily better than simpler models for a few reasons. First, a complex model can be less efficient than a simpler model if the problem is relatively simple. Second, complex models can be costly in terms of processing power.

Finally, complex models can lead to black-box approaches that are difficult or impossible to explain. While the results of a black-box solution may be “good,” black-box solutions don’t allow users to explore how a result was derived.

If users can’t explore how a result was derived, they can’t understand what went into it. If they can’t understand what led to the result, they can’t explain the details, which is not good, particularly in an audit scenario. Simpler models are easier to understand and explain.