Data Science — What is it anyways?

For nearly half-a-decade now I’ve been working with Data and have held several roles — Decision Scientist, Data Scientist, Co-Founder & Team Lead. I’ve worked with leading brands some of them Fortune 100 across various domains of marketing, supply chain & fraud. But, there are always lingering questions about data science. And I am confident that these questions exist for many other data scientists & for other business professionals as well, because I get asked about this all the time.

What really is data science? Why do I brand myself as a Data professional? Why do so many young professionals like me want to be a Data Scientist? Why are universities across the world starting Data Science courses at such exorbitant rates? How does a data scientist grow? How’s blockchain going to change the way data scientists work, as data scientists largely rely on centralized repository of data?

And there are so many more questions and whilst they look so innocuous to begin with, answers to those questions essentially underpin the reason for my existence & for many other young people like me in a professional setting.

In this article I am going to do a deep dive on the first question and perhaps the most important one.

What really is data science?

As data professionals always do — I decided to first collect data. I spoke with numerous professionals based out of India, US & Canada who held titles such as Data Analyst, Decision Scientist, Data Scientist, Financial Analyst, Manager — Insights, Product Managers & with a lot of students currently enrolled in data science programs and in engineering courses. More importantly, I presented the results back to a lot of professionals across organizations to understand if the perception of data science was in harmony with the “accepted” definition(s) of data science amongst data professionals.

Here are the highlights of my findings & my interpretation from those interviews & conversations:

1. Data Science is all about problem-solving — Well, what part of the business isn’t about problem — solving? Every person who has ever got paid in life, got paid because he/she solved a problem or at least attempted to solve a problem. But, wait let’s not dismiss this definition right off the bat.

Whilst, it is most certainly true that all parts of the business solve problems the way data professionals formulate & approach the problem is fundamentally different. Experienced data scientists focus on defining the problem. Infact, a lot of them dedicate about 80% of the time to define the problem and this ensures that they completely understand about the current state of the business and have consensus about the final state. But, then these goal-posts aren’t set in stone. Rather they adapt to the ever-changing business environment & end-user needs.

2. Data science involves building products leveraging the latest tools & technologies available at my company — This is what you hear from a lot of young aspiring data scientists and most people who gave this answer were engineers. Interestingly, over-whelming majority of people at various parts of the organization to agree with this definition. Infact, if you go to people who are trying to transition into data science and ask them how they intend do so, the number one answer is, “I am learning Spark/Hadoop/H2O”. But a lot of experienced data scientists loathe this definition of data science. For them it is never really about the technology that you leverage. Infact, most business leaders don’t care about the exact kind of technology used or the algorithms that a data scientist uses. So clearly, experienced scientists and business leaders don’t worry much about the underlying technology as long as the product/tool/team solves the problem at an acceptable price point

But then, why did this become such a popular answer? And why was this accepted across the board and not above?

Well, perhaps it is due the reason that data science arguably has more students than professionals as compared to any other discipline. The only other science/ engineering discipline that has such a skew is perhaps Blockchain technology.

And with the benefits of Data Science & Block-chain can have of businesses, leaders relying on expertise to solve business problems aren’t going to fail enough to succeed big. Looking at the world this way urges even a conservative person like me to float a daring prophecy — “The Ultimate Wild West of technology is just dawning upon us”. But, that discussion is for another day.

3. Data Science is about generating tactical insights from data that enable leaders design strategy & make decisions — Well, this is the classic managers response to “How do you define Data Science? / What is the role of a data scientist?” Most data scientists whether experienced or novice agree with this definition of data science. Interestingly, men/women in nice suits (consultants) have been claiming to do this for the better part of the last century.

The thing that surprised me was that this definition of data science didn’t have too many takers from other parts of the organization. The users felt that they could understand their customers intimately and due to the explosion of self-serve tools such as Tableau, Alteryx etc., most teams within an organization today has access to data and efforts from leadership team across these organizations and the media hyperbole has ensured that everyone (almost) is data driven or is attempting to make data driven decisions in most large organization.

Data Science is engineering solutions to problems in which engineers enable a system to form its own rules based on some historical data — Looks like a crazy definition, right? Well, this is what I thought data science was all about. When you think of problems such as customer segmentation, Fraud detection, product bundling and a few others I worked on — looks like this definition holds water. My conversations with several experts have made me realize that this exactly is the problem. We make these definitions based on our experience. People from various backgrounds today are approaching their problems with a data first mind-set. So, an attempt to force-fit any kind of definition to a rapid evolving piece of technology will result in failure.

So as, the wise man said — “Seek comfort in ambiguity & discomfort, for it that propels growth

But hey, wait no piece of work by a data scientist is ever considered complete without a quick synthesis, a bold prediction (massive leap of faith) and next steps/ a call to action.

Synthesis: Businesses of today are finding it hard to define data science and understand how the technology can be leveraged to impact to bottom-line. The IT plug & play model doesn’t yet exist in case of data science except for a few standard problems such as recommendation engines, anomaly detection etc. Also, as the problems keep changing it is just not possible to keep buying products every time the problem changes. For vendors in the space it is just not economical to build a massive gamut of products for every small problem. And even if they did, I really do pity the salesforce of that company. That is why large vendors (AWS, Azure, Google compute Engine) in the space have a PaaS (Platform as a service) model and not a SaaS (Software as a service) model.

Prediction: In the next few years specialized roles such as Data Scientists/ Analyst will disappear. Instead a major portion of the workforce of the future would be expected to understand data science and work alongside machines.

Some leaders in the space have already started calling such an employee as a Citizen Data Scientist!

They’ll exhibit the following characteristics:

  • Structured approach to problem solving
  • Lack of horizontal focus
  • Ability to leverage the best advancements in technology & math to solve business problems
  • Possess a data first mindset

Now, this leads to so many more questions? How do we hire this talent? Can we retrain my entire workforce? What problems are these kinds of people going to solve? How do we lead them? How do we retain them? Where do we find them?

Well, those questions are for another discussion!!

Please note:

  1. Feel Free to share this article on professional networking sites or within your company.
  2. I really look forward to your comments!
  3. While, I have taken care to maintain complete gender neutrality in this article, I might have unknowingly made some reference to a gender. Please let me know if that is the case. I’ll happily make the correction.

Source: Deep Learning on Medium