Data Scientists Are Not Magicians

Original article was published by Natassha Selvaraj on Artificial Intelligence on Medium


Why does this happen to data scientists so often?

Photo by Tim Gouw on Unsplash

Even after hearing horror stories from data scientists and analysts, I never believed it was too bad.

As a newcomer in the industry, I thought that a lot of what I was hearing was simply a gross exaggeration of events.

That is…until it happened to me.

My Experience

As a final year computer science student, all of us are required to complete a capstone project.

Since I major in data science, I was keen on taking up a project that involved data analysis and machine learning.

After reading the scope of the project, I had multiple ideas in mind and was pretty excited about moving forward with it. My teammates and I already had a list of questions mapped out for our project supervisor and client.

However, the first meeting with the project supervisor was a complete disaster. Without disclosing too much, here is a brief overview of how the meeting went:

Supervisor: “Client A has a lot of data they are collecting through a variety of channels. They don’t know what to do with the data. Can you do some data analysis on it?”

Student: “Alright, that seems fine. Are there any specific questions they want to answer with this data?”

Supervisor: “Not exactly, maybe just build dashboards and apply machine learning techniques on it.”

After realizing that we were stuck in the “data analyst and rock” situation, my teammates and I decided to just go along with the flow and work with what we had.

Student: “Sure, that seems doable. When can we have access to this data?”

The supervisor explained that it would take some time for us to get access to this data. In fact, we may never get our hands on it at all till the last few weeks of the project submission.

After trying to explain multiple times that it wasn’t possible to start without knowing at least the variables present or the type of data being collected, we gave up.

Based on the details given to us in the project description and sample datasets online, we created a vague proposal and presented it to the supervisor.

Just like the “data analyst and the rock” story, this didn’t go too well either.

Supervisor: “This is great and all, but can you create a generic machine learning model that the client can use?”

What this means: He wanted us to use supervised machine learning techniques to create a machine learning model that would work on data it has never seen before.

We don’t have labels, neither do we have any idea of what the data looks like or what prediction is to be made.

In short, he expected magic.

If one machine learning model could be created and used on any kind of data, and predict anything in the world, machine learning engineers wouldn’t have jobs.