Your Data is Probably Worse Than You Think (Chapter Excerpt)


Draft material from upcoming book: Machine Learning in Practice

Over the last decade, we’ve worked with dozens of companies across numerous industries to help them implement machine learning and artificial intelligence solutions. Clients approach us with a wide range of understanding about the problem they want to solve and the data they have to solve it.

We’ve seen clients who know they’d like to automate some element of their business, but don’t know anything about their data. We’ve seen clients identify one problem, only to have us discover a bigger problem where machine learning could add significantly more value once we’ve looked at their data.

Consistently, when we ask if their data is in good shape, we get a knowing nod of the head. In our experience, the data is rarely ready to use and almost always requires significant preparation work before we can start developing machine learning models.

Why the disconnect? It’s because companies rarely examine their data in the way someone doing advanced data science does. A good analogy is visiting your doctor for a yearly medical physical.

Photo by rawpixel on Unsplash

You know your body better than anyone, right? Yet at your annual physical, your doctor may tell you things about your body that you never considered. Your doctor may run a few tests, compare numbers across tests, consult the literature, and come back with a diagnosis of an issue you had no idea existed when you walked in the door.

That’s very similar to what happens when data scientists look at your data. You would think a company’s employees would know its data better than anyone else. But when an outside expert examines the data, she looks at it in ways employees rarely do and asks questions employees don’t typically ask. An expert can find holes the employees never saw themselves.

Next, we’ll describe the type of things an expert looks at when assessing data.

Source: Deep Learning on Medium