Original article was published on Artificial Intelligence on Medium
Is Big Data Dragging Us Towards Another AI Winter?
Why data hoarding is a problem and how we can fix it
It can be hard to remember with the amount of breathless press clippings in the past few years, but the history of artificial intelligence has been fraught with snags and setbacks. People with long memories remember the first pair of so-called “AI Winters” in the early 70s and most of the 80s. The first was a result of disillusionment with AI generally while the second, arguably more important winter was born because the technology and physical hardware lagged far behind the theories of the day. For example: scientists had invented the idea of backpropagation, a backbone of Deep Learning, but the compute power necessary really wasn’t available until modern day GPUs were in abundance.
Now, of course, things have changed. Compute is readily available. We’re swimming in data. Governments are investing in research. Undergrads are studying machine learning. The press is covering AI as the next sea change in tech. Everything seems to be trending towards a future where AI is fairly commonplace and the general public understands and accepts its promise and utility.
Unfortunately, “seems” is the operative word in the last sentence. Businesses are investing in AI, but only about a third of them are seeing any return on investment. And if that ROI continues to be elusive, it’s easy to forecast a world where investment starts shrinking, especially in a global economic climate made wobbly and uncertain by an unprecedented pandemic. Instead of enjoying the continued thaw from the last AI Winter, we could very easily be watching the temperature dip again.
But see, the problem isn’t that AI doesn’t or can’t make money–it does. Plenty of process automation AI projects are successful, for example. Think about AIs that “read” legal documents and extract information or ones that triage and handle customer communications or reconcile billing issues. These aren’t the sexiest or most complex use cases, sure, but they save companies money so you know they aren’t going anywhere any time soon.
So if AI can make money, why are only 35% of companies seeing return on their investment? One big reason is that the cost of building and training models is still prohibitively expensive. Okay, but why is that? It’s because of another recent trend that swept the business–and especially tech–world: Big Data.
Ask yourself: how many times have you heard “more data” is what makes models better? In fact, that’s wrong. High quality data makes models better. Useful, well-labeled data makes models work. Having tons and tons of data? That doesn’t actually matter. Especially if there’s no way to prove the data’s useful. And that’s a lot harder to do when you’re dealing with the quantities we’re talking about here.
The reality is that investors and, for lack of a better term, the Big Data industry has kept the narrative of Big Data’s primacy alive. And the cost of Big Data is a gigantic driver of the cost of AI. The worry for AI and ML practitioners is that companies who believe wholeheartedly that they need to hoard all their data (even if they don’t see any obvious utility in doing so) may start pulling back investment in AI in favor of housing Big Data. But then, the biggest reason to store all that data in the first place is that you can make predictions and build AIs from it. In other words, we’re actually in danger of Big Data killing the investment in AI — -which, paradoxically, is a big reason it exists in the first place!
There’s also the issue that smaller companies are more severely impacted by the burdens of Big Data. There’s simply a higher barrier to entry for smaller organizations to store data and train models. When you couple all this with the fact that Moore’s Law is over and you can start to see a future where there’s real economic competition for server space and compute. And that feels like a forecast for a potential AI winter.
So what can be done here? Practitioners are going to have to lead the way. We need to be voices for what we need, not what Big Data needs. We need to invest in companies and solutions that help make AI profitable, not solutions whose goals are just to organize and structure Big Data. We have to take efforts to make the industry sustainable, both monetarily and environmentally. We have to reject the old ideas that having more data is always preferable. Because, frankly, it isn’t. Too many companies are hoarding data without much of a real use for it. We need to be able to purge that old, useless data we don’t really have a reason to hold onto. We have to invest in data quality measures, not places to hold our quantity.
In other words, to avoid an AI Winter, we need to reject Big Data and embrace Smart Data.