Dispelling the Myths Around Automated Machine Learning

Original article was published on Artificial Intelligence on Medium


Dispelling the Myths Around Automated Machine Learning

Machine learning continues to make great strides in optimizing data collection and analysis tools for use across a wide range of industries, but now a new strand is emerging: automated machine learning, or AutoML for short.

This promises to be less reliant on data scientists, who are in short supply because of their highly specialized skill set, and can be expensive to hire. However, there are a number of misconceptions about AutoML, the chief one being that it can do away with such data scientists altogether.

Machine learning is an extraordinarily powerful general purpose technology that has a staggering number of applications. It is therefore no wonder people are excited about an evolution of the technology known as auto machine learning.

To understand what AutoML can do, it is important to understand how standard machine learning works.

Machine learning involves several steps: first you need to collect the relevant data, and then clean it in order to learn what you want to learn from it. You then define the feature representation of your data, and put it into a model which is trained to optimize the accuracy in order to achieve your predefined goal. It is a complicated task that requires a lot of human involvement. And not just any human — to get the most from machine learning, you need a team of highly trained data scientists to create, apply and optimize the model, and to be involved at every stage.

The ultimate aim of AutoML is to automate every step in this process, increasing efficiency while reducing cost. If it worked perfectly, it could be used in many different applications in all sorts of industries, revolutionizing multiple sectors of society. This is why it is currently gaining so much attention.

Changing the Role of Data Scientists

However, like many emerging technologies, the reality is rather more complicated.

“How useful AutoML is really depends on the industry, the data type and the model classes involved,” says Min Sun, Chief AI Scientist at Appier. In terms of data collection and cleaning, digital marketing is one area that can benefit from AutoML, because data labels are naturally generated from customers interacting with companies’ marketing campaigns. Mature tools exist to clean those labels, to make sure they are not noisy or biased.

Other industries are less well-placed to benefit in data collection and cleaning, but suitable in automatic feature engineering. For example, self-driving cars need humans to help identify what is a pedestrian and what is a stop sign. Similarly, medical imaging tools require experienced doctors to help spot tumors. However, using neural networks to automatically construct features from raw images has already reduced many data scientists’ efforts.

However, Sun urges scientists not to naively apply AutoML. Regardless of what goal you are trying to achieve, AutoML will not replace human knowledge altogether. Rather, it will change the focus where that knowledge is applied.

“In marketing, you only automate the steps of the process that will be much more effective than if they were carried out by a human manually,” Sun says. “Typically these processes are repetitive or complicated with sufficient data support. In this way, humans can be freed from repetitive tasks and applied their knowledge in areas with less accumulated data.”

So, you cannot get rid of data scientists altogether. This approach that sees humans and machines working hand-in-hand has been termed Semi-AutoML by some. It gives a more realistic appraisal of what the process actually entails.

Weighing Up the Benefits and Costs

As long as companies are informed on what AutoML can do and how it works, they have a lot to gain by using it.

It can be more efficient, because it removes the need to have human experts in the loop, and machines can do this work much quicker than humans. Used in the right way, it can outperform humans too, as it minimizes the risk of human error.

As the processes are automated, it can also scale much more effectively than if the processes were done manually.

However, there are other factors to consider that are often overlooked. Namely, the cost.

The Holy Grail of AutoML is a neural architecture search — this will program an AI to search for the best neural network architecture to solve your given problem. Researchers have demonstrated it is possible to fully automate a neural architecture search (and outperform humans attempting the same task), but it takes an enormous amount of computational power — it could involve a dozen CPUs being trained for several days, for example, which is very expensive. So, any firm looking to leverage AutoML needs to use it smartly by weighing up the potential gain versus the financial cost and time spent.

While AutoML minimizes the risk of human error, it does not eliminate it altogether. AutoML will only optimize the metric you define, so if you define a metric in the wrong way, the model created will not solve your problem. This issue is not unique to AutoML — humans can make the same mistake with standard machine learning, but having humans in the loop can help to correct it, as they will spot that the model behavior is not correct. While removing humans from the process can have big gains in terms of efficiency, if done naively it can also cause more errors to creep in.

Putting It Into Practice

It is a tricky balance to strike. Companies need to weigh up the downsides of intense human involvement with the potential benefits, and decide what is best for their business model. Having a human in the loop at every stage of the process will mean your model will not scale, for example. But at the same time, automating the entire process while building each model will be too time-consuming. “We know that marketers cannot wait days before sending out a campaign, they need to strike in a timely fashion,” Sun says.

Often the best solution now is to leverage a data science platform that uses AutoML in certain areas (what is often called Semi-AutoML as described earlier). By automating certain steps, you can concentrate your computational power where gains in efficiency and reach are not compromised by a reduction in accuracy. This helps marketers tap into the power of AutoML but only where it is really beneficial to the business’ goals. Otherwise you are just using technology for technology’s sake.

AutoML has many benefits for enterprises, and especially for marketers, but only if used correctly. Only by taking a realistic look at how it works, what it involves and what it can do for your organization can you really leverage it to its full potential.