How Does autoML Work?

Original article was published by smartboost on Artificial Intelligence on Medium


Data-driven marketing is all the rage today, and rightfully so. With advancements in AI, like automated machine learning, businesses can now access techniques, insights and models that previously required specialized expertise. From improving efficiency and increasing revenue to giving your company the competitive edge, read on to find out about the future of machine learning and how it can transform your business.

Before jumping into how autoML works, let’s look at a definition.

What is Automated Machine Learning (autoML)?

Automated machine learning (AutoML) is one of the newest tools in the field of Artificial Intelligence. AutoML automats the end-to-end process of applying machine learning to real-world and practical problems. It covers the whole process, from the raw dataset to the deployable machine learning model.

Automating the end-to-end process of machine learning offers many advantages, including quick and simple solutions that often out-perform traditional hand-designed models. It automatically reduces the manual, repetitive, and time-consuming tasks of data scientists. Automation in Machine Learning is a gift to non-experts because you can use machine learning models and techniques without being a professional in the field.

How Does autoML Work?

Implementing a machine learning model consists of many steps. With automated ML, we can easily reduce those steps. Traditional machine learning involves the following steps:

Data Acquisition

This is the first step of the ML process. Data is collected from various sources like files, databases, etc. and merged into one medium.

Data Preprocessing

To directly use data for training, some data processing needs to be done. This includes data cleaning for duplicates, processing missing data, any leakage, and removing noisy data.

Data Engineering

The goal of feature engineering is to convert categorical and ordinal data into numerical features. We also do feature selection at this step.

Data Modeling

Selecting the correct model is crucial. Research needs to be done to finalize which model will work best for the dataset. At this step, the model is trained, interpreted, and evaluated for best performance.

Hyperparameter Tuning

We use hyperparameter tuning to improve the performance of models by fine-tuning the parameters.

Prediction

At the final step, we make predictions of the unseen data. Machine learning answers the questions the ML model is trained for.

AutoML focuses on two main aspects: Data Acquisition and Prediction. All other intermediate steps are automated, as the name signifies.

AutoML takes the merged data as an input and produces predictions as outputs. It provides models that have been optimized and ready for prediction. The main purpose of AutoML is to reduce the burden of time-consuming and repetitive tasks for data scientists.

Pros and Cons of autoML

Every advancement in technology brings its own benefits. Along with the pros, there come some cons, as well. Let’s discuss the pros and cons of autoML.

Pros

  1. Reduce the time it takes to implement traditional ML models
  2. Reduce human effort by automatically running repetitive tasks
  3. Reduce human errors
  4. Save a lot of GPU and CPU processing, resulting in cost and power efficiency
  5. Anyone without ML knowledge can enjoy the benefits of ML features
  6. Opens doors for new opportunities to create a platform to provide autoML apps for easier access to machine learning

Cons

  1. Human intelligence is neglected in complex problems, which can be more efficient than autoML
  2. More emphasis on research and automating everything can lead to fewer jobs for data scientists
  3. ML makes some decisions, like feature engineering, on the basis of domain knowledge which is lacking in the automation process
  4. AutoML only focuses on supervised tasks that require labeled data as input and overlooks the more challenging tasks of unsupervised and reinforcement learning.

Choosing an autoML Tool

There are five key considerations when picking an autoML tool:

Ease of Use

AutoML tools should be easy to use and need to be user-friendly so that non-experts can handle them easily.

Pricing

Price is a key parameter when choosing automated ML tools. High range tools can be expensive, so not every business can take advantage of their capabilities.

Customizable

Automated Machine Learning tools need to be easily customized as per the requirements. Every dataset has its own unique features and parameters and you must be able to customize your tools accordingly.

Key Features and Performance Capabilities

The tool’s key features and performance capabilities must match the requirements of the project to predict accurate results.

Speed and Efficiency

A fast and efficient tool is the best choice for any organization.

Different Auto Machine Learning Concepts

There are two widely used autoML concepts: Transfer Learning and Neural Architecture Search.

Transfer Learning

The idea behind transfer learning is that instead of training the model every time from scratch, we can use the pre-trained model knowledge which is trained on a different dataset.

In traditional machine learning, different models are trained on different datasets. But in transfer learning, knowledge of the already-trained model is transferred to the other similar datasets. See the image below.

Transfer learning speeds up the training process, uses less computation power, and improves the performance of deep learning models.

Neural Architecture Search(NAS)

NAS is a subfield of automated ML. It is a process that can automate the design of neural networks as per the deep learning problem. Modern deep neural networks often have hundreds of layers of different types and the connections between layers can be varied. Traditionally, we develop deep neural network architectures using trial and error methods based on experience. NAS’ task is to automate this process of identifying effective deep learning architectures.

NAS consists of three components: search space, search strategy, and a performance estimation strategy. In the NAS approach, the search space contains all the architectures (often an infinite number) possible.

Based on search strategies like grid search or random search, architecture is selected from the search space which is to be tested by the performance estimation strategy. The performance estimation strategy is a metric on the search space, which returns a number that corresponds to the performance of every architecture on the search space. Based on the performance metric, the best architecture can be selected.

autoML Tools

There are two types of autoML tools available: open source and commercial. Let’s discuss some widely used tools available in the market.

Open-Source autoML Tools

AutoKeras

Auto-Keras is an open-source software library for automated machine learning. It is built over Keras, the widely used deep learning library. It is developed by DATA Lab at Texas A&M University and community contributors. Auto-Keras provides functions to automatically search for deep learning architectures and hyperparameters for models.

H2OAutoML

H2OAutoML is based on H2O, an open-source machine learning platform that automates the machine learning module. It can be used for automating training and tuning of many models within a user-specified time-limit in the workflow.

Auto-Sklearn

Auto-Sklearn is built over scikit-learn, the widely used machine learning library. Auto-sklearn automatically searches for the right supervised learning algorithm for a new machine learning dataset and optimizes its hyperparameters.

Amazon Lex

Amazon Lex provides advanced deep learning functionalities of automatic speech recognition (ASR), and natural language understanding (NLU). We have an example of Amazon Alexa with natural language conversational capability, or conversational bots.

TPOT

TPOT is built on top of scikit-learn and used for supervised machine learning methods. TPOT explores thousands of possible tree-based pipelines and finds the one that best fits the data.

Auto Pytorch, Azure Machine Learning, SMAC, Auto Weka are some other open-source tools.

Commercial Tools

Google Auto ML

Google AutoML is a suite of machine learning products on the cloud that are used to train high-quality models according to the business needs. The tool can easily be used by non-experts. It relies on automated machine learning concepts like transfer learning and neural architecture search.

Data Robot and Darwin are also commercial tools.

ROI of autoML

Every business invests in any new technology for a return on their investment. There is definitely some return on investment on AutoML.

Increase in Revenue

Every business runs to increase its revenue. Fast and accurate predictions using autoML result in an increase in revenue.

Decreased Costs

A one-time investment in automated ML tools leads to a reduction in other costs like training, processing power, etc.

Save Time

One-click model deployment speeds up the modeling ML process and saves data scientists precious time.

Improves Team Efficiency

Data scientists, analysts, and engineers can work with autoML, which will eventually lead to an improvement in team efficiency.

Competitive Edge

Automation gives your business a competitive edge over other competitors because you can be more results-oriented.

Fill a Skills Gap

Automation in ML can fill a skills gap in data science, which leaves the data scientists time to be more productive.

The Future of autoML

AutoML is now a trend in data science and it looks like promising technology. With the automation of repetitive work, the data scientist can spend more of their time on the research and business problems at hand.

AutoML fills the current skills gap in data science talent and leads to an increase in productivity. Automated ML and data scientists can work together to accelerate the machine learning process and utilize the real effectiveness of it to reach optimum performance.

AutoML is a big part of the future of machine learning.