Original article was published on Artificial Intelligence on Medium
Five layers of a machine learning solution
On the technology layer, the system designer makes decisions about how and where to store datasets, what kind of computing device is needed to train and serve models, and the software stack it relies on, e.g., programming languages, frameworks, and other dependencies. The design decisions in each topic will depend on:
- Storage: depends on the dataset size, data type, required latency, etc.
- Computing: depends on required processing units such as CPUs or GPUs.
- Development: depends on team expertise, toolset maturity, etc.
Besides, like any other computational system, the decisions on each of these topics will have to take into account other issues, such as budget limits, scalability, and maintainability. There are many options in this layer, ranging from a small system where data sits in your hard drive and is treated on a local Jupyter Notebook, to massive pipelines on multi-cloud environments.
It may seem obvious, but it’s worth stating: all machine learning solutions depend on data. Knowing which kind of data will be treated by the system will allow the system designer to make informed decisions about adequate modeling techniques to achieve the desired results. When talking about data, the concerns are usually about:
- Data type: the way information is presented to the system. It may be on tables, texts, images, sounds, etc. Each data type presents unique challenges for which specific tools and methods are available.
- Dataset size: the amount of available data will affect decisions both on the technology and the model layer. Some models thrive on lots of data, while others are adequate for small datasets.
- Dependence: knowing if each data point in the dataset is independent or if some sort of dependence structure exists, e.g., time series and graphs, makes a big difference when choosing an adequate modeling technique.
Choosing a model is selecting the space of functions, or the hypothesis set, in which we will search for the best fitting model. The characterization of the previous layers will already inform this decision in some ways, deep learning models, for example, may require bigger datasets and specialized hardware to train. Additionally, there is a particular characteristic that is central to this layer:
- Interpretability: the need to explain its prediction or the variable relations within the model is a critical aspect of the model choice.
The direct assessment of the coefficients on a linear model or the thresholds on a decision tree can give us information about the relationship between the variables on the dataset. Other models, such as a Random Forest or deep neural networks, are hard to assess directly and may need additional tools to get some insight into its inner workings.
The method, or learning approach, is where we define how data is used to search for a good model. It will heavily depend on the type of problems we are solving: clustering, classification, regression, control, etc. Most expositions about this aspect of machine learning tend to highlight three major approaches:
- Unsupervised learning: where we do not have a particular target variable. The usual method to tackle segmentation and association tasks.
- Supervised learning: where we have a particular target variable. The usual method to tackle classification and regression problems.
- Reinforcement learning: where an agent learns how to achieve a goal by direct interaction with an environment.
Although the learning approach characterization may seem a mere formality, it is a crucial definition of a machine learning system design. The specification of an adequate learning approach for a given problem will guide all the model training setup, including its assessment methods, learning metrics, and expected results.
It is worth keeping in mind that those three are not the only approaches available. Still, once you get a clear understanding of their characteristics, it will be easier to understand other learning variations, such as semi-supervised learning, online learning, adversarial learning, etc.
Applications are designed to solve problems, regardless of the underlying technique. The application is the final product of the machine learning system design. Some well-known applications relying on machine learning nowadays are recommender systems, loans classifier, anomaly detectors, and autonomous vehicles.
It is important to remember that, virtually, any of those applications could be built using hard-coded rules. Keeping this in mind works as a reality check on the real necessity to make use of machine learning techniques on a given solution. Software engineering has a lot of complexities by itself, relying on machine learning to build an application adds yet another layer of complexity that can lead to a significant increase in the technical debt backlog.
Artificial Intelligence is a complex research field, and having a clear overall picture is very useful, if not needed, for people dealing with this technology. The next time you face an application built using machine learning, try disentangling each of its abstraction layers to understand the designer’s choices in each one, it may improve your understanding of it. Finally, I hope this conceptual introduction may work as a simple map helping you navigate this vast field. For those wondering how to dig deeper into the subject, when in doubt, I always go back to Russell and Norvig’s excellent ‘Artificial Intelligence: a modern approach.’.