Machine learning and its scope — for core engineers, with a cup of coffee

Source: Deep Learning on Medium


The definition of machine learning (ML) in its literal sense is information/data based learning adopted by machines to allow decision making in specific tasks. Artificial Intelligence (AI) on the other hand is information/data based intelligence that machines “develop” to allow decision making in complex tasks. Thus AI is a broader term encompassing the concepts of machine learning as well.

Given these definitions, let’s explore the relevance of ML/AI in core engineering domains that deal with real physical, chemical or biological systems. How does machine learning or artificial intelligence apply here? In today’s world, the terms ML/AI are almost synonymous with applications such as robotics, image recognition and self driving cars. And as a result, the relevance of these essentially data-based computational methods are not perceived clearly in core industries. My objective therefore is to create here an introductory reading material that depicts a clear picture of this fast emerging field to professionals in traditional technical domains.

Coffee brewing

To make this discussion easy to interpret, let’s consider an actual, small physical system, say coffee brewing! Coffee brewing is a popular sytem to experiment with. Given a certain kind of coffee grain, different ways of brewing produce a different kind of coffee. For instance, coffee brewed in an espresso machine has a full bodied, balanced bitter/acid flavour and comes with a “crema” layer. Drip coffee on the other hand has a wider taste profile and is usually higher in caffeine content. Both of these brewing methods differ in several ways — from the grind size of coffee used, to water temperature, extraction time and the type of force applied to leach out the coffee. Thus in order to determine the quality of coffee produced from a dynamic brewing system, it is important to understand the correlation and effects of each of these input variables.

Now if we were to systematically compute the effects of these variables on the quality of a brew, we would need first need to assess, quantify and aggregate output profiles of coffee (e.g., its texture, caffeine, acid, oil, astringent content, sweetness etc.) The relation between input variables and the defined output variables can then be derived by applying fundamental heat, mass and momentum balances. On transforming the resulting differential equations, analytical or stable numerical solutions can be generated for the output variables.

Although this seems straightforward, it is quite challenging to comprehensively define the model from first principles. Even if this is achieved, it is pretty difficult to determine the values for the various parameters of the model, such as mass/heat transfer rate coefficients etc. In short, generating a purely “white-box” model is quite challenging and often the computational effort does not warrant the desired result.

Mass transfer processes involved in coffee brewing

Alternatively, it is possible to map the relation between the output variables (coffee quality) and the respective input variables in a data-based format. This brings us to the application of machine learning to the same problem. Let’s say a set of well designed experiments were established to generate data on coffee quality for a wide range of input variable combinations (i.e., different brewing conditions). Linear regression may not be effective enough in capturing the underlying relations leading to models with possibly large biases. This means that they are not useful and would lead to inaccurate predictions in unknown/extrapolated brewing conditions. Machine learning on the other hand offers a suite of models that can effectively capture these nonlinear relationships. For instance, there are Decision Trees, Random forests, Adaboost or Gradient boost regressors, Support Vector Regressors etc.

A map of machine learning techniques for different types of datasets
Overfitted regression model

Unlike traditional regression techniques, machine learning algorithms such as decision trees or their boosted versions does not limit the data training to a predetermined model structure (i.e., linear, polynomial, exponential etc.) So there is much more freedom in determining nonlinear relationships, which is very useful (although this makes it prone to overfitting). The following figure depicts how a decision tree regressor would work in our case. As can be seen, coffee quality is regressed by generating a tree structure out of the input variables, each playing a role at different levels of hierarchy.

Decision tree regression

A support vector regression model is defined as shown below. Its biggest advantage lies in its ability to apply kernel tricks that can introduce nonlinear transformations of the input feature space with lesser computational effort https://towardsdatascience.com/understanding-the-kernel-trick-e0bc6112ef78

Working principle of a support vector regression model

Machine learning algorithms described above are generally referred to with the adjective “classical” and they are in some sense less black-box in nature as compared to the suite of deep learning models built with neural networks. Why is this so? Firstly, neural networks can be perceived as powerful mathematical tools that can achieve classification/prediction outputs through the stacking of linear combinations of input features transformed by nonlinear functions. The way the model is then tuned to a specific task is through backpropagation of loss functions (observed differences between true values and model generated predictions). Unlike classical machine learning algorithms, stacking of several layers renders a deep learning model with very minimal insight on feature influences on the final output. Considering our example of coffee brewing, this implies that despite achieving excellent predictive abilities, a deep learned coffee brewing model cannot pinpoint the magnitude of contribution of say, steam pressure on the sweetness of the resulting coffee.

Deep neural networks

Nevertheless, the scope of deep learning models is humungous and its application has time and again emerged successful in a variety of domains. And this success is beyond the field of robotics that people immediately tend to relate to. For instance, deep learning models are applicable in the oil & gas and petrochemicals industries where operational plants continue to collect huge reserves of data on a regular basis. Manufacturing, agriculture, mining, alternate energy, health care — so on and so forth, the list of industries that can benefit from machine learning are endless. Yet another sector that can tremendously benefit from deep learning is finance, where in there is a constant motivation to tap more and more information from various sources of data.

  • Liu, Yi, et al. “Ensemble deep kernel learning with application to quality prediction in industrial polymerization processes.” Chemometrics and Intelligent Laboratory Systems 174 (2018): 15–21.
  • Zhao, Yang, Jianping Li, and Lean Yu. “A deep learning ensemble approach for crude oil price forecasting.” Energy Economics 66 (2017): 9–16.
  • Chong, Eunsuk, Chulwoo Han, and Frank C. Park. “Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies.” Expert Systems with Applications 83 (2017): 187–205.
  • Acharya, U. Rajendra, et al. “Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals.” Information Sciences 415 (2017): 190–198.

In summary, the motivation of this post is to urge core engineers to realize the potential of machine learning and identify it as host of sophisticated techniques that can be applied to tasks regularly performed in their work sphere. Adopting these data-based techniques would also mean learning from years worth of data with powerful computational resources that are now available. As they say, data is the new oil! Moreover, technical expertise gained through experience is in many ways analogous to the learning gained by deep learning models from observing facts and data. Thus, if suitable models can be trained, the potential for advancement with machine learning is limitless!