Source: Deep Learning on Medium

“A breakthrough in machine learning would be worth ten Microsofts.” — Bill Gates

Machine learning can be understood in different ways.

The one point of view is that it is the part of the Big Data analysis. This concept was published in 1996 as KDD (Knowledge Discovery in Databases) Process. It mostly uses statistical and data mining methods and deals with structured data represented as a table.

But from another point of view machine learning is the part of the Artificial Intelligence (AI). It is a set of methods which AI uses. As usual they use deep learning algorithms and deal with unstructured data (images, videos, …).

It’d be expected to improve the performance of the learning algorithm with increasing the amount of data. The traditional learning algorithms, like SVM and logistic regression, doesn’t meet these expectations but in the case of neural networks, the performance of the model increases with an increase in the data you feed to the model.

### Brief introduction to deep learning

As each machine learning method, deep learning takes in an input data X and uses it to simplify (or ‘predict’) as an output of Y. The core of it is a neural network. It finds associations between a set of inputs and outputs.

A neural network consists of nodes organized in layers. Distinction is made between input, hidden, and output layers. Nodes connect with each other to send signals but not with the nodes of the same layer.

Input layer takes data (a numerical representation of images, videos, …), output layer gives predictions, and hidden layers involved in computation process. Computation is an iterative process. After the input nodes pass all data through hidden nodes to the output, the network compare obtained values with expected to evaluate the quality of prediction. It uses so called loss function (e.g. “Mean Squared Error”). Using values of this function as residuals, adjustments are made in computational process and the next iteration goes on.

The neural network is called deep if it has more than one hidden layer (in practice much more than one). The additional layers improves the ability to find complex associations in data.

### Machine learning basic concepts

Three basic concepts has each machine learning algorithm:

**Representation**: the type of the knowledge (e.g. decision trees, sets of rules, instances, graphical models, neural networks, support vector machines, model ensembles and others).**Evaluation**: the way to evaluate hypotheses (e.g. accuracy, prediction and recall, squared error, likelihood, posterior probability, cost, margin, entropy k-L divergence and others).**Optimization**: the search process or the way to generate candidate programs (e.g. combinatorial optimization, convex optimization, constrained optimization).

The feature engineering is the most important phase for the success of machine learning models: the more we work close to the business to extract relevant knowledge from the structured data, the more powerful the model will be.

### How we get machines to learn

Let us say, we have a set of images and need the model to distinguish image relation to the predefined class. To do it, machine learning need to learn differences between two categories of images (from class and not) by observing images of both like humans do. The goal is to create automatically patterns that better describe each category. This phase is called training. In our imaginary model, a pattern is a weighted combination of inputs. So that, the training is the estimation of the weights (also called parameters) of the model.

Estimation cannot be defined without an objective function (loss function) we have to optimize. This function is so called a model global parameter and it is taken with respect to model assumptions, better convergence of iterative process os something else. When the model prediction is closer to the true output then the loss function is lower, and vice-versa.

**There are four types of learning:**

**Supervised learning**: (also called inductive learning) Training data is labeled i.e. includes desired outputs. The example is classification, to predict predefined class of input data.**Unsupervised learning**: Training data does not include desired outputs. Example is clustering, to divide input into different classes.**Semi-supervised learning**: Training data includes a few desired outputs.**Reinforcement learning**: Rewards from a sequence of actions.

### An overview of the challenges and limitations of ML

There two limitations of machine learning: computational power and amount of data. It is more a characteristic of deep learning domains. Tech giants like Google, Facebook, Amazon are a good example. They have huge amount of data (like voices, videos, images,…) and big computational resources, and they have significant progress in deep learning model building and training. And the problem is not really about distributing data. Cloud services can help in it. But this is extensive way and it has quickly achievable limitations. The big challenge is to improve deep learning systems to be more efficient and able to work with less data.

The other key problem is that the current deep learning systems are not smart enough. After training they can be incredibly efficient at specific tasks in particular domain. But there isn’t exist the model that can work in different domains (e.g identify objects and images, play Space Invaders, and listen to music). Other big challenge is to create a system which can learn multiple tasks.