Deep Learning Basics: A Crash Course

Source: Deep Learning on Medium

Introduction to Deep Learning

Given the universal approximation theorem, you may wonder what the point of using more than one hidden layer is. This is in no way a naive question, and for a long time neural networks were used in this way.

One reason for multiple hidden layers is that approximating a complex function might require a huge number of neurons in the hidden layer, making it impractical to use. A more important reason for using deep networks, which is not directly related to the number of hidden layers but to the level of learning, is that a deep network does not simply learn to predict output Y given input X: it also understands basic features of the input.

Let’s take a look at an example.

In Proceedings of the International Conference on Machine Learning (ICML) (2009) by H. Lee, R. Grosse, R. Ranganath, and A. Ng, the authors train a neural network with pictures of different categories of either objects or animals. In the following image we can see how the different layers of the network learn different characteristics of the input data. In the first layer the network learns to detect some basic features, such as lines and edges, which are common to all images in all categories:

The first layer weights (top) and the second layer weights (bottom) after training

In the next layers, shown in the image below, it combines those lines and edges to compose more complex features that are specific to each category:

Columns 1–4 represent the second layer (top) and third layer (bottom) weights learned for a specific object category (class). Column 5 represents the weights learned for a mixture of four object categories (faces, cars, airplanes, and motobikes)

In the top row, we can see how the network detects different features of each category. Eyes, noses, and mouths for human faces, doors and wheels for cars, and so on. These features are abstract. That is, the network has learned the generic shape of a feature, such as a mouth or a nose, and can detect this feature in the input data despite variations it might have.

In the second row of the preceding image, we can see how the deeper layers of the network combine these features into even more complex ones, such as faces and whole cars. A strength of deep neural networks is that they can learn these high-level abstract representations themselves by deducing them from the training data.

Deep Learning Algorithms

We could define deep learning as a class of machine learning techniques where information is processed in hierarchical layers to understand representations and features from data in increasing levels of complexity. In practice, all deep learning algorithms are neural networks, which share some common basic properties. They all consist of interconnected neurons that are organized in layers. Where they differ is network architecture (the way neurons are organized in the network), and sometimes the way they are trained.

With that in mind, let’s look at the main classes of neural networks. The following list is not exhaustive, but it represents the vast majority of algorithms in use today.

Multi-Layer Perceptrons (MLPs)

A neural network with feedforward propagation, fully-connected layers, and at least one hidden layer.

The diagram demonstrates a 3-layer fully connected neural network with two hidden layers. The input layer has k input neurons, the first hidden layer has n hidden neurons, and the second hidden layer has m hidden neurons. The output, in this example, is the two classes y₁ and y₂. On top is the always-on bias neuron. A unit from one-layer is connected to all units from the previous and following layers (hence fully connected).

Convolutional Neural Networks (CNNs)

A CNN is a feedforward neural network with several types of special layers. For example, convolutional layers apply a filter to the input image (or sound) by sliding that filter all across the incoming signal, to produce an n-dimensional activation map. There is some evidence that neurons in CNNs are organized similarly to how biological cells are organized in the visual cortex of the brain. Today, they outperform all other ML algorithms on a large number of computer vision and natural language processing tasks.

Recurrent Neural Networks (RNNs)

This type of network has an internal state (or memory), which is based on all or part of the input data already fed to the network. The output of a recurrent network is a combination of its internal state (memory of previous inputs) and the latest input sample. At the same time, the internal state changes, to incorporate newly input data. Because of these properties, recurrent networks are good candidates for tasks that work on sequential data, such as text or time-series data.

Autoencoders

A class of unsupervised learning algorithms, in which the output shape is the same as the input, that allows the network to better learn basic representations. It consists of an input, hidden (or bottleneck), and output layers. Although it’s a single network, we can think of it as a virtual composition of two components:

  • Encoder: Maps the input data to the network’s internal representation.
  • Decoder: Tries to reconstruct the input from the network’s internal data representation.

Reinforcement Learning (RL)

Reinforcement algorithms learn how to achieve a complex objective over many steps by using penalties when they make a wrong decision and rewards when they make a correct decision. It is a method often used to teach a machine how to interact with an environment, similar to the way human behaviour is shaped by negative and positive feedback. RL is often used in building computer games and autonomous vehicles.

Applications of Deep Learning

Machine learning, particularly deep learning, is producing more and more astonishing results in terms of the quality of predictions, feature detection, and classification. Many of these recent results have even made the news! Here are just a few of the ways that these techniques can be applied today or in the near future:

Autonomous Vehicles

Nowadays, new cars have a suite of safety and convenience features that aim to make the driving experience safer and less stressful. One such feature is automated emergency braking if the car sees an obstacle. Another one is lane-keeping assist, which allows the vehicle to stay in its current lane without the driver needing to make corrections with the steering wheel. To recognize lane markings, other vehicles, pedestrians, and cyclists, these systems use a forward-facing camera. We can speculate that future autonomous vehicles will also use deep networks for computer vision.

Image and Text Recognition

Both Google’s Vision API and Amazon’s Rekognition services use deep learning models to provide various computer vision capabilities. These include recognizing and detecting objects and scenes in images, text recognition, face recognition, and so on.

Medical Imaging

Medical imaging is an umbrella term for various non-invasive methods of creating visual representations of the inside of the body. Some of these include Magnetic Resonance Images (MRIs), ultrasound, Computed Axial Tomography (CAT) scans, X-rays, and histology images. Typically, such an image is analyzed by a medical professional to determine the patient’s condition. Machine learning, computer vision, in particular, is enabling computer-aided diagnosis which can help specialists by detecting and highlighting important features of images.

For example, to determine the degree of malignancy of colon cancer a pathologist would have to analyze the morphology of the glands using histology imaging. This is a challenging task because morphology can vary greatly. A deep neural network could segment the glands from the image automatically, leaving the pathologist to verify the results. This would reduce the time needed for analysis, making it cheaper and more accessible.

Medical History Analysis

Another medical area that could benefit from deep learning is the analysis of medical history records. Before a doctor diagnoses a condition and prescribes treatment they consult the patient’s medical history for additional insight. A deep learning algorithm could extract the most relevant and important information from those extensive records, even if they are handwritten. In this way, the doctor’s job can be made easier while also reducing the risk of errors.

Language Translation

Google’s Neural Machine Translation API uses — you guessed it — deep neural networks for machine translation.

Speech Recognition and Generation

Google Duplex is another impressive real-world demonstration of deep learning. It’s a new system that can carry out natural conversations over the phone. For example, it can make restaurant reservations on a user’s behalf. It uses deep neural networks to both to understand the conversation and to generate realistic, human-like replies.

Siri, Google Assistant, and Amazon Alexa also rely on deep networks for speech recognition.

Gaming

Finally, AlphaGo is an artificial intelligence (AI) machine based on deep learning, that made the news in March 2016 for beating the world Go champion, Lee Sedol. AlphaGo had already made the news in January 2016, when it beat the European champion, Fan Hui. Although, at the time, it seemed unlikely that it could go on to beat the world champion. Fast-forward a couple of months and AlphaGo was able to achieve this remarkable feat by sweeping its opponent in a 4–1 victory series.

This was an important milestone because Go has many more possible game variations than other games, such as chess, and it’s impossible to consider every possible move in advance. Also, unlike chess, in Go it’s very difficult to even judge the current position or value of a single stone on the board. In 2017, DeepMind released an updated version of AlphaGo called AlphaZero.