Original article was published on Deep Learning on Medium

In this series of posts, we will study the concept of similarity learning applied to different type of data for different contexts:

**Text data/Sequence data**(e.g for sentence similarities)**Image data**(e.g for object similarities, face recognition…)**Multimodal data**(e.g for retail product comparison, comprising of a title and an image)

In this article, I will go through my take on the general concept of Similarity Learning, which processes it involves and how it can be summarized. I will then apply these outlined concepts to the context of **questions similarities**.

# Table of Contents

- Overview of Similarity Learning
- Text Similarity Learning
- Source code (PyTorch implementation)

# 1. Overview of Deep Similarity Learning

When one is doing similarity learning, the same process is always performed:

As explained in this infographic, any process involving Similarity Learning revolves around 3 main concepts:

**Transformation**of the data in a vector of features**Comparison**of the vectors using a distance metric**Classification**of the distance as being*similar*or*dissimilar*

## 1.a Transformation through an Encoder

In most Deep Learning tasks, the first layers of a model represent what is sometimes referred to as “*an encoding phase*”: it has the role of extracting relevant features from the input data.

For the rest of the article, we will write the encoding function as follows:

This encoder can take, depending on the input, different forms, amongst which we find:

**RNN layers**for encoding and comparison of**sequences**;**CNN layers**for**temporal**/**spatial**data (1D Convolutions can be used for**sequences**as well);

Usually, **after **the input data has been reduced to a vector by these **encoders**, we stack layers of **Fully-Connected Neurons **to **classify **these extracted **features**. In our case, we use this **vector **as a dimensionally reduced version of our data to **compute distance **with other pieces of data. It becomes way easier to numerically say how different two vectors are rather than two sentences for example.

To sum up, an encoder will use a combination of any kind of layers that will, adequately to its input data, generate the data’s **latent representation**, a compressed, non-human interpretable, vector of information.

Throughout Deep Learning history, multiple types of architectures have been created to generate latent vectors. Some of them were:

- Siamese Neural Networks
*(Koch, Zemel and Salakhutdinov, 2015)* - Multimodal Autoencoders
*(Silberer and Lapata, 2015)*

We will explore Siamese Neural Networks further away in this article.

## 1.b Distance calculation

Once we have our vectorized input data, we can compare the two vectors using a distance function. The most popular distances are:

- The
**Manhattan**distance - The
**Euclidean**distance

Once the distance is calculated, we could set a **threshold **above which we consider two pieces of data to be *dissimilar* and vice versa to consider them *similar*.

## 1.c Distance classification

However, depending on the input data, setting this threshold might be complex or time consuming. For simplicity, we can use another classifier that will, **given an input distance, classify if this distance is the one of similar or dissimilar objects . **My choice was to use a logistic regression classifier: finding a linear seperation in our data correspond to learn the threshold classifying our distances.