Source: Deep Learning on Medium

### What are Siamese networks?

A Siamese network is a special type of neural network and it is one of the simplest and most popularly used one-shot learning algorithms.

one-shot learningis a technique where we learn from only one training example per class.

So, a Siamese network is predominantly used in applications where we don’t have many data points in each class. let’s say we want to build a face recognition model for our organization and about 500 people are working in our organization.

### Why use Siamese networks?

For instance, let’s say we want to build a face recognition model for our organization and about 500 people are working in our organization. If we want to build our face recognition model using a **Convolutional Neural Network** (**CNN**) from scratch, then we need many images of all of these 500 people for training the network and attaining good accuracy. But apparently, we will not have many images for all of these 500 people and so it is not feasible to build a model using a CNN or any deep learning algorithm, unless we have sufficient data points. So, in these kinds of scenarios, we can resort to a sophisticated one-shot learning algorithm such as a Siamese network, which can learn from fewer data points.

### How does Siamese networks work?

But how do siamese networks work? Siamese networks basically consist of two symmetrical neural networks both sharing the same weights and architecture and both joined together at the end using some energy function, **E**. The objective of our siamese network is to learn whether two input values are similar or dissimilar. Let’s say we have two images, **X1** and **X2**, and we want to learn whether the two images are similar or dissimilar.

Siamese networks are not only used for face recognition, but they are also used extensively in applications where we don’t have many data points and tasks where we need to learn similarity between two inputs. The applications of siamese networks include signature verification, similar question retrieval, object tracking, and more. We will study siamese networks in detail in the upcoming section.

### Architecture of Siamese networks

As you can see in the preceding diagram, a Siamese network consists of two identical networks both sharing the same weights and architecture. Let’s say we have two inputs, **X1 **and **X2**. We feed our input **X1 **to Network A, that is, **fw(X1)**, and we feed our input **X2 **to Network B, that is, **fw(X2)**. As you will notice, both of these networks have the same weights, w, and they will generate embeddings for our input, **X1** and **X2**. Then, we feed these embeddings to the energy function, E, which will give us similarity between the two inputs.

It can be expressed as follows:

The input to the siamese networks should be in pairs, **(X1, X2)**, along with their binary label, **Y ∈ (0, 1)**, stating whether the input pairs are a genuine pair (same) or an imposite pair (different). As you can see in the following table, we have sentences as pairs and the label implies whether the sentence pairs are genuine (1) or imposite (0):

a Siamese network learns by finding similarity between two input values using identical architecture. It is one of the most commonly used few-shot learning algorithms among tasks that involve computing similarity between two entities. It is powerful and robust and serves as a solution for a low data problem.

### Face recognition using Siamese networks

We will create Siamese network by building a face recognition model. The objective of our network is to understand whether two faces are similar or dissimilar. We use the AT&T Database of Faces, which can be downloaded from here: https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.

Once you have downloaded and extracted the archive, you can see the folders s1, s2, up to s40, as shown here:

Each of these folders has 10 different images of a single person taken from various angles. For instance, let’s open folder s1. As you can see, there are 10 different images of a single person:

We open and check folder s13:

So, we will take two images randomly from the same folder and mark them as a genuine pair and we will take single images from two different folders and mark them as an imposite pair.

First, we will import the required libraries:

import re

import numpy as np

from PIL import Image

from sklearn.model_selection import train_test_split

from keras import backend as K

from keras.layers import Activation

from keras.layers import Input, Lambda, Dense, Dropout, Convolution2D, MaxPooling2D, Flatten

from keras.models import Sequential, Model

from keras.optimizers import RMSprop

Now, we define a function for reading our input image. The read_image function takes as input an image and returns a NumPy array:

def read_image(filename, byteorder='>'):

#first we read the image, as a raw file to the buffer

with open(filename, 'rb') as f:

buffer = f.read()

#using regex, we extract the header, width, height and maxval of the image

header, width, height, maxval = re.search(

b"(^P5\s(?:\s*#.*[\r\n])*"

b"(\d+)\s(?:\s*#.*[\r\n])*"

b"(\d+)\s(?:\s*#.*[\r\n])*"

b"(\d+)\s(?:\s*#.*[\r\n]\s)*)", buffer).groups()

#then we convert the image to numpy array using np.frombuffer which interprets buffer as one dimensional array

return np.frombuffer(buffer,

dtype='u1' if int(maxval) < 256 else byteorder+'u2',

count=int(width)*int(height),

offset=len(header)

).reshape((int(height), int(width)))

For an example, let’s open one image:

Image.open("data/orl_faces/s1/1.pgm")

img = read_image('data/orl_faces/s1/1.pgm')

img.shape

(112, 92)

Finally, we concatenate both x_genuine_pair and x_imposite to X and y_genuine and y_imposite to Y:

size = 2

total_sample_size = 10000

def get_data(size, total_sample_size):

#read the image

image = read_image('data/orl_faces/s' + str(1) + '/' + str(1) + '.pgm', 'rw+')

#reduce the size

image = image[::size, ::size]

#get the new size

dim1 = image.shape[0]

dim2 = image.shape[1]

count = 0

#initialize the numpy array with the shape of [total_sample, no_of_pairs, dim1, dim2]

x_geuine_pair = np.zeros([total_sample_size, 2, 1, dim1, dim2]) # 2 is for pairs

y_genuine = np.zeros([total_sample_size, 1])

for i in range(40):

for j in range(int(total_sample_size/40)):

ind1 = 0

ind2 = 0

#read images from same directory (genuine pair)

while ind1 == ind2:

ind1 = np.random.randint(10)

ind2 = np.random.randint(10)

# read the two images

img1 = read_image('data/orl_faces/s' + str(i+1) + '/' + str(ind1 + 1) + '.pgm', 'rw+')

img2 = read_image('data/orl_faces/s' + str(i+1) + '/' + str(ind2 + 1) + '.pgm', 'rw+')

#reduce the size

img1 = img1[::size, ::size]

img2 = img2[::size, ::size]

#store the images to the initialized numpy array

x_geuine_pair[count, 0, 0, :, :] = img1

x_geuine_pair[count, 1, 0, :, :] = img2

#as we are drawing images from the same directory we assign label as 1. (genuine pair)

y_genuine[count] = 1

count += 1

count = 0

x_imposite_pair = np.zeros([total_sample_size, 2, 1, dim1, dim2])

y_imposite = np.zeros([total_sample_size, 1])

for i in range(int(total_sample_size/10)):

for j in range(10):

#read images from different directory (imposite pair)

while True:

ind1 = np.random.randint(40)

ind2 = np.random.randint(40)

if ind1 != ind2:

break

img1 = read_image('data/orl_faces/s' + str(ind1+1) + '/' + str(j + 1) + '.pgm', 'rw+')

img2 = read_image('data/orl_faces/s' + str(ind2+1) + '/' + str(j + 1) + '.pgm', 'rw+')

img1 = img1[::size, ::size]

img2 = img2[::size, ::size]

x_imposite_pair[count, 0, 0, :, :] = img1

x_imposite_pair[count, 1, 0, :, :] = img2

#as we are drawing images from the different directory we assign label as 0. (imposite pair)

y_imposite[count] = 0

count += 1

#now, concatenate, genuine pairs and imposite pair to get the whole data

X = np.concatenate([x_geuine_pair, x_imposite_pair], axis=0)/255

Y = np.concatenate([y_genuine, y_imposite], axis=0)

return X, Y

Now, we generate our data and check our data size. As you can see, we have 20,000 data points and, out of these, 10,000 are genuine pairs and 10,000 are imposite pairs:

X, Y = get_data(size, total_sample_size)

X.shape

(20000, 2, 1, 56, 46)

Y.shape

(20000, 1)

Next, we split our data for training and testing with 75% training and 25% testing proportions:

x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=.25)

Now that we have successfully generated our data, we build our siamese network. First, we define the base network, which is basically a convolutional network used for feature extraction. We build two convolutional layers with ReLU activations and max pooling followed by a flat layer:

def build_base_network(input_shape):

seq = Sequential()

nb_filter = [6, 12]

kernel_size = 3

#convolutional layer 1

seq.add(Convolution2D(nb_filter[0], kernel_size, kernel_size, input_shape=input_shape,

border_mode='valid', dim_ordering='th'))

seq.add(Activation('relu'))

seq.add(MaxPooling2D(pool_size=(2, 2)))

seq.add(Dropout(.25))

#convolutional layer 2

seq.add(Convolution2D(nb_filter[1], kernel_size, kernel_size, border_mode='valid', dim_ordering='th'))

seq.add(Activation('relu'))

seq.add(MaxPooling2D(pool_size=(2, 2), dim_ordering='th'))

seq.add(Dropout(.25))

#flatten

seq.add(Flatten())

seq.add(Dense(128, activation='relu'))

seq.add(Dropout(0.1))

seq.add(Dense(50, activation='relu'))

return seq

Next, we feed the image pair to the base network, which will return the embeddings, that is, feature vectors:

input_dim = x_train.shape[2:]

img_a = Input(shape=input_dim)

img_b = Input(shape=input_dim)

base_network = build_base_network(input_dim)

feat_vecs_a = base_network(img_a)

feat_vecs_b = base_network(img_b)

feat_vecs_a and feat_vecs_b are the feature vectors of our image pair. Next, we feed these feature vectors to the energy function to compute the distance between them, and we use Euclidean distance as our energy function:

def euclidean_distance(vects):

x, y = vects

return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))

def eucl_dist_output_shape(shapes):

shape1, shape2 = shapes

return (shape1[0], 1)

distance = Lambda(euclidean_distance, output_shape=eucl_dist_output_shape)([feat_vecs_a, feat_vecs_b])

Now, we set the epoch length to 13, and we use the RMS prop for optimization and define our model:

epochs = 13

rms = RMSprop()

model = Model(input=[input_a, input_b], output=distance)

Next, we define our loss function as the contrastive_loss function and compile the model:

def contrastive_loss(y_true, y_pred):

margin = 1

return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))

model.compile(loss=contrastive_loss, optimizer=rms)

Now, we train our model:

img_1 = x_train[:, 0]

img_2 = x_train[:, 1]

model.fit([img_1, img_2], y_train, validation_split=.25, batch_size=128, verbose=2, nb_epoch=epochs)

Now, we make predictions with test data:

pred = model.predict([x_test[:, 0], x_test[:, 1]])

Next, we define a function for computing accuracy:

def compute_accuracy(predictions, labels):

return labels[predictions.ravel() < 0.5].mean()

Now, we the accuracy of model:

compute_accuracy(pred, y_test)

0.9779092702169625