Convolutional Neural Network on AHCD dataset

Source: Deep Learning on Medium

Convolutional Neural Network on AHCD dataset

Convolutional Neural Networks ( CNNs ) are one of the most powerfull and used supervised learning techniques in deep learning mainly in image classification.

Handwritten character recognition systems in Arabic are very rare and
they face a lot of difficulties, including the unlimited variation of human writing and very large datasets.

In this article we will train a simple Convolutional Neural Network for Arabic Handwritten Characters Dataset ( AHCD ) to classify arabic letters. Befor we dive into the making process of the convnet I prefere to give you a breaf introduction to CNNs if you don’t know what it is and why we use it.

What are Convolutional neural networks :

Convolutional neural networks are very similar to regular neural networks ( multy layer perceptron MLPs ), they both are made up of neurons having learnable parameters and uses a loss function to adjust them. The main difference is that ConvNets make the assumption that the inputs are images explicitly, this reduces vastly the amount of parameters and learn much faster [1]. ConvNets are really good at extracting features, they can capture and learn features from an image at different levels.

ConvNets VS. MLP :

In multy layer perceptrons, every element of a previous layer is connected to every element of the next layer. So for example if you have a grayscaled image of 28×28 pixels, you will end up with 28x28x1 (784) neurons which can be manageable. But in the case of big rgb images or high resolution images, you will likely end up with thousands of neurons in the input layer which is without doubt non practicle therefor MLPs are not scalable when it comes to image classification. ConvNets can solve this problem as they are weight sharing, if you have a one layer CNN with 10 3×3 filters the number of parameters would be 3x3x10+10 = 100 parameters ( +10 being the number of biases ). Now if we take the previous 784 input neurons MLP, having one hidden layer of 200 neurons, the number of parameters would be 784×200+1=156801. Clearly CNNs are a lot less complex [2].

Building the CNN for AHCD :

preparing the data :

Ahcd dataset is composed of 16,800 characters written by 60 participants. The database is partitioned into two sets: a training set (13,440 characters to 480 images per class) and a test set (3,360 characters to 120 images per class). The caracters are labeled from “ا “ to “ي “ ( aliph to ya ). Each Image is of size 32 by 32 and grayscaled. You can download the dataset from here.

After downloading the dataset you will have 4 csv files, 2 for training and 2 for test. To load them we will use pandas.read_csv

#Loading the datasetx_train = pd.read_csv(‘csvTrainImages 13440x1024.csv’)
y_train = pd.read_csv(‘csvTrainLabel 13440x1.csv’)
x_test = pd.read_csv(‘csvTestImages 3360x1024.csv’)
y_test = pd.read_csv(‘csvTestLabel 3360x1.csv’)

Bein contained in a csv file, the images will be loaded having a 13440x32x32 shape. We need the color channel dimension which is 1 in our case as they are greyscaled images.

x_train = np.asarray(x_train).reshape(x_train.shape[0],32,32,1)
x_test = np.asarray(x_test).reshape(x_test.shape[0],32,32,1)

now we have to make some processing to the images, we begin with the normalization of the pixel values and we transform the labels to a onehot vector which is a vector containing probabilities for each class ( using keras.utils.to_categorical() function ) . We use min max normalization

#scaling x in range [0,1] 
x_train_scaled = (x_train - np.min(x_train)) / (np.max(x_train) - np.min(x_train))
x_test_scaled = (x_test - np.min(x_test)) / (np.max(x_test) - np.min(x_test))
#onehot encoding for labels
y_train_labels = tf.keras.utils.to_categorical(y_train)
y_test_labels = tf.keras.utils.to_categorical(y_test)

We create a list containing all the arabic caracters to associate them later with the ouput

alph = list('ابتةثجحخدذرزسشصضطظعغفقكلمنهوي')

Building the first model :

After preparing and processing the data we begin building our model. We will use keras running on a tensorflow backend. But first we have to choose a good architecture.

The architecture I opted for is quite simple, it will have 2 convolutional layers each one followed by a pooling layer then a fully connected layer and the output layer with a softmax activation. At each convolutional layer we use a 5×5 filter and ReLU as the activation function.

#building the modelmodel = tf.keras.Sequential()model.add(tf.keras.layers.Conv2D(32,kernel_size=(5,5),input_shape=(32,32,1),padding='same',activation='relu'))
model.add(tf.keras.layers.MaxPooling2D())
model.add(tf.keras.layers.Conv2D(64,kernel_size=(5,5),padding='same',activation='relu'))
model.add(tf.keras.layers.MaxPooling2D())
# flatten since too many dimensions, we need classification output
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1024,activation='relu'))
model.add(tf.keras.layers.Dense(29,activation='softmax'))

Then we compile the model. We use a categorical crossentropy loss function because the problem is a multiclass classification. And for the optimizer we use Adam as it is popular and converge faster.

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

Then we just fit and train the model

model.fit(x_train_scaled,y_train_labels,epochs=10,validation_data=(x_test_scaled,y_test_labels))

using GPU it took ~50 seconds to train, on CPU ~15min

we obtain 93% validation accuracy and 99% on training

score = model.evaluate(x_test_scaled, y_test_labels, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

for test : Test loss: 0.25114767971890073 Test accuracy: 0.9446264

We got a good performance, this is due to the simplicity of the images. But we can also optimize the model by adding Dropout layers and using LeakyReLU instead of ReLU.

model.add(tf.keras.layers.Dense(1024))
model.add(tf.keras.layers.LeakyReLU(alpha=0.2))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(29,activation=’softmax’))

let’s test some images from the test dataset and see how the model will perform

prediction = model.predict(x_test_scaled)ax = [0,0,0,0,0,0,0,0,0,0]
f, ((ax[0] ,ax[1] , ax[2], ax[3], ax[4]),( ax[5],ax[6] ,ax[7], ax[8], ax[9])) = p.subplots(2,5)
r = np.random.randint(100,size=(1,10))[0]
print(r)
for i in range(10) :
ax[i].imshow(x_test[r[i]].reshape(32,32).T,cmap='binary')
ax[i].set_title('{}'.format(i))
print("predicted {} : ".format(i)+alph[np.argmax(prediction[r[i]])-1])

we can then save the model for future use

model_json = model.to_json()with open(“model.json”, “w”) as json_file:
json_file.write(model_json)
model.save_weights(“model.h5”)

And this conclude our small tutorial. This is the first article I write, I hope it could give you a breaf idea about convnets and why we use them.

link to repo https://github.com/nassim-fox/arabic_alphabet_ahcd1_cnn

ref. :

[1] http://cs231n.github.io/convolutional-networks/

[2] https://www.quora.com/What-are-the-advantages-of-a-convolutional-neural-network-CNN-compared-to-a-simple-neural-network-from-the-theoretical-and-practical-perspective