Source: Deep Learning on Medium
Several month ago I was observing how one of my relatives was teaching colors to his daughter. He was showing him papers with different colors on them and the baby had to distinguish them from each other. For humans it is very easy to see the clear difference between colors, if one does not suffer any disability. But with this much progress, it should be very easy for machines as well right?
As we know, image for computer is just combination of some numbers that are divided in 3, Red-Green-Blue channels (in most cases). I won’t start explaining it here, if you don’t know how computers see digital images go here .
Even though computers see image as raw numbers, it should be very easy for them to identify colors right? For example, we know that when value of Red channel varies between 200–230, Green 0–30, Blue 170–200, we get Purple. When R : 0–30, G : 230–255, B : 0–30, we get Light Green and etc. So it should be very easy for computer as well to identify each color. As it just has to find a line which divides pixel numbers into clusters.
**Note: These color margins are all set by me. (This is how I perceive colors).
Even just writing code that calculates average of each channel and then finding corresponding color name in dictionary or list should do the trick. But I am more interested whether simple neural network will be able to learn the pattern that will allow it to distinguish between colors with 100% accuracy. So, lets get started.
You can see all the code here .
I have created 64×64 image type pictures that look like this.
There are 13 different colored images available in dataset, that I quickly came up with. 10 images for each class. So overall there are 130 images.
First step is to always work with data preprocessing, but in my case there are no variations that I should consider as I am interested if neural networks will be able to understand this pattern that I have.
But I still have to take into the consideration that all of my labels are words and I need to convert them to the style that will be useful for my neural nets. To do that I use One Hot Encoders. If you don’t know what one hot encoder does, read this.
In my case I have 13 labels. So, for example “red” will be vector that looks like this: [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0].
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
enc1_file = 'label_encoder.pkl',
enc2_file = 'hot_encoder.pkl'):
if os.path.exists(enc1_file) and os.path.exists(enc2_file):
print("The pickle files already exists")
label_encoder = LabelEncoder()
encoded = label_encoder.fit_transform(y)
encoded = encoded.reshape(len(encoded), 1)
enc = OneHotEncoder(sparse=False)
with open(enc1_file, 'wb') as write:
with open(enc2_file, 'wb') as write:
print("Model Saved Sucessfully")
I create pickle files that will allow me to use this encodings, which where generated from training data, for testing my model after I train it.
Now, I will quickly go over with the code that I’ve developed. Firstly, we need to create our input and output for our image. As it is very simple task, I use placeholders to define my input – output.
X = tf.placeholder(tf.float32, shape = [None, 64, 64, 3])
y = tf.placeholder(tf.float32, shape = [None, 13])
hold_prob = tf.placeholder(tf.float32)
Here x represents image that we will have as input. As you see it has shape (None, 64,64,3). None in this case represents the batch size. Number of images that I have is very low, so I take full batches in this case and set batch size to number of images that are in dataset (in my case 117). But if you want you can change that parameter. “ hold_prob “ defines the probability that will keep layers alive when using dropout.
Next, I define all the necessary steps. As loss function I use Adamoptimizer with cross entropy.
cnn = CNN(batch_size= 117, learning_rate=0.001, shape = [64,64,3],
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,
optimizer = tf.train.AdamOptimizer(learning_rate = cnn.learning_rate)
loss = optimizer.minimize(cross_entropy)
Here CNN is class created by me which is in “model.py” file and you can check it on github. Next I call for function neural_net which constructs neural net that I will present below and saves final outputs to cnn.y_prob which we use in cross entropy loss.
To train my dataset I developed my own simple neural net with two convolution, one pooling and 2 fully connected layers. I use relu as my activation function.
I used dropout layers to avoid overfitting. Here is the code
def neural_net(self, X, hold_prob = 0.5):
with tf.variable_scope('convolution') as scope:
conv1 = self.conv_layer(X, [3, 3, 3, 32])
conv2 = self.conv_layer(conv1, [5,5, 32, 64])
with tf.variable_scope('pooling') as scope:
pool1 = tf.nn.max_pool(conv2,
ksize = [1, self.kernel_size , self.kernel_size ,1],
strides=[1, self.strides_size, self.strides_size , 1],
padding = 'VALID')
with tf.variable_scope('fc_layer') as scope:
flat = tf.reshape(pool1, [-1 , 8 * 8 * 64])
dropout_flat = tf.nn.dropout(flat, keep_prob=hold_prob)
fc1 = self.fc_layer(dropout_flat, 1024)
fc1 = tf.nn.relu(fc1)
dropout_fc1 = tf.nn.dropout(fc1, keep_prob=hold_prob)
fc2 = self.fc_layer(dropout_fc1, self.num_classes)
#fc2 = tf.nn.relu(fc2)
self.y_prob = fc2
This is very simple model that I decided to use. I trained 117 images for 1000 epochs on GTX-1060 which took up to 4 minutes. I thought at first that it would definitely give me 100% accuracy. But I was wrong. The training accuracy was 94%. Also I tried testing it on 13 images from test set and it identified 12 correct. Which is also 92% accuracy. The mistake was that the image that I see as “dark green”, was identified as “dark blue”. It might seem very easy to find the set of additions and multiplications that allows to identify 13 colors, but it turns out during training process we could not achieve that. There might be several reasons behind it.
- Training time is short
- There are a few images
- Each channel contains same value for entire image (we have same pixel values for each image). Like in purple image every pixel value in red, green or blue channel is equal throughout whole image. To be more specific there are 10 images for purple images. For first one all pixel values are (200, 0, 170), for second purple image (203, 3, 173) and etc.
In conclusion, I wanted to test simple CNN on very easy task, to learn identifying patterns of colors in images. It turned out it is just another classification problem, that needs more training and good data structure, as I could not achieve 100% accuracy, even on this simple problem with small training set and time. Maybe I could have came up with better neural network model, without even using Convolutions or Pooling, but all the fun that I had from this project was to test how this type of neural net would do against this problem.
As I’ve said in introduction, my inspiration came from the little child who was learning how to identify colors. But she as well struggled at first to identify it and made a lot of mistake. It took more than 4 minutes for her to learn the color patterns. After that how can we blame AI for not learning this simple constructions ? Maybe it’s time to start thinking of improving our own neural nets in brain :).
Thank you everyone for reading this post. This is my first article ever, so I wanted to publish something that is very easy to read and understand. Next, when I have time I plan to publish more complex problems that I’ve solved. Hope to see you again when it happens.