Learn Tensorflow 2: Introduction to Computer Vision — Codelabs

Original article can be found here (source): Artificial Intelligence on Medium

Learn Tensorflow 2: Introduction to Computer Vision — Codelabs

In this article i will be discussing about Computer Vision.And how the computer figure out the patterns from image?

Computer vision is the field of having a computer understand and label what is present in an image.Consider this slide.

When you look at it, you can interpret what a shirt is or what a shoe is,but how would you program for that?If a person who had never seen shoes , how would you explain the shoes to him? It’s really difficult, if not impossible to do right?

And it’s the same problem with computer vision.So one way to solve that is to use lots of pictures of clothing and tell the computer what that’s a picture of and then have the computer figure out the patterns that give you the difference between a shoe,and a shirt, and a handbag, and a coat.

Fashion MNIST

Fortunately, there’s a data set called Fashion MNIST which gives a 70 thousand images spread across 10 different items of clothing.These images have been scaled down to 28 by 28 pixels.Now usually, the smaller the better because the computer has less processing to do.

If you look at this slide you can still tell the difference between shirts, shoes, and handbags.So this size does seem to be ideal,and it makes it great for training a neural network.

The images are also in gray scale,so the amount of information is also reduced.Each pixel can be represented in values from 0 to 255 and so it’s only one byte per pixel. With 28 by 28 pixels in an image, only 784 bytes are needed to store the entire image.

Despite that ,we can still see what’s in the image and in this case,it’s an ankle boot, right?

Fashion Mnist Dataset


Machine Learning depends on having good data to train a system with. Scenario for training a system to recognize fashion images. The data comes from a dataset called Fashion MNIST.

First download the data from the site KAGGLE .After downloading the data we will receive the zip file then extract that files.


Setup the environment in the COLAB.Fortunately, it’s still quite simple because Fashion-MNIST is available as a data set with an API call in TensorFlow. We simply declare an object of type MNIST loading it from the Keras database.On this object, if we call the load data method,it will return four lists to us.

In the Fashion-MNIST data set, 60,000 of the 70,000 images are used to train the network,and then 10,000 images,one that it hasn’t previously seen,can be used to test just how good or how bad it is performing.So this code will give you those sets.Then, each set has data, the images themselves and labels and that’s what the image is actually of.

For Example:

The training data will contain images like this one,and a label that describes the image like this.While this image is an ankle boot,the label describing it is the number nine.


You’ll notice that all of the values in the number are between 0 and 255. If we are training a neural network, for various reasons it’s easier if we treat all values as between 0 and 1, a process called ‘normalizing’…and fortunately in Python it’s easy to normalize a list like this without looping. You do it like this:


We have three layers.The important things to look at are the first and the last layers.The last layer has 10 neurons in it because we have ten classes of clothing in the dataset. They should always match.The first layer is a flatten layer with the input shaping 28 by 28.Now, if you remember our images are 28 by 28,so we’re specifying that this is the shape that we should expect the data to be in.Flatten takes this 28 by 28 square and turns it into a simple linear array.


The next thing to do, now the model is defined, is to actually build it. You do this by compiling it with an optimizer and loss function as before — and then you train it by calling model.fit asking it to fit your training data to your training labels — i.e. have it figure out the relationship between the training data and its actual labels, so in future if you have data that looks like the training data, then it can make a prediction for what that data would look like.

Once it’s done training — you should see an accuracy value at the end of the final epoch. It might look something like 0.9527. This tells you that your neural network is about 95% accurate in classifying the training data. I.E., it figured out a pattern match between the image and the labels that worked 95% of the time. Not great, but not bad considering it was only trained for 15 epochs and done quite quickly.


But how would it work with unseen data? That’s why we have the test images. We can call model.evaluate, and pass in the two sets, and it will report back the loss for each. Let’s give it a try:

For me, that returned a accuracy of about .8881, which means it was about 88% accurate. As expected it probably would not do as well with unseen data as it did with data it was trained on! As you go through this course, you’ll look at ways to improve this.

To explore further, try the below exercises:

Exploration Exercises:

Exercise 1:

For this first exercise run the below code: It creates a set of classifications for each of the test images, and then prints the first entry in the classifications. The output, after you run it is a list of numbers. Why do you think this is, and what do those numbers represent?

Hint: try running print(test_labels[1]) — and you’ll get a 2. Does that help you understand why this list looks the way it does?

What does this list represent?

  1. It’s 10 random meaningless values.
  2. It’s the first 10 classifications that the computer made.
  3. It’s the probability that this item is each of the 10 classes.


The correct answer is (3)

The output of the model is a list of 10 numbers. These numbers are a probability that the value being classified is the corresponding value, i.e. the first value in the list is the probability that the handwriting is of a ‘0’, the next is a ‘1’ etc. Notice that they are all VERY LOW probabilities.

For the 2, the probability was .999+, i.e. the neural network is telling us that it’s almost certainly a 2.

How do you know that this list tells you that the item is an ankle boot?

  1. There’s not enough information to answer that question
  2. The 10th element on the list is the biggest, and the ankle boot is labelled 9
  3. The ankle boot is label 9, and there are 0->9 elements in the list


The correct answer is (2). Both the list and the labels are 0 based, so the ankle boot having label 9 means that it is the 10th of the 10 classes. The list having the 10th element being the highest value means that the Neural Network has predicted that the item it is classifying is most likely an ankle boot.

Exercise 2:

Let’s now look at the layers in your model. Experiment with different values for the dense layer with 512 neurons. What different results do you get for loss, training time etc? Why do you think that’s the case?

Question 1. Increase to 1024 Neurons — What’s the impact?

  • Training takes longer, but is more accurate
  • Training takes longer, but no impact on accuracy
  • Training takes the same time, but is more accurate


The correct answer is (1) by adding more Neurons we have to do more calculations, slowing down the process, but in this case they have a good impact — we do get more accurate. That doesn’t mean it’s always a case of ‘more is better’, you can hit the law of diminishing returns very quickly!

Exercise 3:

What would happen if you remove the Flatten() layer. Why do you think that’s the case?

You get an error about the shape of the data. It may seem vague right now, but it reinforces the rule of thumb that the first layer in your network should be the same shape as your data. Right now our data is 28×28 images, and 28 layers of 28 neurons would be infeasible, so it makes more sense to ‘flatten’ that 28,28 into a 784×1. Instead of writing all the code to handle that ourselves, we add the Flatten() layer at the begining, and when the arrays are loaded into the model later, they’ll automatically be flattened for us.

Exercise 4:

Consider the final (output) layers. Why are there 10 of them? What would happen if you had a different amount than 10? For example, try training the network with 5

You get an error as soon as it finds an unexpected value. Another rule of thumb — the number of neurons in the last layer should match the number of classes you are classifying for. In this case it’s the digits 0–9, so there are 10 of them, hence you should have 10 neurons in your final layer.

You can also access my code:

you can also access my another medium:

Andrew Ng Ev Williams Aqsa Qadir Bushra Shakeel Alamsyah Hanza