Building an Image Classifier Using Deep Learning

Source: Deep Learning on Medium

Image Credit: Titima Ongkantong/Shutterstock

My first inspired project is in computer vision, a field of artificial intelligence associated with training computers to understand the visual world via digital videos and images. This project lines up with the first and second sessions of their introductory class, Practical Deep Learning for Coders.’s embraces a top-down learning method which gets you building a complete deep learning model out of the gates. As mentioned in my last post here, there is a lot more theory to come as the course progresses. Finally, if you’d like to skip the post and check out my project in production, you can find it here. The project’s code is on my GitHub page.

Picking a Project

Readers of my publication, Complexity Everywhere, know that I am passionate about financial markets and the lessons economic history can provide in preparing for the future. For more on this topic, I suggest reading This Time Is Different by Carmen M. Reinhart and Kenneth Rogoff.

Gold (and other precious metals) has played an important role in economic history, serving as a store of value for thousands of years. Its fate is intertwined with the rise and fall of civilizations. In the last two decades, the price of Gold in US$ has skyrocketed from around $300 per ounce to as much as $1900+/oz, currently trading around $1300/oz. The emergence of cryptocurrencies has also brought gold into renewed focus. Famed investor Michael Novogratz has argued that bitcoin is the new gold, a digital store of value.

Trending AI Articles:

1. From Perceptron to Deep Neural Nets

2. Neural networks for solving differential equations

3. Bursting the Jargon bubbles — Deep Learning

4. Turn your Raspberry Pi into homemade Google Home

Given all of this, I thought that an image classifier comparing gold, silver, and copper coins would be an interesting project.

Getting the Images

Francisco Ingham and Jeremy Howard from wrote a great guide on creating your own image dataset from Google Images. Using the guide, I identified ~200 images of gold, silver, and copper coins respectively. I downloaded the urls of each image using a couple of lines of Javascript code.

urls = Array.from(document.querySelectorAll(‘.rg_di.rg_meta’)).map(el=>JSON.parse(el.textContent).ou);‘data:text/csv;charset=utf-8,’ + escape(urls.join(‘\n’)));

In most deep learning models, it is important that your dataset is in a file structure that the model you are using expects them to be in. I created separate folders for gold, silver, and copper coin images. Leveraging the guide’s code, I uploaded URL files for each set of images and then downloaded them into their respective folders.

Downloading the images was made quite easy by a download_image function built into the library. If you specify the urls filename as well as the destination folder, the function will download and save all only those images that can be opened.

Preparing the Data

The first step in preparing the data is to define the classes. This will allow the model to know what you are looking to classify. No surprise, I defined my classes to be gold, silver, copper.

The key step here is to create a Python object that contains your data, an object that can be passed to the Learner (see below) which is how the model is trained. has built a class called DataBunch. In this case we use ImageDataBunch which is the same initialization. ImageDataBunch allows you to access image data from folders, from CSV files, from a data frame, as well as from other sources. I had standard images stored in folders, so I called ImageDataBunch.from_folder.

It’s worth touching on the important arguments here.

  • Path: the path where the data lives.
  • Train: there is no separate training and validation folder (see next argument)
  • Valid_pct: this argument allows you to pick a certain percentage of your dataset to be held back as validation (one of the most important concepts in DL/ML) instead of previously creating a validation folder and set. You want to do this to avoid overtraining or overfitting your model on all of your data. There should be some data that is not used for training. I selected 20 percent.
  • ds_tfms=get_transforms(). I called the get_transforms which is another function that transforms images for more effective use in deep learning models. I called the function with default values, which includes flips, rotations, increasing brightness, zooming in, etc.
  • size=224 is the pixel size of the pictures

Training the Model

Now for the fun stuff. I use a factory method learner built by Factory methods are previously set up for ease of use. As I move along in the course, I will build custom learners, but for now, a factory method is the fastest way to complete a project end-to-end. The learner is a convolutional neural network (CNN) which is a commonly used architecture for computer vision. CNNs take image data in numerical form (a matrix of pixel values for a picture) and apply other mathematical operations to these matrices over and over again. It’s critical to note that these matrices are multi-dimensional. For example, pictures are 3 dimensional. In deep learning, multidimensional matrices are called tensors. A matrix with 3 dimensions would be called a rank 3 tensor. These operations include matrix multiplication with convolutional kernels or weights to help identify certain features of images.

Also, the learner employs Transfer Learning. That simply means you use a model already trained on a very large dataset that you can then adapt. A model like this can recognize more generic features found in most types of images. This can be quite helpful in saving you model training time and if your dataset is relatively small. This article offers proof of the value of using transfer learning.

To train a basic image model, recommended using a pre-trained model called resnet34 that raised the bar on image classification when it was released. I created a learner object from your data object (prepared above) and by default, the method cuts the pre-trained model at its last convolutional layer. It then adds the following mathematical functions:

  • AdaptiveConcatPool2d layer: This is a combination of MaxPooling (changing resolution by taking maximum numbers from a set of pixel values) and AveragePooling (taking average values)
  • Flatten layer: Flatten the values to a single dimension

Blocks of

  • BatchNormId: Normalizes the values between -1 and 1 by subtracting the mean and dividing by the standard deviation
  • Dropout: randomly zeroes out some of the values
  • Linear: applies a linear transformation to the incoming values
  • Rectified Linear Unit (RELU): drops any negative numbers from the calculated values

I’ve touched on a lot of concepts. Let’s restate from top to bottom:

  1. A dataset is compiled, which in this case is pictures that on a computer are a set of pixel values. These are your input values, call them a matrix A.
  2. These pictures are grouped, or classified, by type (in my case gold, silver, or copper coins). This classification is the output of our function, Y.
  3. The goal is to find some function such that the Y = A*X, with X being some unknown values.
  4. To begin, an initially randomized set of X values are created.
  5. A series of predefined (and customizable if you want) operations are applied to A for the 80% of pictures that are in our training set. Many of these operations involve multiplying A by other sets of numbers X, called convolutional kernels. As mentioned, X is initially randomly assigned. The output of each operation is called a layer, and you can keep applying mathematical operations to layer after layer. That is what these techniques are called deep learning or deep neural networks (as opposed to shallow).
  6. These calculations ultimately produce, in this case, a single output Ŷ, which is the predicted value of Y using the function and randomly created X values.
  7. Ŷ is compared to Y to determine how close the function came to reality, i.e. the error.
  8. Once completed, the process is done over and over again employing a concept called Stochastic Gradient Descent or SGD. I plan on doing an entire post on SGD, but the basic premise is that SGD is a function that helps you minimize the loss your calculated function (i.e. minimize Ŷ — Y). The process by which you do this is to adjust X by the (slope of the function * a learning rate). A learning rate is usually a tiny number and is used to ensure you are taking small steps toward adjusting X in a way that Ŷ changes and moves toward minimizing Ŷ — Y.

Evaluating Results

In my initial pass, I achieved a 14% error rate. On the second go around, I employed a’s function to optimize the learning rate and was able to reduce the error rate to 6.4%. Finally, I ran some data cleaning techniques that primarily removed duplicate images as well as images that were not coins from the dataset. This function required scrolling through the dataset which may not be practical if you have thousands or more images. But I was able to do it with ~200 images of each type of coin. Once done, my final run of the learner produced a 3.5% error rate. I was satisfied with this result.

Putting into Production has produced a series of guides to put your models into production as a web app. I opted to deploy on Render, the simplest and fastest approach. The guide to doing this can be found here. Finally, if you’d like to check out my Coin Classifier, you can find it here. Thanks for reading and I appreciate the feedback!


Any opinions or forecasts contained herein reflect the personal and subjective judgments and assumptions of the author only. There can be no assurance that developments will transpire as forecasted and actual results will be different. The accuracy of data is not guaranteed but represents the author’s best judgment and can be derived from a variety of sources. The information is subject to change at any time without notice.

Don’t forget to give us your 👏 !