Original article can be found here (source): Artificial Intelligence on Medium
A picture is worth a thousand words — and a whole lot of numbers
We, as humans, can pretty easily qualitatively analyze an image. A computer, on the other hand, needs numbers. The images are represented in data as pixels, and each pixel has 3 values: (R, G, B). Each of these numbers is a value between 0–255. The image as a whole, then, is represented as a 3-D matrix.
Convolutional Neural Nets: how computers see
The neural network I used is a Convolutional Neural Network which is commonly used for most image-related tasks. The idea behind a ConvNet is essentially a process of abstraction. Let’s start with an image of an apple:
We want the network to be able to detect features. We do this by using convolutions. Essentially, the network is going to pass a filter or “window” over the image. This filter is our feature detector.
By passing the filter over the image, the network’s producing what is essentially a feature map of the image. This feature map will help the computer extract meaningful features. However, by doing the convolution above we went from a 5×5 grid to only 3×3! In order to combat this, padding is used. Padding will essentially surround the input with zeroes. This ensures that the spatial size of the output is the same as the input, and it improves computational performance.
As with any other neural network, an activation function is applied.
model.add(Conv2D(128, 3, padding='same', activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)))
Typically between each convolution layer, a pooling layer is added. This pooling layer serves to reduce the dimensions of the data to improve computation and overfitting.
What was the point of padding then?
A max-pooling layer does something different than a lack of padding. When the filter is sliding over our image, it’s detecting features that we want to isolate. From a high-level perspective, however, the precise position where a feature is present isn’t important. Max-pooling is like a “zoom-out”: it allows later convolutions to work on larger areas of the image, since a small set of data after pooling corresponds to a much larger swath pre-pooling. Through doing this, the network can also go beyond small transformations and prevent overfitting. Pooling does not result in the loss of data that has not already been analyzed.
Padding, on the other hand, makes sure the network can analyze every piece of data it has, including the edges and corners. No padding means losing quite a bit of information for analysis, something usually undesirable.
A network can use multiple groups of convolution and pooling layers, as seen fit. Before the network can start working on classification, the multi-dimensional data needs to be flattened so the rest of the network can work with it.
Of course, all these parameters can be tweaked and played with. I played with many of them, training 27 different models (that takes quite a while on a laptop), but the plots were…
Yeah. Not fun.
After the data is flattened, we’re back in familiar territory with standard, feed-forward neural networks. I’m not going to explain them in this article, but if you want a good explanation, check out Joshua Payne’s article on the subject.
Taking my eyeballing and guesswork, I narrowed it down to 4 networks, iterating between 2 and 3 convolutions and 1 or 2 Dense layers.
For more detail regarding the code and building process, check out my Github repo here.
What are the next steps?
There are a couple things that can be done to improve this model.
- The dataset is imbalanced
- The sample size is relatively small
- No frontend!
Python is also kind of ugly:)).
- Data that replicates real-life inconsistencies = very good
- By engineering variability into the dataset, the model can be more robust
- Overfitting — when the neural net starts memorizing instead of learning
- Convolutions allow computers to extract high-level features from images
- Convolutions extract features -> pooling zooms out
Thanks for reading! If you have any questions, comments, or want to talk, you can reach me at email@example.com. All code for this project is available on my GitHub!