Original article was published on Deep Learning on Medium
How to use it?
Maybe at this point, you don’t see how simple the setup is. And yet it is. All we have to do is define a list of the transformations we want to do on our sample and that’s it. We do not touch anything else afterward. Note that the order of the transformations will have its importance. It’s up to you.
We can now dive into the purpose of the article and see the image augmentation techniques.
The first, and one of the simplest, consists of randomly performing flips on the horizontal and vertical axes of the images. In other words, there is a 50/50 chance of performing a vertical flip and a 50/50 chance of performing a horizontal flip.
To do image augmentation, it is common to crop the image randomly. In other words, we crop a part of the image of random size and over a random area.
The size of the cropped image can be chosen from a ratio on the dimensions
(height, width). If the proportional maximum size of the crop is not specified then we will consider by default that it is the size of the image.
We are going to get into something a little more enjoyable. Filters are great classics but I think it’ is important to be able to easily create our own convolution filters. If you do not know how a filter works I refer you to my article about Conv2d.
So I wanted to make a general function to be able to use our own filters.
As far as filters are concerned, it is possible to go even further by choosing a filter upstream and applying it with random weighting. For example, I introduce you the filter for sharpening our image.
To finish with the filters, the most popular are used to randomly blur our image. There are a lot of ways to blur our image. The best known are the average, median, Gaussian, or bilateral filters.
Concerning average filter. As its name indicates: it allows us to average the values on a given center. This is made by a kernel. Its size can be specified for more or less blur. To increase our images with an average filter we just need to filter our input image with a kernel of a random size.
Finally in the same way as for the average blur. The Gaussian blur does not use an average filter but a filter so the values correspond to a Gaussian curve from the center. Note that the kernel dimension must contain odd numbers only.
By far the most widely used image enhancement technique is perspective transformation. There are rotation, translation, shearing, and scaling. These transformations can be performed in a 3D dimension. Usually, they are used only in 2D which is a pity. Let’s take advantage of everything we have at our disposition, right?
I will not take more time on the 3D transformations of a 2D image because I wrote a whole article about it. So I picked up the function we get at the end of this article. I invite you to have a look at it if you want to know more about homogeneous coordinates and 3D transformation matrices.
What should be noted is that this function allows us to randomly perform transformations according to the 4 proposed matrices. The order has its importance. Here we have the shearing, then the rotation, then the scale, and finally the translation. Note that the translation is done by a ratio of the dimensions of the image.
The cutout is pretty intuitive. It involves removing regions of the input image at random. It works in the same way as the cropping we talked about earlier. But instead of returning the regions concerned, we delete them. We can, therefore, once again allow the user to provide a minimum and maximum size per ratio of regions to be deleted, a maximum number of regions, to cut the regions from the target at the same time or not, we can perform this cutout per channel, and also choose the default replacement value of the deleted regions.
Now we get to the part I find the funniest. A part that is very rarely taken into account. If we know the color spaces we can take advantage of their properties to enhance our images. To give you a simple example, with the HSV color space we can have fun extracting the leaf thanks to its color and change its color randomly according to our wishes. That is a very cool thing to do! And we understand the interest of having our own image enhancement functions. Of course, this requires a little more creativity. So it is important to know our color spaces to make the most of them. Particularly since they can be crucial in preprocessing for our (Deep) Machine Learning models.
Let’s stay on our colors a little longer. A great classic in image augmentation is to be able to play with brightness. There are several ways to do so the simplest is to simply add a random bias.
In the same way, it is very simple to play with contrasts. This can also be done randomly.
The last fairly common image enhancement technique is noise injection. In reality, we only add a matrix of the same size as our input. This matrix is composed of elements following a random distribution. Noise injection can be done from any random distribution. In practice, we only see two of them. But feel free to go further 😃
Finally, much less used but not useless. Some cameras have a vignetting effect. It is also interesting to think about how we can increase our images by randomly imitating this phenomenon. We will also try to give flexibility to the user. We will be able to decide the minimum distance from the effect can randomly start, decide its intensity, and even decide if it’s an effect that goes towards black or toward white.
And finally the best for last. I am surprised it is not used more often. But it can mimic the distortion of a camera lens. It is like looking through a round glass. What appears to us is distorted because the lens (the glass) is rounded. So if our images are taken from a camera with a lens why do we not simulate them. This should be used by default for images. At least I think so.
I thus propose in this last function to be able to randomly simulate our lens distortion by playing on the radial coefficients
k1, k2, k3 and on the tangential coefficients
p1, p2. In this method, the order of the coefficients is as follows:
k1, k2, p1, p2, k3. I invite you to have a look at the OpenCV documentation on this subject.