Original article can be found here (source): Deep Learning on Medium
Computer Vision 101: Working with Color Images in Python
Learn the basics of working with RGB and Lab images to boost your computer vision projects!
Every computer vision project — be it a cat/dog classifier or bringing colors to old images/movies — involves working with images. And in the end, the model can only be as good as the underlying data — garbage in, garbage out. That is why in this post I focus on explaining the basics of working with color images in Python, how they are represented and how to convert the images from one color representation to another.
In this section, we set up the Python environment. First, we import all the required libraries:
import numpy as npfrom skimage.color import rgb2lab, rgb2gray, lab2rgb
from skimage.io import imread, imshowimport matplotlib.pyplot as plt
We use scikit-image, which is a library from
scikit-learn’s family that focuses on working with images. There are many alternative approaches, some of the libraries include
numpy, OpenCV, Pillow, etc.
In the second step, we define a helper function for printing out a summary of information about the image — its shape and the range of values in each of the layers.
The logic of the function is pretty straightforward, and the slicing of dimensions will make sense as soon as we describe how the images are stored.
We start with the most basic case possible, a grayscale image. Such images are made exclusively of shades of gray. The extremes are black (weakest intensity of contrast) and white (strongest intensity).
Under the hood, the images are stored as a matrix of integers, in which a pixel’s value corresponds to the given shade of gray. The scale of values for grayscale images ranges from 0 (black) to 255 (white). The illustration below provides an intuitive overview of the concept.
In this article, we will be working with the image you already saw as the thumbnail, the circle of colorful crayons. It was not accidental that such a colorful picture was selected 🙂
We start by loading the grayscale image into Python and printing it.
image_gs = imread('crayons.jpg', as_gray=True)fig, ax = plt.subplots(figsize=(9, 16))
As the original image is in color, we used
as_gray=True to load it as a grayscale image. Alternatively, we could have loaded the image using the default settings of
imread (which loads an RGB image — covered in the next section) and converted it to grayscale using the
Next, we run the helper function to print the summary of the image.
Running the code produces the following output:
Image dimensions: (1280, 1920)
G : min=0.0123, max=1.0000
The image is stored as a 2D matrix, 1280 rows by 1920 columns (high-definition resolution). By looking at the min and max values, we can see that they are in the [0,1] range. That is because they were automatically divided by 255, which is a common preprocessing step for working with images.
Now it is time to work with colors. We start with the RGB model. In short, it is an additive model, in which shades of red, green and blue (hence the name) are added together in various proportions to reproduce a broad spectrum of colors.
scikit-image, this is the default model for loading the images using
image_rgb = imread('crayons.jpg')
Before printing the images, let’s inspect the summary to understand the way the image is stored in Python.
print_image_summary(image_rgb, ['R', 'G', 'B'])
Running the code generates the following summary:
Image dimensions: (1280, 1920, 3)
R : min=0.0000, max=255.0000
G : min=0.0000, max=255.0000
B : min=0.0000, max=255.0000
In comparison to the grayscale image, this time the image is stored as a 3D
np.ndarray. The additional dimension represents each of the 3 color channels. As before, the intensity of the color is presented on a 0–255 scale. It is frequently rescaled to the [0,1] range. Then, a pixel’s value of 0 in any of the layers indicates that there is no color in that particular channel for that pixel.
A helpful note: When using the OpenCV’s
imread function, the image is loaded as BGR instead of RGB. To make it compatible with other libraries, we need to change the order of the channels.
It is time to print the image and the different color channels:
fig, ax = plt.subplots(1, 4, figsize = (18, 30))ax.imshow(image_rgb/255.0)
ax.set_title('original RGB')for i, lab in enumerate(['R','G','B'], 1):
temp = np.zeros(image_rgb.shape)
temp[:,:,i - 1] = image_rgb[:,:,i - 1]
In the image below, we can see the original image and the 3 color channels separately. What I like about this image is that by focusing on individual crayons, we can see which colors from the RGB channels and in which proportions constitute the final color in the original image.