Day 1 of 30 days of finishing CVND

Source: Deep Learning on Medium

Day 1 of 30 days of finishing CVND

My day 1 progress of 30 days of finishing Computer Vision Nanodegree (CVND)

Udacity’s Computer Vision Nanodegree Program [TRAILER]

Today is the first day of my 30 days of Udacity Challenge. I challenged myself to finish my CVND in 30 days. So I decided to spend at least 2 hours everyday which I usually spend on my phone and social media .I made this schedule which I will be following strictly to guide me.

I got this course (CVND) as a scholarship from Facebook AI in late september and now I only have one month to finish it.🏃

What is first Computer Vision?

Computer Vision is a specialized branch of Artificial Intelligence focused on enabling machines like smarphones or robotic systems to visually perceive the world, and respond to it. It is modeled after human vision. A robot can gather data through cameras and other sensors, and then use that input to identify different objects and safely move through its environment.

Why did I start the course from scratch yesterday?

I started the course from scratch because some of the concepts of CVND are a little bit hard. Wait, did I say a little bit hard? No, they are really very hard😅 and since I am totally new in this field, the whole course felt so overwhelming. For the past 3 months, I was stuck at part one of the course because I wanted everything to make sense before I move to the next lessons or try the first project. I wanted to understand everything which was not possible haha.

What did I Study Day 1?

I enjoyed day 1 because I was having those aha moments 😀. All of a sudden, everything started to make sense when I rewatched the lessons. It is not like I never tried rewatching the lessons in the past 3 months. I can’t even remember how many times I tried doing that.

So today, what I did was I took advantage of the Udacity community instead of using google search which most of the time confused me with a lot of results that felt overwhelming sometimes. I asked some of my classmates and friends all the silly questions I had. I know I might have looked stupid for not knowing some of the concepts which are general knowledge but trust me it was worth it 👌.

Day 1 Progress Update

I finished Lesson 4: Image Representation and Classification which is among the lessons of Part 1: Introduction to Computer Vision of the course I learned how images are represented numerically and some image processing techniques, such as color masking and binary classification.

What is image processing?

Image processing is a subset of computer vision which is a method to perform operations on images, in order to enhance them or extract useful information, analyze it and make decisions. One of the purposes of image processing is Image recognition which is to distinguish objects in an image. Computer Vision can be trained to recognize and tag (or label) faces or different features in any given photo library.

YOLO multi-object detection on an image of a road. This example shows detected people, cars, traffic lights, and even handbags.

Why do we need to represent images numerically?

Machines store images in the form of numbers.

Machines store images in the form of a matrix of numbers. The size of this matrix depends on the number of pixels we have in any given image.
Reading and displaying color images using python. Can you see the x and y-axis

This car image and all digital images are made of a grid of pixels, which are very small units of a single color or intensity.

Then what is a Pixel?

“The word pixel means a picture element. Every photograph, in digital form, is made up of pixels. They are the smallest unit of information that makes up a picture. Usually round or square, they are typically arranged in a 2D or two-dimensional grid”.

What is a Pixel?
© Julie Waterhouse Photography

“In the image on the left, one portion has been magnified many times over so that you can see its individual composition in pixels. As you can see, the pixels approximate the actual image. The more pixels you have, the more closely the image resembles the original”.

What is a pixel used for?

Each pixel stores color information for your image. Each pixel in that grid has a corresponding numerical value. And each of these pixels, in addition to having a color value, also has a location XY in this image grid. These axes are a lot like axes for a graph.

Every pixel in an image is just a numerical value

Grayscale(black and white) images

“A grayscale picture just needs brightness or intensity information — how bright is a particular pixel. The higher the value, the greater the intensity. Current displays support 256 distinct shades of gray. Each one just a little bit lighter than the previous one!”

These numbers, or the pixel values, denote the intensity or brightness of the pixel. Smaller numbers (closer to zero) represent black, and larger numbers (closer to 255) denote white.

Displaying images in Grayscale using computer vision library (CV2)

Finding the dimensions of a grayscale image

“So in the memory, a grayscale image is represented by a two-dimensional array of bytes. The size of the array being equal to the height and width of the image. Technically, this array is a “channel”. So, a grayscale image has only one channel. And this channel represents the intensity of whites”.

Displaying the dimension of a grayscale image

The dimensions of the image above are 3088 x 4700 These dimensions are basically the number of pixels in the image (height x width).

Color Images

Color images are interpreted as 3D cubes of values with width, height, and depth. Most color images can be represented by combinations of only three colors, Red, Green, and Blue (known as RGB Colors).

The width and height are the x and y locations while the depth is the number of color channels (or matrices). For colored images or RGB images, the depth is 3 : Red, Green, Blue.

It’s helpful to think of the depth as three stacked, two-dimensional (2D) color layers. One layer is Red, one Green, and one Blue. Together they create a complete color image.

“Each color channel (or matrix) has values between 0–255 representing the intensity or brightness of the color for that pixel. Consider the below image to understand this concept”:

source: analyticsvidhya

Finding the dimensions of a colored image

can you see the 3 which shows it is a colored image

The colored image above has a dimension of (3088, 4700, 3), where 3 is the number of channels.

Displaying Separated Color Channels of an RGB Image

The below example displays each color channel and the original image.

Color adds another layer of complexity. More information needs to be stored. It is no more just about what shade. It’s about what shade of which “color”.

Each color channel (or matrix) has values between 0–255 representing the intensity or brightness of the color for that pixel.

Notice that each separated color channel in the figure contains an area of white. The white corresponds to the highest values (purest shades) of each separate color. For example, in the Red Channel image, the white represents the highest concentration of pure red values. As red becomes mixed with green or blue, gray pixels appear. The black region in the image shows pixel values that contain no red values.

I am sorry if I confused you but let’s continue lol😅 👇.

Color Thresholds

I also learned about how to use information about the colors in an image to isolate a particular area.

Thresholding is one of the most basic techniques for what is called Image Segmentation. With thresholding, you can segment the image based on colour. For example, you can segment all green color in an image.

Color thresholds are used in a number of applications including extensively in computer graphics and video. A common use is with a green screen.

A Green Screen similar to the blue screen is used to layer two images or video streams based on identifying and replacing a large green area.

Coding Green Screen

The first step was to isolate the green background and then replace the green area with an image of our choosing.

The output

The HSV color space

The lessons also covered HSV color space which is useful in consistently detecting objects under varying light conditions.

The HSV color space also has 3 channels: the Hue, the Saturation and the Value, or intensity.

some color spaces are good for some purposes. Other are good for other purposes. RGB color space is good for humans. RGB color space for image processing is often tough. So the HSV color space was invented!

An example that uses RGB color space and HSV color space to select all the pink balloons.

we can see that HSV space is actually more valuable in selecting an area under varying light conditions

Day and Night Classification

We then did a classification challenge: Classifying two types of images, taken during either the day or at night (when the sun has set).

We built a classifier that can accurately label these images as day or night, and that relies on finding distinguishing features between the two types of images!

Two images of the same scene. One taken during the day (left) and one at night.

You made it all the way here?! Thanks for reading. 😆

You can find all the code practicals and my schedule in this repo on my GitHub.

If you have any questions or comments feel free to leave your feedback below or you can always reach me on Twitter. Till then, see you in the next post! ✋.

Reference:

Computer Vision Nanodegree Program

What is a Pixel?

RGB and Color Channels in Photoshop Explained

Color spaces

Thresholding

3 Beginner-Friendly Techniques to Extract Features from Image Data using Python