Original article was published by Bharath K on Artificial Intelligence on Medium
Working with OpenCV and Computer Vision
Now that we have a brief understanding of how images work, we can proceed further to learn about the openCV library, and how to utilize this module to perform computer vision tasks. OpenCV module is by far the best module for the execution of complex machine learning, deep learning, and computer vision tasks. It offers simplicity and high standards for the analysis and performance of the models being built. It is an open-source library and can be integrated with other python modules such as NumPy to accomplish complicated real-time applications. It is supported for a wide range of programming languages and runs remarkably on most platforms such as Windows, Linux, and MacOS.
The installation process for opencv module is quite simple. I have mentioned both the ways to install openCV for python as well as build the module in an anaconda virtual environment. Feel free to choose whichever method suits you the best.
The straightforward installation process of openCV can be done by using the following command in the command prompt.
pip install opencv-python
If you are using the anaconda environment, then you also choose to use the below installation process to build the library opencv module in your virtual environment. Type the command below in the anaconda command terminal.
conda install -c conda-forge/label/cf202003 opencv
Once we are done with the installation procedure, we can focus on some coding. Today, we will mainly look at the three basic aspects of computer vision i.e., as follows:
- Reading, writing, and viewing an image.
- Drawing with openCV
- Accessing the Webcam
So, without further ado, let us get started with these three basic concepts.
1. Reading, Writing, and Viewing an Image:
We will be performing these three tasks consecutively. The reading, displaying, and writing of images is an essential part of computer vision as you have to consistently deal with images. The best part about opencv apart from the previously mentioned advantages is that it grants you access to a variety of image formats as well. So, we can work on all these image formats without facing any major issues. I will be making use of the lena.png image format for the rest of this section.
You can feel free to download the same image and follow along. Download the image and place it in the same folder or directory as your python file. This will help us to directly access the image without the requirement of continuously mentioning the path to view each of the respective images. Once you have placed the images in the same location as your python file, we can start accessing them and working on the various functionalities provided by the opencv module. Let us start with importing and reading the images accordingly.
- Reading and Displaying the image:
The most paramount thing in any computer vision task is obviously to know how to read an image and display it appropriately.
The first step here is to import the cv2 module. This is how you can check if your installation is successful. Once you have imported the cv2 module run the above program with the respective lines of code to read the image. The cv2.imread performs a similar task of the pillow module where you can use open() to read the image of your choice. Once you have read the image, you need some way to display the following image. There are mainly two ways of achieving this task.
The first method is to use the cv2.imshow() command as shown in the above code block. It is essential to give the waitKey() command to ensure that the cv2 window which opens upon passing the cv2.imshow() command stays intact. The number inside the waitKey() is a representation of the time for the image to be displayed. The time period used is in milliseconds. The second method to deal with the analysis of the image shown above is to make use of the matplotlib library module. Using the pyplot function in the matplotlib library module, we can use the plt.imshow() function to directly display the image within the jupyter notebook without having to show the cv2 graphical window that appears when we use cv2.imshow(). You have to however make sure to convert it to an RGB image because cv2 for some reason utilizes the BGR format. I will be exploring this concept too in the further topics in the same section.
Let us analyze the main characteristics of the image we are working with by using the image.shape attribute. Using this we can figure out all the dimensions and the number of channels in which the image is being displayed. The following are the features of our lena.png image stored in your directory —
Height of the Image = 512Width of the Image = 512Number of channels = 3
The height and width of the image are of the dimensions of 512 pixels. The number of channels of the above image are obviously three because we are working with mainly three colors, i.e., red, blue, and green. With our image and dimensions how analyzed, we can proceed to the next topic in this section that deals with the writing of the images.
In this section, we will look at how we can write the images, and save it to our desktop. The example I will be showing will be simple and I will be rewriting the same image as we read. You can however feel free to draw your own images and write them with the file format and save them appropriately. The below code line shown is an accurate representation of how you can write the image to your desktop and save it.
The above images are taken from my respective folder / directory, where you can see there are two images. One is the original Lena image and other is the image we wrote, that is the lena1 image. You can also specify the path you want to write and save the image too. Now, that we have an idea of how to perform the basic operations of reading the image, displaying the image, and writing and saving the image, we can move forward to the next topic where we will learn how to manipulate these images.
Now that we have a brief understanding of the basic operations related to computer vision, let us proceed to understand the ways we can manipulate an image. This is extremely useful and important for specific tasks that need to be performed effectively. The resize function helps us to rescale the image into a different dimension. We can choose to make it bigger or smaller and this is completely up to the user and also highly dependent on the task that is being performed. Below is the code block representing the resizing of the image. I have halved the dimensions of the original image to receive a new image which half as big as the original image.
The above images are a representation of how we can successfully manipulate the image we have and rescale them to a bigger or smaller dimensionality. This is useful in a variety of applications where you need an image of a particular dimension to perform a task more effectively. An example of this can be the transfer learning deep neural networks model we build.
- Converting into grayscale image:
The next way to maneuver to handle the functionality of images is converting them into a grayscale image. This is sometimes an extremely important step as it is useful to reduce the load on the model that is being trained. Working on grayscale images is comparatively less complex than working with RGB images. The computational strength can be increased for models which do not have effective resources for the processing and computing of RGB images. The step to convert the RGB image convention to the grayscale image convention is just as referenced in the below code block. The cv module has an in-built system for the effective computation of these color images to grayscale images effectively.
OpenCV uses a BGR instead of the standard RGB convention, so don’t be too confused about this particular notion used while converting the RGB image to a grayscale image. This was also referenced in the earlier section. When you are trying to implement the matplotlib library on the opencv read image, there is a chance that you read it as a BGR image formatting instead of RGB. This issue can be solved by using the simple convert color operation that is provided by the computer vision module.
The final task we will be looking at in this particular section is the blurring of the images. The main reason for blurring the image is to remove the external noise that impacts the performance of the actual image and help in the smoothing process of the images. This is like using a kernel based filtering technique implemented to tackle the unevenness and the overall noise distribution in the image. The low intensity edges are also removed which is extremely useful for the better processing, and improving the overall quality of the images. It can also be used to hide the data for security purposes or privacy issues. The below code block can be used to activate the blur operation by using the opencv module. The blur operation similar to the grayscale and rescaling operations are extensively used in dealing with the various computer vision tasks.
The Gaussian Blur is a widely used effect in graphics software, typically to reduce image noise and reduce detail. It is also used as a preprocessing stage before applying on the machine learning or deep learning models. The (19,19) is the kernel size I have used to get a more blur image on the existing picture. Make sure to only use an odd number of kernels, and not an even sized one. You can change and edit the kernel size of (19,19) to one that suits your purpose more appropriately.
This completes the first section of dealing with images. The next step is to look at how we can draw using the opencv module. This concept is the next significant topic to master computer vision. Let us proceed to next section and learn how to draw some important diagrams.
2. Drawing with OpenCV:
This next is going to some pretty obvious drawing methods which can be used and implemented using the opencv module. Hence we will looking at each of these more quickly and understanding them intuitively. Let us start of with drawing a simple line using this library.
The below code block is the method used to draw a simple line in the cv2 graphical window. The first command is to make sure the entire image that is being displayed is of black color for a better visualization. You can choose to use the traditional white background approach if it suits you better. We are then defining a line to be drawn as follows:
- The image where the line should be drawn.
- The starting point with both the x and y coordinates.
- The ending point with both the x and y coordinates.
- In the next property, we are assigning the color to the line. Here, the format is BGR. By utilizing the method that I have used we can get a blue line.
- The last slot of attribute defines the thickness of the line.
The above image is a representation of a line cutting diagonally through the entire graphical window. You can use your preferred starting and ending coordinates to visualize and display your lines.
The next operation which the opencv module allows us to perform is the drawing of the rectangle. This can be effectively done by using the below code block. I won’t be explaining too much for this because it is very similar to the previously mentioned drawing of line. Here, also you define a start and end point, and if the defined points are satisfactory for a rectangle to be drawn, then the operation will be successfully performed.
The above image is of a rectangle drawn in the center of the graphical window. You can render your images in any direction and dimension as well as change the color.
The opencv library also allows us to draw a circle in a similar fashion to that of the line and the rectangle. However, there is one key difference when it comes to drawing a circle. You need to give a central location point and give the x and y centered coordinate point. After this step, you can specify the radius of the circle. The radius of the circle will define how big the circle will be and you can adjust the color and thickness as well according to your preferences. The below code block shows and accurate representation of how to draw a circle with the help of the opencv module.
The image of a circle in yellow color drawn from the center. As usual, you can render your circle with your respective coordinates, color, and radius of variable thickness.
We are done with most of the drawings but at some point it also becomes essential to add some text to the displayed images. Thankfully, opencv grants us access to the putText() command which can be used to add textual visualizations in the graphical window. The below code block is precise representation of how exactly you can perform this task. I have used the Hershey Simplex font in this code block but I would highly recommend you guys to check out the various font options available in opencv and choose one according to your preference. Choose the image and the text to be displayed with the right dimensions, the font, the scale of the font used, the color, the thickness, and an optional line type. I would also suggest the viewers to look deeper into the various line type options available.
The images shows the text computer vision displayed at the center. I have chosen a red color for my text with a thickness of 3. Please feel free to try out the various options according to your preferences and explore more. Only with exploration and practice can you truly master computer vision and artificial intelligence.
This is the final drawing operation we will be performing in this section before moving on to the next topic. The polylines function in the opencv module can be used to draw absolutely anything you wish. The main methodology of this function can be noted from the below code block. I am constructing a hexagon in the below image. I have mentioned an array of 6 points holding the respective positions of the hexagon that is being designed. Once the array is successfully designed and we have all the necessary points we are reshaping it into an array of polygonal curves. We are making sure that it is a closed polygonal image, and finally defining the color attribute to represent the polygon with its respective color.
The above representation belongs to a hexagon. Needless to say, you can use this method to draw any other polygon sized images by using the polylines method. We can also draw triangles, squares, and rectangles, with this method. Just make sure you define the right number of co-ordinates and that they are all aligned in their right respective positions.
We have successfully completed all the basics of drawing involved in the computer vision module. Before we move on to the final topic I would like to reiterate that practicing and exploring is key to success in computer vision. With this basic knowledge of computer vision, you can try out the drawing function in opencv to design something really cool. If you interested in working on this project then please do send me screenshots of whatever you design. I would totally love to see what you guys have designed.
Let us move to the final topic under this section where we will cover all the intricate details on how to access the webcam which will be extremely useful for computing real-time and real-life scenarios. So, let us move on to the next section to understand this topic in further detail.
3. Accessing the Webcam:
We have reached the final part of this section that is to access your webcam for real-time as well as live image or video analysis. This access is useful for real-time object detection, face recognition, video surveillance, among many other applications. Opencv allows you to access external cameras, and provides you the option to choose which camera you want to choose as well. This same option can also be used to view the image in a similar manner as compared to the way we viewed the images. The procedure to access your webcam can be done in the following way:
Let us understand each line in the above code block in detail and in a conceptual manner. the cap variable is used capture and access the webcam. While the capture video by the webcam exists, we will proceed to read the respective video image by image in the external cv2 graphical window that will be displayed during this process. We are giving a waitKey command as specified in the earlier sections. If the button ‘q’ is pressed during this interval while the video is running, then the program immediately exits the graphical window and no other operations can be performed. The cap.release() command is used to release the webcam, and the destroy all windows command is used to exit and destroy the cv2 based graphical window.
If you want to play a saved video from your folder, then that can also be easily done. Instead of specifying the external webcam option to use, if you choose to specify the location of a video file to play in a similar manner to how we worked while dealing with images, you will be able to achieve a similar result. You can successfully play the video you desire. Just to make sure that we are all on the same page and you are not confused by the statements made in this paragraph, please follow the below code line in order to execute and play the video you select.
Video = cv2.VideoCapture(“your path”) # Read The Video
With this we have come to an end of the basic concepts of dealing with computer vision problem statements, and how to effectively use the opencv module to successfully achieve the desired task. However, this is not the end though because we still need to know how this knowledge will help us to solve more complicated and complex computer tasks, and what exactly are these tasks that we need to perform. To have a more strategic understanding of this, let us look at the applications of computer vision in detail in the next section of this article.