Creating Your Own Image Data-set for Machine Learning

Source: Deep Learning on Medium

One of the basic tasks for creating an effective machine learning model is to have the appropriate data-sets available for us to use.
It is not always easy to find the required data-set, especially for image related data. In this article, we will try to tackle this very problem by writing a python program to create our own image data-set.

Photo by Ales Nesetril on Unsplash

In this article, we will try to tackle this problem with a simple but effective solution by learning how to create our own image data-set using our laptop’s webcam or any camera connected to our Computer for analysis which can be used to train a Neural Network with Deep Learning algorithms or run machine learning algorithms on. 
The process of creating our own data-set may seem like a tiresome process but the data can be very specific to our requirements and the data can be fine-tuned to our requirements which would eventually lead to much better results.

The Steps we will be following is as follows
1)Use Open CV library for python to capture frames from our webcam or any other video recording device connected to our PC.
2)Implement the Haar Cascades classifier to detect the object from the video frame .
3)Then crop the frame of the detected object and convert it into an image format and store it in a directory.

Now that we have gone through the basic procedure, let us begin.

I will be creating a dataset to collect the pictures of the eyes of people that appear in the frame of the camera for the purpose of demonstration.
Any object can be detected just by replacing a few lines of code as we will see later.

The Requirements

Let us first import the required packages to run our code. We will be using just two import statements

import cv2
import os

cv2 is the python wrapper library for OpenCV which can be installed by running the following command from either command prompt or the anaconda prompt depending on your python installation.

pip install opencv-python

If that doesn’t work then I suggest you follow this tutorial for windows or this for mac.

Now we need to download the corresponding Haar Cascade .xml file to enable object detection. We will be downloading two files, the first is to detect the face and then once the face is detected we shall detect the eyes inside the detected face, this will improve our accuracy as any wrongly detected frames can be avoided to a certain degree.

Let’s head over to GitHub to download the files. Download the haarcascade_eye.xml and haarcascade_frontalface_alt.xml and place them in the same directory of your python code (or any directory, just remember the path to the file).

Now let us consider the first part of our code

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_alt.xml')
eye_cascade =
cap = cv2.VideoCapture(0)
image_eyes = []
path = 'C:\\Users\\Your_System_Name\\.spyder-py3\\Deep_Learning\\Project\\Generated'

here we assign the two variables to access the haar cascade values for both the objects, now we can use these objects to detect our face and subsequently the eyes. You can just change the .xml for whatever object you wish to capture images of.
 We then begin capturing our screen using the open CV’s built-in function VideoCapture(), we specify our camera number as the parameter, leave it as 0 if you have only one camera attached to your PC. If you have multiple cameras then give the appropriate number as the parameter.
Then create a list to append all the images (which is stored as numbers in decimal form) of the detected frames.
Then finally create a variable called path to store the path of your output location where the images generated will be stored. (Make sure to use two backslashes as python detects a single slash as a delimiter)

Let us continue to the key part of our program.

while True:
ret, img =
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

face = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in face:
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]

eye = eye_cascade.detectMultiScale(roi_gray)
for (ex, ey, ew, eh) in eye :
#cv2.rectangle(roi_color, (ex, ey) ,(ex+ew, ey+eh), (0,255,0), 2)
image_eyes.append(roi_color[ey:(ey+eh), ex:(ex+ew)])

cv2.imshow("Face", img)
for i, x in enumerate(image_eyes):

cv2.imwrite(os.path.join(path , "eye-" + str(i) + ".jpg"), x)
k = cv2.waitKey(30) & 0xff

if k== ord('q'):

We begin with a while loop which iterates until the termination condition is met ie. it records from the webcam unless stopped.
we call the inbuilt read() function and assign two variables ret and img . The frame is stored in img variable.
Next up we convert the frame to grey scale as I wanted my images to be in grey (You can leave this out if you want colour images).
Now we invoke the haar cascade classifier to detect the face by calling the detectMultiscale() function. The parameters are pretty standard, you can explore the documentation and experiment yourself, but these default values should work just fine.
Next run a work loop to get the coordinates of the detected face frame,this part is common for any rectangular object detection, now once that is done we call in the detectMultiscale() function to detect the eyes frame within the face frame,now we again run a similar nested for loop to get the coordinates of the two eyes. Now once inside we append these coordinates to the image_eyes list.
The commented out line is the code to draw boxes around the detected eyes.It will help you see if the object detection works as per your requirements or not.
Then we invoke the imshow() function to display the recordings of our webcam to the user(shows what the webcam is recording to the user).
Then we write a for loop to enumerate over the matrix values of the image_eyes list and write the file to our computer by specifying the file name, path and image type. 
Finally, we assign the ‘q’ key to stop the program from recording further and to immediately terminate the program and stop all processes.

That’s pretty much it, the program will detect your eyes and store the detected part as a .jpg file. The number of images generated in pretty high and large datasets can be generated with different types of input and that can, in turn, be used in deep learning or other fields to train our model.
I have given an example for detecting our eyes and making an eye dataset, you can come up with your own images using the same technique.
All you have to change is the CascadeClassifier with the appropriate objects CascadeClassifier and minor adjustments inside the for loop and you are good to go.