Generate YOLO Object Detection training data from its own results

Original article was published by Python Lessons on Deep Learning on Medium

This tutorial is one of the last tutorials from my YOLO object detection tutorial series. So, after covering I think almost everything about YOLO, I thought that it would be useful to implement something interesting and fun. Then I remembered my old Counter-Strike Global Offensive TensorFlow aimbot, where I used TensorFlow to detect enemies. This project was unsuccessful because when TensorFlow was receiving an image where it detects enemies there the bottleneck was coming. FPS was dropping to 4–5 frames per second, and it becomes impossible to play this game for our bot. Right now I have YOLO, which is much better, so I can revive my project.

Also, right now I am on Linux before I was on Windows 10. But, because I plan to use TensorRT (in my previous tutorial I did speed comparison) I decided to say on Linux. It’s simpler to use TensorRT on Linux than on Win10. Off-course, there is a method on how to install TensorRT on Win10, but for me, it’s simpler to install Steam on Linux.

To start with, I downloaded the same CSGO_training data I used last time. I followed my own YOLOv4 tutorial to train a custom object detector that I used to generate new training data, to achieve better accuracy than last time.

If you don’t know how to train a custom YOLOv4 object detector, on GitHub at the end of readme you can find a lot of tutorials with instructions on how to do that.

In the tools folder, I already uploaded script, that you can use to create XML files from detections. To test it with image detection go to tools/ scroll to detect_image function and uncomment CreateXMLfile(“XML_Dete… line. You can do the same with video or webcam detections. To test if it works, simply run from the main folder, and check if the XML_Detections folder was created with content in it.

To collect training data for CSGO I use a different method, I grab my own computer screen and do detection on it, following script code you can find here.

If you would analyze my code, you may see that draw_enemy function is simply a copy of draw_bbox. Next, I use the detect_enemy function, that’s also a copy of the detect_image function, but in this function, I removed the first line where I read images from a given path. Although I removed the last lines which were used to show the image and instead of returning the resulting image I also return a bboxes.

And this is my while loop:

If you were watching my YouTube tutorial above, you will understand that I am using mss python library to grab part of my screen, because I need to detect objects, not in a full screen. And I use offset to crop unnecessary screen part. Then I convert the image from RGBA to RGB and send the captured screen image to my detect_enemy function. As a result, I receive an image with drawn bounding boxes and bboxes array. Then we need to check if there are bboxes detected, if there are, it means we can create an XML file from our results. Then simply we need to call CreateXMLfile function, and our XML file will be created in the XML_Detections folder. Then I use a delay of 2 seconds because usually, our detection works quite fast, so without delay, we would receive a lot of the same copies, which could give us a lot of manual checking.

Right now I use LabelImg tool, to check my created XML files: