Person of interest in real life with deep learning

Original article was published by Enismentalistler on Deep Learning on Medium

Person of interest in real life with deep learning

If you are on a high hill and there are no buildings higher than where you are, you can see a large part of Istanbul at a glance; But if your goal is not only to watch Istanbul, but to take notes of what you see, to follow hundreds of people at the same time, to infer in, it’s a little difficult (even impossible).

illustration : Matt Vogt

Istanbul Metropolitan Municipality has put tourist cameras in important places to watch Istanbul without going up a high hill and broadcasts them publicly. There’s nothing less than that. Mobese broadcasts traffic cameras in almost everywhere in Istanbul publicly on his website. We even made software on July 30, 2013 and share it with people in Istanbul to watch all touristic cameras from a single software.

Now we can watch Istanbul more easily and anywhere we want; But everything flows so fast that my eyes can’t choose anything. But isn’t it to make sense of the existence of something? Then we ‘ll try to create a machine like in Person Of Interest.

Part 1: Detect objects

There’s no need to rediscover America, there are dozens of ready-made libraries, datasets on the Internet. As a library, we will benefit from the OpenCV and NumPY library. We are going to take advantage of the ready-trained Yolov3 data set to detect objects. You can download datasets at You can also use it if you want to have a framework called Darknet that supports CPU and GPU calculations, providing fast installation; But we are not going to use it on this project.

We download my Yolov3.weights, Yolov3.cfg and coco files. Maybe there are readers who are not connected to deep learning but have nothing to do with it. That’s why I felt the need to explain what the files were for. Here, Yolov3.weights contain previously trained 9000+ datasets. Yolov3.cfg represents configuration files, meaning CNN architecture includes anchor boxes, class numbers, learning rate, optimization technique, input sizes, probability score, and batch sizes. The Coco file is the file where the names of the objects to be detected are kept. We tried to write the terms as much as possible in Turkish; But if you don’t understand the terms or have no knowledge, you can get more results by researching them online.

We can process my Yolo data sets with 80 lines of code that we type in Python. (We will not share all the codes because the article is based on inspiring. You can find some of the codes we share here: But if you want to do it yourself, we will try to explain how it is done.)

Then we record the addresses of all Mobese cameras and touristic cameras in m3u8 format and create them as a dictionary through Python. Because we just want the cameras we’ve chosen to get on my screen instantly and process the data there.

We added the results we received below gif when I ran the software. All of the results were taken from real-time cameras and processed in real time

Please check this link:

As you can see, we successfully processed the images from the cameras instantly. But we just want to perceive the cars that pass through it, not just people, but also the mood of people.

Part 2: Sensing people’s moods.

There are many methods to perceive people’s moods; But they all have advantages and disadvantages for each other, for example, we can use the Haar Cascade method to detect moods. As an advantage, it doesn’t tire our processor too much, it results faster. Disadvantagelly, it only detects when you see the human face from a certain angle, and the error rate is very high compared to normal deep learning algorithms.

After doing some research at Github, we noticed that they were emotionally detecting with libraries like Tensorflow and Keras. Among them, we found ready-made software that would make our processor less tired than the others. The general logic of the software again detects the human face with haar cascade, then process ready-trained facial data with Tensorflow and analyzes emotion.

Please check this link:

When we combined emotion analysis with Yolo, we came across the image like the one above. But the biggest problem here is that most people don’t always perceive people’s faces because they’re at different angles. Which makes things a little difficult. Even if the algorithm detects 20 faces in an hour, it’ll do us good. I’m not going to push too hard for him.

Right now, software can see and detect people, and the beauty of it is that his eyes don’t get tired. For us, that’s not enough, so we can perceive so much. We can use this data to make problem detections.

Part 3: Software detecting problems

Our software can see everything with its artificial eyes; No one’s ever been a brother to him, so what’s the accident? What’s the problem? Don’t know. Let’s make him a brother and teach him to report certain problems and these problems to his elders when he sees them. From now on, we’re going to take advantage of the power of algorithms instead of artificial intelligence.

The first step is to detect car accidents. What we need to do is find a camera that sees cars at the right angle. If it’s not inclined enough or sees cars from a flat angle on a horizontal plane, our algorithm will not work. But since all Mobese cameras are placed at an angle of almost a massive 45 degrees, that’s not a problem for us.

The path we will follow to determine the accident situation will conclude that our software working with yolo dataset detects objects with the name of the car and intersects 2 frames when it is squarely intersected.

Please check this link:

We wrote an additional eight lines of code for accident detection. Now our algorithm will call it an accident when the square of 2 objects with the name car is intertwined.

We are going to do unauthorized parking detection in step two. In doing so, we are going to make sure that when we detect a face in the area we wrote last year, we are going to evolve our alarming app a little bit more and detect it when it’s parked in the areas we have marked.

In doing so, we will take advantage of the selectROI command, which is one of the opencv functions for marking. With this command, you can mark the image on it. We get the starting x-y and end x-y coordinators of the marked field. Then we compare the detected car’s x and y coordinators to the coordinators of our area and detect whether it entered the area we mentioned. And to see if you’re parking, we are going to have to compare it to a while. There’s a formula in the software called tic toc. And when he comes out of the field, he’s a toc, and he compares time to what we’re talking about as tic. When it stays within over a certain period of time, it perceives it as parked. We set the time to 10 minutes.

We’ve also written the entry code to the restricted place again with the SelectROI method, and we are not actually retelling it from it. When we log into the areas we’ve identified, our software will have detected it.

We have the knife object raised when the ready-trained knife object is detected in yolo again to detect the knife. We wrote a for loop to frame the objects detected in the code we typed. Since we print the name of each detected object in the For, we can question whether the detected object is a knife or anything else with if else command on a bottom line.

Now the software will set me off when it detects a knife.

We wanted to add to the perception of it when a ship or boat hit the ground here; But we need a fixed camera for that , we couldn’t find a fixed camera that saw the sea. But using the Dlib library, we have done software that follows me when we have marked moving objects before. Likewise, we can make it look like we are moving the ground and keep the space we have chosen fixed. We will do your integration in the future. In this way, in the last days we could have detected the ship that washed ashore in Sariyer.

When the detection of falls is placed at a flat angle (unless the crookies are inserted), the detected object may detect that the y-length is reduced if it is more than x-length

Part 4 : Software records what it sees

To do this, we can save the data as a Json file or as Excel. We will save data as excel because we don’t plan to use data in a different place in the future. In doing so, we are going to use the openpypxl library. The data we need are the names of the detected objects , the coordinates on the X and Y axis, the number of the detected object in the environment, the accident status, the danger situation, the time and the state of emotion. With this data we can then access the temperature map (where people have passed the most, which times and on what days more crowded, etc.)

When we receive this data, the results are as follows:


Now we have done software that follows everything instantly and lets us know. In doing so, we tried to tell you as much as we could to inspire you.

Github Link :