Original article was published on Deep Learning on Medium
This started with waking up early in the middle of the night, during the extremely difficult to process, long lockdown period due to the COVID-19 pandemic.
I woke up on a Saturday at 5 AM to get groceries and saw a bunch of people playing football, amidst lockdown, in an office campus nearby. Then thought why can’t we just detect from a distance who are walking around without masks.
Then it struck me, wait… You can! I can try something. (Sometimes you forget, what skills you possess yourself. Happens to most of us! 😂)I went home with excitement and set up my Deep Learning environment.
I used my version of DarkNet to quickly spin up the standard Scene Detection models. Then I started curating a dataset of Masked and Unmasked images from different public datasets and repositories.
The beauty of DarkNet is that once you build all its C binaries and bindings, its fairly easy to retrain its outermost layer. So I got the weights of the model, the dataset which I then had to annotate, and started to train on my GTX 1060 with CUDA for around 10 hours.
This was the first result.
I ran it under several lighting conditions and saw that results fluctuate under low lit environments, so I added some Histogram Equalization techniques.
Histogram equalization adds and balances some level of contrast ratio to existing images. The results came out quite well.
The next step was to run it through a video. Now under the current lockdown circumstances, it was not feasible for me to get access to an IP Camera, so I asked my flatmate to point and shoot me (forgive the shorts😅, it’s extremely hot here!) from the window to the street.
Once the accuracy was suitable for me, I realized that I needed some more performance boost. So I tried something different. I needed to optimize the amount of memory being allocated to the thread running this job as well as the cache of the frames being processed on every tick.
So instead of me using a Python script to run this job, I had to build the other modules of DarkNet as a whole, and used CMake to build the C bindings for my OS, and added DarkNet to my SYSTEM PATH VARIABLES. This now gave me access to run inference using just the CLI 😉
This removed the necessity to have a script to do the inference. Now I searched online for videos of a lot of people walking and found one where a guy walks the streets with his camera pointed to people. So I ran this model on that video, and these were the results.
I reached around 40FPS which was good enough for video processing. Really wanted to run the webcam for the same, but sadly being quarantined doesn’t really give me a lot of space in my apartment to test something out at scale.
This means we can now run this on IP Cameras, Drones, CCTVs in public places, and system webcams.
The ideal scenario would be to deploy this on a Data Center GPU like the new Nvidia V100, to get the astounding performance and results at scale.
We can build a lot of things if we just find out problems, and I feel most of the problems just turn up every day. We just need to be creative about it.