Original article was published on Artificial Intelligence on Medium
Little Engineering Challenge
10% engineering; 90% creativity
I like this engineering problem because the solution requires no special training, and because it demonstrates the creative problem-solving side of engineering — anyone can understand the problem-set and possibly think of the solution.
Design a device (hardware+software) that detects gunshots with high confidence.
Indoors, bad acoustics (glass, steel, concrete), many people/much activity. Possible machinery.
Current solution landscape
There are no high-confidence retail solutions that accomplish this objective. Many solutions are in the $1000/device range, but fail (usually due to false positives). So while there are solutions (products) on the market, they all produce a lot of false positives (to the point that such devices are removed and/or contracts are not renewed). (Note: there is a solution, but it’s not on the market).
The Decibel Problem(s)
Gunshots typically register at 130–170 decibels. Most of your daily life registers between 40 and 80 decibels, with loud screaming and other loud noises perhaps hitting 90 decibels. So gunshots are clearly significantly louder than “normal” sounds. (Note 1: Even a jet engine at 20 feet only hits 130–140 decibels, so gun shots are usually significantly louder than any “daily life” sound.) (Note 2: Keep in mind that the decibel rating system is logarithmic.)
The problem is that commercial microphones are very low confidence outside of that standard daily range of 40–80 decibels. There are phone apps that claim to measure up to 130 decibels, but that’s the standard high boundary of commercial microphones (the hardware); the measurement is not reliable. Even expensive commercial microphones are higher-confidence in the 40–80 decibel range, not at the limits (0–30 and 100–130 decibels).
Further complicating the matter is that the bad acoustic environment means that while a single human is highly unlikely (and largely incapable) of making sounds above 110 decibels, the bad acoustics and many people/much activity, plus the possibility of machinery, means that decibel-spikes are possible. While it’s not realistic that any combination of sounds would reach gunshot level (>130 decibels), a commercial microphone wouldn’t be able to distinguish the decibel variance between many people/machinery and a gunshot in a bad acoustic environment.
The Model Problem(s)
There no such thing as a “gunshot” sound or acoustic model. Each type of gun produces a different acoustic signature (with some significant variation), and if you wanted high-confidence models, you’d probably need something like 80 different models to cover all the possible gunshot signatures. It’s safe to assume that the use case will entail continuous audio processing, so processing continuous audio through 80 models sounds like a bad idea (for the gpu(s), the network, and anything else involved). If a facility were to have multiple such devices, you could end up with bandwidth problems and a gpu you could roast marshmallows on. So, 80 models probably isn’t the greatest solution (which is why almost no one uses it).
The other solution is a generic “gunshot” model. A single model won’t melt your gpu, and your hardware solution can be fairly inexpensive. The problem is that generic models produce an undesirable number of false-positives in actual use cases (the generic model/false positives is the most common reason why current models fail).
The final model problem is model training and calibration. “Bad acoustics” is another term for “your model probably won’t work.” Your model — particularly generic models — should be trained in the use environment if that environment is substantially unique. Even worse, your audio configuration (mic/software) should be calibrated for that environment. This environment-specific training and calibration almost never occurs, which is another reason why these solutions fail.
The cost of your solution doesn’t need to be very low — $100 or $1000 is fine. But it can’t be $10,000 — which means you can’t use a $5,000 cpu/gpu config with $3,000 lab audio equipment with $2,000 on-site calibration. So if you invent a $1,000 solution, it’ll probably be put into production. But if you invent a $1 solution, it’ll probably be put into production and you’ll get a promotion.
You don’t need a PhD in signal processing to figure this out… really, anyone could. That said, some major companies have failed. So, what’s the solution?