Original article was published on artificial intelligence
I just heard from those clever chaps and chapesses at Algolux, who tell me they are using an evolutionary algorithm approach in their Atlas Camera Optimization Suite, which — they say — is the industry’s first set of machine-learning tools and workflows that can automatically optimize camera architectures intended for computer vision applications.
As we will see, this is exciting on many levels, not the least that it prompted me to start cogitating, ruminating, and musing on the possibilities that might ensue from combining evolutionary algorithms (EAs) and genetic algorithms (GAs) with artificial intelligence (AI).
But before we plunge headfirst into the fray with gusto and abandon (and aplomb, of course), let’s remind ourselves that not everyone may be as familiar with things like genetic algorithms as you and yours truly, so let’s take a slight diversion to bring everyone up to speed.
Personally, I find the entire concept of genetic algorithms to be tremendously exciting. John Henry Holland (1929 – 2015) was an American scientist and Professor of psychology and Professor of electrical engineering and computer science at the University of Michigan, Ann Arbor. In the 1960s, Holland came up with the idea of genetic algorithms, which are based on the concept of Darwin’s theory of evolution, and which employ biologically inspired operations such as mutation, crossover, and selection.
Some people might say that genetic algorithms are a metaheuristic inspired by the process of natural selection that belong to the larger class of evolutionary algorithms. I wouldn’t say this myself, but that’s mainly because I have no idea what “metaheuristic” means.
Although it’s taken countless generations, it’s important to wrap one’s brain around the fact that, starting with simple single-cell organisms, natural selection has resulted in the current peak of evolutionary development in the form of your humble narrator (I pride myself on my humility).
So, how do genetic algorithms come into the picture? Well, let’s start by assuming that we have a complex system involving a bunch of inputs and outputs connected together via a humongous collection of convoluted algorithms, where the algorithms themselves employ an eye-watering number of variables.
Let’s further assume that many of these variables are interdependent in weird and wonderful ways. For example, increasing the value on input A may cause the value on output Y to rise or fall depending on the states of other inputs and the values of various variables. Even worse, the values of the variables may themselves be inexplicably intertwined — changing the value of one may aggravate or mitigate the effects of others, which may themselves intensify or moderate myriad facets of the system’s operation.
All of which brings us to genetic algorithms. First of all, we manipulate the problem under consideration in such a way that the combination of all of its variables can be represented as a single string of 0s and 1s. Next, we perform an initialization step in which we seed an initial “population” of strings with random (or pseudo-random) values as illustrated by (a) in the figure below.
High-level visualization of the genetic algorithm process (Image source: Max Maxfield)
The next step is to evaluate each of our strings by looking at any measurable outputs from the system to see how well the system performs with regard to one or more defined goals. This allows us to assign a “fitness” value to each string. In turn, this allows us to perform a ranking operation as illustrated by (b) in the figure above.
This is where the “survival of the fittest” comes in, because low-ranking strings are discarded, while high-ranking strings are allowed to “breed.” This is the clever part, because the strings that will act as the parents for the next generation undergo a process known as “crossover.” This is where we emulate the natural process of exchanging genetic material in order to create “offspring” strings as illustrated in (c) above. Note that the (b) and (c) steps shown here comprise a single evolutionary cycle, or generation. The algorithm may perform thousands, tens of thousands, or millions of such cycles to achieve the desired result.
In addition to the fact that the crossover breakpoints are randomly selected, the system also introduces a low-level rate of mutation whereby random bits are flipped from 0 to 1, and vice versa. Also observe that the original high-ranking “parent” strings form part of the new population because we don’t wish to lose our fittest members before their time. Furthermore, some genetic algorithms bestow mating privileges based on a parent string’s fitness, in which case higher-ranking strings get to mate with more strings than their lower-ranking rivals.
And so it goes, generation after countless generation, until we end up with a result we like and/or our solutions cease to improve on what we have already achieved.
The thing is that, unlike biological processes that may take billions of years to perform their magic, modern computers can cycle through a mind-boggling number of generations in a very short amount of time. Of particular interest to me is that the results may be a long way from what one might expect and something a human would never come up with in a million years. For example, consider the 2006 NASA ST5 spacecraft antenna, whose complicated and convoluted shape was discovered by an evolutionary algorithm that was tasked with creating the best radiation pattern.
The 2006 NASA ST5 spacecraft antenna that was designed by an evolutionary algorithm (Image source: NASA)
I actually saw an FPGA-based genetic algorithm in action a couple of years ago. This involved an 8-legged robot spider at the University of Oslo in Norway, where I happened to be giving a guest lecture (I would say that they’d heard I was going to be in town and I couldn’t find it in my heart to say no, but — as many people know to their cost — much like my dear old mother, the real trick is to get me to stop talking).
The goal of the spider was to transport itself as efficiently as possible from its current position to a defined location at the far side of the room. The problem was that the little rascal didn’t know how to walk per se. All it knew was the minimum and maximum allowable extents of its various servos.
When the spider first awoke, it started moving its servo motors individually and in combination. Initially, all you could see was somewhat disconcerting uncontrolled twitching, after which it started to “grab” its way over short distances. Eventually, it staggered to its feet and began to perform a “drunken” amble, which quickly evolved into a full-fledged scurry. This all happened over the course of a couple of minutes and it was really rather unnerving to watch (I observed the latter behavior from the safety of a table top — not that I was afraid, you understand — it was just that I didn’t want to damage the little fellow).
Tuning a Camera System for Human Vision Consumption
I have to confess that, until recently, I hadn’t really given much thought to everything that’s involved in setting up a new camera system. In fact, each of the components — lens assembly, sensor, and image signal processor (ISP) — has numerous parameters (variables). This means that a massive and convoluted parameter space controls the image quality for each camera configuration.
Today’s traditional human-based camera system tuning can involve weeks of lab tuning combined with months of field and subjective tuning.
Today’s traditional human-based camera system tuning (Image source: Algolux)
The sad part of all of this is that there’s no guarantee of results when it comes to computer vision applications. What? Why? Well, allow me to expound, explicate, and elucidate (don’t worry; I’m a professional; it won’t hurt (me) at all).
Tuning a Camera System for Computer Vision Consumption
Here’s one of the problems. Until recently, the folks involved in tuning camera systems were targeting human observers. So, even though the tuning process is highly subjective, the human performing the tuning at least had a clue as to how other humans like to see their images.
One aspect of all this that really made me sit up and pay attention is the fact that tuning a camera system for a computer vision application is a completely different “kettle of poisson,” as it were, when compared to tuning an image or video stream for human consumption.
For example, assuming that the goal of the computer vision application is some form of object detection and recognition, then the way in which the images/videos need to be manipulated and presented may be very different to those intended for viewing by people. All of which leads us to the Atlas workflow for computer vision.
Atlas-based tuning workflow for computer vision camera system tuning (Image source: Algolux)
Much like a traditional object detection and recognition system, we start with a training set of raw images, where these images will be “tagged” (augmented with metadata) by humans. In the case of a traditional artificial intelligence / machine learning system, the metadata might be something along the lines of “This image contains an elephant” and “There are no elephants in this image.” By comparison, in the case of an automotive system intended to detect other vehicles, pedestrians, and so forth, the metadata may involve humans drawing different colored rectangles around the objects of interest.
In this case, however, we aren’t interested in teaching an AI/ML system to recognize vehicles and people. That system has already been trained and locked-down. What we are doing here is manipulating the variables in the camera system to process the images and video streams in such a way as to facilitate the operation of the downstream AI/ML system.
As we’ve already discussed, humans are almost certainly not the best judges of the way in which the AI/ML system likes to see its images. The solution here is to let the AI/ML system judge for itself. At least, let Atlas determine how close the AI/ML system is coming to what is required, using the human-supplied metadata as the “ground truth” state for comparison.
The real issue in this case is that we may be talking about a camera system involving hundreds of inter-related variables (parameters), which brings us back to evolutionary techniques. As the folks from Algolux told me: “Our recent breakthrough was a new Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES), which allows us to deal with highly convex/rugged state spaces. Evolutionary strategies and genetic algorithms are ‘in the same family,’ although evolutionary strategies are deterministic, which is very necessary for repeatability when performing optimization of the camera systems.”
In fact, there’s a rather interesting video on YouTube that shows the Atlas Camera Optimization Suite fine-tuning the camera system variables so as to automatically maximize the results of the computer vision application that’s using the system.
The bottom line here is that Atlas significantly improves computer vision results in days versus traditional approaches that deliver suboptimal results even after many months of human tuning.
Algolux shared the research underlying their results at the 2020 Conference on Computer Vision and Pattern Recognition (CVPR). As an aside, this work was presented as a rare oral presentation, which has an acceptance rate of less than 5% for CVPR.
The conclusion is that the use of evolutionary algorithms facilitates deeper automatic optimization of camera system architectures for computer vision tasks. Object detection or instance segmentation results are improved by up to 30 mAP points, while effort and time can be reduced by more than 10x versus today’s traditional human expert-based tuning approaches.
For myself, I’m delighted to see another example of evolutionary techniques being used to implement innovative solutions that cannot be achieved using traditional approaches. In this case, they are being used to improve the results that can be achieved by a certified and “locked-down” AI/ML based application, such as an application that has already received an automotive safety certification, for example. Being able to enhance the results from that AI/ML application by fine-tuning the camera system feeding the images/videos to the application is a win-win for all concerned.
What say you? Have you had any experience with evolutionary or genetic algorithms that you’d care to share with the rest of us in the comments below?