Original article was published on Artificial Intelligence on Medium
Ants can tell you how to wire your neurons
Using Swarm Intelligence in Neural Architecture Search
An ant-inspired algorithm can be used to determine the best architecture of your brain-inspired neural network model. Nature is surely technology’s muse.
Here we will discuss the following topics:
- How ants find the best routes to food sources,
- An example of an ant-inspired optimization algorithm,
- How to apply this algorithm to find the best architecture for an artificial neural network model.
The foraging behavior of ants
How can an ant find the shortest route to a food source? It certainly doesn’t see it from the nest as its target its usually several dozens of meters away and an ant itself is less than one centimetre long. It has to choose its long-distance itinerary solely based on limited and local navigational information¹.
The mystery was unveiled in 1989 and presented in a concise yet exciting paper Self-organized Shortcuts in the Argentine Ant. The authors performed an elegant experiment which proved that the ants rely on the collective experience of the colony expressed in the pheromone level left by members of the colony on the available routes.
An ant nest was linked to a arena with food by a special bridge consisting of two identical modules:
As you can see a module has two branches which significantly differ in length, but at the branching point nothing gives off the difference: the angle to the main bridge axis is the same. Therefore an ant cannot know in advance which option is better.
To remove a potential bias towards left or right side (i.e. just in case for some unknown reasons ants like the one side more) the second module is connected to the first one in a way that it’s longer branch is on the other side:
Confronted with such a setup the ants initially choose their route randomly. After twenty seconds the first ants to traverse the bridge via the shorter path start their return. Those who choose the short route again will be the first ones to return to the nest and the first ones to lay pheromone trail along entire nest — food — nest path. This will give an incentive to the other ants to choose the shorter path (marked with higher levels of pheromones).
The results of the neat experiment are best captured through the following photo:
You have to look closely to notice the tiny dark dots — the ants during their hectic efforts to provide the food to the colony. The photo on the left was taken shortly after opening of the bridge — you can see that the ants are distributed randomly between the both paths. In the second photo — taken later — the ants are already concentrated on the shorter route.
Ant Colony System
This behavior of ants inspired the whole family of algorithms collectively referred to as Ant Colony Optimization Algorithms and belonging to even wider group of methods called Swarm Intelligence. The common feature of all these techniques is that they rely on decentralized systems with no individual control unit. The local interactions between the individual agents in the system lead to a globally intelligent behavior.
Here I will present just one of these algorithms called Ant Colony System (ACS) proposed in 1997 by Marco Dorigo as a solution to Traveling Salesman Problem: finding the shortest route connecting all the cities in a network of cities connected by roads of various length.
ACS algorithm overview
In a nutshell the algorithm does the following:
- A number of ants is randomly placed in some nodes of a graph.
- The ants start walking through the graph choosing the nodes based on stochastic state transition rule (they prefer shorter edges with high pheromone levels).
- During an ant’s walk the pheromone along its path is evaporating according to a local updating rule (it is to encourage other ants to try out other paths).
- After all the ants complete their walk the amount of the pheromone is updated again according to a global updating rule (the shorter the entire path, the bigger pheromone deposition on each edge belonging to the path).
State Transition Rule in detail
First of all let’s introduce the following notation:
An ant placed in node r chooses the next node s according to the following state transition rule:
- q is a random variable uniformly distributed in [0, 1],
- q₀ is a parameter (0 ≤ q₀ ≤1),
- S is a random variable drawn from a probability distribution defined below:
Let’s take a closer look at these equations. As mentioned before, while choosing the next node ants prefer edges (r, s) with higher pheromone level τ(r, s) and lower length δ(r, s) (and therefore higher η(r, s)= 1 / δ(r, s)). The probability of choosing a given edge p(r, s) should therefore be proportional to τ(r, s) and η(r, s):
To control the relative importance of the two factors (pheromone level τ and edge length δ) parameter β > 0 was introduced:
The bigger the β, the more important the distance than the pheromone level, i.e. long distance will have higher repelling power than the attractive power of pheromones.
To get a variable which can be interpreted as a probability, we have to make sure that its values range between 0 and 1. It can be done by dividing
τ(r, s)·η(r, s)ᵝ by the sum of similar expressions for all the nodes u accessible from node r and not yet visited by an ant k. If we denote the set of such nodes by Jₖ(r) the full formula for probability pₖ(r, s) with which ant k chooses the node s as its next destination from node r is given by the previously quoted formula:
The transition between the nodes happens according to this probability only in the so-called biased exploration stage. In the exploitation stage, the behavior of an ant is deterministic: it chooses the optimal edge i.e. the edge with highest τ(r, s)·η(r, s)ᵝ.
Pheromone updating rules in detail
During their walk ants change the pheromone levels on each visited edge according to the local updating rule:
where ρ denotes pheromone decay factor and τ₀ is the initial pheromone value. The evaporation of the pheromones entices other ants to try other paths.
After all the ants complete their routes, the ant which found the best solution deposits pheromones along its path according to the global updating rule:
- pheromone evaporation regulated by parameter 0 < α < 1
- pheromone deposition which is in inverse proportion to the length of the globally best tour from the beginning of the trial L_bg.
Neural Architecture Search
Whenever you come up with a neural network architecture you have to perform a hyperparameter search: find the learning rate, batch size, learning rate scheduling etc. for which your network performs best. Why not to extend the scope of that search to help you find the best architecture as well? This approach is referred to as Neural Architecture Search (NAS) and can be realized through various techniques such as reinforcement learning and evolutionary algorithms. Of course you have to introduce some constraints on the search space e.g. define what kind of layers should the algorithm consider and how big the network should be.
How to apply Ant Colony System to Neural Architecture Search?
The general idea is this: you decide what kind of layers interest you (e.g. you want only convolutional, pooling and fully-connected layers in your model) and how deep should your model be. Using a number of “ants” (independent agents) you gradually build a graph where each node corresponds to a neural network layer and each path corresponds to an architecture. “Ants” leave “pheromone” so that “better” paths attract more “ants” in the next steps. [Which model wins]
Let’s see how it’s realized in the DeepSwarm paper which applies ACS to NAS for CNN. A good overview is given by the following figure from the paper:
Let’s dive into the details. The DeepSwarm algorithm starts with a graph which contains only input node (input layer) and places a number of “ants” in that node. Each “ant” chooses what kind of node it will go next based on the previously discussed ACS State Transition Rule. The selected node is then added to the graph and the “ant” chooses its attributes (e.g. filter size and stride in case of a node corresponding to a CNN layer). The choice of the attributes is also based on the ACS State Transition Rule.
Once an ant’s path reaches current maximum length, the neural network architecture corresponding to that path is evaluated and the ant performs ACS local pheromone update which decays the pheromone so that other ants are encouraged to explore other paths.
Once all the ants finish their walk, the accuracies of all the architectures are compared and the ant whose architecture yields highest accuracy performs the ACS global pheromone update therefore increasing pheromone levels on its path.
The current maximum path length is increased and a new population of ants is released from the input node. This continues until the current maximum path length reaches the user-defined maximum allowed depth of the model.
The life of eusocial insects like ants or bees is not only fascinating in itself but also provides an inspiration for a number of algorithms belonging to Swarm Intelligence techniques. Here we discussed the details of one such a method: Ant Colony System and we described how it can be applied to the problem of finding the best architecture for a neural network.
 Goss, S. & Aron, Serge & Deneubourg, Jean-Louis & Pasteels, Jacques. (1989). Self-organized shortcuts in the Argentine Ant. Naturwissenschaften 76: 579–581. Naturwissenschaften. 76. 579–581. 10.1007/BF00462870.; link
 Dorigo, Marco & Gambardella, Luca Maria. (1997). Gambardella, L.M.: Ant Colony System: A cooperative learning approach to the Traveling Salesman Problem. IEEE Tr. Evol. Comp. 1, 53–66. Evolutionary Computation, IEEE Transactions on. 1. 53–66. 10.1109/4235.585892.; link
 Byla, Edvinas & Pang, Wei. (2019). DeepSwarm: Optimising Convolutional Neural Networks using Swarm Intelligence.; link