What Happens When Neural Networks Hallucinate

Original article was published by Andre Ye on Artificial Intelligence on Medium

Beyond a fascinating correlation with the brain, deep dreams can be good visualizations of the network’s decision making process.

To understand how deep dreaming works, first we need to understand how conventional convolutional neural networks work.

  1. An image is fed into a convolutional neural network.
  2. Initial layers detect ‘low-level’ features like edges, bends, and points through convolutional layers. These learnings are repeatedly summarized by pooling layers.
  3. Later layers build upon previous learnings to detect ‘higher-level’ features like faces, trees, or cars.
  4. Final layers (usually dense, fully-connected) assemble features into complete understandings and classify the image.

The DeepDream selects one of the feature-detecting layers to ‘stimulate’. Since a layer’s learnings are passed on to the next through an activation functions, the algorithm alters the image to maximize that layer’s activation. In essence, we are making our image everything that layer is looking for.

Although it’s not completely accurate, theoretically each of the nodes looks for one aspect of an image. For example, consider several nodes that identify different textures of images based on previous learning.

Normally, previous inputs, when weighted and summed, can activate a node, whose learnings are distributed to each of the following nodes. In our example, the node has detected spiky textures based on four preceding activated neurons (which may correspond to things like joints, edges, etc.).

The goal of DeepDream is to activate every neuron with an output value as high as possible, by creating an image that has all the features it is looking for, as well as all the feature before it.

For example, if a specific layer’s purpose was to detect dogs and the input image was of a jellyfish, the DeepDream algorithm would continually alter the input image with dog-like figures to maximize the activation output. For example, notice the introduction of legs and even of dog faces overlaid on images of the jellyfish.

Source: Wikimedia. Image free to share.

Then, a deep-dream produced image would be representative of everything the network has learned to recognize.

Gradient descent is used to find changes to the parameters necessary to find the largest decrease in the loss function. Similarly, DeepDream uses gradient ascent to find changes in each of the pixel values necessary for the largest increase in that layer’s activation.

However, this can be problematic in terms of image generation because gradient ascent will pursue aggressive strategies to maximize the layer activation, and adjacent pixels (pixels next to each other) may have little meaningful relationships with each other. Hence, a ‘smoothing’ regularizer is often used to punish pixel changes that vary too much with respect to neighboring pixels. These ensure more ‘natural-looking’ images.

So, formally, we can write the DeepDream algorithm process as:

  1. Forward-propagate an image through a trained architecture (not necessarily a convolutional neural network).
  2. Select the layer of choice to be maximized.
  3. Calculate the output of the selected layer.
  4. Find the gradient of the image, with respect to the activations of the chosen layer (find how changing the image affects the output).
  5. Modify the image to increase these activations with gradient ascent and a regularizer to ensure smoothness.
  6. Repeat steps one through five multiple times. The more iterations of the algorithm, the more enhanced patterns will appear.

DeepDream is often used with later layers that detect higher-level entities, like the cars, animals, trains, and buildings that appear in this disorienting photo:

Source: PyCharm Blog. Image free to share.

Note that entities appearing in these photos are ones explicitly recognized for in the network because they exist in the dataset. Cars appear in the image because they do in the data, and the network has allocated certain parts of the layer specifically to identify them.

However, the lower-level features of DeepDream can also be used to paint an image with more root-level edges and points. These are the root representations of an image in the earlier layers.

Left: DeepDreamGenerator. Right: TensorFlow blog. Images free to share.

Alternatively, if the objective is to emphasize one particular object in an image rather than all learned entities — which can be overwhelming — one can feed forward an image, identify which nodes have been activated, and use DeepDream only to activate those nodes. For example, if nodes A, B, and C are activated when an image of cats are fed into the network, we use DeepDream to maximize only the outputs of those nodes, for any image.

More practically, one can view the results when fed completely random noise and told to maximize neurons for a specific class (without regularization restrictions on making images more ‘natural’). One can, for instance, view orientations and rotations of the shape that have been learned, or if they have been learned at all (this is a big problem in CNNs).

Source: Google AI blog. Image free to share.

Or, for example, researchers were able to discover that networks told to recognize weights could only do so with confidence if there was also a weightlifter in the image (perhaps as a result of strong correlation in the datasets). This is a real problem that would have been undetected by metrics that only measure in-dataset and not real-world performance.

Source: Google AI blog. Image free to share.

Deep dreaming is still, however, mostly just for fun. It can give interesting visualizations of what the network is capable of recognizing, which may be beneficial in explaining why a network may not be working as it should or to ensure a network is making the right rationales for a decision.

It may also be of interest in explaining ‘biological neural network’ phenomena — why humans sometimes see faces or other common objects in ‘training data’ in inanimate objects, especially in the absence of light (lack of information).

Thanks for reading!