Source: Deep Learning on Medium
The goal of this work is to go about planning a sequence of abstract states towards a goal and then decode the abstract states to their corresponding predicted observations. This is contrary to many other popular methods that perform planning over a sequence of actions. The approach to planning was in part inspired by InfoGAN.
The goal of InfoGAN is to maximize the amount of mutual information between s and a generated observation(o) from G(z, s). Where mutual information is defined by:
The purpose of I being that the input to the generator can be thought of as containing some representation of o and that we would like this representation of o to maximize the amount of mutual information shared with the abstract-state s. This requires us to add I as an additional component to the objective:
Where V(G, D) is the standard formulation of a GAN objective:
Training is done in a way inspired by that of InfoGAN. We start by sampling an abstract-state(s) and pass the abstract-state through a transition model to receive(s’). Next a noise vector(z) is passed into a generator with s and s’ to generate an observation o and o’, giving us G(s, s’, z). The discriminator is asked to predict whether o and o’ are real observations or fake observations from the generator, giving us D(G(s, s’, z)). Lastly the the pairs of abstract states and decoded observations([s, o], [s’, o’]) are passed to a function Q(s|o). The purpose of this function is to approximate the mutual information term in InfoGAN with a variational lower bound.
We then begin training by maximizing the objective with respect to the Discriminator and minimizing with respect to the Generator, Q, and the Transition Model. More specifically:
Planning is done by first gathering initial observation and a goal observation. The initial and goal observation are embedded into a latent space(s). A sequence of h-reachable abstract states are generated. h-reachable is a fancy way of saying that it takes less than h steps to go from one abstract-state to the next abstract-state. The sequence of abstract-states are then decoded back into raw observations.
This is an interesting way of performing forward planning, in that it is quite different by not focusing on planning with respect to actions. Some possible interesting future work would be to see a combination of planning with the Causal InfoGAN and and trajectory optimization between each of h-reachable abstract-states/observations with a Universal Planning Network.