Original article can be found here (source): Deep Learning on Medium

## A Stroll Through the Manifold

Above some examples of latent space exploration for my dresses and footwear models. The idea is just to generate N sample vectors (drawn from a Gaussian distribution) and transition between them sequentially using whatever preferred transition function. In my case here this function is just a linear interpolation done on a fixed frames number (equivalent to morphing).

Notice that we are relying on the very initial latent vector *z. *This means that we are using the StyleGAN mapping network to first generate the latent vector *w*, and then using *w *to synthesize a new image. For this reason, we can rely on the truncation trick, and discard areas of the latent space poorly represented. We want to specify how much the generated intermediate vector *w *has to stay close to the average (computed based on random inputs to the mapping network). ψ (psi) value scales the deviation of *w *from the average, and as such can be tweaked for quality/variety trade-offs. ψ=1 is equivalent to no truncation (original *w*), while values towards 0 gets us closer to the average, with quality improvement but a reduction in terms of variety.

## Encode Real Images

We often want to be able to obtain the code/latent-vector/embedding of real images with regards to a target model, in other words: what is the input value I should feed to my model to generate the best approximation of my image.

In general, there are two methods for this:

- pass image through the encoder component of the network
- optimize latent (using gradient descent)

The former provides a fast solution but has problems generalizing outside of the training dataset, and unfortunately for us, it doesn’t come out of the box with a vanilla StyleGAN. The architecture simply doesn’t learn an explicit encoding function.

We are left with **latent optimization option **using perceptual loss. We extract high-level features (e.g. from a pre-trained model like VGG) for the reference and generated images, compute the distance between them and optimize on the latent representation (our target code). The **initialization of this target code** is a very important aspect for efficiency and effectiveness. The easiest way is simple random initialization, but a lot can be done to improve on this, for example by learning an explicit encoding function from images to latent. The idea is to randomly generate a set of N examples and store both the resulting image and the code that generated it. We can then train a model (e.g. ResNet) on this data, and use it to initialize our latent before the actual StyleGAN encoding process. See this rich discussion regarding improved initialization.

Encoder for v1 and Encoder for v2 provide code and step-by-step guide for this operation. I also suggest the following two papers: Image2StyleGAN and Image2StyleGAN++, which give a good overview of encoding images for Stylegan, with considerations about initialization options and latent space quality, plus an analysis of image editing operations like morphing and style mixing.

## w(1) vs w(N)

StyleGAN uses a mapping network (eight fully connected layers) to convert the input noise (`z`

) to an intermediate latent vector (`w`

). Both are of size 512, but the intermediate vector is replicated for each style layer. For a network trained on 1024 size images, this intermediate vector will then be of shape (512, 18), for 512 size it will be (512, 16).

The encoding process is generally done on this intermediate vector, and as such one can decide whether to optimize for `w(1)`

(meaning only one 512 layers, which is then tiled as necessary to each style layer) or the whole `w(N)`

. The official projector operated the former, while adaptations often rely on optimizing all `w`

entries individually, for visual fidelity. Regarding this topic see also this Twitter thread.

Even more noticeable and goliardic when projecting reference not proper of the model training distribution, like in the following example projecting a dress image for the FFHQ model.