Original article was published on Artificial Intelligence on Medium
A Generative Model of People in Clothing
Now we are going to generate the image, directly and this is actually a hard task. Since there are so many poses in humans, a lot of different angles to consider.
There is one way to do this via computer graphics, but here we are able to directly generate everything from one model.
I didn’t even think that this is a hard problem to solve, very interesting. Cloth Net is the name of the game and, there is some latent variable we want to model.
Learning a direct model is good since we are able to cover a lot of distribution. The bad side of this approach is, we need a good amount of data, and it can be hard to get this.
A lot of people actually saw the generated images as real, this is very impressive results. And for related works, many related works exist.
Such as pedestrian detection, this might include drawing bounding boxes as well. And of course, human pose estimation.
The general GAN method is explained, and image to image translation is also considered. 3D shapes and everything, the dataset for pose estimation, is not labeled with clothing. So they had to create some other datasets.
They have augmented the dataset, this is quite an impressive work. Again, it seems like the most critical contribution to this paper is the created Dataset.
A UNet architecture is used with a skipped connection. VAE also have some part in the network, they are put together to create this network.
All of this tricks are explained, these are old but goodies, a good review of the knowledge.
Quite a large overall network, this might limit the training case. But it is good in a sense that the model is covering and using DIRECT image data.
A walk in the distribution shows, how well the model has learned the distribution. A generative model here and there.
Wow, the model is able to encode the color information as well, and the generated image actually looks really good. The color scheme is mixed very well. The background image, as shown below is bit weird but overall this is an amazing result. I like this a lot.
And without the two sketch models, we can see that the model is not able to generate a high dimensional image. This indicates that the author was able to create an effective deep learning architecture.