FactorVAE Explained

Original article was published on Deep Learning on Medium

Photo by Thomas Chan on Unsplash

Reading through the paper, understanding the key aspects while catching the details

First Page

Unsupervised Disentangled Representation Learning is typically performed with a reconstruction loss (no supervision) with a term that pushes statistically the latent representation towards a desired prior.

The prior consists of independent random variables, so this is what it is meant by disentanglement practically.

Beta-VAE is the benchmark.

The trade-off between the reconstruction loss (its measure is clearly defined) and disentanglement (they propose a new metric) is the metric to compare.

Bengio definition of Disentanglement: change one number in the latent representation and one, and only one, factor of variation changes.

Generative Models seem to be an interesting tool to learn disentangled representations.

What is the cost of supervision?

  1. Humans learn quite a lot in an unsupervised way, and so far humans have been way better at learning complex tasks than any algo
  2. Labels have an explicit cost: (good) annotations do not come for free
  3. Labels have an implicit cost: even if you are able to pay well for the best annotations ever, humans have bias and make mistakes