Source: Deep Learning on Medium
Disentangling disentanglement: Footnotes from NEURIPS — 2019
TL;DR : Disentangling disentanglement. Via this blog-post, I intend to try and summarize all of the dozen papers presented on disentanglement in deep learning in this year’s NEURIPS-2019 Vancouver.
Companion github repo replete with paper summaries and cheat sheets: https://github.com/vinayprabhu/Disentanglement_NEURIPS_2019/
Background: Disentanglement in Representation learning
On Thursday evening of the conference week, as I sauntered around the poster session in the massive east exhibition halls of the Vancouver convention center, I realized that I had chanced upon probably the 5th poster in the past couple of days entailing analysis of a disentanglement framework the authors had worked on.
A quick check in the proceedings led me to this stunning statistic: A total of I-KID-YOU-NOT dozen papers were accepted this year with the term ‘DISENTANGLEMENT’ in the title. There were at least a few more that I chanced upon in the multitude of workshops. (There were 20+ papers and talks during the 2017 NEURIPS workshop on Learning Disentangled Representations: from Perception to Control — https://sites.google.com/view/disentanglenips2017)
I had first encountered this flavor of usage of the term in statistical learning during the last stages of my doctoral journey at CMU (circa 2013) when I read ‘Deep Learning of Representations: Looking Forward’ by Yoshua Bengio in which he emphasized the need to be ‘.. learning to disentangle the factors of variation underlying the observed data’.(How I wish he still authored such single author papers)
As it turns out, much to the chagrin of the physicists perhaps, if you are working on teasing out visual style from digit type on MNIST, or separating shape and pose in images of human bodies and facial features from facial shape on CelebA or grappling with unwrapping the effects of mixture ratio of the two constituent compounds and environmental factors such as thermal fluctuation in images generated for microstructure growth, you are disentangling.
There seems to be no consensus on what the term precisely means or what metric(s) capture the extent of it, an observation that is confirmed by this rather funny/snarky slide in Stafano Soatto’s talk at IPAM (refer to the playlist below)
That said, this is not a case of there existing a mere smattering of empirical experiments that all use their own customized notion of disentanglement. In fact, reasonably rigorous frameworks have been proposed harnessing powerful tools from areas such as Variational inference, Shannonian Information theory, Group theory and matrix factorization. Deepmind’s group theoretic treatment of the same seems to have perched itself as one of the go-to frameworks. In case you are looking for a succinct 3 min recap of what this is, please refer to this video that I saw during one of Simons Institute workshops (around the 7th minute). (A much detailed talk from one the main authors of the Deepmind group can be found here)
A bird’s view of the papers presented
In Fig 4 below, is a bird’s-eye view of the 12 papers presented. I roughly bucketized them into two subsections depending on whether the main perceived goal of the paper (from my humble viewpoint) was to either analyze and/or critique the properties of a pre-existing framework or to harness one and apply the same to an interesting problem domain. Bear in mind that this is admittedly a rather simplistic categorization and this is not very instructive of whether the applications oriented papers did or did not critique and analyze the frameworks used or that the analysis/critique papers did not include real-world applications.
(You can find the pdf version with the paper links here: https://github.com/vinayprabhu/Disentanglement_NEURIPS_2019/blob/master/Disentanglement_papers_tree-diagram.pdf )
What do they mean by disentanglement?
In order to summarize the contexts in which disentanglement was used in these papers, I created a look-up-table (See Table-1). In those cases where the authors explicitly did not have a subsection dedicated to define the same, I improvised and extracted the gist (and hence the caveat [improv]).
Reproducibility and open-sourced code:
Given the strong growing trend towards open sourcing the code used to produce the results, 10 of the 12 author-groups shared their github repos as well. This is captured in Table-2 below:
What now? Some ideas..
[Here are some scribbles to try and guilt myself into working on this more seriously. Please take these with a grain of salt or 12 🙂 ]
1: Survey paper detailing the definitions, frameworks and metrics to be used.
2: Disentangling author / writing style / nation of origin using Kannada-MNIST dataset. (65 native volunteers from India and 10 non-native volunteers from USA)
3: It’s somewhat surprising that no one’s tried throwing a K user interference channel model for entanglement and see if an Interference Alignment [https://arxiv.org/pdf/0707.0323.pdf ] like trick works for Dsprites-like datatsets
4: Disentangling Shoe type, pocket and device location from Gait representations
5: Bridging the body of work pertaining to (Hyperspectral) Unmixing / Blind source separation and disentangled representation learning.
Companion github repo replete with paper summaries and cheat sheets.
A. Datasets to get started with:
B. Video playlist:
 Y. Bengio’s: From Deep Learning of Disentangled Representations to Higher-level Cognition
 \beta-VAE (Deepmind): https://www.youtube.com/watch?v=XNGo9xqpgMo
 Flexibly Fair Representation Learning by Disentanglement: https://www.youtube.com/watch?v=nlilKO1AvVs&t=27s
 Disentangled Representation Learning GAN for Pose-Invariant Face Recognition: https://www.youtube.com/watch?v=IjsBTZqCu-I
 Invariance and disentanglement in deep representations (Fun talk), https://www.youtube.com/watch?v=zbg49SMP5kY
(From NEURIPS 2019 authors)
 The Audit Model Predictions paper: https://www.youtube.com/watch?v=PeZIo0Q_GwE
 Twiml interview of Olivier Bachem (3 papers on this topic at NEURIPS-19): https://www.youtube.com/watch?v=Gd1nL3WKucY