Frontier: Must-Read Papers in ICLR 2021[Causal Inference]

Original article was published by Tech First on Artificial Intelligence on Medium


Frontier: Must-Read Papers in ICLR 2021[Causal Inference]

ICLR 2021, which is the International Conference on Learning Representations, is one of the most influential conferences in AI. It’s noted that there are 3013 papers submitted, and lots of them are related to Causal Inference this year. Causal Inference is widely used in Engineering nowadays, thus we selected 7 papers which might inspire you about the Causal Inference trendings. Enjoy!

1. Accounting for unobserved confounding in domain generalization

> https://openreview.net/forum?id=ZqB2GD-Ixn

  • Keywords: Causality, Robust Optimization, Domain Generalization
  • Abstract: The ability to extrapolate, or generalize, from observed to new related environments is central to any form of reliable machine learning, yet most methods fail when moving beyond i.i.d data. In some cases, the reason lies in a misappreciation of the causal structure that governs the data, and in particular as a consequence of the influence of unobserved confounders that drive changes in observed distributions and distort correlations. In this paper, we argue for defining generalization with respect to a broader class of distribution shifts (defined as arising from interventions in the underlying causal model), including changes in observed, unobserved and target variable distributions. We propose a new robust learning principle that may be paired with any gradient-based learning algorithm. This learning principle has explicit generalization guarantees, and relates robustness with certain invariances in the causal model, clarifying why, in some cases, test performance lags training performance. We demonstrate the empirical performance of our approach on healthcare data from different modalities, including image and speech data.

2. Amortized causal discovery learning to infer causal graphs from time series data

> https://openreview.net/forum?id=gW8n0uD6rl

  • Abstract: Standard causal discovery methods must fit a new model whenever they encounter samples from a new underlying causal graph. However, these samples often share relevant information — for instance, the dynamics describing the effects of causal relations — which is lost when following this approach. We propose Amortized Causal Discovery, a novel framework that leverages such shared dynamics to learn to infer causal relations from time-series data. This enables us to train a single, amortized model that infers causal relations across samples with different underlying causal graphs, and thus makes use of the information that is shared. We demonstrate experimentally that this approach, implemented as a variational model, leads to significant improvements in causal discovery performance, and show how it can be extended to perform well under hidden confounding.
  • One-sentence Summary: We propose Amortized Causal Discovery, a framework for inferring causal relations from time series data across samples with different underlying causal graphs.

3. Continual lifelong causal effect inference with real world evidence

> https://openreview.net/forum?id=IOqr2ZyXHz1

  • Keywords: continual learning, incremental learning, causal effect inference, representation learning, treatment effect estimation
  • Abstract: The era of real world evidence has witnessed an increasing availability of observational data, which much facilitates the development of causal effect inference. Although significant advances have been made to overcome the challenges in causal effect estimation, such as missing counterfactual outcomes and selection bias, they only focus on source-specific and stationary observational data. In this paper, we investigate a new research problem of causal effect inference from incrementally available observational data, and present three new evaluation criteria accordingly, including extensibility, adaptability, and accessibility. We propose a Continual Causal Effect Representation Learning method for estimating causal effect with observational data, which are incrementally available from non-stationary data distributions. Instead of having access to all seen observational data, our method only stores a limited subset of feature representations learned from previous data. Combining the selective and balanced representation learning, feature representation distillation, and feature transformation, our method achieves the continual causal effect estimation for new data without compromising the estimation capability for original data. Extensive experiments demonstrate the significance of continual causal effect inference and the effectiveness of our method.

4. Counterfactual generative networks

> https://openreview.net/forum?id=BXewfAYMmJw

  • Keywords: Causality, Counterfactuals, Generative Models, Robustness, Image Classification, Data Augmentation
  • Abstract: Neural networks are prone to learning shortcuts — they often model simple correlations, ignoring more complex ones that potentially generalize better. Prior works on image classification show that instead of learning a connection to object shape, deep classifiers tend to exploit spurious correlations with low-level texture or the background for solving the classification task. In this work, we take a step towards more robust and interpretable classifiers that explicitly expose the task’s causal structure. Building on current advances in deep generative modeling, we propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision. By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background; hence, they allow for generating \textit{counterfactual images}. We demonstrate the ability of our model to generate such images on MNIST and ImageNet. Further, we show that the counterfactual images can improve out-of-distribution robustness with a marginal drop in performance on the original classification task, despite being synthetic. Lastly, our generative model can be trained efficiently on a single GPU, exploiting common pre-trained models as inductive biases.
  • One-sentence Summary: Structural causal models can generate counterfactual images that can be used to train invariant classifiers.

5. Disentangled generative causal representation learning

> https://openreview.net/forum?id=agyFqcmgl6y

  • Keywords: disentanglement, causality, representation learning, generative model
  • Abstract: This paper proposes a Disentangled gEnerative cAusal Representation (DEAR) learning method. Unlike existing disentanglement methods that enforce independence of the latent variables, we consider the general case where the underlying factors of interests can be causally correlated. We show that previous methods with independent priors fail to disentangle causally related factors. Motivated by this finding, we propose a new disentangled learning method called DEAR that enables causal controllable generation and causal representation learning. The key ingredient of this new formulation is to use a structural causal model (SCM) as the prior for a bidirectional generative model. A generator is then trained jointly with an encoder using a suitable GAN loss. Theoretical justification on the proposed formulation is provided, which guarantees disentangled causal representation learning under appropriate conditions. We conduct extensive experiments on both synthesized and real datasets to demonstrate the effectiveness of DEAR in causal controllable generation, and the benefits of the learned representations for downstream tasks in terms of sample efficiency and distributional robustness.

6. Explaining the efficacy of counterfactually augmented data

> https://openreview.net/forum?id=HHiiQKWsOcV

  • Keywords: humans in the loop, annotation artifacts, text classification, sentiment analysis, natural language inference
  • Abstract: In attempts to produce machine learning models less reliant on spurious patterns in training data, researchers have recently proposed generating counterfactually augmented data through a human-in-the-loop process. As applied in NLP, given some documents and their (initial) labels, humans are tasked with revising the text to make a (given) counterfactual label applicable. Importantly, the instructions prohibit edits that are not necessary to flip the applicable label. Models trained on the augmented (original \emph{and} revised) data have been shown to rely less on semantically irrelevant words and to generalize better out of domain. While this work draws on causal thinking, casting edits as interventions and relying on human understanding to assess outcomes, the underlying causal model is not clear nor are the principles underlying the observed improvements in out-of-domain evaluation. In this paper, we explore a toy analog, using linear Gaussian models. Our analysis reveals interesting relationships between causal models, measurement noise, out-of-domain generalization, and reliance on spurious signals. Interestingly our analysis suggests that data corrupted by adding noise to causal features will degrade out-of-domain performance, while noise added to non-causal features may make models more robust out-of-domain. This analysis yields interesting insights that help to explain the efficacy of counterfactually augmented data. Finally, we present a large-scale empirical study that supports this hypothesis.
  • One-sentence Summary: We present a framework for thinking about counterfactually augmented data and make strides towards understanding its benefits in out-of-domain generalization.

7. Selecting treatment effects models for domain adaptation using causal knowledge

> https://openreview.net/forum?id=AJY3fGPF1DC

  • Keywords: causal inference, treatment effects, healthcare
  • Abstract: Selecting causal inference models for estimating individualized treatment effects (ITE) from observational data presents a unique challenge since the counterfactual outcomes are never observed. The problem is challenged further in the unsupervised domain adaptation (UDA) setting where we only have access to labeled samples in the source domain, but desire selecting a model that achieves good performance on a target domain for which only unlabeled samples are available. Existing techniques for UDA model selection are designed for the predictive setting. These methods examine discriminative density ratios between the input covariates in the source and target domain and do not factor in the model’s predictions in the target domain. Because of this, two models with identical performance on the source domain would receive the same risk score by existing methods, but in reality, have significantly different performance in the test domain. We leverage the invariance of causal structures across domains to propose a novel model selection metric specifically designed for ITE methods under the UDA setting. In particular, we propose selecting models whose predictions of interventions’ effects satisfy known causal structures in the target domain. Experimentally, our method selects ITE models that are more robust to covariate shifts on several healthcare datasets, including estimating the effect of ventilation in COVID-19 patients from different geographic locations.
  • One-sentence Summary: We take advantage of the invariance of causal graphs across domains and propose a novel model selection metric for individualized treatment effect models in the unsupervised domain adaptation setting.