Paper Review 6— Rethinking Lossy Compression: Rate-Distortion-Perception Tradeoff

Source: Deep Learning on Medium


This paper published in ICML19 is proposing an alternative to the popular Shannon Rate-Distortion theory of lossy image compression. It has been show in other works that “low distortion” (e.g. PSNR, MS-SSIM) is not a synonym for “high perceptual quality”, and in fact optimization of one often comes at the expense of the other. This work takes a more theoretical approach to proving the trade-offs involved. While their experimental approach only validates on a Bernoulli source and MNIST, they do a good job in putting a framework around this triple-tradeoff. Link to paper.

They show that restricting the perceptual quality to be high, generally leads to an elevation of the rate-distortion curve, thus necessitating a sacrifice in either rate or distortion.

Notes:

  1. Their definition of Distortion: Expectation on original image X and reconstructed image X_hat.

2. Rate is defined as the R(D), I denotes mutual information and is constrained by D. Several general properties of this function are known, including that it is always monotonically non-increasing and convex.

3. Perceptual quality is defined as divergence of distributions from two sources (e.g. K-L divergence or Wesserstein).

4. Note, R-D-P trade-off is not a closed form solution.

5. They show minimizing distortion does not necessarily lead to good perceptual quality. This behavior, is shown to hold true for any distortion measure.

6. At high bit-rates, it’s possible to achieve good perceptual quality (low P) without significantly sacrificing distortion D. However, as the bit-rate becomes lower, the equi-rate level sets substantially curve towards the low P values, illuminating the exacerbation in the tradeoff between distortion and perception in this regime.

7. Theorem 1: 1) R-D-P functions is monotonically non-increasing in D and P. 2) Is convex if A1 constraint holds and 3) satisfies R(.,0) not equal to R(.,inf) if A2 holds. (constraints in paper). i.e. properties 1 and 3 indicate that there exists some D0 for which R(D0, 0) > R(D0, ∞), showing that the rate-distortion curve necessarily elevates when constraining for perfect perceptual quality.

8. Theorem 2: it is possible to attain perfect perceptual quality without increasing the rate, by sacrificing no more than a 2-fold increase in the mean squared-error (MSE). More specifically, attaining perfect perceptual quality at distortion D does not require a higher bit rate than that necessary for compression at distortion 1/2 D with no perceptual quality constraint.

Rate-Distortion-Perception Trade-off curve on MSE as Distortion for MNIST dataset.

Cons / Not quiet clear:

  1. Experimental section is little weak due to absence of human-evaluations and lack of real-world dataset (MNIST does not cut it).
  2. Methodology for MNIST is evaluated for higher bit-rates (see fig. 6). While the MNIST images look convincing, do they get similar trade-offs for lower bit-rates?