Source: Deep Learning on Medium

This paper published in ICML19 is proposing an alternative to the popular Shannon Rate-Distortion theory of lossy image compression. It has been show in other works that “low distortion” (e.g. PSNR, MS-SSIM) is not a synonym for “high perceptual quality”, and in fact optimization of one often comes at the expense of the other. This work takes a more theoretical approach to proving the trade-offs involved. While their experimental approach only validates on a Bernoulli source and MNIST, they do a good job in putting a framework around this triple-tradeoff. Link to paper.

They show that restricting the perceptual quality to be high, generally leads to an elevation of the rate-distortion curve, thus necessitating a sacrifice in either rate or distortion.

**Notes:**

- Their definition of Distortion: Expectation on original image X and reconstructed image X_hat.

2. Rate is defined as the R(D), I denotes mutual information and is constrained by D. Several general properties of this function are known, including that it is always monotonically non-increasing and convex.

3. Perceptual quality is defined as divergence of distributions from two sources (e.g. K-L divergence or Wesserstein).

4. Note, R-D-P trade-off is not a closed form solution.

5. They show minimizing distortion does not necessarily lead to good perceptual quality. This behavior, is shown to hold true for *any* distortion measure.

6. At high bit-rates, it’s possible to achieve good perceptual quality (low P) without significantly sacrificing distortion D. However, as the bit-rate becomes lower, the equi-rate level sets substantially curve towards the low P values, illuminating the exacerbation in the tradeoff between distortion and perception in this regime.

7. Theorem 1: 1) R-D-P functions is monotonically non-increasing in D and P. 2) Is convex if A1 constraint holds and 3) satisfies R(.,0) not equal to R(.,inf) if A2 holds. (constraints in paper). i.e. properties 1 and 3 indicate that there exists some D0 for which R(D0, 0) > R(D0, ∞), showing that the rate-distortion curve necessarily elevates when constraining for perfect perceptual quality.

8. Theorem 2: it is possible to attain perfect perceptual quality without increasing the rate, by sacrificing no more than a 2-fold increase in the mean squared-error (MSE). More specifically, attaining perfect perceptual quality at distortion D does not require a higher bit rate than that necessary for compression at distortion 1/2 D with no perceptual quality constraint.

**Cons / Not quiet clear:**

- Experimental section is little weak due to absence of human-evaluations and lack of real-world dataset (MNIST does not cut it).
- Methodology for MNIST is evaluated for higher bit-rates (see fig. 6). While the MNIST images look convincing, do they get similar trade-offs for lower bit-rates?