13 probability distributions you need to master in deep learning

Original article was published by Earth System Science and Remote Sensing on Deep Learning on Medium

1. Uniform distribution (continuous) code : https://github.com/graykode/distribution-is-all-you-need/blob/master/uniform.py

The uniform distribution has the same probability value on [a, b], which is a simple probability distribution.

2. Bernoulli distribution (discrete) code: https://github.com/graykode/distribution-is-all-you-need/blob/master/bernoulli.py

The prior probability p(x) does not consider Bernoulli distribution. Therefore, if we optimize for maximum likelihood, we can easily be overfitted.Binary classification is classified using binary cross entropy. Its form is the same as the negative logarithm of the Bernoulli distribution.

3. Binomial distribution (discrete) code: https://github.com/graykode/distribution-is-all-you-need/blob/master/binomial.py

The binomial distribution with parameters n and p is a discrete probability distribution of the number of successes in a series of n independent experiments.The binomial distribution is a distribution that considers the prior probability by specifying the number to be selected in advance.

4. Multi-Bernoulli distribution/categorical distribution (discrete) code: https://github.com/graykode/distribution-is-all-you-need/blob/master/categorical.py

Do Bernoulli is called classification distribution.The cross entropy has the same form as the Do Bernoulli distribution with negative logarithms.

5. Polynomial distribution (discrete) code: https://github.com/graykode/distribution-is-all-you-need/blob/master/multinomial.py

The relationship between the polynomial distribution and the categorical distribution is the same as the relationship between the Bernoul distribution and the binomial distribution.

6. Beta distribution (continuous) code: https://github.com/graykode/distribution-is-all-you-need/blob/master/beta.py

The β distribution is conjugated to the binomial distribution and Bernoulli distribution.Using conjugation, the posterior distribution can be obtained more easily by using the known prior distribution.When the β distribution satisfies the special case (α=1, β=1), the uniform distribution is the same.

7. Dirichlet distribution (continuous) code: https://github.com/graykode/distribution-is-all-you-need/blob/master/dirichlet.py

The dirichlet distribution is conjugate to the polynomial distribution.If k=2, it is β distribution.

8. Gamma distribution (continuous) code: https://github.com/graykode/distribution-is-all-you-need/blob/master/gamma.py

If gamma(a,1)/gamma(a,1)+gamma(b,1) is the same as beta(a,b), then the gamma distribution is β distribution.The exponential distribution and the chi-square distribution are special cases of the gamma distribution.

9. Exponential distribution (continuous) code: https://github.com/graykode/distribution-is-all-you-need/blob/master/exponential.py

The exponential distribution is a special case of the γ distribution when α is 1.

10. Gaussian distribution (continuous) code: https://github.com/graykode/distribution-is-all-you-need/blob/master/gaussian.py

The Gaussian distribution is a very common continuous probability distribution.

11. Normal distribution (continuous) code: https://github.com/graykode/distribution-is-all-you-need/blob/master/normal.py

The normal distribution is a standard Gaussian distribution, with an average value of 0 and a standard deviation of 1.

12. Chi-square distribution (continuous) code: https://github.com/graykode/distribution-is-all-you-need/blob/master/chi-squared.py

The chi-square distribution of k degrees of freedom is the distribution of the sum of squares of k independent standard normal random variables.Chi-square distribution is a special case of β distribution.

13.t distribution (continuous) code : https://github.com/graykode/distribution-is-all-you-need/blob/master/student-t.py

The t-distribution is a symmetrical bell-shaped distribution, similar to the normal distribution, but with heavier tails, which means that it is more likely to produce values ​​much lower than the average.