A Gentle Introduction to Monte Carlo Sampling for Probability

Source: Machine Learning Mastery

Monte Carlo methods are a class of techniques for randomly sampling a probability distribution.

There are many problem domains where describing or estimating the probability distribution is relatively straightforward, but calculating a desired quantity is intractable. This may be due to many reasons, such as the stochastic nature of the domain or an exponential number of random variables.

Instead, a desired quantity can be approximated by using random sampling, referred to as Monte Carlo methods. These methods were initially used around the time that the first computers were created and remain pervasive through all fields of science and engineering, including artificial intelligence and machine learning.

In this post, you will discover Monte Carlo methods for sampling probability distributions.

After reading this post, you will know:

  • Often, we cannot calculate a desired quantity in probability, but we can define the probability distributions for the random variables directly or indirectly.
  • Monte Carlo sampling a class of methods for randomly sampling from a probability distribution.
  • Monte Carlo sampling provides the foundation for many machine learning methods such as resampling, hyperparameter tuning, and ensemble learning.

Discover bayes opimization, naive bayes, maximum likelihood, distributions, cross entropy, and much more in my new book, with 28 step-by-step tutorials and full Python source code.

Let’s get started.

A Gentle Introduction to the Monte Carlo Sampling for Probability

A Gentle Introduction to the Monte Carlo Sampling for Probability
Photo by Med Cruise Guide, some rights reserved.

Overview

This tutorial is divided into three parts; they are:

  1. Need for Sampling
  2. What Are Monte Carlo Methods?
  3. Examples of Monte Carlo Methods

Need for Sampling

There are many problems in probability, and more broadly in machine learning, where we cannot calculate an analytical solution directly.

In fact, there may be an argument that exact inference may be intractable for most practical probabilistic models.

For most probabilistic models of practical interest, exact inference is intractable, and so we have to resort to some form of approximation.

— Page 523, Pattern Recognition and Machine Learning, 2006.

The desired calculation is typically a sum of a discrete distribution or integral of a continuous distribution and is intractable to calculate. The calculation may be intractable for many reasons, such as the large number of random variables, the stochastic nature of the domain, noise in the observations, the lack of observations, and more.

In problems of this kind, it is often possible to define or estimate the probability distributions for the random variables involved, either directly or indirectly via a computational simulation.

Instead of calculating the quantity directly, sampling can be used.

Sampling provides a flexible way to approximate many sums and integrals at reduced cost.

— Page 590, Deep Learning, 2016.

Samples can be drawn randomly from the probability distribution and used to approximate the desired quantity.

This general class of techniques for random sampling from a probability distribution is referred to as Monte Carlo methods.

Want to Learn Probability for Machine Learning

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

What Are Monte Carlo Methods?

Monte Carlo methods, or MC for short, are a class of techniques for randomly sampling a probability distribution.

There are three main reasons to use Monte Carlo methods to randomly sample a probability distribution; they are:

  • Estimate density, gather samples to approximate the distribution of a target function.
  • Approximate a quantity, such as the mean or variance of a distribution.
  • Optimize a function, locate a sample that maximizes or minimizes the target function.

Monte Carlo methods are named for the casino in Monaco and were first developed to solve problems in particle physics at around the time of the development of the first computers and the Manhattan project for developing the first atomic bomb.

This is called a Monte Carlo approximation, named after a city in Europe known for its plush gambling casinos. Monte Carlo techniques were first developed in the area of statistical physics – in particular, during development of the atomic bomb – but are now widely used in statistics and machine learning as well.

— Page 52, Machine Learning: A Probabilistic Perspective, 2012.

Drawing a sample may be as simple as calculating the probability for a randomly selected event, or may be as complex as running a computational simulation, with the latter often referred to as a Monte Carlo simulation.

Multiple samples are collected and used to approximate the desired quantity.

Given the law of large numbers from statistics, the more random trials that are performed, the more accurate the approximated quantity will become.

… the law of large numbers states that if the samples x(i) are i.i.d., then the average converges almost surely to the expected value

— Page 591, Deep Learning, 2016.

As such, the number of samples provides control over the precision of the quantity that is being approximated, often limited by the computational complexity of drawing a sample.

By generating enough samples, we can achieve any desired level of accuracy we like. The main issue is: how do we efficiently generate samples from a probability distribution, particularly in high dimensions?

— Page 815, Machine Learning: A Probabilistic Perspective, 2012.

Additionally, given the central limit theorem, the distribution of the samples will form a Normal distribution, the mean of which can be taken as the approximated quantity and the variance used to provide a confidence interval for the quantity.

The central limit theorem tells us that the distribution of the average […], converges to a normal distribution […] This allows us to estimate confidence intervals around the estimate […], using the cumulative distribution of the normal density.

— Page 592, Deep Learning, 2016.

Monte Carlo methods are defined in terms of the way that samples are drawn or the constraints imposed on the sampling process.

Some examples of Monte Carlo sampling methods include: direct sampling, importance sampling, and rejection sampling.

  • Direct Sampling. Sampling the distribution directly without prior information.
  • Importance Sampling. Sampling from a simpler approximation of the target distribution.
  • Rejection Sampling. Sampling from a broader distribution and only considering samples within a region of the sampled distribution.

It’s a huge topic with many books dedicated to it. Next, let’s make the idea of Monte Carlo sampling concrete with some familiar examples.

Examples of Monte Carlo Sampling

We use Monte Carlo methods all the time without thinking about it.

For example, when we define a Bernoulli distribution for a coin flip and simulate flipping a coin by sampling from this distribution, we are performing a Monte Carlo simulation. Additionally, when we sample from a uniform distribution for the integers {1,2,3,4,5,6} to simulate the roll of a dice, we are performing a Monte Carlo simulation.

We are also using the Monte Carlo method when we gather a random sample of data from the domain and estimate the probability distribution of the data using a histogram or density estimation method.

There are many examples of the use of Monte Carlo methods across a range of scientific disciplines.

For example, Monte Carlo methods can be used for:

  • Calculating the probability of a move by an opponent in a complex game.
  • Calculating the probability of a weather event in the future.
  • Calculating the probability of a vehicle crash under specific conditions.

The methods are used to address difficult inference in problems in applied probability, such as sampling from probabilistic graphical models.

Related is the idea of sequential Monte Carlo methods used in Bayesian models that are often referred to as particle filters.

Particle filtering (PF) is a Monte Carlo, or simulation based, algorithm for recursive Bayesian inference.

— Page 823, Machine Learning: A Probabilistic Perspective, 2012.

Monte Carlo methods are also pervasive in artificial intelligence and machine learning.

Many important technologies used to accomplish machine learning goals are based on drawing samples from some probability distribution and using these samples to form a Monte Carlo estimate of some desired quantity.

— Page 590, Deep Learning, 2016.

They provide the basis for estimating the likelihood of outcomes in artificial intelligence problems via simulation, such as robotics. More simply, Monte Carlo methods are used to solve intractable integration problems, such as firing random rays in path tracing for computer graphics when rendering a computer-generated scene.

In machine learning, Monte Carlo methods provide the basis for resampling techniques like the bootstrap method for estimating a quantity, such as the accuracy of a model on a limited dataset.

The bootstrap is a simple Monte Carlo technique to approximate the sampling distribution. This is particularly useful in cases where the estimator is a complex function of the true parameters.

— Page 192, Machine Learning: A Probabilistic Perspective, 2012.

Random sampling of model hyperparameters when tuning a model is a Monte Carlo method, as are ensemble models used to overcome challenges such as the limited size and noise in a small data sample and the stochastic variance in a learning algorithm.

  • Resampling algorithms.
  • Random hyperparameter tuning.
  • Ensemble learning algorithms.

Monte Carlo methods also provide the basis for randomized or stochastic optimization algorithms, such as the popular Simulated Annealing optimization technique.

Monte Carlo algorithms, of which simulated annealing is an example, are used in many branches of science to estimate quantities that are difficult to calculate exactly.

— Page 530, Artificial Intelligence: A Modern Approach, 3rd edition, 2009.

  • Stochastic optimization algorithms.

Worked Example of Monte Carlo Sampling

We can make Monte Carlo sampling concrete with a worked example.

In this case, we will have a function that defines the probability distribution of a random variable. We will use a Gaussian distribution with a mean of 50 and a standard deviation of 5 and draw random samples from this distribution.

Let’s pretend we don’t know the form of the probability distribution for this random variable and we want to sample the function to get an idea of the probability density. We can draw a sample of a given size and plot a histogram to estimate the density.

The normal() NumPy function can be used to randomly draw samples from a Gaussian distribution with the specified mean (mu), standard deviation (sigma), and sample size.

To make the example more interesting, we will repeat this experiment four times with different sized samples. We would expect that as the size of the sample is increased, the probability density will better approximate the true density of the target function, given the law of large numbers.

The complete example is listed below.

# example of effect of size on monte carlo sample
from numpy.random import normal
from matplotlib import pyplot
# define the distribution
mu = 50
sigma = 5
# generate monte carlo samples of differing size
sizes = [10, 50, 100, 1000]
for i in range(len(sizes)):
	# generate sample
	sample = normal(mu, sigma, sizes[i])
	# plot histogram of sample
	pyplot.subplot(2, 2, i+1)
	pyplot.hist(sample, bins=20)
	pyplot.title('%d samples' % sizes[i])
	pyplot.xticks([])
# show the plot
pyplot.show()

Running the example creates four differently sized samples and plots a histogram for each.

We can see that the small sample sizes of 10 and 50 do not effectively capture the density of the target function. We can see that 100 samples is better, but it is not until 1,000 samples that we clearly see the familiar bell-shape of the Gaussian probability distribution.

This highlights the need to draw many samples, even for a simple random variable, and the benefit of increased accuracy of the approximation with the number of samples drawn.

Histogram Plots of Differently Sized Monte Carlo Samples From the Target Function

Histogram Plots of Differently Sized Monte Carlo Samples From the Target Function

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Books

Articles

Summary

In this post, you discovered Monte Carlo methods for sampling probability distributions.

Specifically, you learned:

  • Often, we cannot calculate a desired quantity in probability, but we can define the probability distributions for the random variables directly or indirectly.
  • Monte Carlo sampling a class of methods for randomly sampling from a probability distribution.
  • Monte Carlo sampling provides the foundation for many machine learning methods such as resampling, hyperparameter tuning, and ensemble learning.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction to Monte Carlo Sampling for Probability appeared first on Machine Learning Mastery.