Original article was published on Artificial Intelligence on Medium

What could you do with 800 million dollars? What could the government do with 800 million dollars? **800 million dollars** is financial loss caused by credit card fraud in Canada every year.

When a fraud occurs, information about the victim can be used against them. Passwords, personal identification numbers, and sometimes even the physical credit card can be stolen. **However, the one thing that cannot be stolen is behavior**.

A thief will often use the stolen credit card for their own purposes, and make distinct purchasing patterns. To demonstrate this, imagine that we have collected data on a credit card for the past few years. Suddenly, a new example, denoted with a red circle, appears.

# Gaussian Anomaly Detection

How do computers detect when a data point is different from the rest? The answer is anomaly detection. Falling in the category of semi-supervised learning, Gaussian anomaly detection finds the probability of a new (possibly fraudulent, or anomalous) data point **distributed through Gaussian distribution**, given a previous data set of non-fraudulent, or non-anomalous data.

To perform Gaussian anomaly detection with a single variable:

- Find the mean of all the data points. This is denoted with the Greek letter μ.
- Calculate the variance of the data set, denoted with σ²:

- To calculate the probability of a new variable distributed with Gaussian distribution, given the parameters μ and σ²:

- If P is less than some constant ε that we choose, we classify it as an anomaly.

This gives a Gaussian curve centered at μ and stretched by σ. We do not want to be limited to only one variable, so there are a few ways to get around this.

For an approach where we assume variables are independent to each other, to change P(x) to include more variables, fit μ and σ² for every variable, and take the product of the probabilities of each variable.

- Compute μ and σ² with the formulas above for each individual variable.
- To calculate the probability of a new variable, given a vector μ and a vector σ²:

- If P is less than some constant ε that we choose, we classify it as an anomaly.

When we fit μ and σ² to our previous example, we get a contour graph that looks like this: