Fighting Fraud with Anomaly Detection

Original article was published on Artificial Intelligence on Medium

What could you do with 800 million dollars? What could the government do with 800 million dollars? 800 million dollars is financial loss caused by credit card fraud in Canada every year.

When a fraud occurs, information about the victim can be used against them. Passwords, personal identification numbers, and sometimes even the physical credit card can be stolen. However, the one thing that cannot be stolen is behavior.

A thief will often use the stolen credit card for their own purposes, and make distinct purchasing patterns. To demonstrate this, imagine that we have collected data on a credit card for the past few years. Suddenly, a new example, denoted with a red circle, appears.

Gaussian Anomaly Detection

How do computers detect when a data point is different from the rest? The answer is anomaly detection. Falling in the category of semi-supervised learning, Gaussian anomaly detection finds the probability of a new (possibly fraudulent, or anomalous) data point distributed through Gaussian distribution, given a previous data set of non-fraudulent, or non-anomalous data.

To perform Gaussian anomaly detection with a single variable:

• Find the mean of all the data points. This is denoted with the Greek letter μ.
• Calculate the variance of the data set, denoted with σ²:
• To calculate the probability of a new variable distributed with Gaussian distribution, given the parameters μ and σ²:
• If P is less than some constant ε that we choose, we classify it as an anomaly.

This gives a Gaussian curve centered at μ and stretched by σ. We do not want to be limited to only one variable, so there are a few ways to get around this.

For an approach where we assume variables are independent to each other, to change P(x) to include more variables, fit μ and σ² for every variable, and take the product of the probabilities of each variable.

• Compute μ and σ² with the formulas above for each individual variable.
• To calculate the probability of a new variable, given a vector μ and a vector σ²:
• If P is less than some constant ε that we choose, we classify it as an anomaly.

When we fit μ and σ² to our previous example, we get a contour graph that looks like this: