Applying Anomaly Detection with Autoencoders to Fraud Detection

Original article was published on Deep Learning on Medium

Applying Anomaly Detection with Autoencoders to Fraud Detection

I recently read an article called Anomaly Detection with Autoencoders. The article was based on generated data, so it sounded like a good idea to apply this idea to a real-world fraud detection task and validate it.

I decided to use Credit Card Fraud Dataset From Kaggle*:

The datasets contains transactions made by credit cards in September 2013 by european cardholders.
This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It is a very unbalanced dataset and a good candidate to consider frauds as anomalies.

Let’s start with data discovery:

We are going to do a smaller plot after decreasing our dimensions from 30 to 3 with Principal Component Analysis. This data has 32 columns where the first column is the time index, 29 unknown features, 1 transaction amount, and 1 class. I will ignore the time index since it is not stationary.

Your first reaction could be that there are two clusters and this would be an easy task but fraud data is yellow points! There are three visible yellow points in the large cluster. So let’s subsample the normal data while keeping the number of fraud data.

Now it is visible that normal transactions are clustered in a disk while frauded transactions are more distributed.

We are going to build an autoencoder with 3 layer encoder and 2 layer decoder:

Autoencoder will encode our data into a subspace and decode the feature back while normalizing the data. Our expectation is autoencoder will learn the features of normal transactions and the input will be similar to output when applied. For anomalies, the input and the output will be significantly different since it is unexpected data.

The good part of this approach is it allows us to use unsupervised learning and we usually have plenty of normal transaction data. Data labeling is usually expensive, hard, and in some cases unavailable. Manual data labeling also includes human interaction which causes human biased implementations. It can be seen that in the model training we only use normal transaction features and not the labels.

Let’s load data and train our autoencoder:

My model settles around validation loss of 8.5641e-04. (It can go as low as 5.4856e-04.)

Using this model, we will calculate mean squared error (mse) for normal transactions and calculate a threshold value which is 95 percentile of all mse values.

We found our threshold (cut_off) as 0.002. We will consider a transaction as an anomaly if the mean squared error is higher than 0.002. Let’s select 100 frauded samples and 100 normal samples and plot it against the threshold:

It is visible that most of the fraud transactions have high mean squared errors compared to normal transactions. It looks very promising.

We gave up on 5% of the normal transactions. There are still fraud transactions that are below the threshold. This can potentially be improved by using better feature extraction since it seems like some fraud data has very similar features to normal transactions. Some valuable features for credit card fraud are the number of transactions in the previous hour/day/week, if the transaction initiated in a different country than the issued country.

Future tasks:

  • Use a better model by using hyperparameter optimization.
  • Analyze the data to understand the features.
  • Compare these results to a common approach like SVM or K-means clustering.

Full code for this post can be found on Github:

Follow me on Github and Linkedin:

*Acknowledgements For Credit Fraud Dataset

The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group ( of ULB (Université Libre de Bruxelles) on big data mining and fraud detection.
More details on current and past projects on related topics are available on and the page of the DefeatFraud project

Cited works:

Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015

Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915–4928,2014, Pergamon

Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784–3797,2018,IEEE

Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)

Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182–194,2018,Elsevier

Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285–300,2018,Springer International Publishing

Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78–88, 2019

Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019