Fraud Detection at Banks using self-Organizing Maps

Original article was published on Artificial Intelligence on Medium

Machine Learning (ML) is an application of artificial intelligence (AI) that provides systems with the ability to learn and improve from experience without being explicitly programmed automatically.

Similarly, Deep Learning(DL) is a subset of machine learning which uses algorithms modelled after the human brain called Artificial Neural Networks (ANN).

Using Deep Learning, I was able to create a Self-Organizing Map(SOM), a type of ANN that uses unsupervised learning, which creates a ranked list of how likely a customer is to have committed fraud in their credit card application at a bank.

How does it work?

First, you would import the dataset of all credit card applications.

Credit Card Statements

As we can see, there are 14 attributes labelled A1-A14. The first column indicates the customer ID to identify a particular customer, and the last column mentions the class.

If the class is equal to 0, that shows that the customer did not get a credit card from the bank and if the class is identical to1 that indicates that the customer got his/her credit card application approved from the bank. The entire dataset is converged on the map with each neuron/node on a 15 x15 2D map containing many customers having some correlations in their attributes. For the SOM to learn, we have to determine the closest node to each of the rows in the dataset in terms of distance.

To determine the closest distance, we would group the customers. As we can see (0,0), the size of the node is 0, which means that two customers have been linked to the same node as they share similar attributes. Similarly, the coordinate (0,13) shows that 11 out of 225 people have the same qualities; that’s why they got grouped in that node.

Once we have figured out our groupings, we can proceed to outlier detection. We are trying to detect customers who are standing out of the crowd. Their unified result should be different from the rest. To do this, we need the mean interneuron distance, which measures the gap between a neuron and its neighbourhood.

Nodes detecting outliers

The coordinates (1,6) and (6,6) mean that those customers who are connected to these neurons are more likely to commit the fraud than the rest of them. Now we would check if these customers were issued a credit card or not by checking the dataset.