Credit-Cards customer approval and categorization

Original article was published on Becoming Human: Artificial Intelligence Magazine


Credit cards play a major role in most of the people who deal with various money transactions. A credit card gives the ability to make money transactions even there is no balance left in the holder’s account. By offering these services, banks find a way to make an income by charging form the account holder. Usually, a bank charges 3% of a charge from the amount of the money transaction made by an account holder. Addition to that another interest is charged as an account holder delayed to pay their due.

Usually, a bank targets the profit and the average cash flow of their business. Account-holders who have a good cash flow, give a higher income to the bank. That income further increases with the delay they get to repay their due amount, since the bank charge an interesting value for the delay. In contrast account holders who does not make much transaction and account holders who get delayed to pay their due for months give a negative result to the bank’s cash flow. Account-holders who doesn’t make many transactions but they clear their due regularly do not give a bad impression to the business and also not so important since they make relatively low income to the bank.

Identifying the nature of the customer is a great advantage to the bank to make decisions in their business. A bank cannot fully distinguish whether a customer is profitable or not at first glance. The bank has to study the financial background and history of past payments throughout the period of their account. This study makes no sense to a human data analyzer with a higher amount of considered parameters. But machine learning could help us in this matter. By resolving the pattern of a customer it is possible to predict whether a customer’s next monthly payment would be a default payment or not. This gives the opportunity to the bank to take business action regarding their customers.

The flowchart in the shows the way data has processed from raw data to prediction model selection

As mentioned in the flow chart the data has been visualized at first and pre-processed while creating new features. Thereafter the attributes will be selected based on the statistical and domain understanding, next different classification models will be trained and finally select the best model based on those prioritise the features.

Trending AI Articles:

1. Natural Language Generation:
The Commercial State of the Art in 2020

2. This Entire Article Was Written by Open AI’s GPT2

3. Learning To Classify Images Without Labels

4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst

Feature Engineering

The problem consists of 24 parameters which describe the background of the account holder and due, payment records throughout 6 months(July — December). The co-relation matrix gives an idea about how data is co-related

When carefully analysing the co-relation matrix, a strong correlation can be seen between due amounts of July to December. But when considering the required prediction, only payment amounts were mostly co-related. All other parameters fall under 0.07 per cent of co-relation with the required prediction. Since the co-relation values rely only on statistical representation it shows the requirement of machine learning approach to resolving the underlying complex pattern.

Feeding non engineered features may mislead machine learning algorithms and might lack accuracy. But we can model a structure according to the problem with to solve and generate more meaningful parameters.

All the new features stand out with a higher co-relation to \textit{NEXT\_MONTH\_DEFAULT} compared to the features that were used to define the new features. Since the given dataset contains temporal features, new features that reflect the temporality were considered. However, new features with exclusive correlations could not be found. Also due to the fact that test data dimensions were identical to train data, it can be assumed that temporal features may not improve the models.

Moreover, by using principal component analysis a new set of features were created, which didn’t have a strong correlation to the class but it represented the whole data set


All the preprocessing steps described below were applied to both training and testing datasets. Firstly the datasets were read into python pandas data frames. Then data were checked for NaNs and found none. The columns with multiple units were treated to hold values in the same unit. For example, Balance_Limit_V1 had multiple units {M and K} and the values were stored as strings. The values were converted to int64 format to express the meaning of {Balance_Limit_V1}. Then a check for finding outliers was carried out. As it seemed to discard potential outliers didn’t increase the models’ performance this step was undone.

Jobs in AI

Final Model and how it was reached

Initially, 11 different classification models were training a Logistic Regression, KNN Regression, SVC with a Linear kernel, SVC with RBF kernel, Gaussian NB, a Decision Tree Classifier, Random Forest Classifier, XGB Classifier, Extra Trees Classifier, an Ada Boost Classifier and a Classical Neural Network. Then the dataset has been divided into 3 parts as 70\% Training data, 15\% cross-validation data and 15\% test data randomly.
Thereafter, by checking the co-relation matrix and by domain understating top 7 and 6 features were selected from 2 sets from classical feature creation and PCA. Then the models were optimized for the Training data set and have done the initial accuracy and error test from the cross-validation set and tested. After, by repeating the same process with changing the cross-validation set and training set above models were trained for 5 times by fine-tuning the hyperparameters. From the above 11 algorithms, 4 algorithms outperformed in Training accuracy, Testing accuracy and in cross-validation accuracy. Which were Logistic Regression, SVC with RBF kernel, XGBoost Classifier, Ada Boost Classifier

Finally based on the feature distribution, Recall, Precision and F1 score the algorithm was selected.

Specifications and recommendations (Business Insights)

Account-holders can be categorized as their importance to the bank. The most important part is people who bring the most income and cash flow to the bank. These kinds of people can identify as following

1) Account-holders with a higher average of expenditures within a month and also pay their due regularly as soon as possible. Since this kind of people makes transactions with a higher amount, the bank gains a considerable amount of profit.
2) Account-holders with a relatively lower average of expenditures does not make much income to the bank. These kind of people are not much important to the bank. But they do not make any harm to the bank’s cash flow or profit.
3) Another type of account holders are people who spend a lot and take a long time to pay their due amounts, but anyhow they manage to pay the due. Even though they provide a considerable amount of interest amount since they take considerable duration to pay their due. But the issue is this kind of income come with a risk.
4) Account-holders which takes a very long duration to pay their due, it makes a negative effect on the bank’s cash flow.

To improve the income of the bank above four categories should be handled effectively. The first type of account holders should be treated well to keep them on spending. Since they provide a higher amount of income to the bank, giving special offers to motivate them and leads to spending more money could boost up their expenditures and increase the amount of income to the bank and also satisfying the customer.
The second type of account holders tends to spend fewer amounts through their credit card account. But there is a chance we could manage to lead them to spend more through their credit card accounts. One way to achieve this is to give them the ability to make instalment payment through their credit card account. These kinds of account holders tend to use this kind of payment methods since they cost relatively less amount per month. This way ensure that account holder retains with the bank for a known duration.
But the total value of the instalment payments relies on the financial stability of the bank company.
The third type of account holders should be handled carefully. The bank should try to keep them with the bank. Offering Offers that suitable for their budget limit. They should be monitored frequently and notice whether they falling into a bankrupt. If any account holder is noticed to be bankrupt, it should inform the risk management team of the company.
The fourth type of account holders mostly tends to fall in to bankrupt. Therefore they must be monitored and their balance limit should be controlled to avoid them making risky payments.

Finally, By following above step a back can maximise their profit, customer base and customer retaining. Moreover, this will increase the cash-flow of the company and increase wealth in the long run.

Don’t forget to give us your 👏 !

Credit-Cards customer approval and categorization was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.