Original article can be found here (source): Artificial Intelligence on Medium
Primitive Fraud detectors
Before the introduction of online applications, customers would visit the bank/credit card companies to complete the paperwork for a credit card application. Usually, this paperwork is done in front of an agent who could identify frauds by reading the applicants body language while filling in the application. For instance, if they changed answers multiple times to questions or appeared to be lying when answering a question, the agent would likely further qualify that candidate. These physical agents were the earliest form of fraud detectors. Physical agents were no longer needed when the applications were made available online, which led to the customer’s behavior going unnoticed. If you can’t see them, you obviously can’t read their body language.
So, What to do now?
To overcome this we can set up a virtual agent who analyzes the behavior based on a customer’s actions.
Virtual Agent — Behavioral Intelligence
A customer interacts with an online application through a computer/mobile that is connected to the internet. Every action of the customer can be recorded which could be the inputs given through their keyboards or mouse and the time taken for it.
Converting a Speech Recognition Technology to identify Behavior
In order to identify a customer behavior pattern, each customer’s micro time series has to be compared with the rest of the other customer’s micro time series. Comparison of two data points is usually based on the distance between them (Euclidean/Manhattan), closer the distance more similar they are to each other. Also, the major problem is that the micro time series of each customer differs in length, like, a customer can use any combination of keys (Input/Backspace/Cut/Copy/Paste) with the time between them to complete a form field which may not be the same with another customer.
One similar example of this problem is speech recognition. Suppose a person speaks the same sentence twice, first time faster and the second time slower. Traditional Euclidean distance matching matches the points between the speeches at one point of time, in this case as the two speeches are out of sync in time, the Euclidean distance becomes high (showing high dissimilarity). To solve this, Dynamic Time Warping goes back in time and matches the points between the speeches. This is how it works,
1. Each point from speech one is compared with every point of speech two by calculating a vector difference metric similar to Euclidean distance. Similarly, each key in the micro timeseries of a customer is compared the micro timeseries of another customer.
2. For each point of speech 1, least distance for points in speech two is taken. In the same way, the least distance is calculated between two micro timeseries of two different customers
3. This eventually warps a path based on least distances. So, the more linear this path is, the more similar will be the speeches. So, DTW always warps a path irrespective of the length of two timeseries.
Dynamic Time Warping
Dynamic Time Warping (DTW) is an A.I. technique which has been very useful for normalizing and comparing data with unequal lengths of data. Similarly, there are key inputs of un-equal lengths and varying time speed. Each micro timeseries were grouped by similarity for each form field (email, phone number, last name, etc.). Except the outliers, remaining groups for each form field becomes the regular filling pattern for that form field. For e.g., DTW distances are calculated for various micro time series within phone number field, micro time series with a high distance between the other micro timeseries becomes an outlier while the remaining becomes a regular filling pattern for the phone number field. Likewise, DTW is done for timeseries between form fields in all pages, between form fields in each page and between pages to generate a pattern for genuine filling. So, any application timeseries pattern that doesn’t fall under this common pattern has a higher chance of being a risky or fraudulent user.
Segmenting users with similar typing pattern
The users can be clustered based on the Euclidean Distance using k-means to identify group of users with similar typing pattern. The outliers have a higher chance of being a risky or fraudulent applicant.
Below plots are an example of users filling in email form field. Using the micro time series, two new features were generated, one is the Percentage of total time used for each key input in the email form field cumulatively and the other one is Aggression rate of users which is the total number of input keys at a given time by each user.
Type 1 Users
Total users in this cluster: 33/210 (15%)
Active time: 60% to 80%
Idle time: 40% to 20%
Total Time in email field: 7 sec to 9 sec
First letter typed within 2 seconds after the email field is clicked
Typing speed: 3keys/sec to 5 keys/sec
These users are slow and less aggressive in typing
Type 2 Users
Total users in this cluster: 156/210 (74%)
Active time: 90% to 100%
Idle time: 10% to 0%
Total Time in email field: less than 6 sec
First letter typed within 4 seconds after the email field is clicked
Typing speed : more than 5 keys/sec
These users are fast and more aggressive in typing
More type of users was segmented to identify their typing patterns. Type 2 users might have used autofill or more backspace while filling in the email. They can be further classified based on error ratio (backspace to total keys ratio), autofill would hardly have any error ratio and less total keys but a high error ratio and high total keys indicates that the user used multiple email ids which could be one of the indicator for risky customer. This process is repeated for other form fields to shape the final dataset for the prediction model.
Adding features to the Prediction Model
The aggression rate, Percentage utilization of overall time taken for each input, error ratio, total keys and the DTW distance between the users can be some of the new features that can be added along with the other behavioral features like last hover field, time take to submit the application, mean time take for each field, number of tries for each field, total number of sessions taken to submit the application and the labels in the final dataset to build the prediction model.
How does this help the company?
Identifying fraud users as soon as they submit the application adds huge savings to the cost considering the present fraud trends and the effort taken to track them down.