ET-USB: Transformer-Based Users’ Sequential Behavior Modeling

Source: Deep Learning on Medium

Background on Inbound call center

For financial industry, because of a large number of financial services and customers, having an inbound call center to deal with phone calls from current and potential customers is required. In Cathay United Bank (a subsidiary of Taiwan’s largest financial holding company, Cathay Financial Holdings), we receive nearly 1 million calls from customers every month. However, due to various types of miscommunication between call center agents and customers, the average call duration can exceed 3 minutes; this leads to high communication costs. Consequently, if we could predict the questions that customers are likely to ask, we could minimize call duration. Confirmation of customers’ questions would be brief, and customer satisfaction would increase.

We found that customer calls were typically triggered by difficulties encountered with various channels or product services. In other words, customer interactions with each channel, product, or service are crucial to analyzing inbound call problems. Therefore, we use the HIPPO framework, which is an event-driven information integration framework for aiding workflow, to collect user data from various subsidiary databases. We integrated those behaviors into behavior sequences as our main features for subsequent modeling- ET-USB.

Modeling Framework


We propose a novel neural network, ET-USB, which utilizes the encoder in the transformer, to learn highly effective representations of users’ sequential behaviors (ET-USB). Our model utilizes the self-attention mechanism to capture the most important dependencies among behavior elements in a sequence.

Fig1. Architecture of ET-USB.

We divide our behavior features into two categories: nonsequential and sequential features. In the following section, we describe information extraction from sequential behavior features; nonsequential features are used only for concatenation with information in the final layer.

We embed all behaviors in behavior sequences into fixed-size low-dimensional vectors. Because of self-attention mechanism, the position of each behavior must be represented; otherwise identical behaviors at different positions would have the same output. The positions of sequential behaviors might have different meanings and latent information in various behavior sequences. We add position as an input feature of each item in the input layer before projection as a low-dimensional vector.

Then, we use the encoder architecture of a multilayer Transformer. The encoder consists of stacked encoder layers, each containing two sublayers, namely a multihead self-attention layer and a feed-forward network. Because stacking the previous encoder layers may be helpful for learning complex behavior transition patterns, we stack the encoder layers, including the self-attention layer and feed-forward network. Eventually, we use two fully connected layers to learn the interactions among nonsequential features; next, we concatenate their output with the output of the Transformer layer. Together, this forms dense representation vectors.

Objective Function
To predict which the category of question a customer asks, we model the topic as a multiclass classification problem; thus, we use the softmax function as the output unit. The objective function used in the model provides the category cross-entropy as follows:

Fig2. Objective function of multiclass classification


To form the training and prediction data sets, we cooperated with business units to decide the business logic of relevant data sets; we excluded meaningless information to denoise the data. Furthermore, because certain call questions tend to occur in specific months, the training data set consisted of inbound calls from the last four months and months with specific call questions. Most importantly, we predicted the categories of only those call questions related to credit cards, which accounted for a total of 24 targets.

Fig3. Comparison of training and prediction data sets. We adopted a validation set, which constituted 10\% of the training data set, for model evaluation.

We evaluate the proposed approach, ET-USB, according to its performance in a multiclass classification task. Our experiments demonstrate that ET-USB outperforms prestigious methods at NLP tasks. In our experiments, we implemented the models using TensorFlow. All of the results in this paper were processed in 2 hours or less by a graphics processing unit with training data sets. We trained all of the models in the same computation environment, which featured an NVIDIA Tesla V100 graphics processing unit.

The results are presented in figure 4, which illustrates the advantages of ET-USB over other approaches. Compared with base models without sequential behavior features, we can see the advantages of those features in the subsequent models.

Fig4. Performance levels of various models on training data sets, with map@3 and accuracy as the metrics.

We use a heatmap matrix to represent the self-attention scores between any two behaviors in several different semantic spaces. Notably, this matrix is normalized through row-based softmax. Consider the user in figure 5, who called regarding a credit-card bill. Specifically, this user bought a cell-phone with a COSTCO card through a different channel, bought a plane ticket using a KOKO Combo card, and then executed a series of behaviors; eventually he tried to pay the credit card bill at a 7 Eleven. He was confused when he attempted to pay the credit card bill, and thus he made a inbound call, which was classified as “credit-card bill.” Here we can see how the model works. Different semantic spaces may focus on different behaviors, and the heatmaps vary greatly for some latent spaces. For example, Heads 6, 7, 8 are different from the other five heads. In the other five heads, the relative trends of the attention score vectors are similar, regardless of strength. Regarding Head 6 and 8, we found that high scores tended to form in a dense situation. From these heads, we conclude that the self-attention mechanism greatly influenced which user behaviors were distributed in which semantic spaces.

Fig5. Heatmap of self-Attention scores using the Transformer encoder.

Other Applications

In this paper, we propose an innovative sequential behavior modeling method, called ET-USB, for customer inbound call prediction. By incorporating these sequential features, we not only help increase the model accuracy but phone agents accurately predict customer questions, thereby reducing communication costs and improving the efficiency of human resource allocation for our call center. Still, there are many business scenarios are suitable for sequential behavior modeling. For example, item recommender system, fraud detection and browsing intention. In the future, we will keep investigating each behavior embedding in different channels of customer journey data that may contain latent information regarding users’ sequential behaviors, and applied in business scenarios.

The full paper ET-USB: Transformer-Based Sequential Behavior Modeling for Inbound Customer Service can be read here.