Machine Learning to Develop Credit Card Customer Churn Prediction

: The credit card customer churn rate is the percentage of a bank’s customers that stop using that bank’s services. Hence, developing a prediction model to predict the expected status for the customers will generate an early alert for banks to change the service for that customer or to offer them new services. This paper aims to develop credit card customer churn prediction by using a feature-selection method and ﬁve machine learning models. To select the independent variables, three models were used, including selection of all independent variables, two-step clustering and k-nearest neighbor, and feature selection. In addition, ﬁve machine learning prediction models were selected, including the Bayesian network, the C5 tree, the chi-square automatic interaction detection (CHAID) tree, the classiﬁcation and regression (CR) tree, and a neural network. The analysis showed that all the machine learning models could predict the credit card customer churn model. In addition, the results showed that the C5 tree machine learning model performed the best in comparison with the three developed models. The results indicated that the top three variables needed in the development of the C5 tree customer churn prediction model were the total transaction count, the total revolving balance on the credit card, and the change in the transaction count. Finally, the results revealed that merging the multi-categorical variables into one variable improved the performance of the prediction models.


Introduction
At present, the market is dynamic and highly competitive due to the availability of large numbers of service providers, especially banks, worldwide. One of the main challenges for this sector is the change in customer behavior. Customers are the core of all industries, especially customer-dependent organizations, such as the banking sector, which is responsible for accepting deposits, making investments, and granting loans. Longterm customers are directly connected to the production of profits; hence, banks should avoid losing customers [1][2][3][4]. The Harvard Business Review believes that a 5% defection in customers can lead to an increase in profits for firms of between 25% and 85% [4,5]. Thus, given that customers are the most important assets with strong effects on a bank's profit, there are five essential pillars for the modern banking business: capital, liquidity, risk, assets, and customer management [6,7]. Focusing effectively on the five pillars can ensure that management effectively maximizes the profits of a bank [7,8].
Therefore, customer churn is a fundamental challenge for banks. Customer churn can be defined as the loss of a customer to a competitor, which leads to losses in profits. To manage churning, it is essential to identify those customers who are likely to move to a competing bank [9][10][11]. In addition, Risselada et al. (2010) [12] showed that churning management is important in establishing appropriate long-term relationships between features to develop a prediction model were the absence of mobile banking, zero-interest personal loans, zero balance, and other services online. Saias et al. (2022) [27] developed a churn risk prediction model for customers of cloud service providers. The aim of the study was to create an alert system to avoid losing cloud customers based on neural network model AdaBoost and random forest. The results found that random forest outperformed other models. Moreover, many researchers developed a prediction system for different churned customers in various fields such as the telecommunication industries [28], and the E-commerce industry [29].
Bank churn prediction aims to understand the possibility of customers moving from one bank to another. The reasons for movement include the availability of the latest technology, low interest rates, services offered, and credit card benefits [30]. This study aims to predict churned customers based on credit card and customer information (i.e., age and gender). Meanwhile, researchers have attempted to find credit card fraud detection using machine learning models [31,32] or by developing optimization algorithms with machine learning models [33].
Most of the prediction models in the literature have focused on developing prediction models for various problems in the banking system with a little interest in credit card customers. Therefore, the authors of this paper tried to fill the gaps of the previous studies in the field. The contributions of this paper are as follows: 1.
Prediction models developed based on forwarding different numbers of independent variables. 2.
The capability of the prediction models was validated based on two-step clustering and k-nearest neighbors. 3.
The capabilities of the machine learning models to predict credit card customer churn in banks were predicted. 4.
The top features for developing a credit card churn prediction method were determined.
The primary aims of this article are as follows: 1.
To use different independent variables in building a prediction model based on two-step clustering and k-nearest neighbors.

2.
To select the appropriate machine learning models with top features for predicting churn customers.
The rest of the paper is organized as follows: Section 1 presents the introduction and explores previous research in this field. Section 2 presents the research methodology used in this paper. Section 3 presents the analysis and empirical results. Finally, our conclusions are drawn in Section 4.

Research Methodology
Various studies have developed models for predicting customer churn without utilizing significant variables. To overcome this issue, it has been suggested that categorical variables are merged into one variable. Therefore, this research gap prompted the authors to find an appropriate model for predicting customer churn.
The primary step for developing a customer churn prediction model is to collect, analyze, and clean the dataset. Poorly cleaned data are unable to establish a relationship between input and output variables; in turn, this affects the performance of the prediction model. Therefore, the cleaned dataset can be applied in three models to build customer churn prediction models. The methods aim to select input variables depending on different independent variables that are selected by feeding all the independent variables in the dataset (continuous and categorical), selecting variables based on two-step clustering and logistic regression (continuous and cluster number variable), and selecting variables based on a feature-selection method. The outputs of the three models are applied in various machine learning models, including random forest, neural network, CR-Tree, C5 tree, Bayesian network, CHAID tree, support vector machine, quest tree, multinomial logistic regression, and a linear regression model. For brevity, only the top five machine learning models were considered in this study. This section is divided into subsections: data collection, developed prediction models, machine learning models, and performance metrics.

Data Collection
This paper depends on the dataset of credit card customer churn for banks; the dataset was collected from https://leaps.analyttica.com (accessed on 10 September 2021). Customers have the option to choose one of four credit card types: blue, silver, gold, or platinum. When customers decide to change their bank, they are recorded as churn customers. Consequently, churn customers cause the profits of a banking system to decrease. Therefore, there has been increased interest from banking professionals to design an early-warning system to classify a bank's customers into churn or non-churn customers. The system would be able to notify the bank's managers so that they can communicate with customers who are expected to churn to improve their services, which is an appropriate way to keep the customers satisfied with their bank. The dataset contains 20 variables: 1 dependent variable and 19 independent variables. The total number of customers is 10,127, with 1627 churn customers.
The dataset contains the following data: a churn value (dependent), age, gender, number of dependents, education level, marital status, income category, product variable (type of credit card-blue, silver, gold, platinum), period of relationship with bank, total number of products held by the customer, number of months inactive of the last 12 months, number of contacts in the last 12 months, credit limit on the credit card, total revolving balance on the credit card, open to buy credit line (average of last 12 months), change in transaction amount (Q4 over Q1), total transaction amount (last 12 months), total transaction count (last 12 months), change in transaction count (Q4 over Q1), and average card utilization ratio. The full analysis of the dataset can be found in the DAS link.
Firstly, the dataset was divided into categorical and continuous variables as independent variables and one dependent variable (churn customers). Next, we analyzed the variables using different statistical metrics including min, max, variance, standard deviation, chi square (for categorical variable), and correlation analysis (for continuous variable). The initial analysis showed that the linear relationship between variables does not exist; therefore, a nonlinear model was applied to develop a prediction model for customer churn.

Developed Customers Churn Prediction Models
To develop a prediction model, three methods were utilized to control the number of independent variables used in the prediction model. First, all independent variables were directed to one of the applied machine learning models, which is referred to as Model 1. Next, we improved the independent variables by applying a two-step clustering method to the categorical variables only. Afterwards, the continuous variables with the cluster values were forwarded to each one of the machine learning models. To make the model more realistic, a logistic regression approach was used to predict the cluster number based on the categorical variables. This model is denoted as Model 2.
Model 2 was divided into two phases: the clustering phase and the prediction phase. In the clustering phase, the dataset was divided into groups using two-step clustering, and the continuous variables with a group variable were used to build a prediction model using the neural network. In the clustering model, the categorical variables were used to divide the customers into a specific number of groups. This step aimed to minimize the number of input variables forwarded to the machine learning model, in addition to simplifying the meta data of the customers. Moreover, the k-nearest neighbor model was used to ensure that the developed model was suitable for the online scenario and to avoid repeating the clustering analysis for future data. The k-nearest neighbors model uses customers' categorical meta data as input and the cluster number from the two-step clustering step as output. Furthermore, in the prediction phase, one of the machine learning models was used to build a prediction model by receiving the inputs from the two-step clustering step and the continuous meta data of the customers. To build a prediction model which depends on machine learning, the dataset should be divided into training, validating, and testing data. The training data was used to learn the network from the previous information about the churn and non-churn customers. The test data were used to test the capability of the machine learning model in predicting the future churn customers. Based on previous research, the best percentages for building training, validating, and testing data for a dataset were found to be 70%, 15%, and 15%, respectively. Figure 1 shows how all the proposed phases of building the prediction model collaborated.
validating, and testing data. The training data was used to learn the network from previous information about the churn and non-churn customers. The test data were u to test the capability of the machine learning model in predicting the future churn tomers. Based on previous research, the best percentages for building training, valida and testing data for a dataset were found to be 70%, 15%, and 15%, respectively. Figu shows how all the proposed phases of building the prediction model collaborated.
Two-step clustering is a tool designed to handle the nature of the data and to main insights. The differences between the two-step method and other clustering mo include the following: it can use both categorical and continuous variables; then, it automatically choose the appropriate number of clusters. Grouping data using the step clustering method involves an initial use of the distance measure to divide the into groups; then, probabilistic approach is applied to select the optimal group. For Model 3, all the independent variables, including the categorical and continu variables, were forwarded to the feature-selection method to select the features that w related to the churn customers. The feature-selection method ranked the features as portant, marginal, and unimportant. Only the important variables were forwarded to machine learning models to be used in building the prediction models. A summary o applied models is shown in Table 1. Table 1. Summary of the developed models.

Model
Variables Model 1 All variables Model 2 All continuous variables and cluster value Model 3 The selected variables after the feature-selection method

Machine Learning Models
This study adopted three methods for selecting independent variables, which ai to understand the most suitable model for improving the performance of the predic model based on machine learning models. The study applied ten machine learning m els: random forest, neural network, CR-Tree, C5 tree, Bayesian network, CHAID tree, port vector machine, quest tree, multinomial logistic regression, and a linear regres Two-step clustering is a tool designed to handle the nature of the data and to find main insights. The differences between the two-step method and other clustering models include the following: it can use both categorical and continuous variables; then, it can automatically choose the appropriate number of clusters. Grouping data using the two-step clustering method involves an initial use of the distance measure to divide the data into groups; then, probabilistic approach is applied to select the optimal group.
For Model 3, all the independent variables, including the categorical and continuous variables, were forwarded to the feature-selection method to select the features that were related to the churn customers. The feature-selection method ranked the features as important, marginal, and unimportant. Only the important variables were forwarded to the machine learning models to be used in building the prediction models. A summary of the applied models is shown in Table 1. Table 1. Summary of the developed models.

Model 1 All variables Model 2
All continuous variables and cluster value Model 3 The selected variables after the feature-selection method

Machine Learning Models
This study adopted three methods for selecting independent variables, which aimed to understand the most suitable model for improving the performance of the prediction model based on machine learning models. The study applied ten machine learning models: random forest, neural network, CR-Tree, C5 tree, Bayesian network, CHAID tree, support vector machine, quest tree, multinomial logistic regression, and a linear regression model. The initial results denoted that the top five machine learning algorithms for the developed models in Table 1 were Bayesian network, C5 tree, CHAID tree, CR-Tree, and the neural network [34][35][36][37][38].

Performance Metrics
To design a predictor, the dataset was divided into training, validating, and testing datasets, with cutoff percentages of 70%, 15%, and 15% for training, testing, and validating the data, respectively.
The performance of churn customers prediction models can be evaluated by using classification parameter variables: recall, precision, accuracy, false omission rate, and F1 score. To find the performance metrics, a confusion matrix is generated first using the output of the classification results. To calculate performance metrics, the following equations are used:

Results Discussion and Analysis
To validate the developed models, the top five machine learning models were considered: Bayesian network, C5 tree, CHAID tree, CR-Tree, and the neural network. This section is divided based on the developed models explained previously in the methodology section.

Model 1: Develop Customers Churn Based on All Variables
Model 1 used all the categorical and continuous variables in the dataset. The independent variables were age, gender, number of dependents, education level, marital status, income category, product variable, period of relationship with bank, total number of products, number of months inactive in the last 12 months, no. of contacts in the last 12 months, credit limit on the credit card, total revolving balance on the credit card, open to buy credit line (average of last 12 months), change in transaction amount (Q4 over Q1), total transaction amount (last 12 months), total transaction count (last 12 months), change in transaction count (Q4 over Q1), and average card utilization ratio. The dependent variable was the status of the customer (churn or non-churn).
The training, testing, and validation results for Model 1 are shown in Table 2. The results showed that, for the training dataset, the accuracy and the FOR variables achieved more than 0.9, whereas the precision, recall, and F1 score showed variation between the machine learning models, with the best results achieved for the C5 tree model. To improve the performance of the training model, a validation dataset was used. The initial results showed that the Bayesian network, CHAID, and the neural network failed to enhance the performance of the prediction model. Additionally, the C5 tree and CR-Tree models showed improvements for the validation dataset compared with the training dataset, with best results derived using the C5 tree model. Finally, the dataset was tested to check the capability of the developed model to predict the future dataset. All the models' results show that their capabilities in predicting the churn customers is acceptable, with the highest performance derived with the C5 tree model. The accuracy, precision, recall, FOR, and F1 score for the C5 tree model with the testing dataset were 0.964, 0.914, 0.880, 0.974, and 0.897, respectively. The results indicated that the C5 tree model showed higher capability in predicting the training, validating, and testing datasets compared with the other four machine learning models applied. To merge the collected results and the studied variables, the importance of the variables included as independent variables is shown in Figure 2. The importance of each variable analysis revealed that the top three variables were: change in transaction count, total revolving balance on the credit card, and total transaction count. Here, open-to-buy credit lines and the total transaction amounts showed very weak relations in building the C5 tree prediction model.

Model 2: Developed Churn Customers Based on Two-Step Clustering and K-Nearest Neighbors
As discussed in the methodology section, the independent variables were divided into continuous and categorical variables. The categorical variables included gender, education level, marital status, income category, and product variable. The rest of the independent variables were continuous. The categorical variables were forwarded to the twostep clustering model to merge the variables into one categorical variable that described all the categorical variables in the dataset. The results showed that the two-step clustering

Model 2: Developed Churn Customers Based on Two-Step Clustering and K-Nearest Neighbors
As discussed in the methodology section, the independent variables were divided into continuous and categorical variables. The categorical variables included gender, education level, marital status, income category, and product variable. The rest of the independent variables were continuous. The categorical variables were forwarded to the two-step clustering model to merge the variables into one categorical variable that described all the categorical variables in the dataset. The results showed that the two-step clustering approach can create four categories for the input variables, as shown in Figure 3. The generated clusters are used with the k-nearest neighbor model to ensure that the developed model is similar to the real environment; here, k is the number of clusters generated using two-step clustering, as shown in the study methodology. The predicted values were used in the three datasets to validate the capability of the k-nearest neighbor model in predicting cluster values using the categorical variables. The accuracy results of the training, testing, and validating datasets were 0.93, 0.94, and 0.95, respectively. The results from the k-nearest neighbors approach indicated that the model is capable of predicting the generated cluster value using the categorical value. The results from Model 2 show that the accuracy and FOR variables for all models were more than 0.9. The precision, recall, and F1 score variables were between 0.70 and 0.97 for the training, testing, and validation datasets. The training dataset showed that the C5 tree achieved the highest performance, where the accuracy, precision, recall, FOR, and F1 score were 0.963, 0.911, 0.851, 0.972, and 0.880, respectively. To improve the performance of the prediction model, a validation dataset was used. The validation results showed that the C5 tree and CR-Tree models improved the performance of the prediction models, while the rest of the prediction models did not. Moreover, to validate the performance of the future data, a testing dataset was used. The results showed that the C5 tree model had the highest performance compared with the other models. The test results of the C5 model were 0.967, 0.919, 0.891, 0.977, and 0.905 for accuracy, precision, recall, FOR, and F1 score, respectively. Finally, Model 2 showed that the C5 tree approach was more robust in creating a prediction model for churn customers.   The predicted values were forwarded with all the continuous values to one of the five machine learning models to develop a prediction model, as shown in Table 3. The results from Model 2 show that the accuracy and FOR variables for all models were more than 0.9. The precision, recall, and F1 score variables were between 0.70 and 0.97 for the training, testing, and validation datasets. The training dataset showed that the C5 tree achieved the highest performance, where the accuracy, precision, recall, FOR, and F1 score were 0.963, 0.911, 0.851, 0.972, and 0.880, respectively. To improve the performance of the prediction model, a validation dataset was used. The validation results showed that the C5 tree and CR-Tree models improved the performance of the prediction models, while the rest of the prediction models did not. Moreover, to validate the performance of the future data, a testing dataset was used. The results showed that the C5 tree model had the highest performance compared with the other models. The test results of the C5 model were 0.967, 0.919, 0.891, 0.977, and 0.905 for accuracy, precision, recall, FOR, and F1 score, respectively. Finally, Model 2 showed that the C5 tree approach was more robust in creating a prediction model for churn customers.
To merge the collected results and the studied variables, the importance of each variable applied in the C5 tree model is shown in Figure 4. The results revealed that the top three variables in building the C5 tree prediction model were total transaction count, total revolving balance on the credit card, and change in transaction count; here, age-and month-inactive variables were very weak variables in building a C5 tree prediction model. In addition, the new cluster value generated in Model 2 was found to be among the top six most important variables in developing the customer churn prediction model. three variables in building the C5 tree prediction model were total transaction count, total revolving balance on the credit card, and change in transaction count; here, age-and month-inactive variables were very weak variables in building a C5 tree prediction model. In addition, the new cluster value generated in Model 2 was found to be among the top six most important variables in developing the customer churn prediction model.

Model 3: Developed Churn Customers Based on Feature-Selection Model
Model 3 used a feature-selection model to select the most important variable(s) for the customers churn prediction (i.e., dependent variable). The output of the feature-selection model was divided into important, marginal, and unimportant, with cutoff percentages greater than 0.95, 0.90, and 0, respectively, as shown in Table 4. The feature-selection model used a Person model for the categorical variable.  Field  Importance Value  Importance  TRUE  1  Total_Trans_Ct  0  1  important  TRUE  2  Total_Ct_Chng_Q4_Q1  0  1  important  TRUE  3  Total_Revolving_Bal  0  1  important  TRUE  4  Contacts_Count_12_mon  0  1  important  TRUE  5  Avg_Utilization_Ratio  0  1  important  TRUE  6  Total_Trans_Amt  0  1  important  TRUE  7  Total_Relationship_Count  0  1

Model 3: Developed Churn Customers Based on Feature-Selection Model
Model 3 used a feature-selection model to select the most important variable(s) for the customers churn prediction (i.e., dependent variable). The output of the feature-selection model was divided into important, marginal, and unimportant, with cutoff percentages greater than 0.95, 0.90, and 0, respectively, as shown in Table 4. The feature-selection model used a Person model for the categorical variable.
The results showed that Total_Trans_Ct, Total_Ct_Chng_Q4_Q1, Total_Revolving_Bal, Contacts_Count_12_mon, Avg_Utilization_Ratio, Total_Trans_Amt, Total_Relationship_Count, Months_Inactive_12_mon, and Total_Amt_Chng_Q4_Q1 are the important variables to be used in building Model 3. The rest of the variables were omitted, as they were either marginal or unimportant. The important variables were forwarded to five machine learning models to develop the customer churn prediction model. Table 4. The output of the feature-selection method.

Status
Rank Field  Importance  Value  Importance   TRUE  1  Total_Trans_Ct  0  1  important  TRUE  2  Total_Ct_Chng_Q4_Q1  0  1  important  TRUE  3  Total_Revolving_Bal  0  1  important  TRUE  4  Contacts_Count_12_mon  0  1  important  TRUE  5  Avg_Utilization_Ratio  0  1  important  TRUE  6  Total_Trans_Amt  0  1  important  TRUE  7  Total_Relationship_Count  0  1  The results of Model 3 showed that all five models predicted the churned customers with different accuracy values, ranging between 0.9 and 0.99 for all the collected datasets, as shown in Table 5. The training dataset shows that the C5 tree model was the most accurate model in building a prediction model for customer churn. The training results were 0.976, 0.921, 0.928, 0.986, and 0.924 for accuracy, precision, recall, FOR, and F1 score, respectively. To improve the performance of prediction models, a validation dataset was applied. The validation results showed that the trained models were not optimized, indicating that the validation dataset was not able to improve the performance of the trained models. To check the predictability of the machine learning model, the testing dataset was used. The test results showed that all five models could accurately predict the churn customers with different performance metrics. The best results were supported by applying the C5 tree prediction model, where the accuracy, precision, recall, FOR, and F1 score results were 0.940, 0.813, 0.861, 0.970, and 0.836, respectively. To investigate the optimal variables used in the C5 tree customers churn prediction model, Figure 5 is used to explain the importance of each variable applied in the C5 tree model. The results showed that total transaction count, the total revolving balance on the credit card, and the change in the transaction count are the most important variables for predicting churn customers; the total transaction amount is not considered in the customer churn prediction model.

Discussion and Analysis
Three models were adopted to develop a machine learning model that can be used to predict churn customers. The models were built after considering the independent variables selection method. To select the variables, three models were considered: all variables selection (Model 1), two-step clustering and k-nearest neighbor selection (Model 2), and a feature-selection algorithm (Model 3). The results revealed that all the applied machine learning models were capable of predicting the churn customers with different efficiency values. In all the developed models (Models 1-3), the C5 tree prediction model showed the highest performance in customer churn prediction, with the best performance derived by Model 2. The accuracy, precision, recall, FOR, and F1 score using the training dataset were 0.967, 0.919, 0.891, 0.977, and 0.905, respectively. These results indicated that the C5 tree model is highly capable of predicting the churn customers, with fewer variables than Model 1 and extra analysis compared with Model 3. The results showed that the top three variables for all the developed models in building the C5 tree model were the total transaction count, the total revolving balance on the credit card, and the change in the transaction count. In addition, the results showed that the total transaction amount was not important in developing a prediction model. However, achieving the highest performance is not always the goal; thus, researchers can obtain good performance with a less complex model by using a feature-selection model with the given dataset. Accordingly, they can reduce the number of independent variables that are forwarded to the machine learning models. In addition, using all the variables with the C5 tree model can guarantee good performance with more processing time to predict the churn customers. Overall, the results indicated that the machine learning models have a high capability to adapt to different numbers of input variables in predicting the churn customers. In addition, the C5 tree model showed a higher capability of predicting the churn customers. Finally, determining the appropriate number of independent variables is a very important step in developing and building a machine learning model.
All models can be used to predict credit card customer churn in banking systems. Model 1 requires that all independent variables are used to predict the churn cases. Model 2 requires all continuous variables and one combined categorical variable to predict the churn cases. Model 3 uses only variables with high importance in predicting the churn

Discussion and Analysis
Three models were adopted to develop a machine learning model that can be used to predict churn customers. The models were built after considering the independent variables selection method. To select the variables, three models were considered: all variables selection (Model 1), two-step clustering and k-nearest neighbor selection (Model 2), and a feature-selection algorithm (Model 3). The results revealed that all the applied machine learning models were capable of predicting the churn customers with different efficiency values. In all the developed models (Models 1-3), the C5 tree prediction model showed the highest performance in customer churn prediction, with the best performance derived by Model 2. The accuracy, precision, recall, FOR, and F1 score using the training dataset were 0.967, 0.919, 0.891, 0.977, and 0.905, respectively. These results indicated that the C5 tree model is highly capable of predicting the churn customers, with fewer variables than Model 1 and extra analysis compared with Model 3. The results showed that the top three variables for all the developed models in building the C5 tree model were the total transaction count, the total revolving balance on the credit card, and the change in the transaction count. In addition, the results showed that the total transaction amount was not important in developing a prediction model. However, achieving the highest performance is not always the goal; thus, researchers can obtain good performance with a less complex model by using a feature-selection model with the given dataset. Accordingly, they can reduce the number of independent variables that are forwarded to the machine learning models. In addition, using all the variables with the C5 tree model can guarantee good performance with more processing time to predict the churn customers. Overall, the results indicated that the machine learning models have a high capability to adapt to different numbers of input variables in predicting the churn customers. In addition, the C5 tree model showed a higher capability of predicting the churn customers. Finally, determining the appropriate number of independent variables is a very important step in developing and building a machine learning model.
All models can be used to predict credit card customer churn in banking systems. Model 1 requires that all independent variables are used to predict the churn cases. Model 2 requires all continuous variables and one combined categorical variable to predict the churn cases. Model 3 uses only variables with high importance in predicting the churn cases. The models clearly indicate the desired direction which an expert should take to retain the bank's clients. Experts and scholars in the field can use all the models to predict churn customers, and Model 1 can be used if nonexperts use the model, given that all variables will be included in the model. Models 2 and 3 can be used by experts in the field because of the additional analyses required in developing prediction models for customer churn.

Conclusions
This paper aimed to investigate the capability of machine learning to predict the credit card customer churn rate in the banking sector. The collected dataset contains two types of data: categorical and continuous variables. These data describe 10,127 customers in the bank; 1627 of these customers were churn customers. To develop different credit card customer churn prediction models, the independent variables were changed to different forms. To change the number of independent variables, three models were proposed. The models were named Model 1 (all variables-categorical and continuous variables), Model 2 (two-step clustering and k-nearest neighbors), and Model 3 (feature-selection model). In addition, five machine learning models were suggested: Bayesian network, C5 tree, CHAID, CR-Tree, and a neural network. Then, the original dataset was divided into three datasets: training, testing, and validation, comprising 70%, 15%, and 15% of the original dataset, respectively. The three models were used with five machine learning models and the results showed that the machine learning models were capable of predicting the credit card customer churn. The results supported that, for Models 1-3, the C5 tree model outperformed all other machine learning models. The results revealed that the total transaction count, the total revolving balance on the credit card, and the change in the transaction count are the top three important variables to develop any churn customer prediction model. In addition, it is not necessary to use all the categorical variables (such as gender, education level, marital status, income category, and product variable) in developing a churn customer prediction model. On the other hand, adding a single categorical variable describing all the categorical variables in the dataset can improve the performance of the churn customer prediction model. The results also revealed that, to improve the churn customer prediction model, the selection of independent variables is an important step. This step can be implemented using one of the feature-selection models, or a combination of several variables. The credit card customer churning rate is the percentage of a bank's customers who try to leave the service. Building an earlyprediction model to predict the status of a bank's customers may help them to avoid losing their customers.
In summary, the results indicated that clustering with KNN and the C5 tree model outperformed previous models in various performance metrics, including R 2 , precision, and recall. Clustering the independent variables can improve the prediction performance in various sciences [39][40][41][42], and the number of transactions is dominant in identifying churn customers. The results proved that C5 tree models can outperform other functional models [16,41,42].
In future work, analyses to further understand the optimal independent variables are required to develop a more robust, more accurate, faster, less complicated, and more efficient churn prediction model. Such a study will cover more datasets with extra variables to extract the most important variables; in addition, new machine learning models will be executed to determine which is the optimal model. The use of the conventional machine learning model will not always guarantee the best results for a given dataset. Therefore, the accuracy and efficiency of prediction models should be improved in the future.