Churn Management in Telecommunications: Hybrid Approach Using Cluster Analysis and Decision Trees

: The goal of the paper is to present the framework for combining clustering and classiﬁcation for churn management in telecommunications. Considering the value of market segmentation, we propose a three-stage approach to explain and predict the churn in telecommunications separately for different market segments using cluster analysis and decision trees. In the ﬁrst stage, a case study churn dataset is prepared for the analysis, consisting of demographics, usage of telecom services, contracts and billing, monetary value, and churn. In the second stage, k-means cluster analysis is used to identify market segments for which chi-square analysis is applied to detect the clusters with the highest churn ratio. In the third stage, the chi-squared automatic interaction detector (CHAID) decision tree algorithm is used to develop classiﬁcation models to identify churn determinants at the clusters with the highest churn level. The contribution of this paper resides in the development of the structured approach to churn management using clustering and classiﬁcation, which was tested on the churn dataset with a rich variable structure. The proposed approach is continuous since the results of market segmentation and rules for churn prediction can be fed back to the customer database to improve the efﬁcacy of churn management.


Introduction
Churn, also known as customer attrition, represents a situation when the customer stops buying products or using services from a company. In telecommunications, churn refers to a loss of customers when the customer leaves a telecommunication service provider and switches to another operator (Chouiekh and Haj 2020).
The telecommunication companies operate in a saturated market since telecommunication services have become widespread globally (Calzada-Infante et al. 2020). Companies compete by delivering marketing campaigns to acquire new customers and retain existing ones, considering that the costs of retaining the existing customer are usually much lower than attracting a new customer (Kim et al. 2020). Hence, customer churn prevention is crucial in the telecommunication industry (Droftina et al. 2015).
Churn management aims to minimize the churn using various retention strategies to prevent customers from cancelling subscriptions, such as offering new devices or services. For retention strategies to be successful, companies obtain insights into customers' characteristics and behavior to predict those likely to churn . Predicting which customers are about to leave allows the company to lower customer churn using specific marketing campaigns that could impact the customers to change their minds on leaving (Bell and Mgbemena 2017). Furthermore, the results of churn prediction are used to identify the leading causes of attrition (Cheng et al. 2019).
Predictive churn modelling is used to discover and analyze patterns in data so that past customer behavior can be used to forecast churn behavior. These analyses mostly use a data mining approach and focus on developing the forecasting model for predicting churn. Most of these studies are developed using the database of all company customers, thus neglecting that not all customer segments have the same level of churn (Bayer 2010). A model that can develop a different churn prediction for different market segments is needed. It would allow companies to develop distinct marketing strategies for various market segments, such as promotions designed considering the churn causes in a specific market segment.
This paper contributes to the field of churn management in telecommunications with the development of the hybrid approach to churn management, which could support companies in the detection of market segments that are more likely to churn and target them with specific marketing campaigns. The three-stage hybrid approach that was developed combined cluster analysis and decision tree analysis. In the first stage, the customer database was prepared. In the second stage, a k-means cluster analysis was used to identify clusters with the highest level of churn. In the third stage, chi-squared automatic interaction detector (CHAID) decision tree analysis was used to determine the most significant characteristics of customers in the clusters with the highest churn levels. We applied the hybrid approach to the Telco Customer Churn open dataset.
The paper is organized as follows. After the introduction in the first section, the second section presents the overview of the literature focusing on market segmentation and churn prediction in telecommunications. In the third section, the data and procedures used for the data analysis are described. The fourth section is dedicated to interpreting the empirical results obtained using cluster analysis, variable selection and CHAID decision trees. The last section discusses the results, implications, limitations and future research directions.

Market Segmentation Using Cluster Analysis
Market segmentation is a process for dividing a given service or product into homogenous sub-groups of customers with similar characteristics . Customers who belong to the same market segment are similar to one another concerning chosen characteristics such as demographic characteristics and purchasing behavior. Market segmentation helps companies detect niche markets and develop targeted marketing strategies . Market segmentation is often used in customer-centric industries such as telecommunications (Hwang et al. 2004;Mu and Lee 2005).
Cluster analysis is commonly applied for market segmentation (Tuma et al. 2011). Cluster analysis groups a set of observations, such as customers, into homogenous groups, thus identifying the customers who are similar to one another within the same cluster and showing how they are different compared to customers in other clusters. When performing cluster analysis, a researcher can choose from various methods, such as hierarchical and non-hierarchical clustering (Marini and Amigo 2020). In hierarchical clustering, clusters are derived using the top-down (divisive) or bottom-up (agglomerative) approach, depending on the fashion in which the hierarchical decomposition is formed. Non-hierarchical clustering groups observations to minimize the evaluation criteria, such as the error function (Jin and Han 2011). In non-hierarchical clustering, a user needs to determine the number of clusters in advance.
Partitional clustering methods are a type of non-hierarchical clustering method. In partitional clustering, the dataset is decomposed into a set of disjoint clusters. Given a dataset of N observations, a partitioning method constructs K partitions of the data. A partition represents a cluster. In other words, by using partitional clustering, data are classified into K groups. Each group contains at least one observation, and each observation belongs to exactly one group. An example of non-hierarchical partitional clustering is k-means clustering.
To perform market segmentation in telecommunications, a variety of cluster approaches have been used in previous research. Zhou et al. (2020) integrated the recency, frequency and monetary approach with the sparse k-means clustering algorithm to conduct segmentation of the Chinese mobile telecommunications market based on extensive consumer data. Khalili-Damghani et al. (2018) used a hybrid soft computing approach based on clustering, rule mining, and decision tree analysis. Bose and Chen (2015) used fuzzy c-means clustering to build customer profiles and study mobile services customers' migratory behavior. They discovered usage and revenue patterns for two migratory groups of customers. Wang (2018) analyzed different groups of omnichannel buyers in telecommunications. The author used k-means particle swarm optimization for cluster analysis and the C5.0 classification method to formulate classification rules. Swarm optimization was also used by Li and Marikannan (2019). Qiu et al. (2020) used the k-means-HF method for silent customer segmentation, to identify a segment of customers that a company is likely to lose. The authors concluded it is necessary to analyze such customers' features and make appropriate market decisions to improve the telecommunication industry's revenue. Zheng and Liu (2020) used a hierarchical locality sensitive hashing-local outlier factor scheme for anomalous customer behavior detection and k-means clustering analysis on the real telecommunication operating data provided by one of China's major Internet service providers. Lin et al. (2019) used a parallel large sum submatrix bi-clustering algorithm based on Spark MapReduce to identify and segment highly profitable telecommunication customers who share similar upscale purchasing behavior on a small fraction of attributes. Golubev et al. (2017) used clustering techniques to determine the clusters with typical user behavior based on call detail records data. Al-Refaie (2017) examined the factors affecting customer churn in the Jordanian telecommunication industry using cluster analysis.
Most of the researchers used cluster analysis as the sole method for determining market segments in telecommunications. Only a few authors investigated the combination of various methods, such as cluster analysis and decision trees, for market segmentation in telecommunications (Khalili-Damghani et al. 2018;Wang 2018). K-means is the most often used clustering algorithm for market segmentation.

Predicting Churn in Telecommunications
Various approaches have been used for churn prediction in telecommunication. The research could be divided into two groups. Appendix A provides a summary of the bellow-mentioned churn modelling approaches in the telecommunication industry.
The first group includes research comparing the performance of algorithms applied on the various datasets to develop the forecasting model with the highest accuracy of predicting churn using one of the competing algorithms. This group of research claims that its modelling approach is successful compared with other state-of-the-art classifiers. A comparison study performed by Pamina et al. (2019) shows that the XG boost classifier performs better than K-NN and random forest in improving the accuracy of customer churn prediction. The comparison study was performed using a publicly available telecommunication dataset. Furthermore, the research shows that fiber optic customers with more significant monthly charges have a more substantial influence on churn. Ahmad et al. (2019) experimented with the decision tree, random forest, gradient boosted machine tree "GBM", and extreme gradient boosting (XgBOOST) to predict churn based on big raw data provided by SyraTel telecommunication company. The best results were obtained by applying the XgBOOST algorithm, which was then used for classification in the churn predictive model. The same methodology was used by Swetha and Dayananda (2020). Mand'ák and Hančlová (2019) performed logistic regression analysis on demographic and service usage variables to predict customer churn in European telecommunications providers. Ahmed and Maheswari (2017) presented metaheuristic-based churn prediction techniques applied to a massive Orange telecommunication dataset. A hybridized form of the Firefly algorithm was used as the classifier. Ahmed et al. (2020) developed the churn prediction model using hybrid firefly-based classification. AlOmari and Hassan (2016) tested the capability of the RULES Family algorithm-6 prediction data-mining technique to predict telecommunications customers' churn. Höppner et al. (2020) applied the ProfTree decision tree to construct churn prediction models based on real-life datasets from various telecommunications providers. Faris (2018) proposed a hybrid model based on particle swarm optimization and feedforward neural networks for churn prediction. Sjarif et al. (2019) used the Pearson correlation and k-nearest neighbor (KNN) algorithm for churn prediction, using the public Telco Customer Churn dataset available on the Kaggle platform. Azeem et al. (2017) compared neural networks, linear regression, C4.5, SVM, AdaBoost, and gradient boosting, random, and fuzzy classifiers for churn prediction. Almufadi et al. (2019) used convolutional neural networks (CNN) for the churn prediction of mobile telecom subscribers. Li and Marikannan (2019) used particle swarm optimization and an extreme learning machine to predict churn in telecommunication.
The second group includes research that used a hybrid approach combining several methods, claiming that the sequential usage of various methods yields the best results in churn prediction. Ullah et al. (2019) used a combined approach for churn prediction in the telecommunication sector. They used the random forest algorithm to classify the churn and non-churn customers and identify factors used in the k-means clustering. The features that were used were related to calls duration, free calls, charges and others. The authors did not provide a discussion of their results in terms of marketing strategy. Choudhari and Potey (2018) performed predictive analysis for customer churn in the telecommunication industry using a hybrid decision tree and logistic regression classifier. The authors also proposed a hybrid Fuzzy unordered rule induction algorithm with fuzzy c-means clustering for the prediction of customer churn. Olle and Cai (2014) performed customer churn analysis with a combined model using logistic regression for classification and the Voted Perceptron for churn probability estimation. Preetha and Rayapeddi (2018) used logistic regression, random forests and k-means clustering to predict customer churn in the telecommunication industry.
Based on the analysis of the data mining approaches for predicting churn in telecommunications, it can be concluded that the researchers focused primarily on the improvement of churn prediction, while they rarely focused on the development of different retention strategies for various market segments.

Methodology
We proposed a three-stage hybrid approach for churn prediction that combines cluster analysis and decision tree analysis ( Figure 1

Stage 1. Data Preparation
Various data sources are used for churn prediction and analysis (Verbeke et al. 2012). The first group of data sources contains the data on the telecommunication transactions, e.g., number and duration of calls (e.g., Wei and Chiu 2002; Kisioglu and Topcu 2011). The second group of data sources contains the customers' data, e.g., demographic characteris- In the first stage, a database for churn prediction was developed. In the second stage, we performed k-means cluster analysis to detect market segments. Additionally, we used chi-square analysis to identify the clusters with the highest level of churn. In the third stage, we used the CHAID decision trees to generate models for predicting churn behavior for each cluster separately, focusing specifically on clusters with the highest churn rate. Rules were extracted that could be used for churn management. The extracted rules and cluster descriptions could be added to the customer database to increase its value and effectiveness.

Stage 1. Data Preparation
Various data sources are used for churn prediction and analysis (Verbeke et al. 2012). The first group of data sources contains the data on the telecommunication transactions, e.g., number and duration of calls (e.g., Wei and Chiu 2002;Kisioglu and Topcu 2011). The second group of data sources contains the customers' data, e.g., demographic characteristics, usage of additional services, contracts and billing, and monetary value and failure. We recommend using the second type of data for customer relationship management since it contains the relevant information for market segmentation (e.g., demographic characteristics).
We support our recommendation with the findings of Verbeke et al. (2012), who analyzed various datasets used in the churn prediction, noting that it is essential to consider if the data are the predictor or the symptom of the occurrence of churn. For example, at first sight, it seems that the attribute suggesting a significant decline in total minutes called may be substantially connected with churn. However, this decline is more likely to occur after the customer has already decided to leave the company-in other words, when a churn event has already occurred but has not yet been recorded in the data. More reliable data for churn prediction includes socio-demographic data, financial information and marketing-related variables.
Furthermore, a variable measuring churn should be included in the analysis. Customers terminate their relationship with the telecommunication company by ending the contract or through the cessation of the one-time payment for prepaid users.

Stage 2. Cluster Analysis
In the second stage, we applied a clustering procedure to identify the homogenous groups of customers from the telecommunication database. A non-hierarchical partitioning k-means clustering procedure was applied since most marketing research uses k-means for market segmentation in telecommunication. We used all the observed variables in the cluster analysis besides churn since clusters will be compared according to the churn ratio.
Clustering requires a method for computing the distance or the (dis)similarity between each pair of observations. In the clustering procedure, a distance measure is a function that quantifies the similarity between two observations. It determines how the similarity of two observations will be calculated, and it will influence the size of the clusters. Euclidean distance is a standard distance measure used in clustering, which measures the straight-line distance between observation x a and x b for all j characteristics (Boehmke and Greenwell 2020): Many partitional clustering algorithms attempt to minimize an objective function when making clusters. In k-means, the objective is to minimize squared error function, which represents intra-cluster variance. It is often called the distortion function: J. Risk Financial Manag. 2021, 14, 544 6 of 25 in which k represents the number of clusters, n is the number of observations, i stands for observation, and c j is a centroid for cluster j.
Step 2.1. Determining the number of clusters Due to the vagueness of the criteria used for selecting the correct numbers of clusters and initial variables, cluster analysis is often criticized due to the possibility of deriving random solutions (Ernst and Dolnicar 2018;Vazirgiannis 2009). However, in a data mining approach, the v-fold methodology allows the automatic selection of the number of clusters (Kodinariya and Makwana 2013;Kassambara 2017). We determined the number of clusters using the v-fold cross-validation approach and by observing the cost sequence graph that showed the clustering solution's error functions for different clusters.
Step 2.2. Cluster solution validity After finding the optimal cluster solution, the derived clusters were observed and interpreted. The analysis of variance (ANOVA) was applied to check whether numeric variables were statistically significant across clusters. Furthermore, chi-square analysis for nominal variables was performed to detect statistically significant differences between clusters.
Step 2.3. Cluster characteristics Ultimately, the clusters were described according to the characteristics of the customers classified in a particular cluster. The distribution of variables across clusters was observed.
Step 2.4. Relationship between clusters and churn Chi-square analysis was performed to estimate whether there was a significant difference in churn occurrence across clusters. This procedure identified the clusters in which the highest proportion of customers were likely to churn.

Stage 3. Churn Prediction
In the second stage, churn prediction was performed for the clusters that demonstrated the highest churn rate. First, the feature selection was conducted. Second, CHAID decision tree modelling was applied to the clusters with the highest churn rate using the variables selected by the feature selection and variable screening analysis.
Step 3.1. Variable selection In this step, the variables that were most likely to be good predictors of churn were selected. In the first step, churn prediction variables were selected using chi-square statistics as the variable importance measures. This type of analysis does not assume any particular type or shape of the relationship between the predictors and the dependent variables of interest. Instead, it applies a generalized "notion of relationship" while screening the predictors, one by one, for regression or classification problems (TIBCO 2020). For continuous predictors, each predictor's range of values is divided into ten intervals by default to "fine-tune" the algorithm's sensitivity to different types of monotone and nonmonotone relationships. Since our research's dependent variable of interest was categorical, a chi-square statistic for each predictor variable was calculated. Those variables with chi-square values larger than ten were selected for the decision tree analysis using a rule of thumb.
Step 3.2. Decision trees' development The CHAID algorithm was used to build decision trees to determine how predictor variables best explain churn behavior.
The CHAID decision tree is based on the chi-square test, which is used to find the relationship between variables. It finds the most significant variable and selects the best split at each step. Each pair of predictor categories are assessed to determine what is least significantly different concerning the dependent variable. Due to these merging steps, a Bonferroni adjusted p-value is calculated for the merged cross-tabulation. In the CHAID decision tree, the root node contains the dependent variable, which is split into two or more categories, called initial or parent nodes. These categories have a significant influence on the dependent variable. The root node is split into child nodes. The child nodes that are not further split are called the terminal nodes.
In the proposed approach, churn was the dependent variable, and the variables selected in the previous step were predictors. First, decision trees were developed for the whole database, and then separately for each cluster identified in the second stage. Second, decision trees were compared according to their accuracy. Only those decision trees with the highest ratio of churn and the best churn predictions were retained in further steps.
Step 3.3. Decision tree analysis In the third step, decision trees developed for the clusters that contain the most significant churn ratio were analyzed to determine the customers' characteristics according to their churn behavior.
Step 3.4. Rule extraction for churn management In the final step, the rules were extracted for churn management that could be useful for developing a relevant marketing strategy to retain the customers likely to churn.

Stage 1. Data Preparation
The dataset used to demonstrate the proposed hybrid approach combining cluster analysis and decision trees was the Telco Customer Churn open dataset, which is available on the Kaggle platform-https://www.kaggle.com/blastchar/telco-customer-churn (accessed on 9 November 2021). This dataset was selected since it contains heterogeneous variables that reflect customer demographic characteristics, usage of services and billing behavior. Table 1 shows the variables used in the analysis. We divided the data into five groups: demographic variables, contracts and billing, additional services used, customer monetary value and tenure, and churn behavior. The prepared dataset includes 7032 customers with 20 variables.
The first group of variables contains demographic variables, and they are all binomial. The Gender variable is represented by two modalities, Female and Male, while the SeniorCitizen, Dependents and Partners variables are represented by two modalities, No and Yes.
The variables that describe the features of contracts and billing are Contract, Paper-lessBilling and PaymentMethod. The Contract variable is related to the contract term of the customer. It is of a nominal type, and it has three modalities: month-to-month, one year and two years. The PaperlessBilling variable describes whether the customer has paperless billing. The PaymentMethod variable is a nominal-type variable with four modalities that show a customer's payment method: Bank transfer, Credit card, Electronic check or mail.
The third group of variables describes additional services that the customers use. The InternetService variable is a nominal variable with three modalities, DSL, Fiber Optic and No. The DeviceProtection, OnlineBackup, OnlineSecurity, StreamingMovies, StreamingTV and TechSupport variables are all nominal-type variables with three possible modalities: No stands for not having contracted the service, No Internet service represents cases in which a customer does not use Internet service, and Yes refers to cases when customers use some of these services. The MultipleLines variable has three modalities: No describes cases when a customer does not use multiple phone lines, No phone service refers to instances in which a customer does not use phone services at all, and Yes is used when the customer has multiple phone lines.
The last group of variables describes customer monetary value and tenure. The Tenure variable is a numeric variable representing customer lifespan in months-the number of months the customer stayed with the company. The variables MonthlyCharges and TotalCharges are numeric numbers containing data on the amount charged to the customer, either monthly or in total (in USD).
The Churn variable is the dependent variable that is binomial and takes on two values, No and Yes, referring to customers who did not churn and customers who did churn, respectively.

Stage 2. Cluster Analysis
The k-means clustering algorithm was applied to group the customers according to the 19 observed variables' values. All of the variables were included except for churn. Customers were assigned to a particular cluster based on the Euclidian distances as a distance measure. The maximum initial distance approach was used to estimate initial centroids (Lee and Han 2012). A 10-fold cross-validation approach was applied to find the optimal clustering solution or the number of clusters with the lowest estimated error rate (intra-cluster variance), as successfully applied by the previous research (Thomassey and Fiordaliso 2006). The Statistica (Version 13.05) software was used for the cluster analysis.
Step 2.1. Determining the number of clusters The optimal number of clusters was determined iteratively by utilizing 10-fold crossvalidation. The cost sequence graph shows the error function for different numbers of clusters. As seen in the cost sequence graph (Figure 2), the best number of clusters was six since the error function decreased up to the cluster solution with six clusters.
Fiordaliso 2006). The Statistica (Version 13.05) software was used for the cluster analysis.
Step 2.1. Determining the number of clusters The optimal number of clusters was determined iteratively by utilizing 10-fold crossvalidation. The cost sequence graph shows the error function for different numbers of clusters. As seen in the cost sequence graph (Figure 2), the best number of clusters was six since the error function decreased up to the cluster solution with six clusters. Step 2.2. Cluster solution validity In Table 2, the results of the ANOVA of the numeric variables used in the cluster analysis are shown for the solution with six clusters. All results of the ANOVA suggest that the null hypothesis, where it is stated that the means between the analyzed variables are equal, can be rejected.  Step 2.2. Cluster solution validity In Table 2, the results of the ANOVA of the numeric variables used in the cluster analysis are shown for the solution with six clusters. All results of the ANOVA suggest that the null hypothesis, where it is stated that the means between the analyzed variables are equal, can be rejected.  Table 3 shows the values of chi-square statistics for the testing of differences between clusters according to nominal variables for the solution with six clusters. The values of the chi-squared test statistics for all variables and the associated p-values indicate that the null hypothesis for each of the variables can be rejected. In other words, a conclusion can be made that differences between the clusters exist for all nominal variables. Both the ANOVA and the chi-square analysis support the decision to use six clusters in the cluster analysis.
Step 2.3. Cluster characteristics Table 4 presents the cluster characteristics. The largest cluster contained 1520 observations, whereas the smallest cluster consisted of 516 observations. Demographic variables. The most common Gender of customers was male for Cluster 1, Cluster 3, Cluster 4 and Cluster 5, and female for Cluster 2 and Cluster 6. Non-senior citizens were the most common in all of the clusters. For Clusters 1, 2, 3 and 5, the customers without partners were in the majority, while the customers in Cluster 4 and 6 mostly had partners. In Cluster 6, most customers had children or partners they financially supported, while the customers in all other clusters mostly did not have dependents.
Contracts and billing. The month-to-month type of contract was the most common for Cluster 2, Cluster 3 and Cluster 6. One-year contracts prevailed in Cluster 1 while the Two-year contract constituted the majority for Cluster 4 and 5. The most common payment method was electronic check for Cluster 2, Cluster 3 and Cluster 6. Credit card as a payment method was found to be related explicitly to Cluster 1, bank transfer was identified as the most common payment method for Cluster 4, and mailed check was related explicitly to Cluster 5.
Additional services. Most customers from Cluster 4 used device protection. In other clusters, that was not the case; in other words, customers from Cluster 1, 2, 3 and 6 did not use device protection. The type of internet service that prevailed in Cluster 1 and 2 was DSL internet, and fiber optic was most common in Clusters 3, 4 and 6. Cluster 5 is unique insofar as most of the customers did not use internet service at all. When it comes to using multiple phone lines, it is noticeable that in Cluster 1, customers usually did not use phone service at all, Clusters 2, 5 and 6 contained customers who usually did not use multiple lines, and Clusters 3 and 4 contained those who used multiple lines. Usage of online backups was common for Cluster 4 and Cluster 6.
Members of Cluster 5 mostly did not use internet service. Therefore, they usually did not use additional online backups, online security, movie streaming, TV streaming, or technical support. However, they did use phone service.
Usage of online backups, movie streaming, and TV streaming was common in Cluster 4 and Cluster 6. However, those clusters differed in terms of technical support. In Cluster 4, most customers used technical support, while in Cluster 6, the opposite was the case. When it comes to other clusters, it is noticeable that online backup and streaming services were not used by the majority of customers in Clusters 1, 2 and 4. However, in Cluster 1, the majority of customers did use online security and technical support. Additionally, in Cluster 1, customers usually did not use phone service.
Customer monetary value and tenure. The length of a customer's stay with the telecommunication service provider or tenure was measured in months. First, the tenure was the longest on average for customers in Cluster 4 (59.47 months). Clusters 1 and 7 were similar in terms of the tenure of their customers. Since Cluster 2 contained the customers with the shortest average tenure (14.28 months), telecommunication companies should pay more attention to them and possibly offer them additional services and one-or two-year contracts. New customers were found to be particularly vulnerable; therefore, if their experiences are not satisfactory, the relationship is likely to be short. Customers who were satisfied with the service provider and had high cumulative satisfaction tended to stay for longer durations.
The highest average monthly charges were in Cluster 4, related to the number of additional services. The majority of customers in that cluster also used many additional services. Interestingly, relatively high average monthly charges were related to Cluster 6. Those customers usually had month-to-month contracts and did not use some of the additional services. Telecommunication companies should consider additional offers or discuss signing long term contracts to ensure that those customers stay. Customers from Cluster 3 had relatively high average monthly charges, even though the majority of them did not use additional services. Telecommunications providers should take some precautions regarding those customers because they may become unsatisfied by high charges.
Total charges were highest on average in Cluster 4, which was expected due to the contract's duration and monthly charges. On average, Cluster 5 had the lowest total charges due to the low monthly charges and the usage of only essential services such as phone service and one phone line.
Step 2.4. Relationship between clusters and churn Table 5 presents results of chi-square analysis. They suggest that clusters were determined to be statistically different according to churn occurrence across the clusters at the 1% significance level (Pearson chi-square 1080.666; p-value 0.000). In total, almost a third of the customers (26.58%) were found to be churners. The highest absolute and the relative number of churn cases was identified in Cluster 3 (708 cases, 51.38%). Cluster 2 was the second for churn occurrence with 552 cases and 39.15% of a total of 1410 customers in that cluster. Cluster 5, which was the largest (1520 customers), had the lowest occurrence of churn-113 customers or 7.43% of customers in that cluster. Cluster 6 was third according to churn occurrence (281 or 33.77% of a total of 832 customers in that cluster). In all of these clusters, the majority of customers had a month-to-month contract.
It can be concluded that the Cluster 2 and Cluster 3 customers were those with the highest churn occurrence (Figure 3). These clusters were selected to be used in further analyses, with decision tree analysis being utilized for churn prediction.

Stage 3. Churn Prediction
In the third stage, the variables were selected using the Feature Selection and Variable Screening features delivered by the Statistica software (Version 13.05). Decision trees were developed for the clusters with the highest churn rate using the SPSS software (Version 22).
Step 3.1. Variable selection In this step, the variables were screened to select the variables used for the churn prediction step. Table 6 contains the Feature Selection and Variable Screening results.

Stage 3. Churn Prediction
In the third stage, the variables were selected using the Feature Selection and Variable Screening features delivered by the Statistica software (Version 13.05). Decision trees were developed for the clusters with the highest churn rate using the SPSS software (Version 22).
Step 3.1. Variable selection In this step, the variables were screened to select the variables used for the churn prediction step. Table 6 contains the Feature Selection and Variable Screening results. The chi-square and p-values indicated that the most important predictor of churn was the Contract variable (chi-square 1179.543; p-value 0.000), followed by Tenure (chisquare 873.717; p-value 0.00). The least important predictors of churn were the Gender and PhoneService variables. Therefore, they were not selected to be used in further analyses. All of the other 17 variables were selected to construct the CHAID decision trees.
Step 3.2. Decision trees' development CHAID decision trees were generated at the level of the entire dataset, as well as separate clusters. Independent variables were selected in the previous step. The dependent variable was the churn binary variable. Table 7 presents the percentages of correct classifications for the decision trees. The decision tree of Cluster 3 was identified as the most successful in predicting churners as compared to the decision trees of other clusters. It correctly classified 81.4% of all churners. The decision tree for Cluster 2 was found to be also successful in its prediction, with 70.7% of churners correctly classified. Since Cluster 2 and Cluster 3 had the highest prediction accuracy for churn and were in the same time clusters with the highest ratio of churn customers, it is recommended that the marketing department conduct the churn analysis solely for these groups of customers.
Step 3.3. Decision tree analysis Figure 4 presents the decision tree developed for the customers in Cluster 2 with three levels and 23 nodes, of which 13 were considered terminal (leaf nodes). Figure 3 reveals that the Tenure, Internet service, Contract, Multiple Lines, Monthly Charges, Paperless Billing and TechSupport variables were statistically significant and used the classification tree provided by the CHAID algorithm.
have phone service. For Node 14, there was a significantly larger share of suspected churners compared to non-churners.
The MultipleLines variable was also used to branch Node 10 into two nodes (Node 19 and Node 20) in which, for both cases, there was a larger share of suspected non-churners than churners. Node 9 was branched using the Monthly Charges variable (Node 15 and Node 16). Node 15 contained customers who were charged for service at a price of 55.25 dollars or less per month. In that node, more than half of the customers were suspected of churning. In Node 16, which contained customers who paid more than 55.25 dollars per month for telecommunication services, there was a significantly larger share of not churners than churners. The PaperlessBilling variable was used to branch Node 11 into two nodes (Node 19 and Node 20). In both of those nodes, there was a larger share of non-churners compared to churners. There was a slightly higher share of churners in Node 19 compared to Node 20. Finally, Node 12 was branched to two nodes (Node 21 and 22) using the TechSupport variable. Node 21 contained customers who used technical support as an additional service. There was also a higher share of churners in Node 21 compared to Node 22. In both of the nodes, there was a higher percentage of non-churners compared to churners.   The variable that was used for branching on the first level was the Tenure variable. This branching resulted in four new nodes (Node 1, Node 2, Node 3 and Node 4). Node 1 included category ≤1.000 and consisted of customers who were mostly suspected of churning. Node 2 included customers who stayed with the company for between one and five months. It had slightly more customers suspected of churning than not suspected of churning. Node 3 included customers who had been with the company for between five and seventeen months. Node 4 included those who had been with the company for more than seventeen months. In both of them, the share of suspected churners was found to be greater than the share of non-suspected churners.
The variable Internet service was used for branching Node 1, Node 2 and Node 3 on the second level, while the variable contract was used for branching Node 4. Node 5 and Node 6 were derived from Node 1. Node 5 was related to the category of customers who used fiber optic. It is noticeable that majority of them were connected with suspected churn. Regarding Node 6, which was related to the category of customers who used DSL, it was shown that there was a more significant share of customers suspected of churning than the share not suspected of churning. However, the ratio between suspected churners and non-churners was much higher for fiber optic users. Similar results were shown for Node 7 and Node 8, which were derived from Node 2. Node 7 had a high share of suspected churners. Node 8 contained customers who used DSL, and it also contained a higher share of suspected non-churners than suspected churners. Node 3 branched to Node 9 and Node 10. For Node 9, which contained fiber optic customers, there was a slightly more significant share of suspected churners than non-churners. Node 10 showed a significantly larger share of suspected non-churners compared to churners. Node 4 had two branches, Node 11 and Node 12. Node 11 contained customers with a month-to-month contract, and Node 12 contained customers with One-year and Two-year contracts. In both of the nodes, there was a larger share of suspected non-churners compared to suspected churners. However, the share of non-churners for Node 11 was more significant compared to Node 12.
The third level branching variables were MultipleLines, MonthlyCharges, Paperless-Billing and TechSupport. The MultipleLines variable was used to further branch Node 6 into two nodes (Node 13 and Node 14). Node 13 consisted of customers who did not have multiple lines. Node 14 consisted of customers who had multiple lines or who did not have phone service. For Node 14, there was a significantly larger share of suspected churners compared to non-churners.
The MultipleLines variable was also used to branch Node 10 into two nodes (Node 19 and Node 20) in which, for both cases, there was a larger share of suspected nonchurners than churners. Node 9 was branched using the Monthly Charges variable (Node 15 and Node 16). Node 15 contained customers who were charged for service at a price of 55.25 dollars or less per month. In that node, more than half of the customers were suspected of churning. In Node 16, which contained customers who paid more than 55.25 dollars per month for telecommunication services, there was a significantly larger share of not churners than churners. The PaperlessBilling variable was used to branch Node 11 into two nodes (Node 19 and Node 20). In both of those nodes, there was a larger share of non-churners compared to churners. There was a slightly higher share of churners in Node 19 compared to Node 20. Finally, Node 12 was branched to two nodes (Node 21 and 22) using the TechSupport variable. Node 21 contained customers who used technical support as an additional service. There was also a higher share of churners in Node 21 compared to Node 22. In both of the nodes, there was a higher percentage of non-churners compared to churners. Figure 5 presents the decision tree developed for the customers in Cluster 3. The CHAID decision tree developed using the data on customers from Cluster 3 had three levels and 22 nodes, of which 13 were considered terminal. Figure 4 reveals that Tenure, Internet service, Multiple Lines, Monthly Charges, PaymentMethod, StreamingMovie, Senior Citizen, and TechSupport were statistically significant, and these were used to build the classification tree using the CHAID algorithm.
J. Risk Financial Manag. 2021, 14, x FOR PEER REVIEW 17 of 25 contained customers who used technical support as an additional service, a significantly higher percentage of customers were suspected of being non-churners than churners. Step 3.4. Rule extraction for customer relationship management Decision trees result in the rules that can identify particular groups of customers who require special attention. In this research, the rules were used to identify groups of customers who are likely or unlikely to churn. These rules could be fed back to the customer The variable that was used for branching on the first level was the Tenure variable. This branching resulted in four new nodes (Node 1, Node 2, Node 3, Node 4 and Node 5). Node 1 included category ≤1.000 and consisted of customers who were mostly suspected of churning. Node 2 included customers who stayed with the company for between one and seven months. It had significantly more customers suspected of churning than not suspected of churning. Node 3 included customers who had been with the company for between seven and fifteen months. It had more customers suspected of churning than not suspected of churning. Node 4 included those who had been with the company for between and fifteen months, and Node 5 included customers who had been with the company for more than 51 months. In both of them, the share of suspected churners was greater than the share of non-suspected churners. However, in Node 5, the share of non-churners was significantly larger.
The second level branching variables included the MultipleLines variable for Node 2, the MonthlyCharges variable for Node 3 and the PaymentMethod variable for Node 4 and Node 5. According to Figure 3, branching resulted in eight nodes. Node 6 and Node 7 were derived from Node 2. Node 6 was related to the category of customers who used multiple phone lines. It is noticeable that a majority of them were associated with suspected churn.
Regarding Node 7, which was related to the category of customers who did not use multiple lines, it was shown that there was a slightly more significant share of customers suspected of churn than the share who were not suspected of churn. However, the ratio between suspected churners and non-churners was much higher for customers who used multiple lines. Furthermore, Node 8 and Node 9 were derived from Node 3, which was branched using the Monthly Charges variable. Node 9, in which the monthly charges were higher than 80 dollars per month, had a higher share of suspected churners.
Node 4 was branched into Node 10 and Node 11 using the PaymentMethod variable. Node 10 contained customers who paid for their service by mailed check, bank transfer and credit card, whereas Node 11 contained customers who paid by electronic check. Node 10 contained a significantly larger share of non-churners than churners. Node 11 contained a slightly more significant share of churners. Additionally, Node 5 was branched to Node 12 and Node 13 using the Payment Methods variable. Node 13, which contained customers who paid for their service by electronic check, had a larger share of churners than Node 12.
The third level branching variables included the PaymentMethod variable for Node 8, the StreamingMovies variable for Node 9, the SeniorCitizen variable for Node 10 and the TechSupport variable for Node 11. Node 8 was branched further into two nodes (Node 15 and Node 15). Node 14 consisted of customers who paid for their service by mailed check, bank transfer or credit card. For Node 14, there was a significantly larger share of non-churners compared to churners. Node 15 consisted of customers who paid for their service by electronic check. More than 50% of customers from Node 15 were churners. The StreamingMovies variable was used to branch Node 9 into two nodes (Node 16 and Node 17). There was a larger share of suspected non-churners compared to churners, but the share was more significant for Node 16, which contained customers who used movie streaming services. Node 10 was branched using the SeniorCitizen variable (Node 18 and Node 19). For the SeniorCitizen variable the value 0,000 represented non-senior customers, and 1,000 represented senior citizens. Therefore, node 18 contained non-senior customers. A significantly larger share of these customers were not suspected of churning. Node 19, which contained senior citizens, contained a larger percentage of churners than Node 18. The TechSupport variable was used to branch Node 11 into two nodes (Node 20 and Node 21). Node 20 was related to customers who did not use technical support. It was shown that more than 50% of them were suspected of churning. Finally, in Node 21, which contained customers who used technical support as an additional service, a significantly higher percentage of customers were suspected of being non-churners than churners.
Step 3.4. Rule extraction for customer relationship management Decision trees result in the rules that can identify particular groups of customers who require special attention. In this research, the rules were used to identify groups of customers who are likely or unlikely to churn. These rules could be fed back to the customer database and be used for targeted marketing campaigns, e.g., providing incentives to the customers who were likely to churn.
The rules that predicted that customers in Cluster 2 and Cluster 3 were likely to churn for are provided in Appendix B. The rules described the specific characteristics of customers who belonged to one of the terminal nodes. For example, the rule indicated that the customers in Node 6 of Cluster 2 had a tenure shorter than one year, used fiber optic, and had an 88.17% probability of churning. In such cases, the company may decide to offer incentives to this specific group of customers. Furthermore, since these customers were using the fiber optic to connect to the Internet, the quality of the fiber optic connection could be further investigated for these specific customers it could be determined whether there were systematic problems with their internet connections.

Theorethical Implications
This research explored the possibilities of using a hybrid data mining approach for churn prediction in telecommunications. A three-stage approach was used.
First, the database was prepared with the following variable groups: demographic characteristics, usage of additional services, contracts and billing, monetary value and failure, and churn. This research used the Kaggle churn dataset, which contained all the variable groups that were considered relevant to market segmentation.
Second, k-means cluster analysis was conducted using 20 independent variables, resulting in six cluster solutions. The identified clusters were compared according to the percentage of churning customers. Two clusters with the highest churn ratios were identified. Cluster 2 was the second-largest cluster, which was composed of mostly female, non-senior citizens, without dependents and partners, with month-to-month contracts with paperless billing enabled, who used using DSL internet services and phone services, and had an average tenure of 14 months which was the shortest time relative to the other clusters. Cluster 3 was the third-largest cluster, with members who were primarily male, non-senior citizens without partners and dependents, who used month-to-month contracts with paperless billing, paid relatively high monthly charges for telecommunication services by electronic check, most commonly used fiber optic for their internet service, had multiple phone lines, and were with the company for less than two years. These clusters indicate that churn in telecommunications is mainly related to tenure or to the number of months a customer has stayed with a company. Based on this, the reasons for churn can be explained in terms of a bad initial experience, low satisfaction with the services, the use of trial periods or prepaid accounts that expire automatically, or perhaps loyalties to multiple companies.
Third, decision trees, with Churn as a dependent variable, were developed using the whole database, and were developed separately for each cluster. It was discovered that the churn forecasts obtained for different clusters, i.e., marketing segments, could be drastically different. This result indicates that the use of cluster analysis as the starting point of decision tree analysis can make it possible to better understand the elements that influence client behavior. In our case study, the decision trees for the clusters with the highest churn rates (Cluster 2 and Cluster 3) also performed the best in terms of forecasting the churn behavior. Decision tree rules were extracted to identify specific groups of customers who were likely to churn in these clusters. Tailor-made marketing campaigns can be designed to incite these specific customers to stay loyal to the telecommunication company.
A theoretical implication of our findings is that a hybrid approach using k-means clustering and CHAID decision trees is appropriate for churn modelling in the telecommunications industry. Additionally, the study brings added value to the literature on customer churn prediction. Understanding the predictors of customer churn behavior would help telecommunication companies in terms of their churn management by making it possible to prevent customers from engaging in churn behavior. The proposed approach to churn management is novel, and it differs from other studies on churn modelling in the telecommunications industry. First, most of the researchers have used accuracy-oriented approaches to churn modelling in telecommunications. They focus on the performances of the algorithms used for churn modelling while lacking an explanation of the variables identified as relevant for churn behavior or the practical recommendations regarding churn management systems. For example, Pamina et al. (2019) compared the k-nearest neighbor, random forest and XG Boost algorithms, and Ahmad et al. (2019) compared the performances of the random Forest, gradient boosted machine tree and XG Boost algorithms. Second, hybrid approaches to churn modelling in telecommunications are scarce. In terms of clustering methods and decision trees, Ullah et al. (2019) used classification methods and k-means clustering, and Preetha and Rayapeddi (2018) used logistic regression, random forest and k-means clustering for customer churn prediction. Choudhari and Potey (2018) used the decision tree, logistic regression and fuzzy unordered rule induction algorithm (FURIA) with fuzzy c-means algorithms. Our hybrid approach includes k-means clustering for customer segmentation and CHAID decision trees, which explain customer churn. Although other researchers have previously proposed the combined usage of clustering and classification methods to improve churn prediction, this paper focuses on the marketing aspects of the churn analysis and prediction process.

Practical Implications
Based on the presented hybrid approach, we developed the following recommendations for improving churn management.
First, the development of a continuous churn management process ( Figure 6) is recommended. Churn management should not be a one-time action. Instead, the inclusion of the market segmentation in the customer database is recommended for use in future customer relationship programs. In addition, the effectiveness of churn rules can be measured, and rules can be further used or discarded based on their effectiveness.
J. Risk Financial Manag. 2021, 14, x FOR PEER REVIEW

Practical Implications
Based on the presented hybrid approach, we developed the following recomm tions for improving churn management.
First, the development of a continuous churn management process ( Figure 6 ommended. Churn management should not be a one-time action. Instead, the in of the market segmentation in the customer database is recommended for use in customer relationship programs. In addition, the effectiveness of churn rules can be ured, and rules can be further used or discarded based on their effectiveness.
Second, a precise approach to design incentives is recommended for the cus who are likely to churn, thus expanding the base of loyal customers who are less li churn. The proposed hybrid approach that segments customers into similar grou predicts churn separately for each segment is excellent in terms of tailored ince Companies can also identify the customers classified as false positives and focus p lar attention in their churn marketing campaigns on the customers who have simila acteristics. Third, although our research is based on a customer base from one telecomm tions provider, we strongly suggest other telecommunications companies develo churn prediction models using their own databases, thus tailoring the proposed ap to their specific situation.

Conclusions
This study aimed to explain customer churn in the telecommunications indu following a hybrid methodology for churn analysis. Four groups of variables wer to explain customer churn behavior: demographic variables, additional services, co and billing, and monetary value and failure. Cluster analysis based on the k-mean rithm was used to detect clusters with the highest churn occurrence. Cluster ana the starting point of decision tree analysis also helped to better understand the va that influenced client behavior. The CHAID algorithm was used to generate decisio for two clusters with the highest churn occurrence. Second, a precise approach to design incentives is recommended for the customers who are likely to churn, thus expanding the base of loyal customers who are less likely to churn. The proposed hybrid approach that segments customers into similar groups and predicts churn separately for each segment is excellent in terms of tailored incentives. Companies can also identify the customers classified as false positives and focus particular attention in their churn marketing campaigns on the customers who have similar characteristics.
Third, although our research is based on a customer base from one telecommunications provider, we strongly suggest other telecommunications companies develop their churn prediction models using their own databases, thus tailoring the proposed approach to their specific situation.

Conclusions
This study aimed to explain customer churn in the telecommunications industry by following a hybrid methodology for churn analysis. Four groups of variables were used to explain customer churn behavior: demographic variables, additional services, contracts and billing, and monetary value and failure. Cluster analysis based on the kmeans algorithm was used to detect clusters with the highest churn occurrence. Cluster analysis as the starting point of decision tree analysis also helped to better understand the variables that influenced client behavior. The CHAID algorithm was used to generate decision trees for two clusters with the highest churn occurrence.
The research contributes to the literature on churn management using machine learning. Contrary to the majority of research, in which one method is applied for churn prediction, we propose a hybrid approach that combines clustering and classification. The combined approach resulted in the increased accuracy of classification for the clusters with the highest churn ratio, and at the same time provided in-depth knowledge about the market segment containing the customers who were the most likely to churn. Our results reveal that churn in telecommunications is mainly related to the Tenure variable, with the customers most likely to churn being those who have been with the company for the shortest period. Characteristics of the clusters with the highest level of churn indicate that churn can be explained in terms of a bad initial experience and low satisfaction with the services, and, in this study, churn was mostly associated with customers using trial periods or prepaid accounts. However, these results were tested only on the churn case study dataset, and tailor-made analyses should be planned for each company, considering that each telecommunication company has specific circumstances.
The research limitations are as follows. First, the research was based on public data from only one telecommunication company from one country. The suggested approach showed its usefulness in the case of Telco Customer Churn data from Kaggle. However, the specific rules and clusters identified by our analysis may not be relevant for other telecommunication companies when using their customer databases. The analysis of such databases could be relevant for the generalization of the proposed hybrid churn management approach. Therefore, its performance and applicability should be tested based on other data. Second, this research used only one method for clustering (k-means) and one method for classifying customers according to churn (CHAID decision trees). Although these methods are well-known and often used for churn management, future research should conduct analyses using the other research from the same group of analyses (e.g., EM clustering instead of k-means).
The proposed framework should be applied to customer downward migration, which refers to the decreasing of customer value over time due to the decreased usage of services, considering that significantly more value may be lost over time through downward migration than is lost through churn (Bayer 2010). Since customer migration and churn have similar causes, the proposed approach is likely to be useful for the prediction of downward migration. Furthermore, the proposed hybrid approach could be applied to other customer-centric industries, such as financial services, retail and health care, in future research.