An Empirical Study on Customer Churn Behaviours Prediction Using Arabic Twitter Mining Approach

: With the rising growth of the telecommunication industry, the customer churn problem has grown in signiﬁcance as well. One of the most critical challenges in the data and voice telecommunication service industry is retaining customers, thus reducing customer churn by increasing customer satisfaction. Telecom companies have depended on historical customer data to measure customer churn. However, historical data does not reveal current customer satisfaction or future likeliness to switch between telecom companies. The related research reveals that many studies have focused on developing churner prediction models based on historical data. These models face delay issues and lack timelines for targeting customers in real-time. In addition, these models lack the ability to tap into Arabic language social media for real-time analysis. As a result, the design of a customer churn model based on real-time analytics is needed. Therefore, this study offers a new approach to using social media mining to predict customer churn in the telecommunication ﬁeld. This represents the ﬁrst work using Arabic Twitter mining to predict churn in Saudi Telecom companies. The newly proposed method proved its efﬁciency based on various standard metrics and based on a comparison with the ground-truth actual outcomes provided by a telecom company.


Introduction
Global competition for telecommunication services drives companies to enhance their customers' satisfaction. Extensive research correlates customer satisfaction with customer loyalty and customer churn [1][2][3]. Customer churn is defined in the telecommunication field as transferring customers from one telecom company to another [4]. Recent research shows that the cost of having a new customer is more than the cost of keeping an existing customer [5]. Thus, companies are more concerned with keeping customers than ever before. Hence, as seen in the literature review section, many studies have been done in various industries in CRM (customer relationship management) to manage customer retention and develop an efficient model to predict the churners. This paper addresses the following problems related to customer churn prediction models:

•
The current churn prediction models have a relatively short life, as they rely on the customers' historical data. The data become less valuable over time for making predictions [6], which may not provide telecom companies with the best churn prediction experience.

•
There is a lack of research that integrates a structural data framework with real-time analytics to target customers in real-time [7].

•
The current churn prediction models exclude location and language factors and that causes geographical and cultural sampling errors [8].
1. It is the first work using Twitter mining to predict potential customer loss (churn) in Saudi telecom companies. 2. It identifyies and evaluates the main gaps in the current churn prediction models. 3. It proposes and evaluates a novel design of a churn prediction model to address the gaps in current churn prediction models by providing a real-time method that suits the telecom data and the Arabic data set. 4. It contributes to the Arabic sentiment analysis (ASA) research community, by using the latest cutting-edge techniques to perform new experiments with the above relatively new, unexplored and extensive Arabic dialect dataset.
This study thus answers the following research question (RQ): RQ: Is it possible to predict the customer churn of telecommunication companies in Saudi Arabia by analysing customers' tweets?
This paper starts with reviewing the related research, then explaining the used methodology, finally discusses the results.

Related Research
Answering the RQ above, we explored the areas related to this research, customer satisfaction, customer churn and social media mining.

Predicting Customer Churn and Data Mining Techniques
Customer satisfaction and customer churn have been identified as two factors that contribute to industry success and are therefore the hot topics researched in various industries, such as telecommunication [22], medicine [23] and tourism [24]. A more recent solution states that customer churn prediction requires customer behavior analysis [25]. Churn management, identified as the process of keeping existing customers [26], has always had a vital role in CRM in the telecommunication industry [6,27]. Due to the lack of the studies that used social media mining as variable input to the churning prediction model, we did not find research close in content to our current research. Therefore, we will discuss here the researches that developed a churn prediction model based on historical data or other parameters, using different techniques, as shown in Table 1. The existing churn prediction studies have been critically evaluated to find the gaps that would allow to answer the RQ. The review included a full-text assessment of the articles ( Table 1). The table briefly compares the aim, used data set, algorithm(s), results and future work for the reviewed studies.
In the following, these related research in Table 1 are analysed in more details. The only other study to use of tweet sentiment analysis for business [28] found that there is a relationship between the sentiment of tweet feeds related to Telcom's broadband internet service and the customer churn rate. They applied the long short-term memory model (LSTM) for sentiment analysis. Their results showed that churn prediction can be improved by monitoring the negative sentiment about 1.47% Mean average percentage error (MAPE). However, their study did use the social media mining for developing churn prediction model. Related research mainly uses company-provided data for churn prediction. Whilst this is a useful source, this is not always available.
Therefore, there is a lack of studies that use social media mining as a variable input for the customer churn prediction model to our knowledge. This proves that a knowledge gap exists in demonstrating how social mining can predict customer churn in various industries. By analysing the studies, a significant finding was that the reviewed literature investigating churn prediction models showed that social mining is a powerful tool for assessing customer satisfaction and predicting customer churn.

New Customer Churn Model Variables
Next, we show how our customer churn variables have been chosen. These data and parameters are presented here as gathered from three sources, sequentially: literature review, questionnaire and interviews with the telecom company experts (Table 2) and the customer satisfaction rate obtained from customer tweet mining. We can further divide the variables into two types: independent variables (predictors), which are all the variables collected as inputs for the prediction model, and dependent variable, which represents the model outcome of the churn status variable. This section explains in detail where and why we use these specific variables based on literature.
Using customer demographics (age and gender) as churn predictors in the churn prediction model is common in the literature [6,7,25,[29][30][31][32][33][34][35][36] found that young people below forty-five years of age are more likely to churn. The similar results were found by [33,36]: customers between forty-five and forty-eight years old are more likely to churn.
Many researchers have studied the impact of a family or a friend leaving the same telecom company on a customer's churn decision [8,33]. That is because of the increase in call price between two customers with different voice provider.
Consistent with this result, reference [37] showed that a customer is more likely to churn if they have a social relationship with another customer, who intends to, or has already churned from the telecom company. This finding denotes that a company is at risk of churning if a customer's relationship leaves the company. Moreover, references [35,38,39], used calling behaviour and network interaction (call length and number of calls) as churn predictors.
Some studies have realised the impact of social network information on churn prediction. For instance, reference [8] predicted customer churn by using customer information and their social network information. Their dataset was from the Pokec social network (http://snap.stanford.edu/data/soc-pokec.html, accessed on 23 June 2021) and the call details of customers issued from the network over an interval of six months. They found that combining social network information with call log details improved the churn prediction. The same results were obtained by [40] who studied the impact of the social network on the prediction of customer churn.  [41] Customer data from SyriaTel telecom company.
Decision tree, random forest, gradient boosted machine tree and extreme gradient boosting (XGBOOST).
The best results were obtained by applying the XGBOOST algorithm with 93.3% the area under the curve (AUC) value. [7] The unstructured data included: (1) details of customer complaints and feedback.
(2) data records captured, such as data regarding purchase, download of apps, etc.

RFM technique
They recommended the integration of the structure data framework with real-time analytics to target customers in real time on the bases of location, time, etc. [42] Available historical records extracted from the telecom industry.
Logistic regression and decision trees in R.
The data mining techniques could be a promising solution for customer churn management. [43] Two telecom industry datasets were considered. Type-1 contained 3333 records, and Type-2 contained 20,468 records.
Axiomatic fuzzy set theory and parallel density-based spatial clustering on the Hadoop MapReduce framework.
The proposed model is more efficient than the existing system in terms of time and performance. [2] Online available customers dataset at Kaggle https://www.kaggle.com/ (accessed on 23 June 2021).
Used different classifiers implemented in WEKA, Summed up their findings with a conclusion that bagging, and the SMO algorithm outperform with an accuracy of 99.8% using 14 attributes. [3] A total of 153,651 distinct tweets for the Twitter handles of five popular telecom brands in India.
Semantic analysis.
Proved that sentiment analysis can manage the higher growth rate of new subscribers who were added to the brand in the study period. [28] Tweets related to Telkom's broadband internet service and customer churn rate data history from the company's data warehouse.
Applied sentiment analysis using recurrent neural network LSTM.
Results indicated that the accuracy of the churn rate predictions (based on the previous three months) are correlated with negative moods. [8] Used the Pokec social network data and generated synthetic call log details of 25,000 users Used influence maximisation Future analysis should factor in both location and language to avoid geographical and cultural sampling errors.  They combined call details from a social network with the information about the customers. Moreover, reference [38] used a relational learner to increase the performance of the churn prediction model. They analysed calling behaviour and network interaction.
Different studies used the contract length as a churn predictor [6,33,36,39,45] concluded that customers with contract lengths between twenty-five and thirty months are more likely to churn. Many studies are related to contract length and overdue bills as churn predictors. Reference [36] found that customers with contract lengths between twenty-five and thirty months and four overdue bills are more likely to churn. In agreement with this result, reference [33] concluded that churning happens more for customers with contract lengths between twenty-five and thirty months and who have more than four overdue payments within six months. Reference [32] chose five attributes to predict churning, one of which also includes unpaid balances.
Most studies analysed use the customer call details as primary churn predictor [37]. Reference [30] assessed the categorical and continuous data transformation in the performance of the churn prediction model. Their dataset was from a European telecommunication company. Some of the variables they selected were the number of minutes for outgoing calls and the number of contacts with the call centre. In addition, reference [6] compared some techniques used in churn modelling. Their dataset was from a UK mobile telecommunication company. They included several variables, one of which was call usage detail.
In 2016, reference [42] proposed a model for churn prediction for telecommunication companies. They used historical records related to the telecom company. The attributes included phone and call details. Reference [31] applied rule-based classification to predict whether a customer is likely to churn or not. Their dataset contained customer information such as call details (billing information and length of calls). Furthermore, reference [2] applied different data mining techniques to predict customer churning. They applied their methodology on the online dataset from Kaggle. They used fourteen attributes, including call details, customer service calls and phone number. Reference [25] built a churning prediction model for a mobile telecommunication company. They used two datasets: customer information and statistical data, which contained call length and complaint information. Reference [32] assessed many techniques to predict customer churning and used the dataset from an Indian telecommunication company. They chose five attributes to predict churning, such as customer dissatisfaction and satisfaction, switching costs, quality of services, service usage in terms of used minutes in calls, call details and unpaid balances. They also used customer-related variables, such as customer gender, customer status or whether a customer is an active user. Reference [46] concluded that customers with no active plans and with no incoming and outgoing calls within six months are likely to churn.
In addition, reference [47] predicted customer churning in the telecommunication industry based on rough set theory. They used historical data on a publicly available dataset and found some essential attributes in the customer churn prediction, such as evening minutes, customer service calls and day minutes. Reference [45] proposed a prediction model for a customer churn by using different data mining techniques. They used customer information, such as contract length, customer complaints and call details. Reference [48] used three hybrid models over two stages: data clustering and churning prediction. They collected the three-month call data of customers of a Jordanian telecommunication company. Reference [49] predicted customer churn according to the call details and contract information they gathered from interviews with telecom experts. Reference [7] proposed a model for predicting high value customers and churner customers. They used customer information, such as age, sex and call details. Numerous studies recognised the importance of including customer complaints as an attribute in their churn prediction model [6,7,25,31,32,34,35,39,41,43,45,46,50].
After reviewing the literature, we listed the most common techniques in Table 3. As shown in the literature, decision trees and logistic regression are the most common techniques used in churning prediction models. A decision tree offers a graphical representation of the relations between churning variables [51]. CART or CHAID are examples of the algorithms used to develop a decision tree [52]. Both logistic regression and decision tree are effective and easy techniques to predict churning and analyse the characteristics that cause a churn [36,42,53,54].  [41,70] However, there are some disadvantages in using a decision tree, such as it being affected by the complex relations between the variables [71]. The next technique commonly used in the literature is a neural network, which has some limitations, including its need for an extensive dataset and extensive time consumption in training [42]. Support vector machine and naïve Bayes were likewise used.

Methodology
The two types of known customer churning are voluntary and involuntary [72]. The decision of a customer to move to another telecom company on their own is called voluntary, while a customer ceasing telecom company services for reasons outside their influence, such as death or change of the customer's job, is called involuntary [73]. Usually, the literature is interested in voluntary customer churning, because it describes the relationship between a customer and a company. There are two types of customer payment schemes: post-paid and pre-paid [74]. Post-paid customers receive a monthly bill for company services, while a pre-paid customer is charged in advance for company services.
In this study, a churner is defined as a post-paid customer who voluntarily leaves the company and stops telecom services within our time window. By contrast, a non-churner in our study is a post-paid customer who remains with the company within our time window.
Data mining refers to knowledge discovery from a large database [75]. The three most common data mining methodologies used to develop data mining models are knowledge discovery databases (KDD) [76], cross-industry standard process for data mining (CRISP-DM) [77,78], and sample, explore, modify, model, assess (SEMMA), which was created by the SAS Institute (Inc. SI. SAS version 9.1., 2005, Wake County, NC, USA). The literature review indicated that KDD and CRISP-DM are more widely used than SEMMA [79,80]. Although KDD includes nine phases and CRISP-DM has six phases, their phases are equivalent [79]. We adopted some of the steps of CRISP-DM [78] that suit our task to develop our churn prediction model (SentiChurn model, Figure 1) because CRISP-DM is appropriate for a business domain [81]. The six phases of CRISP-DM are shown in Figure 2.  As you can see in Figure 1, the first phase, data set construction, includes variables defined and collected. Defining the variables entails collecting the variables from the sources (Figure 3). To collect the variables that can differentiate between churners and non-churners and use our model as input (Table 2), we collected them from three sources. First, we collected variables from the literature review. Some variables found in the review were disregarded because of the difficulty of obtaining them from telecom companies due to privacy concerns, such as name, phone number and code, call details and billing information. This is the case with many prediction model systems in other countries [46]. Next, we conducted a survey via questionnaire with the telecom customers. The questionnaire aimed to test the relationship between the collected variables and churning behavior from a customer's point of view. Afterwards, we conducted an informal interview with a Saudi telecom expert (a telecom business consultant) to show here the collected variables and question him about other variables from the company's point of view.  The telecom company divides its customers into segments based on their own selected set of variables and calculates the churn rate for each segment quarterly, half-yearly and annually. They propose that the variables for one segment have higher churn rates halfyearly because a higher churn rate must be obtained to train the prediction model. Based on the results of the literature review, questionnaire and interview, we collected some variables that could help us predict customer churning and differentiate between churners and non-churners. The company provided us with historical data from two years ago to maintain customer privacy about their current customers. The company name has been withheld according to its request and is called in the rest of the document 'the company'. The second phase in the SentiChurn model ( Figure 1) is data preparation; it includes data description, data transformation and initialization of the dataset for modelling. Regarding the third phase modelling in Figure 1, it includes training the model, an appropriate data mining algorithm (G. Modelling) is chosen, and the model in the training set is trained to address the problem.
In model evaluation, the model is evaluated on the test set, by using the performance measures. In the model deployment stage, the prediction result is presented to the company for evaluation from a real-world as well as company-perspective.

Data Set Construction
The dataset has been constructed from historical data that was provided by the company, and from the customer satisfaction rate that was measured through Twitter mining [44]. We collected a sample of 100,000 customers' data from the Saudi telecom Company. From this figure, 27,000 were churners while 73,000 were non-churners. These historical data of customers were collected randomly within a time window of six months, from January 2017 to June 2017.
Earlier studies differed in setting the time window for churning analysis and prediction. For instance, reference [28] proved that a customer mood in Twitter could be a predictor for churning three months later. In addition, reference [48] collected the threemonth call data of customers from a Jordanian telecommunication company. Their results agreed with those found by [82] that two to three months is a sufficient time window to prepare a strategy for retaining customers and preventing churning.
On the contrary, reference [34] stated that the four months is needed to predict a customer churning based on his/her dissatisfaction. However, reference [3] increased this to a five-month collection of tweets as dataset to predict their customer growth model. Other studies set even six months as the time window for churn prediction [6,8,33,36]. Tsai and Lu [67] found that a customer should be with a company six months or longer to have an accurate prediction model. Thus, our selected time window is adequate conform to even the strictest previous studies. We take [67] suggestions into account, as we agree that a customer could become resentful but may take a longer period to carry out the churning action. Thus, we can consider that, as our dataset is from January 2017 to June 2017, the churning can only be estimated between July and December 2017 ( Figure 4).

Customer Satisfaction Rate
To build the datasets that we will use for measuring the customer satisfaction rate, we used Python to interact with Twitter's search application programming interface (API) [83] to fetch Arabic tweets based on certain search keys, as follows. Firstly, the hashtags used in the search were the ones that indicated different Saudi telecom companies, such as STC, Mobily and Zain. Then we grabbed the top hashtags mentioning these telecom companies, which were: #STC, #Mobily, #Zain

Customer Satisfaction Rate
To build the datasets that we will use for measuring the customer satisfaction rate, we used Python to interact with Twitter's search application programming interface (API) [83] to fetch Arabic tweets based on certain search keys, as follows. Firstly, the hashtags used in the search were the ones that indicated different Saudi telecom companies, such as STC, Mobily and Zain. Then we grabbed the top hashtags mentioning these telecom companies, which were: #STC, #Mobily, #Zain ‫ﻣﻮﺑﺎﯾﻠﻲ‬ (Mobily), and ‫زﯾﻦ_اﻟﺴﻌﻮدﯾﺔ‬ # (Saudi Zain).

Customer Satisfaction Rate
To build the datasets that we will us we used Python to interact with Twitte (API) [83] to fetch Arabic tweets based o hashtags used in the search were the companies, such as STC, Mobily and Zain. these telecom companies, which were: ‫زﯾﻦ_اﻟﺴﻌﻮدﯾﺔ‬ # (Saudi Zain).
# (Saudi Zain). The aim was to monitor the telecom customers' sentiments continuously. This process was ongoing from January 2017 until June 2017, to generate the largest possible dataset, because this would subsequently shrink after spam and retweets are eliminated. This raw dataset comprised 3.5 million Arabic tweets. After filtering and cleaning, it shrank to 795,500 Saudi tweets. Then, we chose the sample of Saudi tweets randomly from the dataset to construct our corpus, AraCust [84], Table 4.

Dataset Cleaning, Pre-Processing and Annotation
To avoid the noise in the corpus, we performed cleaning of the datasets and preprocessing via a Python script. To reduce spam, retweets were excluded. In addition, non-Arabic tweets were removed, by filtering by language (lang: ar), as translation damages the classifier efficiency.
Additionally, unnecessary features in the tweets that might lower accuracy were removed from the tweet corpus before applying classifiers, such as user mentions (@user), emoticon, numbers, operators (+ =~$) and stop words (",", ".", ";"). The emoticon is deleted because we noticed that the classifier misunderstanding between the parentheses in the quote and in the emoticon as found by (Al-Twairesh 2016). In addition, reference [85] proved that the classification with keeping the emoticon decreased the performance of the classifier and they stated this due to the way of writing the Arabic sentience from the right-to-left and what causes from interchanging in the emoticons.
Moreover, tweets with a uniform resource locator (URL) were excluded, as most of them were news or spam. Then, the tweet corpus was processed using the natural language toolkit (NLTK) library in Python for normalization and tokenization. The words in the tweets were tokenized, which means that the sentences were segmented into words for easier analysis. Then, the tweets were normalized. Normalization is the unification technique for the types of certain Arabic letters of different shapes. As stemming algorithms do not perform well with dialectical Arabic words [86] they were not applied. Examples before and after pre-processing (AraCust) [84] are shown in Tables 5 and 6.  [84] to define the best performance suitable to the corpus and the dialect Arabic characteristics. Finally, the proposed model combining the AraBERT model and Bi-GRU predicted customer satisfaction for the three companies [84], Table 7.  [84] to define the best performance suitable to the corpus and the dialect Arabic characteristics. Finally, the proposed model combining the AraBERT model and Bi-GRU predicted customer satisfaction for the three companies [84], Table 7.  Table 6. Subset of the AraCust corpus after pre-processing.

Tweet in Arabic Label Company Tweet in English
Future Internet 2021, 13, x FOR PEER REVIEW 12 of 20 [84] to define the best performance suitable to the corpus and the dialect Arabic characteristics. Finally, the proposed model combining the AraBERT model and Bi-GRU predicted customer satisfaction for the three companies [84], Table 7.  [84] to define the best performance suitable to the corpus and the dialect Arabic characteristics. Finally, the proposed model combining the AraBERT model and Bi-GRU predicted customer satisfaction for the three companies [84], Table 7. To annotate AraCust corpus, three annotators were hired in this work following [16]. Our annotators, A1, A2, and A3, were all computer science graduates, native speakers of the Saudi dialect, and had prior annotation experience. In this research, we classified the corpora using binary classification (negative vs. positive) to predict customer satisfaction toward the telecom company, following many studies that used binary sentiment classification with Arabic text [16,[87][88][89].

Using the Model to Measure the Customer Satisfaction Rate
The study aimed to develop a potential model for the sentiment analysis of tweets to measure customer satisfaction and predict customer churn using the real-time method. The application was aimed at Saudi Telecom companies. We developed our model to predict customer satisfaction on the AraCust corpus [84] based on the predefined companies STC, Mobily and Zain.
In this study, our proposed model has been used based on two models: first, bidirectional Gated recurrent units (Bi-GRU) with Word2Vec model that achieved the best result with 95.16% for accuracy [44]. Second, three transfer networks designed for Arabic language AraBERT [90], hULMonA [91] and RoBERTa [92] models were utilized on AraCust [84] to define the best performance suitable to the corpus and the dialect Arabic characteristics. Finally, the proposed model combining the AraBERT model and Bi-GRU predicted customer satisfaction for the three companies [84], Table 7. We noticed that the customer satisfaction percentage for the three companies STC, Mobily and Zain were 31.06%, 34.25% and 32.06%, respectively (all below 50%). Perhaps that was because customers tend to post a negative tweet rather than a positive tweet on Twitter, as previously observed.
This study has used a sentiment analysis to design an accurate model by applying several approaches to measure customer satisfaction. Then, it has developed a questionnaire for the customers whose tweets were mined to evaluate the model by comparing the predicted customer satisfaction (using the model) with actual customer satisfaction (using the survey). From Table 7, we can see that our model achieved the goal of predicting the customer satisfaction of telecom companies based on the Twitter analysis. These results can provide insights for the decision-makers in these companies regarding the percentage of customer satisfaction and help in improving the services provided by these companies. These results should encourage the decision-makers to consider using Twitter analyses for measuring customer satisfaction and to include it as a new method for evaluating their marketing strategies.
Next, we used these results to further predict the customer churn for the telecommunication company that provided the historical data, specifically, and compared it to the customer churn percentage that we obtained from the company.

Historical Data Set Preparation
In the dataset preparation step, the variable data type is transformed, and the binary data are normalised. The goal of data preparation is to help the SentiChurn model deal with data easily [30]. The binary variable is normalised to '1' for 'yes' and '0' for 'no' as well as '0' for 'male' and '1' for 'female'. Regarding the continuous variables, such as age and long period as a customer, we transform them into categories as an ordinal variable and then assign them by sequential numbering starting from 1. The final collected variables and their types that will be used as inputs for our prediction model are listed in Table 2. The dataset captures the features of the population under study. The outcome from this step is the final dataset that will be used to train the model ( Figure 5). • FP: indicates that our model predicts the customer is a churner but the customer is a non-churner. • FN: indicates that our model predicts the customer is a non-churner but the customer is a churner. • TP: indicates our model correctly predicts the customer is a churner. • TN: indicates our model correctly predicts the customer is a non-churner.
There are other metrics used in addition to TP, TN, FN and FP, such as sensitivity, specificity and accuracy. The weakness of an accuracy measure originates from overusing the sensitivity and specificity measures (P. Li et al., 2014). Sensitivity is equal to recall. Meanwhile, specificity is the ratio of the negative correctly predicted as shown in the following equation: High sensitivity is more preferred than high specificity in telecom providers because the cost of an untrue classification of a non-churner is less than the cost of an untrue classification of a churner [1]. Some churning prediction studies prefer to evaluate model performance by using ROC and AUC because of the ability of these curves to remain the same with imbalanced data, even if the positive and negative instances are changing [1].
ROC is a two-dimensional curve drawn to show the relation between TP, the churner that is correctly predicted, and FP, the non-churner incorrectly predicted as a churner [93]. The best model performance occurs when the ROC is close to (0,1). A better model performance also has higher AUC.
Moreover, we used a cross-entropy/logarithmic loss (log loss) as loss function; both calculate the same in the classification problem. The loss function is an error metric to measure uncertainty. It is one of the measures used for evaluating the performance of a binary classifier from the probability estimation between 0 and 1. Log loss penalises both types of errors, especially those predictions where the confidence is inaccurate. If the log loss is closer to zero, then this indicates a good performance of the model.
Using the log loss provides us with an accurate view of our performance model based on the prediction of probabilities, not only the output.
where N is the number of items on the training set; 1/N is the probability of each class; log is the natural logarithm; y is the binary label, which is either 0 or 1; and p(y) is the probability predicted of the class.

Training the SentiChurn Model
We used the proposed model [94] Given that an overlap exists between a churner and a non-churner, the threshold 'cut-off' must be defined. Usually, the threshold is set as fifty per cent. Any probability right of the threshold has the most specificity, while any probability left of the threshold has the most sensitivity, as shown in Figure 6. The dashed line in Figure 7 is the threshold. Any probability above the threshold means higher sensitivity, with more churners correctly predicted and better model performance, whereas any probability under the threshold means higher specificity, with more non-churners incorrectly predicted and worse model performance. The closer curve to the top left corner (0,1) denotes the better prediction power of the model. The ROC of the class' 'churner' and 'non-churner' is 0.97; this denotes the power of our prediction model performance.  The classification report on Table 8 denotes the performance model, where the average metrics precision for both classes is 0.93, the average recall for both classes is 0.97, the average F1-score for both classes is 0.95, and the model accuracy is 95.8 per cent. In the confusion matrix (Figure 8), 13,611 non-churner customers were correctly predicted as non-churners by our model. Furthermore, 5549 churner customers were correctly predicted as churners by our model, 840 non-churner customers were predicted as churners by our model and no churners were predicted as a non-churner customer by our model.
As for the normalised confusion matrix (Figure 9), 94% of non-churner customers were predicted correctly, 100% of churner customers were predicted correctly and 0.06% of non-churner customers were predicted incorrectly.
The log loss score is 0.1, which means our model is fine. Figure 10 shows the probability distribution (x) with the log loss (y) and the distribution between the actual and predicted values.

Evaluating the Model
We evaluate the model by using the performance evaluation metrics and validating the percentage of customer churn that our model predicted versus that provided by the company.
The company presented a customer churn percentage of 27% from January 2017 to June 2017. Our model predicted the customer churn for the same period as 31.6%, which is close to the real percentage.
The model predicts the customer churn percentage based on the following equation: cust_churn = total_churner/(num_customers) × 100 where total_churner is the total number of churners in the dataset, and num_customers is the total number of all the customers in our dataset. After validating the customer churn percentage by using the historical data of customers and the customer satisfaction percentage predicted by Twitter mining, we were able to answer the RQ1, 'Is it possible to predict the customer churn of telecommunication companies in Saudi Arabia by analysing customers' tweets?'

Conclusions
With the rising growth of the telecommunication industry, the customer churn problem has grown in significance as well. One of the most critical challenges in the data and voice telecommunication service industry is retaining customers, thus reducing customer churn, by increasing customer satisfaction. The use of social media mining to predict customer churn in the telecommunication sector is unexplored. Therefore, new methods to extract real-time customer satisfaction feedback must be proposed and used to predict customer churn. The current customer churn models used in the telecom companies depend on the historical customer data which become less valuable over time for making predictions, due to the lack of Arabic resources for natural language processing (NLP) and because of the difficulty of the Arabic language. Our proposed SentiChurn model proved its efficiency firstly based on various standard metrics; average precision for our model was 0.93, the average recall was 0.97, the average F1-score was 0.95 and the model accuracy was 95.8%, and secondly based on a comparison with the ground-truth real and recent outcomes provided by a telecom company as 27% of customer churn rate.
In future work, we will try to obtain more historical data variables from the telecom company. In addition, we will apply more data mining techniques.

Conflicts of Interest:
The authors declare no conflict of interest.