Boosting Ant Colony Optimization with Reptile Search Algorithm for Churn Prediction

: The telecommunications industry is greatly concerned about customer churn due to dissatisfaction with service. This industry has started investing in the development of machine learning (ML) models for churn prediction to extract, examine and visualize their customers’ historical information from a vast amount of big data which will assist to further understand customer needs and take appropriate actions to control customer churn. However, the high-dimensionality of the data has a large inﬂuence on the performance of the ML model, so feature selection (FS) has been applied since it is a primary preprocessing step. It improves the ML model’s performance by selecting salient features while reducing the computational time, which can assist this sector in building effective prediction models. This paper proposes a new FS approach ACO-RSA, that combines two metaheuristic algorithms (MAs), namely, ant colony optimization (ACO) and reptile search algorithm (RSA). In the developed ACO-RSA approach, an ACO and RSA are integrated to choose an important subset of features for churn prediction. The ACO-RSA approach is evaluated on seven open-source customer churn prediction datasets, ten CEC 2019 test functions, and its performance is compared to particle swarm optimization (PSO), multi verse optimizer (MVO) and grey wolf optimizer (GWO), standard ACO and standard RSA. According to the results along with statistical analysis, ACO-RSA is an effective and superior approach compared to other competitor algorithms on most datasets. 62; can be used to predict the ones likely to join other operators. FS is a typical preprocessing problem in ML, concerning discrimination of salient and redundant features from each dataset’s complete set of features. This paper presented a new FS approach by combing the standard ACO and standard RSA for customer churn prediction. The combined method, ACO-RSA, utilized a serial mechanism to balance exploration and exploitation while avoiding traps in local optima. The efﬁciency of the proposed ACO-RSA is evaluated using seven public benchmark datasets form churn prediction application and ten CEC 2019 test functions. The reliability and performance of the ACO-RSA are compared with the standard ACO, the standard RSA and three other MAs: PSO, MVO, and GWO. The results showed that the ACO-RSA approach has higher accuracy with the minimum number of features over the other comparative methods. Statistical analysis also conﬁrmed the superiority of the ACO-RSA in terms of various measures. Therefore, the proposed ACO-RSA provides high reliability FS approach for the application of churn prediction. The main limitation of the proposed approach is the slightly high CT requirement during the training phase to specify the best combination of the tested element. We will apply the standard ACO and standard RSA in a parallel manner to reduce the CT in the training phase. In future, we would like to apply ACO-RSA approach to various classiﬁcation, regression, or clustering applications in renewal energy, internet of things and signal processing.


Introduction
The rapid evolution in the telecommunications industry has increased competition among service providers in the market, which has resulted in severe revenue losses because of churning [1]. Churner customers refer to those who leave a service provider and develop a new relationship with another provider in the market. It was confirmed that attracting new customer costs about five to six times the cost of retaining an existing one [2]. For this reason, telecommunications companies employ customer relationship management (CRM) as an integrated approach in their strategic plan to understand their customers' needs, and ultimately reduce customer churn [3]. The customers' historical data stored in such CRM systems can be transformed into valuable information with the help of ML. The results from these techniques can assist these companies in formulating new policies, detecting customers who have a high tendency to end their relationship with the company, and developing retention strategies for existing customers [4].
In ML techniques, data preprocessing is vitally essential, and feature selection (FS) is generally considered as a foremost preprocessing step. FS techniques aim to determine the optimum feature subsets (OFS) by removing redundant and irrelevant features from high dimensional data without changing the original data representation. It has been proven that the use of FS in the ML learning process has several benefits [5,6], such as that it reduces the amount of required data to achieve a good learning process and that it improves prediction performance and minimizes CT.
FS techniques have been successfully applied in different applications and delivered promising results. Among these techniques, metaheuristic algorithms (MAs) have shown significant success in several applications, such as vehicle routing [7], energy consumption [8], fuzzy control design [9], e-commerce [10], medical diagnosis [11] and others, mainly because of their capability to provide high-quality OFS [12]. MAs utilize two search principles: exploration, where the algorithm investigates different candidate regions in the search space, and exploitation, the algorithm searches around the obtained promising solutions to improve the existing ones.
According to [13], MAs can be grouped into (i) single solution-based algorithms and (ii) population-based algorithms based on their behaviors. The first group exploits prior search knowledge to expand the search space in some promising environments. Tabu search [14], greedy randomized adaptive search [15], and vector neighborhood search [16] are examples belonging to this group of algorithms. Population-based algorithms generate optimal solutions by exploring a new region in the search space via an iterative process for generating a new population through nature-inspired selection. GWO [17], cuckoo search algorithm [18], PSO [19], firefly algorithm (FFA) [20], crow search algorithm [21], dragonfly optimization algorithm [22], ACO [23], MVO [24], and RSA [25] are examples of the well-known MAs in this group.
In recent years, various researchers explored MAs for customer churn prediction. In [26], the customer churn prediction business intelligence using text analytics with metaheuristic optimization (CCPBI-TAMO) model is reported. The authors used pigeon inspired optimization (PIO) to select OFS from a customer churn dataset collected from a business sector and used that OFS as inputs to long short-term memory (LSTM) with a stacked autoencoder (LSTM-SAE) model. Their proposed model outperformed other existing models used in their work. In [27], FFA was applied on both classification and FS using a huge publicly available dataset of churn prediction and the authors reported that FFA performed well for this application. The potential of ACO to predict customer churn was investigated and discussed in [28]. The results reported that the ACO attained an effective performance compared to other MAs. In [29], combined multiobjective costsensitive ACO (MOCS-ACO) with genetic algorithm (GA) to improve classification results. The GA is employed to select OFS while the MOCS-ACO is used as a classification model. Experimental results reported that the combined model performed well when validated on a customer churn prediction dataset obtained from a company in Turkey. In [30], the authors employed ACO to identify OFS and the identified features are then fed to the gradient boosting tree (GBT) model. The results showed that the proposed ACO-GBT model produced good results in predicting customer churn.
In [31], the authors employed PSO to choose OFS and then the selected features were used by an extreme learning machine (ElM) classifier for churn prediction using a public dataset. In [32], the authors employed information gain and fuzzy PSO to determine OFS from two publicly available churn prediction datasets and then the selected features were used by divergence kernel-based support vector machine (DKSVM) to predict churners and nonchurners. In [33], a hybrid model based on PSO and feed-forward neural networks for churn prediction was reported to select OFS from one public dataset and another private dataset. In [34], three variants of PSO were reported for churn prediction using a public dataset. These variants comprise PSO incorporated with FS as a preprocessing step, PSO embedded with simulated annealing (SA), and PSO combined with FS and SA.
All these studies have reported promising results for using MAs to select the most informative features in churn prediction. Although most of these efforts have used MAs to select OFS, a quantitative analysis of methods' capabilities in terms of accuracy, number of features in OFS, fitness values, and CT in this application is not reported. Thus, there is an imminent need for further works to propose new MAs for FS in this application. Most of these works are limited to using an individual MA. Therefore, combining Mas to produce a hybrid FS method for this application is worth being investigated, while selecting OFS in this application is very important for reliable and safe predictions to the customers who are going to end the relationship and develop a new one with another competitor. Motivated by these limitations, we propose a new metaheuristic-based approach called ACO-RSA that combines standard ACO and RSA in a serial collaborative manner to find the most appropriate features for churn prediction. The comparison with five popular Mas, including PSO, MVO, GWO, standard ACO, and standard RSA, validates the effectiveness of the ACO-RSA approach. The contributions of this paper can be summarized as follows: • A new metaheuristic-based approach, namely ACO-RSA, is proposed for churn prediction, the standard ACO and RSA are combined in a serial collaborative mechanism to achieve exploration-exploitation balance in the proposed ACO-RSA and avoid getting stuck in local optima. • Seven publicly available benchmark customer churn datasets with different records and features and ten CEC 2019 test functions are utilized to check the stability of ACO-RSA performance.

•
We also investigate the convergence behavior, statistical significance, and explorationexploitation balance of the proposed ACO-RSA against the competitor MAs.
A brief overview of ACO and RSA is provided in the next section, followed by detailed explanations of the suggested ACO-RSA approach. The experimental results are discussed in Section 4. Finally, conclusions are noted in Section 5.

Ant Colony Optimization (ACO)
Ant colony optimization (ACO) is a nature-inspired MA that mimics ants' searching process for food sources [23]. The characteristics of ACO make the model more sensible than other MAs as it supports parallel processing while avoiding the process dependency and it gives feedback on ants' behaviors in the search space [35]. Ants are not blind when searching for food because they can find the shortest route between their nest and a food source. While moving, ants deposit a chemical material, known as a pheromone, along their trails. The pheromone is a medium for communication among ants and represents the shortest path to collect food. Ants move towards food by sensing the deposition of pheromone by other ants that have previously traveled the path, which subsequently increases the probability of other ants traversing via the same path.
ACO uses two factors: the pheromone trail and heuristic information to make probabilistic decisions. The ants update the pheromone level at any feature as they traverse a path. The more ants traverse a feature, the more pheromone deposition at that feature, resulting in a higher probability of the feature being part of the shortest path. The path with the highest pheromone level will be followed by the maximum number of ants and will be the shortest path. The pheromone value τ 0 = 1 is initialized at all M features, and ants are positioned randomly on a set of features with a predefined maximum number of generations T. At every generation g, the transition probability TP k i (g) of kth ant at ith feature is shown below [36,37]: where j k i is a set of possible neighbors of ith features that are not visited by the kth ant. The relative importance of pheromone level (τ i ) and heuristic information (η j ) for the ants' movements are specified by non-negative parameters α and β, respectively.
After choosing the next feature in the ant's path, a fitness function (FF) is employed to quantify the new set of selected features. The movement of kth ant is stopped if the improvement in the fitness value is not attained after adding any new feature. If the stopping criteria is not reached, the amount of pheromone level at next generation (g + 1) at ith feature is updated as [38]: where, where p is the pheromone decay rate (0 ≤ p ≤ 1), N is the number of ants, S k (g) presents number of the selected features, and ∆τ k i represents the pheromone deposited by kth ant if ith feature is in the shortest path of the ant; otherwise, it is 0.
The stopping criteria are achieved when g reaches the predefined maximum T. The set of features with the highest pheromone level and smallest fitness value will be selected as an OFS. Figure 1 shows the overall process of the ACO.

Reptile Search Algorithm (RSA)
Reptile search algorithm (RSA) is another nature-inspired MA proposed by [25] in 2021 to simulate crocodiles' encircling and hunting behavior. It is a gradient-free algorithm that starts by generating random solutions as follows: where, x i,j is the ith solution for jth input feature for total N solutions comprising M features, rand ∈[0,1] is a random number distributed uniformly in the range [0, 1] (0, 1), and the jth feature has upper UB j and lower LB j boundaries. Like the other nature-inspired MAs, RSA can be understood in two principles: exploration and exploitation. These principles are facilitated by crocodile movement while encircling the target prey. Total iterations of RSA are divided into four stages to take advantage of the natural behavior of crocodiles. In the first two stages, RSA achieves the exploration based on the encircling behavior comprising the high and the belly walking movements. Crocodiles begin their encircling to search the region, facilitating a more exhaustive search of the solution space. This behavior can be mathematically modeled as follows: where, Best j (g) is the best solution for jth feature, n i,j refers to the hunting operator for the jth feature in the ith solution (calculated as in Equation (6)), parameter γ controls the exploration accuracy throughout the length of iterations and is set as 0.1. The reduce function R i,j is used to reduce the search region and is computed as in Equation (9), rand ∈[1,N] is a number between 1 to N used to randomly select one of the possible candidate solution, and evolutionary sense ES(g) stands for the probability ratio reducing from 2 to −2 over iterations, calculated as in Equation (10).
where, P i,j indicates the percentage difference between the jth value of the best solution to its corresponding value in the current solution and is calculated as: where θ denotes a sensitive parameter that controls the exploration performance, is a small floor value, and M(x i ) refers to the average solutions and is defined as: where the value 2 acts as a multiplier to provide correlation values in the range [0, 2], and rand ∈[−1,1] is a random integer number between (−1, 1). In the last two stages, RSA implements the exploitation (hunting) to search feature space for optimal solution using two ways: hunting coordination and cooperation. The solution can update its value during the exploitation using the following equation: The quality of candidate solutions at each iteration is measured using the predefined FF and the algorithm stops after T iteration and a candidate solution with least fitness value is selected as OFS. The process of the RSA is shown in Figure 2.

Proposed ACO-RSA Based FS Method
In ACO, a path with the highest pheromone level is the shortest path to transport the food from the source to the nest. Most ants will follow this path unless there is some obstruction and might limit ACO from exploring the quality of existing solutions by searching only within the current search space [38]. High exploration in MA reduces the quality of the optimum solutions, and fast exploitation prevents the algorithm from finding global optimum solutions [39]. RSA is the most recent MA, which shows superiority to solve several engineering problems and has an excellent exploration capability. It has an inbuilt exploration-exploitation balance that significantly enhances its performance [25]. Different MAs can be combined effectively to use the algorithm's merits while maintaining exploration-exploitation balance and avoiding premature convergence in local optima.
According to [40], there are several ways to hybrid Mas. High-level relay hybrid (HRH) strategy is one of these methods. In this strategy, two MAs can be executed in homogenous (i.e., same algorithms) or heterogeneous (i.e., different algorithms) sequences. The proposed ACO-RSA method uses the heterogeneous HRH strategy to achieve exploitationexploration balance as in RSA with high exploitation as in ACO. Figure 3 illustrates the overall process of the ACO-RSA approach. At first, ACO, RSA, and shared parameters are initialized. A random number uniformly distributed in the range (−1, 1) initializes N candidate solutions {x i,j ∈ X(0) |1 ≥ i ≥ N and 1 ≥ j ≥ M} each for M-dimensional feature vectors. Then FF evaluates candidate solutions to judge the enhancement by comparing current solutions with the obtained solutions in the previous iteration. If the current solution is better than the previous solution, it will be accepted; otherwise, it will be rejected. The threshold used to convert candidate solutions during the searching process for the informative features into binary vectors is set to 0.5, as recommended by [39,41], to produce a small number of features. K-nearest neighbor (KNN) is a widely used classifier due to its simplicity, fast, and flexibility to deal with noisy data [42]. KNN with a Euclidean distance measure (k = 5) is employed as the classifier. Hence, the FF is considered to achieve dimensionality reduction (by minimizing the number of the selected OFS) and maximum accuracy (by reducing classification error). Therefore, it is defined using the following equation: where, γ and β are weighted factors that vary in the range of (0, 1) (subject to γ + β = 1) to balance the number of features in OFS d i out of M features in the original dataset. The parameters γ and β are set to 0.99 and 0.01 respectively [41]. and the is the number of correctly classified instances N c out of total N instances in the original dataset by the KNN classifier. Each feature in the OFS follows: Then the best solution is determined and the current solution X(0) are assigned to the candidate solutions of the ACO. In addition, the ACO starts with assigning each candidate solution x i,j as an initial path for an ant in the colony. An ith ant initially traverses a subset of features initialized with a pheromone value x i,j greater than 0.5 and updates candidate solutions X new according to Equations (1)-(3). The FF evaluates the enhancement in the candidate solution, and it is updated only if its fitness value is decreased after the update. A candidate solution is updated according to the following equation: In the next iteration, the set of candidate solutions X(g + 1) are given as initial candidate solutions (after thresholding) to either ACO or RSA to extend the searching process into other promising regions in the feature space. If the least FF value in the current iteration is smaller than the smallest FF value in the previous iteration (min(FF(x i )|x i ∈ X(g)) < min(FF(x i )|x i ∈ X(g − 1))), the same algorithm continues in the next iteration; otherwise, an algorithm switching flag is set to switch between the two algorithms. The main goal behind the switching between the two algorithms is that if the ACO could not improve the candidate solutions, it might get stuck in local optima. At this point, the RSA will be employed in the next iteration and will move the candidate solutions into another search region using Equations (4)-(11) to find some better solutions. This process is repeated until the maximum iterations T is reached. A candidate solution with the smallest FF value {min(FF(x i ) | x i ∈ X(T)} is used to extract OFS. During the testing phase, a reduced feature set is obtained by filtering only the selected features (i.e., OFS). This OFS is used to evaluate classifier performance metrics, as will be discussed later in Section 4.3. The steps of the ACO-RSA are shown in Algorithm 1. Form mutually exclusive and exhaustive training and testing subsets. Training Phase 2: Load training dataset 3: Initialize ACO parameters τ 0 , η, p, α, β 4: Initialize RSA parameters γ, θ, UB, LB, n 5: Initialize shared parameters N, M, T 6: for g = 1 to T do 7: if first iteration 8: Perform one iteration of ACO using Equations Equations (4)

Experiments and Results
This section provides the experiments that are performed to assess the ACO-RSA and compare its performance with PSO, MVO, GWO, standard ACO, and standard RSA for FS on seven datasets.

Experimental Setup
All the experiments are implemented using Python and they are executed on a 3.13 GHz PC with 16 GB RAM and Windows 10 operating system.
The performance of the proposed ACO-RSA is validated by conducting experiments on publicly available benchmark datasets for customer churn. The characteristics of these datasets are presented in Table 1. It shows the number of classes, the number of features, the number of instances, and the dataset source. Each dataset is divided randomly into the ratio of 50% as a training set and the remaining as a test set.

Parameter Settings
The ACO-RSA approach is examined with several well-known MAs, and these algorithms include PSO [19], WOA [24], GWO [17], ACO [23], and RSA [25]. Parameter settings play a critical role in enhancing the performance of MAs. For all MAs, the population of 20 and the maximum iterations of 50 are selected empirically. Each algorithm is executed 20 times independently to obtain reliable analysis. In addition, the default parameter settings for each comparative algorithm are defined according to its implementations, and they are presented in Table 2.

Evaluation Measures
To assess the reliability and performance of the ACO-RSA approach against the other comparative MAs, a set of evaluation measures are used, and they include, accuracy, fitness function, number of selected OFS, and computational time.

•
Average accuracy (AvgACC) calculates the average of the accuracy for all runs. In this paper, the proposed ACO-RSA and the other MAs are executed 20 times (N r = 20): • Average fitness function (AvgFitF) metric quantifies the performance of the proposed ACO-RSA and the other MAs, which puts the relationship between maximizing classification accuracy and minimizing the number of the selected OFS. Its average is computed by using the following: • Average OFS (Avgofs) represents the average number of the selected OFS to the total number of features in each dataset (D) at run number i: • Average computational time (AvgCT) measures the average CPU time in seconds for the proposed ACO-RSA and the other MAs at the run number i:

Results and Analysis
In this subsection, the performance results of ACO-RSA and the comparative MAs are demonstrated not only using the measurements of performance mentioned in the previous Section 4.3, but also based on the convergence behavior, boxplots graphs, statistical analysis and exploration and exploitation effects.

Performance Results
The performance of the ACO-RSA and the other MAs on seven open-source customer churn datasets are given in Tables 3-6. Each MA is executed 20 times independently to obtain reliable analysis and conclusions. Table 3 compares all the algorithms in terms of the average (Avg) testing accuracy and the number of OFS. Table 4 reports the best and worst fitness values obtained by the ACO-RSA and other MAs, while the Avg and standard deviation (Std) of the fitness values are summarized in Table 5. The average CT in seconds for the ACO-RSA and other MAs on all seven datasets are provided in Table 6. The testing accuracy varies in the range of 0-1; where, 0 means a total misdetection while 1 means a perfect detection. The number of features in OFS varies from 1 to the total number of features in the respective dataset. A good MA should maximize classification accuracy and minimize the number of selected OFS. In Table 3, the ACO-RSA gained better accuracy than the other MAs on five out of seven datasets. Comparing OFS for each dataset, ACO-RSA required the least number of informative features than the other MAs. This proves the capability of the ACO-RSA in reducing the selected OFS while obtaining a higher accuracy result.
The fitness value is a singular measure that varies from 0 to 1, with a preference for the value closer to 0 (an ideal value that cannot be achieved), indicating better detection with fewer features. In Table 4, the best fitness value for the ACO-RSA arrived at the minimum value in five out of seven datasets, while the worst fitness value for ACO-RSA is the smallest in six datasets. Although RSA scored the smallest fitness value in datasets 2 and 6, the ACO-RSA has better testing accuracy than standard RSA. Similarly, the PSO achieved the smallest worst-case fitness for dataset 7, but Table 3 confirms slightly superior performance of ACO-RSA than the PSO. The ACO-RSA and standard RSA obtained the first and second rank in the best and worst fitness value range, respectively. Table 5 provides the Avg and Std of the fitness values for all the MAs and datasets over 20 independent runs. A good MA should have a smaller Avg and Std of fitness values to signify the stability and consistency. As shown in Table 4, the ACO-RSA has the smallest Avg fitness value in six out of seven datasets and the smallest Std in five out of seven datasets. The PSO and MVO have the least Avg and Std for datasets 5 and 7, respectively. In Table 5, it can be observed that ACO-RSA approach is more stable than the other comparative algorithms which has the smallest Std in five out of seven datasets. Although the Std values of the PSO and the standard RSA are smaller than those corresponding to ACO-RSA for datasets 3 and 4, the Avg fitness values of ACO-RSA are slightly smaller in both cases.
The number of features in each dataset and its size (i.e., samples) affecting the CT. For instance, a greater number of features and size as in the case of datasets 3, 4, and 7, the algorithms take more CT to find OFS. Table 6 provides the Avg results in terms of the CT. As per Table 6, the standard RSA gets the smallest CT, followed by the ACO-RSA, to finish the job compared to other MAs. It can be observed that for small datasets, the difference in CT between the standard RSA and the proposed ACO-RSA is not significant, while the CT increases to a significant value for large datasets. It should be noted that CT is essential only during the training phase for practical applications and is independent for the FS algorithm in the testing phase. Hence, ACO-RSA would still be suitable for most real-time implementations of the application of churn prediction. Figure 4 presents the switching behavior of the proposed ACO-RSA during 50 exploitationexploration iterations for all the seven datasets. The total number of iterations for the ACO and RSA are displayed in each switching behaviors in the last column in Figure 4. In datasets 3 and 4, ACO uses slightly more iterations than RSA to exploit many features. The iterative design of ACO requires more than one iteration to build confidence in the estimated shortest path. Hence, Table 6 shows a significantly higher CT for these datasets. On the other hand, datasets 1, 2, 5, and 6 have comparatively fewer features, and therefore, more iterations are used by the RSA, resulting in a very close CT to the one obtained using the proposed ACO-RSA. For dataset 7, a very higher number of training examples causes a larger delay for each iteration of ACO. This results in a significant impact on the CT of the proposed ACO-RSA than the fastest RSA algorithm.  Figure 5 demonstrates convergence behavior of the ACO-RSA and the other comparative MAs for all datasets over the defined number of iterations on the x-axis and the fitness values on the y-axis. It presents the average convergence behavior obtained by executing each algorithm 20 times. In these convergence curves, the method with the rapid convergence is the best one.

Convergence Behavior
Although the standard RSA converges faster than ACO-RSA in dataset 4, the final fitness value of the ACO-RSA is slightly smaller than the RSA. It is clearly observed in Figure 5 that ACO-RSA shows a faster convergence rate and finds OFS in the least iterations for datasets 1, 2, 3, 5, and 6. This proves that the proposed ACO-RSA is suitable for churn prediction compared to others comparative methods. Figure 6 demonstrates the boxplots which is used to visualize the distribution of classification accuracy for the ACO-RSA and the comparative MAs. In this figure, the x-axis represents the MAs and the y-axis represents the average accuracy.

Boxplots Graphs
In the boxplots, a small degree of dispersion (the gap between the best, the median, and the worse) refers to the algorithm's robustness that achieves the same results in the experiment. It can be seen from Figure 6 that the ACO-RSA is more robust than the other comparative MAs on most the datasets. This indicates the efficacy and robustness of the ACO-RSA approach compared to the PSO, MVO, GWO, standard ACO, and standard RSA methods.

Statistical Analysis
To show the significance results of the ACO-RSA, Friedman test is performed, which is a widely used nonparametric two-way analysis of variances by ranks [44], on seven datasets for 20 independent runs. In this test, the null hypothesis H 0 affirms the equal behavior of the comparative methods, while the alternative hypothesis H 1 indicates the difference in behaviors of the comparative methods. In the Friedman test, the higher (lower) rank refers to the best measure algorithm assuming the larger (smaller) value is preferred. In the current scenario, H 0 points out that all the MAs have the same behaviors, while H 1 points out that there is a significant difference in the MAs behaviors. Table 7 provides the Avg ranking for each algorithm in terms of accuracy, the number of features in OFS, and fitness value. The significance level (α = 0.05) is employed to reveal the statistically reliable results. The highest p-value calculated using Friedman's test for all seven datasets is 0.0026, which is less than α. The lower the p-value, the greater the statistically significant difference, and therefore, the results are statistically significant. For the classification accuracy metric, the higher value is better, indicating that the method with the highest rank has better performance, while for the OFS and fitness value metrics, the method with the lower rank is preferred. In Table 6, the proposed ACO-RSA gained the best results of accuracy, OFS, and fitness value metrics than the results produced by the PSO, MVO, GWO, standard ACO, and standard RSA in five out of seven datasets. However, in the case of OFS, the RSA achieved slightly better results than the proposed ACO-RSA for datasets 5 and 7. Holm's procedure is used as a post hoc method to statistically confirm the differences in the behavior between the controlled algorithm and the other methods. In Holm's test, p-values are adjusted to control the probability of false positives. The controlled and alternate hypotheses are evaluated using pairwise comparison of p-values. The alternate hypothesis is rejected if the adjusted p-value is smaller than the original p-value. A hypothesis is rejected if there is a significant difference between the controlled method and comparative methods; otherwise, it is not rejected.
In the current work, ACO-RSA is employed as the controlled algorithm. The results of Holm's procedure in terms of fitness values for the controlled method and other comparative algorithms are given in Table 8. Based on Table 8, there is a significant difference between the controlled method and other MAs in most cases. However, the controlled method shows no significant results than the standard ACO and standard RSA in dataset 3 and the MVO in dataset 5. The overall performance results of the ACO-RSA approach are significantly different from the rest of MAs. These results prove the superiority of the ACO-RSA approach as a FS method for customer churn prediction.

Exploration and Exploitation Effects
As mentioned earlier, exploration and exploitation are the two main principles in any search algorithm. These phases are obtained using the dimension-wise diversity measurement presented in [45]. In this approach, the exploration can be measured during the search process by the increased mean value of distance within dimensions of the population and exploitation phase by the reduced mean value, where search agents are in a concentrated region. Figure 7 provides exploration-exploitation ratios during the searching process for all MAs on each dataset for 50 iterations. From the bar charts in Figure 7, ACO-RSA maintains better balance between exploration and exploitation for all the seven datasets. Although PSO balanced exploration-exploitation for the first four datasets, it fails to maintain the balance (has high exploitation) for the remaining three datasets. Most other algorithms have shown high exploitation which can either be confirmed through the literature or by analyzing the algorithm design. In standard ACO, ants travel the path iteratively to bring out the best solution, representing higher exploitation than exploration. Standard RSA achieved this balance by splitting the total iteration into four stages, but it failed for four out of seven datasets.

CEC 2019 Test Functions
To show the capability of the ACO-RSA compared to standard ACO and standard RSA, ten standard well-known test functions from CEC 2019 test functions with dimension 50 and search range as in the work of [25], which have been widely used in recent years, are chosen. Table 9 provides a summary of these functions. To achieve the simulation criteria, i.e., the Avg and Std values, the algorithm for each function of CEC 2019 has been performed by each algorithm 20 independent runs and the results are given in Table 10. It can be observed that ACO-RSA achieved better performance in three out of ten functions than standard ACO and standard RSA. For functions F1, F2, F4 and F9 both ACO-RSA and RSA achieved the best Avg and Std results. For functions F5 and F8, ACO and RSA reported the best average performance, respectively.

Conclusions and Future Work
In the telecommunication sector, churn prediction models are broadly employed to analyze and discover patterns in massive data using ML, so that past customers' behavior can be used to predict the ones likely to join other operators. FS is a typical preprocessing problem in ML, concerning discrimination of salient and redundant features from each dataset's complete set of features. This paper presented a new FS approach by combing the standard ACO and standard RSA for customer churn prediction. The combined method, ACO-RSA, utilized a serial mechanism to balance exploration and exploitation while avoiding traps in local optima. The efficiency of the proposed ACO-RSA is evaluated using seven public benchmark datasets form churn prediction application and ten CEC 2019 test functions. The reliability and performance of the ACO-RSA are compared with the standard ACO, the standard RSA and three other MAs: PSO, MVO, and GWO. The results showed that the ACO-RSA approach has higher accuracy with the minimum number of features over the other comparative methods. Statistical analysis also confirmed the superiority of the ACO-RSA in terms of various measures. Therefore, the proposed ACO-RSA provides high reliability FS approach for the application of churn prediction. The main limitation of the proposed approach is the slightly high CT requirement during the training phase to specify the best combination of the tested element. We will apply the standard ACO and standard RSA in a parallel manner to reduce the CT in the training phase. In future, we would like to apply ACO-RSA approach to various classification, regression, or clustering applications in renewal energy, internet of things and signal processing.