Chaos Embed Marine Predator (CMPA) Algorithm for Feature Selection

: Data mining applications are growing with the availability of large data; sometimes, handling large data is also a typical task. Segregation of the data for extracting useful information is inevitable for designing modern technologies. Considering this fact, the work proposes a chaos embed marine predator algorithm (CMPA) for feature selection. The optimization routine is designed with the aim of maximizing the classiﬁcation accuracy with the optimal number of features selected. The well-known benchmark data sets have been chosen for validating the performance of the proposed algorithm. A comparative analysis of the performance with some well-known algorithms advocates the applicability of the proposed algorithm. Further, the analysis has been extended to some of the well-known chaotic algorithms; ﬁrst, the binary versions of these algorithms are developed and then the comparative analysis of the performance has been conducted on the basis of mean features selected, classiﬁcation accuracy obtained and ﬁtness function values. Statistical signiﬁcance tests have also been conducted to establish the signiﬁcance of the proposed algorithm.


Introduction
In recent years, the application of optimization in the field of data-mining has been reported in many published approaches. Feature selection (FS) from a large data set is also one of the optimization problems. The FS problem has many industrial and healthcarerelated applications. An effective FS technique can enhance the classification accuracy of the classifier and reduce the complexity of the system. The complexity of the system substantially enhanced with the dimension of the data. In other words, it speeds up the learning rate and improves the ability of a machine to anticipate the information pertaining to the data. The recent application of the FS technique in the field of healthcare is reported in [1], where an ensemble-based hybrid feature selection has been employed for the diagnosis of the brain tumor. The authors claimed that the proposed method is able to handle the imbalanced data. A network intrusion detection scheme based on the Least Square Support Vector Machine has been proposed by the authors [2]. The authors validated the approach on intrusion data sets. The problem of the high dimensionality of feature space pertaining to text characterization has been addressed in reference [3]. In this work, the authors proposed a novel Gini index for the classification and reduction of the features. Feature selection for the Brain Computer Interface (BCI) has been conducted with the help of information gain ranking, correlation-based feature selection, ReliefF, consistency-based feature selection and 1R ranking methods in the approach [4]. A brief classification of the feature selection algorithms are given in Figure 1. A very interesting approach on the path planning for the mobile robot is proposed in reference. For defining the obstacle, the situation of workers in the Artificial Bee Colony has been utilized and in the second phase, the shortest path is selected by Dijkstra's algorithm [5]. A very important application of the ABC algorithm has been reported for the identification of mechanical parameters of the Servo-drive system [6]. A novel approach of the Adaptive Procedure for Optimization Algorithms is proposed in reference [7]. Apart from these approaches, recent approaches based on the metaheuristic optimization motivated the author to employ the optimization algorithm in a feature selection task [8][9][10]. These references provide strong evidence of what optimization algorithms are capable of for dealing with complex engineering problems.
Apart from the application of metaheuristic optimization algorithms and evolutionbased algorithms, there are many deterministic algorithms that are also employed for conducting feature selection tasks. Due to the deterministic nature or gradient-based mechanism, these algorithms are often stuck in a local minima trap and provide slow and premature convergence. For avoiding such problems and to provide a smooth and fast optimization environment, metaheuristic techniques are employed for executing feature selection problems. The recent trend is to apply the metaheuristic optimization algorithm for conducting this task; some of the fine approaches are depicted in the following references, where the application of the Hybrid Whale Optimization Algorithm (HWOA) [11] is explored with the amalgamation of the Whale Optimization Algorithm and Simulated annealing Algorithm (SA). A chaotic dragonfly algorithm has been proposed and applied on the feature selection task in reference [12].A similar approach based on the chaotic selfish heard optimizer has been proposed in reference [13]. A rich review of literature pertaining to the feature selection methods have been demonstrated in reference [14]. S-shaped and V-shaped functions are employed to create a binary search space in gaining and sharing a knowledge algorithm for the feature selection task in reference [15].

Some Recent Chaos-Based Approaches for Feature Selection
A chaotic optimization algorithm based on gaining and sharing knowledge-based optimization has been proposed in reference [16], as well as the the similar applications based on chaotic fruit fly optimization [17], chaotic crow search algorithms [18], chaotic multi verse optimizer [19] and chaotic salp swarm optimizers [20].
From these approaches, it is evident that the embedding chaos for making naive algorithms compatible for feature selection is a potential area of research. These approaches are strong evidence that by embedding chaos in the mechanism of algorithms, a substantial improvement can be achieved as far as classification accuracy and reduction in dimensionality is considered. Based on this discussion, the following subsection presents the research proposal for the work and objectives.

Research Objectives and Proposal
Recently, a new metaheuristic has been proposed [21] based on predatory behavior. The algorithm is known as the marine predator algorithm (MPA). The application of this algorithm in a multi-objective domain has been explored in reference [22]. A new improved model of MPA has been established in reference [23]. The paper touched the theme of introducing an opposition-based learning method, chaos map, self-adaption of population, and switching between exploration and exploitation phases. Application of this algorithm has been explored in the field of controller tuning. Further, a hybrid computational intelligence-based approach has been proposed for structural damage detection in reference [24].
Keeping these facts in mind, the work proposed in this paper addresses following objectives.

1.
To propose a chaotic marine predator algorithm and develop a balance between the exploration and exploitation phase considering the binary search space.

2.
To benchmark the proposed algorithm on a standard data set used in state-of-the-art classification tasks.

3.
To evaluate the performance of the proposed algorithm with some recently proposed approaches in the feature selection domain.

4.
To evaluate the performance of the proposed algorithm on certain evaluation criterion such as the statistical parameter calculation such as mean feature selected by algorithms, mean values of classification accuracy obtained in optimization runs and mean fitness values. Apart from these statistical attributes, a statistical test has also been conducted for showcasing the statistical significance of the algorithm.
The remaining part of this paper is organized as follows: in Section 2, brief details of the MPA are discussed. Section 3 presents the basic framework of the chaos embed marine predator algorithm (CMPA). Section 4 presents the problem formulation and details of the objective considered in this study. Section 5 presents the results and analysis of different tests. Section 6 concludes all major findings.

Marine Predator Algorithm: An Overview
The marine predator algorithm (MPA) [21] is a recently developed optimization technique that is based on the philosophy that while predator is searching for the prey, the prey also updates its position according to the location of food. The MPA presents a beautiful mimicry of a social life in terms of mathematical representations. This section briefly discuss the steps incorporated in the development of MPA. The different steps of MPA are as follows 1.
Conceptualization of MPA: Like other nature-inspired algorithms, the initial population in MPA is equally scattered in the search region, which can be given as: Here, U b and L b are the minimum and maximum values of variables and r is an arbitrary number satisfying 0 < m < 1.
Following the well-known Darwinian fittest theory in MPA, a group of best predators are selected as a final solution. In MPA, the initial location of the prey can be expressed as the following matrix of order n × d, where n represents the number of search agents and d is the dimension of the problem.
where Y tp 1,1 represents the first top predator vector, which is replicated n times to construct the Elite matrix TPR EM , which can be extended up to n times and d dimensions. In MPA, the prey is searching for food and the predator is searching for prey, hence both can be considered as search agents. The matrix TPM has taken initial solutions, and after every iteration, the position of prey has improved. This updated matrix is called the elite matrix TPR EM . The prey matrix (TPM) is given by following expression.
Y i,j denotes the location of i-th prey in the j-th dimension. It is to be noted that during the search process both prey and predators are search agents and they search for food.

1.
Optimization steps: As predators and prey are two search agents of MPA, the whole optimization process depends on their proportional velocity. To illustrate the optimization process scientifically, it can be spilt up into three stages. Each stage predefined a natural order and time and was inspired by the natural behavior of the prey and predator. These stages are as follows: • Stage 1: If the velocity of predator is greater than prey. This case occurs in the initial steps or in intensification. When the proportion velocity is very high, i.e., (≥10), then the predator is almost still. This can be mathematically written as when t < T max /3, where t is the current iteration and T Max maximum values of iteration.
where step i = step size of i-th iteration, R B = vector including arbitrary numbers related to Brownian motion, K = constant number taken as equal to 0.5 and R = a vector of arbitrary numbers ∈ [0, 1]. This stage occurs in almost the first 33 percentage of the total iteration, when the intensification is high. • Stage 2: If the proportional velocity of predator and prey is almost the same, which indicates that the prey is looking for its food and the predator is looking for its prey. This case happens in middle iterations, when intensification is slowly converting into diversification. At this time, half of the part of the population, i.e., predator, is accountable for the intensification and the prey is responsible for the diversification. If the prey follows the Levy motion and the predator follows the Brownian motion, then we get proportional velocity (≈1). Mathematically, when 1 3 T max < t < 2 3 T max . For the first part of the population: Here, the R L = vector includes arbitrary numbers related to the Levy motion. As in the Levy distribution, the step size is very small, hence this movement represents diversification.
In the second half population MPA consider is a control parameter that commands the step size of movements of the predator. The predator moves according to the Brownian motion and the prey follow the predator for its position updates. • Stage 3: If the proportional velocity ratio is low, i.e., the predator is moving faster in comparison to the prey. This situation occurs in the last iterations of optimization, and is related to diversification. The predator adopts the Levy motion in the case of low proportional velocity (=0.1). This can be given in the These three stages present different steps of predators in finding their prey. According to their behaviour, we consider that the predator follows both the Brownian and Levy motion equally. In stage I, the predator is still, in stage II it follows the Brownian motion and in the last stage it moves in the Levy motion. These same things are also followed by the prey, as the prey is also a predator for some other marine creatures. For example, bony fish and marine invertebrates are prey for tuna fish and themselves a prey for silky sharks.

2.
Fish Aggregating Device Effect (FAD): FAD is a floating device made by humans to find some specific marine creatures in tropical regions. It also affects marine animals in many other ways. According to [25], 80% of the lifespan of sharks has been spent around FAD and the rest in jumping in various dimensions to find prey. These FADs can be considered as local optima trapping agents of marine predators. The effect of FADs can be given mathematically as: Here, f is the probability of the FAD effect on any optimizer and taken as f = 0.2, q = a is the random number between 0 and 1, and r 1 and r 2 represent two arbitrary indexes of the prey matrix.

− →
3. Memory of marine predators: Almost all marine predators are good at memorizing their location of successful foraging, which is referred to as the memory saving term in MPA. When the prey updates their location and the FAD effect is implemented, the fitness of the prey matrix has evaluated whether to update the elite matrix or not and the most fit matrix is chosen. This step also helpful in the improvement of the solution, according to [26].

Development of Chaos Embed Marine Predator Algorithm
This section presents the development of the chaos embed marine predator algorithm (CMPA). The following are the procedural steps for the development.

1.
The MPA has been divided into three phases. During the first phase, the search agents take big leaps and try to acquire as much space as they can; hence, in a way it can be said that this phase is primarily governed by exploratory action. Likewise, during the final phase, the exploration virtue of the algorithm becomes weakened and the exploitation virtue becomes enhanced. In a way, the starting phase that governs 1/3 of the iterations and the last phase that governs last 1/3 phase of iterations is solely dedicated to the exploration and exploitation virtues. Hence, any modifications in these either enhance the exploration or exploitation virtue of MPA. Considering this fact, the authors are motivated to develop a new position update mechanism that can affect both virtues simultaneously.

2.
During the intermediate phase, where the both processes are simultaneously progressing, a position update mechanism that can search alternative solutions is acutely required. Considering this argument, we propose a chaotic function-inspired position update mechanism that helps the algorithm to transit swiftly between exploration and exploitation phases.
(a) The generation of β-chaotic sequence through the initialization of the parameters (ν, µ, J 1 , J 2 ) is carried out. A generalized equation for the β distribution, as given in following expression, is as follows: where (ν, µ, J 1 , J 2 ) ∈ R and J 1 < J 2 . The β-Chaotic sequence at any iteration t will be given as: (b) For the first part of the population, during the second phase an update mechanism is introduced and represented as: Here, the R L = vector includes the arbitrary numbers related to the Levy motion. As in the Levy distribution the step size is very small, this movement represents diversification. (c) More precisely, the update in prey position can be governed by by the following decision-making loop.
In this modification, R has been replaced by Equation (15). This implies that for every iteration there will a new chaotic number is assigned for making a decision process. Hence, the decision for the position update is handled with the help of the chaotic function instead of a random function that is normally distributed. Pseudo code of the proposed algorithm is depicted in Algorithm 1. Update prey based on phase 1 Equations (4) and (5). 4: else if(T max /3 > t < 2 * T max /3) 5: Update prey based on phase 2 Equations (8), (9) and (15)-(18). 6: Else update prey based on phase 3 Equations (10) and (11). 7: End if loop 8: Accomplish Memory saving and update TPR EM

9:
Apply FAD effect and update based on the last phase as per Equations (12) and (13) 10: end while 11: Print the values of Fitness, Accuracy and Attributes.

Discussion
During stage 2, both prey and predator moves at the same pace; hence, there is a chance of local minima stagnation as the exploration and exploitation rates are almost same. Hence, to keep the exploration and exploitation phase alive the position update equation based on a random number has been replaced with chaotic numbers, which are obtained from the sequence generation as per the definition in Equations (14) and (15).
Embedding chaos at this stage, when the velocity of prey and predator is almost the same, is more meaningful because these search agents can be directed to a local minima spot without changing or exploring in the different direction. Hence, it is quite necessary to keep the gradient of the velocity agile. This fact also motivates the experimental investigation of embedding chaos in other phases. In this work, our focus is to embed chaos and observe the impact of this addition only on the optimization performance of the algorithm in the binary domain. The following section presents the problem formulation part for evaluation of the proposed CMPA.

Problem Formulation
From the evaluation perspective, the feature selection problem can be classified into two broad categories, in the first type of approach, which is based on filter-based methods, an effective subset of the feature is selected and its performance is evaluated; finally, the algorithm suggests the optimal subset. In this type of approach, the subset is not evaluated over the training samples. On the other hand, the wrapper feature selectionbased approaches evaluate the feature subset and performance validation is conducted with testing and validation of the data sets. Feature selection is always considered as a multi objective optimization problem where objectives can be the maximization of the classification accuracy with the minimum number of feature subsets. It appears that both of the objectives are conflicting in nature. Hence, the objective function employed in this study is a weighted combination of these objectives.
where Er(D) is the error in the classification rate of a given classifier; in this work, we have employed the K-nearest Neighbor classifier (KNN), and w 1 and w 2 are the weights where w 1 = 1 − w 2 . The weighted combination philosophy has been adapted from reference [11].

Results and Discussions
For comparing the proposed variant we draw a comparison on the basis of the accuracy of the classification, fitness values obtained by algorithm and average attributes obtained from the optimization runs. In order to access the performance of the proposed algorithm, 17 classical data sets have been chosen. The details of data sets are shown in Table 1.
We have reported our results in two sets. In set-1, a comparison is made with contemporary algorithms, and in set-2 the chaotic algorithms are simulated and their comparative analysis is presented.

Experimental Details
Designing a mechanism that chooses the optimal feature from the given sets is a very important procedure, as the randomness can alter the results in a very effective manner; hence, a rigorous experimental analysis has been carried out for choosing the number of iterations, number of search agents and both chaotic marine algorithms, along with the marine algorithm, have been analyzed for many independent runs. We choose the Vote, Tic-Tac-Toe, Sonar, Penguin, Lymphography, Exactly, CongressEw and Breast Cancer for analysis. In this analysis, we change the values of search agents from (5, 10 and 20) and number of maximum iterations (20, 30, 50 and 70). From the analysis conducted in this experiment, we have adopted the numbers of search agents to be 10 and the maximum iteration number is 100. This analysis is conducted in such a manner that the parametric impact can be observed on the accuracy of classification and fitness values. We observe that in choosing these values of the parameters, the accuracy of the classification is not compromised and fitness values are also optimal. Further, the experimental details of this study has been shown in Figure 2.

Comparison with Previously Published Approaches
For investigation, the comparison is made with some of the previously reported approaches in the classification domain, where the objective function depicted in the previous section has been considered for dealing with the KNN classifier. The comparison results of the fitness values has been shown in Table 2. It is worth mentioning here that the simulation process is time consuming, hence the mean values of 10 runs are reported in the table. We observe that the fitness values for all the test data is optimal for the proposed CMPA and in some cases these values are optimal. This fact establishes the applicability of CMPA in the binary domain. For example, in the case of CongressEw data, the fitness values are optimal for both CMPA and MPA.  Further, the comparative analysis of the classification accuracy has also been conducted with previously published algorithms; we observed that the classification accuracy of the proposed algorithm is better than MPA and better than GA, PSO and ALO. These results are shown in Table 3. For example, in the case of the ZOO data base, we observed that the classification accuracy of the CMPA is about 98%, on the other hand, the classification accuracy has been substantially compromised in ALO (91%), GA (88%) and PSO (83%).
It is also important to showcase the fact that classification accuracy has been achieved without compromising feature size. Hence, the attributes (feature) selected by every algorithm in each run has been averaged and showcased in Table 4. These values are very important indicators, as it can be easily observed from the table that the number of features selected by the algorithm is optimal in many cases, and this happens without compromising the classification accuracy.

Comparative Analysis of MPA and CMPA
For conducting this analysis, we have compared the optimization run results on the basis of attributes selected by the optimization algorithms, i.e., MPA and CMPA, on the basis of the fitness function values and on the basis of the classification accuracy achieved for different data sets. Table 5 showcases the results of the Wilcoxon rank-sum test [30] between MPA, and CMPA and the p-values are depicted in the table. This test is conducted with 95% confidence interval (5% significance level).    The column entry, which indicates value 1 in the p-values column, is considered as the native algorithm, from which the statistical comparison is executed. Here, MPA is considered as native algorithm and the rank-sum test calculation has been executed between MPA and the proposed CMPA. Hence, the results that obtained 0.05 were considered as a different distribution. From the entries depicted in the table, it has been observed that the CMPA provides competitive results when compared with MPA, and provides an optimal values of attributes, fitness function values and classification accuracies for almost all data sets. This fact advocates the applicability of a proposed algorithm on the feature selection problem.

Comparative Analysis of Performance of the Proposed CMPA with Other Chaotic Algorithms
Further, it has been an established fact that amending the chaos in the metaheuristic algorithms improvises the optimization efficiency in the binary domain. In order to investigate this fact, some recently published algorithms are considered for the evaluation of the performance of the proposed CMPA. These algorithms are the enhanced chaotic grasshopper optimization algorithm (ECGOA) (with sine map) [31], sinusoidal bridging mechanism-based grasshopper algorithm (with sine map) [32] and enhanced chaotic artificial bee colony algorithm (ECABC) (with sine map) [33]. The binary version of these chaotic algorithms are obtained, as per reference [11].
For showcasing the impact of chaos on the performance of these algorithms, the classification accuracy along with the mean fitness attribute selected by the algorithms is depicted in Table 6. From the table it has been observed that for majority of the data sets the classification accuracy is very competitive and that is with a smaller number of selected features. Further, as proof, the statistical significance test has been conducted for comparison of the proposed algorithm with other chaotic algorithms. The results of the mean feature obtained from the optimization runs along with the p-values of the rank-sum test have been showcased in Table 7. The following points are observed:

•
The mean values of features for 15 data sets are found optimal. Only the Zoo data set has optimal results for SFECGOA, and the HeartEW data set has the ECABC. This fact suggests that the selection of features without compromising accuracy can be possible with the proposed CMPA. • Inspecting the p-values obtained from the Wilcoxon rank-sum test [30], it has been observed that all the algorithms have p-values less than 0.05. Hence, it can be said that a statistical significance exists in the results for obtaining the mean attributes. This fact indicates that if we repeat this experiment again with the same parameters, we will obtain the same results. • The graphical analysis of the results obtained from the optimization process has been depicted with the help of bar charts in Figures 3 and 4. From these figures it is evident that the optimization capability of the proposed CMPA is superior to other algorithms. • From the analysis conducted in this experiment, it has been observed that the chaotic position update mechanism in MPA yields better results as compared with the contemporary chaotic algorithms that uses chaos as a bridging mechanism. In short, the modification suggested in the MPA is meaningful and demonstrates a positive impact on the optimization performance of the proposed algorithm.

Conclusions
This paper reports an application of the chaotic marine predator algorithm in a feature selection task; a binary version of the chaotic MPA algorithm is proposed in this work by altering the decision making of the position update phase of stage-2 with a chaotic sequence. We have changed the decision process by inculcating chaotic numbers generated from a chaotic sequence. Further, the proposed binary algorithm has been tested over 17 data sets and the algorithm analysis has been performed with the native algorithm. We observed that the native algorithm is strong and robust but some modifications in the position update process make it more suitable for the feature selection task. The results are reported with the help of different analyses. The following are the major conclusions drawn from this work.

1.
The algorithm analysis has been conducted on the basis of the number of search agents selected and the number of iterations selected for feature selection. After this analysis, the optimal values of design parameters have been selected for executing the feature selection task.

2.
A comparison with a recently published algorithm and state-of-the-art algorithms has been conducted to showcase the efficacy of the algorithm; the fitness value of the objective function along with classification accuracy have been reported in order to validate the efficacy of the proposed modification.

3.
A comparison of some chaotic algorithms along with the proposed CMPA has also been reported to showcase the feasibility of CMPA. It is observed that the classification accuracy of the algorithm has not been compromised and the number of features obtained from the optimization runs are found optimal for the majority of cases.

4.
Graphical analysis along with statistical comparison of the proposed algorithm with others revealed that a modification in the stage-2 of MPA algorithm has some positive implications on the optimization performance of MPA.
Application of chaos in multiple phases with normalization and scaled functions will be evaluated in the future.