Next Article in Journal
Non-Markovian Inverse Hawkes Processes
Next Article in Special Issue
Interaction Behaviours between Soliton and Cnoidal Periodic Waves for Nonlocal Complex Modified Korteweg–de Vries Equation
Previous Article in Journal
ROC Curves, Loss Functions, and Distorted Probabilities in Binary Classification
Previous Article in Special Issue
Guided Hybrid Modified Simulated Annealing Algorithm for Solving Constrained Global Optimization Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Chaos Embed Marine Predator (CMPA) Algorithm for Feature Selection

by
Adel Fahad Alrasheedi
1,
Khalid Abdulaziz Alnowibet
1,
Akash Saxena
2,*,
Karam M. Sallam
3 and
Ali Wagdy Mohamed
4,5,*
1
Statistics and Operations Research Department, College of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia
2
Swami Keshvanand Institute of Technology, Management & Gramothan, Jaipur 302017, India
3
School of IT and Systems, University of Canberra, Bruce, ACT 2601, Australia
4
Operations Research Department, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt
5
Department of Mathematics and Actuarial Science School of Sciences Engineering, The American University in Cairo, Cairo 11835, Egypt
*
Authors to whom correspondence should be addressed.
Mathematics 2022, 10(9), 1411; https://doi.org/10.3390/math10091411
Submission received: 12 March 2022 / Revised: 15 April 2022 / Accepted: 17 April 2022 / Published: 22 April 2022
(This article belongs to the Special Issue Variational Problems and Applications)

Abstract

:
Data mining applications are growing with the availability of large data; sometimes, handling large data is also a typical task. Segregation of the data for extracting useful information is inevitable for designing modern technologies. Considering this fact, the work proposes a chaos embed marine predator algorithm (CMPA) for feature selection. The optimization routine is designed with the aim of maximizing the classification accuracy with the optimal number of features selected. The well-known benchmark data sets have been chosen for validating the performance of the proposed algorithm. A comparative analysis of the performance with some well-known algorithms advocates the applicability of the proposed algorithm. Further, the analysis has been extended to some of the well-known chaotic algorithms; first, the binary versions of these algorithms are developed and then the comparative analysis of the performance has been conducted on the basis of mean features selected, classification accuracy obtained and fitness function values. Statistical significance tests have also been conducted to establish the significance of the proposed algorithm.
MSC:
68T01; 68T05; 68T07; 68T09; 68T20; 68T30

1. Introduction

In recent years, the application of optimization in the field of data-mining has been reported in many published approaches. Feature selection (FS) from a large data set is also one of the optimization problems. The FS problem has many industrial and healthcare-related applications. An effective FS technique can enhance the classification accuracy of the classifier and reduce the complexity of the system. The complexity of the system substantially enhanced with the dimension of the data. In other words, it speeds up the learning rate and improves the ability of a machine to anticipate the information pertaining to the data. The recent application of the FS technique in the field of healthcare is reported in [1], where an ensemble-based hybrid feature selection has been employed for the diagnosis of the brain tumor. The authors claimed that the proposed method is able to handle the imbalanced data. A network intrusion detection scheme based on the Least Square Support Vector Machine has been proposed by the authors [2]. The authors validated the approach on intrusion data sets. The problem of the high dimensionality of feature space pertaining to text characterization has been addressed in reference [3]. In this work, the authors proposed a novel Gini index for the classification and reduction of the features. Feature selection for the Brain Computer Interface (BCI) has been conducted with the help of information gain ranking, correlation-based feature selection, ReliefF, consistency-based feature selection and 1R ranking methods in the approach [4]. A brief classification of the feature selection algorithms are given in Figure 1.
A very interesting approach on the path planning for the mobile robot is proposed in reference. For defining the obstacle, the situation of workers in the Artificial Bee Colony has been utilized and in the second phase, the shortest path is selected by Dijkstra’s algorithm [5]. A very important application of the ABC algorithm has been reported for the identification of mechanical parameters of the Servo-drive system [6]. A novel approach of the Adaptive Procedure for Optimization Algorithms is proposed in reference [7]. Apart from these approaches, recent approaches based on the metaheuristic optimization motivated the author to employ the optimization algorithm in a feature selection task [8,9,10]. These references provide strong evidence of what optimization algorithms are capable of for dealing with complex engineering problems.
Apart from the application of metaheuristic optimization algorithms and evolution-based algorithms, there are many deterministic algorithms that are also employed for conducting feature selection tasks. Due to the deterministic nature or gradient-based mechanism, these algorithms are often stuck in a local minima trap and provide slow and premature convergence. For avoiding such problems and to provide a smooth and fast optimization environment, metaheuristic techniques are employed for executing feature selection problems. The recent trend is to apply the metaheuristic optimization algorithm for conducting this task; some of the fine approaches are depicted in the following references, where the application of the Hybrid Whale Optimization Algorithm (HWOA) [11] is explored with the amalgamation of the Whale Optimization Algorithm and Simulated annealing Algorithm (SA). A chaotic dragonfly algorithm has been proposed and applied on the feature selection task in reference [12].A similar approach based on the chaotic selfish heard optimizer has been proposed in reference [13]. A rich review of literature pertaining to the feature selection methods have been demonstrated in reference [14]. S-shaped and V-shaped functions are employed to create a binary search space in gaining and sharing a knowledge algorithm for the feature selection task in reference [15].

1.1. Some Recent Chaos-Based Approaches for Feature Selection

A chaotic optimization algorithm based on gaining and sharing knowledge-based optimization has been proposed in reference [16], as well as the the similar applications based on chaotic fruit fly optimization [17], chaotic crow search algorithms [18], chaotic multi verse optimizer [19] and chaotic salp swarm optimizers [20].
From these approaches, it is evident that the embedding chaos for making naive algorithms compatible for feature selection is a potential area of research. These approaches are strong evidence that by embedding chaos in the mechanism of algorithms, a substantial improvement can be achieved as far as classification accuracy and reduction in dimensionality is considered. Based on this discussion, the following subsection presents the research proposal for the work and objectives.

1.2. Research Objectives and Proposal

Recently, a new metaheuristic has been proposed [21] based on predatory behavior. The algorithm is known as the marine predator algorithm (MPA). The application of this algorithm in a multi-objective domain has been explored in reference [22]. A new improved model of MPA has been established in reference [23]. The paper touched the theme of introducing an opposition-based learning method, chaos map, self-adaption of population, and switching between exploration and exploitation phases. Application of this algorithm has been explored in the field of controller tuning. Further, a hybrid computational intelligence-based approach has been proposed for structural damage detection in reference [24].
Keeping these facts in mind, the work proposed in this paper addresses following objectives.
  • To propose a chaotic marine predator algorithm and develop a balance between the exploration and exploitation phase considering the binary search space.
  • To benchmark the proposed algorithm on a standard data set used in state-of-the-art classification tasks.
  • To evaluate the performance of the proposed algorithm with some recently proposed approaches in the feature selection domain.
  • To evaluate the performance of the proposed algorithm on certain evaluation criterion such as the statistical parameter calculation such as mean feature selected by algorithms, mean values of classification accuracy obtained in optimization runs and mean fitness values. Apart from these statistical attributes, a statistical test has also been conducted for showcasing the statistical significance of the algorithm.
The remaining part of this paper is organized as follows: in Section 2, brief details of the MPA are discussed. Section 3 presents the basic framework of the chaos embed marine predator algorithm (CMPA). Section 4 presents the problem formulation and details of the objective considered in this study. Section 5 presents the results and analysis of different tests. Section 6 concludes all major findings.

2. Marine Predator Algorithm: An Overview

The marine predator algorithm (MPA) [21] is a recently developed optimization technique that is based on the philosophy that while predator is searching for the prey, the prey also updates its position according to the location of food. The MPA presents a beautiful mimicry of a social life in terms of mathematical representations. This section briefly discuss the steps incorporated in the development of MPA. The different steps of MPA are as follows
  • Conceptualization of MPA: Like other nature-inspired algorithms, the initial population in MPA is equally scattered in the search region, which can be given as:
    Y 0 = U b + m ( U b L b )
Here, U b and L b are the minimum and maximum values of variables and r is an arbitrary number satisfying 0 < m < 1 .
Following the well-known Darwinian fittest theory in MPA, a group of best predators are selected as a final solution. In MPA, the initial location of the prey can be expressed as the following matrix of order n × d , where n represents the number of search agents and d is the dimension of the problem.
T P R E M = Y 1 , 1 t p Y 1 , 2 t p Y 1 , d t p Y 2 , 1 t p Y 2 , 2 t p · Y 2 , d t p Y n , 1 t p Y n , 2 t p Y n , d t p
where Y 1 , 1 t p represents the first top predator vector, which is replicated n times to construct the Elite matrix T P R E M , which can be extended up to n times and d dimensions. In MPA, the prey is searching for food and the predator is searching for prey, hence both can be considered as search agents. The matrix TPM has taken initial solutions, and after every iteration, the position of prey has improved. This updated matrix is called the elite matrix T P R E M . The prey matrix (TPM) is given by following expression.
T P M = Y 1 , 1 Y 1 , 2 Y 1 , d Y 2 , 1 Y 2 , 2 · Y 2 , d Y n , 1 Y n , 2 Y n , d
Y i , j denotes the location of i-th prey in the j-th dimension. It is to be noted that during the search process both prey and predators are search agents and they search for food.
  • Optimization steps: As predators and prey are two search agents of MPA, the whole optimization process depends on their proportional velocity. To illustrate the optimization process scientifically, it can be spilt up into three stages. Each stage predefined a natural order and time and was inspired by the natural behavior of the prey and predator. These stages are as follows:
    • Stage 1: If the velocity of predator is greater than prey. This case occurs in the initial steps or in intensification. When the proportion velocity is very high, i.e., (≥10), then the predator is almost still. This can be mathematically written as when t < T m a x / 3 ,
      s t e p i = R B ( T P R i E M R B T P i )
      where t is the current iteration and T M a x maximum values of iteration.
      T P M i = T P M i + K . R s t e p i
      where s t e p i = step size of i-th iteration, R B = vector including arbitrary numbers related to Brownian motion, K = constant number taken as equal to 0.5 and R = a vector of arbitrary numbers [ 0 , 1 ] . This stage occurs in almost the first 33 percentage of the total iteration, when the intensification is high.
    • Stage 2: If the proportional velocity of predator and prey is almost the same, which indicates that the prey is looking for its food and the predator is looking for its prey. This case happens in middle iterations, when intensification is slowly converting into diversification. At this time, half of the part of the population, i.e., predator, is accountable for the intensification and the prey is responsible for the diversification. If the prey follows the Levy motion and the predator follows the Brownian motion, then we get proportional velocity (≈1). Mathematically, when 1 3 T m a x < t < 2 3 T m a x . For the first part of the population:
      s t e p i = R L ( T P R i E M R L T P M i )
      T P M i = T P M i + K . R × s t e p i
      Here, the R L = vector includes arbitrary numbers related to the Levy motion. As in the Levy distribution, the step size is very small, hence this movement represents diversification.
      In the second half population MPA consider
      s t e p i = R B ( R B T P R i E M T P M i )
      T P M i = T P M i + K . C × s i
      C = 1 t T m a x 2 t T m a x is a control parameter that commands the step size of movements of the predator. The predator moves according to the Brownian motion and the prey follow the predator for its position updates.
    • Stage 3: If the proportional velocity ratio is low, i.e., the predator is moving faster in comparison to the prey. This situation occurs in the last iterations of optimization, and is related to diversification. The predator adopts the Levy motion in the case of low proportional velocity (=0.1). This can be given in the following way, if t > 2 3 T m a x
      s t e p i = R L R L T P R i E M T P M i i = 1 , . . . , n
      T P M i = T P R i E M + K . C × s t e p i
      These three stages present different steps of predators in finding their prey. According to their behaviour, we consider that the predator follows both the Brownian and Levy motion equally. In stage I, the predator is still, in stage II it follows the Brownian motion and in the last stage it moves in the Levy motion. These same things are also followed by the prey, as the prey is also a predator for some other marine creatures. For example, bony fish and marine invertebrates are prey for tuna fish and themselves a prey for silky sharks.
  • Fish Aggregating Device Effect (FAD): FAD is a floating device made by humans to find some specific marine creatures in tropical regions. It also affects marine animals in many other ways. According to [25], 80% of the lifespan of sharks has been spent around FAD and the rest in jumping in various dimensions to find prey. These FADs can be considered as local optima trapping agents of marine predators. The effect of FADs can be given mathematically as:
    T P M i = T P M i + C L b + R × U b L b × A i f r f T P M i + f 1 q + q T P M r 1 T P M r 2 i f r > f
    Here, f is the probability of the FAD effect on any optimizer and taken as f = 0.2 , q = a is the random number between 0 and 1, and r 1 and r 2 represent two arbitrary indexes of the prey matrix.
    A = 0 i f r < 0.2 1 i f r > 0.2
  • Memory of marine predators: Almost all marine predators are good at memorizing their location of successful foraging, which is referred to as the memory saving term in MPA. When the prey updates their location and the FAD effect is implemented, the fitness of the prey matrix has evaluated whether to update the elite matrix or not and the most fit matrix is chosen. This step also helpful in the improvement of the solution, according to [26].

3. Development of Chaos Embed Marine Predator Algorithm

This section presents the development of the chaos embed marine predator algorithm (CMPA). The following are the procedural steps for the development.
  • The MPA has been divided into three phases. During the first phase, the search agents take big leaps and try to acquire as much space as they can; hence, in a way it can be said that this phase is primarily governed by exploratory action. Likewise, during the final phase, the exploration virtue of the algorithm becomes weakened and the exploitation virtue becomes enhanced. In a way, the starting phase that governs 1/3 of the iterations and the last phase that governs last 1/3 phase of iterations is solely dedicated to the exploration and exploitation virtues. Hence, any modifications in these either enhance the exploration or exploitation virtue of MPA. Considering this fact, the authors are motivated to develop a new position update mechanism that can affect both virtues simultaneously.
  • During the intermediate phase, where the both processes are simultaneously progressing, a position update mechanism that can search alternative solutions is acutely required. Considering this argument, we propose a chaotic function-inspired position update mechanism that helps the algorithm to transit swiftly between exploration and exploitation phases.
(a)
The generation of β -chaotic sequence through the initialization of the parameters ( ν , μ , J 1 , J 2 ) is carried out. A generalized equation for the β distribution, as given in following expression, is as follows:
β ( J ; ν , μ , J 1 , J 2 ) = J J 1 J c J 1 ν J 2 J J 2 J c μ i f J [ J 1 , J 2 ] 0 o t h e r w i s e
where ( ν , μ , J 1 , J 2 ) R and J 1 < J 2 . The β -Chaotic sequence at any iteration t will be given as:
J t + 1 = k β ( J t ; ν , μ , J 1 , J 2 )
(b)
For the first part of the population, during the second phase an update mechanism is introduced and represented as:
s t e p i = R L ( T P R i E M R L T P M i )
T P M i = T P M i + K . R × s t e p i
Here, the R L = vector includes the arbitrary numbers related to the Levy motion. As in the Levy distribution the step size is very small, this movement represents diversification.
(c)
More precisely, the update in prey position can be governed by by the following decision-making loop.
T P M i = T P M i + K . J × s t e p i
In this modification, R has been replaced by Equation (15). This implies that for every iteration there will a new chaotic number is assigned for making a decision process. Hence, the decision for the position update is handled with the help of the chaotic function instead of a random function that is normally distributed. Pseudo code of the proposed algorithm is depicted in Algorithm 1.
Algorithm 1 Pseudo code of proposed CMPA.
1:
Initialize the search agent number, maximum iteration T m a x and FAD probability
2:
while Termination criterion is not met, start the algorithm loop do
     if(t < T m a x / 3 )
3:
    Update prey based on phase 1 Equations (4) and (5).
4:
else if( T m a x / 3 > t < 2 T m a x / 3 )
5:
    Update prey based on phase 2 Equations (8), (9) and (15)–(18).
6:
    Else update prey based on phase 3 Equations (10) and (11).
7:
    End if loop
8:
    Accomplish Memory saving and update T P R E M
9:
    Apply FAD effect and update based on the last phase as per Equations (12) and (13)
10:
end while
11:
Print the values of Fitness, Accuracy and Attributes.

Discussion

During stage 2, both prey and predator moves at the same pace; hence, there is a chance of local minima stagnation as the exploration and exploitation rates are almost same. Hence, to keep the exploration and exploitation phase alive the position update equation based on a random number has been replaced with chaotic numbers, which are obtained from the sequence generation as per the definition in Equations (14) and (15).
Embedding chaos at this stage, when the velocity of prey and predator is almost the same, is more meaningful because these search agents can be directed to a local minima spot without changing or exploring in the different direction. Hence, it is quite necessary to keep the gradient of the velocity agile. This fact also motivates the experimental investigation of embedding chaos in other phases. In this work, our focus is to embed chaos and observe the impact of this addition only on the optimization performance of the algorithm in the binary domain. The following section presents the problem formulation part for evaluation of the proposed CMPA.

4. Problem Formulation

From the evaluation perspective, the feature selection problem can be classified into two broad categories, in the first type of approach, which is based on filter-based methods, an effective subset of the feature is selected and its performance is evaluated; finally, the algorithm suggests the optimal subset. In this type of approach, the subset is not evaluated over the training samples. On the other hand, the wrapper feature selection-based approaches evaluate the feature subset and performance validation is conducted with testing and validation of the data sets. Feature selection is always considered as a multi objective optimization problem where objectives can be the maximization of the classification accuracy with the minimum number of feature subsets. It appears that both of the objectives are conflicting in nature. Hence, the objective function employed in this study is a weighted combination of these objectives.
O b j e c t i v e F u n c i o n ( J ) = w 1 × E r ( D ) + w 2 × R c N
where E r ( D ) is the error in the classification rate of a given classifier; in this work, we have employed the K-nearest Neighbor classifier (KNN), and w 1 and w 2 are the weights where w 1 = 1 w 2 . The weighted combination philosophy has been adapted from reference [11].

5. Results and Discussions

For comparing the proposed variant we draw a comparison on the basis of the accuracy of the classification, fitness values obtained by algorithm and average attributes obtained from the optimization runs. In order to access the performance of the proposed algorithm, 17 classical data sets have been chosen. The details of data sets are shown in Table 1.
We have reported our results in two sets. In set-1, a comparison is made with contemporary algorithms, and in set-2 the chaotic algorithms are simulated and their comparative analysis is presented.

5.1. Experimental Details

Designing a mechanism that chooses the optimal feature from the given sets is a very important procedure, as the randomness can alter the results in a very effective manner; hence, a rigorous experimental analysis has been carried out for choosing the number of iterations, number of search agents and both chaotic marine algorithms, along with the marine algorithm, have been analyzed for many independent runs. We choose the Vote, Tic-Tac-Toe, Sonar, Penguin, Lymphography, Exactly, CongressEw and Breast Cancer for analysis. In this analysis, we change the values of search agents from (5, 10 and 20) and number of maximum iterations (20, 30, 50 and 70). From the analysis conducted in this experiment, we have adopted the numbers of search agents to be 10 and the maximum iteration number is 100. This analysis is conducted in such a manner that the parametric impact can be observed on the accuracy of classification and fitness values. We observe that in choosing these values of the parameters, the accuracy of the classification is not compromised and fitness values are also optimal. Further, the experimental details of this study has been shown in Figure 2.

Comparison with Previously Published Approaches

For investigation, the comparison is made with some of the previously reported approaches in the classification domain, where the objective function depicted in the previous section has been considered for dealing with the KNN classifier. The comparison results of the fitness values has been shown in Table 2. It is worth mentioning here that the simulation process is time consuming, hence the mean values of 10 runs are reported in the table. We observe that the fitness values for all the test data is optimal for the proposed CMPA and in some cases these values are optimal. This fact establishes the applicability of CMPA in the binary domain. For example, in the case of CongressEw data, the fitness values are optimal for both CMPA and MPA.
Further, the comparative analysis of the classification accuracy has also been conducted with previously published algorithms; we observed that the classification accuracy of the proposed algorithm is better than MPA and better than GA, PSO and ALO. These results are shown in Table 3. For example, in the case of the ZOO data base, we observed that the classification accuracy of the CMPA is about 98%, on the other hand, the classification accuracy has been substantially compromised in ALO (91%), GA (88%) and PSO (83%).
It is also important to showcase the fact that classification accuracy has been achieved without compromising feature size. Hence, the attributes (feature) selected by every algorithm in each run has been averaged and showcased in Table 4. These values are very important indicators, as it can be easily observed from the table that the number of features selected by the algorithm is optimal in many cases, and this happens without compromising the classification accuracy.

5.2. Comparative Analysis of MPA and CMPA

For conducting this analysis, we have compared the optimization run results on the basis of attributes selected by the optimization algorithms, i.e., MPA and CMPA, on the basis of the fitness function values and on the basis of the classification accuracy achieved for different data sets. Table 5 showcases the results of the Wilcoxon rank-sum test [30] between MPA, and CMPA and the p-values are depicted in the table. This test is conducted with 95% confidence interval (5% significance level).
The column entry, which indicates value 1 in the p-values column, is considered as the native algorithm, from which the statistical comparison is executed. Here, MPA is considered as native algorithm and the rank-sum test calculation has been executed between MPA and the proposed CMPA. Hence, the results that obtained 0.05 were considered as a different distribution. From the entries depicted in the table, it has been observed that the CMPA provides competitive results when compared with MPA, and provides an optimal values of attributes, fitness function values and classification accuracies for almost all data sets. This fact advocates the applicability of a proposed algorithm on the feature selection problem.

5.3. Comparative Analysis of Performance of the Proposed CMPA with Other Chaotic Algorithms

Further, it has been an established fact that amending the chaos in the metaheuristic algorithms improvises the optimization efficiency in the binary domain. In order to investigate this fact, some recently published algorithms are considered for the evaluation of the performance of the proposed CMPA. These algorithms are the enhanced chaotic grasshopper optimization algorithm (ECGOA) (with sine map) [31], sinusoidal bridging mechanism-based grasshopper algorithm (with sine map) [32] and enhanced chaotic artificial bee colony algorithm (ECABC) (with sine map) [33]. The binary version of these chaotic algorithms are obtained, as per reference [11].
For showcasing the impact of chaos on the performance of these algorithms, the classification accuracy along with the mean fitness attribute selected by the algorithms is depicted in Table 6. From the table it has been observed that for majority of the data sets the classification accuracy is very competitive and that is with a smaller number of selected features.
Further, as proof, the statistical significance test has been conducted for comparison of the proposed algorithm with other chaotic algorithms. The results of the mean feature obtained from the optimization runs along with the p-values of the rank-sum test have been showcased in Table 7. The following points are observed:
  • The mean values of features for 15 data sets are found optimal. Only the Zoo data set has optimal results for SFECGOA, and the HeartEW data set has the ECABC. This fact suggests that the selection of features without compromising accuracy can be possible with the proposed CMPA.
  • Inspecting the p-values obtained from the Wilcoxon rank-sum test [30], it has been observed that all the algorithms have p-values less than 0.05. Hence, it can be said that a statistical significance exists in the results for obtaining the mean attributes. This fact indicates that if we repeat this experiment again with the same parameters, we will obtain the same results.
  • The graphical analysis of the results obtained from the optimization process has been depicted with the help of bar charts in Figure 3 and Figure 4. From these figures it is evident that the optimization capability of the proposed CMPA is superior to other algorithms.
  • From the analysis conducted in this experiment, it has been observed that the chaotic position update mechanism in MPA yields better results as compared with the contemporary chaotic algorithms that uses chaos as a bridging mechanism. In short, the modification suggested in the MPA is meaningful and demonstrates a positive impact on the optimization performance of the proposed algorithm.

6. Conclusions

This paper reports an application of the chaotic marine predator algorithm in a feature selection task; a binary version of the chaotic MPA algorithm is proposed in this work by altering the decision making of the position update phase of stage-2 with a chaotic sequence. We have changed the decision process by inculcating chaotic numbers generated from a chaotic sequence. Further, the proposed binary algorithm has been tested over 17 data sets and the algorithm analysis has been performed with the native algorithm. We observed that the native algorithm is strong and robust but some modifications in the position update process make it more suitable for the feature selection task. The results are reported with the help of different analyses. The following are the major conclusions drawn from this work.
  • The algorithm analysis has been conducted on the basis of the number of search agents selected and the number of iterations selected for feature selection. After this analysis, the optimal values of design parameters have been selected for executing the feature selection task.
  • A comparison with a recently published algorithm and state-of-the-art algorithms has been conducted to showcase the efficacy of the algorithm; the fitness value of the objective function along with classification accuracy have been reported in order to validate the efficacy of the proposed modification.
  • A comparison of some chaotic algorithms along with the proposed CMPA has also been reported to showcase the feasibility of CMPA. It is observed that the classification accuracy of the algorithm has not been compromised and the number of features obtained from the optimization runs are found optimal for the majority of cases.
  • Graphical analysis along with statistical comparison of the proposed algorithm with others revealed that a modification in the stage-2 of MPA algorithm has some positive implications on the optimization performance of MPA.
Application of chaos in multiple phases with normalization and scaled functions will be evaluated in the future.

Author Contributions

Conceptualization, K.M.S.; Data curation, K.A.A.; Formal analysis, A.S. and A.W.M.; Funding acquisition, A.F.A. and K.A.A.; Investigation, K.M.S.; Methodology, A.S.; Project administration, A.F.A.; Resources, K.A.A.; Supervision, A.W.M.; Writing—original draft, A.S.; Writing—review & editing, A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the Researchers Supporting Program at King Saud University, Project number (RSP-2021/323).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors present their appreciation to King Saud University for funding this research through the Researchers Supporting Program (Project number RSP-2021/323), King Saud University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Huda, S.; Yearwood, J.; Jelinek, H.F.; Hassan, M.M.; Fortino, G.; Buckl, M. A hybrid feature selection with ensemble classification for imbalanced healthcare data: A case study for brain tumor diagnosis. IEEE Access 2016, 4, 9145–9154. [Google Scholar] [CrossRef]
  2. Ambusaidi, M.A.; He, X.; Nanda, P.; Tan, Z. Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans. Comput. 2016, 65, 2986–2998. [Google Scholar] [CrossRef] [Green Version]
  3. Shang, W.; Huang, H.; Zhu, H.; Lin, Y.; Qu, Y.; Wang, Z. A novel feature selection algorithm for text categorization. Expert Syst. Appl. 2007, 33, 1–5. [Google Scholar] [CrossRef]
  4. Koprinska, I. Feature selection for brain-computer interfaces. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Berlin/Heidelberg, Germany, 2009; pp. 106–117. [Google Scholar]
  5. Szczepanski, R.; Tarczewski, T. Global path planning for mobile robot based on Artificial Bee Colony and Dijkstra’s algorithms. In Proceedings of the 2021 IEEE 19th International Power Electronics and Motion Control Conference (PEMC), Gliwice, Poland, 25–29 April 2021; pp. 724–730. [Google Scholar]
  6. Szczepanski, R.; Tarczewski, T.; Niewiara, L.J.; Stojic, D. Identification of mechanical parameters in servo-drive system. In Proceedings of the 2021 IEEE 19th International Power Electronics and Motion Control Conference (PEMC), Gliwice, Poland, 25–29 April 2021; pp. 566–573. [Google Scholar]
  7. Szczepanski, R.; Tarczewski, T.; Grzesiak, L.M. Application of optimization algorithms to adaptive motion control for repetitive process. ISA Trans. 2021, 115, 192–205. [Google Scholar] [CrossRef] [PubMed]
  8. Bangyal, W.H.; Ahmad, J.; Rauf, H.T. Optimization of neural network using improved bat algorithm for data classification. J. Med. Imaging Health Inform. 2019, 9, 670–681. [Google Scholar] [CrossRef]
  9. Rukhsar, L.; Bangyal, W.H.; Nisar, K.; Nisar, S. Prediction of insurance fraud detection using machine learning algorithms. Mehran Univ. Res. J. Eng. Technol. 2022, 41, 33–40. [Google Scholar] [CrossRef]
  10. Bangyal, W.H.; Ahmad, J.; Shafi, I.; Abbas, Q. A forward only counter propagation network-based approach for contraceptive method choice classification task. J. Exp. Theor. Artif. Intell. 2012, 24, 211–218. [Google Scholar] [CrossRef]
  11. Mafarja, M.M.; Mirjalili, S. Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 2017, 260, 302–312. [Google Scholar] [CrossRef]
  12. Sayed, G.I.; Tharwat, A.; Hassanien, A.E. Chaotic dragonfly algorithm: An improved metaheuristic algorithm for feature selection. Appl. Intell. 2019, 49, 188–205. [Google Scholar] [CrossRef]
  13. Anand, P.; Arora, S. A novel chaotic selfish herd optimizer for global optimization and feature selection. Artif. Intell. Rev. 2020, 53, 1441–1486. [Google Scholar] [CrossRef]
  14. Agrawal, P.; Abutarboush, H.F.; Ganesh, T.; Mohamed, A.W. Metaheuristic algorithms on feature selection: A survey of one decade of research (2009–2019). IEEE Access 2021, 9, 26766–26791. [Google Scholar] [CrossRef]
  15. Agrawal, P.; Ganesh, T.; Oliva, D.; Mohamed, A.W. S-shaped and v-shaped gaining-sharing knowledge-based algorithm for feature selection. Appl. Intell. 2022, 52, 81–112. [Google Scholar] [CrossRef]
  16. Agrawal, P.; Ganesh, T.; Mohamed, A.W. Chaotic gaining sharing knowledge-based optimization algorithm: An improved metaheuristic algorithm for feature selection. Soft Comput. 2021, 25, 9505–9528. [Google Scholar] [CrossRef]
  17. Zhang, X.; Xu, Y.; Yu, C.; Heidari, A.A.; Li, S.; Chen, H.; Li, C. Gaussian mutational chaotic fruit fly-built optimization and feature selection. Expert Syst. Appl. 2020, 141, 112976. [Google Scholar] [CrossRef]
  18. Sayed, G.I.; Hassanien, A.E.; Azar, A.T. Feature selection via a novel chaotic crow search algorithm. Neural Comput. Appl. 2019, 31, 171–188. [Google Scholar] [CrossRef]
  19. Ewees, A.A.; El Aziz, M.A.; Hassanien, A.E. Chaotic multi-verse optimizer-based feature selection. Neural Comput. Appl. 2019, 31, 991–1006. [Google Scholar] [CrossRef]
  20. Sayed, G.I.; Khoriba, G.; Haggag, M.H. A novel chaotic salp swarm algorithm for global optimization and feature selection. Appl. Intell. 2018, 48, 3462–3481. [Google Scholar] [CrossRef]
  21. Faramarzi, A.; Heidarinejad, M.; Mirjalili, S.; Gandomi, A.H. Marine Predators Algorithm: A nature-inspired metaheuristic. Expert Syst. Appl. 2020, 152, 113377. [Google Scholar] [CrossRef]
  22. Zhong, K.; Zhou, G.; Deng, W.; Zhou, Y.; Luo, Q. MOMPA: Multi-objective marine predator algorithm. Comput. Methods Appl. Mech. Eng. 2021, 385, 114029. [Google Scholar] [CrossRef]
  23. Ramezani, M.; Bahmanyar, D.; Razmjooy, N. A new improved model of marine predator algorithm for optimization problems. Arab. J. Sci. Eng. 2021, 46, 8803–8826. [Google Scholar] [CrossRef]
  24. Ho, L.V.; Nguyen, D.H.; Mousavi, M.; De Roeck, G.; Bui-Tien, T.; Gandomi, A.H.; Wahab, M.A. A hybrid computational intelligence approach for structural damage detection using marine predator algorithm and feedforward neural networks. Comput. Struct. 2021, 252, 106568. [Google Scholar] [CrossRef]
  25. Filmalter, J.D.; Dagorn, L.; Cowley, P.D.; Taquet, M. First descriptions of the behavior of silky sharks, Carcharhinus falciformis, around drifting fish aggregating devices in the Indian Ocean. Bull. Mar. Sci. 2011, 87, 325–337. [Google Scholar] [CrossRef]
  26. Parouha, R.P.; Das, K.N. A memory based differential evolution algorithm for unconstrained optimization. Appl. Soft Comput. 2016, 38, 501–517. [Google Scholar] [CrossRef]
  27. Mirjalili, S. The ant lion optimizer. Adv. Eng. Softw. 2015, 83, 80–98. [Google Scholar] [CrossRef]
  28. Harik, G.R.; Lobo, F.G.; Goldberg, D.E. The compact genetic algorithm. IEEE Trans. Evol. Comput. 2015, 3, 287–297. [Google Scholar] [CrossRef] [Green Version]
  29. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
  30. Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in Statistics; Springer: New York, NY, USA, 1992; pp. 196–202. [Google Scholar]
  31. Saxena, A.; Shekhawat, S.; Kumar, R. Application and development of enhanced chaotic grasshopper optimization algorithms. Model. Simul. Eng. 2018, 2018, 4945157. [Google Scholar] [CrossRef] [Green Version]
  32. Saxena, A. A comprehensive study of chaos embedded bridging mechanisms and crossover operators for grasshopper optimisation algorithm. Expert Syst. Appl. 2019, 132, 166–188. [Google Scholar] [CrossRef]
  33. Saxena, A.; Shekhawat, S.; Sharma, A.; Sharma, H.; Kumar, R. Chaotic step length artificial bee colony algorithms for protein structure prediction. J. Interdiscip. Math. 2020, 23, 617–629. [Google Scholar] [CrossRef]
Figure 1. Classification of feature selection algorithms.
Figure 1. Classification of feature selection algorithms.
Mathematics 10 01411 g001
Figure 2. Classification of feature selection algorithms.
Figure 2. Classification of feature selection algorithms.
Mathematics 10 01411 g002
Figure 3. Graphical representation of the optimization results (set-1).
Figure 3. Graphical representation of the optimization results (set-1).
Mathematics 10 01411 g003
Figure 4. Graphical representation of the optimization results (set-2).
Figure 4. Graphical representation of the optimization results (set-2).
Mathematics 10 01411 g004
Table 1. Data sets used for experimental verification.
Table 1. Data sets used for experimental verification.
S. No.Data SetNo. of AttributesNo. of Objects
1Breastcancer9699
2Breast EW30569
3CongressEw16435
4Exactly131000
5Exactly2131000
6HeartEW13270
7IonosphereEW34351
8KrvskpEw363196
9Lymphography18148
10Penguin32573
11SonarEw60208
12SpectEw22267
13Tic-tac-toe9958
14Vote16300
15WaveformEw405000
16Wine13178
17Zoo16101
Table 2. Fitness value.
Table 2. Fitness value.
Data SetMPA [21]CMPAALO [27]GA [28]PSO [29]
Breastcancer0.050.040.020.030.03
Breast EW0.060.060.030.040.03
CongressEw0.020.020.050.040.04
Exactly0.160.120.290.280.28
Exactly20.210.210.240.250.25
HeartEW0.190.190.120.140.15
IonosphereEW0.070.070.110.130.14
KrvskpEw0.030.030.050.070.05
Lymphography0.130.130.140.170.19
Penguin0.030.030.140.220.22
SonarEw0.100.100.180.130.13
SpectEw0.170.170.120.140.13
Tic-tac-toe0.220.230.220.240.24
Vote0.030.030.040.050.05
WaveformEw0.210.210.0210.20.22
Wine0.030.030.020.010.02
Zoo0.020.020.070.080.1
Table 3. Comparative analysis of classification accuracy.
Table 3. Comparative analysis of classification accuracy.
Data SetMPACMPAALOGAPSO
Breastcancer0.960.960.960.960.95
Breast EW0.940.940.930.940.94
CongressEw0.980.980.930.940.94
Exactly0.840.890.660.670.68
Exactly20.780.780.750.760.75
HeartEW0.810.820.830.820.78
IonosphereEW0.930.930.870.830.84
KrvskpEw0.970.970.960.920.94
Lymphography0.870.870.790.710.69
Penguin0.970.970.630.70.72
SonarEw0.900.900.740.730.74
SpectEw0.830.830.80.780.77
Tic-tac-toe0.780.780.730.710.73
Vote0.970.970.920.890.89
WaveformEw0.790.790.770.770.76
Wine0.970.970.910.930.95
Zoo0.980.980.910.880.83
Table 4. Optimized mean of attributes.
Table 4. Optimized mean of attributes.
Data SetMPACMPAALOGAPSO
Breastcancer3.443.386.285.095.72
Breast EW7.026.2216.0816.3516.56
CongressEw4.634.376.986.626.83
Exactly4.755.616.6210.829.75
Exactly22.102.0510.76.186.18
HeartEW5.075.7310.319.497.94
IonosphereEW7.396.889.4217.3119.18
KrvskpEw18.8215.9424.722.4320.81
Lymphography5.305.8911.0511.058.98
Penguin63.8360.43164.13177.13178.75
SonarEw20.7816.2337.9233.331.2
SpectEw5.295.0016.1511.7512.5
Tic-tac-toe5.605.536.996.856.61
Vote3.813.619.526.628.8
WaveformEw22.4119.7935.7225.2822.72
Wine4.554.3010.78.638.36
Zoo4.924.7813.9710.119.74
Table 5. Statistical significance test with MPA.
Table 5. Statistical significance test with MPA.
Data SetAttributesFitnessClassificationAttributes
MPACMPAMPACMPAMPACMPA
BreastcancerMean Values4.48 × 10 2 4.52 × 10 2 9.60 × 10 1 9.60 × 10 1 3.443.38
p-values1.005.82 × 10 1 1.005.69 × 10 1 1.009.03 × 10 1
Breast EWMean Values5.88 × 10 2 6.13 × 10 2 9.40 × 10 1 9.40 × 10 1 7.026.22
p-values1.004.32 × 10 1 1.005.03 × 10 1 1.001.81 × 10 1
CongressEwMean Values2.02 × 10 2 2.12 × 10 2 9.80 × 10 1 9.80 × 10 1 4.634.37
p-values1.003.48 × 10 1 1.005.80 × 10 1 1.005.08 × 10 1
ExactlyMean Values1.16 × 10 1 1.57 × 10 1 8.40 × 10 1 8.90 × 10 1 4.755.61
p-values1.002.35 × 10 1 1.002.35 × 10 1 1.003.10 × 10 1
Exactly2Mean Values2.15 × 10 1 2.15 × 10 1 7.80 × 10 1 7.80 × 10 1 2.102.05
p-values1.002.35 × 10 1 1.002.35 × 10 1 1.007.35 × 10 1
HeartEWMean Values1.86 × 10 1 1.92 × 10 1 8.10 × 10 1 8.20 × 10 1 5.075.73
p-values1.001.26 × 10 1 1.001.10 × 10 1 1.002.07 × 10 2
IonosphereEWMean Values6.60 × 10 2 7.05 × 10 2 9.30 × 10 1 9.30 × 10 1 7.396.88
p-values1.005.41 × 10 2 1.006.39 × 10 2 1.004.41 × 10 1
KrvskpEwMean Values3.41 × 10 2 3.01 × 10 2 9.70 × 10 1 9.70 × 10 1 1.88 × 10 1 1.59 × 10 1
p-values1.002.67 × 10 1 1.001.55 × 10 1 1.005.65 × 10 2
LymphographyMean Values1.28 × 10 1 1.29 × 10 1 8.70 × 10 1 8.70 × 10 1 5.305.89
p-values1.006.62 × 10 1 1.006.64 × 10 1 1.002.18 × 10 1
PenguinMean Values2.66 × 10 2 2.68 × 10 2 9.70 × 10 1 9.70 × 10 1 6.38 × 10 1 6.04 × 10 1
p-values1.005.88 × 10 1 1.001.001.004.57 × 10 1
SonarEwMean Values1.03 × 10 1 1.03 × 10 1 9.00 × 10 1 9.00 × 10 1 2.08 × 10 1 1.62 × 10 1
p-values1.005.43 × 10 1 1.009.67 × 10 1 1.003.97 × 10 3
SpectEwMean Values1.69 × 10 1 1.66 × 10 1 8.30 × 10 1 8.30 × 10 1 5.295.00
p-values1.004.80 × 10 1 1.004.58 × 10 1 1.006.36 × 10 1
Tic-tac-toeMean Values2.26 × 10 1 2.20 × 10 1 7.80 × 10 1 7.80 × 10 1 5.605.53
p-values1.003.19 × 10 1 1.003.19 × 10 1 1.005.79 × 10 1
VoteMean Values3.50 × 10 2 3.19 × 10 2 9.70 × 10 1 9.70 × 10 1 3.813.61
p-values1.001.73 × 10 1 1.001.08 × 10 1 1.008.39 × 10 1
WaveformEwMean Values2.11 × 10 1 2.11 × 10 1 7.90 × 10 1 7.90 × 10 1 2.24 × 10 1 1.98 × 10 1
p-values1.007.76 × 10 1 1.009.89 × 10 1 1.001.33 × 10 1
WineMean Values3.33 × 10 2 3.19 × 10 2 9.70 × 10 1 9.70 × 10 1 4.554.30
p-values1.007.62 × 10 1 1.007.41 × 10 1 1.005.43 × 10 1
ZooMean Values2.32 × 10 2 2.21 × 10 2 9.80 × 10 1 9.80 × 10 1 4.924.78
p-values1.004.22 × 10 1 1.003.42 × 10 1 1.005.43 × 10 1
Table 6. Comparative analysis of performance with chaotic algorithms.
Table 6. Comparative analysis of performance with chaotic algorithms.
Data SetParameterMPACMPAECGOA [31]SFECGOA [32]ECABC [33]
BreastcancerMean (Feature)3.443.383.543.656.74
Classification0.960.960.950.940.95
Breast EWMean (Feature)7.026.227.297.567.25
Classification0.940.940.940.930.93
CongressEwMean (Feature)4.634.374.444.564.92
Classification0.980.980.970.980.97
ExactlyMean (Feature)4.755.615.685.926.01
Classification0.840.890.850.840.83
Exactly2Mean (Feature)2.102.052.212.352.47
Classification0.780.780.770.780.79
HeartEWMean (Feature)5.075.735.654.985.24
Classification0.810.820.80.80.8
IonosphereEWMean (Feature)7.396.887.217.467.15
Classification0.930.930.920.910.93
KrvskpEwMean (Feature)18.8215.9418.2619.2418.25
Classification0.970.970.950.960.96
LymphographyMean (Feature)5.305.895.485.985.77
Classification0.870.870.860.860.86
PenguinMean (Feature)63.8360.4364.2564.9869.32
Classification0.970.970.970.950.96
SonarEwMean (Feature)20.7816.2321.5623.8725.36
Classification0.900.900.890.90.9
SpectEwMean (Feature)5.295.005.245.635.41
Classification0.830.830.820.850.83
Tic-tac-toeMean (Feature)5.605.535.985.725.69
Classification0.780.780.760.760.75
VoteMean (Feature)3.813.613.893.953.63
Classification0.970.970.960.950.96
WaveformEwMean (Feature)22.4119.7923.5425.3623.01
Classification0.790.790.770.780.78
WineMean (Feature)4.554.305.654.354.69
Classification0.970.970.960.960.96
ZooMean (Feature)4.924.784.984.654.79
Classification0.980.980.970.970.97
Table 7. Statistical significance analysis of CMPA with chaotic algorithms.
Table 7. Statistical significance analysis of CMPA with chaotic algorithms.
Data SetParameterCMPAECGOASFECGOAECABC
BreastcancerMean (Feature)3.383.543.656.74
p-values1.006.80 × 10 8 6.80 × 10 8 6.80 × 10 8
Breast EWMean (Feature)6.227.297.567.25
p-values1.006.80 × 10 8 6.80 × 10 8 6.80 × 10 8
CongressEwMean (Feature)4.374.444.564.92
p-values1.006.80 × 10 8 6.80 × 10 8 6.80 × 10 8
ExactlyMean (Feature)5.615.685.926.01
p-values1.006.80 × 10 8 6.80 × 10 8 6.80 × 10 8
Exactly2Mean (Feature)2.052.212.352.47
p-values1.006.80 × 10 8 6.80 × 10 8 6.80 × 10 8
HeartEWMean (Feature)5.735.654.985.24
p-values1.006.80 × 10 8 6.80 × 10 8 6.80 × 10 8
IonosphereEWMean (Feature)6.887.217.467.15
p-values1.006.80 × 10 8 6.80 × 10 8 6.80 × 10 8
KrvskpEwMean (Feature)15.9418.2619.2418.25
p-values1.006.80 × 10 8 6.80 × 10 8 6.80 × 10 8
LymphographyMean (Feature)5.895.485.985.77
p-values1.004.40 × 10 8 6.99 × 10 8 1.80 × 10 8
PenguinMean (Feature)60.4364.2564.9869.32
p-values1.002.18 × 10 8 1.96 × 10 8 2.80 × 10 8
SonarEwMean (Feature)16.2321.5623.8725.36
p-values1.002.18 × 10 8 1.96 × 10 8 2.80 × 10 8
SpectEwMean (Feature)5.005.245.635.41
p-values1.006.80 × 10 8 2.48 × 10 8 6.80 × 10 8
Tic-tac-toeMean (Feature)5.535.985.725.69
p-values1.004.40 × 10 8 2.48 × 10 8 6.80 × 10 8
VoteMean (Feature)3.613.893.953.63
p-values1.006.80 × 10 8 4.40 × 10 8 2.48 × 10 8
WaveformEwMean (Feature)19.7923.5425.3623.01
p-values1.002.48 × 10 8 4.40 × 10 8 6.80 × 10 8
WineMean (Feature)4.305.654.354.69
p-values1.002.48 × 10 8 2.80 × 10 8 3.20 × 10 8
ZooMean (Feature)4.784.984.654.79
p-values1.006.80 × 10 8 6.80 × 10 8 6.80 × 10 8
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Alrasheedi, A.F.; Alnowibet, K.A.; Saxena, A.; Sallam, K.M.; Mohamed, A.W. Chaos Embed Marine Predator (CMPA) Algorithm for Feature Selection. Mathematics 2022, 10, 1411. https://doi.org/10.3390/math10091411

AMA Style

Alrasheedi AF, Alnowibet KA, Saxena A, Sallam KM, Mohamed AW. Chaos Embed Marine Predator (CMPA) Algorithm for Feature Selection. Mathematics. 2022; 10(9):1411. https://doi.org/10.3390/math10091411

Chicago/Turabian Style

Alrasheedi, Adel Fahad, Khalid Abdulaziz Alnowibet, Akash Saxena, Karam M. Sallam, and Ali Wagdy Mohamed. 2022. "Chaos Embed Marine Predator (CMPA) Algorithm for Feature Selection" Mathematics 10, no. 9: 1411. https://doi.org/10.3390/math10091411

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop