A Competitive Memory Paradigm for Multimodal Optimization Driven by Clustering and Chaos

: Evolutionary Computation Methods (ECMs) are proposed as stochastic search methods to solve complex optimization problems where classical optimization methods are not suitable. Most of the proposed ECMs aim to ﬁnd the global optimum for a given function. However, from a practical point of view, in engineering, ﬁnding the global optimum may not always be useful, since it may represent solutions that are not physically, mechanically or even structurally realizable. Commonly, the evolutionary operators of ECMs are not designed to e ﬃ ciently register multiple optima by executing them a single run. Under such circumstances, there is a need to incorporate certain mechanisms to allow ECMs to maintain and register multiple optima at each generation executed in a single run. On the other hand, the concept of dominance found in animal behavior indicates the level of social interaction among two animals in terms of aggressiveness. Such aggressiveness keeps two or more individuals as distant as possible from one another, where the most dominant individual prevails as the other withdraws. In this paper, the concept of dominance is computationally abstracted in terms of a data structure called “competitive memory” to incorporate multimodal capabilities into the evolutionary operators of the recently proposed Cluster-Chaotic-Optimization (CCO). Under CCO, the competitive memory is implemented as a memory mechanism to e ﬃ ciently register and maintain all possible optimal values within a single execution of the algorithm. The performance of the proposed method is numerically compared against several multimodal schemes over a set of benchmark functions. The experimental study suggests that the proposed approach outperforms its competitors in terms of robustness, quality, and precision. presents a multimodal extension to incorporate multimodal capabilities in a recently developed optimization algorithm Cluster-Chaotic-Optimization (CCO). The proposed Multimodal Cluster-Chaotic-Optimization (MCCO) incorporates the concept of dominance found in animal behavior, which indicates the level between two in terms of aggressiveness leads the distant from MCCO, this computationally abstracted data multimodal


Introduction
Engineering optimization aims to obtain the optimal solution from a possible set of candidate solutions for a given minimization/maximization problem [1,2]. Many areas, such as economics, science, bio-engineering, and others, model an optimization problem in terms of objective functions. Traditionally, to solve optimization problems, engineering optimization has proposed the use of classical deterministic paradigms which theoretically guarantee the location of global optima. However, classical approaches present issues in the presence of multiple optima [3,4]. Deterministic methods are susceptible to being trapped in local optima. Under such circumstances, these methodologies obtain suboptimal values. On the other hand, Evolutionary Computation Methods (ECMs) have been proposed to solve complex optimization problems to alleviate the stagnation problem derived from the presence of multiple optima in a given objective function. ECMs are catalogued as stochastic search

Cluster-Chaotic-Optimization (CCO)
The main process of the CCO is based on data analysis of the population through a clustering method. The method considered here is the Ward method. With this approach, individual associations among data points guide the search strategy for the optimization process. Traditionally, most of the proposed ECMs consider each individual separately, regardless of the spatial information among them. Under this situation, CCO considers spatial associations among each generation to group similar individuals into clusters. These clusters will operate locally and globally to improve the search strategy. On the other hand, CCO also incorporates the stochastic behavior of chaotic sequences to randomly perturb solutions. The use of chaotic sequences has been demonstrated to improve the performance of ECMs based on random numbers [34][35][36].
The CCO method was conceived to find the global optimum of complex optimization problems in the following form: minimize/maximize J(x) x = (x 1 , . . . , x n ) ∈ R n subject to where J : R n → R corresponds to the objective function, and X = x ∈ R n |l j ≤ x j ≤ u j , j = 1, . . . , n is a bounded searched space, constrained by the upper (u j ) and lower (l j ) bounds. To find the global optimum of the aforementioned definition, CCO considers population D k d k 1 , d k 2 , . . . , d k N D compound of N D data points which evolves from an initial iteration (k = 0) to a maximum number of gen iterations (k = gen). Each solution d k i (i ∈ [1, . . . , N D ]) corresponds to a n-dimensional vector d k i,1 , d k i,2 , . . . , d k i,n where each dimension represents a decision variable. Based on the population description in CCO, three procedures are required to implement the evolution process of each data point: the first corresponds to the initialization process of the population of data points; the second considers an intracluster operation to search locally inside each cluster; and finally, an extracluster operation is executed to globally search outside each cluster but inside the search space.

Initialization
CCO begins by randomly initializing the population D k compound of N D solutions (data points). Each dimension of every data point corresponds to a uniform random number within the range of the upper (u j ) and lower (l j ) bounds. This mechanism is mathematically defined as follows: d k i,j = l j + rand(0, 1)·(u j − l j ), j = 1, 2, . . . , n, i = 1, 2, . . . , N D (2) where d k i,j represents the j-th decision variable of the i-th solution (data point) at the k iteration.

Intracluster Operation
In CCO, the main process for identifying promissory search zones within the search space is conducted by clustering. During the optimization process of CCO, clustering starts grouping individuals into a hierarchical structure. This mechanism generalizes natural data associations without considering the total number of clusters.
In each iteration of the CCO algorithm, the Ward method is used to obtain clusters by spatial data associations. Then, an evolutionary operator called "Intracluster operation" will locally explore each formed cluster. Under this operation, two procedures are computed: the local attraction mechanism and the local perturbation mechanism. In the local attraction mechanism, each data point inside each cluster is attracted to the best element found in the cluster; this operation can be considered as an exploitation operator inside the cluster. The local perturbation modifies each data point to increase the exploitation process inside the cluster. For this operation, it is assumed that c k q represents a cluster and it is composed of a set of c k q data points d k i i ∈ c k q , and that each element d k i is attracted to the best data point d k b d k b,1 , d k b,2 , . . . , d k b,n of the cluster based on the fitness value it represents. In this paper, the best element is considered as the minimum argument which minimizes the following objective function: Then, the local attraction operator will modify the dimensionality of the d k i data point as follows: where z corresponds to a chaotic sequence value obtained by the Iterative Chaotic Map with Infinite Collapses (ICMIC) [37] chaotic map, to generate near-uniform distribution to maintain diversity among the population [38]. The density term ρ k c q is then calculated as follows: The cluster density term ρ k c q is the quotient between the number of data points belonging to a given cluster c k q and the number of solutions in the population N D . This quotient will produce lower values (low-density) when the number of elements belonging to a given cluster c k q contains few data points. In contrast, the density term ρ k c q will produce higher values (high-density) when the number of elements belonging to a given cluster c k q contains a higher number of data points. The density term ρ k c q then acts as an attraction factor among the data points of a given cluster and their corresponding best cluster elements d k b . The induced effect of the density term ρ k c q implies two different scenarios: (i) clusters containing a high number of data points, whereby the quotient c k q N D will produce high-density values which will obtain larger movements by Equation (4). The direct effect of this is large movements towards the best cluster element, improving the exploration of the inner space of the cluster, but smaller search capabilities in the exploitation of the inner space of the cluster. To illustrate this effect, Figure 1a represents the case when the cluster contains a high number of elements; hence, the quotient c k q N D will produce high-density values, and Equation (4) will produce large movements inside the cluster. The arrows in the Figure,  will produce low-density values which will obtain smaller movements by Equation (4). The direct effect of this is small movements towards the best cluster element, not improving the exploration of the inner space of the cluster, but producing larger search capabilities in the exploitation of the inner space of the cluster. To illustrate this effect, Figure 1b represents the case when the cluster contains few elements; hence, the quotient c k q N D will produce low-density values, and Equation (4) will produce small movements inside the cluster but larger search capabilities to exploit the inner space of the cluster. The arrows in the Figure represent larger search capabilities in the exploitation of the inner space of the cluster.

Local Perturbation
In this operation, the resulting repositioned solutions by the local attraction mechanism are perturbed inside the clusters to improve its exploitation search. Each produced solution by the attraction method is then modified to conduct the search strategy inside each cluster. According to this procedure, the produced element

Local Perturbation
In this operation, the resulting repositioned solutions by the local attraction mechanism are perturbed inside the clusters to improve its exploitation search. Each produced solution by the attraction method is then modified to conduct the search strategy inside each cluster. According to this procedure, the produced element d k+1 i generates two different subelements: Both elements are generated based on: where z A and z B represent chaotic values generated by an ICMIC chaotic map. The terms v A and v B correspond to a radial neighborhood described as follows: v l = cos(α·r); l = A, B where r is a random number in the range [0, 2π], and α corresponds to the self-adaptive value described in [29]. The last step in this operation, is the elitist selection among d k+1 i , h A i and h B i to hold only the best elements from each generation. This process is described as follows: To graphically illustrate the local perturbation operation, Figure 2 summarizes the procedure. The figure shows that element

Local Perturbation
In this operation, the resulting repositioned solutions by the local attraction mechanism are perturbed inside the clusters to improve its exploitation search. Each produced solution by the attraction method is then modified to conduct the search strategy inside each cluster. According to this procedure, the produced element v correspond to a radial neighborhood described as follows: where r is a random number in the range [0, 2 ]  , and α corresponds to the self-adaptive value described in [29]. The last step in this operation, is the elitist selection among h to hold only the best elements from each generation. This process is described as follows: To graphically illustrate the local perturbation operation, Figure 2 summarizes the procedure. The figure shows that element B i h presents a better fitness value than its predecessor

Extracluster Operation
This CCO operation improves the global search. The operation considers two different parts: Global attraction and global perturbation. In global attraction, the best elements of each cluster are attracted to the best global solution which has occurred so far. In global perturbation, the repositioned elements produced by the global attraction movement are perturbed to increase the search capabilities of the method. This extracluster mechanism establishes a balance between the exploration and exploitation stages.

Global Attraction
This operation moves the best elements of each cluster d k b towards the best element to have occurred so far in the entire optimization process d k B . This global operator is described as follows: where rand(·) represents a random number between [0, 1] and v G corresponds to a radial neighborhood from Equation (7).

Global Perturbation
After the application of global attraction, the repositioned data points will produce two solutions in terms of radial movement. The aim of this procedure is to increase the exploitation rate of the search mechanism outside the clusters but inside the search space. In this operation, two different elements n are obtained as follows: where r R and r S correspond to random numbers between (0, 1). v R and v S are radial neighborhoods generated by Equation (7). The last step in this operation is an elitist selection among d k+1 i , h R b and h S b to keep only the best elements from each generation. This process is described as follows: To illustrate the extracluster operation, Figure 3 summarizes both the global attraction mechanism and the global perturbation procedure.

Extracluster Operation
This CCO operation improves the global search. The operation considers two different parts: Global attraction and global perturbation. In global attraction, the best elements of each cluster are attracted to the best global solution which has occurred so far. In global perturbation, the repositioned elements produced by the global attraction movement are perturbed to increase the search capabilities of the method. This extracluster mechanism establishes a balance between the exploration and exploitation stages.

Global Attraction
This operation moves the best elements of each cluster k b d towards the best element to have occurred so far in the entire optimization process k B d . This global operator is described as follows: rand v (9) where ()  rand represents a random number between [0, 1] and G v corresponds to a radial neighborhood from Equation (7).

Global Perturbation
After the application of global attraction, the repositioned data points will produce two solutions in terms of radial movement. The aim of this procedure is to increase the exploitation rate of the search mechanism outside the clusters but inside the search space. In this operation, two different elements   where R r and S r correspond to random numbers between (0, 1). R v and S v are radial neighborhoods generated by Equation (7). The last step in this operation is an elitist selection among h to keep only the best elements from each generation. This process is described as follows: To illustrate the extracluster operation, Figure 3 summarizes both the global attraction mechanism and the global perturbation procedure.  The following pseudo-code in Algorithm 1 summarizes the entire iteratively process of the CCO. The following pseudo-code in Algorithm 1 summarizes the entire iteratively process of the CCO. Algorithm 1. Pseudo-code for the Cluster-Chaotic-Optimization (CCO) algorithm.
for (each element of c k q ) Extracluster procedure end for 16. k = k + 1; 17. end while 18.
Output: d k B

Multimodal Cluster-Chaotic-Optimization (MCCO)
In CCO, the optimization process is driven by the application of a data analysis technique with chaotic perturbations. CCO divides the population considering the spatial information among individuals. This clustering process is based on the computation of a hierarchical tree, where the natural associations among each data point (individual) are determined. The clustering method called "Ward" is cataloged as a hierarchical clustering methodology, where each element starts by forming a single cluster, and then an association tree is generated over the remaining elements. This tree structure is used to define the level at which the clustering algorithm will produce clusters. The formed clusters share similarities among data points. CCO uses the idea of clustering from the Ward method to partition the population into similar groups at each iteration of the optimization process. In the beginning, each element is treated as a single cluster; then, during the application of its evolutionary operators, it starts grouping clusters containing more elements. Then, CCO operates each cluster differently by exploring and exploiting inside and outside each cluster.
The CCO uses the intracluster operation to locally explore and exploit inside each formed cluster. This process is achieved by two operations: local attraction and local perturbation. Under local attraction, each element of a given cluster is attracted to the best element in such a cluster. The way each element moves towards the best individual in the cluster is based on the density measure of the cluster. In the CCO, the density of a cluster refers to the number of elements a cluster has. Clusters containing few elements will attract each element quickly. On the other hand, if a cluster contains a high number of elements, the elements will attract each other slowly. This process can be defined as an exploration operator inside the cluster. To maintain a balance between exploration and exploitation inside the clusters, CCO defines a local perturbation operator as an exploitation mechanism inside the clusters. In this way, two different solutions are radially generated to improve the search mechanism.
On the other hand, CCO uses the extracluster operation to globally explore and exploit outside the clusters, that is, in the overall search space. To accomplish this, CCO considers both the global attraction and global perturbation operations. Under global attraction, the best elements of each cluster are attracted to the global best solution to have occurred so far. This procedure improves the exploitation stage outside each cluster but inside the feasible search space of a given optimization problem. Then, a redefinition phase is computed. The CCO uses global perturbation as an exploitation operator outside the clusters but inside the search space. This whole process maintains the population diversity and promotes a balance among the evolutionary exploitation and exploration stages.
The spatial associations from the CCO operators suggest an inherent multimodal behavior. Each time an iteration begins, a clustering procedure is executed. This mechanism will group each individual in the population into similar individuals forming clusters. This clustering process suggests the agglomeration of individuals into potential search zones. Under such circumstances, CCO presents a certain degree of multimodality in its operators. However, the original CCO structure is not suitable for detecting multiple optima in a single run. Under such limitations, CCO can be adapted to incorporate multimodal capabilities.
In this paper, a multimodal extension of the CCO is presented to incorporate multimodal capabilities into the original structure of the CCO. The concept of dominance commonly found in social animal associations presents the nature-inspired structure called "competitive memory". Under the competitive memory approach, each individual will confront its neighbors. The resulting individual will be catalogued as a potential optimum. Then, an updating scheme will manage the diversity of the population. The effect of this computational structure will provide a multimodal structure where optimal and suboptimal solutions will be carried out at each iteration.
The multimodal extension used to incorporate multimodal capabilities in the CCO was conceived based on the concept of dominance in animal interactions. Biologists have demonstrated that social interactions among animals remain in an animal's memory. Such a structure has been called "competitive memory" [31][32][33]. In this structure, it is established that previous group interactions can affect social interactions in the future in terms of aggressiveness. Such aggressiveness keeps two or more individuals as distant as possible from one another, where the most dominant individual prevails as the other withdraws. From a computational point of view, the idea of dominance among individuals in a population is implemented based on a data structure called competitive memory.
To implement the competitive memory approach, two types of memory must be generated: historic M H and population memories M P . Historic memory stores promissory solutions through the optimization process, in contrast to population memory, which only stores the solutions for each generation. Once these memory structures have been initialized, competition and update mechanisms are required.

Initialization Phase
The first step considered for the implementation of the competitive memory approach in the CCO is the initialization of the memory mechanism. For that, once the initialization procedure from Section 2.1 has been executed, a sorted copy of the population will create the historic memory M H = m 1 H , m 2 H , . . . , m n H , where each m i H vector corresponds to an element belonging to the historic memory. A sorted copy of the population will also create the population memory M P = m 1 P , m 2 P , . . . , m n P , where each element m i P corresponds to an individual stored in the population memory. After the initialization process has occurred, the population memory will be affected by the CCO evolutionary operators and the historic memory will maintain potential optima during the optimization process.

Competition Phase
This procedure is based on the biological concept of dominance. Animal dominance is a social interaction behavior among two animals. Animals maintain a distance from each other to avoid confrontation. The distance is based on how aggressively the animals behave. When two animals confront each other inside a radius distance, the most dominant animal will prevail, while the other withdraws. In order to implement this idea, a set of competition rules must be applied for each solution to be part of the M H . The competition rules are based on the distance (δ) and fitness values among M k P and M H . The following rules are considered in the implementation of the competition phase in MCCO.

2.
Compute the euclidean distance δ among elements of M H and elements of M k P .

3.
If the distance δ between two individuals is less than the dominance radius ρ, then the prevailing individuals beloging to M H will be stored in a temporary historic memory T H , while the prevailing individuals in M k P will be stored in temporary population memory T P . 4.
The temporary memory structure T will be the union of T H and T P .
To illustrate the previously described competition rules, Figure 4 graphically illustrates the competition phase between L 1 and L 2 animals using the representation L 1 as individual m H i stored in the M H , and L 2 to represent individual m P j stored in M k P . 3. If the distance δ between two individuals is less than the dominance radius ρ , then the prevailing individuals beloging to H M will be stored in a temporary historic memory H T , while the prevailing individuals in k P M will be stored in temporary population memory P T .
4. The temporary memory structure T will be the union of H T and P T .
To illustrate the previously described competition rules, Figure   The dominance radius ρ in Figure 4 is computed by: where j l and j u correspond the lower and upper limits for the j-th decision variable, respectively.
κ corresponds to a proportional factor to adjust the radius to a minimum value regarding the number of dimensions for the objective function. The κ parameter was experimentally configured to 20. Such an experimental value was chosen considering the sensitivity analysis reported in Section 4.3. The pseudo-code for the competition phase is summarized in Algorithm 2.

Algorithm 2.
Pseudo-code for the competition phase.
1.  In Figure 4a, L 2 enters to the radius of L 1 ; then, they confront each other, considering their fitness values. If L 2 possesses a lower fitness value than L 1 , then L 1 remains unbeaten and L 2 is removed from M k P (Figure 4b). On the other hand, if L 1 possesses a lower fitness value than L 2 , L 2 remains unbeaten and L 1 is removed from M H (Figure 4c).
The dominance radius ρ in Figure 4 is computed by: where l j and u j correspond the lower and upper limits for the j-th decision variable, respectively. κ corresponds to a proportional factor to adjust the radius to a minimum value regarding the number of dimensions for the objective function. The κ parameter was experimentally configured to 20. Such an experimental value was chosen considering the sensitivity analysis reported in Section 4.3.
The pseudo-code for the competition phase is summarized in Algorithm 2.

Update Phase
Finally, a mechanism to maintain population diversity in the optimization process is considered in the last step. The updating scheme for the competitive memory approach aims to obtain the historic memory M H for future iterations. The historic memory will contain and maintain the best solutions through the optimization process. In the competition phase, a temporary historic memory T H is created; however, the number of its elements could be smaller than the historic memory M H , Hence, if the aforementioned condition is satisfied (|T H | < |M H |), then the update phase is executed considering two scenarios as: 1.
If |T P | > 0, then the best individuals belonging to T P will be stored in T H . 2.
If |T H | < N D , then the best solutions in P k will be allocated to T H .

Algorithm 2.
Pseudo-code for the competition phase.

The Complete Multimodal Cluster-Chaotic-Optimization (MCCO)
To incorporate multimodal capabilities into the original structure of CCO, the MCCO requires three operators to allocate and manage the potential optima: initialization, competition, and update. In the initialization phase, a memory mechanism is initialized based on the current population; in this process, two types of memories are generated in order to computationally abstract the concept of dominance. Then, in the competition phase, the solutions in both memories confront each other in order to determine the most dominant solutions. Finally, the updating scheme manages the historic memory to produce a new population to be used in future iterations. Under the complete memory mechanism of the MCCO, potential optima will be stored and maintained during the whole optimization process by holding only the solutions which present better fitness values. The complete pseudo-code for the MCCO algorithm can be summarized in Algorithm 4.
In Algorithm 4, the original operators of the CCO are extended with the memory mechanism described in Section 3; the memory initialization process is achieved by line 3. Then, the competitive phase is applied in line 17, and the update phase is accomplished in line 18. The original structure of the intra-and extra-cluster operations are found in lines 8-12 and 13-16, respectively. Algorithm 4. Pseudo-code for the Multimodal Cluster-Chaotic-Optimization (MCCO) algorithm.

Experimental Results
This section presents a numerical comparison among MCCO and 11 state-of-the-art multimodal techniques. The performance results were obtained by the evaluation of 23 multimodal benchmark functions containing different types of complexities. The experimental analysis was based on the computation of commonly used performance metrics in the multimodal literature. Such metrics measure the ability of each multimodal methodology to quantify the number of approximated solutions considering true optima. In the Section 4.1, each of the performance metrics used in the experimental study is described. Section 4.2 presents the analytical methodology considered in this study to obtain the true optimal values for each multimodal benchmark function. Section 4.3 presents the numerical results of MCCO, and the rest of the multimodal approaches are compared considering the performance metrics.

Performance Metrics
In this section, six multimodal optimization performance indexes are presented. The set of metrics is composed of the Effective Peak Number (EPN), the Maximum Peak Ratio (MPR), the Peak Accuracy (PA), the Distance Accuracy (DA), the Peak Ratio (PR), and the Success Rate (SR). The entire set of metrics was used extensively to quantify the performance of many multimodal optimization techniques [17,[39][40][41]. The entire set of metrics expresses the performance of a multimodal approach based on the difference among true optima and the approximated optimal values. The EPN reflects the capability of a multimodal technique to obtain most of the optima. The MPR computes the consistency of the approximated optima over true optima. On the other hand, PA measures the error among approximated optima and true optima. Similarly, DA indicates the total error, considering each independent variable of the objective function. PR calculates the percentage of the total number of approximated optima over multiple executions of a given algorithm. Lastly, SR measures the successful percentage of runs considering the total number of executions. The following paragraphs mathematically describe each metric.
Effective Peak Number (EPN). This metric quantifies the number of approximated solutions identified as valid optima. Each approximated solutionô is considered as a valid optimum if the Euclidean distance betweenô and the true optimum o is less than µ. The EPN is calculated as follows: where the subindexes i and j correspond to the i-th and the j-th true optimum and the approximated optimum, respectively. Additionally, µ is a threshold value which refers to the accuracy. The value of µ was set to 0.5. This value corresponds to the accuracy level in [41].
Maximum Peak Ratio (MPR). This metric computes the consistency of the approximated optima over true optima. MPR is defined as: where EPN and O correspond to the number of valid optima and the number of true optima, respectively.
Peak Accuracy (PA). Calculates the obtained error among approximated optima and true optima as follows: Distance Accuracy (DA). Since the calculated error in PA is based on the fitness value, it does not consider the closeness over peaks. For that, DA computes the error between approximated optima and true optima according to the following model: Peak Ratio (PR). PR calculates the percentage of the total number of approximated optima over multiple executions of a given algorithm as follows: Successful Rate (SR). Measures the successful percentage of runs considering the total number of executions as: where NSR corresponds to the number of successful runs and NR denotes the total number of executions.

True Optima Determination
Each of the previously described multimodal metrics operates considering the true optima solutions for each benchmark function. Under such circumstances, true optimal values are required. Most of the reported literature on multimodal optimization lacks information related to the numerical values of true optima. In this paper, the calculation process to obtain the numerical values of each optimum is based on derivate application. To obtain all the optima values, the middle point between the highest and lowest values is defined for each benchmark function. Then, all the optima found below (in case of minimization) the middle point will be target optima. Under the target optima, the application of the second partial derivative is required to analytically compute the optimal values. The following model describes the true optimal set T.
where J corresponds to the objective function, o is the optimum, and m represents the middle point used as a threshold value to compute optimal values. Then, Equation (20) defines the second partial derivative discriminant: To indicate if certain point (x 0 , y 0 ) could represent a local minimum, the discriminant D is used as follows: From Equation (21), it can be shown that the process to obtain local minima is based on a minimization process.

Performance Comparison
In this section, the numerical results of the proposed MCCO are presented by comparing the performance among MCCO and 11 state-of-the-art multimodal approaches, considering a set of 14 multimodal benchmark functions. The benchmark functions have been widely used to test the multimodal capabilities of several multimodal functions [41][42][43]. Tables A1 and A2 in Appendix A mathematically describe the test functions considered in the experimental study. For clarity, the benchmark functions have been split into two tables. Table A1 describes functions J 1 -J 7 , and Table A2 describes functions J 8 -J 14 . In the tables, the features of each benchmark function are defined. The search domain column indicates the box constraints for each objective function, n corresponds to the dimensionality tested, and the optima number corresponds to the number of true optima determined by the second partial derivative method from Section 4.2.
The comparison scheme involves the evaluation of the six multimodal metrics described in Section 4.1. Also, it is considered a statistical validation framework based on a rank sum [52] test to avoid the random effect. The population size has been configured to 100 individuals, and the maximum number of iterations has been configured as 500, considering 30 independent runs. Each optimization process is executed using MATLAB ® R2018b, Windows-7 OS, x64-based PC, Intel(R) Core-i7(R)-CPU, 2.20 GHz with 16 GB RAM. The initial configuration parameters for each multimodal approach were devised according to the guidelines in Table 1. These configuration settings were chosen since they represent the best parameters for each multimodal approach according to their reported guidelines.
Additionally, the κ parameter was experimentally configured to 20; this value was chosen by the sensitivity analysis shown in Table 2. In the table, an evaluation of the MCCO method is reported for each benchmark function considering the EPN metric. The sensitivity analysis was conducted on 30 independent runs. The best entries in the table are in bold, and the numbers in parenthesis are the standard deviations.
Tables 3 and 4 present the numerical results from the experimental study of all multimodal approaches. To make a clear representation of the numerical results, Table 3 reports the experimental results for functions J 1 -J 7 , and Table 4 reports the numerical results for functions J 8 -J 14 . In the tables, the numerical values for each metric are presented. Additionally, to measure the computational effort of each multimodal approach, the Number of Function Calls (NFC) and the execution time (T) in seconds were assessed. Finally, the entries in parentheses are the standard deviations of each particular metric. Table 1. Parameter configuration for each multimodal method used in the experimental study.
[51]        Table 3 reports the numerical results for functions J 1 -J 7 . In the table, it can be seen that MCCO and MOMMOP achieved the best results for the majority of the numerical simulations. In function J 1 and J 5 , both algorithms outperformed the others. According to the EPN metric, only MCCO and MOMMOP are capable of finding the total number of peaks in the functions; however, MCCO produced more consistent results. In the case of function J 2 , FSDE and MCCO obtained all the optimal values. For function J 6 , CSA, HTS, FSDE MOMMOP, and MCCO obtained the maximum number of peaks with similar levels of robustness. The most distinguishing characteristics of MCCO regarding these multimodal methods concerned its ability to find all the peaks with relatively low computational effort, compared to its competitors. For functions J 3 , J 4 and J 7 , it is clear that MCCO outperformed the other algorithms in terms of dispersity, scalability, and precision. According to the reported results, the MPR, PA, and DA metrics suggest that MCCO is capable of operating under complex multimodal functions by yielding the greatest EPN value.
Additionally, Table 3 reports the computational effort of each multimodal approach, considering the NFC and execution time. From the results, it is quite evident that LIPSM presented the best execution time metric for functions J 1 -J 7 . However, performance metrics indicate that LIPSM was not able to produce competitive results. In contrast, MCCO yielded significantly better results than many of the tested algorithms when evaluating a similar number of function calls. In general terms, by analyzing the numerical results from Table 3, it can be seen that the competitive memory approach adapted in the original operators of the CCO method provided better results than its competitors. Since collective memory is based on the concept of dominance, potential solutions compete among themselves to be allocated into the historic memory. This process detects most of the possible optima in a single run of the entire MCCO method. The DA metric reported that MCCO obtained the optimal values with the shortest spatial relation, compared to the rest of multimodal methods. This indicates that MCCO produces solutions with a higher level of consistency. The MPR and PA performance metrics corroborate that the proposed approach obtained a higher accuracy level than the other methodologies by measuring the error among approximated optima and true optimal values, evaluating each benchmark function in fewer runs with respect to the other methods. This indicates that the proposed mechanism is capable of finding most of the true optima with low computational effort.
From the numerical results in Table 4, it is clear that for functions J 10 and J 11 , PNPCDE and FSDE outperformed the other algorithms, including MCCO. The experimental results suggest that these methods are capable of finding a higher number of optimal values than MCCO. However, PNPCDE presented a lower level of consistency than MCCO. Considering the standard deviation of EPN, it can be seen that PNPCDE and FSDE tended to produce dispersed solutions. Under such circumstances, both techniques produced nonrobust solutions. In contrast, even though it did not detect the highest number of peaks in functions J 10 and J 11 , the standard deviation of EPN indicated that more consistent and robust solutions were produced using MCCO. On the other hand, for the remaining functions, MCCO outperformed the tested multimodal approaches. In functions J 08 and J 09 , it was quite evident that the competition phase from the memory mechanism in CCO efficiently registered most of the candidate optima for these functions. In the results, it can be noted that MCCO produced results containing a higher level of accuracy with the lowest standard deviations. Additionally, for functions J 12 -J 14 , MCCO presented remarkable performance considering the EPN and its corresponding standard deviation. Also, MCCO produced closer solutions regarding the true optima for each benchmark function (MPR). MCCO is capable of detecting the suboptimal and optimal solution under these functions, since it makes use of a powerful updating scheme. The collective memory mechanism stores and manages all the potential solutions thanks to the balance among the original evolutionary operators of the CCO method.
As a result, the proposed multimodal extension to adapt CCO in multimodal optimization produces a balanced and powerful data structure which can efficiently register, maintain, and manage all potential solutions during the entire optimization process. Also, from the experimental study, it was determined that the proposed MCCO detects most of the optima for the majority of the test functions, evaluating similar NFC than the many of tested methods with the lowest execution time in most cases, indicating that MCCO is less computationally complex.
In order to statistically corroborate the numerical results from Tables 3 and 4, A Wilcoxon rank sum test was computed. This nonparametric approach indicates whether there is a significant difference between two multimodal approaches. In this study, the test was conducted based on 5% significance. Table 5 report the p-values of a pair-wise comparison among each multimodal technique. For the test, the proposed null hypothesis H 0 represents the idea of no significant difference among multimodal methods. As a counterpart, the proposed alternative hypothesis H 1 indicates a significant difference among two tested methods. To visually analyze the numerical results from this test, Table 5 uses the symbols , , and ; the first symbol indicates that one algorithm outperformed its competitor; the second indicates that a given method performed worse than its competitor; the third symbol indicates that the statistical test could not decide which algorithm was significantly better. As shown in Table 5, MCCO performed better than its competitors, producing solutions that were quite different in most of the experimental cases.

Conclusions
Evolutionary Computation Methods (ECMs) are stochastic search mechanisms which present an alternative search strategy with which to solve real-world optimization problems where classical optimization techniques are unsuitable. Most of the literature on ECMs indicates that these methods are conceived of to detect the global optimum. However, in real-world applications in the engineering, medical, or economic fields, the global optimum may not be realizable due to physical, mechanical, or even realistic aspects. Under such circumstances, multimodal optimization methodologies have been designed to detect optimal and suboptimal solutions in a single run of the optimization process.
This paper presents a multimodal extension to incorporate multimodal capabilities in a recently developed optimization algorithm called Cluster-Chaotic-Optimization (CCO). The proposed Multimodal Cluster-Chaotic-Optimization (MCCO) incorporates the concept of dominance found in animal behavior, which indicates the level of social interaction between two animals in terms of aggressiveness. Such aggressiveness leads the animals to remain as distant as possible from each other, i.e., the most dominant individual prevails while the other withdraws. In MCCO, this concept is computationally abstracted in terms of a data structure called "competitive memory" to incorporate multimodal capabilities into the evolutionary operators of the CCO.
The single optimum CCO divides the population into small clusters for each generation; meanwhile, the search strategy is conducted based on intra-and extra-cluster evolutionary operations. Intracluster procedures will cause the search strategy to be inside each cluster. Extracluster will search outside of each cluster but inside the feasible search space. The combination of these two evolutionary operators tends to form groups into potential search zones. Such promissory zones can be efficiently registered within a memory data structure to maintain potential locations which can be catalogued as optimal and suboptimal values. Under such circumstances, CCO can be extended considering the abstraction idea of animal dominance to incorporate multimodal capabilities into the original CCO to detect all possible optimal solutions in a single run of the optimization process.
The performance of the proposed MCCO was tested and compared against eleven multimodal techniques, i.e., DCGA, CSA, FSDE, LIPSM, LoINDE, MGSA, PNPCDE, HTS, MOMMOP, EARSDE, and RM. In the experimental section, a comparison of the results based on six commonly used multimodal performance metrics, i.e., Effective Peak Number (EPN), the Maximum Peak Ratio (MPR), the Peak Accuracy (PA), the Distance Accuracy (DA), the Peak Ratio (PR), and the Success Rate (SR) was reported. The EPN reflects the ability of a multimodal technique to obtain most of the optima. The MPR computes the consistency of the approximated optima over true optima. On the other hand, PA measures the error among approximated optima and true optima. Similarly, DA indicates the total error, considering each independent variable of the objective function. PR calculates the percentage of the total number of approximated optima over multiple executions of a given algorithm. Lastly, SR measures the successful percentage of runs, considering the total number of executions. Also, the computational effort, in terms of the Number of Function Calls (NFC) and execution time (T) in seconds, was reported.
Based on the numerical results, it was demonstrated that the proposed approach is capable of obtaining most of the true optimal values in most of the benchmark functions, with a competitive computational effort level based on NFC and execution time. Since the MPR, PA, DA, PR, and SR metrics were calculated based on the EPN metric, a nonparametric test was conducted on the EPN metric to statistically validate the performance results based on true optima approximation. In the statistical test, it was shown that the proposed method is capable of locating most of the true optima based on the Euclidean distance between the true optima and approximated solutions.

Conflicts of Interest:
On behalf of all authors, the corresponding author states that there is no conflict of interest. The authors declare that they have no conflict of interest.