A New Parallel Cuckoo Flower Search Algorithm for Training Multi-Layer Perceptron

Salgotra, Rohit; Mittal, Nitin; Mittal, Vikas

doi:10.3390/math11143080

Open AccessArticle

A New Parallel Cuckoo Flower Search Algorithm for Training Multi-Layer Perceptron

by

Rohit Salgotra

^1,2,*

,

Nitin Mittal

³

and

Vikas Mittal

³

¹

Faculty of Physics and Applied Computer Science, AGH University of Science & Technology, 30-059 Krakow, Poland

²

MEU Research Unit, Middle East University, Amman 11813, Jordan

³

University Centre for Research and Development, Chandigarh University, Mohali 140413, India

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(14), 3080; https://doi.org/10.3390/math11143080

Submission received: 21 June 2023 / Revised: 7 July 2023 / Accepted: 10 July 2023 / Published: 12 July 2023

(This article belongs to the Special Issue Biologically Inspired Computing, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

This paper introduces a parallel meta-heuristic algorithm called Cuckoo Flower Search (CFS). This algorithm combines the Flower Pollination Algorithm (FPA) and Cuckoo Search (CS) to train Multi-Layer Perceptron (MLP) models. The algorithm is evaluated on standard benchmark problems and its competitiveness is demonstrated against other state-of-the-art algorithms. Multiple datasets are utilized to assess the performance of CFS for MLP training. The experimental results are compared with various algorithms such as Genetic Algorithm (GA), Grey Wolf Optimization (GWO), Particle Swarm Optimization (PSO), Evolutionary Search (ES), Ant Colony Optimization (ACO), and Population-based Incremental Learning (PBIL). Statistical tests are conducted to validate the superiority of the CFS algorithm in finding global optimum solutions. The results indicate that CFS achieves significantly better outcomes with a higher convergence rate when compared to the other algorithms tested. This highlights the effectiveness of CFS in solving MLP optimization problems and its potential as a competitive algorithm in the field.

Keywords:

evolutionary algorithm; neural networks; FNN; multi-layer perceptron; cuckoo flower search

MSC:

90C26; 68U05

1. Introduction

Over the past decades, artificial intelligence (AI), and particularly machine learning (ML), has paved the way for researchers to study nature and build problem solving models. In particular, studying the phenomena of natural selection, social behavior, and other patterns has led to the rise of evolutionary computing, swarm intelligence, and neural networks (NN). NN are the most significant invention in the arena of soft computing, inspired by neurons present in human brain. The basic NN model was conceptualized by McCulloch and Pitts [1]. There are various types of NNs, including Kohonen self-organizing networks [2], recurrent NN [3], spiking NN [4], feed-forward networks (FNN) [5], and others. Among these NNs, FNN are the simplest, with low computational cost and high performance. FNN receive input from one side and provides output at the other. The FNN is generally unidirectional, with multiple layers in between. If there is only a single layer, the network is called a Single-layer perceptron (SLP) [6]. SLPs are used for solving linear problems. If there are multiple layers, called a multi-layer perceptron (MLP) [1,7], these networks are used to solve non-linear problems.

All NNs have a common feature of learning from experience. Such NNs are called Artificial NN (ANN), and they adapt themselves according to given set of inputs. ANNs can be supervised using an external source for providing feedback [8,9], or they can be unsupervised [10,11], taking the form of a NN that adapts to its own inputs without any external feedback. Training NNs to achieve the highest possible performance is performed by a trainer. The trainer provides the NN with a set of input samples, modifies it with the structural parameters of the NN, and finally, when the training process is complete, the trainer is omitted and the NN is set as active and is available for use. There are two types of trainers: deterministic and stochastic. Supervised learning to solve problems, brought about with the advent of the Back propagation (BP) algorithm [12] and the gradient search algorithm, are deterministic methods that aim at training through mathematical optimization to achieve maximum performance. These trainers are simple and have higher convergence speed, leading to a global optimum from a single solution. These optimization methods have a problem of becoming in a local optima that is sometimes mistaken as global optima. On the other hand, stochastic training methods use stochastic optimization methods to achieve desired performance. These methods initiate training with a random solution and enhance it to achieve a global optimum. Randomness in stochastic methods provides local optima avoidance but these methods are slower than deterministic methods [13,14]. Stochastic trainers are generally used in literature due to high avoidance of local optimum.

Stochastic trainers can be single-solution or multi-solution. For a single-solution, the NN is constructed by training it with a single random solution and evolving it iteratively until stopping criteria is satisfied. Simulated annealing (SA) [15,16], hill climbing [17], and others [18,19] are examples of single-solution NNs. Multi-solution NNs, on the other hand, are initiated with multiple random solutions and evolve each solution unless the stopping criteria is met. These criteria include Genetic algorithm (GA) [20], Ant colony optimization (ACO) [21], Artificial Bee colony (ABC) [22,23], Particle swarm optimization (PSO) [24,25], Differential evolution (DE) [26], Teacher-learning based optimization (TLBO) [27], Invasive weed algorithm (IWO) [28], ensemble techniques [29], Grey Wolf optimization (GWO) [30], and others. These algorithms have high performance in terms of finding approximate global optimum solutions. This inspires us to develop a new meta-heuristic and apply it efficiently for training NNs.

In this work, a new parallel algorithm based on Cuckoo Search (CS) [31] and Flower Pollination Algorithm (FPA) [32], which we have named Cuckoo Flower Search (CFS), is introduced. The main motivation for this work is the problem of local optima stagnation and premature convergence problems of already existing algorithms. CFS has been tested on standard benchmark functions and compared with state-of-the-art algorithms for establishing its competitiveness. In addition, it has been further tested on FNN-MLP as an application to real world problems. Nineteen benchmark functions have been used to analyze the performance of the proposed algorithm. These benchmark functions consist of unimodal functions, multi-modal functions, and fixed-dimension functions. These problems are highly challenging and any algorithm performing well on these functions is considered to be a good algorithm. A comparison with GWO, CS, FPA, BFP, and others was also conducted. Statistical tests have also been performed to prove the superiority of CFS over other comparable algorithms. The major contributions of the paper are highlighted as:

To avoid premature convergence and local optima stagnation, best known properties of FPA and CS are added to the proposed algorithm.
The global and local search phase equations of FPA and CS are optimized for addition in the proposed algorithm.
Solutions generated by FPA and CS are compared and best among the two is selected as the current best solution. These solutions are further generated over the course of iterations to find the global best solution.
A greedy selection operation is followed for retaining the best solution over subsequent iterations.
The proposed algorithm is tested on 19 classical benchmark functions, and Wilcoxon rank-sum test is done to prove the significance of the algorithm statistically.
Finally, five real-world datasets, including Heart, Breast cancer, Iris, Ballon, and XOR, are optimized using the proposed algorithm.
The source code of CFS algorithm is available at: https://github.com/rohitsalgotra/CFS (accessed on 20 June 2023).

The rest of the paper is organized as follows: Section 2 describes the preliminary definitions of FNN and MLP. The basics of CS and FPA are detailed in Section 3. Section 4 describes the proposed CFS algorithm. Section 5 presents with the results and discussion. Finally, Section 6 concludes the paper.

2. Feed-Forward Neural Networks and Multi-Layer Perceptron

FNNs are that are unidirectional networks and have a one-way connection between neurons. They contain several parallel layers in which neurons are arranged [33]. The first layer is the input layer and the last last is the output layer. In between these are several other layers that correspond to hidden layers. A three-layer MLP with n input nodes, h hidden nodes, and m number of outputs is shown in Figure 1, showing a simple unidirectional connection between the nodes. The outputs are calculated as in [34]:

Weighted sum of inputs is given by:

s_{j} = \sum_{i = 1}^{n} (W_{i j} \cdot W_{i}) - θ_{j}, j = 1,2, \dots h

(1)

Outputs of hidden layers are calculated as:

s_{j} = s i g m o i d (s_{j}) = \frac{1}{(1 + e x p (- s_{j}))}, j = 1,2, \dots h

(2)

Final output based on the hidden node outputs is given as:

o_{k} = \sum_{j = 1}^{h} (W_{j k} {\cdot S}_{j}) - θ_{k}, k = 1,2, \dots m

(3)

o_{k} = s i g m o i d (o_{k}) = \frac{1}{(1 + e x p (- o_{k}))}, k = 1,2, \dots m

(4)

where W_ij and W_jk are weight connection of ith node in input layer to jth node in the hidden layer and from jth hidden layer to kth output layer, respectively,

θ_{j}

and

θ_{k}

are the threshold of jth hidden layer and kth output layer, respectively, and X_i is the ith input layer.

From the above equations, it can be seen that weights and thresholds define the final value of the MLPs. The major concern is finding optimum weights and thresholds (biases) for achieving a balanced relation between input and outputs.

3. Basic Cuckoo Search and Flower Pollination Algorithm

3.1. Cuckoo Search Algorithm

The CS algorithm is inspired by the obligatory brood parasitism behavior of cuckoos [35]. The cuckoos of some species lay their eggs in the nests of host birds, following an obligate brood parasitism. CS is a competitive algorithm among existing algorithms. CS contains components including the selection of best solution and ensuring that this best solution is passed on to the next generation. It employs local random walk to perform the exploitation locally and randomization via Lévy flights to perform the exploration globally. Three rules are established that describe the Cuckoo Search in a simple way. These are explained as follows:

Each cuckoo lays one egg and dumps it in a random nest;
The nest with highest fitness will carry over to next generation;
The host bird discovered the cuckoo’s egg with a probability p_a ∈ [0, 1]. A fixed number of host nests are available. Depending on $p_{a}$ , a new nest is built by the host bird at a new location either by throwing the egg away from the nest or abandoning the nest.

In CS, a solution is an egg that is already present in a nest and a new solution is that egg which is laid by a cuckoo. The not-so-good solutions of nests are replaced by the new and better solutions [35]. More complicated cases arise when multiple eggs are present in each nest. In these cases, the extended form of this algorithm can be used. Based on the above three rules, Equation (5) derives the Levy flight that is performed to produce a new solution

x_{i}^{t + 1}

for

i t h

cuckoo:

x_{i}^{t + 1} = x_{i}^{t} + α ⨁ L é vy (λ)

(5)

where the previous solution is denoted by

x_{i}^{t}

,

⨁

is entry wise multiplication, and α > 0 is the step size. In most cases, α = 1 is used. The above equation is the stochastic equation for random walk. In the case of random walk, the current location draws a path to next status/location and the transition probability of next position. PSO also used this type of entry-wise product.

Cuckoos usually search for food using a basic random walk. This is a Markov chain whose updated position is determined by the present location and the transition probability of the following position. The performance of CS is enhanced using Lévy flights [36]. Lévy flight is a random walk measured in step-lengths following a heavy-tailed probability distribution. Ultimately, Levy flight is not a continuous space; it is used to refer to a discrete grid [37,38,39]. The Levy flight is employed in this study as a result of Levy flight’s greater efficiency in exploring the search space. Our algorithm is generated from a Levy distribution with infinite mean and variance.

As the random walk occurs via Lévy flight, the Lévy distribution draws the random step length as:

L é vy \sim u = t^{- λ}, (1 < λ \leq 3)

(6)

This random walk process is a heavy tail step-length distribution. The Lévy walk achieves new solutions nearer to the best solutions to speed up the local search [36]. Far field randomization should be used to create some of the solutions in order to prevent the system getting stuck in a local optimum. Here, some points are discussed that show that CS is analogous to and competitive with other optimization algorithms. First, as with other GA and PSO algorithms, CS is a population-based algorithm. Second, because of the heavy tailed step length, the large step is possible in CS and the randomization is more efficient. Third, a wide class of optimization problems have adapted to the CS because it tunes fewer parameters when compared to PSO and GA.

3.2. Flower Pollination Algorithm

Flowers are fascinating species. Dating from the Cretaceous period, flowers are estimated to comprise about 80 percent of the total species of plants [40]. About 250,000 species of flowers have been found on earth. The ultimate aim of flowers is to reproduce and this reproduction occurs mainly by pollination. In pollination, pollen is transferred from one flower to other by pollinators. Cross-pollination means that pollination occurs due to pollen from different plants. On the other hand, self-pollination means fertilization of pollen from the same or different flowers of the same plant. Pollinators can be insects, birds, or any other animal. Some flowers do attract only specific kinds of insects for pollination, showing a sort of flower-insect partnership; this is referred to as called flower constancy. Pollinators such as honeybees have been found to develop flower constancy. This property helps pollinators to visit only particular plant species, hence increasing the chances of reproduction for the flower and, in turn, maximizing nectar supply for the pollinator [41].When the pollen is shifted by pollinators such as insects and animals, the process is called biotic pollination (about 90 percent occurs via biotic). Meanwhile, when it occurs via diffusion or wind, the process is called abiotic [42] (this constitutes about 10 percent of pollination). In total, there are about 200,000 varieties of pollinators found on earth. Biotic cross-pollination occurs over long distance and is facilitated by birds, bats, bees, and fireflies, among other animals. This is often referred to as global pollination. Meanwhile, self-pollination is termed as local pollination.

The above characteristics are idealized into four set of rules [43]:

Global pollination arises via biotic and cross-pollination.
Local pollination occurs via abiotic and self-pollination.
Flower constancy, termed as reproduction probability, is proportional to the similarity of two flowers.
Switch probability p ϵ [0, 1] balances global and local pollination.

When designing the algorithm, it is expected that each plant has only one flower producing only a single pollen gamete. Following this, we can use y_i as a solution equivalent to a flower or a pollen gamete, defining a single objective problem.

The above characteristics have been combined to design an FPA that mainly consists of local and global pollination. In the earlier version, pollination and reproduction of the fittest flower is ensured and the rules are represented mathematically as:

y_{i}^{t + 1} = y_{i}^{t} + α L (λ) (R_{*} - y_{i}^{t})

(7)

where

R_{*}

is the current best solution,

y_{i}^{t}

is the potential solution at t iteration, and α is the scaling factor to control the Lévy flight-based step size L(λ). Lévy flight is expressed as:

L ~ \{\frac{λ Γ (λ) s i n (π λ / 2)}{π} \frac{1}{s^{1 + λ}}, (s ≫ s_{0} > 0)

(8)

where Γ(λ) is the standard gamma function.

The local pollination rule can be mathematically represented as:

y_{i}^{t + 1} = y_{i}^{t} + ϵ (y_{j}^{t} - y_{k}^{t})

(9)

where

y_{j}^{t}

and

y_{k}^{t}

are pollens from diverse flowers of the same plants. In the confined space, flower constancy corresponds to a local random walk, and is selected from a uniform distribution

ϵ

in [0, 1].

4. Cuckoo Flower Search Algorithm

4.1. Algorithm Definition

The CFS algorithm is proposed as a hybrid version of the CS and FPA algorithms. Both these algorithms work in coordination to attain a global optimum solution. The main idea is to generate the current best solution for both cuckoos and flower pollinators. After finding this solution, both are compared, the best solution is considered, and the process is repeated. The solution after first evaluation is fed back to the cuckoos and flower pollinators. This procedure is continued until the termination criteria are met. The final solution is the most appropriate solution to the problem under discussion. There are three phases to the proposed CFS algorithm:

Initialization

This is the first phase of the CFS algorithm, in which the population is randomly initialized. The solution is initialized according to Equation (10) and operates as a potential solution to the problem under examination, starting with an initial population of N cuckoos and flower pollinators (termed as CF).

{C F}_{i, j} = {C F}_{m i n, j} + a_{b} * ({C F}_{m i n, j} - {C F}_{m a x, j})

(10)

where i ϵ {1,….CF}, j ϵ {1,…..D}, CF_i,j is the ith solution in the jth dimension, D is the dimension or number of variables in the problem being studied, CF_min,j, CF_max,j are the lower and upper bounds, respectively, and

a_{b}

is randomly generated number between [0, 1]. Here, the population initialized in Equation (10) is same for both cuckoos and flower pollinators. The fitness of the solution is estimated for objective function after initialization, and the best solution attained is treated as the initial best for all cuckoos (

C_{F}

) and flower pollinators

(F_{C})

.

Solution generation

After the initial step, two new solutions are generated: one solution is inspired by cuckoo brood parasitism and the other from flower pollinators. The main concern here is to follow exploration and exploitation in a well-defined fashion. In cuckoos, exploration is achieved by randomization via Lévy flights. Local random walk is used to achieve exploitation. The new solution

x_{i}^{t + 1}

is generated as per Equation (6) and its fitness is evaluated for the optimization problem being tested. The solution

x_{i}^{t + 1}

obtained in this manner is compared with the

C_{F}

, and the best (

C_{F}^{b e s t}

) among them is retained.

In the case of flower pollinators, exploration and exploitation is balanced by local pollination and global pollination, respectively. Equations (7) and (9), based on random probability in the search domain [0, 1], often called the switch probability, are used extensively to find the second new solution

{(y}_{i}^{t + 1}

). This solution

y_{i}^{t + 1}

and the initial best

(F_{C})

solution are also compared, resulting in another best solution

F_{C}^{b e s t}

among them.

Final evaluation

After comparing the best fit solutions obtained by cuckoos (

C_{F}^{b e s t}

) and flower pollinators (

F_{C}^{b e s t}

), the best solution attained is the final optimum solution. For both cuckoos and flower pollinators, the solution generated at the last stage is set as the initial best (

C_{F}

and

F_{C}

, respectively) Unless and until the termination requirements are met, the same procedure is followed. The final solution obtained in this manner is the most appropriate solution. It is also worth noting that cuckoos and flower pollinators are both seeking the most appropriate solutions in parallel. If a cuckoo-produced solution becomes trapped in the local optimum and is unable to deliver the global optimal, flower pollinators assist it in exiting the local trap and achieving the global optimum, and vice versa. This characteristic increases the likelihood of the CFS algorithm reaching the global optimum solution.

Both CS and FPA are good algorithms in terms of finding global optimum solution, but the real problem is their inconsistency in finding the best fit individual every time the algorithm is run. This inconsistency is due to the problem of getting stuck in local optima while moving toward global optima. As a result, a better solution is required to move the algorithm closer to the global optimum. If the CS algorithm becomes stuck, FPA moves it towards the global optimum, and vice versa. As a result, both Cuckoos and Flower Pollinators collaborate analytically to obtain a global optimum. Figure 2 shows the flow code for the CFS algorithm. The pseudocode of the proposed algorithm is given in Algorithm 1.

Algorithm 1: Pseudocode of CFS algorithm

begin:
1. Initialize:

α, β_{0},

γ

, maximum iterations
      2.  Define Population, objective function f(x)
      3.  While (t < maximum iterations)
                         For i = 1 to n
                                       For j = 1 to n
                                                     Evaluate new solution using CS inspired equation;
                                                     Evaluate new solution using FPA inspired equation;
                                       Find the best among the two using greedy selection;
                                       End for j
                         End for i
      4.  Update current best.
      5.  End while
      6.  Find final best
end.

4.2. CFS-MLP Trainer

When training a MLP, the first step is to formulate a problem [44] and find values of weights and biases with the highest accuracy/classification/statistical results. These should be found by a trainer. This important step is achieved by training an MLP using meta-heuristic algorithms. There are three methods for training MLPs using meta-heuristic algorithms [45]:

To find combination of weights and biases of MLP for achieving the minimum error using meta-heuristic algorithms. In this approach, proper values of weights are found without changing the basic architecture of the heuristic algorithm. It has simple a encoding phase and a difficult decoding phase, and so is often used for simple NNs.
To find proper architecture for an MLP using heuristic algorithms. In this method, the architecture varies and it can be achieved by varying the connections between hidden nodes, layers, and neurons, as proposed in [46]. This method has a simple decoding phase but, due to complexity in the encoding phase, it is used for complex structures.
To tune the gradient-based learning algorithm parameters using a heuristic approach. This method has been used to train FNNs using EAs [47] and others, such as GA [48], using a combination of methods to tune FNN. In this method, the decoding and encoding processes are very complicated and hence the structure becomes very complex.

In the present work, the CFS algorithm is proposed and applied to train an MLP using the first method. The weights and biases for the CFS algorithm are given in the form of a vector as follows:

C = (W, θ) = (W_{1,1}, W_{1,2} {, \dots, W}_{n, n} | θ_{1}, θ_{2}, \dots, θ_{h})

(11)

where n is the number of nodes, W_ij is the weight connection between ith and jth node, and

θ_{j}

is the bias of jth hidden node. After setting the initial variables, the fitness function is to be designed using CFS algorithm. This is achieved by defining a common metric for evaluation of the MLP and is called Mean Square Error (MSE). The MSE is used to calculate the difference between the desired output and the value obtained from MLP. The performance of MLP is based upon the average MSE values of all training samples and is given by:

M S E = \sum_{k = 1}^{s} \frac{\sum_{i = 1}^{m} {(o_{i}^{k} - d_{i}^{k})}^{2}}{s}

(12)

where s is training samples count, m is the number of outputs, and

o_{i}^{k}

and

d_{i}^{k}

are the actual output and desired output of the ith input for the kth training sample, respectively. Based on MSE, the final objective function can be formulated as:

M i n i m i z e : F (C) = M S E

(13)

The overall process of using the CFS algorithm delivers MLP with weights as well as biases and, in turn, receives average MSE for all training samples. The CFS algorithm updates the weights and biases iteratively in order to achieve minimized average MSE. The best MSE is obtained from the last iteration of the algorithm. Since weights and biases find the best MSE in the MLP, there is a greater chance of improvement in the MLP structure at each iteration. Thus, the CFS algorithm converges toward a better solution than initial random solution.

5. Result and Discussion

This section presents the details on the applicability of the proposed CFS algorithm for classical benchmark problems and real-world optimization of FNN-MLP. We have used 19 benchmark functions, consisting of unimodal, multi-modal, and fixed dimension problems. For the optimization of real-world FNN-MLP, five highly challenging datasets have been used. More details on applicability are presented in the consecutive subsections. For performance analysis, the simulations are performed on MATLAB, using a Windows 10 × 64, Intel Core i3 processor, with 8 GB RAM.

5.1. Benchmark Problems

To check the effectiveness of the CFS algorithm, it was tested on nineteen well known benchmark problems. The Wilcoxon rank-sum tests were performed to test the validity of results statistically. This non-parametric test is used to detect the static significance of any algorithm. Differences between two pairs of populations were analyzed and compared. The test returns a p-value determining the significance level of two algorithms. This value should be less than 0.05 for an algorithm to be statistically efficient [49]. The proposed algorithm is compared with ABC [50], Firefly Algorithm [51], FPA, CS, and Bat Flower Pollinator [52] algorithms. The parameter setting to test each algorithm for benchmark problems is shown in Table 1.

5.1.1. Unimodal Functions

There is no local solution for unimodal functions; they have a single global solution. These benchmark functions are useful for evaluating the convergence characteristics of heuristic optimization techniques. The CFS algorithm was applied to four unimodal benchmark problems with three dimension sets (30, 50, and 100), as given in Table 2. The CFS algorithm was compared to the ABC, FA, FPA, CS, and BFP algorithms (see Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8). For the 30 (Table 3) and 50 (Table 5) dimension (D) problems, with the function f₁, the FPA algorithm has a better mean and best value but CS and CFS are found to give the best values of standard deviation. For function f₂, the FA is found to be better, with a highly competitive result for the CFS algorithm. for f₃, the CFS algorithm provides better results, and for f₄, the ABC and CFS algorithms are both competitive when compared to rest of the algorithms. For 100 D (Table 7), the CFS algorithm performs better for f₂ and f₃. for f₁, FPA is better, and for f₄, ABC is better. The BFP algorithm is better for none of the functions. The rank-sum tests from Table 4, Table 6 and Table 8 acknowledge the superior performance of the CFS algorithm. The convergence characteristics are shown in Figure 3.

5.1.2. Multimodal Functions

Multimodal benchmark functions feature several local minima that grow in number exponentially with dimension. As such, they are good for testing an algorithm’s ability to avoid local minima. The CFS algorithm has been applied to six multimodal benchmark problems, with three-dimension sets (30, 50, and 100), as shown in Table 9. The algorithm has been compared to the ABC, FA, FPA, BFP, and CS algorithms. For 30 D and 50 D problems, with the f₅, f₆, and f₇ functions, the CFS algorithm performs better, while only the FA algorithm performs better than CFS with the f₈, f₉, and f₁₀ functions, as shown in Table 10, Table 11, Table 12 and Table 13, respectively. For 100 D (Table 14 and Table 15), the CFS algorithm performs better for the f₅, f₆, f₇, and f₉ functions. for f₈ and f₁₀, the FA algorithm performs better. The rank-sum tests from Table 11, Table 13 and Table 15 shows that the performance of the CFS algorithm is better statistically. The convergence characteristics are shown in Figure 4.

5.1.3. Fixed Dimension Functions

Fixed dimension benchmark functions have finite dimensional space. The CFS algorithm has been applied to nine benchmark functions, as shown in Table 16, and the results have been compared to the ABC, CS, FA, BFP, and FPA algorithms. It can be seen from Table 17 and Table 18 that the CFS algorithm performs better than the other algorithms for all the test problems. For functions f₁₄, f₁₅, f₁₆, f₁₇, and f₁₈, the ABC, FA, FPA, and CS algorithms are not able to achieve the global optimum. For the remaining algorithms, even if the global optimum is met, the CFS algorithm shows superior consistency because of its better standard deviation. The CFS algorithm’s results are also statistically better, as shown in Table 18. The convergence curves are shown in Figure 5.

5.2. FNN–MLP Datasets

The proposed CFS algorithm was used to train FNN-MLP datasets. The standard benchmark FNN–MLP data sets were obtained from the Machine Learning Repository of the University of California, Irvine [53]. The datasets used are: Breast Cancer, XOR, Balloon, Iris, and Heart. The results are compared with the PSO, GA, ES, ACO, GWO, PLIB [30,54,55,56,57] algorithms, and the Whale Optimization Algorithm (WOA) [58] and the Moth Flame Optimization (MFO) [59] for verification. The optimization parameter settings for the CFS algorithm are presented in Table 19. Table 20 details the specifications of the datasets used for comparison. The simplest dataset is the 3-bit XOR; it has three attributes and eight training/test samples. for the Balloon dataset, there are four attributes and 16 training/test samples. For the Iris dataset, there are 150 training/test samples with four attributes. For the Breast Cancer dataset, the highest number of 100 test samples, 599 training samples, and nine attributes are used. for the Heart dataset, there are 80 training samples, 187 test samples, and 22 attributes. The number of classes for each dataset is two, except for Iris, which is set to three. These datasets are highly complex sets of problems and are employed to test the performance of the CFS algorithm. The total number of runs for the CFS algorithm is set to 10; this is the same as used in study [30]. The number of function evaluations for the XOR and Balloon datasets is 50 × 250 = 12,500 for all the algorithms. For the Iris, Breast Cancer, and Heart datasets, the number of function evaluations is 20 × 250 = 5000 for the CFS algorithm and 200 × 250 = 50,000 for the rest. The results are presented in the form of an average of 10 runs and their standard deviation is obtained from the best MSEs in the last iteration of the CFS algorithm. The best results are those with the lowest average and standard deviation, ultimately indicating the better performance of the proposed approach [60,61].

X^{t} = \frac{(x - a) \times (d - c)}{(b - a)} + c

(14)

The number of hidden nodes for N number of inputs of datasets is kept constant and is given by

2 \times N + 1

. The structure for each MLP is given in Table 21.

5.2.1. XOR Dataset

This dataset returns XOR of input as output. It has three inputs, eight training/test samples, and one output. This dataset has a dimension of 36 with range of [−10, 10], with an MLP structure of 3−7−1. The results, in term of average and standard deviation, are given in Table 22. It can be seen in Table 4 that the performance evaluation of the CFS-MLP algorithm is far better than all other algorithms tested.

5.2.2. Balloon Dataset

The Balloon dataset has a dimension of 55, with range of [−10, 10]. This dataset has 18 training/test samples, with four attributes and two classes, with an MLP structure of 4−9−1. The results are given in Table 23. The results show that the CFS algorithm gives far higher average and standard deviation values when compared to the GWO, PSO, GA, ACO, ES, PBIL, WOA, and MFO algorithms.

5.2.3. Iris Dataset

The Iris dataset has 75 variables to be optimized in the range of [−10, 10]. It has 150 training/test samples, with four attributes and two classes. The MLP structure of 4−9−3 is utilized to solve this dataset. The results are shown in Table 24. For the Iris dataset, the results of the CFS algorithm are competitive with GWO in terms of average; for standard deviation, the results of the CFS algorithm are superior.

5.2.4. Breast Cancer Dataset

This is a challenging dataset, with 100 test samples, 599 training samples, nine attributes, and two classes. It has 209 dimensions, with an MLP structure of 9−19−1. The outcomes of this dataset are given in Table 25. The results show that the CFS algorithm is far superior than the PSO, GA, ACO, PBIL, WOA, and MFO algorithms. When compared to GWO, they are highly competitive.

5.2.5. Heart Dataset

This is the last dataset used in this paper and was solved with an MLP structure of 22−45−1. It has 187 test samples, 80 training samples, 22 attributes, and two classes. The results are shown in Table 26. The Heart dataset is a very challenging dataset, with a 1081 dimension. The CFS algorithm performs better for this dataset when compared to others.

5.3. Discussion of Results

The results comparison of the CFS algorithm with the ABC, CS, FA, FPA, and BFP algorithms show that, for test function, the CFS algorithm delivers very competitive results for unimodal and multimodal benchmark problems. For fixed dimension problems, no algorithm among ABC, CS, FA, FPA, and BFP are comparable. This occurs as a result of the inability of these algorithms to emerge from local minima. The ABC algorithm has the problem of becoming stuck in local minima, while the CS and FPA algorithms are inconsistent due to their inability to emerge from local minima and are, hence, inconsistent. In its initial stage, the FA algorithm has slower convergence because of random distribution and as a result of its insufficiency in exploring ability. At the last stage, fireflies gather around the optimal solution but, due to random motion and attractiveness, there can be flight mistake and hence the solution converges very slowly.

The results of the CFS-MLP clearly show that, for the XOR and Balloon datasets, there are same number of function evaluations and that the results are bothbetter and significant. For the Iris, Breast Cancer, and Heart datasets, the minimum number of function evaluations for CFS algorithms is 5000, while for others it is 50,000. Hence, it can be said that the CFS algorithm is able to achieve a more significant result with fewer function evaluations. This proves the superiority of the CFS algorithm over the GWO, PSO, GA, ACO, ES, PBIL, WOA, and MFO algorithms.

In the CFS algorithm, there are two search agents: cuckoos and flower pollinators. When cuckoos are not able to find the optimal solution, flower pollinators help them; in turn, the flower pollinators are helped by cuckoos when stuck in a local optimum. Therefore, two solutions (one from cuckoos and the other from flower pollinators) are generated. The final solution is the best among the two. This helps the CFS algorithm to achieve faster convergence and consistency in finding the optimal solution.

6. Conclusions

In this work, a new CFS algorithm was proposed for MLP training. The algorithm was first tested over 19 standard benchmark functions and their results were statistically compared with the ABC, CS, FA, FPA, and BFP algorithms. The results demonstrate that the CFS algorithm perform significantly better, with higher consistency, in avoiding local minima when compared to the ABC, CS, FA, FPA, and BFP algorithms. The CFS algorithm was then used to train MLPs; five datasets were used. The results of the CFS-MLP were compared, in terms of average and standard deviation, with the GWO, PSO, GA, ACO, ES, PBIL, WOA, and MFO algorithms. The experimental results again proved the superiority of the CFS algorithm for MLPs.

Despite this, the proposed algorithm has certain drawbacks. Because of the stochastic nature of the algorithm, the algorithm is inefficient for several kinds of problems. As the CFS algorithm uses two general equations for finding new solutions, the computational complexity of the algorithm increases. Thus, it is imperative to find new and prospective solutions to deal with the complexity problem. As a future direction of study, the parameters of the algorithm can be exploited to find the best set of parameters. In addition, the CFS algorithm can be applied to various other domains, including clustering, antenna array synthesis, feature selection, medical imaging, segmentation, and others. Apart from that, the proposed algorithm can be extended to multi-objective, hyper-parameters, and other dimensions.

Author Contributions

Conceptualization, R.S.; methodology, R.S.; software, N.M.; validation, R.S., N.M. and V.M.; formal analysis, V.M.; investigation, R.S.; resources, N.M.; data curation, V.M.; writing—original draft preparation, R.S.; writing—review and editing, R.S, and N.M.; visualization, R.S.; supervision, R.S.; project administration, R.S. and N.M.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Kohonen, T. The self-organizing map. Proc. IEEE 1990, 78, 1464–1480. [Google Scholar] [CrossRef]
Dorffner, G. Neural networks for time series processing. Neural Netw. World 1996. [Google Scholar]
Ghosh-Dastidar, S.; Adeli, H. Spiking neural networks. Int. J. Neural Syst. 2009, 19, 295–308. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bebis, G.; Georgiopoulos, M. Feed-forward neural networks. IEEE Potentials 1994, 13, 27–31. [Google Scholar] [CrossRef]
Rosenblatt, F. The Perceptron, A Perceiving and Recognizing Automaton Project Para; Cornell Aeronautical Laboratory: Buffalo, NY, USA, 1957. [Google Scholar]
Werbos, P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, 1974. [Google Scholar]
Reed, R.D.; Marks, R.J. Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
Caruana, R.; Niculescu-Mizil, A. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 161–168. [Google Scholar]
Hinton, G.E.; Sejnowski, T.J. Unsupervised Learning: Foundations of Neural Computation; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
Wang, D. Unsupervised Learning: Foundations of Neural Computation; MIT Press: Cambridge, MA, USA, 2001; p. 101. [Google Scholar]
Hertz, J. Introduction to the Theory of Neural Computation. Basic Books 1; Taylor Francis: Abingdon, UK, 1991. [Google Scholar]
Wang, G.-G.; Guo, L.; Gandomi, A.H.; Hao, G.-S.; Wang, H. Chaotic krill herd algorithm. Inf. Sci. 2014, 274, 17–34. [Google Scholar] [CrossRef]
Wang, G.-G.; Gandomi, A.H.; Alavi, A.H.; Hao, G.-S. Hybrid krill herd algorithm with differential evolution for global numerical optimization. Neural Comput. Appl. 2013, 25, 297–308. [Google Scholar] [CrossRef]
Van Laarhoven, P.J.; Aarts, E.H. Simulated Annealing; Springer: Berlin/Heidelberg, Germany, 1987. [Google Scholar]
Szu, H.; Hartley, R. Fast simulated annealing. Phys. Lett. A 1987, 122, 157–162. [Google Scholar] [CrossRef]
Mitchell, M.; Holland, J.H.; Forrest, S. When will a genetic algorithm outperform hill climbing? NIPS 1993, 51–58. [Google Scholar]
Sanju, P. Enhancing Intrusion Detection in IoT Systems: A Hybrid Metaheuristics-Deep Learning Approach with Ensemble of Recurrent Neural Networks. J. Eng. Res. 2023; in press. [Google Scholar]
Mirjalili, S.; Mohd Hashim, S.Z.; Moradian Sardroudi, H. Training feedforward neural networks using hybrid particle swarm optimization and gravitational search algorithm. Appl. Math. Comput. 2012, 218, 11125–11137. [Google Scholar] [CrossRef]
Whitley, D.; Starkweather, T.; Bogart, C. Genetic algorithms and neural networks: Optimizing connections and connectivity. Parallel Comput. 1990, 14, 347–361. [Google Scholar] [CrossRef]
Shokouhifar, A.; Shokouhifar, M.; Sabbaghian, M.; Soltanian-Zadeh, H. Swarm intelligence empowered three-stage ensemble deep learning for arm volume measurement in patients with lymphedema. Biomed. Signal Process. Control. 2023, 85, 105027. [Google Scholar] [CrossRef]
Socha, K.; Blum, C. An ant colony optimization algorithm for continuous optimization: Application to feed-forward neural network training. Neural Comput. Appl. 2007, 16, 235–247. [Google Scholar] [CrossRef]
Ozturk, C.; Karaboga, D. Hybrid Artificial Bee Colony algorithm for neural network training. In Proceedings of the 2011 IEEE Congress on, Evolutionary Computation (CEC), New Orleans, LA, USA, 5–8 June 2011; pp. 84–88. [Google Scholar]
Mendes, R.; Cortez, P.; Rocha, M.; Neves, J. Particle swarms for feed forward neural network training. In Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No.02CH37290), Honolulu, HI, USA, 12–17 May 2002. [Google Scholar]
Gudise, V.G.; Venayagamoorthy, G.K. Comparison of particle swarm optimization and backpropagation as training algorithms for neural networks. In Proceedings of the Swarm Intelligence Symposium, SIS’03, Indianapolis, IN, USA, 26 April 2003; pp. 110–117. [Google Scholar]
Ilonen, J.; Kamarainen, J.-K.; Lampinen, J. Differential evolution training algorithm for feed-forward neural networks. Neural Process. Lett. 2003, 17, 93–105. [Google Scholar] [CrossRef]
Uzlu, E.; Kankal, M.; Akpınar, A.; Dede, T. Estimates of energy consumption in Turkey using neural networks with the teaching–learning-based optimization algorithm. Energy 2014, 75, 295–303. [Google Scholar] [CrossRef]
Moallem, P.; Razmjooy, N. A multi-layer perceptron neural network trained by invasive weed optimization for potato color image segmentation. Trends Appl. Sci. Res. 2012, 7, 445–455. [Google Scholar] [CrossRef]
Darekar, R.V.; Chavan, M.; Sharanyaa, S.; Ranjan, N.M. A hybrid meta-heuristic ensemble based classification technique speech emotion recognition. Adv. Eng. Softw. 2023, 180, 103412. [Google Scholar] [CrossRef]
Mirjalili, S. How effective is the Grey Wolf Optimizer in training multi-layer perceptrons. Appl. Intell. 2015, 43, 150–161. [Google Scholar] [CrossRef]
Yang, X.-S.; Deb, S. Engineering optimization by cuckoo search. Int. J. Math. Model. Numer. Optim. 2010, 1, 330–343. [Google Scholar]
Yang, X.-S. Flower Pollination Algorithm for Global Optimization. In Proceedings of the 11th International Conference, UCNC 2012, Orléan, France, 3–7 September 2012; Volume 7445, pp. 240–249. [Google Scholar] [CrossRef] [Green Version]
Fine, T.L. Feedforward Neural Network Methodology; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
Mirjalili, S.; Sadiq, A.S. Magnetic optimization algorithm for training multi-layer perceptron. In Proceedings of the Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference, Xi’an, China, 27–29 May 2011; pp. 42–46. [Google Scholar]
Payne, R.B.; Sorenson, M.D.; Klitz, K. The Cuckoos; Oxford University Press: Oxford, UK, 2005. [Google Scholar]
Barthelemy, P.; Bertolotti, J.; Wiersma, D.S. A Lévy flight for light. Nature 2008, 453, 495–498. [Google Scholar] [CrossRef]
Yang, X.-S.; Deb, S. Cuckoo Search via Levy Flights’. In Proceedings of the 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC), Coimbatore, India, 9–11 December 2009; IEEE Publications: Piscataway, NJ, USA, 2009. [Google Scholar]
Brown, C.; Liebovitch, L.S.; Glendon, R. Lévy Flights in Dobe Ju/’hoansi Foraging Patterns. Human Ecol. 2007, 35, 129–138. [Google Scholar] [CrossRef]
Pavlyukevich, I. Cooling down Lévy flights. J. Phys. A Math. Theory 2007, 40, 12299–12313. [Google Scholar] [CrossRef] [Green Version]
Walker, M. How Flowers Conquered the World, BBC Earth News, 10 July 2009. Available online: http://news.bbc.co.uk/earth/hi/earth_news/newsid_8143000/8143095.stm (accessed on 1 January 2019).
Waser, N.M. Flower constancy: Definition, cause and measurement. Am. Nat. 1986, 127, 596–603. [Google Scholar] [CrossRef]
Glover, B.J. Understanding Flowers and Flowering: An Integrated Approach; Oxford University Press: Oxford, UK, 2007. [Google Scholar]
Xin-She, Y.; Karamanoglu, M.; He, X. Flower pollination algorithm: A novel approach for multiobjective optimization. Eng. Optim. 2014, 46, 1222–1237. [Google Scholar]
Belew, R.K.; McInerney, J.; Schraudolph, N.N. Evolving Networks: Using the Genetic Algorithm with Connectionist Learning; Cognitive Computer Science Research Group: La Jolla, CA, USA, 1990. [Google Scholar]
Smizuta, T.; Sato, D.; Lao, M.; Ikeda, T. Shimizu, Structure design of neural networks using genetic algorithms. Complex Syst. 2001, 13, 161–176. [Google Scholar]
Yu, J.; Wang, S.; Xi, L. Evolving artificial neural networks using an improved PSO and DPSO. Neurocomputing 2008, 71, 1054–1060. [Google Scholar] [CrossRef]
Leung, F.H.; Lam, H.; Ling, S.; Tam, P.K.S. Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans. Neural Netw. 2003, 14, 79–88. [Google Scholar] [CrossRef] [Green Version]
Montana, D.J.; Davis, L. Training Feedforward Neural Networks Using Genetic Algorithms. IJCAI 1989, 89, 762–767. [Google Scholar]
Derrac, J.; García, S.; Molina, D.; Herrera, F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut. Comput. 2011, 1, 3–18. [Google Scholar] [CrossRef]
Karaboga, D. An Idea Based on Honey Bee Swarm for Numerical Optimization; Technical Report TR-06; Erciyes University, Engineering Faculty, Computer Engineering Department: Kayseri, Turkey, 2005. [Google Scholar]
Yang, X.S. Firefly algorithms for multimodal optimization. In Stochastic Algorithms: Foundations and Applications; Lecture Notes in Computer Sciences; SAGA: Chicago, IL, USA, 2009; Volume 5792, pp. 169–178. [Google Scholar]
Urvinder, S.; Salgotra, R. Synthesis of linear antenna array using flower pollination algorithm. Neural Comput. Appl. 2016, 29, 435–445. [Google Scholar]
Blake, C.; Merz, C.J. {UCI} Repository of Machine Learning Databases; UCI: Aigle, Switzerland, 1998. [Google Scholar]
Beyer, H.-G.; Schwefel, H.-P. Evolution strategies—A comprehensive introduction. Nat. Comput. 2002, 1, 3–52. [Google Scholar] [CrossRef]
Yao, X.; Liu, Y.; Lin, G. Evolutionary programming made faster. Evol. Comput. IEEE Trans. 1999, 3, 82–102. [Google Scholar]
Yao, X.; Liu, Y. Fast evolution strategies. In Proceedings of the Evolutionary Programming VI, Indianapolis, IN, USA, 13–16 April 1997; pp. 149–161. [Google Scholar]
Baluja, S. Population-Based Incremental Learning: A Method for Integrating Genetic Search-Based Function Optimization and Competitive Learning; DTIC Document; Carnegie Mellon University: Pittsburgh, PA, USA, 1994. [Google Scholar]
Seyedali, M.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar]
Seyedali, M. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl. Based Syst. 2015, 89, 228–249. [Google Scholar]
Zhou, Y.; Niu, Y.; Luo, Q.; Jiang, M. Teaching learning-based whale optimization algorithm for multi-layer perceptron neural network training. Math. Biosci. Eng. 2020, 17, 5987–6025. [Google Scholar] [CrossRef]
Chong, H.Y.; Yap, H.J.; Tan, S.C.; Yap, K.S.; Wong, S.Y. Advances of metaheuristic algorithms in training neural networks for industrial applications. Soft Comput. 2021, 25, 11209–11233. [Google Scholar] [CrossRef]

Figure 1. An MLP with one hidden node.

Figure 2. Flow-code for CFS algorithm.

Figure 3. Convergence curves for unimodal functions.

Figure 4. Convergence curves for multimodal functions.

Figure 5. Convergence curves for fixed dimension problems.

Table 1. Parameter settings for various algorithms.

Algorithm	Parameters	Values
FA	Number of fireflies	20
	Alpha (α)	0.5
	Beta (β)	0.2
	Gamma (γ)	1
	Stopping Criteria	200 Iterations
ABC	Swarm Size	20
	Limit	100
	Stopping Criteria	200 Iterations
FPA	Population Size	20
	Probability Switch	0.8
	Stopping Criteria	200 Iterations
CS	Population Size	20
	Discovery Rate of alien egg	0.25
	Maximum number of iterations	200
	Stopping Criteria	Max Iteration.
BFP	Population size	20
	Probability Switch	0.8
	Alpha (α)	0.5
	Stopping Criteria	200 Iterations
CFS	Population size	20
	Probability switch	0.8
	Discovery rate of alien egg (p_a)	0.25
	Stopping Criteria	200 Iterations

Table 2. Description of Unimodal Test functions.

Unimodal Test Problems	Objective Function	Search Range	Optimum Value	D
Schwefel function	$f_{1} (x) = \sum_{i = 1}^{D} [x_{i} s i n (\sqrt{\| x_{i} \|})]$	[−500, 500]	−418.9829 × D	30, 50, 100
Sphere function	$f_{2} (x) = \sum_{i = 1}^{D} x_{i}^{2}$	[−100, 100]	0	30, 50, 100
Elliptic function	$f_{3} (x) = \sum_{i = 1}^{D} {(10^{6})}^{\frac{i - 1}{D - 1}} x_{i}^{2}$	[−100, 100]	0	30, 50, 100
Scaffer function	$\begin{array}{c} f_{4} (x) = {[\frac{1}{n - 1} \sqrt{s_{i}} . (\sin (50.0 s_{i}^{\frac{1}{5}}) + 1)]}^{2} s_{i} \\ = \sqrt{x_{i}^{2} + x_{i + 1}^{2}} \end{array}$	[−100, 100]	0	30, 50, 100

Table 3. Results comparison for unimodal functions (30 Dimension).

Objective Function	Algorithm	Best	Worst	Mean	Standard Deviation
$f_{1} (x)$	CFS	−1.16 × 10⁴	−1.03 × 10⁴	−1.08 × 10⁴	3.47 × 10²
	FA	−4.85 × 10³	−2.53 × 10³	−3.78 × 10³	6.61 × 10²
	ABC	−9.65 × 10³	−7.67 × 10³	−8.68 × 10³	4.93 × 10²
	FPA	−6.36 × 10¹⁹	−4.73 × 10¹⁵	−3.72 × 10¹⁸	1.41 × 10¹⁹
	CS	−7.34 × 10³	−6.49 × 10³	−6.94 × 10³	2.30 × 10²
	BFP	−5.19 × 10¹⁰	−2.08 × 10³	−2.76 × 10⁹	1.15 × 10¹⁰
$f_{2} (x)$	CFS	1.0666	2.9397	2.0917	0.4731
	FA	0.0282	0.0818	0.0567	0.0137
	ABC	1.09 × 10⁴	2.31 × 10⁴	1.56 × 10⁴	3.27 × 10³
	FPA	9.52 × 10³	2.28 × 10⁴	1.53 × 10⁴	3.20 × 10³
	CS	2.93 × 10²	1.23 × 10³	8.07 × 10²	2.45 × 10²
	BFP	3.49 × 10⁴	7.41 × 10⁴	6.00 × 10⁴	1.27 × 10⁴
$f_{3} (x)$	CFS	9.73 × 10³	3.56 × 10⁴	2.08 × 10⁴	6.88 × 10³
	FA	1.95 × 10⁶	1.66 × 10⁷	6.96 × 10⁶	4.06 × 10⁶
	ABC	6.75 × 10⁶	5.16 × 10⁸	1.04 × 10⁸	1.17 × 10⁸
	FPA	1.60 × 10⁸	5.09 × 10⁸	2.81 × 10⁸	8.27 × 10⁷
	CS	9.32 × 10⁵	5.87 × 10⁶	2.31 × 10⁶	1.13 × 10⁶
	BFP	9.86 × 10⁸	4.43 × 10⁹	2.73 × 10⁹	7.60 × 10⁸
$f_{4} (x)$	CFS	0	6.43 × 10⁻¹⁴	1.40 × 10⁻¹⁴	1.74 × 10⁻¹⁴
	FA	3.61 × 10⁻¹⁰	0.0298	0.0066	0.0091
	ABC	0	0	0	0
	FPA	1.28 × 10⁻⁵	0.0029	5.12 × 10⁻⁴	7.13 × 10⁻⁴
	CS	1.82 × 10⁻⁸	5.84 × 10⁻⁵	1.05 × 10⁻⁵	1.73 × 10⁻⁵
	BFP	4.67 × 10⁻²	4.75 × 10⁻¹	3.25 × 10⁻¹	1.51 × 10⁻¹

Bold values in the table correspond to the best algorithmic values.

Table 4. P-test values of simulated algorithms for unimodal functions (30 Dimension).

Objective Function	FA	FPA	CS	ABC	CFS
$f_{1} (x)$	6.79 × 10⁻⁸	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸
$f_{2} (x)$	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸
$f_{3} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{4} (x)$	8.00 × 10⁻⁹	8.00 × 10⁻⁹	8.00 × 10⁻⁹	NA	NA

Table 5. Results comparison for unimodal functions (50 Dimension).

Objective Function	Algorithm	Best	Worst	Mean	Standard Deviation
$f_{1} (x)$	CFS	−1.66 × 10⁴	−1.53 × 10⁴	−1.60 × 10⁴	4.10 × 10²
	FA	−9.18 × 10³	−3.88 × 10³	−6.19 × 10³	1.59 × 10³
	ABC	−1.36 × 10⁴	−1.11 × 10⁴	−1.23 × 10⁴	7.09 × 10²
	FPA	−1.18 × 10²⁰	−9.35 × 10¹⁵	−7.75 × 10¹⁸	2.69 × 10¹⁹
	CS	−1.08 × 10⁴	−9.62 × 10³	−1.00 × 10⁴	3.52 × 10²
	BFP	−6.32 × 10¹¹	−1.22 × 10³	−3.31 × 10¹⁰	1.41 × 10¹¹
$f_{2} (x)$	CFS	4.5385	11.9049	9.2753	4.5385
	FA	0.1062	0.2069	0.1578	0.0303
	ABC	5.83 × 10³	1.81 × 10⁴	1.37 × 10⁴	3.21 × 10³
	FPA	1.46 × 10⁴	4.84 × 10⁴	3.03 × 10⁴	8.91 × 10³
	CS	2.09 × 10³	5.50 × 10³	3.83 × 10³	8.82 × 10²
	BFP	9.06 × 10⁴	1.43 × 10⁵	1.18 × 10⁵	1.63 × 10⁴
$f_{3} (x)$	CFS	1.14 × 10⁴	3.03 × 10⁴	1.95 × 10⁴	5.71 × 10³
	FA	2.80 × 10⁶	1.34 × 10⁷	6.66 × 10⁶	2.99 × 10⁶
	ABC	2.91 × 10⁷	1.14 × 10⁹	5.12 × 10⁸	3.16 × 10⁹
	FPA	1.16 × 10⁸	4.53 × 10⁸	2.76 × 10⁸	1.04 × 10⁸
	CS	1.17 × 10⁶	4.58 × 10⁶	2.42 × 10⁶	8.49 × 10⁵
	BFP	1.71 × 10⁹	4.60 × 10⁹	2.70 × 10⁹	8.34 × 10⁸
$f_{4} (x)$	CFS	0	7.88 × 10⁻¹⁴	1.63 × 10⁻¹⁴	2.32 × 10⁻¹⁴
	FA	9.36 × 10⁻¹⁰	0.0336	0.0082	0.0106
	ABC	0	0	0	0
	FPA	2.38 × 10⁻⁵	0.0069	0.0012	0.0019
	CS	2.37 × 10⁻⁸	2.39 × 10⁻⁴	3.22 × 10⁻⁵	7.18 × 10⁻⁵
	BFP	2.19 × 10⁻²	4.86 × 10⁻¹	3.33 × 10⁻¹	1.36 × 10⁻¹

Bold values in the table correspond to the best algorithmic values.

Table 6. P-test values of simulated algorithms for unimodal functions (50 Dimension).

Objective Function	FA	FPA	CS	ABC	CFS
$f_{1} (x)$	6.79 × 10⁻⁸	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸
$f_{2} (x)$	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸
$f_{3} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10^v8	NA
$f_{4} (x)$	8.00 × 10⁻⁹	8.00 × 10⁻⁹	8.00 × 10⁻⁹	NA	NA

Table 7. Results comparison for unimodal functions (100 Dimension).

Objective Function	Algorithm	Best	Worst	Mean	Standard Deviation
$f_{1} (x)$	CFS	−2.78 × 10⁴	−2.36 × 10⁴	−2.60 × 10⁴	1.07 × 10³
	FA	−1.54 × 10⁴	−5.88 × 10³	−9.40 × 10³	3.05 × 10³
	ABC	−2.28 × 10⁴	−1.73 × 10⁴	−1.98 × 10⁴	1.45 × 10³
	FPA	−1.63 × 10¹⁹	−1.24 × 10¹⁶	−1.50 × 10¹⁸	3.85 × 10¹⁸
	CS	−1.05 × 10⁴	−9.41 × 10³	−1.00 × 10⁴	2.79 × 10²
	BFP	−5.75 × 10⁸	−4.56 × 10³	−5.87 × 10⁷	1.73 × 10⁸
$f_{2} (x)$	CFS	32.0745	1.01 × 10²	69.1336	19.4049
	FA	14.1504	1.69 × 10²	55.1173	40.4882
	ABC	5.70 × 10³	1.88 × 10⁴	1.23 × 10⁴	3.88 × 10³
	FPA	3.03 × 10⁴	9.66 × 10⁴	5.99 × 10⁴	1.93 × 10⁴
	CS	1.38 × 10⁴	2.45 × 10⁴	1.69 × 10⁴	2.59 × 10³
	BFP	1.61 × 10⁵	3.16 × 10⁵	2.53 × 10⁵	4.57 × 10⁴
$f_{3} (x)$	CFS	6.22 × 10³	3.48 × 10⁴	2.11 × 10⁴	6.55 × 10³
	FA	1.89 × 10⁶	1.10 × 10⁷	5.29 × 10⁶	2.83 × 10⁶
	ABC	2.57 × 10⁸	1.69 × 10⁹	1.03 × 10⁹	3.57 × 10⁸
	FPA	1.71 × 10⁸	4.66 × 10⁸	3.22 × 10⁸	8.91 × 10⁷
	CS	1.26 × 10⁶	6.12 × 10⁶	2.69 × 10⁶	1.06 × 10⁶
	BFP	1.92 × 10⁹	4.22 × 10⁹	2.87 × 10⁸	2.87 × 10⁹
$f_{4} (x)$	CFS	2.22 × 10⁻¹⁶	2.83 × 10⁻¹³	2.49 × 10⁻¹⁴	6.35 × 10⁻¹⁴
	FA	1.36 × 10⁻¹¹	0.0667	0.0121	0.0164
	ABC	0	0	0	0
	FPA	1.34 × 10⁻⁶	0.0028	4.55 × 10⁻⁴	7.03 × 10⁻⁴
	CS	1.80 × 10⁻⁸	6.88 × 10⁻⁵	1.43 × 10⁻⁵	2.11 × 10⁻⁵
	BFP	3.10 × 10⁻²	4.92 × 10⁻¹	3.30 × 10⁻¹	1.42 × 10⁻¹

Bold values in the table correspond to the best algorithmic values.

Table 8. P-test values of simulated algorithms for unimodal functions (100 Dimension).

Objective Function	FA	FPA	CS	ABC	CFS
$f_{1} (x)$	6.79 × 10⁻⁸	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁰⁸	6.79 × 10⁻⁸
$f_{2} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁰⁸	NA
$f_{3} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁰⁸	NA
$f_{4} (x)$	8.00 × 10⁻⁹	8.00 × 10⁻⁹	8.00 × 10⁻⁹	NA	7.97 × 10⁻⁹

Table 9. Description of multimodal test problems.

Multimodal Test Problems	Objective Function	Search Range	D
Rastrigin function	$f_{5} (x) = 10 D + \sum_{i = 1}^{D} [x_{i}^{2} - 10 c o s (2 π x_{i})]$	[−5.12, 5.12]	30, 50, 100
Weierstrass function	$f_{6} (x) = \sum_{i = 1}^{D} \sum_{k = 0}^{k m a x} [a^{k} \cos ({2 π b}^{k} (x_{i} + 0.5))] - D \sum_{k = 0}^{k m a x} [a^{k} c o s ({2 π b}^{k} \cdot 0.5)]$ ; where a = 0.5, b = 3, kmax = 20	[−0.5, 0.5]	30, 50, 100
Griewank	$f_{7} = \frac{1}{4000} \sum_{i = 1}^{N} x_{i}^{2}$ − $\prod_{i = 1}^{N} c o s (\frac{x_{i}}{\sqrt{i}}) + 1$	[−600, 600]	30, 50, 100
Penalized 1 Function	$f_{8} = \frac{π}{n} {10 s i n (π y_{1}) + \sum_{i = 1}^{n - 1} (y_{i} - 1) 2 [1 + 10 s i n 2 (π y_{i + 1}) + (y_{n} - 1) 2} + \sum_{i = 1}^{n} u (x_{i}$ , 10, 100, 4) $y_{i} = 1 + \frac{x_{i + 1}}{4};$ u( $x_{i}$ , a, k, m) $= \{\begin{matrix} k {(x_{i} - a)}^{m} x_{i} > a \\ 0 - a < x_{i} < a \\ k {(- x_{i} - a)}^{m} x_{i} < - a \end{matrix}$	[−50, 50]	30, 50, 100
Penalized 2 function	$f_{9} = 0.1 {(3 π x_{1}) + \sum_{i = 1}^{n} {(x_{i} - 1)}^{2} [1 + {s i n}^{2} (3 π x_{i} + 1)] + {(x_{n} - 1)}^{2} [1 + {s i n}^{2} (2 π x_{n})]} + \sum_{i = 1}^{n} u (x_{i}, 5, 100, 4)$ u( $x_{i}$ , a, k, m) $= \{\begin{matrix} k {(x_{i} - a)}^{m} x_{i} > a \\ 0 - a < x_{i} < a \\ k {(- x_{i} - a)}^{m} x_{i} < - a \end{matrix}$	[−50, 50]	30, 50, 100
Ackley function	$f_{10} (x) = - 20 \exp (- 0.2 \sqrt{\frac{1}{D}} \sum_{i = 1}^{D} x_{i}^{2}) - e x p (\frac{1}{D} \sum_{i = 1}^{D} c o s (2 π x_{i})) + 20 + e$	[−100, 100]	30, 50, 100

Table 10. Results comparison for multimodal functions (30 Dimension).

Objective Function	Algorithm	Best	Worst	Mean	Standard Deviation
$f_{5} (x)$	CFS	7.53 × 10⁻¹³	3.59 × 10⁻⁹	7.46 × 10⁻¹⁰	9.59 × 10⁻¹⁰
	FA	2.07 × 10⁻⁹	0.339	0.0226	0.0765
	ABC	7.24 × 10²	4.17 × 10³	2.06 × 10³	9.69 × 10²
	FPA	0.0017	0.2469	0.0595	0.063
	CS	8.00 × 10²	1.86 × 10³	1.16 × 10³	3.00 × 10²
	BFP	4.00 × 10⁻³	1.70 × 10¹	8.13 × 10	3.16 × 10
$f_{6} (x)$	CFS	1.9004	2.6108	2.3049	0.2186
	FA	13.279	21.3048	16.8613	1.8416
	ABC	11.1346	19.6264	15.5206	2.4429
	FPA	35.7517	39.4474	37.5697	1.2557
	CS	16.6273	23.8924	19.9146	1.9027
	BFP	42.4496	50.6708	46.8925	2.1102
$f_{7} (x)$	CFS	1.31 × 10⁻¹³	8.61 × 10⁻¹¹	2.18 × 10⁻¹¹	2.44 × 10⁻¹¹
	FA	2.25 × 10⁻⁷	1.48 × 10⁻⁵	4.01 × 10⁻⁶	3.54 × 10⁻⁶
	ABC	3.5295	72.961	28.0289	16.7278
	FPA	1.50 × 10⁻⁴	0.0874	0.0161	0.0231
	CS	4.5803	18.3641	8.8762	3.1545
	BFP	7.5279	1.65 × 10²	7.69 × 10¹	4.51 × 10¹
$f_{8} (x)$	CFS	0.187	4.4737	0.5345	0.9324
	FA	0.0013	0.133	0.0167	0.0287
	ABC	8.04 × 10⁶	1.68 × 10⁸	6.93 × 10⁷	4.39 × 10⁷
	FPA	5.42 × 10⁵	3.37 × 10⁷	8.71 × 10⁶	8.63 × 10⁶
	CS	9.846	81.0669	24.5174	15.5959
	BFP	3.63 × 10⁸	9.02 × 00⁸	5.82 × 10⁸	1.64 × 10⁸
$f_{9} (x)$	CFS	0.0086	0.0873	0.0286	0.0015
	FA	0.0059	0.0619	0.0119	0.0031
	ABC	2.41 × 10⁷	3.19 × 10⁸	1.49 × 10⁸	8.95 × 10⁷
	FPA	1.38 × 10⁷	1.09 × 10⁸	4.99 × 10⁷	2.45 × 10⁷
	CS	51.2308	1.83 × 10⁵	3.13 × 10⁴	5.18 × 10⁴
	BFP	4.59 × 10⁸	1.65 × 10⁹	1.13 × 10⁹	3.41 × 10⁸
$f_{10} (x)$	CFS	0.5159	0.879	0.6915	0.0111
	FA	0.1462	0.469	0.261	0.0745
	ABC	6.0052	14.9839	11.0513	2.6699
	FPA	14.0678	19.0195	17.4088	1.0373
	CS	10.0393	17.1319	13.0073	1.6329
	BFP	19.8877	20.849	20.5093	0.2645

Bold values in the table correspond to the best algorithmic values.

Table 11. P-test values of various algorithms for multimodal functions (30 Dimension).

Objective Function	FA	FPA	CS	ABC	BFP	CFS
$f_{5} (x)$	1.23 × 10⁻⁷	1.23 × 10⁻⁷	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{6} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{7} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{8} (x)$	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸
$f_{9} (x)$	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	7.57 × 10⁻⁴
$f_{10} (x)$	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA

Table 12. Results comparison for multimodal functions (50 Dimension).

Objective Function	Algorithm	Best	Worst	Mean	Standard Deviation
$f_{5} (x)$	CFS	7.88 × 10⁻¹⁰	3.30 × 10⁻⁹	7.39 × 10⁻¹⁰	9.49 × 10⁻¹⁰
	FA	1.85 × 10−09	0.1989	0.0109	0.0444
	ABC	7.08 × 1003	2.62 × 1004	1.74 × 1004	6.25 × 1003
	FPA	0.0083	0.323	0.1085	0.1053
	CS	3.08 × 1003	5.85 × 1003	4.34 × 1003	8.13 × 1002
	BFP	1.51 × 1000	1.98 × 1001	1.09 × 1001	5.08 × 1000
$f_{6} (x)$	CFS	4.4288	6.3445	5.6194	0.5645
	FA	28.8712	39.2514	33.3655	3.1566
	ABC	32.506	46.6375	38.085	3.2999
	FPA	66.9832	74.2188	70.8866	2.0856
	CS	33.8792	46.2211	39.4868	3.1594
	BFP	68.1743	89.0548	80.9581	5.9281
$f_{7} (x)$	CFS	9.55 × 10⁻¹⁴	3.91 × 10⁻¹⁰	4.20 × 10⁻¹¹	9.14 × 10⁻¹¹
	FA	1.31 × 10⁻⁹	0.0038	1.91 × 10⁻⁴	8.44 × 10⁻⁴
	ABC	45.2242	3.06 × 10²	1.84 × 10²	67.741
	FPA	0.0017	0.0634	0.0144	0.0174
	CS	21.4091	67.0806	38.1323	9.8239
	BFP	1.5237	1.77 × 10²	8.07 × 10¹	6.33 × 10¹
$f_{8} (x)$	CFS	1.4169	11.1903	4.2718	2.5936
	FA	0.0061	2.0945	0.4204	0.5428
	ABC	8.05 × 10⁶	1.07 × 18	5.58 × 10⁷	2.92 × 10⁷
	FPA	4.61 × 10⁵	9.02 × 100⁷	2.58 × 10⁷	2.07 × 10⁷
	CS	65.7835	8.56 × 10⁵	6.28 × 10⁴	1.90 × 10⁵
	BFP	2.73 × 10⁸	1.33 × 10⁹	9.55 × 10⁸	3.23 × 10⁸
$f_{9} (x)$	CFS	0.0809	1.446	0.4249	0.0559
	FA	0.0075	0.3539	0.0444	0.0817
	ABC	8.40 × 10⁷	2.33 × 10⁸	1.52 × 10⁸	4.62 × 10⁷
	FPA	3.80 × 10⁷	3.59 × 10⁸	1.24 × 10⁸	7.85 × 10⁷
	CS	1.47 × 10⁵	1.09 × 10⁶	5.76 × 10⁵	2.86 × 10⁵
	BFP	4.59 × 10⁸	1.65 × 10⁹	1.13 × 10⁹	3.41 × 10⁹
$f_{10} (x)$	CFS	2.3662	18.0276	9.7138	4.4605
	FA	0.3074	0.8458	0.3074	0.1439
	ABC	10.5978	18.3258	15.8443	1.9489
	FPA	16.287	18.9147	17.6333	0.7004
	CS	10.3972	17.8387	13.9978	1.9681
	BFP	19.5794	20.8868	20.6361	0.3188

Bold values in the table correspond to the best algorithmic values.

Table 13. P-test values of various algorithms for multimodal functions (50 Dimension).

Objective Function	FA	FPA	CS	ABC	BFP	CFS
$f_{5} (x)$	1.06 × 10⁻⁷	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{6} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{7} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{8} (x)$	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸
$f_{9} (x)$	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	9.12 × 10⁻⁷
$f_{10} (x)$	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸

Table 14. Results comparison for multimodal functions (100 Dimension).

Objective Function	Algorithm	Best	Worst	Mean	Standard Deviation
$f_{5} (x)$	CFS	3.21 × 10⁻¹⁰	2.21 × 10⁻⁹	7.12 × 10⁻¹⁰	7.73 × 10⁻¹⁰
	FA	1.49 × 10⁻⁸	1.91 × 10⁻⁶	3.98 × 10⁻⁷	5.25 × 10⁻⁷
	ABC	8.09 × 10⁴	1.34 × 10⁵	1.05 × 10⁵	1.44 × 10⁴
	FPA	0.0033	0.2715	0.0813	0.0755
	CS	1.22 × 10⁴	2.03 × 10⁴	1.65 × 10⁴	2.37 × 10³
	BFP	2.10 × 10⁻³	1.49 × 10¹	8.27 × 10⁰⁰	4.11 × 10⁰⁰
$f_{6} (x)$	CFS	16.4301	22.9819	18.3616	1.4303
	FA	67.1686	81.6839	74.2381	4.1565
	ABC	1.03 × 10²	1.25 × 10²	1.16 × 10²	6.363
	FPA	1.23 × 10²	1.62 × 10²	1.53 × 10²	9.3169
	CS	81.7266	96.5011	89.2076	4.8649
	BFP	1.47 × 10²	1.86 × 10²	1.72 × 10²	1.03 × 10¹
$f_{7} (x)$	CFS	9.20 × 10⁻¹⁴	3.20 × 10⁻¹⁰	4.46 × 10⁻¹¹	7.82 × 10⁻¹¹
	FA	1.11 × 10⁻⁶	6.13 × 10⁻⁶	1.83 × 10⁻⁶	1.65 × 10⁻⁶
	ABC	6.68 × 10²	1.08 × 10³	8.83 × 10²	1.15 × 10²
	FPA	0.003	0.071	0.022	0.0212
	CS	1.06 × 10²	1.91 × 10²	1.41 × 10²	22.8388
	BFP	1.0051	1.60 × 10²	5.98 × 10¹	4.05 × 10¹
$f_{8} (x)$	CFS	18.9008	1.86 × 10²	50.7091	35.0491
	FA	9.0521	48.0274	28.2233	10.0771
	ABC	3.42 × 10⁶	7.72 × 10⁷	3.44 × 10⁷	1.94 × 10⁷
	FPA	3.35 × 10⁷	1.78 × 10⁷	8.73 × 10⁷	4.64 × 10⁷
	CS	2.67 × 10⁴	4.32 × 10⁶	7.91 × 10⁵	9.75 × 10⁵
	BFP	7.01 × 10⁸	3.61 × 10⁹	2.49 × 10⁹	8.69 × 10⁸
$f_{9} (x)$	CFS	1.7343	5.7693	3.5451	1.1998
	FA	2.3704	9.5091	4.5693	1.5113
	ABC	3.89 × 10⁷	2.01 × 10⁸	1.07 × 10⁸	4.86 × 10⁸
	FPA	5.12 × 10⁷	7.27 × 10⁸	3.17 × 10⁸	2.03 × 10⁸
	CS	2.86 × 10⁶	2.04 × 10⁷	7.63 × 10⁶	5.15 × 10⁶
	BFP	1.64 × 10⁹	6.02 × 10⁹	4.75 × 10⁹	1.40 × 10⁹
$f_{10} (x)$	CFS	4.5948	19.4525	12.4022	3.9196
	FA	1.3412	3.5494	2.7283	0.5434
	ABC	18.2683	19.7601	19.1378	0.3621
	FPA	16.6528	19.686	18.2081	0.8273
	CS	13.5684	17.8497	15.6218	1.4927
	BFP	20.0389	21.0637	20.7694	0.2923

Bold values in the table correspond to the best algorithmic values.

Table 15. P-test values of various algorithms for multimodal functions (100 Dimension).

Objective Function	FA	FPA	CS	ABC		CFS
$f_{5} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{6} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{7} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{8} (x)$	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸
$f_{9} (x)$	0.0439	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{10} (x)$	NA	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸

Table 16. Description of fixed dimension test functions.

Fixed Dimension Test Problems	Objective Function	Search Range	Optimum Value	D
Branin RCOS Function	$f_{11} (x) = {(x_{2} - \frac{5.1}{{4 π}^{2}} x_{1}^{2} + \frac{5}{π} x_{1} - 6)}^{2} + 10 (1 - \frac{1}{8 π}) c o s x_{1} + 10$	x₁ ϵ [−5, 10], x₂ ϵ [0, 15]	0.397887	2
Six Hump Camel function	$f_{12} (x) = (4 - {2.1 x}_{1}^{2} + \frac{x_{1}^{4}}{3}) x_{1}^{2} + x_{1} x_{2} + (- 4 + {4 x}_{2}^{2}) x_{2}^{2}$	[−5, 5]	−1.0316	2
Goldstein & Price function	$f_{13} (x) = {(1 + (x_{1} + x_{2} + 1)}^{2} (19 - 14 x_{1} + 3 {x_{1}}^{2} - 14$ $x_{2} + 6 x_{1} x_{2} + 3 {x_{2}}^{2})) (30 + {(2 x_{1} - 3 x_{2})}^{2} (18 - 32 x_{1}$ $+ 12 {x_{1}}^{2} + 48 x_{2} - 36 x_{1} x_{2} + 27 {x_{2}}^{2}))$	[−2, 2]	3	2
Hartmann function 3	$f_{14} (x) = - \sum_{i = 1}^{4} α_{i} e x p [- \sum_{j = 1}^{3} A_{i j} {(x_{j} - P_{i j})}^{2}]$	[0, 1]	−3.86278	3
Hartmann function 6	$f_{15} (x) = - \sum_{i = 1}^{4} α_{i} e x p [- \sum_{j = 1}^{6} A_{i j} {(x_{j} - P_{i j})}^{2}]$	[0, 1]	−3.32237	6
Shekel 5	$f_{16} (x) = - \sum_{j = 1}^{5} [\sum_{i = 1}^{4} {({(x_{i} - C_{i j})}^{2} + β_{j})}^{- 1}]$	[0, 10]	−10.1532	4
Shekel 7	$f_{17} (x) = - \sum_{j = 1}^{7} [\sum_{i = 1}^{4} {({(x_{i} - C_{i j})}^{2} + β_{j})}^{- 1}]$	[0, 10]	−10.4029	4
Shekel 10	$f_{18} (x) = - \sum_{j = 1}^{10} [\sum_{i = 1}^{4} {({(x_{i} - C_{i j})}^{2} + β_{j})}^{- 1}]$	[0, 10]	−10.5364	4
Easom function	$f_{19} (x) = - c o s x_{1} c o s x_{2} e^{(- {(x_{1} - π)}^{2} - {(x_{2} - π)}^{2})}$	[−10, 10]	−1	2

Table 17. Results comparison for fixed dimension functions.

Objective Function	Algorithm	Best	Worst	Mean	Standard Deviation
$f_{11} (x)$	CFS	0.3979	0.3979	0.3979	2.19 × 10⁻¹¹
	FA	0.3979	0.3979	0.3979	1.30 × 10⁻⁸
	ABC	0	0	0	0
	FPA	0.3979	0.3983	0.398	9.64 × 10⁻⁵
	CS	0.3979	0.3979	0.3979	5.32 × 10⁻⁸
	BFP	0.4416	5.3576	3.0721	1.63 × 10⁰⁰
$f_{12} (x)$	CFS	−1.0316	−1.0316	−1.0316	1.66 × 10⁻¹⁰
	FA	−1.3016	−1.3015	−1.0316	3.47 × 10⁻⁵
	ABC	−1.3016	−1.0250	−1.0310	0.0015
	FPA	−1.3016	−1.3016	−1.3016	1.24 × 10⁻⁵
	CS	−1.3016	−1.0316	−1.3016	8.22 × 10⁻¹¹
	BFP	−0.9884	4.4587	0.1862	1.50 × 10⁰⁰
$f_{13} (x)$	CFS	3	3	3	1.54 × 10⁻¹²
	FA	3	3	3	1.51 × 10⁻⁷
	ABC	3.0004	3.0531	3.0107	0.0148
	FPA	3	3.0015	3.0004	4.73 × 10⁻⁴
	CS	3	3	3	9.86 × 10⁻⁹
	BFP	3.3525	98.258	47.458	3.46 × 10¹
$f_{14} (x)$	CFS	−3.8628	−3.8628	−3.8628	6.56 × 10⁻¹²
	FA	−3.8628	−2.1968	−3.3064	0.6077
	ABC	−3.8628	−3.8621	−3.8626	2.17 × 10⁻⁴
	FPA	−3.8325	−1.5171	−3.3253	0.6709
	CS	−3.8628	−3.8628	−3.8628	1.13 × 10⁻⁸
	BFP	−0.5359	−3.25E−6	−0.0814	1.65 × 10⁻¹
$f_{15} (x)$	CFS	−3.3224	−3.3224	−3.3224	3.74 × 10⁻⁷
	FA	−3.3224	−3.0639	−3.2469	9.32 × 10⁻²
	ABC	−3.3223	−3.1954	−3.2461	0.059
	FPA	−3.2275	−2.9663	−3.1345	0.0702
	CS	−3.3223	−3.3140	−3.3201	0.0028
	BFP	−2.6298	−0.7595	−1.6330	0.5693
$f_{16} (x)$	CFS	−10.1532	−10.1532	−10.1532	5.74 × 10⁻⁵
	FA	−5.0552	−5.0552	−5.0552	1.03 × 10⁻⁸
	ABC	−10.1486	−2.6075	−5.5322	3.4454
	FPA	−5.0546	−5.0419	−5.0513	0.0033
	CS	−10.0826	−9.3309	−10.0826	0.1799
	BFP	−3.9584	−1.2893	−2.3915	0.8119
$f_{17} (x)$	CFS	−10.4029	−10.4029	−10.4029	1.33 × 10⁻⁴
	FA	−5.0877	−5.0877	−5.0877	9.13 × 10⁻⁹
	ABC	−10.5359	−2.4206	−5.3332	3.1817
	FPA	−5.0864	−5.0771	−5.0837	0.0025
	CS	−10.5358	−73868	−10.3006	0.6974
	BFP	−4.4980	−1.6336	−2.6498	0.8739
$f_{18} (x)$	CFS	−10.5364	−10.5364	−10.5364	1.87 × 10⁻⁶
	FA	−5.1285	−5.1285	−5.1285	9.16 × 10⁻⁹
	ABC	−10.4895	−1.8556	−4.6289	3.0032
	FPA	−5.1279	−5.1185	−5.1244	0.0028
	CS	−10.5357	−9.8686	−10.4320	0.1724
	BFP	−4.3369	−1.6523	−2.5939	0.7823
$f_{19} (x)$	CFS	−1	−1.0000	−1.0000	6.07 × 10⁻¹⁴
	FA	−1.0000	−1.0000	−1.0000	1.26 × 10⁻⁸
	ABC	−1.0000	−0.9886	−0.9977	0.0029
	FPA	−1.0000	−0.9998	−0.9999	6.79 × 10⁻⁵
	CS	−1.0000	−1.0000	−1.0000	5.30 × 10⁻¹⁰
	BFP	−0.5894	−2.18 × 10⁻¹³	−0.0574	1.51 × 10⁻¹

Bold values in the table correspond to the best algorithmic values.

Table 18. P-test values of various algorithms for fixed dimension functions.

Objective Function	FA	FPA	CS	ABC	BFP	CFS
$f_{11} (x)$	7.89 × 10⁻⁸	7.89 × 10⁻⁸	9.17 × 10⁻⁸	8.00 × 10⁻⁹	6.79 × 10⁻⁸	NA
$f_{12} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	0.0679	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{13} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{14} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	7.89 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{15} (x)$	0.1895	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{16} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{17} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	0.0012	1.60 × 10⁻⁴	6.79 × 10⁻⁸	NA
$f_{18} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA
$f_{19} (x)$	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	6.79 × 10⁻⁸	NA

Table 19. Parameters for algorithms.

Algorithm	Parameters	Value
CFS	Population size	50 for XOR and Balloon; 20 for the rest
	Probability switch	0.8
	Discovery rate of alien egg (p_a)	0.25
	Maximum number of iterations	250

Table 20. Classification datasets.

Classification Datasets	Attributes Count	Training Samples Count	Test Samples Count	Number of Classes
3-bit XOR	3	8	8 as training samples	2
Balloon	4	16	16 as training samples	2
Iris	4	150	150 as training samples	3
Breast Cancer	9	599	100	2
Heart	22	80	187	2

Table 21. MLP structure for each dataset.

Classification Datasets	Attributes Count	MLP Structure
3-bit XOR	3	3−7−1
Balloon	4	4−9−1
Iris	4	4−9−3
Breast Cancer	9	9−19−1
Heart	22	22−45−1

Table 22. Comparison results of CFS-MLP for XOR dataset.

Algorithm	Average	Standard Deviation
CFS−MLP	9.687 × 10⁻¹²	2.520 × 10⁻¹¹
GWO−MLP	9.410 × 10⁻³	2.950 × 10⁻¹
PSO−MLP	8.405 × 10⁻²	3.594 × 10⁻²
GA−MLP	1.810 × 10⁻⁴	4.130 × 10⁻⁴
ACO−MLP	1.803 × 10⁻¹	2.526 × 10⁻²
ES−MLP	1.187 × 10⁻¹	1.157 × 10⁻²
PBIL−MLP	3.022 × 10⁻²	3.966 × 10⁻²
WOA−MLP	8.420 × 10⁻²	5.140 × 10⁻²
MFO−MLP	5.298 × 10⁻⁶	1.038 × 10⁻⁵

Bold values in the table correspond to the best algorithmic values.

Table 23. Comparison results of CFS-MLP for the Balloon dataset.

Algorithm	Average	Standard Deviation
CFS−MLP	1.19 × 10⁻⁴¹	1.90 × 10⁻⁴¹
GWO−MLP	9.38 × 10⁻¹⁵	2.81 × 10⁻¹⁴
PSO−MLP	0.000585	0.000749
GA−MLP	5.08 × 10⁻²⁴	1.06E−23
ACO−MLP	0.004854	0.00776
ES−MLP	0.019055	0.17026
PBIL−MLP	2.49 × 10⁻⁵	5.27 × 10⁻⁵
WOA−MLP	4.88 × 10⁻⁶	1.41 × 10⁻⁵
MFO−MLP	1.85 × 10⁻¹⁵	6.18 × 10⁻¹⁵

Bold values in the table correspond to the best algorithmic values.

Table 24. Comparison results of CFS-MLP for the Iris dataset.

Algorithm	Average	Standard Deviation
CFS−MLP	0.06673	5.31 × 10⁻⁴
GWO−MLP	0.0229	0.0032
PSO−MLP	0.22868	0.057235
GA−MLP	0.089912	0.123638
ACO−MLP	0.405979	0.053775
ES−MLP	0.31434	0.052142
PBIL−MLP	0.116067	0.036355
WOA−MLP	0.734134	0.051808
MFO−MLP	0.667957	0.003467

Bold values in the table correspond to the best algorithmic values.

Table 25. Comparison results of CFS-MLP for the Breast Cancer dataset.

Algorithm	Average	Standard Deviation
CFS−MLP	0.0018	2.83 × 10⁻⁴
GWO−MLP	0.0012	7.44 × 10⁻⁵
PSO−MLP	0.034881	0.002472
GA−MLP	0.003026	0.0015
ACO−MLP	0.01351	0.002137
ES−MLP	0.04032	0.00247
PBIL−MLP	0.032009	0.003065
WOA−MLP	0.006243	0.003128
MFO−MLP	0.004038	0.003041

Bold values in the table correspond to the best algorithmic values.

Table 26. Comparison results of CFS-MLP for the Heart dataset.

Algorithm	Average	Standard Deviation
CFS−MLP	0.0686	0.0067
GWO−MLP	0.1226	0.0077
PSO−MLP	0.188568	0.008939
GA−MLP	0.093047	0.02246
ACO−MLP	0.22843	0.004979
ES−MLP	0.192473	0.015174
PBIL−MLP	0.154096	0.018204
WOA−MLP	0.179664	0.052152
MFO−MLP	0.08321	0.02062

Bold values in the table correspond to the best algorithmic values.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Salgotra, R.; Mittal, N.; Mittal, V. A New Parallel Cuckoo Flower Search Algorithm for Training Multi-Layer Perceptron. Mathematics 2023, 11, 3080. https://doi.org/10.3390/math11143080

AMA Style

Salgotra R, Mittal N, Mittal V. A New Parallel Cuckoo Flower Search Algorithm for Training Multi-Layer Perceptron. Mathematics. 2023; 11(14):3080. https://doi.org/10.3390/math11143080

Chicago/Turabian Style

Salgotra, Rohit, Nitin Mittal, and Vikas Mittal. 2023. "A New Parallel Cuckoo Flower Search Algorithm for Training Multi-Layer Perceptron" Mathematics 11, no. 14: 3080. https://doi.org/10.3390/math11143080

APA Style

Salgotra, R., Mittal, N., & Mittal, V. (2023). A New Parallel Cuckoo Flower Search Algorithm for Training Multi-Layer Perceptron. Mathematics, 11(14), 3080. https://doi.org/10.3390/math11143080

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Parallel Cuckoo Flower Search Algorithm for Training Multi-Layer Perceptron

Abstract

1. Introduction

2. Feed-Forward Neural Networks and Multi-Layer Perceptron

3. Basic Cuckoo Search and Flower Pollination Algorithm

3.1. Cuckoo Search Algorithm

3.2. Flower Pollination Algorithm

4. Cuckoo Flower Search Algorithm

4.1. Algorithm Definition

Initialization

4.2. CFS-MLP Trainer

5. Result and Discussion

5.1. Benchmark Problems

5.1.1. Unimodal Functions

5.1.2. Multimodal Functions

5.1.3. Fixed Dimension Functions

5.2. FNN–MLP Datasets

5.2.1. XOR Dataset

5.2.2. Balloon Dataset

5.2.3. Iris Dataset

5.2.4. Breast Cancer Dataset

5.2.5. Heart Dataset

5.3. Discussion of Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI