An Adaptive Cuckoo Search-Based Optimization Model for Addressing Cyber-Physical Security Problems

: One of the key challenges in cyber-physical systems (CPS) is the dynamic ﬁtting of data sources under multivariate or mixture distribution models to determine abnormalities. Equations of the models have been statistically characterized as nonlinear and non-Gaussian ones, where data have high variations between normal and suspicious data distributions. To address nonlinear equations of these distributions, a cuckoo search algorithm is employed. In this paper, the cuckoo search algorithm is effectively improved with a novel strategy, known as a convergence speed strategy, to accelerate the convergence speed in the direction of the optimal solution for achieving better outcomes in a small number of iterations when solving systems of nonlinear equations. The proposed algorithm is named an improved cuckoo search algorithm (ICSA), which accelerates the convergence speed by improving the ﬁtness values of function evaluations compared to the existing algorithms. To assess the efﬁcacy of ICSA, 34 common nonlinear equations that ﬁt the nature of cybersecurity models are adopted to show if ICSA can reach better outcomes with high convergence speed or not. ICSA has been compared with several well-known, well-established optimization algorithms, such as the slime mould optimizer, salp swarm, cuckoo search, marine predators, bat, and ﬂower pollination algorithms. Experimental outcomes have revealed that ICSA is superior to the other in terms of the convergence speed and ﬁnal accuracy, and this makes a promising alternative to the existing algorithm.


Introduction
With the norm of cyber-physical systems (CPS), cyber defense systems such as intrusion detection and threat intelligence, which deal with data sources under the constraints of nonnormality and nonlinearity, should be designed to handle these constraints and produce accurate outcomes [1,2]. These models have been developed using nonlinear equation systems (NESs) [3], which need to be accurately solved in reasonable time [4]. Therefore, to overcome NESs, several numerical methods, including the Newton-type method [5] and the iterative and recursive methods [6], have been proposed. However, most of those methods cannot estimate the roots of NESs with a complex nature due to their sensitivity to picking the initial guess of the solutions, which significantly affects the obtained outcomes and stability of those methods [4]. Therefore, the only way to overcome those drawbacks and to estimate the optimal roots is to use evolutionary and meta-heuristic algorithms, which have gained significant attention over the last decades due to their superiority in terms of local minima avoidance, convergence speed, and reaching the optimal solution in a reasonable time.
The evolutionary algorithms (EAs) and swarm algorithms (SAs) have achieved significant achievements in real-world optimization problems [7][8][9][10][11][12][13][14][15][16][17][18], particularly the convex, discontinuous nonlinear optimization problem [11,19,20]. Therefore, they have been widely used in the literature for solving the NESs. Unfortunately, the existing algorithms still suffer from local minima and convergence speed to the optimal root. This causes two problems when solving NESs: (1) consuming several numbers of function evaluations before reaching the optimal root in some cases, and (2) the algorithms are unable to find the optimal root with an increasing number of function evaluations due to the weak ability of the algorithms in exploring as much of the search space as possible while avoiding getting stuck in local minima problems. In cybersecurity, data distributions of intrusion detection and threat models often demand nonlinear and non-Gaussian systems that can discriminate small variations between normal and suspicious behaviors [21]. In this paper, the cuckoo search algorithm (CSA) is improved in an effective way to help it avoid those two problems while solving NESs. The algorithm is named as the improved CSA (ICSA). ICSA was extensively validated using 34 well-known NES cases and compared with some recently published, well-established optimization algorithms, namely the slime mould algorithm (SMA, 2020) [22], marine predators algorithm (MPA, 2020) [23], Bat algorithm (BA, 2012) [24], salp swarm algorithm (SSA, 2017) [25], standard cuckoo search algorithm (CSA, 2009) [26], and flower pollination algorithm (FPA, 2012) [27], under various statistical analyses that can flexibly fit nonlinear distributions of CPS-driven data sources, efficiently enhancing the discovery of anomalous events. The experiments show that our improved algorithm has significant performance for most test cases concerning the convergence speed and final accuracy in comparison to the abovementioned algorithms. The main contributions in this research are as follows: (a) Improving the classical CSA using an effective strategy called the convergence improvement strategy (CIS) to produce a new variant able to accurately tackle NESs. This variant was named ICSA. (b) The experiments conducted on 34 well-known NES cases to assess the performance of this variant, in addition to comparing its performance with 6 well-established optimization algorithms, show the efficacy of this variant in terms of the convergence speed and final accuracy for most test cases.
The remainder of this paper is organized as follows: Section 2 presents the literature review, Section 3 overviews the standard CSA, Section 4 extensively describes our proposed work, and Section 5 shows our experimental outcomes and some discussions. Finally, Section 6 shows some conclusions devised from our proposed work and discusses our future work.

Literature Review
This section is divided into two parts. The first part will define the problem formulation of the NES, and the second reviews the EAs and the SAs proposed in the literature to tackle the NESs.

Problem Description
Generally, nonlinear equation systems are mathematically formulated as follows: where d denotes the number of decision variables of the equation; n refers to the number of equations; x is a vector of d-dimensions and includes a solution to the NES, where each dimension within this solution must be subject to its search boundary: lower bound (Lb) and upper bound (Ub).
As formulated in Equation (1), x denotes the decision variables and the attributes/ features in a cyber-physical problem, specifically, a machine learning-based intrusion detection. When these attributes were statistically evaluated using the Kolmogorov-Smirnov (K-S) test, the outcomes revealed that the attributes follow nonlinear and non-Gaussian distributions. This indicates that the models must employ nonlinear equations to perfectly fit small variations of normal and anomalous behaviors [1].
To solve the nonlinear attributes/decision attributes of NESs in a machine learningbased intrusion detection problem, Equation (1) comprises n equations, while the optimization algorithms usually work to minimize only one. Therefore, Equation (1), which defines the NESs, was transformed into Equation (2) to become a minimization problem that could be solved using an optimization algorithm.
This equation is considered as the objective function that needs to be minimized using optimization techniques to find the optimal roots and clear boundaries between the nonlinear attributes of normal and suspicious events.

Swarm and Evolutionary Algorithms
In [28], the social emotion optimization algorithm (EOA) was integrated with a metropolis rule as an attempt to escape the local minima that has been proposed for the NESs. This hybrid algorithm, abbreviated as MSEOA, was compared with the particle swarm algorithm (PSO) and the standard EOA to determine the best one for solving four nonlinear equations. In the experiments, MSEOA was found to be more effective for solving the NESs. Further, Wu Z. and L. Kang [29] proposed a parallel Elite-subspace evolutionary algorithm (PESEA) to solve the NESs in a reasonable time. PESEA was validated using five nonlinear equations to determine its punctuality in estimating their optimal roots. Based on the conducted experiments, PESEA is faster and more punctual.
To solve NESs, the authors of [30] suggested a hybrid approach that involved using the capability of chaos maps to dramatically explore the search space with a quasi-Newton method outstanding with high convergence. The authors of [31] integrated an evolutionary algorithm with additional strategies. The authors combined the k-means clustering method with niching to guide the optimization process to the multiple roots within the search space, and avoided getting stuck in local minima using the two methods. Finally, the authors proposed using various crowding factors to decrease the replacement error for finding the multiple roots of the NESs, and this algorithm was called a one-step k-means clusteringbased differential evolution (KSDE). Following the development of KSDE, 30 problems have been used to validate its performance, in addition to comparing the algorithm with some of the state-of-the-art methods to show its superiority.
Rizk-Allah [32] proposed a new approach, namely Q-SCA, to solve NESs based on modifying the sine-cosine algorithm (SCA). Using Q-SCA, the Rizk-Allah dynamically adjusted the SCA's search ability to search around the current location or the best-sofar solution to improve its exploitation capability as an attempt to accelerate the local convergence rate. Q-SCA also used the quantum local search (QLS) to improve the obtained solutions as an attempt to balance the algorithm's exploration and exploitation capability. This approach was investigated among 12 NESs and 2 electrical applications and compared with several algorithms to show its stability and accuracy in achieving true better outcomes. The experimental outcomes showed the superiority of this algorithm over the standard one. The authors of [33][34][35] adapted various genetic algorithms (GAs) to solve the NESs.
The grasshopper optimization algorithm (GOA) [36] has been hybridized with the GA to produce a new hybrid algorithm known as hybrid GOA with GA for solving the NESs. This hybrid algorithm combined the merits of both GA and GOA to escape from the local minima and accelerate the convergence speed. More than that, the grey wolf optimizer (GWO) has been integrated with differential evolution to tackle the NESs; this algorithm was named GWO-DE. In [37], differential evolution improved and integrated with a restart strategy, namely DE-R, has been proposed for the NESs. DE-R used a new mutation operator and a restart technique to promote the exploration ability and avoid getting stuck into local minima. DE-R was compared with some recently developed algorithms over a set of nonlinear equation systems and real-world problems to show the effectiveness of DE-R.
Ultimately, several continuous evolutionary and swarm intelligence algorithms have been promoted that might be applied to tackle this problem in the future in the hope of finding better outcomes; some of those algorithms are natural evolution strategies [38], particle swarm optimization in the estimation of distribution algorithms (EDAs) framework [39], the EDAs [40], and the covariance matrix adaptation evolution strategy [41].

Standard Algorithm: Cuckoo Search Algorithm
Xin Shen Yang [26] proposed a new metaheuristic algorithm, namely the cuckoo search algorithm (CSA), for solving optimization problems. Recently, CSA was employed for selecting the most relevant nonlinear attributes and discovering suspicious observations [42]. This research is motivated to develop a new variant of CSA that can efficiently deal with nonlinear functions and will be effective in finding the clear bounds of legitimate and suspicious behaviors while implementing classification methods. CSA is inspired by the obligate brood parasitism of some cuckoo birds by laying their eggs in the nests of other host birds. Sometimes, when the cuckoos find out the eggs in their nests do not belong to them, those foreign eggs are either flung out or all the nests are abandoned. In general, the CS algorithm is based on three rules: (1) Each cuckoo lays one egg at a time and put its egg in a randomly chosen nest; (2) The best nests with eggs having high quality will be used in the next generation; (3) The available host nests number is fixed, and the cuckoos can discover a foreign egg with a probability p a that varies between 0 and 1.
CSA could balance the global random walk and local random walk to promote its searchability for reaching better outcomes. Mathematically, the global random walk is formulated as where t express the current iteration, x t i is the current position of the i th cuckoo, x t+1 i indicates the next position, L(s, λ) is the levy distributions used to determine the size of the step of random walk, s is the stepsize, and α is a positive scaling factor. The local random walk is defined as follows: where indicates the entry-wise multiplication operator, H is a heavy-side function, ε is a random number generated based on the normal distribution, and x t j and x t k are two random positions chosen randomly from the current population. t max indicates the maximum number of iterations. The steps of CSA are shown in Algorithm 1. Create an initial population of N solutions.
Evaluate the fitness for each solution and determine the best-so-far solution x * . 4.
Create a new population using Equation (3) and insert better ones into the current population. 6.
Create a new population using Equation (4) and add into the current the best ones. 8.
end while

Proposed Algorithm
In this section, the steps of the proposed algorithm, known as an improved cuckoo search algorithm (ICSA), will be clearly described; those steps are initialization, evaluations, and ICSA.

Initialization
At the outset of the optimization algorithm, a group of N solutions will be created with d dimensions for each, which are randomly initialized within the search space of the problem according to the following equation: where → U and → L are two vectors including the upper and lower bounds of various problem dimensions, and → r is a vector of d elements assigned randomly between 0 and 1. After completing the initialization step, those initial solutions will be evaluated using Equation (2) to determine the quality of each one, and the one with the highest quality will be extracted to help later in improving the quality of the new populations.

Convergence Improvement Strategy (CIS)
A new strategy, called convergence improvement strategy, is proposed for improving the performance of the meta-heuristic algorithm to achieve better convergence, in addition to improving final accuracy, and enhancing the ability to select the most significant attributes for CPS problems. This strategy is two-fold: the first aspect is based on searching the best-so-far solutions for a better solution using Equation (6) to save time in the optimization process if the near-optimal solution is found around this best-so-far case, but this best-so-far solution also may be a trap to drift the algorithm into local minima, hence reducing the possibility of reaching better outcomes. Therefore, the second aspect, formulated mathematically in Equation (7), is used to avoid falling into local minima based on multiplying the current position in a vector vc generated randomly based on the uniform distribution with the lower endpoints −1 × r 1 and upper endpoint r 1 ; where r 1 is a value created randomly between 0 and 1.
The swap between Equations (6) and (7) is determined based on a probability, namely γ, picked during the experiments by the researcher at the expense of their outcomes; this probability in our experiment was set to 0.1, after extensive experiments.

Improved Cuckoo Search Algorithm (ICSA)
To improve the global random walk of CSA, CIS is called after executing the global random walk, with probability pr to accelerate the convergence speed toward the best-sofar solution, using the first aspect, and to avoid getting stuck into local minima, using the second aspect. Generally, Algorithm 2 elaborates the steps of ICSA after integrating CIS. Before starting the optimization process by ICSA, N solutions will be randomly distributed within the search space to cover it as much as possible, in addition to initializing the main parameters of the ICSA. Then, those solutions will be updated by the global random walk integrated with the CIS with a probability pr set to 0.5, as explained in the experiments section, to promote its searchability for reaching better outcomes, as described in Lines 6-16 in Algorithm 2. In Line 19, the current solution will be updated using the local random walk as an attempt to avoid getting stuck into local minima. This optimization process is continuously running until the termination condition is satisfied (reaching the maximum iteration t max ).

Algorithm 2
The steps of ICSA

1.
Create an initial population of N solutions.
Evaluate the fitness for each solution and determine the best-so-far solution x * . 4.
r: create a random number between 0 and 1. 8.

Outcomes and Discussion
This section validates the performance of the proposed algorithm, ICSA, to examine its efficacy, in addition to witnessing its superiority compared to some well-established optimization algorithms under various statistical analyses. Best, average (Avg), worst, and standard deviation (SD) were obtained as the fitness values within 30 independent trials, and the Wilcoxon rank-sum test was used to determine significance. The compared algorithms used in our experiments included slime mould algorithm (SMA, 2020) [22], marine predators algorithm (MPA, 2020) [23], Bat algorithm (BA, 2012) [24], salp swarm algorithm (SSA, 2017) [25], standard cuckoo search algorithm (CSA, 2009) [26], and flower pollination algorithm (FPA, 2012) [27]. Algorithms were programmatically implemented using MATLAB R2019a based on the cited parameters under the same operating conditions as the proposed algorithm; those conditions are summarized as the maximum number of iterations, the population size, and the number of independent runs, which are respectively set to 500, 30, and 30. A computer with 32GB of RAM, Intel(R) Core(TM) i7-4700MQ CPU @ 2.40 GHz, and a 64-bit operating system (Windows 10) was used to conduct all the experiments.
To validate the performance of our proposed algorithm, 34 test cases of the nonlinear equation systems used widely in the literature were used. Most of these equations were widely used in the design of cybersecurity models, such as intrusion detection and threat models, to differentiate between small variations of normal and abnormal activities in CPSs. The characteristics of these functions are summarized as the number of dimensions (D), the search space (R) for each dimension, and formulas to those functions, and their references are presented in Table 1.
To adjust the main effective parameters of the proposed algorithm, which include α, γ, and pr, extensive experiments have been performed with various values for each parameter on F12, and their outcomes for 30 independent trials are depicted in Figure 1. Inspecting this figure shows that the near-optimal values for α, γ, and pr were 0.5, 0.1, and 0.5, respectively. The value of the parameter pr was set to 0.5 instead of 0.6 because the algorithm was better able to minimize the objective value at this number.
In Table 2, the best, worst, and Avg objective values, in addition to SD, were obtained after running each algorithm 30 independent times, and test functions F1-F28 are exposed. From this table, on one side, ICSA had the best metric for 26 of 28 test cases, where the less possible value of 0 was reached for 19 out of those 26 test cases. This shows the superiority of our proposed algorithm in minimizing objective values in comparison with the other algorithms; for Avg, Worst, and SD measures, ICSA was best for 21 test cases, and this indicates the proposed algorithm is not stable since its outcomes were relatively diversified within all independent runs. This is our main limitation that needs to be addressed in future work.
Furthermore, the proposed algorithm was compared with the others regarding the convergence speed to see which algorithm quickly converged to the optimal solution. This can be used to select the most relevant features or fit normal and abnormal observations under multivariate distributions. The convergence curves based on the outcomes were obtained by various algorithms for 21 test cases randomly selected among the first 21 test cases, and these are depicted in Figures 2-22. From those figures, we point out that the proposed algorithm reached a lower objective value faster than the others.

-value h P-value h P-value h P-value h P-value h P-value h F P-value h P-value h P-value h P-value h P-value h P-value h
Additionally, the outcomes of the algorithms on test cases F29-F34 are shown in Table 4, which show the superiority of ICSA for F29, F30, F31, and F32 in terms of the best, avg, worst, and SD values. Only the best objective value could be better for the other two test cases. The convergence curves obtained by various algorithms for the same test cases are respectively presented in Figures 23 and 24, which show that our proposed algorithm moved toward the optimal solution faster; hence, the number of function evaluations required for reaching the optimal solution will be significantly decreased compared to the other algorithms used in our comparison.     Last but not least, various algorithms in our experiments will be compared in terms of CPU time consumed by each one until completing the optimization process for each test case. For that, each algorithm was executed for 30 independent runs, and the consumed time within those runs on all test cases was calculated. Afterward, the rate of consumption on each test case was calculated by taking the average of the total consumed time and presented in Figure 25. This figure shows the superiority of SSA, which could occupy the first rank in terms of CPU time, while BA, FPA, MPA, CSA, and ICSA, respectively, came in second, third, fourth, fifth, and sixth. Although ICSA occupied the sixth rank in terms of the consumed time, its final accuracy and convergence speed make it a strong alternative for tackling the NESs, as it could reach better outcomes with a fewer number of function evaluations; hence, the consumed time will be minimized. Last but not least, various algorithms in our experiments will be compared in terms of CPU time consumed by each one until completing the optimization process for each test case. For that, each algorithm was executed for 30 independent runs, and the consumed time within those runs on all test cases was calculated. Afterward, the rate of consumption on each test case was calculated by taking the average of the total consumed time and presented in Figure 25. This figure shows the superiority of SSA, which could occupy the first rank in terms of CPU time, while BA, FPA, MPA, CSA, and ICSA, respectively, came in second, third, fourth, fifth, and sixth. Although ICSA occupied the sixth rank in terms of the consumed time, its final accuracy and convergence speed make it a strong alternative for tackling the NESs, as it could reach better outcomes with a fewer number of function evaluations; hence, the consumed time will be minimized. Ultimately, ICSA and the standard algorithm were separately compared with each other using a boxplot to analyze the efficacy of our improvement strategy. In general, the proposed algorithm and the standard one were independently executed 30 times, and the objective values obtained for 15 test cases are graphically pictured in Figures 26-30. These figures show that ICSA was better for all used test cases except F4 and F12 depicted in Figures 27(a) and 29(c), where CSA could fulfill better outcomes. As a result, our improvement strategy could make a significant, positive effect on the performance of the standard Ultimately, ICSA and the standard algorithm were separately compared with each other using a boxplot to analyze the efficacy of our improvement strategy. In general, the proposed algorithm and the standard one were independently executed 30 times, and the objective values obtained for 15 test cases are graphically pictured in Figures 26-30. These figures show that ICSA was better for all used test cases except F4 and F12 depicted in Figures 27a and 29c, where CSA could fulfill better outcomes. As a result, our improvement strategy could make a significant, positive effect on the performance of the standard algorithm for achieving better outcomes in fewer iterations and enhance the capability of finding small variances of legitimate and suspicious observations in the CPS domain, enhancing the performance of the machine learning-based intrusion detection techniques. Ultimately, ICSA and the standard algorithm were separately compared with each other using a boxplot to analyze the efficacy of our improvement strategy. In general, the proposed algorithm and the standard one were independently executed 30 times, and the objective values obtained for 15 test cases are graphically pictured in Figures 26-30. These figures show that ICSA was better for all used test cases except F4 and F12 depicted in Figures 27(a) and 29(c), where CSA could fulfill better outcomes. As a result, our improvement strategy could make a significant, positive effect on the performance of the standard algorithm for achieving better outcomes in fewer iterations and enhance the capability of finding small variances of legitimate and suspicious observations in the CPS domain, enhancing the performance of the machine learning-based intrusion detection techniques.    From the above, it is concluded that our modification to the standard CSA significantly improved its performance during solving the nonlinear equations system. This improvement was due to the searchability of the integrated method to avoid falling into local optima and accelerating the convergence speed in the right direction of the optimal solution. However, ICSA could not outperform some optimization algorithms, with the computational cost and stability as the limitations of our proposed algorithm, which will be addressed in future work by integrating the CIS with one of the several continuous evolutionary and swarm intelligence algorithms such as natural evolution strategies [38], the particle swarm optimization in the estimation of distribution algorithms (EDAs) framework [39], the EDAs [40], and the covariance matrix adaptation evolution strategy [41], which have not been yet applied to tackle the NESs.

Conclusion and Future Work
This paper has presented a new algorithm with strong merits to promote the searchability for solving the systems of nonlinear equations with a low number of function evaluations and fast convergence to the near-optimal solution. This is one of the challenges in the cyber physical domain, especially finding small variations between normal and abnormal behaviors of nonlinear attributes. This algorithm is based on integrating the cuckoo search algorithm with a novel strategy to produce a new variant, named the improved cuckoo search algorithm (ICSA), with high convergence speed and final accuracy in a small number of function evaluations. To assess the performance of ICSA, 34 wellknown nonlinear equations systems were compared to see ICSA's effectiveness in attacking the optimal solution for several function evaluations reaching 15000 (multiplying the population size by the maximum number of iterations). ICSA also was extensively compared with the standard cuckoo search algorithm and five well-established algorithmsslime mould optimizer, marine predators algorithm, salp swarm algorithm, bat algorithm, and flower pollination algorithm-to affirm the superiority of ICSA. Experimental findings affirmed that ICSA could perform better for 32 test cases out of 34 in terms of the best objective value, while for the Avg, worst, and SD values it performed better for 25 test cases. This is considered as one of our main limitations to be processed in the future work, From the above, it is concluded that our modification to the standard CSA significantly improved its performance during solving the nonlinear equations system. This improvement was due to the searchability of the integrated method to avoid falling into local optima and accelerating the convergence speed in the right direction of the optimal solution. However, ICSA could not outperform some optimization algorithms, with the computational cost and stability as the limitations of our proposed algorithm, which will be addressed in future work by integrating the CIS with one of the several continuous evolutionary and swarm intelligence algorithms such as natural evolution strategies [38], the particle swarm optimization in the estimation of distribution algorithms (EDAs) framework [39], the EDAs [40], and the covariance matrix adaptation evolution strategy [41], which have not been yet applied to tackle the NESs.

Conclusions and Future Work
This paper has presented a new algorithm with strong merits to promote the searchability for solving the systems of nonlinear equations with a low number of function evaluations and fast convergence to the near-optimal solution. This is one of the challenges in the cyber physical domain, especially finding small variations between normal and abnormal behaviors of nonlinear attributes. This algorithm is based on integrating the cuckoo search algorithm with a novel strategy to produce a new variant, named the improved cuckoo search algorithm (ICSA), with high convergence speed and final accuracy in a small number of function evaluations. To assess the performance of ICSA, 34 well-known nonlinear equations systems were compared to see ICSA's effectiveness in attacking the optimal solution for several function evaluations reaching 15,000 (multiplying the population size by the maximum number of iterations). ICSA also was extensively compared with the standard cuckoo search algorithm and five well-established algorithms-slime mould optimizer, marine predators algorithm, salp swarm algorithm, bat algorithm, and flower pollination algorithm-to affirm the superiority of ICSA. Experimental findings affirmed that ICSA could perform better for 32 test cases out of 34 in terms of the best objective value, while for the Avg, worst, and SD values it performed better for 25 test cases. This is considered as one of our main limitations to be processed in the future work, to preserve the stability of the algorithm within all runs for fulfilling the same outcomes. Additionally, the convergence curve and Wilcoxon rank-sum test were used to confirm the convergence speed and significance of our proposed algorithm, which affirmed that ICSA was better than several compared algorithms. In the future, we will integrate the proposed algorithm for developing a dynamic and wrapper feature selection algorithm that will assist in finding clear boundaries of legitimate and anomalous nonlinear attributes, improving the performance of identifying anomalous events while applying classification algorithms.