Toward an Ideal Particle Swarm Optimizer for Multidimensional Functions

: The Particle Swarm Optimization (PSO) method is a global optimization technique based on the gradual evolution of a population of solutions called particles. The method evolves the particles based on both the best position of each of them in the past and the best position of the whole. Due to its simplicity, the method has found application in many scientiﬁc areas, and for this reason, during the last few years, many modiﬁcations have been presented. This paper introduces three modiﬁcations to the method that aim to reduce the required number of function calls while maintaining the accuracy of the method in locating the global minimum. These modiﬁcations affect important components of the method, such as how fast the particles change or even how the method is terminated. The above modiﬁcations were tested on a number of known universal optimization problems from the relevant literature, and the results were compared with similar techniques.


Introduction
The global optimization problem is usually defined as: x * = arg min The function f is considered a continuous and differentiable function, which is formulated as f : S → R, S ⊂ R n . This problem finds application in a variety of objective problems in the real world, such as problems of physics [1][2][3], chemistry [4][5][6], economics [7,8], medicine [9,10], etc. Global optimization methods are grouped into two broad categories: deterministic and stochastic methods. The most common methods of the first category are the so-called Interval methods [11,12], where the set S is divided iteratively into subregions, and some subregions that do not contain the global solution are discarded using some pre-defined criteria. In stochastic methods, the finding of the global minimum is guided by randomness. In these methods, there is no guarantee to find the global minimum, but they constitute the vast majority of global optimization methods, which is mainly due to the simplicity in their implementation. There have been proposed many stochastic methods by various researchers such as Controlled Random Search methods [13][14][15], Simulated Annealing methods [16][17][18], Differential Evolution methods [19,20], Particle Swarm Optimization methods [21][22][23], Ant Colony Optimization [24,25], Genetic Algorithms [26][27][28], etc. In addition, many works have appeared utilizing the modern GPU processing units [29][30][31]. The reader can find some complete surveys about metaheuristic algorithms in some recent works [32][33][34].
The method of Particle Swarm Optimization is a method based on a population of candidate solutions that are also called particles. These particles have two basic characteristics: their position at any given time − → x and the speed − → u at which they moved. The position x i of every particle can be defined as the summation of the current position x i and the associated velocity u i : x i = x i + u i , where the velocity u i is defined as as a combination of the old velocity and the best located values p i and p best : where r 1 and r 2 are random numbers and c 1 , c 2 are user-defined constants. The vaue ω is called the inertia of the algorithm and usually is calculated using some efficient algorithm. The purpose of the method is to move the particles repetitively, and their next position is calculated as a function not only of their position but also of the best position they had in the past as well as the best position of the population. Various researchers have provided an analytic review of the PSO process such as the review of Jain et al. [35] where 52 papers have been reviewed, the work of Khare et al. [36] where a systematic review of the PSO algorithm is provided along with the application of PSO in solar photovoltaic systems. The method was successfully used in a variety of scientific and practical problems in areas such as physics [37,38], chemistry [39,40], medicine [41,42], economics [43], etc. Many researchers have proposed a variety of modifications in the PSO method during the last few years; such methods aimed to estimate a more efficient calculation for the inertia parameter ω of the speed calculation [44][45][46]. The majority of methods used to calculate the inertia value calculate the inertia as a time-varying parameter, and hence, there is lack of diversity, which could be used to better explore the search space of the objective function. This paper introduces a new scheme for the calculation of inertia which takes into account the speed of the algorithm to locate new local minima in the search space of the objective function.
Other modifications of the PSO are used to alter the calculation of the factor c 1, c 2 , which usually are called acceleration factors: for example, the work of Ratnaweera et al. [47], where a time-varying mechanism is proposed for the calculation of these factors. In addition, the PSO algorithm has been combined with the mutation mechanism [48][49][50] in order to improve the diversity of the algorithm and to better explore the search space of the objective function. Engelbrecht [51] proposed a novel method for the initialization of the velocity vector, where the velocity is initialized randomly rather than with zero value. In addition, in the relevant literature, there have been hybrid techniques [52][53][54], parallel techniques [55][56][57], methods aim to improve the velocity calculation [58][59][60], etc.
The method of PSO has been integrated into other optimization techniques. For example, Bogdanova et al. [61] combined Grammatical Evolution [62] with swarm techniques such as PSO. Their work is divided in two phases: during the first phase, the PSO method is decomposed into a list of basic grammatical rules. During the second phase, instances of the PSO method are created using the previous rules. Another interesting combination with an optimization technique is the work of Pan et al. [63], who combined the PSO method with simulated annealing to improve the local search ability of the PSO method. In addition, Mughal et al. [64] used a hybrid technique of PSO and simulated annealing for photovoltaic cell parameter estimation. Similarly, the work of Lin et al. [65] utilized a hybrid method of PSO and differential evolution for numerical optimization problems. In addition, Epitropakis et al. [66] suggested a hybrid PSO method with differential evolution, where the at each step of PSO, a differential evolution is applied to the best particle of the population. Among others, Wang et al. proposed a new PSO method based on chaotic neighbourhood search [67].
The PSO method is an iterative process, during which a series of particles evolve through a process that involves updating the position of the particles and fitness computation, i.e., evaluation of the objective function. Variations of PSO that aim at the global minimum in a shorter time may include the use of a local optimization method in each iteration of the algorithm. Of course, the above process can be extremely time consuming, and depending on the termination method used and the number of local searches performed, it may require a long execution time.
This text introduces three distinct modifications to the original method, which drastically improve the time required to find the total minimum by reducing the required number of function evaluations. These modifications cover a large part of the method: the speed calculation, a new method of avoiding running local search methods and a new adaptive termination rule. The first modification is used to enable the method to explore the search space more efficiently by calculating the inertia parameter with respect to the ability of the PSO to discover the local minima of the objective function. The second modification is used to avoid finding the same local minimum over and over again by the method. This way, on the one hand, the algorithm will not be trapped in local minima, and on the other hand, calls to the objective function will not be wasted, which will lead with some certainty to minima that have already been found. The third modification introduces a novel termination rule based on asymptotic calculations. Termination of methods such as PSO is an important process, as efficient termination methods will terminate the method in time without wasting many calls on the objective function, but at the same time, they should guarantee the ability of the method to find the global minimum.
The proposed modifications were applied to a number of problems from the relevant literature and compared with similar techniques, and the results are presented in a separate section. From the experiments performed, it was shown that the proposed modifications significantly reduce the required execution time of the method by drastically reducing the number of function calls required to find the total minimum. In addition, these modifications can be applied either alone or in combination with the same positive results. This means that they are quite general and can be included in other techniques based on PSO.
The rest of this article is divided as follows: in Section 2, the initial method and the proposed modifications are discussed, in Section 3, the experiments are listed, and finally, in Section 4, some conclusions and guidelines for future improvements are presented.

Method Description
The base algorithm of PSO and the proposed modifications are outlined in detail in the following subsections. The discussion starts with a new mechanism that calculates the velocity of the populations, continues with a discarding procedure used to minimize the number of local searches performed, and ends with a discussion about the new stopping rule proposed here.

The Base Algorithm
The base Algorithm 1 is listed below with the periodical application of the local search method in order to enhance the estimation of the global minimum, i.e., at every iteration, a decision with probability p l is made in order to apply a local search procedure to some of the particles. Usually, this probability is small, for example 0.05 (5%).
The current work modifies the above algorithm in three key points: 1.
In Step 2, a new termination rule based on asymptotic considerations is introduced.

2.
In Step 3b, the algorithm calculates the new position of the particles. The proposed methodology modifies the position of the particles based on the average speed of the algorithm to discover new minimums. 3. In Step 3c, a method based on gradient calculations will be used to prevent the PSO method from executing unnecessary local searches.

Velocity Calculation
The algorithm of Section 2.1 calculates at every iteration the new position x i , which is calculated using the old position x i , and the associated velocity u i according to the scheme: Typically, the velocity is calculated as a combination of the old velocity and the best located values p i and p best and may be defined as: where 1. The parameters r 1 , r 2 are random numbers with r 1 ∈ [0, 1] and r 2 ∈ [0, 1].

3.
The variable ω is called inertia, with ω ∈ [0, 1]. The inertia was proposed by Shi and Eberhart [21]. They argued that high values of the inertia coefficient cause better exploration of the search area, while small values of this variable are used when we want to achieve better local research around promising areas for the global minimum. The value of the inertia factor generally starts with large values and decreases with repetition. This article proposes a new adaptive technique for the inertia parameter, and this mechanism is compared against three others from the relevant literature.

Random Inertia
The inertia calculation is proposed in [68], and it is defined as where r is a random number with r ∈ [0, 1]. This inertia calculation will be called I1 in the tables with the experimental results.

Linear Time-Varying Inertia (Min Version)
This inertia schema has been proposed and used in various studies [68][69][70], and it is defined as: where ω min is the minimum value of inertia and ω max the maximum value for inertia. This inertia calculation will be called I2 in the tables with the experimental results.

Linear Time-Varying Inertia (Max Version)
This method is proposed in [71,72], and it is defined as: This inertia calculation will be called I3 in the tables with the experimental results.

Proposed Technique
This calculation of inertia involves the number of iterations where the method manages to find a new minimum. In the first iterations and when the method has to do more exploration of the research area, the inertia will be great. When the method should focus on a minimum, then the inertia will decrease. For this reason, at every iteration iter, the quantity is measured. In the first steps of the algorithm, this quantity will change from repetition to repetition at a fast pace, and at some point, it will no longer change at the same rate or will be zero.
Hence, we define a metric to model the changes in δ (iter) as Using this observation, two additional metrics are created, S and the metric C δ is given by: The following definition for the inertia calculation is proposed: This mechanism will be called IP in the tables with the experimental results.

The Discarding Procedure
The method in each iteration performs under the condition of a series of local searches. However, these searches will often either lead to local minima already found or locate values far below the global minimum. This means that much of the computing time will be wasted on actions that could have been avoided. In order to be able to group points that would lead by local search to the same local minimum, we introduce the concept of cluster, which refers to a set of points that are believed, under some asymptotic considerations, to belong to the same region of attraction of the function. The region of attraction for a local minimum x * is defined as: where LS(x) is a local search procedure that starts from a given point x and terminates when a local minimum is discovered. The discarding procedure suggested here prevents the method from starting a local search from a point x if that point belongs to the same region of attraction with other points. This procedure is composed by two two major parts: 1. The first part is the so-called typical distance, which is a measure calculated after every local search, and it is given by where the local search procedure LS(x) initiates from x i and x iL is the outcome of LS(x i ). This measure has been used also in [73]. If a point x is close enough to an already discovered local minima, then it is highly possible that the point belongs to the so-called region of attraction of the minima.

2.
The second part is a check using the gradient values between a candidate starting point and an already discovered local minimum. The function value f (x) near to some local minimum z can be calculated using: where B is the Hessian matrix at the minimum z. By taking gradients in both sides of Equation (15), we obtain: Of course, Equation (16) holds for any other point y near to z By subtracting Equation (17) from Equation (16) and by multiplying with (x − y) T , we have the following equation: Hence, a candidate start point x can be rejected if for any already discovered local minimum z.

Stopping Rule
A common used way to terminate a global optimization method is to utilize a maximum number of allowed iterations iter max , i.e., stop when iter ≥ iter max . Although it is a simple criterion, it is not an efficient one, since if iter max is too small, then the algorithm will terminate without locating the global optimum. In addition, when iter max is too high, the optimization algorithm will spend computation time in unnecessary function calls. In this paper, a new termination rule is proposed to terminate the PSO process, and it is compared against two other methods used in various optimization methods.

Ali's Stopping Method
A method is proposed in the work of Ali and Kaelo [74] where at every generation, the measure is calculated. The f max stands for the maximum function value of the population and f min represents the lowest function value of the population. The method will terminate when α ≤ where is a predefined small positive value, for example = 10 −3 .

Double Box Method
The second method utilized is a method initially proposed by [75]. In this method, we denote with σ (iter) the variance of f min calculated at iteration iter. If the algorithm cannot locate a new lower value for f min for a number of iterations, then the global minimum has already located, and the algorithm should terminate when where iter last stands for the last iteration where a new lower value for f min was discovered.

Proposed Technique
In the proposed termination technique, in each iteration k, the difference between the current best value and the previous best value is measured, i.e., the difference f min . If this difference is zero for a predefined number of iterations k max , then the method terminates.

Experiments
To measure the effect of the proposed modifications on the original method, a series of experiments were performed on test functions from the relevant literature [76,77], and they have been used widely by various researchers [74,[78][79][80]. The experiments evaluated both the effect of the new method of calculating inertia as well as the criterion for avoiding local minima as well as the new termination rule. The experiments were recorded in separate tables, so that it is more possible to understand the effect of each modification separately.
Sinusoidal function: The case of n = 4, 8, 16, 32 and z = π 6 was used in the experimental results. • Test2N function: The function has 2 n in the specified range and in our experiments we used n = 4, 5, 6, 7. • Test30N function: 10,10], with 30 n local minima in the search space. For our experiments, we used n = 3, 4.

Experimental Setup
All the experiments have been performed 30 times using a different seed for the random number generator each. The code has been implemented in ANSI C++, and the well-known function drand48() was used to produce random numbers. The local search method used was the BFGS method [83]. All the parameters used in the conducted experiments are listed in Table 1.

Experimental Results
For every stopping rule, two tables are listed: the first one contains experiments with the relevant stopping rule without the gradient discarding procedure, and the second table contains experiments with the gradient discarding procedure enabled. The numbers in table cells stand for average function calls. The fraction in parentheses denotes the fraction of runs where the global optimum was found. The absence of these fractions means that the global minimum was discovered in every run (100% success). The experimental results using the stopping rule of Equation (21) are listed in Tables 2 and 3. The experimental results for the Double Box stopping rule of Equation (22) are listed in Tables 4 and 5. Finally, for the proposed stopping rule, the results are listed in Tables 6 and 7. In addition, the boxplots for the proposed stopping rules without and with the gradient check of Equation (19) are illustrated in the Figures 1 and 2 respectively. The above results lead to a number of observations:

1.
The PSO method is a robust method, and this is evident by the high success rate in finding the global minimum, although the number of particles used was relatively low (100).

2.
The proposed inertia calculation method as defined in Equation (12) achieves a significant reduction in the number of calls between 11 and 25% depending on the termination criterion used. However, the presence of the gradient check mechanism of Equation (19) nullifies any gain of the method, as the rejection criterion significantly reduces the number of calls regardless of the inertia calculation mechanism used. 3.
The local optimization avoidance mechanism of the gradient check drastically reduces the required number of calls for each termination criterion while maintaining the success rate of the method at extremely high levels.

4.
The proposed termination criterion is significantly superior to the other two with which the comparison was made. In addition, if the termination criterion is combined with the mechanism for avoiding local optimizations, then the gain in the number of calls grows even more.
To show the effect of the proposed termination method, an additional experiment was performed in which the dimension of the sinus problem increased from 2 to 32, and in each case, all three termination techniques were tested. The result of this experiment is graphically represented in Figure 3. This graph shows that the double box method is significantly superior to the Ali method but, of course, the new proposed method further reduces the required number of function calls. In addition, the effect of the application of the rejection mechanism based on gradients of Equation (19) is illustrated graphically in Figure 4, where the proposed termination rule is applied on a series of test functions with the gradient check and without the gradient check.

Conclusions
In the current work, three new modifications of the PSO method for locating the global minimum of continuous and differentiable functions were presented. The first modification alters the population velocity calculation in an attempt to cause large changes in velocity when the method is in its infancy and constantly finds new local minima and small velocity changes when the method is to be centered around a promising area of a global minimum. The second modification limits the number of local searches performed by the method through an asymptotic criterion based on derivative computation. The third modification introduces a new termination criterion based on the observation that the method from some iteration onwards will not be able to detect a new minimum, and therefore, its termination should be considered. All of these modifications have low computational requirements.
The proposed modifications were applied to the PSO method either one by one or all together in combination. The purpose of the method is to find the total minimum of continuous functions using the smallest possible number of function calls. The experimental results showed that the modifications significantly reduce the number of function calls even when not used in combination. This means that they can be used individually and in other variations of PSO. The reduction in the number of function calls reaches up to 80%. In addition, the amendments did not reduce the ability of the PSO to find the total minimum of the objective function. In addition, the first modification reduces the number of required calls but only when the criterion for avoiding local minimization is not present.