Double-Group Particle Swarm Optimization and Its Application in Remote Sensing Image Segmentation

Particle Swarm Optimization (PSO) is a well-known meta-heuristic. It has been widely used in both research and engineering fields. However, the original PSO generally suffers from premature convergence, especially in multimodal problems. In this paper, we propose a double-group PSO (DG-PSO) algorithm to improve the performance. DG-PSO uses a double-group based evolution framework. The individuals are divided into two groups: an advantaged group and a disadvantaged group. The advantaged group works according to the original PSO, while two new strategies are developed for the disadvantaged group. The proposed algorithm is firstly evaluated by comparing it with the other five popular PSO variants and two state-of-the-art meta-heuristics on various benchmark functions. The results demonstrate that DG-PSO shows a remarkable performance in terms of accuracy and stability. Then, we apply DG-PSO to multilevel thresholding for remote sensing image segmentation. The results show that the proposed algorithm outperforms five other popular algorithms in meta-heuristic-based multilevel thresholding, which verifies the effectiveness of the proposed algorithm.


Introduction
Particle Swarm Optimization (PSO) is an evolutionary optimization algorithm based on swarm intelligence. It is originally proposed by Kennedy and Eberhart in 1995 [1] and is known for its effectiveness and simplicity. It has been proved to be outstanding in solving many complex optimization problems such as power systems [2], neural network training [3], global path planning [4], and feature selection [5].
However, PSO also suffers from two limitations. One is that the original PSO tends to converge to the local optima when applied to complex problems. On the other hand, the convergence speed of the original PSO and most of its variants is slow, especially on high-dimensional problems [6]. Therefore, accelerating the convergence speed and avoiding the local optima convergence have become the two most important and appealing goals in particle swarm optimization studies [7,8]. Specifically, the studies can be classified into three strategies: parameter selection strategy, topology strategy and learning strategy.
The parameter selection refers to the optimization of the inertial weight factor, convergence factor, and the acceleration constant. The inertial weight factor is introduced by Shi and Eberhart to improve the update of velocity [9]. Further studies also show that applying linear decreasing [10], nonlinear [11], exponential [12] and Gaussian [13] strategy to optimize the inertia weight can enhance the overall performance. The convergence factor is proposed by Clerc and Kennedy to enhance the final convergence [14]. In addition, detailed studies [15][16][17] show that the acceleration constant takes an important role on convergence performance. 2

of 29
The topology strategy is generally employed to improve exploration and avoid premature convergence. In topology strategy, individuals learn from the neighborhood rather than the whole swarm. Therefore, more information would be shared during the search process, which is useful to improve optimization performance. A number of topologies including ring or circle topology, wheel topology, star topology, pyramid topology, Von Neumann topology and random topology are suggested by Kennedy in [18]. Generally, a large neighborhood is good for simple problems, whereas a small neighborhood is helpful for avoiding premature convergence on complex problems [19]. Reference [20] studied the topology extensively, which provides a useful guide of topology selection. It points out that an optimal topology is both problem-specific and computational-budget-dependent and two formulas have been introduced to estimate optimal topology parameters based on numerical experiments.
In the original PSO, all individuals keep learning from the global best solution and their individual best experience in the whole search process. This may lead to premature convergence [21]. To overcome the problem, some novel learning strategies have been developed in recent years. A comprehensive learning strategy is developed to improve the performance on complex multimodal functions in [22]. Reference [23] introduces a cooperative approach to solve high-dimensional optimization problems with multiple swarms. A cooperatively coevolving strategy is proposed in [24] to further improve the performance. Sun et al. introduce a global guaranteed convergence optimizer called quantum behaved particle swarm optimization, which improves the performance by increasing the population diversity [25]. A variant with double learning patterns is developed in [26], which employs the master swarm and the slave swarm with different learning patterns to achieve a trade-off between the convergence speed and the swarm diversity.
However, the three strategies above still face the following shortcomings. In parameter selection, some strategies do improve the overall performance in many cases, but the effect is limited [19], and it is hard to obtain an optimal parameter for all cases. In topology strategy and learning strategy, although the exploration is improved to avoid premature convergence, the convergence speed is reduced at the same time.
In this paper, we design a double-group particle swarm optimization (DG-PSO) to improve the performance. The whole population is divided into two groups: an advantaged group and a disadvantaged group. The modification is focused on the disadvantaged group. A novel learning strategy is developed based on the comprehensive learning strategy and the self-pollination strategy in another popular metaheuristic called Flower Pollination Algorithm (FPA). In addition, a diversity enhancing strategy is also designed to avoid premature convergence. Compared with those published works, the main contribution in this paper is that a novel variant called DG-PSO is proposed which shows remarkable performance compared with five other popular variants and two meta-heuristics. Two new ideas are developed in DG-PSO: a learning strategy, which combines the comprehensive learning strategy [22] and the self-pollination strategy [27], and a diversity enhancing strategy, which adds disturbance to the individuals in the disadvantaged group to avoid premature convergence in multimodal problem. In addition, we also apply the algorithm to multilevel thresholding for image segmentation, which verifies the effectiveness of DG-PSO and provides a good choice of the metaheuristic algorithm to implement multilevel thresholding. The rest of the paper is organized as follows: Section 2 reviews the original PSO and some related works. The strategies and framework of the proposed algorithm are presented in detail in Section 3, followed by the experiments in Section 4. Then, the further application on multilevel thresholding for image segmentation is shown in Section 5.

Background
In this section, firstly, we outline the original PSO. Then, two basic works for our algorithm including the comprehensive learning strategy and the self-pollination strategy in FPA are introduced, respectively.

Particle Swarm Optimization
Similar to other meta-heuristics, PSO is based on swarm intelligence. The swarm is composed of a set of particles i ∈ [1, 2, . . . n]. A particle moves in the search space with a velocity. The position and velocity of the particle are dynamically adjusted according to its own and its companion's historical experience. Each particle's position is associated with a candidate solution to the problem, and better solutions are obtained via evolution. The performance of a solution is judged by a given fitness function (e.g., smaller fitness function values indicate better solutions for the minimization problem). For a D-dimensional problem, there are four main vectors: the original PSO, and it has been well tested that CLPSO is effective in optimizing benchmark functions and real-world problems [22,[28][29][30][31].

Self-Pollination Strategy in the Flower Pollination Algorithm
Flower pollination algorithm is a popular nature inspired meta-heuristic in [27]. It has been widely used in many fields such as sizing optimization of truss structures [27], economic load dispatch problem in power systems [32], Sudoku Puzzles [33] and feature selection [34] since being published in 2012.
As a swarm-based metaheuristic algorithm, each individual i in the swarm is called a pollen individual. Each pollen individual is associated with a candidate solution (sol i = [sol 1 i , sol 2 i . . . sol D i ]) in the search space. FPA searches using the global and local search techniques, where the local search simulates the self-pollination process. The self-pollination strategy is one of the basic ideas in FPA (the other one is cross-pollination). Self-pollination occurs when there are no pollen vectors (Pollen vectors, or called pollinators, can be very diverse. It is estimate there are at least 200,000 variety of pollen vectors such as insects, bats and birds [27]) such as wind or insects or when the pollen individuals are pollinated within the same plant. Such self-pollination behaviors are concluded in the following two rules below: 1.
Self-pollination corresponds to the local pollination.

2.
Pollinators can develop flower constancy, which is regarded as a reproduction probability that is proportional to the similarity of two flowers involved.
Based on the two rules above, the self-pollination strategy is drawn as Equation (4) shows. Different from PSO, sol i is the only vector that associates with each pollen individual. sol i not only represents the position of the pollen individual i, but also plays the role of the best solution this individual has ever found (to understand what is the "sol", we can refer to the solutions in PSO such as the position x and the previous best solution pbest). It generates the new solutions by using the previous one and two other solutions chosen randomly from the population: where sol r1 and sol r2 are two random solutions in the current generation, which mimics the flower constancy in a limited neighborhood. ε is a uniformly distributed random number within [0,1] used to implement a local random walk. As rule 1 indicates, the self-pollination is considered as local pollination, which often occurs in a limited neighborhood of the particle itself. It can be regarded as the local search around the current position of the pollen individual.

The Proposed Algorithm
In this section, we describe the proposed algorithm. Figure 1 shows the overall flowchart, where the process colored by yellow is the core idea of our algorithm. Different from the original PSO, we separate all particles into two groups in DG-PSO: an advantaged group (with the population of x 1 , x 2 . . . x m ) and a disadvantaged group (with the population of x m+1 , x m+2 . . . x n , where n > m). The advantaged group evolves according to the same theory as the original PSO (Equations (1) and (2)), while the disadvantaged group is updated with two novel strategies: a learning strategy and a diversity enhancing strategy. We focus on the explanation of how the disadvantaged group works. As shown in Figure 2, the two new strategies work as two sequential processing stages in the update of the disadvantaged group, which will be discussed carefully in the following two subsections. In addition, the detailed steps and the whole framework of the proposed method are given in Section 3.3. Finally, we discuss and compare the proposed algorithm with other related works in Section 3.4.

The Learning Strategy
The learning strategy is based on the self-pollination strategy introduced in Section 2. We firstly employ the previous best solution pbest to be the solution " sol " in Equation (4) (rather than the position x, this is because pbest represent the best historical experience of each particle, which is more worthy to learn from compared with the position x). Then, it becomes Equation (5) for the particle i : where   1,..., i m n denotes the particles in the disadvantaged group;  (6), where  ( , ) [1, 2... ] f i d m is the strategy to identify a particle's pbest for the th d dimension of particle i to learn from: ( , ) f i d works according to the comprehensive learning strategy. For the th d dimension of particle i , the specific procedure to identify the

The Learning Strategy
The learning strategy is based on the self-pollination strategy introduced in Section 2. We firstly employ the previous best solution pbest to be the solution "sol" in Equation (4) (rather than the position x, this is because pbest represent the best historical experience of each particle, which is more worthy to learn from compared with the position x). Then, it becomes Equation (5) for the particle i: where i = m + 1, . . . , n denotes the particles in the disadvantaged group; pbest r1 and pbest r2 are two solutions chosen randomly from the pbest of the whole population (Specifically, r1 and r2 are two random integers chosen from sequence 1, 2, . . . n (r1 = r2). These two parameters keep the same for all dimensions when updating a particle i. In addition, they are regenerated for different particles.).
ε represents the scaling factor to perform a random walk satisfying a uniform distributed within [0, 1]. Similar to the original self-pollination strategy, Equation (5) can be considered as the local search around the solution (position) pbest i . On the other hand, as the comprehensive learning strategy generally defines a more suitable solution for the particles to learn from, we additionally replaced the pbest d i in Equation (5) with is the strategy to identify a particle's pbest for the dth dimension of particle i to learn from: f (i, d) works according to the comprehensive learning strategy. For the dth dimension of particle i, the specific procedure to identify the pbest d f (i,d) is shown as follows: 1.
Randomly choose two particles out of the advantaged group; 2.
Compare the fitness of the two particles' pbest and choose the better one; 3.
Use the dth dimension of the winner's pbest as the pbest d f (i,d) for the corresponding dimension of the ith particle to learn from. Then, a new position is generated using Equation (6) for the particle i in the disadvantaged group to update. Using (6), the particles in the disadvantaged group can learn from the information derived from different particles' historical best position. The strategy is different from the original self-pollination because we perform local search around the new generated position pbest d f (i,d) rather than the particle itself. The reason is that always searching the area around the position itself may reduce the search efficiency because some particles may be located in the low-promising area. In contrast, making more use of the good information from the advantaged group (using the comprehensive learning strategy) is inductive to the search efficiency.

The Diversity Enhancing Strategy
PSO often suffers from premature convergence, especially when optimizing the multimodal problem. It is because the original PSO algorithm only employs an attraction phase Equation (1), in which all particles in the swarm move quickly to the same area and the diversity decreases quickly [35]. This generally leads to converging to the local optima due to the loss of diversity [22]. In such case, improving diversity becomes an important issue in PSO research [22,36]. As diversity is lost due to particles getting clustered together [37], adding disturbance to the particles is helpful for them to escape from the local optimal and enhance diversity. Therefore, we developed a strategy to push the particle away from their current position by adding disturbance given in Equation (7): where s is the scaling factor that controls the intensity of the disturbance. As shown in Equation (8), it is identified using the whole search range of the corresponding dimension (which denotes the strong disturbance) or the Euclidean distance of the two pbest chosen in the learning strategy (which denotes a relatively weak disturbance). The strong disturbance is designed for the case that the particle falls rand 1 into a large-area local optimum. Therefore, a big jump is needed to escape. The weak disturbance is designed for the case that the particle is close to the global optimum. In such case, a small random walk is more helpful to approaching the optimum.
where, rand 1 and rand 2 are two random number uniformly generated within [0, 1] and Ub and Lb represents the upper and lower bounds of the search space. Specifically, the strategy works as follows. For each dimension of particle i, we generate a random number within [0, 1]. If the number is smaller than the given threshold P, the diversity of the corresponding dimension will be enhanced by adding a random disturbance using (7) and (8). With the disturbance, the particles are more capable to escape from the local optimal and avoid premature convergence.

The Framework
Algorithm 1 shows the detailed steps of updating the disadvantaged group, which is the core of our modification. Apart from Algorithm 1, another minor modification in the proposed algorithm is that all particles in the two groups should be redistributed according to their fitness at the end of each generation. m particles with better fitness (for minimization problem, "better" means "smaller") are distributed to the advantaged group, whereas others are distributed to the disadvantaged group. The overall framework and the detailed steps are shown in Figure 1 and Algorithm 2, respectively, where MaxFEs is the maximum number of function evaluations that represent the maximum computation cost. For i = m + 1 : n 2 Randomly choose two pbest:pbest r2 and pbest r2 out of the whole population; 3 /* Learning stage */ 4 For d = 1 : D 5 1 Generate two different integers a and b within [1,2 . . . m]; End 11 x End 13 /* Diversity Enhancing stage */ 14 For j = 1 : D 15 If rand < p 16 Draw a scaling factor using Equation (8); 17 Add disturbance for the current dimension using Equation (7) While f es < MaxFEs 4 For i = 1 : m 5 Update the particle i in the advantaged group using Equations (1) and (2); 6 End 7 Evaluate the fitness of the advantaged group; 8 Update pbest and record the corresponding fitness as fpbest. 9 Update the disadvantaged group using Algorithm I; 10 Evaluate the fitness of the disadvantaged group; 11 Update pbest and record the corresponding fitness as fpbest. 12 f es = f es + n; 13 Redistribute the whole population; 14 End

Discussion and Comparison of the Proposed Algorithm with Other Related Works
As mentioned above, we combined the current existing comprehensive learning strategy with the self-pollination strategy in FPA. Specifically, we firstly applied the self-pollination strategy to PSO. Then, the comprehensive learning strategy is used to identify an exemplar for the particles in the disadvantaged group to learn from. Note that we choose the exemplar in the advantaged group rather than in the whole swarm. This strategy aims to improve the learning efficiency of the disadvantaged group. Obviously, such strategy is different from CLPSO (because CLPSO uses the comprehensive learning to modify the learning strategy of the original PSO as introduced in Section 2, whereas we proposed a new learning strategy).
Based on the analysis above, CLPSO, FPA would be used to compare with the proposed one. In addition, since we also developed a diversity enhancing strategy to further improve the performance, it is also necessary to evaluate its effectiveness. We firstly define: 1.
dg-PSO: the proposed algorithm that only employs the learning strategy; 2.
DG-PSO: the proposed algorithm that employs both the learning strategy and the diversity enhancing strategy.
Then, the effectiveness of the diversity enhancing strategy can be evaluated by comparing the performance of dg-PSO with DG-PSO.

Experiments on Benchmark Functions
In this section, we first describe the 20 benchmark functions used for performance evaluation. Then, the algorithms and the necessary parameters for comparison are introduced. Finally, the results are shown and discussed in detail.

The Benchmark Functions
The 20 benchmark functions employed in the experiments are presented in Table 1. All the functions are the minimization problem, which is defined according to [38,39] in the search space [−100, 100]. The functions can be categorized into four classes, namely (1) basic problems; (2) rotated problems; (3) shifted problems; and (4) complex problems. The basic problems include not only the basic unimodal and multimodal problems, but also a noisy problem (F4), an expanded (F8) and an expanded hybrid problem (F9). The rotated problems are designed to overcome the drawback in the basic functions that the variables are separable and the local optima are regularly distributed. In these rotated problems, the original variable x is rotated by left multiplying the orthogonal matrix M, i.e.,y = M × x. Shifted problems are designed to overcome two other problems (in basic functions) including: each dimension value of the global optimum is always the same, and the global optimum is usually located at the centre of the search space. In addition, the complex problems include both rotation and shift. Table 2 shows the five PSO variants and two other popular meta-heuristics used in the comparison. These algorithms include not only the algorithms we mentioned before (CLPSO, FPA), but also some other state-of-the-art algorithms, which are chosen according to the three strategies introduced in Section 1. We give a brief description of them here. First, Modified PSO (MPSO) [36] uses parameter selection based strategy, of which the population size and inertial weight are adaptively adjusted within the search process. Second, Unified PSO (UPSO) [40] and Fully Informed PSO (FIPS) [41] are two neighbourhood topology strategy based variants. UPSO represents the unified PSO, which is a combination of the original PSO and the topology strategy based PSO. FIPS means the fully informed PSO, which employs the fully informed neighbourhood topology. Finally, Fitness-distance-Ratio PSO (FDR-PSO) [42] and CLPSO [22] are chosen from learning strategy based variants, where FDR-PSO employs a fitness-distance-ratio strategy to identify a "fittest-and-closest" particle to modify the learning strategy. In addition, another novel meta-heuristic called Social Spider Optimization (SSO) [43] is also chosen to give the comparison as comprehensive as possible. In addition, DG-PSO and dg-PSO are the proposed algorithms, where only DG-PSO has diversity enhancing strategy.

Algorithms and Parameters
The parameters of the involved algorithm are set as follows. For dg-PSO and DG-PSO, the population size of the advantaged group and the disadvantage group are set to 30 and 25 respectively; the possibility p of diversity enhancing is set to 1/D. The population size for other PSO variants are set to 40 [44], except MPSO, which employs the adaptive population strategy (initial value, minimum and maximum are 5, 5 and 40, respectively) [36]. Other parameters are listed in Table 2. We performed the evaluation in both 30 dimensions with MaxFEs = 4 × 10 5 [45] and 50 dimensions with MaxFEs = 7 × 10 5 . Thirty runs are conducted for each function, and the mean fitness error and the corresponding deviation are calculated (the error is defined by the difference between the fitness function value and the minimum, i.e., Error = Fitness − F min ). All the experiments are carried out using MATLAB 2016 on the same machine with an Intel I5-4590 CPU @ 3.3 GHz processor (Intel, Santa Clara, CA, USA), 4.00 GB memory, and Windows 7 Professional operating system (Microsoft, Redmond, WA, USA).

Results and Discussion
The mean fitness error values and the corresponding standard deviation are shown in Tables 3  and 4, respectively, where "Mean" represents the mean fitness error of which the best one in each case is shown in bold; "Std" means the standard deviation. We perform the Wilcoxon Signed Rank test to give a rigorous comparison, in which the significance level is set as 0.05. The results are represented by "C" in the tables, where the three kinds of symbols indicate the performance of DG-PSO: "+" means DG-PSO is relatively better, "=" means insignificant and "-" means DG-PSO is relatively worse. We make a sum of the comparison results and showed the final results in the form of "W/T/L" in the bottom of each table, where "W/T/L" means the number of problems DG-PSO win, tie and lose respectively compared with the corresponding algorithm.
We firstly compare DG-PSO with other published algorithms. According to the statistical results in Table 3  However, by comparing DG-PSO with dg-PSO, we can find that the diversity enhancing also brings significant inefficiency to DG-PSO on unimodal problems (F1, F2, F4, F10, F17 and F18). This is mainly because the diversity enhancing strategy weakens the exploitation.

No.
Name  Table 2. Parameters and references of the involved algorithms.

PSO Variants Parameters Reference
Other State-of-the-Art Meta-Heuristics Parameters Reference FPA Population size Switch possibility n = 25 p = 0.8 [27] SSO Population size The threshold The Proposed Variants Parameters Reference Ours (with diversity enhancing)    To rank the algorithms clearly, the Friedman test is used to compare the involved algorithms using all the data of mean fitness error values on the 20 problems. The Friedman test is the best-known procedure for testing the differences between more than two related samples [48], which can detect significant differences between the behavior of two or more algorithms. We conduct two tests that rank the algorithms on the 30D and 50D, respectively. The significance level is set to 0.05. Table 5 presents the numerical rankings obtained by the test. In addition, the corresponding graphical ranking results are shown in Figures 2 and 3, where the center square indicates the average rank of the corresponding algorithm and the line denotes the confidence intervals. Smaller ranks mean better performance and, when there is no overlap on the intervals of any two algorithms, they are significantly different. The results in these two figures clearly demonstrate that the proposed algorithm outperforms all other algorithms including dg-PSO, CLPSO and FPA in both 30D and 50D. To rank the algorithms clearly, the Friedman test is used to compare the involved algorithms using all the data of mean fitness error values on the 20 problems. The Friedman test is the best-known procedure for testing the differences between more than two related samples [48], which can detect significant differences between the behavior of two or more algorithms. We conduct two tests that rank the algorithms on the 30D and 50D, respectively. The significance level is set to 0.05. Table 5 presents the numerical rankings obtained by the test. In addition, the corresponding graphical ranking results are shown in Figures 2 and 3, where the center square indicates the average rank of the corresponding algorithm and the line denotes the confidence intervals. Smaller ranks mean better performance and, when there is no overlap on the intervals of any two algorithms, they are significantly different. The results in these two figures clearly demonstrate that the proposed algorithm outperforms all other algorithms including dg-PSO, CLPSO and FPA in both 30D and 50D.   For further evaluation, the convergence performance and average time consumption are also compared in Figures 4 and 5, respectively. The results of F8, F10, and F14 in 50-D are given to exemplify the performance. From Figure 4, we observe that DG-PSO has outstanding performance on the multimodal problems (F8 and F14), while dg-PSO obtained the best result in unimodal function F10. From Figure 5, it can be found that DG-PSO consumes slightly more than the two To rank the algorithms clearly, the Friedman test is used to compare the involved algorithms using all the data of mean fitness error values on the 20 problems. The Friedman test is the best-known procedure for testing the differences between more than two related samples [48], which can detect significant differences between the behavior of two or more algorithms. We conduct two tests that rank the algorithms on the 30D and 50D, respectively. The significance level is set to 0.05. Table 5 presents the numerical rankings obtained by the test. In addition, the corresponding graphical ranking results are shown in Figures 2 and 3, where the center square indicates the average rank of the corresponding algorithm and the line denotes the confidence intervals. Smaller ranks mean better performance and, when there is no overlap on the intervals of any two algorithms, they are significantly different. The results in these two figures clearly demonstrate that the proposed algorithm outperforms all other algorithms including dg-PSO, CLPSO and FPA in both 30D and 50D.   For further evaluation, the convergence performance and average time consumption are also compared in Figures 4 and 5, respectively. The results of F8, F10, and F14 in 50-D are given to exemplify the performance. From Figure 4, we observe that DG-PSO has outstanding performance on the multimodal problems (F8 and F14), while dg-PSO obtained the best result in unimodal function F10. From Figure 5, it can be found that DG-PSO consumes slightly more than the two   For further evaluation, the convergence performance and average time consumption are also compared in Figures 4 and 5, respectively. The results of F8, F10, and F14 in 50-D are given to exemplify the performance. From Figure 4, we observe that DG-PSO has outstanding performance on the multimodal problems (F8 and F14), while dg-PSO obtained the best result in unimodal function F10.
From Figure 5, it can be found that DG-PSO consumes slightly more than the two related algorithms: CLPSO and FPA. However, the time consumption of DG-PSO is still acceptable when compared with the other algorithms such as FDR-PSO, UPSO, MPSO and SSO.

DG-PSO Based Remote Sensing Image Segmentation
Image segmentation is a fundamental task in remote sensing applications [49], such as change detection and object-based classification. It is used with the expectation that it will divide the image into semantically significant regions, or objects, to be recognized by further processing steps [50]. This work attracts a lot of researchers in the past decade but is still an intractable problem [51]. In terms of all the existing segmentation methods, one of the most popular segmentation techniques is thresholding due to its simplicity, robustness and accuracy [52].
The thresholding methods can be divided into two categories: the bi-level thresholding and multilevel thresholding. If the object in an image is separated from the background using a single threshold value, it is called the bi-level thresholding. In contrast, the multilevel thresholding means that the given image are classified into several different regions according to multiple thresholds. In remote sensing image segmentation, bi-level thresholding does not give appropriate performance, and there are strong requirements of multilevel thresholding [53]. Therefore, numerous studies have been reported [47,[53][54][55][56][57][58] in multilevel thresholding.
The most popular way [53][54][55][56][57][58][59][60][61] to search the optimal thresholds is to maximize some discriminating criteria (fitness function). The traditional method searches the optimal thresholds using exhaustive search strategies, which lead to high computation costs. In recent years, meta-heuristics based methods gained the attention of researchers because of the high computation inefficiency. Quantities of algorithms have been introduced to this area such as PSO [36], Differential Evolution (DE) [62], Artificial Bee Colony (ABC) [59,63,64], Wind Driven Optimization (WDO) [56], Cuckoo Search (CS) [65] and SSO [47]. However, the remote sensing images are very difficult to segment accurately due to multimodality of the histograms [53]. Therefore, improving the performance of the metaheuristic algorithms is necessary for the remote sensing image segmentation.
In this section, we applied the proposed algorithm to multilevel thresholding for optical remote sensing image segmentation. We first describe the problem. Then, the experimental setup is introduced carefully in Section 5.2. Finally, the results and analysis are given in detail.

Problem Definition
This subsection deals with the problem definition of multilevel thresholding problem. As we mention above, multilevel thresholding methods generally search the optimal thresholds by maximizing some criteria. In the literature, Otsu's criterion [66] has been widely employed [36,67,68]. It generally provides image segmentation with satisfactory results [69] and is known for its simplicity and effectivity with respect to uniformity and shape measures and can usually obtain optimal global threshold value [58].
Let l ∈ [0, 1 . . . L − 1] be the gray level of a given an image I, where L is the total gray levels, the problem is then defined as follows. Firstly, the image histogram is calculated and normalized, which is denoted by P l , l = 0, 1, . . . L − 1. For the (D + 1) − class thresholding problem, there are D thresholds k d , (d = 1, 2, . . . D) that segment the image into D + 1 classes. Assume that k 0 (k 0 = 0) and k D+1 (k D+1 = L) denote the upper and lower bound. Then, the thresholds can be sorted with k 0 < k 1 < . . . < k d < . . . k D+1 , and the problem is defined using (9): where Here, ω d = ∑ k d+1 l=k d P l is the probability of the occurrence of the dth class. µ T = ∑ L l=1 l · P l is the total mean intensity of the original image.

Experimental Setup
To demonstrate the superiority of the proposed method, five popular meta-heuristic algorithms in multilevel thresholding including DE, ABC, CS, MPSO, SSO are chosen to compare with the proposed algorithm. All of these algorithms are demonstrated to have good performance in multilevel thresholding in the corresponding reference in Table 6. Specifically, ABC performs better than PSO when the level of thresholds is higher than two in [59]. Reference [53] demonstrates that CS showed remarkable performances in multilevel thresholding problems and could outperform the other known algorithms, such as DE, PSO, WDO and ABC. MPSO shows better performance than Genetic Algorithm (GA) and the original PSO [36]. SSO is applied to multilevel thresholding in [47] and it clearly outperforms PSO, BAT algorithm and FPA in [47]. The parameters of these algorithms are set according to the corresponding work shown in Table 6. The parameters of our proposed algorithm are the same as that in Section 4.
All populations are uniformly randomly initialized. Thirty independent runs are carried out for each algorithm on each image on 2, 3, 4, 5, 7, 9, 15 and 20 thresholds [68,69], respectively. All algorithms are conducted with the same maximum function evaluation: MaxFEs = 3000 * D in identical search space: [0, 256). All methods are adapted for integer optimization problems using the rounding method. Specifically, the search space is defined as [0, 256) for 8-bit gray-scale images, and the integer is obtained by rounding down (e.g., 255.6 is rounded to 255). Figure 6 shows the test images (These images are taken from a very-high-resolution remote sensing image dataset constructed by Gong Cheng et al. from Northwestern Polytechnical University [70].

Experimental Setup
To demonstrate the superiority of the proposed method, five popular meta-heuristic algorithms in multilevel thresholding including DE, ABC, CS, MPSO, SSO are chosen to compare with the proposed algorithm. All of these algorithms are demonstrated to have good performance in multilevel thresholding in the corresponding reference in Table 6. Specifically, ABC performs better than PSO when the level of thresholds is higher than two in [59]. Reference [53] demonstrates that CS showed remarkable performances in multilevel thresholding problems and could outperform the other known algorithms, such as DE, PSO, WDO and ABC. MPSO shows better performance than Genetic Algorithm (GA) and the original PSO [36]. SSO is applied to multilevel thresholding in [47] and it clearly outperforms PSO, BAT algorithm and FPA in [47]. The parameters of these algorithms are set according to the corresponding work shown in Table 6. The parameters of our proposed algorithm are the same as that in Section 4.
All populations are uniformly randomly initialized. Thirty independent runs are carried out for each algorithm on each image on 2, 3, 4, 5, 7, 9, 15 and 20 thresholds [68,69], respectively. All algorithms are conducted with the same maximum function evaluation:  3000 * MaxFEs D in identical search space: [0 , 256 ) . All methods are adapted for integer optimization problems using the rounding method. Specifically, the search space is defined as [0 , 256 ) for 8-bit gray-scale images, and the integer is obtained by rounding down (e.g., 255.6 is rounded to 255). Figure 6 shows the test images (These images are taken from a very-high-resolution remote sensing image dataset constructed by Gong Cheng et al. from Northwestern Polytechnical University [70].

Results and Discussion
In detail, the mean fitness and the corresponding standard deviation are given in Table 7, where the best one in each case of the mean fitness is shown in bold. It is easy to find that our algorithm obtains the best results in all cases in terms of the mean fitness, except the case of 7-level thresholding (D = 7) of image C. To evaluate the effectiveness of our algorithm's improvement over other ones, the involved algorithms are also ranked with the Friedman test. We conduct two tests that ranked the algorithms on the normal (D = 2, 3, 4 and 5) level and high level (The high level thresholding is popularly employed in multilevel thresholding [68,69] Figure 7, the proposed algorithm significantly outperforms DE, ABC and MPSO, and also showed an advantage over the other two algorithms. It can be observed from Figure 8 that the proposed algorithm ranks even better in high level thresholding, which showed a significant difference from all algorithms except CS (our algorithm also ranks better than CS). Figures 9 and 10 show the segmentation results. The pseudo color image shows the whole thresholding results, where each level of the image is represented by the regions with the same color. The binary images show some of the objects separated from the original image, which proved the effectiveness of the segmentation.
In conclusion, the results demonstrated that the proposed algorithm shows remarkable performance in multilevel thresholding when compared with other popular meta-heuristics in this research area.

Results and Discussion
In detail, the mean fitness and the corresponding standard deviation are given in Table 7, where the best one in each case of the mean fitness is shown in bold. It is easy to find that our algorithm obtains the best results in all cases in terms of the mean fitness, except the case of 7-level thresholding (D = 7) of image C. To evaluate the effectiveness of our algorithm's improvement over other ones, the involved algorithms are also ranked with the Friedman test. We conduct two tests that ranked the algorithms on the normal (D = 2, 3, 4 and 5) level and high level (The high level thresholding is popularly employed in multilevel thresholding [68,69]) (D = 7, 9, 15 and 20), respectively. Therefore, 40 variables (i × t × m = 40) are used in each comparison in each test, where i = 5 is the number of images, t = 4 denoted the number of levels, and m = 2 denoted the number of used measures including the meant fitness and the corresponding standard deviation. The significance level is set to 0.05. Table 8 and the two figures (Figures 7 and 8) present the numerical rankings and graphical results obtained by the test, where better performance is denoted by smaller ranks. From the results of normal level thresholding shown in Figure 7, the proposed algorithm significantly outperforms DE, ABC and MPSO, and also showed an advantage over the other two algorithms. It can be observed from Figure 8 that the proposed algorithm ranks even better in high level thresholding, which showed a significant difference from all algorithms except CS (our algorithm also ranks better than CS). Figures 9 and 10 show the segmentation results. The pseudo color image shows the whole thresholding results, where each level of the image is represented by the regions with the same color. The binary images show some of the objects separated from the original image, which proved the effectiveness of the segmentation.
In conclusion, the results demonstrated that the proposed algorithm shows remarkable performance in multilevel thresholding when compared with other popular meta-heuristics in this research area.

Conclusions
This paper proposes a variant of particle swarm optimization called DG-PSO. DG-PSO uses a double-group based evolution framework. The individuals in DG-PSO are divided into two groups according to their fitness values. Two main ideas are introduced in the evolution of the disadvantaged group: a hybrid strategy for learning and a diversity enhancing strategy for avoiding premature convergence. The experimental results on various benchmark functions demonstrate that: although DG-PSO consumes slightly more time than the two related algorithms of the proposed algorithm: CLPSO and FPA; DG-PSO achieves a significant improvement in terms of mean fitness error, the corresponding standard deviation and convergence performance over all contrast algorithms. In addition, we further apply the proposed algorithm to multilevel thresholding for remote sensing image segmentation. The results also show the effectiveness of DG-PSO.