Parallel Compact Differential Evolution for Optimization Applied to Image Segmentation

: A parallel compact Differential Evolution (pcDE) algorithm is proposed in this paper. The population is separated into multiple groups and the individual is run by using the method of compact Differential Evolution. The communication is implemented after predeﬁned iterations. Two communication strategies are proposed in this paper. The ﬁrst one is to replace the local optimal solution by global optimal solution in all groups, which is called optimal elite strategy (oe); the second one is to replace the local optimal solution by mean value of the local optimal solution in all groups, which is called mean elite strategy (me). Considering that the pcDE algorithm does not need to store a large number of solutions, the algorithm can adapt to the environment with weak computing power. of the proposed two communication strategies for the pcDE. Finally, the proposed pcDE is applied to image segmentation and experimental results also demonstrate the superior quality of the pcDE compared with some existing methods.


Introduction
Differential Evolution (DE) has prove to be an useful optimization algorithm applied in many areas [1][2][3]. In many practical application scenarios, it may be less likely to use high-performance computing equipment due to various reasons, but it is necessary to solve varieties of optimization problems. The traditional optimization algorithm often needs to store a lot of solutions, and to calculate each solution's corresponding value. The huge amount of calculation conflicts with the performance of the computing equipment. Whereas, the compact evolutionary algorithm (cEA) which belongs to the estimation of distribution algorithm is proposed to overcome this dilemma. The compact evolutionary algorithm uses only one individual solution with the statistical characteristics of the population to optimize the related problem. Therefore, the method demands less computing power and memory compared with traditional methods.
Several versions of compact evolutionary algorithm have been proposed. The compact genetic algorithm (cGA) [4] is the first compact concept of the genetic algorithm (GA), which only processes one solution, but it gets the similar result when compared with the traditional genetic algorithm. The convergence of compact genetic algorithm is proved in [5] and an extended compact genetic algorithm (ecGA) can be found in [6]. The study of the scalability for extended compact genetic algorithm can be found in [7] and the compact genetic algorithm is also applied to the training of natural network [8].
A compact differential evolution (cDE) algorithm is proposed in [9]. The principle of compact differential evolution algorithm is similar to that of compact genetic algorithm, but there are still two obvious differences. The first is survivor selection scheme. cDE adopts one-to-one generation logic and selects survivors by comparing the fitness of parents and offspring, which is different from the typical selection mechanism of the genetic algorithm. The second difference lies in that the number of DE search operations is limited, which will most likely result in failure when efforts are made to find the highest quality solution (see [10][11][12]). To solve this problem, randomness is added to the search logic of the basic DE algorithm, and cDE introduces certain randomness because it uses the statistical characteristics of the population. This feature not only makes it easier for the algorithm to find better solutions, but also reduces stability.
In this paper, we propose an improved version of cDE, which is called parallel compact differential evolution algorithm (pcDE). In cDE, only one individual is generated in per iteration, so it is easy to cause a large experimental error. The stability of the cDE algorithm is not strong, and the experimental results fluctuate widely. In the pcDE algorithm, multiple groups are parallelized, and the results of different groups are fused through the communication strategies after fixed number of iterations so as to obtain a reliable solution. Hence, the stability and efficiency of the algorithm are enhanced. It is found that the performance of pcDE is better than that of cDE.
Image segmentation is still a hot topic in image processing and computer vision research. The goal of image segmentation is to separate the target from its background, which is the basis of image classification and recognition. Frequently-used image segmentation methods include threshold segmentation, region tracking, and edge detection. The method of threshold segmentation is the most frequently-used image segmentation method. At present, there are many threshold selection methods, like maximum inter-class variance method (Otsu method) [13], best histogram entropy method (KSW entropy method) [14], and minimum error thresholding method [15] etc. Most existing methods use the one-dimensional gray histogram to select segmentation threshold. However, these segmentation methods will cause a lot of errors when the signal-to-noise ratio (SNR) of image decreases. Therefore, the image segmentation method based on a two-dimensional gray histogram appeared. This method is called two-dimensional maximum entropy threshold segmentation [16,17]. It uses gray information and the related information of the neighborhood space, so its effect is significantly improved by comparing with the traditional method. Different algorithms are used to select the segmentation threshold, and the results are analyzed and compared. The analysis of the experimental results shows that the threshold found by the pcDE algorithm is better for image segmentation comparing with some existing methods.

Background
For the optimization problem, we employ some kind of optimization algorithm to get the optimal approximate solution, which is usually the minimum value of the objective function f (X), that is, min f (X), where X = (x 1 , x 2 , x 3 , ...x D ) is the vector we want to solve in the solution space . Optimization algorithms could be applied in a wide range of areas, like wireless sensor networks [18][19][20][21][22][23][24], transportation optimization issues [25], high-dimensional expensive problems [26] etc. The search range of the problem is arranged to be [−1,1]. To obtain the true target solution, we need to perform the following transformations: where

DE Algorithm
The original definition of DE can be found in [10,27]. A Gaussian bare-bones differential evolution algorithm can be found in [28]. The steps of DE algorithm include: where X i (0) represents the i-th individual of the 0-th generation, and x i,j represents the j-th dimension of the i-th individual. Initialize the population through an uniform distribution method: rand () represents a random number. 2. Variation: The DE algorithm implements the mutation of individuals through a difference mechanism. A frequently-used difference mechanism is to randomly select two individuals from the population and add their scaled vector difference to the individual vector to be mutated: where X t (gth) is the individual to be mutated in the g-th generation, X r (gth) and X s (gth) are two individuals randomly selected from g-th generation. F ∈[0, 2] as a constant, represents the scaling factor. r, s, t ∈[1, Np] and r = s = t. X o f f (gth + 1) is a temporary offspring produced by mutation. This difference mechanism is called rand/1/bin. Other difference mechanisms are proposed in [27]. 3. Crossover: The binomial crossover mechanism is implemented for each individual of the g-th generation population {x i (gth)} and its temporary offspring {xo f f i (gth + 1)} to obtain the final offspring. The formula is as follows: where Cr ∈[0, 1] is a predefined constant, which is called the crossover rate. 4. Selection: The DE algorithm employs a greedy algorithm to choose individuals for the next generation: f it() is a fitness function. In the process of evolution, the individual with higher fitness value will be preserved. The pseudocode of the DE algorithm is shown in Algorithm 1.

Compact Differential Evolution Algorithm
The cDE algorithm inherits the probability vector (PV) and elites mechanism from rcGA [9,29]. In cDE, PV is an n * 2 matrix and is not a vector [30]: where n is the dimension of the problem. µ and σ are n * 1 vectors. µ i and σ i respectively represent the mean value and standard deviation of the normal distribution. The normal probability distribution function (PDF) is truncated on the interval [−1,1]. To make the area of the probability distribution function equal to 1, we normalize the height of the probability distribution function.
randomly select three individuals x r , x s , and x t 9 compute The sketch map of the sampling mechanism is shown in Figure 1. At the beginning of the cDE algorithm, µ is initialized to 0, σ is initialized to 10. A solution is sampled from the solution space as elite. During each iteration, a new temporary offspring will be generated through the new PV. The temporary offspring will be compared with the elite. The one with higher fitness will be recorded as the winner, while the one with lower fitness as a loser. PV update rules are as follows: There are two different versions of cDE, which are called persistent elitism compact differential evolution (pe-cDE) and non-persistent elitism compact differential evolution (ne-cDE) (see [9]). For pe-cDE algorithm, the elite is replaced by the temporary offspring when the elite is loser, while for the ne-cDE algorithm, if there is no replacement after the η (a predefined constant representing the threshold) comparison, whatever the state of elite's fitness is, the elite will be replaced by temporary offspring. The pseudocode of pe-cDE and ne-cDE are shown in Algorithms 2 and 3.

Parallel Compact Differential Evolution
In this section, we propose a parallel cDE algorithm (pcDE). The purpose of parallel processing is to perform calculations on multiple processors at the same time to produce the same results. Parallelism can improve the speed of search and convergence, and find better solutions faster. In [33], parallel particle swarm optimization (PPSO) and three different communication strategies are simultaneously proposed. A new communication strategy for paralleling gray wolf optimization is proposed in [34]. Parameter adaptive DE(PaDE) is proposed in [35], and parallel heterogeneous meta-heuristic is proposed in [36]. It can be seen that the effect of parallelization is much better than that of non-parallelization. This paper also proposes two parallel strategies. One is to replace the elites in all groups with the most adaptable elites and replace the corresponding PV value at the same time, called the optimal elite strategy(oe); the other is to average the elites in all groups and replace the PV with the average value of PV in all groups at the same time, called the mean elite strategy(me). The difference mechanism rand/1/bin is adopted in this paper, and two different communication strategies are tested under persistent elitism version. The algorithm flow is as follows: 1. Divide the problem into g independent groups, and initialize the PV in each group, where 2. According to the method proposed in Section 2, each group generates an elite. 3. Each group performs mutation and crossover. 4. According to the persistent elitism version, each group compares the generated offspring with the elite, and obtains a new elite and updates the PV according to the rules mentioned in Section 2. 5. Every time after θ (a predefined constant representing the threshold) iterations, the groups communicate with each other. Taking the optimal elite strategy as an example, it firstly calculates the fitness of the elites in each group and then selects the elite with the highest fitness to replace the other elites. PV will also be replaced with the optimal elite's PV . 6. Repeat 2-5 until the program ends.
The pseudocode of pcDE (optimal elite strategy) is shown in Algorithm 4, and the pseudocode of pcDE (mean elite strategy) is shown in Algorithm 5, m∈ [1, g].

Numerical Results
For each test function, we test it with the previously mentioned algorithms. f1-f17 are tested with n = 10 and n = 30; f18-f25 are tested in the specified dimensions. Each algorithm runs independently for 10 times. The number of parallel groups is set to 3. Each test is performed in 5000 * n iterations. The number of virtual population Np is set as 2 * n. The mean of 10 independent experiments is used as the experimental result. Table 1 shows the experimental results of two parallel strategies (persistent elitism version, F = 0.1, Cr = 0.1, rand/1/bin). Table 2 shows the experimental results of two parallel strategies (non-persistent elitism version, F = 0.1, Cr = 0.1, rand/1/bin). The best results are highlighted in boldface. From Table 1, we can see that in the 17 tests with n = 10, pcDE achieves the best results of 15 tests, while cDE only achieves 2; when n = 30, the result is the same; in the 8 tests with n = various, pcDE achieves the best results of 6 tests, while cDE achieves only 2. From Table 2, we can see that in the 17 tests with n = 10, pcDE achieves the best results of 16 tests, while CDE only achieves 1; when n = 30, pcDE achieves the best results of 14 tests, while cDE achieves only 3; in the 8 tests with n = various, pcDE achieves the best results of all tests. From the above data, we can see that the performance of pcDE is significantly better than that of cDE. Figure 3 shows the performance trends of pcDE and cDE algorithms. We also test the effects of two communication strategies on experimental results under three different schemes. The three schemes are:1.rand/1/bin, 2.best/1/bin, 3.rand-to-best/1/bin. Table 3 shows the test results, where '+' represents that the performance of the algorithm after parallelism is better than the original algorithm, '−' represents the opposite. Figure 4 shows performance trends of pcDE and cDE algorithms under three different strategies.

Case of Study: Parallel Compact Differential Evolution for Image Segmentation
In this section, we employ the pcDE algorithm to implement image threshold segmentation. The differences between pcDE and traditional methods are compared. For the image Img(x, y), we suppose that its size is width * height and the gray level is L, the neighborhood average gray value g of (2 * n + 1) 2 at the (x, y) point is: We use the point gray value and (2 * n + 1) * (2 * n + 1) neighborhood average gray value to establish a two-dimensional gray histogram F(x, y), then segment F(x, y) with a two-dimensional threshold (s, t). Let N i,j be the number of pixels whose gray value is i and neighborhood average gray value is j, P i,j be the probability, then {P i,j , i, j = 1, 2, 3...L} is the two-dimensional gray histogram of the image, which is shown in Figure 5. Figure 5 shows that the peak of the probability of the point-neighbor average gray is mainly distributed near the diagonal of the plane, and the overall appearance is bimodal. It is because the target and background account for the largest proportion of all pixels, and the gray value of the pixels inside the target area and the background area are relatively uniform. The gray values of the points are similar to their neighborhood average gray value, so they are concentrated near the diagonal. The two peaks respectively represent target and background. Away from the diagonal, the height of the peak drops sharply. This part reflects the image noise, edges and other information. A two-dimensional histogram XOY plan view is shown in Figure 6. Region A and B correspond to the target and background, and region C and D to the edges and noise. The optimal threshold is determined by the pcDE algorithm to maximize the amount of information that truly represents the target and background.  Assuming that the target region and the background region have different probability distributions, we normalize each region with the posterior probability of the target and the background region, so that the entropy of each region is additive. We suppose the threshold is (s, t), then the discrete two-dimensional entropy is: The two-dimensional entropy function of the target and the background is defined as: where H L is the entropy when (s, t) takes the maximum gray value. The calculation method is the same as H A . Using the formula (13) as the evaluation function, the pcDE algorithm is used to calculate the fitness value at different thresholds. Finally we get the optimal threshold. Table 4 shows the final fitness and standard deviation of the Lena images ( Figure 7) tested with four algorithms, s and t represent the final two-dimensional segmentation threshold (s,t∈[0,255]). The performance comparison among these four algorithms is shown in Figure 8. In the following section, we segment a medical image of lung and compare the experimental results of Otsu method, minimum error thresholding method, and pcDE. The main steps include: 1. Image preprocessing. Enhance the contrast of the image to achieve better segmentation. 2. Calculate the optimal threshold for image segmentation through different methods to get the binarized image. 3. Use morphological methods to process images to extract targets. 4. Mark target contour.   Figure 9 shows the experimental result, in which (a)-(c) are the images after threshold segmentation, and (d) and (e) are the images after morphological processing, and (g) is the original image, and (h) is the contrast-enhanced image. Figure 10 shows the contour marked image. From the above experimental results, we can see that the image segmented via the pcDE algorithm is much closer to the reality, compared with other algorithms.  On the basis of the above experiments, we add Gaussian noise(µ = 1.00 × 10 −5 , σ = 1.00 × 10 −1 ) to the original picture and re-run the experiment. Figure 11 shows the experimental result, in which (a)-(c) are the images after threshold segmentation, and (d) and (e) are the images after morphological processing, and (g) is the original image, and (h) is the noisy image. Figure 12 shows the contour marked image.
Several frequently-used metrics in medical image segmentation are: Dice coefficient (DICE), volumetric overlap error (VOE), relative volume difference (RVD), average symmetric surface distance (ASD), and maximum symmetric surface distance (MSD) (see [41,42]) etc. This paper uses DICE as a metric to evaluate the quality of image segmentation results. DICE is the most frequently-used method. When the segmentation effect is optimal, the value of DICE is 1. The mathematical definition is as follows: R gt : Represent the result of ground truth segmentation ( Figure 13).  Table 5 shows the experimental numerical results.   Through analyzing the experimental results, we can see that, in the absence of noise, the minimum error thresholding and pcDE algorithm have a better segmentation effect, with Otsu's segmentation effect being weak. After that Gaussian noise is added, the pcDE algorithm can still achieve better segmentation effect, while the minimum error thresholding segmentation effect becomes unsatisfactory. Considering the two situations, the pcDE performs better than the other two methods with or without noise, and could be extensive in prospects for application.

Conclusions
This paper proposes a parallel compact DE algorithm. The parallel communication strategy achieves faster convergence speed and can find a better solution in a short time. Two different parallel communication strategies are proposed and compared, which are optimal elite strategy and mean elite strategy. In general, the former is better than the later one. Compared with the previous algorithms, the parallel algorithm has obvious advantages in terms of performance and stability. It is applied to image threshold segmentation. Compared with the previous method, we can see that the threshold found by the pcDE algorithm has a better segmentation effect, which has a positive impact on subsequent image processing operations. It can adapt to some weak computing power environment, such as medical equipment, micro robot and so on. In these environments, pcDE algorithm can still maintain good performance and stability, and achieve a balance between accuracy and speed. It has a good application prospect for those optimization problems that need to be solved in the environment with weak computing power.