A Grouping Differential Evolution Algorithm Boosted by Attraction and Repulsion Strategies for Masi Entropy-Based Multi-Level Image Segmentation

Masi entropy is a popular criterion employed for identifying appropriate threshold values in image thresholding. However, with an increasing number of thresholds, the efficiency of Masi entropy-based multi-level thresholding algorithms becomes problematic. To overcome this, we propose a novel differential evolution (DE) algorithm as an effective population-based metaheuristic for Masi entropy-based multi-level image thresholding. Our ME-GDEAR algorithm benefits from a grouping strategy to enhance the efficacy of the algorithm for which a clustering algorithm is used to partition the current population. Then, an updating strategy is introduced to include the obtained clusters in the current population. We further improve the algorithm using attraction (towards the best individual) and repulsion (from random individuals) strategies. Extensive experiments on a set of benchmark images convincingly show ME-GDEAR to give excellent image thresholding performance, outperforming other metaheuristics in 37 out of 48 cases based on cost function evaluation, 26 of 48 cases based on feature similarity index, and 20 of 32 cases based on Dice similarity. The obtained results demonstrate that population-based metaheuristics can be successfully applied to entropy-based image thresholding and that strengthening both exploitation and exploration strategies, as performed in ME-GDEAR, is crucial for designing such an algorithm.


Introduction
Image segmentation is a challenging task in machine vision. It is the process of dividing an image into several non-overlapping areas based on features such as colour or texture. Image segmentation is used in a broad spectrum of applications including medicine [1,2], the modelling of microstructures [3] and food quality [4]. While a variety of image segmentation approaches have been proposed [5] and although deep learning methods have shown impressive performance for image segmentation tasks [6], techniques based on image thresholding remain popular due to their simplicity and robustness [7,8] despite not requiring a training process. Image thresholding aims to find the threshold value(s) for an image using information from the histogram of an image. While bi-level image thresholding (BLIT) methods try to find a single threshold to discriminate between the background and foreground, multi-level image thresholding (MLIT) approaches have determined multiple threshold values to partition an image into several regions. MLIT is a challenging task and has thus attracted the attention of significant research [9][10][11][12]. from a broad spectrum of domains [37,40,41]. The canonical DE algorithm includes four main steps: initialisation, crossover, mutation, and selection. The pseudo-code of DE is given in Algorithm 1, whereas the main operators are described below.
Algorithm 1: Pseudo-code of DE algorithm.
Input : D: number of dimensions, N P : population size, F: scaling factor, C R : crossover probability Output : x * : the best solution // Population initialisation Generate initial population randomly; Evaluate fitness for each candidate solution in population ; Initialise generation counter iter = 1 ; while stopping criterion not met do for i ← 1 to N P do // Mutation Select parents, x i1 , x i2 , and x i3 , randomly from current population, x * ← best solution in current population;

Initialisation
Similarly to other PBMHs, DE begins with a population of N P randomly generated individuals, where for a D-dimensional problem, an individual is defined as x i = (x i,1 , x i,2 , . . . , x i,D ) ∈ R D .

Mutation
Mutation creates a mutant vector based on differences among individuals. While there is a wide range of mutation operators, DE/rand/1 is popular and defined as where x r1 (called the base vector), x r2 , and x r3 are three randomly selected individuals distinct from the current population, and F is a scaling factor.

Crossover
Crossover combines the mutant and parent vectors, with the aim of enhancing the exploration of the population. Among the different crossover operators, binomial crossover is often chosen and is formulated as where i = 1, . . . , N pop , j = 1, . . . , D, u is called a trial vector, C R is the crossover rate, and j rand is a random number in [1; N pop ].

Selection
The selection operator aims to select the better individual from the trial vector and the parent vector for inclusion in the next population.

Clustering
Clustering is an unsupervised pattern recognition technique to divide a set of samples into a number of groups so that samples located in the same cluster are more similar compared to those in different clusters. The main characteristics of a clustering algorithm are: • Each cluster should have at least one sample: The total number of samples in all clusters must be equal to the total number of samples: ∪ K i = 1 = O; and • Distinct clusters should not have a mutual sample: c i ∩ c j = φ, j = 1, . . . , K, i = j. Among the different clustering algorithms, k-means [42] is a simple yet effective approach that is widely employed. k-means proceeds in the following steps:

Grouping Strategy
We propose a grouping strategy, inspired by [43], for dividing the current population into groups. Our grouping strategy has two main operators: region creation and population update.

Region Creation
Our grouping strategy first creates some regions based on the k-means algorithm. Here, each cluster indicates a region and the number of clusters is set as a random number between 2 and √ N P . Cluster centres are the means of individuals in the same cluster, meaning that each cluster centre holds information about the individuals in the cluster. The cluster centres thus support a sort of multi-parent crossover. Figure 2 indicates the process of region creation for a toy example.

Population Update
The cluster centres created above should be included in the current population. To this end, we employed a generic population-based algorithm (GPBA) proposed in [43,44] to boost the performance of the algorithm. GPBA uses four operators to tackle optimisation problems, namely: • Selection: randomly choose some individuals from the current population. This relates to choosing initial samples in the k-means algorithm; • Generation: create m individuals as set A. For this, ME-GDEAR selects the cluster centres as the new individuals, that is, the new individuals are generated using kmeans clustering; • Substitution: choose m individuals (set B) from the population for substitution. There are various ways to select some individuals from the population; ME-GDEAR uses random selection as a simple selection strategy; • Update: from the union set A ∪ B, the m best individuals are selected asB. The new population is then obtained as (P − B) ∪B.

Clustering Period
In ME-GDEAR, clustering is not performed in every iteration. Instead, clustering is periodically performed [43,45], where parameter C P defines the clustering period. Selecting an effective clustering period is essential so that DE can create stable clusters.

Attraction and Repulsion Strategies
We introduce attraction and repulsion strategies into ME-GDEAR inspired by the WOA algorithm [46] in order to explore the search space more effectively. These strategies are applied with a probability P r . Three operators are employed, which we explain below, while switching between them is performed based on some probabilities.

Repulsion from Random Individuals
This operator causes all individuals to move away from some randomly selected individuals as with where x r is a random individual selected from the current population, A is a number greater than 1, and C is a random number between 0 and 2.

Attraction towards the Best Individual
Here, each individual tries to converge towards the best individual as with: where x best is the best individual in the current population, A is a number less than 1, and C is a random number between 0 and 2.

Attraction towards the Best Individual (Spirally)
This operator updates an individual in a spiral way as with: where x best is the position of the best individual, b is a constant, and l is a random number in [−1, 1].

Encoding Strategy
The encoding strategy determines the structure of each individual in the population. In ME-GDEAR, we employed, as illustrated in Figure 3, a one-dimensional vector to encode the threshold values as where D is the number of threshold values, and th i is the i-th threshold value.

Objective Function
The probability of occurrence of pixel intensity i is: where M and N are the dimensions of the image, L is the number of image intensities, and n i is the number of pixels of intensity i. For our MLIT algorithm, the class likelihoods are computed as and the multi-level Masi entropy (MME) of each class is calculated as . . .
where r is the value of the entropic parameter. Finally, we define the objective function as

Proposed Algorithm
Our ME-GDEAR algorithm, which performs clustering-based DE boosted by attraction and repulsion strategies for Masi-entropy multi-level image segmentation, proceeds in the following steps:

1.
Initialise the parameters including population size N P , maximum number of function evaluations NFE max , clustering period C P , probability of attraction and repulsion strategies P r , and entropic parameter r. Set the current number of function evaluations NFE = 0, and the current iteration iter = 1.

2.
Generate the initial population of size N P using uniformly distributed random numbers.

3.
Calculate the objective function value of each individual in the population using Equation (14).
If (iter%C P == 0), go to Step 7a-otherwise, go to Step 8: (a) Randomly generate k as random integer number between 2 and √ N P ; (b) Perform k-means clustering and select k cluster centres as set A; (c) Select k individuals randomly from current population as set B; (d) From A ∪ B, select best k individuals asB; (e) Select new population as (P − B) ∪B.

8.
If rand < P r , go to Step 8a-otherwise, go to Step 9. (a) Generate two random numbers, r1 and r2, between 0 and 1, and one random number, C, between 0 and 2; (b) Set a as 2 − NFE(2/NFE max ) and A as 2ar1 − a; (c) If rand < 0.5, go to Step 8d-otherwise, go to Step 8g; (d) If |A| ≥ 1, go to Step 8e-otherwise, go to Step 8f; (e) Apply repulsion operator using Equation (4) and go to Step 9; (f) Apply attraction operator using Equation (6) and go to Step 9; (g) Apply spiral attraction operator using Equation (8).

9.
Set iter = iter + 1. 10. If NFE > NFE max , go to Step 11-otherwise, go to Step 5. 11. Select the best individual as the set of optimal threshold values.

Monte-Carlo Simulations
In our approach, clustering acts similarly to a multi-parent crossover. To analyse the effect of clustering on the algorithm's performance, we designed some Monte-Carlo simulations. For this, we selected three representative images from the Berkeley image segmentation database [47], namely 147091 , 101087, and 253027.
The golden region was defined as a hyper-sphere whose diameter is the middle 60% interval of the shrunken search space and whose centre is the centre of the shrunken search space [48]. The lower and higher bounds of the shrunken search space are the minimum and maximum of the current population, respectively. An individual is located in the golden region if the distance to the centre point is less than the radius of the hyper-sphere. Points in the golden region are more likely to be close to an unknown optimum solution [48].
In the first simulation, the percentages of cluster centres and random individuals which are located in the golden region were computed. In each iteration, several randomly generated individuals were generated (based on the population size) and their locations were found (inside the golden region or not). Then, the location of cluster centres was obtained. Figure 4 gives the results (all simulations were repeated 10,000,000 times) and shows that the probability of a cluster centre falling in the golden region is much higher than that of a random individual, indicating that cluster centres are biased toward the centre of the golden region. In the next experiment, we calculated the distance between the centre of the golden region and cluster centres and between the centre of the golden region and random individuals. From Figure 5, which shows the results, we can observe that the distance between the cluster centres and the centre of golden region is smaller than the distance between random individuals and the centre of golden region, indicating that the cluster centres are closer to the centre of golden region compared to random individuals. Finally, we evaluate the mean objective function value with and without our proposed grouping strategy to assess its effectiveness. Figure 6 shows that for all images, the mean objective function values are improved, confirming that the grouping stage leads to improved thresholding performance.

Results and Discussion
In order to evaluate the performance of our proposed ME-GDEAR algorithm, we performed several experiments on a set of benchmark images which are widely used to test thresholding algorithms, namely Boats, Peppers, Goldhill, Lenna, and House, as well as seven images from the Berkeley image segmentation database [47], 12003, 181079, 175043, 101085, 147091, 101087, and 253027. Figure 7 shows the images and their histograms. As we can see, the image histograms show different characteristics; some images such as Lenna and Peppers have different peaks and valleys, while others such as 175043 have only one peak and images such as Goldhill have abrupt changes in the histogram.  We compared ME-GDEAR with a number of population-based image thresholding algorithms, including Masi entropy-based differential evolution (ME-DE), the Masi entropybased firefly algorithm (ME-FA), Masi entropy-based bat algorithm (ME-BA), Masi entropybased moth flame optimisation (ME-MFO), Masi entropy-based dragonfly algorithm (ME-DA), and Masi entropy-based whale optimisation algorithm (ME-WOA).
The population size and the number of function evaluations for all algorithms were 50 and 10,000, respectively. For ME-GDEAR, C p and p r are set to 5 and 0.2, respectively. For the other algorithms, we used the default values for the various parameters which are listed in Table 1. For all algorithms, the entropic parameter was set to 1.2. Each algorithm was run 25 times and we reported the average and standard deviation over these 25 runs.

Objective Function Results
We first compared the algorithms in terms of objective function values. Table 2 gives the results of all algorithms and all images for D = 3. For each image and algorithm, we give the average, standard deviation, and resulting rank (based on the average) of each algorithm. In addition, the average ranks and overall ranks are reported.
As we can see, ME-GDEAR is ranked first or second for 8 of the 12 images, leading to the first overall rank. ME-DE is ranked top for three images, while ME-FA gives the best results for two images and these two algorithms give the second-best results overall. Table 3 reports the results for D = 4. ME-GDEAR is again clearly ranked first overall. By comparing Tables 2 and 3, we can observe that ME-DE drops from an average rank of 3.25 to 4.50, leading to an overall rank of 5 for D = 4. In contrast, ME-MFO is ranked second overall for D = 4, improving from its fourth rank for D = 3.    For D = 5, similar results can be seen in Table 4. ME-GDEAR yields the first overall rank, while ME-MFO is ranked second. There is a clear difference between the average rank of ME-GDEAR (1.83) and that of ME-DE (4.17) which shows that our approach clearly outperforms differential evolution.  The curse of dimensionality is a challenging problem in solving an optimisation problem, since increasing the number of dimensions results in exponentially expanding the search space. To assess our proposed algorithm in higher dimensions, we compared ME-GDEAR for D = 10 against the other algorithms in Table 5. It is obvious that our algorithm again yields the best results, being ranked first or second for 9 of the 12 images, while ME-BA is ranked second overall.
Overall, ME-GDEAR thus outperforms all other algorithms for all tested dimensionalities, indicating the impressive multi-level image thresholding performance.

Feature Similarity Index Results
The feature similarity index measure (FSIM) [53] is a popular measure for evaluating image quality which is based on two low-level features-phase congruency, which measures the significance of local structures; and gradient magnitude, which incorporates contrast information. Table 6 lists the FSIM results for D = 3. From there, we can see that our proposed algorithm is again ranked top overall. The same holds for D = 4 whose results are in Table 7 and for D = 5 with results in Table 8.    The results for the higher-dimensional problem with D = 10 are given in Table 9. From there, we can see that ME-GDEAR maintains its efficacy and outperforms all other algorithms. Overall, ME-GDEAR also outperforms all other algorithms in terms of FSIM and does so for all dimensionalities, confirming the efficacy of our proposed algorithm.

Dice Measure
We further performed an evaluation based on Dice similarity [54], which measures the overlap between two segmented images. Since the Dice measure requires a ground truth, we can only apply it on the images of the Berkeley segmentation dataset. As there are multiple manual segmentations for each image, we take the maximum obtained Dice score as our measure for comparison. Table 10 gives the results for D = 3 and shows ME-GDEAR to give the best Dice score for 5 of the 7 images, and, consequently, the best average rank. Similar results are obtained for D = 4, D = 5, and D = 10, as can be observed from Tables 11-13, respectively.

Statistical Tests
Owing to the random characteristics of PBMHs, we also performed statistical tests, based on objective function performance, to further assess the algorithms. In particular, we conducted two non-parametric statistical tests, the Wilcoxon signed rank test and the Friedman test [55]. The Wilcoxon signed rank test is a pair-wise test to compare two algorithms, while the Friedman test allows to evaluate more than two algorithms. The null hypothesis (H 0 ) states that there is no significant difference between algorithms, while the alternative hypothesis (H 1 ) investigates a difference. Furthermore, the level of statistical significance α indicates the hypothesis rejection probability: if the calculated p-value is lower than α, H 0 is rejected.
The results of the Wilcoxon signed rank test between ME-GDEAR and the other algorithms are given in Table 14. From there, we can see that in all cases, the obtained pvalue is much smaller than α = 0.05, confirming that ME-GDEAR statistically outperforms the other algorithms.

p-Value
ME-GDEAR vs. ME-DE 5.8052 × 10 −5 ME-GDEAR vs. ME-BA 2.9061 × 10 −6 ME-GDEAR vs. ME-GWO 4.4433 × 10 −9 ME-GDEAR vs. ME-DA 9.4286 × 10 −5 ME-GDEAR vs. ME-MVO 1.1412 × 10 −7 ME-GDEAR vs. ME-WOA 3.6885 × 10 −9 The results of the Friedman test are given in Table 15. It is apparent that ME-GDEAR yields the lowest rank (1.96) and with a wide margin over the second ranked algorithm (ME-BA). The obtained p-value is negligible, confirming the fact that there is a significant difference between the algorithms. The critical value for (8 − 1) = 7 degrees of freedom with a 0.05 significance level is 14.067 (from chi-squared distribution table). The obtained chisquared value of 87.6 is much higher than the critical value; in other words, H 0 is rejected.

Visual Evaluation
In this section, we visually compare the results of the algorithms. For this, we select (due to length restrictions) image 147091 for D = 5 and image 101087 for D = 10 as representatives examples. Since the images are from the Berkley segmentation dataset, there are several ground truth segmentations available for each, although these are often quite different. Figure 8 shows the manual segmentations together with the images thresholded by all algorithms for image 147091 for D = 5. We can notice that our proposed algorithm can segment the image with less noise, particularly the parts of the sky that are cloudless. In contrast, some algorithms such as ME-BA and ME-WOA are unable to distinguish between the left vertical margin and its adjacent parts.  Figure 9 shows the results for images 101087 and D = 10. Here, we can observe that some algorithms such as ME-WOA and ME-BA do not perform well, most noticeably in the sky part, while ME-GDEAR works significantly better and with less noise. Some algorithms such as ME-FA, ME-BA, and ME-WOA cannot properly segment the shadow part of the lake; these algorithms segment the shadow part into three different regions with almost the same proportions, while our proposed algorithm segments this part more reasonably into two partitions. It is worth noting that in our proposed method the distribution of the classes in the shadow part is not the same and most of the shadow part belongs to one single class, which is more in line with reality.

Effect of Parameters
In ME-GDEAR, we introduce two new parameters, namely C P and P r . To see their effect, we select three representative images, 147091, 101087, and 253027 with D = 10. As shown in Figure 10, the performance highly depends on C P . Therefore, finding a good value for C P is beneficial to achieve better thresholding. The best value was obtained for C P = 5.  Figure 11 shows results for different values of P r . As we can see, 0.2 is an appropriate value for this parameter.

Conclusions
Multi-level image thresholding remains a popular image segmentation approach. Its aim is to find optimal thresholds based on information available in the image histogram. In this paper, we proposed an improved differential evolution algorithm for MLIT based on Masi entropy. Our ME-GDEAR algorithm introduces (1) a grouping strategy into DE to cluster the population and use cluster information to update the population; and (2) attraction and repulsion strategies to more effectively update individuals. Experiments on a benchmark image set with different characteristics clearly demonstrate that ME-GDEAR outperforms other MLIT approaches.
One challenge of image thresholding algorithm is that they may not be too widely used on their own, particularly for higher dimensions. However, they can also be effectively employed as a pre-processing technique. For example, ref. [56] uses image thresholding as a pre-processing step for the application of a subsequent graph cut segmentation algorithm. Therefore, in future work, we intend to integrate our approach with other image segmentation algorithms. Another challenge is that only the image histogram is used, thus ignoring 2-dimensional image information including texture.
Furthermore, some of the drawbacks of ME-GDEAR can be addressed in future work. For instance, it uses k-means to cluster the population which can be time-consuming. Using methods with lower computational demand can thus be considered. Furthermore, as is common with other population-based metaheuristic algorithms, parameter tuning is a demanding task and investigating mechanisms for automatic parameter-tuning will be beneficial. Other planned future work includes the application of alternative objective functions to improve segmentation and a multi-objective variant of the algorithm.