Kapur’s Entropy for Color Image Segmentation Based on a Hybrid Whale Optimization Algorithm

In this paper, a new hybrid whale optimization algorithm (WOA) called WOA-DE is proposed to better balance the exploitation and exploration phases of optimization. Differential evolution (DE) is adopted as a local search strategy with the purpose of enhancing exploitation capability. The WOA-DE algorithm is then utilized to solve the problem of multilevel color image segmentation that can be considered as a challenging optimization task. Kapur’s entropy is used to obtain an efficient image segmentation method. In order to evaluate the performance of proposed algorithm, different images are selected for experiments, including natural images, satellite images and magnetic resonance (MR) images. The experimental results are compared with state-of-the-art meta-heuristic algorithms as well as conventional approaches. Several performance measures have been used such as average fitness values, standard deviation (STD), peak signal to noise ratio (PSNR), structural similarity index (SSIM), feature similarity index (FSIM), Wilcoxon’s rank sum test, and Friedman test. The experimental results indicate that the WOA-DE algorithm is superior to the other meta-heuristic algorithms. In addition, to show the effectiveness of the proposed technique, the Otsu method is used for comparison.


Introduction
Image segmentation is a fundamental and key technique in image processing, computer vision, and pattern recognition, the purpose of which is to partition a given image into specific regions with unique characteristics and then extract the objects of interest [1][2][3][4]. Hence, the segmentation technique to be adopted determines the performance of higher level systems that introduced above [5]. At present, the main techniques of image segmentation include edge-based technique, region-based technique, neural network-based technique, wavelet transform-based technique, and threshold-based technique [6][7][8][9][10]. Among the available techniques, threshold-based technique (thresholding) is the most popular one that many scholars have done much work in this domain.
More specifically, the thresholding technique determines the segmentation thresholds by optimizing some criteria, such as maximum between-class variance and various entropy criteria [11]. In 1985, Kapur et al. maximized the histogram entropy of segmented classes to obtain the optimal threshold values, which is known as Kapur's entropy technique [12]. This thresholding technique is adopted extensively and show remarkable performance in many image segmentation problems. However, when dealing with complex image segmentation problem, the high threshold operation will increase the computational complexity of the algorithm significantly. Thus, scholars introduce various meta-heuristic algorithms into this domain with the view of reducing computational complexity and improving segmentation accuracy. Shen et al. [13] proposed a modified flower pollination H(th 1 , th 2 , . . . , th n ) = H 0 + H 1 + . . . + H n (1) where: H 0 , H 1 , . . . , H n denote the entropies of distinct classes, ω 0 , ω 1 , . . . , ω n are the probability of each class. In order to obtain the optimal threshold values, the fitness function in Equation (5) is maximized: f Kapur (th 1 , th 2 , . . . , th n ) = argmax{H(th 1 , th 2 , . . . , th n )} (5) It is worth noting that the computational complexity of the thresholding technique above will result in exponential growth as the number of thresholds increase. Under such circumstances, Kapur's entropy method is not very effective for multilevel thresholding. Therefore, the WOA-DE-based method using Kapur's entropy is proposed to improve the accuracy and computation speed of thresholding techniques. The ultimate goal of proposed method is to determine the optimal threshold values by maximizing the objective function given in Equation (1).

Whale Optimization Algorithm
The whale optimization algorithm, which was proposed by Mirjalili and Lewis in 2016, is inspired by the foraging behavior of humpback whales in nature [21]. Humpback whales tend to create spiral bubbles, and then swim to the prey along the trajectory of bubbles (see Figure 1) [25]. The encircling prey and bubble-net attacking behaviors represent the exploitation phase of optimization. The other phase of optimization namely exploration is represented by the search for prey behavior. It is worth noting that the position vector of search agent is defined in a d-dimensional space, where d denotes the number of decision variables of an optimization problem. Thus, the population X of n search agents can be represented by a (n × d)-dimensional matrix, which is shown in Equation (6):

Whale Optimization Algorithm
The whale optimization algorithm, which was proposed by Mirjalili and Lewis in 2016, is inspired by the foraging behavior of humpback whales in nature [21]. Humpback whales tend to create spiral bubbles, and then swim to the prey along the trajectory of bubbles (see Figure 1) [25]. The encircling prey and bubble-net attacking behaviors represent the exploitation phase of optimization. The other phase of optimization namely exploration is represented by the search for prey behavior. It is worth noting that the position vector of search agent is defined in a d-dimensional space, where d denotes the number of decision variables of an optimization problem. Thus, the population X of n search agents can be represented by a (n × d)-dimensional matrix, which is shown in Equation (6):

Exploitation Phase (Encircling Prey and Bubble-net Attacking Method)
In the process of hunting, the humpback whales first encircle the prey, which can be represented as follows: where * represents the best solution obtained so far, X denotes the position vector, t is the current iteration, || is the absolute value, · is an element-by-element multiplication, A and C are two essential parameters that can be evaluated by: where r is a random number in the range of [0,1] and a is a constant that will decrease linearly from 2 to 0 within the whole iterative process (both exploration and exploitation). It can be observed from Equation (8) that search agents can update their position X(t) according the best solution * . The parameters A and C determine the distance between the updated position ( + 1) and the optimal position * .
The bubble-net attacking behavior can be mathematically represented by the following equation:

Exploitation Phase (Encircling Prey and Bubble-Net Attacking Method)
In the process of hunting, the humpback whales first encircle the prey, which can be represented as follows: D = |C · X * (t) − X(t)| (7) where X * represents the best solution obtained so far, X denotes the position vector, t is the current iteration, || is the absolute value, · is an element-by-element multiplication, A and C are two essential parameters that can be evaluated by: where r is a random number in the range of [0,1] and a is a constant that will decrease linearly from 2 to 0 within the whole iterative process (both exploration and exploitation). It can be observed from Equation (8) that search agents can update their position X(t) according the best solution X * . The parameters A and C determine the distance between the updated position X(t + 1) and the optimal position X * . The bubble-net attacking behavior can be mathematically represented by the following equation: X(t + 1) = D · e br · cos(2πr) + X * (t) (12) where D shows the distance between the current search agent position and the optimal position, b is a constant that determine the shape of a logarithmic spiral, and r is a random number in the range of [−1,1]. In order to transform these two mechanisms (encircling prey and bubble-net attacking method) of exploitation phase, assume that each mechanism will be executed with 50% probability. Thus, the mathematical model of the entire exploitation phase can be expressed as: X(t + 1) = X * (t) − A · D i f p < 0.5 D · e br · cos(2πr) + X * (t) i f p ≥ 0.5 (13) where p is a random number in the range of [0,1].

Exploration Phase (Search for Prey)
In order to enhance the exploration capability of algorithm, a global search strategy is utilized. The search agents update their position according to a random agent in the population rather than the best solution obtained so far. It is worth mentioning that the absolute value of A determines the phase of optimization to be selected, namely the exploration and exploitation phases. Thus, the search for prey behavior can be mathematically represented as follows: where X rand denotes a random individual in the current population. Pseudo code of traditional whale optimization algorithm based multilevel thresholding has been given in Algorithm 1.

Algorithm 1 Pseudo code of whale optimization algorithm based multilevel thresholding
Initialize the position of whales X i . Initialize the best search agent X * . WHILE t < Maximum number of iterations FOR i = 1:n Calculate the objective value of each search agent by using the Equation (1) for Kapur's entropy. Update the best search agent X * . Update a, A, C, r, and p IF1 p < 0.5 IF2 |A| < 1 Update the position of search agent using Equations (7) and (8).

ELSE
Update the position of search agent using Equations (14) and (15). END IF2 ELSE Update the position of search agent using Equations (11) and (12). END IF1 Correct the position of the current search agent if it is beyond the border. END FOR END WHILE Return X * , which represents the optimal threshold values of segmentation. g i otherwise (18) where f denotes the fitness function value of a given problem.

The Proposed Method
In this section, a detailed introduction of the WOA-DE-based method is given, and the algorithm will be used to obtain the optimal threshold values for image segmentation. A hybrid of the WOA and DE algorithms is introduced to balance the two essential phases of optimization, namely exploration and exploitation. The flowchart of WOA-DE for finding the optimal threshold values is shown in Figure 2.
It is worth mentioning that a better balance between exploration and exploitation plays an important role in improving the optimization ability of algorithm. Therefore, an efficient hybrid strategy is introduced to balance and improve these two phases. On the one hand, the WOA algorithm has strong ability to explore the solution space and is used as global search technique. On the other hand, the DE algorithm is adopted as local search technique, which can increase the precision of solutions. It is worth mentioning that a better balance between exploration and exploitation plays an important role in improving the optimization ability of algorithm. Therefore, an efficient hybrid strategy is introduced to balance and improve these two phases. On the one hand, the WOA algorithm has strong ability to explore the solution space and is used as global search technique. On the other hand, the DE algorithm is adopted as local search technique, which can increase the precision of solutions.
In addition, the purpose of introducing DE operator is not only to enhance the local search ability of the algorithm, but also to overcome the drawback that WOA algorithms easily fall into local optima in the late iterations. As described above, the random variable A will change in the range [−2,2] as a decreases progressively. If the value larger than 1 or less than −1, Equation (15) will be adopted to enhance the exploration capability of the algorithm. On the contrary, Equation (8) will be adopted as local search strategy when the value in the range [−1,1]. In order to more intuitively reflect the change of random variable A during the whole iterative process, a relevant schematic diagram is presented in Figure 3. It can be observed from the figure that the value of random variable A is fixed in the interval of [−1,1] after 250 iterations. This means that the global search strategy has no chance to be adopted after half of the iterative process, even if the current best solution may not the global optimum. Therefore, the traditional WOA algorithm will fall into the local optimum, resulting in an unsatisfactory solution accuracy. Especially for complex multi-dimensional optimization problems, such as multilevel color image segmentation, traditional WOA algorithms cannot handle them. On the contrary, DE operators can scale the difference between any two search agents in the population, which makes the particles jump out of the current search area. In Equation (16), ( − ) can be considered as the difference between two individuals, and is the scaling factor. The latter term in Equation (16) " × ( − ) " is crucial to the mutation operator. For the exploration stage, particles tend to be very far apart, and there is a big difference between the individuals. Scaling this big difference can enhance the diversity of population. For the exploitation stage, particles tend to be close together, scaling a small difference makes the algorithm effectively optimize in a small range, improving the accuracy of the solution and avoiding local optimum. In this paper, the average fitness value of the population is computed in the iterative process to evaluate the quality of each particle. The proposed hybrid model enables particles with better quality to exploit the current promising area to ensure the convergence speed, while the particles with poor quality can explore the unknown area to prevent local optimization. Although the global search strategy of traditional WOA algorithm will not be adopted in the later iteration, the introduced DE operator can effectively overcome this shortcoming, as discussed above. Exactly speaking, if f f i > , In addition, the purpose of introducing DE operator is not only to enhance the local search ability of the algorithm, but also to overcome the drawback that WOA algorithms easily fall into local optima in the late iterations. As described above, the random variable A will change in the range [−2,2] as a decreases progressively. If the value larger than 1 or less than −1, Equation (15) will be adopted to enhance the exploration capability of the algorithm. On the contrary, Equation (8) will be adopted as local search strategy when the value in the range [−1,1]. In order to more intuitively reflect the change of random variable A during the whole iterative process, a relevant schematic diagram is presented in Figure 3. It can be observed from the figure that the value of random variable A is fixed in the interval of [−1,1] after 250 iterations. This means that the global search strategy has no chance to be adopted after half of the iterative process, even if the current best solution may not the global optimum. Therefore, the traditional WOA algorithm will fall into the local optimum, resulting in an unsatisfactory solution accuracy. Especially for complex multi-dimensional optimization problems, such as multilevel color image segmentation, traditional WOA algorithms cannot handle them. On the contrary, DE operators can scale the difference between any two search agents in the population, which makes the particles jump out of the current search area. In Equation (16), x g r2 − x g r3 can be considered as the difference between two individuals, and SF is the scaling factor. The latter term in Equation " is crucial to the mutation operator. For the exploration stage, particles tend to be very far apart, and there is a big difference between the individuals. Scaling this big difference can enhance the diversity of population. For the exploitation stage, particles tend to be close together, scaling a small difference makes the algorithm effectively optimize in a small range, improving the accuracy of the solution and avoiding local optimum.
In this paper, the average fitness value of the population is computed in the iterative process to evaluate the quality of each particle. The proposed hybrid model enables particles with better quality to exploit the current promising area to ensure the convergence speed, while the particles with poor quality can explore the unknown area to prevent local optimization. Although the global search strategy of traditional WOA algorithm will not be adopted in the later iteration, the introduced DE operator can effectively overcome this shortcoming, as discussed above. Exactly speaking, if f i > f , the DE algorithm will be used to update the solution x g i using Equations (16)- (18). However, if f i ≤ f , then the current solution will be updated using Equations (8), (12), or (15). In addition, a series of experiments are conducted in the following section to verify the advantages of WOA-DE algorithm from various aspects. , the DE algorithm will be used to update the solution g i x using Equations (16)- (18). However, if f f i  , then the current solution will be updated using Equations (8), (12), or (15). In addition, a series of experiments are conducted in the following section to verify the advantages of WOA-DE algorithm from various aspects.

Experimental Setup
In this paper, Kapur's entropy thresholding technique is utilized to determine the optimal threshold values for image segmentation. The performance of our WOA-DE-based method is evaluated on fourteen images. Among them, five images are natural images from the Berkeley segmentation database [39], five images are satellite images from [40], and four images are brain magnetic resonance images (MRI) from [41]. Besides, all the images and their corresponding histogram images are shown in Figure 4. Both state-of-the-art and conventional methods, such as the traditional WOA [21], salp swarm algorithm (SSA) [42], sine cosine algorithm (SCA) [43], ant lion optimizer (ALO) [44], harmony search optimization (HSO) [45], bat algorithm (BA) [46], particle swarm optimization (PSO) [47,48], betaDE (BDE) [49], and improved differential search algorithm (IDSA) [50] are used to validate the superiority of proposed algorithm, whose parametric settings are presented in Table 1, except for the population size set to 30 and the number of iterations max set to 500 for fair comparison. The experiments are carried out through the simulation in "Matlab2017" (The MathWorks Inc., Natick, MA, USA) and implemented on a computer equipped with the Microsoft Windows 10 operating system and 8 GB memory space.

Experimental Setup
In this paper, Kapur's entropy thresholding technique is utilized to determine the optimal threshold values for image segmentation. The performance of our WOA-DE-based method is evaluated on fourteen images. Among them, five images are natural images from the Berkeley segmentation database [39], five images are satellite images from [40], and four images are brain magnetic resonance images (MRI) from [41]. Besides, all the images and their corresponding histogram images are shown in Figure 4. Both state-of-the-art and conventional methods, such as the traditional WOA [21], salp swarm algorithm (SSA) [42], sine cosine algorithm (SCA) [43], ant lion optimizer (ALO) [44], harmony search optimization (HSO) [45], bat algorithm (BA) [46], particle swarm optimization (PSO) [47,48], betaDE (BDE) [49], and improved differential search algorithm (IDSA) [50] are used to validate the superiority of proposed algorithm, whose parametric settings are presented in Table 1, except for the population size N set to 30 and the number of iterations t max set to 500 for fair comparison. The experiments are carried out through the simulation in "Matlab2017" (The MathWorks Inc., Natick, MA, USA) and implemented on a computer equipped with the Microsoft Windows 10 operating system and 8 GB memory space.

Objective Function Measure
As discussed above, Kapur's entropy is used to determine the segmentation thresholds. The segmented images of "Image2" and "Image10" obtained by WOA-DE using Kapur's entropy method with different threshold levels are given in Figures 5 and 6, respectively. Due to the stochastic nature of meta-heuristic algorithms, the experiments are conducted over 30 runs. Then the average objective values of "Image1" and "Image6" are presented in Table 2. It can be seen from the table that the WOA-DE based method gives the best values in general.

Objective Function Measure
As discussed above, Kapur's entropy is used to determine the segmentation thresholds. The segmented images of "Image2" and "Image10" obtained by WOA-DE using Kapur's entropy method with different threshold levels are given in Figures 5 and 6, respectively. Due to the stochastic nature of meta-heuristic algorithms, the experiments are conducted over 30 runs. Then the average objective values of "Image1" and "Image6" are presented in Table 2. It can be seen from the table that the WOA-DE based method gives the best values in general. The entropy of an image reflects its average information content [51]. Therefore, higher value of Kapur's entropy indicates more information in the image. It can be observed from Table 2 that the objective function value of each algorithm increases with the number of threshold values. This promising result shows that high-quality image with more information is obtained when the threshold level is high (such as K = 10 and 12).

Objective Function Measure
As discussed above, Kapur's entropy is used to determine the segmentation thresholds. The segmented images of "Image2" and "Image10" obtained by WOA-DE using Kapur's entropy method with different threshold levels are given in Figures 5 and 6, respectively. Due to the stochastic nature of meta-heuristic algorithms, the experiments are conducted over 30 runs. Then the average objective values of "Image1" and "Image6" are presented in Table 2. It can be seen from the table that the WOA-DE based method gives the best values in general. The entropy of an image reflects its average information content [51]. Therefore, higher value of Kapur's entropy indicates more information in the image. It can be observed from Table 2 that the objective function value of each algorithm increases with the number of threshold values. This promising result shows that high-quality image with more information is obtained when the threshold level is high (such as K = 10 and 12).  The entropy of an image reflects its average information content [51]. Therefore, higher value of Kapur's entropy indicates more information in the image. It can be observed from Table 2 that the objective function value of each algorithm increases with the number of threshold values. This promising result shows that high-quality image with more information is obtained when the threshold level is high (such as K = 10 and 12).

Stability Analysis
Standard deviation (STD): a value indicates the dispersion of sample data and it is mathematically represented as: where n is the sample size, f i is the fitness value of the i-th individual, and f indicates the average value of the sample.
In order to verify the stability of proposed algorithm, the STD indicator is also used. A lower value of STD indicates better stability. The STD values of "Image1" and "Image6" obtained by all algorithms are presented in Table 2. From the table it is found that WOA-DE based method gives lower values as compared to other algorithms, which shows the better consistency and stability of proposed algorithm.

Peak Signal to Noise Ratio (PSNR)
Peak signal to noise ratio (PSNR): an index which is used to evaluate the similarity of the processed image against the original image [13]: MSE represents the mean squared error and is calculated as: where I(i, j) and K(i, j) denote the gray level of the original image and the segmented image in the i-th row and j-th column, respectively. M and N denote the number of rows and columns in the image matrix, respectively. A higher value of PSNR indicates a better quality segmented image. Table 3 shows the PSNR values of "Image2" and "Image7" obtained by all algorithms and Kapur's entropy method. According to the table, the WOA-DE-based method gives the highest values in 9 out of 10 cases using Kapur's entropy. When the threshold level is small, all algorithms give similar result, while the obtained values become different as the number of thresholds increases, and the proposed method can present the best result in most cases. This phenomenon indicates that WOA-DE-based method can determine the appropriate thresholds and then present high-quality segmented image that are more similar to the original image. Figure 7 shows the visual comparison of all available methods at different threshold levels. The results of proposed method are represented as "black" lines and "square" data points. values in 9 out of 10 cases using Kapur's entropy. When the threshold level is small, all algorithms give similar result, while the obtained values become different as the number of thresholds increases, and the proposed method can present the best result in most cases. This phenomenon indicates that WOA-DE-based method can determine the appropriate thresholds and then present high-quality segmented image that are more similar to the original image. Figure 7 shows the visual comparison of all available methods at different threshold levels. The results of proposed method are represented as "black" lines and "square" data points.

Structural Similarity Index (SSIM)
Structural similarity index (SSIM) [52,53]: a measure of the similarity between the original image and the segmented image, which takes various factors such as brightness, contrast, and structural similarity into account: where µ x and µ y denote the mean intensities of the original image and the segmented image respectively. σ 2 x and σ 2 y are the standard deviation of the original image and the segmented image respectively. σ xy denotes the covariance between the original image and the segmented image. c 1 and c 2 are constants. The value of SSIM is in the range [0,1], and a higher value shows better performance.
The SSIM values obtained by all algorithms are given in Table 3 and Figure 8, respectively. It can be seen from the table that the WOA-DE-based method gives competitive results again compared with other methods in terms of SSIM indicator. The values obtained by all algorithms increase with the number of thresholds, which indicates that the segmented image is more similar to the original image in terms of brightness, contrast, and structural similarity. The experimental results in this section verify the remarkable performance of the proposed algorithm from another perspective.

Feature Similarity Index (FSIM)
Feature similarity index (FSIM) [54,55]: another measure of the image quality through evaluating the feature similarity between the original image and the segmented image: where Ω represents the whole image pixel domain. S L (x) is a similarity score. PC m (x) denotes the phase consistency measure, which is defined as: where PC 1 (x) and PC 2 (x) represent the phase consistency of two blocks, respectively: S PC (x) denotes the similarity measure of phase consistency. S G (x) denotes the gradient magnitude of two regions G 1 (x) and G 2 (x). α, β, T 1 , and T 2 are all constants. The value of FSIM is also in the range [0,1], and a higher value shows better segmented image quality. On comparing the FSIM values, which are given in Table 3 and Figure 9, it can be observed that WOA-DE-based method again outperforms the other methods. The feature similarity between the original image and the segmented image is considered in this experiment to verify the quality of segmented image comprehensively. The relevant results indicate that the proposed method has a strong feature preserving ability as compared to other methods.

Convergence Performance
In this section, the convergence performance of all algorithms is evaluated and discussed in details. In order to reflect the performance of WOA-DE more intuitively, the convergence curves of Kapur's entropy function (for K = 12) are shown in Figure 10. Four different images are selected for testing, namely "Image1", "Image4", "Image7", and "Image10". It can be found that the proposed algorithm outperforms other algorithms in general. In other words, the WOA-DE-based method gives higher position curves using Kapur's entropy technique.
As discussed above, the main drawbacks of the standard WOA are premature convergence and unbalanced exploration-exploitation, which are clearly reflected in the curves. For example, under the circumstance of "Image1" segmentation, the objective function value of WOA is almost never updated after 100 iterations, while the optimal value obtained is not the best. This phenomenon

Convergence Performance
In this section, the convergence performance of all algorithms is evaluated and discussed in details. In order to reflect the performance of WOA-DE more intuitively, the convergence curves of Kapur's entropy function (for K = 12) are shown in Figure 10. Four different images are selected for testing, namely "Image1", "Image4", "Image7", and "Image10". It can be found that the proposed algorithm outperforms other algorithms in general. In other words, the WOA-DE-based method gives higher position curves using Kapur's entropy technique.

Computation Time
The average CPU time of different algorithms considering all cases is given in Table 4. It can be found from the table that HSO is the fastest among available methods, but the segmentation accuracy discussed above is not ideal. The standard WOA algorithm gives competitive results in some cases, and the proposed algorithm namely WOA-DE is slightly slower than the standard WOA. The reason for this phenomenon is the premature convergence of HSO algorithm, which cannot well balance exploration and exploitation. On the contrary, the WOA-DE algorithm combines the advantages of both WOA and DE, which determine the most appropriate threshold value, despite not being the fastest. To sum up, WOA-DE is a high-performance hybrid algorithm that improves segmentation precision while maintaining runtime.

Statistical Analysis
In this section, a non-parametric statistical test known as "Wilcoxon's rank sum test" is used to evaluate the significant difference between algorithms [56]. The experiments are conducted 30 runs at significance level 5%. All experimental data obtained based on Kapur's entropy are used for testing. The alternative hypothesis (H ) assumes that there is a significant difference between the As discussed above, the main drawbacks of the standard WOA are premature convergence and unbalanced exploration-exploitation, which are clearly reflected in the curves. For example, under the circumstance of "Image1" segmentation, the objective function value of WOA is almost never updated after 100 iterations, while the optimal value obtained is not the best. This phenomenon illustrates the premature convergence shortcoming of WOA. However, the proposed WOA-DE algorithm gives the highest objective function value under the premise of ensuring the convergence speed. In fact, the remarkable performance of the proposed algorithm is not only reflected in the segmentation task of "Image1", but also in other images. The experimental results in this section indicate that WOA-DE algorithm can better balance the exploration and exploitation, and the complex image segmentation tasks are also competent.

Computation Time
The average CPU time of different algorithms considering all cases is given in Table 4. It can be found from the table that HSO is the fastest among available methods, but the segmentation accuracy discussed above is not ideal. The standard WOA algorithm gives competitive results in some cases, and the proposed algorithm namely WOA-DE is slightly slower than the standard WOA. The reason for this phenomenon is the premature convergence of HSO algorithm, which cannot well balance exploration and exploitation. On the contrary, the WOA-DE algorithm combines the advantages of both WOA and DE, which determine the most appropriate threshold value, despite not being the fastest. To sum up, WOA-DE is a high-performance hybrid algorithm that improves segmentation precision while maintaining runtime.

Statistical Analysis
In this section, a non-parametric statistical test known as "Wilcoxon's rank sum test" is used to evaluate the significant difference between algorithms [56]. The experiments are conducted 30 runs at significance level 5%. All experimental data obtained based on Kapur's entropy are used for testing. The alternative hypothesis (H 1 ) assumes that there is a significant difference between the two algorithms being compared. The null hypothesis H 0 considers that there is no significant difference between the algorithms. The results of the statistical experiments are given in Table 5. It can be observed from the table that the p-values acquired are far less than 0.05. This promising result indicates that H 0 can be rejected in all cases and there is a significant difference between the proposed algorithm and other methods.

Comparison of Otsu and Kapur's Entropy Methods
In order to obtain a simple and powerful technique for color image segmentation, an experiment of comparison between Otsu and Kapur's entropy thresholding techniques based on WOA-DE is conducted in this section. More details of Otsu thresholding technique can be found in [11].
The PSNR, SSIM, and FSIM values obtained by WOA-DE-based method are given in Table 6. It can be seen that WOA-DE-based method using Kapur's entropy gives higher values than using Otsu [57]. Thus, the WOA-DE algorithm based on different thresholding techniques has potential in the field of color image segmentation, which may exhibit superior performance in some engineering problems that have not been solved so far.

Robustness Testing on Noisy Images
In order to further investigate the performance of proposed algorithm, an experiment is conducted on two famous benchmark test images with various noise levels. "Lena" and "Peppers" images are used in this section (see Figure 11), which can be obtained from [58]. The mean value is fixed in this Entropy 2019, 21, 318 21 of 28 experiment, and the level of Gaussian noise is adjusted by setting the variance as 0.00625, 0.0125, 0.025, 0.05, and 0.1, respectively. The experiment is carried out at 12 threshold level, in which case the difference between algorithms is the most obvious. The relevant results are presented in Figures 12-15. It can be observed from the results that the value of performance measures and quality of segmented image decrease with the increase of noise level, and the WOA-DE-Kapur outperforms other methods using Kapur entropy. The promising results indicate that the proposed technique has strong robustness, which can be competent for complex image segmentation tasks with noise. 12 32

Robustness Testing on Noisy Images
In order to further investigate the performance of proposed algorithm, an experiment is conducted on two famous benchmark test images with various noise levels. "Lena" and "Peppers" images are used in this section (see Figure 11), which can be obtained from [58]. The mean value is fixed in this experiment, and the level of Gaussian noise is adjusted by setting the variance as 0.00625, 0.0125, 0.025, 0.05, and 0.1, respectively. The experiment is carried out at 12 threshold level, in which case the difference between algorithms is the most obvious. The relevant results are presented in Figures 12-15. It can be observed from the results that the value of performance measures and quality of segmented image decrease with the increase of noise level, and the WOA-DE-Kapur outperforms other methods using Kapur entropy. The promising results indicate that the proposed technique has strong robustness, which can be competent for complex image segmentation tasks with noise.

Application in MR Image
In this section, the WOA-DE-Kapur-based multilevel thresholding technique is applied to the field of MR image segmentation. The purpose of this experiment is to investigate whether the proposed algorithm is capable of producing high quality segmented MR images. Two other threshold-based MR image segmentation techniques are used for comparison, namely the crow search algorithm-based method using minimum cross entropy thresholding (CSA-MCET) [59] and adaptive bacterial foraging algorithm-based method using Otsu (ABF-Otsu) [60]. The combination of thresholds (K = 2, 3, 4, and 5) selected is the same as that used by above two algorithms in their corresponding articles. Besides, the parameter values are set according to the original literature, except for the population size set to 30 and the number of iterations set to 500 for fair comparison. All experiments are performed 30 times to eliminate errors.
The experimental results are shown in three tables. Table 7 presents the optimal thresholds and PSNR values, Table 8 gives the SSIM and FSIM values, and Table 9 indicates the segmented images obtained by all methods. It can be found from these results that WOA-DE-Kapur method can determine more accurate thresholds compared to other methods. For quantitative analysis, the values of performance measures obtained by proposed method is higher, which indicate the better quality of segmented image. For visual analysis, WOA-DE-Kapur method gives more informative segmented MR images, and the details of image become more prominent as the number of thresholds increases.

Application in MR Image
In this section, the WOA-DE-Kapur-based multilevel thresholding technique is applied to the field of MR image segmentation. The purpose of this experiment is to investigate whether the proposed algorithm is capable of producing high quality segmented MR images. Two other threshold-based MR image segmentation techniques are used for comparison, namely the crow search algorithm-based method using minimum cross entropy thresholding (CSA-MCET) [59] and adaptive bacterial foraging algorithm-based method using Otsu (ABF-Otsu) [60]. The combination of thresholds (K = 2, 3, 4, and 5) selected is the same as that used by above two algorithms in their corresponding articles. Besides, the parameter values are set according to the original literature, except for the population size N set to 30 and the number of iterations t max set to 500 for fair comparison. All experiments are performed 30 times to eliminate errors.
The experimental results are shown in three tables. Table 7 presents the optimal thresholds and PSNR values, Table 8 gives the SSIM and FSIM values, and Table 9 indicates the segmented images obtained by all methods. It can be found from these results that WOA-DE-Kapur method can determine more accurate thresholds compared to other methods. For quantitative analysis, the values of performance measures obtained by proposed method is higher, which indicate the better quality of segmented image. For visual analysis, WOA-DE-Kapur method gives more informative segmented MR images, and the details of image become more prominent as the number of thresholds increases.
Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as non-parametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H 0 ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H 1 ) indicates the difference. A more detailed description of Friedman test can be found in literature [62]. The results of the relevant statistical tests can be observed in Tables 10 and 11. Table 10 presents the average rank and p-value of all algorithms at different threshold levels. As can be found, ABF-Otsu obtains the first rank for K = 3, and WOA-DE-Kapur provides the first rank in other cases.
In other words, the proposed technique gives the best result in general. The p-value for all threshold levels is very small indicating the significant difference among available methods. Table 11 gives the result of Wilcoxon's rank sum test. It can be observed that the p-value is less than 0.05 in most cases, which verifies the remarkable performance of WOA-DE-Kapur technique in a statistical and meaningful way.         Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman   Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman                        Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman  Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman  Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman  Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman  Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman  Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman  Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman  Since the experiments of three methods are the same, it is necessary to carry out relevant statistical tests. In this section, Friedman test [61] and Wilcoxon's rank sum test [56] are used as nonparametric statistical test to evaluate the performance of these methods considering 5% as significant level. Null hypothesis (H ) in Friedman test states equality of medians between the algorithms, and the alternative hypothesis (H ) indicates the difference. A more detailed description of Friedman

Conclusions
In order to obtain an efficient technique for color image segmentation, an improved WOA-based method is introduced in this paper, which is known as WOA-DE. In the proposed algorithm, DE is adopted as a local search strategy with the purpose of enhancing exploitation capability. Compared to the traditional WOA, the WOA-DE algorithm can effectively avoid falling into a local optimum and prevent the loss of population diversity in the later iterations. A series of experiments have been conducted on various color images including natural images and satellite images. Seven meta-heuristic algorithms are utilized for comparison. The experimental results indicate that the proposed techniques outperform other methods in terms of average fitness values, standard deviation (STD), peak signal to noise ratio (PSNR), structural similarity index (SSIM), and feature similarity index (FSIM) as well as the Wilcoxon's rank sum test. In addition, to give more convincing and reliable results, another thresholding technique namely Otsu is adopted for testing. The experimental results indicate that WOA-DE-based technique through Kapur's entropy gives better results than using the Otsu technique in most cases. However, there is no technique that can handle all image segmentation tasks. Thus, it is necessary to introduce more and better techniques to meet the requirements of different image segmentation problems and this is also the motivation for our future research. The performance of some novel meta-heuristic algorithms will be evaluated in this domain, such as salp swarm algorithm, spotted hyena optimizer, emperor penguin optimizer, etc.
Author Contributions: C.L. and H.J. contributed to the idea of this paper; C.L. performed the experiments; C.L. wrote the paper; C.L. and H.J. contributed to the revision of this paper.
Funding: This research received no external funding.