Automatic Threshold Selection Guided by Maximizing Homologous Isomeric Similarity Under Unified Transformation Toward Unimodal Distribution

Zou, Yaobin; Yu, Wenli; Huang, Qingqing

doi:10.3390/electronics15020451

Open AccessArticle

Automatic Threshold Selection Guided by Maximizing Homologous Isomeric Similarity Under Unified Transformation Toward Unimodal Distribution

by

Yaobin Zou

^1,2,*,

Wenli Yu

^1,2

and

Qingqing Huang

^1,2

¹

Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang 443002, China

²

College of Computer and Information Technology, China Three Gorges University, Yichang 443002, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(2), 451; https://doi.org/10.3390/electronics15020451

Submission received: 25 December 2025 / Revised: 15 January 2026 / Accepted: 18 January 2026 / Published: 20 January 2026

(This article belongs to the Special Issue Image Processing and Pattern Recognition)

Download

Browse Figures

Versions Notes

Abstract

Traditional thresholding methods are often tailored to specific histogram patterns, making it difficult to achieve robust segmentation across diverse images exhibiting non-modal, unimodal, bimodal, or multimodal distributions. To address this limitation, this paper proposes an automatic thresholding method guided by maximizing homologous isomeric similarity under a unified transformation toward unimodal distribution. The primary objective is to establish a generalized selection criterion that functions independently of the input histogram’s pattern. The methodology employs bilateral filtering, non-maximum suppression, and Sobel operators to transform diverse histogram patterns into a unified, right-skewed unimodal distribution. Subsequently, the optimal threshold is determined by maximizing the normalized Renyi mutual information between the transformed edge image and binary contour images extracted at varying levels. Experimental validation on both synthetic and real-world images demonstrates that the proposed method offers greater adaptability and higher accuracy compared to representative thresholding and non-thresholding techniques. The results show a significant reduction in misclassification errors and improved correlation metrics, confirming the method’s effectiveness as a unified thresholding solution for images with non-modal, unimodal, bimodal, or multimodal histogram patterns.

Keywords:

image thresholding; Renyi mutual information; homologous isomeric similarity; bilateral filtering

1. Introduction

Image thresholding is a widely applied low-level image processing method owing to its simplicity and effectiveness [1,2]. Its applications span various fields, including but not limited to industrial non-destructive testing [3], remote sensing target detection [4], infrared pedestrian recognition [5], and biomedical image analysis [6,7]. Image thresholding involves comparing each pixel’s gray level with a selected threshold, thereby dividing the image into target and background regions [8]. Threshold selection is crucial for segmentation accuracy. Therefore, one core objective of image thresholding is to robustly determine a reasonable threshold.

The gray-level histograms of images typically exhibit non-modal, unimodal, bimodal, or multimodal patterns. Automatically selecting reasonable thresholds for these four different histogram modalities within a unified framework is a challenging issue. This challenge is the direct derivation of a unified threshold selection criterion from these varying modalities.

Among various thresholding methods, some representative ones primarily include histogram-based methods, entropy-based methods, and methods based on the numerical characteristics of random variables. Histogram-based methods mainly select thresholds by utilizing shape features or statistical information of the gray-level histogram, but they often fail in images with imbalanced or fluctuating histograms [9,10,11,12,13]. Entropy-based methods, such as those using Shannon, Tsallis, Masi, or Kaniadakis entropy, maximize information entropy to select thresholds [14,15,16,17,18,19,20,21]. However, these methods often involve nonextensive parameters that require manual tuning, limiting their adaptability to complex images. Methods based on numerical characteristics, such as the representative Otsu method [22], utilize variance or derivatives to distinguish between classes. While effective for bimodal histograms, their performance degrades significantly for non-modal, unimodal, or multimodal patterns, especially when the target size is imbalanced [23,24,25]. In summary, most existing methods tend to focus on handling images with specific histogram patterns and lack a unified framework for diverse modalities. A detailed review of these related works is presented in Section 2.

To automatically threshold images with non-modal, unimodal, bimodal, or multimodal gray-level histograms within a unified framework, a homologous isomeric similarity thresholding (HIST) method under a unified transformation toward unimodal distribution is proposed. The HIST method first performs a unified transformation on an input image to obtain an edge image. The gray-level histogram of the edge image exhibits a special right-skewed unimodal pattern, with its mode located at the leftmost end of the gray-level histogram. Subsequently, the HIST method extracts contours from binary images obtained by different thresholds to generate corresponding binary contour images. Finally, the appropriate threshold is determined by searching for the binary contour image with the maximum homologous isomeric similarity to the edge image.

The remainder of this manuscript is structured to systematically develop and validate this HIST framework. Section 2 reviews the related work on image thresholding. Section 3 introduces the unified transformation method to obtain the gray-level edge images with the right-skewed unimodal patterns. This section is theoretically pivotal as it establishes the prerequisite for a unified criterion by converting diverse histogram patterns into a standardized, right-skewed unimodal distribution. Section 4 describes a method for extracting binary contour images. Section 5 analyzes the homologous isomeric similarity and its quantification using normalized Renyi mutual information. This mathematical formulation serves as the robust objective function for the threshold selection problem. Section 6 outlines the overall framework of the proposed HIST method and provides the corresponding algorithm steps. Section 7 provides a thorough experimental evaluation and comparative analysis on both synthetic and real-world datasets, validating the adaptability and accuracy of the proposed method. Finally, Section 8 concludes the study by summarizing the contributions and outlining future research directions.

2. Related Work

Song et al. [9] derived the gray-level gradient histogram by calculating the average gradient of an input image, and then determined the optimal threshold by analyzing the shape of this histogram. The method is effective for unimodal histograms but unsuitable for bimodal or multimodal histograms. Additionally, histogram fluctuations can impact threshold selection stability. Christy and Umamakeswari [10] proposed the “percentage split distribution” for image thresholding, efficiently dividing images into groups by pixel distribution percentages. Although suitable for real-time applications, the method is prone to errors in images with imbalanced gray-level histograms, potentially leading to empty segmentation regions. Farshi and Demirci [11] proposed a thresholding method that treats the histogram curve as the objective function. This method utilizes multimodal particle swarm optimization to automatically locate the peaks and valleys of the histogram, and selects the valley between two peaks as the threshold. While offering automation and user independence, the method uses a Gaussian mask with a fixed variance of 3 for histogram smoothing, potentially limiting its adaptability. Manda and Kim [12] introduced a rapid thresholding technique for infrared images using histogram approximation and circuit theory. The approach involves modeling the gray-level histogram as the transient response of a first-order linear circuit, with threshold determination based on circuit theory operators. The method is versatile for various infrared imaging scenarios but is histogram shape-dependent and best suited for images with long-tailed or heavy-tailed distributions. Elen and Dönmez [13] proposed a method that determines alpha and beta regions using the mean and standard deviation of the gray-level histogram, and the final threshold is obtained by calculating the average gray level of these regions. The method is computationally straightforward, making it appropriate for real-time systems. However, it is mainly suited for images with long-tail or heavy-tail distributions and may not perform well on images with imbalanced histograms.

The entropy-based methods primarily utilize different entropy models to design the objective functions for threshold selection. Since Kapur et al. [14] first proposed the maximum Shannon entropy method, the idea of selecting thresholds by maximizing the sum of target and background entropies has garnered widespread attention. Portes de Albuquerque et al. [15] introduced Tsallis entropy to image thresholding to address nonextensive information in images. The parameter in Tsallis entropy allows for flexible adjustment within image categories, but selecting the appropriate parameter requires experience and experimentation. Lin and Ou [16] further employed the nonextensive parameter in Tsallis entropy to capture local long-range correlations. However, without manual ground-truth labeling, automatically selecting the appropriate nonextensive parameter is challenging due to the lack of a quantitative relationship between the parameter value and long-range correlations. Additionally, the method is primarily targeted at images with specific long-range correlations, potentially limiting its generalization abilities. Nie et al. [17] introduced a thresholding method using Masi entropy, combining the nonextensivity of Tsallis entropy with the additivity of Renyi entropy. This allows the method to adapt to a broader range of image types. Nonetheless, its thresholding results are sensitive to the Masi entropy parameter, and an effective automatic method for determining the parameter has yet to be developed.

Within the framework of the maximum entropy, Sparavigna [18] employed Kaniadakis entropy to create a threshold selection objective function; however, choosing an appropriate Kaniadakis entropy index requires additional knowledge and experience. Building on Sparavigna’s thresholding technique [18], Lei and Fan [19] adaptively select the parameter in Kaniadakis entropy via particle swarm optimization. However, particle swarm optimization will increase computational complexity. Furthermore, the method is more suitable for images with long-tailed gray-level histograms. Ferreira Junior et al. [20] proposed an image thresholding method that integrates Tsallis and Masi nonextensive entropies to leverage their ability to represent long-range and short-range correlations. By using two nonextensive entropy parameters, the method effectively captures gray-level long-range correlations, improving segmentation for images with local long-range correlations. The added flexibility of two entropy parameters, however, complicates the parameterization process. Deng et al. [21] combined nonextensive Tsallis entropy with within-class variance in gray-level distribution, enhancing small target extraction by introducing the nonextensive parameter to model long-range pixel correlations. The automatic estimation of the parameter adds adaptability to the algorithm. However, the estimated parameter may not accurately represent long-range correlations, especially in complex and varied images. Moreover, the method’s performance relies on image long-range correlations, potentially limiting its consistent performance across different types of images.

Thresholding methods based on the numerical characteristics of random variables primarily utilize first-order, second-order, or higher-order statistics to design the objective functions for threshold selection. The Otsu method, which is effective for bimodal histograms, is a representative example. However, target size significantly impacts the Otsu method, leading to over-segmentation for target sizes of less than 80% and under-segmentation for target sizes of over 80% [22]. Many enhanced methods address the limitations of Otsu method, with the weighted Otsu improvement being representative. These methods incorporate weight information into Otsu’s objective function to approximate the optimal threshold more closely. For instance, Xing et al. [23] proposed a valley-emphasis weighting method. This method incorporates a second-order derivative as a valley metric into Otsu’s objective function, thereby making the threshold closer to the histogram’s valley. However, the second-order derivative’s sensitivity to histogram noise necessitates a smooth and well-shaped gray-level histogram. Kang et al. [24] introduced a parameter to modify the inter-class variance calculation in the Otsu method, thereby emphasizing target weight and refining the threshold selection via an adaptive iterative algorithm. The parameter computation relies on the histogram probability gradient, which, in cases with significant noise, may yield larger values, resulting in a smaller parameter value. This can diminish the effect of the parameter, potentially reducing the objective function proposed by Kang et al. to the standard Otsu form. Singh et al. [25] utilized a weight function based on Kapur’s entropy to modify Otsu’s function. The fusion of Otsu’s and Kapur’s methods offers a more comprehensive evaluation of image characteristics, enhancing threshold selection. Although the method reduces user intervention in determining thresholds, it still requires parameter tuning for metaheuristic algorithms, which can affect the performance and may require expertise.

3. Unified Transformation Toward Unimodal Distribution

The gray-level histograms of images may exhibit different distribution patterns: non-modal, unimodal, bimodal, or multimodal. A unified transformation method is proposed to convert the gray-level histograms of different patterns into a unified, unimodal, right-skewed gray-level histogram. The unified transformation is applied to an input image to generate a gray-level edge image with this desired distribution (see Figure 1).

For an input image

f

, the unified transformation first applies the bilateral filtering [26] to the image

f

:

Z = \frac{1}{W_{p}} \sum_{q \in S} G_{σ_{s}} (‖p - q‖) G_{σ_{r}} (| f_{p} - f_{q} |) f_{q}

(1)

W_{p} = \sum_{q \in S} G_{σ_{s}} (‖p - q‖) G_{σ_{r}} (|f_{p} - f_{q}|)

(2)

G_{σ_{s}} (‖p - q‖) = e^{(- \frac{{(x - k)}^{2} + {(y - l)}^{2}}{2 σ_{s}^{2}})}

(3)

G_{σ_{r}} (| f_{p} - f_{q} |) = e^{(- \frac{{(f_{p} - f_{q})}^{2}}{2 σ_{r}^{2}})}

(4)

Z

denotes the image obtained by bilateral filtering,

W_{p}

is a normalization factor, and

S

represents the filtering kernel.

p

and

q

denote the current pixel and the pixel within the

S

-neighborhood, respectively, with their gray levels denoted by

f_{p}

and

f_{q}

.

‖p - q‖

represents the spatial distance between pixels

p

and

q

, while

|f_{p} - f_{q}|

denotes the difference in gray levels between pixels

p

and

q

.

(x, y)

indicates the coordinates of the current pixel

p

, and

(k, l)

denotes the coordinates of pixel

q

within the

S

-neighborhood.

G_{σ_{s}} (‖p - q‖)

and

G_{σ_{r}} (| f_{p} - f_{q} |)

denote the spatial Gaussian weighting and gray-level Gaussian weighting, respectively.

σ_{s}

and

σ_{r}

represent the spatial domain parameter and the gray-level domain parameter, respectively, which adjust the influence of pixel distance and pixel gray-level difference on the weights. The size of the filtering kernel is typically odd, so the paper sets the filtering kernel radius to 5, resulting in a kernel size of

11 \times 11

. Given that 95% of the components of the Gaussian function are concentrated within

[- 2 σ_{s}, 2 σ_{s}]

, the spatial domain parameter

σ_{s}

can be determined by

σ_{s} = ⌈r / 2⌉ = 3

, where

⌈\cdot⌉

denotes the ceiling function. The gray-level domain parameter

σ_{r}

should not be too small or too large. When

σ_{r}

is too small, the smooth effect of bilateral filtering on the image is insufficient, failing to effectively suppress noise and other interferences. Conversely, when

σ_{r}

is too large, the bilateral filtering will gradually degrade into Gaussian filtering, thereby failing to preserve edge details well. In this paper,

σ_{r}

is set to 0.3, which enables the bilateral filtering to suppress noise while preserving edge details effectively.

The unified transformation method utilizes a

3 \times 3

Sobel operator to compute the numerical approximations of the first-order partial derivatives of the image

Z

in both horizontal and vertical directions, thereby calculating the gradient magnitude image

M

and the gradient orientation

θ

of the image

Z

. To refine edge details and remove redundant points, non-maximum suppression [27] is applied to the gradient magnitude image

M

along the gradient direction

θ

to obtain image

D

. After normalizing the gray levels of the image

D

to

[0, 255]

, the gray-level histograms of the images

D

corresponding to four synthetic images in Figure 1 are shown in Figure 2.

From Figure 2a–c, it can be observed that there is a minor peak to the right of the leftmost main peak in the gray-level histogram of the image

D

, which corresponds to some spurious edges in the image

D

. To obtain a gray-level edge image with a unified right-skewed unimodal distribution, the unified transformation method utilizes the maximum entropy thresholding method [14] to calculate a threshold

v

for the image

D

, and then modifies the gray levels of pixels in the image

D

with values below the threshold

v

to 0. After the above processing, the final gray-level edge image preserves robust edge features while containing a little noise, and exhibits a unified right-skewed unimodal distribution (see Figure 1). Hereinafter, the symbol

E

is employed to denote the gray-level edge image obtained by the unified transformation method.

4. Extraction of Binary Contour Images

Let

f

denote an 8-bit gray-level image with gray levels in the interval

[t_{\min}, t_{\max}]

. Given a gray level

t

in the interval

[t_{\min}, t_{\max}]

, we can obtain a corresponding binary image

B_{t}

by applying the following formula to threshold image

f

:

B_{t} (x, y) = \{\begin{array}{l} 1 & f (x, y) > t \\ 0 & f (x, y) \leq t \end{array}

(5)

For a binary image

B_{t}

, let

B_{t} (x, y)

denote a pixel in the image

B_{t}

. The 4-neighborhood pixels of the pixel

B_{t} (x, y)

can be denoted as

N_{4} (x, y) = {B_{t} (x - 1, y), B_{t} (x + 1, y), B_{t} (x, y - 1), B_{t} (x, y + 1)}

. The interior pixel region of the binary image

B_{t}

is defined as

I R = \{(x, y) | \forall b \in N_{4} (x, y), B_{t} (x, y) \land b = 1\}

.

Performing a bitwise inversion on the binary image

B_{t}

, we can obtain a corresponding binary image

\tilde{B_{t}}

. The 4-neighborhood pixels of the pixel

\tilde{B_{t}} (x, y)

can be denoted as

\tilde{N_{4}} (x, y) = {\tilde{B_{t}} (x - 1, y), \tilde{B_{t}} (x + 1, y), \tilde{B_{t}} (x, y - 1), \tilde{B_{t}} (x, y + 1)}

. The interior pixel region of the binary image

\tilde{B_{t}}

is defined as

\tilde{I R} = \{(x, y) | \forall b \in \tilde{N_{4}} (x, y), \tilde{B_{t}} (x, y) \land b = 1\}

.

Algorithm 1 is employed to extract the contour image

C_{t}

from the image

B_{t}

: For the binary image

B_{t}

, the corresponding inner contour image

I C_{t}

can be obtained by removing the pixels in its interior pixel region

I R

(i.e., setting the pixel values in the

I R

of the binary image

B_{t}

to 0). Similarly, the outer contour image

O C_{t}

of the binary image

B_{t}

can be obtained by removing the pixels in the interior pixel region

\tilde{I R}

of the binary image

\tilde{B_{t}}

. Further, we can obtain the final binary contour image

C_{t}

of the binary image

B_{t}

by computing the union of the inner and outer contour images, as follows:

C_{t} = I C_{t} \cup O C_{t}

.

Algorithm 1: Extract $C_{t}$
Input:	A binary image $B_{t}$ .
Output:	A binary contour image $C_{t}$ .
Step 1:	Obtain an inner contour image $I C_{t}$ by setting the pixel values in the $I R$ of the binary image $B_{t}$ to 0.
Step 2:	Obtain an outer contour image $O C_{t}$ by setting the pixel values in the $\tilde{I R}$ of the binary image $B_{t}$ to 0.
Step 3:	Obtain a binary contour image $C_{t}$ by computing the union of the inner and outer contour images, as follows: $C_{t} = I C_{t} \cup O C_{t}$ .

By applying the above binary contour extraction method to each gray level

t \in [t_{\min}, t_{\max}]

in the gray-level image

f

, we can obtain a set of binary contour images

\{C_{t} | t \in [t_{\min}, t_{\max}]\}

. These binary contour images reflect the contour features of the target in the image to varying degrees.

5. Calculation of Homologous Isomeric Similarity

The gray-level edge image

E

and the binary contour image

C_{t}

are homologous, as they derive from the same original gray-level image

f

. The gray-level edge image

E

and the binary contour image

C_{t}

are isomeric: the former describes the target edge features of image

f

in the discrete gray-level space, while the latter describes the target contour features of image

f

in the discrete binary space (see Figure 3). Both target edges and target contours objectively describe the spatial location and shape of the target in an image, meaning that there is a specific similarity in terms of planar geometric structure between target edges and target contours. Thus, there is a certain similarity between the homologous and isomeric gray-level edge image

E

and the binary contour image

C_{t}

. We adopt the terminology from biology and chemistry to designate this similarity a Homologous Isomeric Similarity (HIS).

The HIS between the gray-level edge image

E

and the binary contour image

C_{t}

varies with different thresholds

t

. For threshold

t

, which is closer to the ideal threshold, the corresponding binary contour image

C_{t}

more accurately represents the target contour features, resulting in a higher HIS. In contrast, for threshold

t

, which is far from the ideal threshold, the corresponding binary contour image

C_{t}

has a lower accuracy in representing the target contour features, leading to a lower HIS. In other words, if there exists a binary contour image

C_{t^{*}}

that exhibits the maximum HIS with the gray-level edge image

E

, then the corresponding gray-level

t^{*}

is likely to be at a relatively reasonable threshold.

Under the guiding principle of maximizing HIS, the HIST method converts the problem of selecting a reasonable threshold

t^{*}

into finding the binary contour image

C_{t^{*}}

that is most similar to the gray-level edge image

E

. The critical issue after this conversion is how to robustly measure the HIS.

Renyi mutual information can capture higher-order dependencies than Shannon mutual information, rendering it more robust for analyzing asymmetric or non-uniformly distributed data [28]. Since the discrete probability distributions of the gray-level edge image

E

and the binary contour image

C_{t}

are typically asymmetric and non-uniform, Renyi mutual information is a suitable similarity measure. Furthermore, as the normalized mutual information is more robust than the standard mutual information [29,30], this paper adopts the normalized form of Renyi mutual information [31] as the HIS measure, and proposes the following objective function for selecting the final threshold

t^{*}

:

t^{*} = \underset{t \in [t_{\min}, t_{\max}]}{\arg \max} ξ (E, C_{t})

(6)

ξ (E, C_{t}) = \frac{H_{α} (E) + H_{α} (C_{t})}{H_{α} (E, C_{t})}

(7)

Equation (6) defines the objective of the threshold selection problem. It states that the optimal threshold

t^{*}

is the value within a feasible range

[t_{\min}, t_{\max}]

that maximizes the similarity metric between the edge image

E

and the binary contour image

C_{t}

. This transforms this visual task of matching contours into a numerical optimization problem. To quantify the similarity in Equation (6), we employ the normalized Renyi mutual information, as shown in Equation (7), where

H_{α} (E)

and

H_{α} (C_{t})

are the Renyi entropies of image

E

and image

C_{t}

, respectively, and

H_{α} (E, C_{t})

represents the joint Renyi entropy of images

E

and

C_{t}

. This is defined as follows:

H_{α} (E) = \frac{1}{1 - α} \log \sum_{i = 0}^{255} {(p_{E} (i))}^{α}, α \neq 1

(8)

H_{α} (C_{t}) = \frac{1}{1 - α} \log \sum_{j = 0}^{1} {(p_{C_{t}} (j))}^{α}, α \neq 1

(9)

H_{α} (E, C_{t}) = \frac{1}{1 - α} \log \sum_{E, C_{t}} {(P_{E C_{t}} (i, j))}^{α}, α \neq 1

(10)

where

p_{E} (i)

denotes the probability of gray-level

i

in the gray-level edge image

E

,

i \in [0, 255]

, and

p_{C_{t}} (j)

denotes the probability of pixel value

j

in the binary contour image

C_{t}

,

j \in [0, 1]

.

P_{E C_{t}} (i, j)

represents the joint probability between the gray-level edge image

E

and the binary contour image

C_{t}

.

The Renyi mutual information has a parameter

α

. When

α = 1

, the Renyi entropy degenerates into Shannon entropy, and the normalized Renyi mutual information also degenerates into the normalized Shannon mutual information. Shannon mutual information treats each gray level proportionally to plogp, so it is equally sensitive to all structural components of the histogram, including small asymmetries and long tails. When

α > 1

, it assigns higher weights to high-probability events and lower weights to low-probability events in the probability distribution. Because the gray-level distribution of the edge image

E

contains sparse data points, utilizing Renyi mutual information with

α > 1

can reduce the contribution of these outliers to the HIS, thereby stabilizing the computation of the HIS. Further, when

α = 2

, Equations (8)–(10) will involve only the squaring operation of the probability distribution, which facilitates the simplification of the HIS computation. Accordingly, this paper sets

α = 2

.

6. Algorithm Description of HIST Method

Algorithm 2 describes key steps for selecting the final threshold in the HIST method, while Figure 3 visually illustrates these key steps.

Algorithm 2: HIST
Input:	A gray level image $f$ .
Output:	A threshold $t^{}$ and a binary image $B_{t^{}}$ .
Step 1:	Set the initial values of variables $ξ$ , $ξ_{\max}$ , and $t^{*}$ to 0.
Step 2:	Extract the gray level edge image $E$ from the image $f$ using the unified transformation method described in Section 3.
Step 3:	for $t = t_{\min}$ to $t_{\max}$ do
Step 4:	Thresholding the gray level image $f$ with the gray level $t$ to obtain a binary image $B_{t}$ . Then, extract the binary contour image $C_{t}$ from the image $B_{t}$ using the Algorithm 1 outlined in Section 4.
Step 5:	Calculate the normalized Renyi mutual information between the gray level edge image $E$ and the binary contour image $C_{t}$ using Equation (7) and record the result in $ξ$ .
Step 6:	if $(ξ > ξ_{\max})$ then $ξ_{\max} = ξ$ ; $t^{*} = t$
Step 7:	end if
Step 8:	end for
Step 9:	Thresholding the gray level image $f$ using the final threshold $t^{}$ , and output the thresholding result image $B_{t^{}}$ and the threshold $t^{*}$ .

7. Experimental Results and Discussions

This section presents the experimental results and a discussion to validate the effectiveness and adaptability of the proposed HIST method. Comprehensive comparative experiments were conducted on both synthetic and real-world images against five state-of-the-art methods. To rigorously assess segmentation accuracy, Misclassification Error (ME) and Matthews Correlation Coefficient (MCC) were adopted as the primary evaluation metrics, while Jaccard, Dice, and Root–Mean–Square (RMSE) are provided in the Supplementary Materials. The experimental section is organized as follows to ensure a logical flow: Section 7.1 details the experimental environment and evaluation metrics to establish the testing framework; Section 7.2 analyzes the results using synthetic images under controlled conditions; Section 7.3 extends the validation to real-world images; and Section 7.4 compares the computational efficiency of the methods.

7.1. Experimental Environment, Comparison Methods, and Quantitative Evaluation Indicators

The parameters of the main software and hardware used in the experiments are as follows: AMD R7-6800H 3.20 GHz CPU, 16 GB DDR5 memory, Windows 11 64-bit operating system, and Matlab 2018b 64-bit development platform. The test image set comprised 8 synthetic images and 100 real-world images. Their gray-level histograms exhibit non-modal, unimodal, bimodal, or multimodal patterns. The test images and their ground truth images can be accessed via this link: https://wwqj.lanzoum.com/i8OMo2y1jkxa (accessed on 22 December 2025).

The proposed HIST method is compared with five recently developed methods, including three thresholding methods and two non-thresholding methods. The five compared methods are as follows: histogram-based global thresholding (HBGT) method [13], thresholding method based on nonextensive entropy and variance of grayscale distribution (NEVGD) [21], second-order derivative valley emphasis (SDVE) thresholding method [23], fuzzy subspace clustering (FSC) method [32], and region-edge-based active contours (RAC) method [33].

ME [5,20,21] and MCC [34,35] were utilized to quantitatively evaluate the segmentation accuracy of each method. Jaccard, Dice, and RMSE [36,37] were also employed. Note that, regardless of whether Jaccard, Dice or RMSE metrics were employed, or MCC and ME metrics were employed, the conclusion regarding the adaptability and accuracy of the six methods remained unchanged. Therefore, the experimental results and analysis of Jaccard, Dice, and RMSE metrics are made available as online resources in the Supplementary Materials section.

ME reflects the proportion of target pixels misclassified as background pixels and vice versa in the segmentation result image. Its calculation is as follows:

M E = 1 - \frac{∣ O_{g} \cap O_{r} ∣ + ∣ B_{g} \cap B_{r} ∣}{∣ O_{g} ∣ + ∣ B_{g} ∣}

(11)

where

O_{g}

and

B_{g}

represent the set of the target and background pixels in the ground truth image, respectively.

O_{r}

and

B_{r}

represent the set of the target and background pixels in the segmentation result image, respectively. The symbol

\cap

denotes the intersection operation and the symbol

| \cdot |

is used to count the number of elements in a set. The ME value varies in the range [0, 1], and a lower ME indicates fewer misclassifications in the segmentation result image. Specifically, ME equals 0 when the segmentation result image is identical to the ground truth image, and ME equals 1 when the segmentation result image is completely opposite to the ground truth image.

MCC takes into account all four components of the confusion matrix: true positives, true negatives, false positives, and false negatives, making it a robust quantitative evaluation metric. The formula for calculating MCC is as follows:

M C C = \frac{(T P \cdot T N - F P \cdot F N)}{\sqrt{((T P + F P) \cdot (T P + F N) \cdot (T N + F P) \cdot (T N + F N))}}

(12)

where

T P

and

T N

represent the number of true-positive and true-negative elements, respectively, and

F P

and

F N

represent the number of false-positive and false-negative elements, respectively. The MCC value varies in the range [−1, 1]. Specifically, MCC equals 1 when the segmentation result image is identical to the ground truth image, and MCC equals −1 when the segmentation result image is completely opposite to the ground truth image.

7.2. Experimental Results and Analysis Using Synthetic Images

To test the six methods’ adaptability and accuracy for images with non-modal, unimodal, bimodal, or multimodal gray-level histograms, comparative experiments were first performed on eight synthetic images in Figure 4. In Figure 4a,b, the size ratio of the target to the background is relatively balanced, and their gray-level histograms exhibit a non-modal pattern (see Figure 5a,b). In Figure 4c,d, the size ratio of the target to the background is severely imbalanced, and their gray-level histograms exhibit unimodal pattern (see Figure 5c,d). Figure 4e,f show relatively balanced target-to-background size ratios, and exhibit bimodal gray-level histograms (see Figure 5e,f). Figure 4g,h also show relatively balanced target-to-background size ratios, but exhibit multimodal gray-level histograms (see Figure 5g,h).

Threshold selection is crucial in image thresholding. Therefore, we first compare and analyze the differences in threshold selection when using the proposed HIST method and three comparative thresholding methods. Note that

t_{M E}^{*}

denotes the optimal threshold in the sense of minimizing ME, and

t_{M C C}^{*}

denotes the optimal threshold in the sense of maximizing MCC. Table 1 presents the

t_{M E}^{*}

and

t_{M C C}^{*}

of the eight synthetic images, along with the thresholds selected by the four thresholding methods.

Figure 5 illustrates the objective function curves for the proposed HIST method on the eight synthetic images. Figure 5 also presents the differences in threshold selection among the four thresholding methods for these images. In Figure 5, each green area represents the gray-level histogram of a test image, and each black curve denotes the objective function curve of the HIST method on this test image. According to Table 1 and Figure 5, for the eight synthetic images with different histogram patterns, the thresholds obtained by the proposed HIST method are consistently equal to or fall within the range of the optimal thresholds. In contrast, the thresholds selected by the other three thresholding methods deviate from the optimal thresholds to varying degrees. Figure 6 further shows the segmentation results of the eight synthetic images obtained using the six methods.

The HBGT method utilizes the mean and standard deviation of the gray-level histogram to determine alpha and beta regions. Then, it selects the threshold by calculating the average gray levels of the alpha and beta regions. Because the HBGT method relies solely on statistical information, including the mean and standard deviation of the gray-level histogram for threshold selection, it has difficulty in selecting satisfactory thresholds for the eight synthetic images with different gray-level histogram patterns. Specifically, the synthetic images in Figure 4a–d,g,h exhibit non-modal, unimodal, or multimodal patterns, respectively. The thresholds obtained by the HBGT method are 97, 128, 100, 82, 113, and 120, all of which significantly deviate from the optimal thresholds. For Figure 4e,f, their gray-level histograms exhibit obvious bimodal patterns. The thresholds obtained by the HBGT are 120 and 84, respectively, which are close to the valley between the two peaks in the histograms (see Figure 5e,f). However, they still have a certain degree of deviation from the optimal thresholds at the valley positions. Overall, it can be observed from Figure 6a–h that the HBGT method fails to effectively extract the targets from the background in synthetic images with non-modal, unimodal, or multimodal histograms.

The SDVE method improves the Otsu method by introducing a modified valley metric based on the second-order derivative, aiming to make the threshold more likely to be located at the valley between two peaks or at the bottom of the unimodal histogram. The histograms of Figure 4a,b exhibit an approximately uniformly distributed non-modal pattern. Due to the absence of obvious peak–valley features in their histograms, the SDVE method tends to select the threshold by maximizing the inter-class variance. The thresholds obtained by the SDVE method for these two images are 121 and 116, which deviate from the optimal thresholds by 6 and 61 gray levels, respectively, resulting in varying degrees of misclassification (see Figure 6a,b). The Otsu method is suitable for thresholding images with a balanced target-to-background size ratio. Although the SDVE method introduces a second-order derivative weight into the objective function of the Otsu method to enhance its valley emphasis, for the two unimodal synthetic images with small targets in Figure 4c,d, it still yields thresholds far from the optimal thresholds at the bottom right of the unimodal histograms (see Figure 5c,d). Consequently, the SDVE method fails to effectively extract the small targets from noisy backgrounds (see Figure 6c,d). Figure 4e,f are two bimodal synthetic images with a relatively balanced size ratio of the target to the background. The thresholds obtained by the SDVE method are close to the valley between two peaks but still deviate to some extent from the optimal thresholds at the valleys, resulting in the misclassification of some background pixels as target pixels in the segmentation results (see Figure 6e,f). For the two multimodal synthetic images in Figure 4g,h, the SDVE method still has difficulty in selecting satisfactory thresholds, leading to obvious misclassification (see Figure 6g,h).

The NEVGD method is an entropy-based method that combines nonextensive entropy with the gray-level variance, which enhances the ability of the maximum Tsallis entropy method to extract targets from the background to some extent. When the gray-level distributions of the targets and the background are uniform and non-overlapping, the maximum entropy method can theoretically obtain the optimal threshold. Thus, for Figure 4a, the threshold obtained by the NEVGD method is 115, which is identical to the optimal threshold. For the non-modal image in Figure 4b, as its distribution is approximately uniform across the entire range [0, 255], the NEVGD method results in a threshold of 126. This outcome is foreseeable considering the maximum Tsallis entropy. However, the threshold of 126 is up to 51 gray levels away from the optimal threshold. Combined with Figure 6b, it can be seen that the NEVGD method fails to separate the target from this image. Although the NEVGD method introduces the gray-level variance in images into the objective function of maximum Tsallis entropy, it still produces serious misclassification for both unimodal and multimodal synthetic images (see Figure 6c,d,g,h). For the two bimodal synthetic images in Figure 4e,f, while the NEVGD method can roughly extract the targets, it still misclassifies some obvious background pixels as targets (see Figure 6e,f), and its segmentation accuracy is also much lower than the proposed HIST method.

Figure 6 shows that the proposed HIST method overall achieves more accurate results. Table 2 and Table 3 present the ME and MCC values, respectively, for the six methods on the eight synthetic images. In these two tables, the minimum ME values and maximum MCC values corresponding to each image are indicated in bold. Statistical analysis indicates that, on the eight synthetic images, the average ME values for the HIST, HBGT, SDVE, NEVGD, FSC, and RAC methods are 0.0013, 0.2265, 0.1797, 0.1220, 0.1168, and 0.0153, respectively; the average MCC values for the six methods are 0.9969, 0.5715, 0.6385, 0.6603, 0.6873, and 0.7164, respectively. Compared to the RAC method, which holds the second-highest segmentation accuracy, the proposed HIST method achieves a 91.50% decrease in the ME mean and a 39.15% increase in the MCC mean. Since lower ME values and higher MCC values indicate superior accuracy, the results in Table 2 and Table 3 further confirm that the proposed HIST method possesses higher accuracy and greater adaptability for test images with non-modal, unimodal, bimodal, or multimodal histogram patterns.

7.3. Experimental Results and Analysis on Real-World Images

To test the six methods’ adaptability and accuracy for real-world images with non-modal, unimodal, bimodal, or multimodal gray-level histograms, comparative experiments are further performed on 100 real-world images. Among these test images, the gray-level histograms of images numbered 1 through 10 are non-modal; those of images numbered 11 through 29 exhibit unimodal; those of images numbered 30 through 69 are bimodal; and those of images numbered 70 through 100 are multimodal, with the number of peaks in the histograms being three or more. These real-world images were collected from various application domains and were captured using diverse imaging methods, primarily including ultrasonic imaging, infrared thermal imaging, and optical CCD imaging.

Figure 7 shows eight representative test images from the 100 real-world images, and the gray-level histograms of the eight images exhibit non-modal, unimodal, bimodal, or multimodal patterns (see Figure 8). Figure 8 also shows the objective function curves of the proposed HIST method on the eight images, and illustrates threshold selection differences among the four thresholding methods on the eight images. For the eight images shown in Figure 7, their

t_{M E}^{*}

is 195, 27, 198, 70, 188, 214, 173, and 52, respectively, as is their

t_{M C C}^{*}

.

Figure 8 demonstrates that the thresholds obtained via the proposed HIST method consistently approximate the optimal thresholds most closely. In contrast, thresholds derived from the other three methods exhibit varying degrees of deviation from the optimal thresholds. Specifically, for the two real-world images with non-modal gray-level histograms (see Figure 8a,b), the thresholds obtained by the HBGT, SDVE, and NEVGD methods deviate from the optimal thresholds by 67 and 85 gray levels, 78 and 90 gray levels, and 68 and 84 gray levels, respectively. In contrast, the thresholds selected by the proposed HIST method are completely consistent with the optimal thresholds. For the two real-world images with unimodal gray-level histograms (see Figure 8c,d), the NEVGD method yields thresholds closer to the optimal thresholds than the HBGT and SDVE methods, with deviations of eight and six gray levels, respectively. However, the thresholds obtained by the proposed HIST method deviate from the optimal thresholds by only one gray level. For the four real-world images with bimodal or multimodal gray-level histograms (see Figure 8e–h), only the HIST method still obtains thresholds that are the closest to the optimal thresholds.

Figure 9 further shows six methods’ result images on the 8 representative real-world images. From Figure 9, it can be observed that only the proposed HIST method can successfully separate the targets from the backgrounds. In contrast, the other five methods often exhibit varying degrees of misclassification. Taking the infrared pedestrian image with the multimodal pattern in Figure 7g as an example, the comparison results of different methods are shown in Figure 9g. In Figure 9g, it can be observed that the HBGT, SDVE, FSC, and RAC methods fail to separate the pedestrian from the complex background, with all suffering obvious misclassification. Although the NEVGD method can extract the pedestrian, it mistakenly classifies some bright regions in the background as targets. In contrast, only the proposed HIST method successfully separates the pedestrian target from the background, yielding a result image that most closely resembles the ground truth image.

Figure 10 and Figure 11 further show the ME and MCC values of six methods on all 100 real-world images. It can be calculated that the average ME values of the HIST, HBGT, SDVE, NEVGD, FSC, and RAC methods are 0.0052, 0.1749, 0.2381, 0.0675, 0.1508, and 0.1864, respectively, and the average MCC values of these methods are 0.9699, 0.6071, 0.5361, 0.8215, 0.6484, and 0.6027, respectively. Given that a lower ME and a higher MCC indicate better accuracy, the above statistical results show that the proposed HIST method overall has higher accuracy and greater adaptability on these 100 real-world test images. From Figure 10 and Figure 11, the following can be observed: (i) The ME values of the proposed HIST method are all less than 0.1, and the MCC values are all greater than 0.9, with minimal fluctuations in both metrics. This indicates that the HIST method possesses good accuracy and robust adaptability. (ii) The ME values of the HBGT method are scattered within the range of 0 to 0.7, and the MCC values are scattered between 0 and 1. The ME and MCC values of the SDVE method are both scattered between 0 and 1. These indicate that the HBGT and SDVE methods have relatively lower accuracy and weaker adaptability in general. (iii) Except for a few real-world images with non-modal distributions, the ME values of the NEVGD method are generally below 0.1, with a few values scattered between 0 and 1. The MCC values of the NEVGD method are primarily above 0.6, with a few below 0.6. These indicate that the NEVGD method performs well on unimodal, bimodal, and multimodal test images but is unsuitable for non-modal test images. (iv) FSC and RAC are non-thresholding methods that are more suitable for test images with bimodal distributions. In bimodal test images, their ME values are primarily scattered between 0 and 0.1, and their MCC values are mainly scattered between 0.7 and 1. However, they are not suitable for test images with non-modal, unimodal, or multimodal patterns.

We conducted paired t-tests and Wilcoxon signed-rank tests on the ME and MCC scores obtained from the 100 real-world images. As shown in Table 4, Paired t-tests and Wilcoxon signed-rank tests on the 100 images showed that both the ME and MCC improvements are statistically significant (p < 0.001).

7.4. Comparison of Computational Efficiency

To compare the computational efficiency of different methods, we calculated the CPU runtime for each method on 8 synthetic images and 100 real-world images, respectively. Considering that the CPU runtime may fluctuate at different times, to reduce this fluctuation effect, we calculated the average runtime of 10 consecutive runs as the CPU runtime for the method on each test image. Furthermore, the standard deviation of the CPU runtimes was used as an indicator of the stability in computational efficiency. Table 5 presents the mean and standard deviation of CPU runtimes for each method on 8 synthetic images and 100 real-world images. According to Table 5, the efficiency of the proposed HIST method is surpassed by only the simple thresholding methods HBGT and SDVE, but it outperformed the NEVGD, FSC, and RAC methods.

A synthetic image and a real-world image, shown in Figure 4a and Figure 7d, respectively, were utilized to further test the changing trend of computational efficiency for different methods as the image size varies. The original dimensions of the two images were 128 × 128 pixels and 256 × 256 pixels, respectively. In the experiments, the two images were enlarged to 1024 × 1024 pixels and 2048 × 2048 pixels, respectively.

Table 6 and Table 7 indicate that as the image size gradually increases, the CPU runtime for all six methods also progressively increases. Overall, the HBGT, SDVE, and NEVGD methods exhibit a slower increase in CPU runtime. This is primarily because these three methods first extract one-dimensional gray-level distribution from an input gray-level image, and subsequent threshold calculations are performed in the one-dimensional information space. The slightly higher CPU runtime for the NEVGD is due to the additional determination of the nonextensive parameter value.

The HIST, FSC, and RAC methods suffer a rapid increase in CPU runtime, primarily because their computations are performed in the two-dimensional image space. For the HIST method, the main computational costs arise from the unified transformation, binary contour image extraction, and the calculation of homologous isomeric similarity. For the FSC method, the primary computational costs are associated with local variance and non-local spatial information, mean membership linking, and subspace weight allocation. For the RAC method, the main computational costs occur during region energy calculation, edge energy calculation, and energy function updating.

8. Conclusions and Future Work

Traditional representative thresholding methods tend to focus on handling images with specific gray-level histogram patterns. To address the challenge of automatically thresholding images with non-modal, unimodal, bimodal, or multimodal histogram patterns within a unified framework, the proposed HIST method applies a unified transformation toward unimodal distribution on the input image. This unified transformation converts gray-level histograms of different patterns into a unified, unimodal, right-skewed gray-level histogram, thereby reducing the prior dependence of the HIST method on the gray-level histogram of the original image. Furthermore, the HIST method converts the problem of selecting a reasonable threshold into a computational problem of HIS, and the normalized Renyi mutual information can robustly measure the HIS between the gray-level edge image and the binary contour image. However, implementing this unified transformation and similarity maximization within a computationally efficient framework presents a challenge.

Despite the challenge of maintaining high computational efficiency, experimental results show that the proposed HIST method outperforms five other methods—HBGT, SDVE, NEVGD, FSC, and RAC—in terms of adaptability and accuracy for test images with non-modal, unimodal, bimodal, or multimodal histogram patterns. On synthetic images and real-world images, compared to the second-best method, the proposed method reduces the average ME by 91.50% and 92.30%, respectively, and increases the average MCC by 39.15% and 18.06%, respectively. While the method demonstrates superior segmentation accuracy across diverse modalities, the trade-off between computational cost and precision remains a consideration for real-time applications.

The computational efficiency of the proposed HIST method is one direction that could be further strengthened. One future work will focus on refining the algorithmic process to enhance efficiency, minimizing unnecessary computations, and investigating the implementation of more sophisticated data structures. Moreover, exploring the potential of approximation algorithms is a promising line of inquiry that could potentially reduce computational time without substantially affecting the accuracy and adaptability that the HIST method offers. The main computational costs of the HIST method arise from image processing operations (e.g., bilateral filtering, edge detection) and the iterative calculation of similarity measures for multiple thresholds. These tasks are inherently suitable for parallel execution and align well with the SIMT (Single Instruction, Multiple Data) architecture of modern GPUs, and future work will involve developing a GPU-accelerated version of the HIST method to significantly reduce runtime.

The HIST method currently focuses on selecting a single threshold. Another future work will explore extending the HIS-based methodology to support multi-level thresholding, ensuring the algorithm maintains high efficiency and accuracy. This may require developing innovative algorithmic frameworks to handle the concurrent optimization of multiple thresholds.

Supplementary Materials

Three additional quantitative evaluation metrics, namely, Jaccard, Dice, and RMSE, have also been employed to further assess the accuracy of the six methods. The quantitative evaluation results can be accessed via the link: https://wwqj.lanzoum.com/icGRM2y1k2rc (accessed on 22 December 2025).

Author Contributions

Conceptualization, Y.Z.; methodology, Y.Z.; validation, W.Y. and Q.H.; investigation, W.Y. and Q.H.; resources, Y.Z.; data curation, W.Y. and Q.H.; writing—original draft preparation, Q.H.; writing—review and editing, Y.Z. and W.Y.; visualization, W.Y.; supervision, Y.Z.; project administration, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Hubei Provincial Central Guidance Local Science and Technology Development Project (Grant No. 2024BSB002), and National Natural Science Foundation of China (Grant No. 61871258).

Data Availability Statement

The test images and their ground truth images can be accessed via this link: https://wwqj.lanzoum.com/i8OMo2y1jkxa (accessed on 22 December 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Patra, D.K.; Si, T.; Mondal, S.; Mukherjee, P. Breast DCE-MRI segmentation for lesion detection by multi-level thresholding using student psychological based optimization. Biomed. Signal Process. Control 2021, 69, 102925. [Google Scholar] [CrossRef]
Wang, S.; Fan, J. Simplified expression and recursive algorithm of multi-threshold Tsallis entropy. Expert Syst. Appl. 2024, 237, 121690. [Google Scholar] [CrossRef]
Truong, M.T.N.; Kim, S. Automatic image thresholding using Otsu’s method and entropy weighting scheme for surface defect detection. Soft Comput. 2018, 22, 4197–4203. [Google Scholar] [CrossRef]
Liu, C.; Xie, F.; Dong, X.; Gao, H.; Zhang, H. Small target detection from infrared remote sensing images using local adaptive thresholding. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 1941–1952. [Google Scholar] [CrossRef]
Lei, B.; Fan, J. Infrared pedestrian segmentation algorithm based on the two-dimensional Kaniadakis entropy thresholding. Knowl.-Based Syst. 2021, 225, 107089. [Google Scholar] [CrossRef]
Si, T.; Patra, D.K.; Mondal, S.; Mukherjee, P. Segmentation of breast lesion in DCE-MRI by multi-level thresholding using sine cosine algorithm with quasi opposition-based learning. Pattern Anal. Appl. 2023, 26, 201–216. [Google Scholar] [CrossRef]
Patra, D.K.; Si, T.; Mondal, S.; Mukherjee, P. Magnetic resonance image of breast segmentation by multi-level thresholding using moth-flame optimization and whale optimization algorithms. Pattern Recognit. Image Anal. 2022, 32, 174–186. [Google Scholar] [CrossRef]
Goh, T.Y.; Basah, S.N.; Yazid, H.; Safar, M.J.A.; Saad, F.S.A. Performance analysis of image thresholding: Otsu technique. Measurement 2018, 114, 298–307. [Google Scholar] [CrossRef]
Song, S.; Liu, J.; Ni, H.; Cao, X.; Pu, H.; Huang, B. A new automatic thresholding algorithm for unimodal gray-level distribution images by using the gray gradient information. J. Pet. Sci. Eng. 2020, 190, 107074. [Google Scholar] [CrossRef]
Christy, A.J.; Umamakeswari, A. A novel percentage split distribution method for image thresholding. Optik 2020, 218, 164953. [Google Scholar] [CrossRef]
Farshi, T.R.; Demirci, R. Multilevel image thresholding with multimodal optimization. Multimed. Tools Appl. 2021, 80, 15273–15289. [Google Scholar] [CrossRef]
Manda, M.P.; Kim, H.S. A fast image thresholding algorithm for infrared images based on histogram approximation and circuit theory. Algorithms 2020, 13, 207. [Google Scholar] [CrossRef]
Elen, A.; Dönmez, E. Histogram-based global thresholding method for image binarization. Optik 2024, 306, 171814. [Google Scholar] [CrossRef]
Kapur, J.N.; Sahoo, P.K.; Wong, A.K.C. A new method for gray-level picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 1985, 29, 273–285. [Google Scholar] [CrossRef]
Albuquerque, M.P.; Esquef, I.A.; Mello, A.R.G.; Albuquerque, M.P. Image thresholding using Tsallis entropy. Pattern Recognit. Lett. 2004, 25, 1059–1065. [Google Scholar] [CrossRef]
Lin, Q.; Ou, C. Tsallis entropy and the long-range correlation in image thresholding. Signal Process. 2012, 92, 2931–2939. [Google Scholar] [CrossRef]
Nie, F.; Zhang, P.; Li, J.; Ding, D. A novel generalized entropy and its application in image thresholding. Signal Process. 2017, 134, 23–34. [Google Scholar] [CrossRef]
Sparavigna, A.C. Bi-level image thresholding obtained by means of Kaniadakis entropy. arXiv 2015, arXiv:1502.04500. [Google Scholar] [CrossRef]
Lei, B.; Fan, J. Adaptive Kaniadakis entropy thresholding segmentation algorithm based on particle swarm optimization. Soft Comput. 2020, 24, 7305–7318. [Google Scholar] [CrossRef]
Ferreira Junior, P.E.; Mello, V.M.; Giraldi, G.A. Image thresholding through nonextensive entropies and long-range correlation. Multimed. Tools Appl. 2023, 82, 43029–43073. [Google Scholar] [CrossRef]
Deng, Q.; Shi, Z.; Ou, C. Self-adaptive image thresholding within nonextensive entropy and the variance of the gray-level distribution. Entropy 2022, 24, 319. [Google Scholar] [CrossRef]
Tan, Z.; Basah, S.N.; Yazid, H.; Safar, M.J.A. Performance analysis of Otsu thresholding for sign language segmentation. Multimed. Tools Appl. 2021, 80, 21499–21520. [Google Scholar] [CrossRef]
Xing, J.; Yang, P.; Qingge, L. Automatic thresholding using a modified valley emphasis. IET Image Process. 2020, 14, 536–544. [Google Scholar] [CrossRef]
Kang, S.; He, Y.; Li, W.; Liu, S. Research on defect detection of wind turbine blades based on morphology and improved Otsu algorithm using infrared images. Comput. Mater. Contin. 2024, 81, 933–949. [Google Scholar] [CrossRef]
Singh, S.; Mittal, N.; Singh, H.; Oliva, D. Improving the segmentation of digital images by using a modified Otsu’s between-class variance. Multimed. Tools Appl. 2023, 82, 40701–40743. [Google Scholar] [CrossRef] [PubMed]
Paris, S.; Kornprobst, P.; Tumblin, J.; Durand, F. Bilateral filtering: Theory and applications. Found. Trends Comput. Graph. Vis. 2009, 4, 1–73. [Google Scholar] [CrossRef]
Ling, F.; Kang, M.; Lin, X. Improved Canny edge detection algorithm. Comput. Sci. 2016, 43, 309–312. [Google Scholar] [CrossRef]
Pan, M.; Zhang, F. Analysis of the α-Rényi entropy and its application for medical image registration. Biomed. Eng. Appl. Basis Commun. 2017, 29, 1750020. [Google Scholar] [CrossRef]
Studholme, C.; Hill, D.L.G.; Hawkes, D.J. An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognit. 1999, 32, 71–86. [Google Scholar] [CrossRef]
Gao, Z.; Gu, B.; Lin, J. Monomodal image registration using mutual information based methods. Image Vis. Comput. 2008, 26, 164–173. [Google Scholar] [CrossRef]
Pan, M.; Zhang, F. Medical image registration based on Rényi’s quadratic mutual information. IETE J. Res. 2022, 68, 4100–4108. [Google Scholar] [CrossRef]
Wei, T.; Wang, X.; Li, X.; Zhu, S. Fuzzy subspace clustering noisy image segmentation algorithm with adaptive local variance and non-local information and mean membership linking. Eng. Appl. Artif. Intell. 2022, 110, 104672. [Google Scholar] [CrossRef]
Fang, J.; Liu, H.; Zhang, L.; Liu, J.; Liu, H. Region-edge-based active contours driven by hybrid and local fuzzy region-based energy for image segmentation. Inf. Sci. 2021, 546, 397–419. [Google Scholar] [CrossRef]
Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
Chicco, D.; Warrens, M.J.; Jurman, G. The Matthews correlation coefficient (MCC) is more informative than Cohen’s kappa and Brier score in binary classification assessment. IEEE Access 2021, 9, 78368–78381. [Google Scholar] [CrossRef]
Niazi, M.; Rahbar, K.; Sheikhan, M.; Khademi, M. Entropy-based kernel graph cut for textural image region segmentation. Multimed. Tools Appl. 2022, 81, 13003–13023. [Google Scholar] [CrossRef]
Kaba, D.; Wang, Y.; Wang, C.; Liu, X.; Zhu, H.; Salazar-Gonzalez, A.G.; Li, Y. Retina layer segmentation using kernel graph cuts and continuous max-flow. Opt. Express 2015, 23, 7366–7384. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Unified transformation on four synthetic images toward a unimodal distribution. (Top) Four synthetic images (128 × 128 pixels, generated using Matlab R2018b) and their corresponding gray-level histograms are displayed from left to right: non-modal, unimodal, bimodal, and multimodal. (Bottom) The corresponding edge images and their gray-level histograms after the unified transformation, showing a unified right-skewed unimodal distribution.

Figure 2. (a–d) show the gray-level histograms of the processed images

D

derived from four synthetic images with non-modal, unimodal, bimodal, and multimodal histograms in Figure 1. Image

D

was obtained by applying bilateral filtering, Sobel operators, and non-maximum suppression to the original images.

Figure 2. (a–d) show the gray-level histograms of the processed images

D

derived from four synthetic images with non-modal, unimodal, bimodal, and multimodal histograms in Figure 1. Image

D

was obtained by applying bilateral filtering, Sobel operators, and non-maximum suppression to the original images.

Figure 3. Graphical illustration of Homologous Isomeric Similarity (HIS) and key steps of the proposed HIST method. The process includes (1) unified transformation to obtain an edge image

E

, (2) extraction of binary contour images

C_{t}

at different thresholds

t

, and (3) maximization of the HIS to select the optimal threshold

t^{*}

.

Figure 3. Graphical illustration of Homologous Isomeric Similarity (HIS) and key steps of the proposed HIST method. The process includes (1) unified transformation to obtain an edge image

E

, (2) extraction of binary contour images

C_{t}

at different thresholds

t

, and (3) maximization of the HIS to select the optimal threshold

t^{*}

.

Figure 4. The eight synthetic images used in the experiment. The histograms of (a,b) are non-modal; those of (c,d) are unimodal with small targets; those of (e,f) are bimodal; and those of (g,h) are multimodal.

Figure 5. (a–h) show thresholds obtained by four thresholding methods on eight synthetic images in Figure 4a–h, respectively. The green shaded areas represent the gray-level histograms of the test images, and the black solid curves indicate the normalized Renyi mutual information (objective function) of the proposed HIST method. Vertical lines in different colors indicate the selected thresholds for the HIST, HBGT, SDVE, and NEVGD methods.

Figure 6. (a–h) show segmentation results of six methods on eight synthetic images in Figure 4a–h, respectively. In each row, from left to right, are the test image from Figure 4, the ground truth, the results of the proposed HIST method, and the results of the comparison methods (HBGT, SDVE, NEVGD, FSC, and RAC).

Figure 7. (a–h) show eight representative real-world images collected from various domains, including ultrasonic imaging, infrared thermal imaging, and optical CCD imaging. Their histogram patterns vary from non-modal to multimodal.

Figure 8. (a–h) show thresholds obtained by four thresholding methods on eight representative real-world images in Figure 7a–h, respectively. The green shaded areas represent the gray-level histograms, and the black curves denote the objective function of the HIST method. Vertical lines in different colors indicate the selected thresholds for the HIST, HBGT, SDVE, and NEVGD methods.

Figure 9. (a–h) show segmentation results of six methods on eight representative real-world images in Figure 7a–h, respectively. In each row, from left to right, are the test image, the ground truth, the results of the proposed HIST method, and the results of the comparison methods (HBGT, SDVE, NEVGD, FSC, and RAC).

Figure 10. ME values of six methods using 100 real-world images. The y-axis represents the ME index, and lower values indicate better segmentation performance.

Figure 11. MCC values of six methods using 100 real-world images. The y-axis represents the MCC index, and higher values indicate better segmentation performance.

Table 1. Comparison of thresholds by four thresholding methods for eight synthetic images.

t_{M E}^{*}

and

t_{M C C}^{*}

denote the optimal thresholds in the sense of minimizing misclassification error and maximizing MCC, respectively. Values in brackets (e.g., [196–206]) indicate the range of optimal thresholds.

Table 1. Comparison of thresholds by four thresholding methods for eight synthetic images.

t_{M E}^{*}

and

t_{M C C}^{*}

denote the optimal thresholds in the sense of minimizing misclassification error and maximizing MCC, respectively. Values in brackets (e.g., [196–206]) indicate the range of optimal thresholds.

Methods	Non-Modal		Unimodal		Bimodal		Multimodal
Methods	Figure 4a	Figure 4b	Figure 4c	Figure 4d	Figure 4e	Figure 4f	Figure 4g	Figure 4h
$t_{M E}^{*}$	115	177	[196–206]	[172–175]	[130–131]	[110–111]	[133–134]	174
$t_{M C C}^{*}$	115	177	[196–206]	[172–175]	130	110	[133–134]	174
HIST	115	177	196	172	130	110	133	174
HBGT [13]	97	128	100	82	120	84	113	120
SDVE [23]	121	116	100	82	101	82	139	144
NEVGD [21]	115	126	110	118	117	95	138	119

Table 2. ME values of the six methods on eight synthetic images. Lower values indicate better accuracy.

Test Image	HIST	HBGT [13]	SDVE [23]	NEVGD [21]	FSC [32]	RAC [33]
Figure 4a	0.0000	0.1182	0.0151	0.0000	0.0413	0.0128
Figure 4b	0.0024	0.1994	0.2475	0.2090	0.0473	0.0203
Figure 4c	0.0000	0.4930	0.4930	0.3397	0.3811	0.0015
Figure 4d	0.0000	0.4229	0.4229	0.0247	0.2861	0.0015
Figure 4e	0.0013	0.0057	0.0580	0.0088	0.0425	0.0071
Figure 4f	0.0046	0.0353	0.0411	0.0177	0.0345	0.0089
Figure 4g	0.0000	0.1990	0.0402	0.0352	0.0436	0.0154
Figure 4h	0.0024	0.3383	0.1201	0.3406	0.0578	0.0549

Table 3. MCC values of the six methods on eight synthetic images. Higher values indicate better accuracy.

Test Image	HIST	HBGT [13]	SDVE [23]	NEVGD [21]	FSC [32]	RAC [33]
Figure 4a	1.0000	0.7728	0.9644	1.0000	0.9028	0.9701
Figure 4b	0.9944	0.6560	0.5957	0.6435	0.8951	0.9521
Figure 4c	1.0000	0.0395	0.0395	0.0543	0.0497	0.0000
Figure 4d	1.0000	0.0455	0.0455	0.2381	0.0615	0.0000
Figure 4e	0.9970	0.9867	0.8771	0.9795	0.9004	0.9831
Figure 4f	0.9891	0.9220	0.9103	0.9595	0.9177	0.9791
Figure 4g	1.0000	0.6565	0.9054	0.9171	0.8980	0.9642
Figure 4h	0.9944	0.4930	0.7697	0.4906	0.8730	0.8828

Table 4. Paired t-tests and Wilcoxon signed-rank tests on the ME and MCC scores obtained from the 100 real-world images.

	Statistical Testing on ME			Statistical Testing on MCC
Baseline	t	p	p_Wilcoxon	t	p	p_Wilcoxon
HBGT [13]	9.593461	3.86 × 10⁻¹⁶	1.40 × 10⁻¹⁷	11.30168	7.10 × 10⁻²⁰	7.53 × 10⁻¹⁸
SDVE [23]	9.95277	6.28 × 10⁻¹⁷	1.55 × 10⁻¹⁸	12.98025	1.79 × 10⁻²³	1.72 × 10⁻¹⁸
NEVGD [21]	4.060804	4.86 × 10⁻⁵	4.56 × 10⁻¹⁷	7.345089	2.82 × 10⁻¹¹	4.85 × 10⁻¹⁸
FSC [32]	7.908764	1.78 × 10⁻¹²	1.35 × 10⁻¹⁸	9.649167	2.91 × 10⁻¹⁶	1.35 × 10⁻¹⁸
RAC [33]	9.134146	3.91 × 10⁻¹⁵	2.15 × 10⁻¹⁸	10.58498	2.59 × 10⁻¹⁸	2.25 × 10⁻¹⁸

Table 5. Comparison of CPU runtime (mean and standard deviation) for six methods. The results are averaged over 10 runs on 8 synthetic images (left) and 100 real-world images (right). Lower values indicate higher computational efficiency.

Methods	CPU Runtime on Synthetic Images (s)		CPU Runtime on Real-World Images (s)
Methods	Mean	Standard Deviation	Mean	Standard Deviation
HIST	0.2351	0.0562	0.3565	0.1262
HBGT [13]	0.0041	0.0073	0.0006	0.0027
SDVE [23]	0.0045	0.0077	0.0008	0.0082
NEVGD [21]	0.5228	0.1301	0.4138	0.1635
FSC [32]	0.6483	0.0691	1.5109	0.7153
RAC [33]	0.6247	0.1347	0.9358	0.3250

Table 6. CPU runtime of six methods on a synthetic image (Figure 4a), scaled to different resolutions, demonstrating the computational scalability of each method.

Methods	128 × 128 Pixels	1024 × 1024 Pixels	2048 × 2048 Pixels
HIST	0.2452 (s)	7.4786 (s)	42.2295 (s)
HBGT [13]	0.0006 (s)	0.0046 (s)	0.0095 (s)
SDVE [23]	0.0006 (s)	0.0036 (s)	0.0092 (s)
NEVGD [21]	0.4967 (s)	0.5738 (s)	0.6331 (s)
FSC [32]	0.7829 (s)	35.0077 (s)	144.4572 (s)
RAC [33]	0.7219 (s)	21.9914 (s)	84.9226 (s)

Table 7. CPU runtime of six methods on a real-world image (Figure 7d) scaled to different resolutions, demonstrating the computational scalability of each method.

Methods	256 × 256 Pixels	1024 × 1024 Pixels	2048 × 2048 Pixels
HIST	0.4792 (s)	7.2050 (s)	42.7171 (s)
HBGT [13]	0.0009 (s)	0.0042 (s)	0.0129 (s)
SDVE [23]	0.0007 (s)	0.0035 (s)	0.0103 (s)
NEVGD [21]	0.1973 (s)	0.2645 (s)	0.2783 (s)
FSC [32]	5.4801 (s)	38.5442 (s)	159.5130 (s)
RAC [33]	1.8168 (s)	21.8165 (s)	91.3975 (s)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zou, Y.; Yu, W.; Huang, Q. Automatic Threshold Selection Guided by Maximizing Homologous Isomeric Similarity Under Unified Transformation Toward Unimodal Distribution. Electronics 2026, 15, 451. https://doi.org/10.3390/electronics15020451

AMA Style

Zou Y, Yu W, Huang Q. Automatic Threshold Selection Guided by Maximizing Homologous Isomeric Similarity Under Unified Transformation Toward Unimodal Distribution. Electronics. 2026; 15(2):451. https://doi.org/10.3390/electronics15020451

Chicago/Turabian Style

Zou, Yaobin, Wenli Yu, and Qingqing Huang. 2026. "Automatic Threshold Selection Guided by Maximizing Homologous Isomeric Similarity Under Unified Transformation Toward Unimodal Distribution" Electronics 15, no. 2: 451. https://doi.org/10.3390/electronics15020451

APA Style

Zou, Y., Yu, W., & Huang, Q. (2026). Automatic Threshold Selection Guided by Maximizing Homologous Isomeric Similarity Under Unified Transformation Toward Unimodal Distribution. Electronics, 15(2), 451. https://doi.org/10.3390/electronics15020451

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automatic Threshold Selection Guided by Maximizing Homologous Isomeric Similarity Under Unified Transformation Toward Unimodal Distribution

Abstract

1. Introduction

2. Related Work

3. Unified Transformation Toward Unimodal Distribution

4. Extraction of Binary Contour Images

5. Calculation of Homologous Isomeric Similarity

6. Algorithm Description of HIST Method

7. Experimental Results and Discussions

7.1. Experimental Environment, Comparison Methods, and Quantitative Evaluation Indicators

7.2. Experimental Results and Analysis Using Synthetic Images

7.3. Experimental Results and Analysis on Real-World Images

7.4. Comparison of Computational Efficiency

8. Conclusions and Future Work

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI