Next Article in Journal
Purtscher-like Retinopathy in a Patient with Acute Alcoholic Pancreatitis and a Literature Review
Next Article in Special Issue
Automated Cervical Nuclei Segmentation in Pap Smear Images Using Enhanced Morphological Thresholding Techniques
Previous Article in Journal
Performance of a Vision-Language Model in Detecting Common Dental Conditions on Panoramic Radiographs Using Different Tooth Numbering Systems
Previous Article in Special Issue
Lightweight Evolving U-Net for Next-Generation Biomedical Imaging
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Color Normalization in Breast Cancer Immunohistochemistry Images Based on Sparse Stain Separation and Self-Sparse Fuzzy Clustering

by
Attasuntorn Traisuwan
1,
Somchai Limsiroratana
1,
Pornchai Phukpattaranont
2,
Phiraphat Sutthimat
3,* and
Pichaya Tandayya
1,*
1
Department of Computer Engineering, Faculty of Engineering, Prince of Songkla University, Karnjanavanich Rd., Songkhla 90110, Thailand
2
Department of Electrical Engineering, Faculty of Engineering, Prince of Songkla University, Karnjanavanich Rd., Songkhla 90110, Thailand
3
Department of Mathematics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand
*
Authors to whom correspondence should be addressed.
Diagnostics 2025, 15(18), 2316; https://doi.org/10.3390/diagnostics15182316
Submission received: 16 July 2025 / Revised: 3 September 2025 / Accepted: 9 September 2025 / Published: 12 September 2025
(This article belongs to the Special Issue Medical Images Segmentation and Diagnosis)

Abstract

Background and Objective: The color normalization of breast cancer immunohistochemistry (IHC)-stained images helps change the color distribution of undesirable IHC-stained images to be more interpretable for the pathologists. This will affect the Allred score that the pathologists use to estimate the drug quantity for treating breast cancer patients. Methods: A new color normalization technique based on sparse stain separation and self-sparse fuzzy clustering is proposed. Results: The quaternion structural similarity was used to measure the quality of the normalization algorithm. Our technique has a structural similarity score lower than other techniques, and the color distribution similarity is closer to the target. We applied automated and unsupervised nuclei classification with Automatic Color Deconvolution (ACD) to test the color features extracted from normalized images. Conclusions: The classification result from our unsupervised nuclei classification with ACD is similar to other normalization methods, but it offers an easier perception to the pathologists.

1. Introduction

Breast cancer is prevalently diagnosed among women, particularly those under 40, and is a significant cause of mortality, accounting for approximately 44,800 deaths annually in this age group [1]. Pathologists play a crucial role in clinical care by diagnosing breast cancer, determining tumor malignancy, assessing its growth within the breast, and identifying any spread to lymph nodes or other organs. This evaluation typically involves examining stained cancer tissues under a light microscope.
A major issue in evaluating histopathological images, particularly for scoring, arises from color variations caused by differences in stain operator protocols, exposure times, and slide scanner specifications. These inconsistencies significantly impact the quality of feature extraction from the images. To address this, color normalization techniques are employed to standardize image colors, making them more general and consistent.
Historically, image normalization can be achieved by adjusting the colors of a source image to match a target image, which is a process that can be performed using image editing software like GNU Image Manipulation Program (GIMP) or Adobe Photoshop. More advanced methods include histogram-matching algorithms [2], which were initially developed for grayscale images but are adaptable for color images by matching individual color channels. Another approach, the color-matching algorithm, specifically adjusts the mean and standard deviation of l α β channels.
Macenko et al. put forth a normalization algorithm for histological slides in 2009, which employed color deconvolution for the identification of stain components, subsequently utilizing singular value decomposition (SVD) projection for normalization [3]. Although this approach demonstrated efficacy, it failed to maintain the structural integrity of the tissue, thereby impacting diagnostic outcomes. The guidelines by the College of American Pathologists (CAP) emphasize the necessity of accurate histological imaging to ensure reliable diagnoses. The Laboratory General Checklist [4] established by the CAP necessitates the maintenance of the structural integrity of tissue to avert misdiagnosis, thereby reflecting an ethical obligation to patients. The World Health Organization (WHO) asserts that the diagnosis of tumors emphasizes the significance of the preservation of the tissue structure [5]. In response to this issue, Vahadane et al. refined the color deconvolution technique to safeguard the tissue structure, designating their approach as structure-preserving color normalization [6]. More recently, advanced deep learning models such as StainGAN were employed for unpaired image-to-image translation to facilitate the transfer of stylistic elements in digital histological images [7]. Additionally, fuzzy clustering methodologies were proposed to mitigate uncertainty in the analysis of histological images and enhance color normalization.
The ambiguity inherent in histological image analysis constitutes one of the principal challenges. Maji and Mahapatra proposed the application of circular clustering within the fuzzy approximation domain for the purpose of color normalization of histological images [8]. They employed the round-fuzzy circular cluster model to generate values in the weighted hue histograms of both the source and template images prior to the implementation of non-negative matrix factorization, which was aimed at achieving effective stain separation. Furthermore, comprehensive probability modeling or the Bayesian methodology was utilized to ascertain stain separation in histological images [9,10].
Most existing normalization methods primarily focus on hematoxylin and eosin (H&E) stained slides due to the greater availability of open datasets for H&E compared to immunohistochemistry (IHC) staining. H&E staining provides basic morphological information but lacks molecular details like antigen expression, which IHC staining offers. IHC staining is crucial for pathologists to predict cancer cell growth, and its evaluation involves counting different types of nuclear staining and cell populations. Given the large volume of slides pathologists process, automated image analysis systems were developed for biomarker scoring [11,12,13,14,15]. Unnormalized IHC images can lead to incorrect cell labeling, potentially resulting in inappropriate treatments and affecting cancer cell growth. Furthermore, empirical evidence demonstrated a statistically significant enhancement in diagnostic confidence upon applying medical image normalization [16].
This paper introduces a color normalization method specifically for IHC-stained images. It adapts techniques previously used for H&E stained images, despite the difference in the number of perceived colors (two in H&E vs. three in IHC), by finding a better structure-preserved normalization method to prepare IHC images. The subsequent sections of this manuscript are described as follows. The materials and methodologies are presented in Section 2. Section 3 presents a comparative analysis of the qualitative and quantitative outcomes derived from various benchmark algorithms against our proposed methodology. The discourse is articulated in Section 4. Finally, Section 5 encapsulates the research conducted within this paper.

2. Materials and Methods

2.1. Database Description

In the present study, our immunohistochemical (IHC) images were obtained from the Department of Pathology at the Faculty of Medicine, Prince of Songkla University. These images were procured from four general and regional hospitals situated in the southernmost provinces of Thailand between January and June 2022. The total number of cases encompassed 151, which were sourced from Naradhiwas Rajanagarindra Hospital, Pattani Hospital, Yala Regional Hospital, and Sungaikolok Hospital. They were identified with ductal carcinoma in situ (DCIS) or epithelial breast cancer at the ages of 23–79 years. Of the 151 participants included, 91 (60.3%) were below 50 years of age, and 60 (39.7%) were above 50 years of age. Furthermore, 131 (86.8%) had invasive ductal carcinoma, 6 (4%) had DCIS, and 14 (9.3%) had pathological types of cancer. The research protocol received approval from the Human Research Ethics Committee of Naradhiwas Rajanagarindra Hospital (REC 001/2564). Furthermore, this investigation conformed to the principles delineated in the Declaration of Helsinki.
The breast tissue images were acquired utilizing a light microscope (Eclipse 80i advanced research microscope, Nikon Instech Co., Ltd., Tokyo, Japan) at a magnification of 40×. These images were stored in a 24-bit color JPEG format. The resolution of the images is 720 × 900 pixels. A singular image may encompass two distinct types of nuclei: cancerous nuclei and non-cancerous nuclei. Each nucleus is stained utilizing two distinct stain colors: blue (immunonegative stain) and brown (immunopositive stain). Pathologists concentrate on quantifying the number of immunopositive nuclei in conjunction with the count of immunonegative nuclei. Five images exhibiting various variations were randomly selected for inclusion in this study.

2.2. Color Deconvolution (CD)

Color deconvolution constitutes a sophisticated image analysis technique formulated to delineate and quantify immunohistochemical staining. Its principal aim is to establish a versatile and robust methodology for the objective immunohistochemical assessment of samples subjected to up to three distinct stains, including horseradish peroxidase staining developed with 3,3′ diaminobenzidine (DAB), hematoxylin, and eosin. This methodological approach aspires to address the challenges posed by conventional color transformation techniques by precisely isolating the contributions of individual stains, even in scenarios where they co-localize or exhibit overlapping absorption spectra. The intensities of light transmitted through a specimen ( A ) can be characterized using the Beer–Lambert law [17]. The correlation between the intensity of light traversing the specimen ( I C ) and the intensity of light entering the specimen ( I 0 , c ) , in conjunction with the absorption factor ( c ) , is delineated as follows [11]:
I C = I 0 , C exp ( A c C ) .
The subscript C denotes the specific detection channel. The concentration of the stain exhibits a non-linear relationship with the RGB color values [3]. Consequently, the RGB color values are not suitable for the purposes of separation and quantification of the concentration of the stain. The optical density ( O D ) can be articulated as shown below:
O D C = l o g ( I C I 0 , C ) = A c C .
Color deconvolution [11] constitutes a methodological approach aimed at the identification of the stain vector ( V R m × r ) and the absorption factor ( S R r × n ) through the process of decomposing O D R , G , B R m × n , where m = 3 for the RGB channels, n is the number of pixels, and r is the number of stains. The intensities of light that are transmitted through a specimen ( O D R , G , B ) may be expressed as a matrix representation of V, while S can be articulated as the matrix denoting the saturation levels of each individual stain as shown below:
O D R , G , B = V S .
Furthermore, the procedural framework for converting the RGB color space ( I R , G , B ) into the optical density (OD) domain [18] or the Beer–Lambert transformation (BLT) yields the following result:
O D R , G , B = l o g ( I R , G , B I ( R , G , B ) blank ) ,
where I ( R , G , B ) blank is the illuminating light intensity on the sample (usually 255 for 8-bit images).
Conversely, the procedure for converting the OD color space into the RGB color space, or the inverted Beer–Lambert transformation (IBLT), is articulated as follows:
I R , G , B = I ( R , G , B ) blank e x p ( O D R , G , B ) .

2.3. Contrast Stretching (CS)

The Spatio-Temporal Retinex-like Envelope with Stochastic Sampling (STRESS) algorithm [19], conceptualized by Kolås et al., is engineered to emulate the adaptive functions of the Human Visual System. Its fundamental purpose is to compute local reference black and white points for each chromatic channel contained within an image. This computation involves the estimation of two envelope functions—maximum ( E max ) and minimum ( E min )—which encapsulate the image signal and exhibit gradual variation. These envelopes are distinguished by their adherence to the signal, exhibiting smoothness, edge preservation, and convergence to the global maximum for E max and global minimum for E min . The algorithm derives these envelopes for each pixel through the application of a random spray model. Upon the determination of the envelopes, the value of each pixel is modified to enhance contrast, highlight details, and equilibrate the three color channels, thereby effectively executing color correction. It facilitates local contrast enhancement, automatic color adjustment, and high dynamic range image rendering. Each pixel in the image ( p 0 ) undergoes an update as shown below:
p stress = p 0 E min E max E min ,
E min = p 0 v ¯ r ¯ ,
and
E max = p 0 + ( 1 v ¯ ) r ¯ = E min + r ¯ .
The variables r ¯ and v ¯ present within the equations denote the average of the sample range and the average of the relative pixel values, which are computed as shown below:
r ¯ = 1 N i = 1 N r i ,
v ¯ = 1 N i = 1 N v i ,
where N represents the number of iterations, r i is the range of the samples, and  v i is the relative value of the center pixel given as shown below:
r i = s i max s i min ,
v i = 1 2 , if r i = 0 p 0 s i min r i , otherwise .
In this context, s i max and s i min denote the uppermost and lowermost sample values, respectively, derived from a spray that is ascertained through a selection of stochastic samples originating from a disk of radius R centered at the point p 0 , which is articulated as follows:
s i max = max { 0 , , M } p j ,
s i min = min { 0 , , M } p j ,
when the number of samples is denoted as M, and the random sample values are denoted as p j . In the context of color imagery, the computation is executed independently for each individual color channel.
Finally, R should equilibrate detail retention and illumination rectification. For instance, in dimly lit images, an augmented R may more accurately gauge global illumination, whereas in high-contrast scenes, a diminished R circumvents excessive smoothing. The algorithm may employ multi-scale methodologies, administering disparate R values. The selection of M and N should consider the image’s noise level and intricacy. Noisy images may necessitate larger M and N to average out noise influences. An enhanced R typically mandates larger M and N to guarantee adequate sampling density within the disk, thus avoiding sparse or erroneous envelope estimates.

2.4. Stain Separation (SS)

Macenko et al. introduced a computational framework that systematically detects the appropriate stain vectors corresponding to an image [3], thus facilitating color deconvolution and normalization processes. The objective of this methodology is to reconcile histological slides subjected to disparate processing conditions into a unified, standardized framework, which consequently enhances both quantitative analytical capabilities and visual uniformity. In order to ascertain the optimal stain vector (v), it is necessary to utilize solely β . The algorithm is delineated as in Algorithm 1.
Algorithm 1 Automatic Color Deconvolution Algorithm [3]
     Input: RGB Slide, β
1:
procedure  FindOptimalStainVectors
2:
    Convert RGB to OD using Equation (4)
3:
    Remove data with OD intensity less than β
4:
    Calculate SVD on the OD tuples
5:
    Create a plane from the SVD directions corresponding to the two largest singular values
6:
    Project data onto the plane and normalize them into a unit length
7:
    Calculate the angle of each point with respect to the first SVD direction
8:
    Find robust extreme values back to the OD space
     Output: Optimal Stain Vectors
Furthermore, non-negative matrix factorization (NMF) may be employed for the purpose of stain separation [20]. Given that the stain is capable of absorbing light but is unable to emit it, therefore, the stain vector (V) and the absorption factor (S) as articulated in Equation (3) must be non-negative. As a result, it is feasible to determine V and S by resolving the following equation:
min V , S 1 2 O D R , G , B V S F 2 , such that V , S 0 .
Vahadane et al. [6] introduced a new cost function that enhances Equation (15) through the incorporation of l1 sparsity regularization applied to the stain vector (V), where each stain has been indexed by j as shown below:
min V , S 1 2 O D R , G , B V S F 2 + λ j = 1 r S ( j , : ) 1 ,
such that V , S 0 , V ( : j ) 2 2 = 1 . λ is the parameter for sparsity and regularization. Establishing λ = 0 diminishes SNMF to NMF. The optimal value can be ascertained through a grid search by identifying the minimum error of the projected stain color matrix or the correlation of the projected stain density maps [6].
Additionally, sparse coding methodologies [21] may be employed to infer S, while dictionary learning techniques [22] are applicable for the estimation of V. Consequently, Vahadane et al. utilized the SPArse Modeling Software version 2.6 (SPAMS) [23] for the purpose of estimating these parameters.

2.5. Fuzzy Clustering (FC)

The Robust Self-Sparse Fuzzy Clustering Algorithm (RSSFCA) represents an innovative methodology formulated for the purpose of image segmentation [24], explicitly targeting two prevalent challenges encountered in conventional fuzzy clustering algorithms: heightened sensitivity to outliers attributable to non-sparse fuzzy memberships and excessive image segmentation resulting from an insufficiency of local spatial information. RSSFCA offers two significant contributions: initially, it incorporates a regularization framework under the Gaussian metric into the objective function of fuzzy clustering algorithms to attain fuzzy memberships characterized by sparsity, thereby diminishing noise and enhancing clustering efficacy. Furthermore, it presents a connected-component filtering mechanism predicated on an area density balance strategy to address the issue of image over-segmentation, which is comparatively simpler and more rapid than the integration of local spatial information for the elimination of minor areas. Empirical findings demonstrate that RSSFCA proficiently alleviates the sensitivity to outliers and the problem of over-segmentation, thereby producing superior image segmentation results comparing to previous leading algorithms in the field. In order to effectively address sparse fuzzy memberships, a regularization term γ i = 1 c j = 1 n u i j 2 was delineated within the objective function as follows:
J ˜ = i = 1 c j = 1 n u i j Φ ( x j | v i , Σ i ) + γ i = 1 c j = 1 n u i j 2 ,
where x j represents an instance of unlabeled data, v i denotes the corresponding centroid of the cluster, and  u i j signifies the membership degree of x j relative to the clustering center v i , which was constrained such that 0 u i j 1 and i c u i j = 1 across c distinct clusters. Furthermore, Φ ( x j | v i , Σ i ) is delineated as shown below:
Φ ( x j | v i , Σ i ) = ln ( ρ ( x j | v i , Σ i ) ) ,
where ρ ( x j | v i , Σ i ) denotes the Gaussian probability density function represented as shown below:
ρ ( x j | v i , Σ i ) = e x p ( 1 2 ( x j v i ) T ) Σ i 1 ( x j v i ) ( 2 π ) ( D / 2 ) | Σ i | ( 1 / 2 ) .
In this context, D represents the dimensionality of the input data, while i denotes the covariance matrix that encapsulates the intra-class variability of the ith class. A reduction in Σ results in a negative value of Φ , which subsequently induces significant inaccuracies in distance calculations and resultant misclassification. Consequently, the issue was addressed by employing Φ ( x j | v i , Σ i ) in place of Φ ( x j | v i , Σ i ) as shown below:
Φ ( x j | v i , Σ i ) = Φ m i n ( Φ ) , if m i n ( Φ ) < 0 Φ , otherwise .
Consequentially, their final objective function is defined as shown below:
J ˜ = i = 1 c j = 1 n u i j Φ ( x j | v i , Σ i ) + γ i = 1 c j = 1 n u i j 2 .
Moreover, J ˜ j can be separated into c sub-problems as shown below:
J ˜ j = m i n i = 1 c ( u i j Φ ( x j | v i , Σ i ) + γ u i j 2 ) .
The update v i can be calculated by solving J ˜ j v i = 0 as shown below:
v i = j = 1 n u i j x j j = 1 n u i j .
Furthermore, the update Σ i from J ˜ j Σ i = 0 can be calculated as shown below:
Σ i = j = 1 n u i j ( x j v i ) T ( x j v i ) j = 1 n u i j .
The comprehensive outline of their methodology is delineated in Algorithm 2, wherein the inputs consist of the number of clusters (c), the regularization parameter ( γ ), the convergence threshold ( η ), and the maximum number of iterations (T). Finally, the algorithm’s outputs are denoted as U ˜ , V ˜ , Σ ˜ , which are instrumental for the segmentation of pixels within the image.
Algorithm 2 RSSFCA Algorithm [24]
     Input: c, γ , η , T
1:
procedure  RSSFCA
2:
    Initialize the membership U ( 0 ) and the clustering centers V ( 0 ) using the FCM algorithm
3:
    Initialize the covariance matrix Σ ( 0 ) utilizing the membership and clustering centroids derived from step 2 and Equation (24), and augment it with an identity matrix  ( I )
4:
     t 1
5:
    Update U ( t ) , V ( t ) , Σ ( t ) using Equations (22)–(24)
6:
    Update the objective function J ˜ ( t ) using Equation (17)
7:
    if  m a x | J ˜ ( t ) J ˜ ( t 1 ) η | or t T  then
8:
        Stop
9:
    else
10:
         t t + 1
11:
        Go to Step 4
     Output:  U ˜ , V ˜ , Σ ˜

2.6. Structure-Preserving Color Normalization (SPCN)

Structure-Preserving Color Normalization (SPCN) is a technique designed to standardize the color appearance of histological images while preserving their underlying biological structure [6]. This method is built upon sparse non-negative matrix factorization (SNMF) for stain separation, which decomposes images into sparse and non-negative stain density maps. SPCN works by replacing the color basis of a source image with that of a pathologist-preferred target image while maintaining the source’s original stain concentrations. This ensures that the structural information, captured in the stain density maps, remains intact, and only the color appearance is altered. This approach addresses the issue of color variations in histological images caused by differences in staining protocols, raw materials, and scanner responses, making images more comparable for analysis by pathologists and software.
In order to achieve a normalized color representation from a source image (s) to a target image (t), it is imperative to decompose the source optical density ( O D s ) into the product of the matrices V s and S s , while concurrently decomposing the target optical density ( O D t ) into the matrices V t and S t , as delineated in Equation (16). Subsequently, the matrix S s necessitates normalization according to the following formulation:
S s n o r m ( j , : ) = S s ( j , : ) S s R M ( j , : ) S t R M ( j , : ) , j = 1 , , r ,
where S i R M = R M ( S i ) R r x 1 , i = ( s , t ) and R M ( ) denotes the pseudo-maximum value of each row vector at the 99% threshold. Finally, the normalized source will be computed as shown below:
O D s n o r m = V t S s n o r m .

2.7. Quaternion Structural Similarity (QSSIM)

Quaternion Structural SIMilarity (QSSIM) is a visual quality matrix (VQM) designed for color images, representing a vectorial expansion of the traditional Structural SIMilarity (SSIM) index using Quaternion Image Processing (QIP) [25]. Unlike scalar methods that often fail to adequately measure combined degradations like blur and desaturation, QSSIM employs a true vectorial approach, treating each color pixel as a single quaternion number. This allows QSSIM to measure changes in both luminance and chrominance vectors simultaneously, making it particularly effective in predicting the visual quality of color images subjected to complex degradations, such as those caused by the color crosstalk effect. Existing VQMs primarily focus on either luminance or chrominance changes, offering superior correlation with human subjective tests for combined degradations.
QSSIM possesses the capability to evaluate the simultaneous degradation attributed to both blurriness and desaturation. The formulation presented in Equation (27) encompasses components of luminance ( μ q ), chromatic components ( σ q q ), and the cross-correlation of color ( σ q ), each of which is delineated as follows:
Q S S I M r e f , d e g = | ( 2 μ q ref · μ q deg μ q ref 2 + μ q deg 2 ) ( 2 σ q ref , deg σ q ref 2 + σ q deg 2 ) | ,
where the standard deviations of the source (ref) and processed images (deg) are represented by σ q ref and σ q deg , respectively.
Moreover, the first term in QSSIM is a luminance comparison term that measures the similarity in average brightness (mean intensity) between two images or image patches. Luminance reflects the overall illumination level, which is critical for visual perception. The second term is a structural term. It measures the correlation of pixel intensity patterns, capturing structural similarity (e.g., edges, textures). A high measurement indicates that the processed image preserves the structural details of the source image.

2.8. Classification of Nuclei in Breast Cancer IHC Based on Automatic Color Deconvolution (CNACD)

The classification of breast cancer nuclei in immunohistochemical (IHC) images can be conducted approximately utilizing a solitary pixel representing the nucleus. The process of stain separation must be implemented on the pixel through the application of Algorithm 1. Subsequently, each RGB stain value is consolidated into a singular grayscale value to facilitate enhanced comparative analysis. The maximum stain value will serve as the criterion for determining the classification of the nuclei. This methodology is encapsulated in Algorithm 3.
Algorithm 3 CNACD  Algorithm
     Input: RGB Pixels
1:
procedure  PredictPixels
2:
     V FindOptimalStainVectors ( I n p u t , )
3:
     S LARS - LASSO ( X = ( O D I n p u t ) T , D = V , λ 1 = 0.1 )
4:
     S R , G , B ConvertODtoRGB ( S )
5:
     S p o s , S n e g SeperateStain ( S R , G , B )
6:
     S p o s ConvertRGBtoGRAY ( S p o s )
7:
     S n e g ConvertRGBtoGRAY ( S n e g )
8:
     Result new Array
9:
    for  i 0 to SizeOf ( I n p u t )  do
10:
        if  S p o s [ i ] > S n e g [ i ]  then
11:
            Result . push ( )
12:
        else
13:
            Result . push ( + )
14:
     Output Result
     Output: Predicted Results

2.9. The Proposed Normalization Method

To normalize breast cancer immunohistochemistry (IHC) images, a multi-step method is proposed. Initially, both source and target stained images undergo contrast stretching or STRESS processing, which can be performed using software like GNU Image Manipulation Program. Following this, the images are converted from the RGB color space to the optical density (OD) space using Beer–Lambert’s law. The stained images are then separated into stain vector and stain absorption matrices. The color appearance of the source image is normalized to match the target image using Structure-Preserving Color Normalization (SPCN). The resulting image from the SPCN block is converted back from OD space to RGB space. Finally, the pixels of the contrast-stretched source image are classified using the Robust Self-Sparse Fuzzy Clustering Algorithm (RSSFCA) to identify background and nuclei, thereby determining their locations. This comprehensive approach aims to standardize the color distribution of IHC images, making them more interpretable for pathologists. The schematic representation of the proposed normalization technique is illustrated in Figure 1.

2.10. Experiment Settings

In the conducted experiment, we established a series of tests to evaluate five distinct image normalization algorithms, each applied to five different color variations of immunohistochemically stained images, for comparative analysis against our proposed methodology. The algorithms under scrutiny include the color transfer between images method, the histogram specification technique, the Macenko approach, the Structure-Preserving Color Normalization method, and the STRESS technique. Subsequently, we employed QSSIM to assess the degradation in clarity and color saturation of the normalized outputs relative to the original images. The efficacy of histological information preservation is quantified utilizing QSSIM. Furthermore, a three-dimensional histogram visualization of color distribution is employed to facilitate quantitative comparisons. Moreover, we analyze the impact of normalization on classifier performance by utilizing the CNACD to approximately classify breast cancer nuclei based on their central pixels annotated by pathologists. The classification outcomes are derived from six normalization techniques to ascertain whether these methodologies influence the Allred score.

3. Results

In this study, we executed our model on a computational system equipped with an Intel Core i5 2.3 GHz CPU and 8 GB of RAM. Our proposed normalization methodology was subjected to a comparative analysis against five distinct image normalization algorithms. We conducted tests utilizing five various color modifications of IHC-stained images. The findings are illustrated in Figure 2. The qualitative and quantitative assessments were categorized into three segments. Initially, we assessed the structural similarity of our proposed methodology in relation to the five image normalization algorithms. Subsequently, to demonstrate the efficacy of our normalization technique for Allred scoring, we evaluated the classification accuracy by employing the CNACD classifier. Lastly, for the purpose of quantitative assessment, the 3D histogram visualization of color distribution was utilized and thoroughly analyzed. The outcomes of all methodologies are elaborated upon in the subsequent subsections.

3.1. Evaluation of Structure Similarity

For the first evaluation, the QSSIM scores were calcucated using a MATLAB version 2022b script which was written by Kolamen [25]. The detail in the script follows Equation (27). The structure similarity scores for five image normalization algorithms applied with five different color variations of IHC stained images are shown in Table 1.

3.2. Classification Performances

To ascertain the classification efficacy, the ground truth was established by the pathologists. The algorithm employed as the classifier was the CNACD methodology. The accuracy (AC) was computed as shown below:
A C = T P + T N T P + F N + T N + F P ,
where TP denotes the aggregate count of true positive cancer nuclei identified within the immunohistochemistry (IHC) image, TN represents the cumulative total of true negative cancer nuclei, FN indicates the overall number of false negative cancer nuclei, and FP signifies the complete tally of false positive cancer nuclei. The accuracy metrics derived from the various normalization algorithms are presented in Table 2.

3.3. Quantitative Comparison with 3D Histogram Visualization of Color Distribution

The three-dimensional histogram representation was generated utilizing the “Color Inspector 3D” version 2.3 [27] plugin within the ImageJ version 1.53 [28] framework. Each color pixel within the image is depicted as the centroid of a circle. Furthermore, the circumference of each circle signifies the prevalence of the corresponding color pixels. This representation is illustrated in Figure 3.

4. Discussion

We have conducted a comparative analysis of our proposed methodology against five distinct color normalization techniques: the color transfer methodology as articulated in [26], the histogram specification technique delineated in [2], the Macenko methodology as described in [3], the SPCN technique referenced in [6], and the STRESS methodology elucidated in [19].
Figure 2 illustrates the outcomes of all experimental conditions. It is noteworthy that the backgrounds associated with the results derived from the Macenko and SPCN methodologies do not exhibit a pristine white coloration.
Figure 4 shows results from each step of our proposed method. In Table 1, the similarity metrics between the normalized source and the ground truth have been computed employing QSSIM [29]. The findings demonstrate that STRESS consistently achieves superior performance relative to other competing methodologies. According to Figure 4e, thanks to the RSSFCA segmentation error, where certain background regions in the image may be erroneously segmented as nuclei, the tissue structure can be changed, lowering our method’s QSSIM score. Furthermore, the variables for RSSFCA can be optimized through grid search.
In reference to the three-dimensional visualization depicted in Figure 3, the methodologies for color transfer involving image and histogram specification exhibit the presence of purple clusters. Furthermore, it is noteworthy that the color clusters associated with the STRESS methodology are devoid of the brown color cluster. In our proposed approach, the color clusters demonstrate a closer resemblance to those of the target image. However, there are more brownish pixels, but these are not in the nuclei. This reflects the structure changes, consequently, affecting the structure similarity score, QSSIM.
Table 3 delineates the computational complexity associated with each phase of our proposed methodology. The cumulative complexity is denoted as O ( n 2 ) . This complexity is dictated by the term possessing the highest order, specifically STRESS. The STRESS component emerges as the most critical step, as it significantly enhances the image contrast. Furthermore, it is noteworthy that the complexity can be mitigated through the application of the Quantile-Based Retinex (QBRIX) [30] or the Retinex-Based Fast Algorithm (RBFA) [31].
The aggregation of brown-hued regions within the histopathologically stained imagery is of significant relevance as it serves as a basis for the computation of the Allred score. This particular score is instrumental in assessing the therapeutic regimen for breast cancer patients [32].
Moreover, the results of the classification experiment are presented in Table 2. This table illustrates that the outcomes of our method exhibit a degree of similarity to those of other methods. In Test 1, our method outperforms the performance of other techniques. The classification of the eight true negative nuclei cannot be adequately achieved through the use of a single pixel, as certain brown nuclei, which include blue portions, were identified as immunonegative by the pathologists. Consequently, the classification of these eight nuclei necessitates the involvement of neighboring pixels to enhance the weighting of the classification outcome. Furthermore, the eight nuclei are depicted in Figure 5. The predictive performance concerning the eight nuclei, derived from the normalized output of our method, is illustrated in Figure 6. To evaluate the efficacy of the color features extracted from the normalized images, we have conducted an automated and unsupervised classification of nuclei utilizing Automatic Color Deconvolution (ACD) and determined that it does not substantially influence the accuracy. Although our methodology employs the unsupervised classification technique CNACD, the results yield a performance that is not markedly different from those of other methodologies.
The Allred score is a semi-quantitative method for assessing the estrogen receptor (ER) and progesterone receptor (PR) status in breast cancer IHC slides, yielding a score from 0 to 8 based on positive cell proportion and staining intensity. The assessment is traditionally visual and may introduce subjectivity. Automated methods can be classified as point-based [33] or patch-based [34], with the former focusing on grid points and the latter on image regions, often utilizing deep learning for estimation. Point-based methods exhibit greater precision in low-expression cases, whereas patch-based methods offer efficiency for large Whole Slide Imagings (WSIs) but may miss subtle variations.
The point-based method samples fixed grid points on the image for score estimation. This approach may overlook brown-hued areas. Consequently, the overall score may be compromised. Furthermore, the ACD can improve performance by choosing optimal points rather than relying on grid points. In contrast, the patch-based method necessitates machine learning. Therefore, score accuracy relies on training.

5. Conclusions

A novel methodology for color normalization in breast cancer immunohistochemistry (IHC) images is presented in this manuscript. This approach employs sparse stain separation and self-sparse fuzzy clustering, with the objective of more interpretably rendering images for pathologists by standardizing color distribution, thus facilitating more precise Allred scoring for the evaluation of cancer cell proliferation and informing treatment decisions. Notwithstanding its reduced structural preservation in comparison to alternative methodologies, the classification outcomes derived from the point-based approach are analogous to those obtained from other techniques. Furthermore, it has augmented pathologists’ perception of nuclear morphologies and color saturation.

Author Contributions

Conceptualization, A.T., S.L. and P.T.; Software, A.T.; Validation, A.T., P.S. and P.T.; Investigation, P.P. and P.S.; Writing—original draft, A.T. and P.T.; Writing—review & editing, A.T., P.P., P.S. and P.T.; Supervision, P.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Royal Golden Jubilee Ph.D. program grant number PHD/0199/2557.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Human Research Ethics Committee of the Naradhiwas Rajanagarindra Hospital (REC 001/2564). Date: 1 April 2021.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
IHCImmunohistochemistry
ACDAutomatic Color Deconvolution
GIMPGNU Image Manipulation Program
SVDSingular Value Decomposition
CAPCollege of American Pathologists
WHOWorld Health Organization
H&EHematoxylin and Eosin
DCISDuctal Carcinoma In Situ
CDColor Deconvolution
ODOptical Density
CSContrast Stretching
STRESSSpatio-Temporal Retinex-like Envelope with Stochastic Sampling
SSStain Separation
SPAMSSPArse Modeling Software
FCFuzzy Clustering
RSSFCARobust Self-Sparse Fuzzy Clustering Algorithm
SPCNStructure-Preserving Color Normalization
SNMFSparse Non-negative Matrix Factorization
QSSIMQuaternion Structural SIMilarity
VQMVisual Quality Matrix
QIPQuaternion Image Processing
CNACDClassification of Nuclei in Breast Cancer IHC based on Automatic Color Deconvolution
BLTBeer–Lambert transformation
IBLTInverted Beer–Lambert transformation
TPTrue Positive
TNTrue Negative
FNFalse Negative
FPFalse Positive
QBRIXQuantile-Based Retinex
RBFARetinex-Based Fast Algorithm

References

  1. Daly, A.A.; Rolph, R.; Cutress, R.I.; Copson, E.R. A review of modifiable risk factors in young women for the prevention of breast cancer. Breast Cancer Targets Ther. 2021, 13, 241–257. [Google Scholar] [CrossRef] [PubMed]
  2. Wechsler, H. Digital image processing, 2nd ed. Proc. IEEE 2008, 69, 1174–1175. [Google Scholar] [CrossRef]
  3. Macenko, M.; Niethammer, M.; Marron, J.S.; Borland, D.; Woosley, J.T.; Guan, X.; Schmitt, C.; Thomas, N.E. A method for normalizing histology slides for quantitative analysis. In Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA, USA, 28 June 2009–1 July 2009; pp. 1107–1110. [Google Scholar]
  4. College of American Pathologists, The Lab General Checklist. 2025. Available online: https://www.aab.org/images/CRBSymposium/Behnke%204.pdf (accessed on 15 August 2025).
  5. WHO. WHO Classification of Tumours: Breast Tumours; WHO Classification of Tumours; International Agency for Research on Cancer (IARC): Lyon, France, 2019; Volume 2. [Google Scholar]
  6. Vahadane, A.; Peng, T.; Sethi, A.; Albarqouni, S.; Wang, L.; Baust, M.; Steiger, K.; Schlitter, A.M.; Esposito, I.; Navab, N. Structure-Preserving Color Normalization and Sparse Stain Separation for Histological Images. IEEE Trans. Med Imaging 2016, 35, 1962–1971. [Google Scholar] [CrossRef] [PubMed]
  7. Shaban, M.T.; Baur, C.; Navab, N.; Albarqouni, S. Staingan: Stain style transfer for digital histological images. In Proceedings of the International Symposium on Biomedical Imaging, Venice, Italy, 8–11 April 2019; pp. 953–956. [Google Scholar] [CrossRef]
  8. Maji, P.; Mahapatra, S. Circular Clustering in Fuzzy Approximation Spaces for Color Normalization of Histological Images. IEEE Trans. Med Imaging 2020, 39, 1735–1745. [Google Scholar] [CrossRef] [PubMed]
  9. Hidalgo-Gavira, N.; Mateos, J.; Vega, M.; Molina, R.; Katsaggelos, A.K. Variational Bayesian Blind Color Deconvolution of Histopathological Images. IEEE Trans. Image Process. 2020, 29, 2026–2036. [Google Scholar] [CrossRef]
  10. Pérez-Bueno, F.; Vega, M.; Naranjo, V.; Molina, R.; Katsaggelos, A.K. Fully automatic blind color deconvolution of histological images using super gaussians. In Proceedings of the European Signal Processing Conference, Amsterdam, The Netherlands, 18–21 January 2021; pp. 1254–1258. [Google Scholar] [CrossRef]
  11. Ruifrok, A.C.; Johnston, D.A. Quantification of histochemical staining by color deconvolution. Anal. Quant. Cytol. Histol. 2001, 23, 291–299. [Google Scholar]
  12. Tuominen, V.J.; Ruotoistenmäki, S.; Viitanen, A.; Jumppanen, M.; Isola, J. ImmunoRatio: A publicly available web application for quantitative image analysis of estrogen receptor (ER), progesterone receptor (PR), and Ki-67. Breast Cancer Res. 2010, 12, R56. [Google Scholar] [CrossRef]
  13. Ko, C.C.; Tsai, C.Y.; Lin, C.H.; Liao, K.S. A computer-aided diagnosis system of breast intraductal lesion using histopathological images. In Proceedings of the 29th International Conference on Image and Vision Computing New Zealand, Hamilton, New Zealand, 19–21 November 2014; pp. 212–217. [Google Scholar] [CrossRef]
  14. Liu, Y.; Li, X.; Zheng, A.; Zhu, X.; Liu, S.; Hu, M.; Luo, Q.; Liao, H.; Liu, M.; He, Y.; et al. Predict Ki-67 Positive Cells in H&E-Stained Images Using Deep Learning Independently From IHC-Stained Images. Front. Mol. Biosci. 2020, 7, 183. [Google Scholar] [CrossRef]
  15. Saha, M.; Chakraborty, C.; Arun, I.; Ahmed, R.; Chatterjee, S. An Advanced Deep Learning Approach for Ki-67 Stained Hotspot Detection and Proliferation Rate Scoring for Prognostic Evaluation of Breast Cancer. Sci. Rep. 2017, 7, 3213. [Google Scholar] [CrossRef]
  16. Salvi, M.; Caputo, A.; Balmativola, D.; Scotto, M.; Pennisi, O.; Michielli, N.; Mogetta, A.; Molinari, F.; Fraggetta, F. Impact of Stain Normalization on Pathologist Assessment of Prostate Cancer: A Comparative Study. Cancers 2023, 15, 1503. [Google Scholar] [CrossRef]
  17. Jahne, B. Practical Handbook on Image Processing for Scientific and Technical Applications; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar]
  18. Mouelhi, A.; Sayadi, M.; Fnaiech, F. A novel morphological segmentation method for evaluating estrogen receptors’ status in breast tissue images. In Proceedings of the 2014 1st International Conference on Advanced Technologies for Signal and Image Processing, ATSIP 2014, Sousse, Tunisia, 17–19 March 2014; pp. 177–182. [Google Scholar] [CrossRef]
  19. Kolås, Ø.; Farup, I.; Rizzi, A. Spatio-Temporal Retinex-inspired Envelope with Stochastic Sampling: A framework for spatial color algorithms. J. Imaging Sci. Technol. 2011, 55, 40503-1–40503-10. [Google Scholar] [CrossRef]
  20. Rabinovich, A.; Agarwal, S.; Laris, C.A.; Price, J.H.; Belongie, S. Unsupervised color decomposition of histologically stained tissue samples. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2004. [Google Scholar]
  21. Wu, T.T.; Lange, K. Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2008, 2, 224–244. [Google Scholar] [CrossRef]
  22. Aharon, M.; Elad, M.; Bruckstein, A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 2006, 54, 4311–4322. [Google Scholar] [CrossRef]
  23. Mairal, J.; Bach, F.; Ponce, J.; Sapiro, G. Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 2010, 11, 19–60. [Google Scholar]
  24. Jia, X.; Lei, T.; Du, X.; Liu, S.; Meng, H.; Nandi, A.K. Robust Self-Sparse Fuzzy Clustering for Image Segmentation. IEEE Access 2020, 8, 146182–146195. [Google Scholar] [CrossRef]
  25. Kolaman, A.; Yadid-Pecht, O. Quaternion structural similarity: A new quality index for color images. IEEE Trans. Image Process. 2012, 21, 1526–1536. [Google Scholar] [CrossRef]
  26. Reinhard, E.; Ashikhmin, M.; Gooch, B.; Shirley, P. Color transfer between images. IEEE Comput. Graph. Appl. 2001, 21, 34–41. [Google Scholar] [CrossRef]
  27. Barthel, K.U. 3D-Data Representation with ImageJ. In Proceedings of the First ImageJ User and Developer Conference, Luxemburg, 18–19 May 2006. [Google Scholar]
  28. Schneider, C.A.; Rasband, W.S.; Eliceiri, K.W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 2012, 9, 671–675. [Google Scholar] [CrossRef] [PubMed]
  29. Sangwine, S.J. Fourier transforms of colour images using quaternion or hypercomplex, numbers. Electron. Lett. 1996, 32, 1979–1980. [Google Scholar] [CrossRef]
  30. Gianini, G.; Manenti, A.; Rizzi, A. Qbrix: A quantile-based approach to retinex. J. Opt. Soc. Am. A 2014, 31, 2663–2673. [Google Scholar] [CrossRef] [PubMed]
  31. Liu, S.; Long, W.; He, L.; Li, Y.; Ding, W. Retinex-based fast algorithm for low-light image enhancement. Entropy 2021, 23, 746. [Google Scholar] [CrossRef] [PubMed]
  32. Collins, L.C.; Botero, M.L.; Schnitt, S.J. Bimodal frequency distribution of estrogen receptor immunohistochemical staining results in breast cancer: An analysis of 825 cases. Am. J. Clin. Pathol. 2005, 123, 16–20. [Google Scholar] [CrossRef] [PubMed]
  33. Ilić, I.R.; Stojanović, N.M.; Radulović, N.S.; Živković, V.V.; Randjelović, P.J.; Petrović, A.S.; Božić, M.; Ilić, R.S. The Quantitative ER Immunohistochemical Analysis in Breast Cancer: Detecting the 3 + 0, 4 + 0, and 5 + 0 Allred Score Cases. Medicina 2019, 55, 461. [Google Scholar] [CrossRef] [PubMed]
  34. Ahmad Fauzi, M.F.; Wan Ahmad, W.S.H.M.; Jamaluddin, M.F.; Lee, J.T.H.; Khor, S.Y.; Looi, L.M.; Abas, F.S.; Aldahoul, N. Allred Scoring of ER-IHC Stained Whole-Slide Images for Hormone Receptor Status in Breast Carcinoma. Diagnostics 2022, 12, 3093. [Google Scholar] [CrossRef]
Figure 1. Schematic representation of our proposed methodology for color normalization in immunohistochemistry (IHC) images.
Figure 1. Schematic representation of our proposed methodology for color normalization in immunohistochemistry (IHC) images.
Diagnostics 15 02316 g001
Figure 2. Visual comparison of color normalization methods.
Figure 2. Visual comparison of color normalization methods.
Diagnostics 15 02316 g002
Figure 3. A 3D histogram visualization of color distribution of each color normalization technique.
Figure 3. A 3D histogram visualization of color distribution of each color normalization technique.
Diagnostics 15 02316 g003
Figure 4. The resultant visual representations derived from each segment of our proposed methodology. (a) Original image. (b) STRESS original. (c) SPCN original. (d) Nuclei SPCN original. (e) Background STRESS original. (f) Normalized original.
Figure 4. The resultant visual representations derived from each segment of our proposed methodology. (a) Original image. (b) STRESS original. (c) SPCN original. (d) Nuclei SPCN original. (e) Background STRESS original. (f) Normalized original.
Diagnostics 15 02316 g004
Figure 5. Ground truth Source Image 2 annotated by the pathologists. Green dots signify immunopositive nuclei. Red dots denote immunonegative nuclei. Black dashed rectangles illustrate the erroneous predictions generated by our methodology.
Figure 5. Ground truth Source Image 2 annotated by the pathologists. Green dots signify immunopositive nuclei. Red dots denote immunonegative nuclei. Black dashed rectangles illustrate the erroneous predictions generated by our methodology.
Diagnostics 15 02316 g005
Figure 6. The categorization of nuclei within the normalized Source Image 2 utilizing the methodology we have proposed. Green dots denote immunopositive nuclei, whereas red dots signify immunonegative nuclei. The black dashed rectangles illustrate the instances of erroneous predictions made by our methodology.
Figure 6. The categorization of nuclei within the normalized Source Image 2 utilizing the methodology we have proposed. Green dots denote immunopositive nuclei, whereas red dots signify immunonegative nuclei. The black dashed rectangles illustrate the instances of erroneous predictions made by our methodology.
Diagnostics 15 02316 g006
Table 1. Quality metrics of various color normalization methods.
Table 1. Quality metrics of various color normalization methods.
QSSIM
Color Normalization Method Test 1 Test 2 Test 3 Test 4 Test 5
Color transfer between image [26]0.80570.85630.95210.99130.9815
Histogram specification [2]0.81320.87610.95080.98270.9918
Macenko [3]0.80590.88370.92270.92320.9841
SPCN [6]0.80960.87610.91570.91210.9808
STRESS [19]0.88700.94050.93740.99160.9909
Our method0.75680.81660.94000.92140.9823
Table 2. The accuracies of nuclei classification by using CNACD.
Table 2. The accuracies of nuclei classification by using CNACD.
Classification Accuracy
Color Normalization Method Test 1 Test 2 Test 3 Test 4 Test 5
Original image82.5682.9896.1079.0173.43
Color transfer between image [26]88.3782.9896.1080.2573.43
Histogram specification [2]80.2375.5396.1079.0173.43
Macenko [3]76.7479.7997.4080.2573.43
SPCN approach [6]74.4279.7996.1080.2573.43
STRESS [19]88.3779.7996.1080.2575
Our method90.7076.7096.1080.2573.43
Table 3. Computational Complexity Table (Big-O Analysis) for each step in our proposed pipeline.
Table 3. Computational Complexity Table (Big-O Analysis) for each step in our proposed pipeline.
MethodComputational Complexity
STRESS [19] O ( n 2 )
SS [3] O ( n log n )
SPCN [6] O ( n )
BLT/IBLT [18] O ( 1 )
RSSFCA [24] O ( n )
Merge O ( n )
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Traisuwan, A.; Limsiroratana, S.; Phukpattaranont, P.; Sutthimat, P.; Tandayya, P. Color Normalization in Breast Cancer Immunohistochemistry Images Based on Sparse Stain Separation and Self-Sparse Fuzzy Clustering. Diagnostics 2025, 15, 2316. https://doi.org/10.3390/diagnostics15182316

AMA Style

Traisuwan A, Limsiroratana S, Phukpattaranont P, Sutthimat P, Tandayya P. Color Normalization in Breast Cancer Immunohistochemistry Images Based on Sparse Stain Separation and Self-Sparse Fuzzy Clustering. Diagnostics. 2025; 15(18):2316. https://doi.org/10.3390/diagnostics15182316

Chicago/Turabian Style

Traisuwan, Attasuntorn, Somchai Limsiroratana, Pornchai Phukpattaranont, Phiraphat Sutthimat, and Pichaya Tandayya. 2025. "Color Normalization in Breast Cancer Immunohistochemistry Images Based on Sparse Stain Separation and Self-Sparse Fuzzy Clustering" Diagnostics 15, no. 18: 2316. https://doi.org/10.3390/diagnostics15182316

APA Style

Traisuwan, A., Limsiroratana, S., Phukpattaranont, P., Sutthimat, P., & Tandayya, P. (2025). Color Normalization in Breast Cancer Immunohistochemistry Images Based on Sparse Stain Separation and Self-Sparse Fuzzy Clustering. Diagnostics, 15(18), 2316. https://doi.org/10.3390/diagnostics15182316

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop