Abstract
Background and Objective: The color normalization of breast cancer immunohistochemistry (IHC)-stained images helps change the color distribution of undesirable IHC-stained images to be more interpretable for the pathologists. This will affect the Allred score that the pathologists use to estimate the drug quantity for treating breast cancer patients. Methods: A new color normalization technique based on sparse stain separation and self-sparse fuzzy clustering is proposed. Results: The quaternion structural similarity was used to measure the quality of the normalization algorithm. Our technique has a structural similarity score lower than other techniques, and the color distribution similarity is closer to the target. We applied automated and unsupervised nuclei classification with Automatic Color Deconvolution (ACD) to test the color features extracted from normalized images. Conclusions: The classification result from our unsupervised nuclei classification with ACD is similar to other normalization methods, but it offers an easier perception to the pathologists.
1. Introduction
Breast cancer is prevalently diagnosed among women, particularly those under 40, and is a significant cause of mortality, accounting for approximately 44,800 deaths annually in this age group [1]. Pathologists play a crucial role in clinical care by diagnosing breast cancer, determining tumor malignancy, assessing its growth within the breast, and identifying any spread to lymph nodes or other organs. This evaluation typically involves examining stained cancer tissues under a light microscope.
A major issue in evaluating histopathological images, particularly for scoring, arises from color variations caused by differences in stain operator protocols, exposure times, and slide scanner specifications. These inconsistencies significantly impact the quality of feature extraction from the images. To address this, color normalization techniques are employed to standardize image colors, making them more general and consistent.
Historically, image normalization can be achieved by adjusting the colors of a source image to match a target image, which is a process that can be performed using image editing software like GNU Image Manipulation Program (GIMP) or Adobe Photoshop. More advanced methods include histogram-matching algorithms [2], which were initially developed for grayscale images but are adaptable for color images by matching individual color channels. Another approach, the color-matching algorithm, specifically adjusts the mean and standard deviation of channels.
Macenko et al. put forth a normalization algorithm for histological slides in 2009, which employed color deconvolution for the identification of stain components, subsequently utilizing singular value decomposition (SVD) projection for normalization [3]. Although this approach demonstrated efficacy, it failed to maintain the structural integrity of the tissue, thereby impacting diagnostic outcomes. The guidelines by the College of American Pathologists (CAP) emphasize the necessity of accurate histological imaging to ensure reliable diagnoses. The Laboratory General Checklist [4] established by the CAP necessitates the maintenance of the structural integrity of tissue to avert misdiagnosis, thereby reflecting an ethical obligation to patients. The World Health Organization (WHO) asserts that the diagnosis of tumors emphasizes the significance of the preservation of the tissue structure [5]. In response to this issue, Vahadane et al. refined the color deconvolution technique to safeguard the tissue structure, designating their approach as structure-preserving color normalization [6]. More recently, advanced deep learning models such as StainGAN were employed for unpaired image-to-image translation to facilitate the transfer of stylistic elements in digital histological images [7]. Additionally, fuzzy clustering methodologies were proposed to mitigate uncertainty in the analysis of histological images and enhance color normalization.
The ambiguity inherent in histological image analysis constitutes one of the principal challenges. Maji and Mahapatra proposed the application of circular clustering within the fuzzy approximation domain for the purpose of color normalization of histological images [8]. They employed the round-fuzzy circular cluster model to generate values in the weighted hue histograms of both the source and template images prior to the implementation of non-negative matrix factorization, which was aimed at achieving effective stain separation. Furthermore, comprehensive probability modeling or the Bayesian methodology was utilized to ascertain stain separation in histological images [9,10].
Most existing normalization methods primarily focus on hematoxylin and eosin (H&E) stained slides due to the greater availability of open datasets for H&E compared to immunohistochemistry (IHC) staining. H&E staining provides basic morphological information but lacks molecular details like antigen expression, which IHC staining offers. IHC staining is crucial for pathologists to predict cancer cell growth, and its evaluation involves counting different types of nuclear staining and cell populations. Given the large volume of slides pathologists process, automated image analysis systems were developed for biomarker scoring [11,12,13,14,15]. Unnormalized IHC images can lead to incorrect cell labeling, potentially resulting in inappropriate treatments and affecting cancer cell growth. Furthermore, empirical evidence demonstrated a statistically significant enhancement in diagnostic confidence upon applying medical image normalization [16].
This paper introduces a color normalization method specifically for IHC-stained images. It adapts techniques previously used for H&E stained images, despite the difference in the number of perceived colors (two in H&E vs. three in IHC), by finding a better structure-preserved normalization method to prepare IHC images. The subsequent sections of this manuscript are described as follows. The materials and methodologies are presented in Section 2. Section 3 presents a comparative analysis of the qualitative and quantitative outcomes derived from various benchmark algorithms against our proposed methodology. The discourse is articulated in Section 4. Finally, Section 5 encapsulates the research conducted within this paper.
2. Materials and Methods
2.1. Database Description
In the present study, our immunohistochemical (IHC) images were obtained from the Department of Pathology at the Faculty of Medicine, Prince of Songkla University. These images were procured from four general and regional hospitals situated in the southernmost provinces of Thailand between January and June 2022. The total number of cases encompassed 151, which were sourced from Naradhiwas Rajanagarindra Hospital, Pattani Hospital, Yala Regional Hospital, and Sungaikolok Hospital. They were identified with ductal carcinoma in situ (DCIS) or epithelial breast cancer at the ages of 23–79 years. Of the 151 participants included, 91 (60.3%) were below 50 years of age, and 60 (39.7%) were above 50 years of age. Furthermore, 131 (86.8%) had invasive ductal carcinoma, 6 (4%) had DCIS, and 14 (9.3%) had pathological types of cancer. The research protocol received approval from the Human Research Ethics Committee of Naradhiwas Rajanagarindra Hospital (REC 001/2564). Furthermore, this investigation conformed to the principles delineated in the Declaration of Helsinki.
The breast tissue images were acquired utilizing a light microscope (Eclipse 80i advanced research microscope, Nikon Instech Co., Ltd., Tokyo, Japan) at a magnification of 40×. These images were stored in a 24-bit color JPEG format. The resolution of the images is 720 × 900 pixels. A singular image may encompass two distinct types of nuclei: cancerous nuclei and non-cancerous nuclei. Each nucleus is stained utilizing two distinct stain colors: blue (immunonegative stain) and brown (immunopositive stain). Pathologists concentrate on quantifying the number of immunopositive nuclei in conjunction with the count of immunonegative nuclei. Five images exhibiting various variations were randomly selected for inclusion in this study.
2.2. Color Deconvolution (CD)
Color deconvolution constitutes a sophisticated image analysis technique formulated to delineate and quantify immunohistochemical staining. Its principal aim is to establish a versatile and robust methodology for the objective immunohistochemical assessment of samples subjected to up to three distinct stains, including horseradish peroxidase staining developed with 3,3′ diaminobenzidine (DAB), hematoxylin, and eosin. This methodological approach aspires to address the challenges posed by conventional color transformation techniques by precisely isolating the contributions of individual stains, even in scenarios where they co-localize or exhibit overlapping absorption spectra. The intensities of light transmitted through a specimen can be characterized using the Beer–Lambert law [17]. The correlation between the intensity of light traversing the specimen and the intensity of light entering the specimen , in conjunction with the absorption factor , is delineated as follows [11]:
The subscript C denotes the specific detection channel. The concentration of the stain exhibits a non-linear relationship with the RGB color values [3]. Consequently, the RGB color values are not suitable for the purposes of separation and quantification of the concentration of the stain. The optical density can be articulated as shown below:
Color deconvolution [11] constitutes a methodological approach aimed at the identification of the stain vector and the absorption factor through the process of decomposing , where for the RGB channels, n is the number of pixels, and r is the number of stains. The intensities of light that are transmitted through a specimen may be expressed as a matrix representation of V, while S can be articulated as the matrix denoting the saturation levels of each individual stain as shown below:
Furthermore, the procedural framework for converting the RGB color space () into the optical density (OD) domain [18] or the Beer–Lambert transformation (BLT) yields the following result:
where is the illuminating light intensity on the sample (usually 255 for 8-bit images).
Conversely, the procedure for converting the OD color space into the RGB color space, or the inverted Beer–Lambert transformation (IBLT), is articulated as follows:
2.3. Contrast Stretching (CS)
The Spatio-Temporal Retinex-like Envelope with Stochastic Sampling (STRESS) algorithm [19], conceptualized by Kolås et al., is engineered to emulate the adaptive functions of the Human Visual System. Its fundamental purpose is to compute local reference black and white points for each chromatic channel contained within an image. This computation involves the estimation of two envelope functions—maximum () and minimum ()—which encapsulate the image signal and exhibit gradual variation. These envelopes are distinguished by their adherence to the signal, exhibiting smoothness, edge preservation, and convergence to the global maximum for and global minimum for . The algorithm derives these envelopes for each pixel through the application of a random spray model. Upon the determination of the envelopes, the value of each pixel is modified to enhance contrast, highlight details, and equilibrate the three color channels, thereby effectively executing color correction. It facilitates local contrast enhancement, automatic color adjustment, and high dynamic range image rendering. Each pixel in the image undergoes an update as shown below:
and
The variables and present within the equations denote the average of the sample range and the average of the relative pixel values, which are computed as shown below:
where N represents the number of iterations, is the range of the samples, and is the relative value of the center pixel given as shown below:
In this context, and denote the uppermost and lowermost sample values, respectively, derived from a spray that is ascertained through a selection of stochastic samples originating from a disk of radius R centered at the point , which is articulated as follows:
when the number of samples is denoted as M, and the random sample values are denoted as . In the context of color imagery, the computation is executed independently for each individual color channel.
Finally, R should equilibrate detail retention and illumination rectification. For instance, in dimly lit images, an augmented R may more accurately gauge global illumination, whereas in high-contrast scenes, a diminished R circumvents excessive smoothing. The algorithm may employ multi-scale methodologies, administering disparate R values. The selection of M and N should consider the image’s noise level and intricacy. Noisy images may necessitate larger M and N to average out noise influences. An enhanced R typically mandates larger M and N to guarantee adequate sampling density within the disk, thus avoiding sparse or erroneous envelope estimates.
2.4. Stain Separation (SS)
Macenko et al. introduced a computational framework that systematically detects the appropriate stain vectors corresponding to an image [3], thus facilitating color deconvolution and normalization processes. The objective of this methodology is to reconcile histological slides subjected to disparate processing conditions into a unified, standardized framework, which consequently enhances both quantitative analytical capabilities and visual uniformity. In order to ascertain the optimal stain vector (v), it is necessary to utilize solely . The algorithm is delineated as in Algorithm 1.
| Algorithm 1 Automatic Color Deconvolution Algorithm [3] |
Input: RGB Slide,
|
Furthermore, non-negative matrix factorization (NMF) may be employed for the purpose of stain separation [20]. Given that the stain is capable of absorbing light but is unable to emit it, therefore, the stain vector (V) and the absorption factor (S) as articulated in Equation (3) must be non-negative. As a result, it is feasible to determine V and S by resolving the following equation:
Vahadane et al. [6] introduced a new cost function that enhances Equation (15) through the incorporation of l1 sparsity regularization applied to the stain vector (V), where each stain has been indexed by j as shown below:
such that . is the parameter for sparsity and regularization. Establishing diminishes SNMF to NMF. The optimal value can be ascertained through a grid search by identifying the minimum error of the projected stain color matrix or the correlation of the projected stain density maps [6].
Additionally, sparse coding methodologies [21] may be employed to infer S, while dictionary learning techniques [22] are applicable for the estimation of V. Consequently, Vahadane et al. utilized the SPArse Modeling Software version 2.6 (SPAMS) [23] for the purpose of estimating these parameters.
2.5. Fuzzy Clustering (FC)
The Robust Self-Sparse Fuzzy Clustering Algorithm (RSSFCA) represents an innovative methodology formulated for the purpose of image segmentation [24], explicitly targeting two prevalent challenges encountered in conventional fuzzy clustering algorithms: heightened sensitivity to outliers attributable to non-sparse fuzzy memberships and excessive image segmentation resulting from an insufficiency of local spatial information. RSSFCA offers two significant contributions: initially, it incorporates a regularization framework under the Gaussian metric into the objective function of fuzzy clustering algorithms to attain fuzzy memberships characterized by sparsity, thereby diminishing noise and enhancing clustering efficacy. Furthermore, it presents a connected-component filtering mechanism predicated on an area density balance strategy to address the issue of image over-segmentation, which is comparatively simpler and more rapid than the integration of local spatial information for the elimination of minor areas. Empirical findings demonstrate that RSSFCA proficiently alleviates the sensitivity to outliers and the problem of over-segmentation, thereby producing superior image segmentation results comparing to previous leading algorithms in the field. In order to effectively address sparse fuzzy memberships, a regularization term was delineated within the objective function as follows:
where represents an instance of unlabeled data, denotes the corresponding centroid of the cluster, and signifies the membership degree of relative to the clustering center , which was constrained such that and across c distinct clusters. Furthermore, is delineated as shown below:
where denotes the Gaussian probability density function represented as shown below:
In this context, D represents the dimensionality of the input data, while denotes the covariance matrix that encapsulates the intra-class variability of the ith class. A reduction in results in a negative value of , which subsequently induces significant inaccuracies in distance calculations and resultant misclassification. Consequently, the issue was addressed by employing in place of as shown below:
Consequentially, their final objective function is defined as shown below:
Moreover, can be separated into c sub-problems as shown below:
The update can be calculated by solving as shown below:
Furthermore, the update from can be calculated as shown below:
The comprehensive outline of their methodology is delineated in Algorithm 2, wherein the inputs consist of the number of clusters (c), the regularization parameter (), the convergence threshold (), and the maximum number of iterations (T). Finally, the algorithm’s outputs are denoted as , which are instrumental for the segmentation of pixels within the image.
| Algorithm 2 RSSFCA Algorithm [24] |
Input: c, , , T
|
2.6. Structure-Preserving Color Normalization (SPCN)
Structure-Preserving Color Normalization (SPCN) is a technique designed to standardize the color appearance of histological images while preserving their underlying biological structure [6]. This method is built upon sparse non-negative matrix factorization (SNMF) for stain separation, which decomposes images into sparse and non-negative stain density maps. SPCN works by replacing the color basis of a source image with that of a pathologist-preferred target image while maintaining the source’s original stain concentrations. This ensures that the structural information, captured in the stain density maps, remains intact, and only the color appearance is altered. This approach addresses the issue of color variations in histological images caused by differences in staining protocols, raw materials, and scanner responses, making images more comparable for analysis by pathologists and software.
In order to achieve a normalized color representation from a source image (s) to a target image (t), it is imperative to decompose the source optical density () into the product of the matrices and , while concurrently decomposing the target optical density () into the matrices and , as delineated in Equation (16). Subsequently, the matrix necessitates normalization according to the following formulation:
where and denotes the pseudo-maximum value of each row vector at the 99% threshold. Finally, the normalized source will be computed as shown below:
2.7. Quaternion Structural Similarity (QSSIM)
Quaternion Structural SIMilarity (QSSIM) is a visual quality matrix (VQM) designed for color images, representing a vectorial expansion of the traditional Structural SIMilarity (SSIM) index using Quaternion Image Processing (QIP) [25]. Unlike scalar methods that often fail to adequately measure combined degradations like blur and desaturation, QSSIM employs a true vectorial approach, treating each color pixel as a single quaternion number. This allows QSSIM to measure changes in both luminance and chrominance vectors simultaneously, making it particularly effective in predicting the visual quality of color images subjected to complex degradations, such as those caused by the color crosstalk effect. Existing VQMs primarily focus on either luminance or chrominance changes, offering superior correlation with human subjective tests for combined degradations.
QSSIM possesses the capability to evaluate the simultaneous degradation attributed to both blurriness and desaturation. The formulation presented in Equation (27) encompasses components of luminance (), chromatic components (), and the cross-correlation of color (), each of which is delineated as follows:
where the standard deviations of the source (ref) and processed images (deg) are represented by and , respectively.
Moreover, the first term in QSSIM is a luminance comparison term that measures the similarity in average brightness (mean intensity) between two images or image patches. Luminance reflects the overall illumination level, which is critical for visual perception. The second term is a structural term. It measures the correlation of pixel intensity patterns, capturing structural similarity (e.g., edges, textures). A high measurement indicates that the processed image preserves the structural details of the source image.
2.8. Classification of Nuclei in Breast Cancer IHC Based on Automatic Color Deconvolution (CNACD)
The classification of breast cancer nuclei in immunohistochemical (IHC) images can be conducted approximately utilizing a solitary pixel representing the nucleus. The process of stain separation must be implemented on the pixel through the application of Algorithm 1. Subsequently, each RGB stain value is consolidated into a singular grayscale value to facilitate enhanced comparative analysis. The maximum stain value will serve as the criterion for determining the classification of the nuclei. This methodology is encapsulated in Algorithm 3.
| Algorithm 3 CNACD Algorithm |
Input: RGB Pixels
|
2.9. The Proposed Normalization Method
To normalize breast cancer immunohistochemistry (IHC) images, a multi-step method is proposed. Initially, both source and target stained images undergo contrast stretching or STRESS processing, which can be performed using software like GNU Image Manipulation Program. Following this, the images are converted from the RGB color space to the optical density (OD) space using Beer–Lambert’s law. The stained images are then separated into stain vector and stain absorption matrices. The color appearance of the source image is normalized to match the target image using Structure-Preserving Color Normalization (SPCN). The resulting image from the SPCN block is converted back from OD space to RGB space. Finally, the pixels of the contrast-stretched source image are classified using the Robust Self-Sparse Fuzzy Clustering Algorithm (RSSFCA) to identify background and nuclei, thereby determining their locations. This comprehensive approach aims to standardize the color distribution of IHC images, making them more interpretable for pathologists. The schematic representation of the proposed normalization technique is illustrated in Figure 1.
Figure 1.
Schematic representation of our proposed methodology for color normalization in immunohistochemistry (IHC) images.
2.10. Experiment Settings
In the conducted experiment, we established a series of tests to evaluate five distinct image normalization algorithms, each applied to five different color variations of immunohistochemically stained images, for comparative analysis against our proposed methodology. The algorithms under scrutiny include the color transfer between images method, the histogram specification technique, the Macenko approach, the Structure-Preserving Color Normalization method, and the STRESS technique. Subsequently, we employed QSSIM to assess the degradation in clarity and color saturation of the normalized outputs relative to the original images. The efficacy of histological information preservation is quantified utilizing QSSIM. Furthermore, a three-dimensional histogram visualization of color distribution is employed to facilitate quantitative comparisons. Moreover, we analyze the impact of normalization on classifier performance by utilizing the CNACD to approximately classify breast cancer nuclei based on their central pixels annotated by pathologists. The classification outcomes are derived from six normalization techniques to ascertain whether these methodologies influence the Allred score.
3. Results
In this study, we executed our model on a computational system equipped with an Intel Core i5 2.3 GHz CPU and 8 GB of RAM. Our proposed normalization methodology was subjected to a comparative analysis against five distinct image normalization algorithms. We conducted tests utilizing five various color modifications of IHC-stained images. The findings are illustrated in Figure 2. The qualitative and quantitative assessments were categorized into three segments. Initially, we assessed the structural similarity of our proposed methodology in relation to the five image normalization algorithms. Subsequently, to demonstrate the efficacy of our normalization technique for Allred scoring, we evaluated the classification accuracy by employing the CNACD classifier. Lastly, for the purpose of quantitative assessment, the 3D histogram visualization of color distribution was utilized and thoroughly analyzed. The outcomes of all methodologies are elaborated upon in the subsequent subsections.
Figure 2.
Visual comparison of color normalization methods.
3.1. Evaluation of Structure Similarity
For the first evaluation, the QSSIM scores were calcucated using a MATLAB version 2022b script which was written by Kolamen [25]. The detail in the script follows Equation (27). The structure similarity scores for five image normalization algorithms applied with five different color variations of IHC stained images are shown in Table 1.
Table 1.
Quality metrics of various color normalization methods.
3.2. Classification Performances
To ascertain the classification efficacy, the ground truth was established by the pathologists. The algorithm employed as the classifier was the CNACD methodology. The accuracy (AC) was computed as shown below:
where TP denotes the aggregate count of true positive cancer nuclei identified within the immunohistochemistry (IHC) image, TN represents the cumulative total of true negative cancer nuclei, FN indicates the overall number of false negative cancer nuclei, and FP signifies the complete tally of false positive cancer nuclei. The accuracy metrics derived from the various normalization algorithms are presented in Table 2.
Table 2.
The accuracies of nuclei classification by using CNACD.
3.3. Quantitative Comparison with 3D Histogram Visualization of Color Distribution
The three-dimensional histogram representation was generated utilizing the “Color Inspector 3D” version 2.3 [27] plugin within the ImageJ version 1.53 [28] framework. Each color pixel within the image is depicted as the centroid of a circle. Furthermore, the circumference of each circle signifies the prevalence of the corresponding color pixels. This representation is illustrated in Figure 3.
Figure 3.
A 3D histogram visualization of color distribution of each color normalization technique.
4. Discussion
We have conducted a comparative analysis of our proposed methodology against five distinct color normalization techniques: the color transfer methodology as articulated in [26], the histogram specification technique delineated in [2], the Macenko methodology as described in [3], the SPCN technique referenced in [6], and the STRESS methodology elucidated in [19].
Figure 2 illustrates the outcomes of all experimental conditions. It is noteworthy that the backgrounds associated with the results derived from the Macenko and SPCN methodologies do not exhibit a pristine white coloration.
Figure 4 shows results from each step of our proposed method. In Table 1, the similarity metrics between the normalized source and the ground truth have been computed employing QSSIM [29]. The findings demonstrate that STRESS consistently achieves superior performance relative to other competing methodologies. According to Figure 4e, thanks to the RSSFCA segmentation error, where certain background regions in the image may be erroneously segmented as nuclei, the tissue structure can be changed, lowering our method’s QSSIM score. Furthermore, the variables for RSSFCA can be optimized through grid search.
Figure 4.
The resultant visual representations derived from each segment of our proposed methodology. (a) Original image. (b) STRESS original. (c) SPCN original. (d) Nuclei SPCN original. (e) Background STRESS original. (f) Normalized original.
In reference to the three-dimensional visualization depicted in Figure 3, the methodologies for color transfer involving image and histogram specification exhibit the presence of purple clusters. Furthermore, it is noteworthy that the color clusters associated with the STRESS methodology are devoid of the brown color cluster. In our proposed approach, the color clusters demonstrate a closer resemblance to those of the target image. However, there are more brownish pixels, but these are not in the nuclei. This reflects the structure changes, consequently, affecting the structure similarity score, QSSIM.
Table 3 delineates the computational complexity associated with each phase of our proposed methodology. The cumulative complexity is denoted as . This complexity is dictated by the term possessing the highest order, specifically STRESS. The STRESS component emerges as the most critical step, as it significantly enhances the image contrast. Furthermore, it is noteworthy that the complexity can be mitigated through the application of the Quantile-Based Retinex (QBRIX) [30] or the Retinex-Based Fast Algorithm (RBFA) [31].
Table 3.
Computational Complexity Table (Big-O Analysis) for each step in our proposed pipeline.
The aggregation of brown-hued regions within the histopathologically stained imagery is of significant relevance as it serves as a basis for the computation of the Allred score. This particular score is instrumental in assessing the therapeutic regimen for breast cancer patients [32].
Moreover, the results of the classification experiment are presented in Table 2. This table illustrates that the outcomes of our method exhibit a degree of similarity to those of other methods. In Test 1, our method outperforms the performance of other techniques. The classification of the eight true negative nuclei cannot be adequately achieved through the use of a single pixel, as certain brown nuclei, which include blue portions, were identified as immunonegative by the pathologists. Consequently, the classification of these eight nuclei necessitates the involvement of neighboring pixels to enhance the weighting of the classification outcome. Furthermore, the eight nuclei are depicted in Figure 5. The predictive performance concerning the eight nuclei, derived from the normalized output of our method, is illustrated in Figure 6. To evaluate the efficacy of the color features extracted from the normalized images, we have conducted an automated and unsupervised classification of nuclei utilizing Automatic Color Deconvolution (ACD) and determined that it does not substantially influence the accuracy. Although our methodology employs the unsupervised classification technique CNACD, the results yield a performance that is not markedly different from those of other methodologies.
Figure 5.
Ground truth Source Image 2 annotated by the pathologists. Green dots signify immunopositive nuclei. Red dots denote immunonegative nuclei. Black dashed rectangles illustrate the erroneous predictions generated by our methodology.
Figure 6.
The categorization of nuclei within the normalized Source Image 2 utilizing the methodology we have proposed. Green dots denote immunopositive nuclei, whereas red dots signify immunonegative nuclei. The black dashed rectangles illustrate the instances of erroneous predictions made by our methodology.
The Allred score is a semi-quantitative method for assessing the estrogen receptor (ER) and progesterone receptor (PR) status in breast cancer IHC slides, yielding a score from 0 to 8 based on positive cell proportion and staining intensity. The assessment is traditionally visual and may introduce subjectivity. Automated methods can be classified as point-based [33] or patch-based [34], with the former focusing on grid points and the latter on image regions, often utilizing deep learning for estimation. Point-based methods exhibit greater precision in low-expression cases, whereas patch-based methods offer efficiency for large Whole Slide Imagings (WSIs) but may miss subtle variations.
The point-based method samples fixed grid points on the image for score estimation. This approach may overlook brown-hued areas. Consequently, the overall score may be compromised. Furthermore, the ACD can improve performance by choosing optimal points rather than relying on grid points. In contrast, the patch-based method necessitates machine learning. Therefore, score accuracy relies on training.
5. Conclusions
A novel methodology for color normalization in breast cancer immunohistochemistry (IHC) images is presented in this manuscript. This approach employs sparse stain separation and self-sparse fuzzy clustering, with the objective of more interpretably rendering images for pathologists by standardizing color distribution, thus facilitating more precise Allred scoring for the evaluation of cancer cell proliferation and informing treatment decisions. Notwithstanding its reduced structural preservation in comparison to alternative methodologies, the classification outcomes derived from the point-based approach are analogous to those obtained from other techniques. Furthermore, it has augmented pathologists’ perception of nuclear morphologies and color saturation.
Author Contributions
Conceptualization, A.T., S.L. and P.T.; Software, A.T.; Validation, A.T., P.S. and P.T.; Investigation, P.P. and P.S.; Writing—original draft, A.T. and P.T.; Writing—review & editing, A.T., P.P., P.S. and P.T.; Supervision, P.T. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Royal Golden Jubilee Ph.D. program grant number PHD/0199/2557.
Institutional Review Board Statement
The study was conducted in accordance with the Declaration of Helsinki and approved by the Human Research Ethics Committee of the Naradhiwas Rajanagarindra Hospital (REC 001/2564). Date: 1 April 2021.
Informed Consent Statement
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| IHC | Immunohistochemistry |
| ACD | Automatic Color Deconvolution |
| GIMP | GNU Image Manipulation Program |
| SVD | Singular Value Decomposition |
| CAP | College of American Pathologists |
| WHO | World Health Organization |
| H&E | Hematoxylin and Eosin |
| DCIS | Ductal Carcinoma In Situ |
| CD | Color Deconvolution |
| OD | Optical Density |
| CS | Contrast Stretching |
| STRESS | Spatio-Temporal Retinex-like Envelope with Stochastic Sampling |
| SS | Stain Separation |
| SPAMS | SPArse Modeling Software |
| FC | Fuzzy Clustering |
| RSSFCA | Robust Self-Sparse Fuzzy Clustering Algorithm |
| SPCN | Structure-Preserving Color Normalization |
| SNMF | Sparse Non-negative Matrix Factorization |
| QSSIM | Quaternion Structural SIMilarity |
| VQM | Visual Quality Matrix |
| QIP | Quaternion Image Processing |
| CNACD | Classification of Nuclei in Breast Cancer IHC based on Automatic Color Deconvolution |
| BLT | Beer–Lambert transformation |
| IBLT | Inverted Beer–Lambert transformation |
| TP | True Positive |
| TN | True Negative |
| FN | False Negative |
| FP | False Positive |
| QBRIX | Quantile-Based Retinex |
| RBFA | Retinex-Based Fast Algorithm |
References
- Daly, A.A.; Rolph, R.; Cutress, R.I.; Copson, E.R. A review of modifiable risk factors in young women for the prevention of breast cancer. Breast Cancer Targets Ther. 2021, 13, 241–257. [Google Scholar] [CrossRef] [PubMed]
- Wechsler, H. Digital image processing, 2nd ed. Proc. IEEE 2008, 69, 1174–1175. [Google Scholar] [CrossRef]
- Macenko, M.; Niethammer, M.; Marron, J.S.; Borland, D.; Woosley, J.T.; Guan, X.; Schmitt, C.; Thomas, N.E. A method for normalizing histology slides for quantitative analysis. In Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA, USA, 28 June 2009–1 July 2009; pp. 1107–1110. [Google Scholar]
- College of American Pathologists, The Lab General Checklist. 2025. Available online: https://www.aab.org/images/CRBSymposium/Behnke%204.pdf (accessed on 15 August 2025).
- WHO. WHO Classification of Tumours: Breast Tumours; WHO Classification of Tumours; International Agency for Research on Cancer (IARC): Lyon, France, 2019; Volume 2. [Google Scholar]
- Vahadane, A.; Peng, T.; Sethi, A.; Albarqouni, S.; Wang, L.; Baust, M.; Steiger, K.; Schlitter, A.M.; Esposito, I.; Navab, N. Structure-Preserving Color Normalization and Sparse Stain Separation for Histological Images. IEEE Trans. Med Imaging 2016, 35, 1962–1971. [Google Scholar] [CrossRef] [PubMed]
- Shaban, M.T.; Baur, C.; Navab, N.; Albarqouni, S. Staingan: Stain style transfer for digital histological images. In Proceedings of the International Symposium on Biomedical Imaging, Venice, Italy, 8–11 April 2019; pp. 953–956. [Google Scholar] [CrossRef]
- Maji, P.; Mahapatra, S. Circular Clustering in Fuzzy Approximation Spaces for Color Normalization of Histological Images. IEEE Trans. Med Imaging 2020, 39, 1735–1745. [Google Scholar] [CrossRef] [PubMed]
- Hidalgo-Gavira, N.; Mateos, J.; Vega, M.; Molina, R.; Katsaggelos, A.K. Variational Bayesian Blind Color Deconvolution of Histopathological Images. IEEE Trans. Image Process. 2020, 29, 2026–2036. [Google Scholar] [CrossRef]
- Pérez-Bueno, F.; Vega, M.; Naranjo, V.; Molina, R.; Katsaggelos, A.K. Fully automatic blind color deconvolution of histological images using super gaussians. In Proceedings of the European Signal Processing Conference, Amsterdam, The Netherlands, 18–21 January 2021; pp. 1254–1258. [Google Scholar] [CrossRef]
- Ruifrok, A.C.; Johnston, D.A. Quantification of histochemical staining by color deconvolution. Anal. Quant. Cytol. Histol. 2001, 23, 291–299. [Google Scholar]
- Tuominen, V.J.; Ruotoistenmäki, S.; Viitanen, A.; Jumppanen, M.; Isola, J. ImmunoRatio: A publicly available web application for quantitative image analysis of estrogen receptor (ER), progesterone receptor (PR), and Ki-67. Breast Cancer Res. 2010, 12, R56. [Google Scholar] [CrossRef]
- Ko, C.C.; Tsai, C.Y.; Lin, C.H.; Liao, K.S. A computer-aided diagnosis system of breast intraductal lesion using histopathological images. In Proceedings of the 29th International Conference on Image and Vision Computing New Zealand, Hamilton, New Zealand, 19–21 November 2014; pp. 212–217. [Google Scholar] [CrossRef]
- Liu, Y.; Li, X.; Zheng, A.; Zhu, X.; Liu, S.; Hu, M.; Luo, Q.; Liao, H.; Liu, M.; He, Y.; et al. Predict Ki-67 Positive Cells in H&E-Stained Images Using Deep Learning Independently From IHC-Stained Images. Front. Mol. Biosci. 2020, 7, 183. [Google Scholar] [CrossRef]
- Saha, M.; Chakraborty, C.; Arun, I.; Ahmed, R.; Chatterjee, S. An Advanced Deep Learning Approach for Ki-67 Stained Hotspot Detection and Proliferation Rate Scoring for Prognostic Evaluation of Breast Cancer. Sci. Rep. 2017, 7, 3213. [Google Scholar] [CrossRef]
- Salvi, M.; Caputo, A.; Balmativola, D.; Scotto, M.; Pennisi, O.; Michielli, N.; Mogetta, A.; Molinari, F.; Fraggetta, F. Impact of Stain Normalization on Pathologist Assessment of Prostate Cancer: A Comparative Study. Cancers 2023, 15, 1503. [Google Scholar] [CrossRef]
- Jahne, B. Practical Handbook on Image Processing for Scientific and Technical Applications; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar]
- Mouelhi, A.; Sayadi, M.; Fnaiech, F. A novel morphological segmentation method for evaluating estrogen receptors’ status in breast tissue images. In Proceedings of the 2014 1st International Conference on Advanced Technologies for Signal and Image Processing, ATSIP 2014, Sousse, Tunisia, 17–19 March 2014; pp. 177–182. [Google Scholar] [CrossRef]
- Kolås, Ø.; Farup, I.; Rizzi, A. Spatio-Temporal Retinex-inspired Envelope with Stochastic Sampling: A framework for spatial color algorithms. J. Imaging Sci. Technol. 2011, 55, 40503-1–40503-10. [Google Scholar] [CrossRef]
- Rabinovich, A.; Agarwal, S.; Laris, C.A.; Price, J.H.; Belongie, S. Unsupervised color decomposition of histologically stained tissue samples. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2004. [Google Scholar]
- Wu, T.T.; Lange, K. Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2008, 2, 224–244. [Google Scholar] [CrossRef]
- Aharon, M.; Elad, M.; Bruckstein, A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 2006, 54, 4311–4322. [Google Scholar] [CrossRef]
- Mairal, J.; Bach, F.; Ponce, J.; Sapiro, G. Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 2010, 11, 19–60. [Google Scholar]
- Jia, X.; Lei, T.; Du, X.; Liu, S.; Meng, H.; Nandi, A.K. Robust Self-Sparse Fuzzy Clustering for Image Segmentation. IEEE Access 2020, 8, 146182–146195. [Google Scholar] [CrossRef]
- Kolaman, A.; Yadid-Pecht, O. Quaternion structural similarity: A new quality index for color images. IEEE Trans. Image Process. 2012, 21, 1526–1536. [Google Scholar] [CrossRef]
- Reinhard, E.; Ashikhmin, M.; Gooch, B.; Shirley, P. Color transfer between images. IEEE Comput. Graph. Appl. 2001, 21, 34–41. [Google Scholar] [CrossRef]
- Barthel, K.U. 3D-Data Representation with ImageJ. In Proceedings of the First ImageJ User and Developer Conference, Luxemburg, 18–19 May 2006. [Google Scholar]
- Schneider, C.A.; Rasband, W.S.; Eliceiri, K.W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 2012, 9, 671–675. [Google Scholar] [CrossRef] [PubMed]
- Sangwine, S.J. Fourier transforms of colour images using quaternion or hypercomplex, numbers. Electron. Lett. 1996, 32, 1979–1980. [Google Scholar] [CrossRef]
- Gianini, G.; Manenti, A.; Rizzi, A. Qbrix: A quantile-based approach to retinex. J. Opt. Soc. Am. A 2014, 31, 2663–2673. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Long, W.; He, L.; Li, Y.; Ding, W. Retinex-based fast algorithm for low-light image enhancement. Entropy 2021, 23, 746. [Google Scholar] [CrossRef] [PubMed]
- Collins, L.C.; Botero, M.L.; Schnitt, S.J. Bimodal frequency distribution of estrogen receptor immunohistochemical staining results in breast cancer: An analysis of 825 cases. Am. J. Clin. Pathol. 2005, 123, 16–20. [Google Scholar] [CrossRef] [PubMed]
- Ilić, I.R.; Stojanović, N.M.; Radulović, N.S.; Živković, V.V.; Randjelović, P.J.; Petrović, A.S.; Božić, M.; Ilić, R.S. The Quantitative ER Immunohistochemical Analysis in Breast Cancer: Detecting the 3 + 0, 4 + 0, and 5 + 0 Allred Score Cases. Medicina 2019, 55, 461. [Google Scholar] [CrossRef] [PubMed]
- Ahmad Fauzi, M.F.; Wan Ahmad, W.S.H.M.; Jamaluddin, M.F.; Lee, J.T.H.; Khor, S.Y.; Looi, L.M.; Abas, F.S.; Aldahoul, N. Allred Scoring of ER-IHC Stained Whole-Slide Images for Hormone Receptor Status in Breast Carcinoma. Diagnostics 2022, 12, 3093. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).