1. Introduction
The assessment of breast cancer (BC) histological grade and biomarker expression has become routine practice in clinical pathology. The histological grading of BC is one of the strongest prognostic factors and has been included in the American Joint Committee on Cancer (AJCC) staging system as a stage modifier [
1]. Beyond its role as a prognostic factor, histological grading is also essential for recognizing when the histological grade of BC is unusual or discordant with hormone receptor or human epidermal growth factor receptor 2 (HER2) status; further work-up is warranted to ensure accurate histological typing, grade, and biomarker status [
2]. BC biomarkers, including the estrogen receptor (ER), progesterone receptor (PR), HER2, and Ki67, are well-established prognostic factors that play crucial roles in determining biological subtypes and guiding therapeutic strategies for patients [
3,
4]. Hence, there is a substantial demand for accurate, precise, and standardized evaluation of these biomarkers. ER, PR, HER2, and Ki67 analyzed through immunohistochemistry (IHC) could act as surrogate markers for gene expression-based subtypes to reflect prognosis. Such assays are generally more accessible than gene expression molecular profiling assays, which are costly and time-consuming [
5]. The interpretation of BC biomarkers through IHC is a critical component of pathological reporting, especially since the St. Gallen consensus guidelines endorsed it as a diagnostic standard [
6].
Digital pathology, originally known as “telepathology”, has seen significant progress since its advent in the 1980s. Digital imaging hardware and software innovations have led to whole slide imaging (WSI), in which glass slides of pathological specimens are digitally scanned at a high resolution for viewing on a computer screen [
7,
8]. Recently, WSI has been used globally for digital imaging preservation, education, teleconsultation, and, increasingly, primary pathological diagnosis because it has several advantages over conventional light microscopy (CLM), such as portability, ease of sharing and retrieval of archival images, and the ability to utilize computer-aided diagnostic tools [
9,
10,
11]. When using WSI for practical diagnostic purposes, validating specific WSI systems before clinical use is necessary to ensure accurate diagnoses to at least the same level as CLM [
12,
13]. Studies validating WSI systems for primary diagnostic purposes have been conducted by pathology laboratories across various subspecialties, including breast pathology [
14,
15,
16]. However, most of these studies primarily used hematoxylin–eosin (H&E) slides for primary diagnosis rather than focusing on the assessment of prognostic pathologic factors or the expression of IHC-stained biomarkers. As WSI becomes the established norm in surgical pathology, a pertinent line of inquiry emerges concerning the potential influence of integrating digital pathology into patient prognostic indicators within a real-world clinical environment. Consistently, there have been concerns regarding the use of WSI as a primary diagnostic tool in breast pathology, including the assessment of prognostic and predictive variables such as histological grade determination and interpretation of biomarker staining results [
7,
17,
18]. Furthermore, given the inherent limitations of visual assessment for evaluating biomarker expression using IHC, automated digital image analysis (DIA) has been proposed as a potential method to improve accuracy and inter-observer reproducibility when assessing IHC expression, and its utility has been analyzed [
17,
19].
Core needle biopsy (CNB) is one of the most common methods for performing pathological breast lesion diagnosis [
20]. When diagnosing invasive BC through CNB, it is imperative to assess not only the initial histological grade but also ER, PR, and HER2 through IHC testing [
21]. This process informs critical treatment decisions regarding potential neoadjuvant therapy before surgical interventions. In instances of complete pathological responses, the biopsy sample represents the sole remains of the available tumor [
22]. Additionally, CNB samples are preferred over excision for biomarker testing because this approach helps to prevent many fixation problems [
3,
23]. Consequently, ensuring reliable evaluation of the histological grade and BC biomarker expression in CNB samples is critical, whether using CLM or WSI.
This study aimed to evaluate the effectiveness and reliability of WSI in BC CNB as a primary diagnostic method, focusing on determining the histological grade and characterizing biomarker expression, compared with CLM systems. The feasibility of using DIA to assess biomarker expression in clinical practice was also evaluated.
4. Discussion
The grading of BC using the Nottingham combined histological grade is one of the strongest prognostic factors, independent of tumor size or the number of positive lymph nodes, and it is also incorporated into the AJCC Cancer Staging Manual [
29,
30]. Despite increasing interest in utilizing WSI for primary diagnostic purposes, the digital validation of BC prognostic factors has not yet been established in the literature. This study achieved a substantial level of intra-observer agreement for NG and its components among three pathologists between CLM and WSI. Furthermore, the inter-observer agreement regarding NG and its associated elements in WSI displayed agreement levels similar to that in CLM, comparable to the concordance rates reported by diverse pathologists who assessed BC grading using CLM (κ = 0.48–0.70) [
31,
32,
33].
As for the individual components of NG, intra-observer agreement for NP scores was the most variable for all three observers. Moreover, NP showed the lowest agreement rate for inter-observer comparisons with CLM and WSI. Consistently, in previous studies, NP had the lowest intra-observer agreement of all components of NG between CLM and WSI [
7,
34] and the worst agreement component in inter-observer variation using WSI [
35,
36]. As NP lacks a quantitative definition, in contrast to TF and MCs, it emerges as the least reproducible among the three grading components. Therefore, when interpretating NP, it is crucial to meticulously examine and compare it with the surrounding normal breast epithelium through not only CLM but also WSI. Compared to previous studies with no clear biases by format [
7,
34], in the present study, two observers showed consistently higher NP scores for CLM than for WSI, indicating bias. In contrast, for TF, two observers showed lower scores for CLM than with WSI. Additionally, previous studies have reported that WSI shows reduced ability to identify MCs [
34,
37]. Rakha et al. demonstrated that, among the three NG components, the most challenging to evaluate by WSI was MCs because of the difficulty in discerning mitotic figures from apoptotic cells [
7]. They also recommended using a higher magnification (×40) to ensure adequate resolution for accurate grading. In the present study, we conducted WSI and graded MCs at ×40 magnification. The improvement in the MC agreement rate through high-magnification scanning is worth noting. However, the substantial size of the files may limit the utility of this technique for routine diagnostic purposes, especially considering the high storage capacity and costs involved [
18]. In the present study, even though all H&E slides were derived from CNB specimens, the scanned file size was substantial (range: 0.59–4.97 GB; mean: 1.95 GB). Efforts to reduce storage requirements are necessary to make this approach more practical for diagnostic purposes.
For patients with BC, determining prognosis and treatment strategies based on ER, PR, HER2, and Ki67 status depends on accurate IHC evaluation [
3,
6]. The conventional approach for IHC assessment involves visually determining and scoring positivity by manually counting stained cells. Although WSI has gained broader acceptance in surgical pathology for primary diagnosis, the digital validation of BC biomarker expression has not been established [
4]. Previous studies have attempted to employ WSI to validate primary diagnoses when reporting breast biomarkers; most were focused on HER2 stains, reporting a substantial kappa value (κ = 0.791) and substantial agreement percentages (range, 61.3–92.5%) [
38,
39]. In the present study, consistent results were observed, with a substantial level of perfect concordance (65.3–92.0%) and kappa coefficients (0.563–0.888) for the CLM/WSI pairs in evaluating ER, PR, HER2, and Ki67 expression. Based on these findings, a consensus was reached that WSI is non-inferior to CLM when interpreting breast biomarkers, although each pathologist achieved slightly different concordance rates. Furthermore, there are concerns regarding the differences in color tone and contrast of immunostained materials when scanned into the WSI device. The HER2 scores on WSI were shown to be higher than those on glass slides, possibly because of the increased color contrast in WSI [
38]. The current study revealed no apparent biases regarding intra-observer variability concerning HER2 scores based on the format used. This pattern was consistently observed for other biomarkers. Including IHC-positive controls in the slides likely contributed to this consistency, as described in a previous study [
4]. Additionally, PR concordance was slightly lower than that of ER. Previous studies have consistently indicated that PR expression shows lower agreement than ER expression in assessing inter-observer variability [
40,
41]. PR is a target gene regulated by estrogen and naturally displays greater homogeneity in normal breast tissues and tumors [
22,
42]. Intermediate biomarker expression categories are less reproducible than categories at the extremes [
43,
44]. Therefore, the heterogeneous expression of PR may be linked to reduced levels of intra-observer agreement, and a more cautious approach is advised for observers when interpreting biomarkers within tumors exhibiting heterogeneous expression, not just through CLM but also through WSI.
In clinical practice, IHC is considered a standard diagnostic tool for tumor classification, therapeutic decision-making, and prognostic factors in BC and other malignancies [
5,
45]. Nevertheless, manual interpretation of BC biomarker expression has inherent limitations, such as subjectivity and variability between different observers [
46]. In the present study, we assessed inter-observer concordance of BC biomarker expression through visual assessment, revealing lower concordance rates, especially for PR and Ki67 using CLM and HER2 using WSI. Importantly, our findings suggest that inter-observer variability is not specific to particular biomarkers or expression patterns. Automated DIA, conversely, is a promising alternative that could produce precise results with enhanced accuracy and reliability [
17,
19,
47]. However, a consensus statement from the College of American Pathologists expert panel underscores the necessity of validating the use of DIA against other methods, acknowledging the insufficient published data available to establish best practices [
48]. In the present study, the application of DIA to assess biomarker expression exhibited an enhanced kappa coefficient compared with the inter-observer agreement, particularly for HER2 and Ki67. Notably, when compared to each observer’s individual assessment, DIA exhibited an improved kappa coefficient when considering the consensus of three observers for the expression of most biomarkers, both with CLM and WSI. This study’s results align with previous observations, suggesting that automated HER2 IHC measurements are more comparable to consensus visual scores determined by multiple pathologists, as well as HER2 gene amplification data [
49]. Given the impracticality of achieving consensus scoring by experts in routine practice, DIA may enhance the quality of biomarker expression assessment. These findings highlight the capability of DIA to improve agreement and concordance in biomarker expression assessment compared to manual assessment with CLM, as well as the consistency of results with WSI. The observed agreements emphasize that integrating DIA into the diagnostic workflow in clinical practice can significantly enhance scoring reproducibility among observers and improve objective assessment.
Given the recent efforts to validate WSI, it is crucial to underscore its numerous potential benefits [
50]. WSI facilitates the easy exchange in pathological opinions between medical institutions located remotely, improves pathology education and learning experiences by enhancing educational environments, can enhance the accuracy and efficiency of pathological interpretation through automated DIA and computer-aided tools, and decreases problems associated with the retrieval of glass slides from physical storage sites. However, intra-observer discrepancies remain problematic, particularly in borderline, difficult, or challenging cases, which are often sources of disagreement [
16]. Difficulties in identifying mitotic figures, nuclear details, and chromatin patterns are also commonly reported [
51]. Integrating DIA is useful for quantifying pathological images and identifying objects and can enhance the consistency and accuracy of pathological interpretation [
17]. However, it requires technical skills for the implementation and maintenance of complex DIA software and difficulty in accurately identifying some pathological features due to limitations in algorithms.
One of the major limitations of this study was the relatively small number of cases. This study was conducted at a single institution, which could affect the external validity of the results as variations may arise in different clinical settings, with diverse devices, or based on the pathologist’s training level. Additionally, because all samples included in this study were CNB, the results may differ from those of the excision samples. However, as this study’s aim was to assess WSI’s effectiveness and reliability as a primary diagnostic tool, focusing on histological grade and the assessment of biomarker expression in BC CNB, we believe that the use of WSI could be viewed as a strength. Furthermore, this study’s data are vital for developing guidelines and protocols for integrating WSI into routine pathology practice, ultimately enhancing diagnostic accuracy.