Digital Validation in Breast Cancer Needle Biopsies: Comparison of Histological Grade and Biomarker Expression Assessment Using Conventional Light Microscopy, Whole Slide Imaging, and Digital Image Analysis

Given the widespread use of whole slide imaging (WSI) for primary pathological diagnosis, we evaluated its utility in assessing histological grade and biomarker expression (ER, PR, HER2, and Ki67) compared to conventional light microscopy (CLM). In addition, we explored the utility of digital image analysis (DIA) for assessing biomarker expression. Three breast pathologists assessed the Nottingham combined histological grade, its components, and biomarker expression through the immunohistochemistry of core needle biopsy samples obtained from 101 patients with breast cancer using CLM, WSI, and DIA. There was no significant difference in variance between the WSI and CLM agreement rates for the Nottingham grade and its components and biomarker expression. Nuclear pleomorphism emerged as the most variable histologic component in intra- and inter-observer agreement (kappa ≤ 0.577 and kappa ≤ 0.394, respectively). The assessment of biomarker expression using DIA achieved an enhanced kappa compared to the inter-observer agreement. Compared to each observer’s assessment, DIA exhibited an improved kappa coefficient for the expression of most biomarkers with CLM and WSI. Using WSI to assess prognostic and predictive factors, including histological grade and biomarker expression in breast cancer, is acceptable. Furthermore, incorporating DIA to assess biomarker expression shows promise for substantially enhancing scoring reproducibility.


Introduction
The assessment of breast cancer (BC) histological grade and biomarker expression has become routine practice in clinical pathology.The histological grading of BC is one of the strongest prognostic factors and has been included in the American Joint Committee on Cancer (AJCC) staging system as a stage modifier [1].Beyond its role as a prognostic factor, histological grading is also essential for recognizing when the histological grade of BC is unusual or discordant with hormone receptor or human epidermal growth factor receptor 2 (HER2) status; further work-up is warranted to ensure accurate histological typing, grade, and biomarker status [2].BC biomarkers, including the estrogen receptor (ER), progesterone receptor (PR), HER2, and Ki67, are well-established prognostic factors that play crucial roles in determining biological subtypes and guiding therapeutic strategies for patients [3,4].Hence, there is a substantial demand for accurate, precise, and standardized evaluation of these biomarkers.ER, PR, HER2, and Ki67 analyzed through immunohistochemistry (IHC) could act as surrogate markers for gene expression-based subtypes to reflect prognosis.Such assays are generally more accessible than gene expression molecular profiling assays, which are costly and time-consuming [5].The interpretation of BC biomarkers through IHC is a critical component of pathological reporting, especially since the St. Gallen consensus guidelines endorsed it as a diagnostic standard [6].
Digital pathology, originally known as "telepathology", has seen significant progress since its advent in the 1980s.Digital imaging hardware and software innovations have led to whole slide imaging (WSI), in which glass slides of pathological specimens are digitally scanned at a high resolution for viewing on a computer screen [7,8].Recently, WSI has been used globally for digital imaging preservation, education, teleconsultation, and, increasingly, primary pathological diagnosis because it has several advantages over conventional light microscopy (CLM), such as portability, ease of sharing and retrieval of archival images, and the ability to utilize computer-aided diagnostic tools [9][10][11].When using WSI for practical diagnostic purposes, validating specific WSI systems before clinical use is necessary to ensure accurate diagnoses to at least the same level as CLM [12,13].Studies validating WSI systems for primary diagnostic purposes have been conducted by pathology laboratories across various subspecialties, including breast pathology [14][15][16].However, most of these studies primarily used hematoxylin-eosin (H&E) slides for primary diagnosis rather than focusing on the assessment of prognostic pathologic factors or the expression of IHC-stained biomarkers.As WSI becomes the established norm in surgical pathology, a pertinent line of inquiry emerges concerning the potential influence of integrating digital pathology into patient prognostic indicators within a real-world clinical environment.Consistently, there have been concerns regarding the use of WSI as a primary diagnostic tool in breast pathology, including the assessment of prognostic and predictive variables such as histological grade determination and interpretation of biomarker staining results [7,17,18].Furthermore, given the inherent limitations of visual assessment for evaluating biomarker expression using IHC, automated digital image analysis (DIA) has been proposed as a potential method to improve accuracy and inter-observer reproducibility when assessing IHC expression, and its utility has been analyzed [17,19].
Core needle biopsy (CNB) is one of the most common methods for performing pathological breast lesion diagnosis [20].When diagnosing invasive BC through CNB, it is imperative to assess not only the initial histological grade but also ER, PR, and HER2 through IHC testing [21].This process informs critical treatment decisions regarding potential neoadjuvant therapy before surgical interventions.In instances of complete pathological responses, the biopsy sample represents the sole remains of the available tumor [22].Additionally, CNB samples are preferred over excision for biomarker testing because this approach helps to prevent many fixation problems [3,23].Consequently, ensuring reliable evaluation of the histological grade and BC biomarker expression in CNB samples is critical, whether using CLM or WSI.
This study aimed to evaluate the effectiveness and reliability of WSI in BC CNB as a primary diagnostic method, focusing on determining the histological grade and characterizing biomarker expression, compared with CLM systems.The feasibility of using DIA to assess biomarker expression in clinical practice was also evaluated.

Case Selection and Immunohistochemistry
A total of 115 specimens of primary BC cases previously diagnosed through US-guided CNB at Chungnam National University Sejong Hospital from July 2020 to December 2022 were retrospectively analyzed.All routine H&E-stained slides were collected and reviewed.Specimens with scant tumor cells or poor fixation for IHC staining were excluded (n = 14).Four 4 µm sections from each formalin-fixed paraffin-embedded block were subjected to IHC using the Dako Omnis autostaining device (Agilent Technologies, Santa Clara, CA, USA).Four primary antibodies were used: ER (1:100, 6F11; Novocastra Laboratories, Newcastle, UK), PR (1:100, 16; Novocastra Laboratories), HER2 (C-erbB2 oncoprotein, 1:600, polyclonal; Dako, Glostrup, Denmark), and Ki-67 (1:50, MIB-1; Dako).This study was approved by the Institutional Review Board of Chungnam National University Sejong Hospital (IRB No. 2022-10-005) and contained a waiver for written informed consent based on the retrospective and anonymous character of this study.

Conventional Light Microscope and Pathologist Visual Grading and Scoring
Histological grading and IHC staining assessment were performed using glass slides via CLM with eyepieces with a field number of 22 mm (Nikon Ci-L, Tokyo, Japan).Cases were initially reviewed by three board-certified pathologists, each with varying levels of experience and training (Figure 1).The Nottingham combined histological grade (NG; Nottingham modification of the Scarff-Bloom-Richardson grading system) is recommended for the histological grading of conventional H&E slides by the College of American Pathologists and WHO guidelines.For this grade, the score for three categories is totaled: tubule formation (TF) as an expression of glandular differentiation (score 1-3), nuclear pleomorphism (NP) (score 1-3), and mitotic counts (MCs) (score 1-3).Combined scores of 3-5, 6-7, and 8-9 points were classified as grades 1, 2, and 3, respectively [24].The interpretation of ER and PR was based on the Allred score and defined positive when ≥1% of the tumor cell nuclei showed immunostaining, according to the 2010 ASCO/CAP guidelines [23,25].HER2 IHC was regarded as negative (0 or 1+), equivocal (2+), or positive (3+) based on the 2018 ASCO/CAP guidelines [26].Nuclear staining of any intensity was defined as Ki67 positive.The assessment of Ki67 staining was conducted globally by determining an average score across all tumor cells in invasive tumor areas, scored as 0, 1 (≤5%), 2 (5-30%), and 3 (≥30%) based on the report of the International Working Group on Ki67 in Breast Cancer [3].The assessment of biomarkers' IHC expression scores was conducted through a consensus meeting involving three observers to compare DIA results.

Conventional Light Microscope and Pathologist Visual Grading and Scoring
Histological grading and IHC staining assessment were performed using glass slides via CLM with eyepieces with a field number of 22 mm (Nikon Ci-L, Tokyo, Japan).Cases were initially reviewed by three board-certified pathologists, each with varying levels of experience and training (Figure 1).The Nottingham combined histological grade (NG; Nottingham modification of the Scarff-Bloom-Richardson grading system) is recommended for the histological grading of conventional H&E slides by the College of American Pathologists and WHO guidelines.For this grade, the score for three categories is totaled: tubule formation (TF) as an expression of glandular differentiation (score 1-3), nuclear pleomorphism (NP) (score 1-3), and mitotic counts (MCs) (score 1-3).Combined scores of 3-5, 6-7, and 8-9 points were classified as grades 1, 2, and 3, respectively [24].The interpretation of ER and PR was based on the Allred score and defined positive when ≥1% of the tumor cell nuclei showed immunostaining, according to the 2010 ASCO/CAP guidelines [23,25].HER2 IHC was regarded as negative (0 or 1+), equivocal (2+), or positive (3+) based on the 2018 ASCO/CAP guidelines [26].Nuclear staining of any intensity was defined as Ki67 positive.The assessment of Ki67 staining was conducted globally by determining an average score across all tumor cells in invasive tumor areas, scored as 0, 1 (≤5%), 2 (5-30%), and 3 (≥30%) based on the report of the International Working Group on Ki67 in Breast Cancer [3].The assessment of biomarkers' IHC expression scores was conducted through a consensus meeting involving three observers to compare DIA results.

Slide Digitization, Re-Grading, and Scoring with WSI
For WSI, H&E and corresponding IHC-stained slides were imaged at a high resolution (0.121 µm/pixel) and 40× magnification (40×/0.95Plan-Apochromat, Carl Zeiss Microscopy, NY, USA) with a single z-plane using a whole slide scanner (PANNORAMIC 250 Flash III, 3DHISTECH, Budapest, Hungary).Digital images were generated and saved in the MRXS format, managed with server software (Panoramic Scanner, 3DHISTECH),

Slide Digitization, Re-Grading, and Scoring with WSI
For WSI, H&E and corresponding IHC-stained slides were imaged at a high resolution (0.121 µm/pixel) and 40× magnification (40×/0.95Plan-Apochromat, Carl Zeiss Microscopy, NY, USA) with a single z-plane using a whole slide scanner (PANNORAMIC 250 Flash III, 3DHISTECH, Budapest, Hungary).Digital images were generated and saved in the MRXS format, managed with server software (Panoramic Scanner, 3DHISTECH), and retrieved using a file management web interface (CaseViewer, 3DHISTECH).The mean file size of the scanned images was as follows: 1.95 GB for H&E, 1.24 GB for ER, 1.24 GB for PR, 1.50 GB for HER2, and 1.39 GB for Ki67.Scanned digital images were evaluated for quality to ensure that they were in focus and analyzed using 27-inch 3840 × 2160 resolution monitors (4 K UHD, LG, Seoul, Korea).
For the intra-observer agreement of BC histological grading using WSI, all three pathologists graded all included cases using WSI blinded to the CLM grade and other clinicopathological parameters according to the same criteria used for CLM after a washout period of at least 3 months with no special training during that time [13,27].As for counting mitoses, the pathologists were provided instructions for annotating areas corresponding to a total area of 2.38 mm 2 , which corresponds to the area in the high-power fields evaluated using an eyepiece with a field diameter of 0.55 mm to perform MCs.ER, PR, HER2, and Ki67 IHC were also re-scored using WSI and the same CLM criteria.

Digital Image Analysis
Images of IHC stained slides from WSI were analyzed using DIA software (QuantCenter Digital Image Analysis Software Version 2.2; 3DHISTECH).Firstly, the images were reviewed by a breast pathologist at low magnification to identify and select the invasive tumor area to be scored.At least five areas to be scored were selected to represent the spectrum of staining observed in the initial WSI overview.The expression of each biomarker in the selected fields was analyzed using DIA software, applying the same scoring methods as those used by the pathologist for visual scoring, and the mean value of each case was obtained (Figure 2).and retrieved using a file management web interface (CaseViewer, 3DHISTECH).The mean file size of the scanned images was as follows: 1.95 GB for H&E, 1.24 GB for ER, 1.24 GB for PR, 1.50 GB for HER2, and 1.39 GB for Ki67.Scanned digital images were evaluated for quality to ensure that they were in focus and analyzed using 27-inch 3840 × 2160 resolution monitors (4 K UHD, LG, Seoul, Korea).
For the intra-observer agreement of BC histological grading using WSI, all three pathologists graded all included cases using WSI blinded to the CLM grade and other clinicopathological parameters according to the same criteria used for CLM after a washout period of at least 3 months with no special training during that time [13,27].As for counting mitoses, the pathologists were provided instructions for annotating areas corresponding to a total area of 2.38 mm 2 , which corresponds to the area in the high-power fields evaluated using an eyepiece with a field diameter of 0.55 mm to perform MCs.ER, PR, HER2, and Ki67 IHC were also re-scored using WSI and the same CLM criteria.

Digital Image Analysis
Images of IHC stained slides from WSI were analyzed using DIA software (QuantCenter Digital Image Analysis Software Version 2.2; 3DHISTECH).Firstly, the images were reviewed by a breast pathologist at low magnification to identify and select the invasive tumor area to be scored.At least five areas to be scored were selected to represent the spectrum of staining observed in the initial WSI overview.The expression of each biomarker in the selected fields was analyzed using DIA software, applying the same scoring methods as those used by the pathologist for visual scoring, and the mean value of each case was obtained (Figure 2).

Definition of Perfect Concordance, Minor Discordance, and Major Discordance
Perfect concordance was established as an absolute agreement between histological grading and biomarker expression.In histological grading and its components, minor discordance was defined as a disparity between grades 1 and 2 or grades 2 and 3. A major discordance can arise when there is a grading disparity of more than one level.In IHC staining, perfect concordance for ER and PR was defined as the same Allred score being assigned.For HER2 and Ki67, perfect concordance was defined as scores of 0, 1, 2, or 3 being matched.Minor concordance was defined as different staining scores with no clinical implications.Major discordance was defined as a notable shift in staining results that could have clinical implications, including positive versus negative outcomes for ER, PR, and HER2, as well as instances of equivocal versus negative HER2 staining [4].For Ki67, a grading discrepancy of more than one level was defined as major discordance.

Statistical Analysis
Cohen's kappa was utilized to assess intra-observer agreement when comparing CLM and WSI, with higher kappa values indicating a greater level of agreement: 0.01-0.20 indicated slight, 0.21-0.40indicated fair, 0.41-0.60indicated moderate, 0.61-0.80indicated substantial, and 0.81-0.99indicated strong agreement [28].Cohen's kappa was also used to compare intra-observer or intra-class correlations among CLM, WSI, and DIA.The differences between CLM and WSI for histological grade and biomarker expression scores were not normally distributed (p for Kolmogorov-Smirnov tests < 0.01), and the Wilcoxon signed-rank test was used to compare the paired difference between CLM and WSI for all values.Fleiss' kappa was utilized to estimate the concordance rates among the three pathologists (representing inter-observer variability) for each evaluation method.Statistical significance was set at p < 0.05.Statistical analyses were performed using SPSS software for Windows (version 26.0; SPSS, Chicago, IL, USA).

Patients and Clinicopathologic Characteristics
This study included 101 cases of BC, with 46 detected in the right breast and 55 in the left breast.The diagnosed cases were histologically categorized as follows: 88 cases of invasive carcinoma of no special type, 8 cases of invasive lobular carcinoma, 2 cases of invasive mucinous carcinoma, 1 case of papillary carcinoma, 1 case of tubular carcinoma, and 1 case characterized by a mixed presentation of invasive ductal and lobular carcinoma.All patients included in this study were female, with a median age of 55 years (range, 36-88 years).
Comparing the paired difference in NG between CLM and WSI, CLM had a higher grade than WSI for one observer.For TF, one observer showed a higher score with CLM than WSI, whereas the other two observers showed lower scores with CLM than WSI.For NP, the two observers achieved higher scores with CLM than WSI (Table 2).No significant paired differences were observed in MCs using CLM or WSI among the three observers.Intra-observer agreement for NG between CLM and WSI was substantial for all observers.For the individual grade components, TF and MCs showed moderate to substantial agreement.For NP, the degree of agreement ranged from fair to moderate for all observers (Figure 4 and Table S1).There was no significant difference in the variance between the WSI and CLM agreement rates for NG and its components (all kappa coefficients showed p values < 0.001).
Comparing the paired difference in NG between CLM and WSI, CLM had a higher grade than WSI for one observer.For TF, one observer showed a higher score with CLM than WSI, whereas the other two observers showed lower scores with CLM than WSI.For NP, the two observers achieved higher scores with CLM than WSI (Table 2).No significant paired differences were observed in MCs using CLM or WSI among the three observers.

Inter-Observer Agreement for Nottingham Grade and Its Components in CLM and WSI
Inter-observer agreement for NG was substantial both in CLM and WSI.For the individual categories, the degree of agreement ranged from moderate in TF to substantial in MC and fair in NP (Table 3).

Inter-Observer Agreement for Nottingham Grade and Its Components in CLM and WSI
Inter-observer agreement for NG was substantial both in CLM and WSI.For the individual categories, the degree of agreement ranged from moderate in TF to substantial in MC and fair in NP (Table 3).

Evaluation of Biomarker Expression with DIA
The comparison of BC biomarker expression between with CLM and DIA, as well as between WSI and DIA, was conducted for each observer.The results revealed moderate to substantial agreement among observers, with kappa values ranging from 0.676 to 0.753 for ER, 0.581 to 0.645 for PR, 0.614 to 0.769 for HER2, and 0.664 to 0.709 for Ki67 in the CLM/DIA comparison (Figure 6, Table S3).Similar kappa agreements were observed in the WSI/DIA comparison, ranging from 0.681 to 0.773 for ER, 0.616 to 0.663 for PR, 0.575 to 0.759 for HER2, and 0.656 to 0.726 for Ki67 (Figure 7, Table S3).To assess the utility of DIA, following a consensus meeting between the three observers, the intra-class correlations between CLM and DIA and between WSI and DIA were evaluated, and the results are presented in Figures 6 and 7 (see also Table S4).For ER, the degree of intra-class agreement was substantial between CLM and DIA (κ = 0.720) and between WSI and DIA (κ = 0.791).
the WSI/DIA comparison, ranging from 0.681 to 0.773 for ER, 0.616 to 0.663 for PR, 0.575 to 0.759 for HER2, and 0.656 to 0.726 for Ki67 (Figure 7, Table S3).To assess the utility of DIA, following a consensus meeting between the three observers, the intra-class correlations between CLM and DIA and between WSI and DIA were evaluated, and the results are presented in Figures 6 and 7 (see also Table S4).For ER, the degree of intra-class agreement was substantial between CLM and DIA (κ = 0.720) and between WSI and DIA (κ = 0.791).PR agreement was substantial in both intra-class analyses (κ = 0.664 for CLM/DIA and κ = 0.675 for WSI/DIA).For HER2, the agreement was substantial for both the compared methods (κ = 0.768 for CLM/DIA and κ = 0.796 for WSI/DIA).Ki67 interpretation achieved substantial to strong intra-class concordance (κ = 0.805 for CLM/DIA and κ = 0.721 for WSI/DIA).

Discussion
The grading of BC using the Nottingham combined histological grade is one of the strongest prognostic factors, independent of tumor size or the number of positive lymph nodes, and it is also incorporated into the AJCC Cancer Staging Manual [29,30].Despite increasing interest in utilizing WSI for primary diagnostic purposes, the digital validation

Discussion
The grading of BC using the Nottingham combined histological grade is one of the strongest prognostic factors, independent of tumor size or the number of positive lymph nodes, and it is also incorporated into the AJCC Cancer Staging Manual [29,30].Despite increasing interest in utilizing WSI for primary diagnostic purposes, the digital validation of BC prognostic factors has not yet been established in the literature.This study achieved a substantial level of intra-observer agreement for NG and its components among three pathologists between CLM and WSI.Furthermore, the inter-observer agreement regarding NG and its associated elements in WSI displayed agreement levels similar to that in CLM, comparable to the concordance rates reported by diverse pathologists who assessed BC grading using CLM (κ = 0.48-0.70)[31][32][33].
As for the individual components of NG, intra-observer agreement for NP scores was the most variable for all three observers.Moreover, NP showed the lowest agreement rate for inter-observer comparisons with CLM and WSI.Consistently, in previous studies, NP had the lowest intra-observer agreement of all components of NG between CLM and WSI [7,34] and the worst agreement component in inter-observer variation using WSI [35,36].As NP lacks a quantitative definition, in contrast to TF and MCs, it emerges as the least reproducible among the three grading components.Therefore, when interpretating NP, it is crucial to meticulously examine and compare it with the surrounding normal breast epithelium through not only CLM but also WSI.Compared to previous studies with no clear biases by format [7,34], in the present study, two observers showed consistently higher NP scores for CLM than for WSI, indicating bias.In contrast, for TF, two observers showed lower scores for CLM than with WSI.Additionally, previous studies have reported that WSI shows reduced ability to identify MCs [34,37].Rakha et al. demonstrated that, among the three NG components, the most challenging to evaluate by WSI was MCs because of the difficulty in discerning mitotic figures from apoptotic cells [7].They also recommended using a higher magnification (×40) to ensure adequate resolution for accurate grading.In the present study, we conducted WSI and graded MCs at ×40 magnification.The improvement in the MC agreement rate through high-magnification scanning is worth noting.However, the substantial size of the files may limit the utility of this technique for routine diagnostic purposes, especially considering the high storage capacity and costs involved [18].In the present study, even though all H&E slides were derived from CNB specimens, the scanned file size was substantial (range: 0.59-4.97GB; mean: 1.95 GB).Efforts to reduce storage requirements are necessary to make this approach more practical for diagnostic purposes.
For patients with BC, determining prognosis and treatment strategies based on ER, PR, HER2, and Ki67 status depends on accurate IHC evaluation [3,6].The conventional approach for IHC assessment involves visually determining and scoring positivity by manually counting stained cells.Although WSI has gained broader acceptance in surgical pathology for primary diagnosis, the digital validation of BC biomarker expression has not been established [4].Previous studies have attempted to employ WSI to validate primary diagnoses when reporting breast biomarkers; most were focused on HER2 stains, reporting a substantial kappa value (κ = 0.791) and substantial agreement percentages (range, 61.3-92.5%)[38,39].In the present study, consistent results were observed, with a substantial level of perfect concordance (65.3-92.0%)and kappa coefficients (0.563-0.888) for the CLM/WSI pairs in evaluating ER, PR, HER2, and Ki67 expression.Based on these findings, a consensus was reached that WSI is non-inferior to CLM when interpreting breast biomarkers, although each pathologist achieved slightly different concordance rates.Furthermore, there are concerns regarding the differences in color tone and contrast of immunostained materials when scanned into the WSI device.The HER2 scores on WSI were shown to be higher than those on glass slides, possibly because of the increased color contrast in WSI [38].The current study revealed no apparent biases regarding intraobserver variability concerning HER2 scores based on the format used.This pattern was consistently observed for other biomarkers.Including IHC-positive controls in the slides likely contributed to this consistency, as described in a previous study [4].Additionally, PR concordance was slightly lower than that of ER.Previous studies have consistently indicated that PR expression shows lower agreement than ER expression in assessing interobserver variability [40,41].PR is a target gene regulated by estrogen and naturally displays greater homogeneity in normal breast tissues and tumors [22,42].Intermediate biomarker expression categories are less reproducible than categories at the extremes [43,44].Therefore, the heterogeneous expression of PR may be linked to reduced levels of intra-observer agreement, and a more cautious approach is advised for observers when interpreting biomarkers within tumors exhibiting heterogeneous expression, not just through CLM but also through WSI.
In clinical practice, IHC is considered a standard diagnostic tool for tumor classification, therapeutic decision-making, and prognostic factors in BC and other malignancies [5,45].Nevertheless, manual interpretation of BC biomarker expression has inherent limitations, such as subjectivity and variability between different observers [46].In the present study, we assessed inter-observer concordance of BC biomarker expression through visual assessment, revealing lower concordance rates, especially for PR and Ki67 using CLM and HER2 using WSI.Importantly, our findings suggest that inter-observer variability is not specific to particular biomarkers or expression patterns.Automated DIA, conversely, is a promising alternative that could produce precise results with enhanced accuracy and reliability [17,19,47].However, a consensus statement from the College of American Pathologists expert panel underscores the necessity of validating the use of DIA against other methods, acknowledging the insufficient published data available to establish best practices [48].In the present study, the application of DIA to assess biomarker expression exhibited an enhanced kappa coefficient compared with the inter-observer agreement, particularly for HER2 and Ki67.Notably, when compared to each observer's individual assessment, DIA exhibited an improved kappa coefficient when considering the consensus of three observers for the expression of most biomarkers, both with CLM and WSI.This study's results align with previous observations, suggesting that automated HER2 IHC measurements are more comparable to consensus visual scores determined by multiple pathologists, as well as HER2 gene amplification data [49].Given the impracticality of achieving consensus scoring by experts in routine practice, DIA may enhance the quality of biomarker expression assessment.These findings highlight the capability of DIA to improve agreement and concordance in biomarker expression assessment compared to manual assessment with CLM, as well as the consistency of results with WSI.The observed agreements emphasize that integrating DIA into the diagnostic workflow in clinical practice can significantly enhance scoring reproducibility among observers and improve objective assessment.
Given the recent efforts to validate WSI, it is crucial to underscore its numerous potential benefits [50].WSI facilitates the easy exchange in pathological opinions between medical institutions located remotely, improves pathology education and learning experiences by enhancing educational environments, can enhance the accuracy and efficiency of pathological interpretation through automated DIA and computer-aided tools, and decreases problems associated with the retrieval of glass slides from physical storage sites.However, intra-observer discrepancies remain problematic, particularly in borderline, difficult, or challenging cases, which are often sources of disagreement [16].Difficulties in identifying mitotic figures, nuclear details, and chromatin patterns are also commonly reported [51].Integrating DIA is useful for quantifying pathological images and identifying objects and can enhance the consistency and accuracy of pathological interpretation [17].However, it requires technical skills for the implementation and maintenance of complex DIA software and difficulty in accurately identifying some pathological features due to limitations in algorithms.
One of the major limitations of this study was the relatively small number of cases.This study was conducted at a single institution, which could affect the external validity of the results as variations may arise in different clinical settings, with diverse devices, or based on the pathologist's training level.Additionally, because all samples included in this study were CNB, the results may differ from those of the excision samples.However, as this study's aim was to assess WSI's effectiveness and reliability as a primary diagnostic tool, focusing on histological grade and the assessment of biomarker expression in BC CNB, we believe that the use of WSI could be viewed as a strength.Furthermore, this study's data are vital for developing guidelines and protocols for integrating WSI into routine pathology practice, ultimately enhancing diagnostic accuracy.

Conclusions
Overall, the results of inter-and intra-observer agreements regarding NG and its components, along with the assessment of biomarker expression in BCs, indicated no significant difference between the interpretations from CLM and WSI.However, a more cautious approach is advisable when interpreting histological grading and biomarker expression within tumors exhibiting heterogeneous histological or biomarker expression patterns.This study offers substantial evidence supporting the use of WSI for assessing prognostic and predictive variables in BC, including NG and biomarker expression, for routine diagnostic purposes.Furthermore, the incorporation of DIA for assessing biomarker expression has the potential to significantly improve scoring reproducibility.
Institutional Review Board Statement: This study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Chungnam National University Sejong Hospital (IRB No. 2022-10-005, data of approval; 26 October 2022).

Informed Consent Statement:
The IRB approval (CNUSH 2022-10-005) contained a waiver for written informed consent based on the retrospective and anonymous character of this study.

Figure 1 .
Figure 1.Three board-certified pathologists participated in this study, and their years of experience were documented.WSI, whole slide imaging.

Figure 1 .
Figure 1.Three board-certified pathologists participated in this study, and their years of experience were documented.WSI, whole slide imaging.

Figure 2 .
Figure 2. Examples of images analyzed using digital image analysis (DIA) software for the assessment of breast cancer biomarker expression (QuantCenter Digital Image Analysis Software Version 2.2, 3DHISTECH, Budapest, Hungary).(a) Tumor cells strongly stained for ER, detected via the software, and highlighted as red circles.(b) PR-stained slide from the same case as ER, exhibiting a more heterogenous pattern compared to that of ER.Different staining intensities are indicated by color (0: blue; 1+: yellow; 2+: orange; 3+: red).(c) HER2-stained image classified as 0 (blue), 1+ (yellow), 2+

Figure 3 .
Figure 3.An example of minor discordant Nottingham combined histologic grade and its component scores between CLM and WSI, demonstrating both intra-observer and inter-observer discordance.The specimen comprised two biopsy cores exhibiting heterogeneous histologic patterns.(a) One core showed poor glandular differentiation but had lower mitotic counts.(b) The other core displayed enhanced glandular differentiation but had higher mitotic counts.(a,b) WSI showed the possible appearance of mitosis-like figures surrounded by red circles.The images were captured at original magnification: ×40.

Figure 3 .
Figure 3.An example of minor discordant Nottingham combined histologic grade and its component scores between CLM and WSI, demonstrating both intra-observer and inter-observer discordance.The specimen comprised two biopsy cores exhibiting heterogeneous histologic patterns.(a) One core showed poor glandular differentiation but had lower mitotic counts.(b) The other core displayed enhanced glandular differentiation but had higher mitotic counts.(a,b) WSI showed the possible appearance of mitosis-like figures surrounded by red circles.The images were captured at original magnification: ×40.

Figure 4 .
Figure 4. Intra-observer agreement of Nottingham combined histologic grade and its component scores between CLM and WSI utilizing kappa.All kappa coefficients demonstrated significance (p < 0.001).

Figure 4 .
Figure 4. Intra-observer agreement of Nottingham combined histologic grade and its component scores between CLM and WSI utilizing kappa.All kappa coefficients demonstrated significance (p < 0.001).

Figure 5 .
Figure 5. Intra-observer agreement of breast cancer biomarker expression between CLM and WSI using kappa.All kappa coefficients demonstrated significance (p < 0.001).

Figure 5 .
Figure 5. Intra-observer agreement of breast cancer biomarker expression between CLM and WSI using kappa.All kappa coefficients demonstrated significance (p < 0.001).

Figure 6 .
Figure 6.Agreement for breast cancer biomarker expression between CLM and DIA among three observers and their consensus.All kappa coefficients demonstrated significance (p < 0.001).

Figure 6 . 16 Figure 7 .
Figure 6.Agreement for breast cancer biomarker expression between CLM and DIA among three observers and their consensus.All kappa coefficients demonstrated significance (p < 0.001).

Figure 7 .
Figure 7. Agreement for breast cancer biomarker expression between WSI and DIA among three observers and their consensus.All kappa coefficients demonstrated significance (p < 0.001).

Table 1 .
Intra-observer concordance of the Nottingham combined histologic grade and its component scores between CLM and WSI.

Table 1 .
Intra-observer concordance of the Nottingham combined histologic grade and its component scores between CLM and WSI.

Table 2 .
Results of Wilcoxon signed-rank test comparing Nottingham combined histologic grade and its component scores between CLM and WSI for each observer.

Table 3 .
Inter-observer agreement of Nottingham combined histologic grade and its component scores in CLM and WSI.

Table 2 .
Results of Wilcoxon signed-rank test comparing Nottingham combined histologic grade and its component scores between CLM and WSI for each observer.CLM, conventional light microscopy; WSI, whole slide imaging.p values in bold indicate significance (p < 0.05).

Table 3 .
Inter-observer agreement of Nottingham combined histologic grade and its component scores in CLM and WSI.

Table 4 .
Intra-observer concordance of breast cancer biomarker expression between CLM and WSI.

Table 5 .
Results of Wilcoxon signed-rank test comparing breast cancer biomarker expression between CLM and WSI for each observer.

Table 5 .
Results of Wilcoxon signed-rank test comparing breast cancer biomarker expression between CLM and WSI for each observer.CLM, conventional light microscopy; WSI, whole slide imaging.

Table 6 .
Inter-observer agreement of breast cancer biomarker expression in CLM and WSI.

Table 6 .
Inter-observer agreement of breast cancer biomarker expression in CLM and WSI.