Department Wide Validation in Digital Pathology—Experience from an Academic Teaching Hospital Using the UK Royal College of Pathologists’ Guidance

Aim: we describe our experience of validating departmental pathologists for digital pathology reporting, based on the UK Royal College of Pathologists (RCPath) “Best Practice Recommendations for Implementing Digital Pathology (DP),” at a large academic teaching hospital that scans 100% of its surgical workload. We focus on Stage 2 of validation (prospective experience) prior to full validation sign-off. Methods and results: twenty histopathologists completed Stage 1 of the validation process and subsequently completed Stage 2 validation, prospectively reporting a total of 3777 cases covering eight specialities. All cases were initially viewed on digital whole slide images (WSI) with relevant parameters checked on glass slides, and discordances were reconciled before the case was signed out. Pathologists kept an electronic log of the cases, the preferred reporting modality used, and their experiences. At the end of each validation, a summary was compiled and reviewed with a mentor. This was submitted to the DP Steering Group who assessed the scope of cases and experience before sign-off for full validation. A total of 1.3% (49/3777) of the cases had a discordance between WSI and glass slides. A total of 61% (30/49) of the discordances were categorised as a minor error in a supplementary parameter without clinical impact. The most common reasons for diagnostic discordances across specialities included identification and grading of dysplasia, assessment of tumour invasion, identification of small prognostic or diagnostic objects, interpretation of immunohistochemistry/special stains, and mitotic count assessment. Pathologists showed similar mean diagnostic confidences (on Likert scale from 0 to 7) with a mean of 6.8 on digital and 6.9 on glass slide reporting. Conclusion: we describe one of the first real-world experiences of a department-wide effort to implement, validate, and roll out digital pathology reporting by applying the RCPath Recommendations for Implementing DP. We have shown a very low rate of discordance between WSI and glass slides.


Introduction
Digital pathology (DP), reporting with whole slide images (WSI) created from highresolution scans of glass slides, has the potential to improve the quality, safety, and efficiency of diagnostic pathology [1][2][3]. Few UK departments have fully transitioned to primary DP, but as part of the National Health Service (NHS) long term plan for digitally enabled care, more are embarking on this journey [4]. This technological modernisation

Results
Pathologists took a mean of 362 days (11.9 months) and a median of 381 days (12.5 months, range 10-793 days) to complete Stage 2 validation (Table 1, Figure 1). of all submitted Stage 2 forms to be verified by the speciality lead, followed by submi of the summary to the DP Steering Group. If range and suitability requirements wer met, either the DP lead or a DP mentor would discuss the caveats, ensure understan of pitfalls, and advise on potential mitigating measures. The verified summaries sub ted by the 20 participating histopathologists were the source material for this paper.

Results
Pathologists took a mean of 362 days (11.9 months) and a median of 381 days months, range 10-793 days) to complete Stage 2 validation (Table 1, Figure 1).  A total of 3777 cases were viewed, with 100% cases being viewed digitally and with a glass check, with a mean average of 135 cases viewed per pathologist. The spe ities varied in the number of cases reported during validation. The lowest and hig number of cases needed for Stage 2 validation was 50 and 242 cases for histopatholo reporting UGC and skin, respectively ( Table 2).  A total of 3777 cases were viewed, with 100% cases being viewed digitally and then with a glass check, with a mean average of 135 cases viewed per pathologist. The specialities varied in the number of cases reported during validation. The lowest and highest number of cases needed for Stage 2 validation was 50 and 242 cases for histopathologists reporting UGC and skin, respectively (Table 2). During validation, an average technical failure rate of 2.6% was encountered, with Liver being the speciality that most often encountered technical issues (15/99 cases) ( Figure 2). These hardships were mostly due to out of focus areas and difficulties in interpreting the orcein stain on WSIs due to a perceived lower contrast compared to glass slides (Table S2). These difficulties tended to become less reported or were resolved over time. Other technical issues were related to scanning or to interaction with the image viewing interface. In only 0.1% of cases (3/3777) a poor-quality digital image led to deferral to glass (such a technical issue was categorised as A). Of note, the focus in this validation process was on diagnostic discordance, and thus this should not necessarily be considered an absolute rate of technical issues. Most technical issues did not stop the pathologist from reporting the case digitally but sometimes caused delay due to rescanning or interface malfunction. Anecdotally, most technical issues encountered resolved over time with improvements in workflow, bandwidth arrangements, development of more robust pipelines, and with departmental learning.
During validation, an average technical failure rate of 2.6% was encountered, with Liver being the speciality that most often encountered technical issues (15/99 cases) (Figure 2). These hardships were mostly due to out of focus areas and difficulties in interpreting the orcein stain on WSIs due to a perceived lower contrast compared to glass slides (Table S2). These difficulties tended to become less reported or were resolved over time. Other technical issues were related to scanning or to interaction with the image viewing interface. In only 0.1% of cases (3/3777) a poor-quality digital image led to deferral to glass (such a technical issue was categorised as A). Of note, the focus in this validation process was on diagnostic discordance, and thus this should not necessarily be considered an absolute rate of technical issues. Most technical issues did not stop the pathologist from reporting the case digitally but sometimes caused delay due to rescanning or interface malfunction. Anecdotally, most technical issues encountered resolved over time with improvements in workflow, bandwidth arrangements, development of more robust pipelines, and with departmental learning.

Cases with Discordances
A total of 1.3% of cases (49/3777) dual reported on glass and digital (range: none-Gynaecology, H&N, and Respiratory to 5.3% of cases-BKP) ( Figure S1). This does not include those caused by technical issues (category A) which are counted separately, as outlined above (n = 3). Most discordances (30/49 cases) were of no clinical impact. Where potential harm could have ensued (19/49 cases), this was due to a minor error in a supplementary parameter in 16 cases (B3 error, Table 3). In three cases, there was a discordance that implied a significant error in a supplementary parameter (B2 error); in all these three instances, in recognition that these were likely challenging cases, the pathologists stated that once fully validated they would have checked glass slides before authorizing the case. Two of the cases were bladder biopsies, one in which a significant amount of inflammation made it difficult to assess dysplasia on WSIs, and another was a case of a urothelial carcinoma in which grading was not readily clear on digital slides. The final B2 error was a missed micrometastasis in a pericolonic lymph node; this did not change the staging as other nodal metastases had already been identified.

Cases with Discordances
A total of 1.3% of cases (49/3777) dual reported on glass and digital (range: none-Gynaecology, H&N, and Respiratory to 5.3% of cases-BKP) ( Figure S1). This does not include those caused by technical issues (category A) which are counted separately, as outlined above (n = 3). Most discordances (30/49 cases) were of no clinical impact. Where potential harm could have ensued (19/49 cases), this was due to a minor error in a supplementary parameter in 16 cases (B3 error, Table 3). In three cases, there was a discordance that implied a significant error in a supplementary parameter (B2 error); in all these three instances, in recognition that these were likely challenging cases, the pathologists stated that once fully validated they would have checked glass slides before authorizing the case. Two of the cases were bladder biopsies, one in which a significant amount of inflammation made it difficult to assess dysplasia on WSIs, and another was a case of a urothelial carcinoma in which grading was not readily clear on digital slides. The final B2 error was a missed micrometastasis in a pericolonic lymph node; this did not change the staging as other nodal metastases had already been identified. Causes for discordances varied within specialities ( Figure 3 and Table 4) but common themes included identification and grading of dysplasia (12 cases; Prostate, BKP, and Breast), assessment of small areas of tumour invasion (three cases; BKP, GI, and Liver), mitotic count assessment (4 cases; Breast and Skin), and diagnosis of unusual or complex cases (three cases; BKP and Renal). Identification of small prognostic or diagnostic objects was another common cause for discordance (eight cases; Prostate, BKP, Breast, Renal, GI, Liver, and Skin) and specific examples include identification of micro-organisms (Renal, GI, and Liver), weddellite calcification (Breast), metastasis or micrometastasis in lymph nodes (GI and Prostate), and small deposits of amyloid (Renal and Skin) and mucin (Breast and GI). Interpretation of immunohistochemistry and special stains led to discrepancies in four cases (Breast and GI). This included assessment of toluidine blue stain (GI) and immunohistochemistry for Her2 (Breast). To the best of our knowledge, these examples have not been previously reported in the literature.

Potential Pitfalls of Digital Reporting That Did Not Cause Discordance
Several features on the digital platform were found to be challenging by pathologists, although they did not result in a discordance (Table S3). The most common potential pitfalls included interpretation of special stains in suspected infection (20 comments; Renal, GI, and Liver), differentiation of reactive atypia from dysplasia in an inflammatory background (17 comments; BKP and Breast), identification of mitoses (15 comments; Breast, Skin, GI, Liver, Gynaecology, and Respiratory), and identification of necrosis (six comments; BKP, Renal, Liver, and Gynaecology). A tendency to over-diagnose low-grade dysplasia was noted (nine comments; BKP, Breast, and GI). Less widely reported pitfalls included identification of spermatogonia (10 comments; UGC), identification of spikes and lucencies on silver stain (three comments; Renal), giant cells (one comment; Respiratory), foreign body material (one comment; BKP), and Paneth cell/intestinal metaplasia (two comments; GI). Interface changes were felt to be harder to identify digitally in skin WSI (two comments).
Histopathologists offered varying perspectives on some parameters: whereas lymphovascular invasion in germ cell tumour of the testis was found to be more easily detected on the digital platform by some UGC pathologists (three comments), other UGC pathologists found it more challenging digitally (two comments). Similarly, detection of perineural invasion was considered more difficult in prostate and skin WSI (three comments), and easier in H&N cases (one comment), when compared to their glass counterparts.  Causes for discordances varied within specialities ( Figure 3 and Table 4) but common themes included identification and grading of dysplasia (12 cases; Prostate, BKP, and Breast), assessment of small areas of tumour invasion (three cases; BKP, GI, and Liver), mitotic count assessment (4 cases; Breast and Skin), and diagnosis of unusual or complex cases (three cases; BKP and Renal). Identification of small prognostic or diagnostic objects was another common cause for discordance (eight cases; Prostate, BKP, Breast, Renal, GI, Liver, and Skin) and specific examples include identification of micro-organisms (Renal, GI, and Liver), weddellite calcification (Breast), metastasis or micrometastasis in lymph nodes (GI and Prostate), and small deposits of amyloid (Renal and Skin) and mucin (Breast and GI). Interpretation of immunohistochemistry and special stains led to discrepancies in four cases (Breast and GI). This included assessment of toluidine blue stain (GI) and immunohistochemistry for Her2 (Breast). To the best of our knowledge, these examples have not been previously reported in the literature.   It was assessed as just reaching 2+ on digital (with low confidence) but on glass check was less intense and considered a 1+ category. The pathologist highlighted that they were aware of difficulty in assessing these cases on digital and would seek more practice and experience. (C1): A bladder biopsy showing florid reactive changes. It was easier on glass to be more confident that the changes were reactive rather than neoplastic with the pathologist reflecting that the atypia stands out more on digital. (C2) CK20 staining was not full thickness, staining umbrella cells only and being supportive of reactive changes.  It was assessed as just reaching 2+ on digital (with low confidence) but on glass check was less intense and considered a 1+ category. The pathologist highlighted that they were aware of difficulty in assessing these cases on digital and would seek more practice and experience. (C1): A bladder biopsy showing florid reactive changes. It was easier on glass to be more confident that the changes were reactive rather than neoplastic with the pathologist reflecting that the atypia stands out more on digital. (C2) CK20 staining was not full thickness, staining umbrella cells only and being supportive of reactive changes.   N/A. Case where an incidental benign naevus was seen on glass which was not seen on digital. The main pathology was the basal cell carcinoma excision however and the parameters of the tumour were not affected so the clinical outcome did not change.

Learning Curves over Time
Some of the challenges encountered during Stage 2 validation were considered part of a learning curve and improved over time (Table 5).

Diagnostic Areas That Potentially May Be Easier on the Digital Platform
Forty-four comments were recorded reporting a preference for the low power and wide field overview available on the digital platform, as it made it easier to pan over the slide (Prostate, UGC, Skin), to assess tumour distribution (Prostate) and vascular invasion (Liver and UGC) (Table S4). A small biopsy assessment was found to be easier and quicker (three comments; BKP and Skin). Certain small diagnostic features were also easier to visualise on digital (18 comments) such as inflammatory cells (GI), ova (BKP), viral inclusions (Renal), oxalate crystals (Renal), Candida sp. and pinworms (GI), and megamitochondria (Liver). Interpretation of some stains was easier on DP (15 comments) including interpretation of some grades of Her2 status (Breast), and silver stain for identification of glomerular basement membrane ruptures (Renal). A perceived starker contrast of immunohistochemistry stains on WSI was helpful in the interpretation of mismatch repair (MMR) protein and C4d staining (three comments), particularly when it appeared weak/doubtful on glass.

Pathologists' Experiences of Digital Reporting
The digital interface played an important role in the pathologists' experience of digital reporting (Table S5). The features most quoted as helpful on digital slides were the ability to take measurements more quickly and precisely (90 comments) and the low magnification view (38 comments). Pathologists commented that caution is needed, however, with lowpower overview as it can lead to false assurance and slides must still be reviewed carefully to avoid missing high-power features (one comment). The importance of carefully scanning the slide on WSI as is done on glass slides was emphasised in dermatopathology for review of melanocytic lesions in particular.
Other significant advantages were pointed out, such as drafting the report on the screen whilst simultaneously looking at the digital slides (13 comments) and being able to simultaneously visualise H&E slides alongside immunohistochemistry for easy comparison (12 comments; Urology, GI, and Gynaecology). Another common theme was how DP facilitated double reporting (21 comments), including working with and teaching trainees (two comments) and how remote working was made possible (two comments).
The digital interface allowed cases to be shown to clinicians (seven comments), and a renal pathologist stated it to be "an extraordinary asset for multidisciplinary team and individual case discussion and allowed us to keep going in COVID-19 times".
A total of 38 comments were made regarding the time-consuming effort of screening large areas using a mouse (Table S6). These complaints tended to become less frequent over time. One pathologist reported wrist pain from scanning extra-large blocks; a 3D mouse was supplied, and it helped to relieve these symptoms.

Diagnostic Confidence and Diagnostic Modality Preference
Across specialities, pathologists showed similar mean diagnostic confidences (on a Likert scale from 0 to 7)-a mean of 6.8 on digital and 6.9 on glass slide reporting (Table S7). A total of 41% of pathologists preferred to report using the digital platform or had no preference between reporting modalities (51%; Figure 4 and Table S8). The minority of cases in which glass reporting was preferred (8%) included those with large tissue areas for scanning, mitotic counting, grading of dysplasia when borderline, and identification of subtle microorganisms. Glass reporting was also preferred when diagnosis required polarisation (Skin and Renal in particular), and for unusual or challenging cases.
Likert scale from 0 to 7)-a mean of 6.8 on digital and 6.9 on glass slide reporting (Table  S7). A total of 41% of pathologists preferred to report using the digital platform or had no preference between reporting modalities (51%; Figure 4 and Table S8). The minority of cases in which glass reporting was preferred (8%) included those with large tissue areas for scanning, mitotic counting, grading of dysplasia when borderline, and identification of subtle microorganisms. Glass reporting was also preferred when diagnosis required polarisation (Skin and Renal in particular), and for unusual or challenging cases.

Discussion
To the best of our knowledge, this is the first description of the real-time experience of a pan-department implementation of DP validation using the UK RCPath guidance [12]. Overall, the guidance was found to be appropriate and easily applied.

Engagement and Time to Validation
Not every consultant who was eligible for Stage 2 validation completed the process within the lead time, which is a limitation of this study. This reflects the reality of Histopathology departments, where some pathologists may struggle more than others to complete the validation process due to lack of time, aversion to novelty, logistical constraints, or personal circumstances. Departments should consider how to overcome these challenges, by, for example, including Stage 2 validation as part of the yearly personal development plan, and actively chasing the completion of the process.

Discussion
To the best of our knowledge, this is the first description of the real-time experience of a pan-department implementation of DP validation using the UK RCPath guidance [12]. Overall, the guidance was found to be appropriate and easily applied.

Engagement and Time to Validation
Not every consultant who was eligible for Stage 2 validation completed the process within the lead time, which is a limitation of this study. This reflects the reality of Histopathology departments, where some pathologists may struggle more than others to complete the validation process due to lack of time, aversion to novelty, logistical constraints, or personal circumstances. Departments should consider how to overcome these challenges, by, for example, including Stage 2 validation as part of the yearly personal development plan, and actively chasing the completion of the process.
There was great variation in the time taken to complete Stage 2 and to obtain a fully validated status for DP reporting (Table S9). The shortest time (10 days) was recorded for a pathologist who submitted a log of 19 cases when applying for validation to report prostate core biopsies. Although this appears to conflict with the 60-case tally suggested by the College of American Pathologists, the 19-case figure is representative of approximately one months' worth of workload for such specimens and is in keeping with the RCPath suggestions of 1-3 months full-time equivalent practice [11,12]. In contrast, the longest time (793 days) was taken by a Gynaecology pathologist due to a year's gap. It is worth noting that the bulk of our observations coincided with the COVID-19 pandemic, which led to a discontinuous inflow of some specimens and variable engagement with the validation process.
Although the figures above may suggest leniency, these data reflect a real-world environment. The validation set does not necessarily represent consecutive cases or continuous periods of practice, and some pathologists underwent validation in multiple specialities simultaneously. This further underlines the importance of flexibility to achieve validation.

Number of Cases Needed for Stage 2 Validation
UGC was the speciality with the lowest average of cases needed for sign-off (Table S9). This is in keeping with UGC as a small component of Urology with relatively rare tumour types. This contrasts to Skin, where the average of 242 cases reflects the higher case volume of routine specimens with a wide range of variability [18]. During validation, the Skin team felt that the range of specimens was more important than the numbers to achieve the fully validated status. The need to include a spectrum of skin lesions, for example, melanocytic and nonmelanocytic tumours, and inflammatory dermatoses for a thorough DP validation process, has been highlighted by dermatopathologists undergoing validation elsewhere [18,19].

Causes of Discordance and Mitigating Strategies
The discordances that arose were relatively few and none were serious. Most consisted of previously described pitfalls [2,20].
Grading of dysplasia is one of the most common causes of clinically significant discordances in DP, across different specialities [2]. Discordance can be caused by missing a focus of dysplasia on the initial low-power review and blurring of nuclear detail when at high power [21]. Assessment of dysplasia on the digital platform has been shown to improve with experience as the features are correlated on digital with glass [22]. This is a key area for the pathologist undergoing validation in DP, and self-validation with glass checks is strongly advised when dysplasia is diagnosed on WSI [2].
Identification of mitoses was a common potential pitfall, raised by Breast, Skin, GI, Liver, Gynaecology, and Respiratory pathologists. Most used the 0.2 mm 2 grid function available digitally for counting mitoses, which correlates to 0.50/0.51 mm field diameter scores. A glass check is advised for mitotic count, particularly when there is any uncertainty or if the mitotic count is at a borderline value that can lead to clinical impact [21]. This was particularly highlighted as necessary by departmental GI and Respiratory pathologists when facing cases of neuroendocrine and gastrointestinal stromal tumours, as described. Of note, there is a tendency for guidelines to move from assessment of features per power field(s) to millimetre square to accommodate the transition from glass to digital. The assessment of somatic type malignancy in Teratoma, Post Pubertal Type exemplifies this [23].
An important cause of diagnostic discordance is the identification of small diagnostic and prognostic objects which are often eosinophilic and refractile due to the technical nature of the scans. DP also lacks the depth of field (to a degree) available on glass and cannot be polarized [2,20,21]. Future mitigating strategies for these may include scanners capable of Z-plane stacking, focus blending techniques, opting to scan/re-scan sections on higher magnification, or higher definition settings where needed.
Some difficulties in interpreting immunohistochemistry and special stains reported by us are in keeping with prior literature [21]. Our group has, however, highlighted potential challenging areas that appear not to have been acknowledged in the precursory literature. Our consensus is that decisions depending on these should be deferred to glass slides.
Finally, the three B2 errors recorded over the whole validation process led us to elaborate on mitigating measures. Of note, two cases referred to problematic identification and grading of urothelial carcinoma. This is a known area of potential challenge on DP, and histopathologists are strongly advised to consider deferring this decision to glass slides when challenging [15]. In the case of the missed micrometastasis, the histopathologist involved in the case provided deep reflective learning, which led to changes to personal and team-wide practice. It has become the standard procedure to review all lymph node glass slides from specimens of colorectal resection staged as pT2 without lymph node metastases (pN0) or lymphovascular invasion detected on WSI. This measure prevents "understaging" and allows consideration of adjuvant therapy, should metastases or lymphovascular invasion be demonstrated. Future studies from other centres undergoing DP validation sharing their experiences will further help to elucidate potential challenges in DP and strategies to overcome them.
Of note, it may be that the discordance rate (1.3%) described by our group is an underestimate of the true discordance between digital and glass assessments. This limitation is inherent in the RCPath recommended method of validation as the pathologist self-verifies on glass and digital at the time of diagnosis and without blinding and/or dissociation between the two modalities it is not possible to capture the underlying true discordance rate.
However, the RCPath validation process is intended to be pragmatic and able to be applied in real-world prospective validation settings. Previous studies have already published on the discordance rates using more rigid retrospective methodologies including wash-out periods between reviews and a meta-analysis, and systematic review of 25 studies found a 1.7% discordance rate [2]. A process that requires double reporting and/or re-review after a wash-out period would likely make the validation unmanageable for departments with time requirements for validation already highlighted as an issue (one comment).

Present and Future Perspectives
Validated consultants are now reporting on DP and refer to glass slides only when required on a case-by-case basis. We have an ongoing audit of 10% of digitally reported cases and continue to share our experiences of DP within our speciality teams. The laboratory still sends out glass slides in parallel with the digital images, and this will persist until every pathologist is fully validated. The benefits of a fully digitised workflow are yet to be seen, although positive impacts on quality and flexibility are already noted. This study does not aim to address matters of economy, workflow, or efficiency of DP compared to traditional glass-slide based diagnoses, which has been published elsewhere [3,24,25]. NHS Screening programme cases are currently not reported fully digitally, but national studies are underway to address the evidence gap in these programmes [26,27].
Pathologists' confidence in reporting digitally compared with glass varied, but there were no trends by speciality. There are likely to be many factors influencing this, such as individual pathology experience and previous digital experience. Confidence is likely to also be influenced by personal approach to risk, and many pathologists reverted to glass for unusual or challenging cases. This is likely related to confidence in DP when faced with diagnostic uncertainty [28].
In this paper we focus on DP validation and its potential to leverage quality, efficiency, and flexibility benefits, however, it also provides the platform and infrastructure for deployment of AI for diagnostic assistance of pathologists. Validation for DP does not necessarily confer validation for use of AI and the process for this needs careful thought. Laboratories undertaking DP should aim for accreditation for this activity, for example in the UK this would be UKAS accreditation (ISO:15189) with an extension to scope for DP.

Conclusions
We have described our pan-departmental experiences of undergoing validation in DP covering a wide range of specialities and have demonstrated the real-world experiences and challenges of primary digital reporting. Whilst we have come across many of the established findings in DP and mitigated for them, we provide evidence of new insights across a greater breadth of specialities.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/diagnostics13132144/s1, Figure S1: Cases with discordances (% of total cases); Table S1: Specialities of the histopathologists pathologists in this study; Table S2: Technical issues encountered; Table S3: Areas of potential pitfalls with the digital platform by speciality; Table S4: Diagnostic areas noted to be easier on the digital platform; Table S5. Key examples of situations where digital pathology assisted histopathologists; Table S6: Issues reported to have arisen due to use of the digital interface; Table S7: Pathologist's diagnostic confidence; Table S8: 'Results of Stage 2 pathologist views'. Diagnostic preferences of pathologists by speciality. Pathologist confidence scores on digital reporting vs. glass reporting; Table S9: Summary of all the data.  Institutional Review Board Statement: Not applicable as it is a service improvement exercise.
Informed Consent Statement: Informed consent was obtained from pathologists and the OUHFT Cellular Pathology Governance Group.

Data Availability Statement:
All data available is presented in this paper. The validation summaries are confidential and cannot be made publicly available.