Digital Pathology and PD-L1 Testing in Non Small Cell Lung Cancer: A Workshop Record

A meeting among expert pathologists was held in 2019 in Rome to verify the results of the previous harmonization efforts on the PD-L1 immunohistochemical testing by scoring a representative series of non-small cell lung cancer (NSCLC) digital slides. The current paper shows the results of this digital experimental meeting and the expertise achieved by the community of Italian pathologists. PD-L1 protein expression was determined using tumor proportion score (TPS), i.e., the percentage of viable tumor cells showing partial or complete membrane staining at any intensity. The gold standard was defined as the final PD-L1 score formulated by a panel of seven lung committed pathologists. PD-L1 status was clustered in three categories, namely negative (TPS < 1), low (TPS 1–49%), and high (TPS ≥ 50%). In 23 cases (71.9%) PD-L1 staining was performed using the companion diagnostic 22C3 pharmDx kit on Dako Autostainer, while in nine (28.1%) cases it was performed using the SP263 Ventana kit on BenchMark platform. A complete PD-L1 scoring agreement between the panel of experts and the participants was reached in 57.1% of cases, whereas a minor disagreement in 16.1% of cases was recorded. Italian pathologists performed best in strong positive cases (i.e., tumor proportion score TPS > 50%), whereas only 10.8% of disagreement with the gold standard was observed, and 55.6% regarded a single challenging case. The worst performance was achieved in the negative cases, with 32.0% disagreement. A significant difference resulted from the analysis of the data separated by the different clones used: 22.3% and 38.1% disagreement (p = 0.01) was found in the group of cases analyzed by 22C3 and SP263 antibody clones, respectively. In conclusion, this workshop record proposed the application of a digital pathology platform to share controversial cases in educational meetings as an alternative possibility for improving the interpretation and reporting of specific histological tools. Due to the crucial role of PD-L1 TPS for the selection of patients for immunotherapy, the identification of unconventional approaches as virtual slides to focus experiences and give more detailed practical verifications of the standard quality reached may be a considerable option.


Introduction
During the last two decades, a significant revolution in pathology laboratories has been represented by the implementation of whole slide imaging and digital scanners [1]. Digital pathology may play a significant role as a didactic, diagnostic and research tool. In fact, digital slides are more efficient to adequately replicate the microscope experience than static images [2]. Another advantage in this setting is represented by the possibility to highlight and analyze interesting areas during didactic sessions [2]. Despite a more realistic experience can be obtained by adopting multi-headed microscopes, a limitation in this approach is related by the low number of viewing head extensions (5 to 14) [2] while digital slides may allow, by using dedicated monitor, an adequate experience to a higher number of attendants [3].
The evaluation of the diagnostic performances of a single tissue biomarker like programmed death-ligand 1 (PD-L1) seems to be a good opportunity to test in parallel the application of digital pathology as a routine tool in educational programs.
PD-L1 immunohistochemical (IHC) testing plays an essential role to select advanced stage non-small cell lung cancer (NSCLC) patients for treatment with immune system checkpoint inhibitors [4][5][6]. In this setting, standardization of both pre-analytical and analytical procedures is a crucial, albeit not trivial, task. Several antibodies have been adopted for PD-L1 staining and usually developed on different immunostainers. The most reproducible results were obtained with the 22C3 (Dako, Carpinteria, CA, USA), , and SP263 clones (Ventana, Tucson, AZ, USA) [7,8]. So far, a constellation of many different research papers has been published to harmonize and standardize PD-L1 IHC assays [9][10][11]. However, interpretation remains challenging, and specific expertise is required to correctly assess PD-L1 status [12][13][14][15][16]. Various clinical interests converge towards the continuous education of pathologists in improving the reproducibility of PD-L1 analysis.
The evaluation of the diagnostic performance of a single tissue biomarker such as PD-L1 in NSCLC is a paradigmatic opportunity to implement digital pathology in highly specialized educational programs. For this purpose, a working group of Italian pathologists met in 2019 in Rome to assess the reproducibility of PD-L1 scoring using e-learning solutions. In this study, we share the results of this initiative, focusing on the reliability and affordability of this "2.0 approach" in pathology clinical practice.

Cases Selection
A panel of seven pathologists with broad experience in lung cancer pathology (MB, CD, GF, FF, PG, AM, and GT) selected from the archives of their respective institutions representative NSCLC histologic blocks, as an e-learning material. All cases were anonymized, and the study was fully compliant with the local ethical guidelines and the declaration of Helsinki. The study group was composed of 32 NSCLC cases, including 27 (84.4%) surgical resections and 5 (15.6%) biopsies. The latter were obtained from either primary (n = 3; 60%) or metastatic (n = 2; 40%) sites. The molecular status of epidermal growth factor receptor (EGFR), ROS proto-oncogene 1 receptor tyrosine kinase (ROS1) and anaplastic lymphoma kinase (ALK) was retrieved from the clinical databases. The inclusion criteria were the following: adult patients (>18 years old); no previous neoadjuvant chemoradiotherapy administered; PD-L1 testing available.

PD-L1 Status Assessment and Tissue Slides Digitalization
For each case, representative hematoxylin and eosin (H&E) and PD-L1 stained slides were digitized using a ScanScope CS digital scanner (Aperio, Park Center Dr, Vista, CA, USA) at a 40x magnification. The original PD-L1 protein expression was determined using tumor proportion score (TPS), i.e., the percentage of viable tumor cells showing partial or complete membrane staining at any intensity. Then, PD-L1 status was clustered into three categories, namely, negative (TPS < 1%), low (TPS 1-49%), and high (TPS ≥ 50%). In 23 (71.9%) cases PD-L1 staining was performed using companion diagnostic 22C3 pharmDx kit on Dako Autostainer, while in 9 (28.1%) cases using the SP263 Ventana kit and OptiView DAB IHC detection kit on BenchMark platform, as recommended by the manufacturer. For all cases, the PD-L1 TPS was re-assessed by the panel of pathologists by using the same classification to obtain the final gold standard. To define tumor cells as positive for PD-L1 staining two different approaches were adopted. Briefly, for the 22C3 clone, a complete circumferential or partial linear membranous staining of tumor cells at any intensity was necessary to define as positive the tumor cells, whereas any membranous and/or cytoplasmic immunoreactivity in tumor cells was considered positive for the SP263 clone. Discordant cases were discussed by the panel of pathologists in order to obtain a consensus.

E-Learning-Based PD-L1 Analysis Harmonization among General Pathologists
All cases were shared with 37 general pathologists who participated at the meeting using e-learning platforms. The study participants were divided into seven groups (i.e., orange, red, violet, green, yellow, pink, blue); each group was supervised by one pathologist from the expert panel as a tutor. Within the working groups, each digital case was blindly scored by each pathologist individually (TESI SPA, Milan, Italy). The automatic informatics system allowed each pathologist to express his TPS for each case on a digital table within 10 min, subsequently closing the session and saving the relative data. Finally, the informatics system stopped the timing and saved the data. A final consensus on TPS was reached by collegial discussion and the second round of scores for the digital cases.

Statistical Analysis
All study data were digitally recorded and collected. Subsequently, inferential statistical analysis was performed using Microsoft Excel embedded tools (EGR, FB, MJ). The inter-and intra-observer reproducibility was assessed by using positive and negative percent agreement, sensitivity (or true positive rate), specificity (or true negative rate), positive and negative predictive values. Ninety-five percent of confidence intervals were computed for all measurements.

Results
In Table 1  In 24 out of 32 selected cases (75.0%), the PD-L1 score was concordant between the original report and the expert panel of pathologists (Supplementary Table S1). In the remaining 8 cases, an under (n = 4) or over-estimation (n = 4) of positive tumor cells was reported. Seven cases were negative for PD-L1 expression, 16 were considered to be intermediate expressors, and 9 were classified as strong positive by the expert panel (Supplementary Table S1). The 37 learner pathologists, divided into 7 groups, evaluated all the digital slides on a laptop (Supplementary Table S2). The complete Cancers 2020, 12, 1800 5 of 16 agreement between the expert panel (gold standard) and the participants was reached in a range of 37.5% (group red) to 78.1% of cases (group yellow); the average value was 57.1% (Table 2). A minor disagreement (1 participant per group) was recorded in 16.1% of cases (average value). In the red, orange, and green groups, two cases had no agreement with the referee scores. In Table 3, TPS disagreements were stratified according to the clinical threshold of PD-L1 scoring. The higher agreement rate among pathologists was obtained in PD-L1 high cases, with only 10.8% (36/333 instances) of disagreements with the gold standard, mainly regarding a single challenging case (55.6%, 20/36 instances; n = 37). Conversely, the worst performance was achieved in PD-L1 negative cases, where 32.0% (83/259 instances) of disagreements was demonstrated. Figure 1 highlights some paradigmatic challenging pitfalls due to the staining of macrophages in PD-L1 negative tumors (cases n. 10

Discussion
Digital pathology technologies for educational, diagnostic, and research purposes are rapidly replacing the use of static images in this era of next-generation pathology. Digitalization of histologic or cytologic slides ensures the possibility to cover the entire slide surface, enabling pathologists to analyze the whole section as with the "traditional" microscope [2]. The application of e-learning during an educational pathology course may reduce the subjectivity and variability of semiquantitative analysis (by using dedicated software) and makes it possible to share interesting or challenging cases for a second opinion [17]. However, despite these possibilities several issues remain unsolved. In particular, efforts have to be devoted to validating the use of digital slides, as reported by The College of American Pathologists (CAP) and the Digital Pathology Association (DPA) guidelines [17,18]. In addition, a not negligible percentage of pathologists feel a lower compliance with digital diagnosis respect to classical microscope evaluation [17]. The gradual introduction of similar tools during informal meetings and courses may help in decreasing the reluctance of pathologists in accepting the digital slides as an option and not an enemy. The development of user-friendly platforms that contribute to the transformation of the traditional vertical didactic exposition of paradigmatic cases into more dynamic and quiz-like experiences is currently available especially to approach single-item issues like the evaluation of a tissue biomarker.
In the era of personalized medicine, immunotherapy with anti-PD-1/PD-L1 monoclonal antibodies represents a relevant clinical option for patients with advanced stage NSCLC [19]. In particular, taking into account the results that have been reported in KEYNOTE 024 and 042 clinical trials, the adoption of pembrolizumab should be considered as the first line or as a valid second line treatment in advanced stage NSCLC patients harboring a PD-L1 expression level ≥50% or between 1-49%, respectively [20,21]. Despite the improvements in molecular techniques, IHC still plays a key role in the evaluation of PD-L1 expression, in order to administer immunotherapy in this setting of patients. However, careful attention should be paid due to extreme intra-and inter-observer variability in the staining interpretation, in particular due to disparities in the training of pathologists [22]. In this setting, digital scanning of selected challenging cases and collegial discussion with a panel of highly trained pathologists may have an educational aim, helping pathologists to better define the expression level of PD-L1 and to ensure the best treatment choice for advanced stage NSCLC patients.
After 3 years of routine testing in the laboratories, this paper focused on the challenges in PD-L1 staining score evaluation among different pathologists by adopting digital slides. Despite a good technical confidence generally being achieved, as demonstrated in previous reports [5], several interpretative issues are remaining unsolved. To overcome these issues, a panel of seven expert pathologists and a large group of pathologists joined a meeting that raised the main diagnostic issues in specific challenging situations. All cases were re-evaluated by the panel of experts, and, after a collegial discussion, in a high percentage of cases (75.0%) an agreement with the score referred in the original report was reached.
Several pre-analytical factors may influence PD-L1 staining score evaluation. Different IHC protocols (i.e., type and duration of antigen retrieval, primary antibody dilution or incubation time, number of steps of amplification) are adopted by diverse laboratories, even when the same primary antibody is used on the same platform with the same detection kit [23,24]. Since recent literature data reported an acceptable overlapping between 22C3 PharmDx clone on Dako Autostainer or SP263 clone on BenchMark platform, the panel enrolled NSCLC cases indifferently stained by using these clones [5]. The SP263 clone demonstrates higher sensitivity than all other assays, but its specificity was lower [19]. A comprehensive meta-analysis showed the sensitivity of PD-L1 IHC 22C3 pharmDx superior than PD-L1 IHC 28-8 pharmDx and then Ventana PD-L1 (SP263) [24]. Other studies have evaluated the interchangeability of PD-L1 assays [10]. However, no data are available in prospective clinical trials. From the patient's perspective, lower specificity may be acceptable if the progression free survival, overall survival, and adverse effects in patients with PD-L1-negative tumors treated by immunotherapy are comparable to that of conventional chemotherapy [25]. The employment of Ventana platform usually produces more intense immunoreactivity, although the morphological detail of the background tissue is not always well preserved, representing a possible issue in the distinction among tumor and the immune cells (usually alveolar macrophages). On the other hand, the Dako platform provides fairly soft immunoreactions with a better definition of the morphological features of the different cell types [26]. Although the selection of the cases enrolled in the study reflected the normal routine cases, the choice aimed to highlight the intra-and inter-tumoral heterogeneity of PD-L1 expression, which may affect the reproducibility of this analysis (Figure 2). For this reason, challenging borderline cases, characterized by a PD-L1 expression close to the proposed threshold, were included.
To better represent the routine diagnostic cases, the panel selected a mixture of small specimens (bronchoscopy and core needle biopsies), and surgical specimens of NSCLC with different histotypes and respective molecular findings (Table 1). Results were compared with the TPS assigned by the panel of experts (gold standards). The project highlighted interpretative inaccuracies potentially leading to false negative results like the suboptimal application of the magnification rule, which recommends the use of a high-power field for the scoring of mild/focal immunoreactivity [27]. On the contrary, a false positive background was a frequent source of pitfalls due to macrophages staining ( Figure 1). Unspecific peri-membranous reactivity in mucous-rich adenocarcinomas may be an additional source of errors for pathologists.
An important issue was related to the evaluation of viable tumor cells. Analytical interpretation of PD-L1 staining score resulted in a false negative result if the pathologists did not establish the specimen adequacy of the tumor sample defined by at least 100 viable neoplastic cells. Another relevant aspect is the staining under-estimation when incomplete tumor membrane staining was evaluated. In previous reports, Cohen's κ coefficient was 0.68 (95% CI, 0.65-0.71) for the 1% cut point sample set, indicating improvable inter-observer agreement in this subgroup of patients [28]. Conversely, in our series in strong positive patients a large agreement was recorded with only 10.8% of disagreements (Table 3). When considering a positive tumor cell staining around 50%, only a careful and extensive quantitative evaluation may avoid an under-estimation. Moreover, the intensity of PD-L1 staining in heterogeneous cases produced questionable interpretations ( Figure 2). Literature data of a similar project showed a slightly lower 0.58 Cohen's κ coefficient (95% CI, 0.55-0.62) for the 50% cut point sample set, indicating moderate inter-observer agreement [28]. The lower reproducibility at the 50% cut point suggested that training and external quality assessment programs should improve reproducibility of the samples with ≥50% of stained cells, particularly as this cut point is used to select NSCLC patients for anti-PD1 therapy [28]. The misclassification of patients may lead to two orders of problems: patients who do not receive the first line PD-1 blockade therapy, and patients who receive a treatment that is not beneficial [29]. All interpretation issues were a potential concern for pathologists, regardless the assay type [30]. In our study, a complete agreement between the expert panel evaluation (gold standard) and the participants was reached in 57.1% of cases (average value, Table 2). A minor additional disagreement was recorded in 16.1% of cases. The best performer featured 93.8% of correct interpretation (Supplementary Table S2).
The present paper highlights once again the potential of digital pathology. The possibility of sharing slides for educational purpose, to use WSI for agreement and quality control studies and the implementation of a digital pathology workflow for primary histological diagnosis have been widely reported [1][2][3]29]. We are approaching the third revolution in pathology with the implementation of the artificial intelligence (AI) tool even in the everyday practice with the possibility to combine the use of digital pathology and AI tool to support pathologists in the diagnostic setting [30,31]. The possibility of predicting IHC findings directly from H&E slides has also been reported [32], including the PD-L1 status by the developing of a deep learning model [33]. The obtained results were robust to interpathologist variability, opening new possibilities for PD-L1 assessment, especially when there is insufficient tissue for all the needed IHC and molecular tests.

Conclusions
In the educational setting, despite an initial reluctance, Italian pathologists' welcome digital slides as an opportunity to share opinions and challenging cases and evaluate specific biomarkers' performances like PD-L1. The educational digital pathology meeting planned in Rome in 2019 stressed the diagnostic challenges of the PD-L1 IHC testing, collecting NSCLC samples from the Italian national laboratories network. Despite the small number of cases (n = 32) and the limited inter-observer reproducibility due to the impossibility to compare the results before and after training, the projected addresses some interpretative components underlying possible sources of disagreement related to the PD-L1 testing. Due to the crucial role of PD-L1 TPS for the selection of patients for immunotherapy, the identification of educational unconventional approaches as virtual slides to focus experiences and give more detailed practical verifications of the standard quality reached may be a considerable option.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/7/1800/s1, Table S1: Original PD-L1 evaluation, expert board consensus and prevalent scoring for every group; discrepancies between the original evaluation and the panel are highlighted in orange; participants errors in yellow. Table S2: Participant performances. In the columns, the cases of the study were grouped from negative (green, TPS< 1%) to strong positive (red, TPS ≥50%). In the first row, the gold standard results recorded by the board of experts were listed. In the following rows, for every pathologist who attended the meeting a progressive number and a colour of identification corresponding to the table were assigned. (ie.e R1: red table, participant number 1). Table S3: Performances by histotype. Table S4: Performances by clone.