Multiparametric Classification of Non-Muscle Invasive Papillary Urothelial Neoplasms: Combining Morphological, Phenotypical, and Molecular Features for Improved Risk Stratification

Diagnosis and grading of non-invasive papillary urothelial tumors according to the current WHO classification poses some challenges for pathologists. The diagnostic reproducibility of separating low-grade and high-grade lesions is low, which impacts their clinical management. Whereas papillary urothelial neoplasms with low malignant potential (PUN-LMP) and low-grade papillary non-invasive carcinoma (LG-PUC) are comparable and show frequent local recurrence but rarely metastasize, high-grade papillary non-invasive carcinoma (HG-PUC) has a poor prognosis. The main objective of this work is to develop a multiparametric classification to unambiguously distinguish low-grade and high-grade tumors, considering immunohistochemical stains for p53, FGFR3, CK20, MIB-1, p16, p21 and p-HH3, and pathogenic mutations in TP53, FGFR3, TP53, ERCC2, PIK3CA, PTEN and STAG2. We reviewed and analyzed the clinical and histological data of 45 patients with a consensus diagnosis of PUN-LMP (n = 8), non-invasive LG-PUC (n = 23), and HG-PUC (n = 14). The proliferation index and mitotic count assessed with MIB-1 and P-HH3 staining, respectively correlated with grading and clinical behavior. Targeted sequencing confirmed frequent FGFR3 mutations in non-invasive papillary tumors and identified mutations in TP53 as high-risk. Cluster analysis of the different immunohistochemical and molecular parameters allowed a clear separation in two different clusters: cluster 1 corresponding to PUN-LMP and LG-PUC (low MIB-1 and mitotic count/FGFR3 and STAG2 mutations) and cluster 2, HG-PUC (high MIB-1 and mitosis count/CK20 +++ expression, FGFR3 WT and TP53 mutation). Further analysis is required to validate and analyze the reproducibility of these clusters and their biological and clinical implication.


Introduction
Bladder cancer represents one of the most common cancers in the world with approximately 550,000 cases annually [1]. About 75% of bladder cancers by the time of diagnosis are low-grade, non-muscle-invasive papillary tumors, which are referred to as non-muscle-invasive bladder cancer (NMIBC) [2].
NMIBC includes papillary tumors invading the lamina propria and the connective tissue and are classified as T1, Ta, and Tis, following the TNM classification. According to

Description of the Cohort
A total of 45 patients with non-invasive urothelial papillary neoplasias classified originally as pTa with good tissue quality, and suitable for further analysis were included in this study. The clinical information is summarized in Table 1. The male-to-female ratio was 2.75:1 and the median age was 75 years (range, 50-95 years). According to the TNM classification, 39/45 (86.6%) patients were classified as pure pTa whereas 6/45 (13.3%) patients had a pT1 lesion at another site. Nineteen cases (42.2%) showed multiple tumor localizations in two or more regions of the bladder, whereas 26 cases showed a solitary lesion. Less than half of the patients recurred (48.4%) and only 52.6% developed no relapse by the two-year follow-up. According to the available clinical data of 44 patients, only 25/44 (56.8%) received immediate therapy, and 13/44 (29.5%) patients received immediate MMC instillations. All patients (12/44, 27.3%) with histological HC-PUC diagnosis were treated with re-TUR-BT and nineteen (19/44, 43.2%) patients were only under medical observance but no treatment was given. Further treatment was given in case of recurrence (Table 1).

Histological Classification
Data of diagnostic concordance among three pathologists blinded to the original diagnosis and the original diagnostic report (ODR) are reported in Table 2. The interobserver variability in the diagnosis of PUN-LMP was modest (57%), good (67%) in cases of LG-PUC and excellent (87%) in cases of HG-PUC. The most significant interobserver variability was found when comparing PUN-LMP and LG-PUC diagnosis, with a rather low Fleiss Kappa index of 0.43 (p = <0.001) with a 95% confidence interval between 0.398-0.635. A consensus diagnosis was established when an agreement was met by at least two independent observers. According to the consensus, only 18% of the cases were classified as PUN-LMP, but half (51.1%) of the patients fell into the category of LG-PUC and 33.3% were classified as HG-PUC. In the follow-up, a higher recurrence rate was observed in cases of HG-PUC, as expected. Cases of PUN-LMP demonstrated a higher median free relapse (34 months) in comparison to 23 months and 5 months of LG and HG-PUC, respectively. Two-year recurrence analysis revealed a higher recurrence rate (79.5%) for HG-PUC, in comparison to LG-PUC (42.7%) and PUN-LMP (33.3%) (p = 0.14, Figure 1). The baseline clinical characteristics and follow-up were described according to the consensus diagnosis in Supplemental Table S2. pendent observers. According to the consensus, only 18% of the cases were classified as PUN-LMP, but half (51.1%) of the patients fell into the category of LG-PUC and 33.3% were classified as HG-PUC. In the follow-up, a higher recurrence rate was observed in cases of HG-PUC, as expected. Cases of PUN-LMP demonstrated a higher median free relapse (34 months) in comparison to 23 months and 5 months of LG and HG-PUC, respectively. Two-year recurrence analysis revealed a higher recurrence rate (79.5%) for HG-PUC, in comparison to LG-PUC (42.7%) and PUN-LMP (33.3%) (p = 0.14, Figure 1). The baseline clinical characteristics and follow-up were described according to the consensus diagnosis in Supplemental Table 2.  LG-PUC, low-grade urothelial papillary carcinoma; HG-PUC, high-grade urothelial papillary carcinoma.

Immunohistochemical Characteristics and Correlation with Tumor Grade
Spearman's rank correlation matrix of the histoscores rendered no positive correlation between the diagnostic categories except for the proliferation index, which was positively correlated with the mitotic count assessed by the p-HH3 staining ( Figure 2). CK20, LG-PUC, low-grade urothelial papillary carcinoma; HG-PUC, high-grade urothelial papillary carcinoma.

Immunohistochemical Characteristics and Correlation with Tumor Grade
Spearman's rank correlation matrix of the histoscores rendered no positive correlation between the diagnostic categories except for the proliferation index, which was positively correlated with the mitotic count assessed by the p-HH3 staining ( Figure 2). CK20, p53 and the proliferation rate (MIB-1) are suitable tools to identify cases with higher histological grade and adverse outcomes. These three markers showed higher histoscores in cases of LG-PUC and HG-PUC versus PUN-LMP cases (p < 0.05) ( Table 3 and Figure 2). The tumor suppressor p53 displayed heterogeneous staining among the three subgroups but displayed higher expression in cases classified as LG-PUC and HG-PUC in comparison to PUN-LMP (p = 0.06). Additionally, cases classified as PUN-LMP showed weaker staining for CK20, p53 and a lower proliferation and mitosis rate evaluated by p-HH3 in comparison to the LG-or HG-PUC cases (p < 0.001). LG-PUC cases demonstrated similar histoscores for CK20 and p53 compared to the HG-PUC group. In contrast, the proliferation rate (MIB-1) and mitosis count showed higher scores in HG-PUC cases. Some cases of PUN-LMP and LG-PUC showed higher mitotic counts, but in these cases, mitosis was restricted to the basal layer, in contrast to HG-PUC. The cell cycle proteins p21 and p16 were heterogeneously stained among the cases and did not show a correlation with the histological grade. In total, 28/37 (75.7%) cases show a lack of FGFR3 staining from all cases but six cases (6/20, 30%) classified as LG-PUC displayed overexpression of FGFR3. In general, only 16/25 (64%) cases show a lack of FGFR3 expression correlated with its mutational status since some mutations may result in a non-functional protein.
but displayed higher expression in cases classified as LG-PUC and HG-PUC in comparison to PUN-LMP (p = 0.06). Additionally, cases classified as PUN-LMP showed weaker staining for CK20, p53 and a lower proliferation and mitosis rate evaluated by p-HH3 in comparison to the LG-or HG-PUC cases (p < 0.001). LG-PUC cases demonstrated similar histoscores for CK20 and p53 compared to the HG-PUC group. In contrast, the proliferation rate (MIB-1) and mitosis count showed higher scores in HG-PUC cases. Some cases of PUN-LMP and LG-PUC showed higher mitotic counts, but in these cases, mitosis was restricted to the basal layer, in contrast to HG-PUC. The cell cycle proteins p21 and p16 were heterogeneously stained among the cases and did not show a correlation with the histological grade. In total, 28/37 (75.7%) cases show a lack of FGFR3 staining from all cases but six cases (6/20, 30%) classified as LG-PUC displayed overexpression of FGFR3. In general, only 16/25 (64%) cases show a lack of FGFR3 expression correlated with its mutational status since some mutations may result in a non-functional protein. . Violin plots comparing the number of mitotic counts in 10 high power fields (HPF) stained with p-HH3 among non-invasive PUC categories. (C). Violin plots of the different histoscores of the immunohistochemistry markers, denoting the non-invasive PUC entities. For statistical analysis unpaired t-test was used, ns= not significant, * p < 0.05, ** p < 0.01, **** p < 0.0001. Table 3. Histoscores of cases according to the histological grading.

Mutational Landscape
In total, 37 cases were analyzed using targeted NGS. The mean average read depth of sequencing was 6820 reads (1141-23,628). The distribution of mutations is depicted in Figure 3 and detailed information is summarized in Supplemental Table S1. In total, 80 mutations were identified, 66/80 missense point mutations (82.5%), nine nonsense mutations (11.25%), and five frameshift mutations (6.25%). Analysis by SIFT, PolyPhen-2 and CADD predicted a damaging effect for all the reported missense mutations. Variant allele frequencies (VAF) of the mutations in all genes ranged from 3% to 93% (mean 34%).

Cluster Analysis Integrating Morphology, Immunophenotype and Mutational Status
Cluster analysis clearly divided non-invasive papillary urothelial tumors into two different subgroups. Cluster 1 represents a mixture of PUN-LMP and LG-PUC cases, displaying a low proliferation ratio and mitotic count, together with frequent FGFR3 and STAG2 mutations, whereas cluster 2 contains HG-PUC with high proliferation index and mitotic count, strong CK20 expression, TP53 mutation but FGFR3 WT (Table 4). Cluster analysis correlated strongly with the clinical behavior and showed lower relapse-free survival in group 2, (cases classified as HG-PUC) in comparison to group 1, (cases classified as PUN-LMP and LG-PUC), (p = 0.029, Figure 3B,C). Additional cluster analysis was performed using three different subgroups to determine whether PUN-LMP and LG-PUC cases could be segregated. (Figure 3B, left side). However, a mixture of cases of PUN-LMP and LG-PUC was found in groups 1 and 2 and the cluster did not correlate with histological grading. Almost all cases classified as HG-PUC were gathered into group 3 ( Figure 3B,C, left side). The Kaplan-Meier analysis showed a strong correlation with the clusters, as recurrencefree survival rates were consistently higher in groups 1 and 2, with overlap between LG-PUC and PUN-LMP (p = 0.055). Group 3 consistently showed a higher recurrence rate ( Figure 3C, left panel). A proportional Cox regression model was performed to estimate the association between RFS and the clusters analysis. Univariate and multivariate analyses confirmed that clinical characteristics and treatment did not influence RFS. Therapy by RE-TURB denoted a significant correlation with recurrence, but it was not considered since RE-TURB was exclusively performed in cases of HG-PUC (Supplementary Table S4).
Multivariable analysis confirmed that clinical characteristics and treatment did not influence relapse-free survival and that relapses were independently associated with the clusters groups resulting from histological characteristics, protein expression and mutational status (Supplementary Tables S5 and S6).

Discussion
In this study, we employed a multiparametric classification of non-invasive papillary urothelial tumors, encompassing mutational status, immunophenotype and histopathology to try to improve risk assessment over histological grading alone. In previous as well as our current analysis, poor to moderate interobserver reproducibility was noted for grading of papillary urothelial tumors. However, a precise grading of the tumor is essential for therapy decisions and to decrease the risk of recurrence and invasion [2,17]. The differences in grading may be due to different weights given to the morphological features (architecture, cytology, mitoses, etc.) for grading, despite numerous attempts to clearly define the grading criteria [18]. However, it is known that grading systems always involve a degree of subjectivity that affects interobserver reproducibility. In papillary urothelial tumors, the most difficult challenge is the distinction of PUN-LMP from LG-PUC, since both entities have the typical papillary architecture, i.e., papillae covered by urothelium lacking apical mitotic activity, with the only defining features being the presence and amount of atypia and disorder of the epithelium, which may be assessed subjectively. In cases of PUN-LMP, the urothelium appears thicker but atypia should be virtually excluded, whereas in the cases of LG-PUC the papillae display an orderly appearance at low magnification and atypia is present but mild [19,20]. In our study, in agreement with the literature, PUN-LMPs showed the lowest degree of interobserver agreement and reproducibility and an increasing trend to diagnose LG-PUC instead of PUN-LMP. In addition, differences in recurrence rates were relatively similar when comparing PUN-LMP and LG-PUC. The recurrence rate for PUN-LMP cases varies from study to study in the literature, probably an outcome likely influenced by the criteria used for classification. Recurrences range from 17 to 35% for PUN-LMP, 48-71% for LG-PUC and up to 78% for HG-PUC cases. In contrast, the risk for invasion is 1-3% for PUN-LMP, 5% for LG-PUC and up to 45% for HG-PUC cases [21].
Since histological features demonstrate obvious limitations, other tools such as mitosis count, proliferation rate and expression of specific immunohistochemical markers were studied as ancillary methods for grading urothelial neoplasms. Studies using mitotic activity evaluated by semi-quantitative methods showed low reproducibility for grading and poor correlation with prognosis [18]. However, studies in breast cancer suggest that mitotic counting can be improved by quantitative methods [22]. In our study, mitotic counting was performed using p-HH3 immunostaining to reduce subjectivity, although it has to be mentioned that p-HH3 staining cannot be compared directly to mitosis counting on H&E sections. A high mitotic index (>20 mitoses in 10 HPF) was observed in all cases of HG-PUC demonstrating that mitosis count with p-HH3 is a useful parameter for grading. The proliferation rate assessed by MIB-1 staining is also a helpful feature, but the differences in proliferation rate between PUN-LMP and LG-PUC are subtle and overlap, precluding its use as a sole parameter for grading [23]. Some authors reported CK 20 and p53 staining as objective markers for urothelial dysplasia [24]. CK20 is usually only expressed in umbrella cells in normal urothelium, whereas diffuse full thickness CK20 staining is characteristic of HG-PUC [25]. However, discordant CK20 expression was reported in cases of PUN-LMP and LG-PUC; therefore, this parameter should not be used in isolation [26].
In recent years, the advent of robust high throughput sequencing technologies has resulted in significant progress in our understanding of the mutational landscape of urothelial carcinoma and the prognostic impact of specific alterations, leading to several molecular classifications of bladder cancer. NMIBC are reported as genomically stable tumors characterized by FGFR3 mutations or chromosomal translocations (in around 70% of low-grade tumors) [27,28], which may contribute to early urothelial hyperplasia [29]. In concordance with these data, FGFR3 mutations were the most common genetic alteration in our series (64.1% of the cases). As expected, FGFR3 mutations were associated with a non-high-grade phenotype. Activating mutations in PIK3CA and inactivating mutations in the cohesin complex tumor suppressor gene STAG2 seem to play a secondary role in the pathogenesis since most of these mutations co-occur with FGFR3 alterations [30][31][32], as observed in the majority of the cases in this study. This suggests that PIK3CA and STAG2 alterations themselves are related to a low risk of progression [33,34]. Conversely, high-grade papillary carcinoma and invasive tumors frequently display alterations in the TP53, ERCC2 and PTEN genes and pathways [35,36]. Pathogenic TP53 mutations were rarely found, but when present, they were exclusively identified in LG-and HG-PUC representing, as in other cancers, a genetic signature of malignancy [37,38]. The DNA helicase ERCC2, previously described as a biomarker for sensitivity to cisplatin therapy in MIBC, similarly appears to play a role in bladder cancer oncogenesis [39,40]. However, as observed in this series, it also can be identified in all categories of NMIBC. The tumor suppressor PTEN is mostly genetically inactivated in MIBC (90%) in comparison to NMIBC (40%) [41]. In this series, however, inactivated PTEN was also found in PUN-LMP cases (2/4 cases mutated). It remains to be seen whether rare cases of PUN-LMP and LG-PUC with mutations in PTEN or TP53, generally associated with high-grade tumors, show a higher progression risk.
The joint evaluation of immunohistochemical and molecular markers in our principal component analysis allowed us to identify two groups with significantly different progression risks, despite the small number of cases. In addition, this cluster analysis using multiple parameters suggests that PUN-LMP and LG-PUC are closely related, likely representing a spectrum of biological behavior. Given that the same clinical and therapeutic approach is used for PUN-LMP and LG-PUC, the rationale for their distinction, which currently is based on histological features with considerable subjectivity in their interpretation, is unclear and requires further analysis with larger numbers of cases [19,21,42,43]. Multiparametric analysis, which employs additional protein and genetic markers, can be useful in NMIBC grading and might improve the prediction of clinical behavior and risk of recurrence and progression.
The results of this study should be interpreted with caution, as this study represents only a limited number of patients in whom only a limited number of protein markers and mutations were analyzed. This work primarily denotes a proof of principle study in which the main objective is to show that a multiparametric approach holds some promising results, which should be tested in a larger cohort. In the era of personalized medicine, molecular panels detecting mutations or their immunohistochemical surrogates are critical for selecting the best therapy and predicting the prognosis of patients with bladder cancers. Since PUN-LMP and UPC represent early diagnoses in the clinical course of bladder cancer patients, their multiparametric analysis including protein biomarkers and mutational profiling holds promise and should be performed in larger cohorts.

Tumor Samples
In this study, we included all samples from the Institute of Pathology and Neuropathology, University Hospital Tuebingen with a diagnosis of non-invasive papillary urothelial tumor between 2013 and 2020 meeting our selection criteria: biopsies from patients with papillary non-muscle invasive bladder cancer graded as PUN-LMP, LG-PUC or HG-PUC. Representative blocks were selected for analysis. In addition, nineteen samples of bladder biopsies without tumor or dysplasia served as negative controls. This study was approved by the Ethics Committee of the Medical Faculty of the University of Tuebingen (547/2021BO2).

Histological Classification According to the 2016 WHO Classification
For reclassification according to the WHO 2016 classification, 3 µm thick sections stained with hematoxylin and eosin (HE) were blindly evaluated by three pathologists. Cases were categorized as PUN-LMP, non-invasive LG-PUC or HG-PUC. The evaluation of cytological and architectural features was performed at low, medium and high magnification (100×, 200× and 400×) [8,44]. PUN-LMP was diagnosed in cases with discrete papillae, slender and not fused, covered by mostly normal urothelial cells with minimal or absent cytologic atypia, mitosis, and necrosis. The non-invasive LG-PUC biopsies showed orderly arranged papillae with mild cytological atypia and rare mitosis in the lower half of the urothelium. The non-invasive HG-PUC cases demonstrated a completely disorderly appearance at low magnification due to architectural disorganization, moderate to high-grade atypia and mitotic activity throughout the height of the epithelium [44].

Clinical Data
The medical records were reviewed including age, sex, disease presentation, location, extent, TNM staging, treatment and follow-up. Recurrence-free survival (RFS) was calculated from the date of diagnosis to 5-year or the last follow-up.

DNA Extraction
Genomic DNA was extracted from remaining biopsy material using the Maxwell ® RSC DNA FFPE Kit and the Maxwell ® RSC Instrument (Promega, Madison, WI, USA), according to the manufacturer's instructions. DNA was quantified with the Qubit Fluorometer employing the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA), according to the manufacturer's protocol. Quality control polymerase chain reaction (PCR) was performed to identify the amplifiable DNA length as previously reported [45]. Cases with at least 200 bp amplifiable DNA were selected for NGS targeted analysis.

Targeted Next Generation Sequencing
Targeted mutation analysis was performed by Next Generation Sequencing (NGS) (Ion GeneStudio S5 prime, Thermo Fisher Scientific, Waltham, MA, USA) using an AmpliSeq Custom Panel covering the most common recurrently mutated genes in papillary nonmuscle invasive urothelial bladder cancer (complete coding sequence: ERCC2, FGFR3, PIK3CA, PTEN, STAG2 and TP53 (Supplementary Table S1). Amplicon library preparation and semiconductor sequencing was carried out according to the manufacturers' manuals using the Ion AmpliSeq Library Kit v2.0, the Ion Library TaqMan Quantitation Kit, the Ion 510, Ion 520 and Ion 530 Kit-Chef and the Ion 520 Chip Kit (Thermo Fisher Scientific, Waltham, MA, USA). Five paraffin-embedded cases of papillary non-muscle invasive urothelial bladder served to validate the NGS panel. Variant calling of non-synonymous somatic variants compared to the human reference sequence was performed using Ion Reporter Software (Thermo Fisher Scientific, Waltham, MA, USA).

Cluster Analysis
Clustering analysis was conducted to identify groups of patients with similarities. Analysis was performed in only 37 cases because data were missing in eight patients. Standardization of variables (Versi) was performed to obtain a mean of zero and standard deviation of one. The clustering analysis was based on the following variants: histology classification (PUN-LMP, LG-PUC and HG-PUC), immunohistochemistry (CK20, p53, MIB-1, p16, p21, FGFR3 and p-HH-3) and mutational status of ERCC2, FGFR3, PIK3CA, PTEN, STAG2, TP53 (mutated vs. nonmutated). The Partitioning around Medoids (PAM) algorithm of the K-Medoids cluster was used to group patients based on their similarity since this algorithm is less sensitive to noise and outliers. In this method, each cluster is represented by one of the data points in the cluster. These points are named cluster medoids (object within a cluster for which average dissimilarity between it and all the other members of the cluster are minimal). [46,47]. The calculation of the distance between two real-valued vectors was performed using the Euclidean distance, which is defined as follows: The average silhouette method was applied to estimate the optimal number of clusters.

Statistical Analysis
Statistical analysis was performed using R Software Version 4.0 (R Foundation forStatistical Computing, Vienna, Austria). Categorical variables were described using frequencies and proportions; numerical variables were reported as either means and standard deviation (±SD) or medians and interquartile range (IQR), depending on the distribution of the data. Normality of the distribution was assessed by investigating kurtosis, skewness as well as QQ plots. Categorical variables were assessed by using Chi-square test. Fleiss' Kappa tests were performed to measure interobserver variability. Relapse-free survival (RFS) was analyzed by Kaplan-Meier method and log-rank test was used to test differences between groups. The follow-up was defined as the period from 24 months until the occurrence of the outcome, or otherwise censored. Univariable and multivariable Cox proportional hazard regression were used to assess the association between cluster and relapse-free survival. Candidate risk factors for the multivariate model were selected based on the clinical considerations and the statistically significant results from the bivariate analyses. Backward selection was performed to sequentially remove variables from the model. Multicollinearity was checked by matrix correlation. Results were reported as hazard ratio and 95% confidence interval. The proportion hazard assumption was evaluated using the Schoenfeld residuals. All statistical tests were two-sided, and p < 0.05 was considered to indicate statistical significance.

Conclusions
Over the last decades, classifications combining different levels of complexity in cancer diagnosis have emerged as an important tool for tumor detection and stratification. They provide a strong correlation with clinical behavior and offer biological insight and understanding of neoplasms which show difficulties in reproducible classification such as PUN-LMP. In this work, we confirm that several histologic, immunophenotypic and genetic features show overlap between PUN-LMP and LG-PUC, also reflected in the difficulties in reliably separating these two entities. This ultimately leads to the question of whether PUN-LMP and LG-PUC represent a biological continuum, which is captured insufficiently by histological grading and requires the addition of molecular and phenotypical features as ancillary tools for better risk stratification. However, due to the limited number of cases and genes investigated, further studies are required to validate and extend these results.