Transcriptional Analysis of Immunohistochemically Defined Subgroups of Non-Muscle-Invasive Papillary High-Grade Upper Tract Urothelial Carcinoma

Immunohistochemical (IHC) staining for CK5/6 and CK20 was reported to be correlated with the prognosis of early urothelial carcinoma in a way contrary to that of advanced tumors for unknown reasons. We aimed to characterize the gene expression profiles of subgroups of non-muscle-invasive papillary high-grade upper tract urothelial carcinoma (UTUC) classified by CK5/6 and CK20 expression levels: group 1 (CK5/6-high/CK20-low), group 2 (CK5/6-high/CK20-high), and group 3 (CK5/6-low/CK20-high). Expression of group 3 was predictive of worse prognosis of non-muscle-invasive papillary high-grade UTUC. Transcriptional analysis revealed 308 differentially expressed genes across the subgroups. Functional analyses of the genes identified cell adhesion as a common process differentially enriched in group 3 compared to the other groups, which could explain its high-risk phenotype. Late cell cycle/proliferation signatures were also enriched in group 3 and in some of the other groups, which may be used as a prognostic biomarker complementary to CK5/6 and CK20. Group 2, characterized by low levels of genes associated with mitogen-activated protein kinase and tumor necrosis factor signaling pathways, was hypothesized to represent the least cancerous subtype considering its normal urothelium-like IHC pattern. This study would facilitate the application of easily accessible prognostic biomarkers in practice.


Introduction
Recent advances in genetic and bioinformatic technologies have increased our understanding of the molecular characteristics of urothelial carcinoma. Based on gene expression profiles, it was demonstrated that both muscle-invasive (MIBC) and non-muscle-invasive urinary bladder carcinoma (NMIBC) can be divided into intrinsic molecular subtypes [1][2][3][4][5][6]. Although the classifications of MIBC were named differently by respective research groups, they had significant overlap: common luminal and basal molecular subtypes were reported [7]. Luminal type tumors expressed terminal
were frequently stained in the whole layer of tumors in groups 1 and 3, respectively, without noticeable compartmentalization ( Figure 1B). Although group 2 tumors were positive to both CK5/6 and CK20 in >50% of tumor cells, the expression of CK5/6 and CK20 was accentuated in basal and luminal cells, leaving out at least one cell layer of the basal and luminal portion, respectively, in all cases ( Figure 1B). Clinicopathological data of the patients are summarized in Table 1. The median age of the patients was 69 years (range,  and the male/female sex ratio was 2:1. The tumors measured 3.6 ± 3.24 cm (mean ± standard deviation) in maximal diameter. Six (40%) patients were in stage pTa, and the other 9 (60%) patients were in pT1. CIS was observed in 5 cases (33.3%). There were no significant differences in clinicopathological parameters or in IHC profiles among the subgroups, except for CK5/6 and CK20 expression (Table 1). One sample that belonged to group 1 showed 20%-30% positivity for CK14. With the exception of this case, IHC staining for GATA3 and FOXA1 showed diffuse staining in all samples.   Further analysis was limited to high-grade tumors to remove grade-related genetic diversity [16]. We tried to classify non-muscle-invasive papillary high-grade UTUC using IHC staining for CK5/6 and CK20 as surrogate markers, which were known to be related to molecular subtypes and to the prognosis of UTUC and of urinary bladder carcinoma [19][20][21][22]. Fresh tissue with high (IHC score ≥ 6) or low (IHC score = 1) CK5/6 and CK20 expression was selected for RNA sequencing (RNA-seq). The same IHC criteria were marginally correlated with the progression-free survival (PFS) of the patients with non-muscle-invasive papillary high-grade UTUC (p = 0.071) retrieved from the published cohort [20]: tumors with CK5/6-low/CK20-high expression tend to show the worst PFS ( Figure S2). Finally, three subgroups were established as follows: group 1, CK5/6-high/CK20-low; group 2, CK5/6-high/CK20-high; and group 3, CK5/6-low/CK20-high ( Figure 1B). CK5/6 and CK20 were frequently stained in the whole layer of tumors in groups 1 and 3, respectively, without noticeable compartmentalization ( Figure 1B). Although group 2 tumors were positive to both CK5/6 and CK20 in >50% of tumor cells, the expression of CK5/6 and CK20 was accentuated in basal and luminal cells, leaving out at least one cell layer of the basal and luminal portion, respectively, in all cases ( Figure 1B). Clinicopathological data of the patients are summarized in Table 1. The median age of the patients was 69 years (range, 56-84) and the male/female sex ratio was 2:1. The tumors measured 3.6 ± 3.24 cm (mean ± standard deviation) in maximal diameter. Six (40%) patients were in stage pTa, and the other 9 (60%) patients were in pT1. CIS was observed in 5 cases (33.3%). There were no significant differences in clinicopathological parameters or in IHC profiles among the subgroups, except for CK5/6 and CK20 expression (Table 1). One sample that belonged to group 1 showed 20%-30% positivity for CK14. With the exception of this case, IHC staining for GATA3 and FOXA1 showed diffuse staining in all samples.

Identification of Differentially Expressed Genes (DEGs) between the Subgroups
Using p-value <0.01 and |fold change| >2 as the cut-offs, 132, 115, and 143 DEGs were identified between groups 1 and 2, groups 1 and 3, and groups 2 and 3, respectively (Figure 2A,B and Table S2). Group 1 had 58 upregulated and 74 downregulated genes compared to group 2. Group 3, relative to group 1 and 2, harbored increased levels of 57 and 58 genes and decreased levels of 52 and 91 genes, respectively. Consequently, a total of 308 DEGs across all comparisons were found. They were put into unsupervised hierarchical clustering that demonstrated the congregation of the subgroups defined by IHC staining ( Figure 2C). Principal component analysis of the DEGs revealed that tumors with similar CK5/6 and CK20 expression profiles made clusters separated from one another in 2D space ( Figure S3). Group 1 had 58 upregulated and 74 downregulated genes compared to group 2. Group 3, relative to group 1 and 2, harbored increased levels of 57 and 58 genes and decreased levels of 52 and 91 genes, respectively. Consequently, a total of 308 DEGs across all comparisons were found. They were put into unsupervised hierarchical clustering that demonstrated the congregation of the subgroups defined by IHC staining ( Figure 2C). Principal component analysis of the DEGs revealed that tumors with similar CK5/6 and CK20 expression profiles made clusters separated from one another in 2D space ( Figure S3).          In addition, KEGG pathway analysis revealed that DEGs between groups 1 and 2 were enriched in TNF signaling pathway (FDR = 0.014) ( Figure 3A). Most DEGs associated in these pathways were downregulated in group 2 tumors ( Figure 6). GSEA also indicated group 2 tumors were less reactive to MAPK and TNF signaling cascades than group 1 tumors ( Figure 6). In addition, KEGG pathway analysis revealed that DEGs between groups 1 and 2 were enriched in TNF signaling pathway (FDR = 0.014) ( Figure 3A). Most DEGs associated in these pathways were downregulated in group 2 tumors ( Figure 6). GSEA also indicated group 2 tumors were less reactive to MAPK and TNF signaling cascades than group 1 tumors ( Figure 6).

Expression of Biologic Signature Genes of the Subgroups
To further delineate the properties of the IHC-based subgroups, we evaluated the expression of known biologic markers (Figure 7) [1,3,6,[23][24][25]. Each subgroup was enriched with different types of keratin: KRT5, KRT6, KRT14, and KRT15 in group 1, KRT18 and KRT20 in group 3, and all in group 2. The expression patterns of basal and luminal type markers clustered moderately with group higher in group 3 and 1 than in group 2 (Mann-Whitney U, p = 0.028), showing a huge parallel to that of late cell cycle/proliferation genes, in that it was high in group 3 and in two samples of group 1. The two group 1 tumors with elevated levels of cell cycle/proliferation genes tend to express KRT14 more than the others, and the tumor with the highest KRT14 level was the only one positive for CK14 by IHC staining. Expression levels of early cell cycle, epithelial-to-mesenchymal transition, and cancer stem cell markers did not significantly overlap with the subgroups based on IHC staining. Finally, independent molecular classifiers of MIBC and NMIBC were applied to the present tumors [3,6]. Although they yielded gene expression clusters similar to their original designs, they did not overlap with the present IHC-defined subgroups ( Figure S4).

Discussion
In the present study, we performed transcriptional investigation on non-muscle-invasive papillary high-grade UTUC by comparing its subgroups defined by CK5/6 and CK20 expression. First, we showed that immunoreactions of low CK5/6, high CK20, low CD44, and high p53 were predictive of high WHO grade in non-muscle-invasive papillary UTUC [19,20,27]. The 4 IHC markers defining the BASQ subtype of MIBC, CK14-positive, CK5/6-positive, GATA3-negative, FOXA1-negative expression [18] were not readily applicable to non-muscle-invasive papillary UTUC because positivity for CK14 and negativity for GATA3 or for FOXA1 was rarely observed in this tumor. To minimize molecular divergence between low-and high-grade urothelial carcinoma [16], we excluded low-grade tumors and focused on high-grade tumors. Along with previous studies [19,21,28], CK5/6-low/CK20-high expression was marginally correlated with the poor PFS of non-muscle-invasive papillary high-grade UTUC, which was demonstrated in the patients from the published cohort [20]. Therefore, IHC staining of CK5/6 and CK20 was adapted as prognostic markers of non-muscle-invasive papillary high-grade UTUC, which could be relevant to its molecular classifications [2,5,19,20,22]. To ensure the protein expression of the RNA-seq samples, expression levels of CK5/6 and CK20 were determined based on IHC staining on adjacent tissue with strict cut-offs [29]. Consequently, mRNA levels of basal/intermediate and luminal keratins concurred with the IHC staining profiles of CK5/6 and CK20, overall [30].
CK5/6 and CK20 expression has been suggested to represent the differentiation status of urothelium; therefore, IHC staining for these proteins can be useful as a surrogate marker pertinent to molecular subtypes of urothelial carcinoma [2,5]. CK5/6 is expressed in basal and progenitor cells of the normal urothelium which is conserved in basal type urothelial carcinoma that supposedly obtains a basal cell-like undifferentiated phenotype [31,32]. On the other hand, high levels of CK20 expression in urothelial carcinoma, as normally observed in terminally differentiated urothelium, are suggested to indicate neoplasm in well-differentiated states [31]. These were well-correlated with prognostic differences of basal and luminal type MIBC [1][2][3]5,24,25,31,33,34]. Consistent with the correlation of keratin expression and basal/luminal types of MIBC, early urothelial carcinomas showing CK5/6-high/CK20-low and CK5/6-low/CK20-high immunoreactions were enriched for basal type and luminal type/urothelial differentiation signature genes, respectively, in this study and in the previous one [1]. However, the contrasting clinical outcomes associated with CK5/6 and CK20 expression, assessed at the protein or at the mRNA levels, were frequently reported in early urothelial carcinoma of the upper or lower tract [6,[19][20][21] and, occasionally, in MIBC [22]. Furthermore, previous classifiers of urothelial carcinoma based on gene expression were assigned to the present cases in a way that was independent of their IHC staining [3,6]. Collectively, these results implied that distinct molecular frameworks, that would determine their respective behaviors, were inherent in the present subgroups rather than the basal vs. luminal paradigm.
Cell adhesion and motility, in agreement with other reports [1,33], was the distinct function that was differentially enriched between groups 3 and 1 and between groups 3 and 2. Of note, group 3 showed downregulation of several genes involved in major cell adhesion complexes, including adherens junction (CDH8, CLCA2) [35][36][37], desmosome (SPINK5) [38], focal adhesion (MYO10, SHC1) [39,40], and basement membrane and the interaction with it (LGALS8, LAMA2, COL7A1, ITGB4) [41,42], with tight junction (CLDN4) being the exception [43], which would promote migration and invasion of cancer cells. As suggested by Sjodahl et al. [1], the major high-risk subtype of NMIBC, 'genomically unstable', had low expression levels of genes engaged in major adhesion structures, such as adherens junction, desmosome, and gap junction, with the exception being tight junction (CLDNs), compared to the low-risk subtype, 'urothelial-like A'. The 'genomically unstable' subtype showed IHC staining for CK5 and CK20, similar to that of group 3 in this study [1,33]. In line with this, functional enrichment analyses and GSEA supported the premise that the loss of non-apical cell-to-cell and cell-to-matrix adhesions is a major determinant of the high-risk subtype, group 3 [37]. In addition, the non-canonical Wnt/PCP pathway (FAT2, CELSR1, WNT5A), known to suppress early-staged cancers by regulating cell adhesion or migration, was downregulated in group 3 [44]. This was also substantiated by the results showing that the alteration of the genes observed in group 3 have been reported to promote invasion of and to be associated with the poor prognoses of urothelial carcinoma and other various malignancies, including downregulation of CLCA2 [36], SPINK5 [45], SHC1 [40], LGALS8 [42], COL7A1 [41], FAT2 [44], WNT5A [44], and NRXN3 [46], and upregulation of CLDN4 [43], PKN1 [47], FREM2 [48], XBP1 [49,50], PKP2 [51], ANG [52], and PFN1 [53]. We tried to validate the prognostic values of these genes in the independent TCGA urinary bladder carcinoma cohort. However, their prognostic consequences were inconsistent, and a few of those genes altered in group 3 even predicted longer survival in the TCGA database. This result, in conjunction with those of previous studies [35,41,44,53], suggested that the adhesion molecules may act differently in advanced disease, which accounted for 99.2% of the TCGA cohort. In treatment of non-muscle-invasive UTUC, adhesion molecules differentially expressed among the present subgroups provide candidates for targeted therapy [44,54], which warrants a deeper investigation.
Late cell cycle/proliferation signature genes were predominantly upregulated in most group 3 tumors, as well as in a few samples of the other subgroups, consistent with the molecular subtypes of NMIBC with poor prognosis suggested in previous reports, 'genomically unstable', 'urothelial-like B', and 'squamous cell carcinoma-like' subtypes, which shared elevated levels of these signature genes compared to the 'urothelial-like A' subtype [1,6,24,33]. High levels of late cell cycle/proliferation markers were also observed in aggressive subtypes of MIBC [6,24]. In addition, IHC staining for cyclin B1, as a late cell cycle marker, combined with CK5 expression and tumor histology in risk-stratified patients with T1 NMIBC [33,34]. Moreover, we showed that the enrichment pattern of late cell cycle/proliferation correlated closely with that of the NMIBC progression signature [26]. Therefore, it is feasible to postulate that the enhanced cell proliferation of group 3 would support its aggressive phenotype, which was possibly obtained by bypassing cell cycle checkpoints [1]. In addition, different molecular subtypes of NMIBC that showed diverse clinical behaviors, 'urothelial-like A', 'urothelial-like B', and 'squamous cell carcinoma-like', as suggested by Sjodahl et al., included tumors with CK5/6-high/CK20-low IHC staining [1,33,34], which was representative of group 1. Therefore, group 1 is suggested to contain prognostically heterogeneous tumors. Considering the remarkable overexpression of these signature genes observed in certain tumors of group 1, genes related to late cell cycle/proliferation are expected to be used as risk-stratification markers within group 1 tumors. Likewise, they would also be useful for group 2 tumors that showed modest discrepancies in the level of late cell cycle/progression signature within the subgroup. This seemed analogous to the 'urothelial-like A' and 'urothelial-like B' subtypes, as both subtypes could be positive for CK5/6 and CK20, similar to group 2 tumors [1,33]. Alternatively, IHC staining for CK14, observed only in one of the group 1 samples, was one of the characteristics of the 'squamous cell carcinoma-like' subtype [33]; therefore, CK14 immunostaining could be used for subclassifying group 1 tumors, even though it was rarely positive in non-muscle-invasive papillary high-grade UTUC.
The accentuated patterns of IHC staining for CK5/6 and CK20 observed in group 2 were reminiscent of the immunoprofiles that were associated with low-risk phenotypes of superficial papillary urothelial neoplasms which maintained stratified CK5/6 and CK20 expression of the normal urothelium [20,55]. Similar IHC staining for CK5/6 and CK20 was also reported in the low-risk 'urothelial-like A' subtype which was well-differentiated at the gene expression level [1,33,34]. In transcriptional investigation, group 2 had moderate enrichment for genes involved in urothelial differentiation as well as in basal and luminal type signatures. Moreover, most DEGs enriched in MAPK or TNF pathways were expressed at lower levels in group 2 than the other groups, except for SPRED1, DUSP7, and WNT5a, that are known to downregulate MAPK and TNF signaling cascades, respectively [56,57]. MAPK activation was shown to be an important mediator of cellular transformation induced by constitutively activated FGFR3, which have crucial roles in tumorigenesis of papillary urothelial carcinoma [58]. In addition, TNF, a major inflammatory cytokine, is known to participate in many steps of carcinogenesis and tumor progression [59]. As a result, we hypothesize that group 2 represents the least cancerous subtype which maintains molecular structures of the normal urothelium, as exemplified by expression of KRTs; the expression of CK5/6 and CK20 in this group is not entirely representative of the basal and luminal types, respectively [60]. It is still not clear if this subgroup is a precursor of progressed lesions [6], for example, group 1 or group 3, or a distinct subtype with different pathogenesis.
This study had some limitations. The number of the patients was constrained by the strict cut-off criteria of IHC staining. Due to the short follow-up duration of the patients subjected to RNA-seq, their exact outcome was unevaluable, even though it was validated in the independent patient cohort. Finally, cellular adhesion, late cell cycle/proliferation signature, MAPK, or TNF signaling pathway genes, suggested as potential biological keys in the present and in previous studies, need external validation for confirmation. However, this study had the merits of investigating the transcriptional characteristics of subgroups defined by routinely used IHC staining, CK5/6 and CK20, which would facilitate the application of the accumulating genetic/phenotypic information of urothelial carcinoma, in practice.
In conclusion, IHC staining for CK5/6 and CK20 classifies subgroups of non-muscle-invasive papillary high-grade UTUC which are prognostically relevant but molecularly different from previous gene expression subtypes established in MIBC or even in NMIBC. Group 2 tumors were postulated to represent a less cancerous subtype.

Validation of Prognosis Associated with Immunohistochemical Profiles of RNA-Seq Subgroups
Prognostic validation of the IHC criteria of RNA-seq subgroups was performed by Kaplan-Meier and log-rank tests of the PFS of the patients with non-muscle-invasive papillary high-grade UTUC who were retrieved from the previously published cohort [20]. Protein expression of CK5/6 and CK20 was re-evaluated semiquantitatively in accordance with the IHC criteria of IHC-defined subgroups, high (≥ 50%) or low (<10%), from the triplicate tissue microarray IHC slides made for the previous study [20]. From 93 patients with non-muscle-invasive papillary high-grade UTUC, 39 tumors met this IHC staining criteria, including 13 CK5/6-high/CK20-low, 4 CK5/6-high/Ck20-high, and 22 CK5/6-low/CK20-high tumors. The median follow-up period was 60 months (range, 1-207). Disease progression was defined by local recurrence except for urinary bladder or distant metastasis.

Transcription Analysis and Identification of Differentially Expressed Genes
mRNA was extracted using a punch with a plunger (3 mm in diameter) from the pure papillary tumor area (tumor cellularity >90%) of frozen tissue that was marked on a hematoxylin and eosin-stained frozen slide. RNA-seq was performed as previously described [61] using the Illumina HiSeq 2500 platform (Illumina, San Diego, CA, USA). Paired demultiplexed fastq files were generated, and initial quality control was performed using FastQC (Phred quality score >30) [62]. Adapter trimming was conducted using the Trimmomatic tool [63] followed by mapping to the human genome (UCSC hg19) using the HISAT2 and Bowtie2 [64]. Known transcripts were assembled with the StringTie tool [65]. Consequently, 21,780 genes were found across the samples. Quantile normalization was done on log 2 (FPKM+1). To fit a linear model for each gene based on the three-group study design, the function of lmFit was used (limma) [66]. Then, the eBayes function was used to calculate the empirical Bayes moderated t-statistic.

Functional Enrichment Analysis
Functional enrichment of DEGs were found from GO [67] and KEGG pathway [68] database with FDR < 0.05 as a cut-off. The GSEA of all genes was used to interpret their functions from the public database [69]. Biological signatures were acquired from the respective publications [1,3,6,[23][24][25]. The integrated score for the progression of NMIBC was calculated by subtracting the total value of COL4A3BP, MBNL2, NEK1, FABP4, and SKAP2 from that of KPNA2, BIRC5, UB2C, CDC25B, COL4A1, MSN, and COL18A1 [26].

TCGA Data Preprocessing and Survival Analysis
From the TCGA database, 399 samples of urinary bladder urothelial carcinoma were collected that contained mRNA expression and survival data. Most tumors were WHO high-grade (n = 375) and MIBC (n = 396). The median follow-up period was 1735 days, during which 109 patients died. Preprocessing of the mRNA expression data was performed in the R 3.4.3 statistical environment: mRNA expression level quantified by RSEM raw counts were normalized using the limma-voom, which applied linear modeling to voom-transformed read counts [70]. We determined the median value of gene expression as a cut-off point to divide patients into high-and low-expression groups. Kaplan-Meier and log-rank tests were applied for survival analysis.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.