Molecular Subtyping in Muscle-Invasive Bladder Cancer on Predicting Survival and Response of Treatment

Molecular classifications for urothelial bladder cancer appear to be promising in disease prognostication and prediction. This study investigated the novel molecular subtypes of muscle invasive bladder cancer (MIBC). Tumor samples and normal tissues of MIBC patients were submitted for transcriptome sequencing. Expression profiles were clustered using K-means clustering and principal component analysis. The molecular subtypes were also applied to The Cancer Genome Atlas (TCGA) dataset and analyzed for clinical outcome correlation. Three molecular subtypes of MIBC were discovered, clusters A, B, and C. The most differentially upregulated genes in cluster A were BDKRB1, EDNRA, AVPR1A, PDGFRB, and TNC, while the most upregulated genes in cluster C were collagen-related genes, PDGFRB, and PRKG1. For cluster B, COL6A3, COL1A2, COL6A2, tenascin C, and fibroblast growth factor 2 were statistically suppressed. When the centroids of clustering on PCA were applied to TCGA data, the clustering significantly predicted survival outcomes. Cluster B had the best overall survival (OS), and cluster C was associated with poor OS but exhibited the best response to perioperative chemotherapy. Among all groups, cluster B had a better pathologic response to neoadjuvant chemotherapy (40%). Based on the results of the present study, the novel clusters of subtype MIBC appear potentially suitable for integration into clinical practice.


Introduction
Bladder cancer (BC) is ranked as the ninth most frequently diagnosed cancer worldwide. It is mainly represented by bladder urothelial carcinoma, which accounts for 90% of BC [1]. Current standard treatments for BC mainly include surgical resection, followed by chemotherapy; however, due to the high incidence of distant metastasis and recurrence after treatment, the five-year overall survival rate remains at 15-20% [2]. High morbidity, mortality, and healthcare costs associated with muscle-invasive bladder cancer (MIBC) draw a need for personalized patient care [3]. Recent advances in molecular profiling have been proposed in individualizing treatment for MIBC patients [4]. While the divergent but interrelated two-molecular-pathway model for the development of low-grade and high-grade BC is well-established, molecular profiling has revealed heterogeneity in the genetic landscape within MIBC, which also reflects its diversity in clinical outcomes [5][6][7].
Transcriptome profiling studies from different institutions, including The Cancer Genome Atlas (TCGA) research network, have identified distinct molecular subtypes within MIBC [8][9][10][11]. Different studies have proposed varying MIBC subtyping schemes such as the TCGA clusters, Lund Taxonomy, mRNA expression-based molecular subtypes  (TCGA-2017), and the consensus subtypes produced by the Bladder Cancer Molecular Taxonomy Group (Consensus) [9][10][11][12][13][14][15][16]. At the highest level of hierarchy, MIBC can be divided into two major subtypes-"basal" (BL) and "luminal" (LU). The BL subtype shares an expression profile with the basal cells of the urothelium and is usually associated with a poor prognosis. The LU subtype shares molecular profiles with differentiated urothelial cells and generally predicts better prognosis [7,9,10,13,14]. In the original Lund Taxonomy classification from 2012, the best cancer-specific survival (CSS) was associated with the "UroA" subtype (now "Urothelial-like"), which represented low-grade non-MIBC with a signature similar to the LU subtype [17]. Intra-tumor and intra-patient co-existence of basal and luminal tumor regions has been reported in MIBC patients, and there were also non-MIBC tumors that exhibited expression profiles similar to MIBC subtypes [13,18]. In addition, intratumoral heterogeneity which may be enhanced following neoadjuvant and adjuvant treatment can be responsible for the fluidity of molecular profiling patterns [18,19].
The primary focus of this study was to develop and validate a subtyping that could be readily applied to our patient cohort for predicting clinical outcome. We initially subtyped our institutional MIBC cohort using an unsupervised clustering of transcriptomic expression profiling. The clusters were validated with TCGA dataset of MIBC [4,11,12,14,[20][21][22].

Sample Selection and RNA Isolation
Frozen tissue samples from 30 consecutive MIBC patients who underwent radical cystectomy at Songklanagarind Hospital, Thailand, from 2015 to 2020, and 7 normal bladder mucosa tissue samples from patients with hematuria but without bladder cancer were studied. Representative sections from all specimens were re-evaluated for their histopathology by a pathologist (K.K.). Regarding specimen collection and storage, tissues were collected at the time of surgical resection, snapped into the size of around 0.5 cm, stored in a cryotube with stabilization solution (RNAlater, Thermo Fisher Scientific, Inc., Waltham, MA, USA), and kept frozen in liquid nitrogen until RNA extraction. Seven samples of non-tumorous urinary bladder wall obtained during a cystectomy were used as controls. Informed consent was obtained from all patients, and the study was approved by the Ethical Committee of Songklanagarind Hospital, Prince of Songkla University, in accordance with the Declaration of Helsinki (REC 61-222-10-1).
RNA was isolated from all samples with the DNA/RNA AllPrep kit (QIAGEN). Extracted RNA was assessed for quantity using Nanodrop 1000 (Nanodrop) or Qubit (Thermo Fisher Scientific, Waltham, MA, USA). Digital quality control (QC) analysis for the integrity of total RNA was performed using the 2100 Bioanalyzer (Agilent Technologies, Quebec City, QC, Canada), following the manufacturer's instructions.

Gene Expression Study
From each RNA sample, 3 ug of total RNA was used for strand-specific library preparation. Illumina Stranded mRNA preparation kit (Illumina) was used to generate the sequencing libraries according to the manufacturer's protocol. cDNA was prepared with random hexamer primer. The Illumina NovaSeq 6000 platform was used for transcriptome sequencing following the manufacturer's instructions.
Paired-end raw data in FASTQ format from the sequencing machine were checked for read quality, size, and GC content using the FASTQC program. Reads were aligned to the reference genome using STAR version 2.7.8. The total mapping rate and mapped read number were analyzed using HTSeq version 0.13.5. The total number of mapped reads and fragments per kilobase of exon model per million mapped reads (FPKM) were calculated for each annotated gene. DESeq2 package (R program) was used to capture the DEG. The differentially expressed genes (DEGs) for 30 BC samples and 7 non-tumorous bladder tissue samples were analyzed with the DESeq2 package, and |log2FC| > 2 and p < 0.05 were set as the cutoff for DEGs.

Clustering and Differential Expression Analysis
The transcriptomic data from MIBC samples were then clustered by K-means clustering and the principal component analysis method. Differentially expressed genes among clusters were performed and shown as a Venn diagram. Differentially expressed genes among clusters were analyzed for signaling pathway enrichment in each cluster using KEGG (Kyoto Encyclopedia of Genes and Genomes) term enrichment analysis, using a cutoff p-value of 0.05. Genes that showed different expression levels in all clusters were evaluated for their diagnostic performance by using Receiver Operating Curve (ROC) analysis to evaluate the performance of expression levels in predicting the cluster of MIBC.

Validation with TCGA Dataset
The selection of TCGA data used the keyword "muscle invasive bladder cancer". Clinicopathological data on the cohort and mRNA data were downloaded from the open access portal, cBioPortal (https://www.cbioportal.org, accessed on 24 March 2022). Patients with an unknown tumor stage (Tx) or T < 2 were excluded from analysis, yielding a final study population of 231 TCGA patients. Statistical analyses were performed using R program version 4.1.10. Association between subtype and clinical outcome was analyzed by univariate (single parameter logistic regression) analysis. ROC curves were used to compute sensitivity and specificity. Mean, median, and 95% confidence interval (CI) of sensitivity and specificity were calculated.
Association between subtypes and time to metastasis, recurrence-free survival, cancerspecific survival, or OS was determined while adjusting for demographic and clinical covariates using the Cox proportional hazard model with a stepwise selection procedure. Kaplan-Meier plots with log-rank statistics determined if subtypes classified MIBC patients into risk categories based on survival outcome. Chi-square tests were used to compare clinicopathologic data of patients for different amplicon definitions. Comparisons included age, gender, tumor stage, lymph node status, and molecular subtype. Kaplan-Meier plots with log-rank statistics categorized MIBC patients into outcome risk categories. Molecular subtypes and age were compared. The Bonferroni adjustment was employed to correct for multiple testing. The significance of univariable Kaplan-Meier regressions was assessed using the log-rank and Wilcoxon tests. Multivariable analyses were conducted using Cox proportional hazard regression. For results from the univariable analysis, a p-value cutoff of <0.2 was chosen to include relevant clinical or pathological parameters that would have been missed with a more restrictive p-value of <0.05. Contingency analyses of nominal variables were performed with Pearson's chi-squared test. Variables for the multivariable analysis included significant (p < 0.2) clinicopathological characteristics on univariable analysis (pT-Stage, pN-Stage, age, gender) and genes. Statistical analyses of numeric continuous variables were performed with non-parametric tests (Wilcoxon rank-sum test, Kruskal-Wallis test).

Patient Information and Clinical Characteristics
All 30 tissue samples were from patients recruited at Songklanagarind Hospital, Songkhla, Thailand. These included tumor tissue from 26 males and 4 females with ages between 52 and 92 years. Clinical data are provided in Table 1. The data of 231 MIBC cohorts retrieved from The Cancer Genome Atlas (TCGA) are also shown in Table 1 and include the clinical information from 169 males and 42 females between the ages of 46 and 90 years old. It should be noted that there is a quite difference in the proportion of T stages and N stages between data from tissue samples and the TCGA cohort. Moreover, no metastasis was found in our MIBC patients.

Transcriptome Profiling and Classification of Thai MIBC
The transcriptome sequencing of all tissue samples was performed based on the strand-specific library preparation to identify the expression levels of all genes. Differential expression analysis and a volcano plot demonstrated that 544 genes were found to be upregulated and downregulated in MIBC ( Figure 1A and Table S1). According to their ontology, the 30 most significantly changed genes included phosphatidylinositol 3-kinase (PI3K)/protein kinase B (AKT) signaling molecules (fibronectin 1 (FN1), collagen type VI alpha 2 chain (COL6A2), and collagen type I alpha 2 chain (COL1A2)), mitogen-activated protein kinase (MAPK) pathway-related molecules (transforming growth factor-beta 1 (TGFB1) and MDS1 and EVI1 complex locus (MECOM)), mitochondrial biogenesis regulators (metallothionein 1A (MT1A) and MT2A), exosomal proteins (tubulin beta 6 class V (TUBB6), TUBB3, galectin 1 (LGALS1), and interferon-induced transmembrane protein 3 (IFITM3)), biomolecule metabolism (cytidine deaminase (CDA), sphingosine kinase 1 (SPHK1), monoamine oxidase A (MAOA), and microsomal glutathione S-transferase 1 (MGST1)), and others. To identify the optimal number of clusters based on transcriptomic classification, Elbow plot analysis was applied for Thai MIBC transcriptome data. The results show that the three clusters were found to be optimal, as demonstrated in Figure 1B. The transcriptomic data of all MIBC tissue samples were then subjected to the classification of the MIBC groups by using principal component analysis by K-means clustering.

ROC Analysis of 37 Differentially Expressed Genes Found in MIBC Tissues
To evaluate the specificity and sensitivity of the genes expressed differently for each MIBC cluster, ROC curve analysis was performed for all 37 genes and all clusters.
Interestingly, the corresponding areas under the ROC curve (AUCs) with values more than 0.8, 0.9, and 0.95 were found in 33, 25, and 14 genes from 37 genes for cluster B (Table 2 and Figure S1). The AUC values above 0.9 were observed only in five genes for cluster C; none could be found for cluster A. The highest AUC value of cluster B was 0.988 for PDGFRB and COL6A2 genes; meanwhile, the lowest AUC was 0.72 for the ITGA8 gene (Table 2 and Figure S1). KCNMB1 is the gene that presented the highest AUC with the value of 0.936 in cluster C while ITGA11 showed the lowest AUC with 0.67. Cluster A displayed the lowest value of mean AUC; the range of AUC values of differentially expressed genes in this cluster was between 0.52 and 0.869, with the ryanodine receptor 3 (RYR3) gene showing the lowest and COL1A1 the highest AUC.

ROC Analysis of 37 Differentially Expressed Genes Found in MIBC Tissues
To evaluate the specificity and sensitivity of the genes expressed differently for each MIBC cluster, ROC curve analysis was performed for all 37 genes and all clusters.
Interestingly, the corresponding areas under the ROC curve (AUCs) with values more than 0.8, 0.9, and 0.95 were found in 33, 25, and 14 genes from 37 genes for cluster B (Table 2 and Figure S1). The AUC values above 0.9 were observed only in five genes for cluster C; none could be found for cluster A. The highest AUC value of cluster B was 0.988 for PDGFRB and COL6A2 genes; meanwhile, the lowest AUC was 0.72 for the ITGA8 gene (Table 2 and Figure S1). KCNMB1 is the gene that presented the highest AUC with the value of 0.936 in cluster C while ITGA11 showed the lowest AUC with 0.67. Cluster A displayed the lowest value of mean AUC; the range of AUC values of differentially expressed genes in this cluster was between 0.52 and 0.869, with the ryanodine receptor 3 (RYR3) gene showing the lowest and COL1A1 the highest AUC.

Clinical Characteristics and Molecular Subtypes of MIBC Associated with Treatment Outcome and the Response of Perioperative Chemotherapy
To identify factors associated with MIBC, logistic regression analysis was performed, including prognostic scores, patient characteristics, and tumor characteristics (Table 1). Univariate analysis identified tumor stage and nodal status as significant predictors of overall survival. The multivariate logistic regression model identified the tumor stage (hazard ratio (HR), 25.64; 95% confidence interval (CI), 2.31-284.04; p = 0.006) as an independent predictor of overall survival. Cluster C exhibited a higher hazard ratio without being statistically significant (HR, 2.63; 95% CI, 0.44-15.79; p = 0.291) ( Table 3). Univariate analysis showed no significant differences in survival in our patients between the molecular subtypes (p = 0.108). Pairwise comparisons using log-rank tests also showed that the overall survival did not differ significantly between each molecular subtype, with patients with the cluster B subtype experiencing the longest survival, followed by those with the cluster A subtype. The poorest survival was observed among patients with the cluster C subtype ( Figure 3A).
We applied molecular subtype classification to tumors from 30 patients treated with preoperative chemotherapy and analyzed the response to neoadjuvant chemotherapy. Among these patients, cluster C had a better pathologic response to neoadjuvant chemotherapy (40%) ( Figure 3B). Moreover, MIBC in cluster C also exhibited better outcomes after adjuvant chemotherapy for progression or metastatic-free survival ( Figure 3C).

The Transcriptomic Classification Using PCA Analysis of Tissue Samples with the TCGA Data Provided a Significant Prognostic Value of MIBC Overall Survival
To increase the number of samples used in this study, we included 231 transcriptomic data from TCGA database for classification by using PCA analysis and unsupervised Kmeans clustering. We decided to apply the centroid derived from MIBC tissue samples to separate PCA coordinates of the TCGA cohort into three clusters according to MIBC tissues ( Figure 4A). We also determined the relationship between each cluster and survival data. The Kaplan-Meier analysis demonstrated that cluster B displayed the highest probability of survival within 1800 days of follow-up while cluster C showed the lowest value (p = 0.028; Figure 4B). Clinical characteristics of MIBC from TCGA in each cluster are shown in Table S2.

The Transcriptomic Classification Using PCA Analysis of Tissue Samples with the TCGA Data Provided a Significant Prognostic Value of MIBC Overall Survival
To increase the number of samples used in this study, we included 231 transcriptomic data from TCGA database for classification by using PCA analysis and unsupervised Kmeans clustering. We decided to apply the centroid derived from MIBC tissue samples to separate PCA coordinates of the TCGA cohort into three clusters according to MIBC tissues ( Figure 4A). We also determined the relationship between each cluster and survival data. The Kaplan-Meier analysis demonstrated that cluster B displayed the highest probability of survival within 1800 days of follow-up while cluster C showed the lowest value (p = 0.028; Figure 4B). Clinical characteristics of MIBC from TCGA in each cluster are shown in Table S2.

Certain Signaling Pathways Were Associated with Each Type of MIBC Cluster
To identify the signaling pathways enriched in each cluster, the transcriptomic data were used with KEGG (Kyoto Encyclopedia of Genes and Genomes) term enrichment analysis. The metabolic pathways or signal transduction pathways associated with differentially expressed genes, comparing the whole genome background with the KEGG terms and p-adj < 0.05, are justified as significant enrichment. The top 10 significantly enriched terms in the KEGG enrichment analysis are displayed for each cluster ( Table 4). The calcium signaling pathway was found to be a generally significant process in all clusters. However, the chemokine signaling pathway was observed significantly only in clusters A and B. Interestingly, the immune signal transduction pathways, including Janus kinase-signal transducer and activator of transcription (JAK-STAT), B cell receptor signaling, and T cell receptor signaling pathways, were marked as the key mechanisms in cluster A of MIBC specifically, while advanced glycation end product-receptor for AGE (AGE-RAGE) and Ras-associated protein-1 (Rap1) signaling pathways were found as significantly enriched molecular processes in cluster B. For cluster C, cGMP-PKG, oxytocin, MAPK, and Relaxin signaling pathways were observed to be unique in this cluster.
In addition, most available clustering schemes are based on cDNA microarray data, and the current high-throughput molecular techniques have migrated to the next-generation sequencing platform.
To our knowledge, our study is the first report of a single institutional MIBC subtyping cohort using unsupervised clustering based on transcriptomic data. The primary finding of this study is that the locations of MIBC clusters on the principal components identified from transcriptome data can be predicted from an understanding of the average coalescent differential genes for tissue samples. This analysis transformed the high-dimensional data into an orthogonal basis which represents the variants of mRNA expression profile in each sample. Unsupervised clustering revealed the three clusters of MIBC with the 37 genes expressed differently in all clusters. Interestingly, all these genes are related to the signaling pathway associated with cancers. For example, the calcium signaling pathway was found to be increased in colon cancer [23], breast cancer [24], and liver cancer [25]. PI3K-Akt activation was also found in breast cancer [26], gastric cancer [27], and thyroid carcinoma [28]. The ubiquitous signal transduction MAPK pathway is also associated with cancer cell proliferation and survival and an inflammatory environment [29]. Although all these signal transductions are in cancers, the dominant pathway in the cancer cell depends on the genetic background, the mutation status, or type of cancer [30], determining the aggressive behavior, the progression rate, and drug response of cancer.
Surprisingly, most of the genes are not related to the markers used for sub-typing in the previous reports [4,[9][10][11][12][13][14][15][16]. When compared to the original report that used data from the Dana-Faber Cancer Institute as a training set, only two markers in our study (MT2A, uniquely expressed in cluster A, and HMGCS2, uniquely expressed in the cluster B) overlapped with the BASE47 panels of that work [9]. The difference in molecular patterns that defined sub-types might be explained by the difference in the expression study technique, as our study used transcriptomic profiling by RNA sequencing rather than cDNA microarray. Among the 37 genes that were significantly less differentially expressed in our cluster B, 13 had differential expression levels in that study, and all 13 were overexpressed in their Basal subtype, which means that our cluster B corresponded to the Luminal subtype. Consistent with other studies in which the Basal subtype had poorer prognosis, our cluster B had higher overall survival probability, which was also validated by TCGA data. Apart from transcriptomic profiling, other studies evaluated the association of other biological factors such as lncRNA, miRNA, protein expression, or DNA methylation [9][10][11][12][13][14][15][16], while we focused only on transcriptomic profiles and determined the sensitivity and specificity of the genes instead. ROC curve analysis revealed that most of the genes showed a high correlation of sensitivity and specificity only for cluster B, which may be used as expression markers for good prognosis in Thai MIBC patients. Our transcriptomic clustering provided three clusters of MIBC tissue which expressed the specific pattern of mRNA profiling. In addition to our MIBC transcriptomic study from patients, we used the information from TCGA dataset for validation. However, PCA analysis with the comparison between two cohorts demonstrated the obviously different PC coordinates between data from our MIBC tissue samples and TCGA dataset. The variation in the genetic background of the different populations studied may be the factor that caused the difference in the PCA data plot [31]. By using the initial cluster centroids of the MIBC tissue data, we applied the distance of tissue PCA coordinated with mRNA expression profile to TCGA data for transcriptomic clustering of the TCGA cohort. This confirmed the influence of distinct expression scenarios on other populations. The three clusters obtained from this method were related to the significant difference in the overall survival of MIBC patients, meaning that the classification based on transcriptomic data of MIBC tissue may be an alternative way to predict the survival outcome.
Historically, molecular classification of bladder tumors has identified two divergent pathways with distinct genetic hallmarks characterizing low-grade and high-grade tumors. Molecular subtyping classifications have provided insight into the biology of bladder tumors. It was reported that MIBC subtypes may predict patient response to chemotherapy and could be used to develop targeted therapies. Since gene expression patterns change over the course of treatment, subtype drifting or switching due to tumor evolution and reaction to therapy should be concerned. Therefore, there is a need for systematic biological studies to better stratify BC based on genetic drivers of treatment response rather than subtyping based on expression profiling alone [10,32].
The present study demonstrates that subtyping by an unsupervised differential gene expression based on RNA-Seq is possibly useful in predicting survival outcomes. Furthermore, regardless of subtyping classification, only clinical parameters such as N-stage and LVI were independent predictors of prognosis in multivariate analyses. In any cohort, subtypes by any classification may provide suboptimal efficacy to predict the clinical outcome if not used together with clinical parameters. The limitation of our study was the relatively low number of BC samples used in the transcriptomic analysis. In addition, most clustering to date uses multi-omics platforms which combine genome variants with expression data [33]. In the present study, although the clustering could be validated with the TCGA dataset, the method should be further evaluated in an adequate number of Thai MIBC to narrow down the representative markers for clinical application. Recent translational research focused on using non-invasive biomolecular markers such as urinary biomarkers or blood profiles in the diagnosis and classification of BC [34][35][36]. In combination with transcriptomic profiling data, more precise stratification of patient care in BC can be expected in the near future.

Conclusions
Molecular subtyping classifications have provided insight into the biology of bladder tumors, especially regarding the relationship between tumor heterogeneity and prognosis. Results from multiple cohorts and our classification systems revealed that subtypes are strongly associated with histopathologic grade and are consistent with clinical parameters to predict BC patient outcomes. Cluster B in our study had a significantly higher survival probability with the current standard treatment. However, a disparity in subtyping consensus between studies exists and needs verification by modern high-throughput genotyping techniques. Further investigation is needed to find the clinical applicability of molecular subtypes before their incorporation into the personalized care of MIBC patients.