Identification of Long Non-Coding RNA MIR4435-2HG as a Prognostic Biomarker in Bladder Cancer

The abnormal expression of long non-coding RNAs(lncRNAs) is closely related to the prognosis of patients. This finding may indicate a new target for the treatment of malignant tumors. Non-muscle invasive bladder cancer (NMIBC) is the most common subtype of bladder cancer, and BCG intravesical therapy is the first-line treatment for NMIBC, but about half of NMIBC patients relapse within two years after BCG treatment. Therefore, it is necessary to screen out lncRNAs related to the prognosis and treatment of BGC-resistant patients. Here, we performed differential expression analysis of lncRNAs in the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) datasets, and screened MIR4435-2HG as the only BCG-response-related lncRNA. Then, the prognosis value of MIR4435-2HG was validated in several publicly available cohorts, and confirmed its prognostic value in bladder cancer of different stages. In addition, we also analyzed its genetic alterations, clinical relevance, function enrichment, and association with other biomarkers. Further validation of the expression of MIR4435-2HG might be helpful to instruct NMIBC prognosis and treatment.


Introduction
Bladder cancer is a common malignancy and is the fourth most common cancer in men [1]. With the improvement in diagnostics and technology, the morbidity of bladder cancer has gradually increased in recent years [2]. Although the management of bladder cancer has made great progress in the past two decades, there is a significant rate of recurrence [3].
Bladder cancer is divided into two categories: non-muscle invasive bladder cancer (NMIBC) and muscle invasive bladder cancer (MIBC), according to the depth and level of the invasion of bladder tumor [4]. Although the surgical treatment of NMIBC has a good effect, the one year recurrence rate can reach 15 to 70% [5]. In order to reduce the recurrence of intermediate-and high-risk NMIBC patients, the EAU guidelines recommended that postoperative intravesical instillation therapy should be maintained for more than one year [6]. BCG is the first-line treatment for NMIBC, in patients with intermediate-and high-risk of NMIBC, the effect is more pronounced than that of chemotherapeutic drug infusion, but about half of NMIBC patients still relapse within two years after BCG treatment [7,8]. Additionally, the BCG treatment changed the pattern of patients' prognosis. The existing NMIBC scale, EORTC and EAU prognostic factor risk groups overestimated the risk of progression in patients treated with BCG, and the EORTC tables overestimated the risk of recurrence in patients treated with BCG [9,10]. The poor prognosis of NMIBC is related to trained immunity, while lncRNA can regulate trained immunity [11,12]. Therefore, lncRNAs could potentially assist the prognosis prediction in BCG-treated patients.
Most RNA in human cells are non-coding RNA (ncRNA), which is classified into micro RNAs(miRNAs), circular RNAs (circRNAs), long non-coding RNAs (lncRNAs), nucleolar RNA (snRNA), and other types of ncRNAs [13,14]. lncRNAs are RNA transcripts with a length of more than 200 nucleotides and play an important role in the biological procession of disease [15]. The abnormal expression of lncRNA is closely associated with prognosis and therapy targets of malignancies [16,17].
The identification of lncRNAs as new biomarkers and prognosis for bladder cancer is promising. In the present study, we used TCGA transcriptome data and GSE176178 data, aiming to identify differentially expressed lncRNA with predictive value for BCG durable NMIBC patients. We screened an only overlap lncRNA which is named MIR4435-2HG from the above-mentioned databases. Therefore, MIR4435-2HG is likely to be associated with the prognosis of patients with NMIBC treated with BCG. Then, we validated the prognosis value of MIR4435-2HG in multiple publicly available datasets. In addition, we also analyze its genetic alterations, clinical relevance, function enrichment, and association with other biomarkers. Accordingly, we proposed to assess MIR4435-2HG expression and clinical prognosis of BCG durable NMIBC patients in these databases.

Data Source and Preprocessing
The FPKM-normalized transcriptome data of the Cancer Genome Atlas (TCGA) and the relevant clinical information were downloaded from the UCSC Xena (https: //xenabrowser.net/datapages (accessed on 28 November 2021)) [18]. The lncRNAs' expressions are selected. The count RNA-seq expression data of GSE176178 and FPKMnormalized RNA-seq of GSE154261 were retrieved from the Gene Expression Omnibus (GEO). The transcriptional expression profiling by array in GSE13507 was downloaded from GEO and normalized with the "limma" package [19]. The normalized expression profile and clinical information of the UROMOL cohort were downloaded from the supplementary data of the UROMOL project publication [20]. Information about the above datasets is in Table S1.

Differently Expressed Gene (DEG) Analysis
The DEG of normal and tumor in the TCGA-BLCA cohort were computed using the "limma" package [19]. The count data of GSE176178 were used as input for the "DESeq2" package to identify DEGs between BCG durable and non-durable patients [21]. In GSE176178, BCG non-durable was defined as patients with recurrence of BCa (any stage or grade) within 2 years, and BCG durable had no recurrences detected during follow-up with a disease-free interval of at least 2 years [22]. The threshold for the adjusted p-value was set as <0.05, and >0.5 for log2 fold change (logFC).

Survival Analysis
The expression of MIR4435-2HG and oncologic outcomes were integrated to assess the prognostic value of MIR4435-2HG expression. The oncologic outcomes included overall survival (OS), progression-free survival (PFS), and recurrence-free survival (RFS) if available. The optimal cut-off values of MIR4435-2HG expression were calculated by the "surv_cutpoint" function of "survminer" package and the minimal proportion of observations per group was set as 0.2. The expression level was divided into high and low levels according to cut-off values. The Kaplan-Meier survival analysis with log-rank test was applied to estimate the effect of MIR4435-2HG level (high vs. low) on patient outcomes. Univariate and multivariate analyses with the Cox proportional hazards model were used to identify independent predictors of BCa outcomes.

Function Enrichment
We utilized cBioPortal to find the MIR4435-2HG co-expressed genes in the TCGA-BLCA cohort [23]. Co-expressed genes with a Spearman's r > 0.6 were selected to perform function enrichment. The clusterProfiler was used to perform gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) enrichment analysis [24].

Association with Other Biomarkers
The immune infiltrates of the TCGA-BLCA cohort by quanTIseq algorithms were downloaded from the TIMER 2.0 database [25,26]. The homologous recombination deficiency (HRD) score, DNA methylation-based stemness scores (DNAss), and RNA-based stemness scores (RNAss) of the TCGA Pan-Cancer cohort were downloaded from UCSC Xena. The MAF file of TCGA-BLCA was downloaded and the tumor mutational burden (TMB) was calculated with Maftools [27]. The microsatellite instability (MSI) assessed by MANTIS was downloaded from Bonneville's publication [28]. The associations between MIR4435-2HG expression and these biomarkers were evaluated using Spearman's r.

Statistics Analysis and Codes
All analyses were conducted by R software (version 4.1.1), and the packages were mentioned above. All codes are available on request. Continuous data were compared using the Wilcoxon rank-sum test. Statistical significance was set at p < 0.05.

Identification of Differentially Expressed lncRNA MIR4435-2HG
The flow diagram of this analysis is presented in Figure 1. After screening, 1441 lncRNAs in TCGA dataset were selected for DEG analysis, and 376 differently expressed lncRNAs between normal and BCa tissue were identified. DEG analysis of GSE176178 dataset presented 187 DEGs between BCG durable and non-durable patients (Supplementary Figure S1). Eventually, MIR4435-2HG was identified as the only common gene in the two DEG analyses. It is worth noting that the different cohorts consist of different populations: UROMOL is NMIBC, GSE154261 is BCG-treated NMIBC, TCGA is MIBC, and GSE13507 is a mixed cohort of NMIBC and MIBC. Univariate and multivariate analyses with the Cox proportional hazards model were used to identify independent predictors of BCa outcomes.

Function Enrichment
We utilized cBioPortal to find the MIR4435-2HG co-expressed genes in the TCGA-BLCA cohort [23]. Co-expressed genes with a Spearman's r > 0.6 were selected to perform function enrichment. The clusterProfiler was used to perform gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) enrichment analysis [24].

Association with Other Biomarkers
The immune infiltrates of the TCGA-BLCA cohort by quanTIseq algorithms were downloaded from the TIMER 2.0 database [25,26]. The homologous recombination deficiency (HRD) score, DNA methylation-based stemness scores (DNAss), and RNA-based stemness scores (RNAss) of the TCGA Pan-Cancer cohort were downloaded from UCSC Xena. The MAF file of TCGA-BLCA was downloaded and the tumor mutational burden (TMB) was calculated with Maftools [27]. The microsatellite instability (MSI) assessed by MANTIS was downloaded from Bonneville's publication [28]. The associations between MIR4435-2HG expression and these biomarkers were evaluated using Spearman's r.

Statistics Analysis and Codes
All analyses were conducted by R software (version 4.1.1), and the packages were mentioned above. All codes are available on request. Continuous data were compared using the Wilcoxon rank-sum test. Statistical significance was set at p < 0.05.

Identification of Differentially Expressed lncRNA MIR4435-2HG
The flow diagram of this analysis is presented in Figure 1. After screening, 1441 lncRNAs in TCGA dataset were selected for DEG analysis, and 376 differently expressed lncRNAs between normal and BCa tissue were identified. DEG analysis of GSE176178 dataset presented 187 DEGs between BCG durable and non-durable patients (Supplementary Figure S1). Eventually, MIR4435-2HG was identified as the only common gene in the two DEG analyses. It is worth noting that the different cohorts consist of different populations: UROMOL is NMIBC, GSE154261 is BCG-treated NMIBC, TCGA is MIBC, and GSE13507 is a mixed cohort of NMIBC and MIBC.

Expressed Level of MIR4435-2HG in Diverse Clinical Groups
Next, we evaluated the expression of MIR4435-2HG from TCGA. Its expression is significantly different between tumor vs. normal group (Figure 2a) and paired sample group (Figure 2b). However, there was no significant difference in the expression of MIR4435-2HG between age groups ( Figure 2c) and gender groups (Figure 2d). In the grade groups, the level of MIR4435-2HG in high grade was markedly higher than in low grade (Figure 2e). Not only that, the TNM stage of bladder cancer was analyzed by Wilcoxon rank-sum test, and the results showed that T stage (Figure 2f

The Prognostic Value of MIR4435-2HG was Confirmed by Survival Analysis
According to the "surv cutpoint" calculated previously, the expression of MIR4435-2HG was divided into high-level and low-level in various datasets that include UROMOL, GSE154261 and TCGA. We divide the UROMOL into low-level groups and high-level groups, and Kaplan-Meier analysis showed that the low-level group had more prominent PFS in the UROMOL dataset than the high-level group (p < 0.0001) (Figure 3a). In the GSE154261 cohort, we analyzed the PFS of patients in the low-level group and in the high-

The Prognostic Value of MIR4435-2HG was Confirmed by Survival Analysis
According to the "surv cutpoint" calculated previously, the expression of MIR4435-2HG was divided into high-level and low-level in various datasets that include UROMOL, GSE154261 and TCGA. We divide the UROMOL into low-level groups and high-level groups, and Kaplan-Meier analysis showed that the low-level group had more prominent PFS in the UROMOL dataset than the high-level group (p < 0.0001) (Figure 3a). In the GSE154261 cohort, we analyzed the PFS of patients in the low-level group and in the high-level group and the results showed PFS that had discrepancy (p = 0.03) (Figure 3b). Similarly, in the TCGA cohort, OS was significant longer in the low-level group than in the high-level group (p = 0.009) (Figure 3c). Additionally, the RFS of the patient was analyzed in GSE154261, and there was a marked difference between the low-level group and the high-level group (p = 0.0083) (Supplementary Figure S2a). In the GES13507 cohort, the low-level group had a better prognosis than the high-level group (p = 0.0049) (Supplementary Figure S2b).
Genes 2022, 13, x FOR PEER REVIEW 6 of 11 level group and the results showed PFS that had discrepancy (p = 0.03) (Figure 3b). Similarly, in the TCGA cohort, OS was significant longer in the low-level group than in the high-level group (p = 0.009) (Figure 3c). Additionally, the RFS of the patient was analyzed in GSE154261, and there was a marked difference between the low-level group and the high-level group (p = 0.0083) (Supplementary Figure S2a). In the GES13507 cohort, the low-level group had a better prognosis than the high-level group (p = 0.0049) (Supplementary Figure S2b).

The Genetic Variations, Functional Enrichment and Association with Other Biomarkers of MIR4435-2HG
The cBioPortal was utilized to assess the genetic variations of MIR4435-2HG ( Figure  4a). In the 411 TCGA-BLCA patients, four high-level copy number amplifications and one deep deletion were detected. Besides, 66 low-level copy number amplifications and 76 shallow deletions were also observed. The MIR4435-2HG mRNA expression was reduced in the copy number deletion group, while the copy number amplification did not elevate the MIR4435-2HG mRNA level (Figure 4b). To investigate the potential biological processes and functions of MIR4435-2HG, the co-expressed genes were used to conduct GO analysis and KEGG pathway analysis. The dot plot of Figure 4c showed that MIR4435-2HG was significantly enriched in biological processes (BP) associated with immunity, involving neutrophil degranulation, neutrophil activation involved in immune response and neutrophil mediated immunity. In Figure 4d, the MIR4435-2HG was enriched in the extracellular matrix structural constituent related to molecular functions, such as collagen binding and integrin binding. KEGG pathway analysis manifested that MIR4435-2HG was enriched in immune-related pathways, which was consistent with the result of anterior GO analysis. The terms included focal adhesion, phagosome and salmonella infection. In addition, the term proteoglycans was related to cancer (Figure 4e). Not only that, the immune infiltrate of the TCGA-BLCA cohort indicated that MIR4435-2HG was related to immunity, including Macrophage M1/M2, Monocyte, Tregs and T cell CD8+ (Figure 4f). Other tumor-related biomarkers were also detected (Figure 4g). The aforesaid outcomes demonstrate that MIR4435-2HG was highly correlated with cancer and immunity.

The Genetic Variations, Functional Enrichment and Association with Other Biomarkers of MIR4435-2HG
The cBioPortal was utilized to assess the genetic variations of MIR4435-2HG (Figure 4a). In the 411 TCGA-BLCA patients, four high-level copy number amplifications and one deep deletion were detected. Besides, 66 low-level copy number amplifications and 76 shallow deletions were also observed. The MIR4435-2HG mRNA expression was reduced in the copy number deletion group, while the copy number amplification did not elevate the MIR4435-2HG mRNA level (Figure 4b). To investigate the potential biological processes and functions of MIR4435-2HG, the co-expressed genes were used to conduct GO analysis and KEGG pathway analysis. The dot plot of Figure 4c showed that MIR4435-2HG was significantly enriched in biological processes (BP) associated with immunity, involving neutrophil degranulation, neutrophil activation involved in immune response and neutrophil mediated immunity. In Figure 4d, the MIR4435-2HG was enriched in the extracellular matrix structural constituent related to molecular functions, such as collagen binding and integrin binding. KEGG pathway analysis manifested that MIR4435-2HG was enriched in immune-related pathways, which was consistent with the result of anterior GO analysis. The terms included focal adhesion, phagosome and salmonella infection. In addition, the term proteoglycans was related to cancer (Figure 4e). Not only that, the immune infiltrate of the TCGA-BLCA cohort indicated that MIR4435-2HG was related to immunity, including Macrophage M1/M2, Monocyte, Tregs and T cell CD8+ (Figure 4f). Other tumor-related biomarkers were also detected (Figure 4g). The aforesaid outcomes demonstrate that MIR4435-2HG was highly correlated with cancer and immunity.

The Independent Predictors of BCa Were Identified by Univariate and Multivariate Cox Analyses
We verified independent prognostic values of BCa by Cox proportional hazards model. As shown in Table 1, the MIR4435-2HG level and some clinical features were identified as highly correlated with PFS in patients with UROMOL by univariate Cox analysis. We conducted a multivariate Cox analysis of highly correlated predictors, such as MIR4435-2HG level (p = 0.0002), tumor stage T1 (p < 0.0001), tumor grade (p < 0.0001), EORTC high risk (p < 0.0001), EAU high risk (p = 0.0082). The results of multivariate Cox analysis displayed three significantly independent risk factors including high MIR4435-2HG level (HR: 9.14, 95% CI: 2.21-37.76, p = 0.0022), tumor stage T1 (HR: 2.9, 95% CI: 1.46-5.77, p = 0.0024), and high risk according to the EORTC risk  (c-e) Gene ontology (GO) and Kyoto encyclopedia of genes genomes (KEGG) enrichment analysis for MIR4435-2HG, genes with a Spearman's r > 0.6 were used as input. (f,g) Immune infiltrates of TCGA-BLCA and association between MIR4435-2HG expression and these biomarkers were evaluated using Spearman's r.

The Independent Predictors of BCa Were Identified by Univariate and Multivariate Cox Analyses
We verified independent prognostic values of BCa by Cox proportional hazards model. As shown in Table 1, the MIR4435-2HG level and some clinical features were identified as highly correlated with PFS in patients with UROMOL by univariate Cox analysis. We conducted a multivariate Cox analysis of highly correlated predictors, such as MIR4435-2HG level (p = 0.0002), tumor stage T1 (p < 0.0001), tumor grade (p < 0.0001), EORTC high risk (p < 0.0001), EAU high risk (p = 0.0082). The results of multivariate Cox analysis displayed three significantly independent risk factors including high MIR4435-2HG level (HR: 9.14, 95% CI: 2.21-37.76, p = 0.0022), tumor stage T1 (HR: 2.9, 95% CI: 1.46-5.77, p = 0.0024), and high risk according to the EORTC risk

Discussion
Bladder cancer is a highly prevalent disease with high morbidity and mortality in developed countries [2]. Although great progress has been made in the treatment of bladder cancer, it is still difficult to identify reliable molecular markers for the prognosis of bladder cancer [29]. In recent years, an increasing number of studies have reported that lncRNAs are associated with the pathophysiology and progression of bladder cancer, and they are expected to become new prognostic biomarkers for BCa [30].
BCG immunotherapy is used in the treatment of NMIBC, which is the gold standard for non-recurrent or progressive NMIBC [29]. BCG vaccine induces specific tumor immunity, which may be due to the internalization of the BCG vaccine in the urinary tract epithelium, resulting in the production of cytokines and chemokines, and then recruits a series of immune cells to participate in the immune response [31]. Clinically, the treatment of BCG on NMIBC results in no effect or resistance which are inexplicable [32]. Thus, there is an urgent need to find markers related to BCG response.
The functional annotation of non-coding RNA (ncRNA) is not well-explored, so most of the studies are limited to the prognosis values of protein-coding genes, and there are relatively few studies on ncRNA such as long non-coding RNA [33]. In particular, there are few lncRNA-related studies in NMIBC, and the exploration of the association with BCG treatment response has not been involved [34][35][36]. In the current study, we identified that MIR4435-2HG expression may serve as a valuable predictor for BCG durable in patients with NMIBC.
The lncRNA MIR4435-2HG, also known as LINC00978, is located on human chromosome 2q13. It plays an oncogenic role that is implicated in various tumors [37]. For example, the MIR4435-2HG was highly expressed in liver cancer tissue and hepatoma cell lines, and was positively associated with progression and poor prognosis of liver cancer patients [38]. Previous studies have shown that MIR4435-2HG promotes the proliferation and invasion of non-small cell lung cancer by inhibiting miR-6754-5p expression [39]. Additionally, MIR4435-2HG knockdown can inhibit the proliferation, invasion and migration of prostate cancer PC-3 cells by inhibiting FAK/AKT/β-catenin signaling pathway [40]. Moreover, Chen et al. confirmed through experiments that the knockdown of MIR4435-2HG can lead to the inactivation of Wnt/β-catenin signaling pathway, further increased cell apoptosis and decreased cell proliferation, migration and invasion [41]. Wang et al. showed that knocked down MIR4435-2HG inhibited the cell proliferation, colony formation, and migration of bladder cancer cells. They also demonstrated that MIR4435-2HG can sponge miR-4428 to promote cancer progression; it is expected to become a potential therapeutic target for bladder cancer [42].
In this study, MIR4435-2HG, an lncRNA marker, was screened out through the exploration of public data mining. MIR4435-2HG showed a certain prognostic value in the cohort of patients with NMIBC and MIBC, and it was also related to the response to BCG perfusion therapy in patients with NMIBC. It can be used to evaluate the clinical prognostic significance of patients with BCa and can be used as a molecular marker target for the treatment of drug resistant patients with BCG. The main drawback of this study is that all the analyses were based on publicly available datasets. However, through the joint verification of MIR4435-2HG by multiple cohorts, the prognosis with a significant stratification effect is obtained, and the result is robust enough. In addition, Wang et al. have experimentally verified the function of MIR4435-2HG in bladder cancer cells, which supported the important role of MIR4435-2HG. However, the downstream targets and related pathways of miR-4428 still need to be further studied. At the same time, whether MIR4435-2HG has an effect on the tumor immune microenvironment and its specific mechanism also needs to be explored.

Conclusions
In conclusion, we analyzed the public data of multiple bladder cancer cohorts and found that MIR4435-2HG has a significant prognostic value. Our findings provide a new insight for the prediction of bladder cancer, not only to provide auxiliary indicators for clinical application to judge the prognosis and to formulate management strategies, but also to provide new directions for experimental research.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/genes13081462/s1, Figure S1: DEG analysis is presented by volcanic maps. (a) lncRNAs in TCGA dataset were screened between normal and BCa tissue (Red indicates an upregulation, Blue is the opposite). (b) DEG analysis of GSE176178 according to BCG durable and non-durable patients (Red indicates an upregulation, Blue is the opposite); Figure S2: The Kaplan-Meier survival analysis with log-rank test was applied to estimate the effect of MIR4435-2HG level (high vs. low) on patient outcomes. (a) The recurrence-free survival (RFS) in GSE154261 cohort. (b) The overall survival (OS) in GSE13507 cohort; Table S1: Information about the dataset.