LncRNA/miRNA/mRNA Network Introduces Novel Biomarkers in Prostate Cancer

The construction of a competing endogenous RNA (ceRNA) network is an important step in the identification of the role of differentially expressed genes in cancers. In the current research, we used a number of bioinformatics tools to construct the ceRNA network in prostate cancer and identify the importance of these modules in predicting the survival of patients with this type of cancer. An assessment of microarray data of prostate cancer and normal samples using the Limma package led to the identification of differential expressed (DE) RNAs that we stratified into mRNA, lncRNA, and miRNAs, resulting in 684 DEmRNAs, including 437 downregulated DEmRNAs (such as TGM4 and SCGB1A1) and 241 upregulated DEmRNAs (such as TDRD1 and CRISP3); 6 DElncRNAs, including 1 downregulated DElncRNA (H19) and 5 upregulated DElncRNAs (such as PCA3 and PCGEM1); and 59 DEmiRNAs, including 30 downregulated DEmiRNAs (such as hsa-miR-1274a and hsa-miR-1274b) and 29 upregulated DEmiRNAs (such as hsa-miR-1268 and hsa-miR-1207-5p). The ceRNA network contained a total of 5 miRNAs, 5 lncRNAs, and 17 mRNAs. We identified hsa-miR-17, hsa-miR-93, hsa-miR-150, hsa-miR-25, PART1, hsa-miR-125b, PCA3, H19, RND3, and ITGB8 as the 10 hub genes in the ceRNA network. According to the ROC analysis, the expression levels of 19 hub genes showed a high diagnostic value. Taken together, we introduce a number of novel promising diagnostic biomarkers for prostate cancer.


Introduction
Prostate cancer is the most frequent malignancy in males worldwide, accounting for 27% of all cancer diagnoses [1]. During the period from 2014 to 2018, the incidence of prostate cancer has stayed stable, yet the annual incidence for advanced prostate cancer has increased by 4-6% since 2011 [1]. This means that the percentage of prostate cancer cases being diagnosed at advanced stages has increased by approximately two times over the past decade. Consequently, it is necessary to find appropriate markers for an early diagnosis of this common cancer. The prostate-specific antigen (PSA), as the mostly used marker, has some limitations, including the poor interchangeability of PSA results which are acquired from diverse tests. The current gaps should be filled by arranging commutable reference materials for calibration immunoassay tests, identifying analytical features that can clarify the different performance of assays, and giving more focus on laboratory tests when clinical practice guidelines are organized [2].
It is also important to mention that high-grade prostate cancer is likely on a genetic basis, whereas low-grade disease is more likely to be associated with environmental factors. In fact, the vivid alteration in stage from the time when PSA was introduced as a screening test has been accompanied by a more modest shift in Gleason grade, suggesting that grade may be established in the early phases of tumor pathogenesis [3]. Meanwhile, metastatic prostate cancers exhibit histological and immunophenotypical heterogeneity, necessitating the conduction of individualized therapeutic regimens [4]. For reasons of cost-effectiveness, this panel should be applied to individuals at risk of high-grade prostate cancer, likely based on PSA levels. According to current guideline recommendations, more recently, a PSA-based algorithm for predicting high-grade tumors has been developed [5].
The recent decade has witnessed a revolution in the application of high throughput data analysis tools, leading to the construction of several interaction networks between different types of biomolecules including RNA, DNA, and protein in different disease contexts, particularly cancer. This type of study has led to the identification of promising biomarkers for the early detection of cancer. In the field of prostate cancer, several efforts have been made. For instance, the re-analysis of high throughput expression data has led to the identification of several differentially expressed genes (DEGs) between cancerous and normal tissues. Further functional enrichment analyses have shown the relationship between these genes and clinical outcomes of patients [6]. Similar approaches have identified differentially expressed circular RNAs in this type of cancer and the main signaling pathways being controlled by these transcripts [7].
The competing endogenous RNA (ceRNA) network being constructed between long non-coding RNAs (lncRNAs), microRNAs (miRNAs), and mRNAs has been shown to contribute to the pathoetiology of different cancers and regulate fundamental processes, both within cancer cells and inside the tumor microenvironment [8,9]. Identification of this type of interaction between different RNA molecules can introduce novel biomarkers for cancer diagnosis. In the current research, we used a number of bioinformatics tools to construct the ceRNA network in prostate cancer and identify the importance of these modules in the prediction of survival of patients with this type of cancer.

Microarray Data Collection
The Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) (accessed on 12 September 2022) was used to obtain the human expression data for GSE88808 (Illumina HumanHT-12 WG-DASL V4.0 expression beadchip (gene summary, GenomeStudio report), Duarte, CA, USA), GSE200879 ((HTA-2_0) Affymetrix Human Transcriptome Array 2.0 (Genosplice), Créteil, France), and GSE60117 (Agilent-021827 Human miRNA Microarray (V3), Palo Alto, CA, USA). There were 98, 137, and 77 samples in each of these datasets, respectively. For further analysis, we chose 113 prostate cancers and 9 normal tissues from GSE200879, 49 prostate tumors and 49 adjacent normal samples from GSE88808, and 56 prostate tumors and 21 normal tissues from GSE60117 (Table 1). Moreover, we selected prostate tumor samples with Gleason scores of 7 or above. The expression data included the expression signatures of lncRNAs, miRNAs, and mRNAs. language R was used for the processing and integration of microarray data. Data were initially normalized independently for pre-processing using the preprocessCore package's normalizeQuantiles function (https://bioconductor.org/packages/release/bioc/html/ preprocessCore.html (accessed on 14 September 2022). Next, data from both platforms were combined using the R software. The ComBat function from the R Package Surrogate Variable Analysis (SVA) was utilized to exclude batch effects (non-biological differences) [10]. The batch-effect removal was assessed using PCA and a boxplot. A unit expression matrix was the outcome of the meta-analysis (the combination of three datasets of this study).

Analysis of Differentially Expressed lncRNAs, miRNAs and mRNAs
We used the Limma package in R language [11] to identify differentially expressed mR-NAs (DEmRNAs), lncRNAs (DElncRNAs), and miRNAs (DEmiRNAs) between prostate and normal samples. GSE88808 and GSE200879 were used to obtain DEmRNAs and DEl-ncRNAs. GSE60117 was utilized to acquire DEmiRNAs. The cut-off criteria for evaluating DEGs were |log2 fold Change (FC)| > 0.5 and false discovery rate (FDR; adjusted p value) < 0.05. Following that, we used the HUGO gene nomenclature committee to identify DElncRNAs.

Two-Way Clustering of DEGs
We determined the gene expression levels of significant DEGs. We used this data in the pheatmap package in R language [12] to complete the two-way clustering based on the Euclidean distance.

TCGA Data Collection and Processing
We included a total of 500 PRAD samples and 52 control samples for further analysis. We used the TCGAbiolinks package to download the transcriptome profiling data (TCGA-PRAD), and limma and edgeR packages to analyze the data. As a result, DEGs were evaluated with the cut-off criteria of the false discovery rate (FDR; adjusted p value) < 0.05 and |log2 fold Change (FC)| > 0.5. Finally, we identified the genes that were present in both the TCGA and GEO datasets.

Gene Ontology (GO) Enrichment Analysis
We used the clusterProfiler R package [13] to perform gene ontology (GO) enrichment analysis to investigate the functions of the remarkably upregulated and downregulated DEGs that we discovered. The functional category criteria were established at an adjusted p-value of 0.05 or below.

Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Analysis
To determine the potential roles of DEGs that took part in the pathways based on the KEGG database, KEGG pathway analyses of these genes were conducted [14].

PPI Network Construction and Hub Genes Identification
The PPI network for DEGs was built using the STRING database [15]. The highest level of confidence (confidence score > 0.9) and text mining, experiments, and database sources were used to establish the interactions parameter. The interactions between the proteins were examined using the Cytoscape software v3.9 [16]. Finally, the Cytohubba plugin [17] of Cytoscape was used to calculate the degree of connectivity of nodes to find the top 20 DEGs as hub genes.

Regulatory Network of miRNA-Hub Genes and TF-Hub Genes
The Networkanalyst database [18] was used to create the relationships between the PPI hub genes and the transcription factors (TFs) and miRNAs. As a result, we identified TF and miRNA with the highest degree in the networks.

Validation of Hub Genes via Expression Values and Receiver Operating Characteristic (ROC) Curve Analysis
The expression value of hub genes was assessed using the Gepia database [23]. The hub genes in the TCGA-PRAD RNA-seq data were examined, and those present in the PPI and ceRNA networks as well as in the TCGA-PRAD were chosen for gene expression validation. Additionally, the area under the curve (AUC) values derived from ROC curve analysis were used to assess the diagnostic efficacy of hub genes. Figure 1 demonstrates the workflow of the study.

Regulatory Network of miRNA-Hub Genes and TF-Hub Genes
The Networkanalyst database [18] was used to create the relationships between the PPI hub genes and the transcription factors (TFs) and miRNAs. As a result, we identified TF and miRNA with the highest degree in the networks.

Validation of Hub Genes via Expression Values and Receiver Operating Characteristic (ROC) Curve Analysis
The expression value of hub genes was assessed using the Gepia database [23]. The hub genes in the TCGA-PRAD RNA-seq data were examined, and those present in the PPI and ceRNA networks as well as in the TCGA-PRAD were chosen for gene expression validation. Additionally, the area under the curve (AUC) values derived from ROC curve analysis were used to assess the diagnostic efficacy of hub genes. Figure 1 demonstrates the workflow of the study.   The boxplot of unprocessed raw data and normalized data after batch effect removal is shown in Figure 2. These boxplots demonstrate the accuracy of the normalization and quality of the expression data. A total of 220 samples are displayed in the PCA plot ( Figure 3) on the 2D plane covered by their first two main components (PC1 and PC2). This figure shows the integration of two samples and the elimination of the batch effect. The boxplot of unprocessed raw data and normalized data after batch effect removal is shown in Figure 2. These boxplots demonstrate the accuracy of the normalization and quality of the expression data. A total of 220 samples are displayed in the PCA plot ( Figure  3) on the 2D plane covered by their first two main components (PC1 and PC2). This figure shows the integration of two samples and the elimination of the batch effect.

DEGs Analysis
The Limma package (version 3.52.3) was used to analyze microarray data from prostate cancer and normal samples. This analysis resulted in the identification of 684 DEmR-

DEGs Analysis
The Limma package (version 3.52.3) was used to analyze microarray data from prostate cancer and normal samples. This analysis resulted in the identification of 684 DEmRNAs, including 437 downregulated DEmRNAs (such as TGM4 and SCGB1A1) and 241 upregulated DEmRNAs (such as TDRD1 and CRISP3); 6 DElncRNAs, including 1 downregulated DElncRNA (H19) and 5 upregulated DElncRNAs (such as PCA3 and PCGEM1); and 59 DEmiRNAs, including 30 downregulated DEmiRNAs (such as hsa-miR-1274a and hsa-miR-1274b) and 29 upregulated DEmiRNAs (such as hsa-miR-1268 and hsa-miR-1207-5p). Tables 2-4 present the most substantially up-and down-regulated DEmRNAs, DElncRNAs, and DEmiRNAs, respectively.   In order to compare the variation in miRNA, lncRNA, and mRNA expressions between prostate cancer and normal samples, we utilized the volcano plot using the EnhancedVolcano package [24] in R (Figure 4). In addition, two heatmaps demonstrated that 20 clearly distinct DEmRNA expression patterns between prostate and normal samples were identifiable ( Figure 5A). The expression of DElncRNAs is also shown in a heatmap ( Figure 5B). In order to compare the variation in miRNA, lncRNA, and mRNA expressions between prostate cancer and normal samples, we utilized the volcano plot using the En-hancedVolcano package [24] in R ( Figure 4). In addition, two heatmaps demonstrated that 20 clearly distinct DEmRNA expression patterns between prostate and normal samples were identifiable ( Figure 5A). The expression of DElncRNAs is also shown in a heatmap ( Figure 5B).

TCGA Data Analysis
All available TCGA data on PRAD were obtained from the TCGA data portal using TCGAbiolinks package in R programming language. In September 2022, there were RNAseq data on 552 PRAD samples, including 500 primary solid tumor and 52 solid tissue normal samples. We analyzed this data using limma and edgeR packages to retrieve DEGs. DEGs were evaluated with the cut-off criteria of false discovery rate (FDR; adjusted

TCGA Data Analysis
All available TCGA data on PRAD were obtained from the TCGA data portal using TCGAbiolinks package in R programming language. In September 2022, there were RNAseq data on 552 PRAD samples, including 500 primary solid tumor and 52 solid tissue normal samples. We analyzed this data using limma and edgeR packages to retrieve DEGs. DEGs were evaluated with the cut-off criteria of false discovery rate (FDR; adjusted p value) < 0.05 and |log2 fold Change (FC)| > 0.5. Finally, we identified the genes that exist in both GEO datasets and the TCGA dataset ( Figure 6). As a result, we found out that there were 193 upregulated and 416 downregulated DEGs in both GEO datasets and the TCGA dataset. We continued the analysis with these genes. (B)

TCGA Data Analysis
All available TCGA data on PRAD were obtained from the TCGA data portal using TCGAbiolinks package in R programming language. In September 2022, there were RNAseq data on 552 PRAD samples, including 500 primary solid tumor and 52 solid tissue normal samples. We analyzed this data using limma and edgeR packages to retrieve DEGs. DEGs were evaluated with the cut-off criteria of false discovery rate (FDR; adjusted p value) < 0.05 and |log2 fold Change (FC)| > 0.5. Finally, we identified the genes that exist in both GEO datasets and the TCGA dataset ( Figure 6). As a result, we found out that there were 193 upregulated and 416 downregulated DEGs in both GEO datasets and the TCGA dataset. We continued the analysis with these genes.       Figure 10 indicates a network of GO terms and Figure 11 shows the gene-concept network of the top five GO terms.    The connection of genes to the corresponding GO is marked with a special color; there are more genes for a specific GO term if the dot relating to it is bigger.

GO Enrichment Analysis of DEGs
In Figure 12, the intersection of the top 10 GO phrases was represented by the UpSet plot. It highlights the gene overlap between several gene sets.  Figure 11. Top 5 GO terms as a network plot. These GO terms are connected to genes in this graph. The connection of genes to the corresponding GO is marked with a special color; there are more genes for a specific GO term if the dot relating to it is bigger.
In Figure 12, the intersection of the top 10 GO phrases was represented by the UpSet plot. It highlights the gene overlap between several gene sets.

Pathway Analysis
Using Pathview [25] and gage [26] packages in R, to find the probable functional genes, the KEGG pathway analysis of 241 upregulated and 437 downregulated DEGs was carried out (Table 5 and Figure 13).

Pathway Analysis
Using Pathview [25] and gage [26] packages in R, to find the probable functional genes, the KEGG pathway analysis of 241 upregulated and 437 downregulated DEGs was carried out (Table 5 and Figure 13).  Figure 13. Visualization of the pathways [25]. Green boxes are downregulated genes. Figure 13. Visualization of the pathways [25]. Green boxes are downregulated genes.

PPI Network Construction and Selection of Hub Genes
To find the hub genes, a PPI network of DEGs ( Figure 15) with 178 nodes and 185 edges created from STRING was imported into the Cytohubba plugin of Cytoscape 3.9. ITGA2, ITGA3, CAV1, PRKCA, ITGB4, ITGA8, MET, VEGFA, GPC1, MMP9, CAMK2B, LAMA3, PAK1, MYH11, and ITGB6 were the 15 hub genes with the highest degree of connectivity. Table 6 shows information about these hub genes. The greatest degree to the lowest degree is used to order these hubs.
(A) Figure 14. A total of 21 DEGs are related to cell senescence.

PPI Network Construction and Selection of Hub Genes
To find the hub genes, a PPI network of DEGs ( Figure 15) with 178 nodes and 185 edges created from STRING was imported into the Cytohubba plugin of Cytoscape 3.9. ITGA2, ITGA3, CAV1, PRKCA, ITGB4, ITGA8, MET, VEGFA, GPC1, MMP9, CAMK2B, LAMA3, PAK1, MYH11, and ITGB6 were the 15 hub genes with the highest degree of connectivity. Table 6 shows information about these hub genes. The greatest degree to the lowest degree is used to order these hubs.

PPI Network Construction and Selection of Hub Genes
To find the hub genes, a PPI network of DEGs ( Figure 15) with 178 nodes and 185 edges created from STRING was imported into the Cytohubba plugin of Cytoscape 3.9. ITGA2, ITGA3, CAV1, PRKCA, ITGB4, ITGA8, MET, VEGFA, GPC1, MMP9, CAMK2B, LAMA3, PAK1, MYH11, and ITGB6 were the 15 hub genes with the highest degree of connectivity. Table 6 shows information about these hub genes. The greatest degree to the lowest degree is used to order these hubs. (A)

Inspection of the Regulatory Network of miRNA-Hub Genes
The miRNAs that target hub genes were collected from the Networkanalyst web database ( Figure 16). Both Hsa-miR-26b-5p and Hsa-miR-1-3p were considered important miRNAs because they interacted with hub genes at the highest level (degree 5) possible.

Inspection of the Regulatory Network of miRNA-Hub Genes
The miRNAs that target hub genes were collected from the Networkanalyst web database (Figure 16). Both Hsa-miR-26b-5p and Hsa-miR-1-3p were considered important miRNAs because they interacted with hub genes at the highest level (degree 5) possible.

Examination of the Regulatory Network of TF-Hub Genes
By using the Networkanalyst database, we were able to acquire TFs that target hub genes ( Figure 17). The TF-hub gene network revealed that SUZ12 regulates 12 hub genes and may play a significant role in the development of prostate cancer.

Examination of the Regulatory Network of TF-Hub Genes
By using the Networkanalyst database, we were able to acquire TFs that target hub genes ( Figure 17). The TF-hub gene network revealed that SUZ12 regulates 12 hub genes and may play a significant role in the development of prostate cancer.

ceRNA Network Construction in Prostate Cancer
The interaction between lncRNAs and miRNAs was evaluated using miRcode. This step showed that 5 of the 6 lncRNAs may target 20 of the 59 PC-specific DEmiRNAs (Table  7). Then, to determine which mRNAs were targeted by these 20 miRNAs, we utilized miRWalk in combination with the miRTarBase, TargetScan, and miRDB filters. According to the research, 5 of the 684 mRNAs may be targeted by 5 miRNAs (Table 8). If miRNA-

ceRNA Network Construction in Prostate Cancer
The interaction between lncRNAs and miRNAs was evaluated using miRcode. This step showed that 5 of the 6 lncRNAs may target 20 of the 59 PC-specific DEmiRNAs (Table 7). Then, to determine which mRNAs were targeted by these 20 miRNAs, we utilized miRWalk in combination with the miRTarBase, TargetScan, and miRDB filters. According to the research, 5 of the 684 mRNAs may be targeted by 5 miRNAs (Table 8). If miRNAtargeted mRNAs were not found in DEmRNAs, they were eliminated. The information from Tables 7 and 8 was utilized to construct the lncRNA-miRNA-mRNA ceRNA network in Cytoscape 3.9. The ceRNA network contained a total of 5 miRNAs, 5 lncRNAs, and 17 mRNAs ( Figure 18). We displayed this ceRNA network using a Sankey diagram generated by the ggalluvial R package (Version: 0.12.3) [27] in order to better understand the impact of lncRNAs on mRNAs in prostate, of which is mediated by their interaction with miRNAs ( Figure 19). Finally, we determined the nodes' degrees using the cytohubba app, and we displayed the top 10 nodes in the network with the highest degree centrality ( Figure 20). As 10 hub genes in the ceRNA network, we identified hsa-miR-17, hsa-miR-93, hsa-miR-150, hsa-miR-25, PART1, hsa-miR-125b, PCA3, H19, RND3, and ITGB8.     Figure 19. The ceRNA network in prostate cancer is shown by a Sankey diagram. Each rectangle represents a gene, and depending on the size of the rectangle, the degree of relationship between each gene is shown. Figure 19. The ceRNA network in prostate cancer is shown by a Sankey diagram. Each rectangle represents a gene, and depending on the size of the rectangle, the degree of relationship between each gene is shown. Figure 19. The ceRNA network in prostate cancer is shown by a Sankey diagram. Each rectangle represents a gene, and depending on the size of the rectangle, the degree of relationship between each gene is shown.

Validation of Hub Genes via Expression Value
The expression value of hub genes was assessed using the Gepia (http://gepia.cancerpku.cn/) (accessed on 30 September 2022). Additionally, we used CancerMIRNome

Validation of Hub Genes via ROC Curve
We used graphpad prism 9.0 and CancerMIRNome to construct ROC curves. The ROC curve was used to evaluate how accurately the hub genes predicted outcomes. AUC was used to compare the diagnostic values of these hub genes. ROC curves and AUC values of the dataset are shown in Figure 22. The computed AUC values in this study, which were based on the findings, varied from 0.7 to 1-this is considered to have high discriminative

Validation of Hub Genes via ROC Curve
We used graphpad prism 9.0 and CancerMIRNome to construct ROC curves. The ROC curve was used to evaluate how accurately the hub genes predicted outcomes. AUC was used to compare the diagnostic values of these hub genes. ROC curves and AUC values of the dataset are shown in Figure 22. The computed AUC values in this study, which were based on the findings, varied from 0.7 to 1-this is considered to have high discriminative power. The expression levels of 22 hub genes had a high diagnostic value, according to the ROC analysis.

Validation of Hub Genes via ROC Curve
We used graphpad prism 9.0 and CancerMIRNome to construct ROC curves. The ROC curve was used to evaluate how accurately the hub genes predicted outcomes. AUC was used to compare the diagnostic values of these hub genes. ROC curves and AUC values of the dataset are shown in Figure 22. The computed AUC values in this study, which were based on the findings, varied from 0.7 to 1-this is considered to have high discriminative power. The expression levels of 22 hub genes had a high diagnostic value, according to the ROC analysis.

Discussion
In order to find novel biomarkers for prostate cancer, we used a bioinformatics approach and constructed the ceRNA network in this context. We also valued the importance of the identified hub genes in the pathogenesis of prostate cancer and found their association with signaling pathways.
In the first step, we identified several DEGS. TGM4 and SCGB1A1 have been among down-regulated mRNAs in this type of cancer. TGM4 gene codes for transglutaminase 4, a protein with a restricted pattern of expression toward prostate. The encoded protein has been shown to regulate the interactions between prostate cancer cells and vascular endothelial cells through bypassing the ROCK pathway [28]. Moreover, TDRD1 and CRISP3 have been among upregulated mRNAs. TDRD1 is responsible for coding a protein containing a tumor domain that suppresses transposable elements during spermatogenesis. The encoded protein has been shown to be expressed in most human prostate tumors, in spite of its absence in normal prostate tissues [29]. CRISP3 is found in low quantities in seminal plasma. The over-expression of CRISP3 in addition to the down-regulation of PTEN illustrates a subgroup of prostate cancer patients with a high probability of biochemical recurrence [30]. We also reported the down-regulation of H19 and up-regulation of a number of lncRNAs, such as PCA3 and PCGEM1. Notably, PCA3 is probably the most important lncRNA biomarker for prostate cancer [31].
Key factors known to be involved in prostate cancer tumorigenesis were identified with this approach. However, there are limitations. Importantly, the bioinformatic prediction is strongly dependent on the patient-derived data sets and number of data sets. We have used here a large number of data sets and from different original sources to minimize off-target finings. Still, it may be possible that using other patient-derived data sets will provide results that do not show a complete overlap of the ceRNA network.
KEGG pathway analysis has revealed glutathione metabolism and the regulation of actin cytoskeleton as the mostly down-regulated pathways. DEmiRNAs have also been reported to participate in a number of critical signaling pathways that affect prostate carcinogenesis. The ceRNA network contained a total of 5 miRNAs, 5 lncRNAs, and 17 mRNAs. We identified hsa-miR-17, hsa-miR-93, hsa-miR-150, hsa-miR-25, PART1, hsa-miR-125b, PCA3, H19, RND3, and ITGB8 as the 10 hub genes in the ceRNA network. According to the ROC analysis, the expression levels of 19 hub genes showed a high diagnostic value. Therefore, the constructed ceRNA network has been shown to affect important cellular pathways in prostate carcinogenesis and influence the prognosis of patients with this type of cancer. Taken together, we introduce a number of novel, promising diagnostic biomarkers for prostate cancer. This ceRNA-based panel can be applied as a second-level test to patients with certain PSA levels. This level should be identified in future studies.

Data Availability Statement:
The analyzed data sets generated during the study are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.