A Radiogenomic Approach for Decoding Molecular Mechanisms Underlying Tumor Progression in Prostate Cancer

Prostate cancer (PCa) is a genetically heterogeneous cancer entity that causes challenges in pre-treatment clinical evaluation, such as the correct identification of the tumor stage. Conventional clinical tests based on digital rectal examination, Prostate-Specific Antigen (PSA) levels, and Gleason score still lack accuracy for stage prediction. We hypothesize that unraveling the molecular mechanisms underlying PCa staging via integrative analysis of multi-OMICs data could significantly improve the prediction accuracy for PCa pathological stages. We present a radiogenomic approach comprising clinical, imaging, and two genomic (gene and miRNA expression) datasets for 298 PCa patients. Comprehensive analysis of gene and miRNA expression profiles for two frequent PCa stages (T2c and T3b) unraveled the molecular characteristics for each stage and the corresponding gene regulatory interaction network that may drive tumor upstaging from T2c to T3b. Furthermore, four biomarkers (ANPEP, mir-217, mir-592, mir-6715b) were found to distinguish between the two PCa stages and were highly correlated (average r = ± 0.75) with corresponding aggressiveness-related imaging features in both tumor stages. When combined with related clinical features, these biomarkers markedly improved the prediction accuracy for the pathological stage. Our prediction model exhibits high potential to yield clinically relevant results for characterizing PCa aggressiveness.


Introduction
Prostate cancer (PCa) is the second most common cancer and affects millions of men every year [1,2]. Around 16% of men are diagnosed with PCa in their lifetime in the US [3]. Due to the complex and heterogenic nature of PCa, many cellular pathways and molecular mechanisms underlying PCa progression and upstaging from one stage to another have not been uncovered yet.
The prognosis and determination of best treatment strategies for PCa patients depend on the correct estimation of PCa TNM (Tumor-Node-Metastasis)-stages, based on the universal TNM tumor stage classification, which refer to the degree by which cancer has spread inside the prostate, to the nearby tissues such as seminal vesicles and bladder, and beyond [4]. Traditional classification approaches that two stages. This proof of principle study demonstrates a plausible association of carcinogenic gene and miRNA expression with PCa upstaging and progression, establishes the potential of improving the prediction accuracy of pathological stages of PCa patients via integrating clinical and molecular features, and finally sheds light on the promise of integrating imaging with molecular and clinical data (radiogenomics) for clinical decision-making.

Description of the Radiogenomic Approach
We implemented and applied a radiogenomic approach aiming at unraveling the molecular mechanisms underlying PCa aggressiveness and upstaging and utilizing the clinical data together with imaging and genomic data to improve the accuracy of predicting the pathological stage. As a case study, we directed our analysis to the two frequent yet highly distinctive PCa stages T2c and T3b. We processed clinical data (age, PSA level, Gleason score) and two types of transcriptomic data (gene and miRNA expression data, from the TCGA archive) for 298 primary PCa tumor cases (T2c and T3b) and 52 healthy cases. The clinical and pathological characteristics of the PCa tumor cohort are listed in Table 1. For revealing the functional characteristics of both stages, we identified stage-specific genes and miRNAs. This was performed in two steps: first, we characterized the differentially expressed genes and miRNAs between the samples from stages T2c and T3b and the 52 healthy samples. Second, we excluded those generic PCa-related genes and miRNAs, which are differentially expressed between all the 496 available tumor samples and the 52 available healthy samples. Imaging data were available only for 14 matching cases for both stages (6 images for T2c and 8 for T3b) on the TCIA archive. Consequently, the available imaging data were not sufficient for training the prediction model. Therefore, we used it for assessing the identified molecular biomarkers via correlation analysis. Figure 1 illustrates our integrative approach for the joint analysis of clinical, genomic, and imaging data of PCa patients. It starts with the separate pre-processing of each dataset, followed by unraveling the molecular and functional characteristics (e.g., stage-specific genes and miRNAs, GO functional terms, and KEGG pathways) for each stage. Next, we constructed the gene regulatory network (GRN) that likely drives tumor upstaging from the T2c stage to the T3b stage, and consequently identifies potential gene and miRNA biomarkers from the genomic data, based on a strict differential expression criterion. The expression signatures of these biomarkers were correlated with aggressiveness-related imaging features for both stages T2c and T3b. Finally, the expression profiles of the identified biomarkers were combined with the clinical data to train classifiers and predict the pathological stage.

Description of the Radiogenomic Approach
We implemented and applied a radiogenomic approach aiming at unraveling the molecular mechanisms underlying PCa aggressiveness and upstaging and utilizing the clinical data together with imaging and genomic data to improve the accuracy of predicting the pathological stage. As a case study, we directed our analysis to the two frequent yet highly distinctive PCa stages T2c and T3b. We processed clinical data (age, PSA level, Gleason score) and two types of transcriptomic data (gene and miRNA expression data, from the TCGA archive) for 298 primary PCa tumor cases (T2c and T3b) and 52 healthy cases. The clinical and pathological characteristics of the PCa tumor cohort are listed in Table 1. For revealing the functional characteristics of both stages, we identified stagespecific genes and miRNAs. This was performed in two steps: first, we characterized the differentially expressed genes and miRNAs between the samples from stages T2c and T3b and the 52 healthy samples. Second, we excluded those generic PCa-related genes and miRNAs, which are differentially expressed between all the 496 available tumor samples and the 52 available healthy samples. Imaging data were available only for 14 matching cases for both stages (6 images for T2c and 8 for T3b) on the TCIA archive. Consequently, the available imaging data were not sufficient for training the prediction model. Therefore, we used it for assessing the identified molecular biomarkers via correlation analysis. Figure 1 illustrates our integrative approach for the joint analysis of clinical, genomic, and imaging data of PCa patients. It starts with the separate pre-processing of each dataset, followed by unraveling the molecular and functional characteristics (e.g., stage-specific genes and miRNAs, GO functional terms, and KEGG pathways) for each stage. Next, we constructed the gene regulatory network (GRN) that likely drives tumor upstaging from the T2c stage to the T3b stage, and consequently identifies potential gene and miRNA biomarkers from the genomic data, based on a strict differential expression criterion. The expression signatures of these biomarkers were correlated with aggressiveness-related imaging features for both stages T2c and T3b. Finally, the expression profiles of the identified biomarkers were combined with the clinical data to train classifiers and predict the pathological stage.

Functional Characteristics of T2c and T3b Stages
Differential expression (DE) analysis resulted in 127 genes and 5 miRNAs that characterize (i.e., that are specifically differentially expressed in) the T2c stage. This was done by comparing the T2c samples to the healthy samples and selecting the DE genes/miRNAs which were not identified in the comparison of the T3b samples to the healthy samples, nor identified in the comparison of all tumor samples to all healthy samples, see Figure 2 (genes) and Figure S1 (miRNAs). Similarly, 450 genes and 21 miRNAs were found to be specifically DE for the T3b stage samples, see Figure 3 (miRNAs) and Figure S2 (genes). The functional characteristics of each tumor stage were backed up by inspecting the associated GO biological processes and KEGG pathways via conducting a functional enrichment analysis. The functional analyses of stage-specific genes and miRNAs revealed diverse functional GO terms and KEGG pathways for each stage. We thus listed the significant pathways that were enriched in the specific genes for stage T2c and stage T3b, which may be of relevance to the etiology of the corresponding PCa stage, in Table 2. Moreover, Figure 2c visualizes the generic GO terms enriched within the T2c-specific genes, while in Table S1 the top 20 significant GO terms and the underlying gene sets are listed. Matching with the "growth" theme featured by the GO terms, a frequently occurring gene family in these gene sets is the FGF or fibroblast growth factor gene family, which is known to be associated with prostate tumorigenesis [33], concordant with the T2c stage. For the T3b stage, specific genes were involved in a much wider spectrum of GO terms, see Table S2. For instance, the gene ACTG2, which is associated with prostatic stromal cells [34], was involved in seven significant GO terms. Similarly, the genes ETV1 and SRD5A2 were involved in five and four significant GO terms; they were found to be over-expressed in primary versus metastatic PCa, concordant with the T3b stage, and associated with an increased prostate cancer risk [35], respectively. A similar analysis was applied to the stage-specific miRNAs (21 for T3b and 5 for T2c) as shown in Figure 3 and Figure S1. Markedly more miRNAs and related PCa-pathways (such as the PI3k-Akt pathway [36] and prostatic neoplasms) are implicated in advanced PCa stages ( Figure 3) than in earlier stages ( Figure S1).   supplementary  Table S1. The scatter plot was generated using the web tool REVIGO [37]. The original data for Figure  2 was shown in Supplementary Materials.   Supplementary Table S1. The scatter plot was generated using the web tool REVIGO [37]. The original data for Figure 2 was shown in Supplementary Materials.

Construction of the Prostate Cancer (PCa-GRN) Network
Next we performed differential expression analysis of the gene and miRNA expression data between the two stages. This yielded 125 differentially expressed (DE) genes and 19 DE miRNAs. We constructed a GRN network that combines transcriptional and post-transcriptional regulatory interactions between the DE genes and DE miRNAs (see Methods). We refer to this network as the PCa-GRN network although it represents the progression between two PCa stages only. The PCa-GRN network comprises nine miRNAs regulating 30 target genes, see Figure 4. This PCa-GRN network indicates how miRNAs are playing a critical role in the complex regulation system underlying PCa progression and upstaging as exemplified here from T2c to T3b. This is concordant with the results of the TCGA consortium analysis [1] where they found various miRNA expression patterns between different categories of PCa tumors. In order to quantify the mechanistic impact of these miRNAs and to characterize the central hub nodes that contribute essentially to the overall regulation, we computed the node degree centrality parameters and ranked the nodes according to their degrees. We identified 4 central hub miRNAs (mir-592, mir-587, mir-147, mir-661) that dominate the PCa-GRN network and maintain the interactions between these miRNAs and their neighboring genes. Hence, they could act as driver miRNAs and genetic regulators for PCa progression across the two stages. Remarkably, the aberrant expression patterns of the 4 candidate driver miRNAs have been connected to pathogenesis of various cancer types such as breast carcinoma and colorectal cancer via regulating cancer-specific pathways such as AKT/mTOR signaling pathways [38,39]. However, their molecular interactions with PCa-specific deregulated genes, and their regulation mechanisms within PCa progression were not reported before. This could highlight new insights into these candidate driver miRNAs as potential targets for new drugs delaying PCa progression.

Functional Homogeneity within the Prostate Cancer (PCa-GRN) Network
Furthermore, we evaluated the biological evidence for the PCa-GRN network in more depth to better assess the functional integrity of the biological processes underlying the etiology of PCa progression and upstaging. We estimated the functional homogeneity of the PCa-GRN genes and the target genes of the nine miRNAs by calculating the functional similarity scores between all gene pairs and comparing the resulting distribution to the similarity score distribution of randomly selected gene pairs from the PCa-GRN network. Interestingly, the PCa-GRN genes have significantly more cellular functional homogeneity than randomly selected ones (p-values < 1.2 × 10 −4 , Kolmogorov-Smirnov test), see Figure 5. Therefore, the nine miRNAs, and their 30 target genes comprising the PCa-GRN network could highlight new insights into a miRNA-gene based network that is representing the molecular interactions and dysregulation mechanisms underlying PCa progression from the T2c stage to the T3b stage. been connected to pathogenesis of various cancer types such as breast carcinoma and colorectal cancer via regulating cancer-specific pathways such as AKT/mTOR signaling pathways [38,39]. However, their molecular interactions with PCa-specific deregulated genes, and their regulation mechanisms within PCa progression were not reported before. This could highlight new insights into these candidate driver miRNAs as potential targets for new drugs delaying PCa progression. Figure 4. The GRN gene regulatory network prostate cancer (PCa-GRN) constructed from the miRNAs and genes differentially expressed between the T2c and the T3b tumor samples. The dark red nodes represent potential driver miRNAs. Square nodes denote the miRNAs, whereas the circular grey nodes represent genes. The network was visualized using the Cytoscape tool.

Functional Homogeneity within the Prostate Cancer (PCa-GRN) Network
Furthermore, we evaluated the biological evidence for the PCa-GRN network in more depth to better assess the functional integrity of the biological processes underlying the etiology of PCa progression and upstaging. We estimated the functional homogeneity of the PCa-GRN genes and the target genes of the nine miRNAs by calculating the functional similarity scores between all gene pairs and comparing the resulting distribution to the similarity score distribution of randomly selected gene pairs from the PCa-GRN network. Interestingly, the PCa-GRN genes have significantly more cellular functional homogeneity than randomly selected ones (p-values < 1.2 × 10 −4 , Kolmogorov-Smirnov test), see Figure 5. Therefore, the nine miRNAs, and their 30 target genes comprising the PCa-GRN network could highlight new insights into a miRNA-gene based network that is Figure 4. The GRN gene regulatory network prostate cancer (PCa-GRN) constructed from the miRNAs and genes differentially expressed between the T2c and the T3b tumor samples. The dark red nodes represent potential driver miRNAs. Square nodes denote the miRNAs, whereas the circular grey nodes represent genes. The network was visualized using the Cytoscape tool.

Biomarker Identification
To identify distinguishing biomarkers of genes and miRNAs between the two PCa stages, we restrained the FDR cutoff to 0.001 and increased the fold change threshold to 2.5-fold. Intriguingly, we identified the DE gene ANPEP and the DE miRNAs mir-217, mir-592, mir-6715b, that survived the above criteria. The gene ANPEP and the miRNA mir-6175b were downregulated in the T3b tumor samples while the miRNAs mir-217 and mir-592 were downregulated in the T2c tumor samples, see Figure 6a. PCA analysis of the normalized expression profiles for these four biomarkers revealed reasonable separation between the T2c and the T3b samples, see Figure 6b. Interestingly, many studies reported the aberrant expression patterns of the ANPEP gene in PCa cells [40,41] and furthermore ANPEP was suggested to be a potential prognostic biomarker for PCa patients [42]. Many studies reported the implications of mir-217, mir-592, mir-6715b either as a potential therapeutics to enhance chemosensitivity response in PCa [43], or in promoting proliferation of PCa cancer cells [44], or in the pathogenesis of other cancer types [45], respectively. However, their potential as molecular biomarkers in clinical outcomes has not been outlined before. Hence, miRNAs mir-217, mir-592, and mir-6715b are novel candidate biopsy-derived diagnostic biomarkers for PCa stages as will be demonstrated in the next section. To unravel the deregulation mechanisms of these four potential biomarkers, we investigated their association (see Methods) with other features from

Biomarker Identification
To identify distinguishing biomarkers of genes and miRNAs between the two PCa stages, we restrained the FDR cutoff to 0.001 and increased the fold change threshold to 2.5-fold. Intriguingly, we identified the DE gene ANPEP and the DE miRNAs mir-217, mir-592, mir-6715b, that survived the above criteria. The gene ANPEP and the miRNA mir-6175b were downregulated in the T3b tumor samples while the miRNAs mir-217 and mir-592 were downregulated in the T2c tumor samples,  Figure 6a. PCA analysis of the normalized expression profiles for these four biomarkers revealed reasonable separation between the T2c and the T3b samples, see Figure 6b. Interestingly, many studies reported the aberrant expression patterns of the ANPEP gene in PCa cells [40,41] and furthermore ANPEP was suggested to be a potential prognostic biomarker for PCa patients [42]. Many studies reported the implications of mir-217, mir-592, mir-6715b either as a potential therapeutics to enhance chemosensitivity response in PCa [43], or in promoting proliferation of PCa cancer cells [44], or in the pathogenesis of other cancer types [45], respectively. However, their potential as molecular biomarkers in clinical outcomes has not been outlined before. Hence, miRNAs mir-217, mir-592, and mir-6715b are novel candidate biopsy-derived diagnostic biomarkers for PCa stages as will be demonstrated in the next section. To unravel the deregulation mechanisms of these four potential biomarkers, we investigated their association (see Methods) with other features from DNA methylation, somatic mutations, and protein expression levels of PCa samples, see Figures S3-S5. For instance, the expression of ANPEP was inversely proportional to the hypermethylation of many differentially methylated regions (DMRs) and CpG islands ( Figure S3c). This may explain the mechanisms of ANPEP downregulation in advanced stages of PCa. Further research work is warranted to comprehensively analyze the implications of such observations in PCa progression and diagnosis.   [46] for fusion of MRI delineated prostate regions of interests. We outline the prostate volume in coronal axis T2-weighted fast MRI images for both T2c and T3b samples. (d) The correlation matrix between the normalized expression levels of the four biomarkers and the extracted aggressivenessrelated radiographic features C2 and C3. The significant correlations (FDR < 0.05) are marked with (*). C2 category represents the histogram of tumor volume intensity and basic statistical metrics such as mean, median, standard deviation, and kurtosis. The C3 feature category denotes the texture analysis of the tumor volume and includes Gray-Level Co-occurrence Matrix (GLCM) features such as contrast, energy, and homogeneity metrics [3]. The original data for Figure 6 was shown in Supplementary Materials.

Correlation Analysis with Aggressiveness-Related Imaging Features
To assess the biological relevance of our candidate biomarkers with respect to PCa progression and upstaging, we investigated their correlation to aggressiveness-related imaging features extracted from corresponding MRI images of PCa patients with the two stages under investigation (T2c and T3b), see Methods. We extracted the feature classes C2 and C3 which are related to PCa aggressiveness and upstaging as described by Stoyanova et al., 2016 [3]. The C2 class describes the distribution of intensities of tumor volume intensity and basic statistical metrics such as mean,  [46] for fusion of MRI delineated prostate regions of interests. We outline the prostate volume in coronal axis T2-weighted fast MRI images for both T2c and T3b samples. (d) The correlation matrix between the normalized expression levels of the four biomarkers and the extracted aggressiveness-related radiographic features C2 and C3. The significant correlations (FDR < 0.05) are marked with (*). C2 category represents the histogram of tumor volume intensity and basic statistical metrics such as mean, median, standard deviation, and kurtosis. The C3 feature category denotes the texture analysis of the tumor volume and includes Gray-Level Co-occurrence Matrix (GLCM) features such as contrast, energy, and homogeneity metrics [3]. The original data for Figure 6 was shown in Supplementary Materials.

Correlation Analysis with Aggressiveness-Related Imaging Features
To assess the biological relevance of our candidate biomarkers with respect to PCa progression and upstaging, we investigated their correlation to aggressiveness-related imaging features extracted from corresponding MRI images of PCa patients with the two stages under investigation (T2c and T3b), see Methods. We extracted the feature classes C2 and C3 which are related to PCa aggressiveness and upstaging as described by Stoyanova et al., 2016 [3]. The C2 class describes the distribution of intensities of tumor volume intensity and basic statistical metrics such as mean, median, standard deviation, and kurtosis. The C3 feature class refers to the texture analysis of the tumor volume including the Haralick GLCM features: contrast, energy, homogeneity, and entropy metrics, see Figure 6c. We computed the Pearson correlation between the expression profiles of our candidate biomarkers and the C2 and C3 imaging features for both stage groups (T2c and T3b). Figure 6d shows relatively high positive and negative correlation between the aggressiveness-related features and the expression signatures of the four biomarkers, especially ANPEP (max r = 0.97 and min p-value = 0.032) in T2c patients, and max r = 0.94 and min p-value = 0.025 in T3b patients. Significant correlations are marked by an asterisk. It is also noteworthy that the C3 features exhibited significant differential correlation between T2c patients T3b patients. For the T3b patient group, the C3 features (except the entropy) were found to be much better than the C2 features in correlating with the biomarker expression signatures. This also matches the findings about the plausible diagnostic power of the C3 feature class especially in advanced PCa stages [47].

Assessing the Predictive Power of the Identified Biomarkers
We randomly partitioned the data into two stratified sample sets, the training data (70%) and the test data (30%) and performed the biomarker identification again only on the training dataset (the 70% of the data). We repeated the aforementioned procedure for multiple runs (10 times), see Methods. In fact, the set of biomarkers did not change and we explain this invariance by the strict selection criteria (2.5 fold and FDR of 0.001) we used for identifying them, and we also note their power in separating the two classes as shown by the PCA analysis in Figure 6b. For each run, we performed the training and testing procedure, the results of which are described next.
In order to evaluate our candidate biomarkers as diagnostic features for predicting the corresponding pathological stages, we used three machine-learning methods (Naïve Bayes, Support Vector Machines (SVM) and Random Forest) on three feature sets: (1) the clinical features only (age, PSA level, and Gleason score), (2) the molecular features only (the expression profiles of the candidate biomarkers ANPEP, mir-217, mir-592, mir-6715b), and (3) the combined set (all 7 features). We used three machine-learning methods to ensure the reliability of the prediction efficacy of all used features. Theoretical backgrounds about the three used machine-learning methods and the parameters used for training and testing processes are described in full details in the Supplementary files. Figure 7 exemplifies the prediction accuracy by receiver operating characteristic (ROC) curves for each method on the three feature sets described above. Prediction accuracy was evaluated using the area under the ROC curve (AUC), which measures the ability of a method (based on the respective set of features) to differentiate between the pathological stages. All prediction results are tabulated in Supplementary  Table S8.
The results revealed that the predictions using the molecular features only (AUC = 0.844, AUC = 0.812 and AUC = 0.848 for SVM, Random Forest and Naïve Bayes, respectively, see Supplementary Table S8) show a slightly lower or equal accuracy, when compared to the predictions using clinical features only (AUC = 0.872, AUC = 0.814, and AUC = 0.841 for SVM, Random Forest, and Naïve Bayes, respectively). In line with our findings, Shen et al. were able to differentiate PCa patients with different degrees of aggressiveness using a different set of four miRNA biomarkers (mir-20a, mir-21, mir-221, and mir-145) with an AUC prediction accuracy of 0.82 [48]. It might be also worthwhile to investigate the predictive power of other molecular features (e.g., differentially methylated regions (DMRs)) identified from other OMICs datasets, which were associated with the biomarkers we found, since these biomarkers yielded prediction accuracies close to the ones based on clinical features, in predicting pathological stage. Recent studies have adopted other models of machine-learning such as Fuzzy logic [19] and neural networks [49] for predicting PCa stages based on clinical features only and reported AUC values (0.82 and 0.7, respectively) comparable to those computed by our models using the clinical features as well.
Cancers 2019, 11, x FOR PEER REVIEW 12 of 18 molecular feature sets separately. Hence, this highlights the potential usefulness of combining features from heterogeneous datasets to achieve better prognosis for PCa patients.

Datasets
RNA-Seq, miRNA-Seq, and clinical data for normal and prostate tumor samples were downloaded from the GDC data portal (https://portal.gdc.cancer.gov), namely the TCGA-PRAD project. For consistency, we only considered matching samples, which were common between the two datasets (RNA-Seq, and miRNA-Seq). Each TCGA sample refers to a tissue biopsy taken from a unique individual. This resulted in a total of 546 samples composed of 52 healthy control samples and 496 tumor samples with different tumor stages. The matching imaging traits for the examined tumor stages (T2c, T3b) were obtained from the Cancer Imaging Archive (TCIA) [50], i.e., again from Interestingly, predicting the pathological stages based on the combined set of both clinical and molecular features returned the largest AUC for all the three methods (AUC =~0.9 for SVM, Random Forest and Naïve Bayes), thereby outperforming the prediction models based on either clinical or molecular feature sets separately. Hence, this highlights the potential usefulness of combining features from heterogeneous datasets to achieve better prognosis for PCa patients.

Datasets
RNA-Seq, miRNA-Seq, and clinical data for normal and prostate tumor samples were downloaded from the GDC data portal (https://portal.gdc.cancer.gov), namely the TCGA-PRAD project. For consistency, we only considered matching samples, which were common between the two datasets (RNA-Seq, and miRNA-Seq). Each TCGA sample refers to a tissue biopsy taken from a unique individual. This resulted in a total of 546 samples composed of 52 healthy control samples and 496 tumor samples with different tumor stages. The matching imaging traits for the examined tumor stages (T2c, T3b) were obtained from the Cancer Imaging Archive (TCIA) [50], i.e., again from the TCGA-PRAD project [51]. The datasets can be downloaded using the URLs listed in Supplementary  Table S3.

Genomic Data
The gene and miRNA expression datasets were obtained in raw read counts and were consequently normalized, corrected for library size, and log2-transformed using the Bioconductor [52] package DESeq2 v.1.12.4 [53] that is part of the statistical programming language R [54].

Clinical Data
The clinical data were also normalized using quantile normalization [55] to remove the outliers that might affect the efficacy of the prediction process.

Imaging Data
The MRI imaging data for the examined two stages were subjected to median filters for noise reduction and further segmented using the NIH ImageJ software [56] to determine the region of interest (ROI) delineating the tumor. Next, the imaging feature categories C2 and C3, which are related to the aggressiveness and upstaging of PCa as described by Stoyanova et al. 2016 [3], were extracted. Briefly, C2 category represents the histogram of tumor volume intensity and basic statistical metrics such as mean, median, standard deviation, and kurtosis. The C3 feature category concerns the texture analysis of the tumor volume and includes Gray-Level Co-occurrence Matrix (GLCM) features such as contrast, energy, and homogeneity metrics.

Differential Expression and Association Analysis
The DESeq2 Bioconductor package [53] was used for the differential expression analysis for both gene and miRNA expression data. Namely, genes/miRNAs that exhibited 1.5-fold changes and False Discovery Rate (FDR) cutoff of 0.05 were classified as differentially expressed (DE) genes/miRNAs. The potentially distinctive biomarkers were identified as those DE genes/miRNAs, which showed at least 2.5 -fold changes and FDR < 0.001. The association of the identified biomarkers with methylation, protein levels, and somatic mutation data was assessed using the Spearman correlation (cutoff threshold = 0.65) and the F-statistic measure (FDR < 0.05). FDR was controlled using the Benjamini-Hochberg [57] adjustment.

Construction of Prostate-Specific GRN Network (PRAD-GRN)
The molecular interactions between the differentially expressed (DE) genes and the DE miRNAs were compiled from the regulatory databases of the TFmiR webserver [58]. We considered only interactions that are supported by experimental evidence. Next, we reduced the entire network to a subnetwork whose target nodes or regulator nodes are known to be associated with prostate cancer in order to contextualize the network towards prostate cancer, generating the PCa-GRN. To this end, the human miRNA disease database (HMDD) [59] as well as DisGeNET [60,61] (a database for gene-disease association) were used as sources for prostate cancer-associated miRNAs and genes, respectively. Key driver genes/miRNAs were identified by determining the highly central nodes (hub nodes) via applying the degree centrality measure on the PCa-GRN. The PCa-GRN network was visualized with Cytoscape V3.3.0 [62]. In order to assess the biological relevance of the identified molecular interactions to PCa phenotype (here represented by tumor progression), we used the GoSemSim R package [63] to estimate semantic similarity scores according to the Gene Ontology (GO) annotations. Statistical significance was computed by comparing the similarity scores of the PCa-GRN genes to the similarity scores of randomly selected genes (using the same number of genes). We repeated the permutation procedure 100 times and adopted the Kolmogorov-Smirnov test to check whether the similarity scores of PCa-GRN gene pairs were statistically higher than the scores of randomly selected pairs.

Enrichment Analysis of Genes and miRNAs
The functional annotation tool DAVID [64] was used to identify significantly enriched functional categories in the gene sets. Enrichment analysis of the miRNA sets was performed using TAM [65]. For this, we determined which functional categories were annotated to at least 2 genes/miRNAs and at the same time are significantly overrepresented in the gene/miRNA study set, as previously done in [66]. For both gene and miRNA enrichment analysis, Fisher's Exact test was employed followed by the Benjamini-Hochberg [57] adjustment for controlling the FDR, with a cutoff value of 0.05.

Correlation Analysis
The pairwise correlations between the expression signatures of the identified biomarkers and the extracted aggressiveness-related imaging features was performed using Pearson correlation. The significance was computed using the R method cor.test followed by the Benjamini-Hochberg adjustment for controlling the false discovery rate (FDR), with a cutoff value of 0.05.

Prediction Models
We performed multiple runs (10×) with random data partitions (70% of the data for training and 30% for testing). The sampling of all partitions was stratified, so that the distribution of the two classes is proportional to the original distribution in the whole dataset. The molecular biomarkers were then identified by differential expression, see "Differential Expression and Association Analysis", above, with the strict thresholds of at least 2.5-fold changes and FDR < 0.001. The molecular biomarkers and/or the clinical features already available, were then used to train three machine-learning methods, Naïve Bayes, Support-Vector Machine (SVM) and Random Forest. Model training and data partitioning were performed using the R caret package [67] with the default parameter settings for all classifiers. Besides the method's own default parameter selection in the training step, no additional parameter tuning was performed.
This whole learning process was performed for every method after removing the NA entries from all datasets. Classifiers were trained based on three datasets; the clinical features only, the molecular features only and finally the combined set of both clinical and molecular features, resulting in nine classifiers that were trained and tested. The AUC (Area Under Curve) characteristic was used as the evaluation metric for the prediction results. We used the pROC package [68] for ROC (Receiver operating characteristic) analysis and for computing the averaging step over the runs to obtain one value as the average AUC for the nine classifiers. The learning and prediction parameters are described in sufficient details in Supplementary Tables S4-S6.

Conclusions
The current study presented an integrated regulatory analysis of gene and miRNA expression data for PCa samples to unravel molecular features, related GO functional terms, and pathways underlying PCa progression, and to identify potential biomarkers that can distinguish different PCa stages. The biomarkers we found belong to genes and miRNAs that play critical roles in PCa and other cancer types and showed high correlation with aggressiveness-related imaging features extracted from mp-MRI images. When combined with the traditional clinical features and using the power of machine learning, these biomarkers were able to improve the prediction accuracy of the corresponding PCa pathological stages.
To this end, future research work is demanded for predicting the best treatment strategy, such as chemotherapy, radiotherapy, or surgery. Other future research includes the development of similar integrative approaches such as those based on patient similarity networks that better take into consideration the molecular heterogeneity of PCa. This is actually one major limitation of our approach. Another limitation is the insufficient number of imaging samples that prevented us from combining the imaging features together with clinical and molecular features for enhancing the prediction of the PCa stages. If more imaging samples were provided for the different PCa stages, then such an analysis would have been easily possible. Further joint analysis of the associated molecular features (from DNA methylation, other non-coding RNAs, and somatic mutations) with the four identified biomarkers, as well as wet-lab experiments may enable to characterize more clinically relevant OMICs features that can potentially be used for diagnosis and prognosis of PCa. Finally, our approach can be applied to other cancer types or complex diseases with progressive stages and might be also extended for studying cellular developmental stages as well.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/11/9/1293/s1, Figure S1: Functional characteristics of the miRNAs specific to T2c stage in PCa. Figure S2: Functional characteristics of genes specific to the T3b stage in PCa. Figure S3: Association analysis of the ANPEB gene biomarker with other PCa OMICs datasets. Table S1: GO Functional Enrichment of the 127 genes, which are exclusively dysregulated in the T2c stage and not in the T3b stage. The table lists the top 20 significant GO Biological process (BP) terms. Table S2: GO Functional Enrichment of the 450 genes, which are exclusively dysregulated in the T3b stage and not in the T2c stage. The table lists the top 20 significant GO Biological process (BP) terms. Table S3: the download URLs of the analyzed data sets in this study. Table S4: Parameters of the random forest models constructed by the caret train method for the three different input datasets. Table S5: Parameters of naive bayes models constructed by the caret train method for the three different input datasets. Table S6: Parameters of SVM models constructed by the caret train method for the three different input datasets. Table S7: Overview of the three datasets used to train the prediction model. Table S8: the AUC performance of the different methods and datasets splitted into the 10 runs.