Model-Based Integration Analysis Revealed Presence of Novel Prognostic miRNA Targets and Important Cancer Driver Genes in Triple-Negative Breast Cancers

Background: miRNAs (microRNAs) play a key role in triple-negative breast cancer (TNBC) progression, and its heterogeneity at the expression, pathological and clinical levels. Stratification of breast cancer subtypes on the basis of genomics and transcriptomics profiling, along with the known biomarkers’ receptor status, has revealed the existence of subgroups known to have diverse clinical outcomes. Recently, several studies have analysed expression profiles of matched mRNA and miRNA to investigate the underlying heterogeneity of TNBC and the potential role of miRNA as a biomarker within cancers. However, the miRNA-mRNA regulatory network within TNBC has yet to be understood. Results and Findings: We performed model-based integrated analysis of miRNA and mRNA expression profiles on breast cancer, primarily focusing on triple-negative, to identify subtype-specific signatures involved in oncogenic pathways and their potential role in patient survival outcome. Using univariate and multivariate Cox analysis, we identified 25 unique miRNAs associated with the prognosis of overall survival (OS) and distant metastases-free survival (DMFS) with “risky” and “protective” outcomes. The association of these prognostic miRNAs with subtype-specific mRNA genes was established to investigate their potential regulatory role in the canonical pathways using anti-correlation analysis. The analysis showed that miRNAs contribute to the positive regulation of known breast cancer driver genes as well as the activation of respective oncogenic pathway during disease formation. Further analysis on the “risk associated” miRNAs group revealed significant regulation of critical pathways such as cell growth, voltage-gated ion channel function, ion transport and cell-to-cell signalling. Conclusion: The study findings provide new insights into the potential role of miRNAs in TNBC disease progression through the activation of key oncogenic pathways. The results showed previously unreported subtype-specific prognostic miRNAs associated with clinical outcome that may be used for further clinical evaluation.

Similarly, 10 miRNAs were identified as significantly associated with overall survival ( Figure 2B, Table 2). Three of these have also been observed to be strongly linked with good prognosis p-value < 0.05. The results from differential expression showed the up-regulation of seven miRNAs and down-regulation of three miRNAs within this group.  16), were significantly correlated with worse overall survival. miRNA hsa-miR-1208 was observed to be an outlier with an extreme increase in high hazard ratios and was excluded from downstream analysis.

In silico Validation
In silico analysis showed that the proposed miRNA signature with 68% overlapping miRNAs with the Buffa et al. dataset shows significant association (HR = 1, CI = 1.00-1.00, p-value = 2.6E-05) with distant relapse-free survival (DRFS), whereas the miRNA signature showed a broad lack of reproducibility (p-value = 0.13) with over 80% overlap with the Bockhorn dataset when assessed for overall survival (Figures S7,S8). This might be due to the sample scarcity and difference in the platforms or methods used during the samples collection. The heterogeneity within TNBC could also have played an important role in the performance of miRNAs in their prognostic ability.

mRNA Differentially Expressed among the Tumour Subtype Classes
The profiles of 18 triple-negative tumour samples and 131 non-TNBCs were used for differential expression analysis. The results showed that 386 genes were significantly expressed in TNBC tumours. Unsupervised hierarchical clustering separates 386 mRNAs into two large clusters with unique gene subsets ( Figure S2). Cluster 1: This was a large cluster characterised by 180 down-regulated mRNAs. Of these, 19 mRNAs were among known biomarkers observed in previous studies linked to receptor-negative breast tumours. Validated biomarker genes included CDKN2A (logFC −2.44), involved in the G1/S checkpoint of the mitotic cell cycle and Ras protein signal transduction [40,41], EGFR (logFC −2.86) in the Fc-epsilon receptor signaling pathway, the activation of phospholipase C activity, the epidermal growth factor receptor signaling and cell proliferation [42,43], and IGF2BP3 (logFC −2.46) in mRNA transport and the regulation of the cytokines and biosynthesis process [44]. The canonical pathways observed for this cluster were cyclins and cell cycle regulation, cell cycle G1/S checkpoint, glioma, p53 signaling pathway, melanoma and Wnt signaling pathway enriched with EGFR, CCNE1, FZD9, CDKN2A, SFRP1, CDK6, SERPINB5, FGF9, MMP7, CALML5, TCF7L1, SHC4, and PRKX genes.
We further analysed prognostic factors by conducting univariate and multivariate analyses using histopathological information on its own (without the miRNAs' expression data). We observed node positivity and tumour size histopathological factors were significantly associated with DMFS prognosis (Table S2), whereas the percentage of tumour cell infiltration, node positivity and tumour size were among the significantly correlated prognostic factors with OS (Table S3). We then assessed the quality of the fitted model using analysis of deviance likelihood for the selection of co-variates which could impact on the association of prognostic factors with miRNA expression on the outcome prediction. The results from deviance-score analysis suggested that the two-factors (node positivity and tumour size) model was the best-suited model for DMFS prognosis, similar to clinical factors such as tumour size and percentage of tumour cell infiltration model for overall survival. Almost all miRNAs retained their prognostic ability when evaluated in multivariate models using information on node positivity, tumour size and tumour cell infiltration for DMFS and OS (Tables S4-S7).

In silico Validation
In silico analysis showed that the proposed miRNA signature with 68% overlapping miRNAs with the Buffa et al. dataset shows significant association (HR = 1, CI = 1.00-1.00, p-value = 2.6E-05) with distant relapse-free survival (DRFS), whereas the miRNA signature showed a broad lack of reproducibility (p-value = 0.13) with over 80% overlap with the Bockhorn dataset when assessed for overall survival (Figures S7 and S8). This might be due to the sample scarcity and difference in the platforms or methods used during the samples collection. The heterogeneity within TNBC could also have played an important role in the performance of miRNAs in their prognostic ability.

mRNA Differentially Expressed among the Tumour Subtype Classes
The profiles of 18 triple-negative tumour samples and 131 non-TNBCs were used for differential expression analysis. The results showed that 386 genes were significantly expressed in TNBC tumours. Unsupervised hierarchical clustering separates 386 mRNAs into two large clusters with unique gene subsets ( Figure S2).

Prognostic miRNAs and Their Association with Predicted Targets and Enriched Functions
Target prediction analysis on differentially expression genes from the mRNA expression profiles identifies 35 mRNA genes with known involvement in various cancers. Of 28 miRNAs, 16 from both signatures are displayed in Figure 3 (Table S8). TFF1, NPY1R, and IGFBP2; cell differentiation (BMP4, AGTR1, METRN, IL6ST, MAPT, FOXA1, KITLG, and IL20); positive regulation of signal transduction (BMP4, HPX, IL6ST, ESR1, GJA1, F7, ECM1, IRS1, and IL20). In addition, other biological processes that were significantly affected include the extracellular matrix, voltage-gated channel activity, regulation of growth, regulation of cell motion and anti-apoptosis, cell adhesion, and cytokine binding. The top regulated pathways for this cluster were also linked with the downregulation of MTA-3, ERBB2 signal transduction and other oncogenic pathways.

Prognostic miRNAs and Their Association with Predicted Targets and Enriched Functions
Target prediction analysis on differentially expression genes from the mRNA expression profiles identifies 35 mRNA genes with known involvement in various cancers. Of 28 miRNAs, 16 from both signatures are displayed in Figure 3 (Table S8).

18.-miRNAs Signature
We further assessed the relationship of predicted miRNAs with each gene cluster of mRNA data set. Three low-risk miRNAs with significant low expression including hsa-miR-301b, hsa-miR-130a, hsa-miR-181d and one high-risk hsa-miR-1290 miRNA belong to the 18-miRNA DMFS prognostic signature targets cluster, two genes of the mRNA dataset involving REEP1 (logFC 2.16), CNTNAP2 (logFC 2.03), BMP4 (logFC 2.66), FOXA1 (4.33) and CLEC3A (logFC 3.61). These genes were functionally involved in processes such as neuron differentiation and the positive regulation of signal transduction. In general, all of the low-risk miRNA target genes were involved in important regulatory pathways of TNBC disease. hsa-miR-181d targets CACNA2D2 (logFC 2.34) gene, involved in important functions such as voltage-gated ion channel and ion transport. Other miRNAs such as hsa-miR-301b have shown anti-correlation with MUM1L1 (logFC 2.18) and NBEA (logFC 2.25), a regulator of plasma membrane. Highly up-regulated genes such as AFF3 (logFC 4.33) and GJA1 (2.02), targeted by another low-risk miRNA, hsa-miR-30e, are involved in KEGG pathways of arrhythmogenic right ventricular cardiomyopathy (ARVC) and other cellular processes like cell-cell signaling, endoplasmic reticulum and plasma membrane.

Discussion
Triple-negative disease is still characterised by the absence of three known receptor genes, Er, Pr, and Her2, because of underlying heterogeneity and aggressive disease behaviours. These two persistent challenges have slowed the progress of developing new targeted therapies within the triple negative subtype. Therefore, there is a need to identify novel biomarkers that can further stratify triple-negative into simpler molecular subtypes and may help in developing new screening techniques, the early detection of cancer, and better clinical outcome. The focus has recently been on elucidating novel and important elements, factors and mechanisms controlling transcription regulation. Investigating miRNAs involving and controlling transcriptional regulation may provide new understanding of complex diseases such as TNBC. Therefore, this study integrates information from miRNA and mRNA by developing independent models on genome-wide expression profiling and identifies important regulatory pathways, targetable markers for chemotherapy and their involvement in predicting clinical outcomes [5,56].
We investigated the role of miRNAs as a differentiating factor among the tumour subtypes based on expression profiling. We observed 173 miRNAs that clearly separate the breast tumours from their respective normal patients. Further analysis of the patients between triple negative and non-triple negative shows the presence of 172 miRNAs, dysregulated and targeting well-known biomarker genes within this subtype. These biomarker genes are involved in key pathways of signalling pathways such as the p53 signaling pathway, Ras protein signalling and Wnt signalling pathway. Therefore, this offers strong a indication that these miRNAs indirectly control critical pathways and consequently impact the biology of TNBC microenvironments. Interestingly, the 77 common miRNAs between the triple negative and non-triple negative cases have shown higher expression in triple negative patients. The finding that miRNAs have higher expression in aggressive subtypes was previously confirmed by an independent study of Blenkiron et al. [57]. De Rinaldis et al., in their study, also showed that higher expression levels can be traced back to changes in the DNA copy numbers of these miRNAs [58]. Most of these highly expressed miRNAs originated from amplified regions of chromosomal DNA in the tumours. Further investigation into tumour progression from normal to tumour subtypes have indicated only three miRNAs (hsa-miR-154, hsa-miR-145, and hsa-miR-93) that are differentially expressed across the tissues ( Figure 1B). The Venn diagram intersection between the three contrast tissue comparisons-normal versus non-TNBC, normal versus TNBC and TNBC versus non-TNBC-shows that the highest expression changed with triple-negatives. miRNAs hsa-miR-154 and hsa-miR-145 exhibited over-expression in TNBC compared to non-TNBC, whereas hsa-miR-93 is up-regulated when TNBC is compared with normal-like tissue.
A large number of miRNAs from cluster 3 ( Figure 1C) have shown higher expression, except the group of five miRNAs. miRNAs such as hsa-miR-342-3p, hsa-miR-195, hsa-miR-497, hsa-miR-29c, hsa-let-7c, and hsa-miR-497 were observed with higher expression in triple negatives and most of these were targeting genes from cluster 1 of mRNA expression data, which are highly down-regulated in the triple-negative patients. Most of these targets were known cancer driver genes involved in inhibiting tumour growth, speeding up the apoptosis, invasion and regulating migration in tumour cells (Figure 3, Tables S9 and S10). These miRNAs provide evidence of their acting as tumour suppressor in triple-negative and can be used as targets in tackling tumour progression and management; however, further validation will be needed. Another characteristic of these miRNAs is the formation of clusters in various cancers and their acting as modulators for important cell functions. It has been noted that these clusters usually reside very near to the known cancerous sites or genomics hotspots and play a very important role in carcinogenesis [59,60]. We identified two members of the miRNA let family hsa-let-7c/b targeting KRT5 and GABBR2, involved in molecular functions such as G-protein-coupled signaling, protein binding and structural activity. The manipulation of this cluster can be used for experimental purposes and also for therapeutic intervention in triple-negative diseases.
Next, we investigated miRNAs' role as potential biomarkers in receptor-negative breast tumours and triple-negative disease. Using univariate Cox regression analysis, we identified 28 miRNAs significantly associated with survival outcomes that were further validated by applying multivariate analysis for their ability as independent markers. miRNA hsa-mir-1290 is down-regulated in triple-negative patients and has shown a strong association with higher risk groups with the triple-negatives. Previously, the over-expression of hsa-mir-1290 in the Her2-postive group has been reported and linked with better overall survival, as well as serving as a marker for early detection in triple-negatives [56,61]. Our results also showed strong evidence of hsa-mir-1290 as prognostic marker in triple-negative and its targeting two highly over-expressed genes (BMP4, logFC 2.66; FOXA1, logFC 4.33) in the mRNA dataset. BMP4 and FOXA1 have been reportedly involved in patients with relapse-free survival (RFS) and disease-free survival (DFS) in breast cancer [62,63]. Functionally, FOXA1 is a member of the transcription family and a very well-characterised biomarker for triple-negatives and other carcinomas [51]. During the Cox regression analysis, we observed that almost all of the down-regulated miRNAs were associated with high-risk, and targeting up-regulatory genes in the mRNA dataset involved in cellular processes such as proliferation, angiogenesis and apoptosis.
An emerging theme in the survival analysis was the association of up-regulated miRNAs with low-risk in triple negatives. miRNAs such as hsa-miR-29c target important biomarker genes like CDK6. This gene is involved in cell cycle checkpoints and regulation, the regulation of cell proliferation and the p53 signalling pathway. A recent paper was also in agreement with our finding that the down-regulation of hsa-miR-29c is linked to poor survival outcome in breast cancer patients [64]. Other predicted targets for hsa-miR-29c are BCL11A and COL22A1, involved in tumour formation, regulator of structural molecular activity and extracellular region. The low-risk miRNAs from has-let-7 family (hsa-let-7c, hsa-let-7b) act as tumour suppressors through controlling cellular processes such as cell projection, cell proliferation, and cell development, involving the genes KRT5 and GABBR2. Over-expression of hsa-miR-342-3p was a prognostic marker for DMFS and OS, and consistent with a better prognosis in our study. It has been reported that hsa-miR-342-3p regulates BRCA1 expression and MYC transcriptional activity by directly repressing E2F1 in human lung cancers [36,65]. Another low-risk miRNA hsa-miR-497 predicting DMFS and OS regulates several important signalling pathways related to VEGF, ErbB1 and stem cells pluripotency, involving gene targets SH2D2A, OTX1 and TRIM2.
We have compared our results with two independent datasets to investigate reproducibility. We observed that only a small percentage of miRNAs overlapped with the independent datasets. The independent validation results also lack consistency when DMFS and OS signatures were assessed for predicting distant relapse-free survival (DRFS) and OS. Sample scarcity, a different analytical platform and demographic changes may have accounted for inter-study differences. The heterogeneity with in TNBC could have also played an important role in the performance of miRNAs in their prognostic ability.
These findings suggested that the gene clusters identified during the mRNA profiling can be influenced by the targeting of significant miRNAs. These findings also support anti-correlative relationship of OS and DMFS prognostic miRNAs with oncogenesis/suppression of cancer-related pathways in predicting prognosis. Some of the prognostic miRNAs observed in this study are somehow linked to various cancers, but this is the first time that we have identified TNBC subtype-specific miRNAs associated with OS and DMFS survival. MiRNAs such as hsa-miR-342-3p, hsa-miR-195, hsa-miR-155, and hsa-miR-497 were associated with OS, whereas hsa-miR-29c, hsa-miR-342-3p, hsa-let-7c, hsa-let-7b, and hsa-miR-497 were correlated with DMFS. We have also observed that hsa-miR-630, hsa-miR-195, and hsa-miR-101 have contributed to drug resistance in breast cancers and hsa-miR-497 in pancreatic cancer previously [66]. Samples were divided into breast tumour subgroups by carefully examining the receptor status of three known biomarker genes Er, Pr, and Her2. Raw expression datasets were extracted using Bioconductor R 'GEOquery' package [67] for both types of data. Quality reports were generated using 'preprocesscore' package of Bioconductor R for potential outliers. Expression values were quantile-normalised and log-transformed using default functions of 'limma' Bioconductor R package. We performed lmFit() function in order to identify differentially expressed miRNAs among by selecting the coefficients of normal versus TNBC, normal versus non-TNBC and TNBC versus non-TNBC groups ( Figure S6). miRNAs with p-value < 0.05 were considered for downstream analysis. P-values were adjusted for multiple testing using the False Discovery Rate method. We used R base hclust function for the calculation of two-dimensional average-linkage hierarchical clustering by building the Spearman rank correlation matrix. Heat maps were draw using 'gplots' package of R Bioconductor for manual visualisation. We applied 'silhouette' for selecting the optimal clusters for miRNA and mRNA datasets (Table S1).

Collection of Predicted, Validated Targets and Calculation of Correlation Index
For our integrated analysis, we applied five widely used tools for target prediction analysis in order to identify possible interactions between dysregulated miRNAs and mRNA: PITA catalog version 2007 [20], microCosm (also known as miRBase Sequence database) [68] and TargetScan catalog version 6.2 [21]. Similarly, we selected two databases for validated targets: TarBase version v.5c [69] and miRecords version April 27, 2013 [70]. An in-house SQLite (http://CRAN.R-project.org/package= RSQLite) database was built to merge, store and process data of all five prediction tools. In the second step, a fast querying searching was applied for the collection of miRNAs targets for downstream analysis. To evaluate the relationship between miRNAs and gene targets we performed correlation analysis using the expression values of each miRNA and their potential target genes. We particularly focused on inversely expressed miRNA-mRNA pairs because of the mechanism of miRNA gene regulation and degradation during the transcription process. Finally, top anti-correlated pairs were selected and further investigated for their functional impact on the clusters of mRNA expression profiles. An interacting network was build using Cytoscape [71] tool visualising the contradicting effects of miRNAs (Figure 3).

Statistical Analysis
A two-step process was adopted for survival model. Firstly, univariate Cox-regression was performed on the 172 differentially expressed miRNAs for their association with overall survival (OS) and distant metastatic-free survival (DMFS) using 'survival' R base package. p-values were adjusted for multiple-tests using the Benjamin-Hochberg method. miRNAs were ranked according to their p-value, and those exhibiting p-value < 0.05 were considered for multivariate modelling, resulting in an optimal set of miRNAs being derived for both outcome conditions. Censoring occurred at the date of death from any cause (overall survival (OS)), and first evidence of metastases (distant metastatic-free survival (DMFS)) or at the time of 10-year follow up, depending upon the first occurrence of any of them. Different Cox-regression models were applied for the selection of pathological covariates. Histopathological covariates were grouped as Tumour Grading (1 and 2 as low, 3 = high), Node Positivity (positive or negative), Tumour size (continuous variable), % of Tumour Cell Infiltration (low < 50% and high > 50%), Age Group (group A > 50 years, group B < 50 years). We further assessed the quality of each fitted model using the analysis of deviance likelihood test for the selection of co-variate, which could impact the association of prognostic factors with miRNA expression on the outcome prediction (Tables S2-S5). Log-rank and Kaplan-Meier analyses were conducted among the three groups (low, intermediate and high) expressing samples for the calculation difference and drawing of the survival curves using 'survcomp' R package.

Gene Set Enrichment Analysis of Predicted Targets
We performed functional enrichment analysis for each cluster of the mRNA data in order to identify the potential involvement of cellular processes, molecular functions and biological pathways. Two different analysis were performed for gene ontology enrich terms and gene set enriched pathways using ranked lists from publically available databases such as DAVID [72], KEGG [73], Reactome [74] and CGAP [75] to translate the expression profiles of mRNA clusters Figure 1A (See Supplementary Methods Figure S1 for detailed workflow).

Independent Cohorts for Validation for miRNA Prognostic Signature
We performed "in silico" validation of prognostic miRNAs using two independent datasets published by Bockhorn et al., with 18 TNBC and 28 Non-TNBC [76], and Buffa et al., including 210 Non-TNBC [77] breast tumour samples. The ability of the model to predict outcome was assessed by calculating the AUC of respective ROC curves for the period of 10 years. Higher AUC indicates better performance of the model (AUC = 0.5).

Conclusions
In conclusion, we have identified subtype-specific signatures from integrated analyses of miRNAs and mRNAs expression profiles. The results show that the identified miRNAs are prognostic markers for OS and DMFS and directly control the important oncogenic processes and pathways through modulating important cancer driver genes and pathways. The study findings suggested a lack of consistency when DMFS and OS signatures were assessed for predicting the distant relapse-free survival (DRFS) and OS using data from two independent cohorts. Further investigation of tumour progression from normal to tumour subtypes has shown only three miRNAs differentially expressed across the tissues and triple-negatives acquired the highest expression change. The study provides further insight to the complex heterogeneity in triple-negatives and also provides evidence for miRNA as an influencing factor on tumour transcriptional phenotypes at the transcriptome level. The study also shows that the model-based breakdown of system-genomics changes from the dysregulation of epigenetic miRNAs to the transcriptomic pathways. Our results suggest that molecular studies based on miRNAs' biomarkers can help in the early detection of disease and can used as agents of therapeutic interventions in triple-negatives.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/3/632/s1, Figure S1. Workflow of the integrated analysis summarising steps pre and post processing, and functional analysis; Figure S2. A heatmap representing two clusters of the mRNA expression data; Figure S3. Representation of two dimensional hierarchical clustering heatmap of the miRNAs clusters 1; Figure S4. Representation of two dimensional hierarchical clustering heatmap of the miRNAs clusters two; Figure S5. Representation of two dimensional hierarchical clustering heatmap of the miRNAs clusters three; Figure S6. The figures shows PCA plot of the top 10 differentially expressed miRNAs within normal, TNBC and non-TNBC patients; Figure S7. Performance estimates of signatures of disease metastatic-free survival in the independent cohort validation; Figure S8. Performance estimates of signatures of overall-survival in the independent cohort validation; Table  S1. Table include top dysregulatory miRNAs of the triple negative verus non-triple negative comparison. The colouring order is according to the clusters shown in Figure 1C; Table S2. prognostic performance of histological factors as covariates in univariate and multivariate analysis in association with DMFS; Table S3. prognostic performance of histological factors as covariates in univariate and multivariate analysis in association with OS; Table S4. assessed representation of the quality of fitted model using analysis of deviance for the selection of co-variate for DMFS; Table S5. assessed representation of the quality of fitted model using analysis of deviance for the selection of co-variate for OS. Table S6. Kaplan-Meier estimates of the patient into three groups for DMFS; Table S7. Kaplan-Meier estimates representing difference in three groups for OS; Table S8.