2.1. MiRNA-Signature Identification
We developed a multi-tiered approach summarized in
Figure 1, which allowed us to identify a surrogate miRNA-based signature for prognostication of ADC patients.
Firstly, we performed gene expression profile analysis of a total of 515 ADC patients belonging to the TCGA-LUAD cohort (see
Section 4), with available mRNA data. Patients and tumors characteristics are reported in
Table 1. Stage I tumors represented 54% of the cohort and smoking habit was present in 71%. Median length of follow-up in survivors was 2.1 years.
Hierarchical clustering analysis using the 10-gene signature of the TCGA-LUAD cohort (
n = 515) patients revealed 4 main branches, namely C1 (
n = 201), C2 (
n = 98), C3 (
n = 39), and C4 (
n = 177) clusters (
Figure 2a) that are consistent with previous findings [
11]. Analysis of the 3-years overall survival showed non-significant differences between C2, C3 and C4 clusters (log-rank test
p-value = 0.90 and
p-value = 0.48 in stage I and advanced stages, respectively), that were therefore collapsed into non-C1 clusters. C1 cluster displayed the worse prognosis both in stage I (
p-value = 0.0010) and in more advanced stages (
p-value = 0.0061) (
Figure 2b). Furthermore, C1 cluster displayed a significant higher fraction of male subjects and patients with more advanced lung cancer, and a nearly significant higher proportion of smokers (
Table S1), which is in line with the reported worse prognosis [
9].
We then performed miRNA expression profile of 510 out of the 515 ADC of the TCGA-LUAD cohort, with miRNAs expression data available. We used both DESeq2 R package and BRB-ArrayTools (see
Section 4) as alternative statistical approaches in order to identify differentially expressed miRNAs in C1 and non-C1 clusters of ADC. We analyzed a total of 382 miRNAs, of which 200 were found differentially expressed by DESeq2 and 90 by BRB-ArrayTools (
Table S2A,B, respectively). A total of 87 miRNAs were overlapping in the two sets. Lasso regularization was then applied to identify optimized miRNA-based signatures capable of stratifying C1 from non-C1 tumors. In total, two signatures of 14- (from the 90 miRNA-set) and 19-miRNA (from the 200 miRNA-set) were derived (5 miRNA overlapping;
Table 2), which displayed a high accuracy in C1/non-C1 cancer patients stratification (cross-validated AUC = 0.81 and AUC = 0.85, respectively;
Figure 2c).
To further reduce complexity of these miRNA-based biomarkers, we looked for a minimal set of miRNAs capable of the same accuracy of the 14- and 19-miRNA signatures to identify C1 aggressive disease. The following assumptions were made: (i) the molecular function of a miRNA is dependent to the network of targeted mRNAs which, in this case, are those differentially expressed in C1/non-C1 tumors; (ii) a prognostic biomarker should be functionally linked to mechanisms involved in tumor progression. Accordingly, we explored the miRNA-mRNA interactome characterizing C1 tumors by performing ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) (see
Section 4) using the set of 200 miRNA, and a set of 2900 mRNA genes found significantly regulated in C1-ADC (
p-value < 0.05) by DESeq2 (see
Section 4). Our analysis was restricted to genes identified by DESeq2 in order to reduce technical variability. The following rules were applied to rewire C1 miRNA-mRNA interactome: (1) we selected miRNA-mRNA pairs generated in only C1 tumors and specific, but not exclusive, for stage I (
n = 2858); (2) we selected miRNA predicted to target C1-genes (
n = 1787, miRWalk3.0, see
Section 4), and (3) with an opposite trend of expression than C1-genes (
n = 598); (4) we selected miRNA interacting with a least three C1-genes (
n = 528).
Among the miRNA-mRNA networks identified, we found a set of interacting networks with 7 miRNA as “HUBs” which derived from both the 19-miRNA and 14-miRNA signatures (
Table 2 and
Figure 2d). Hierarchical clustering analysis of this 7-miRNA signature (
Table S3) showed an overall increased expression in the more aggressive C1 tumors (
Figure 2e). Importantly, the 7-miRNA signature had a cross-validated AUC of 0.79 in C1/non-C1 patients’ stratification, which is comparable to the other two signatures (
Figure 3a), as well as when we considered differences in C1 predicted probability (
Figure 3b). The predicted C1 class from all the three signatures (7-, 14- and 19-miRNA) presented significantly increased hazard of death at 3 years in patients of all stages, with an increased risk comparable to C1 patients identified by using the 10-gene signature (
Table 3). However, when we focused the analysis to stage I ADC patients, we scored that the best risk-stratification was held by the 7-miRNA signature with approximately two-fold increased risk of death for C1 patients (HR = 2.11; 95% Confidence Interval: 1.11–4.00;
p-value = 0.0223) (
Table 3). Interestingly enough, the networks of genes targeted by these 7 miRNAs were found significantly (
q-value < 0.0001) enriched in gene sets representing molecular mechanisms related to cancer progression, which fulfilled our initial hypotheses (
Figure 3c).
Despite most of 90 miRNAs identified by BRB-ArrayTools (87/90, 97%) were comprised in the 200-miRNA set found by DESeq2, including 12 out of 14 miRNAs of the BRB-derived model, we performed ARACNe as well by using this 90-miRNAs set. Among the three not overlapping miRNAs, only hsa-miR-210-3p passed all the selection filters we described previously. However, when we added this additional miRNA to the 7-miRNA signature and performed cross-validation in C1/non-C1 patients’ stratification, the prediction performance remained the same (AUC = 0.79).
2.2. Seven-miRNA-Signature Validation
Finally, we validated the 7 miRNA-signature in an external cohort of 44 lung adenocarcinoma patients, which was collected at the IRCCS Casa Sollievo della Sofferenza Hospital (CSS). Clinical pathological characteristics of the CSS cohort are reported in
Table 1, with an overrepresentation of stage I tumors in CSS (70%) with respect to the TCGA-LUAD cohort (54%). We performed qRT-PCR analysis of FFPE samples using the 10-gene signature and calculated relative risk-score to stratify the cohort into C1 (
n = 16) and non-C1 (
n = 28) groups (
Figure 3d) (see
Section 4). Next, we performed Low-Density Taqman miRNA Arrays to profile the 7-miRNA signature in the same cohort of 44 ADC and, using logistic regression, we rederived a model based on the expression profile of the 7-miRNA signature (
Table S4). The 7-miRNA model stratified C1 from non-C1 tumors with an AUC of 0.76 (
Figure 3e) and with significant difference (
p-value = 0.0028) in C1 predicted probability (
Figure 3f). Remarkably, when we limited the analysis to stage I tumors, we scored an AUC of 0.81 (
Figure 3e) and a significant difference (
p-value = 0.0108) in C1 predicted probability (
Figure 3f).