2.1. Development of MiRaGE Method
The outline of the MiRaGE server [9
] is indicated in Figure 1
and the methodological detail has been described in [3
]. When miRNA m
is considered, genes are classified into three categories: (1) target genes of m
; (2) non-target genes of m
but targeted by any other miRNAs; (3) genes not targeted by any miRNAs. Hereafter, we denote the set of genes which belong to category (1) as Gm
and the set of genes G0m
includes genes in category (2). Then the statistical significance of difference in gene expression between Gm
for each miRNA is calculated with various methods.
We have already established a web server [9
] to perform MiRaGE method. MiRaGE server computes these P
-values employing either Kolmogorov-Smirnov test, Wilcoxon rank-sum test, or Student t
-test. Three sets of miRNAs can be analyzed: conserved, weakly conserved, or no restriction. Then P
-value is compared to check if miRNA m
significantly regulates its target genes between two different experimental conditions. After applying FDR correction (BH method [10
]) to 162 P
-values, we have selected m
s whose FDR corrected P
-values are less than 0.05 as miRNA which regulates target genes significantly.
2.2. miRNAs Underrepresented in ES Cells after the Differentiation to Neuronal Cells
Aiba et al.
] measured gene expression profiles of mouse ES cells during differentiation to several lineages. Among them, we have analyzed gene expression profile during the differentiation to neuronal cells with N2-supplement B medium. They have measured six time points: each day from 0 to 5 days during differentiation induction. Importantly, they added Universal Mouse Reference RNA supplemented with mRNAs of ES cells as internal references (UMRR + ES), labeled with Cy5. All sample RNAs were labeled with Cy3 and mixed with Cy5-labeled UMRR + ES and subjected to microarray analyses. Addition of UMRR + ES was intended to precise quantification of each gene expression in regards to the externally added reference RNA [11
]. Since there are two biological replicates for each time point, we can have in total 26
= 64 distinct combinations of time points. After applying principal component analysis (PCA) to total 64 combinations of normalized profiles, we have uploaded all of the first PCs to MiRaGE server. Since the first PCs turn out to be monotonic decease/increase of gene expression as time goes (data not shown), we may identify miRNAs which effect on the gene expression profiles in a monotonic manner.
indicates the miRNAs underrepresented in ES cells after induction of differentiation to the neuronal cells by MiRaGE method. In other words, these miRNAs may be down regulated in ES cells after differentiation and important for the stemness of ES cells. There are many miRNAs known to be biologically critical in ES cell biology appeared in Table 1
. For instance, miR-302a/b/d, miR-290 cluster (291a-3p, 294 and 295), miR-200, and miR-429 are reported to be upregulated in undifferentiated ES cells [8
]. Recently, induced pluripotent stem cell can be generated with several miRNAs including miR-302 and miR-200 [13
]. It is also remarkable that miR-106a/b are listed in miRNAs dominated in ES cells [15
]. Most of the above mentioned miRNAs showed statistical significance in all three analyses used in MiRaGE method. The MiRaGE could infer many miRNAs which are believed to be critical in stemness in ES cells. These results revealed the power of MiRaGE method for inference of important miRNAs in biological processes.
As an analytical method, the MiRaGE can accept the principal component scores (PCs) of the expression profiles (see Figure 1
and section 3). It gives us an interesting possibility to MiRaGE for inference of miRNAs specific in a PC. In the other words, if a PC is related to a particular subpopulation in the sample population, we can infer specific miRNAs important for the subpopulation. In addition, changing statistical methods does not drastically affect the list of selected miRNAs (Table 1
). This suggests the robustness of the MiRaGE method.
To test whether these highly ranked miRNAs as underrepresented in ES cell differentiation, we analyzed another data in the same dataset: the differentiation to trophoblast from ES cells (Table 2
indicates the miRNAs that mostly appeared in the top 50 underrepresented in ES cells between neural and trophoblast differentiation [10
]. Among them, 50% (25) of miRNAs are commonly underrepresented and all the above mentioned miRNAs appeared in the list. Among the 25, we found miRNAs that belong to the miR-17-92 cluster (miR-20, 93, and 17). The miR-17-92 cluster is known to be a critical component in the MYC pathway [16
] and contributes to tumorigenesis in several malignancies [17
]. It is a very interesting possibility that the cluster miRNAs may contribute the stemness of non-tumor pluripotent stem cells.
2.3. miRNAs May Be Overrepresented in ES Cells after the Differentiation to Neuronal Cells
There are many miRNAs known to be critical in neurogenesis. We had expected that MiRaGE could find some of those characterized miRNAs in the dataset of Aiba et al.
]. However, we could not statistically infer significant miRNA as overrepresented after the neuronal differentiation (i.e.
, potentially critical in neural differentiation) when we tried the standard MiRaGE method. To overcome this difficulty, we changed the algorithm of data processing as described in the method section and found three miRNAs showed relatively small P
-values in t
-test (Table 3
Among them, miR-184 is known to be overexpressed in the central nervous system [18
], indicating the biological relevance of the MiRaGE. However, we could not see the well-documented miRNAs for neurogenesis such as miR-9 or miR-124 [19
]. To address this issue, we have analyzed the other dataset by the same group [20
] with the modified MiRaGE method. The comparison between ES and adult brain tissue indicated that miR-124 showed modest significance as overrepresented in brain (P
= 0.044), suggesting that the miRNA may be more important in the mature brain tissue than neuronal-differentiating ES cells.
Generally, most of the analyzed miRNAs showed strong statistical significance as underrepresented in differentiating ES cells, whereas not many miRNAs were statistically significant as overrepresented in our analysis (Tables 1
). We have analyzed the difference of gene expression of the miRNA targets with the same data set and more than half the probes of whole miRNA targets used in MiRaGE analysis were downregulated in ES cells (53.5%, P
< 0.0001). We analyzed the other data set using a different group: ES cell differentiation to the Embryonic body [21
]. The results showed that much less miRNA targets were downregulated in ES cells compared with differentiated cells to embryoid bodies (23.6%, P
= 0.007) and we did not observe the uneven distribution of P
-values in the MiRaGE analysis with the dataset (data not shown). Hence the uneven distribution of P
-value shown in Tables 1
may be caused by the experimental procedure [11
Based on the statistical algorithm, the number of suppressed target mRNA species, rather than the absolute suppression of target gene expression, achieves lower (i.e.
, more significant) P
-values using the miRNAs in the original MiRaGE method. This characteristic of the original MiRaGE is apparent in the difference of P
-values shown in Tables 1
. The biological logic of this algorithm is based on the notion that the miRNA with more target species should be biologically more important. As mentioned above, we had to modify the analytical method of MiRaGE to see the known critical miRNA in neural differentiation as high-ranked ones. This experience suggested that some miRNAs may function by suppression of fewer targets with the larger absolute differences of mRNA expression. That type of miRNAs may be more difficult to significantly infer using the original MiRaGE method as the absolute differences of mRNA expression is not used in the analysis. Further experimental and analytical studies will be needed to define which calculation method of MiRaGE may be more suitable to individual types of datasets and/or miRNAs.
The present study showed the potential of the MiRaGE method in the prediction of miRNAs critical for cellular differentiation. MiRaGE at present, however, may be more suitable for the inference of differently functioning miRNAs between two quite similar but distinct cell populations such as histopathological subtypes in various human cancers. For example, ovarian cancers consist of several different morphological subtypes (serous, mucinous, endometrioid, clear cell and undifferentiated) and some of them show poorer prognosis. MiRaGE may be useful to find out the specific miRNAs for poor prognostic subtypes and give an insight for the molecular therapeutic targets of the cancers.
How then, can we improve the MiRaGE method to analyze the difference of miRNAs functions between the two datasets with quite different expression profiles? One possibility is by analyzing the datasets of gene expression profiles with miRNA overexpression or suppression. This type of experiment reveals important parameters for MiRaGE such as valid targets of miRNAs in a cell and the extent of maximum gene suppression by the miRNAs for each target. We can adjust the strength of the miRNAs in suppression of each target gene expression more precisely. We may give further additional parameters such as the state of chromatin modifications in the promoter regions of target genes in the two different states of the cells; we may delete target genes whose promoter is inactivated in a particular cell. Curating of potential targets by experimental data and promoter activities may also improve the precision of the MiRaGE method.