Ordering of Omics Features Using Beta Distributions on Montecarlo p -Values

: The current trend in genetic research is the study of omics data as a whole, either combining studies or omics techniques. This raises the need for new robust statistical methods that can integrate and order the relevant biological information. A good way to approach the problem is to order the features studied according to the different kinds of data so a key point is to associate good values to the features that permit us a good sorting of them. These values are usually the p -values corresponding to a hypothesis which has been tested for each feature studied. The Montecarlo method is certainly one of the most robust methods for hypothesis testing. However, a large number of simulations is needed to obtain a reliable p -value, so the method becomes computationally infeasible in many situations. We propose a new way to order genes according to their differential features by using a score deﬁned from a beta distribution ﬁtted to the generated p -values. Our approach has been tested using simulated data and colorectal cancer datasets from Inﬁnium methylationEPIC array, Affymetrix gene expression array and Illumina RNA-seq platforms. The results show that this approach allows a proper ordering of genes using a number of simulations much lower than with the Montecarlo method. Furthermore, the score can be interpreted as an estimated p -value and compared with Montecarlo and other approaches like the p -value of the moderated t -tests. We have also identiﬁed a new expression pattern of eighteen genes common to all colorectal cancer microarrays, i.e., 21 datasets. Thus, the proposed method is effective for obtaining biological results using different datasets. Our score shows a slightly smaller type I error for small sizes than the Montecarlo p -value. The type II error of Montecarlo p -value is lower than the one obtained with the proposed score and with a moderated p -value, but these differences are highly reduced for larger sample sizes and higher false discovery rates. Similar performances from type I and II errors and the score enable a clear ordering of the features being evaluated.


Introduction
A major aim in most omics analyses is to order the genes according to their biological relevance with respect to a biological phenomenon [1,2]. To do so, experimental studies are designed and performed but usually the number of available samples in these studies is much lower than the number of expressed features (genes, methylation, etc.). In statistical terms, the number of variables is much higher than the sample size. We have a large number of hypotheses to be tested and distributional hypotheses about these features are not tenable i.e., the null distribution of the statistics is unknown. A common approach to tackle this problem resorts to randomization methods where different realizations of this randomization distribution are generated and, from them, a Montecarlo p-value is obtained [3][4][5][6]. Then these p-values are simply ordered or compared with a given threshold. These randomization procedures are also attractive because, due to their lack of assumptions about the underlying distribution followed by the data, they can be generally applied and are therefore indicated for data which comes from many different sources (as is common in meta-analysis of several experiments with omics data).
Let B be the number of randomizations (plus the p-value obtained using the original data) which are generated in a study performed for all genes (where gene expression is an example of feature); the number of possible Montecarlo p-values is B + 1 and let us call N the total number of genes to be ordered [3]. By computational reasons B use is in the order of 100 to 10,000, whereas the number of gene uses is more than 20,000. It is even not uncommon to work with N = 40,000 or more, as for isoforms or CpG site (the DNA region where a cytosine is followed by a guanine nucleotide and can be methylated). Therefore, and since the possible different p-values in a Montecarlo test of B randomizations are exactly B + 1, a high number of ties appear [4,5]. This clearly precludes obtaining a reliable ranking of the features, as long as the original p-values are used for other purposes like aggregation. The motivation of this paper is precisely to obtain a better value that leads to a better ordering. An appropriate order of the features allows us to properly compare the studies and therefore to be able to group different datasets to obtain robust results.
The order in which the features are differentially identified indicates a stronger relationship with the biological question under study. The most significant features should have a closer relationship with the biological phenomenon. However, the differentially expressed features which are first on the list have usually not been validated experimentally [7,8] so the researcher has no evidence that allows him/her to order them properly. Thus, some weakly related genes are included in the first positions of the list, introducing bias in the biological understanding. Therefore, it is the correct ordering of genes, more than just a p-value, that is essential to adequately study a biological problem [9,10]. Certainly, the difficulty would not arise if the number of randomizations in a Montecarlo test were sufficiently large. Indeed, the result of the p-value when the number of randomisation is large enough is considered as the most accurate estimation we can get from the given data. What our method intends to do is to get the best possible approximation to this limit value without performing an unfeasible number of simulations.
The null hypothesis of no differential expressions per feature can be tested using permutation tests. As stated before, this is a reliable, albeit computationally intensive, choice. Given an experimental design and a statistical test, the researcher tests the null hypothesis of no differential expression for a given feature against the alternative hypothesis of existence of a differential feature, i.e., the studied feature is associated with the experimental design. A random permutation of the label-sample correspondence produces a different dataset, and therefore a different p-value using the same statistical test. Obviously, if we consider B permutations of the label-sample correspondences, B different p-values are obtained (p 1 , . . . , p B ). We denote as p 0 the p-value obtained using the original sample classification, i.e., the original dataset. This comment applies to both the simplest experimental design where paired or independent samples are compared but also to more complex designs where some null effect (coefficient) is tested.
The widely used procedure to test the significance of the feature consists in the comparison between p 0 and p 1 , . . . , p B and the usual method of comparison relies on counting the number of p i 's with a value less than p 0 , which is used to get the Montecarlo pvalue. This is a p-value conditioned to the observed data and to the null distribution used to randomize the sample-treatment correspondence. It is a solid and useful approach but has a major drawback: This Montecarlo p-value can only take a small number of values, indeed only B + 1 different values. If the procedure is repeated many times (multiple comparisons) the same p-value will be obtained many times so a final ordering using them will produce a great number of ties. Our aim in this paper is to use the same robust procedure while doing the final comparison with a method that increases the discrimination between otherwise equivalent randomizations.

Modelling Montecarlo p-Values
As will be stated in Section 2.3, we have paired and non-paired datasets. A statistical test is chosen for testing the null hypothesis of no differential expression. Whatever the statistic used, t 0 will be its value obtained using the original sample classification. The values t 1 , . . . , t B are the corresponding observed statistics for the B randomised sample classification. The randomisation distribution takes into account the experimental design. The Montecarlo p-value ( [3,6]) is defined as It is assumed a two-tail distribution and that values with a large absolute value correspond to the alternative hypothesis. The Montecarlo p-value is certainly a p-value under the null hypothesis of no differential expression because we assume that all possible orderings of {t 0 , t 1 , . . . , t B } have the same probability, which is 1/(B + 1). Instead of using the formerly defined Montecarlo p-value, a common practice consists in using the Z-scale. Let Φ be a cumulative distribution function, either that of the standard normal distribution or of other continuous distribution; then, for each observed statistic t i we define In general, if we consider the observed t i values as samples of a random variable T we have the following transformation Under this transformation an equivalent definition of the Montecarlo p-value is These p-values or the corresponding adjusted p-values are used to order the different features evaluated, which in this way become ranked. The following comment is the motivation of this paper. Note that a Montecarlo p-value obtained from B replications can take only one of the values in {0, 1/(B + 1), . . . , 1} (considered as a p-value, as in Equation (1)) or their corresponding transformed values, as in Equation (4). However, whatever the case, there are only B + 1 possible values. Moreover, note that the number of features is N and let us remember that N uses are in the range of from a few thousand to almost a million (for instance, in methylation studies). On the other hand, the number of replica B uses is about one hundred (generally, no more than one thousand). In the unrealistic case of the observed p-values being uniformly distributed (under the null hypothesis of no differential expression) the mean number of values equal to each possible p-value would be N/(B + 1) which would be the average number of ties. Moreover, even more ties will appear if the distribution is not uniform. In our opinion, a basic result in a differential expression analysis has to be the final ordering of the genes or features studied. It is clear that doing that only using the Montecarlo p-value of Equation (1) or Equation (4) does not allow a sensible order.

Our Approach
The idea proposed is as follows. The Montecarlo p-value depends only on the number of simulations and on the number of generated u i values which are smaller than the original u 0 . It does not depend on the relative location of these generated u i 's with respect to u 0 . To highlight the consequences of this let us consider two synthetic examples of extreme situations, both with u 0 = 0.03. In the first situation all u i 's are greater than u 0 and also very close to 1. In the second situation all u i s are greater than u 0 , too, but in the interval [0.031, 0.032]. In both cases the Montecarlo p-values are zero but clearly the second situation shows that u 0 is not so extreme. It is natural to define some kind of score to distinguish both situations.
Our approach will assume that even the distribution of the random variable U defined in Equation (3) is strictly unknown. We have to take into account that the tests are applied to each row in the expression matrix, i.e., thousands of tests. Any distributional assumption done for all rows is not tenable, i.e., we cannot assume a common cumulative distribution function Φ. This is why we propose the beta distribution just as an approximate and convenient null distribution. We are not assuming that the distribution of U is a beta distribution (which is not true). However, we believe that this family of distributions is general enough to approximate the real unknown distribution of the statistic U, just as a simple approximation. Note that the support of the distribution of U is the unit interval [0, 1]. Whatever its real null distribution would be, it can be reasonably approximated by a beta distribution with appropriate α and β values. In particular the uniform distribution in the unit interval corresponds to α = β = 1 as is well known. From a probabilistic point of view this is a reasonable assumption that will be discussed in Section 4. Let us remember that a random variable follows a beta distribution with parameters (α, β), for 0 ≤ u ≤ 1 and zero otherwise; the parameters are positive real numbers, α, β > 0 and the normalisation factor, the beta function, is defined as This is a distribution suitable to model data in the unit interval [0, 1] with great flexibility.
We will have random variables U 0 , U 1 , . . . , U B corresponding to the original sample classification and to the B replications of the null distribution chosen. This sample will be considered as independent and identically distributed random variables with a beta distribution. Since the p-values are in the unit interval [0, 1], and since this family of distributions is sufficiently flexible, we consider it as a suitable model for a lot of different datasets, as long as the α and β parameters are correctly estimated. For the given U 0 = u 0 , we are interested in We will call score to γ(u 0 ). Really, the Montecarlo p-value defined in (4) is just an estimator of γ(u 0 ) without any distributional assumption, i.e., where, for a set S, 1 S is defined as 1 S (u) = 1 if u ∈ S and zero otherwise. The natural estimator for γ(u 0 ) assuming a beta distribution for the p-value would bê where (α,β) are estimated using the observed u 1 , . . . , u B . The parameters (α, β) were estimated using two methods: The maximum likelihood estimator and the moment estimator [11]. Both estimation methods were implemented in C++ and included as part of an R-package publicly accessible. Furthermore, it is possible to obtain a confidence interval for the score defined in Equation (7) by using the delta method ( [12], p. 587). The confidence region for (α, β) will produce a confidence interval for γ(u 0 ) given by h(α,β) ± n −3/2 Z 1−α/2 ∇h(α,β) I(α,β) −1 ∇h(α,β).
where Z 1−α/2 is the 1 − α/2-quantile of the standard normal distribution, the function h is defined as h(α, β) = p 0 the partial derivatives of h with respect to its variables (α, β). Full details are included in Appendix A. The procedure will estimate the score γ (in fact, an estimation of the p-value) for each gene and it will order the genes according to such score. For the practical applications that will be described in Section 2.3 we will consider experiments with two conditions. Either a paired or a non-paired t-test has been used according to the experimental design. If n denotes the total number of cases plus controls, keeping constant the number of cases (let it be n 1 , and therefore, the number of controls is n 2 = n − n 1 ) a random choice of n 1 elements among n is generated and considered a random selection of cases. Each random selection will produce a different statistic and a p-value. Let t 0 and p 0 be the statistic and p-value obtained with the true classification of cases and controls and let (t i , p i ) be the corresponding values observed for the i-th random selection. To order the genes a beta distribution will be adjusted per gene, and then the estimate will be calculated as stated before. For the transformation between raw p-values and integrated p-values or score (Equation (4)) two Φ functions were considered: The cumulative distribution functions of a standard normal distribution and of a t-distribution with the right number of degrees of freedom assuming a common variance. Additionally the means were compared using a moderated t-test [13].

Applications to Omics Data
Different applications of the just proposed score to three types of omics data are provided. All these datasets can be downloaded from the public repositories Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/, accessed on 5 June 2021), Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra, accessed on 5 June 2021 ) and The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/, accessed on 5 June 2021 ). For all example datasets two level experimental factor is considered indicating whether each sample corresponds to healthy or to cancerous colorectal tissue. The file SupplementaryMaterialData.pdf, provided in the Supplementary Material, contains a detailed description of where to find and how to preprocess the different datasets.
• RNA-Seq data: Three RNA-seq paired datasets were analyzed. Two of them correspond respectively to the Bioprojects PRJNA413956 [14] and PRJNA218851 [15] and the third is a dataset with 50 pairs of preprocessed data (count files) obtained from the TCGA (see Table 1, experiments 25 to 27). • Methylation data: The Infinium DNA MethylationEPIC assay GSE149282 dataset ( [16]) was included as example of differential methylation analysis. The MethylationEPIC array includes 850,000 methylation sites (CpGs) across the genome at single-nucleotide resolution. The dataset is made with 24 colorectal cancer (CRC) and normal adjacent colon from 12 patients. See Table 1 (experiment 24). We compare the results obtained using the score with the moderated t-test ( [13], implemented in [17]). • Microarray data: The expression array datasets were downloaded from GEO [18], by searching the terms "expression profiling by array", "Homo sapiens", "tissue", "colorectal cancer NOT cell line". This query returned 218 results (to date 15 May 2019). Of these, 195 datasets were excluded because of xenografts, organoid culture, Superseries, NanoString platform and others. Finally 21 datasets corresponding to case/control samples obtained directly from patients were included; from these datasets, 9 were paired, i.e., healthy and cancerous samples from the same patient, and the other 12 were non-paired studies, i.e., those with independent samples (see Table 1, experiments 1 to 23. Datasets 20 and 22 were discarded afterwards and therefore were not included in the table).

Comparison between Conventional Montecarlo p-Value and the Score
To compare the Montecarlo p-value and the score, we selected genes that are differentially expressed in most colorectal cancer studies according to the literature, for instance MYC, CD44, OLFM4 and other genes (see Supplementary Material).
There are three different choices of the proposed method that have to be evaluated: The use of a t-Student vs. normal distribution for the transformation between the raw p-values and the score, the number of randomizations and the way to generate them (between-pair or complete). These possibilities were evaluated for the Montecarlo p-value and the score. The number of randomizations to be executed, B, is relevant since the development of a computationally feasible approach is one of the objectives. For each experiment analyzed, B values from 10 to 1000 in steps of 10 were tested.
A representative example of the results is included in Figure 1 which shows the results of the MYC gene in experiment 1 (all results for each gene in each experiment are included in the Supplementary Material). In general, no differences were observed when using the t-Student distribution ( Figure 1A) vs. normal ( Figure 1B) distribution. On the other hand, major differences were found when the between-pair randomization ( Figure 1A,B) was compared to complete randomization ( Figure 1D,E). Regarding the number of simulations B ( Figure 1C,F), the Montecarlo shows a greater variability than the beta distribution. Furthermore, it is not necessary to perform a large number of simulations to reach a value similar to the one obtained with the maximum B = 1000 evaluated. Indeed, from a small number of simulations and up, the value remains sufficiently stable. From a statistical point of view the natural randomization distribution would be the between-pair randomization distribution because the original data in this example are paired. The statistical practice suggests that in such cases the between-pair distribution should provide a more powerful test. This is what it could be expected but surprisingly the observed p-values for complete distribution in our data seem to perform better. This particular gene is usually reported as associated with colorectal cancer. However, this is not a gold standard (in fact, there is none), but can be used to evaluate the method. The complete randomization distribution detects the gene, which does not happen with the between-pair randomization distribution. This and other examples suggested that we recommend the complete against the between-pair distribution independently of the original design.
Therefore, the parameters selected to be used in further experiments with the beta distribution method were the use of a normal distribution for the transformation between raw p-values and integrated p-values or score, the use of complete randomization and a number of B = 300 simulations.

Simulation Study
A simulation study is proposed to evaluate the ability of the method to mark properly the significance of a gene. Artificial data coming from a model that mimics a real experiment will be generated but obviously the true significance of each fictitious gene is known. This is done with the following steps:

1.
A value for the false discovery rate (FDR) α is given.

2.
For a given model, a realisation is generated. 3.
The Montecarlo p-value, our proposed score and the p-value of the moderated t-test are calculated. A number of simulations (from 100 to 1000) is used for the evaluation of the Montecarlo p-value and of the proposed score.

4.
The Benjamini-Hochberg correction will be applied to the three quantities evaluated in step 3.

5.
The features (genes) declared as significant will be compared with the (real) significant features. 6.
Steps 2 to 5 are repeated.
This experimental setup requires the generation of random but plausible matrices of expression. In order to do so two different stochastic models have been implemented and used. The first one has 200 significant genes and 800 non-significant genes. The expressions of a non-significant gene are independent samples from a normal distribution with mean 20 and standard deviation 1 (N(20, 1)), whereas the expressions of a significant gene will come from a normal, too, N(m, 1), as in the first condition but with mean m = 20 + δ for a given positive δ. We will vary the value of δ from 0.01 to 4 in steps of 0.02. Additionally the same number of observations are generated per condition. This sample size goes from 10 to 20 with unit increment. The number of replicas goes from 100 to 1000 with an increment of 100. Finally, different false discovery rates α were used from 0.001 to 0.05 with an increment of 0.001. The second stochastic model uses the gamma distribution instead of the normal distribution. The parameters were chosen in such a way that we reproduce the habitual setup, i.e., the expressions of a non-significant gene are independent samples from a gamma distribution with mean 20 and standard deviation 1, whereas the mean of the first group for significant features is equal to 20 and for the second group is m = 20 + δ (with δ taking values from 0.01 to 4 in steps of 0.02). The variance is equal to 1 for both groups. The false positive and false negative proportions were estimated (i.e., the type I and II errors). Figure 2 is an example of this simulation; it displays the two types of errors estimated for different experimental settings (the left column corresponds to type I error and the right column to type II error) using a total of 100 simulations and a normal cumulative distribution function. The two first rows correspond to α = 0.001, while the two last rows correspond to α = 0.05. The first and third rows correspond to a number of samples per group n 1 = 10, whereas the second and fourth are for n 1 = 20. The red, green and blue lines correspond to the Montecarlo method, our score and the moderated t-test, respectively. From the point of view of type I error, the performance of our score is very similar to the moderated t-test with values smaller than the Montecarlo p-value. The behaviour for the type II error is not so clear. When the groups compared have 20 values and FDR = 0.05 then practically there is no difference between the methods except that the Montecarlo performs better for small differences between the means. A similar comment applies for comparisons of groups of 10 values, although the observed differences are greater. The performance of our score and that of the moderated p-values are very close. When smaller groups with 10 values are used, the Montecarlo p-value shows a lower type II error. The differences are smaller for FDR = 0.05 than for FDR = 0.001.
In summary, our score improves the performance of the Montecarlo p-value for the type I error and it shows a behaviour very close to that of the moderated the p-values. If the type II error is considered, then the Montecarlo p-value shows the best performance and then our score. Additional plots with comparable results are included in the file SupplementaryMaterialAddons.pdf as supplementary material, in particular some GIF animations showing the behaviour of both types of error are shown.  Type I (left column) and type II (right column) errors when comparing two groups of gamma distributed random values. The difference between the means is the abscisa, δ. Please, notice the different scales in the Y-axis for types I and II. First and third rows (respectively, second and fourth rows) correspond to two groups of 10 values (respectively, 20 values). The two first rows (respectively, the two last rows) use a false discovery rate equal to 0.001 (respectively, 0.05).
The functions rArrayNorm and rArrayGamma included in the associated R package OMICfpp2 were used to generate the random expression matrices. The file fun-BetaMonteca rlo20 contains the function doReplication used in the simulation study and the whole code is included in the last section of SupplementaryMaterialMethods_BetaMonte Carlo.pdf.

Using the Score with Real Datasets
Colorectal cancer datasets from different platforms were used to test the biological effectiveness of the proposed approach.

Using the Score for Multi-Cohort Analysis
The biological results of 21 colorectal cancer (CRC) datasets (see Table 1) were analyzed. As stated before, a normal distribution, complete randomization and 300 realizations were used. A total of 18 genes were found to be significant (score < 0.05) in all the microarrays of gene expression experiments (see Table 2). With a less restrictive criterion (namely, admit a gene if it was present in most of the studies, and missing in no more than two), 197 genes were significant since not all platforms include the same genes. With respect to the genes of the first criterion, most of these 18 were reported associated to colorectal cancer in experimental studies (see Section 4). The moderated t-test method included in [17], (limma) is the most used method for statistical analysis of microarray datasets. Only the genes PRKAR2B and B3GALT4 were reported as significant in the 21 experiments by using the limma model and these two genes were also reported by our score (Table 2). Thus, the results obtained for genes in the Table 2 in each experiment using both methods were contrasted (Figure 3). In general, a clear pattern is observed through the experiments and for each gene using the proposed score, whereas this does not happen in the p-value of the moderated t-test method.

Using the Score on Different Platforms
The infinium methylationEPIC array with 850,000 methylation sites throughout the human genome was analyzed using the score. The method proved to be efficient at analyzing the variables in a short period of time, without ties and with coherent biological results. Figure 4 displays our score versus the p-values using the well established method limma. The observed correlation is 0.96. Note that our approach does not need the parametric hypothesis.
Three RNA-Seq datasets were included in the analysis. The RNA-Seq data are counts. The top ten genes reported as differentially expressed in the PRJNA413956 experiment were ETFDH, RPSAP48, IPO7P2, CEMIP, LILRB5, KIFAP3, ENC1, LILRB5, TROAP and SMG9. For PRJNA218851 dataset were the genes OTOP3, BEST4, SPIB, HAUS6P3, UNC5C, OTOP2, CA7, SALL4, SH2D6 and ETV4. In both cases, the genes have previously been linked to CRC and other cancers, but new genes are also identified, which are reported to be associated with CRC here for the first time. Regarding the results obtained for the TCGA data, 3567 zeros were obtained, which does not allow order concerning the genes in this case. This also occurred with some of the microarray experiments, curiously, with larger sample sizes. To solve this problem, bootstrap were carried out, reducing the number of ties from 3567 to 195. More experiments are necessary to refine this option when the ties do not allow order concerning the significant genes.

Discussion
The major aim of the paper is to propose a score for ordering omics features (gene expression, methylation levels, etc.). It is proposed as an improvement of the usual Montecarlo p-value. It is closely related with it but improves the use of the randomization p-values. This is the focus of the paper. Both approaches have been compared with a well established methodology, the moderated t-test.
The expression data that come from microarrays and similar techniques have a high level of noise and masking between different effects. Therefore, comparisons between experiments performed in different although similar conditions might be biased which implies that the obtained p-values should be taken cautiously. Nevertheless, the relative importance of the expression of a gene in relation with the others in the same experiment is likely to be more meaningful, and therefore ordering is a key issue to be considered.
The central idea is to assume that p-values can be considered as samples that come from a beta distribution. We think that, given the flexibility of this family of distributions, the assumption is tenable. It is true that, if some knowledge of the underlying distribution of the given data were available, tailoring the distribution would be the obvious choice. However, in real situations the use of a beta distribution provides us a powerful tool. It is reasonable to wonder if the family of distributions covered by a beta family is flexible enough, i.e., if multimodal distributions could not be better fitted by something more complex like a mixture of beta distributions. Following the parsimony principle (use the simplest possible model that accounts for the data) we decided not to do so, since the results seem sensible. Nevertheless, this is a possibility to be explored in further work, taking into account the balance between goodness of fit and model complexity.
The raw statistics are transformed by a link function in order to obtain the p-values. These link functions are cumulative distribution functions. The score proposed is to be used mainly for ordering purposes. Nevertheless, the simulation study shows that if it is used as a p-value, the type I and type II errors are similar to those obtained with the moderated p-values, and both are different from the Montecarlo method. Notice that these results were obtained with fewer simulations i.e., less computational workload (with respect to the Montecarlo method) and without explicit assumptions about the specific distribution followed by the data (with respect to the moderated p-values method).
Moreover, the first step of the method used a t-test statistic, but it is worth mentioning that any other statistic could have been used, too. With respect to the practical application of the method, the cumulative distribution function used as a link does not seem to be crucial since similar results have been obtained using the distribution function of standard normal and of the t-Student distribution.
With respect to computation time, the method was programmed in C++ and automatically uses threads in several process units/cores if they are available, which makes it efficient, but it has been embedded into an R package to be called from R code, which makes it easier to use. The fact of being able to obtain good orderings with a relatively low number of randomizations constitutes an advantage with respect to the Montecarlo method.
Further, when paying attention to the generated ordering in a given experiment, our method is better than the classical Montecarlo method.
Finally, we proposed a pattern made of 18 genes that, using our approach, appear differentially expressed in the multi-cohort colorectal cancer datasets analysed. Most of these genes were found significant by validation in the relevant bibliography, for instance, PRKAR2B and B3GALT4 were found to be differentially expressed in all experiments, both using limma and our approach. The protein kinase cAMP-dependent type II regulatory subunit beta (PRKAR2B), has been associated with cancer [38], including colorectal cancer, in more than 50 publications. The B3GALT4 gene has been associated with the prognosis of colorectal cancer [39]. Other genes identified only by our score as the transforming growth factor beta induced (TGFBI) [40] or CXCL1 [41] are also widely related to cancer. Therefore, we consider it interesting to evaluate their joint biological function and their diagnostic value in subsequent studies, since the novel approach proposed here obtains reproducible results between experiments.

Conclusions
The approach proposed in this paper has shown a better performance than the Montecarlo p-values but with much fewer simulations and, differently to other methods, namely moderated t-test, without additional assumptions. It obtains reliable biological results in multiple platforms of omics data and across different experiments.
Institutional Review Board Statement: Ethical review and approval were waived for this study, due to the fact that it uses publicly available data from previous studies which received its ethical approval when they were done.
Informed Consent Statement: Not applicable, since it was obtained when original studies that provide the publicly available data were done.