# A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Properties of LC-MS Untargeted Datasets: High-Dimensional and Multicolinear

**Table 1.**Summary of working examples obtained from LC-MS untargeted metabolomic experiments. Further experimental details and methods can be obtained from references. (KO=Knock-Out; WT=Wild-Type).

Biofluid/Tissue | Sample groups | # samples /group | # XCMS variables | System | Reference | |
---|---|---|---|---|---|---|

Example #1 | Retina | KO | 11 | 4581 | LC/ESI-QTOF | [17] |

WT | 11 | |||||

Example #2 | Retina | Hypoxia | 12 | 8146 | LC/ESI-QTOF | [16] |

Normoxia | 13 | |||||

Example #3 | Serum | Untreated | 12 | 9877 | LC/ESI-TOF | [18] |

Treated | 12 | |||||

Example #4 | Neuronal cell cultures | KO | 15 | 8221 | LC/ESI-QTOF | unpublished data |

WT | 11 |

## 3. Sample Size Calculation in LC-MS Untargeted Metabolomics Studies

**Figure 1.**(A) Power curves for example #2 (∆) and example #4 (□) with sample size on the x-axis and estimated power using 5% FDR on the y-axis. Estimated densities of effect sizes for example #4 (B) and example #2 (C) with the standardized effect size on x-axis and estimated densities on the y-axis. Bimodal densities as in example #2 reflect more pronounced effects.

## 4. Handling Analytical Variation

_{QC}) according to formula Equation (1), where S and X are respectively the standard deviation and the mean of each individual feature detected across the QC samples:

_{T}) can be defined according to formula Equation (2), where S and X are the standard deviation and mean respectively calculated for each mzRT feature across all study samples in the dataset.

_{QC}) is expected to be low since they are replicates of the same pooled samples. Therefore Dunn et al. [24] have established a quality criteria by which any peak that presents a CV

_{QC}> 20% is removed from the dataset and thus ignored in subsequent univariate data analyses. Red and green spots in Figure 2 illustrate the CV

_{T}and CV

_{QC}frequencies distributions respectively for example #3 in which QC samples were measured. As expected, the highest percentage of mzRT features detected across QC samples present the lowest variation in terms of CV

_{QC}(green line). Conversely, the highest percentage of the mzRT features detected across the study samples holds the highest variation in terms of CV

_{T}(red line). Notice that the intersection of red and green lines is produced around the threshold proposed by Dunn et al. [24]. Additionally, other studies performed on cerebrospinal fluid, serum or liver QC extracts also reported around 20% of CV on experimental replicates [25,26].

_{T}) can be expressed as a sum of biological variation (CV

_{B}) and analytical variation (CV

_{A}) according to Equation (3), computed CV

_{T}should be at minimum larger than 20% (the most accepted analytical variation threshold) for a metabolite feature to comprise biological variation.

_{T}< 20% since biological variation is bellow analytical variation threshold. Figure 2 shows the frequency distribution of CV

_{T}for working examples #1,2 and #4 where QC samples were not available. According to our criteria, those mzRT features to the left of the threshold will hold more analytical than biological variation and should be conveniently removed from further statistical analysis. This surely results in a too broad criterion since it assumes that the analytical variation of all metabolites is similar, which is of course not accurate given that instrumental drifts do not affect all metabolites evenly. It should be beard in mind, however, that tightly regulated metabolites presenting low variation such as glucose will likely be missed according to a 20% CV

_{T}cut-off criterion. Of mention, example #2 and example #4 show the higher and lower percentage of mzRT features with more than 50% CV

_{T}respectively. Therefore, there is more intrinsic variation in example #2 than in example#4. Whether such variation relates to the phenomena under study remain to be ascertained using hypothesis testing.

**Figure 2.**Comparison for our four working examples of the mzRT relative standard deviation (CV) frequency distributions calculated either across all the samples (CV

_{T}) or across QC samples (CV

_{QC}). Grey spots represent CV

_{T}for examples #1(◊), #2 (∆) and #4 (□) respectively. Green and red circles represent CV

_{QC}and CV

_{T}respectively for example #3. Blue line represents 20% CV

_{T}cut-off threshold established when QC samples are not available.

## 5. Hypothesis Testing

_{0}). Then, we specify the probability threshold for this null hypothesis to be rejected when in fact it is true. This threshold of probability called α is frequently set-up at 5% and it can be though as the probability of a false positive result or Type I error. Then, we use hypothesis testing to calculate the probability (p-value) of null hypothesis rejection. Whenever this p-value is bellow to this pre-defined threshold of probability (α), we reject the null hypothesis. On the other hand, when calculated p-values are larger than α we do not have enough evidence to reject this hypothesis and we fail to reject it. Note that null hypothesis can never be proven, instead null hypothesis is either rejected or failed to reject. Conceptually, the failure to reject the null hypothesis (failure to find difference between the means) does not directly translate in to accept or prove it (showing that there is no difference in reality).

**Table 2.**Best suited statistical tests for datasets following normal distribution or far from the normal curve according to their experimental design.

Experimental design | Normal distribution | Far from normal-curve |
---|---|---|

Compare Means | Compare Medians | |

Compare two unpaired groups | Unpaired t-test | Mann-Whitney |

Compare two paired groups | Paired t-test | Wilcoxon signed-rank |

Compare more than two unmatched groups | One-way ANOVA with multiple comparison | Kruskal-Wallis |

Compare more than two matched groups | Repeated-measures ANOVA | Friedman |

## 6. Deciding between Parametric or Non-Parametric Tests

#### 6.1. Normality, Homogeneity of Variances and Independence Assumptions

#### 6.2. Parametric and Non-Parametric Tests. Does It Really Matters in LC-MS Untargeted Metabolomics Data?

**Table 3.**mzRT features percentages in which normality, homocedasticity or both assumptions are met. H

_{0}(Shapiro-Wilk’s test)= Data are sampled from a Gaussian distribution. H

_{0}(Levene’s test)=Variances are equal. Percentages represent those features in which there were not enough evidences to reject H

_{0}at conventional α=0.05 relative to the total number of features retained after handling analytical variation.

# mzRT | Groups | Normality (Shapiro-Wilk's test) | Homocedasticity (Levene’s test) | Normality & Homocedasticity | |

Example #1 (Retinas) | 3252 | KO | 66% | 93% | 60% |

WT | 60% | 54% | |||

Example #2 (Retinas) | 7654 | Normoxia | 65% | 77% | 48% |

Hypoxia | 79% | 60% | |||

Example #3 (Serum) | 6131 | Untreated | 85% | 90% | 76% |

Treated | 88% | 78% | |||

Example #4 (Neuronal cells) | 6831 | KO | 72% | 91% | 64% |

WT | 82% | 73% |

**Figure 3.**Venn-Diagrams of the mzRT features showing statistical significance using either parametric or non-parametric tests. Venn-Diagrams’ areas are proportional to the percentage of the significantly varied features out of the number of total features retained after handling analytical variation (indicated in parenthesis) .The Mann-Whitney test (examples #1, 2 and 4) or Wilcoxon signed rank (example #3) tests were used for non-parametric groups median comparisons. Unpaired (examples #1, 2 and 4) or paired (example #3) t-tests were used for parametric groups mean comparisons.

## 7. Using Multiple Related Tests that Cumulate the p-Value: The Multiple Testing Problem and the False Discovery Rate

#### 7.1. The Multiple Testing Problem

#### 7.2. Bonferroni Correction

^{k}; where k is the number of hypothesis tests performed and α is the pre-defined threshold of probability in each individual test. Therefore, to maintain a prescribed FWER (i.e. 0.05) in an analysis involving multiple tests, the α assumed in each independent test must be more stringent than FWER. Bonferroni correction is the standard approach to control FWER by specifying what α values should be considered in each individual test using the Equation 4:

^{-5}for each individual test to accept an overall FWER of 0.05. Hence, in each individual test, only those features with p-values ≤ 1.54 × 10

^{-5}would be declared to be statistically significant. Assuming this correction, the probability of yielding one or more false positives out of all 3252 hypotheses tested would be FWER = 1-(1-1.54 × 10

^{-5})

^{3252}= 0.0488. Notice that this probability is much lower than the one obtained if no correction was applied: FWER = 1-(1-0.05)

^{4581}≈ 1. Bonferroni correction represents a substantial increase of the stringency of our testing leading to just 75 metabolite features out of the initially 3252 prescribing a FWER = 0.05.

#### 7.3. The FDR Multiple Testing Correction

^{-5}. Bonferroni provides the strongest control of the false positives and therefore a high confidence in the selected metabolic features. However, an important advantage of FDR approach is that it allows the researcher to select the error rate that they would assume in their subsequent studies. On the other hand, Figure 4 show that a t-test comparison of WT and KO groups on example#4 outlined 328 features all of them resulting in false positives after FDR correction. This indicates that all this significant outcomes derived from chance and no real effect was underlying on this example. Accordingly if no correction for multiple testing were considered we would have done subsequent MS/MS identification experiments on features that represent false positives. This would have been a pointless task with consequent waste of time and resources. To avoid situations like this, we would recommend correcting for multiple testing when dealing with multiple univariate analysis of untargeted LC-MS datasets. Then, focus on those metabolites with lower FDR derived q-values for further MS/MS identification experiments. In addition, we would like to comment that whenever a follow-up targeted validation study was going to be attempted, we would recommend considering those metabolites showing statistical significance after strict Bonferroni correction.

**Figure 4.**Frequency histogram showing the distribution of p-values typically expected from t-tests binary groups’ comparison in examples #1, 2 and 4. Green bar represent the total number of features declared to be significant assuming 5% false positives in a t-test comparison of the two groups. Red bar represent the FDR- estimated number of features being considered false positives out of the features declared significant in the t-test. The number of total significant features retained after FDR correction (q < 0.05) is also indicated.

## 8. The Fold Change Criteria

## 9. Univariate LC-MS Untargeted Analysis Workflow

_{QC}and proceed to retain only those metabolic features presenting CV

_{QC}< 20%. If QC samples are not available, an alternative procedure is to compute CV

_{T}and retain those mZRT with CV

_{T}> 20%.

**Figure 5.**General flow chart for univariate data analysis of untargeted LC-MS-based metabolomics data. Different colors for the four working examples indicate the initial number and the retained number of mzRT features in each step. FDR and FC value are fixed at 5% level 1.5-cutoff values respectively.

**Table 4.**Statistics summary of those metabolites identified using MS/MS experiments in working example #2. Unpaired t-test and Mann-Whitney test were used for parametric and non-parametric hypoxic and normoxic retinas comparison respectively. Correction for multiple testing was performed assuming 5% FDR.

Parametric Test | Non-parametric | |||||
---|---|---|---|---|---|---|

p-value | q- value | FC (mean) | p-value | q-value | FC (median) | |

Hexadecenoylcarnitine | 3.31×10^{-13} | 1.05×10^{-10} | 5.0 | 2.49×10^{-05} | 3.18×10^{-04} | 4.9 |

Acetylcarnitine-derivative | 1.10×10^{-13} | 5.02×10^{-11} | 7.2 | 2.49×10^{-05} | 3.18×10^{-04} | 7.5 |

Tetradecenoylcarnitine | 1.29×10^{-13} | 5.29×10^{-11} | 8.8 | 2.49×10^{-05} | 3.18×10^{-04} | 8.8 |

Decanoylcarnitine | 7.79×10^{-11} | 1.03×10^{-08} | 5.7 | 2.49×10^{-05} | 3.18×10^{-04} | 5.6 |

Laurylcarnitine | 8.48×10^{-11} | 1.06×10^{-08} | 9.2 | 2.49×10^{-05} | 3.18×10^{-04} | 8.7 |

7-ketocholesterol | 4.00×10^{-09} | 1.92×10^{-07} | 3.1 | 2.49×10^{-05} | 3.18×10^{-04} | 3.3 |

5,6β-epoxy-cholesterol | 2.12×10^{-08} | 6.61×10^{-07} | 5.1 | 2.49×10^{-05} | 3.18×10^{-04} | 7.0 |

7α-hydroxycholesterol | 3.88×10^{-08} | 1.07×10^{-06} | 4.1 | 2.49×10^{-05} | 3.18×10^{-04} | 4.5 |

All-trans-Retinal | 1.26×10^{-05} | 9.24×10^{-05} | -3.0 | 4.01×10^{-05} | 3.98×10^{-04} | -2.8 |

Octanoylcarnitine | 9.21×10^{-05} | 4.28×10^{-04} | 5.5 | 5.09×10^{-03} | 1.14×10^{-02} | 17.2 |

## Acknowledgments

## Conflict of Interest

## References and Notes

- Patti, G.J.; Yanes, O.; Siuzdak, G. Innovation: Metabolomics: the apogee of the omics trilogy. Nat. Rev. Mol. Cell. Biol.
**2012**, 13, 263–269. [Google Scholar] [CrossRef] - Smith, C.A.; Want, E.J.; O'Maille, G.; Abagyan, R.; Siuzdak, G. XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification. Anal. Chem.
**2006**, 78, 779–787. [Google Scholar] [CrossRef] - Katajamaa, M.; Oresic, M. Processing methods for differential analysis of LC/MS profile data. BMC Bioinformatics
**2005**, 6, 179. [Google Scholar] [CrossRef] [Green Version] - Lommen, A. MetAlign: Interface-Driven, Versatile Metabolomics Tool for Hyphenated Full-Scan Mass Spectrometry Data Preprocessing. Anal. Chem.
**2009**, 81, 3079–3086. [Google Scholar] [CrossRef] - Kuhl, C.; Tautenhahn, R.; Böttcher, C.; Larson, T.R.; Neumann, S. CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets. Anal. Chem.
**2011**, 84, 283–289. [Google Scholar] - Alonso, A.; Julia, A.; Beltran, A.; Vinaixa, M.; Diaz, M.; Ibanez, L.; Correig, X.; Marsal, S. AStream: an R package for annotating LC/MS metabolomic data. Bioinformatics
**2011**, 27, 1339–1340. [Google Scholar] [CrossRef] - Kristian Hovde, L. Multivariate methods in metabolomics – from pre-processing to dimension reduction and statistical analysis. Trac-Trend. Anal. Chem.
**2011**, 30, 827–841. [Google Scholar] [CrossRef] - Hendriks, M.M.W.B.; Eeuwijk, F.A.v.; Jellema, R.H.; Westerhuis, J.A.; Reijmers, T.H.; Hoefsloot, H.C.J.; Smilde, A.K. Data-processing strategies for metabolomics studies. Trac-Trend. Anal. Chem.
**2011**, 30, 1685–1698. [Google Scholar] - Kalogeropoulou, A. Pre-processing and analysis of high-dimensional plant metabolomics data. Master Thesis, University of East Anglia, Norwich, UK, 2011. [Google Scholar]
- Goodacre, R.; Broadhurst, D.; Smilde, A.; Kristal, B.; Baker, J.; Beger, R.; Bessant, C.; Connor, S.; Capuani, G.; Craig, A.; et al. Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics
**2007**, 3, 231–241. [Google Scholar] [CrossRef] - Karp, N.A.; Griffin, J.L.; Lilley, K.S. Application of partial least squares discriminant analysis to two-dimensional difference gel studies in expression proteomics. Proteomics
**2005**, 5, 81–90. [Google Scholar] [CrossRef] - Kenny, L.C.; Broadhurst, D.I.; Dunn, W.; Brown, M.; North, R.A.; McCowan, L.; Roberts, C.; Cooper, G.J.S.; Kell, D.B.; Baker, P.N.; et al. Robust Early Pregnancy Prediction of Later Preeclampsia Using Metabolomic Biomarkers. Hypertension
**2010**, 56, 741–749. [Google Scholar] [CrossRef] - R Development Core Team. 2009 R: A language and environment for statistical computing. Available online: http://www.R-project.org accessed on 17 October 2012.
- Patti, G.J.; Yanes, O.; Shriver, L.P.; Courade, J.P.; Tautenhahn, R.; Manchester, M.; Siuzdak, G. Metabolomics implicates altered sphingolipids in chronic pain of neuropathic origin. Nat. Chem. Biol.
**2012**, 8, 232–234. [Google Scholar] - Yanes, O.; Clark, J.; Wong, D.M.; Patti, G.J.; Sanchez-Ruiz, A.; Benton, H.P.; Trauger, S.A.; Desponts, C.; Ding, S.; Siuzdak, G. Metabolic oxidation regulates embryonic stem cell differentiation. Nat. Chem. Biol.
**2010**, 6, 411–417. [Google Scholar] [CrossRef] - Marchetti, V.; Yanes, O.; Aguilar, E.; Wang, M.; Friedlander, D.; Moreno, S.; Storm, K.; Zhan, M.; Naccache, S.; Nemerow, G.; et al. Differential macrophage polarization promotes tissue remodeling and repair in a model of ischemic retinopathy. Sci. Rep.
**2011**, 1, 76. [Google Scholar] - Dorrell, M.I.; Aguilar, E.; Jacobson, R.; Yanes, O.; Gariano, R.; Heckenlively, J.; Banin, E.; Ramirez, G.A.; Gasmi, M.; Bird, A.; et al. Antioxidant or neurotrophic factor treatment preserves function in a mouse model of neovascularization-associated oxidative stress. J. Clin. Invest.
**2009**, 119, 611–623. [Google Scholar] [CrossRef] - Vinaixa, M.; Rodriguez, M.A.; Samino, S.; Díaz, M.; Beltran, A.; Mallol, R.; Bladé, C.; Ibañez, L.; Correig, X.; Yanes, O. Metabolomics Reveals Reduction of Metabolic Oxidation in Women with Polycystic Ovary Syndrome after Pioglitazone-Flutamide-Metformin Polytherapy. PloS One
**2011**, 6, e29052. [Google Scholar] - Grainger, D.J. Megavariate Statistics meets High Data-density Analytical Methods:The Future of Medical Diagnostics? IRTL Rev. 1
**2003**, 1–6. [Google Scholar] - Ferreira, J.A.; Zwinderman, A. Approximate sample size calculations with microarray data: an illustration. Sta.t Appl. Genet. Mol. Biol.
**2006**, 5. Article25. [Google Scholar] - Ferreira, J.A.; Zwinderman, A.H. Approximate Power and Sample Size Calculations with the Benjamini-Hochberg Method. Int. J. Biostat.
**2006**, 2. [Google Scholar] - van Iterson, M.; 't Hoen, P.; Pedotti, P.; Hooiveld, G.; den Dunnen, J.; van Ommen, G.; Boer, J.; Menezes, R. Relative power and sample size analysis on gene expression profiling data. BMC Genomics
**2009**, 10, 439. [Google Scholar] [CrossRef] - van der Kloet, F.M.; Bobeldijk, I.; Verheij, E.R.; Jellema, R.H. Analytical Error Reduction Using Single Point Calibration for Accurate and Precise Metabolomic Phenotyping. J. Proteome. Res.
**2009**, 8, 5132–5141. [Google Scholar] [CrossRef] - Dunn, W.B.; Broadhurst, D.; Begley, P.; Zelena, E.; Francis-McIntyre, S.; Anderson, N.; Brown, M.; Knowles, J.D.; Halsall, A.; Haselden, J.N.; et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Prot.
**2011**, 6, 1060–1083. [Google Scholar] [CrossRef] - Masson, P.; Alves, A.C.; Ebbels, T.M.D.; Nicholson, J.K.; Want, E.J. Optimization and Evaluation of Metabolite Extraction Protocols for Untargeted Metabolic Profiling of Liver Samples by UPLC-MS. Anal. Chem.
**2010**, 82, 7779–7786. [Google Scholar] [CrossRef] - Crews, B.; Wikoff, W.R.; Patti, G.J.; Woo, H.-K.; Kalisiak, E.; Heideker, J.; Siuzdak, G. Variability Analysis of Human Plasma and Cerebral Spinal Fluid Reveals Statistical Significance of Changes in Mass Spectrometry-Based Metabolomics Data. Anal. Chem.
**2009**, 81, 8538–8544. [Google Scholar] [CrossRef] - Riffenburgh, R.H. Statistics in Medicine; Elsevier: Amsterdam, The Netherland, 2006. [Google Scholar]
- Motulsky, H. Intuitive Biostatistics; Oxford University Press: New York, NY, USA, 1995. [Google Scholar]
- Box, G.E.P. Non-Normality and Tests on Variances. Biometrika
**1953**, 40, 318–335. [Google Scholar] - Ioannidis, J.P.A. Why Most Published Research Findings Are False. PLoS Med.
**2005**, 2, e124. [Google Scholar] [CrossRef] [Green Version] - Storey, J.D.; Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA
**2003**, 100, 9440–9445. [Google Scholar] [CrossRef] - Storey, J.D. A direct approach to false discovery rates. J. Roy. Stat. Soc. B Met.
**2002**, 64, 479–498. [Google Scholar] [CrossRef] - Benjamini, Y.; Drai, D.; Elmer, G.; Kafkafi, N.; Golani, I. Controlling the false discovery rate in behavior genetics research. Behav. Brain. Res.
**2001**, 125, 279–284. [Google Scholar] [CrossRef] - Benjamini, Y.; Yekutieli, D. Quantitative Trait Loci Analysis Using the False Discovery Rate. Genetics
**2005**, 171, 783–790. [Google Scholar] [CrossRef] - Broadhurst, D.; Kell, D. Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics
**2006**, 2, 171–196. [Google Scholar] [CrossRef]

© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Vinaixa, M.; Samino, S.; Saez, I.; Duran, J.; Guinovart, J.J.; Yanes, O.
A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data. *Metabolites* **2012**, *2*, 775-795.
https://doi.org/10.3390/metabo2040775

**AMA Style**

Vinaixa M, Samino S, Saez I, Duran J, Guinovart JJ, Yanes O.
A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data. *Metabolites*. 2012; 2(4):775-795.
https://doi.org/10.3390/metabo2040775

**Chicago/Turabian Style**

Vinaixa, Maria, Sara Samino, Isabel Saez, Jordi Duran, Joan J. Guinovart, and Oscar Yanes.
2012. "A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data" *Metabolites* 2, no. 4: 775-795.
https://doi.org/10.3390/metabo2040775