Untangling the Complexities of Processing and Analysis for Untargeted LC-MS Data Using Open-Source Tools
Abstract
:1. Introduction
2. Materials and Methods
2.1. Overview and Workflow Diagram
2.2. Experimental Design and Quality Control
- Case vs. control
- Wild-type vs. transgenic line
- Strain 1 vs. strain 2 vs. strain 3
- Two factors with two or more levels in each such as +/− treatment for two strains
- Time course for one or two factors such as +/− treatment for two strains over three time points
- What are the biological replicates being analysed and are they independent of each other (or has the same organism/population been sampled multiple times)?
- Are there technical replicates (i.e., repeated runs of the same sample)?
- Are Quality Control (QC) samples required? Are analytical standards needed?
- What groupings are required to answer the research questions outlined?
- Spike all prepared samples with a compound for which the m/z (and RT) is known and which is unlikely to be otherwise present in the experimental samples;
- Prepare a pooled QC sample from an aliquot of each of the samples and include this at regular intervals in the MS run;
- Include blanks and/or extraction blanks at regular intervals in the MS run;
- Use lock mass calibration (for Waters instruments).
- Check file sizes of .raw files across the MS run;
- Check file sizes of converted .mzML files—reconvert any that are unexpected;
- Compare spectra between technical replicates
2.3. Metabolite Extraction and Data Acquistion
2.4. Preparing Metadata for Analysis
- “Filename”: this is a list of the filenames of the .mzml files (the part before the .mzml)
- “Filetext”: this is the name that has been manually added to the metadata of that sample
- “MSFile” or an equivalent column that contains either “pos” or “neg” within it. Any other columns will be ignored in this file.
- “Filetext”: this must contain all the distinct values of “Filetext” from samplelist.csv
- “Variable1”: the naming of this column is left to the user. For example, in an MS run comparing a wild-type to a control, this column could be named “treatment” and filled with “WT” and “C” as appropriate
- “Variable2” etc: further variables. This may include batch identifiers (for example if many samples were run over multiple days), treatments or environmental variables
3. Results
3.1. Converting Data to Open Format Using Proteowizard
3.2. Preprocessing Data
- Depending on the MS approach, different stages are involved but they broadly fall into:
- Baseline correction and/or noise reduction (estimating what part of the detected intensity is the sample and “cleaning” or adjusting the spectra to show only the signal believed to be associated with the sample);
- Normalisation and/or standardisation (these can mean a range of different things to different people but broadly cover accounting for differences in sample volume or concentration or total intensity of the signal);
- Grouping and peak picking (wave-form algorithms are used to determine which parts of the spectra constitute separate peaks utilising their m/z value);
- Alignment or peak matching (assessing across samples to determine whether peaks with slightly different m/z values are the same peak so that samples can be compared more reliably).
- The above criteria are very important when processing data as they can have a big impact on data quality however the parameters may vary with different datasets and different analysis methods. The importance of these factors have been discussed previously by [19].
3.3. Multivariate Analysis
- Are the metabolomic fingerprints distinct classes (treatment groups) different from each other?
- Which features of the metabolomic fingerprint are causing them to be different from each other?
3.4. What Are My Metabolites?
- METLIN to search by m/z;
- KEGG PATHWAY and KEGG COMPOUND [32] to corroborate likelihood of detecting certain compounds in the study organism/sample and to gain insight on biological function;
- Data repositories such as MetaboLights;
- Reporting Metabolomics Standards Initiative (MSI) identification levels (see also [37]).
3.5. Sharing Metabolomics Data
- Findable
- Accessible
- Interoperable
- Reusable
3.6. Citation of the Tools Used in the Workflow
- All R packages used;
- R and RStudio versions;
- Proteowizard (SeeMS and MSConvert);
- Metaboanalyst;
- XCMS online and METLIN;
- MassUp;
- MassBank (including access date);
- ECMDB and any other organism specific metabolite databases used;
- KEGG (including BRITE, COMPOUND and PATHWAY);
- PubChem;
- A data availability statement that links to your archived data (e.g., in MetaboLights).
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Allwood, J.W.; Williams, A.; Uthe, H.; van Dam, N.M.; Mur, L.A.J.; Grant, M.R.; Pétriacq, P. Unravelling Plant Responses to Stress—The Importance of Targeted and Untargeted Metabolomics. Metabolites 2021, 11, 558. [Google Scholar] [CrossRef] [PubMed]
- Want, E.J.; Cravatt, B.F.; Siuzdak, G. The expanding role of mass spectrometry in metabolite profiling and characterization. ChemBioChem 2005, 6, 1941–1951. [Google Scholar] [CrossRef] [PubMed]
- Vincent, I.M.; Ehmann, D.E.; Mills, S.D.; Perros, M.; Barrett, M.P. Untargeted metabolomics to ascertain antibiotic modes of action. Antimicrob. Agents Chemother. 2016, 60, 2281–2291. [Google Scholar] [CrossRef] [Green Version]
- Di Minno, A.; Gelzo, M.; Stornaiuolo, M.; Ruoppolo, M.; Castaldo, G. The evolving landscape of untargeted metabolomics. Nutr. Metab. Cardiovasc. Dis. 2021, 31, 1645–1652. [Google Scholar] [CrossRef]
- Wei, Y.; Jasbi, P.; Shi, X.; Turner, C.; Hrovat, J.; Liu, L.; Rabena, Y.; Porter, P.; Gu, H. Early Breast Cancer Detection Using Untargeted and Targeted Metabolomics. J. Proteome Res. 2021, 20, 3133. [Google Scholar] [CrossRef] [PubMed]
- Schrimpe-Rutledge, A.C.; Codreanu, S.G.; Sherrod, S.D.; McLean, J.A. Untargeted Metabolomics Strategies—Challenges and Emerging Directions. J. Am. Soc. Mass Spectrom. 2016, 27, 1897–1905. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dudzik, D.; Barbas-Bernados, C.; García, A.; Barbas, C. Quality assurance procedures for mass spectrometry untargeted metabolomics. A review. J. Pharm. Biomed. Anal. 2018, 147, 149–173. [Google Scholar] [CrossRef]
- Rainer, J.; Vicini, A.; Salzer, L.; Stanstrup, J.; Badia, J.M.; Neumann, S.; Stravs, M.A.; Verri Hernandes, V.; Gatto, L.; Gibb, S.; et al. A Modular and Expandable Ecosystem for Metabolomics Data Annotation in R. Metabolites 2022, 12, 173. [Google Scholar] [CrossRef]
- Blaženović, I.; Kind, T.; Ji, J.; Fiehn, O. Software tools and approaches for compound identification of LC-MS/MS data in metabolomics. Metabolites 2018, 8, 31. [Google Scholar] [CrossRef] [Green Version]
- Misra, B.B. New tools and resources in metabolomics: 2016–2017. Electrophoresis 2018, 39, 909–923. [Google Scholar] [CrossRef]
- Chaleckis, R.; Meister, I.; Zhang, P.; Wheelock, C.E. Challenges, progress and promises of metabolite annotation for LC–MS-based metabolomics. Curr. Opin. Biotechnol. 2019, 55, 44–50. [Google Scholar] [CrossRef]
- Chang, H.-Y.; Colby, S.M.; Du, X.; Gomez, J.D.; Helf, M.J.; Kechris, K.; Kirkpatrick, C.R.; Li, S.; Patti, G.J.; Renslow, R.S.; et al. A Practical Guide to Metabolomics Software Development. Anal. Chem. 2021, 93, 1912–1923. [Google Scholar] [CrossRef] [PubMed]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2010; Available online: https://www.R-project.org/ (accessed on 23 February 2023).
- Lu, W.; Su, X.; Klein, M.S.; Lewis, I.A.; Fiehn, O.; Rabinowitz, J.D. Metabolite Measurement: Pitfalls to Avoid and Practices to Follow. Annu. Rev. Biochem. 2017, 86, 277–304. [Google Scholar] [CrossRef]
- Pezzatti, J.; Boccard, J.; Codesido, S.; Gagnebin, Y.; Joshi, A.; Picard, D.; González-Ruiz, V.; Rudaz, S. Implementation of liquid chromatography-high resolution mass spectrometry methods for untargeted metabolomic analyses of biological samples: A tutorial. Anal. Chim. Acta 2020, 1105, 28–44. [Google Scholar] [CrossRef] [PubMed]
- Austen, N.; Walker, H.J.; Lake, J.A.; Phoenix, G.K.; Cameron, D.D. The Regulation of Plant Secondary Metabolism in Response to Abiotic Stress: Interactions Between Heat Shock and Elevated CO2. Front. Plant Sci. 2019, 10, 1463. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Martens, L.; Chambers, M.; Sturm, M.; Kessner, D.; Levander, F.; Shofstahl, J.; Tang, W.H.; Römpp, A.; Neumann, S.; Pizarro, A.D.; et al. mzML—A Community Standard for Mass Spectrometry Data. Mol. Cell. Proteom. 2011, 10, R110.000133. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P. ProteoWizard: Open source software for rapid proteomics tools development. Bioinformatics 2008, 24, 2534–2536. [Google Scholar] [CrossRef] [Green Version]
- Forsberg, E.; Huan, T.; Rinehart, D.; Benton, H.P.; Warth, B.; Hilmers, B.; Siuzdak, G. Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online. Nat. Protoc. 2018, 13, 633–651. [Google Scholar] [CrossRef]
- Katajamaa, M.; Orešič, M. Data processing for mass spectrometry-based metabolomics. J. Chromatogr. A 2007, 1158, 318–328. [Google Scholar] [CrossRef]
- López-Fernández, H.; Santos, H.M.; Capelo, J.L.; Fdez-Riverola, F.; Glez-Peña, D.; Reboiro-Jato, M. Mass-Up: An all-in-one open software application for MALDI-TOF mass spectrometry knowledge discovery. BMC Bioinform. 2015, 16, 318. [Google Scholar] [CrossRef] [Green Version]
- Smith, C.A.; Want, E.J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification. Anal. Chem. 2006, 78, 779–787. [Google Scholar] [CrossRef] [PubMed]
- Gibb, S.; Strimmer, K. MALDIquant: A versatile R package for the analysis of mass spectrometry data. Bioinformatics 2012, 28, 2270–2271. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xia, J.; Psychogios, N.; Young, N.; Wishart, D.S. MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucl. Acids Res. 2009, 37, 652–660. [Google Scholar] [CrossRef] [Green Version]
- Metaboanalyst Tutorials. Available online: https://dev.metaboanalyst.ca/docs/Tutorials.xhtml (accessed on 27 January 2023).
- Tsugawa, H.; Cajka, T.; Kind, T.; Ma, Y.; Higgins, B.; Ikeda, K.; Kanazawa, M.; van der Gheynst, J.; Fiehn, O.; Arita, M. MS-DIAL: Data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods 2015, 12, 523–526. [Google Scholar] [CrossRef]
- Narayanaswamy, P.; Teo, G.; Ow, J.R.; Lau, A.; Kaldis, P.; Tate, S.; Choi, H. MetaboKit: A comprehensive data extraction tool for untargeted metabolomics. Mol. Omics 2020, 16, 436. [Google Scholar] [CrossRef]
- Howe, E.; Holton, K.; Nair, S.; Schlauch, D.; Sinha, R.; Quackenbush, J. MeV: MultiExperiment Viewer. In Biomedical Informatics for Cancer Research; Ochs, M., Casagrande, J., Davuluri, R., Eds.; Springer: Boston, MA, USA, 2010; pp. 267–277. [Google Scholar] [CrossRef]
- Kuhl, C.; Tautenhahn, R.; Boettcher, C.; Larson, T.R.; Neumann, S. CAMERA: An integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal. Chem. 2012, 84, 283–289. [Google Scholar] [CrossRef] [Green Version]
- Haug, K.; Cochrane, K.; Nainala, V.C.; Williams, M.; Chang, J.; Jayaseelan, K.V.; O’Donovan, C. MetaboLights: A resource evolving in response to the needs of its scientific community. Nucleic Acids Res. 2020, 48, D440–D444. [Google Scholar] [CrossRef] [Green Version]
- Guijas, C.; Montenegro-Burke, J.R.; Domingo-Almenara, X.; Palermo, A.; Warth, B.; Hermann, G.; Koellensperger, G.; Huan, T.; Uritboonthai, W.; Aisporna, A.E.; et al. METLIN: A Technology Platform for Identifying Knowns and Unknowns. Anal. Chem. 2018, 90, 3156–3164. [Google Scholar] [CrossRef] [Green Version]
- Kanehisa, M. KEGG Bioinformatics Resource for Plant Genomics and Metabolomics. In Plant Bioinformatics; Methods in Molecular Biology; Edwards, D., Ed.; Humana Press: New York, NY, USA, 2016; Volume 1374. [Google Scholar] [CrossRef]
- Horai, H.; Arita, M.; Kanaya, S.; Nihei, Y.; Ikeda, T.; Suwa, K.; Ojima, Y.; Tanaka, K.; Tanaka, S.; Aoshima, K.; et al. MassBank: A public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 2010, 45, 703–714. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem 2023 update. Nucleic Acids Res. 2023, 51, D1373–D1380. [Google Scholar] [CrossRef] [PubMed]
- Caspi, R.; Altman, T.; Billington, R.; Dreher, K.; Foerster, H.; Fulcher, C.A.; Holland, T.A.; Keseler, I.M.; Kothari, A.; Kubo, A.; et al. The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 2014, 42, D459–D471. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- The Metabolomics Workbench. Available online: https://www.metabolomicsworkbench.org/ (accessed on 27 January 2023).
- Sumner, L.W.; Lei, Z.; Nikolau, B.J.; Saito, K.; Roessner, U.; Trengove, R. Proposed quantitative and alphanumeric metabolite identification metrics. Metabolomics 2014, 10, 1047–1049. [Google Scholar] [CrossRef]
- Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016, 3, 160018. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Alseekh, S.; Aharoni, A.; Brotman, Y.; Contrepois, K.; D’Auria, J.; Ewald, J.; Ewald, J.C.; Fraser, P.D.; Giavalisco, P.; Hall, R.D.; et al. Mass spectrometry-based metabolomics: A guide for annotation, quantification and best reporting practices. Nat. Methods 2021, 18, 747–756. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Parker, E.J.; Billane, K.C.; Austen, N.; Cotton, A.; George, R.M.; Hopkins, D.; Lake, J.A.; Pitman, J.K.; Prout, J.N.; Walker, H.J.; et al. Untangling the Complexities of Processing and Analysis for Untargeted LC-MS Data Using Open-Source Tools. Metabolites 2023, 13, 463. https://doi.org/10.3390/metabo13040463
Parker EJ, Billane KC, Austen N, Cotton A, George RM, Hopkins D, Lake JA, Pitman JK, Prout JN, Walker HJ, et al. Untangling the Complexities of Processing and Analysis for Untargeted LC-MS Data Using Open-Source Tools. Metabolites. 2023; 13(4):463. https://doi.org/10.3390/metabo13040463
Chicago/Turabian StyleParker, Elizabeth J., Kathryn C. Billane, Nichola Austen, Anne Cotton, Rachel M. George, David Hopkins, Janice A. Lake, James K. Pitman, James N. Prout, Heather J. Walker, and et al. 2023. "Untangling the Complexities of Processing and Analysis for Untargeted LC-MS Data Using Open-Source Tools" Metabolites 13, no. 4: 463. https://doi.org/10.3390/metabo13040463
APA StyleParker, E. J., Billane, K. C., Austen, N., Cotton, A., George, R. M., Hopkins, D., Lake, J. A., Pitman, J. K., Prout, J. N., Walker, H. J., Williams, A., & Cameron, D. D. (2023). Untangling the Complexities of Processing and Analysis for Untargeted LC-MS Data Using Open-Source Tools. Metabolites, 13(4), 463. https://doi.org/10.3390/metabo13040463