A Data Set of 255,000 Randomly Selected and Manually Classified Extracted Ion Chromatograms for Evaluation of Peak Detection Methods
Abstract
:1. Summary
2. Data Description
2.1. MS Data
2.2. Processed Data
2.3. Classified Data
3. Methods
3.1. Data Set
3.2. Sampling
3.3. Chemical Analysis
3.3.1. Instrumentation
3.3.2. Sample Preparation
3.3.3. Chromatography
3.3.4. Mass Spectrometry
3.4. Data Processing
3.4.1. Peak Picking and Alignment
- Mass Detection:
- (a)
- Noise level 5e3
- ADAP chromatogram builder
- (a)
- Min group size of # of scans: 15
- In the entire chromatogram there must be at least this number of sequential scans with points above the group intensity threshold set by the user.
- (b)
- Group intensity threshold: 5000
- (c)
- Min highest intensity: 5000
- There must be at least one point in the chromatogram that has an intensity greater than or equal to this value.
- (d)
- m/z tolerance: 0.001 m/z or 7 ppm
- Smoothing
- (a)
- Filter width: 7
- This parameter sets the intensity of the smoothing effect, a higher width means a more extreme smoothing.
- Chromatogram deconvolution
- (a)
- Algorithm: Local minimum search
- (b)
- Chromatographic threshold: 80.0%
- Threshold for removing noise. The algorithm finds the intensity such that the specified percentage of the chromatogram’s data points are below that intensity; all such data points are removed.
- (c)
- Search minimum in RT range (min): 0.15
- A local minimum is considered to be the point that separates two adjacent peaks if it is minimal in the specified retention time range.
- (d)
- Minimum relative height: 20.0%
- Minimum height of a peak relative to the chromatogram’s highest data point.
- (e)
- Minimum absolute height: 5.0 ×
- Minimum absolute height of a peak for it to be recognized.
- (f)
- Min ratio of peak top/edge: 2.7
- Minimum ratio between a peak’s top intensity and side (lowest) data points. This parameter helps to reduce the detection of false peaks in cases where the chromatogram is not smooth.
- (g)
- Peak duration range (min): 0.15–4.00
- Range of acceptable peak durations
- Alignment
- (a)
- m/z tolerance: 0.001 or 7 ppm
- (b)
- Weight for m/z: 80
- (c)
- Retention time tolerance: 0.3 absolute (min)
- (d)
- Weight for RT: 30
3.4.2. Extraction of EICs
3.5. Classification
3.5.1. Initial Classification
3.5.2. Automated Post-Processing
- All single-length scans have been removed from the EICs. This means that every scan that has a zero-intensity scan before and after it has also been set to zero. This was done over all scans in all EICs, because (especially at low intensities) it has been observed that often that a random uptick of electronical noise caused single-scan peaks.
- All peaks with a scanlength of 2 or smaller have been removed if the sum of their intensities was lower than .
- All peaks with a sum lower than have been removed.
4. User Notes
Author Contributions
Funding
Conflicts of Interest
Abbreviations
LC-MS | liquid chromatography mass spectrometry |
EIC | extracted ion chromatogram |
m/z | mass-to-charge ratio |
WWTP | waste water treatment plant |
HESI | heated electrospray ionization |
Appendix A. WWTP Information
WWTP I | WWTP II | |
---|---|---|
Sewer system | Separate sewer system | Combined sewer system |
Treatment technology | Activated sludge | Activated sludge |
Connection rate [%] | 98 | 99.4 |
Size (population equivalents) | 80,000 | 60,000 |
Connected inhabitants | 50,000 | 36,800 |
Connected industry (population equivalents) | 15,000 | 5400 |
Hydraulic retention time [h] | ca. 72 | 54 (dry weather ∼7500 m3) 15 (rain event ∼27.000 m3) |
Daily discharge [m3] on sampling day | 10,042 | 6617 |
Temperature of effluent water [°C] on sampling day | 16.7 | 18.01 |
Appendix B. Solvent Gradient Information
Time [min] | Flow Rate [mL] | Solvent A [%] Water + 0.1% Formic Acid | Solvent B [%] Methanol + 0.1% Formic Acid | Solvent C [%] Acetone/Isopropanol (50:50) |
---|---|---|---|---|
0 | 0.3 | 95 | 5 | 0 |
1 | 0.3 | 95 | 5 | 0 |
13 | 0.3 | 0 | 100 | 0 |
24 | 0.3 | 0 | 100 | 0 |
24.1 | 0.35 | 5 | 10 | 85 |
26.2 | 0.35 | 5 | 10 | 85 |
26.3 | 0.35 | 95 | 5 | 0 |
31.9 | 0.35 | 95 | 5 | 0 |
32 | 0.3 | 95 | 5 | 0 |
Appendix C. Evaluation of Calibration Standards
Appendix C.1. m/z Standard Deviation
Appendix C.2. Retention Time Standard Deviation
Appendix C.3. Median Intensity Ratio of Calibration Levels
Appendix C.4. Peak Duration Standard Deviation
Appendix D. Peak Quality Assessment
Appendix D.1. In-Depth Analysis of the Expert Classification of Peak Quality
All True | All False | All Unclear | True, False | True, Unclear | False, Unclear | True, False, Unclear |
---|---|---|---|---|---|---|
322 | 285 | 1 | 163 | 136 | 104 | 60 |
Appendix D.2. Guidelines and Examples for the Peak Quality Assessment
References
- Libiseller, G.; Dvorzak, M.; Kleb, U.; Gander, E.; Eisenberg, T.; Madeo, F.; Neumann, S.; Trausinger, G.; Sinner, F.; Pieber, T.; et al. IPO: A Tool for Automated Optimization of XCMS Parameters. BMC Bioinform. 2015, 16, 118. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Smith, C.A.; Want, E.J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification. Anal. Chem. 2006, 78, 779–787. [Google Scholar] [CrossRef] [PubMed]
- Hu, M.; Krauss, M.; Brack, W.; Schulze, T. Optimization of LC-Orbitrap-HRMS Acquisition and MZmine 2 Data Processing for Nontarget Screening of Environmental Samples Using Design of Experiments. Anal. Bioanal. Chem. 2016, 408, 7905–7915. [Google Scholar] [CrossRef] [PubMed]
- Eliasson, M.; Rännar, S.; Madsen, R.; Donten, M.A.; Marsden-Edwards, E.; Moritz, T.; Shockcor, J.P.; Johansson, E.; Trygg, J. Strategy for Optimizing LC-MS Data Processing in Metabolomics: A Design of Experiments Approach. Anal. Chem. 2012, 84, 6869–6876. [Google Scholar] [CrossRef] [PubMed]
- Pluskal, T.; Castillo, S.; Villar-Briones, A.; Orešič, M. MZmine 2: Modular Framework for Processing, Visualizing, and Analyzing Mass Spectrometry-Based Molecular Profile Data. BMC Bioinform. 2010, 11, 395. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lange, E.; Tautenhahn, R.; Neumann, S.; Gröpl, C. Critical Assessment of Alignment Procedures for LC-MS Proteomics and Metabolomics Measurements. BMC Bioinform. 2008, 9, 375. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Beckers, L.M.; Brack, W.; Dann, J.P.; Krauss, M.; Müller, E.; Schulze, T. Unraveling Longitudinal Pollution Patterns of Organic Micropollutants in a River by Nontarget Screening and Cluster Analysis. Sci. Total Environ. 2020. [Google Scholar] [CrossRef]
- Schymanski, E.L.; Singer, H.P.; Slobodnik, J.; Ipolyi, I.M.; Oswald, P.; Krauss, M.; Schulze, T.; Haglund, P.; Letzel, T.; Grosse, S.; et al. Non-target screening with high-resolution mass spectrometry: Critical review using a collaborative trial on water analysis. Anal. Bioanal. Chem. 2015, 407, 6237–6255. [Google Scholar] [CrossRef] [PubMed]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019. [Google Scholar]
- Walker, A. Openxlsx: Read, Write and Edit XLSX Files; R Core Team: Vienna, Austria, 2019. [Google Scholar]
- Wickham, H.; Bryan, J. Readxl: Read Excel Files; R Core Team: Vienna, Austria, 2019. [Google Scholar]
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Müller, E.; Huber, C.; Beckers, L.-M.; Brack, W.; Krauss, M.; Schulze, T. A Data Set of 255,000 Randomly Selected and Manually Classified Extracted Ion Chromatograms for Evaluation of Peak Detection Methods. Metabolites 2020, 10, 162. https://doi.org/10.3390/metabo10040162
Müller E, Huber C, Beckers L-M, Brack W, Krauss M, Schulze T. A Data Set of 255,000 Randomly Selected and Manually Classified Extracted Ion Chromatograms for Evaluation of Peak Detection Methods. Metabolites. 2020; 10(4):162. https://doi.org/10.3390/metabo10040162
Chicago/Turabian StyleMüller, Erik, Carolin Huber, Liza-Marie Beckers, Werner Brack, Martin Krauss, and Tobias Schulze. 2020. "A Data Set of 255,000 Randomly Selected and Manually Classified Extracted Ion Chromatograms for Evaluation of Peak Detection Methods" Metabolites 10, no. 4: 162. https://doi.org/10.3390/metabo10040162
APA StyleMüller, E., Huber, C., Beckers, L. -M., Brack, W., Krauss, M., & Schulze, T. (2020). A Data Set of 255,000 Randomly Selected and Manually Classified Extracted Ion Chromatograms for Evaluation of Peak Detection Methods. Metabolites, 10(4), 162. https://doi.org/10.3390/metabo10040162