In Silico Optimization of Mass Spectrometry Fragmentation Strategies in Metabolomics

Liquid chromatography (LC) coupled to tandem mass spectrometry (MS/MS) is widely used in identifying small molecules in untargeted metabolomics. Various strategies exist to acquire MS/MS fragmentation spectra; however, the development of new acquisition strategies is hampered by the lack of simulators that let researchers prototype, compare, and optimize strategies before validations on real machines. We introduce Virtual Metabolomics Mass Spectrometer (ViMMS), a metabolomics LC-MS/MS simulator framework that allows for scan-level control of the MS2 acquisition process in silico. ViMMS can generate new LC-MS/MS data based on empirical data or virtually re-run a previous LC-MS/MS analysis using pre-existing data to allow the testing of different fragmentation strategies. To demonstrate its utility, we show how ViMMS can be used to optimize N for Top-N data-dependent acquisition (DDA) acquisition, giving results comparable to modifying N on the mass spectrometer. We expect that ViMMS will save method development time by allowing for offline evaluation of novel fragmentation strategies and optimization of the fragmentation strategy for a particular experiment.


MSSimulations
shows the overall characteristics of m/z, RT and intensity values from running XCMS peak picking on the resulting simulated mzML file and 4 multi-beer samples for comparison (labelled multi-beer-1, ..., multi-beer-4 ). It can be observed from Figure 1 that all m/z and RT boxplots show similar profiles between the 4 randomly selected real beer samples and the simulated sample from ViMMS. We explain the slightly higher intensity distribution in Figure 1c due to the fact that in our sampling scheme, a chemical having high sampled maximum intensity values could be assigned a large ROI that spans a big RT range. During peak picking, these potentially-noisy ROIs are detected as multiple high-intensity peaks by XCMS, resulting in an upward shift in the intensity distribution of detected MS1 features from the simulated mzML file. Improving the regions of interest detection scheme to address this issue is a future work.

Top-N Simulations
From the actual multi-beer-1 fragmentation mzML, we extracted 512,540 ROIs across all scans using a 10 ppm tolerance that determines when peaks should be grouped under one ROI. Only ROIs with at least one peak above the minimum intensity threshold of 1.75E5 are kept. This results in 10,190 ROIs, which are converted into chemical objects in ViMMS, having their corresponding normalised chromatographic peak shapes derived from the ROIs. Simulated Top-10 DDA fragmentation was performed using the same fragmentation parameters used to generate the actual beer1 data (N=10, DEW=15s). Simulation took approximately 2 minutes in ViMMS running on an Intel Core i9 laptop. The generated mzML file and simulator state were loaded into ToppView and Jupyter Notebook for further analysis. Visual inspection of the resulting spectra in ToppView shows that the mzML files from ViMMS and the multi-beer-1 data look broadly similar ( Figure 2). Comparing the number of scans, we obtained a total of 9,891 scans from the simulator to 9,406 scans in the actual beer mzML, resulting in 3,027 fewer MS1 features that can be picked by XCMS from the generated mzML file (Table 1).
To evaluate these differences quantitatively, we matched precursor ion information from the real multi-beer 1 mzML to the generated mzML from ViMMS. 6,757 out of 7,655 (88%) precursor ions can be matched from the actual multi-beer 1 file to the generated file when matching up to 2 decimal places for the m/z values and using 15s tolerance window for retention time. It can be seen from Figure 3 that fragmentation events from both the real and simulated files are in close proximity to each other.  Consider a scenario where within an experimental batch, full-scan and Top-N data are acquired. In a typical analysis of this scenario, the MS1 peaks detected by XCMS from the full-scan MS1 file serve as the ground truth of peaks we wish to fragment. To compute performance in this scenario, XCMS' CentWave peak detection is performed on the fullscan input files mzML files, resulting in the number of ground truth MS1 features shown in Table 2.

Number of ground truth MS1 features
multi-beer-1 9,198 multi-beer-2 9,805 multi-urine-2 7,188 multi-urine-3 7,664 For evaluation, we provide the following definition of positive and negative instances (illustrated in Figure 4): True Positives (TP): peaks from ground truth (found in full-scan files) that are fragmented above the minimum intensity threshold.
False Positives (FP): peaks from ground truth that are not fragmented + peaks from ground truth that are fragmented below the minimum intensity threshold.
False Negatives (FN): peaks not from ground truth that are fragmented above the minimum intensity threshold. The results in this experiment shows that increasing N results in lower precision (Figure 5a), while recall increases with N initially but flattens (Figure 5b). Assessing the F 1 score (Figure 5c), which is the harmonic average of precision and recall, we see that the best fragmentation performance (as represented by the F 1 score) is reached between N from 10 to 20 and plateaus after 20, suggesting that no further performance gain is obtained from increasing the number of precursor ions to fragment.  Figure 6 shows boxplots comparing the F 1 -scores (representative of fragmentation performance) of all parameter combinations from the real and simulated data. The results from our simulator generally match the results from the real data, although with a slightly greater spread in the simulated fragmentation performance. Exploring the results in detail, we see that the best fragmentation performance is obtained at N = 20 and DEW = 30 for both datasets and the worst at N = 1 and DEW = (15, 30, 60) (Table 3). We explain our findings by the fact that the best performance is obtained at the parameter combinations with the lowest trade-off between fragmentation performance and MS2 peak picking quality. The poor results came about from when both N and DEW are too small. In the former, not enough unique MS1 peaks are fragmented due to dynamic exclusion effect, whereas in the latter case, the quality of peak picking from fragmentation files decrease significantly, affecting the number of true positives obtained.

Implementation Details
ViMMS is implemented in Python (Van Rossum et al., 2007), with numerical and matrix computations performed on top of the NumPY (Van Der Walt et al., 2011), SciPy (Jones et al., 2001) and Scikit-learn (Pedregosa et al., 2011) libraries. Prototyping as well as running interactive examples of ViMMS functionalities are performed in Jupyter Notebook environment (Kluyver et al., 2016). For DsDA, we re-use the original script from DsDA (Broeckling et al., 2018) available in the R programming language (Ihaka and Gentleman, 1996) to perform the prioritisation and scoring process. mzML files are read using the pymzML library (Kösters et al., 2018) and written using the psims library (Klein and Zaia, 2019).