Untargeted Metabolomics-Based Screening Method for Inborn Errors of Metabolism using Semi-Automatic Sample Preparation with an UHPLC- Orbitrap-MS Platform

Routine diagnostic screening of inborn errors of metabolism (IEM) is currently performed by different targeted analyses of known biomarkers. This approach is time-consuming, targets a limited number of biomarkers and will not identify new biomarkers. Untargeted metabolomics generates a global metabolic phenotype and has the potential to overcome these issues. We describe a novel, single platform, untargeted metabolomics method for screening IEM, combining semi-automatic sample preparation with pentafluorophenylpropyl phase (PFPP)-based UHPLC- Orbitrap-MS. We evaluated analytical performance and diagnostic capability of the method by analysing plasma samples of 260 controls and 53 patients with 33 distinct IEM. Analytical reproducibility was excellent, with peak area variation coefficients below 20% for the majority of the metabolites. We illustrate that PFPP-based chromatography enhances identification of isomeric compounds. Ranked z-score plots of metabolites annotated in IEM samples were reviewed by two laboratory specialists experienced in biochemical genetics, resulting in the correct diagnosis in 90% of cases. Thus, our untargeted metabolomics platform is robust and differentiates metabolite patterns of different IEMs from those of controls. We envision that the current approach to diagnose IEM, using numerous tests, will eventually be replaced by untargeted metabolomics methods, which also have the potential to discover novel biomarkers and assist in interpretation of genetic data.


Introduction
Routine diagnostic testing for inborn errors of metabolism (IEM) in clinically selected individuals is currently performed by targeted analysis of known biomarkers [1]. In most laboratories, specific groups of metabolites, such as organic acids, amino acids and acylcarnitines, are analyzed by dedicated

Retention Time Stability
To investigate the chromatographic stability of the LC method, we monitored within-batch and between-batch retention time (RT) variation of 17 stable isotope-labeled standards, which were added to the samples. Within batch RT variation was never higher than 1.12% for the positive ion mode and 0.37% for the negative ion mode, while the median CV of within batch variation was 0.22% (all plasma samples and QC samples, N = 29-166). Median and range data for all internal and external standards of all batches are shown in supplementary materials S1. Between batch RT variation was within 2% (all plasma samples and QC samples), except for [D 10 ]-isoleucine (3.8%, N = 1011), [D 2 ]-uridine (4.7%, N = 1011), [D 3 ]-methylmalonic acid (5.3%, N = 504), and [D 4 ]-tyrosine (5.4%, N = 1011), which were all eluting between 1.70-2.80 min. This larger variation probably relates to the use of two analytical columns (although manufactured with core-shell particles from the same batch) during the experiments, since RT variation between different batches analyzed on one column was always within 2%. Between-batch RT variation data for all internal and external standards are shown in supplementary materials S2. To obtain more information regarding the reproducibility of peak areas across a chromatographic run, we monitored peak areas of metabolites that were consistently annotated in all analyses of the QC sample across the eight batches (supplementary materials S8). Within-batch CVs were determined per metabolite and binned by retention time (bin width: 1 min), covering the whole chromatogram for all eight batches. For each batch, this resulted in 11 bins and 10 bins containing a total of 119 and 87 metabolites, for the positive and negative ion mode, respectively. First, we reviewed each bin in the eight batches separately, i.e., 88 bins for the positive ion mode and 80 bins for the negative ion mode. Median CV of variation in peak area in the positive ion mode was <30% for 73/88 bins (83%) and 66/88 bins (75%) had a median CV <20%. In the negative ion mode, 73/80 bins (91%) had a median CV <30%, while in 62/80 bins (78%) median CV was <20%. The largest variation was observed near the end of the chromatogram in bins 8-9, 9-10 and 10-11 min (supplementary materials S5). Next, we reviewed median CVs across all batches for each bin (see Figure 1). A constant low variation in the peak areas measured in a bin across batches suggests a stable LC-MS method. Only bins 8-9, 9-10 and 10-11 min in the positive ion mode showed a median CV >20%, while in the negative ion mode a median variation >20% was observed in bins 6-7, 8-9 and 9-10 min ( Figure 1). Overall, 182 of the 206 (88%) metabolites annotated had a median CV <20%.

Separation of Isobaric and Isomeric Species
Correct annotation of features is crucial in the diagnostics of IEM, because misidentification or lack of identification can result into false positive or false negative results. In this respect, it is challenging to make the correct annotation of isobaric and isomeric species. Using our Orbitrap-MS,

Separation of Isobaric and Isomeric Species
Correct annotation of features is crucial in the diagnostics of IEM, because misidentification or lack of identification can result into false positive or false negative results. In this respect, it is challenging to make the correct annotation of isobaric and isomeric species. Using our Orbitrap-MS, a resolution of 140,000 is feasible while maintaining adequate data sampling frequency (1.8 scans/s). To determine the ability to separate isobaric ions by using only mass spectrometry, we identified 1608 combinations of isobaric ions derived from metabolites present in our in-house database, which required a resolution >40,000 for separation. In the ESI(+)-mode 1032 out of 1608 combinations (64%) could be baseline-separated by acquiring the data at a resolution of 140,000. This included, for example, serine[K + ] and cysteine[Na + ], that co-elute in our LC-method. In the ESI(-)-mode, 1010 out of 1608 combinations (63%) could be resolved by acquiring the data at a resolution of 140,000 (detailed data available on request).
To achieve correct annotation of isomeric compounds, chromatographic separation is required. We selected pentafluorophenylpropyl (PFPP) phase-based chromatography, because PFPP, as a stationary phase, shows better retention for small polar compounds, such as organic acids and amino acids, compared to traditional C18-based chromatography and is more stable in terms of retention time variation and required stabilization times, compared to HILIC [12][13][14][15][16]. An example of compounds that are separated by our LC-method is the isomeric group N-acetylisoleucine, N-acetylleucine, isohexanoylglycine and hexanoylglycine (C 8 H 15 NO 3 ). Separation of these isomers is required to distinguish aminoacylase I deficiency from medium chain acyl-CoA dehydrogenase deficiency. Figure 2 clearly shows separation of N-acetylisoleucine and N-acetylleucine in a plasma sample of a patient with aminoacylase I deficiency, while hexanoylglycine and isohexanoylglycine peaks are present at different retention times in a plasma sample of a patient with medium chain acyl-CoA dehydrogenase deficiency. Other examples of compounds that can be separated by our LC method using PFPP chromatography are the isomeric pairs isoleucine/leucine and betaine/valine (see supplementary material S6), as well as 2-hydroxybutyric acid/3-hydroxybutyric acid/4-hydroxybutyric acid (C 4 H 8 O 3 ), glutaric acid/ethylmalonic acid/methylsuccinic acid (C 5 H 8 O 4 ) and tiglylglycine/3-methylcrotonylglycine (C 7 H 11 NO 3 ) (data not shown).

Data Analysis and IEM Detection
We have studied plasma samples of 53 patients covering 33 distinct IEM to evaluate our UHPLC-Orbitrap-MS platform and data processing pipeline. We investigated the ability to (1) detect metabolites relevant to IEM and (2) to produce metabolite signatures that can be interpreted to assign a diagnosis. Z-scores of annotated metabolites were calculated using 15 age -and sex-matched control samples originating from the same batch as the patient samples (Table 1). Ranked z-score plots (metabolite with highest z-score at the top), representing metabolic signatures of the 53 patient samples analyzed, were reviewed in a blinded fashion and independently by two laboratory specialists experienced in clinical biochemical genetics to assign the most likely diagnosis. The z-score plot obtained for a hyperargininemia sample is shown in Figure 3 as an example. In 95 of the 106 reviews (90%), the correct diagnosis was achieved. Two diagnoses remained undetected: alkaptonuria and mevalonic aciduria. The known diagnostic biomarkers for these diseases, homogentisic acid and mevalonic acid, were found, but the levels were not significantly different compared to the controls (Table 1). Surprisingly, the urinary homogentisic acid level of the alkaptonuria patient sampled on the same day was strongly elevated (3074 mmol/mol creatinine; reference values <5). Diagnoses were incorrect in some of the cases reviewed for the following IEM: carbamylphosphate synthase I deficiency (two out of four reviews), ornithine transcarbamylase deficiency (1/4), homocystinuria (1/6), tyrosinemia type I (1/4), carnitine transporter deficiency (1/2) and 2-methyl-3-hydroxybutyryl-CoA dehydrogenase deficiency (1/2).
To highlight the unique features of untargeted metabolomics, which may result in the discovery of new biomarkers for diagnosis and therapy monitoring, we present a case of hyperargininemia. In the traditional targeted diagnostic process, hyperargininemia would be diagnosed if plasma levels of arginine are strongly elevated together with elevated plasma levels of glutamine and citrulline and high levels of urinary orotic acid [18]. We performed untargeted metabolomics on a plasma sample of a hyperargininemia patient treated by dietary protein restriction. As expected, arginine was only marginally increased, due to treatment, with a z-score of 2.4 and p-value > 0.05 ( Figure 3, Table 1). Despite the treatment, we detected several elevated features, which were putatively annotated as 2-oxoarginine, N-acetylarginine, argininic acid, homoarginine and 4-guanidinobutyric acid by the Human Metabolite Database (HMDB). These metabolites have been reported previously in hyperargininemia patients [19][20][21][22]. Inclusion of these biomarkers in the panel of IEM metabolites resulted in high z-scores for N-acetylarginine, homoarginine and 4-guanidinobutyric acid in addition to orotic acid, a well-established biomarker of several urea cycle defects ( Figure 3, Table 1). This example shows the strength of untargeted metabolomics analyses. The additional three biomarkers help to appoint the diagnosis hyperargininemia in this sample. Argininic acid and 2-oxoarginine remained undetected during the automated raw data analysis, but manual review of the raw data revealed elevated levels compared to controls. Argininic acid was not peak-picked in either ion mode by Progenesis QI, while 2-oxoarginine was not peak-picked in the negative ion mode and its peak incorrectly deconvoluted in the positive ion mode (the [M+H] adduct of 2-oxoarginine was annotated as the [M-H 2 O+H] adduct of 4-hydroxycitrulline).

Data Analysis and IEM detection
We have studied plasma samples of 53 patients covering 33 distinct IEM to evaluate our UHPLC-Orbitrap-MS platform and data processing pipeline. We investigated the ability to (1) detect metabolites relevant to IEM and (2) to produce metabolite signatures that can be interpreted to assign a diagnosis. Z-scores of annotated metabolites were calculated using 15 age -and sex-matched control samples originating from the same batch as the patient samples (Table 1).

Discussion
Mass spectrometry analytics has matured in recent decades. As a result, metabolomic studies have grown more precise and comprehensive, now allowing the identification of hundreds to thousands of unique metabolites in the analysis of a single biological sample [23]. The current study concerns a novel platform for untargeted metabolomics in diagnosing IEM based on semi-automatic sample preparation combined with pentafluorophenylpropyl phase-based (Kinetex F5) UHPLC coupled to Orbitrap-MS. The individual capabilities of each of these two analytical techniques will work synergistically and enable analysis of low-abundance components in complex samples and separation of isobaric and isomeric species. As an example, we showed the clear separation of the isomers isohexanoylglycine and hexanoylglycine and the isomers N-acetylleucine and N-acetylisoleucine ( Figure 2). To minimize the chance of false positive and false negative results we used the Q Exactive Plus at a resolution of 140,000 (full half-maximum width at 200 m/z). The exceptional specificity of a resolution of 140,000 is of great importance. As we showed, 64% of the isobaric pairs that need a resolution higher than 40,000 will be separated at a resolution 140,000. Furthermore, the within batch quality control data show that, even with this ultra-high resolution, peak areas can be determined accurately. Last but not least, we used a semi-automatic sample preparation procedure. To the best of our knowledge, fully or semi-automated sample preparation procedures have not been used in metabolomics studies to diagnose IEM. Other methods have used manual protein precipitation and centrifugation [8][9][10][11]. Semi-automated sample processing potentially increases laboratory efficiency, reduces user errors and increases reproducibility. It must be noted that we did not investigate performance of semi-automated sample preparation in comparison to manual sample processing, but, from a practical point of view, automation did facilitate integration of sampling handling into the lab information system. An additional advantage

Discussion
Mass spectrometry analytics has matured in recent decades. As a result, metabolomic studies have grown more precise and comprehensive, now allowing the identification of hundreds to thousands of unique metabolites in the analysis of a single biological sample [23]. The current study concerns a novel platform for untargeted metabolomics in diagnosing IEM based on semi-automatic sample preparation combined with pentafluorophenylpropyl phase-based (Kinetex F5) UHPLC coupled to Orbitrap-MS. The individual capabilities of each of these two analytical techniques will work synergistically and enable analysis of low-abundance components in complex samples and separation of isobaric and isomeric species. As an example, we showed the clear separation of the isomers isohexanoylglycine and hexanoylglycine and the isomers N-acetylleucine and N-acetylisoleucine ( Figure 2). To minimize the chance of false positive and false negative results we used the Q Exactive Plus at a resolution of 140,000 (full half-maximum width at 200 m/z). The exceptional specificity of a resolution of 140,000 is of great importance. As we showed, 64% of the isobaric pairs that need a resolution higher than 40,000 will be separated at a resolution 140,000. Furthermore, the within batch quality control data show that, even with this ultra-high resolution, peak areas can be determined accurately. Last but not least, we used a semi-automatic sample preparation procedure. To the best of our knowledge, fully or semi-automated sample preparation procedures have not been used in metabolomics studies to diagnose IEM. Other methods have used manual protein precipitation and centrifugation [8][9][10][11]. Semi-automated sample processing potentially increases laboratory efficiency, reduces user errors and increases reproducibility. It must be noted that we did not investigate performance of semi-automated sample preparation in comparison to manual sample processing, but, from a practical point of view, automation did facilitate integration of sampling handling into the lab information system. An additional advantage of the 96-well Phree filter plates that we employed, is selective binding of phospholipids, which, if not removed, may cause ion suppression [24].
We have evaluated our method by testing 53 patient samples corresponding to 33 different IEM and achieved a correct diagnosis in 90% of the cases. Disorders from the following disease groups were included: aminoacidopathies, urea cycle disorders, organic acidurias, fatty acid oxidation defects, purine and pyrimidine disorders and peroxisomal disorders. For an additional IEM disease group-lysosomal storage disorders-the utility of clinical testing using metabolomics has not been not been reported before. We demonstrate that our metabolomics platform is able to detect mannosyl-β1,4-N-acetylglucosamine (GlcNAc-Man), the current biomarker for β-mannosidase deficiency. Preliminary experiments on oligosaccharidoses showed that biomarkers for aspartylglucosaminuria and α-mannosidosis were also easily detectable (data not shown).
A limitation of our platform was encountered in the raw data processing steps using Progenesis QI. Using a mass resolution of 140,000 allows the determination of the fine isotope pattern (N, O and S atoms) of molecules with m/z < 500, which helps to identify biomarkers by restricting the number of possible annotations. Unfortunately, Progenesis QI was unable to extract the fine isotope patterns, which were indeed correctly recorded by the orbitrap MS. Consequently, we could not use the fine isotope patterns to determine isotope similarity scores. For the annotated compounds, the isotope similarity score was always >85%, which we considered acceptable. Another drawback of Progenesis QI is that integration of picked peaks cannot be edited, e.g., to split isobaric compounds, which the algorithm integrated as a single peak, or to manually add peaks, which were missed by the peak detection algorithm. This resulted in missed annotations and the inability to report some isomers, which were, in fact, present in the raw data. Better procedures for peak-picking and deconvolution of adducts and isotope signals are required to optimize metabolite identification (see below).
Applications of untargeted metabolomics platforms to clinical testing in the field of IEM are scarce. Those platforms that used plasma samples include the approach reported by Miller et al., using GC-MS and LC-high resolution MS run in parallel [8], an LC-QTOF/MS method described by Coene et al. [9], and a direct infusion-Orbitrap-MS method reported by Haijes et al. [11]. Comparison of method performances is complicated, since many variations exist between the different approaches. Retention time stability observed on our PFPP column was ≤ 1.1% within batch and <2% between batches, which is very similar to the values (<1% within run and <2% between run) reported by Coene et al. [9] for the commonly used C18 UHPLC and indicates that very stable chromatography can be achieved on PFPP-based columns. Since the aforementioned metabolomics approaches rely on relative perturbations in metabolite levels, peak area detection must be reproducible across different samples. For the 17 stable isotope-labeled standards added to the samples, the within-batch median peak area CVs were 7-19% (data from all plasma samples included in a batch). These values compare well to the median CV values determined in a similar manner by Haijes et al. [11]: 16-21%. Coene et al. [9] reported a median CV of <20% for approximately 20 metabolites in their QC samples. We found that, in our method, the median within-batch CV values of peak areas of metabolites detected in the QC sample are well below 20% across the chromatogram (Figure 3). Only in the last two minutes of the chromatogram peak area CV values were larger, which may be explained by the fact that late-eluting compounds are apolar metabolites, e.g., long chain acylcarnitines, which are present at very low concentrations (<0.1 µmol/L) in the QC sample with limited solubility in the final diluent (water:methanol 95:5% + 0.5% v/v formic acid). The resulting fluctuations in metabolite recovery may be a major cause of the relatively high variation observed. Since these IEM screening methods investigate metabolite abnormalities within-batch, variation in peak areas between different experimental batches is less of an issue. Still, a stable platform is desirable and between-batch variation in peak areas should be monitored. Median between-batch variation of all standards in all samples was acceptable (27%).
A challenge in the use of untargeted metabolomics platforms to screen for IEM is the lack of adequate determination of some clinically relevant metabolites. Using our platform, homogentisic acid and mevalonic acid levels were not increased in an alkaptonuria sample and a hyper-IgD syndrome sample, respectively, and this resulted in failure to establish the diagnoses in these two cases. Similarly, a normal glutamine level was observed in one CPS I sample, which impeded correct diagnosis. Several other metabolites were not annotated or had normal values (Table 1), e.g., orotic acid in two OTC samples, but in these cases the correct diagnosis could still be established, because other metabolites had abnormal levels. Similar findings have been reported by other researchers. The platform described by Miller et al. [8] correctly diagnosed 20 out of the 21 IEM tested. Their method did not identify methylmalonic acid, tetradecenoylcarnitine (C14:1), and guanidinoacetic acid, but only in the latter case the diagnosis of guanidinoacetate methyltransferase (GAMT) deficiency was missed. Coene et al. [9] correctly identified 42 out of 46 diagnoses and could not diagnose argininosuccinate lyase deficiency, dimethylglycine dehydrogenase deficiency and GAMT deficiency, because abnormal values of argininosuccinic acid, dimethylglycine and guanidinoacetic acid were not annotated. A possible explanation for the inability to detect argininosuccinic acid was the lack of retention by C18 chromatography [9]. In our method, using PFPP-based chromatography, argininosuccinic acid was retained and correctly annotated in all three cases tested. Finally, the DI-HRMS method reported by Haijes et al. [11] could make a precise diagnosis in 19 of the 21 IEM tested in plasma but did not identify methylenetetrahydrofolate reductase deficiency and carnitine palmitoyltransferase I deficiency. Several causes may explain the lack of abnormal test results and the failure to establish a diagnosis. First, due to the rarity of IEM, some samples used in our method evaluation, as well as in studies by others [8][9][10][11], were taken from patients who had already received specific treatment for their condition, which has resulted in less pronounced or even normalized disease-specific biochemical abnormalities. It is to be expected that undiagnosed and untreated patients will show larger deviations in metabolite patterns, which will improve diagnostic accuracy. Prospective studies are required to demonstrate the full capability of untargeted metabolomics platforms in screening for IEM. Second, technical limitations in the used platforms, such as sample preparation, chromatography, raw data processing and data analysis, may hinder correct test results for certain metabolites [8,9,11]. We expect that developments in data analysis will lead to improvement in the methodology. In raw data processing, for example, better procedures for peak-picking and deconvolution of adducts and isotope signals will improve metabolite identification (our data and [9]). The application of comprehensive databases containing all (possible) biomarkers of IEM, informative metabolite ratios and algorithms assisting in recognition of characteristic metabolite patterns are required to further optimize diagnostic performance. As an example, we show that expanding the number of biomarkers for hyperargininemia facilitates its diagnosis.
It is worth reiterating that comparison of the performance of the different methods reported for IEM screening by untargeted metabolomics platforms is complicated, since many variations exist between the different studies, e.g., different IEM were tested and for each disorder distinct samples with different degrees of metabolite abnormalities were used. Appropriate testing of method performance should be performed by sample exchange programs or External Quality Assurance schemes.
In this study, we describe a novel metabolomics method applying a semi-automatic sample preparation procedure combined with UHPLC-Orbitrap-MS. Our metabolomics platform differentiates signatures of many different IEM from that of controls. The use of metabolomics in the field of IEM diagnostics is still in its childhood and likely has not reached its full potential yet. Nevertheless, our results and those of others [8,9,11] show that the number of IEMs detected by metabolomics increases. Progress in automation, like we applied in our novel method, and data analyses, will make diagnostics of IEM faster and cheaper. We envision that in the near future the current approach to diagnose IEM, with numerous targeted tests, will be replaced by untargeted metabolomics methods. Noticeably, we, as well as others [8,9,11], show the potential of metabolomics in IEM diagnostics, but in none of these reports full analytical and clinical validation has been performed, while this is a requirement before application in the clinic. In addition to the potential of screening for known IEMs, untargeted metabolomics platforms allow the identification of new biomarkers, useful to establish diagnoses or to use as a surrogate marker for disease outcome [9,25,26]. Finally, metabolomics provides a comprehensive biochemical phenotype that facilitates interpretation of possible biochemical consequences of variants of unknown significance identified in whole exome sequencing or whole genome sequencing [9,11,27].

Reagents and Chemicals
Acetonitrile (hypergrade) and formic acid were from Merck (Amsterdam, The Netherlands) and UPLC water and methanol from Biosolve (Valkenswaard, The Netherlands). Internal and external standards (mostly isotope labeled) were selected on the basis of two criteria. First, compounds were selected to represent a number of different compound classes, i.e., amino acids, organic acids, acylcarnitines, purines and pyrimidines and a bile acid. Second, from these different compound classes, compounds were selected to have retention times across the chromatogram. An internal standards mixture was prepared containing 600 µmol/L L-phenylalanine (ring-[D 5 ], 98%), 300 µmol/L thymidine Standard mixtures used for external calibration of the Q Exactive Plus were Calmix positive and Calmix negative, for the positive and negative ion mode, respectively (Thermo Fisher Scientific, Breda, The Netherlands). Additionally, an in-house 'Metabolic Laboratory' negative ion mode calibration mix was made to ensure mass accuracy for ions with a m/z < 262, containing 500 µmol/L L-phenylalanine, 500 µmol/L methylmalonic acid, 150 µmol/L taurocholic acid and 330 µmol/L uridine (Sigma Aldrich, Zwijndrecht, The Netherlands). The lock mass solution consisted of 400 mg/L caffeine and 400 mg/L 5-bromo-uracil (Sigma Aldrich, Zwijndrecht, The Netherlands). Custom calibration mix negative ion mode was made by combining 30 µL lock mass solution, 30 µL Metabolic Laboratory negative ion mode calibration mix and 300 µL Calmix negative. The ClinCal amino acid calibrator from Recipe (Munich, Germany) was used as a QC sample.

Sample Selection
Our metabolomics workflow was tested on a range of 33 distinct IEM. In total, 53 plasma samples from 33 known IEM patients (1-4 samples per IEM) were analyzed. IEM diagnoses were previously confirmed by enzyme and/or molecular testing when appropriate. Control samples were obtained from remaining material of patients, which screened negative for all known IEM. Heparin blood samples of both groups were drawn for routine metabolic screening or therapy monitoring without applying a specific protocol on collection of material (e.g., time, fasting/dietary status, treatment). In agreement with national legislation and institutional guidelines, all patients or their guardians approved the possible anonymous use of the remainder of their samples for method validation purposes. The study was conducted in accordance with the Declaration of Helsinki. Samples were stored in a digital-alarm-controlled freezer at −20 • C before analysis for a period ranging from 2 weeks to 14 years. Samples were analyzed in eight experimental runs (batches), including 2-14 IEM samples and 25-38 controls (random age and gender) per batch.

Semi-Automated Sample Preparation Procedure
Semi-automated sample preparation was performed on a Hamilton Robotics ML-STAR eight channel pipetting robot equipped with a camera, iSwap robotic hand, an orbital shaker, an orbital heater/shaker and a vacuum station (Bonaduz, Switzerland) according to the following procedure. All samples were thawed at room temperature for 20-30 min and mixed by vortexing. For each sample a 200 µL aliquot was pipetted into a 1.5 mL polypropylene tube with a unique 2D-barcode containing the sample identifier. Subsequently, all samples were centrifuged at 13,000 rpm for 5 min at 4 • C.
An input file for the pipetting robot was created, containing the name of each sample, the corresponding 2D barcode and whether the sample should be treated as a patient or non-patient sample (e.g., control, blank). Thereafter, the 2D barcodes of the samples in the sample trays were scanned. If the order in the sample trays matched the order on the worklist, sample preparation started. First the vacuum station was assembled by iSwap. Then, 450 µL acetonitrile containing 1% formic acid was added to each well of a Phree 96 well plate (Phenomenex, Maarsen, The Netherlands), which was placed on the shaker. Subsequently 20 µL of internal standards mix was added, followed by 50 µL of sample. For each sample marked as a patient, this was done in triplicate (three separate wells). The 96 well plate was shaken during 2 min at 1000 rpm. After shaking, the iSwap moved the Phree plate to the vacuum station for filtration. A delta pressure of 600 mbar was applied for 5 min to separate the metabolite extract from the protein precipitate. After filtration, the iSwap moved the Phree plate back to the shaker where 500 µL methanol + 1% formic acid was added to each well. The plate was shaken for 2 min at 800 rpm and moved back to the vacuum station for filtration (delta pressure 600 mbar for 5 min) to facilitate extraction of remaining metabolites. The collected filtrate was evaporated to dryness at 60 • C on a Porvair Ultravap (Porvair Sciences Limited, Norfolk, UK). The plate was then placed on the heated shaker position of the pipetting robot and 200 µL water/methanol (95:5) with 0.5% formic acid was added to each well, followed by shaking at 800 rpm for 2 min. Subsequently, 20 µL external standards mix was added to each well of a 350 µL microtiter plate and 130 µL of each sample extract was transferred to the microtiter plate. After shaking for 2 min at 200 rpm, the plate was sealed with a pre-slit PTFE cover (Thermo Scientific, Breda, The Netherlands) and placed in the autosampler of a Dionex Ultimate 3000 UHPLC chromatographic system (Thermo Fisher Scientific, Breda, The Netherlands). Sample preparation and the start of the UHPLC-Orbitrap-MS(/MS) analysis always took place on the same day.

UHPLC-Orbitrap-MS(/MS) Analysis
UHPLC-MS(/MS) analysis was performed using a Dionex Ultimate 3000 UHPLC chromatographic system combined with a Q Exactive Plus mass spectrometer fitted with a heated electrospray source operated in the positive or negative ion mode. The software interface was Xcalibur 4.0.27.42, SII 1.3 and MSTune 2.8 SP 1 (Thermo Fisher Scientific, Breda, The Netherlands).
UHPLC separation was performed on a Kinetex F5 2.6 µm 2.1 mm × 150 mm column equipped with an F5 guard column (Phenomenex, Maarsen, The Netherlands). The column was kept at 20 ± 0.1 • C during analysis. Mobile phases were A: 100% water, 0.5% formic acid also containing 40 µg/L caffeine and 40 µg/L 5-bromo-uracil as lock mass compounds, and B: 100% ACN and 1.0% formic acid. The injection volume for all separations was 3 µL. Chromatographic elution was achieved under gradient conditions with a flow rate of 400 µL/min. Elution started with an isocratic step of 1.03 min at 0% B, followed by a linear gradient from 0% to 25% B (1.03-2.60 min), 25% to 35% B (2.60-5.70 min), and 35% to 95% B (5.70-7.78 min). These conditions were maintained for 3.59 min before returning to 0% B in 0.03 min and equilibration at start conditions for 3.66 min. The total runtime was 15 min.
The Q Exactive Plus mass spectrometer was operated with a capillary voltage of -3.50 kV in the negative ionization mode and 3.50 kV in the positive ionization mode. The capillary temperature was set at 380 • C, and auxiliary gas temperature at 300 • C. The sheath gas pressure, auxiliary gas pressure and sweep gas flow rate were set at 60, 20, and three arbitrary units, respectively, with nitrogen gas. Detection was achieved in both ionization modes from 70 to 1050 m/z. To set the correct pre-scan frequency, chromatographic peak width (FWHM) was set at 3 s in all modes. In full scan mode, the resolution of the analyzer was set at 140,000 (m/∆m, FWHM @ 200 m/z). The maximum inject time was set at 100 ms. AGC-target was set to 3E6. In Full scan -ddMS2 mode: The survey scan was obtained with a resolution of 70.000 (m/∆m, FWHM @ 200 m/z). Maximum inject time was set to 100 ms and AGC-target was set at 3E6. MS/MS scans were acquired with a resolution of 17,500 (m/∆m, FWHM @ [D 3 ]-methylmalonic acid, [D 2 ]-uridine, and [D 8 ]-valine were monitored in the negative ion mode. The within-run CV of the peak area of all standards was not allowed to exceed 30%. Additionally, in the QC sample, the CVs in the peak area of all annotated endogenous metabolites were calculated. Then the calculated CVs were binned on retention time (increments of 1.0 min) and subsequently the median of each bin was taken.

Between-Batch Peak Area Variation
The median peak area of all internal and external standards within the measured QC samples was monitored. The between-batch variation was not allowed to exceed 30%.

Data Processing
The raw data were imported into Progenesis QI v2.4 (Newcastle-upon-Tyne, UK). Progenesis QI deals with alignment of the chromatograms, normalization, deisotoping, adduct deconvolution, peak picking and peak annotation. All settings used within Progenesis QI can be found in the supplementary materials (S7). Progenesis QI, builds an aggregate of the features detected in all samples included in a batch. Any feature detected in one sample in a batch will be detected in all samples in that batch. Each batch contained different samples of patients with varying IEM. Therefore, the number of annotated metabolites varied per batch due to the presence of abnormal metabolites only detected in patient samples. Initially, HMDB 4.0 was used for compound annotation [29][30][31][32]. Compound ions were annotated by matching the retention time (max. ∆ RT: 0.15 min), isotope pattern similarity (>85%) and m/z-value of a feature (max. ∆ ppm: 3) with an in-house database containing metabolites which are known biomarkers for IEM. At the time of writing, this database contains 757 entries of endogenous metabolites, of which 408 have a retention time validated by standards or plasma samples of IEM patients with established metabolite abnormalities. Known co-eluting isomeric compounds share one entry in the database. A selection of 339 metabolites relevant to IEM screening was used to annotate compound ions for routine diagnostic purposes (list available on request). MS/MS spectra were included when available to increase the confidence of the annotation. Here, the dot product score must be larger than 0.60. The confidence level of metabolite annotation was established according to the MSI initiative reporting standard [17].
After these processing steps, we obtained for every batch a matrix containing abundancies where every element corresponded to a sample and feature. Note that some abundancies are the summation of the detected adducts and their isotopes. This matrix, together with qualitative data (e.g., ppm error, isotopic pattern and annotation), was exported as a csv-file and processed by our data pipeline. Within the pipeline, z-scores were calculated for each metabolite by using 15 control samples originating from the same batch as the patient. The z-score was defined as the number of standard deviations a measured value was above or below the mean of the control group. Matching of the controls with the patient was performed on age and sex. First, the controls with the same sex were selected, then we determined the most age-related controls and defined an age cut-off based on the following equation, where age is in years: age patient 0.95 − 0.5 ≤ age control ≤ age patient 1.05 + 0.5 These cut-offs have the tendency to be less strict for increasing age and have a small bias towards older reference samples. When there were less than 15 controls fulfilling these conditions, additional controls were chosen by their similarity in age (which might include the opposite sex). Note that, because of the limited number of controls in a batch, this matching was also limited.
Technical triplicates of patient samples allowed us to gain insight into the technical variability of every metabolite. Excessive technical variability was detected by using the Welch's t-test, which was considered appropriate since the variance of the triplicate differed from the variance of the reference population (the 15 controls). We expected triplicates with a relatively large variance but distant average from the reference average to have low p-values (acceptable measurement). Increasing the variance of the triplicate while fixing the distance to the reference average should lead to increasing p-values (unacceptable measurement). For p-value >0.05, we considered that technical variability of the triplicate was too large to rely on the inferred z-score (average of the triplicates).

Conclusions
In this paper we describe a novel untargeted metabolomics platform for screening IEM combining semi-automatic sample preparation with pentafluorophenylpropyl phase (PFPP)-based UHPLC-Orbitrap-MS. We demonstrate robust performance and show that our method differentiates metabolite patterns of many different IEM from those of controls.
Supplementary Materials: The following are available online at http://www.mdpi.com/2218-1989/9/12/289/s1, Table S1: Within-batch variation in retention time, Table S2: Between-batch variation in retention time, Table S3: Within-batch variation in peak area, Table S4: Between-batch variation in peak area, Figure S5: Peak area variation of all metabolites annotated in the QC sample, Figure S6: Separation of the isomeric pairs isoleucine/leucine and betaine/valine, Table S7: Progenesis QI settings, MS Excel file S8: Information on annotated features used for determination of the variation in peak area.