Urine Metabolites Enable Fast Detection of COVID-19 Using Mass Spectrometry

The COVID-19 pandemic boosted the development of diagnostic tests to meet patient needs and provide accurate, sensitive, and fast disease detection. Despite rapid advancements, limitations related to turnaround time, varying performance metrics due to different sampling sites, illness duration, co-infections, and the need for particular reagents still exist. As an alternative diagnostic test, we present urine analysis through flow-injection–tandem mass spectrometry (FIA-MS/MS) as a powerful approach for COVID-19 diagnosis, targeting the detection of amino acids and acylcarnitines. We adapted a method that is widely used for newborn screening tests on dried blood for urine samples in order to detect metabolites related to COVID-19 infection. We analyzed samples from 246 volunteers with diagnostic confirmation via PCR. Urine samples were self-collected, diluted, and analyzed with a run time of 4 min. A Lasso statistical classifier was built using 75/25% data for training/validation sets and achieved high diagnostic performances: 97/90% sensitivity, 95/100% specificity, and 95/97.2% accuracy. Additionally, we predicted on two withheld sets composed of suspected hospitalized/symptomatic COVID-19-PCR negative patients and patients out of the optimal time-frame collection for PCR diagnosis, with promising results. Altogether, we show that the benchmarked FIA-MS/MS method is promising for COVID-19 screening and diagnosis, and is also potentially useful after the peak viral load has passed.


Introduction
SARS-CoV-2 caused the worst pandemic in the last 100 years. Modern-day laboratory medicine was highly impacted by: the need for the implementation of new technologies; the shortage of the workforce and of supplies, equipment overload, and regulatory changes; this being in addition to the emergence of new mutations [1][2][3]. Considering the incessant demand for fast and accurate diagnosis, the critical role of clinical laboratory tests in human health has become apparent [4,5]. The increasing need for patient testing motivated many clinical laboratories to explore different methods for the collection [6][7][8][9][10], handling [11][12][13], and analysis [14][15][16][17][18][19] of samples, along with different specimen types [20,21].

Subjects
Self-collected urine samples from 246 volunteers were prospectively obtained from July to October 2020 at three medical centers in Bragança Paulista (SP, Brazil); Santa Casa and Bragantino Hospitals; and at the Integrated Unit of Pharmacology and Gastroenterology (UNIFAG). No fasting guidelines were given to the volunteers prior to sample collection. In a convenience sampling, we also recruited healthy volunteers and patients hospitalized with moderate or severe [52] symptoms after being admitted to the medical center. Patients older than 18 years old with suspicion of COVID-19 were recruited according to the following eligibility criteria: patients hospitalized in the medical center, non-pregnant, without mechanical ventilation or indwelling catheter; further, patients who were facing imminent death were excluded. Healthy non-pregnant volunteers older than 18 were selected if they declared no previous contamination by COVID -19 or close contact with infected people.
Institutional Review Board (IRB) approval was received for the study (protocol number 31573020.9.0000.5514, approved on 29 May 2020). Samples were collected from healthy volunteers (n = 104) and hospitalized volunteers when they possessed symptoms similar to those found in COVID-19 infections (n = 142). All the healthy and symptomatic volunteers had their diagnoses confirmed via an analysis of nasopharyngeal swab samples through an RT-PCR, which were used for the purposes of recruitment into the study or as part of their clinical care, using Brazilian-certified analysis services. RT-PCR was performed using a TaqPath COVID-19 RT-PCR IVD Kit (Thermo Fisher), and the results were interpreted using the COVID-19 Interpretative Software, according to the manufacturer's instructions, with a cycle threshold (Ct) value of <37. Positive SARS-CoV-2 infection was confirmed for 99 hospitalized volunteers and discarded for 43. Table 1 provides the patient demographic and clinical information. Patients or volunteers with inconclusive RT-PCR results were resampled or excluded. Table 1. Clinic and demographic information of individuals recruited for the study, including SARS-CoV-2-negative non-hospitalized subjects (Neg-NH) and SARS-CoV-2-positive hospitalized subjects (Pos-H), used for model building and evaluation. Withheld Sets 1 and 2 containing symptomatic SARS-CoV-2-negative hospitalized subjects (Neg-H) are also shown.

Sample Preparation
Urine samples were heat-inactivated after collection (65 • C, 30 min) [53] in a Class II biological safety cabinet before being aliquoted and frozen until extraction. All the samples were thawed at room temperature. A pooled sample was prepared from equal parts (10 µL) of each sample and then aliquoted in different quality control (QC) samples, which were extracted and distributed every ten injections for instrumental monitoring. This resulted in 10 QC samples for system suitability and 28 samples QC for intra-batch monitoring. Samples (300 µL) were randomized and centrifuged (12,000 rpm, 4 • C, 10 min). Next, the supernatant (150 µL) was collected, following the addition of water (120 µL), acetonitrile (15 µL), and internal standard (IS) solution (15 µL of isovaleryl-DL-carnitine-(N,N,N-trimethyl-d9) hydrochloride solution at 11.1 ng mL −1 in methanol). Blank samples were prepared using ultrapure water instead of urine.

Flow Injection-Tandem MS Analysis
Data acquisition was performed on a Waters ® Xevo TQD triple quadrupole mass spectrometer equipped with a Shimadzu ® SCL-10A controller, a Shimadzu ® LC-20AD pump controller, and a Shimadzu ® SIL-20A automatic sampler injector. The methodology employed Flow Injection Analysis (FIA) without chromatographic separation, and 10 µL was used as injection volume. Further, the mobile phase was composed of water:acetonitrile:formic acid (80:20:0.1 v/v/v). A flow gradient was used, starting with a zeroed flow until 0.5 min. We initially zeroed the flow rate to allow the integration of the entire peak, with no cuts due to the proximity to the y-axis. Afterward, the flow ranged from 0 to 0.5 mL min −1 from 0.5 to 0.51 min, at which point it was maintained until 3.50 min, and was then decreased to 0.1 mL min −1 , with a total runtime of 4 min. Multiple reaction monitoring (MRM) transitions were optimized for each compound by analyzing labeled and unlabeled standards, as described in Supplementary Table S1 (ST1). The acquisition was controlled by the Target Lynx software (Waters).

Data Analysis and Statistical Classifiers
The ratio of the peak areas of the analytes and the IS was considered and processed using Metaboanalyst 5.0 (http://www.metaboanalyst.ca) [54]. Calculations were made based on the relative peak area ratios of each analyte/IS through the different groups. Missing values were replaced by 1/5 of the minimal positive values of their corresponding variables. Relative standard deviation (RSD) was calculated for the intra-batch QC samples, and those analytes found with RSD > 25% were not considered for statistical modeling. Interquartile range filtering was applied in order to remove variables with near-constant values. Data normalization was performed by sum, followed by generalized logarithm transformation [55], while the Pareto scaling method was applied. The resulting dataset was used for statistical analysis using the least absolute shrinkage and selection operator (Lasso).
As hospitalized patients had their urine samples collected in a time lapse from 0 to 95 days from the swab collection to RT-PCR diagnosis, we, therefore, used a time frame to select patients in order to build the statistical classifier. For this purpose, we considered time-qualified samples, such as those from volunteers with a time interval of two days between urine and swab collections and the onset of symptoms of 14 days or less from the urine collection. The classifier was built using 75% of data from healthy non-hospitalized COVID-19 PCR-negative (n = 78, Neg-NH) and hospitalized COVID-19 PCR-positive (n = 32, Pos-H) patients. We validated the model with the remaining 25% of the data composed of Pos-H (n = 10) and Neg-NH (n= 26) volunteers. Additionally, we tested the ability of this model to predict on a withheld sample set (Withheld Set 1) composed of suspected hospitalized/symptomatic COVID-19 PCR-negative (n = 24, Neg-H) patients. We also tested this classifier's prediction on samples that were excluded because they did not meet the selected time interval criteria for swab collection/symptoms onset. This sample set (Withheld Set 2) was composed of Pos-H (n = 57) and suspected hospitalized/symptomatic Neg-H (n = 19) patients. Cutoff values for positivity definition were selected based on the receiver operator characteristics (ROC) curve for training and validation sets. We evaluated the model's performance for the validation and test sets by measuring the predictive accuracy, sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV), which were all calculated based on the agreement with PCR diagnosis.
Univariate analysis was performed after data normalization using the Kruskal-Wallis test for the three groups (Pos-H, Neg-NH, and Neg-H), followed by Dunn's post hoc test, using the Benjamini-Hochberg (BH) correction for the p-value. Afterward, the Mann-Whitney test was used to examine differences between Pos-H vs. Neg-NH, Pos-H vs. Neg-H, and Neg-NH vs. Neg-H (25), followed by the BH correction of the p-value. The stability of the analytes to the heat-inactivation process was evaluated using RSD (Supplementary Table S2). Calculations were performed in R version 3.6.3 (R Foundation for Statistical Computing). Discriminant metabolic markers found by Lasso analysis were interrogated for the purposes of pathway enrichment analysis by using the metabolite set enrichment analysis (MSEA) via over-representation analysis from the Metaboanalyst web platform [56]. Two metabolomics databases were interrogated, i.e., Kegg and the MSEA's disease-associated metabolite sets using urine as a reference (Supplementary Figure S1) [56,57].

Results
Detection of 19 amino acids, such as alanine, leucine, glutamine, tryptophan, and 15 acylcarnitines-such as free-carnitine, malonyl-carnitine, octadecanoyl-carnitine-were achieved from urine analysis, as presented in the Supplementary Table S1 along with the relative standard deviation (RSD) measured for the QC samples (Supplementary Table S2). Although asparagine and aspartate were detected in our method, they were excluded from statistical analyses due to the higher variability measured in their peak area ratios (RSD > 25%, n = 28, ST2). Monitoring the labeled internal standard signal along the QC samples resulted in 3.3% of RSD (N = 28 QC samples, Supplementary Table S2), showcasing the analytical stability of the method. Note that the heat inactivation process did not appear to alter the peak area of the analytes, as the RSD measured between heatinactivated and non-inactivated samples was lower than 15% for the entire set of analytes (Supplementary Table S2). Twenty-nine metabolites were detected with metrics above thresholds established for RSD and thermal stability (Supplementary Table S3); further, these were then used for statistical analysis. Figure 1 shows that high diagnostic performances were achieved using statistical analysis for the training and validation sample sets. Only 1 out of 32 Pos-H samples in the training set and 1 out of 10 Pos-H samples from the validation set were erroneously classified as negative, resulting in high sensitivity (97% and 90%) and negative predictive values (NPV) of 99.0% and 96.3% for the training and validation sets, respectively. Amongst negative samples, 4 out of 78 were misclassified in the training set. In contrast, none of the 26 samples were misclassified as positive in the validation set, resulting in positive predictive values (PPV) of 89.0% and 100.0% for the test and validation sets, respectively; further, specificities of 95% and 100% were noted for these sets. The overall agreement to PCR was 95.0% and 97.2% for the test and validation sets, respectively. The cutoff value for classification was 0.181. The influence of age on the classifier's predictive performance was evaluated and was noted to have minimally improved the classification metrics (Supplementary Table S4). However, we opted not to take this variable into account with the goal of building a model that is independent of age; this is because we expect to adapt the model to different populations in the future. We observed that other studies also reported age and sex disparities in their sample sets, which is one of the disadvantages of using convenience sampling approaches. Dewulf et al. investigated a targeted urinary metabolic panel in 56 patients who were hospitalized with COVID-19 (26 non-critical and 30 critical); further, they also utilized 16 healthy controls and 3 controls with proximal tubule dysfunction unrelated to SARS-CoV-2 [35]. Their control set comprised 31% men, while their positive set comprised 69-83% men. Thomas et al. also reported a divergence in age and sex when evaluating serum metabolites of patients with COVID-19 (n = 33, which was diagnosed by nucleic acid testing), compared with COVID-19-negative controls (n = 16). They reported 76% of subjects in the disease group as male, aged 56.5 ± 18.1 years old (mean ± standard deviation), and a control group comprising 38% of men, aged 37.8 ± 11.6 years old [46]. Ling Yan et al. used the serum peptidome as the diagnostic matrix for COVID-19 [58]. The group infected by COVID-19 had an average age of 46.6 ± 14.9 and 47.2 ± 15.4 (training and validation sets), whereas the control group had an average age of 32.4 ± 11.4 and 29.6 ± 10.2, for training and validation sets.
When the statistical classifier was used to predict the Withheld Set 1 (Supplementary  Table S6A  The statistical classifier, built using the Lasso algorithms, was based on 14 predictive metabolites, which were given associated mathematical weights according to their relevance to each classifier class, as described in Figure 2. Some variables, which were Lasso selected, also have significant values for the purposes of univariate statistical analysis, such as fold change and adjusted p-value, as presented in Supplementary Table S5A. To The statistical classifier, built using the Lasso algorithms, was based on 14 predictive metabolites, which were given associated mathematical weights according to their relevance to each classifier class, as described in Figure 2. Some variables, which were Lasso selected, also have significant values for the purposes of univariate statistical analysis, such as fold change and adjusted p-value, as presented in Supplementary Table S5A. To visualize the changes in metabolite abundance-which are also in the Neg-H group-not accounted for when using the binary Lasso model, we additionally performed a univariate analysis based on the Kruskal-Wallis test (Figure 3 and Supplementary Table S5B). We could not find any significant metabolic alteration when comparing Neg-H and Pos-H groups, which is in agreement with their similar clinical states. On the other hand, 13 of 14 metabolites indicated by the Lasso analysis were also altered between Neg-NH and Neg-H, evidencing how the metabolites are affected by hospitalization and clinical symptoms. ate analysis based on the Kruskal-Wallis test (Figure 3 and Supplementary Table S5B). We could not find any significant metabolic alteration when comparing Neg-H and Pos-H groups, which is in agreement with their similar clinical states. On the other hand, 13 of 14 metabolites indicated by the Lasso analysis were also altered between Neg-NH and Neg-H, evidencing how the metabolites are affected by hospitalization and clinical symptoms.  To investigate the biological significance of the metabolites selected by our model and evaluate if the changes observed in the chemical patterns were correlated to biological processes involved in infections, we performed a metabolite enrichment analysis of the discriminatory analytes. This analysis resulted in seven significantly altered pathways (FDR < 0.05), as shown in Figure 4.   stars (***). If a p-value is less than 0.0001, it is flagged with four stars (****).
To investigate the biological significance of the metabolites selected by our model and evaluate if the changes observed in the chemical patterns were correlated to biological processes involved in infections, we performed a metabolite enrichment analysis of the discriminatory analytes. This analysis resulted in seven significantly altered pathways (FDR < 0.05), as shown in Figure 4.

Discussion
The method we developed for COVID-19 diagnosis is an adaptation of the wellknown and worldwide established newborn screening methodology based on selective MS/MS detection. Utilizing a cohort of 246 RT-PCR validated samples, we opted to build a classifier using samples selected based on rigorous criteria that ensured maximum viral load based on the proximity of the onset of symptoms and RT-PCR collection date. Using this approach, we showed that a panel of amino acids and acylcarnitine could be used to develop classification models that are highly sensitive (>90%), specific (>95%), and accurate (>95%) for COVID-19 screening (Figure 1).
The reported performances of serological or antigen tests for diagnosis or confirmation of SARS-CoV-2 infection present sensitivities ranging from 21.8 to 97.9% (serological) and 34.1 to 96% (antigen), as recently revised by Bastos et al. [25], and Dinnes et al. [60]. These authors revised 104 studies, including 38 serological tests and 16 antigen tests applied to symptomatic volunteers, finding specificities ranging from 80.6 to 100% for serological tests and 34.1 to 96% for antigen tests. Böger et al. reviewed the performance of RT-PCR of nasopharyngeal specimens in four different studies and found 73.3% for sensitivity and 100% for specificity [61]. The method we introduced here presented a simple sample workup consisting of dilution and centrifugation. We developed the method to provide a short processing time, with a run time of 4 min, with no chromatographic separation. Good sensitivities and specificity rates were found, as well as also the ability to detect COVID-19 infection outside the "optimal detection window", as presented in Figure 1. Altogether, these results showcase the potential of FIA-MS/MS to be used as a screening technique or for time course follow-up. However, further studies for clinical validation should include the evaluation of contamination with other viruses, such as influenza, and including positive asymptomatic people and other virus variants.
The classification results obtained for the Withheld Set 1, composed of samples from suspect Neg-H patients, suggest that our classifier mainly reflects patient infection status, given the agreement with chest CT scan results (Figure 1). The chest CT scan is a fundamental tool for COVID-19 diagnosing and monitoring. However, it cannot differentiate between an active or previous viral infection or, indeed, indicate the viral pathogen-resulting in lower specificity than RT-PCR for COVID-19 diagnosis [62][63][64][65][66]. For patients from Withheld Set 1, the negative result from the RT-PCR test was in disagreement with their clinical profile and chest CT scan findings for most cases (19 out of 24). From 19 Neg-H patients with viral suggestive chest CT, our model classified 17 as being positive for COVID-19. For example, patient #34 (see Supplementary Table S6A in Data Supplement), a 76-year-old male, received a negative result for RT-PCR, while he was classified as positive by our classifier. The patient presented a chest CT scan that was suggestive of viral infection with ground-glass opacity, consolidations, and pulmonary commitment (50%). The patient was in the intensive care unit (ICU) for 13 days, 11 of which required the use of mechanical ventilation, until death. As recognized by many studies [62][63][64][65], repeated PCR tests should be used for patients with an inconclusive diagnosis in order to more accurately diagnose COVID-19, although repeated PCR tests were not performed for the patients in our study as this could have resulted in a false-negative diagnosis. The disagreement of RT-PCR and chest CT scan results for the Neg-H volunteers, assumed to be the absence of a second-tier or confirmatory test for these individuals, motivated their exclusion from the training/validation sets and also in the option to keep them predicted within Withheld Set 1. The effect of the time lapse between symptoms onset and sample collection day was interrogated by analyzing the Withheld Set 2, which showed similar results to those acquired for training and validation sets. The results suggests that the detected metabolic alterations enabled sample classification for patients who were assessed more than 14 days from symptoms onset and after two days from RT-PCR detection.

Conclusions
In conclusion, we showed that urine analysis, using an adaptation of the known method for newborn screening by FIA-MS/MS, is a promising methodology for COVID-19 screening and diagnosis, with the potential to be used even after the peak viral load passes. The non-invasive sample collection, the lack of need for specific primers, and the possibility of using existing laboratory resources in order to implement the methodology demonstrate the technique's feasibility to be fully validated. This includes multi-center trials, as well as occurrences for newborn screening programs. Our method also revealed substantial changes in the metabolome of infected patients and pointed out the relation of COVID-19 to other diseases, providing insights into the physiopathology of the disease. Importantly, our method uses urine, a non-invasive and self-collectible sample that would ease the collection procedure without overburdening medical staff. Urine has also been shown to contain dense and consistent biological information regarding COVID-19 infection.
Further advancements should focus on measuring the specificity of the method for samples that are obtained from patients presenting multiple pathogens, as well as its ability to detect COVID-19 in asymptomatic infected people or to distinguish COVID-19 infection from other critical diseases. Longitudinal experiments following the time course of the infection would also be valuable to better understand the metabolic changes in urine during different phases of the infection. The challenges faced in developing new alternatives for COVID-19 screening underscore the need to provide new methodological insights ahead of the next health security crisis.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/metabo12111056/s1. Table S1: Amino acids and acylcarnitines investigated in urine from COVID-19 patients and their experimental detection parameters. Table  S2: Relative standard deviation of metabolites and internal standard for QC samples during batch analysis and after the heat inactivation test. Table S3: Amino acids and acylcarntines selected after RSD and IQR filtering processes performed in MetaboAnalyst. Table S4: The effect of age and sex on the model's performance. Table S5A: Comparison of analytes between the groups (Pos-H and Neg-NH) of patients using the Mann-Whitney test. Table S5B: Comparison analytes between the groups (Pos-H, Neg-NH, and Neg-H) of patients using the Kruskal-Wallis test and Dunn's Test as a post hoc measure. Table S6A

Institutional Review Board Statement:
This study was approved by the Ethics Committee of USF (Research Ethics Committee approval CAAE# 31573020.9.0000.5514, 29 May 2020).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study before having their biological samples collected.

Data Availability Statement:
The data analyzed and generated in our work are available upon request from the corresponding author. The data are not publicly available due to patient confidentiality, participant privacy, and ethical restrictions.