Metabolomics Diagnosis of COVID-19 from Exhaled Breath Condensate

Infection from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) can lead to severe respiratory tract damage and acute lung injury. Therefore, it is crucial to study breath-associated biofluids not only to investigate the breath’s biochemical changes caused by SARS-CoV-2 infection, but also to discover potential biomarkers for the development of new diagnostic tools. In the present study, we performed an untargeted metabolomics approach using a bidimensional gas chromatography mass spectrometer (GCxGC-TOFMS) on exhaled breath condensate (EBC) from COVID-19 patients and negative healthy subjects to identify new potential biomarkers for the noninvasive diagnosis and monitoring of the COVID-19 disease. The EBC analysis was further performed in patients with acute or acute-on-chronic cardiopulmonary edema (CPE) to assess the reliability of the identified biomarkers. Our findings demonstrated that an abundance of EBC fatty acids can be used to discriminate COVID-19 patients and that they may have a protective effect, thus suggesting their potential use as a preventive strategy against the infection.


Introduction
Infection from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) can lead to severe respiratory tract damage and acute lung injury. Autopsy reports on COVID-19 patients demonstrated diffuse alveolar damage and a wider spectrum of histological lesions affecting both the epithelial and vascular components of the lung [1]. Previous studies showed several biochemical changes in the breath in acute respiratory distress syndrome (ARDS) [2][3][4], and in viral (influenza A) and bacterial (Streptococcus pyogenes) infections [5]. Moreover, it is well-known that the oral-lung aspiration axis is a key factor that leads to many infectious diseases [6]. Therefore, it is critical to examine breath-associated biofluids not only to investigate breath-biochemical changes caused by SARS-CoV-2 infection, but also to discover potential biomarkers for the development of new diagnostic tests. Although the most reliable test to detect SARS-CoV-2 infection is real-time reversetranscription polymerase chain reaction (RT-PCR) [7], new methods for the noninvasive diagnosis and monitoring of COVID-19 disease are needed. Computed tomography (CT) was proposed to evaluate the state of the lungs and virus progression, but radiologists concluded that, considering the pathognomonic imaging of COVID-19, CT cannot be use as a diagnostic tool.
As new diagnostic strategies are urgently needed to control the COVID-19 pandemic, the analysis of exhaled breath condensate (EBC) has been proposed for the identification of infected patients [8,9]. EBC, which is the liquid phase of the exhaled air sampled by cooling, represents a potential surrogate of the lung environment and of the lower respiratory tract of patients. It can be easily collected in a noninvasive manner, as EBC contains a variety of diluted, nonvolatile molecules (ranging from simple ions to DNA, leukotrienes, proteins, lipids, microbiota, etc.) and represents a promising source of biomarkers for both the diagnosis and monitoring of the disease. According to the first breath study conducted by Ruszkiewicz et al., patients with COVID-19 can be recognized from those with other conditions during initial healthcare contact [10]. In addition, Berna et al. found that volatile organic compound (VOC) analysis performed on EBC can be used to identify pediatric patients with SARS-CoV-2 infection [11], while Grassin-Delyle et al. were able to discriminate between COVID-19 acute respiratory distress syndrome (ARDS) and non-COVID-19 ARDS patients that were invasively mechanically ventilated [12]. EBC analysis is frequently used to investigate respiratory diseases, infections, obstructive disorders, and lung cancer [13][14][15].
Although these studies show promising results, it is not yet clear how EBC reflects the presence of SARS-CoV-2 and whether its composition might be associated with the virus. In addition, an extended and untargeted metabolomics characterization of EBC is still missing, since only VOC analysis or real-time, online, proton-transfer-reaction time-offlight mass spectrometry (PTR-MS) technique have been used so far to analyze a limited number of molecules. The importance of extending the range of quantifiable molecules in total EBC is also supported by previous breath metabolite profiles obtained from other infections, including Mycobacterium tuberculosis (MT) and Aspergillus spp. [16], as well as pneumonia [17].
The aim of the present study was to perform an untargeted metabolomics approach using a bidimensional gas chromatography-mass spectrometer (GCxGC-TOFMS) on EBC from COVID-19 patients and negative healthy controls to identify new potential biomarkers for noninvasive diagnosis, monitoring, and clinical outcome prediction of COVID-19 pneumonia. The EBC analysis was further performed on patients with acute or acute-onchronic cardiopulmonary edema (CPE) to determine the reliability of identified biomarkers, while healthy ex-COVID-19 subjects, negative for SARS-CoV-2 at the time of EBC collection, were analyzed to explore whether past infection with SARS-CoV-2 resulted in long-term changes in breath biochemistry.

EBC Collection and Clinical Patients' Characteristics
An untargeted metabolomics analysis was performed on exhaled breath condensate from COVID-19 patients, healthy volunteers, and CPE patients in order to investigate breath-biochemical alterations associated with viral infection and to identify new potential biomarkers ( Figure 1). Table 1 summarizes the clinical features and demographics.

EBC Metabolome Is Influenced by SARS-CoV-2 Infection
From EBC analysis, 322 small molecules and some potential biomarkers were quantified, along with a unique signature associated with the SARS-CoV-2 infection. We first investigated the differences between COVID-19 patients and healthy subjects. As reported in the 3D score plot in Figure 2a, partial least squares discriminant analysis (PLS-DA) clearly shows the presence of a metabolomics profile associated with the infection. The most predictive or discriminative features that are potentially useful in helping sample classification were also determined through the variable importance in projection (VIP) score. The VIP score summarizes the most prominent molecules responsible for the reported phenotypic variances in COVID-19 EBC (Figure 2b).

Potential EBC Biomarkers of SARS-CoV-2 Infection
We then performed univariate statistical analysis on small molecule abundances. A total of twenty-six metabolites (Supplementary Table S1) were differentially expressed in EBC samples (p-value < 0.05 and fold change >1.3), as reported in the volcano plot in Figure 3b. Additionally, we performed hierarchical clustering heat-map analysis to better visualize the abundance of regulated metabolites and EBC sample grouping (Figure 3a).

Validation of EBC Biomarkers with CPE and Machine Learning
We further investigated whether the identified biomarkers were able to distinguish COVID-19 patients from patients with CPE. The best previously selected biomarkers were not statistically different between CPE and COVID-19 patients; however, using the complete chemical fingerprints of small molecules and machine learning, we were able to discriminate the two groups of samples. We built a genetic algorithm machine learning model based on metabolomic data from 17 COVID-19 patients, 7 CPE subjects, and 16 healthy subjects that were randomly selected from our cohort and grouped based on SARS-CoV-2 positivity. The model was then externally validated on the remaining 17 subjects (COVID-19 n = 9; healthy n = 4; and CPE n = 4), reaching an average area under the curve of 0.98 (80 iterations) with a classification accuracy of almost always 100%. Interestingly, among the 20 most important features selected by the algorithm to perform the classification on, there were also best biomarkers obtained through monovariate analysis: monomyristin, monolaurin, heptadecanoic acid-glycerine-(1)-monoester, nonadecanoic acid-glycerine-(1)-monoester, pentadecanoic acid-glycerine-(1)-monoester, dihydroxypropyl icosanoate, 2-tert-butyl-4-ethylphenol, and monostearin. These results suggest that this approach can be used to discriminate COVID-19-positive patients not only from healthy controls but also from other patients with respiratory diseases, such as CPE.
Additionally, since the healthy group included eight subjects who had been infected in the past by SARS-CoV-2, we explored whether the SARS-CoV-2 infection caused longterm changes in breath biochemistry. We found no significant differences in the levels of SARS-CoV-2-breath-associated molecules between healthy subjects who had contracted COVID-19 and healthy controls who were never infected. This could be explained by the fact that healthy subjects who contracted COVID-19 had mild or no symptoms, with limited or no involvement of the respiratory tract.

Discussion
Although the analysis of EBC has been already proposed for the identification of SARS-CoV-2-infected patients, the only few studies reported in the literature are limited to a small number of patients or to a specific subset, such as pediatric [9], to the analysis of the volatile molecules [10][11][12], or to used instruments with a limited range of analyzed molecules such as GC-IMS [11] or PTR-MS [12].
Our results support the ability of the EBC-metabolomic approach to deeply investigate COVID-19 disease. Although most of the identified compounds were found at higher concentrations in healthy subjects, the comparison of EBC molecule abundances shows the presence of 26 metabolites that are significantly different between healthy subjects and COVID-19 patients. Interestingly, eight out of twenty-six molecules were monoglycerides of fatty acids. As reported, it is well-known that medium-chain fatty acids (MCFAs), long-chain fatty acids (LCFAs), and monoglycerides of fatty acids have antiviral activity and provide protection against viruses [19]. Among the best eight biomarkers identified, it is worth noting the presence of three saturate monoacylglycerols (monopalmitin, monomyristin, and monolaurin), which are downregulated in COVID-19 patients. These lipids are well-known for their antiviral properties and the ability to prevent intestinal coronavirus infections [20]. Both monomyristin and monopalmitin dysregulation in EBC have already been associated with an increased risk of lung cancer or infection susceptibility [21,22], and a significant decrease in monolaurin levels can be found in the serum of patients shortly before developing COVID-19 [23].
According to our metabolomics results, higher levels of monolaurin were found in the EBC of healthy controls compared with COVID-19 patients. A study investigating methods to reduce African swine fever virus infectivity showed that the addition of medium-chain fatty acids and monoacylglycerol laurate, in water and in pig's feed, resulted in a reduction in infectivity [24]. Furthermore, Lerner et al. evaluated the effect of MCFAs in pig's feed to prevent the spread of porcine epidemic diarrhea virus (PEDV), and found that the addition of medium-chain fatty acids to feed could decrease the detection of PEDV in food, thus reiterating the antiviral activity of medium-chain fatty acids [25]. Other fatty acids and fatty alcohols with potential viricidal effects were also tested on respiratory syncytial virus (RSV) cell lines, showing that their addition in food could protect infants against infection [26]. Interestingly, research focusing on rat bronchoalveolar lavage fluid discovered that the antibacterial activity of the bronchial mucus was attributable to a higher presence of free fatty acids [27].
Finally, the levels of the dicarboxylic pimelic acid in COVID-19 patients were lower than in healthy subjects, while 2,3-Dihydroxy-2-methylpropanoic acid showed an opposite trend, being found at higher concentrations in the EBC from COVID-19 subjects (Supplementary Table S1). These molecules may be involved in metabolisms associated with SARS-CoV-2 infection and disease development. On the other hand, research performed on NMR spectra of EBC from cats affected by asthma demonstrated that levels of several carboxylic and dicarboxylic acids, such as pimelic acid and suberic acid, were able to be used to distinguish healthy cats from cats with asthma [28].
The validation of potential biomarkers was performed on patients with CPE, which is a potentially fatal cause of acute respiratory failure and is characterized by symptoms similar to COVID-19. Although the biomarkers identified in the comparison between COVID-19 and healthy subjects were not completely suitable for discriminating CPE from COVID-19, the use of machine learning allowed for the correct classification of patients based on EBC metabolome, suggesting that this approach can be used to discriminate COVID-19-positive patients not only from healthy subjects but also from other patients with pulmonary diseases such as CPE.
Compared to previous research performed on adults [10,12] and children [11], we did not identify the markers previously found to be associated with COVID-19, including acetone, 2-butanone, octanal, heptanal, and nonanal. This could be mainly due to: (i) the different sample preparation methods and, in particular, to the different derivatization processes that, in our case, focused on diverse classes of molecules, although a GCxGC-MS instrument was used; (ii) different instrumentation, as the other authors used GC-IMS and PTR-MS; (iii) different patients' immune responses, which not only depend on the age of the subject, but also on the severity and the progression of the disease.
The last striking result we obtained from this study is that we did not identify longterm changes in breath biochemistry from ex-COVID-19 subjects who had mild or no symptoms. The levels of SARS-CoV-2-breath-associated molecules were no different between healthy subjects who had contracted COVID-19 and healthy subjects who were never infected.
In conclusion, EBC contains different diluted molecules that might represent a promising source of biomarkers of SARS-CoV-2 infection and shows potential for use for earlier therapeutic intervention. Although previous studies have shown that breath analysis could be used to detect SARS-CoV-2-positive subjects, the identified biomarkers were mainly VOCs, alkanes, and volatile molecules. Our data demonstrate that an abundance of fatty acids can be used to discriminate COVID-19 patients and that they might have a protective role, thereby suggesting their potential use as a preventive strategy against infection. Although this is the largest untargeted GCxGC-MS research study in the literature to date, there are some limitations: the average age of COVID-19 patients in our cohort was higher than that of the healthy population, although the CPE patients were more similar in age; based on sample preparation, our analysis mainly focused on a few classes of molecules; and a larger validation would be required to confirm our results.

Patients
The study was performed at the university hospital "Maggiore della Carità" in Novara, Italy, between May 2020 and May 2021, according to the principles outlined in the Declaration of Helsinki. The protocol was approved by the Institutional Review Board (Comitato Etico Interaziendale Novara, protocol No. CE 116/20) and written informed consent was obtained from all subjects according to the Italian regulations. We enrolled the 26 patients, named COVID-19 group, with the following inclusion criteria: (1) SARS-CoV-2 positivity detected via reverse transcription PCR (RT-PCR) from nasopharyngeal swab; (2) bilateral COVID-19 pneumonia; (3) age ≥18 years; (4) respiratory distress treated with low-flow oxygen therapy; (5) ability to cooperate and breath trough the mouthpiece of the condenser (Turbo DECCS System, Medivac, Parma, Italy). A cohort of 20 healthy volunteers, named the healthy volunteer group, with negative SARS-CoV-2 reverse-transcription PCR (RT-PCR) from nasopharyngeal swab, was considered the control group. Furthermore, 11 patients with negative SARS-CoV-2 reverse-transcription PCR (RT-PCR) from nasopharyngeal swab, but were hospitalized with dyspneal symptoms due to acute or acute-on-chronic cardiopulmonary edema (CPE), were enrolled as a validation cohort. The sample size was calculated based on previous experiments on EBC from other laboratories; these analyses provided useful data on which to base an estimation of the variability.

Measurements
Demographic characteristics including height and weight, blood sample exams (white blood cell count, lymphocytes count, lactate dehydrogenase, D-dimer, ferritin, platelet count) and American Thoracic Society score [18] performed at hospital entrance, partial pressure of oxygen to fraction of inspired oxygen ratio (PaO 2 /FiO 2 ) and respiratory rate performed on the day of EBC collection, volume of EBC collected, and coexisting comorbidities were recorded. Furthermore, hospital length of stay was registered.

EBC Collection
Patients were instructed to breathe through the mouthpiece. Once Turbo Deccs System (Medivac, Parma, Italy) was cooled at −4 • C, EBC was collected for 10 min. Samples were aliquoted and stored at −80 • C before analysis through bidimensional gas chromatography mass spectrometer.

Sample Preparation
A mixture of ACN/IPA/water (3:3:2) solution (1 mL) with tridecanoic acid as internal standard (1 ppm) was added to 500 µL of EBC and then vortexed. The sample was centrifuged at room temperature at 14,500× g for 15 min. The supernatant was dried using a speed-vacuum system. The derivatization was performed using methoximation (20 µL of methoxamine, 80 • C, 20 min) and BSTFA (50 µL, 80 • C for 20 min). Finally, an internal standard (5.55 µL of hexadecane) was added before the analysis.

GCxGC-TOFMS Analysis
The analyses of the samples were performed using a GCXGC-TOFMS with a dualstage, quad-jet modulator. The MS instrument was a LECO Pegasus BT 4D (Leco Corp., St. Josef, MI, USA). As a first-dimension column, we used a 30 m Rxi-5Sil MS (Restek Corp., Bellefonte, PA, USA) capillary column (internal diameter = 0.25 mm) with a stationary phase film thickness of 0.25 µm, while the second-dimension chromatographic column was a 2 m Rxi-17Sil MS (Restek Corp., Bellefonte, PA, USA) with an internal diameter of 0.25 mm and a film thickness of 0.25 µm. For the carrier gas, we used high-purity helium (99.9999%) at a flow rate of 1.4 mL/min. A sample of 1 µL was injected in splitless mode with the inlet at 250 • C. The temperature program was as follows: (a) initial temperature at 70 • C for 2 min; (b) ramped 6 • C/min up to 160 • C, 10 • C/min up to 240 • C, 20 • C/min to 300; (c) held at this value for 6 min. The secondary column was maintained at +5 • C relative to the GC oven temperature of the first column. The programming rate was the same for both columns. Electron impact ionization was applied at 70 eV. The ion source temperature was set at 250 • C, the mass range was 25 to 550 m/z with an extraction frequency of 32 kHz for the bidimensional and 30 kHz for monodimensional analysis. The acquisition rates were 200 spectra/s for 2D analysis. The modulation period for the bidimensional analysis was maintained at 4 s for the entire run. The modulator temperature offset was set at +15 • C relative to the secondary oven temperature, while the transfer line was set at 280 • C [29][30][31].

Statistical and Data Analysis
Patient data are expressed as medians and interquartile ranges, or absolute numbers and percentages. The chromatograms were acquired in total ion current (TIC) mode. Peaks with signal-to-noise (S/N) ratio lower than 500.0 were rejected. ChromaTOF version 5.31 was used for raw data processing. Mass spectral assignment was performed by matching with NIST MS Search 2.3 libraries and the FiehnLib. The raw files were aligned with Statistical Compare version 4.74. The statistical analysis, the PLS-DA, and all processing was performed with MATLAB R2017 (The MathWorks Inc., Natick, MA, USA) [32], Metaboanalyst software 5.0, and GraphPad v. 7.

Machine Learning Analysis
Statistical analysis was supported by machine learning algorithms. We randomly divided the samples into two cohorts composed of 40 (training) and 17 (validation) subjects. From the training cohort, we selected important metabolite features with an information gain ratio feature selection algorithm, and we employed a genetic algorithm as the classifier algorithm. In the genetic algorithm analysis, parameters were set to 150 iterations, 10 population size, 0.8 crossover, and 0.1 mutation. Algorithms were built using R package caret (version 4.6.14) with 3-fold cross-validation repeated 5 times, the entire framework was repeated 80 times, and an average of accuracies was calculated. These selected important features were used for the genetic algorithm analysis on the independent test cohort (17 subjects).