Integrative Plasma Metabolic and Lipidomic Modelling of SARS-CoV-2 Infection in Relation to Clinical Severity and Early Mortality Prediction

An integrative multi-modal metabolic phenotyping model was developed to assess the systemic plasma sequelae of SARS-CoV-2 (rRT-PCR positive) induced COVID-19 disease in patients with different respiratory severity levels. Plasma samples from 306 unvaccinated COVID-19 patients were collected in 2020 and classified into four levels of severity ranging from mild symptoms to severe ventilated cases. These samples were investigated using a combination of quantitative Nuclear Magnetic Resonance (NMR) spectroscopy and Mass Spectrometry (MS) platforms to give broad lipoprotein, lipidomic and amino acid, tryptophan-kynurenine pathway, and biogenic amine pathway coverage. All platforms revealed highly significant differences in metabolite patterns between patients and controls (n = 89) that had been collected prior to the COVID-19 pandemic. The total number of significant metabolites increased with severity with 344 out of the 1034 quantitative variables being common to all severity classes. Metabolic signatures showed a continuum of changes across the respiratory severity levels with the most significant and extensive changes being in the most severely affected patients. Even mildly affected respiratory patients showed multiple highly significant abnormal biochemical signatures reflecting serious metabolic deficiencies of the type observed in Post-acute COVID-19 syndrome patients. The most severe respiratory patients had a high mortality (56.1%) and we found that we could predict mortality in this patient sub-group with high accuracy in some cases up to 61 days prior to death, based on a separate metabolic model, which highlighted a different set of metabolites to those defining the basic disease. Specifically, hexosylceramides (HCER 16:0, HCER 20:0, HCER 24:1, HCER 26:0, HCER 26:1) were markedly elevated in the non-surviving patient group (Cliff’s delta 0.91–0.95) and two phosphoethanolamines (PE.O 18:0/18:1, Cliff’s delta = −0.98 and PE.P 16:0/18:1, Cliff’s delta = −0.93) were markedly lower in the non-survivors. These results indicate that patient morbidity to mortality trajectories is determined relatively soon after infection, opening the opportunity to select more intensive therapeutic interventions to these “high risk” patients in the early disease stages.


Introduction
The coronavirus disease 2019 (COVID-19) pandemic continues to present global challenges to individuals, health systems, and economies and the long-term consequences of the disease are poorly understood. Our ability to develop effective therapeutic management strategies remains reliant on improved understanding of the pathogenic mechanisms associated with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection [1,2]. Spectroscopic measurements to establish metabolic consequences of human disease have proved to be a powerful tool for exploring existing clinical challenges and can also readily be applied to help understand emergent diseases such as COVID-19 [1].
COVID-19 has spread worldwide infecting over 767 million and killing more than 6.9 million as of July 2023. We have previously shown that, upon SARS-CoV-2 infection, a metabolic "phenoconversion" from healthy through different stages of infection is readily detectable by both Nuclear Magnetic Resonance (NMR) and mass spectrometry [1,[3][4][5][6], impacting multiple metabolic pathways and different organ systems. To understand the acute and long-term effects of COVID-19, we and others have compared the plasma of COVID-19 patients with healthy controls using a variety of analytical platforms [7][8][9]. In addition to damaging the respiratory system, SARS-CoV-2 infection affects multiple organs [10][11][12][13], which creates a continuum of emergent metabolic phenotypes some of which appear to relate to respiratory severity. In a meta-analysis of 57 studies, more than 50% of previously hospitalized SARS-CoV-2 survivors were found to have persistent postacute pathological sequelae including neurologic disorders, general functional impairment, fatigue, and cardiac abnormalities [14]. At the molecular level, infection-related signatures have been found across a range of molecular groups and pathways, some of which persist for several months post-infection. For instance, perturbations in glutamine, glutamate, and taurine are indicative of disruption of hepatic metabolism and, whereas elevated levels of α1-acid Glycoprotein (GlycA) are associated with inflammation, disrupted tryptophan metabolism in some individuals may relate to a neurological impact [2,3,15]. Given the potential socioeconomic impact of SARS-CoV-2 infection, it is important to understand whether respiratory severity of the acute infection is indicative of downstream impact on other organs and systems. If the metabolic dysregulation associated with Post-Acute COVID-19 Syndrome (PACS) [2,16,17] is not closely related to the respiratory infection severity, then follow up and monitoring of patients who experienced mild symptoms may be as important as those that experienced severe respiratory symptoms in their acute infection phase, particularly if the biochemical profile indicates the involvement of multiple systems or organs.
The aim of this study was to comprehensively map the metabolic signature of each severity class to determine if a patient with SARS-CoV-2 infection who experiences mild respiratory symptoms is metabolically distinct from a patient with severe symptoms using a wide range of quantitative parameters (n = 1034) derived from targeted profiling using a combination of NMR spectroscopy and ultra-performance high resolution mass spectrometry (UPLC-MS) based on prior knowledge of disrupted molecular pathways [5,6,18,19]. Here, we explore further metabolic data from a previously reported Spanish COVID-19 cohort [5,20] to measure the impact of respiratory disease severity on the systemic metabolic signatures. SARS-CoV-2 participants were stratified into four classes ranging from mild respiratory symptoms to hospitalized with severe respiratory symptoms requiring ven-tilation in ICU. Serum from a further group of participants collected pre-pandemic were included as non-infected controls. This system for stratifying severity has been widely used in studies on the effects of COVID-19 [21,22]. In the most severely affected category, a 56.1% mortality rate due to an immunological cytokine storm was observed consistent with other similar studies [23][24][25][26]. We derived a predictive model of mortality for the most severe class using multiple combined panels of molecules to achieve a broad phenotype of SARS-CoV-2 infection.

SARS-CoV-2 Infection Induces Reproducible Metabolic and Lipidomic Consequences across Severity Classes Reflective of Systemic Multi-Organ Effects
As reported in numerous articles, the impact of SARS-CoV-2 infection on the serum metabolite profile causes a disruption of multiple metabolites reflecting dysregulation of pulmonary, cardiovascular, hepatic, and neurological processes [27]. Many of these pathways are immunologically driven via complex cytokine fluctuations [4,6]. Using the integrated metabolite panel from the combined NMR and UPLC-MS assays to compare infected versus non-infected individuals, regardless of respiratory severity class, it was apparent that the differential molecular signature of SARS-CoV-2 infection included reduced levels of phosphocholine, phosphoethanolamine, lysophosphocholine, hexosylceramide, glutamine, Fischer's ratio (sum of the branched chain amino acids/sum of the aromatic amino acids), histidine, high density lipoprotein parameters, and lactate:pyruvate ratio with higher levels of ABA1, LDL triglycerides, formate, pyruvate, phenylalanine, glutamate, aspartate, neopterin:tryptophan ratio, and the (aspartic acid + glutamic acid)/(asparagine + glutamine) ratio ( Figure 1). This profile is consistent with previous studies [28][29][30], including our analysis previously carried out for a subset of this study containing 75 SARS-CoV-2 positive patients [20], and reflects the multi-organ impact of SARS-CoV-2 infection with differentially altered parameters indicating increased cardiovascular risk [31], e.g., apolipoprotein B100/A1 ratio, liver damage, e.g., the Fischer's ratio and taurine [32] and cellular immune activation e.g., neopterin and kynurenine:tryptophan ratio [3,33].
In the combined O-PLS-DA model calculated using 1034 fully quantified metabolic variables from all four assays: lipids; lipoproteins, and low molecular weight metabolites (MS derived amino acids and tryptophan pathway metabolites and the NMR derived small molecules), an AUROC of 0.99 differentiating infected from non-infected participants was achieved (Table 1). Of the 1034 variables in this model, 598 were significant after correcting for multiple testing. All the significant metabolites in the combined model and the associated Cliff's delta and adjusted p-values can be found in Table S5. The level of statistical significance attached to many of the COVID-19 biomarkers is striking. For instance, pyruvic acid (4.04 fold higher than control; p-value 2.67 × 10 −38 ), formate (3.71 fold higher than control; adjusted p-value of 1.18 × 10 −43 ), PC 18:2/18:2 (0.18 fold higher than control; p-value of 4.48 × 10 −39 ), PE.O 16:0/20:4 (0.18 fold higher than control; p-value 4.48 × 10 −39 ), Asp+Glu/Asn+Gln ratio (2.71 fold higher than control; p-value 1.21 × 10 −38 ) and the PE.P 18:1/20:4 (0.17 fold higher than control; p-value 2.67 × 10 −38 ) were the strongest directly associated markers of SARS-CoV-2 infection ( Figure S1).
As shown in Table S1, the average age of the controls is significantly different for each of the COVID-19 severity groups. The 1034 fully quantified metabolic variables from the four assays were therefore corrected for age and the controls versus SARS-CoV-2 positive patients were modelled using O-PLS-DA ( Figure S2). It can clearly be seen that the most significant metabolites which are elevated in the non-age corrected model ( Figure 1) are also present and the most highly significant in the age corrected model ( Figure S2). These include formic acid (p-value of age corrected model = 3. 16 Table S5.  [34,35]. This strong correlation of pyruvate with SARS-CoV-2 infection is consistent with reported altered mitochondrial function (failure to utilize pyruvate as an energy source) following viral infection, which can trigger an immune response that shifts towards aerobic glycolysis to increase production of fatty acids, amino acids, and nucleotides [29]. Increased circulating hypoxia-inducible factor-1α (HIF-1α), which induces glycolysis [36], has been reported in SARS-CoV-2 infected patients, as has lactate [37]. Disruption of oxidative phosphorylation has been independently observed via transcriptomic measurements in COVID-19 [38] and further studies on the dynamics of mitochondrial disruption during COVID-19 are warranted.
Whilst increased serum pyruvate could result from a dysregulation of mitochondrial metabolism or hepatic central carbon metabolism [35], another theory is that the increase in pyruvate concentrations is driven by an increase in lactate dehydrogenase activity [39]. Zhou et al. [39] reported that the lactate to pyruvate ratio, which reflects lactate dehydrogenase activity, was markedly lower in the infected group in comparison to the non-infected group. Marin-Corral et al. [34] published similar findings and proposed that since the lower lactate to pyruvate ratio in SARS-CoV-2 infection was not associated with a concomitant increase in serum lactate concentrations, the high pyruvate concentrations may rather reflect an imbalance of nicotinamide adenine dinucleotide (NAD) metabolism, which is required to convert pyruvate into lactate. In support of this hypothesis, they found evidence of alteration in other metabolite ratios that require NAD+ as a cofactor in SARS-CoV-2 patients including the transformation of cortisol into cortisone by 11β-hydroxysteroid dehydrogenase type 2 [34]. The synthesis of NAD+ is dependent on the kynurenine arm of the tryptophan pathway [40], which is also disrupted following SARS-CoV-2 infection. We also found a significantly lower lactate to pyruvate ratio in infected patients (p-value = 6.10 × 10 −35 ). However, although the infection-related increase in pyruvate and decrease in the lactate:pyruvate ratio was amongst the strongest differentiators of non-infected and infected samples, we did not find a direct relationship with severity indicating a quantized shift of metabolic state associated with infection.
We found glutamate and aspartate to be directly associated with SARS-CoV-2 infection, whereas glutamine and asparagine were moderately but significantly (Table S5) inversely associated. Therefore, we calculated the ratio Asp:Glu/Asn:Gln to summarize these observations and found it to be strongly associated with the severity of infection. Other studies have also shown that SARS-CoV-2 infection results in significantly enriched aspartate and glutamate metabolism [3,28,41], and that impaired glutamate and glutamine pathways were the strongest metabolic indices of SARS-CoV-2 infection [42], with some proposing glutamine supplementation as part of the therapeutic management of the infection [43]. However, some research groups have reported lower plasma glutamate concentrations in SARS-CoV-2 infected individuals [44]. On balance the literature and our results from the current study indicate that the glutamate to glutamine ratio is strongly associated with SARS-CoV-2 infection [45]. Both aspartate and glutamate are major anaplerotic carbon sources for the citric acid cycle, and they may be another indication of reduced mitochondrial efficiency during SARS-CoV-2 infection. Krishnan et al. reported higher levels of serum glutamate and found that the surface expression of the glutamate transporter xCT (SLC7A11) was increased in monocytes in SARS-CoV-2 infected patients and showed that glutaminolysis was essential for replication of the SARS-CoV-2 virus [46]. There has been considerable concern about reports of new onset diabetes caused by SARS-CoV-2 infections [3,47,48]. It is of note that high plasma glutamate and low plasma glutamine observed here and in earlier work [3] is a strong feature in the plasma during the acute phase of the disease. We previously reported that the glu/gln ratio remained persistently high in patients following COVID-19 and was one of the least reversible of the metabolic features measured in "long COVID" patients [2]. This indicates a persistent driver for type 2 diabetes in post COVID-19 patients and warrants further investigation in relation to long-term diabetic risks.
We also found elevated plasma Ornithine in infected patient samples which may indicate upregulation of the urea cycle, possibly driven by the increase in serum aspartic acid. Upregulation of urea cycle metabolites was also reported by Costanzo et al. [6,28]. Elevated ornithine levels have been associated with an increase in ammonia burden due to a metabolic block in the urea cycle. In such cases, the ammonia burden normally shifts the glutamate:glutamine ratio towards glutamine. However, this is the reverse of what we observed here with a highly significant shift towards glutamate. Recent large-scale epidemiology studies have shown that a high glu:gln ratio is associated with type 2 diabetes and risk of metabolic diseases [48,49]. Higher plasma concentrations of glutamate, lower glutamine concentrations, and the associated higher glutamate:glutamine ratio have been associated with increased risk of type 2 diabetes in the PREDIMED trial. Proposed mechanisms by which this altered glutamate to glutamine ratio impacts diabetes risk includes the fact that glutamine can bring about lowering blood glucose levels by stimulating insulin secretion via release of the glucagon-like peptide (GLP-1) [50]. Conversely, high circulating concentrations of glutamate can increase oxidative damage in pancreatic cells [51]. This is of interest because of the diabetogenic properties of the SARS-CoV-2 infection and the increased diabetes risk that is now recognized as a problem associated with long COVID [48].
Of all the metabolites in the predictive molecular panel for infection, formate demonstrated the strongest association with SARS-CoV-2 infection [9,52,53]. Formate is formed as a by-product of the conversion of tryptophan to N-formyl-kynurenine and one of the metabolic hallmarks of SARS-CoV-2 infection is the reduction of the bioavailability of tryptophan through the activation of Indoleamine 2,3-dioxygenase (IDO), in-turn reducing serotonin levels and elevating production of kynurenine and quinolinic acid products [6,54]. However, formate is also formed as part of gut microbial metabolism, and it is known that the gut microbiome can be significantly impacted by SARS-CoV-2 infection and so this potential biomarker is likely to have multiple origins directly and indirectly related to the virus infection.
The parameters were ranked according to their significance in the overarching multiclass severity model, and the top 50 parameters, mostly but not exclusively, belong to the lipid and lipoprotein classes ( Figure S1). Radar plots were constructed to show the fold-changes between the control samples and the SARS-CoV-2 positive patients for the 50 most significant molecules by p-value; the radar plots were ordered clockwise by decreasing fold change (Fold change is defined as (B − A)/A, so 0 means no change). Pyruvate shows the largest fold-change between the controls and the SARS-CoV-2 positive patients. The lipids that were most significant in differentiating infected from non-infected samples were the phosphatidylcholines (PC 18:2/18:2, PC 18:1/18:2, PC 18:2/20:4) and phosphatidylethanolamines (PEP 18:2/18:2, PEP 18:1/18:2, PEP 16:0/18:2). This has been reported previously but was based on a study with a smaller number of participants [22]. Several of the HDL subclass-4 lipoprotein parameters (H4A1, H4A2, H4CH, and H4PL) were markedly reduced and featured in the most significant list. Previous studies have found an inverse correlation between levels of HDL particles and severity of SARS-CoV-2 infection, with some studies showing binding between the spike protein of the virus and HDL [4,55]. In contrast, other studies have proposed that HDL facilitates infection of host cells by binding to angiotensin-converting enzyme 2 (ACE2) [56]. In one study, low pre-covid levels of HDL cholesterol was found to be correlated with the severity of SARS-CoV-2 infection [57].
In previous research on pulmonary arterial hypertension, low plasma levels of HDL4 were found to be associated with mortality [58]. Plasma concentrations of HDL4 levels were directly associated with several proteins involved in fibrinolysis and indeed small HDL particles such as HDL4 are known to transport proteins such as prekallikrein [58]. Prekallikrein is the precursor of the serine protease kallikrein which acts to release kinins such as bradykinin, which are involved in fibrinolysis, blood pressure control, and vascular inflammation [59]. In addition to atheroprotective properties and a role in fibrinolysis, HDL4 (also referred to as HDL3b and HDL3c in older nomenclature) has been shown to have antioxidant and antiinflammatory properties, with the ability to stimulate production of nitric oxide, mainly due to the effect of Apo-A1 [60]. The highest ranked lipoprotein parameter in differentiating SARS-CoV-2 infection and control and which was also lower in the participants who died was H4A1 (HDL4-Apolipoprotein-1), the main protein carried by the small, dense HDL4 particles. Apo-A1 is inversely correlated to cardiovascular disease and is arguably a better predictor of cardiovascular disease [61].
Stratification of the data by molecular class or assay type allows a more detailed assessment of the impact of infection. The models for each assay type are provided in Figure S3. While the model built using all four assays yielded a AUROC of 0.99 for differentiating infected from non-infected samples, the models generated for lipoproteins, lipids, and low molecular weight metabolites independently also showed excellent classification predictivity (AUROC 0.95-1.00), indicating that any one of these assays on its own was capable of accurately classifying SARS-CoV-2 infection (Table 1 and Figure S3).

Metabolite Classes Based on Severity of Infection
The key question to consider is whether the metabolite classes that are most significantly perturbed within a mild case of COVID-19 disease are the same as those in an individual with a severe infection. To obtain a more comprehensive view of all the 1034 variables in the integrated model and how the significant metabolites change with increasing respiratory severity, significant metabolites were clustered and colored by assay in a pan-metabolic plot ( Figure 2): lipids in black; lipoproteins in green; low molecular weight metabolites in magenta. For all three molecular panels, the core differential metabolites remained the same across severity categories, with the ranked order of significant metabolites being similar, but not identical across the severity levels (Figures 2 and 3, Tables S6-S9). The similarity of the core metabolites across all severity models indicates that the metabolic changes within the severity group B are similar to those in group E, at least within the top 50 most significant metabolites. However, it can be clearly observed that group E severity log 2 fold change is greater than group B, C, and D ( Figure 3 and Figure S1B). Although the core set of metabolites were stable as severity progressed, as the severity level of respiratory symptoms increased, the number of discriminatory metabolites in the O-PLS-DA models tended to increase incrementally (Table 2). Thus, the model for the mildest severity (group B) contained the lowest number of statistically significant parameters (404 in Group B, 483 in Group C, 537 in Group D, and 608 in Group E, respectively).  Severity-the top 50 most significant differentiating metabolites in the controls versus each severity group models for the integrated data set. The top 50 most significant metabolites by p-value of the integrated data set using the controls vs. SARS-CoV-2 positive patients, ordered by log 2 fold change. Fold changes with respect to controls of each severity class are shown: group B (blue), group C (cyan), group D (orange), and group E (red). The metabolite axis is colored according to the assay with which the metabolite is measured: lipids (black), lipoproteins (green), and the low molecular weight metabolites (magenta). Expressed as a percentage of the total measured parameters, it is evident that the greatest increment in percentage of differential parameters is observed between SARS-CoV-2 participants who did not require hospitalization (Group B) and those who required hospitalization but with no requirement for oxygen (Group C). For lipids, the percentage of statistically significant differential parameters ranged from 36% of all lipids measured being discriminatory in the non-hospitalized category to 57% in the most severe category, indicating that even at the lowest severity there is substantial metabolic dysregulation. To display the pan-metabolic responses to different severity levels, we have introduced a "Metabolic Barcode" model ( Figure 2). Here, each line of the barcode represents an individual statistically significant parameter organized according to molecular class providing a means for rapid visualization of similarities between the models, in this case relating to increasing severity. Thus, it can be seen that the lipoproteins and most of the lipid classes share strong similarities across severity levels but that the triacylglycerides are somewhat different between the least and most severe class comparisons with control. For the low molecular weight metabolites and the lipoproteins, the number of significant metabolites increased most sharply between categories B (non-hospitalized) and C (hospitalized but not requiring oxygen) but plateaued at the more severe symptom categories. Since the decision whether to administer oxygen can be partially dependent on the caregivers, there is a certain amount of subjectivity in the distinction between hospitalized patients who did or did not receive additional oxygen, and therefore the similarities between categories C and D is not unduly surprising.
Another type of pan-metabolic response graph is shown in Figure 3 where the top 50 metabolites differentiating control from SARS-CoV-2 infected classes are ranked by order of the variables in terms of their adjusted p-values in each of the pairwise comparisons of the severity groups with non-infected controls for the integrated parameter set, and shows the log 2 fold change with respect to the non-infected control class, with the different severity classes defined by the color of the coordinate (red for the greatest severity and blue for the least severe). Thus, the longer the bar, the greater the difference in fold change from the least to most severe class. The top 25 parameters differentiating infected and control samples on a class-by-class basis are provided in Table 2, which shows that in general, while there is some reordering in the rank of significant metabolites (based on adjusted p-value) level of significance of the metabolites as severity increases (Table 2), the same metabolites remain upregulated in the disease state. Formate is the most significant metabolite across all severities, except for Group E, where it falls to second place behind the neopterin:tryptophan ratio. Pyruvate and the Asp:Glu/Asn:Gln are consistently ranked in the top four places across all severity levels. It should also be noted that for the control vs. B group ( Table 2), most of the top 25 significant metabolites comprise mainly of lipids but as severity increases a number of ranked HDL subparticle four lipoprotein classes are involved. This is of note because we have previously shown that HDL subclass four (the smallest high density HDL) is significantly reduced in COVID-19 [4,20]. HDL is also reduced in pulmonary hypertension and carries several fibrinolytic proteins, such as alpha-2-antiplasmin, prekallikrein, and coagulation factor XI [58], which potentially reflects a predisposition towards micro blood clotting, a known problem of SARS-CoV2 infection. In the higher severity groups, TPA1 and TPA2 (total Apolipoprotein-A1 and -A2) were significant in differentiating between cases and controls in addition to H4A1 and H4A2, emphasizing the roles of Apolipoprotein-A1 and -A2 in response to SARS-CoV-2 infection.
Inspection of the extended list of the top 50 most significant parameters ( Figure 3 and Table 2) shows that lipids dominate the ranked lists with lower levels of multiple phosphatidylcholines and phosphatidylethanolamines in the infected group being a defining feature. Although the core molecular signature of SARS-CoV-2 infection was similar in character regardless of the severity, the main exception lay in the impact on the tryptophan pathway, which was differentially impacted in the most severely infected group (Table 2).
For most parameters, the difference in magnitude of fold change between the least (Group B, blue) and most (Group E, red) severe infection groups are not substantial and many parameters do not demonstrate a linear progression with severity, again reinforcing the observation that the molecular signature of the infection is similar regardless of respiratory severity. For example, in severity groups B and C, pyruvic acid, Asp:Glu/Asn:Gln, and glutamic acid manifest the largest fold change from non-infected but the fold change does not increase substantially as the respiratory severity of infection increases. This suggests changes in these metabolites and ratios may be more indicative of the presence of infection. The exception is for serum levels of tryptophan, neopterin, and quinolinic acid ( Figure S5), which show an abrupt concentration change with the transition from groups B and C to groups D and E thereby associating with severity rather than presence of infection. Nevertheless, with the exception of the contribution from the tryptophan pathway metabolites, the severity of respiratory infection has little impact on the core metabolic parameters differentiating infected from non-infected individuals. To note, within severity group E, the assisted ventilation patients, 22 were admitted into ICU while 35 were not. Comparison of the 1034 fully quantified metabolic variables of those who were admitted into ICU versus those who were not where all patients were subsequently discharged from hospital resulted in all adjusted p-values being non-significant. Figure 3 and Table 2 shows some of the same metabolites that are highly significant are the same between controls versus mild cases and controls versus assisted ventilation patients. This suggests the shifts in metabolomic profile seen are highly dependent on the presence of infection with minimal contributions from assisted ventilation and admission into ICU. These findings would indicate that the metabolic monitoring of patients with mild acute phase respiratory disease may be equally important as a patient that was severely ill in the acute phase. Indeed, it has been shown previously that even with mild symptoms, metabolic perturbations are still present many months after the acute phase of the disease is resolved [2], and several of these metabolic perturbations may be associated with altered long-term disease risk.
When the datasets for the assays were modelled independently and stratified for respiratory severity, all four infected categories ranging from non-hospitalized but symptomatic (Group B) to hospitalized and requiring ventilation (Group E) when compared with the non-infected group delivered robust models (Table 1). O-PLS-DA scores plots, eruption plots, and variable importance plots of the variables of each model can be found in the Supplementary Materials (Figures S6-S12 and Tables S10-S29). In most cases, the AUROC values were as high for the model comparing the mild severity disease with the non-infected group (AUROC= 0.98-1.00) as for the model of the most severe disease group versus the non-infected group (AUROC > 0.99). Although the metabolic plasma profiles showed a continuum of changes across the respiratory severity levels, even mildly affected respiratory patients showed multiple highly significant abnormal biochemical signatures reflecting serious metabolic deficiencies of the type observed in Post-acute COVID-19 syndrome patients.
For the lipoprotein dataset, strong models were produced for each severity classification, which gives insights into cardio-metabolic complications of COVID-19 ( Figures S6-S8). Seven of the top ten lipoproteins appear as the most significant in each of the four severity comparisons, namely: H4CH (high density cholesterol subfraction 4), H4PL (high density phospholipid subfraction 4), TPA2 (Apolipoprotein-A2), H4A1 (high density lipoprotein Apolipoprotein-A1 subfraction 4), H4A2 (high density lipoprotein Apolipoprotein-A2 subfraction 4), HDA1 (high density lipoprotein Apolipoprotein-A1), and H4FC (high density free cholesterol subfraction 4), which are all decreased in the SARS-CoV-2 infected individuals. As before, the number of statistically significant metabolites increased as the respiratory severity of the patient worsened (Tables 1 and 2). Low Density Triglyceride (LDTG) had the greatest statistical significance across all severity classes. Interestingly, ABA1 (Apolipoprotein-B100/Apolipoprotein-A1), a known cardiovascular risk lipoprotein marker [1,31,62,63], demonstrated an increase in all severity classes in comparison to the controls with a Cliff's delta ranging from 0.91-1.00, which indicates that even with mild disease the potential for detrimental cardiovascular effects is present.
Pairwise comparison of SARS-CoV-2 infected groups indicated that severity classes B and C could not be significantly differentiated for any of the molecular panels with AUROC < 0.6 for all single increment models (Table 1). Similarly, groups C and D and groups D and E could not be differentiated based on their lipid or lipoprotein parameters, although weak models were obtained for the low molecular weight parameter set for the comparison of group C (hospitalized patients with no oxygen supplementation) versus severity group D (hospitalized patients with low flow oxygen) and for group D versus E. As a result, the number of differential metabolites for each assay was vastly reduced when comparing between severity levels in comparison with comparing any of the infected classes to those non-infected. This lack of ability to robustly differentiate sequential severity classes supports the suggestion that metabolically the impact of SARS-CoV-2 infection is much greater than differences between mild and severe infection.
As shown in the Eruption plots constructed from the lipid data ( Figure S9), the majority of the lipid parameters are reduced in SARS-CoV-2 infection from the mildest infection to the most severe. However, from the lipids that increase with infection, monoacylglycerol 20:3 is the most significantly upregulated lipid in severity group B vs. controls and severity group C vs. controls and remains highly significant in the more severe disease states. Phos-phoserine 18:0/18:0 is also highly significant in all models, while ceramide 18:0 increases in significance as respiratory severity increases. Interestingly, of the lipids that decrease with COVID-19 infection, the most significant in the control vs. B group comparison is also the most significant in the control vs. E group: phosphocholine 18:1/18:2. In all the severity classes vs. controls the phosphocholine, phosphoethanolamine, lysophosphocholine, and hexosylceramide remain the most significant markers of COVID-19 infection, regardless of severity class, all of which are present in reduced concentration compared to the controls and have been reported to have been reduced previously in COVID-19 infections [5,64].

Tryptophan Pathway Metabolism Is Substantially Disrupted in Severe SARS-CoV-2 Infection
Although most of the low molecular weight metabolites, including many of the amino and organic acids, did not associate with infection severity, we and others have reported the tryptophan pathway to be disrupted by SARS-CoV-2 infection [6] and also influenced by the severity of infection: quinolinic acid (positively associated), tryptophan (negatively associated), and 3-hydroxykynurenine (positively associated), correlated with severity (See Figures S11-S13). Indole-2,3-dioxygenase modulates the production of kynurenines and is a known regulator of inflammation in the event of infection, including SARS-CoV-2 [65]. In the integrated set of all metabolites stratified by severity, no tryptophan pathway metabolites ranked in the top 50 most discriminatory parameters when comparing either group B or C severity with control, whereas the neopterin to tryptophan ratio (positively associated) ranks in the top 50 for Groups D versus control and both the neopterin to tryptophan and quinolinic acid to tryptophan (positively associated) ratios rank in the top 50 most significant parameters for the comparison of group E versus controls (Table S9). Low tryptophan levels together with high neopterin concentrations have been associated with cardiovascular disease and cancer [66] and are related to inflammation. Analysis of the low molecular weight parameter set in isolation additionally finds higher serum concentrations of quinolinic acid, neopterin, and 3-hydroxykynurenine, plus lower levels of tryptophan and serotonin (Tables S21-S24). Lower serum serotonin levels are only apparent in the models for the two highest severity categories.
Disruption of the tryptophan pathway following SARS-CoV-2 infection has been reported in multiple studies [2,6]. Both quinolinic acid and 3-hydroxykynurenine, together with glutamate, which is also directly associated with infection across all severity classes (Tables S21-S24), are excitatory neurotoxins [67]. Increasing reports of associations between quinolinic acid and neurodegeneration include conditions such as Huntington's disease, AIDS, dementia, Alzheimer's disease, and Parkinson's disease [67][68][69]. In the current study, changes in serum concentrations of the kynurenine:tryptophan ratio and picolinic acid were found to track with the severity of SARS-CoV-2 infection. Consistent with this observation, kynurenine significantly differentiates classes C to E from controls but not B whereas picolinic acid is only significant on the model comparing the most severe class E with non-infected (Tables S21-S24). Upregulation of the kynurenine pathway occurs due to proinflammatory cytokines including IL-1, TNF-α and IL-6 [70] and has been noted previously in COVID-19 and other chronic diseases [71].
These data indicate that as the severity increases, catabolism of tryptophan via the kynurenine pathway increases [33]. Picolinic acid, a metabolite downstream in the pathway, is the most significant metabolite in the D vs. E model while it was not significant in the C vs. D model and has been previously shown to have a role in inflammatory disorders within the central nervous system [72] and to be increased in cases of children with malaria [73]. It has also been found to be significant in severity classification in COVID-19 patients. Cihan et al. reported an association between SARS-CoV-2 infection severity and picolinic acid and the kynurenine:tryptophan ratio and showed a correlation between KYN:TRP and the inflammatory marker IL-6 [33]. It has been shown previously that decreases in tryptophan concentrations become significant in severely ill COVID-19 patients in comparison to mild or moderately ill patients [38,74].

Mortality Prediction in SARS-CoV-2 Positive Patients
This study was conducted early in the pandemic with the Wuhan sub-variant dominant in an exclusively unvaccinated population with consequent high mortality. Within the severely ill SARS-CoV-2 patients (group E severity), 56.1% died. No patients in severity groups B, C, or D died. In terms of demographics, the subset of group E that did not survive were significantly older than those that survived. In order to remove the effect of the age disparity and compare the two groups, only participants that were between the ages 65-80 were selected, resulting in 13 people who survived (median age = 73) and 13 who did not (median age = 75). The demographics of these two groups are found in Table S30.
For the 13 SARS-CoV-2 infected patients, it should be noted that the time between blood collection to the patient dying ranged from 8-61 days. The resulting model (Figure 4), which used all the 1034 variables from all four assays, had an AUROC of 0.96 indicating that the retrospective model was able to predict mortality at a median of 25.5 days prior to death. All significant metabolites can be found in Table S31. The severity classification prediction model was then cross validated. Projection of the COVID-19 patients from this study into the trained model provided high specificity. For the group B severity group ( Figure 4C), all were classified as survivors, so the specificity = 1.00. For group C severity ( Figure 4D Effectively this means that on hospitalization, the high mortality patients had a pharmaco-metabonomic serum signature that was predictive of the outcome of the disease for up to 25 days prior to death. The concept of pharmaco-metabonomic prediction was proposed to define the ability to predict metabolic outcomes based on pre-intervention or pre-disease metabolic profiles [75,76], and has previously been applied to retrospectively predict survival in acute-on-chronic liver disease patients [77]. Knowledge of such prognostic data could be applied to beneficially influence selecting the therapy of the individual patient. As shown in Figure 4A, the lipid family hexosylceramides were key indices of survival and several ceramides were present in significantly higher levels (Cliff's delta > 0.5) in the patients who did not survive (hexosylceramide 16:0; 20:0, 22:0; 24:0; 24:1; 26:0; 26:1 and dihydrohexosylceramide 18:0/24:0; 18:0/24:1). An increase in ceramide species in those patients that did not survive has previously been shown in a study predicting the 7-day mortality outcome [78,79]. It has also been demonstrated that ceramides could predict death in patients with stable coronary artery disease and acute coronary syndromes, where it was postulated that the ceramides are associated with lipoprotein aggregation and uptake, superoxide anion production, apoptosis, and inflammation [80]. More specifically, hexosylceramides have been found in higher concentrations in patients with multiorgan dysfunction syndrome than sedated controls in an intensive care unit [81] and have also been linked to viral load in hepatitis C infection [82].
In addition to the elevated hexosylceramides, three sphingomyelins with sidechain lengths 20:1, 26:0, and 26:1 (Cliff's delta values 0.70, 0.86, and 0.75, respectively) were also elevated in the patients that did not survive. Increases in sphingomyelins have been shown previously in COVID-19 infection in humans and in animal models [83]. In contrast to hexosylceramides and sphingomyelins, phosphoethanolamines with chain lengths 16 and 18 and the triacylglycerides are decreased in the patients who did not survive. Correlation plots were completed ( Figure S14) and showed differing patterns between the two groups in the model. Using the combined panel of lipids, lipoproteins, and small molecules, the AUROC for predicting survival was 0.96. This compares well with other previously published predictive biomarkers and biomarker panels. In terms of single predictive clinical markers of survival, placental growth factor (P1GF) returned an AUROC of 77.2% for predicting survival at a median of 14 days after hospital admission (range 2 to 57 days) [84] and CRP concentrations correlated with 14-day mortality with a sensitivity of 0.88 and specificity of 0.56 [85], chromogranin A [86], and D-dimer [87]. In particular, D-dimer levels along with high sensitivity CRP, ferritin, and IL-6 have been reported to be correlated with the severity of SARS-CoV-2 infection with D-dimer demonstrating the best ability for predictions of mortality [88]. Other research groups have proposed ratios of biomarkers for predicting SARS-CoV-2 mortality such as (kynurenine/tryptophan)/(cirulline/ornithine), which returned an AUROC of 0.95 [89]. Although various biomarker panels have been proposed with relatively high sensitivity and specificity, many of these have not been validated in other cohorts. One example is a panel of lactate dehydrogenase, CRP, and lymphopenia which achieved >90% accuracy in predicting mortality in a Chinese cohort but was not replicated in a cohort of Caucasian Dutch individuals [90].
In the current cohort, we found altered serum lipoproteins also contributed to the model differentiating patients who did and did not survive (Figure 4 and Figure S15). Key changes were observed in predominantly cholesterol and free cholesterol components. These include HDCH (Cliff's delta = 0.80), LDCH (Cliff's delta = 0.69), H3FC (Cliff's delta = 0.83), HDFC (Cliff's delta = 0.70). Total Apoprotein A1 (TPA1, Cliff's delta = 0.60), HDL Apoprotein A1 (HDA1, Cliff's delta = 0.61), and HDL subfraction 4 Apolipoprotein A1 (H4A1, Cliff's delta = 0.61) are all increased in the patients who did not survive. The very low density lipoprotein fractions were present in higher concentrations in those who survived compared to those who did not. These observations are in concordance with the work from Masana et al. [91] who noted that low plasma HDL cholesterol and high triglyceride concentrations were correlated with infection severity. Similarly decreased plasma concentrations of several lipid classes has been reported as a feature of SARS-CoV-2 with lysophosphocholine (LPC) 18:0 and LPC 18:2 being inversely correlated with mortality. Although we did not find a specific correlation between these lipids and survival in the current cohort, LPC 18:0 and LPC 18:2 were associated with both the presence of SARS-CoV-2 infection and severity (Tables S5, S17 and S19) [92]. Given the predictive strength of the lipid data, we assessed the ability of the top lipid species defining the SARS-CoV-2 positive and control groups according to the adjusted p-value (HCER 16:0 and PE.O 18:0/18:1), respectively, to predict mortality. The AUROC based on these two lipid species was 0.99 (Supplementary Figure S16) suggesting that this may be a good diagnostic for SARS-CoV-2 mortality and that the diagnostic value of these lipids warrants validation in independent datasets.
The only low molecular weight metabolite with a Cliff's delta above 0.6 in the model predicting mortality was taurine which was higher in the patients that did not survive. The Mann-Whitney test showed the significance between the two groups to be 2.3 × 10 −3 . Elevated taurine levels have previously been associated with liver injury and hepatotoxicity [93]. However, taurine is also present in large quantities in skeletal and cardiac muscle [94], and as COVID-19 causes skeletal muscle loss it may be a result of muscle breakdown [95]. Other groups have previously achieved accurate mortality prediction using clinical [96,97] and metabolomic [38,89,98] data only. Here we present a model which contains a larger patient cohort, therefore adding statistical power, and measuring considerably more variables facilitating deeper understanding of mechanistic pathways involved in SARS-CoV-2 infection.

Participant Enrolment and Sample Collection
The cohort consisted of non-infected control participants (n = 89) and patients who tested positive for SARS-CoV-2 infection from upper and/or lower respiratory tract swabs by RT-PCR (n = 306). These samples were collected early in the pandemic with the Wuhan sub-variant dominant in an exclusively unvaccinated population. The infected participants were divided into four categories based on severity of respiratory symptoms: severity group B, symptomatic but no hospitalization; group C hospitalized, no oxygen required; group D hospitalized supplemental oxygen required; and group E hospitalized, assisted ventilation [21]. No asymptomatic patients were in this cohort (group A). The cohort demographics are provided in Table S1.
All serum samples were provided by the Basque Biobank for research (BIOEF). Control serum samples were collected prior to the COVID-19 pandemic by Osarten Kooperativa Elkartea from an apparently healthy population (employees of the Mondragon Cooperative [Basque Country], during the annual medical test). For the control samples only the participant gender, age, and BMI were provided for this study (Table S1). No information was provided upon the possible presence of any other diseases such as diabetes or cardiovascular disease. For this reason, within this study, they are not referred to as healthy controls but controls of a normal population.
The COVID-19 samples were collected at the Cruces University Hospital (Barakaldo, Spain) from patients who presented compatible symptoms, confirmed by a RT-PCR assay on nasal swab samples. All blood was collected in BD vacutainer serum tubes with clot activator with the same pre-analytical handling procedures for the controls and patients. All participants provided informed consent, according to the Declaration of Helsinki, and data were anonymized to protect their confidentiality. The sample handling protocol was evaluated and approved by the ethics committee of Basque Country (Report of the ethics committee for research on medicinal products in the Basque Country, CEIm-E, PI+CES-BIOEF 2020-04, and PI219130). Shipment of human samples to the ANPC had the approval of the Ministry of Health of the Spanish Government and were imported under Import Permit 0004275122 issued by the Australian Government Department of Agriculture, Water, and the Environment. Upon receipt samples were stored at −80 • C. Samples were approved for analysis as part of the International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC)/World Health Organization (WHO) pandemic trial framework (SMHS Research governance office PRN:3976). Research was conducted in accordance with the Murdoch University Human Ethics Committee approval (no. 2020/052 and 2020/053).

1 H NMR Spectroscopy Sample Preparation
All sample preparation and processing followed the guidelines recommended by Loo et al. [99]. Samples were defrosted at room temperature for 1 h prior to preparation for analysis. NMR samples were prepared in a SamplePro Tube (Bruker Biospin, GmbH, Ettlingen, Germany) robot system for liquid handling with integrated temperature control. Every sample was automatically prepared as a mixture of phosphate buffer (75 mM Na 2 HPO 4 , 2 mM NaN 3 , 4.6 mM sodium trimethylsilyl propionate-[2,2,3,3-2H 4 ] (TSP) in H 2 O/D 2 O 4:1, pH 7.4 ± 0.1) and serum at a 1:1 ratio for a final volume of 600 µL into the 5 mm SampleJet TM NMR tubes. Samples were then manually shaken for several seconds and stored at 5 • C inside the SampleJet TM automatic sample changer until measurement (<24 h). All methods were validated for COVID-19 samples as previously reported [99].

1 H NMR Spectroscopy Data Acquisition and Processing Parameters
NMR spectroscopic analyses were performed on a 600 MHz Bruker Avance III HD spectrometer, equipped with a 5 mm BBI probe and fitted with the Bruker SampleJet TM robot cooling system set to 5 • C. A full quantitative calibration was completed prior to the analysis using a protocol described elsewhere [100]. All experiments were acquired using the Bruker In Vitro Diagnostics research (IVDr) methods. For each sample prepared, a standard 1D experiment with solvent pre-saturation (32 scans, 98K data points, spectral width of 30 ppm) amounting to a total experiment time of 4 min 3 secs was generated and a total of 112 lipoprotein parameters were measured [18] Table S2. In addition to the 112 lipoprotein parameters, 11 low molecular weight metabolite concentrations were obtained from the Bruker IVDr Quantification in Plasma/Serum B.I.Quant-PS (acetic acid, acetoacetic acid, acetone, citric acid, creatine, creatinine, formic acid, glucose, D-3-hydroxybutyric acid, lactic acid, pyruvic acid) (Table S3).

Liquid Chromatography Mass Spectrometry (LC-MS)
Biogenic amines, amino acids, and tryptophan metabolites were measured using two LC-MS quantification methods following previously reported methods for tryptophan and associated catabolites [101] and amino acids [19,102], which were used to measure forty-five parameters (thirty-six individual metabolite concentrations and nine ratios, Table S2). In brief, samples were thawed at 4 • C and prepared for analysis. For the quantification of the biogenic amines and amino acid metabolites, a Bruker Impact II QToF mass spectrometer (Bruker, Daltonics, Billerica, MA, USA) coupled to a Waters Acquity I-class UPLC system (Waters Corp, Milford, MA, USA) was used. Full scan mass spectrometry data in high resolution were acquired using electrospray ionisation positive in a mass range of m/z 30-1000. Tandem mass spectrometry (MS/MS) were collected on all acquired samples using Bruker broadband collision-induced dissociation (bbCID) function. Resulting data files were processed for peak integration and quantification using the Target Analysis for Screening Quantification (TASQ; v2.2) software (Bruker Daltonics, Bremen, Germany) where calibration curves were linearly fitted with a weighting factor of 1/x. For the measurement of tryptophan and associated catabolites, a Waters TQ-XS triple quadrupole (QQQ) coupled to a Waters Acquity I-class UHPLC system (Waters, Wilmslow, UK) was used. The QQQ was operated in positive electrospray ionisation using multiple reaction monitoring (MRM). Raw files were processed for peak integration and metabolite quantification using the the TargetLynx package within MassLynx v4.2 (Waters Corp., Milford, MA) where calibration curves were linearly fitted using a weighting factor of 1/x. Resulting data matrices were combined and quality control checked prior to statistical analysis.

Data Analysis
All computation and data visualization was performed using R and RStudio IDE with the open-source R package metabom8 (version 0.2), available from GitHub (github. com/tkimhofer/metabom8 (accessed on 1 June 2022). Orthogonal projection to latent structures-discriminant analysis (O-PLS-DA) [105] was used to model the respiratory symptom variance in the data and to extract discriminating features. An O-PLS-DA model was calculated to differentiate between infected and non-infected samples for the low molecular weight metabolites, lipids, and lipoproteins. In addition, each severity class was modelled against the control group, and the different severity classes were modelled against each other. In order to balance the numbers for severity group B, which contained fewer samples than the other severity groups, only 25 controls were modelled, selected randomly.
The optimal number of orthogonal components for each model was determined using the area under the receiver operator characteristic curve (AUROC) calculated from predictive component scores, generated using an internal sevenfold cross-validation (CV) procedure. The Cliff's delta statistic was calculated for all the O-PLS-DA models to assess the overall effect size for the intergroup differences [106].

Conclusions
COVID-19 is a heterogeneous disease with strong patient-to-patient variability of symptoms and severity. We used a multi-platform approach to determine the metabolic signature of SARS-CoV-2 severity across a moderately to severely infected cohort. Whilst stratification of the datasets by metabolite class allowed for deeper insight into the metabolic consequences of SARS-CoV-2 infection, the combined multi-modal dataset delivered a stronger model for predicting infection presence, severity, and ultimate mortality.
Although the number of significant metabolites, lipids, and lipoproteins increased as respiratory severity increased, the core metabolic signature of infection was the same for lipids, lipoproteins, and most low molecular weight metabolites regardless of severity level indicating multiorgan involvement of the disease even in mild cases where no hospitalization was required. This raises the question as to the necessity of long-term monitoring of these patients in relation to PACS to establish their long-term recovery and potentially modified disease risks. Marked alterations on pyruvate, formate, and the lactate to pyruvate ratio indicate perturbation of the tricarboxylic acid cycle and energy metabolism at all levels of infection, whereas the disparity of the Asp:Glu/Asn:Gln indicates liver involvement and the increase in the Apolipoprotein-B100/Apolipoprotein-A1 ratio (ABA1) in combination with changes in other lipid and lipoprotein parameters suggests increased cardiovascular disease risk.
Tryptophan pathway metabolism was heavily disrupted by SARS-CoV-2 infection but in contrast to the majority of metabolites, we find that the disruption of this pathway was associated with infection severity and that the pathway was only substantially disrupted in the hospitalized patients requiring oxygen. The change in balance of the pathway from serotonin to quinolinic acid production indicates a shift towards a neurotoxic systemic environment.
It should be noted that there are limitations within this study. The samples were collected at the start of the pandemic. Several publications have alluded to the altered expression of infection symptoms and generally decreased respiratory severity over the successive waves of SARS-CoV-2 infection, typically corresponding to the progression of variants [107,108], so the infection may have ongoing changing disease risks and potentially different metabolic sequelae. Certain sociodemographic and pre-existing health factors have been shown to be associated with SARS-CoV-2 outcomes such as age, BMI, and chronic health conditions including diabetes and cardiovascular disease. Thus, the distribution of numbers of patients with some of these parameters is skewed for the higher severity categories. Of note, and as expected, greater mortality was observed in the more severe respiratory infection classes (statistics on sociodemographic, anthropometric, and selected clinical parameters are provided in Table S1). Nevertheless, the metabolic signature for mortality was distinct from the signature associated with severity, indicating that the prediction of mortality was not solely related to the severity of respiratory symptoms. As expected, mortality was associated with infection severity and could be predicted based on the hexosylceramide and sphingomyelin profiles 8-61 days prior to death. Early indices of adverse clinical outcomes have value in identifying the most 'at risk' patients and may provide a window of opportunity for tailoring the therapeutic monitoring and management of those patients.

Conflicts of Interest:
The authors declare no conflict of interest.