Kynurenine and Hemoglobin as Sex-Specific Variables in COVID-19 Patients: A Machine Learning and Genetic Algorithms Approach

Differences in clinical manifestations, immune response, metabolic alterations, and outcomes (including disease severity and mortality) between men and women with COVID-19 have been reported since the pandemic outbreak, making it necessary to implement sex-specific biomarkers for disease diagnosis and treatment. This study aimed to identify sex-associated differences in COVID-19 patients by means of a genetic algorithm (GALGO) and machine learning, employing support vector machine (SVM) and logistic regression (LR) for the data analysis. Both algorithms identified kynurenine and hemoglobin as the most important variables to distinguish between men and women with COVID-19. LR and SVM identified C10:1, cough, and lysoPC a 14:0 to discriminate between men with COVID-19 from men without, with LR being the best model. In the case of women with COVID-19 vs. women without, SVM had a higher performance, and both models identified a higher number of variables, including 10:2, lysoPC a C26:0, lysoPC a C28:0, alpha-ketoglutaric acid, lactic acid, cough, fever, anosmia, and dysgeusia. Our results demonstrate that differences in sexes have implications in the diagnosis and outcome of the disease. Further, genetic and machine learning algorithms are useful tools to predict sex-associated differences in COVID-19.


Introduction
Sex differences in manifestations of viral infections have been observed for multiple respiratory viruses [1,2] where men have shown higher disease severity and mortality compared with women, including SARS-CoV [3], MERS-CoV [4], the H1N1 pandemic [5], and others. A recent meta-analysis of 3.1 million global cases showed that men have a nearly three times higher chance of being admitted to an intensive care unit (ICU) and a higher risk of dying, even though the incidence of COVID-19 infection is similar [6]. In addition, laboratory measures of routinely collected blood and urine samples from infected individuals have revealed differential patterns by sex and age [7]. Researchers have also looked at sex to understand the mechanisms behind the differences in COVID-19 outcome [8], with some studies focusing on the role of hormones, adipose tissue distribution, and metabolites [9][10][11][12].
In recent years, machine learning (ML) has been widely used for biomarker discovery [13][14][15]. Support vector machine (SVM), firstly proposed by Vapnik [16] has proved to be a powerful technique for pattern recognition, classification, and regression in many fields [17][18][19][20]. SVMs are supervised learners to construct models from available training data with a known classification. To obtain accurate class predictions, SVMs provide a number of free parameters that have to be tuned to reflect the requirements of a given task. Logistic regression (LR) is another technique borrowed by ML from the field of statistics [21]. Genetic algorithms (GA) are ML search techniques inspired by Darwinian evolutionary models. GA are metaheuristics that imitate the long-term optimization process of biological evolution for solving mathematical optimization problems [22].
ML has already been used to build survival and prognostic prediction models in cancer [15,23,24], Alzheimer's disease [25], and obstructive sleep apnea [26]. Similarly, efforts to develop novel diagnostic approaches for COVID-19 using ML algorithms have been proposed [27]. Despite not being focused specifically on sex differences, recent works have reported the use of artificial intelligence (AI) to predict COVID-19 outcomes using clinical data. Jiang et al. [28] used traditional ML methods such as decision tree (DT), random forest (RF), and SVM to predict disease progression to acute respiratory distress syndrome (ARDS) in COVID-19 patients with a 70%-80% overall accuracy. In another study, Xu et al. [29] tested five algorithms for modeling, including LR, RF, SVM, DT, and deep neural networks (DNN) leading to the identification of 19 risk factors to determine whether the patient would develop ARDS: severity evaluation at admission, sex, age, body mass index (BMI), temperature, cough, shortness of breath, hemoptysis, hypertension, diabetes, secondary bacterial infection, lung consolidation, lymphocyte count, CK, NLR, ALT, AST, LDH, and CRP. However, there are still limited data on sex differentials in COVID-19 outcomes.
Jiang et al algorithmically identified the combinations of clinical characteristics of COVID-19 that predict outcomes, developing a tool with AI capabilities that predicted patients at risk for more severe illness on initial presentation. A mildly elevated alanine aminotransferase (ALT), the presence of myalgias, and an elevated hemoglobin were the clinical features, on presentation, that were the most predictive. The predictive models that learned from historical data of patients from the studied population achieved high accuracy levels in predicting severe cases [28]. Lu et al employed a neural network algorithm to predict ICU admission, finding that C-reactive protein, lactate dehydrogenase, creatinine, white-blood cell count, D-dimer and lymphocyte count, showed temporal divergence between COVID-19 patients hospitalized in the general floor who were upgraded to ICU compared to those who were not [30]. Similarly, Li et al. developed a deep neural network model and a risk-score system to predict ICU admission and in-hospital mortality. Prediction performance used the receiver operating characteristic area under the curve (AUC). In this study, the authors found that procalcitonin, lactate dehydrogenase, C-reactive protein, ferritin, and oxygen saturation were the top ICU predictors, while the top mortality predictors were age, lactate dehydrogenase, procalcitonin, cardiac troponin, C-reactive protein, and oxygen saturation [31]. By using machine learning, Hou et al. identified age, procalcitonin, C-creative protein, lactate dehydrogenase, D-dimer, and lymphocytes as the top mortality predictors. The top six ICU admission predictors were procalcitonin, lactate dehydrogenase, C-creative protein, pulse oxygen saturation, temperature, and ferritin [32].
Ancochea et al. [33] identified sex-dependent differences in clinical features, diagnosis, treatment, and hospital resource use associated with COVID-19 using Natural Language Processing and ML.
Understanding how sex could influence COVID-19 outcomes can have relevant implications for accurate clinical management and the implementation of mitigation strategies. The rapid development of automated diagnostic systems based on artificial intelligence and ML can thus contribute to increasing the speed of patient profiling to help improve the management of the COVID-19 pandemic. Clinical data from a sample of 157 patients were extracted from the epidemiologic data set of patients admitted to the Respiratory Triage at the General Hospital of the Mexican Institute of Social Security from March to November 2020 in the city of Zacatecas. Plasma samples from these patients were collected at an early stage of the disease (four days on average after onset of symptomatology and prior to diagnosis). Forty individuals suspected of infection due to close contact with a COVID-19 case tested negative (18 men, 22 women) and 117 (68 men, 49 women) had a positive result using reverse transcriptase polymerase chain reaction (RT-qPCR); in these patients, plasma samples were collected within two days of hospitalization, prior to antibiotic use, if any. A description of the clinical features including demographic data, clinical symptoms, and laboratory variables is provided in Table 1. The study protocol was written in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Mexican Institute for Social Security (ID: R-2020-785-068). Metabolites were measured using a locally developed LC-MS/MS metabolomics assay previously developed for urine and adapted to work with plasma [34]. Mass spectrometric analyses were performed on an ABSciex 4000 Qtrap tandem MS instrument (Applied Biosystems/MDS Analytical Technologies, Foster City, CA, USA) equipped with an Agilent 1260 series UHPLC system (Agilent Technologies, Palo Alto, CA, USA). The method combines the derivatization and extraction of the analytes and the selective mass-spectrometric detection using multiple reaction monitoring (MRM) pairs. Amino acids, biogenic amines and derivatives, and organic acids were analyzed by a reverse-phase LC-MS/MS custom assay, while glycerophospholipids, acylcarnitines, glucose, and sphingomyelins were measured by direct injection (DI).

Sample Preparation
A working internal standard (ISTD) solution mixture in water (for amino acids, biogenic amines, carbohydrates, carnitines and derivatives, and phosphatidylcholines and their derivatives) was made by mixing all the prepared isotope-labeled stock solutions together. For organic acids, a working internal standard (ISTD) solution mixture in aqueous methanol was made. All standard solutions were aliquoted and stored at −80°C until further use. 2H-, 13C-, and 15N-labeled compounds were purchased from Cambridge Isotope Laboratories, Inc. (Tewksbury, MA, USA) and from Sigma-Aldrich (Oakville, ON, Canada). All other standards including lactic acid, beta-hydroxybutyric acid, alpha-ketoglutaric acid, citric acid, butyric acid, isobutyric acid, propionic acid, p-hydroxyhippuric acid, succinic acid, fumaric acid, pyruvic acid, hippuric acid, methylmalonic acid, homovanillic acid, indole-3-acetic acid, uric acid, and their isotope-labeled standards were all purchased from Sigma-Aldrich (Oakville, ON, Canada).
For organic acid analysis, 150 µL of ice-cold methanol and 10 µL of isotope-labeled internal standard mixture [34] were added to 50 µL of plasma sample for overnight protein precipitation at −20°C, followed by centrifugation at 13,000× g for 20 min. A total of 50 µL of supernatant was loaded into the center of wells of a 96-deep-well plate, followed by the addition of 3-nitrophenylhydrazine reagent. Butylated hydroxytoluene stabilizer (2 mg/mL) and water were added before LC-MS injection.
For amino acids and biogenic amines and derivatives, glycerophospholipids, acylcarnitines, and sphingomyelins, samples were thawed on ice and subsequently vortexed and centrifuged at 13,000× g; 10 µL of each sample was then loaded onto the center of the filter on the upper 96-well plate and dried in a stream of nitrogen. Subsequently, phenylisothiocyanate was added for derivatization. After incubation, the filter spots were dried again using an evaporator. Extraction of the metabolites was then achieved by adding 300 µL of extraction solvent.
For the analysis of organic acids, the mobile phases used were (A) 0.01% (v/v) formic acid in water, and (B) 0.01% (v/v) formic acid in methanol. The gradient profile was as follows: t = 0 min, 30% B; t = 2.0 min, 50% B; t = 12.5 min, 95% B; t = 12.51 min, 100% B; t = 13.5 min, 100% B; t = 13.6 min, 30% B; and finally maintained at 30% B for 4.4 min. The column oven was set to 40°C. The flow rate was 300 µL/min, and the sample injection volume was 10 µL.

DI-MS/MS Method
The LC autosampler was connected directly to the MS ion source by red PEEK tubing. The mobile phase was prepared by mixing 60 µL of formic acid, 10 mL of water, and 290 mL of methanol; and the flow rate was programmed as follows: t = 0 min, 30 µL/min; t = 1.6 min, 30 µL/min; t = 2.4 min; 200 µL/min; t = 2.8 min, 200 µL/min; and t = 3.0 min, 30 µL/min. The sample injection volume was 20 µL.

Quantification
To quantify organic acids, amino acids, biogenic amines, and derivatives, an individual seven-point calibration curve was generated for each analyte. The ratios of each analyte's signal intensity to its corresponding isotope-labeled internal standard were plotted against the specific known concentrations using quadratic regression with a 1/x 2 weighting.
Lipids, acylcarnitines, and glucose were analyzed semiquantitatively. Single point calibration of a representative analyte was built, using the same group of compounds that share the same core structure, assuming linear regression through zero. All data analysis was conducted using Analyst 1.6.2 and MultiQuant 3.0.3. Metabolites with more than 50% of missing values were removed from further analysis.

Cytokine and Chemokine Quantification in Plasma Samples
We used a bead based flow cytometry assay (Legendplex, Biolegend, San Diego, CA, USA) for the quantitative simultaneous determination of 13 analytes: IL-1β, IFN-α, IFN-γ, TNF-α, IP-10, IL-6, IL-8 (CXCL8), IL-10, IL-12p70, GM-CSF, IFN-β, and IFN-λ. The assays were performed according to the manufacturer protocols and procedures. Precoated beads were dispensed in a 96 well filter plate and mixed with either the plasma samples or standards. Detection was made by biotin labeled antibodies and PE-Streptavidin detection reagents. The flow cytometry data were acquired in a FACS CANTO II flow cytometer (BD Biosciences, Franklin Lakes, NJ, USA) and analyzed in the FirePlex software (Biolegend, USA). Regression analysis was calculated, and the limit of detection and limit of quantification was obtained for each molecule (R 2 value > 0.995).

Descriptive Statistics
Medians (interquartile ranges [IQRs]) and frequency (%) were used to report healthy controls and patient baseline characteristics for continuous and categorical variables, respectively. This information is shown in Table 1. Normality was assessed by the D'Agostino-Pearson normality test. Continuous variables were compared using Mann-Whitney U tests or Kruskal-Wallis tests, and categorical variables (sex, smoking, symptoms, and comorbidities) were compared using the chi-square test for trend, with p values of less than 0.05 considered statistically significant. The analyses were conducted using GraphPad Prism version 8.0.1 for Windows (GraphPad Software, La Jolla, CA, USA).

Machine Learning Methodology
To assess clinical, immunologic, and metabolic associations with sex, a genetic approach was used to identify features that could be used in multivariate modeling to predict COVID-19 by sex. The proposed methodology is presented in Figure 1. It consists of four stages: (1). Data are split into training/testing; a blind test is used to search and find the best features to predict COVID-19 using a genetic algorithm; (2). Representative models are created using SVM and LR; (3). Model training and cross-validation are performed; and (4). Final models of SVM and LR are tested on an unseen blind data set to establish the model robustness on new samples.

Data Preparation and Feature Selection
Four patients were excluded from the ML techniques due to several missing variables, but imputation with the mean was performed for patients with an individual missing variable; the mean was calculated with respect to the subgroup to which each patient belonged, that is, whether they were male or female, control patient or outpatient, hospitalized or in critical condition due to COVID-19. The age and binary variables (0, 1) were excluded for data normalization, which consisted of the conversion of values to z-scores representing standard deviations below or above the mean of a reference population, Equation (1), where x = the observed measure, µ is population mean, and σ is the population standard deviation [35]. Figure 1, stage 1, shows the selection of variables using a genetic algorithm (GALGO, R package) designed to develop multivariate statistical models [36]. The parameters included were goal Fitness = 1, maxBigBangs = 1000, and maxGenerations = 300; these were used with two classification methods, LR and SVM. One thousand models were generated to obtain a fitness goal closer to 1. With each model, the selected features produced a ranking graphic from the most to the less frequent to ultimately build the optimal model. The first ML technique used was LR, a transformed version of linear regression using the logic function, which was useful to model the probability of an event given other variables, namely, the probability of belonging to a group based on predicted probabilities from 0 to 1, which is considered the standard classification method for binary problems [37]. The model inputs real values that are multiplied by a weight and the sum is entered to the logit function Equation (2), to obtain the probability of belonging to one or another group based on the function of the threshold value [38][39][40], where z is the linear sum α plus β 1 by X 1 plus β 2 by X 2 , and so on up to β k times X k , where the Xs are the independent variables of interest, α the constant term, and β i (slopes) representing the unknown parameters.
The second ML technique was SVM, a group of algorithms used for classification and regression. It is a model that represents sample points in space, splitting two classes of a new sample by means of a separation hyperplane, defined as the vector between the two points of the two classes, closer to what is called the support vector [41]. Using the simple mathematical Equation (3), it allows linear division of the domain [41]. Here, y is the optimal hyperplane, and γ is the constant that indicates the position of the hyperplane with respect to the origin of coordinates. This constant is called the bias, and w is the normal vector of the hyperplane.
Ten experiments were performed, half using LR and the other half SVM, to predict patients without COVID-19 versus infected patients. First, all patients were included regardless of sex to identify all relevant characteristics associated with COVID-19 infection, followed by a second experiment with only women, and a third one with only men. The fourth and fifth experiments aimed at predicting sex among infected and non-infected patients, respectively.

Model Generation
Once the variables were selected by GALGO in the experiments, wrapping techniques were implemented. Forward selection (FS) was used, this is an iterative method that starts without variables in the model, and with each iteration variables are added one by one; if the performance of the model improves, then variables are added until no improvement in the classification model is achieved [42]. For each experiment, the possible models were presented, and the one that produced the highest level of prediction was chosen as the biomarker. This model is therefore capable of optimizing the prediction of the patient group. Two models were obtained for each of the experiments, corresponding to the LR and SVM models. This stage was carried out within stage 2 ( Figure 1).

Model Training and Validation
Once the best model was chosen, it was used with 80% of the data set, and k-fold cross-validation was also performed (k = 5 from Figure 1 stage 4), a useful technique to evaluate the effectiveness of the model that mitigates overfitting, in which one of the subsets is used as test data and the rest as training data. Finally, the average of the results of each iteration is computed to obtain a single result and the performance of the proposed model.

Blind Testing
The model was then subjected to evaluation or testing with the 20% of the remaining data, namely, with data unknown to the model resembling an evaluation in an unknown population, for instance, from another state. This is shown in stage 4 of Figure 1.

Models Evaluation Metrics
For cross-validation, training, and blind testing, models were evaluated using the following metrics: Sensitivity: The ability of the test to detect the infection in infected individuals, calculated with the ratio of true positives (TP), divided by the sum of false negatives (FN) and true positives (TP), (Equation (4)).
Specificity: The ability of the test to detect negative cases among the healthy, calculated by dividing the true negatives (TN) by the sum of the false positives (FN) and true negatives (TN), (Equation (5)).
Speci f icity = TN/FP + TN Receiver operating characteristic (ROC) curves were used to assess the overall performance of the models. The curve depicts the sensitivity as a function of false positives (complementary to specificity). The area under the curve (AUC) is then calculated and interpreted as the probability of having the model rank a random positive example higher than a negative example [43].
Accuracy, the fraction of predictions that the model made correctly [44], is also calculated by dividing the number of correct predictions (TP and TN), among all predictions (TP, TN, FP, and FN), (Equation (6)).
The free statistical software R was used for the analyses and graphics.

Results
A total of 117 patients with confirmed COVID-19 and 40 negative individuals used as controls were enrolled in this study. For the analyses, patients were categorized into four groups: men without COVID-19 (12%), women without COVID-19 (14%), men with COVID-19 (68%) and women with COVID-19 (31%). Table 1 describes the study population stratified by sex, and in general the parameters are in line with previous reports [45]. Laboratory parameters showed decreased levels of lymphocytes in COVID-19 patients, especially among men. Monocyte counts were also decreased in COVID-19 patients, but only men had a statistically significant reduction. Conversely, levels of urea were higher in COVID-19 patients, reaching statistical significance among women with COVID-19. Clinical symptoms included fever, cough, and dyspnea in COVID-19 patients regardless of sex; similarly, no statistical differences in comorbidities were found by sex.

Comparison between COVID-19 Patients and Non-COVID-19 (Negative Controls)
Firstly, we analyzed the differences between COVID-19 patients and those without COVID-19, regardless of sex. Appendix A Figure A1A,B depict the ranking of variables for the LR and SVM models. The selection process was completed by entering the highest ranked variables that improved the models' performance (Appendix A Figure A2A,B). When sex was not adjusted for, the SVM model included 21 variables. Relevant symptoms comprised cough, dysgeusia, anosmia, fever, and chest pain. Obesity was the most important comorbidity. Neutrophil, lymphocyte, and platelet counts were also relevant, as were cytokines, IL-10, IL-6, and IP-10. Metabolites included PC aa C36:6, C10:1, spermidine, lysoPC a 28:0, tryptophan, lysoPC a 26:0, lysoPC a 26:1, propionic acid, and butyric acid. For the LR model, only six variables were included: age, cough, dysgeusia, anosmia, lysoPC a 26:0, and SM C16:0. Thus, cough, anosmia, dysgeusia, and lysoPC a 26:0 were present in both models. A final model was constructed using the variables obtained from the forward selection process in both the SVM and LR algorithms. This model was then cross-validated with k=5 and blind-tested in unseen samples to assess the new samples' performance. Appendix A Figure A3A,B depict ROC curves for both algorithms. Appendix A Table A1 shows the performance for each model with AUCs ranging from 0.93 to 0.98 for both SVM and LR. Figure 2 shows highly ranked variables for the LR (Figure 2A) and SVM ( Figure 2B) models for men and women with a COVID-19 diagnosis. Figure 3 illustrates the forward selection process used, resulting in three variables selected in the SVM model (i.e., hemoglobin, kynurenine, and taurine), and two in the LR model (i.e., hemoglobin and kynurenine). ROCs for both models are presented in Figure 4, the ROC curves are presented for both algorithms. The model's performance was assessed with the AUC, ranging from 0.66 to 0.94 (Table 2). The "y negative" axis shows the color coded rank of each feature as each model was generated. The "x" axis shows the features ordered by rank. The starting color for each feature is assigned accordingly to the feature descending rank (from black down to white).

Comparison of COVID-19 Status by Sex
Next, with the aim to elucidate whether kynurenine and hemoglobin were strictly in relationship with the disease, we compared men and women without COVID-19 (negative controls) by the same approach. Appendix A Figure A4 depicts highly ranked variables for the LR (Appendix A Figure A4A) and SVM (Appendix A Figure A4B) models for men and women without a COVID-19 diagnosis (i.e., control patients). Appendix A Figure A5A,B show the forward selection process used, resulting in 16 variables in the SVM model, and four in the LR model. Variables included in both models were C10:2, neutrophils, lymphocytes, and erythrocytes. ROCs are presented in Appendix A Figure A6. Appendix A Table A2 shows the AUC to assess the models' performance that ranged from 0.66 to 1.  [36]. The logistic regression function was computed using an external implementation coupled to the GALGO library; therefore, the dashed/average lines for this model could not be computed.   Figure 5A,B show highly ranked variables for the LR and SVM models for women with and without a COVID-19 diagnosis. The forward selection process used to identify variables is shown in Figures 6A,B, which resulted in 29 variables for the SVM model and 12 variables for the LR model. Nine variables were selected in both models (i.e., C10:2, lysoPC a C26:0, lysoPC a C28:0, alpha-ketoglutaric acid, lactic acid, cough, fever, anosmia, and dysgeusia). ROCs are shown in Figure 7A,B, and Table 3 presents the AUC to assess the models' performance. The "y negative" axis shows the color coded rank of each feature as each model was generated. The "x" axis shows the features ordered by rank. The starting color for each feature is assigned accordingly to the feature descending rank (from black down to white).   [36]. The logistic regression function was computed using an external implementation coupled to the GALGO library; therefore, the dashed/average lines for this model could not be computed. Model A was built with C10:2, cough, alpha ketoglutaric acid, lysoPC a 26:0, lysoPC a 28:0, histidine, fever, anosmia, propionic acid, lysoPC a 18:2, dysgeusia, and lactic acid. Model B was built with C10:2, cough, anosmia, dysgeusia, proline, arginine, lysoPC a C26:0, lysoPC a C28:0, transhydroxyproline, alphaketo glutaric acid, lactic acid, fever, succinic acid, lysine, lysoPC a C28:1, indoleacetic acid, IP-10, IL-6, kynurenine, choline, acethylornithine, lysoPC a C26:1, methylhistidine, sarcosine, glutamic acid, ornithine, C10:1, and glucose.  Figure 8A,B shows highly ranked variables for the LR and SVM models, respectively, for men with and without a COVID-19 diagnosis. The forward selection for the selection process is shown in Figure 9A,B, resulting in four variables in the SVM model and eight in the LR model. Three variables were identified by both models (i.e., C10:1, cough, and LysoPC a C14:0). In Figure 10A,B the ROC curves are shown for both models. Table 4 presents the AUC to assess the performance of the models ranging from 0.96 to 1. shows the number of times each feature was included in a given model (the frequency ranking). The "y negative" axis shows the color coded rank of each feature as each model was generated. The "x" axis shows the features ordered by rank. The starting color for each feature is assigned accordingly to the feature descending rank (from black down to white).  [36]. The logistic regression function was computed using an external implementation coupled to the GALGO library; therefore, the dashed/average lines for this model could not be computed.

Discussion
Genetic algorithms and machine learning are useful tools that assist researchers and medical professionals in screening, detecting, and predicting several diseases, including COVID-19. Here, they were used to identify potential sex differences associated with COVID-19 using numerous symptoms, metabolites, and cytokines measured in 157 individuals. The genetic algorithm was used to build a multivariate model able to predict and classify COVID-19 patients along with two machine learning algorithms (SVM and LR), extensively tested for classification tasks [46][47][48][49][50]. One of the key advantages of the proposed methodology, is that a genetic algorithm searches for the combined classification power rather than for individual performance of each feature (Table 1).
When comparing COVID-19 patients with control individuals without sex stratification, cough, anosmia, dysgeusia, and lysoPC a 26:0 were identified by SVM and LR, with a similar performance and sensitivity, even though the former required 21 variables instead only 6 as the latter. Zoabi and colleagues also developed a model for predicting COVID-19 using machine learning that included fever and cough as the most important symptoms [51]. Similarly, Tandan and colleagues found that fever, cough, pneumonia, and sore throat were the most frequent features using a rule-based machine learning technique called association rule mining [52]. Other authors have reported lipid dysregulations in COVID-19 patients, as found in this study, such as that of glycerophospholipid metabolism [53][54][55].
Studies of COVID-19 patients have shown that men have a higher risk of developing severe illness compared to women, as well as fatalities [56]. The underlying mechanisms for such differences, reflected also in clinical symptoms, metabolic alterations, and immune response, still remain insufficiently understood. Therefore, more research should focus on the role of sex as a relevant factor in COVID-19. The method used was GALGO [57], an R package for multivariate selection technique based on genetic algorithms that produce thousands of models' combinations keeping only the best variables in the final model. Here, hemoglobin plus kynurenine and hemoglobin in combination drove the differences between men and women with COVID-19. The inclusion of hemoglobin should not be surprising, even when levels fell within normal range in this study, as others have reported a significant drop in hemoglobin values associated with disease severity [57]. Other authors have also reported remarkably low levels of hemoglobin and albumin in COVID-19 patients, probably due to the rapid turnover of red blood cells that led to hemoglobin degradation [58]. So, apart from the classic pulmonary immune-inflammation explanation, the occurrence of an oxygen-deprived blood disease (i.e., hemoglobinopathies with iron metabolism dysregulation) appears to be playing a major role in the pathophysiology of this infection [59]. From the metabolic point of view, the simultaneous inclusion of hemoglobin and kynurenine can be expected. Kynurenine is a metabolic result of tryptophan degradation. In COVID-19 patients, there is a decrease in tryptophan levels and an increase in kynurenine compared with non-infected individuals, and there is a positive correlation between tryptophan levels and hemoglobin in men and women with COVID-19. The mechanism behind this in COVID-19 individuals is an increase in indoleamine 2,3-dioxygenase activity. Cytokine-induced (i.e., interferon-γ and tumor necrosis factor-α) tryptophan degradation via this enzyme suppresses erythropoiesis. Tryptophan is a nutritional pyrrole source essential for hemoglobin synthesis, and therefore the enhanced degradation of tryptophan is involved in a hemoglobin drop of blood levels and in the further development of anemia [60,61]. On the other hand, the kynurenine pathway plays a crucial role in the regulation of the immune response, notably as a counter-regulatory mechanism in the context of inflammation. It has been seen that the immune response against SARS-CoV-2 is different between men and women [62]. A previous article has linked metabolic markers with these immune differences in both sexes (i.e., kynurenic acid, another metabolite involved in the kynurenine pathway) [12]; the authors found that kynurenic acid correlated well with several immune markers only in male patients. In conclusion, the proposed model to address differences between men and women with COVID-19 also detected one metabolite belonging to the kynurenine pathway. Metabolites in the kynurenine pathway are critical immunomodulators, contributing to immunosuppressive activity of dendritic cells and to CD8+ T cell suppression. Therefore, activating this pathway may allow SARS-CoV-2 to evade immunity [63]. The differentiated expression of kynurenine in men and women could be interpreted as a different activation of IDO, which in turn is a consequence of IFNs and cytokines production. A recent study reported that after TLR7 stimulation, IFN levels were lower in men compared with women.
Toll-like receptor 7-mediated IFN expression may be decreased in men due to the known negative effects of testosterone on IFN expression [64]. Sex-disaggregated data are not widely available, and consistent reporting of data based on sex is limited at the moment. We provide here predictive models for men and women, since we demonstrated that there are sex-specific differences in COVID-19 patients. The model that included C10:2, lysoPC a 26:0, lysoPC a 28:0, alpha ketoglutaric acid, lactic acid, cough, fever, anosmia, and dysgeusia differentiated well between women with COVID-19 and women without COVID-19, with some of these being reported in earlier studies; for instance, anosmia and dysgeusia have been found more frequently in women than in men [33]. Between men without COVID-19 and men with COVID-19, cough and metabolites LysoPC a 14:0 and C10:1 were the predictors included in the models. In an earlier study, lysophosphatidylcholine (14:0) was reported to be negatively associated with COVID-19 [65,66]. Decreased plasma lysophosphatidylcholines (LPCs) levels have been associated with unfavorable disease outcomes, such as sepsis mortality and hospital mortality in patients with pneumonia [67]. LPCs have been recognized as important homeostatic mediators involved in inflammation and the activation of immune cells. Furthermore, they have been found to act as strong chemo-attractants for monocytes, T cells, as well as natural killer (NK) cells, drawing them to inflammation sites [67]. Lysophosphatidylcholine (14:0) was found negatively associated with COVID-19 in a study performed by Cai et al. [12]. None of the models proposed by us to differentiate between men with COVID-19 and men without COVID-19 found cytokines/chemokines as important features, while in women, SVM identified IP-10 and IL-6 as important features to discriminate between those with COVID-19 and those without. This may be further evidence about the sex-related differential immune activation.
Studies agree that male sex is a strong risk factor for increased mortality, together with other factors such as circulating hormones, ACE receptors, immune incompetence, older age, and comorbidities such as diabetes and cardiovascular disease [68,69]. Identifying the mechanisms of sexual dimorphism in SARS-CoV-2 could thus provide important information about the physiopathology of COVID-19.
It was observed that the models' performance dropped when tested in blind tests with "new patients". Moreover, when blind-tested, SVM performance was weaker than LR, indicating a tendency to overfit the data. Blind testing showed that the best metrics were obtained when LR was used, as this method builds models with fewer variables and higher statistical significance. Conversely, SVM showed higher performance in terms of model accuracy. Yet, both models maintained statistical power to differentiate and classify COVID-19 patients. Considering accuracy as an important criterion for comparing the performance of models in this domain, it can be argued that SVM has better efficiency than LR. However, regardless of the classification model used, statistically significant metrics and appropriate performances were achieved.

Conclusions
There are clinical and metabolic differences between men and women with COVID-19. The machine learning classification methods used show statistically significant variables that predicted specific sex characteristics. This work reinforces the need to take into account the sex differences to accurately diagnose COVID-19 infection. A combination of kynurenine and hemoglobin could, for instance, help discriminate between infected men and women, revealing underlying metabolic and hematologic differences associated with biological sex. We found that different metabolites are needed to discern COVID-19 in women and in men. In women, a panel consisting of C10:2, lysoPC a 26:0, lysoPC a 28:0, alpha ketoglutaric acid, lactic acid, cough, fever, anosmia, and dysgeusia differentiated well between women with and women without COVID-19 . In men, cough, LysoPC a 14:0, and C10:1 were the most useful variables to differentiate COVID-19. However, we have to acknowledge some limitations. One limitation is the cross-sectional exploratory nature of the study design. This design prevented a longitudinal metabolite assessment as well as monitoring of clinical and hematological perturbations. Furthermore, we did not measure sex hormones, which may have a possible role in the differences found between COVID-19 women and men. Furthermore, due to the lack of collected data, the impact of gender was not analyzed by us.  Informed Consent Statement: Written informed consent was obtained from all participants. All patients included in this study were informed in writing regarding the collection of their samples for research aims and given the right to refuse such uses.

Data Availability Statement:
The data presented in this study are openly available in Mendeley at doi: 10.17632/x9tw3knwsd.1. Figure A2. Forward selection model construction, for the prediction of COVID-19, (A) logistic regression and (B) support vector machine models. Forward selection process was performed by the most frequent feature obtained in the genetic search, then adding one feature at the time, after each feature, the accuracy of the model was obtained; if adding the feature increased the accuracy, the feature was kept. The vertical axis shows the classification accuracy. Solid line represents the overall accuracy calculated by measuring the misclassified samples divided by the total number of samples. Colored dashed lines represent the accuracy per class [36]. Note: The logistic regression function was computed using an external implementation coupled to the GALGO library; therefore, the dashed/average lines for said model could not be computed properly.  The model A was built with cough, lysoPC a 26:0, dysgeusia, anosmia, SM16.0, age. The model B was built with cough, dysgeusia, anosmia, neutrophils, chest pain, PC aa C36:6, obesity, lymphocytes, C10:1, spermidine, lysoPC aa C28:0, fever, tryptophan, platelets, lysoPC aa C26:0, IL-10, lysoPC aaC26:1, IL-6, propionic acid, IP-10, butyric acid Figure A4. Feature-rank stability in 1000 models, for the classification of non-COVID-19 men and women (A) Logistic regression model (B) Support Vector Machine model.The "y positive" axis shows the number of times each feature was included in a given model (the frequency ranking). The "y negative" axis shows the color coded rank of each feature as each model was generated.
The "x" axis shows the features ordered by rank. Starting color for each feature is assigned accordingly to the feature descending rank (from black down to white).  [36]. The logistic regression function was computed using an external implementation coupled to the GALGO library; therefore, the dashed/average lines for this model could not be computed.