Predicting Hypertension Subtypes with Machine Learning Using Targeted Metabolites and Their Ratios

Hypertension is a major global health problem with high prevalence and complex associated health risks. Primary hypertension (PHT) is most common and the reasons behind primary hypertension are largely unknown. Endocrine hypertension (EHT) is another complex form of hypertension with an estimated prevalence varying from 3 to 20% depending on the population studied. It occurs due to underlying conditions associated with hormonal excess mainly related to adrenal tumours and sub-categorised: primary aldosteronism (PA), Cushing’s syndrome (CS), pheochromocytoma or functional paraganglioma (PPGL). Endocrine hypertension is often misdiagnosed as primary hypertension, causing delays in treatment for the underlying condition, reduced quality of life, and costly antihypertensive treatment that is often ineffective. This study systematically used targeted metabolomics and high-throughput machine learning methods to predict the key biomarkers in classifying and distinguishing the various subtypes of endocrine and primary hypertension. The trained models successfully classified CS from PHT and EHT from PHT with 92% specificity on the test set. The most prominent targeted metabolites and metabolite ratios for hypertension identification for different disease comparisons were C18:1, C18:2, and Orn/Arg. Sex was identified as an important feature in CS vs. PHT classification.


Introduction
One of the main risk factors for cardiovascular disease is arterial hypertension. Arterial hypertension is a significant health problem that affects a wide population every year [1]. The underlying mechanisms of primary (essential) arterial hypertension are multiple and largely unknown. There are forms of so-called secondary hypertension, where arterial hypertension is one of the clinical manifestations of the underlying disease. Among those, we distinguish the endocrine hypertension cases, caused by hormonal hypersecretion mainly related to diseases of the adrenal glands. The latter are represented by primary aldosteronism (PA), Cushing's syndrome (CS), and pheochromocytoma/functional paraganglioma (PPGL), which are highly challenging to diagnose in the early stages [2]. The reason for this lies in the cumbersome diagnostic process, requiring complex pre-analytical procedures and expertise in the interpretation of the test results, making it less available for the high number of patients of this global pandemic. Metabolomics has already been successfully used in patients with endocrine-related hypertension [3][4][5] and recently our research group identified different metabolic fingerprint discrimination between primary and endocrine hypertension cases [6]. Metabolomics is a relatively new approach for the parallel and high-throughput identification and quantification of numerous low molecular weight molecules (metabolites). Whilst untargeted metabolomics identifies numerous molecules without prior knowledge of their presence, there is often a lack of quantification and definite biochemical annotation. In contrast, targeted metabolomics provides the advantage of reliable quantification of metabolites with known biochemical annotation making it more suitable for the diagnostic purpose [7].
Machine learning (ML) is capable of processing large datasets in a minimal time frame and can provide accurate clinical insights to aid physicians in diagnosis and treatments. In recent years, ML methods have been widely popular in medicine [8,9], biomarker discovery in high-dimensional omics data [10], and detecting signatures of disease in liquid biopsies [11]. Some studies investigated targeted metabolomics markers of preclinical Alzheimer's disease [12], psoriasis [13], and the detection of intrauterine growth restriction [14]. In the past, a variety of ML methods such as k-nearest neighbours, support vector machines, and decision trees have been evaluated for targeted metabolomics [15,16].
In this study, we investigated various supervised machine learning methods and evaluate their classification performance through overall classification accuracy, specificity, and sensitivity using the targeted metabolomics dataset previously published [6]. The dataset was also investigated within subsets of age and sex to evaluate its impact on the model training, prediction performance, and corresponding selected features. The most prominent metabolites and their ratios were identified for distinguishing various hypertension subtypes.

Omic Dataset
The metabolomics dataset was described in detail in our previous work [6]. Briefly, blood plasma samples were collected from 294 male and female patients between 16-78 years with one of the four underlying hypertension subtypes, (PA, PPGL, CS, and PHT). Of the 282 patients included in the final analyses (see the exclusion of outliers below), we had information on the presence of diabetes mellitus in 88.7% and BMI data for 86.9% of cases. Diabetes mellitus was present in 12% of cases, with a higher prevalence in patients with CS (26.7%) and PPGL (26.5%), as expected [17][18][19]. Obesity (BMI ≥ 30 kg/m 2 ) was present in 24.5% of patients, with the highest prevalence in patients with CS (40%), followed by PA (32.6%), PHT (22.4%), and PPGL (7.7%), in accordance with the literature [17][18][19]. The PA patients comprised of aldosterone-producing adenoma (APA) (n = 66), bilateral adrenal hyperplasia (BAH) (n = 36), and unknown (n = 5, adrenal venous sampling failed: 1 and refused: 4). The samples were provided by 11 centers of the ENS@T-HT consortium (http://www.ensat-ht.eu accessed on 1 June 2022). The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the local ethics committees of participating centers. Table 1 presents a breakdown of the patients by their disease subtypes for analysis, after the exclusion of outliers (see below). The specific inclusion and exclusion criteria for each hypertension subtype are provided in Appendix B. Table 1. Patient data for all disease types namely Cushing's syndrome (CS), primary aldosteronism (PA), pheochromocytoma or paraganglioma (PPGL), and primary hypertension (PHT). There was a significant difference in the distribution of patients according to sex (p < 0.001) and age (p = 0.006) between the disease groups. The difference was significant also when considering CS, PA, and PPGL in the common EHT group for sex (p = 0.009), but not for age (p = 0.088). For distribution difference analysis, the Pearson Chi-Square Test was performed using the SPSS ® Statistics v26.0 (IBM). The targeted metabolomics approach was based on LC-ESI-MS/MS and FIA-ESI-MS/MS measurements by AbsoluteIDQ TM p180 Kit (BIOCRATES Life Sciences AG, Innsbruck, Austria). The assay allows simultaneous quantification of 188 metabolites and includes free carnitine, 39 acylcarnitines, 21 amino acids (19 proteinogenic + citrulline + ornithine), 21 biogenic amines, hexoses (sum of hexoses-about 90-95% glucose), 90 glycerophospholipids (14 lysophosphatidylcholines (lysoPC) and 76 phosphatidylcholines (PC)), and 15 sphingolipids (SM). Further details are provided in Appendix C.

Disease
In addition to the investigated samples, five aliquots of a pooled reference plasma were analysed on each kit plate. The results of these reference plasma aliquots were used for the calculation of potential batch effects and data normalization. We included all metabolite measurements with peaks above the limit of detection, defined as three times the values of the zero samples, as well as those below this threshold if the respective peak was detectable visually. To ensure the comparability of received data between batches, each metabolite value was normalized as previously described [20,21]. Metabolites for which measurement values were valid in less than 3 of 5 reference plasma were excluded from normalization and further statistical analysis. We further excluded metabolites for which the coefficient of variance of reference plasma was >25% within and between batches (exceptions included 8 metabolites for which only the variance between batches, but not within, were only slightly above the predetermined cut-off prior normalization) and those metabolites for which values were not detectable in >40% of samples. From 188 metabolites, 155 passed these selection criteria. In addition to the 155 eligible metabolites, 18 pre-defined metabolite sums and ratios were eligible for further analyses (See Table A1 in Appendix A).
The missing values of the metabolites with <40% of undetectable data were estimated using the KNN method, considering each subgroup of clinical conditions separately [22].
Using the heatmap analysis method, we identified potential outliers among the studied patients as previously described [23], and those patients were excluded from the statistical analysis. In total, 282 patients were eligible for further analyses (See Table 1).
The missing data estimation and outlier detection were performed using the Metabo-Analyst platform [23]. The final dataset was catalogued in RDMP Software [24] for systematic access.

ML Analysis Pipeline
The small metabolites data was evaluated for five different disease comparisons namely All vs. All (i.e., PA vs. PPGL vs. CS vs. PHT), EHT (i.e., PA + PPGL + CS) vs. PHT, PA vs. PHT, PPGL vs. PHT, and CS vs. PHT (See Figure 1). Each of these comparisons was investigated for possible bias due to age and sex by creating six sets. These sets included: A. All patients, all metabolite features (including age and sex); B. All patients, all metabolite features (excluding age and sex); C. Male patients, all metabolite features (including age); D. Female patients, all metabolite features (including age); E. All patients (with age ≥ 50 years), all metabolite features (including sex); and F. All patients (with age < 50 years), all metabolite features (including sex). Set E and F were bifurcated based on average female menopausal age i.e., 50 years to understand the effect of patient age on metabolites. These segregated sets were also useful in comparing their respective significant discriminating features and using them for final model training.
The complete metabolomics dataset was randomly partitioned into 80% training and 20% testing sets (See Table A2 in Appendix A). The training set was used for the Monte Carlo Cross-Validation (MCCV) approach [35] and, therefore, further partitioned into 80% training and 20% validation sets. On the other hand, the testing set was only used to test the final model (See Figure 1). A set of five metrics: balanced accuracy (arithmetic mean between sensitivity and specificity) [36], sensitivity, specificity, F1 score (with beta = 1), and AUC were used to evaluate the classification performance. These were calculated using the confusionMatrix function from caret package [37].
The ML analysis pipeline was divided into three phases. Phase 1 studied the best feature selection and top classification algorithms using All vs. All disease comparison for set A (as they represent the complete dataset) with the MCCV approach. It used 100 random repeats (as in [38]) to train algorithms and then compared their average performance metrics (accuracy, sensitivity, and specificity) on the validation set. The ML analysis pipeline investigated (See Figure 1) three feature selection methods: (a) Using all features, (b) CFS: correlation-based feature selection [25], and (c) Boruta [26]; and eight different supervised learning classifiers (J48 [27], IBk [28], Bayes Net [29], Logitboost [30], Logistic Model Tree (LMT) [31], Simple Logistic (SL) [32], Random Forest (RF) [33], and Sequential minimal optimization (SMO) [34]).
The complete metabolomics dataset was randomly partitioned into 80% training and 20% testing sets (See Table A2 in Appendix A). The training set was used for the Monte Carlo Cross-Validation (MCCV) approach [35] and, therefore, further partitioned into 80% training and 20% validation sets. On the other hand, the testing set was only used to test the final model (See Figure 1). A set of five metrics: balanced accuracy (arithmetic mean between sensitivity and specificity) [36], sensitivity, specificity, F1 score (with beta = 1), and AUC were used to evaluate the classification performance. These were calculated using the confusionMatrix function from caret package [37].
The ML analysis pipeline was divided into three phases. Phase 1 studied the best feature selection and top classification algorithms using All vs All disease comparison for set A (as they represent the complete dataset) with the MCCV approach. It used 100 random repeats (as in [38]) to train algorithms and then compared their average performance metrics (accuracy, sensitivity, and specificity) on the validation set.
In Phase 2, the best feature selection and top 4 classifiers from Phase 1 are used to find the discriminating features (metabolites and their ratios) for remaining disease In Phase 2, the best feature selection and top 4 classifiers from Phase 1 are used to find the discriminating features (metabolites and their ratios) for remaining disease combinations with MCCV. The most selected features during the 100 random repeats are considered as top features and hence saved.
Finally, in Phase 3, the subset of top common features from the training set was downsampled (to avoid class imbalance) and then used for training the best-performing classifier (from Phase 2). This final classifier was then tested on the test set and the predictions were saved (for each disease comparison and set combination). All classifications were implemented with the RWeka package [39] in the R language [40].

Evaluation of Feature Selection Methods & Classifiers
Phase 1 of the ML analysis pipeline investigated ALL vs. ALL (PA vs. PPGL vs. CS vs. PHT) disease comparison using CFS and Boruta feature selection methods. The classification was also performed using all features (i.e., no feature reduction). Table 2 shows the mean values of five performance metrics (i.e., balanced accuracy, sensitivity, specificity, F1 score, and AUC) for all three feature selection approaches when used in conjunction with different classifiers across the 100 MCCV repeats. It was observed that using all features for classification provided the best metrics followed by Boruta and CFS methods. Although the mean accuracies for ALL vs. ALL disease comparisons are low, since it is a complex multi-class problem, still it is evident that Boruta being a wrapperbased method provides reasonably better classification than CFS. Tables A3-A6 show the classification performance for the remaining four disease combinations. Hence, Boruta was empirically selected for the rest of the ML analysis pipeline. Similarly, based on the metrics, SL, LMT, LB, and RF were selected as the top four classifiers. RF was selected instead of NB since it was able to provide a consistent performance irrespective of the choice of the feature selection method). Hence, Boruta and SL, LMT, LB, and RF were selected for Phase 2 of the analysis.

Classification Performance and Discriminating Features
In Phase 2 of the analysis, the classification performance and corresponding top discriminating features for the various disease comparisons were individually evaluated. In Set A and Set B, the highest accuracy (~82%) was observed for CS vs. PHT with SL and LMT. The corresponding F1 score and AUC were 0.8 and 0.9 respectively. On the other hand, RF provided the highest specificity (~92%) in CS vs. PHT (Set A). Although EHT vs. PHT had a low accuracy (~54%) and specificity (16%), it still was able to achieve high sensitivity (~93%) using SL in both Set A and B. The corresponding F1 score and AUC were 0.9 and 0.7 respectively. For ALL vs. ALL, SL and LMT achieved higher accuracy (~60%) and specificity (~80%) in comparison to LB and RF. Amongst the two sets, Set A provided better performance for all five metrics irrespective of the classifier used. As earlier in CS vs. PHT, both SL and LMT provided better performance for PA vs. PHT in comparison to RF and LB. For PPGL vs. PHT, LB and RF outperformed LMT and SL. Overall, there is no notable difference in any of the metrics values within Set A and Set B. This shows that age and sex did not appear as significant features in metabolites-based hypertension classification. In Set C vs. Set D, bifurcation based on patients' sex, higher accuracy was observed for CS vs. PHT in Set D (~73%) compared to Set C (~64%). However, the specificities for Set D were lower than Set C. Also, the corresponding sensitivities for Set D were higher than those compared to Set C. For EHT vs. PHT, PA vs. PHT, and PPGL vs. PHT, Set C had consistently higher accuracies than Set D except for a few classifiers in PPGL vs. PHT. The sensitivities for EHT vs. PHT, PA vs. PHT, and PPGL vs. PHT were higher for the female set (Set D) in comparison to the male set (Set C). The accuracies, sensitivities, and F1 scores for All vs. All were very low for both sets, however, the corresponding specificities were high. In Set A and Set B, the highest accuracy (~82%) was observed for CS vs PHT with SL and LMT. The corresponding F1 score and AUC were 0.8 and 0.9 respectively. On the other hand, RF provided the highest specificity (~92%) in CS vs PHT (Set A). Although EHT vs PHT had a low accuracy (~54%) and specificity (16%), it still was able to achieve high sensitivity (~93%) using SL in both Set A and B. The corresponding F1 score and AUC were 0.9 and 0.7 respectively. For ALL vs ALL, SL and LMT achieved higher accuracy (~60%) and specificity (~80%) in comparison to LB and RF. Amongst the two sets, Set A provided better performance for all five metrics irrespective of the classifier used. As earlier in CS vs PHT, both SL and LMT provided better performance for PA vs PHT in comparison to RF and LB. For PPGL vs PHT, LB and RF outperformed LMT and SL. Overall, there is no notable difference in any of the metrics values within Set A and Set B. This shows that age and sex did not appear as significant features in metabolites-based hypertension classification. In Set C vs Set D, bifurcation based on patients' sex, higher accuracy was observed for CS vs PHT in Set D (~73%) compared to Set C (~64%). However, the specificities for Set D were lower than Set C. Also, the corresponding sensitivities for Set D were higher than those compared to Set C. For EHT vs PHT, PA vs PHT, and PPGL vs PHT, Set C had consistently higher accuracies than Set D except for a few classifiers in PPGL vs PHT. The sensitivities for EHT vs PHT, PA vs PHT, and PPGL vs PHT were higher for the female set (Set D) in comparison to the male set (Set C). The accuracies, sensitivities, and F1 scores for All vs All were very low for both sets, however, the corre- Next, Set E was compared to Set F, where higher accuracies and AUC were observed for younger patients (Set F) only for CS vs. PHT. For other disease combinations, older patients (Set E) had higher accuracies. The specificities for CS vs. PHT and PPGL vs. PHT were higher for Set F than Set E, but opposite in the case of all other disease combinations. Overall, higher sensitivities were observed for EHT vs. PHT in Set F than Set E. Figure 3a shows the list of important metabolites (in green) and metabolite ratios (in pink) with the most common on top and used >50 times during MCCV for various sets within EHT vs. PHT disease classification. C18:1 and C18:2 were the two most prominent features for almost all sets except Set C. Almost similar features were selected for Set A and B. However, for Set C and D, Orn, Orn/Arg, and C9 were not selected for Set D, while C3-DC (C4-OH) was not selected for Set C. Notably, C9 was prominently selected only in Set C and not any other Set. In the case of Set F, three metabolites (C16, SM C16:0, and PC ae C32:2) were selected, which did not appear as prominent in any of the other Sets. On the other hand, Set E Spermidine was selected along with C18:1, C18:2, and Orn. within EHT vs PHT disease classification. C18:1 and C18:2 were the two most prominent features for almost all sets except Set C. Almost similar features were selected for Set A and B. However, for Set C and D, Orn, Orn/Arg, and C9 were not selected for Set D, while C3-DC (C4-OH) was not selected for Set C. Notably, C9 was prominently selected only in Set C and not any other Set. In the case of Set F, three metabolites (C16, SM C16:0, and PC ae C32:2) were selected, which did not appear as prominent in any of the other Sets. On the other hand, Set E Spermidine was selected along with C18:1, C18:2, and Orn.  Figure A1 in Appendix A shows a combined summary list of all features used for classifying the remaining disease combinations for all given sets (Set A-F). Figure 3b shows rank details of selected features during 100 MCCV repeats for EHT vs PHT disease classification based on Set A. Metabolite C18:2 was selected during all 100 MCCV repeats and ranked as second for 32 times, third for 55 times followed by 11 and 2 times in position four and four, respectively. Similarly, C18:1 was selected 99 times,  Figure A1 in Appendix A shows a combined summary list of all features used for classifying the remaining disease combinations for all given sets (Set A-F). Figure 3b shows rank details of selected features during 100 MCCV repeats for EHT vs. PHT disease classification based on Set A. Metabolite C18:2 was selected during all 100 MCCV repeats and ranked as second for 32 times, third for 55 times followed by 11 and 2 times in position four and four, respectively. Similarly, C18:1 was selected 99 times, however, it was ranked first 31 times and second 55 times, followed by 11 and 2 times. This indicates that although C18:2, it was selected more times than C18:1. However, still C18:1 was ranked higher 31 times in comparison to C18:2. In the case of Orn, Orn/Arg, and lysoPC, of C18:2, they are selected as 81, 72, and 59 times, respectively. Amongst the three, Orn was ranked higher consistently (rank third and fourth) and therefore should be considered more important due to its higher ranking. The ranking of all selected features and their frequency of selection during 100 MCCV thus provides a robust evaluation of the prominent discriminating features in disease classification. The corresponding results for the other four disease comparisons were shown in Appendix A (Figures A2-A5).

Final Model Training and Testing
In Phase 3 of the ML pipeline, the training set based on the list of selected features (from Phase 2) is used to train the best classifier (from Phase 1). Table 3 shows the classification results on the test set for the five disease combinations using the best-performing classifier. It also shows the distribution of the reduced feature set along with the balanced accuracy, sensitivity, specificity, F1 score, and AUC. CS vs. PHT provided the best classification (balanced accuracy: 83%, sensitivity: 75%, specificity: 92%) on the test set using the LMT classifier with a reduced set of 22 features (16 metabolites and 5 metabolite ratios and sex). Similarly, for EHT vs. PHT, 92% specificity was achieved although balanced accuracy, and specificity was 74% and 57%, respectively. In terms of age and sex as features, it is evident that age and sex were only selected for ALL vs. ALL and CS vs. PHT respectively and were not used for the training of the remaining three disease combinations' classifiers.
Finally, Table 4 shows the confusion matrix for the classification using the test set for CS vs. PHT disease combination. The values in the diagonal position show the number of correctly classified patients. For example, for CS vs. PHT, 6 CS and 11 PHT patients were correctly classified; however, in total three patients were misclassified. Tables A7-A10 show the confusion matrices for the test sets of the remaining four disease combinations.

Discussion
The application of machine learning has recently facilitated the use of high-throughput omics technologies in healthcare. In this study, we investigate the use of targeted metabolomics data for classifying and distinguishing the various subtypes of endocrine and primary hypertension using machine learning methods. From a clinical perspective, discriminating individuals with endocrine hypertension from primary hypertension is a challenging task that often involves intensive medical work-up and imaging protocols (See details in Appendix B). However, this study used a data-driven approach for identifying metabolomic patterns that can provide further insight into different hypertension subtypes without any other a priori information.
We investigated a range of disease comparisons in different sets using three feature selection methods and eight classifiers with the MCCV approach. Amongst the three feature selection methods, Boruta outperformed others in terms of classification performance as it is a wrapper-based method that detects interactions between features during selection. It evaluates the most optimal subset of features using its importance scoring mechanism [41]. On the other hand, CFS is a filter-based method that does not consider relationships between features during selection. Out of eight, four classifiers (LB, LMT, RF, and SL) provided better performance amongst all while using the same selected metabolomic features.
Our current results correspond well with our preliminary results [6] and also provide a more detailed and insightful feature ranking for each disease classification. For example, in the case of EHT vs. PHT, the common top metabolomic features were C18:2, C18:1, C9, C16, ornithine, spermidine, and ornithine/arginine, pointing to our possible association of acylcarnitine and bioamine metabolic disturbances in the pathogenesis of the morbidity and cardiovascular complications in patients with EHT, as discussed in our previous work [6]. Similarly, for other disease comparisons, distinct discriminating features emerged that can be further investigated. In particular, elevated long-chain acylcarnitines (e.g., C18:1, C18:2) have been observed in patients with heart failure and have been shown to play a role in disrupting cardiac electrophysiology and cell contractility as well as being associated with insulin resistance and diabetes mellitus. The identified amino acids and biogenic amines alterations in patients with endocrine hypertension may be related to increased inflammation and endothelial dysfunction, all of which may contribute together to the increased cardiovascular morbidity observed in EHT compared with PHT, as discussed previously [6]. Further studies are needed to clarify whether these findings are associated with a common pathogenic mechanism or are related to EHT. Instead of using a standardised ML pipeline, this work utilised a novel approach that used three phases to find a robust list of selected metabolomic features, which were used for model training and then evaluated on the test set. The selected features are not considered just based on their random repeat frequency but rather on the number of times a feature is selected along with its ranking, which provides greater insight into the most discriminating features. It was interesting to identify the variation in selected features based on the age of patients. For example, in the case of EHT vs. PHT disease combination, alongside common features (C18:1 and C18:2), a different combination of unique features was selected for patients younger than 50 years of age.
This machine learning-based study had few limitations. Firstly, class imbalance was observed in the acquired dataset. For example, fewer CS patients, since it is a rarer disease. To balance the classifier training, a downsampling approach was adopted, which led to the loss of samples from the majority class. This strong natural disbalance between different aetiologies can be improved in future by using advanced oversampling techniques such as Synthetic Minority Over-sampling TEchnique (SMOTE) [42] for ML model training. Secondly, due to the unavailability of an independent test dataset, the dataset was randomly partitioned into a training/testing dataset for MCCV (with 100 random repeats) approach for an extended validation. The reported results are based on the limited size of the cohort. Further, sensitivity for discrimination was not optimal in all subgroup analyses; it was best in discriminating EHT from PHT. Thus, while we were able to confirm the results of our previous work that our approach could potentially be used as a pre-screening test to identify patients requiring further endocrine testing by a specialist, namely the EHT group [6], it is not suitable for distinguishing the different endocrine entities from each other due to its low sensitivity ( Figure 2). Finally, within our study, we did not differentiate between distinct aetiologies of the hormonal excess in the EHT cases (e.g., adrenal or pituitary cause of cortisol excess, bilateral or unilateral PA).
While clinical presentation, further diagnostic procedures, and treatment will be dependent on the final diagnosis, the overall aim of this study was to evaluate the use of metabolites and their ratios for developing a prediction tool to distinguish the endocrine hypertension forms from primary hypertension as a first screening step in the evaluation of hypertension patients. The subtype classification of the aetiology of hormonal excess in endocrine hypertension cases was considered out of scope at this stage, however, in future studies, it would be interesting to analyse the potential of metabolomics for this purpose. Another study (currently in progress) with a larger prospective dataset would further help in understanding the top discriminating features and allow refinement of the machine learning-based modelling. In future prospective studies, it will be also of interest to analyse the role of metabolomics as a prognostic factor e.g., medical treatment outcome or risk of cardiovascular events in patients with arterial hypertension. Similarly, the most recently studied TroponinT, which is a widely used diagnostic marker for cardiac ischemia, has shown a promising role as a marker for predicting cardiac surgery outcomes [43].

Conclusions
This study classified different hypertension subtypes using targeted metabolomics and their ratios. The ML pipeline comprised of five disease comparisons and nine supervised learning algorithms that used different age and sex-based sets. Amongst all the different disease combinations, CS vs. PHT and EHT vs. PHT provided the highest specificity (92%) on the test dataset using LMT and RF classifiers respectively. The evaluation showed promising results with a reduced set of features, which can be further investigated in the future on a much larger prospective dataset.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by Ethikkommission an der TU Dresden (EK 407122010).

Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data generated or analyzed during this study are included in this published article. Some datasets generated during and/or analyzed during the current study are not publicly accessible but are available from the corresponding author on reasonable request.

Acknowledgments:
We thank all participating centers from the ENSAT-HT consortium for contributing to patient recruitment. We thank Julia Scarpa, Werner Römisch-Margl, and Silke Becker for metabolomics measurements performed at the Helmholtz Zentrum München, Genome Analysis Center, Metabolomics Core Facility. We thank all members of the Genetics Department, Biological Resources Center, and Tumor Bank Platform, Hopital européeen Georges Pompidou (BB-0033-00063) for technical support.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Table A1. List of metabolites measured with the AbsoluteIDQ ® p180 Kit GAC, Helmholtz Zentrum München. Note: Complete list of the 188 metabolites. With the asterisk (*) are marked the 33 metabolites excluded after selection as described in the method section. With the double-asterisk (**) are marked 8 metabolites included in the analyses for which only the variance between batches, but not within the batches, were only slightly above the predetermined cutoff prior normalization. Abbreviations: Cx:y indicates the lipid chain composition, where "x" is the number of carbons and "y" the number of double bonds. LysoPC, lysophosphatidylcholine, PC, phosphatidylcholine; a, acyl; aa, diacyl; ae, acyl-alkyl; SM, sphingomyelin; SM(OH), hydroxysphingomyelin.