Autoimmune diseases (ADs) are a diverse group of chronic disorders, including rheumatoid arthritis (RA), Hashimoto’s thyroiditis (HT), psoriasis (PSO), vitiligo (VIT) and inflammatory bowel diseases (IBD) caused by the loss of tolerance to self by the immune system and currently affect 5–10% of the population [1
ADs can be organ-specific or systemic, leading to different health complications and disabilities. Prevalence rates for ADs can vary due to the high diversity of the group of ADs and the constantly new conditions that are added in the category of ADs and related conditions [2
]. As stated in the report on Autoimmune Diseases Workshop of the European Parliament, data on ADs epidemiology is insufficient and limited to only some of the Ads, providing only a part of the picture [3
ADs are a major public health problem because they are often accompanied by musculoskeletal problems that deteriorate the quality of life, accounting for a significant number of Disability Adjusted Life Years (DALYs) lost due to the condition, having a great economic and mental impact [4
]. ADs rarely have one manifestation, but they are rather combined with other types of Ads. For example, patients with autoimmune thyroiditis (Grave’s or Hashimoto’s diseases) have at least one more AD, with RA being the most prevalent [6
], or cardiometabolic complications further aggravating the disease [7
ADs share common features and molecular pathways that are linked to the loss of self-tolerance from the immune system. Dysregulated immune responses characterized by increased auto-reactive T cells and reduced regulatory T cells lead to the non-resolving low-grade chronic inflammation, which is a hallmark of ADs. Depending on the type of ADs, different autoantigens have been recognized, facilitating the diagnosis of the disease [8
]. Diagnosis of an AD can be time-consuming and expensive, given that ADs can manifest with many different symptoms requiring the consultation of several different specialists before reaching a diagnosis, especially when it can be triggered by a specific treatment, for example antiviral treatments [9
Another important issue of ADs is that they are characterized by relapses and remissions, which manifest as increases and decreases in the immune response markers, including TNFα [10
]. As a consequence, studies aiming to identify biomarkers that can monitor disease severity or treatment efficacy should depend on mechanisms that provide a systemic overview of the cells and organism as a biological system rather than a located immune response.
Metabolomics has attracted increasing attention in the field of biomarker discovery because it captures the interaction of genes and environmental triggers that is expressed at a given time, thus can have clinical application. Additionally, sample collection requires the minimum level of intervention as it can be performed in urine or blood in addition to a more location-specific site such as Cerebrospinal fluid (CSF) [12
]. Last but not least, metabolomics is a low-cost method allowing repeated measurements in a short time, providing close monitoring of the metabolic state of the patient in response to disease and treatment adjustments [13
Organic acids (OAs) are intermediate metabolites of critical cellular metabolic pathways, including but not limited to the energy production pathway in the mitochondria via citric acid or tricarboxylate (TCA) cycle, metabolism of carbohydrates and proteins, ketone bodies’ metabolism and other related pathways [14
]. Additionally, selected OAs have been linked to the microbiome status, antioxidant capacity, metabolism of neurotransmitters and vitamin bioavailability. In addition, OAs can provide valuable information on the nutritional and vitamin adequacy, metabolism of drugs, and microbiome unbalances [15
]. Previous studies analyzing organic acids in autoimmune diseases have significantly contributed to the biomarker discovery, though the limited number of studies and the lack of repeated findings have hampered their validation [16
The aim of the present study was to identify metabolic changes in the OAs of individuals with ADs and develop predictive algorithms for the presence of ADs through the integration of metabolomics and artificial intelligence (AI).
We have performed targeted metabolomics using Gas Chromatography-Mass Spectrometry in a case-control exploratory study of prevalent ADs (RA, THY, PSO, VIT, IBD, MS,) and other less prevalent (OTHER). Correlation and pathways analysis demonstrated metabolite-metabolite correlations and inter-pathway changes in ADs, and a predictive algorithm was developed to estimate the predictive probability of ADs presence based on the OAs profile.
Metabolomics is an emerging tool for the prediction and early diagnosis of autoimmune diseases since it can capture metabolic changes that are associated with the presence of, or predisposition to a disease.
In the present study, we have quantitatively assessed the organic acids profile of patients with ADs, based on which we developed predictive models that reached 92.6% accuracy for patients with AD(s). Previously, we have shown that the integration of artificial intelligence with metabolomics analysis of fatty acids can identify metabolic biomarkers associated with the presence of ADs [18
Comparative analysis of urine OAs in patients with ADs and control demonstrated statistically significant differences in succinic acid, malic acid, pyroglutamic acid, methylmalonic acid, 2-hydroxyglutaric acid, 2-hydroxyisobutyric acid, 2-hydroxybutyric acid, methylcitric acid and 4-hydroxyphenylpuryvic acid, which remained significant after the Bonferroni correction (Table 1
). False discovery rate (FDR) analysis was also performed to identify the affected metabolites after adjusting for multiple corrections and lactic acid, 2-hydroxyisobutyric acid, malic acid, and 3-hydroxybutyric acid reached statistical significance (FDR < 0.05, fold change (FC) > 1.5). Succinic acid and malic acid are key components of the TCA cycle, and in our results, they were markedly decreased in the ADs group compared to the control. Notably, all the metabolites of the TCA cycle were downregulated in the ADs group (citric acid, isocitric acid, oxoglutaric acid, fumaric acid and oxalic acid) even though these differences did not reach statistically significant levels. The TCA cycle is the central metabolic pathway network of the cells for the production of energy. The TCA cycle is fueled by the catabolism of macronutrients (amino acids, carbohydrates, lipids) and ketone metabolism. At the same time, TCA intermediate metabolites serve as substrates for other metabolic networks, the majority of which are summarized in Figure 7
In our study, levels of TCA intermediate metabolites were lower in the AD group compared to the control, suggesting reduced energy production and disrupted fueling of the TCA-linked pathways. A possible explanation of these findings would be the consumption of nutrient-empty foods due to poor dietary habits or reduced intake due to pain and discomfort, leading to insufficient intake of micronutrients, which act as co-factors in metabolism. On the other hand, even if nutrient-dense foods are consumed, malabsorption is very common among patients with ADs and particularly IBD and other ADs with gastrointestinal complications. Vitamins and other micronutrients are not effectively absorbed in the gastrointestinal tract resulting in reduced transport and use in the metabolism.
Levels of pyroglutamic acid or 5-oxoproline, a metabolite of the glutathione cycle that is converted to glutamate by 5-oxoprolinase, were statistically significantly lower in the group of ADs compared to the control. Reduced levels of pyroglutamic acid could indicate low glutathione recycling caused by the insufficiency of dietary amino acids that are required for glutathione synthesis or high glutathione depletion due to the upregulation of detoxification mechanisms [19
]. Pyroglutamic acid is also important for free amino acid transportation, and lower pyroglutamic acid levels have been associated with type 2 diabetes and increased glucose levels [20
2-hydroxybutyric or a-hydroxybutyric acid is naturally produced from the conversion of a-ketobutyrate or 2-oxobutanoate as a byproduct in the anabolism of glutathione when cystathione is converted to cysteine. The production of a-ketobutyrate derives from the degradation of methione and threonine. (Figure 7
). 2-hydroxybutyric acid mainly originates in hepatic cells and reflects the glutathione synthesis flow in conditions of metabolic or oxidative stress, while it has been suggested as an early marker for the evaluation of insulin resistance and impaired glucose levels regulation. In our study, 2-hydroxybutyric acid was markedly decreased in patients with ADs compared to the control, in line with the changed levels of pyroglutamic acid. These findings show a significant disruption of the glutathione cycle and possibly reduced glutathione synthesis and reduced detoxification capacity.
Methylmalonic acid (MMA) is a downstream metabolite of MMA-CoA, participating in the metabolic pathways of vitamin B12 or cobalamin, and is a known marker for Vitamin B12 bioavailability [22
]. In our study, methylmalonic acid was found to be markedly decreased in patients with AD compared to the control, indicating a perturbed metabolic pathway of vitamin B12. A separate role of MMA is in the biosynthesis of pyrimidines (pyrimidine metabolism), the propanoate metabolism and the synthesis of valine, leucine and isoleucine.
2-hydroxyglutaric acid, a widely used marker for gliomas [23
], is naturally produced by 2-ketoglutaric or 2-oxoglutaric in the butanoate metabolism. Abnormal accumulation of 2-hydroxyglutarate is observed in hydroxyglutaric acidurias, an inborn metabolic error characterized by neurometabolic manifestations. In the present study, 2-hydroxyglutaric acid was statistically significantly higher in the ADs group compared to the control. Although the effect of elevated 2-hydroxyglutaric acid in nerve cells has not been deciphered, several links have been proposed, including the promotion of oxidative damage, myelin degradation and the disturbance of nerve cells in energy metabolism [24
2-hydroxyisobutyric acid or a-hydroxyisobutyric was found to be statistically significantly increased in patients with ADs compared to the control. According to the general concept, 2-hydroxyisobutyric acid is not an endogenous metabolite but is a byproduct of methyl tert-butyl ether, which can be obtained from the environment and is rapidly excreted from the body. However, recent studies indicate that 2-hydroxyisobutyric acid is associated with human health [25
] while suggesting that its levels are strongly correlated with endogenous metabolites indicating an endogenous origin [27
3-hydroxybutyric acid or b-hydroxybutyric acid is a member of the ketone bodies (including also acetoacetic acid), which are formed in the liver from fatty acids in periods of fasting and carbohydrates restrictive diets. Ketone bodies can also be formed after intensive exercise, excessive alcohol consumption or type 1 diabetes. Their natural role is to fuel the citric acid cycle to provide energy, or they can be converted into long-chain fatty acids in the brain. The group of ADs had elevated levels of 3-hydroxybutyric acid, which reached statistical significance after FDR adjustment (Figure 2
). Elevated levels of 3-hydroxybutyric acid are a clinical marker of ketoacidosis and disturbed insulin sensitivity in fasted and diabetic patients. Therefore, markers of insulin sensitivity, including 3-hydroxybutyric acid and 2-hydroxybutyric acid, may have application in ADs due to the close interrelationship between insulin elevated levels causing lipolysis reduction and excessive fatty acids storage that results in local inflammation [28
4-hydroxyphenylpyruvic acid (4-HPPA) is a keto acid involved in the tyrosine catabolic pathway. In particular, 4HPPA can be biosynthesized from L-tyrosine through its interaction with tyrosine aminotransferase. Subsequently, 4HPPA can be converted into homogentisic acid, mediated by 4-hydroxyphenylpyruvate dioxygenase. Homogentisic acid contributes to the regulation of the tocopherol and tocotrienol biosynthetic pathway (Vitamin E biosynthesis). Moreover, 4-HPPA, via its multistep conversion into 4-hydroxybenzoate, is related to the ubiquinone biosynthetic pathway. Ubiquinone, also known as coenzyme Q, is a coenzyme family, with coenzyme Q10 being the most common form in humans, present primarily in the mitochondria as a component of the electron transport chain and aerobic cellular respiration [29
]. Vitamin C, which is involved in the oxidative degradation of tyrosine, is associated with the activity of HPPD, suggesting that 4-HPPA would be a valuable marker for vitamin C bioavailability and uptake [30
]. In the present study, 4-HPPA was found to be significantly decreased in patients with AD, indicating an abnormal metabolism of tyrosine and possible association with vitamin C bioavailability. In a previous study, 4-HPPA was found to be associated with diabetes [31
] and autoimmune thyroiditis [32
Enrichment analysis was performed for the 28 metabolites. The butanoate metabolism pathway was found to be the most important metabolic pathway since succinic acid, 2-ketoglutaric acid, 2-hydroxyglutaric acid and 3-hydroxybutyric acid were identified in the pathway, followed by the propanoate metabolism.
Butanoate or butyrate metabolism is responsible for the metabolism of butyric acid, which is formed under bacterial fermentation of carbohydrates to succinic acid for the citric acid cycle, the formation of ketone bodies (3-hydroxybutyric and acetoacetate), or short-chain lipids. Based on our results, butanoate metabolism is substantially altered, which can be seen by the altered levels of metabolites directly involved in the metabolism of butyrate (namely 2-hydroxyglutaric acid and succinic acid) but also the related pathways.
Propanoate metabolism is responsible for the metabolism of propionate through a metabolic reaction pathway where propionate is converted to propionyl-CoA and then to MMA under the activity of MMA-CoA mutase and vitamin B12 and then to succinyl-CoA and succinic acid, which is further used in the citric acid cycle. The origin of propionic acid is the intestinal microflora, while propionyl-CoA can derive from fatty acids or amino acids metabolism. Collectively, in our combination of metabolites, MMA, succinic, methylcitric, and 2-hydroxybutyric participate in the propanoate metabolism. Our findings suggest that patients with AD have significant disturbance in propanoate metabolism.
Aiming to explore the potency of organic acids as predictive biomarkers for ADs, three predictive models were developed using as input the absolute concentrations of organic acids, age, gender, BMI, alcohol consumption and physical exercise levels. PCA, a variable reduction method, was used to identify similarities and differences among the AD group and the control group, reaching 66.8% predictive accuracy. Binary logistic regression model analysis of the Bonferroni corrected metabolites identified two metabolites and two lifestyle variables as being determinant for the model. 2-hydroxyisobutyric and 2-hydroxybutyric were negatively and positively associated with the absence of AD, reaching statistical significance (p
< 0.0001 and p
= 0.015), respectively. In line with our previously published work, exercise was positively associated with the absence of AD (p
< 0.0001), while alcohol consumption was negatively associated with the absence of AD (p
= 0.002). Besides which, ANN analysis of organic acids and lifestyle factors showed that the most important predictors were the following, in order of importance: pyroglutamic acid, 2-hydroxyglutaric, 2-hydroxyisobutyric, 2-hydroxybutyric and methylmalonic acid. It should be noted that the “relative importance” depicted in Figure 6
of ANN variables refers only to the presence or the lack of predictive information for each variable and does not represent any particular information concerning the statistical significance of the included variables, which is given a priori in any ANN model. Predictive accuracy values from the binary logistic regression model and the ANN were comparably reaching 74.9% and 66.8% overall score, respectively, though ANN was more potent in the discrimination of the AD group (92.6%).
A strength of this study is the integration of targeted metabolomic analysis of selected organic acids that participate in key cellular metabolic pathways with advanced statistics and artificial intelligence. Targeted metabolomics, the quantitative analysis of known metabolites in human biofluid samples, is a sensitive and low-cost method that allows the determination and measurement of a priori selected metabolites. As discussed elsewhere, the advantage of targeted metabolomics over untargeted metabolomics is that it can have application in the validation of potent predictive biomarkers facilitating their application in clinical practice [16
]. ADs, as with many other chronic diseases, pre-exist years before symptoms appear, and unfortunately, diagnosis is performed only once the disease is established and has resulted in partly tissue or organ damage. Consequently, there is a big challenge for physicians to manage the symptomatology of ADs and slow down their progression to extend life expectancy and improve their quality of life [33
]. Proper use of valid biomarkers, in addition to the regular check-up, would potentiate the prediction and subsequent early diagnosis of ADs.
The present study has some limitations. The analysis of ADs as a group may hamper the disease-specific metabolic profile that could have a diagnostic value. However, as discussed elsewhere, ADs share common features, including genetic loci and molecular pathways, suggesting that a grouped analysis would provide valuable information on the common metabolic disturbance [18
]. Additionally, comorbidities are substantially frequent in ADs, and in some cases, an underlying AD might be undiagnosed or unnoticed for years, hampering the single-disease study analysis. Recent evidence also suggests that different ADs such as myasthenia gravis and rheumatoid arthritis have metabolic overlap enhancing the view of common immunometabolic pathways among ADs [32
]. As has been described in the related literature [13
], sample size determination remains a complex step in metabolomic studies since this type of data is correlated and very sensitive. In statistical theory, there are some attempts to identify significant effects via the determination of the adequate sample size in order to capture patient heterogeneity, type I and type II errors. Nonetheless, in practical research, there are restrictions on the availability of training samples, and usually, researchers include only 30–50 patients per group. Despite the fact that in the present work the number of participants was well above this number, some over-fitting issues still remain, and our results should be interpreted with caution. To limit this type of bias, we investigated several ANN models with more complex structures (two hidden layers), but the overfitting was even higher in this case. Hence, we used an ANN model with a simple structure (one hidden layer), and we also split our data into three different data sets (training, test, holdout) to measure the level of overfitting. The difference between the predictive accuracy of “Test” dataset (79.2%) vs. “Holdout” dataset (66.7%) represents the magnitude of overfitting. This model, despite these issues, could serve as a starting point and a benchmark for future work in this field.
Statistical analysis indicates that although the percentage of correct case groups is more than 90%, this model cannot satisfactorily predict the control group and thus, the predictive power of the models is rather limited. This effect has been previously observed by our research team when analyzing total fatty acids in an ADs group, and the results were comparable. As a general comment, we need to highlight that the selection of control groups for predictive and analytic purposes is a common issue in case-control studies. In our case-control study, the aim was to investigate the differential expression and predictive value of organic acids for Ads, having as a hypothesis that these are different in case and controls. However, absolute metabolite values are very sensitive to diet and lifestyle factors (as also shown in our study), thus making the control sample diverse and overlapping with the ADs group. However, as can be observed from our analysis, ADs are associated with OAs levels despite the lifestyle-associated fluctuations of metabolites.
From a statistical standpoint, even though we have conducted advanced non-linear techniques to investigate the differences between the two study arms, the inclusion criteria of healthy individuals should be considered in future metabolomics studies. Hence, selection bias is an important issue in the question at hand since the ideal control group would comprise a random sample from the general population that gave rise to the cases [37
]. In our case, we included individuals with no diagnosed disease following the inclusion criteria. However, a portion of the sample may have a different metabolic profile (compared to the rest of the control group), possibly related to diet, lifestyle or underlying metabolic complication, which cannot be pre-assessed with established clinical markers. This is depicted in the large standard deviations of the significantly dysregulated metabolites.
To overcome this barrier in the field, large studies covering the above-mentioned factors affecting the metabolites followed by longitudinal studies have to be conducted to optimize the control group criteria for these types of studies by defining the healthy metabolic group.