A Comprehensive Metabolism-Related Gene Signature Predicts the Survival of Patients with Acute Myeloid Leukemia

(1) Background: Acute myeloid leukemia (AML) is a clonal malignancy with heterogeneity in genomics and clinical outcome. Metabolism reprogramming has been increasingly recognized to play an important role in the leukemogenesis and prognosis in AML. A comprehensive prognostic model based on metabolism signatures has not yet been developed. (2) Methods: We applied Cox regression analysis and the least absolute shrinkage and selection operator (LASSO) normalization to establish a metabolism-related prognostic gene signature based on glycolysis, fatty acid metabolism, and the tricarboxylic acid cycle gene signatures. The Cancer Genome Atlas-Acute Myeloid Leukemia-like (TCGA-LAML) cohort was set as the training dataset for model construction. Three independent AML cohorts (GSE37642, GSE10358, and GSE12417) combined from Gene Expression Omnibus (GEO) datasets and the Beat-AML dataset were retrieved as two validation sets to test the robustness of the model. The transcriptome data and clinic information of the cohorts were enrolled for the analysis. (3) Results: Divided by the median value of the metabolism risk score, the five-year overall survival (OS) of the high-risk and low-risk groups in the training set were 8.2% and 41.3% (p < 0.001), respectively. The five-year OS of the high-risk and low-risk groups in the combined GEO cohort were 25.5% and 37.3% (p = 0.002), respectively. In the Beat-AML cohort, the three-year OS of the high-risk and low-risk groups were 16.2% and 40.2% (p = 0.0035), respectively. The metabolism risk score showed a significantly negative association with the long-term survival of AML. Furthermore, this metabolism risk score was an independent unfavorable factor for OS by univariate analysis and multivariate analysis. (4) Conclusions: Our study constructed a comprehensive metabolism-related signature with twelve metabolism-related genes for the risk stratification and outcome prediction of AML. This novel signature might contribute to a better use of metabolism reprogramming factors as prognostic markers and provide novel insights into potential metabolism targets for AML treatment.


Introduction
Acute myeloid leukemia (AML) is a malignant clonal disease characterized by a blockade in the differentiation of hematopoietic stem and progenitor cells, leading to the abnormal proliferation of immature myeloblasts.Despite advancements in hematopoietic stem cell transplantation (HSCT) and novel agents, the prognosis of AML patients remains suboptimal, with approximately 70% of patients who achieve remission eventually experiencing relapse.The 5-year overall survival (OS) rate is still unsatisfactory [1].Risk stratification based on cytogenetics and genomic signatures has been widely used in clinical practice to identify favorable, intermediate, and unfavorable risk groups.However, due to the genetic mutation diversity and high heterogeneity of AML, current risk stratification methods have limitations in accurately predicting the outcome of all patients, particularly those with multiple mutations.Therefore, there is an urgent need to identify prognostic features that can serve as novel therapeutic targets and be applied in risk stratification and treatment guidance to improve clinical outcomes.
Metabolism reprogramming has gained increasing recognition for its significant role in tumor cell proliferation, invasion, and survival.This recognition has opened up promising avenues for the development of novel therapeutic targets.Previous studies have demonstrated that cancer cells, unlike normal cells, preferentially utilize aerobic glycolysis and enhance flux through the truncated tricarboxylic acid (TCA) cycle to support tumor growth.In vitro results have shown that increased glycolysis contributes to AML cell resistance to apoptosis induction by chemotherapeutics [2].Conversely, the inhibition of glycolysis suppresses leukemia cell proliferation and enhances the cytotoxicity of cytarabine [3].Specific mutant isocitrate dehydrogenase 1 and 2 (IDH1/2) inhibitors reduce the catalyzation of mutant IDH1/2, which converts α-ketoglutarate (α-KG) in the TCA cycle to the oncometabolite 2-hydroxyglutarate .This alteration competes with α-KG and affects DNA and histone demethylases, ultimately promoting leukemogenesis [4,5].Moreover, disrupting the TCA cycle in primary AML blasts using BCL-2 inhibitors combined with hypomethylating agents (HMAs) has been shown to eliminate leukemia stem cells (LSCs) by suppressing oxidative phosphorylation (OXPHOS) [6].Additionally, AML cells rely more on fatty acid β-oxidation for energy production and membrane biogenesis, in contrast to the repressed de novo synthesis of fatty acids in differentiated cells [7].Several studies have demonstrated that different rate-limiting enzymes in fatty acid oxidation and synthesis are overexpressed in certain AML cell lines and are associated with worse patient survival [8].A phase 2 clinical trial showed that the combination of statins and chemotherapy improved the complete remission (CR) and CR with incomplete count recovery (CRi) rates in relapsed/refractory (R/R) AML [9].
Based on the aforementioned studies, we conducted a study to establish a metabolismrelated prognostic model that mainly focuses on the combination of genes related to glycolysis, fatty acid metabolism, and the TCA cycle for predicting long-term prognosis in AML patients.We obtained gene expression profiles and corresponding clinical information of patients from the Cancer Genome Atlas-Acute Myeloid Leukemia-like (TCGA-LAML) project to perform Cox regression analysis and least absolute shrinkage and selection operator (LASSO) normalization.Our findings revealed the prognostic value of metabolism signatures and provided novel insights into potential metabolism-related therapeutic targets for AML.
The biological and clinical data referring to patients of the three cohorts are summarized in Table 1.The follow-up duration of Beat-AML was significantly shorter than the other two cohorts (p < 0.001).Differences in general clinical information including median age, gender, WBC count at diagnosis, and BM blast of training and validation sets are not statistically significant.There were significantly fewer AML-M1 and AML-M2 patients in the Beat-AML cohort than the other two cohorts (p < 0.001), and more favorable risk cases in the Beat-AML than in the TCGA-LAML cohort (p < 0.001).Most patients in the TCGA and Beat-AML cohorts were treated with "7 + 3" standard chemotherapy.Patients in the GEO datasets were treated according to the AMLCG-1999 (NCT00266136) protocol, including daunorubicin and high-dose cytarabine in induction, and the CALGB (NCT00002925) protocol, including cytarabine, daunorubicin, and etoposide.The "cytogenetic risk" referred to 2022 ELN (European Leukemia Net) risk classification by genetics at initial diagnosis [16].Common favorable cytogenetic abnormalities included t(8;21), inv (16), and NPM1 mutations without FLT3-internal tandem duplications (ITD) mutations.Intermediate cytogenetic abnormalities included FLT3-ITD mutations, t(9;11), and cytogenetic features that do not belong to favorable or high-risk groups.Highrisk cytogenetic abnormalities mainly include t(v;11) complex karyotypes and TP53 mutations.Notably, there is necessary information for determining cytogenetic risk in the TCGA-LAML and Beat-AML cohorts to stratify these cases, while there is not enough information for stratification in GEO cohorts.

Differentiation of Metabolic Status of the Patients in the TCGA-LAML Dataset
The upregulation of metabolism-related pathways is known to be associated with increased metabolic activity, and the key enzymes involved in metabolism play a critical role in determining the rate of metabolic processes.Rate-limiting enzymes can affect the overall speed of the entire metabolic pathway.Table 2 shows the key rate-limiting enzymes in glycolysis, the TCA cycle, and fatty acid metabolism, and their corresponding gene symbols.While the relationship between enzyme activity and metabolic flux is complex, several studies have demonstrated that gene expression levels can partially reflect metabolic activity [17,18].The expression value of each enzyme was the average of the sum expression value of all gene types encoding this enzyme.Firstly, we used the average of the sum of HK1, HK2, and HK3 expression values to represent the expression value of hexokinase.This way, the expression value of phosphofructokinase-1 and pyruvate kinase were calculated.Secondly, the sum of the expression values of these three rate-limiting enzymes in glycolysis was calculated to represent the glycolysis pathway activity.Similarly, we calculated the sum of the expression values of corresponding rate-limiting enzymes of the TCA cycle pathway and fatty acid metabolism pathway.Finally, we add them together to distinguish the metabolic status of patients.By the median value of the total expression value of seven key rate-limiting enzymes of the three metabolism pathways, we separated the TCGA-LAML training cohort into metabolism high and metabolism low groups.

Gene Set Enrichment Analysis
Gene set enrichment analysis (GSEA) is a bioinformatics method for analyzing largescale gene expression data, which was accomplished through the GSEA software v4.3.2 (Broad Institute) [19].It aims to investigate the relationship between gene sets and specific biological conditions.To access annotated gene sets for GSEA, we utilized the Molecular Signature Database (MSigDB), which is a comprehensive resource of annotated gene sets for use in GSEA software, available at https://www.gsea-msigdb.org/gsea/msigdb/index.jsp (accessed on 1 October 2022).We obtained the c2.all.v2023.2.Hs.symbols.gmt[Curated](c2 gene set) from MSigDB to serve as the reference gene set for subsequent analysis, which included 66 gene sets associated with the TCA cycle, glycolysis, and fatty acid metabolism.
Based on overall survival, 131 TCGA-LAML patients were grouped into long-term survival group (OS ≥ 12 months) and short-term survival group (OS < 12 months).Firstly, GSEA was performed on long-term and short-term survival groups in the TCGA cohort based on the c2 gene set to investigate the distinct features of metabolic processes associated with survival.Secondly, we derived all 66 gene sets related to the TCA cycle, glycolysis, and fatty acid metabolism pathways from MSigDB, and performed GSEA on metabolism high and metabolism low groups based on these 66 metabolic-related gene sets.Thirdly, we conducted leading-edge analysis in GSEA software, aiming to identify the leading-edge genes.The leading-edge genes are the main drivers of the enrichment signal in enriched gene sets with FDR < 0.1 and normalized enriched score (|NES|) > 1.5 after GSEA on metabolism high and metabolism low groups.

Establishment and Validation of Prognostic Model
Firstly, univariate Cox regression analysis was applied on all leading-edge genes from GSEA to assess the impact of the expression level of each leading-edge gene on OS, which was accomplished by survival package in R [20].Genes with p-values less than 0.05 were selected as prognosis-related genes.The LASSO regression is a statistical technique for linear regression that selects important features and prevents overfitting by shrinking some coefficients to zero.Ten-fold cross-validation is a technique used to evaluate and compare models.The LASSO regression was performed on prognosis-related genes, through glmnet packages (version 4.1.8)[21] in R, and the lambda value was selected with the smallest likelihood bias as the optimal lambda value by ten-fold cross-validation.Finally, we identify the optimal set of genes and the corresponding regression coefficients of these genes.The prognosis risk score was established with the following formula: where β i represented the coefficient of gene i from regression results and E i represented the expression level of gene i .
The risk scores were calculated for each case in the TCGA-LAML (n = 131), Beat-AML (n = 252), and GEO cohorts (n = 300), based on the normalized expression data in each case.Patients were subsequently divided into high-risk and low-risk groups according to the median cutoff of the prognosis risk score.The prognostic performance was evaluated by using time-dependent receiver operating characteristic (ROC) curve analysis within three years and five years to evaluate the predictive accuracy and sensitivity of our prognostic model.The overall survival probability of AML patients in low-and high-risk groups was estimated by the Kaplan-Meier method and compared through the log rank test.A nomogram was created by rms (Regression Modeling Strategies)1 packages (version 6.3-0) [22] and survival2 packages in R (version 3.4-0).
In addition, we performed univariate and multivariate Cox regression analysis on the patients in the TCGA training cohort and the Beat-AML validation cohort to assess the validity of the metabolism-related risk score incorporated with several widely used clinical factors in predicting prognosis, which included gender, age, WBC count, bone marrow blast percentage, and cytogenetic risk.

Statistical Analysis
Continuous variables are presented as median and interquartile range (IQR) and compared using the Mann-Whitney U-test.Categorical variables were analyzed using Fisher's exact test.A two-tailed p < 0.05 was considered statistically significant.All statistical analyses were performed using the R software 4.0.2(The CRAN project, www.rproject.org,accessed on 1 October 2022).

Summary of the Methods
The methods and the sequential steps performed to construct and validate the metabolismrelated gene prognostic signature are presented in Figure 1.

Summary of the Methods
The methods and the sequential steps performed to construct and validate the metabolism-related gene prognostic signature are presented in Figure 1.

Comparison of the Metabolic Pathways between Long-Term and Short-Term Survival Groups in AML Patients from the TCGA-LAML Dataset
Table S2 shows the metabolic pathways that were significantly enriched (FDR < 0.25) in the gene expression data of the group with short-term survival (OS < 12 months) compared with the group with long-term survival (OS > 12 months) in the TCGA-LAML database.Among all pathways in the c2 gene set from MSigDB, the most significantly enriched metabolic pathways are associated with the TCA cycle, glycolysis, and fatty acid metabolism, which suggested these three metabolic pathways contributed the most to the impact on the survival of AML patients.Thus, we then focused on these three metabolic pathways to further explore the relationship between the metabolic signatures and survival of AML patients.By integrating the expression level of the rate-limiting enzymes in  Table S2 shows the metabolic pathways that were significantly enriched (FDR < 0.25) in the gene expression data of the group with short-term survival (OS < 12 months) compared with the group with long-term survival (OS > 12 months) in the TCGA-LAML database.Among all pathways in the c2 gene set from MSigDB, the most significantly enriched metabolic pathways are associated with the TCA cycle, glycolysis, and fatty acid metabolism, which suggested these three metabolic pathways contributed the most to the impact on the survival of AML patients.Thus, we then focused on these three metabolic pathways to further explore the relationship between the metabolic signatures and survival of AML patients.By integrating the expression level of the rate-limiting enzymes in glycolysis, the TCA cycle, and fatty acid metabolism pathways, we differentiated the metabolic status of AML patients in the training cohort for the sequential study.

Identification of Leading-Edge Gene
After performing GSEA on the metabolism high (n = 65) and metabolism low (n = 66) group in the TCGA-LAML training cohort, we selected seven gene sets with FDR < 0.1 and |NES| > 1.5, which included the glycolysis pathway, glycolysis gluconeogenesis, citrate cycle TCA cycle, and oxidative phosphorylation pathways, glucose import, fatty acid β-oxidation, and fatty acid β-oxidation using acyl CoA oxidase pathways.Then, we identified 153 leading-edge genes with core enrichment by leading-edge analysis on the seven enriched gene sets.Table S3 shows the metabolic pathways significantly enriched in the gene expression data of the metabolism high compared with metabolism low group and the description of genes in the correspondent gene sets.
Secondly, to further screen the most predictive genes in these 33 genes, the statistical method LASSO and ten-fold internal cross-validation were utilized.Ultimately, we determined the best lambda value (0.0625) and used the β Finally, with the median metabolism-related risk score as the cutoff value, the training set (TCGA-LAML) was separated into two groups (high-risk, n = 65 vs. low-risk, n = 66).The five-year OS of the high-risk and low-risk groups were 8.2% (95% CI, 2.6-25.7%)and 41.3% (95 CI, 29.2-58.3%,p <0.001, Figure 2A), respectively, and the results witnessed a survival advantage in the low-risk group.The AUC value of the metabolism-related risk for five-year OS for AML patients in the training set was 0.703 (Figure 2B), indicating that the metabolism-related gene signature had an accurate predictive capacity for prognosis in AML.The survival analysis indicated that, of the two validation cohorts, the samples in the high-risk groups both had significantly poorer outcomes than those in the low-risk groups (p = 0.0035, p = 0.002, respectively).The AUC value of the metabolism-related risk model for three-year OS for AML patients in the Beat-AML cohort was 0.694 (Figure 3C), and for five-year OS in the combined GEO validation cohort was 0.600 (Figure 3D), indicating that the metabolism-related risk model was a reliable prognostic signature.

Metabolism-Related Gene Prognostic Signature Is an Independent Prognostic Factor
Table 3 shows the results of univariate and multivariate Cox regression analysis based on the OS in the TCGA-LAML and Beat-AML cohorts.The AUC value of the metabolism-related risk model for three-year OS for AML patients in the Beat-AML cohort was 0.694 (Figure 3C), and for five-year OS in the combined GEO validation cohort was 0.600 (Figure 3D), indicating that the metabolism-related risk model was a reliable prognostic signature.

Metabolism-Related Gene Prognostic Signature Is an Independent Prognostic Factor
Table 3 shows the results of univariate and multivariate Cox regression analysis based on the OS in the TCGA-LAML and Beat-AML cohorts.The univariate analysis indicated that age, WBC count, cytogenetic risk, and the metabolism-related risk score were the significant unfavorable factors associated with OS.Then, the above four factors were further included in the multivariate analysis and the metabolism-related risk score, presenting an independent prognostic factor after adjusting for other clinical variables.
The construction of the OS-predictive nomogram for clinical application is demonstrated in Figure 4.After multivariate Cox proportional hazard regression, age, cytogenetic risk, and the metabolism risk score were integrated to construct a prognostic nomogram for better evaluating an individual's risk in the clinical setting.The AUC values of the prognostic nomogram for 1-year, 2-year, and 3-year OS were 0.820, 0.813, and 0.760, respectively, indicating the favorable capability of the nomogram to estimate survival for AML patients (Figure 4B).The univariate analysis indicated that age, WBC count, cytogenetic risk, and the metabolism-related risk score were the significant unfavorable factors associated with OS.Then, the above four factors were further included in the multivariate analysis and the metabolism-related risk score, presenting an independent prognostic factor after adjusting for other clinical variables.
The construction of the OS-predictive nomogram for clinical application is demonstrated in Figure 4.After multivariate Cox proportional hazard regression, age, cytogenetic risk, and the metabolism risk score were integrated to construct a prognostic nomogram for better evaluating an individual's risk in the clinical setting.The AUC values of the prognostic nomogram for 1-year, 2-year, and 3-year OS were 0.820, 0.813, and 0.760, respectively, indicating the favorable capability of the nomogram to estimate survival for AML patients (Figure 4B).According to the total score, the corresponding 1-year, 2-year, and 3-year survival probability of an individual patient could be obtained (A).ROC analysis and AUC for 1-year, 2-year, and 3-year survival of the nomogram in the Beat-AML cohort (B).For example, a 60-year-old patient with AML having an intermediate cytogenetic risk has a high metabolism risk.The individual score of cytogenetic risk, age, and metabolism risk is shown successively in the "Points" line (the red point on the line).The total score of the patient by adding three individual scores is marked in the "Total Points" line (the red point on the line).The corresponding 1-year, 2-year, and 3-year survival probability of the patient could be obtained by the corresponding straight line (the red straight line).
In addition, among 12 involved metabolism-related genes, Kaplan-Meier curves showed that a high expression of ALHD2 was significantly associated with poorer outcomes in AML patients in two validation cohorts (p = 0.0017, p = 0.035, respectively), which might be an independent biomarker of poor prognosis (Figure 5A,B).Kaplan-Meier curves for the overall survival of AML patients with high and low expression of ABCB11, ENO1, HK2, INSR, NUP210, PC, SDHA, SDHB, SESN2, SLC27A4, and SORT1 in the Beat-AML and combined GEO cohorts are presented in Figure S1.
x FOR PEER REVIEW 11 of 19 the individual scores of all variable values.According to the total score, the corresponding 1-year, 2-year, and 3-year survival probability of an individual patient could be obtained (A).ROC analysis and AUC for 1-year, 2-year, and 3-year survival of the nomogram in the Beat-AML cohort (B).For example, a 60-year-old patient with AML having an intermediate cytogenetic risk has a high metabolism risk.The individual score of cytogenetic risk, age, and metabolism risk is shown successively in the "Points" line (the red point on the line).The total score of the patient by adding three individual scores is marked in the "Total Points" line (the red point on the line).The corresponding 1-year, 2-year, and 3-year survival probability of the patient could be obtained by the corresponding straight line (the red straight line).
In addition, among 12 involved metabolism-related genes, Kaplan-Meier curves showed that a high expression of ALHD2 was significantly associated with poorer outcomes in AML patients in two validation cohorts (p = 0.0017, p = 0.035, respectively), which might be an independent biomarker of poor prognosis (Figure 5A,B).Kaplan-Meier curves for the overall survival of AML patients with high and low expression of ABCB11, ENO1, HK2, INSR, NUP210, PC, SDHA, SDHB, SESN2, SLC27A4, and SORT1 in the Beat-AML and combined GEO cohorts are presented in Figure S1.

Discussion
AML has been studied thoroughly in the aspects of epigenomic and genomic sequencing, gene transcription, and protein expression.ELN risk stratification, the widely used prognostic system based on cytogenetics and genomics, failed to predict the accurate survival situation between heterogeneous intermediate-risk groups in AML.Previous studies demonstrated that metabolism reprogramming, including glycolysis, fatty acid metabolism, and the TCA cycle, was associated with leukemogenesis, therapeutic resistance, and poorer outcomes in AML [8,23].Hence, we constructed a comprehensive prognostic model based on glycolysis, fatty acid metabolism, and the TCA cycle, which was an independent prognostic factor and showed a robust predictive ability in the long-

Discussion
AML has been studied thoroughly in the aspects of epigenomic and genomic sequencing, gene transcription, and protein expression.ELN risk stratification, the widely used prognostic system based on cytogenetics and genomics, failed to predict the accurate survival situation between heterogeneous intermediate-risk groups in AML.Previous studies demonstrated that metabolism reprogramming, including glycolysis, fatty acid metabolism, and the TCA cycle, was associated with leukemogenesis, therapeutic resistance, and poorer outcomes in AML [8,23].Hence, we constructed a comprehensive prognostic model based on glycolysis, fatty acid metabolism, and the TCA cycle, which was an independent prognostic factor and showed a robust predictive ability in the long-term survival of AML in the training and validation sets.We separated the TCGA-LAML cohort into the high-risk and low-risk group with the median metabolism-related risk score.The results of the survival analysis witnessed a survival advantage in the low-risk group.As for the validation cohorts, a survival advantage was also demonstrated in the low-risk group in the Beat-AML and combined GEO cohort.The AUCs calculated for the two validation cohorts further verified the robustness of the metabolism-related risk signature.Our study suggested that the metabolism-related risk score could be a supplemental tool for the risk stratification of AML.
Several groups have demonstrated different metabolism signatures, focused on a specific metabolism pathway analysis, with a predictive performance of AML survival and treatment guidance.Table 4 shows a comparison of four previous studies that created a metabolism-related prognostic signature, including the study design and methods, the training and validation cohorts, the main results, the metabolism-related signatures generated by these studies, and the prognostic significance of these signatures with our study.A Chinese group generated a prognosis risk score with a panel of six serum glucose metabolism markers, which displayed an independent prognostic value in cytogenetically normal AML patients [3].Moreover, the consistency and accuracy of a carbohydrate-metabolismrelated gene prognostic signature on the predictive performance of AML survival was validated using GEO cohorts and the authors' own cohort [24].Another group recently developed a distinct six-lipid-metabolism-related-gene prognostic risk signature for AML [25].Wei et al. proposed a metabolism-related prognostic signature index consisting of three metabolism-related gene pairs [26].The combination of MRPSI and age as a composite metabolism-clinical prognostic model index demonstrated better prognostic accuracy.
As shown in Table 4, each study has a different focus, and several single metabolomic pathway-based signatures have been reported to be capable of aiding in improving the prediction accuracy of AML.However, a comprehensive metabolic signature for AML is still lacking.After performing GSEA on the short-term and long-term survivors, diverse metabolic pathways mainly including the TCA cycle, glycolysis, and fatty acid metabolism pathway were significantly enriched in the short-term survival group.Thus, our study constructed a comprehensive metabolic signature mainly focused on a combination of the TCA cycle, glycolysis, and fatty acid metabolism pathway for AML.We integrated the rates of these three metabolic pathways to differentiate the metabolism high/low group in the training cohorts and identified the leading-edge genes between the two groups to further establish the metabolism-related risk signature.
Our study created a risk model consisting of 12 metabolism-related genes, many of which have been previously implicated in the pathogenesis, progression, and prognosis of leukemia.The functions and effects of these twelve genes are shown in Table 5.  ENO1 [27][28][29] and PC [31] were found to be overexpressed in several types of AML.A high expression of SDHA [41] and NUP210 [33,34] were positively associated with the unfavorable prognosis of AML patients and an elevated expression of SLC27A4 was linked to poorer clinical outcomes in several cancer types [35].An upregulated level of SORT1 [37,38] and downregulated INSR [39] were both reported in chemo-resistant or relapsed AML samples.SDHB [40] and ABCB11 [43,44] were reported to play an important role in imatinib resistance.HK2 overexpression was demonstrated to result in the chemoresistance of LSCs to DNA-damaging agents [45].
ALDH2 is necessary for protecting hematopoietic stem and progenitor cells (HSPCs) against acetaldehyde toxicity [46].ALDH2 is also reported to play an important role in the chemoresistance of AML cells.The overexpression of ALDH2 significantly increased the proliferation rate and the ability to form colonies in leukemia cell lines, resulting in increased resistance to doxorubicin [47].The inhibition of ALDH2 with daidzin and CVT-10216 significantly inhibited mesenchymal stromal cell (MSC)-induced ALDH activity in AML cells and sensitized them to chemotherapy [48].Moreover, the expression levels of ALDH2 are increased in primary AML cells from elderly patients [50].Consistent with previous research, a high expression of ALDH2 was significantly associated with poorer outcomes of AML patients in our study, which might be related to the chemoresistance resulting from the high expression level.
There is still an urgent need for novel therapies in AML since many patients relapse.And, notably, our study highlighted the potential importance of ALDH2 as an independent therapeutic target.Moreover, metabolism-targeted therapy has been shown to overcome chemotherapy resistance to a certain extent.The glycolytic inhibitor 2-DG combined with Ara-C could enhance cytotoxic effects in primary blast cells [3].Statins combined with chemotherapy improved the complete remission and complete remission with incomplete count recovery rates in R/R AML [9].Thus, metabolism-related drugs could potentially be added to chemotherapy in relapsed or refractory patients with a high-risk metabolismrelated gene prognostic signature to increase their sensitivity to chemotherapy.Our study provided an effective metabolism-related prognostic signature for clinical application; highrisk patients could attempt to add inhibitors that target glycolysis, fatty acid metabolism, and the TCA cycle pathways, in addition to traditional chemotherapy.
In conclusion, we identified a novel 12-gene metabolism-related prognostic gene signature for AML by Cox regression analysis and LASSO.This gene signature could be a powerful supplemental tool for the risk stratification and outcome prediction of AML.The gene signature might play an important role in the better understanding of metabolic pathways as the potential prognostic biomarkers and therapeutic targets for AML.This metabolism-related gene prognostic signature is primarily based on bioinformatics data analysis extracted from the public database and needs to be verified in larger-scale clinical cohorts.The metabolism risk score could be utilized together with the currently known genetic alterations used for risk stratification to improve the prognostic value and develop novel treatment options to improve final outcomes.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes15010063/s1, Figure S1: Kaplan-Meier curves for overall survival of AML patients with high and low expression of ABCB11, ENO1, HK2, INSR, NUP210, PC, SDHA, SDHB, SESN2, SLC27A4 and SORT1 in the Beat-AML and GEO cohorts ; Table S1: The overall features of the TCGA-LAML, Beat-AML, GSE37642, GSE10358, and GSE12417 databases; Table S2: Metabolic pathways that were significantly enriched (FDR < 0.25) in the gene expression data of the group with short-term survival (OS < 12 months) compared with the group with long-term survival (OS > 12 months) in the TCGA-LAML database; Table S3: Metabolic pathways that were significantly enriched (FDR < 0.2 and |NES| > 1.5) in the gene expression data of the metabolism high compared with metabolism low group in the TCGA-LAML; Table S4: The metabolism pathways of 33 prognosis-related genes identified by univariate Cox regression analysis.

Figure 1 .
Figure 1.Flow chart summarizing the methods and the sequential steps performed to construct and validate the metabolism-related gene prognostic signature.TCGA-LAML, Cancer Genome Atlas-Acute Myeloid Leukemia-like; TCA cycle, tricarboxylic acid cycle; GSEA, gene set enrichment analysis; LASSO, least absolute shrinkage and selection operator.

Figure 1 .
Figure 1.Flow chart summarizing the methods and the sequential steps performed to construct and validate the metabolism-related gene prognostic signature.TCGA-LAML, Cancer Genome Atlas-Acute Myeloid Leukemia-like; TCA cycle, tricarboxylic acid cycle; GSEA, gene set enrichment analysis; LASSO, least absolute shrinkage and selection operator.

Figure 2 .
Figure 2. Construction of the metabolism-related gene prognostic signature in the TCGA-LAML cohort.Kaplan-Meier curves for overall survival in the two risk groups based on the prognostic signature; (A) ROC analysis and AUC for five-year survival of the metabolism-related risk model (B).The blue line shows the ROC curve.

3. 4 .
External Validation of Metabolism-Related Gene Prognostic Signature in GEO AML (GSE37642, GSE10358, and GSE12417) and the Beat-AML Datasets Both validation datasets were split into high-risk (Beat-AML, n = 126; GEO, n = 150) and low-risk groups (Beat-AML, n = 126; GEO, n = 150) by the median of the metabolismrelated risk score.Figure 3A and 3B demonstrate the Kaplan-Meier curves for overall survival based on the prognostic signature in the Beat-AML and combined GEO cohort.The survival analysis indicated that, of the two validation cohorts, the samples in the high-risk

Figure 2 .
Figure 2. Construction of the metabolism-related gene prognostic signature in the TCGA-LAML cohort.Kaplan-Meier curves for overall survival in the two risk groups based on the prognostic signature; (A) ROC analysis and AUC for five-year survival of the metabolism-related risk model (B).The blue line shows the ROC curve.

3. 4 .
External Validation of Metabolism-Related Gene Prognostic Signature in GEO AML (GSE37642, GSE10358, and GSE12417) and the Beat-AML Datasets Both validation datasets were split into high-risk (Beat-AML, n = 126; GEO, n = 150) and low-risk groups (Beat-AML, n = 126; GEO, n = 150) by the median of the metabolismrelated risk score.Figure3Aand 3B demonstrate the Kaplan-Meier curves for overall survival based on the prognostic signature in the Beat-AML and combined GEO cohort.

Genes 2024 , 19 Figure 3 .
Figure 3. Validation of the metabolism-related gene prognostic signature in the validation datasets.Kaplan-Meier curves for overall survival based on the prognostic signature in the Beat-AML (A) and combined GEO cohort (B), ROC analysis and AUC for three-year survival of the metabolismrelated risk model in the Beat-AML cohort (C).ROC analysis and AUC for five-year survival of the metabolism-related risk model in the combined GEO cohort (D).The blue lines show the ROC curves.

Figure 3 .
Figure 3. Validation of the metabolism-related gene prognostic signature in the validation datasets.Kaplan-Meier curves for overall survival based on the prognostic signature in the Beat-AML (A) and combined GEO cohort (B), ROC analysis and AUC for three-year survival of the metabolismrelated risk model in the Beat-AML cohort (C).ROC analysis and AUC for five-year survival of the metabolism-related risk model in the combined GEO cohort (D).The blue lines show the ROC curves.

Figure 4 .
Figure 4. Construction of the OS-predictive nomogram for clinical application.Each line of the prognostic nomogram consists of the name of each predictive factor, including metabolism risk, age, and cytogenetic risk, on the left, and the corresponding scales lines on the right.The scales on the lines represent the factor's range of values, and the length of the line reflects the contribution of the factor to the clinical outcome events.The scores in the nomogram, including the individual score (Points), correspond to each variable at different values.The total score (Total Points) is obtained by adding

Figure 4 .
Figure 4. Construction of the OS-predictive nomogram for clinical application.Each line of the prognostic nomogram consists of the name of each predictive factor, including metabolism risk, age, and cytogenetic risk, on the left, and the corresponding scales lines on the right.The scales on the lines represent the factor's range of values, and the length of the line reflects the contribution of the

Figure 5 .
Figure 5. Kaplan-Meier curves for overall survival between ALDH2 high and ALDH2 low groups in the Beat-AML (A) and combined GEO cohorts (B).

Figure 5 .
Figure 5. Kaplan-Meier curves for overall survival between ALDH2 high and ALDH2 low groups in the Beat-AML (A) and combined GEO cohorts (B).

Table 1 .
Overview of biological and clinical data referring to patients in the TCGA-LAML, Beat-AML, and GEO datasets.

Table 2 .
The key rate-limiting enzymes in glycolysis, the TCA cycle, and fatty acid metabolism, and their corresponding gene symbols.

Table 3 .
Univariate and multivariate Cox regression analysis based on OS in the TCGA-LAML and Beat-AML cohorts.

Table 3 .
Univariate and multivariate Cox regression analysis based on OS in the TCGA-LAML and Beat-AML cohorts.
HR, hazard ratio; Cytogenetic risk *, referred to 2022 ELN risk classification by genetics.

Table 4 .
Comparison of all prior metabolism-related prognostic signatures for AML with our study.

Table 5 .
The 12 metabolism-related genes included in the gene prognostic signature.