Machine Learning Identification of Nutrient Intake Variations across Age Groups in Metabolic Syndrome and Healthy Populations

This study undertakes a comprehensive examination of the intricate link between diet nutrition, age, and metabolic syndrome (MetS), utilizing advanced artificial intelligence methodologies. Data from the National Health and Nutrition Examination Survey (NHANES) spanning from 1999 to 2018 were meticulously analyzed using machine learning (ML) techniques, specifically extreme gradient boosting (XGBoost) and the proportional hazards model (COX). Using these analytic methods, we elucidated a significant correlation between age and MetS incidence and revealed the impact of age-specific dietary patterns on MetS. The study delineated how the consumption of certain dietary components, namely retinol, beta-cryptoxanthin, vitamin C, theobromine, caffeine, lycopene, and alcohol, variably affects MetS across different age demographics. Furthermore, it was revealed that identical nutritional intakes pose diverse pathogenic risks for MetS across varying age brackets, with substances such as cholesterol, caffeine, and theobromine exhibiting differential risks contingent on age. Importantly, this investigation succeeded in developing a predictive model of high accuracy, distinguishing individuals with MetS from healthy controls, thereby highlighting the potential for precision in dietary interventions and MetS management strategies tailored to specific age groups. These findings underscore the importance of age-specific nutritional guidance and lay the foundation for future research in this area.


Introduction
Metabolic Syndrome (MetS) represents a complex cluster of risk factors that significantly contribute to the development of cardiovascular diseases and type II diabetes.These factors include hypertension, dyslipidemia, hyperglycemia, and central obesity [1], which are becoming increasingly prevalent worldwide [2,3], alongside a parallel rise in obesity rates [4,5].This trend presents a substantial challenge to global public health infrastructure.Given the intricate nature of MetS, crafting effective prevention and management strategies is crucial.Among these strategies, diet plays a pivotal role in both the development and management of MetS.A detailed exploration of dietary functions reveals its critical influence on metabolic health.Dietary components can directly affect the body's metabolic processes, influencing blood pressure, lipid levels, glucose metabolism, and body fat distribution.By understanding and optimizing dietary intake, it is possible to mitigate the risk factors associated with MetS, offering a promising avenue for reducing its prevalence and the associated health burdens [6][7][8].
The traditional approach to diagnosing MetS involves the assessment of multiple clinical parameters [9,10], a process that is both time-consuming and costly, especially in large-scale population studies.Consequently, there is a pressing need for alternative strategies that can efficiently screen individuals at a high risk of MetS for further evaluation.In this context, the advent of machine learning (ML) offers promising avenues for the development of predictive models for MetS [11,12].Recent advancements in computational capabilities have enabled the application of ML algorithms to predict MetS risk with high accuracy, using both conventional statistical analyses and novel ML methods.Significant contributions to this field include the work of Darko Ivanović, who developed an artificial neural network (ANN) model based on non-invasive, low-cost, and readily obtainable parameters such as sex, age, body mass index, waist-to-height ratio, and blood pressure.This model demonstrated high predictive values for MetS [13].Similarly, Maria Trigka and colleagues employed various ML techniques to predict MetS, incorporating a range of factors, including demographic, socioeconomic, and clinical parameters, achieving an accuracy rate of 89.35% [11].
The aforementioned research predominantly targeted the prognostication of MetS within the general populace, devoid of delving into age-specific distinctions.In the contemporary era, the demographic shift towards an older population is palpable, rendering it imperative to question the applicability of findings from studies traditionally focused on younger cohorts or the general populace to an aging demographic [14].With an eye on the aging population, our investigation endeavors to incorporate dietary habits into the predictive modeling framework to explore which specific nutrients may exert a considerable impact on people with MetS in different age groups.This approach not only addresses a critical gap in existing research but also underscores the potential for dietary interventions in the management and prevention of MetS [15,16].
As a result, in order to explore the relationship between MetS and age and nutrition and to find out which nutrition may have an effect on MetS, our research leverages the advanced capabilities of ML in conjunction with the proportional hazards model (Cox) to analyze data from ten cycles of the National Health and Nutrition Examination Survey (NHANES) from 1999 to 2018 [17,18].Our study embarks on an exploration to meld dietary patterns with fundamental physical indicators to unveil predictive factors for MetS and optimal cardiometabolic health (OCH) across diverse age demographics.Following an intricate analysis of feature importance and through meticulous manual selection [19], we identified a series of predictive markers, encompassing both physical examination metrics and nutritional indicators.Subsequent investigations into these nutritional indicators revealed their significant correlations with MetS.This effort aims to deepen our comprehension of the pathogenesis of MetS, thereby establishing a solid groundwork for the development of evidence-based clinical prevention strategies and the creation of personalized treatment protocols.

Data Collection and Processing
To enhance precision and mitigate sampling errors in our analysis, we amalgamated data spanning ten cycles (1999-2018) of the NHANES.NHANES, a cross-sectional, multistage survey, employs probabilistic sampling of the non-institutionalized civilian US population to effectively represent national demographics.Following data consolidation via "SEQN" linkage, we executed a series of preprocessing steps, including the removal of duplicate entries, exclusion of pregnant individuals, elimination of rows with missing data, excluding minors (age < 18), and identification and removal of outliers (defined as data points exceeding or falling below the mean by five times the standard deviation).Subsequent classification efforts partitioned the subjects into two distinct cohorts based on criteria for MetS and OCH, with the population neither meeting MetS criteria nor qualifying for OCH being excluded (Figure 1).MetS was identified by the presence of any three or more of the following conditions [9]: Waist circumference: >88 cm for women or >102 cm for men.
To address the issue of class imbalance observed across all datasets, the synthetic minority over-sampling technique (SMOTE) was utilized for the datasets.This technique is facilitated by the "imblearn version 0.12.2"Python package.
Following preprocessing, the dataset retained 5838 samples, which included 4721 subjects with MetS, 1117 exhibiting OCH, forming the basis for temporary dataset 1.A notable correlation between the incidence of MetS and age prompted division into two age groups (3.1 for details): the younger cohort (≤44 years) and middle-aged and elderly cohorts (≥45 years).This categorization yielded temporary datasets 2 and 3, respectively.
As a result, temporary datasets 1, 2, and 3 underwent processing via SMOTE to ensure equalized class representation across various age groups.Specifically, dataset 1 (n = 9442) featured a balanced compilation of basic data and nutritional indicators for all age groups; dataset 2 (n = 2532) catered to individuals aged 44 or younger; and dataset 3 (n = 6910) focused on those aged 45 or older, all benefitting from SMOTE's balancing effect.In contrast, dataset 4 (n = 5838), which originated from temporary dataset 1, remained unprocessed by SMOTE.This dataset retained its original unbalanced nature but offered comprehensive coverage of basic data and nutritional indicators.
Each dataset comprised four physical examination indicators and sixty-three dietary indicators, including solely demographic information (sex, age), anthropometric information (height, weight), and nutritional intake (energy, carbohydrates, fiber, protein, cholesterol, etc.).

Model Selection
In the preliminary phase of developing the predictive model, we utilized logistic regression [21], random forest [22], extreme gradient boosting (XGBoost), and support vector classification (SVC).XGBoost, an advanced ensemble method, sequentially constructs decision trees, with each tree aimed at correcting the inaccuracies of its predecessors [17].SVC, renowned for its versatility, discriminates between classes by identifying a hyperplane in the feature space that maximizes class separation [23].To evaluate the predictive performance of each model under uniform training conditions, we set the max_iter for logistic regression at 500, and the n_estimators for both random forest and XGBoost at 500.Additionally, we applied min-max normalization to dataset 1 [24], partitioning it into the training set and the validation set following an 8:2 ratio.

Model Evaluation
In our analysis, we leveraged sensitivity, specificity, accuracy, the area under the receiver operating characteristic curve (AUC), and the area under the precision-recall curve (PR-AUC) to conduct a comprehensive evaluation of the models [25][26][27].The calculations for some metrics are delineated as follows: The TP (true positive) refers to the proportion of patients accurately identified as having MetS.Conversely, FP (false positive) represents the fraction of individuals without MetS who are erroneously classified as having the condition.TN (true negative) denotes the proportion of healthy subjects correctly identified as not having MetS.Lastly, FN (true negative) pertains to the proportion of patients with MetS who are misclassified as healthy.
All these metrics are normalized and thus range from 0 to 1, ensuring a standardized measure of diagnostic performance across varying sample sizes and study designs.
To assess the influence of nutritional indicators on MetS predictions, datasets 2 and 3 were subdivided into two categories: one containing solely demographic and anthropometric information (sex, age, height, weight) and the other encompassing both basic demographic information and nutritional indicators.To guarantee the methodological robustness of study, these subsets were allocated into training and testing groups following an 8:2 distribution ratio, with a randomized shuffling procedure to minimize bias.

Feature Verification and Screening
Then, we tried to identify which specific nutritional indicators exerted a pivotal influence.Datasets 2 and 3 were analyzed using the XGBoost algorithm.The model parameters were meticulously adjusted across a range of n_estimators (200, 500, 1000) and max_depth (3,5,7,10) to refine the model's performance.Notably, the disproportionate impact of height and weight on prediction outcomes necessitated a recalibration of feature importance weights.To mitigate this bias and ensure a balanced representation of nutritional intake indicators, the relative importance of height, weight, sex, and age was deliberately scaled down to three-twentieths of that assigned to other indicators after many experiments.
Through this methodological adjustment, the XGBoost model's inherent feature selection mechanism, complemented by manual curation techniques [19], facilitated the identification of the 20 most critical nutritional indicators from a diverse range of parameters.
Then, datasets 2 and 3 were filtered to retain only the identified 20 nutritional and 4 demographic indicators.This curated dataset was then subjected to further analysis using the XGBoost model to establish a definitive ranking of feature importance.For this phase of the analysis, all features were assigned equal importance weights.However, given the direct correlation between height, weight, and the BMI-a critical determinant of MetS-the feature importance of height and weight was deliberately excluded from the final evaluation.

Risk Ratio Analysis
To evaluate the risk ratios associated with each indicator, we employed a Cox proportional hazards model for the analysis [18].The Cox model was instantiated using the "lifelines version 0.28.0"package in Python.We used Cox model to evaluate the risk ratio of basic data indicators and nutrition indicators.Specifically, we conducted risk analyses for each nutritional intake with MetS.Given the absence of specific illness onset times for the MetS in our dataset, we standardized the illness time parameter across all instances to a value of 1.
A notable methodological adaptation was required to accommodate the Cox model's constraint to numerical data inputs.Consequently, to integrate non-numeric data forms into the analysis, we employed a numeric encoding strategy.For example, educational attainment levels, serving as proxy indicators of socioeconomic status and lifestyle factors potentially influencing MetS risk, were quantified on an ordinal scale.Specifically, we assigned integer values in ascending order based on the level of education: "<HS grade" was encoded as 0, indicating high school not completed; "HS grade" was 1, representing high school completion; "Some college" was 2, denoting some college but no degree; and "College grade" was 3, corresponding to college graduation or higher levels of education.

Final Model Construction
In the concluding segment of our investigation, we directed our focus towards optimizing a predictive model that is universally applicable across diverse age cohorts.This endeavor necessitated the refinement of dataset 1, from which only four physical examination indicators and twenty nutritional indicators were retained.
The assessment of the final model's efficacy entailed a structured comparison across various datasets, each representing distinct combinations of age groups and nutritional indicators.Specifically, the "All Age All Nutrition" dataset encapsulated the entirety of age demographics alongside a comprehensive array of nutritional indicators, as represented by dataset 1.In contrast, the "Some Age All Nutrition" dataset was confined to specific age brackets (either 44 years and younger or 45 years and older, as delineated in datasets 2 or 3) while still encompassing all nutritional indicators.The "Some Age Some Nutrition" dataset further narrowed the scope to include specific age groups in conjunction with a subset of indicators (24 evaluation indicators, derived from a filtration process applied to datasets 2 or 3).The "All Age Some Nutrition" dataset spanned all age groups but was limited to the same subset of indicators (24 evaluation indicators, obtained through the filtration of dataset 1).

The Impact of Age on the Prevalence of MetS
These 4721 participants (69%) were classified as having MetS based on predefined criteria, while Table A1 presents a detailed comparison of characteristics between individuals with MetS and those with OCH.The MetS cohort exhibited a mean BMI that was 11.57 units higher and a mean age that was 31.68 years greater than their healthy counterparts.
Subsequent analysis focusing on age as a variable revealed a positive correlation between the prevalence of MetS and increasing age, alongside a corresponding decline in the incidence of OCH (Figure 2A).Specifically, within the MetS group, individuals aged 51 years and above constituted 64% of the total population diagnosed with the condition (Figure 2B).Additionally, we employed a Cox regression model to evaluate fundamental demographic information, including sex, age, education level, race/ethnicity, and income/poverty income ratio (PIR), identified age as a significant predictor with a coefficient value of 6.49, aligning with findings from prior research [28,29] (Figure 2C).In line with guidelines from the World Health Organization, the dataset was bifurcated into two age groups: ≤44 years and ≥45 years [30].Post-application of the synthetic minority over-sampling technique (SMOTE) for data balancing, Tables A2 and A3 further delineate the characteristics of individuals with MetS versus those in OCH within these age brackets.The analysis highlighted an average age discrepancy of 6.84 years and 8.52 years between the MetS and healthy groups in the younger cohort and middle-aged and elderly cohorts, respectively.These findings underscore the pivotal role of age in the development of MetS.The age distribution of subjects with MetS.(C) Examination of the risk ratios of selected social factors (including sex, age, education level, race/ethnicity, and income PIR) to MetS in dataset 4. In the sex row, females are defined as 0, and males are defined as 1.In the age row, all ages 18 and older are included, those aged 44 and under are defined as 0 and those aged 45 and over are defined as 1.
In the education level row, education is incremental; specifically, '<HS grad' is defined as 0, 'HS grad' is defined as 1, 'Some college/AA degree' is defined as 2, and 'College grad' is defined as 3.
Income, PIR was divided into two groups by their mean (2.47), where less than 2.47 was defined as 0 and greater than or equal to 2.47 was defined as 1.A coefficient >1 suggests a positive correlation In the sex row, females are defined as 0, and males are defined as 1.In the age row, all ages 18 and older are included, those aged 44 and under are defined as 0 and those aged 45 and over are defined as 1.In the education level row, education is incremental; specifically, '<HS grad' is defined as 0, 'HS grad' is defined as 1, 'Some college/AA degree' is defined as 2, and 'College grad' is defined as 3.
Income, PIR was divided into two groups by their mean (2.47), where less than 2.47 was defined as 0 and greater than or equal to 2.47 was defined as 1 In the race/ethnicity column, we refer to the definitions in the NHANES dataset to place the following: 'Mexican American' is defined as 0, 'Other Hispanic' as 1, 'Non-Hispanic White' as 2, 'Non-Hispanic Black' as 3, and 'Other Race' as 4. The income PIR column contains the salary level of everyone in the dataset.A coefficient >1 suggests a positive correlation with metabolic syndrome risk, while <1 indicates a negative correlation.The diamond symbol represents the multivariable-adjusted hazard ratio, with width denoting the 95% CI.

Evaluating Machine Learning Techniques for Predicting MetS Risk in Individuals
To ascertain the most efficacious predictive model for MetS risk, we meticulously compared the performance of several ML algorithms, including logistic regression (LR), random forest (RF), XGBoost, and support vector classifier (SVC) on the validation dataset in dataset 1.The evaluation metrics indicate that the XGBoost model outperformed its counterparts, achieving the highest precision-recall area under the curve (PR-AUC) of 0.9985, surpassing the RF model's PR-AUC of 0.9982 (Figure 3).In terms of sensitivity and accuracy, XGBoost demonstrated superior performance with a sensitivity of 97.1% and an accuracy rate of 98.2%, respectively.Additionally, it also attained the highest specificity score of 99.3%.Given its exemplary performance across multiple evaluation metrics, the XGBoost algorithm was selected for the development of the subsequent predictive model.

Investigating the Impact of Nutritional Intake on MetS
To elucidate the influence of dietary habits on the risk of developing MetS, we partitioned the data according to age and whether to add nutritional indicators into four distinct datasets for analysis.The first and second datasets are both for people aged 44 years and below, and they differ in that the first dataset includes only demographic and anthropometric information (height, weight, sex, age), while the second dataset encompassed these basic data in addition to detailed records of nutritional intake.The third and fourth datasets are both 45 years and older, and they differ in the same way.Utilizing the

Investigating the Impact of Nutritional Intake on MetS
To elucidate the influence of dietary habits on the risk of developing MetS, we partitioned the data according to age and whether to add nutritional indicators into four distinct datasets for analysis.The first and second datasets are both for people aged 44 years and below, and they differ in that the first dataset includes only demographic and anthropometric information (height, weight, sex, age), while the second dataset encompassed these basic data in addition to detailed records of nutritional intake.The third and fourth datasets are both 45 years and older, and they differ in the same way.Utilizing the XGBoost algorithm, we then trained and validated models using these datasets to assess the predictive relevance of dietary information on MetS across different age groups (Figure A2).
In the younger cohort, the dataset integrating nutritional intake alongside basic data exhibited a precision-recall area under the curve (PR-AUC) of 0.996, higher than the 0.994 PR-AUC observed for the dataset limited to basic data.Conversely, the area under the curve (AUC) for the receiver operating characteristic (ROC) analysis indicated a slight improvement with the inclusion of dietary information (0.993) compared to the basic data alone (0.990).
Conversely, for middle-aged and elderly cohorts, the addition of nutritional intake data markedly enhanced the model's performance.This was evidenced by superior PR-AUC (0.998 vs. 0.988) and AUC (0.997 vs. 0.981) metrics for the dataset containing both basic and dietary information relative to the dataset restricted to basic data.
Our analysis substantiates the premise that nutritional intake possesses a discernible effect on the risk of MetS, with a more pronounced impact observed within middle-aged and elderly cohorts [31,32].These findings underscore the importance of incorporating dietary data in predictive models to enhance their accuracy and reliability in identifying individuals at heightened risk of MetS, particularly among those aged 45 and above.

Feature Importance Analysis in MetS Risk Prediction
In our investigation into the determinants of MetS risk, we employed the XGBoost algorithm, leveraging a curated set of 24 features to elucidate their respective contributions to the prediction of MetS (Figure 4, Tables A6 and A7).It is worth noting that BMI is one of the determinants of OCH, and it is not meaningful to analyze height and weight, which form BMI, so we removed height and weight from the final report.
For the younger cohort, the variables with the greatest significance were age, retinol, beta-cryptoxanthin, caffeine, and vitamin C, highlighting the important role of these factors in MetS prediction.Conversely, the feature importance ranking for middle-aged and elderly cohorts identified age, caffeine, lycopene, retinol, and alcohol as the top contributors, which may indicate variances in nutrient impacts with advancing age.
Consistently, age emerged as the leading factor in both age groups, reinforcing its pivotal role in MetS predisposition.The differential ranking of dietary components between the two age groups underscores the potential age-dependent effects of dietary intake on MetS risk.Remarkably, retinol maintained a high position in the feature importance hierarchy across both datasets, signifying its potential as a critical dietary factor in the risk assessment of MetS.
Furthermore, the analysis indicated a relatively minor role of sex in determining MetS risk, suggesting that the influence of age and dietary factors predominates over sex-specific effects in this context.

Nutritional Intake and Its Association with MetS
To elucidate the directional influence of dietary components on the risk of MetS, beyond merely their importance, we employed Cox proportional hazards models to assess the risk ratios associated with 20 specific nutritional intake features (Figure 4C).The results organize the nutritional exposures by their estimated hazard ratios (HRs), delineating those with HRs greater than 1.0 as promotive of MetS and those with HRs less than 1.0 as inhibitive.The distribution pattern revealed that significant associations predominantly clustered away from an HR of one, with exposures exhibiting small p-values dispersed at both extremes of the spectrum.
Our analysis identified 24 significant associations across datasets segmented by age (Tables A4 and A5), with a p-value threshold set at <0.01 for proportionality assignments.In the younger cohort, cholesterol and total sugars were hypothesized to increase risk, whereas vitamin C, theobromine, and alpha-carotene were hypothesized to be protective.In contrast, for the middle-aged and older cohorts, variables such as energy, carbohydrates, and cholesterol were hypothesized to be associated with increased risk, whereas caffeine, theobromine, and alcohol were predicted to be protective.

Feature Importance Analysis in MetS Risk Prediction
In our investigation into the determinants of MetS risk, we employed the XGBoo algorithm, leveraging a curated set of 24 features to elucidate their respective contribu tions to the prediction of MetS (Figure 4, Tables A6 and A7).It is worth noting that BMI one of the determinants of OCH, and it is not meaningful to analyze height and weigh which form BMI, so we removed height and weight from the final report.Notably, alcohol intake revealed an inverse relationship with MetS risk (HR 0.29; 95% CI: 0.20-0.41;p-value < 4 × 10 −12 ) despite relatively low consumption levels within the study population.This observation underscores the nuanced role of alcohol in metabolic health, warranting further investigation.Then, we selected four nutrients for an in-depth examination: theobromine and cholesterol (for the younger cohort) and caffeine and carbohydrates (for middle-aged and elderly cohorts).The violin diagram illustrates the consumption patterns of these nutrients between populations with and without MetS (Figure 5).This analysis revealed that beneficial nutrients were generally consumed in higher quantities by the healthy population, whereas harmful nutrients were more frequently consumed by individuals with MetS, suggesting a tangible impact of these selected nutrients on MetS development.In contrast, the risk ratio of vitamin C in the younger cohort caught our attention, with a risk ratio of 0.55 representing the strong likelihood that it would have some inhibitory effect on MetS.This is also consistent with the findings of the previous article.In fact, the mechanism of action of vitamin C on MetS may be diverse and may be achieved by a Furthermore, we found that cholesterol has a negative effect on people of all ages.Cholesterol presented a risk for the younger cohort (HR 2.35; 95% CI: 1.65-3.34;p-value < 3 × 10 −6 ) and also presented a risk for middle-aged and elderly individuals (HR 2.23; 95% CI: 1.76-2.83;p-value < 4 × 10 −11 ), which is in line with previous findings in the literature [33,34].
In contrast, the risk ratio of vitamin C in the younger cohort caught our attention, with a risk ratio of 0.55 representing the strong likelihood that it would have some inhibitory effect on MetS.This is also consistent with the findings of the previous article.In fact, the mechanism of action of vitamin C on MetS may be diverse and may be achieved by a combination of its antioxidant properties, protective effects against oxidative stress, anti-inflammatory properties, and other potential biological functions [35,36].
Differently, caffeine and theobromine exhibited more pronounced benefits for middleaged and elderly cohorts, potentially due to their roles in addressing age-related health challenges, including cognitive decline, oxidative stress, and lipid metabolism disorders.The antioxidative properties of theobromine and the cardiovascular benefits associated with both caffeine and theobromine highlight the complex interplay between age, nutrient intake, and metabolic health [37][38][39].

Model Optimization for Predictive Analysis across Age Groups
In our endeavor to devise a predictive model capable of accurately forecasting MetS across a diverse age spectrum, we leveraged a dataset enriched with 24 meticulously selected features.This dataset encompassed the features above, including both demographic and nutritional intake indicators, tailored to enhance the predictive precision of our model.
Empirical evidence from our analyses clearly indicates that the "All Age All Nutrition" model is the pinnacle of predictive efficiency, with a ROC of 0.99783 (Figure A3), and that "All Age Some Nutrition" achieved the second performance in all ages.Compared to "All Age All Nutrition", "All Age Some Nutrition" required 43 fewer metrics and had a decrease in PR-AUC and ROC of less than 0.1%.The superior performance of the model constructed by "All Age Some Nutrition" emphasizes its robustness in predicting age-specific demographics, further demonstrating its excellent sensitivity metrics.The comprehensive evaluation of its performance metrics confirms its overall efficacy and acceptability for a wide range of predictive applications.

Discussion
The exploration of the interplay between nutritional intake and MetS has been limited, prompting our investigation using advanced deep learning methodologies, particularly the XGBoost algorithm.This study aimed to predict MetS and identify the OCH population leveraging four physical examination indicators and twenty nutritional indicators.The dataset comprised 5838 samples from the NHANES, enhanced through synthetic minority over-sampling technique (SMOTE) to address class imbalance.The model achieved very good results, showing that it is very effective in distinguishing MetS.
Our analysis extends beyond traditional models by incorporating nutritional indicators alongside physical indicators, distinguishing between OCH and MetS populations.This approach contrasts with earlier studies by Darko Ivanović, Maria Trigka, and others, who predominantly focused on basic demographic and physical parameters, and through these parameters, they predict individuals with MetS in the population, while our study focuses on nutritional indicators alongside physical metrics.By studying the OCH and MetS populations, we have hypothesized a number of factors that have an impact on MetS.The differences in research may stem from different research purposes; our research focuses on theory, which is to identify factors that may have a significant impact on MetS.
In our investigation, we employed sophisticated ML techniques alongside univariate analyses of fundamental physiological markers and dietary intake indicators.This approach enabled us to elucidate previously obscure correlations between diet and health metrics.Our findings highlight the differential impact of age-specific dietary patterns on the prevalence of MetS.We discovered that the influence of certain dietary components, including retinol, beta-cryptoxanthin, caffeine, and vitamin C, was more pronounced in younger cohorts.Conversely, in middle-aged and elderly cohorts, the dietary effects of caffeine, lycopene, retinol, and alcohol were more marked.Subsequent investigations revealed that both young healthy cohorts and their middle-aged and elderly counterparts exhibited higher consumption levels of theobromine or caffeine.In contrast, an increased intake of cholesterol or carbohydrates was observed in young and middle-aged/elderly cohorts diagnosed with MetS.
Vitamin C intake, on the other hand, may be beneficial in younger populations, which is consistent with previous findings, and may be related to the anti-inflammatory properties of vitamin C. Individuals with MetS experience chronic systemic inflammation, and vitamin C has been shown to play a beneficial role in reducing the inflammatory response in the body [35].In patients with MetS who were provided with a balanced diet and 500 mL/day of orange juice, an increase in vitamin C intake but a decrease in CRP and high-sensitivity C-reactive protein (hsCRP) levels were observed after three months of intervention [40].
Moreover, this investigation unveiled that different nutritional intakes may exert disparate pathogenic risks for MetS across various age groups.Notably, our study suggests that cholesterol intake may increase the risk of MetS in both younger and middle-aged and older cohorts.Conversely, the intake of caffeine and theobromine exhibits a possible beneficial effect in the middle-aged and elderly groups.This divergence may be attributed to age-related physiological transformations, including decelerated metabolism, alterations in sex hormone levels, cognitive function decline, augmented oxidative stress, and the dysregulation of lipid metabolism [41][42][43].These findings contribute novel insights into the intricate relationship between diet and MetS, expanding our understanding beyond the existing literature.
Our findings elucidate that identical nutritional components may exert different effects across distinct age demographics.Such observations underscore the necessity of considering patient age when formulating nutritional recommendations for individuals with MetS.Concurrently, our research suggests that an increased intake of theobromine and caffeine, coupled with a reduced consumption of cholesterol and carbohydrates, may contribute to the mitigation of MetS symptoms.Moreover, the role of retinol within various age brackets emerged as a focal point of interest in our study.After consulting information, we found that retinol binds to retinol binding protein (RBP) in the blood and affects energy homeostasis and insulin response [44,45].Moreover, retinol also has anti-inflammatory effects and has an impression of lipid metabolism [46][47][48], indicating that retinol may have a positive impact on MetS through multiple pathways.
Leveraging deep learning methodologies enables us to discern the significance and risk ratios associated with nutritional factors in the context of MetS.While conventional statistical approaches facilitate the identification of disparities between MetS-afflicted individuals and their healthy counterparts, they falter in explicitly delineating the importance of specific factors.The application of feature importance and Cox risk ratio models in the XG-Boost framework provides the nuanced ability to differentiate between key factors, and by using these methods, we can quickly find nutrients that may have some impact on a disease and then analyze them further, a feat not possible with traditional statistical techniques.
The use of the NHANES dataset, a comprehensive compilation of population data, underpins our study.However, we did not analyze some known risk factors, such as smoking and physical activity, for example, because of missing data.This can lead to distorted data being analyzed; specifically, some diets that may be healthy are considered unhealthy because of smoking or an extreme lack of exercise.The selection of a subset of 5838 participants necessitated the application of SMOTE to mitigate category imbalance, a technique that, while effective, may introduce biases reflecting synthetic rather than actual data distributions.This is particularly relevant in the analysis of middle-aged and elderly populations; this is because there is a large population of middle-aged and elderly people suffering from MetS, which leads to a small population of OCH (n = 162).After SMOTE treatment, the resulting population of OCH will also be limited to the original population, which may limit the accuracy and completeness of some nutritional factors.NHANES is a cross-sectional study that does not allow for a direct causal relationship between nutrition and MetS.Therefore, the further validation of the potential associations between nutrients and MetS identified above will need to be pursued through population tracking and other means.
Beyond this, we recognize that the assessment of dietary intake is a complex process in which the intake of various nutrients can amplify or mitigate their respective effects and that supplements were not considered during our experiments.XGBoost was chosen as our primary analytical tool in the model selection, and we acknowledge its strengths in handling complex datasets but also recognize the potential for alternative models or parameter configurations to produce better results.Similarly, our risk analysis utilizes a Cox model under uniform event time constraints, a simplification that may affect the estimation of time-dependent risk factors.

Conclusions
In conclusion, this study harnesses the power of advanced ML technologies to shed light on the intricate relationship between dietary intake and MetS, underlining the critical need for age-specific dietary recommendations within public health initiatives.Our research not only lays the groundwork for subsequent inquiries but also paves the way for the formulation of precise public health policies, thus making a notable contribution to the discipline.

Figure 1 . 1 .
Figure 1.Participant selection and exclusion criteria flowchart.This schematic depicts the process of creating dataset 4 from four attribute-filtered datasets, consolidated into a table via sequence Figure 1.Participant selection and exclusion criteria flowchart.This schematic depicts the process of creating dataset 4 from four attribute-filtered datasets, consolidated into a table via sequence number (seqn), followed by the removal of duplicates, pregnant women, missing, and unreasonable data entries.Unreasonable data are defined as values exceeding five times the standard deviation above the attribute mean.Exclusions also include neither the metabolic syndrome nor the optimal cardiometabolic health population and minors, resulting in 5838 records, comprising 4721 MetS subjects and 1117 with OCH.Dataset 2 encompasses 2532 participants aged ≤44 years post-SMOTE processing.Dataset 3 includes 6910 participants aged ≥45 years, also post-SMOTE.Dataset 1 is derived from dataset 4, featuring 9442 entries post-SMOTE processing.

24 Figure 2 .
Figure 2. Original dataset (dataset 4) analysis.(A) A histogram analyzes the distribution of different age groups, along with the prevalence of OCH versus metabolic syndrome within these groups.(B)

Figure 2 .
Figure 2. Original dataset (dataset 4) analysis.(A) A histogram analyzes the distribution of different age groups, along with the prevalence of OCH versus metabolic syndrome within these

Figure 3 .
Figure 3. Model selection analysis.(A) Compares the accuracy of four models using dataset 1's validation set.(B) Assesses the sensitivity and specificity of these models.(C) Shows the precisionrecall curve.(D) Illustrates the ROC curve for model evaluation using dataset 1's validation set.

Figure 3 .
Figure 3. Model selection analysis.(A) Compares the accuracy of four models using dataset 1's validation set.(B) Assesses the sensitivity and specificity of these models.(C) Shows the precisionrecall curve.(D) Illustrates the ROC curve for model evaluation using dataset 1's validation set.

Figure 4 .
Figure 4. Dietary factors: feature importance and p-value distribution.(A) The top 10 feature im portance in the XGBoost model for age ≤44.(B) The top 10 feature importance in the XGBoost mod for age ≥45.(C) The p-value distribution from two-sided Wald tests, with the Y-axis showing th negative logarithm of each exposure's p-value.The dotted red line indicates the p-value thresho

Figure 4 .
Figure 4. Dietary factors: feature importance and p-value distribution.(A) The top 10 feature importance in the XGBoost model for age ≤44.(B) The top 10 feature importance in the XGBoost model for age ≥45.(C) The p-value distribution from two-sided Wald tests, with the Y-axis showing the negative logarithm of each exposure's p-value.The dotted red line indicates the p-value threshold of 0.01.Significant nutrients negatively associated with metabolic syndrome (HR < 1) are highlighted in green, and those with a positive association (HR > 1) are in red.

Figure 5 .
Figure 5. Significant nutritional elements by age group.The white horizontal line represents the average value.(A) Cholesterol as a possible metabolic syndrome-promoting nutrient in patients aged ≤44.(B) Theobromine as a possible metabolic syndrome-inhibiting nutrient in patients aged ≤44.(C) Carbohydrates as possible metabolic syndrome-promoting nutrients in patients aged ≥45.(D) Caffeine as a possible metabolic syndrome-inhibiting nutrient in patients aged ≥45.

Figure 5 .
Figure 5. Significant nutritional elements by age group.The white horizontal line represents the average value.(A) Cholesterol as a possible metabolic syndrome-promoting nutrient in patients aged ≤44.(B) Theobromine as a possible metabolic syndrome-inhibiting nutrient in patients aged ≤44.(C) Carbohydrates as possible metabolic syndrome-promoting nutrients in patients aged ≥45.(D) Caffeine as a possible metabolic syndrome-inhibiting nutrient in patients aged ≥45.

Figure A1 .
Figure A1.Approach overview.This figure outlines the sequential execution steps of our methodology.Step 1 involves data processing, detailing the handling of temporary dataset 1.In Step 2, we assess four models against five evaluation metrics to determine the most suitable model for selection.Step 3 focuses on feature selection within the dataset, applying feature importance and manual examination to construct a new model.This model undergoes further analysis for feature importance and the impact of feature adjustments.

Figure A2 .
Figure A2.Age group dataset comparison.(A,B) Precision-recall and ROC curves for age group ≤44.(C,D) Precision-recall and ROC curves for age group ≥45.Figure A2.Age group dataset comparison.(A,B) Precision-recall and ROC curves for age group ≤44.(C,D) Precision-recall and ROC curves for age group ≥45.

Figure A2 .
Figure A2.Age group dataset comparison.(A,B) Precision-recall and ROC curves for age group ≤44.(C,D) Precision-recall and ROC curves for age group ≥45.Figure A2.Age group dataset comparison.(A,B) Precision-recall and ROC curves for age group ≤44.(C,D) Precision-recall and ROC curves for age group ≥45.

Table A4 .
A Cox risk analysis (age as covariate) was performed for screening the nutrition in a population <45 years of age.

Table A5 .
A Cox risk analysis (age as covariate) was performed for screening the nutrition in a population ≥45 years of age.