Mapping Mental Trajectories to Physical Risk: An AI Framework for Predicting Sarcopenia from Dynamic Depression Patterns in Public Health

Han, Yaxin; Tian, Renzhi; Pan, Chengchang; Qi, Honggang

doi:10.3390/ai6120300

Open AccessArticle

Mapping Mental Trajectories to Physical Risk: An AI Framework for Predicting Sarcopenia from Dynamic Depression Patterns in Public Health

¹

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 101408, China

²

Department of Nutrition and Food Hygiene, School of Public Health, Peking University, Haidian District, Beijing 100191, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AI 2025, 6(12), 300; https://doi.org/10.3390/ai6120300

Submission received: 20 October 2025 / Revised: 17 November 2025 / Accepted: 18 November 2025 / Published: 21 November 2025

(This article belongs to the Topic Artificial Intelligence in Public Health: Current Trends and Future Possibilities, 2nd Edition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Background: The accelerating global population aging underscores the urgency of addressing public health challenges. Sarcopenia and depression are prevalent, interrelated conditions in older adults, yet prevailing research often treats depression as a static state, neglecting its longitudinal progression and limiting predictive capability for sarcopenia. Methods: Using data from four waves (2011–2018) of the China Health and Retirement Longitudinal Study (CHARLS), we identified distinct depressive symptom trajectories via Group-Based Trajectory Modeling. Seven machine learning algorithms were employed to develop predictive models for sarcopenia risk, incorporating these trajectory patterns and baseline characteristics. Results: Three depressive symptom trajectories were identified: ‘Persistently Low’, ‘Persistently Moderate’, and ‘Persistently High’. Tree-based ensemble methods, particularly Random Forest and XGBoost, demonstrated superior and robust performance (mean accuracy: 0.8265 and 0.8178; mean weighted F1-score: 0.8075 and 0.8084, respectively). Feature importance analysis confirmed depressive symptoms as a core, independent predictor, ranking third (5.7% importance) in the optimal Random Forest model, only after BMI and cognitive function, and surpassing traditional risk factors like age and waist circumference. Conclusions: This study validates that longitudinal depressive symptom trajectories provide superior predictive power for sarcopenia risk compared to single-time-point assessments, effectively mapping mental health trajectories to physical risk. The robust ML framework not only enables early identification of high-risk individuals but also reveals a multidimensional risk profile, highlighting the intricate mind–body connection in aging. These findings advocate for integrating dynamic mental health monitoring into routine geriatric assessments, demonstrating the potential of AI to facilitate a paradigm shift towards proactive, personalized, and scalable prevention strategies in public health and clinical practice.

Keywords:

sarcopenia; depression; trajectory analysis; machine learning; China health and retirement longitudinal study (CHARLS); artificial intelligence; public health; predictive modeling

1. Introduction

The accelerating pace of global population aging has made promoting healthy aging among older adults an urgent public health priority [1,2]. Within this context, the co-occurrence and interaction between physical frailty and mental health disorders represent a central challenge in geriatric health. Sarcopenia—a syndrome characterized by progressive and generalized loss of skeletal muscle mass, strength, and function [3]—and depression, a prevalent affective disorder, are highly common and closely interrelated in the older population [4]. Numerous cross-sectional and longitudinal studies have established a significant association between depression and sarcopenia [5,6,7]. These conditions frequently present as “psychosomatic comorbidities,” collectively exacerbating the risks of disability, hospitalization, and mortality [8].

However, prevailing research paradigms exhibit notable limitations. First, the vast majority of studies treat depression as a static baseline condition, examining its association with sarcopenia based solely on severity at a single time point [9,10,11]. This approach overlooks the inherent dynamic nature and heterogeneity of depression. In its natural course, symptoms often follow distinct trajectory patterns such as ‘persistent high,’ ‘worsening,’ or ‘remitting-relapsing’ [12,13]. This simplified, static perspective substantially limits the ability to prospectively predict future physical risk. Second, although potential biological mechanisms (e.g., hypothalamic–pituitary–adrenal axis dysregulation, chronic inflammation) have been extensively explored [14,15,16], the role of observable and modifiable behavioral and lifestyle factors (such as reduced physical activity, social withdrawal, and sleep disturbances) in linking depression to sarcopenia has received less attention, and they consequently remain poorly represented in existing predictive models [17]. Finally, clinical practice lacks effective tools to accurately identify individuals with depression who are at the highest risk of developing sarcopenia before substantial, often irreversible, muscle loss occurs, thereby missing the critical window for early intervention.

To address these gaps and meet the pressing need for proactive public health interventions, this study introduces a novel AI-driven approach. We hypothesize that the dynamic trajectory of depressive symptoms, rather than a single time-point assessment, serves as a more powerful and clinically realistic indicator for predicting future sarcopenia risk. Recent years have seen machine learning (ML) algorithms demonstrate significant potential in leveraging longitudinal data for disease prediction [18,19]. Building on this, our study aims to develop and validate an ML-based risk prediction model using large-scale, long-term longitudinal cohort data. The core innovation of our model lies in its use of long-term depression trajectory patterns as key predictors. Our primary aims are to:

(1): develop an AI-powered early-warning system for sarcopenia in older adults by identifying high-risk individuals with specific depressive symptom trajectories;
(2): systematically demonstrate the predictive superiority of longitudinal trajectory data over conventional single-point assessments;
(3): identify core predictors of sarcopenia risk linked to depressive states, thereby pinpointing precise targets for designing subsequent multi-dimensional, personalized non-pharmacological interventions, thereby paving the way for cost-effective, population-level screening.

In parallel with conventional statistical approaches, artificial intelligence and machine learning (ML) have emerged as powerful tools for disease prediction. Several studies have begun to explore ML models for sarcopenia risk assessment. For instance, recent research has successfully employed algorithms such as Random Forests, XGBoost, and Support Vector Machines, primarily utilizing static, cross-sectional data encompassing anthropometric measurements, physical performance tests, and blood biomarkers [20,21]. While these studies demonstrate the feasibility of AI in this domain, a common limitation persists: the reliance on single-time-point assessments of risk factors. This approach fails to capture the temporal dynamics and heterogeneous progression of key modifiable predictors, such as depressive symptoms. Consequently, our study introduces a novel paradigm by integrating longitudinal trajectories of depressive symptoms, identified via Group-Based Trajectory Modeling, as core predictors within an ML framework. We hypothesize that this dynamic representation of mental health will provide superior predictive power for incident sarcopenia compared to models using only static assessments.

To our knowledge, this is the first study to attempt stratifying the risk of sarcopenia in older adults based on the long-term dynamic trajectories of depressive symptoms using machine learning. The findings from this research have the potential to provide a low-cost, scalable risk-stratification tool for geriatric medicine and psychiatry, facilitating a paradigm shift in clinical practice from reactive treatment towards proactive, preventive management.

2. Methods

2.1. Data Resource

This analysis leverages data from the China Health and Retirement Longitudinal Study (CHARLS), a longitudinal survey with national representation that commenced in 2011. To ensure the sample accurately reflects the national population of middle-aged and elderly Chinese, the study design utilized a multi-stage, probability-proportional-to-size sampling approach, with analytical weights provided for generating national estimates. All study procedures complied with the ethical standards of the Declaration of Helsinki and were approved by the Biomedical Ethics Committee of Peking University (IRB00001052-11015). Prior publications detail the full cohort profile, and all participants provided written informed consent [22]. The baseline survey in 2011 enrolled 17,708 individuals from 28 provinces through structured, in-person interviews that gathered extensive demographic, health, and biomarker data. Follow-up data are collected every two to three years. An additional recruitment wave in 2015 expanded the total sample size to 21,095 participants.

2.2. Study Participants

Based on the CHARLS 2011 wave, participants were included as the baseline cohort and subsequently excluded if they met any of the following criteria: (1) unavailable sarcopenia assessment data; (2) incomplete physical performance measures, including handgrip strength (HGS), five-repetition sit-to-stand test (5-CST), or the 6-min walk test (6-WT)—noting that individuals under 60 years of age were exempt from the 6-WT per protocol and were classified as having normal results for this component; (3) missing data on age, sex, height, or weight; (4) lack of follow-up depression data.

2.3. Sarcopenia Assessment

Sarcopenia was defined according to the algorithm derived from the Asian Working Group for Sarcopenia (AWGS) criteria [23], as operationalized in the CHARLS dataset. The diagnosis was based on three components: low muscle strength, low physical performance, and low muscle mass.

Low Muscle Strength: Was assessed by handgrip strength (HGS). The maximum value from the left and right hands was averaged. Low muscle strength was defined as an average grip strength of <28 kg for men and <18 kg for women.

Low Physical Performance: Was assessed using the 5-time chair stand test (5-CST). Low physical performance was defined as a time taken to complete five stands ≥12 s.

Low Muscle Mass: Was determined by the appendicular skeletal muscle mass index (ASM/height²). The ASM was estimated using a validated anthropometric equation. Low muscle mass was defined as an ASM/height² <6.88 kg/m² for men and <5.69 kg/m² for women.

Participants were then categorized into four mutually exclusive groups based on these components: (1) No Sarcopenia: Participants with normal muscle mass. (2) Possible Sarcopenia: Participants with low muscle mass, but with normal muscle strength and normal physical performance. (3) Sarcopenia: Participants with low muscle mass, plus either low muscle strength or low physical performance. (4) Severe Sarcopenia: Participants with low muscle mass, low muscle strength, and low physical performance.

2.4. Depression Assessment

Depressive symptoms were evaluated with the 10-item Center for Epidemiologic Studies Depression Scale (CESD-10), an instrument designed to measure the frequency of depressive experiences during the previous week. The CESD-10 includes ten questions, categorized into eight items reflecting negative feelings or behaviors and two items indicating positive feelings or behaviors. Responses for negative items were scored according to their frequency: 0 points for less than 1 day, 1 point for 1–2 days, 2 points for 3–4 days, and 3 points for 5–7 days. The two positive items were scored in reverse. Summing all item scores yielded a total score ranging from 0 to 30, where a higher total score corresponds to greater depressive severity. The psychometric properties, including reliability and validity, of the CESD-10 have been confirmed in prior research [24]. Consistent with methodological precedents [25], a cut-off score of 10 was applied to define depression as a binary variable. Accordingly, participants with a total CESD-10 score ≥ 10 were categorized as having clinically significant depressive symptoms.

2.5. Other Input Variables

Other input variables were selected at baseline (Wave 1), collected primarily through questionnaires and physical measurements. These variables were categorized as follows: demographic factors (sex, age, residence/province, marital status), lifestyle factors (nocturnal sleep duration, smoking, alcohol consumption, life satisfaction, social activity, exercise), health status (pain, self-rated health), disease-related factors (activities of daily living, hypertension, dyslipidemia, diabetes, disability, tooth loss, fracture, chronic disease status, cognitive function), socioeconomic factors (education level, income), and physical examination data (waist circumference, body mass index). To minimize bias due to extensive missing data, participants with more than 20% missing information across the input variables at Wave 1 were excluded from the analysis.

2.6. Trajectory Analysis

We employed Group-Based Trajectory Modeling (GBTM) on CESD-10 scores from Waves 1 to 4 to identify distinct subgroups of older adults with similar long-term depressive symptom patterns. We estimated models with one to six trajectory classes. The optimal number of latent trajectories was determined by comparing these models using several key metrics: (1) Information Criteria: The Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC) were used as the primary statistical criteria. Lower values indicate a better model fit, with BIC placing a stronger penalty on model complexity. (2) Classification Quality: The Average Posterior Probability (AvePP) for each class was examined to assess the accuracy of class assignment. AvePP values above 0.70 for all classes are considered indicative of good classification certainty. (3) Class Proportions and Interpretability: The substantive meaning and clinical relevance of the trajectories, along with the proportion of individuals in each class, were critically evaluated. We required that the final solution yield distinct, interpretable patterns with all class proportions being substantively meaningful. We primarily employed the Group-Based Trajectory Modeling (GBTM) approach, specifically the Latent Class Growth Model (LCGM) which assumes homogeneity within classes, for its parsimony and clinical interpretability. To justify this choice, we conducted a sensitivity analysis comparing LCGM with the more flexible Growth Mixture Model (GMM), which allows for within-class variation. As presented in Supplementary Table S1, for the optimal class 3 solution, the GMM did not yield a superior fit compared to the LCGM (SABIC values were identical: 46,742.96), while the LCGM solution demonstrated high classification accuracy (Entropy = 0.80). Given the comparable fit and our primary goal of identifying distinct, interpretable subgroups for downstream prediction, the more parsimonious LCGM was selected for all subsequent analyses.

The comprehensive results of this model comparison are presented in Section 3.2. Based on this assessment, the three-class trajectory model was selected as the optimal solution. The trajectory group assigned to each individual was subsequently used as a key stratification variable in the prediction models for sarcopenia.

To address the potential sensitivity of downstream predictive models to the number of trajectory classes, we conducted a sensitivity analysis by estimating GBTM solutions with two to five classes. While models with four or five classes showed comparable statistical fit in terms of BIC (e.g., the five-class model achieved a BIC of −20,076.19), the three-class solution was ultimately selected as the optimal model. This decision was based on its superior balance of statistical fit, high classification certainty (all Average Posterior Probabilities > 0.79), and, most critically, its clearer clinical interpretability and the substantive meaning of the resulting trajectories (‘Persistently Low’, ‘Persistently Moderate’, and ‘Persistently High’). The trajectories and cross-dataset model performance for the five-class solution are provided in the Supplementary Figures S1 and S2 for reference. This approach of aligning the structural assumptions of the trajectory model with the requirements of the downstream predictive task is consistent with best practices in hybrid analytical frameworks [26].

2.7. Predictive Model Development and Evaluation

We addressed the issues of missing data and class imbalance with the following rigorous protocol. First, to handle missing values in the 28 input variables, we applied a comprehensive, adaptive multiple imputation strategy. The imputation was performed using the IterativeImputer from the scikit-learn library, which employs a chained equations approach. A RandomForestRegressor (with n_estimators = 20) was used as the underlying predictive model for continuous variables within the iterative imputer, which was run for max_iter = 10 iterations to achieve convergence. To enhance the biological plausibility of the imputations, we tailored the strategy for different variable categories based on their epidemiological characteristics: Health status variables (e.g., hypertension, diabetes, chronic disease) were imputed using a combination of individual historical data (forward/backward filling where available) followed by mode imputation; Lifestyle variables (e.g., smoking, alcohol consumption) were imputed using longitudinal patterns; Cognitive function variables (e.g., total cognition score) were imputed based on linear regression models incorporating age; Socioeconomic variables (e.g., income) were imputed using stratified means/modes based on geographical region; Categorical variables were imputed using the mode. We generated m = 5 complete datasets, each with a different random seed (42 + i for i = 1 to 5). The final analysis used the aggregated results from these five datasets. The imputation process was successful, with a 100% imputation rate, reducing the missingness from the original count to zero across all 28 processed variables.

To assess the potential impact of selection bias introduced by our inclusion criteria (which required complete data across waves), we conducted a sensitivity analysis. We compared the baseline characteristics of participants included in the final analytical sample (n = 6125) with those who were excluded due to missing data (n = 5481). As presented in Supplementary Table S2, the two groups differed significantly in a number of demographic, socioeconomic, and health-related characteristics. The excluded participants were, on average, younger, more likely to be female, had a lower BMI, and had a lower prevalence of chronic conditions such as high blood pressure. However, critically, the two groups did not differ significantly in the levels of the key exposure variable, depressive symptoms (CESD-10 score, p = 0.795), or in cognitive function (total cognition score, p = 0.104). This pattern suggests that while our analytical sample is not fully representative of the entire baseline cohort on some covariates, the core relationships under investigation between mental trajectory and physical outcome may be less affected.

Second, to mitigate the significant class imbalance arising from the lower prevalence of sarcopenia (which can compromise model performance), we implemented the Synthetic Minority Over-sampling Technique (SMOTETomek). Crucially, this resampling was applied exclusively during the training phase of the 10-fold cross-validation, ensuring that the final model evaluation was performed on a pristine, unaltered test set. This strict separation, combined with the inherent robustness of tree-based ensembles like Random Forest and XGBoost, effectively guards against overfitting and provides a reliable estimate of model generalizability to new, imbalanced data. Consequently, we employed the weighted F1-score as our primary evaluation metric to ensure a balanced assessment of precision and recall across all classes.

In this study, seven machine learning algorithms were employed to develop predictive models for sarcopenia risk based on dynamic depression trajectories, including logistic regression (LR), random forest (RF), XGBoost, multilayer perceptron (MLP), recurrent neural network (RNN), long short-term memory (LSTM), and Transformer. Among them, LR serves as a classical statistical baseline model offering high interpretability [27]. RF and XGBoost represent powerful ensemble methods based on bagging and boosting principles, respectively, which enhance predictive performance by combining multiple decision trees while mitigating overfitting [28,29]. To capture complex temporal patterns in longitudinal depression data, we also implemented several deep learning architectures: MLP as a foundational feedforward network [30], RNN and LSTM for modeling sequential dependencies, and the Transformer model to leverage self-attention mechanisms for capturing long-range interactions in symptom progression [31,32].

To ensure robust model development and evaluation, the dataset was randomly partitioned into a temporary training set (80%) and a held-out test set (20%) [33]. We adhered to standard machine learning protocols, employing 10-fold cross-validation on the training set for hyperparameter optimization [34]. Model performance was comprehensively evaluated on the independent test set using multiple metrics: accuracy to assess overall predictive performance, the weighted F1-score, along with precision and recall individually to provide detailed insights into the model’s predictive characteristics. Given the class imbalance in our dataset, we used the weighted F1-score as the primary metric to balance precision and recall across classes. The 95% confidence intervals for test accuracy were calculated and reported to quantify the uncertainty of the performance estimates. Statistical significance was determined through hypothesis testing to confirm that model performances were better than random chance. Furthermore, we analyzed feature importance patterns from the best-performing classifiers across different trajectory groups to identify key predictors.

3. Results

3.1. Population Demographics

The baseline characteristics of the study participants, stratified by their depressive symptom trajectories, are presented in Table 1. Based on the depressive symptom trajectories, Group 1 (persistently low depressive symptom group) was characterized by a higher proportion of males, urban residents, and individuals with higher income and education levels. In contrast, Groups 2 and 3 (persistently moderate and persistently high depressive symptom groups) showed a higher prevalence of females, rural residents, lower socioeconomic status, poorer self-reported health, and higher rates of comorbidities, body pain, ADL disorders, and disabilities. Additionally, more severe depressive symptoms were associated with shorter sleep duration and lower cognitive scores.

3.2. Heterogeneous Trajectories of Depressive Symptoms

Based on the model fit statistics (Table 2), the three-class solution was selected as the optimal trajectory model for depressive symptoms. While the Bayesian Information Criterion (BIC) continued to decrease through the five-class model, indicating potential statistical improvement, the three-class model was chosen based on a superior balance of statistical fit and clinical utility. Crucially, the three-class solution demonstrated excellent classification accuracy, with all Average Posterior Probabilities (APP) exceeding 0.79, well above the 0.70 threshold that indicates clear and reliable class separation. In contrast, models with four or more classes contained at least one trajectory group with an APP below 0.70 (e.g., APP = 0.69 for a class in the 4-class model), signifying deteriorating classification certainty. Furthermore, the three-class structure—identified as Group 1 (Persistently Low), Group 2 (Persistently Moderate), and Group 3 (Persistently High)—yielded substantively meaningful, distinct trajectories with robust class proportions (all > 18%), thereby ensuring clear interpretability and clinical relevance for subsequent risk prediction. Heterogeneous trajectory classes of depressive symptoms are shown in Figure 1. The final three-group trajectory model of depressive symptoms in middle-aged and older adults from CHARLS is presented in Table 3.

3.3. Cross-Dataset Performance Benchmarking of Machine Learning Models

To comprehensively evaluate the predictive performance and generalizability of our approach, we benchmarked seven machine learning algorithms across three independent datasets derived from distinct depressive symptom trajectories. The models included both classical and advanced architectures: Logistic Regression (LR) as an interpretable baseline, tree-based ensemble methods (Random Forest [RF], XGBoost), and deep learning models (Multilayer Perceptron [MLP], Recurrent Neural Network [RNN], Long Short-Term Memory [LSTM], and Transformer).

The aggregated performance metrics (Accuracy and Weighted F1-score) for all models across the three datasets are summarized in Table 4. Across all benchmarks, the tree-based ensemble methods, RF and XGBoost, consistently demonstrated superior and robust performance. RF achieved the highest mean accuracy (0.8265), while XGBoost attained the highest mean Weighted F1-score (0.8084), a critical metric given the class imbalance in our data. The consistent superiority of these two models across all three datasets is visually apparent in the overall performance comparison (Figure 2), as well as in the detailed accuracy and F1-score comparisons (Figure 3 and Figure 4, respectively). A detailed analysis of class-wise performance, as visualized in the detailed heatmaps (Figure 5), revealed a consistent challenge: the accurate identification of the minority class (Class 2). The overall stability and variability of each model across all evaluation metrics are further visualized in Figure 6, which presents the mean and standard deviation of the results. To provide a complete picture of the models’ predictive characteristics, we also compared their weighted precision (Figure 7) and weighted recall (Figure 8) across the datasets. The performance ranking remained remarkably stable: RF and XGBoost occupied the top two positions in all three datasets.

Despite their capacity to model complex temporal patterns, deep learning models (MLP, RNN, LSTM, Transformer) were, on average, outperformed by the ensemble methods (Figure 2, Figure 3, Figure 4 and Figure 5). Notably, the Transformer model achieved competitive accuracy on Dataset_01 (0.824), suggesting an aptitude for capturing specific long-term temporal dependencies in depressive symptoms. However, its performance exhibited greater variability across datasets compared to the more stable tree-based ensembles. The deep learning models showed clear signs of overfitting and demonstrated poor generalization capability. Furthermore, all deep learning models required substantially greater computational resources for training. This establishes the practical superiority of tree-based ensembles for this specific prediction task.

As indicated in Figure 4 and detailed in Supplementary Table S3, a consistent challenge was the accurate identification of the minority class (Class 2), which represented the smallest patient subgroup across all datasets. This is a known limitation in machine learning with highly imbalanced data. However, for the larger and clinically significant classes (Class 1, 3, and 4), RF and XGBoost provided reliable and substantially better discrimination than other models, as evidenced by their higher performance in the relevant metrics.

This comprehensive cross-dataset benchmarking, synthesizing evidence from all performance metrics and visualizations (Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8, Table 4), confirms that the ensemble methods, particularly RF and XGBoost, deliver the most robust and effective performance for predicting sarcopenia risk based on dynamic depression trajectories. They successfully balance high predictive power with computational efficiency, making them the most suitable candidates for practical deployment in this context. The overall performance statistics summary of each dataset is detailed in Supplementary Table S4.

To evaluate the clinical utility of our predictive models beyond conventional performance metrics, we conducted a comprehensive Decision Curve Analysis (DCA) across all outcome classes and models (Supplementary Figures S3–S13). The analysis revealed that the top-performing tree-based ensemble models, particularly Random Forest and XGBoost, demonstrated superior net benefit compared to both “Treat All” and “Treat None” strategies across a wide range of clinically relevant threshold probabilities (approximately 10–50%). The confusion matrices presented in Figure 9 offer detailed insights into each model’s classification performance across all sarcopenia outcome classes, revealing specific patterns of correct and incorrect predictions.

For the primary sarcopenia outcome classes (Class 3 “Sarcopenia” and Class 4 “Severe Sarcopenia”), these models maintained a positive net benefit throughout most threshold ranges, indicating their practical value in clinical decision-making. This DCA confirms that using our best-performing models to guide intervention decisions would lead to better clinical outcomes by appropriately balancing true positives against false positives across varying risk thresholds and clinical preferences.

3.4. Feature Importance

To identify the key drivers of sarcopenia risk and validate the central hypothesis of our study, we performed a comprehensive feature importance analysis across all seven machine learning models. The results robustly affirm that depressive symptoms are a core, independent predictor of sarcopenia, while also delineating a comprehensive multidimensional risk profile.

As summarized in Table 5, depressive symptoms (CES-D-10 score) consistently ranked as one of the most significant predictors. This finding was most pronounced in the tree-based ensemble models, which demonstrated superior overall performance. In the Random Forest model, depressive symptoms were the third most important predictor (5.7% importance), trailing only the established physiological factors of BMI and cognitive function. This places depressive symptom severity ahead of other critical metrics, such as age, waist circumference, and all other lifestyle factors, in predicting sarcopenia risk.

The critical role of mental health was further embedded within a broader pathophysiological context. The dominance of body composition metrics (BMI and waist circumference) across all models underscores the interplay between metabolic health and musculoskeletal integrity. Similarly, the prominence of age and gender aligns with established epidemiological evidence for sarcopenia.

Furthermore, the feature importance patterns reveal potential mechanistic pathways linking depression to sarcopenia. The high ranking of cognitive function in the Random Forest model suggests a shared neurobiological pathway, supporting the “mind–body connection” in physical frailty. Concurrently, modifiable lifestyle factors such as smoking and alcohol consumption appeared among the top predictors in several models, indicating that depression may influence sarcopenia risk through behavioral pathways involving health-compromising behaviors.

It is noteworthy that the importance of depressive symptoms was more consistently captured by the tree-based models (Random Forest and XGBoost) than by the deep learning architectures. This suggests that the relationship between depression and sarcopenia may be more effectively modeled through explicit feature interactions rather than the implicit representations learned by deep neural networks. This finding further reinforces the practical superiority of tree-based ensembles for this specific clinical prediction task, as they provide both robust performance and clinically interpretable insights. The feature importance patterns are visualized in detail in Figure 10, which compares the top predictors across the seven models. A further detailed ranking of feature importance for each individual model is provided in Figure 11.

In conclusion, this feature importance analysis provides compelling evidence that depressive symptoms serve as a powerful and independent predictor of sarcopenia risk, operating within a complex network of physiological, cognitive, and behavioral factors. This finding crucially validates our core methodological innovation—using dynamic depression trajectories rather than single-point assessments—by confirming that mental health status contains unique and significant information for predicting future physical frailty. This multidimensional risk profile, uncovered by AI, not only predicts risk but also points to actionable targets for public health interventions, such as promoting cognitive activities and smoking cessation alongside mental health care. It is important to emphasize that the feature importance analysis presented here identifies predictive associations rather than establishes causality. The models highlight which variables, including depressive symptoms, are most informative for predicting sarcopenia risk within our dataset. The observed associations may be influenced by unmeasured confounding factors or complex bidirectional relationships.

4. Discussion

This study exemplifies the application of Artificial Intelligence in Public Health by establishing the dynamic trajectories of depressive symptoms as a powerful predictor of sarcopenia risk in older adults. To our knowledge, this is one of the few investigations to integrate longitudinal mental health patterns with multiple machine learning algorithms for sarcopenia prediction. While previous AI studies have primarily relied on static features [35,36], our findings provide compelling evidence that the temporal evolution of depressive symptoms offers superior predictive power compared to conventional single-point assessments [37]. The decision to employ a discrete 3-class model was driven by the necessity to derive clinically interpretable risk strata for potential public health intervention, a principle that aligns with hybrid modelling approaches which emphasize the integration of domain knowledge for downstream predictive robustness [26].

The exceptional performance of tree-based ensemble methods, particularly Random Forest and XGBoost, across all benchmarking datasets establishes their practical utility for this prediction task [38,39]. While deep learning architectures demonstrated capacity for capturing complex temporal patterns, their higher computational demands and inconsistent cross-dataset performance rendered them less suitable for clinical deployment in this context [40]. The remarkable stability of RF and XGBoost across diverse depressive trajectory groups underscores their robustness for real-world implementation.

Crucially, our feature importance analysis provides insights into the predictive relationship between depression and sarcopenia. The consistent ranking of depressive symptoms among the top predictors, particularly in the best-performing Random Forest model, robustly affirms mental health as a core, independent predictive marker for sarcopenia risk. This finding aligns with emerging evidence of the bidirectional mind–body connection in age-related frailty [41,42]. The prominence of depressive symptoms ahead of established risk factors such as waist circumference and lifestyle behaviors highlights the critical importance of integrating mental health assessment into sarcopenia screening protocols. This finding provides a data-driven mandate for clinicians to view persistent depression not merely as a comorbid condition, but as a potent, independent risk factor for physical frailty. For instance, an older adult exhibiting a ‘Persistently High’ depression trajectory should be considered for proactive muscle health assessments, even in the absence of other traditional risk factors.

Our findings reveal a complex multidimensional risk profile that extends beyond conventional physiological factors. The dominance of body composition indicators (BMI and waist circumference) across all models reaffirms their fundamental role in musculoskeletal health [43]. Meanwhile, the high importance of cognitive function suggests shared neurobiological pathways between mental and physical frailty, possibly involving inflammatory processes or hypothalamic–pituitary–adrenal axis dysregulation [44,45]. The appearance of modifiable lifestyle factors, including smoking and alcohol consumption, as significant predictors indicates potential behavioral mechanisms through which depression influences sarcopenia risk, possibly via reduced physical activity, poor nutrition, or social withdrawal [17].

From a clinical perspective, our ML framework enables a paradigm shift toward proactive prevention. By identifying older adults with high-risk depression trajectories (e.g., “persistently high”), healthcare providers can initiate early, multidimensional interventions targeting the identified risk factors [46,47]. This might include combined approaches addressing psychological distress through therapy, cognitive engagement through mentally stimulating activities, and physical exercise to preserve muscle mass—all before significant, often irreversible muscle loss occurs.

The Decision Curve Analysis further strengthens the clinical relevance of our findings. The demonstrated net benefit of our top-performing models across clinically meaningful threshold probabilities suggests that implementing this prediction framework in practice could lead to tangible improvements in patient outcomes. By providing a favorable balance between identifying true cases and avoiding unnecessary interventions, our models offer a practical tool for risk stratification in resource-constrained healthcare settings.

The clinical translation of our predictive model advocates for a multidisciplinary care approach. Upon identifying high-risk individuals, a coordinated intervention should be initiated. This includes: involving psychologists to address depressive symptoms through evidence-based therapies (e.g., Cognitive Behavioral Therapy); collaborating with dietitians to provide nutritional support targeting muscle health, such as ensuring adequate protein intake and Vitamin D supplementation; and working with physiotherapists to prescribe tailored resistance and balance exercise regimens. This integrated strategy concurrently targets the psychological, behavioral, and physiological pathways linking depression to sarcopenia, thereby enabling proactive and holistic patient care.

The clinical implications of our study are substantial. The robust performance of our models across distinct depression trajectory groups suggests generalizability to diverse clinical populations. The relatively simple implementation requirements of tree-based models make them feasible for integration into existing geriatric assessment workflows, potentially providing a low-cost, scalable risk-stratification tool for both geriatric medicine and psychiatry settings.

Several limitations should be considered when interpreting our findings. First, while the CHARLS cohort is nationally representative, the potential for residual confounding cannot be ruled out, and external validation in other populations is warranted [22]. Second, the assessment of muscle mass relied on anthropometric equations rather than gold-standard methods like DXA, which might introduce measurement error [48]. Third, although we employed multiple imputation for missing data, the potential for selection bias remains. Fourth, our sample required complete data across multiple waves and assessments. The sensitivity analysis (Supplementary Table S1) revealed significant differences in baseline characteristics between the included and excluded participants, indicating that the former were generally older, had a higher burden of chronic disease, and differed in socioeconomic composition. This suggests a potential for selection bias, as participants who were lost to follow-up or had extensive missing data are often systematically less healthy. Consequently, the generalizability of our findings might be somewhat limited to a relatively healthier and more stable segment of the older population. However, it is noteworthy that the key exposure (depressive symptoms) and an important predictor (cognitive function) were balanced between the groups, which strengthens the internal validity of the identified relationship between depression trajectories and sarcopenia risk. External validation in more inclusive cohorts is needed to confirm the generalizability of our model. In addition, the assessment of muscle mass relied on anthropometric prediction equations rather than gold-standard methods like dual-energy X-ray absorptiometry (DXA). This approach, while necessary for large-scale population studies, may introduce measurement error and non-differential misclassification bias in the diagnosis of sarcopenia. Such bias would likely attenuate the observed associations and model performance metrics, suggesting that the true predictive relationship between depressive trajectories and sarcopenia might be stronger than reported here.

Future research should focus on integrating this AI framework into real-world public health workflows and mobile health (mHealth) platforms to enable continuous risk monitoring. Exploring federated learning techniques could also allow for building more generalized models while protecting data privacy across different jurisdictions, representing a key future possibility for AI in public health.

5. Conclusions

This study establishes the significant predictive value of longitudinal depressive symptom trajectories for sarcopenia risk in older adults. Through a comprehensive benchmarking of seven machine learning algorithms, we identified tree-based ensemble methods, particularly Random Forest (RF) and XGBoost, as the most suitable and robust predictors for this task. RF achieved the highest mean accuracy (0.8265), while XGBoost attained the highest mean weighted F1-score (0.8084). Their consistent superiority over other models, including deep learning architectures, combined with computational efficiency and stability across diverse datasets, designates them as the optimal choices for practical deployment in predicting sarcopenia risk based on dynamic depression patterns.

The predictive accuracy of our best model (RF, 0.8265) is highly competitive with the current state-of-the-art. For instance, Kim et al. [36] reported a top test accuracy of 0.848 using physical factors, while Seok et al. [20] achieved 78.8% accuracy with socioeconomic data. Although Ozgur et al. [21] reported higher accuracies (RF: 89.4%), their model was developed on a selective sample of female participants from a university hospital, which may limit its generalizability to community-dwelling populations and both genders. In contrast, our model achieved its robust performance on a large, nationally representative cohort of both men and women, using a novel and dynamic predictor, longitudinal depression trajectories. This approach not only delivers competitive accuracy but also captures the temporal evolution of a core, modifiable risk factor, offering a more holistic and clinically informative tool for population-level screening than static assessments.

Beyond model performance, our feature importance analysis yielded a clinically interpretable, multidimensional risk profile, robustly affirming depressive symptoms as a core, independent predictor, ranking ahead of traditional risk factors like age and waist circumference. This finding, coupled with the high importance of cognitive function and modifiable lifestyle factors, provides a mechanistic understanding and actionable targets for intervention.

Looking forward, several avenues for future work emerge. First, external validation in independent and diverse populations, including other national cohorts and prospective clinical settings, is essential to confirm the generalizability and clinical utility of our model. While our internal validation through rigorous cross-validation and the reporting of confidence intervals provides robust performance estimates, validation in external datasets remains a critical next step. Second, future studies could benefit from incorporating gold-standard measures of muscle mass (e.g., DXA). Finally, to facilitate widespread adoption while addressing data privacy and imbalance concerns, we propose integrating this AI framework into real-world public health workflows and exploring federated learning techniques. Federated learning would allow for building more robust and generalizable models across multiple institutions without centralizing sensitive patient data, directly addressing the challenges of data sharing and population-level imbalance.

In summary, this work provides a validated, AI-driven strategy that moves beyond static risk assessment. By mapping dynamic mental health trajectories to physical frailty risk, our framework facilitates a necessary paradigm shift in geriatric care, from reactive treatment to proactive, personalized, and scalable prevention.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ai6120300/s1, Figure S1: Five-class depressive symptom trajectory groups; Figure S2: Cross-dataset model performance comparison for the five-class trajectory solution; Table S1: Comparison of fit indices between Latent Class Growth Model (LCGM) and Growth Mixture Model (GMM) for depressive symptom trajectories; Table S2: Comparison of baseline characteristics between participants included (n = 6125) and excluded (n = 5481); Table S3a: Detailed per-class performance metrics for all models across three datasets; Table S3b: Confusion matrices for all models across three datasets; Table S4: Overall performance statistics for each dataset; Figures S3–S6: Decision Curve Analysis (DCA) across all outcome classes (Class 1–4); Figures S7–S13: Decision Curve Analysis (DCA) across all machine learning models.

Author Contributions

Conceptualization, Y.H. and C.P.; methodology, Y.H. and C.P.; software, Y.H. and R.T.; validation, Y.H. and R.T.; formal analysis, Y.H. and R.T.; investigation, Y.H. and C.P.; resources, Y.H.; data curation, Y.H. and R.T.; writing—original draft preparation, Y.H.; writing—review and editing, C.P. and R.T.; visualization, Y.H. and R.T.; supervision, C.P.; project administration, H.Q.; funding acquisition, H.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China (Grant No. 62271466).

Institutional Review Board Statement

The study was conducted in accordance with the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of Peking University (PU IRB) (IRB00001052-11015, 20 January 2011).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the original China Health and Retirement Longitudinal Study (CHARLS).

Data Availability Statement

The China Health and Retirement Longitudinal Study (CHARLS) data is made available to the global research community. The data can be obtained from the CHARLS project upon a straightforward registration (http://charls.pku.edu.cn/ (accessed on 19 August 2025)). Access to certain sensitive variables may require a formal application process.

Acknowledgments

The authors acknowledge the China Health and Retirement Longitudinal Study (CHARLS) for providing data. We are grateful to the original data creators, depositors, copyright holders, and funders.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ailshire, J.; Lee, J. Aging Experiences Around the World: Local Findings and Global Insights From Population Surveys of Aging. Innov. Aging 2020, 4 (Suppl. S1), 559–560. [Google Scholar] [CrossRef]
Ogura, S.; Jakovljevic, M.M. Editorial: Global Population Aging—Health Care, Social and Economic Consequences. Front. Public Health 2018, 6, 335. [Google Scholar] [CrossRef]
Ueshima, J.; Maeda, K.; Shimizu, A.; Inoue, T.; Murotani, K.; Mori, N.; Satake, S.; Matsui, Y.; Arai, H. Diagnostic accuracy of sarcopenia by “possible sarcopenia” premiered by the Asian Working Group for Sarcopenia 2019 definition. Arch. Gerontol. Geriatr. 2021, 97, 104484. [Google Scholar] [CrossRef] [PubMed]
Maurice, C.; Engels, C.; Canouï-Poitrine, F.; Lemogne, C.; Fromantin, I.; Poitrine, E. Dog ownership and mental health among community-dwelling older adults: A systematic review. Int. J. Geriatr. Psychiatry 2022, 37. [Google Scholar] [CrossRef]
Tan, J.Y.; Zeng, Q.L.; Ni, M.; Zhang, Y.X.; Qiu, T. Association among calf circumference, physical performance, and depression in the elderly Chinese population: A cross-sectional study. BMC Psychiatry 2022, 22, 278. [Google Scholar] [CrossRef]
Kokkeler, K.J.E.; van den Berg, K.S.; Comijs, H.C.; Oude Voshaar, R.C.; Marijnissen, R.M. Sarcopenic obesity predicts nonremission of late-life depression. Int. J. Geriatr. Psychiatry 2019, 34, 1226–1234. [Google Scholar] [CrossRef]
Kim, N.H.; Kim, H.S.; Eun, C.R.; Seo, J.A.; Cho, H.J.; Kim, S.G.; Choi, K.M.; Baik, S.H.; Choi, D.S.; Park, M.H.; et al. Depression is associated with sarcopenia, not central obesity, in elderly korean men. J. Am. Geriatr. Soc. 2011, 59, 2062–2068. [Google Scholar] [CrossRef]
Rajna, P. Psychosomatic disorders and illnesses: A blind spot of medicine. Orv. Hetil. 2021, 162, 252–261. [Google Scholar] [CrossRef]
Yuenyongchaiwat, K.; Jongritthiporn, S.; Somsamarn, K.; Sukkho, O.; Pairojkittrakul, S.; Traitanon, O. Depression and low physical activity are related to sarcopenia in hemodialysis: A single-center study. PeerJ 2021, 9, e11695. [Google Scholar] [CrossRef] [PubMed]
Yuenyongchaiwat, K.; Boonsinsukh, R. Sarcopenia and Its Relationships with Depression, Cognition, and Physical Activity in Thai Community-Dwelling Older Adults. Curr. Gerontol. Geriatr. Res. 2020, 2020, 8041489. [Google Scholar] [CrossRef] [PubMed]
Olgun Yazar, H.; Yazar, T. Prevalence of sarcopenia in patients with geriatric depression diagnosis. Ir. J. Med. Sci. 2019, 188, 931–938. [Google Scholar] [CrossRef]
Gao, Y.; Jia, Z.; Zhao, L.; Han, S. The Effect of Activity Participation in Middle-Aged and Older People on the Trajectory of Depression in Later Life: National Cohort Study. JMIR Public Health Surveill. 2023, 9, e44682. [Google Scholar] [CrossRef]
Saunders, R.; Buckman, J.E.J.; Suh, J.W.; Fonagy, P.; Pilling, S.; Bu, F.; Fancourt, D. Variation in symptoms of common mental disorders in the general population during the COVID-19 pandemic: Longitudinal cohort study. BJPsych Open 2024, 10, e45. [Google Scholar] [CrossRef]
Koskinen, M.K.; van Mourik, Y.; Smit, A.B.; Riga, D.; Spijker, S. From stress to depression: Development of extracellular matrix-dependent cognitive impairment following social stress. Sci. Rep. 2020, 10, 17308. [Google Scholar] [CrossRef] [PubMed]
Dooley, L.N.; Kuhlman, K.R.; Robles, T.F.; Eisenberger, N.I.; Craske, M.G.; Bower, J.E. The role of inflammation in core features of depression: Insights from paradigms using exogenously-induced inflammation. Neurosci. Biobehav. Rev. 2018, 94, 219–237. [Google Scholar] [CrossRef] [PubMed]
Thai, M.; Schreiner, M.W.; Mueller, B.A.; Cullen, K.R.; Klimes-Dougan, B. Coordination between frontolimbic resting state connectivity and hypothalamic-pituitary-adrenal axis functioning in adolescents with and without depression. Psychoneuroendocrinology 2021, 125, 105123. [Google Scholar] [CrossRef] [PubMed]
Choi, K.W.; Zheutlin, A.B.; Karlson, R.A.; Wang, M.; Dunn, E.C.; Stein, M.B.; Karlson, E.W.; Smoller, J.W. Physical activity offsets genetic risk for incident depression assessed via electronic health records in a biobank cohort study. Depress. Anxiety 2020, 37, 106–114. [Google Scholar] [CrossRef]
Krittanawong, C.; Virk, H.U.H.; Bangalore, S.; Wang, Z.; Johnson, K.W.; Pinotti, R.; Zhang, H.; Kaplin, S.; Narasimhan, B.; Kitai, T.; et al. Machine learning prediction in cardiovascular diseases: A meta-analysis. Sci. Rep. 2020, 10, 16057. [Google Scholar] [CrossRef]
He, J.; Li, J.; Jiang, S.; Cheng, W.; Jiang, J.; Xu, Y.; Yang, J.; Zhou, X.; Chai, C.; Wu, C. Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation. Front. Public Health 2022, 10, 967681. [Google Scholar] [CrossRef]
Seok, M.; Kim, W.; Kim, J. Machine Learning for Sarcopenia Prediction in the Elderly Using Socioeconomic, Infrastructure, and Quality-of-Life Data. Healthcare 2023, 11, 2881. [Google Scholar] [CrossRef]
Ozgur, S.; Altinok, Y.A.; Bozkurt, D.; Saraç, Z.F.; Akçiçek, S.F. Performance Evaluation of Machine Learning Algorithms for Sarcopenia Diagnosis in Older Adults. Healthcare 2023, 11, 2699. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Hu, Y.; Smith, J.P.; Strauss, J.; Yang, G. Cohort profile: The China Health and Retirement Longitudinal Study (CHARLS). Int. J. Epidemiol. 2014, 43, 61–68. [Google Scholar] [CrossRef]
Chen, L.-K.; Liu, L.-K.; Woo, J.; Assantachai, P.; Auyeung, T.-W.; Bahyah, K.S.; Chou, M.-Y.; Chen, L.-Y.; Hsu, P.-S.; Krairit, O.; et al. Sarcopenia in Asia: Consensus Report of the Asian Working Group for Sarcopenia. J. Am. Med. Dir. Assoc. 2014, 15, 95–101. [Google Scholar] [CrossRef]
Chen, H.; Mui, A.C. Factorial validity of the Center for Epidemiologic Studies Depression Scale short form in older population in China. Int. Psychogeriatr. 2014, 26, 49–57. [Google Scholar] [CrossRef]
Boey, K.W. Cross-validation of a short form of the CES-D in Chinese elderly. Int. J. Geriatr. Psychiatry 1999, 14, 608–617. [Google Scholar] [CrossRef]
Pratticò, D.; Carlo, D.D.; Silipo, G.; Laganà, F. Hybrid FEM-AI Approach for Thermographic Monitoring of Biomedical Electronic Devices. Computers 2025, 14, 344. [Google Scholar] [CrossRef]
Christodoulou, E.; Ma, J.; Collins, G.S.; Steyerberg, E.W.; Verbakel, J.Y.; Van Calster, B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 2019, 110, 12–22. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Sutou, A.; Wang, J. Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence Functions. IEEE Access 2024, 12, 193473–193486. [Google Scholar] [CrossRef]
Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. MLP-Mixer: An all-MLP Architecture for Vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
Zaremba, W.; Sutskever, I.; Vinyals, O. Recurrent Neural Network Regularization. arXiv 2014, arXiv:1409.2329. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G.M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. BMC Med. 2015, 13, 1. [Google Scholar] [CrossRef] [PubMed]
Facal, D.; Valladares-Rodriguez, S.; Lojo-Seoane, C.; Pereiro, A.X.; Anido-Rifon, L.; Juncos-Rabadán, O. Machine learning approaches to studying the role of cognitive reserve in conversion from mild cognitive impairment to dementia. Int. J. Geriatr. Psychiatry 2019, 34, 941–949. [Google Scholar] [CrossRef]
Kamal, S.; Demetriades, C.; Roy, A.; Garcia-Pittman, E. Using Machine Learning to Predict Poor Mental Health Status in the Geriatric Population. Am. J. Geriatr. Psychiatry 2023, 31, S71–S72. [Google Scholar] [CrossRef]
Kim, J.H. Machine-learning classifier models for predicting sarcopenia in the elderly based on physical factors. Geriatr. Gerontol. Int. 2024, 24, 595–602. [Google Scholar] [CrossRef]
Nemesure, M.D.; Collins, A.C.; Price, G.D.; Griffin, T.Z.; Pillai, A.; Nepal, S.; Heinz, M.V.; Lekkas, D.; Campbell, A.T.; Jacobson, N.C. Depressive Symptoms as a Heterogeneous and Constantly Evolving Dynamical System: Idiographic Depressive Symptom Networks of Rapid Symptom Changes Among Persons With Major Depressive Disorder. J. Psychopathol. Clin. Sci. 2024, 133, 155–166. [Google Scholar] [CrossRef]
Parmar, A.; Katariya, R.; Patel, V. A Review on Random Forest: An Ensemble Classifier. In Proceedings of the International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018, Coimbatore, India, 7–8 August 2018; pp. 758–763. [Google Scholar]
Chen, T.Q.; Guestrin, C.; Assoc Comp, M. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Loewenthal, J. The Potential for Mind-Body Therapies as Frailty Interventions. Innov. Aging 2024, 8, 393–394. [Google Scholar] [CrossRef]
Wan, R.; Huang, J.; Wang, K.; Long, D.; Tao, A.; Huang, J.; Liu, Z. Effectiveness of Mind-Body Exercise in Older Adults With Sarcopenia and Frailty: A Systematic Review and Meta-Analysis. J. Cachexia Sarcopenia Muscle 2025, 16, e13806. [Google Scholar] [CrossRef]
Abbate, L.M.; Stevens, J.; Schwartz, T.A.; Renner, J.B.; Helmick, C.G.; Jordan, J.M. Anthropometric measures, body composition, body fat distribution, and knee osteoarthritis in women. Obesity 2006, 14, 1274–1281. [Google Scholar] [CrossRef]
Ma, L.N.; Chan, P. Understanding the Physiological Links Between Physical Frailty and Cognitive Decline. Aging Dis. 2020, 11, 405–418. [Google Scholar] [CrossRef]
Sharan, P.; Vellapandian, C. Hypothalamic-Pituitary-Adrenal (HPA) Axis: Unveiling the Potential Mechanisms Involved in Stress-Induced Alzheimer’s Disease and Depression. Cureus J. Med. Sci. 2024, 16, e67595. [Google Scholar] [CrossRef]
Cappeliez, P. Psychotherapeutic Interventions Among Depressed Elderly People. J. Psychiatry Neurosci. 1991, 16, 170–175. [Google Scholar] [PubMed]
Kil, T.; Yoon, K.A.; Ryu, H.; Kim, M. Effect of group integrated intervention program combined animal-assisted therapy and integrated elderly play therapy on live alone elderly. J. Anim. Sci. Technol. 2019, 61, 379–387. [Google Scholar] [CrossRef] [PubMed]
Lukaski, H. Sarcopenia: Assessment of muscle mass. J. Nutr. 1997, 127, S994–S997. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Heterogeneous trajectory classes of depressive symptoms.

Figure 2. Cross-dataset model performance comparison.

Figure 3. Accuracy comparison across all datasets.

Figure 4. F1-Score (weighted) comparison across all datasets.

Figure 5. Detailed performance heatmaps.

Figure 6. Mean and standard deviation of all metrics.

Figure 7. Precision (weighted) comparison across all datasets.

Figure 8. Recall (weighted) comparison across all datasets.

Figure 9. Confusion matrices for each model across all datasets.

Figure 10. Feature importance comparison across datasets.

Figure 11. Feature importance ranking plot.

Table 1. Characteristics of the study participants at baseline.

Variable	Group 1	Group 2	Group 3	p-Value
Age				0.000 **
<60	45.7	41.2	42.2
60–69	36.1	37.4	40.4
70–79	15.1	18.2	15.1
≥80	3.0	3.3	2.3
Sex				0.000 **
Male	57.1	41.1	34.5
Female	42.9	58.9	65.5
Residence				0.000 **
Rural	64.2	74.7	73.9
Urban	35.8	25.3	26.1
Geographical location				0.000 **
Eastern	34.9	26.0	21.1
Central	33.2	37.5	34.2
Western	25.0	30.9	40.4
Northeast	6.9	5.7	4.3
Education				0.000 **
Vocational school	39.7	55.7	59.0
Two/Three Year College	23.7	22.8	21.1
Four Year College	24.2	15.7	14.6
Others	12.4	5.8	5.3
Income				0.000 **
<4999	20.7	29.6	30.9
4999–9999	4.6	6.8	7.4
≥10,000	74.7	63.6	61.7
Life satisfaction				0.000 **
Completely satisfied	0.3	2.4	6.0
Very satisfied	4.4	11.1	19.4
Somewhat satisfied	60.7	61.8	56.3
Not very satisfied	30.2	21.5	16.1
Not at all satisfied	4.4	3.2	2.1
Self-reported health status				0.000 **
Very good	1.6	5.5	10.7
Good	11.3	27.3	36.3
Fair	56.3	53.6	44.0
Poor	18.8	9.4	6.3
Very poor	11.9	4.3	2.7
Hypertension				0.000 **
No	71.9	68.0	65.1
Yes	28.1	32.0	34.9
Diabetes				0.000 **
No	93.4	91.6	90.1
Yes	6.6	8.4	9.9
Dyslipidemia				0.000 **
No	87.1	86.8	82.6
Yes	12.9	13.2	17.4
Comorbidity				0.000 **
No	32.1	20.8	12.9
Yes	67.9	79.2	87.1
Teeth				0.000 **
No	88.7	84.7	84.4
Lost all	11.3	15.3	15.6
Major misfortune injury experience				0.000 **
Never	99.0	98.3	97.9
Ever	1.0	1.7	2.1
ADL disorder				0.000 **
No	90.6	75.0	68.0
Yes	9.4	25.0	32.0
Disability				0.000 **
No	78.8	67.4	62.7
Yes	21.2	32.6	37.3
Smoking				0.000 **
Never	66.9	72.2	75.8
Ever	33.1	27.8	24.2
Alcohol consumption				0.000 **
Never	61.1	71.7	74.8
Ever	38.9	28.3	25.2
Sleep duration				0.000 **
<4 h	3.6	11.0	17.3
4–6	18.6	25.4	30.5
6–8	68.6	55.2	45.8
≥8 h	9.2	8.4	6.4
Exercise				0.128
Yes	93.3	92.5	92.7
No	6.7	7.5	7.3
BMI				0.000 **
<18.5	3.8	6.3	6.3
18.5–24	47.7	49.5	47.5
≥24	48.5	44.2	46.2
Cognitive scores	12.4 ± 3.1	11.1 ± 3.2	10.9 ± 3.2	0.007
Depressive symptom score	4.6 ± 3.1	10.5 ± 5.5	14.7 ± 6.4	0.000 **

** significant difference with p-value lower than 0.05.

Table 2. Fit statistics for trajectory classes of depressive symptoms.

Fit Statistic	1	2	3	4	5	6
BIC *	−16,953.87	−19,596.26	−19,809.09	−19,933.42	−20,076.19	−20,042.70
AIC *	17,047.95	19,791.15	20,104.78	20,329.91	20,573.48	20,640.79
Class proportion ¶
	Class 1, 100.00%	Class 1, 44.60%	Class 1, 31.8%	Class 1, 42.87%	Class 1, 24.26%	Class 1, 30.61%
		Class 2, 55.40%	Class 2, 28.4%	Class 2, 12.46%	Class 2, 21.49%	Class 2, 20.82%
			Class 3, 39.8%	Class 3, 15.66%	Class 3, 14.33%	Class 3, 18.40%
				Class 4, 29.01%	Class 4, 11.93%	Class 4, 10.30%
					Class 5, 27.98%	Class 5, 9.75%
						Class 6, 10.12%
APP ‡
	Class 1, 1.00	Class 1, 1.00	Class 1, 1.00	Class 1, 1.00	Class 1, 1.00	Class 1, 1.00
		Class 2, 0.88	Class 2, 0.79	Class 2, 0.74	Class 2, 0.69	Class 2, 0.68
			Class 3, 0.84	Class 3, 0.83	Class 3, 0.81	Class 3, 0.67
				Class 4, 0.69	Class 4, 0.71	Class 4, 0.67
					Class 5, 0.75	Class 5, 0.79
						Class 6, 0.68

CHARLS, AIC Akaike’s information criterion, BIC Bayesian information criteria, APP average posterior probabilities; * A lower absolute value suggests a better model fit; ¶ Proportion of individuals in each class; ‡ Average posterior probability of assignment to each class.

Table 3. The final three-group trajectory model of depressive symptoms in middle-aged and older adults from CHARLS.

Trajectory Group	Parameter	Est.	SE	T Value	p Value
Class 1: persistently low depressive symptom group (n = 1945, 31.8%)	Intercept	−0.71386	0.46285	−1.542	0.1230
	Linear (time)	−0.06037	0.61124	−0.099	0.9213
	Quadratic (time²)	−0.16700	0.63941	−0.261	0.7940
	Cubic (time³)	−0.16094	0.64681	−0.249	0.8035
Class 2: persistently moderate depressive symptom group (n = 1740, 28.4%)	Intercept	0.37900	0.67685	0.560	0.5755
	Linear (time)	0.04538	1.32722	0.034	0.9727
	Quadratic (time²)	0.72699	0.89107	0.816	0.4146
	Cubic (time³)	0.71984	0.89413	0.805	0.4208
Class 3: persistently high depressive symptom group (n = 2440, 39.8%)	Intercept	1.12685	0.98740	1.141	0.2538
	Linear (time)	0.07552	1.15527	0.065	0.9479
	Quadratic (time²)	−0.68391	1.17752	−0.581	0.5614
	Cubic (time³)	−0.68764	1.17085	−0.587	0.5570

CHARLS, Est. parameter estimate, SE standard error of parameter estimate.

Table 4. Cross-dataset performance comparison of machine learning models.

Model	Dataset_01 Accuracy	Dataset_01 F1 (Weighted)	Dataset_02 Accuracy	Dataset_02 F1 (Weighted)	Dataset_03 Accuracy	Dataset_03 F1 (Weighted)	Mean Accuracy	Mean F1 (Weighted)
Logistic Regression	0.7417 ± 0.0146	0.7935 ± 0.0120	0.6994 ± 0.0194	0.7414 ± 0.0178	0.6951 ± 0.0247	0.7329 ± 0.0235	0.7121	0.7559
Random Forest	0.8745 ± 0.0105	0.8605 ± 0.0120	0.8144 ± 0.0164	0.7971 ± 0.0186	0.7907 ± 0.0224	0.7650 ± 0.0254	0.8265	0.8075
XGBoost	0.8546 ± 0.0113	0.8508 ± 0.0117	0.8163 ± 0.0164	0.8079 ± 0.0176	0.7825 ± 0.0221	0.7666 ± 0.0254	0.8178	0.8084
MLP	0.7940 ± 0.0131	0.8209 ± 0.0115	0.7544 ± 0.0180	0.7676 ± 0.0178	0.7459 ± 0.0247	0.7496 ± 0.0249	0.7648	0.7794
RNN	0.7464 ± 0.0140	0.7955 ± 0.0117	0.7013 ± 0.0192	0.7413 ± 0.0178	0.7040 ± 0.0243	0.7376 ± 0.0232	0.7172	0.7581
LSTM	0.7927 ± 0.0127	0.8210 ± 0.0109	0.7512 ± 0.0180	0.7699 ± 0.0166	0.7205 ± 0.0247	0.7447 ± 0.0231	0.7548	0.7785
Transformer	0.8240 ± 0.0120	0.8374 ± 0.0112	0.7488 ± 0.0187	0.7674 ± 0.0176	0.7347 ± 0.0228	0.7481 ± 0.0227	0.7692	0.7843

Table 5. Top Predictive features for sarcopenia across machine learning models.

Rank	Logistic Regression	Random Forest	XGBoost	MLP	RNN	LSTM	Transformer
1	BMI (17.6%)	BMI (10.8%)	BMI (17.9%)	BMI (21.8%)	BMI (27.5%)	Age (22.9%)	BMI (23.1%)
2	Age (14.5%)	Cognition scores(7.3%)	Age (12.6%)	Age (29.8%)	Gender (19.2%)	BMI (16.8%)	Age (19.9%)
3	Gender (10.7%)	Depressive Symptoms (5.7%)	Waist (4.0%)	Gender (14.6%)	Age (15.4%)	Gender (20.1%)	Gender (19.4%)
4	Waist (4.8%)	Waist (4.4%)	Drink Level (4.2%)	Education (4.2%)	Waist (6.7%)	Waist (5.7%)	Drink (5.0%)
5	Smoke (6.0%)	Education (4.2%)	Gender (4.6%)	Drink Level (4.0%)	Exercise (3.4%)	Smoke (7.3%)	Dyslipidemia (1.3%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, Y.; Tian, R.; Pan, C.; Qi, H. Mapping Mental Trajectories to Physical Risk: An AI Framework for Predicting Sarcopenia from Dynamic Depression Patterns in Public Health. AI 2025, 6, 300. https://doi.org/10.3390/ai6120300

AMA Style

Han Y, Tian R, Pan C, Qi H. Mapping Mental Trajectories to Physical Risk: An AI Framework for Predicting Sarcopenia from Dynamic Depression Patterns in Public Health. AI. 2025; 6(12):300. https://doi.org/10.3390/ai6120300

Chicago/Turabian Style

Han, Yaxin, Renzhi Tian, Chengchang Pan, and Honggang Qi. 2025. "Mapping Mental Trajectories to Physical Risk: An AI Framework for Predicting Sarcopenia from Dynamic Depression Patterns in Public Health" AI 6, no. 12: 300. https://doi.org/10.3390/ai6120300

APA Style

Han, Y., Tian, R., Pan, C., & Qi, H. (2025). Mapping Mental Trajectories to Physical Risk: An AI Framework for Predicting Sarcopenia from Dynamic Depression Patterns in Public Health. AI, 6(12), 300. https://doi.org/10.3390/ai6120300

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Mapping Mental Trajectories to Physical Risk: An AI Framework for Predicting Sarcopenia from Dynamic Depression Patterns in Public Health

Abstract

1. Introduction

2. Methods

2.1. Data Resource

2.2. Study Participants

2.3. Sarcopenia Assessment

2.4. Depression Assessment

2.5. Other Input Variables

2.6. Trajectory Analysis

2.7. Predictive Model Development and Evaluation

3. Results

3.1. Population Demographics

3.2. Heterogeneous Trajectories of Depressive Symptoms

3.3. Cross-Dataset Performance Benchmarking of Machine Learning Models

3.4. Feature Importance

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI