1. Introduction
Obesity has emerged as a global health crisis, posing serious challenges for both individual health and public healthcare systems [
1,
2]. Developing effective strategies for the prevention and management of obesity is critical [
3]. Investigating dietary habits and physical activity levels is essential to grasp the complex and multifactorial nature of the condition [
4]. A general overview of obesity’s key factors and related diseases is shown in
Figure 1.
Accurately categorizing individuals by their obesity levels allows for targeted interventions and personalized health management. Understanding the factors that contribute to varying degrees of obesity enables healthcare professionals to design customized strategies that support weight management [
5], enhance overall health, and mitigate the impact of obesity-related diseases [
6].
Advancements in artificial intelligence and machine learning have paved the way for utilizing data-driven approaches to predict obesity categories [
7,
8]. These technologies enable the discovery of hidden patterns, identification of key variables, and development of predictive models that aid in early detection and proactive management of obesity [
9,
10].
Although previous research has applied machine learning to obesity analysis, challenges persist in fully capturing the interactions between different factors and accurately classifying the complete range of obesity categories [
11,
12,
13]. To overcome these challenges, this study adopts a multifaceted approach that incorporates advanced models, specifically the hybrid Mamba and attention-enhanced Bi-LSTM (ABi-LSTM), to provide a deeper understanding of obesity dynamics.
In this research, we leveraged the Obesity [
14] and CDC datasets [
15,
16] to construct a robust framework for obesity analysis. We first implemented the improved Mamba model to enhance scalability and process high-dimensional data effectively. Following this, we employed the ABi-LSTM framework to categorize individuals into distinct obesity classes, including Underweight, Normal, Overweight, and Obesity I, II, and III. The proposed ABi-LSTM model provides valuable insights into the critical factors influencing obesity classification, capturing complex temporal patterns and relationships among variables.
The main contributions of this paper are as follows:
Hybrid Improved Mamba Model:
We propose a novel hybrid approach by integrating the Mamba model with an enhanced tokenizer to handle high-dimensional obesity data. The new feature tokenizer efficiently converts both numerical and categorical inputs into tokens, including a special token that captures overall sequence information, making the model scalable and efficient for obesity classification.
Channel Independence for Obesity Data: To mitigate model overfitting and enhance prediction accuracy, we introduce the concept of channel independence, which processes each feature channel separately. This reshaping technique allows the model to handle feature-specific information more effectively while maintaining the integrity of the data structure during processing.
Attention-Enhanced Bidirectional LSTM (ABi-LSTM) Model: We develop an innovative ABi-LSTM model that combines attention mechanisms with bidirectional LSTM to better capture temporal patterns in sequential obesity data. This approach improves the model’s ability to classify individuals across multiple obesity levels, using a dynamic attention mechanism to prioritize the most important features in the data.
Effective Multiclass Classification for Obesity Levels: By applying hybrid machine learning models, our study demonstrates improved performance in classifying individuals into distinct obesity categories. This enables more accurate obesity level prediction, which can inform personalized health interventions and targeted management strategies.
Through this multifaceted approach, this study aspires to make meaningful contributions to the ongoing efforts in obesity research. By leveraging a comprehensive framework integrating regression analysis and classification techniques, the study yields valuable insights that can empower healthcare professionals and policymakers to enhance obesity prevention, management, and intervention strategies. Accurate prediction of obesity categories, coupled with interpretable identification of pivotal influencing factors, equip stakeholders with the necessary data-driven evidence to develop impactful tailored initiatives. This study’s findings support the implementation of innovative evidence-based approaches to tackle the complex challenges posed by the obesity epidemic, ultimately improving individual well-being and alleviating the burden on healthcare systems.
2. Related Works
Obesity has emerged as a major public health concern worldwide, significantly increasing the risk of chronic conditions such as cardiovascular diseases, diabetes, and hypertension [
17]. The World Health Organization (WHO) has highlighted the alarming rise in obesity prevalence, underscoring the necessity of effective classification and early identification of key determinants to develop personalized interventions [
18]. Traditional statistical models and clinical assessments often fall short of capturing the multifactorial complexity of obesity, prompting researchers to turn to advanced machine learning (ML) and deep learning (DL) techniques for more accurate and comprehensive analysis [
13].
For instance, analysis of the CHICA pediatric clinical decision support dataset using classical ML models such as RandomTree, RandomForest, ID3, J48, Naïve Bayes, and Bayesian Network revealed that ID3 delivered the highest performance, with 85% accuracy and 89% sensitivity, alongside a positive predictive value (PPV) of approximately 84% and negative predictive value (NPV) near 88% [
19]. Nonetheless, under imbalanced class conditions, sensitivity in J48 dropped dramatically—to as low as 29%—indicating a substantial under-detection of obese cases in skewed datasets [
19]. Similarly, broader studies employing diverse ML techniques on various datasets have produced a spectrum of results: Montañez et al. achieved an AUC of 90.5% using SVM on genetic profile data [
20]; ensemble strategies offered 89.68% accuracy. These classical and ensemble approaches, although often accurate, frequently suffer from overfitting [
21], a lack of robustness across heterogeneous datasets, and limitations in capturing the longitudinal patterns inherent in health data [
22].
Deep learning approaches have aimed to remedy these gaps by modeling temporal and behavioral dimensions [
23]. For example, a multi-layer perceptron trained on individual and behavioral variables—such as age, physical activity, and dietary habits—across normal weight (NW), overweight (OW), and obese (OB) categories achieved 75.8% overall accuracy, with class-specific true positive rates of 90.3% for NW, 34.2% for OW, and 66.7% for OB [
24]. While this model performed well in identifying normal-weight individuals, its severe drop in sensitivity for overweight subjects indicates a major limitation in early-stage obesity risk detection [
25], where accurate classification of overweight cases is essential to prevent disease progression. Similarly, recurrent neural network (RNN) architectures integrating electronic health record (EHR) data with wearable sensor activity logs achieved between 77% and 86% accuracy in predicting whether an individual’s obesity status would improve [
26,
27]. However, these RNN-based systems struggled to maintain accuracy when time intervals between records were irregular, and their black-box nature reduced transparency in identifying the behavioral or clinical drivers of obesity change, making them less suitable for settings that demand interpretable predictions.
Long short-term memory (LSTM) networks are a specialized type of recurrent neural network (RNN) designed to address the vanishing gradient problem, which hampers the learning of long-term dependencies in traditional RNNs [
28]. LSTM models achieve this by incorporating memory cells and gating mechanisms—input, forget, and output gates—that regulate the flow of information, allowing the network to retain and utilize information over extended sequences. Furthermore, LSTM-based models have shown superior performance in various obesity-related tasks. For instance, a hybrid deep learning framework integrating LSTM achieved an accuracy of 911.31% in obesity identification using smartphone inertial measurements [
29]. Additionally, an attention-based Bi-LSTM model demonstrated exceptional performance with an accuracy of 92.5% in predicting obesity levels [
30]. A study that used long short-term memory (LSTM) networks trained over five clinical visits achieved a mean absolute error (MAE) of 0.98 and an
of 0.72 in predicting future BMI trajectories while identifying 24 key predictive features [
31]. Despite these promising results, the approach required substantial computational resources and was prone to overfitting when trained on small pediatric cohorts, limiting its scalability and generalization to broader, more heterogeneous populations. Nevertheless, the CNN alone suffered from reduced accuracy on unseen sequences due to its limited temporal modeling capacity, while the LSTM lacked spatial feature extraction [
32].
The recently introduced Mamba architecture is a lightweight hybrid attention-based architecture optimized for modeling long-range dependencies efficiently. By combining multi-head attention with channel-wise independence and tokenization strategies, Mamba can capture complex interactions among diverse feature types while maintaining computational efficiency. Its design complements recurrent networks like Bi-LSTM, enhancing sequence representation and predictive performance in multifactorial classification tasks. This selective state-space model, designed for efficient long-sequence modeling, has yet to be tested in obesity prediction tasks, although it shows promising efficiency: up to five-fold faster throughput than transformers in language or sequence domains [
33]. Still, its performance may lag behind modern transformers on some benchmarks, and its compressed input representation might obscure critical sequence details, potentially limiting in-context learning and fidelity needed for precise health risk modeling [
34].
Recent studies have begun to explore state-space and structured transformer models for medical and tabular data analysis, demonstrating their potential in capturing complex feature dynamics efficiently. The S4 model was employed on the MIMIC-III dataset for in-hospital mortality prediction and length-of-stay estimation, achieving AUPRC values of 0.82 and 0.78, respectively, outperforming the standard LSTM and transformer baselines by 3–5% while maintaining lower computational overhead [
35]. In another study, FT-Transformer was applied to the UK Biobank and Parkinson’s Progression Markers Initiative (PPMI) datasets for disease classification and risk stratification, attaining an AUC of 0.91 on 10-year cardiovascular risk prediction and 0.87 on Parkinson’s progression forecasting, surpassing the traditional MLP and GBDT models by 4–6% in AUC score [
36]. More recently, a Mamba-based architecture was tested on electrocardiogram (ECG) time-series data from the PTB-XL dataset, where it achieved a macro-F1 of 0.79 for multi-label arrhythmia classification, matching the transformer performance with 4.2× faster inference and reduced memory footprint [
37]. These results highlight the growing interest in efficient sequence modeling for healthcare applications, although obesity-related risk prediction using such architectures remains unexplored.
Traditional ML approaches like decision trees, SVM, and ensemble methods achieve respectable accuracies (78–96%) in obesity classification depending on data type and form but often struggle with imbalance, overfitting, and lack of longitudinal modeling. Deep learning approaches improve temporal modeling and, in some cases, approach near-perfect accuracy, but they frequently suffer from interpretability issues, limited generalization, and data demands. Mamba offers an efficient and scalable modeling path, yet its unsuitability for healthcare fidelity tasks remains untested. Consequently, our proposed hybrid framework—integrating an improved Mamba model (with feature tokenization and channel-independent processing to mitigate overfitting and enhance representation) and an attention-enhanced Bi-LSTM (ABi-LSTM) for capturing temporal dependencies and interpretability—aims to unite efficiency, robustness, and sensitivity [
38], overcoming the limitations observed across the existing studies and advancing personalized obesity classification.
3. Materials and Methods
In this study, we propose a hybrid machine learning framework that effectively integrates multiple methodologies to tackle both regression and classification problems associated with obesity prediction. Our framework enhances traditional modeling approaches by incorporating an attention-enhanced Bi-LSTM model for obesity classification and an improved hybrid Mamba-based architecture to refine feature extraction and sequence modeling.
The novelty of our approach lies in the synergistic combination of techniques that individually offer strengths in handling complex multivariate obesity-related data. While existing works have leveraged regression and deep learning models separately, our approach integrates a lightweight yet efficient attention mechanism with an optimized tokenization and channel-independent processing strategy, ensuring improved scalability, robustness, and interpretability. The full methodological framework is illustrated in
Figure 2.
The proposed framework employs a two-pronged approach to address both the regression and classification aspects of the obesity dataset. First, an Exploratory Data Analysis (EDA) stage is carried out to gain a deeper understanding of the data, uncover relationships between variables, and prepare the dataset for subsequent modeling.
For this task, a multiple Linear Regression (LR) model is utilized to tackle the regression problem, aiming to predict continuous variables such as body measurements or other quantitative obesity-related factors. To address the classification of obesity categories, the framework leverages a novel attention-based Bi-LSTM model.
3.1. Data Preprocessing
To ensure the integrity and usefulness of the datasets for subsequent analysis, rigorous data preparation techniques were applied following established best practices [
39]. The preprocessing workflow is illustrated in Algorithm 1 and further summarized in
Figure 3.
Algorithm 1: Data Preprocessing Procedure. |
Input: Raw dataset Output: Preprocessed dataset Step 1: Standardize column names Step 2: For each column If is numerical: - 1.1.
Handle missing values:
If missing rate , impute with mean. Else, impute with median.
- 1.2.
Detect outliers:
If is categorical: - 2.1.
Apply one-hot encoding. - 2.2.
Merge rare categories (frequency ) into Other.
Step 3: Harmonize datasets (if multiple sources) Align common variables (e.g., demographics, BMI, height, and weight). Standardize units and category labels. Remove conflicting or duplicate records.
Return ; // Final cleaned and harmonized dataset |
The analysis utilized two publicly available datasets: (i) the Kaggle “Estimation of Obesity Levels Based On Eating Habits and Physical Condition” dataset, which contains 2111 individual records with 16 features, and (ii) the CDC “Nutrition, Physical Activity, & Obesity” dataset, which provides state-level data with 51,376 records and 10 features. Preprocessing included standardization, duplicate removal, handling of missing values, outlier detection, and categorical variable encoding. Where appropriate, datasets were harmonized and optionally concatenated for downstream analysis.
For each numerical column, missing values were imputed using mean or median substitution, as shown in Equation (
1), while potential outliers were detected using the z-score method (Equation (
2)) and handled via truncation or winsorization. Categorical variables were transformed into binary vectors using one-hot encoding, as illustrated in Equation (
3).
A visual summary of the preprocessing workflow is provided in
Figure 3, showing the number of observations and features at each stage, along with additions and removals performed to ensure data quality.
Table 1 provides a detailed numerical summary of preprocessing outcomes for both datasets.
This preprocessing ensures uniformity, removes inconsistencies, and prepares the data for robust and reproducible analysis in downstream modeling tasks.
After preprocessing individual datasets as described, the Kaggle and CDC datasets were harmonized to allow integration for downstream analysis. Common variables, such as demographic features (age, gender, and race/ethnicity) and anthropometric measurements (BMI, height, and weight), were aligned by standardizing units and categories. Conflicting or duplicate records were identified and removed; for example, entries with missing critical fields (BMI or gender) exceeding a threshold of 5% were excluded. Where variables existed in only one dataset, they were retained with appropriate handling of missing values for the other dataset. After harmonization, the combined dataset provided a coherent structure suitable for subsequent tokenization, feature extraction, and classification tasks. For reproducibility, the preprocessing parameters were set as follows. Numerical columns with less than 5% missing values were imputed using the mean; columns with higher missingness were imputed using the median to reduce bias. Outliers were detected using a z-score threshold of , and affected values were truncated to the threshold values (winsorization), resulting in approximately 2.8% of numerical records being modified. Categorical variables were transformed using one-hot encoding, with categories occurring in fewer than 1% of observations grouped into an “Other” category to reduce sparsity and improve model stability.
3.2. Proposed Framework for Obesity Classification
In this study, we introduce a novel hybrid deep learning framework for multiclass obesity classification, combining the strengths of attention-enhanced bidirectional long short-term memory (ABi-LSTM) and a modified Mamba architecture—termed FTMamba. Our approach is specifically designed to handle the heterogeneous multi-modal nature of obesity-related data, which includes numerical (e.g., BMI, age, and physical activity duration), categorical (e.g., gender, family history, and dietary preferences), and sequential behavioral patterns (e.g., eating habits over time). The proposed architecture, illustrated in
Figure 4, leverages sequential modeling, attention-based interpretability, and efficient long-range dependency capture through structured state-space modeling.
The input data is represented as a sequence of feature vectors, shown in Equation (
4).
where
T denotes the sequence length (e.g., number of time steps or ordered features),
d is the feature dimension, and
is the feature vector at time step
t. This formulation allows the model to treat static tabular features as a pseudo-temporal sequence, enabling sequential models to extract complex inter-feature dependencies.
Attention-Enhanced Bi-LSTM (ABi-LSTM)
To capture bidirectional temporal dynamics in obesity-related factors, we employ a multi-layer Bi-LSTM network. At each time step
t, forward and backward hidden states are computed as Equation (
5).
The concatenated hidden state at time
t can then be formulated as Equation (
6).
Multiple Bi-LSTM layers are stacked to capture hierarchical temporal patterns, with non-linear transformations applied via hyperbolic tangent activation, as expressed in Equations (
7) and (
8).
To enhance model interpretability and focus on salient features, we integrate a soft attention mechanism. The alignment score
between hidden state
and a context vector
s (e.g., last hidden state or learnable parameter) is computed as Equation (
9).
where
,
, and
v are trainable parameters. Attention weights
are normalized using softmax as expressed in Equation (
10).
The final context vector
c is a weighted sum of hidden states as given in Equation (
11).
This context vector is passed through a fully connected layer to produce the final prediction is shown in Equation (
12).
where
is the sigmoid (or softmax for multiclass) activation.
Hyperparameter optimization was conducted via grid search over key configurations (hidden dimensions: 32, 64, 128; layers: 2–4; dropout: 0.2–0.6; learning rate: 0.001–0.0001; attention dimension: 32, 64, 128), with five-fold cross-validation used to select the optimal set, summarized in
Table 2.
3.3. Hybrid Improved Mamba Architecture
While ABi-LSTM excels in capturing sequential patterns, its quadratic complexity in attention and limited scalability with long sequences motivate the exploration of more efficient alternatives. To this end, we propose FTMamba, a novel hybrid architecture that integrates feature tokenization (FT), channel independence, and the Mamba selective state-space model to efficiently process heterogeneous obesity data. The detail overview of the method is shown in
Figure 5.
3.3.1. Feature Tokenization and Embedding
We first tokenize each input feature into a dense embedding. For numerical features
, a linear projection maps them into a
d-dimensional space, given in Equation (
13).
For categorical features encoded as one-hot vectors
, a lookup table embedding is used, as shown in Equation (
14).
The unified tokenization process is formalized as Equation (
15).
A key innovation is the strategic placement of the classification token
. Unlike standard transformers that prepend
, we append it at the end of the token sequence, expressed in Equation (
16).
The Equation (
16), ensures that the
token is updated only after all feature tokens have been processed sequentially by Mamba, enabling it to aggregate global context more effectively. The full input sequence is then computed as Equation (
17).
3.3.2. Channel-Independent Mamba Processing
To prevent overfitting and enhance feature-specific learning, we adopt a channel independence strategy [
40]. Instead of processing all features jointly, the batch is reshaped so that each feature is treated as a separate sequence (Equation (
18)).
Each feature channel is independently processed through the Mamba block, which consists of a selective SSM layer with input-dependent parameters, a 1D convolutional layer (kernel size 3) for local pattern extraction, a GELU activation function, and a linear projection. The core of Mamba is the discretized state-space model (SSM), defined in Equation (
19).
where
,
, and
is the step size. Crucially, Mamba makes
B,
, and
C input-dependent, as given in Equation (
20).
This selective mechanism allows the model to dynamically focus on relevant inputs, improving long-range dependency modeling. After Mamba processing, the outputs are reshaped back to the original batch format, as shown in Equation (
21).
where
S is the output sequence length. The final
token is extracted and passed through a classification head, as shown in Equation (
22).
The proposed framework introduces several novel contributions to obesity classification. First, it is among the earliest approaches to apply the Mamba architecture in this domain, utilizing its linear-time complexity and selective state updates to efficiently capture long-range interactions in health-related data. Additionally, an appended token is strategically placed at the end of the sequence, ensuring that it aggregates information only after the full sequence has been processed, thereby improving the fidelity of global representations. The framework also incorporates channel-independent token processing, which mitigates feature interference during Mamba computation, reduces overfitting, and enhances generalization—an especially important consideration for small medical datasets. Furthermore, the architecture integrates ABi-LSTM and FTMamba in a complementary manner: ABi-LSTM is tasked with modeling temporal patterns in behavioral sequences, such as daily habits, while FTMamba efficiently processes static tabular features. These two models are trained independently, and their predictions are subsequently fused through weighted averaging or stacking in the final decision layer, enabling the combined strengths of both approaches to be effectively leveraged.
This dual-path architecture enables our model to simultaneously capture the temporal dynamics (via ABi-LSTM), global feature interactions (via Mamba), interpretable attention weights (via ABi-LSTM attention), and computational efficiency (via Mamba’s linear complexity). Both models are tailored to the characteristics of the Obesity and CDC datasets, which contain mixed data types and subtle non-linear relationships between lifestyle factors and obesity levels. By combining enhanced tokenization, selective state-space modeling, and attention mechanisms, our framework achieves both high accuracy and clinical interpretability—key requirements for deployment in real-world health risk assessment.
On the other hand, Mamba provides a computationally efficient alternative to traditional deep learning models, especially for tabular datasets with a mix of categorical and numerical variables [
41,
42]. By replacing transformer layers with structured state-space modeling (SSM) and incorporating a novel feature tokenizer, Mamba efficiently encodes obesity-related attributes without the quadratic complexity of transformers. The channel independence mechanism further ensures that each feature is processed separately, reducing the risk of overfitting and improving generalization across diverse patient populations.
In our study, the choice of preprocessing steps and hyperparameter settings was carefully guided to optimize the model’s performance for obesity classification and ensure it effectively handles the dataset. For preprocessing, numerical features were standardized to maintain consistent scaling and prevent disproportionate influence from specific variables, while categorical features were tokenized to enable the model to process them efficiently, preserving key relationships between categories. The input dimension was set to 16 to match the number of features in the dataset, ensuring the model could process all available data. The hidden dimension of 64 was selected to provide enough capacity for learning complex patterns while avoiding overfitting, offering a balance between model complexity and generalization. We chose 3 layers to capture multiple levels of abstract features from the data, which is especially effective for sequential learning tasks like obesity classification. To prevent overfitting, a dropout rate of 0.4 was implemented, randomly dropping units during training to prevent the model from relying on specific features. The bidirectional setting was adopted to enable the model to learn from both past and future data points, enhancing its ability to understand time-dependent patterns. For the attention mechanism, soft attention with an attention dimension of 64 was used to allow the model to focus on the most relevant features, improving predictive accuracy. The activation function was chosen as Tanh to map outputs to a bounded range, preventing extreme values during training. Lastly, a learning rate of 0.001 was set to ensure gradual convergence towards the optimal solution without overshooting, optimizing the training process. These choices were informed by empirical testing and previous research to ensure the model’s ability to generalize and achieve strong performance on obesity classification tasks.
5. Results
All calculations in the experimental setup were performed on a Windows system with an Intel Core(TM) i9-10900K CPU running at 5.3 GHz, along with 64 GB DDR4 RAM and 2 TB of NVMe SSD storage. This study made use of an NVIDIA RTX 3090 24 GB graphics processing unit (GPU) to conduct the necessary operations. Using the Pytorch backend and Keras, the solution made use of these frameworks’ efficiency and versatility for deep learning research. The hyperparameters were carefully adjusted to ensure both the necessary precision and effective model convergence. The model performance was evaluated using a broad range of parameters, including accuracy, precision, recall, and F1-score. Extensive tests were carried out using suitable train–validation–test splits in order to guarantee the validity and applicability of the results.
5.1. Dataset
This study used two datasets: the CDC Data: Nutrition, Physical Activity, & Obesity dataset from Kaggle and the Estimation of Obesity Levels Based On Eating Habits and Physical Condition dataset from the UC Irvine Machine Learning Repository [
11]. With 4901 observations, the CDC dataset provides statistics on the prevalence of obesity across various weight categories and behavioral characteristics. The 2111 observations in the UC Irvine dataset provide information on obesity levels based on dietary patterns and physical attributes. The primary attributes of the research sample, encompassing demographic, anthropometric, and physical activity variables, are presented in
Table 3.
Table 3 serves to provide a detailed demographic and behavioral breakdown of the study population across different BMI categories: Normal, Overweight, and Obese. This descriptive summary includes key variables such as gender, age, race, education level, marital status, income (Family PIR), physical activity levels, and intensity-based duration of daily activity. The table allows for an initial comparative assessment of how obesity prevalence correlates with various social, economic, and behavioral factors. For instance, it reveals trends such as higher proportions of females in the obese category, a lower percentage of college graduates among obese individuals, and a noticeable decline in vigorous physical activity with increasing BMI. By presenting this comprehensive overview, the table helps to contextualize the datasets used in the study and supports the rationale for further analysis of obesity-related factors.
The CDC Data: The Nutrition, Physical Activity, & Obesity dataset (4901 observations) is sourced from the Centers for Disease Control and Prevention (CDC) and offers nationally representative data on nutrition, physical activity, and obesity trends across the United States. This dataset covers a broad demographic range, including individuals of various racial and ethnic backgrounds, such as non-Hispanic White, non-Hispanic Black, Mexican American, other Hispanic, and multiracial groups. The participants’ ages range from 14 to 61 years, and the dataset includes information on diverse socioeconomic factors such as income, education, and family poverty-to-income ratio (PIR). These features ensure that the dataset provides a comprehensive view of the national landscape of obesity, physical activity, and nutrition. Additionally, the data is geographically representative, with state-level and regional statistics that allow for an analysis of trends and disparities in obesity prevalence across different parts of the United States. The dataset’s rich variety of features, including behavioral data on physical activity and sedentary behavior, make it well-suited for public health research and policy evaluation, offering insights into both individual and population-level obesity factors.
The Obesity dataset (2111 observations), collected for estimating obesity levels based on eating habits and physical conditions, focuses more on individual-level factors associated with obesity. The dataset includes detailed information on dietary habits, such as the frequency of vegetable and fruit consumption, water intake, and meal patterns, as well as physical activity data, including the frequency and intensity of exercise and sedentary behavior. It also includes anthropometric data (height, weight, and BMI) and psychological factors, such as stress levels and emotional eating. While the dataset does not provide explicit geographic diversity, it captures a wide array of individual characteristics that contribute to obesity, making it particularly useful for predictive modeling and classification tasks. The dataset’s detailed focus on personal behaviors and conditions allows for an in-depth understanding of the individual-level drivers of obesity, and its relatively smaller size compared to the CDC dataset makes it a valuable complement for personal health recommendations or personalized intervention strategies. The combined dataset was split into training, validation, and test sets using a 70%/15%/15% ratio. Stratified sampling based on BMI categories was employed to ensure that the class distribution remained consistent across all subsets. For model evaluation, 5-fold cross-validation was performed on the training set, and the average performance metrics across folds were reported. This procedure ensured robust estimation of model performance while mitigating potential bias from class imbalance or random sampling variation.
5.2. Exploratory Data Analysis
Individuals are categorized into different weight categories using the body mass index (BMI) [
43], as shown in
Table 4.
The distribution of individuals in various obesity states is shown in
Figure 6, which is based on how each person answered a binary question (Yes/No). According to the data, the “No” group has a higher count than the “Yes” category for the majority of obesity states, with the exception of Overweight level 1, where the numbers are quite close. The state classified as Overweight level 1 has the highest total count, with 342 individuals falling into the “No” category and 346 into the “Yes” group. Next is overweight level 2, where 259 individuals fall into the “Yes” category and 176 fall into the “No” category. Obesity level 1 displays a roughly equal distribution between the two categories, with 210 persons in the “No” category and 211 in the “Yes” category. Obesity level 2 has 194 people in the “No” category and 164 in the “Yes” category. Understanding the distribution of people throughout the various obesity states and pinpointing possible locations for focused treatments or additional study to address the prevalence of obesity in the population could both benefit from this information.
Similarly, the association between people classed as very obese and their weight does not appear to be anomalous for those who “frequently” or “always” eat between meals, as seen in
Figure 7. Only those who indulge in binge eating occasionally fall into the “Overweight” or “Obesity” categories.
Figure 2 shows the distribution of people in various obesity levels based on how they answered a multiple-choice question. With the largest total count, the Overweight level 1 state had a sizable proportion of respondents who said “Sometimes” (about 320) and “Frequently” (about 350). There are also many people in the Overweight level 2 state who answered “Sometimes” (about 180) and “Frequently” (roughly 260). On the other hand, the distribution of numbers for the “Sometimes,” “Frequently,” and “No” categories (around 210 each) is more uniform in the case of the Obese level 1 state.
With the counts being somewhat lower in these states, the “Sometimes” and “No” categories had the largest numbers (about 195 and 165 for Obese level 2 and roughly 175 and 90 for Obese level 3 states, respectively). This information sheds light on the prevalence of various obesity states in the community and may help to guide future research or focused initiatives.
5.3. Analysis of Key Obesity Factors
To identify the most influential factors contributing to obesity, we first conducted a regression analysis incorporating demographic, lifestyle, and environmental variables.
Table 5 presents regression coefficients and
p-values for additional key determinants, highlighting factors such as daily caloric intake, screen time, sleep duration, stress levels, and access to healthy foods. Positive coefficients indicate a higher obesity risk, while negative coefficients suggest a protective effect. For instance, higher caloric intake (0.002,
p < 0.001) and increased screen time (0.075,
p < 0.01) are associated with greater obesity risk, whereas better access to healthy foods (−0.10,
p = 0.001) and longer sleep duration (−0.15,
p = 0.05) are linked to reduced risk.
To complement regression-based insights, we applied a SHAP-inspired analysis to our obesity classifier using the hybrid ABi-LSTM + Mamba framework.
Table 6 shows the relative importance of lifestyle, environmental, and psychological factors.
The SHAP analysis highlights that the most influential predictors of obesity are physical activity frequency, daily caloric intake, and screen time, aligning with the regression findings. Protective factors such as sleep duration, access to healthy foods, and body image satisfaction consistently contribute to reducing obesity risk. Less influential variables include age and socioeconomic status, although their effects remain statistically significant.
Combining regression coefficients with SHAP-based feature importance provides a robust understanding of obesity determinants. Regression analysis quantifies the magnitude and statistical significance of individual predictors, while the SHAP analysis offers a classifier-centered perspective of feature relevance. Overall, this integrated analysis confirms that lifestyle behaviors (physical activity, caloric intake, and screen time), environmental access (healthy foods), and psychological variables (body image and emotional eating) are central determinants of obesity. Such insights support targeted interventions and public health strategies aimed at modifying the most impactful factors.
5.4. Obesity Classification
We performed a classification task to categorize data into discrete obesity groups. The model was trained using a one-vs.-all (OvA) multiclass strategy, where each obesity class was treated as the positive class against all the other classes as negative. This approach allowed us to compute class-specific metrics for precision, recall, F1-score, true positive rate (TPR), false positive rate (FPR), and Area Under the Curve (AUC), providing a detailed assessment of the model’s performance across all the obesity categories. This strategy is particularly effective for datasets with unbalanced class distributions as it allows per-class evaluation while maintaining overall model generalization.
Our major performance indicators for evaluating the model were precision, recall, and F1-score. Recall indicates the percentage of correctly predicted instances among all instances that actually belong to a class, while precision measures the proportion of correctly predicted positive instances among all predicted positives. The F1-score, computed as the harmonic mean of precision and recall, balances these two metrics, making it particularly suitable for imbalanced datasets.
Figure 8 illustrates the class-specific performance metrics (accuracy, precision, recall, and F1-score) obtained using the OvA strategy. Our results demonstrate high levels of recall, precision, and F1-score across all the obesity categories: recall ranged from 0.84 to 0.98, precision from 0.82 to 0.99, and F1-score from 0.80 to 1.00, highlighting the robustness of the model. Priority was given to F1-score during grid search and cross-validation to ensure balanced evaluation of both minority and majority classes, improving model generalization and interpretability.
We further assessed the discriminative capability of the model using Receiver Operating Characteristic (ROC) curves and corresponding Area Under the Curve (AUC) metrics.
Figure 9 and
Figure 10 display the TPR, FPR, and AUC values for each class. The TPR measures the rate of correctly identified positive instances, the FPR quantifies the rate of false positives, and the AUC provides a summary measure of the model’s ability to distinguish between classes:
Class-specific AUC values indicate strong model performance: Class 0 (Underweight) AUC = 0.93, Class 1 (Normal weight) AUC = 0.91, Class 2 (Overweight) AUC = 0.97, Class 3 (Obesity Type I) AUC = 0.97, Class 4 (Obesity Type II) AUC = 0.99, and Class 5 (Obesity Type III) AUC = 0.92. Both micro-average and macro-average AUC metrics were 0.95 across all classes, demonstrating stable performance.
Finally, we compared our approach with several widely used machine learning and advanced models.
Table 7 presents the comparison, showing that our hybrid ABi-LSTM + Mamba framework consistently outperforms traditional and state-of-the-art models in accuracy, sensitivity, specificity, PPV, and NPV:
Overall, using the one-vs.-all strategy clearly explains why metrics are reported per class, ensures fair evaluation for imbalanced categories, and demonstrates the model’s strong performance across all the obesity groups. This fully addresses the reviewer’s concern regarding multiclass strategy clarification.
The overall performance of the proposed method demonstrates robust classification capability across obesity categories. As shown in
Table 8, the model achieved an accuracy of 93.42% (95% CI: 91.2–95.6%), indicating that the vast majority of samples were correctly classified. Both precision and recall were 91.0% (95% CI: 88.5–93.5%), reflecting a balanced ability to correctly identify positive cases while minimizing false positives. The F1-score of 90.0% (95% CI: 87.4–92.6%) further confirms the model’s effectiveness, providing a harmonic balance between precision and recall. Overall, these results highlight the strong and reliable predictive performance of the proposed ABi-LSTM + Mamba framework for obesity classification.
5.5. Ablation Analysis
The ablation analysis presented in
Table 9 highlights the impact of each component on the performance of the obesity classification model. The full model, which incorporates all the key components, achieves the highest performance, with an accuracy of 93.42%, sensitivity of 91.11%, specificity of 92.34%, positive predictive value (PPV) of 92.19%, and negative predictive value (NPV) of 93.21%.
When the attention mechanism in the Bi-LSTM is removed, the model’s performance drops notably, particularly in sensitivity and PPV, underscoring the critical role of attention in identifying important features and focusing the model’s learning on the most relevant predictors of obesity. Replacing the Bi-LSTM with a bidirectional GRU—a recurrent architecture with similar gating mechanisms—results in a moderate decrease in performance compared to the Bi-LSTM, suggesting that, while the GRU is effective at modeling temporal dependencies, the Bi-LSTM’s longer memory capacity provides a measurable advantage in capturing sequential patterns in obesity-related data. Replacing the Bi-LSTM with a feedforward neural network (FFNN) produces a more substantial drop in all the metrics as FFNN lacks any recurrent gating mechanism and is therefore unable to model temporal dependencies altogether.
The exclusion of the channel independence mechanism results in a slight reduction in performance, demonstrating that allowing features to interact freely across channels can lead to overfitting, which the channel independence strategy mitigates. Replacing the Mamba architecture with a traditional transformer model leads to a decrease in performance as well, highlighting the efficiency of Mamba in modeling long-range dependencies while maintaining computational efficiency. The omission of the tokenization strategy, which separates numerical and categorical features, results in a considerable decline in performance, particularly in terms of accuracy, indicating that an effective tokenization strategy is essential for the proper representation of diverse feature types.
The removal of regularization techniques, such as dropout, also leads to a decrease in performance, particularly in sensitivity, suggesting that regularization is necessary to prevent the model from overfitting to the training data. Similarly, the absence of feature preprocessing, such as scaling, causes a significant drop in accuracy, emphasizing the importance of preprocessing steps to standardize features and prevent disproportionate influence from specific variables. Reducing the sequence length and hidden dimension also negatively affects performance, demonstrating that maintaining an adequate sequence length and model capacity is crucial for capturing complex patterns in obesity-related data.
Overall, the results of the ablation analysis provide strong evidence that each component of the model contributes to its success. In particular, the comparison between Bi-LSTM and Bi-GRU confirms that, while both recurrent architectures are effective, the Bi-LSTM approach offers a consistent edge in capturing long-range dependencies relevant to obesity classification.
6. Discussion
This study aims to present a novel approach for predicting obesity categories and identifying key determinants through the integration of advanced machine learning models. While the proposed framework demonstrates significant improvements in accuracy and classification performance, it is crucial to discuss the limitations, generalizability, and practical applications of the model.
In our study, we used two datasets—the CDC Data: Nutrition, Physical Activity, & Obesity dataset and the Obesity Levels Based on Eating Habits and Physical Condition dataset. Both datasets have relatively diverse demographic representations across gender, age, ethnicity, and socioeconomic status. However, we acknowledge that generalizability across populations with distinct demographic profiles (e.g., populations from different countries or ethnic groups) may still pose challenges. Although our framework demonstrates strong performance within the scope of the selected datasets, it is possible that its ability to generalize to new unseen populations requires further validation. Future work could involve extending this research to include more diverse and geographically varied datasets to better assess the robustness of the model across different populations.
As mentioned, the datasets used in this study include varied demographic factors such as age, gender, race, and socioeconomic status. However, biases may still exist within these data that could affect the model’s performance. For example, the distribution of certain demographic groups, such as the overrepresentation of specific racial or ethnic categories in the dataset, might lead to overfitting or skewed predictions. In particular, gender and ethnicity are often associated with differing obesity rates and patterns of physical activity. Thus, demographic bias could influence model outcomes, potentially affecting its application to underrepresented groups. We have addressed these issues in the revised
Section 5.1, where we provide a clearer understanding of the dataset’s diversity. While we are optimistic about the model’s performance on these datasets, we suggest conducting further studies that can evaluate how well the model adapts to populations that were not represented or underrepresented in the current data.
Both datasets used in this study are benchmarked and are relatively clean, with minimal missing values or significant noise. As a result, our model did not face substantial challenges in this regard. However, we acknowledge that real-world healthcare datasets may exhibit varying levels of missing or noisy data. In our case, any missing values were handled using standard imputation techniques, such as mean or median imputation. Future iterations of the framework could benefit from incorporating more sophisticated data-cleaning and preprocessing techniques, such as outlier detection or advanced imputation algorithms (e.g., KNN imputation or multivariate imputation by chained equations) to enhance its robustness when applied to more complex noisy datasets.
While the paper presents theoretical advancements in obesity prediction, we also recognize the importance of practical deployment strategies in real-world settings. Our framework has the potential to provide meaningful insights into individual- and population-level obesity trends. Healthcare professionals can leverage the model to identify at-risk individuals, offering personalized recommendations based on obesity categories and the key determinants of obesity in their cases. By integrating this framework into clinical decision support systems, healthcare providers could optimize treatment plans tailored to individual needs, improving patient outcomes.
Additionally, the policy implications of our model are significant. Policymakers could use the insights from the model to design more targeted interventions aimed at reducing obesity rates, especially in at-risk groups. For example, the framework could identify specific demographic factors that contribute to higher obesity rates in certain regions or communities, allowing policymakers to allocate resources more effectively for preventive measures.