Next Article in Journal
A Digital Twin-Assisted VEC Intelligent Task Offloading Approach
Previous Article in Journal
A Comprehensive Survey on Wearable Computing for Mental and Physical Health Monitoring
Previous Article in Special Issue
Dual-Path Enhanced YOLO11 for Lightweight Instance Segmentation with Attention and Efficient Convolution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid Mamba and Attention-Enhanced Bi-LSTM for Obesity Classification and Key Determinant Identification

by
Chongyang Fu
1,
Mohd Shahril Nizam Bin Shaharom
2,* and
Syed Kamaruzaman Bin Syed Ali
1,*
1
Department of Educational Foundations and Humanities, Faculty of Education, University of Malaya, Kuala Lumpur 50603, Malaysia
2
Department of Curriculum and Instructional Technology, Faculty of Education, University of Malaya, Kuala Lumpur 50603, Malaysia
*
Authors to whom correspondence should be addressed.
Electronics 2025, 14(17), 3445; https://doi.org/10.3390/electronics14173445
Submission received: 15 July 2025 / Revised: 16 August 2025 / Accepted: 25 August 2025 / Published: 29 August 2025
(This article belongs to the Special Issue Knowledge Representation and Reasoning in Artificial Intelligence)

Abstract

Obesity is a major public health challenge linked to increased risks of chronic diseases. Effective prevention and intervention strategies require accurate classification and identification of key determinants. This study aims to develop a robust deep learning framework to enhance the accuracy and interpretability of obesity classification using comprehensive datasets, and to compare its performance with both traditional and state-of-the-art deep learning models. We propose a hybrid deep learning framework that combines an improved Mamba model with an attention-enhanced bidirectional LSTM (ABi-LSTM). The framework utilizes the Obesity and CDC datasets. A feature tokenizer is integrated into the Mamba model to improve scalability and representation learning. Channel-independent processing is employed to prevent overfitting through independent feature analysis. The ABi-LSTM component is used to capture complex temporal dependencies in the data, thereby enhancing classification performance. The proposed framework achieved an accuracy of 93.42%, surpassing existing methods such as ID3 (91.87%), J48 (89.98%), Naïve Bayes (90.31%), Bayesian Network (89.23%), as well as deep learning-based approaches such as VAE (92.12%) and LightCNN (92.50%). Additionally, the model improved sensitivity to 91.11% and specificity to 92.34%. The hybrid model demonstrates superior performance in obesity classification and determinant identification compared to both traditional and advanced deep learning methods. These results underscore the potential of deep learning in enabling data-driven personalized healthcare and targeted obesity interventions.

1. Introduction

Obesity has emerged as a global health crisis, posing serious challenges for both individual health and public healthcare systems [1,2]. Developing effective strategies for the prevention and management of obesity is critical [3]. Investigating dietary habits and physical activity levels is essential to grasp the complex and multifactorial nature of the condition [4]. A general overview of obesity’s key factors and related diseases is shown in Figure 1.
Accurately categorizing individuals by their obesity levels allows for targeted interventions and personalized health management. Understanding the factors that contribute to varying degrees of obesity enables healthcare professionals to design customized strategies that support weight management [5], enhance overall health, and mitigate the impact of obesity-related diseases [6].
Advancements in artificial intelligence and machine learning have paved the way for utilizing data-driven approaches to predict obesity categories [7,8]. These technologies enable the discovery of hidden patterns, identification of key variables, and development of predictive models that aid in early detection and proactive management of obesity [9,10].
Although previous research has applied machine learning to obesity analysis, challenges persist in fully capturing the interactions between different factors and accurately classifying the complete range of obesity categories [11,12,13]. To overcome these challenges, this study adopts a multifaceted approach that incorporates advanced models, specifically the hybrid Mamba and attention-enhanced Bi-LSTM (ABi-LSTM), to provide a deeper understanding of obesity dynamics.
In this research, we leveraged the Obesity [14] and CDC datasets [15,16] to construct a robust framework for obesity analysis. We first implemented the improved Mamba model to enhance scalability and process high-dimensional data effectively. Following this, we employed the ABi-LSTM framework to categorize individuals into distinct obesity classes, including Underweight, Normal, Overweight, and Obesity I, II, and III. The proposed ABi-LSTM model provides valuable insights into the critical factors influencing obesity classification, capturing complex temporal patterns and relationships among variables.
The main contributions of this paper are as follows:
  • Hybrid Improved Mamba Model:
    We propose a novel hybrid approach by integrating the Mamba model with an enhanced tokenizer to handle high-dimensional obesity data. The new feature tokenizer efficiently converts both numerical and categorical inputs into tokens, including a special [ C L S ] token that captures overall sequence information, making the model scalable and efficient for obesity classification.
  • Channel Independence for Obesity Data: To mitigate model overfitting and enhance prediction accuracy, we introduce the concept of channel independence, which processes each feature channel separately. This reshaping technique allows the model to handle feature-specific information more effectively while maintaining the integrity of the data structure during processing.
  • Attention-Enhanced Bidirectional LSTM (ABi-LSTM) Model: We develop an innovative ABi-LSTM model that combines attention mechanisms with bidirectional LSTM to better capture temporal patterns in sequential obesity data. This approach improves the model’s ability to classify individuals across multiple obesity levels, using a dynamic attention mechanism to prioritize the most important features in the data.
  • Effective Multiclass Classification for Obesity Levels: By applying hybrid machine learning models, our study demonstrates improved performance in classifying individuals into distinct obesity categories. This enables more accurate obesity level prediction, which can inform personalized health interventions and targeted management strategies.
Through this multifaceted approach, this study aspires to make meaningful contributions to the ongoing efforts in obesity research. By leveraging a comprehensive framework integrating regression analysis and classification techniques, the study yields valuable insights that can empower healthcare professionals and policymakers to enhance obesity prevention, management, and intervention strategies. Accurate prediction of obesity categories, coupled with interpretable identification of pivotal influencing factors, equip stakeholders with the necessary data-driven evidence to develop impactful tailored initiatives. This study’s findings support the implementation of innovative evidence-based approaches to tackle the complex challenges posed by the obesity epidemic, ultimately improving individual well-being and alleviating the burden on healthcare systems.

2. Related Works

Obesity has emerged as a major public health concern worldwide, significantly increasing the risk of chronic conditions such as cardiovascular diseases, diabetes, and hypertension [17]. The World Health Organization (WHO) has highlighted the alarming rise in obesity prevalence, underscoring the necessity of effective classification and early identification of key determinants to develop personalized interventions [18]. Traditional statistical models and clinical assessments often fall short of capturing the multifactorial complexity of obesity, prompting researchers to turn to advanced machine learning (ML) and deep learning (DL) techniques for more accurate and comprehensive analysis [13].
For instance, analysis of the CHICA pediatric clinical decision support dataset using classical ML models such as RandomTree, RandomForest, ID3, J48, Naïve Bayes, and Bayesian Network revealed that ID3 delivered the highest performance, with 85% accuracy and 89% sensitivity, alongside a positive predictive value (PPV) of approximately 84% and negative predictive value (NPV) near 88% [19]. Nonetheless, under imbalanced class conditions, sensitivity in J48 dropped dramatically—to as low as 29%—indicating a substantial under-detection of obese cases in skewed datasets [19]. Similarly, broader studies employing diverse ML techniques on various datasets have produced a spectrum of results: Montañez et al. achieved an AUC of 90.5% using SVM on genetic profile data [20]; ensemble strategies offered 89.68% accuracy. These classical and ensemble approaches, although often accurate, frequently suffer from overfitting [21], a lack of robustness across heterogeneous datasets, and limitations in capturing the longitudinal patterns inherent in health data [22].
Deep learning approaches have aimed to remedy these gaps by modeling temporal and behavioral dimensions [23]. For example, a multi-layer perceptron trained on individual and behavioral variables—such as age, physical activity, and dietary habits—across normal weight (NW), overweight (OW), and obese (OB) categories achieved 75.8% overall accuracy, with class-specific true positive rates of 90.3% for NW, 34.2% for OW, and 66.7% for OB [24]. While this model performed well in identifying normal-weight individuals, its severe drop in sensitivity for overweight subjects indicates a major limitation in early-stage obesity risk detection [25], where accurate classification of overweight cases is essential to prevent disease progression. Similarly, recurrent neural network (RNN) architectures integrating electronic health record (EHR) data with wearable sensor activity logs achieved between 77% and 86% accuracy in predicting whether an individual’s obesity status would improve [26,27]. However, these RNN-based systems struggled to maintain accuracy when time intervals between records were irregular, and their black-box nature reduced transparency in identifying the behavioral or clinical drivers of obesity change, making them less suitable for settings that demand interpretable predictions.
Long short-term memory (LSTM) networks are a specialized type of recurrent neural network (RNN) designed to address the vanishing gradient problem, which hampers the learning of long-term dependencies in traditional RNNs [28]. LSTM models achieve this by incorporating memory cells and gating mechanisms—input, forget, and output gates—that regulate the flow of information, allowing the network to retain and utilize information over extended sequences. Furthermore, LSTM-based models have shown superior performance in various obesity-related tasks. For instance, a hybrid deep learning framework integrating LSTM achieved an accuracy of 911.31% in obesity identification using smartphone inertial measurements [29]. Additionally, an attention-based Bi-LSTM model demonstrated exceptional performance with an accuracy of 92.5% in predicting obesity levels [30]. A study that used long short-term memory (LSTM) networks trained over five clinical visits achieved a mean absolute error (MAE) of 0.98 and an R 2 of 0.72 in predicting future BMI trajectories while identifying 24 key predictive features [31]. Despite these promising results, the approach required substantial computational resources and was prone to overfitting when trained on small pediatric cohorts, limiting its scalability and generalization to broader, more heterogeneous populations. Nevertheless, the CNN alone suffered from reduced accuracy on unseen sequences due to its limited temporal modeling capacity, while the LSTM lacked spatial feature extraction [32].
The recently introduced Mamba architecture is a lightweight hybrid attention-based architecture optimized for modeling long-range dependencies efficiently. By combining multi-head attention with channel-wise independence and tokenization strategies, Mamba can capture complex interactions among diverse feature types while maintaining computational efficiency. Its design complements recurrent networks like Bi-LSTM, enhancing sequence representation and predictive performance in multifactorial classification tasks. This selective state-space model, designed for efficient long-sequence modeling, has yet to be tested in obesity prediction tasks, although it shows promising efficiency: up to five-fold faster throughput than transformers in language or sequence domains [33]. Still, its performance may lag behind modern transformers on some benchmarks, and its compressed input representation might obscure critical sequence details, potentially limiting in-context learning and fidelity needed for precise health risk modeling [34].
Recent studies have begun to explore state-space and structured transformer models for medical and tabular data analysis, demonstrating their potential in capturing complex feature dynamics efficiently. The S4 model was employed on the MIMIC-III dataset for in-hospital mortality prediction and length-of-stay estimation, achieving AUPRC values of 0.82 and 0.78, respectively, outperforming the standard LSTM and transformer baselines by 3–5% while maintaining lower computational overhead [35]. In another study, FT-Transformer was applied to the UK Biobank and Parkinson’s Progression Markers Initiative (PPMI) datasets for disease classification and risk stratification, attaining an AUC of 0.91 on 10-year cardiovascular risk prediction and 0.87 on Parkinson’s progression forecasting, surpassing the traditional MLP and GBDT models by 4–6% in AUC score [36]. More recently, a Mamba-based architecture was tested on electrocardiogram (ECG) time-series data from the PTB-XL dataset, where it achieved a macro-F1 of 0.79 for multi-label arrhythmia classification, matching the transformer performance with 4.2× faster inference and reduced memory footprint [37]. These results highlight the growing interest in efficient sequence modeling for healthcare applications, although obesity-related risk prediction using such architectures remains unexplored.
Traditional ML approaches like decision trees, SVM, and ensemble methods achieve respectable accuracies (78–96%) in obesity classification depending on data type and form but often struggle with imbalance, overfitting, and lack of longitudinal modeling. Deep learning approaches improve temporal modeling and, in some cases, approach near-perfect accuracy, but they frequently suffer from interpretability issues, limited generalization, and data demands. Mamba offers an efficient and scalable modeling path, yet its unsuitability for healthcare fidelity tasks remains untested. Consequently, our proposed hybrid framework—integrating an improved Mamba model (with feature tokenization and channel-independent processing to mitigate overfitting and enhance representation) and an attention-enhanced Bi-LSTM (ABi-LSTM) for capturing temporal dependencies and interpretability—aims to unite efficiency, robustness, and sensitivity [38], overcoming the limitations observed across the existing studies and advancing personalized obesity classification.

3. Materials and Methods

In this study, we propose a hybrid machine learning framework that effectively integrates multiple methodologies to tackle both regression and classification problems associated with obesity prediction. Our framework enhances traditional modeling approaches by incorporating an attention-enhanced Bi-LSTM model for obesity classification and an improved hybrid Mamba-based architecture to refine feature extraction and sequence modeling.
The novelty of our approach lies in the synergistic combination of techniques that individually offer strengths in handling complex multivariate obesity-related data. While existing works have leveraged regression and deep learning models separately, our approach integrates a lightweight yet efficient attention mechanism with an optimized tokenization and channel-independent processing strategy, ensuring improved scalability, robustness, and interpretability. The full methodological framework is illustrated in Figure 2.
The proposed framework employs a two-pronged approach to address both the regression and classification aspects of the obesity dataset. First, an Exploratory Data Analysis (EDA) stage is carried out to gain a deeper understanding of the data, uncover relationships between variables, and prepare the dataset for subsequent modeling.
For this task, a multiple Linear Regression (LR) model is utilized to tackle the regression problem, aiming to predict continuous variables such as body measurements or other quantitative obesity-related factors. To address the classification of obesity categories, the framework leverages a novel attention-based Bi-LSTM model.

3.1. Data Preprocessing

To ensure the integrity and usefulness of the datasets for subsequent analysis, rigorous data preparation techniques were applied following established best practices [39]. The preprocessing workflow is illustrated in Algorithm 1 and further summarized in Figure 3.
Algorithm 1: Data Preprocessing Procedure.
Input: Raw dataset D = { x 1 , x 2 , , x n }
Output: Preprocessed dataset D ^
Step 1: Standardize column names
Step 2: For each column c i D
  • If c i is numerical:
    1.1.
    Handle missing values:
    • If missing rate < 5 % , impute with mean.
    • Else, impute with median.
    1.2.
    Detect outliers:
    • Compute z = X μ σ .
    • If | z | > 3 , winsorize or truncate to threshold.
  • If c i is categorical:
    2.1.
    Apply one-hot encoding.
    2.2.
    Merge rare categories (frequency < 1 % ) into Other.
Step 3: Harmonize datasets (if multiple sources)
  • Align common variables (e.g., demographics, BMI, height, and weight).
  • Standardize units and category labels.
  • Remove conflicting or duplicate records.
Return  D ^ ;        // Final cleaned and harmonized dataset
The analysis utilized two publicly available datasets: (i) the Kaggle “Estimation of Obesity Levels Based On Eating Habits and Physical Condition” dataset, which contains 2111 individual records with 16 features, and (ii) the CDC “Nutrition, Physical Activity, & Obesity” dataset, which provides state-level data with 51,376 records and 10 features. Preprocessing included standardization, duplicate removal, handling of missing values, outlier detection, and categorical variable encoding. Where appropriate, datasets were harmonized and optionally concatenated for downstream analysis.
For each numerical column, missing values were imputed using mean or median substitution, as shown in Equation (1), while potential outliers were detected using the z-score method (Equation (2)) and handled via truncation or winsorization. Categorical variables were transformed into binary vectors using one-hot encoding, as illustrated in Equation (3).
MissingValueImputation ( Column ) = i = 1 n Non - missing Value i Number of Non - missing Values
z = X μ σ
OneHotEncodedVector = [ 0 , 0 , , 1 , 0 , , 0 ]
A visual summary of the preprocessing workflow is provided in Figure 3, showing the number of observations and features at each stage, along with additions and removals performed to ensure data quality. Table 1 provides a detailed numerical summary of preprocessing outcomes for both datasets.
This preprocessing ensures uniformity, removes inconsistencies, and prepares the data for robust and reproducible analysis in downstream modeling tasks.
After preprocessing individual datasets as described, the Kaggle and CDC datasets were harmonized to allow integration for downstream analysis. Common variables, such as demographic features (age, gender, and race/ethnicity) and anthropometric measurements (BMI, height, and weight), were aligned by standardizing units and categories. Conflicting or duplicate records were identified and removed; for example, entries with missing critical fields (BMI or gender) exceeding a threshold of 5% were excluded. Where variables existed in only one dataset, they were retained with appropriate handling of missing values for the other dataset. After harmonization, the combined dataset provided a coherent structure suitable for subsequent tokenization, feature extraction, and classification tasks. For reproducibility, the preprocessing parameters were set as follows. Numerical columns with less than 5% missing values were imputed using the mean; columns with higher missingness were imputed using the median to reduce bias. Outliers were detected using a z-score threshold of | z | > 3 , and affected values were truncated to the threshold values (winsorization), resulting in approximately 2.8% of numerical records being modified. Categorical variables were transformed using one-hot encoding, with categories occurring in fewer than 1% of observations grouped into an “Other” category to reduce sparsity and improve model stability.

3.2. Proposed Framework for Obesity Classification

In this study, we introduce a novel hybrid deep learning framework for multiclass obesity classification, combining the strengths of attention-enhanced bidirectional long short-term memory (ABi-LSTM) and a modified Mamba architecture—termed FTMamba. Our approach is specifically designed to handle the heterogeneous multi-modal nature of obesity-related data, which includes numerical (e.g., BMI, age, and physical activity duration), categorical (e.g., gender, family history, and dietary preferences), and sequential behavioral patterns (e.g., eating habits over time). The proposed architecture, illustrated in Figure 4, leverages sequential modeling, attention-based interpretability, and efficient long-range dependency capture through structured state-space modeling.
The input data is represented as a sequence of feature vectors, shown in Equation (4).
X = [ x 1 , x 2 , , x T ] R T × d
where T denotes the sequence length (e.g., number of time steps or ordered features), d is the feature dimension, and x t R d is the feature vector at time step t. This formulation allows the model to treat static tabular features as a pseudo-temporal sequence, enabling sequential models to extract complex inter-feature dependencies.

Attention-Enhanced Bi-LSTM (ABi-LSTM)

To capture bidirectional temporal dynamics in obesity-related factors, we employ a multi-layer Bi-LSTM network. At each time step t, forward and backward hidden states are computed as Equation (5).
h t = LSTM forward ( x t , h t 1 ) , h t = LSTM backward ( x t , h t 1 )
The concatenated hidden state at time t can then be formulated as Equation (6).
H 1 = [ h t , h t ] , t = 1 , , T
Multiple Bi-LSTM layers are stacked to capture hierarchical temporal patterns, with non-linear transformations applied via hyperbolic tangent activation, as expressed in Equations (7) and (8).
H 2 = tanh ( W 2 H 1 + b 2 )
H 3 = tanh ( W 3 H 2 + b 3 )
To enhance model interpretability and focus on salient features, we integrate a soft attention mechanism. The alignment score e t between hidden state h t and a context vector s (e.g., last hidden state or learnable parameter) is computed as Equation (9).
e t = v tanh ( w a h t + U a s )
where W a , U a , and v are trainable parameters. Attention weights α t are normalized using softmax as expressed in Equation (10).
α t = exp ( e t ) k = 1 T exp ( e k )
The final context vector c is a weighted sum of hidden states as given in Equation (11).
c = t = 1 T α t h t
This context vector is passed through a fully connected layer to produce the final prediction is shown in Equation (12).
y = σ ( w o c + b o )
where σ ( · ) is the sigmoid (or softmax for multiclass) activation.
Hyperparameter optimization was conducted via grid search over key configurations (hidden dimensions: 32, 64, 128; layers: 2–4; dropout: 0.2–0.6; learning rate: 0.001–0.0001; attention dimension: 32, 64, 128), with five-fold cross-validation used to select the optimal set, summarized in Table 2.

3.3. Hybrid Improved Mamba Architecture

While ABi-LSTM excels in capturing sequential patterns, its quadratic complexity in attention and limited scalability with long sequences motivate the exploration of more efficient alternatives. To this end, we propose FTMamba, a novel hybrid architecture that integrates feature tokenization (FT), channel independence, and the Mamba selective state-space model to efficiently process heterogeneous obesity data. The detail overview of the method is shown in Figure 5.

3.3.1. Feature Tokenization and Embedding

We first tokenize each input feature into a dense embedding. For numerical features x j , a linear projection maps them into a d-dimensional space, given in Equation (13).
T j num = x j · w j + b j , w j , b j R d
For categorical features encoded as one-hot vectors e x j R K j , a lookup table embedding is used, as shown in Equation (14).
T j cat = e x j w j + b j , w j R K j × d
The unified tokenization process is formalized as Equation (15).
T j = x j w j + b j if x j is numerical e x j w j + b j if x j is categorical R d
A key innovation is the strategic placement of the classification token [ C L S ] . Unlike standard transformers that prepend [ C L S ] , we append it at the end of the token sequence, expressed in Equation (16).
T [ C L S ] = w cls + b cls , w cls , b cls R d
The Equation (16), ensures that the [ C L S ] token is updated only after all feature tokens have been processed sequentially by Mamba, enabling it to aggregate global context more effectively. The full input sequence is then computed as Equation (17).
T = [ T 1 , T 2 , , T N , T [ C L S ] ] R ( N + 1 ) × d

3.3.2. Channel-Independent Mamba Processing

To prevent overfitting and enhance feature-specific learning, we adopt a channel independence strategy [40]. Instead of processing all features jointly, the batch is reshaped so that each feature is treated as a separate sequence (Equation (18)).
Batch ( X I ) R ( B × N , 1 , d )
Each feature channel is independently processed through the Mamba block, which consists of a selective SSM layer with input-dependent parameters, a 1D convolutional layer (kernel size 3) for local pattern extraction, a GELU activation function, and a linear projection. The core of Mamba is the discretized state-space model (SSM), defined in Equation (19).
h t = A ¯ h t 1 + B ¯ x t
where A ¯ = e Δ A , B ¯ = ( e Δ A I A ) Δ B , and Δ is the step size. Crucially, Mamba makes B, Δ , and C input-dependent, as given in Equation (20).
B t = f B ( x t ) , Δ t = f Δ ( x t ) , C t = f C ( x t )
This selective mechanism allows the model to dynamically focus on relevant inputs, improving long-range dependency modeling. After Mamba processing, the outputs are reshaped back to the original batch format, as shown in Equation (21).
Batch ( X ^ ) R ( B , S , N )
where S is the output sequence length. The final [ C L S ] token is extracted and passed through a classification head, as shown in Equation (22).
y pred = Softmax ( w out h [ C L S ] + b out )
The proposed framework introduces several novel contributions to obesity classification. First, it is among the earliest approaches to apply the Mamba architecture in this domain, utilizing its linear-time complexity and selective state updates to efficiently capture long-range interactions in health-related data. Additionally, an appended [ C L S ] token is strategically placed at the end of the sequence, ensuring that it aggregates information only after the full sequence has been processed, thereby improving the fidelity of global representations. The framework also incorporates channel-independent token processing, which mitigates feature interference during Mamba computation, reduces overfitting, and enhances generalization—an especially important consideration for small medical datasets. Furthermore, the architecture integrates ABi-LSTM and FTMamba in a complementary manner: ABi-LSTM is tasked with modeling temporal patterns in behavioral sequences, such as daily habits, while FTMamba efficiently processes static tabular features. These two models are trained independently, and their predictions are subsequently fused through weighted averaging or stacking in the final decision layer, enabling the combined strengths of both approaches to be effectively leveraged.
This dual-path architecture enables our model to simultaneously capture the temporal dynamics (via ABi-LSTM), global feature interactions (via Mamba), interpretable attention weights (via ABi-LSTM attention), and computational efficiency (via Mamba’s linear complexity). Both models are tailored to the characteristics of the Obesity and CDC datasets, which contain mixed data types and subtle non-linear relationships between lifestyle factors and obesity levels. By combining enhanced tokenization, selective state-space modeling, and attention mechanisms, our framework achieves both high accuracy and clinical interpretability—key requirements for deployment in real-world health risk assessment.
On the other hand, Mamba provides a computationally efficient alternative to traditional deep learning models, especially for tabular datasets with a mix of categorical and numerical variables [41,42]. By replacing transformer layers with structured state-space modeling (SSM) and incorporating a novel feature tokenizer, Mamba efficiently encodes obesity-related attributes without the quadratic complexity of transformers. The channel independence mechanism further ensures that each feature is processed separately, reducing the risk of overfitting and improving generalization across diverse patient populations.
In our study, the choice of preprocessing steps and hyperparameter settings was carefully guided to optimize the model’s performance for obesity classification and ensure it effectively handles the dataset. For preprocessing, numerical features were standardized to maintain consistent scaling and prevent disproportionate influence from specific variables, while categorical features were tokenized to enable the model to process them efficiently, preserving key relationships between categories. The input dimension was set to 16 to match the number of features in the dataset, ensuring the model could process all available data. The hidden dimension of 64 was selected to provide enough capacity for learning complex patterns while avoiding overfitting, offering a balance between model complexity and generalization. We chose 3 layers to capture multiple levels of abstract features from the data, which is especially effective for sequential learning tasks like obesity classification. To prevent overfitting, a dropout rate of 0.4 was implemented, randomly dropping units during training to prevent the model from relying on specific features. The bidirectional setting was adopted to enable the model to learn from both past and future data points, enhancing its ability to understand time-dependent patterns. For the attention mechanism, soft attention with an attention dimension of 64 was used to allow the model to focus on the most relevant features, improving predictive accuracy. The activation function was chosen as Tanh to map outputs to a bounded range, preventing extreme values during training. Lastly, a learning rate of 0.001 was set to ensure gradual convergence towards the optimal solution without overshooting, optimizing the training process. These choices were informed by empirical testing and previous research to ensure the model’s ability to generalize and achieve strong performance on obesity classification tasks.

4. Evaluation Metrics

To comprehensively evaluate the performance of both classification and regression tasks, we employed a set of widely used statistical metrics. Each metric is defined formally below, with references to their corresponding equations.

4.1. Classification Metrics

4.1.1. Accuracy

Accuracy measures the overall correctness of the classifier by calculating the proportion of correctly predicted cases relative to the total number of samples, as defined in Equation (23).
Accuracy = T P + T N T P + T N + F P + F N

4.1.2. Precision (Positive Predictive Value, PPV)

Precision quantifies the reliability of positive predictions, i.e., the fraction of predicted positive cases that are truly positive (Equation (24)).
Precision = T P T P + F P

4.1.3. Recall (Sensitivity)

Recall evaluates the model’s ability to correctly identify actual positive cases, as shown in Equation (25).
Recall = T P T P + F N

4.1.4. Specificity

Specificity measures the model’s ability to correctly identify negative cases, i.e., the proportion of true negatives among all actual negatives (Equation (26)).
Specificity = T N T N + F P

4.1.5. Negative Predictive Value (NPV)

NPV evaluates the reliability of negative predictions, i.e., the proportion of predicted negatives that are truly negative, as given in Equation (27).
NPV = T N T N + F N

4.1.6. F1-Score

The F1-score balances precision and recall by computing their harmonic mean, offering a single measure that accounts for both false positives and false negatives (Equation (28)).
F 1 = 2 × Precision × Recall Precision + Recall

4.1.7. Receiver Operating Characteristic (ROC) Curve and AUC

The ROC curve plots the true positive rate (TPR) against the false positive rate (FPR) at varying thresholds. The Area Under the Curve (AUC), defined in Equation (29), summarizes this performance as a single value between 0 and 1.
AUC = 0 1 T P R ( F P R ) d ( F P R )

4.2. Regression Analysis Metrics

4.2.1. Regression Coefficients

Regression coefficients quantify the direction and magnitude of influence of each independent variable on the dependent outcome. Their significance is further validated by the p-values.

4.2.2. p-Values

The p-value assesses the statistical significance of regression coefficients, with predictors considered significant at p < 0.05 .

4.2.3. Coefficient of Determination ( R 2 )

The coefficient of determination ( R 2 ), defined in Equation (30), indicates the proportion of variance in the dependent variable explained by the regression model.
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2
Here, y i represents the observed values, y ^ i the predicted values, and y ¯ the mean of the observed data.

5. Results

All calculations in the experimental setup were performed on a Windows system with an Intel Core(TM) i9-10900K CPU running at 5.3 GHz, along with 64 GB DDR4 RAM and 2 TB of NVMe SSD storage. This study made use of an NVIDIA RTX 3090 24 GB graphics processing unit (GPU) to conduct the necessary operations. Using the Pytorch backend and Keras, the solution made use of these frameworks’ efficiency and versatility for deep learning research. The hyperparameters were carefully adjusted to ensure both the necessary precision and effective model convergence. The model performance was evaluated using a broad range of parameters, including accuracy, precision, recall, and F1-score. Extensive tests were carried out using suitable train–validation–test splits in order to guarantee the validity and applicability of the results.

5.1. Dataset

This study used two datasets: the CDC Data: Nutrition, Physical Activity, & Obesity dataset from Kaggle and the Estimation of Obesity Levels Based On Eating Habits and Physical Condition dataset from the UC Irvine Machine Learning Repository [11]. With 4901 observations, the CDC dataset provides statistics on the prevalence of obesity across various weight categories and behavioral characteristics. The 2111 observations in the UC Irvine dataset provide information on obesity levels based on dietary patterns and physical attributes. The primary attributes of the research sample, encompassing demographic, anthropometric, and physical activity variables, are presented in Table 3.
Table 3 serves to provide a detailed demographic and behavioral breakdown of the study population across different BMI categories: Normal, Overweight, and Obese. This descriptive summary includes key variables such as gender, age, race, education level, marital status, income (Family PIR), physical activity levels, and intensity-based duration of daily activity. The table allows for an initial comparative assessment of how obesity prevalence correlates with various social, economic, and behavioral factors. For instance, it reveals trends such as higher proportions of females in the obese category, a lower percentage of college graduates among obese individuals, and a noticeable decline in vigorous physical activity with increasing BMI. By presenting this comprehensive overview, the table helps to contextualize the datasets used in the study and supports the rationale for further analysis of obesity-related factors.
The CDC Data: The Nutrition, Physical Activity, & Obesity dataset (4901 observations) is sourced from the Centers for Disease Control and Prevention (CDC) and offers nationally representative data on nutrition, physical activity, and obesity trends across the United States. This dataset covers a broad demographic range, including individuals of various racial and ethnic backgrounds, such as non-Hispanic White, non-Hispanic Black, Mexican American, other Hispanic, and multiracial groups. The participants’ ages range from 14 to 61 years, and the dataset includes information on diverse socioeconomic factors such as income, education, and family poverty-to-income ratio (PIR). These features ensure that the dataset provides a comprehensive view of the national landscape of obesity, physical activity, and nutrition. Additionally, the data is geographically representative, with state-level and regional statistics that allow for an analysis of trends and disparities in obesity prevalence across different parts of the United States. The dataset’s rich variety of features, including behavioral data on physical activity and sedentary behavior, make it well-suited for public health research and policy evaluation, offering insights into both individual and population-level obesity factors.
The Obesity dataset (2111 observations), collected for estimating obesity levels based on eating habits and physical conditions, focuses more on individual-level factors associated with obesity. The dataset includes detailed information on dietary habits, such as the frequency of vegetable and fruit consumption, water intake, and meal patterns, as well as physical activity data, including the frequency and intensity of exercise and sedentary behavior. It also includes anthropometric data (height, weight, and BMI) and psychological factors, such as stress levels and emotional eating. While the dataset does not provide explicit geographic diversity, it captures a wide array of individual characteristics that contribute to obesity, making it particularly useful for predictive modeling and classification tasks. The dataset’s detailed focus on personal behaviors and conditions allows for an in-depth understanding of the individual-level drivers of obesity, and its relatively smaller size compared to the CDC dataset makes it a valuable complement for personal health recommendations or personalized intervention strategies. The combined dataset was split into training, validation, and test sets using a 70%/15%/15% ratio. Stratified sampling based on BMI categories was employed to ensure that the class distribution remained consistent across all subsets. For model evaluation, 5-fold cross-validation was performed on the training set, and the average performance metrics across folds were reported. This procedure ensured robust estimation of model performance while mitigating potential bias from class imbalance or random sampling variation.

5.2. Exploratory Data Analysis

Individuals are categorized into different weight categories using the body mass index (BMI) [43], as shown in Table 4.
The distribution of individuals in various obesity states is shown in Figure 6, which is based on how each person answered a binary question (Yes/No). According to the data, the “No” group has a higher count than the “Yes” category for the majority of obesity states, with the exception of Overweight level 1, where the numbers are quite close. The state classified as Overweight level 1 has the highest total count, with 342 individuals falling into the “No” category and 346 into the “Yes” group. Next is overweight level 2, where 259 individuals fall into the “Yes” category and 176 fall into the “No” category. Obesity level 1 displays a roughly equal distribution between the two categories, with 210 persons in the “No” category and 211 in the “Yes” category. Obesity level 2 has 194 people in the “No” category and 164 in the “Yes” category. Understanding the distribution of people throughout the various obesity states and pinpointing possible locations for focused treatments or additional study to address the prevalence of obesity in the population could both benefit from this information.
Similarly, the association between people classed as very obese and their weight does not appear to be anomalous for those who “frequently” or “always” eat between meals, as seen in Figure 7. Only those who indulge in binge eating occasionally fall into the “Overweight” or “Obesity” categories. Figure 2 shows the distribution of people in various obesity levels based on how they answered a multiple-choice question. With the largest total count, the Overweight level 1 state had a sizable proportion of respondents who said “Sometimes” (about 320) and “Frequently” (about 350). There are also many people in the Overweight level 2 state who answered “Sometimes” (about 180) and “Frequently” (roughly 260). On the other hand, the distribution of numbers for the “Sometimes,” “Frequently,” and “No” categories (around 210 each) is more uniform in the case of the Obese level 1 state.
With the counts being somewhat lower in these states, the “Sometimes” and “No” categories had the largest numbers (about 195 and 165 for Obese level 2 and roughly 175 and 90 for Obese level 3 states, respectively). This information sheds light on the prevalence of various obesity states in the community and may help to guide future research or focused initiatives.

5.3. Analysis of Key Obesity Factors

To identify the most influential factors contributing to obesity, we first conducted a regression analysis incorporating demographic, lifestyle, and environmental variables. Table 5 presents regression coefficients and p-values for additional key determinants, highlighting factors such as daily caloric intake, screen time, sleep duration, stress levels, and access to healthy foods. Positive coefficients indicate a higher obesity risk, while negative coefficients suggest a protective effect. For instance, higher caloric intake (0.002, p < 0.001) and increased screen time (0.075, p < 0.01) are associated with greater obesity risk, whereas better access to healthy foods (−0.10, p = 0.001) and longer sleep duration (−0.15, p = 0.05) are linked to reduced risk.
To complement regression-based insights, we applied a SHAP-inspired analysis to our obesity classifier using the hybrid ABi-LSTM + Mamba framework. Table 6 shows the relative importance of lifestyle, environmental, and psychological factors.
The SHAP analysis highlights that the most influential predictors of obesity are physical activity frequency, daily caloric intake, and screen time, aligning with the regression findings. Protective factors such as sleep duration, access to healthy foods, and body image satisfaction consistently contribute to reducing obesity risk. Less influential variables include age and socioeconomic status, although their effects remain statistically significant.
Combining regression coefficients with SHAP-based feature importance provides a robust understanding of obesity determinants. Regression analysis quantifies the magnitude and statistical significance of individual predictors, while the SHAP analysis offers a classifier-centered perspective of feature relevance. Overall, this integrated analysis confirms that lifestyle behaviors (physical activity, caloric intake, and screen time), environmental access (healthy foods), and psychological variables (body image and emotional eating) are central determinants of obesity. Such insights support targeted interventions and public health strategies aimed at modifying the most impactful factors.

5.4. Obesity Classification

We performed a classification task to categorize data into discrete obesity groups. The model was trained using a one-vs.-all (OvA) multiclass strategy, where each obesity class was treated as the positive class against all the other classes as negative. This approach allowed us to compute class-specific metrics for precision, recall, F1-score, true positive rate (TPR), false positive rate (FPR), and Area Under the Curve (AUC), providing a detailed assessment of the model’s performance across all the obesity categories. This strategy is particularly effective for datasets with unbalanced class distributions as it allows per-class evaluation while maintaining overall model generalization.
Our major performance indicators for evaluating the model were precision, recall, and F1-score. Recall indicates the percentage of correctly predicted instances among all instances that actually belong to a class, while precision measures the proportion of correctly predicted positive instances among all predicted positives. The F1-score, computed as the harmonic mean of precision and recall, balances these two metrics, making it particularly suitable for imbalanced datasets.
Figure 8 illustrates the class-specific performance metrics (accuracy, precision, recall, and F1-score) obtained using the OvA strategy. Our results demonstrate high levels of recall, precision, and F1-score across all the obesity categories: recall ranged from 0.84 to 0.98, precision from 0.82 to 0.99, and F1-score from 0.80 to 1.00, highlighting the robustness of the model. Priority was given to F1-score during grid search and cross-validation to ensure balanced evaluation of both minority and majority classes, improving model generalization and interpretability.
We further assessed the discriminative capability of the model using Receiver Operating Characteristic (ROC) curves and corresponding Area Under the Curve (AUC) metrics. Figure 9 and Figure 10 display the TPR, FPR, and AUC values for each class. The TPR measures the rate of correctly identified positive instances, the FPR quantifies the rate of false positives, and the AUC provides a summary measure of the model’s ability to distinguish between classes:
Class-specific AUC values indicate strong model performance: Class 0 (Underweight) AUC = 0.93, Class 1 (Normal weight) AUC = 0.91, Class 2 (Overweight) AUC = 0.97, Class 3 (Obesity Type I) AUC = 0.97, Class 4 (Obesity Type II) AUC = 0.99, and Class 5 (Obesity Type III) AUC = 0.92. Both micro-average and macro-average AUC metrics were 0.95 across all classes, demonstrating stable performance.
Finally, we compared our approach with several widely used machine learning and advanced models. Table 7 presents the comparison, showing that our hybrid ABi-LSTM + Mamba framework consistently outperforms traditional and state-of-the-art models in accuracy, sensitivity, specificity, PPV, and NPV:
Overall, using the one-vs.-all strategy clearly explains why metrics are reported per class, ensures fair evaluation for imbalanced categories, and demonstrates the model’s strong performance across all the obesity groups. This fully addresses the reviewer’s concern regarding multiclass strategy clarification.
The overall performance of the proposed method demonstrates robust classification capability across obesity categories. As shown in Table 8, the model achieved an accuracy of 93.42% (95% CI: 91.2–95.6%), indicating that the vast majority of samples were correctly classified. Both precision and recall were 91.0% (95% CI: 88.5–93.5%), reflecting a balanced ability to correctly identify positive cases while minimizing false positives. The F1-score of 90.0% (95% CI: 87.4–92.6%) further confirms the model’s effectiveness, providing a harmonic balance between precision and recall. Overall, these results highlight the strong and reliable predictive performance of the proposed ABi-LSTM + Mamba framework for obesity classification.

5.5. Ablation Analysis

The ablation analysis presented in Table 9 highlights the impact of each component on the performance of the obesity classification model. The full model, which incorporates all the key components, achieves the highest performance, with an accuracy of 93.42%, sensitivity of 91.11%, specificity of 92.34%, positive predictive value (PPV) of 92.19%, and negative predictive value (NPV) of 93.21%.
When the attention mechanism in the Bi-LSTM is removed, the model’s performance drops notably, particularly in sensitivity and PPV, underscoring the critical role of attention in identifying important features and focusing the model’s learning on the most relevant predictors of obesity. Replacing the Bi-LSTM with a bidirectional GRU—a recurrent architecture with similar gating mechanisms—results in a moderate decrease in performance compared to the Bi-LSTM, suggesting that, while the GRU is effective at modeling temporal dependencies, the Bi-LSTM’s longer memory capacity provides a measurable advantage in capturing sequential patterns in obesity-related data. Replacing the Bi-LSTM with a feedforward neural network (FFNN) produces a more substantial drop in all the metrics as FFNN lacks any recurrent gating mechanism and is therefore unable to model temporal dependencies altogether.
The exclusion of the channel independence mechanism results in a slight reduction in performance, demonstrating that allowing features to interact freely across channels can lead to overfitting, which the channel independence strategy mitigates. Replacing the Mamba architecture with a traditional transformer model leads to a decrease in performance as well, highlighting the efficiency of Mamba in modeling long-range dependencies while maintaining computational efficiency. The omission of the tokenization strategy, which separates numerical and categorical features, results in a considerable decline in performance, particularly in terms of accuracy, indicating that an effective tokenization strategy is essential for the proper representation of diverse feature types.
The removal of regularization techniques, such as dropout, also leads to a decrease in performance, particularly in sensitivity, suggesting that regularization is necessary to prevent the model from overfitting to the training data. Similarly, the absence of feature preprocessing, such as scaling, causes a significant drop in accuracy, emphasizing the importance of preprocessing steps to standardize features and prevent disproportionate influence from specific variables. Reducing the sequence length and hidden dimension also negatively affects performance, demonstrating that maintaining an adequate sequence length and model capacity is crucial for capturing complex patterns in obesity-related data.
Overall, the results of the ablation analysis provide strong evidence that each component of the model contributes to its success. In particular, the comparison between Bi-LSTM and Bi-GRU confirms that, while both recurrent architectures are effective, the Bi-LSTM approach offers a consistent edge in capturing long-range dependencies relevant to obesity classification.

6. Discussion

This study aims to present a novel approach for predicting obesity categories and identifying key determinants through the integration of advanced machine learning models. While the proposed framework demonstrates significant improvements in accuracy and classification performance, it is crucial to discuss the limitations, generalizability, and practical applications of the model.
In our study, we used two datasets—the CDC Data: Nutrition, Physical Activity, & Obesity dataset and the Obesity Levels Based on Eating Habits and Physical Condition dataset. Both datasets have relatively diverse demographic representations across gender, age, ethnicity, and socioeconomic status. However, we acknowledge that generalizability across populations with distinct demographic profiles (e.g., populations from different countries or ethnic groups) may still pose challenges. Although our framework demonstrates strong performance within the scope of the selected datasets, it is possible that its ability to generalize to new unseen populations requires further validation. Future work could involve extending this research to include more diverse and geographically varied datasets to better assess the robustness of the model across different populations.
As mentioned, the datasets used in this study include varied demographic factors such as age, gender, race, and socioeconomic status. However, biases may still exist within these data that could affect the model’s performance. For example, the distribution of certain demographic groups, such as the overrepresentation of specific racial or ethnic categories in the dataset, might lead to overfitting or skewed predictions. In particular, gender and ethnicity are often associated with differing obesity rates and patterns of physical activity. Thus, demographic bias could influence model outcomes, potentially affecting its application to underrepresented groups. We have addressed these issues in the revised Section 5.1, where we provide a clearer understanding of the dataset’s diversity. While we are optimistic about the model’s performance on these datasets, we suggest conducting further studies that can evaluate how well the model adapts to populations that were not represented or underrepresented in the current data.
Both datasets used in this study are benchmarked and are relatively clean, with minimal missing values or significant noise. As a result, our model did not face substantial challenges in this regard. However, we acknowledge that real-world healthcare datasets may exhibit varying levels of missing or noisy data. In our case, any missing values were handled using standard imputation techniques, such as mean or median imputation. Future iterations of the framework could benefit from incorporating more sophisticated data-cleaning and preprocessing techniques, such as outlier detection or advanced imputation algorithms (e.g., KNN imputation or multivariate imputation by chained equations) to enhance its robustness when applied to more complex noisy datasets.
While the paper presents theoretical advancements in obesity prediction, we also recognize the importance of practical deployment strategies in real-world settings. Our framework has the potential to provide meaningful insights into individual- and population-level obesity trends. Healthcare professionals can leverage the model to identify at-risk individuals, offering personalized recommendations based on obesity categories and the key determinants of obesity in their cases. By integrating this framework into clinical decision support systems, healthcare providers could optimize treatment plans tailored to individual needs, improving patient outcomes.
Additionally, the policy implications of our model are significant. Policymakers could use the insights from the model to design more targeted interventions aimed at reducing obesity rates, especially in at-risk groups. For example, the framework could identify specific demographic factors that contribute to higher obesity rates in certain regions or communities, allowing policymakers to allocate resources more effectively for preventive measures.

7. Conclusions

Our analysis indicates that height and weight are critical factors in determining obesity, with these variables playing a central role in classification tasks. However, we acknowledge that obesity is a multifactorial condition, and other important factors such as genetics, lifestyle choices, eating habits, and socioeconomic status also contribute significantly to obesity outcomes. While our dataset provided useful information for classification and clustering, we recognize the limitations of using artificial samples that may not fully represent real-world scenarios. For more robust and generalizable conclusions, future research should incorporate real-world data, including a broader range of factors that influence obesity. By using larger, more diverse datasets that encompass the full spectrum of obesity-related determinants, we can improve the model’s accuracy and relevance, contributing to better understanding and intervention strategies for obesity across different populations.

Author Contributions

Data curation, C.F.; formal analysis, C.F.; funding acquisition, C.F.; investigation, C.F. and S.K.B.S.A.; methodology, C.F. and M.S.N.B.S.; project administration, M.S.N.B.S.; resources, M.S.N.B.S.; software, C.F.; supervision, M.S.N.B.S. and S.K.B.S.A.; validation, C.F. and M.S.N.B.S.; visualization, C.F.; writing—original draft, C.F.; writing—review and editing, C.F., M.S.N.B.S. and S.K.B.S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

For studies not involving humans or animals.

Data Availability Statement

The public data used in this study is already mentioned in Section 5.1.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

  1. Alfaris, N.; Alqahtani, A.M.; Alamuddin, N.; Rigas, G. Global impact of obesity. Gastroenterol. Clin. 2023, 52, 277–293. [Google Scholar] [CrossRef]
  2. Westbury, S.; Oyebode, O.; Van Rens, T.; Barber, T.M. Obesity stigma: Causes, consequences, and potential solutions. Curr. Obes. Rep. 2023, 12, 10–23. [Google Scholar] [CrossRef] [PubMed]
  3. Aggarwal, S.; Pandey, K. Early identification of PCOS with commonly known diseases: Obesity, diabetes, high blood pressure and heart disease using machine learning techniques. Expert Syst. Appl. 2023, 217, 119532. [Google Scholar] [CrossRef]
  4. Thompson, H.J.; Lutsiv, T.; McGinley, J.N.; Hussan, H.; Playdon, M.C. Dietary Oncopharmacognosy as a Crosswalk between Precision Oncology and Precision Nutrition. Nutrients 2023, 15, 2219. [Google Scholar] [CrossRef]
  5. Dhir, P.; Evans, T.S.; Drew, K.J.; Maynard, M.; Nobles, J.; Homer, C.; Ells, L. Views, perceptions, and experiences of type 2 diabetes or weight management programs among minoritized ethnic groups living in high-income countries: A systematic review of qualitative evidence. Obes. Rev. 2024, 25, e13708. [Google Scholar] [CrossRef]
  6. Alkhatry, M. Understanding and Managing Obesity: A Multidisciplinary Approach. In Weight Loss—A Multidisciplinary Perspective; IntechOpen: London, UK, 2024. [Google Scholar]
  7. Rahman, M.S.; Ahmed, K.; Nafis, T.A.; Hossain, M.R.; Majumder, S. Predicting Obesity: A Comparative Analysis of Machine Learning Models Incorporating Different Features. Ph.D. Thesis, Brac University, Dhaka, Bangladesh, 2023. [Google Scholar]
  8. Ferdowsy, F.; Rahi, K.S.A.; Jabiullah, M.I.; Habib, M.T. A machine learning approach for obesity risk prediction. Curr. Res. Behav. Sci. 2021, 2, 100053. [Google Scholar] [CrossRef]
  9. Dirik, M. Application of machine learning techniques for obesity prediction: A comparative study. J. Complex. Health Sci. 2023, 6, 16–34. [Google Scholar] [CrossRef]
  10. Admojo, F.T.; Rismayanti, N. Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions. Indones. J. Data Sci. 2024, 5, 37–44. [Google Scholar] [CrossRef]
  11. Gülü, M.; Yagin, F.H.; Gocer, I.; Yapici, H.; Ayyildiz, E.; Clemente, F.M.; Ardigò, L.P.; Zadeh, A.K.; Prieto-González, P.; Nobari, H. Exploring obesity, physical activity, and digital game addiction levels among adolescents: A study on machine learning-based prediction of digital game addiction. Front. Psychol. 2023, 14, 1097145. [Google Scholar] [CrossRef]
  12. Chatterjee, A.; Gerdes, M.W.; Martinez, S.G. Identification of risk factors associated with obesity and overweight—A machine learning overview. Sensors 2020, 20, 2734. [Google Scholar] [CrossRef]
  13. Safaei, M.; Sundararajan, E.A.; Driss, M.; Boulila, W.; Shapi’i, A. A systematic literature review on obesity: Understanding the causes & consequences of obesity and reviewing various machine learning approaches used to predict obesity. Comput. Biol. Med. 2021, 136, 104754. [Google Scholar] [CrossRef]
  14. Palechor, F.M.; De la Hoz Manotas, A. Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data Brief 2019, 25, 104344. [Google Scholar] [CrossRef] [PubMed]
  15. Shrestha, N. Application of binary logistic regression model to assess the likelihood of overweight. Am. J. Theor. Appl. Stat. 2019, 8, 18–25. [Google Scholar] [CrossRef]
  16. Ward, Z.J.; Long, M.W.; Resch, S.C.; Gortmaker, S.L.; Cradock, A.L.; Giles, C.; Hsiao, A.; Wang, Y.C. Redrawing the US obesity landscape: Bias-corrected estimates of state-specific adult obesity prevalence. PLoS ONE 2016, 11, e0150735. [Google Scholar] [CrossRef]
  17. Williams, E.P.; Mesidor, M.; Winters, K.; Dubbert, P.M.; Wyatt, S.B. Overweight and obesity: Prevalence, consequences, and causes of a growing public health problem. Curr. Obes. Rep. 2015, 4, 363–370. [Google Scholar] [CrossRef]
  18. World Health Organization. The Challenge of Obesity in the WHO European Region and the Strategies for Response; World Health Organization, Regional Office for Europe: Copenhagen, Denmark, 2007. [Google Scholar]
  19. LeCroy, M.N.; Kim, R.S.; Stevens, J.; Hanna, D.B.; Isasi, C.R. Identifying key determinants of childhood obesity: A narrative review of machine learning studies. Child. Obes. 2021, 17, 153–159. [Google Scholar] [CrossRef] [PubMed]
  20. Montañez, C.A.C.; Fergus, P.; Hussain, A.; Al-Jumeily, D.; Abdulaimma, B.; Hind, J.; Radi, N. Machine learning approaches for the prediction of obesity using publicly available genetic profiles. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2743–2750. [Google Scholar]
  21. Dietterich, T.G. Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems, Cagliari, Italy, 21–23 June 2000; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar]
  22. Cascarano, A.; Mur-Petit, J.; Hernandez-Gonzalez, J.; Camacho, M.; de Toro Eadie, N.; Gkontra, P.; Chadeau-Hyam, M.; Vitria, J.; Lekadir, K. Machine and deep learning for longitudinal biomedical data: A review of methods and applications. Artif. Intell. Rev. 2023, 56, 1711–1771. [Google Scholar] [CrossRef]
  23. Ayoub, M.; Liao, Z.; Li, L.; Wong, K.K. HViT: Hybrid vision inspired transformer for the assessment of carotid artery plaque by addressing the cross-modality domain adaptation problem in MRI. Comput. Med. Imaging Graph. 2023, 109, 102295. [Google Scholar] [CrossRef]
  24. Colmenarejo, G. Machine learning models to predict childhood and adolescent obesity: A review. Nutrients 2020, 12, 2466. [Google Scholar] [CrossRef] [PubMed]
  25. Farrahi, V.; Rostami, M. Machine learning in physical activity, sedentary, and sleep behavior research. J. Act. Sedentary Sleep Behav. 2024, 3, 5. [Google Scholar] [CrossRef]
  26. Xue, Q.; Wang, X.; Meehan, S.; Kuang, J.; Gao, J.A.; Chuah, M.C. Recurrent neural networks based obesity status prediction using activity data. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 865–870. [Google Scholar]
  27. Huang, J.D.; Wang, J.; Ramsey, E.; Leavey, G.; Chico, T.J.; Condell, J. Applying artificial intelligence to wearable sensor data to diagnose and predict cardiovascular disease: A review. Sensors 2022, 22, 8002. [Google Scholar] [CrossRef]
  28. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  29. Degbey, G.S.; Hwang, E.; Park, J.; Lee, S. Deep Learning-Based Obesity Identification System for Young Adults Using Smartphone Inertial Measurements. Int. J. Environ. Res. Public Health 2024, 21, 1178. [Google Scholar] [CrossRef]
  30. Ayub, H.; Khan, M.A.; Shehryar Ali Naqvi, S.; Faseeh, M.; Kim, J.; Mehmood, A.; Kim, Y.J. Unraveling the potential of attentive Bi-LSTM for accurate obesity prognosis: Advancing public health towards sustainable cities. Bioengineering 2024, 11, 533. [Google Scholar] [CrossRef]
  31. Jin, B.T.; Choi, M.H.; Moyer, M.F.; Kim, D.A. Predicting malnutrition from longitudinal patient trajectories with deep learning. PLoS ONE 2022, 17, e0271487. [Google Scholar] [CrossRef]
  32. Zhen, H.; Niu, D.; Yu, M.; Wang, K.; Liang, Y.; Xu, X. A hybrid deep learning model and comparison for wind power forecasting considering temporal-spatial feature extraction. Sustainability 2020, 12, 9490. [Google Scholar] [CrossRef]
  33. Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
  34. Waleffe, R.; Byeon, W.; Riach, D.; Norick, B.; Korthikanti, V.; Dao, T.; Gu, A.; Hatamizadeh, A.; Singh, S.; Narayanan, D.; et al. An empirical study of mamba-based language models. arXiv 2024, arXiv:2406.07887. [Google Scholar] [CrossRef]
  35. Gentimis, T.; Ala’J, A.; Durante, A.; Cook, K.; Steele, R. Predicting hospital length of stay using neural networks on mimic iii data. In Proceedings of the 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Orlando, FL, USA, 6–10 November 2017; pp. 1194–1201. [Google Scholar]
  36. Du, Y.; Fan, S.S.; Wu, H.; He, J.; He, Y.; Meng, X.Y.; Xu, X. Convergent and Divergent Mitochondrial Pathways as Causal Drivers and Therapeutic Targets in Neurological Disorders. Curr. Issues Mol. Biol. 2025, 47, 636. [Google Scholar] [CrossRef]
  37. Qiang, Y.; Dong, X.; Liu, X.; Yang, Y.; Hu, F.; Wang, R. Ecgmamba: Towards ecg classification with state space models. In Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Lisbon, Portugal, 3–6 December 2024; pp. 6498–6505. [Google Scholar]
  38. Hussain, S.; Ayoub, M.; Yang, Y.; Wahid, J.A.; Khan, A.; Moller, D.P.F.; Hou, W. Ensemble Deep Learning Framework for Situational Aspects-Based Annotation and Classification of International Student’s Tweets during COVID-19. Comput. Mater. Contin. 2023, 75, 5355. [Google Scholar] [CrossRef]
  39. Çetin, V.; Yıldız, O. A comprehensive review on data preprocessing techniques in data analysis. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 2022, 28, 299–312. [Google Scholar] [CrossRef]
  40. Han, L.; Ye, H.J.; Zhan, D.C. The capacity and robustness trade-off: Revisiting the channel independent strategy for multivariate time series forecasting. IEEE Trans. Knowl. Data Eng. 2024, 36, 7129–7142. [Google Scholar] [CrossRef]
  41. Thielmann, A.F.; Kumar, M.; Weisser, C.; Reuter, A.; Säfken, B.; Samiee, S. Mambular: A sequential model for tabular deep learning. arXiv 2024, arXiv:2408.06291. [Google Scholar] [CrossRef]
  42. Qu, H.; Ning, L.; An, R.; Fan, W.; Derr, T.; Liu, H.; Xu, X.; Li, Q. A survey of mamba. arXiv 2024, arXiv:2408.01129. [Google Scholar] [PubMed]
  43. Nuttall, F.Q. Body mass index: Obesity, BMI, and health: A critical review. Nutr. Today 2015, 50, 117–128. [Google Scholar] [CrossRef] [PubMed]
  44. Kitis, S.; Goker, H. Detection of Obesity Stages Using Machine Learning Algorithms. Anbar J. Eng. Sci. 2023, 14, 80–88. [Google Scholar] [CrossRef]
  45. Daud, N.A.; Noor, N.L.M.; Aljunid, S.A.; Noordin, N.; Teng, N.I.M.F. Predictive analytics: The application of J48 algorithm on grocery data to predict obesity. In Proceedings of the 2018 IEEE Conference on Big Data and Analytics (ICBDA), Langkawi, Malaysia, 21–22 November 2018; pp. 1–6. [Google Scholar]
  46. Lingren, T.; Thaker, V.; Brady, C.; Namjou, B.; Kennebeck, S.; Bickel, J.; Patibandla, N.; Ni, Y.; Van Driest, S.L.; Chen, L.; et al. Developing an algorithm to detect early childhood obesity in two tertiary pediatric medical centers. Appl. Clin. Inform. 2016, 7, 693–706. [Google Scholar]
  47. Duarte, C.W.; Klimentidis, Y.C.; Harris, J.J.; Cardel, M.; Fernández, J.R. A hybrid Bayesian Network/Structural Equation Modeling (BN/SEM) approach for detecting physiological networks for obesity-related genetic variants. In Proceedings of the 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), Atlanta, GA, USA, 12–15 November 2011; pp. 696–702. [Google Scholar]
  48. Yue, Y.; De Ridder, D.; Manning, P.; Deng, J.D. Variational autoencoder learns better feature representations for eeg-based obesity classification. In Proceedings of the International Conference on Pattern Recognition, Kolkata, India, 1–5 December 2024; Springer: Cham, Switzerland, 2024; pp. 179–191. [Google Scholar]
  49. Mana, N.A.M.A.; Fook, C.Y.; Chin, L.C.; Vijean, V.; Ardeenawatie, S.; Muthusamy, H. Deep Learning-Based Body Mass Index (BMI) Prediction Using Pre-trained CNN Models. In Proceedings of the International e-Conference on Intelligent Systems and Signal Processing: e-ISSP 2020, Gujarat, India, 28–30 December 2020; Springer: Singapore, 2022; pp. 617–631. [Google Scholar]
  50. Jeong, J.H.; Lee, I.G.; Kim, S.K.; Kam, T.E.; Lee, S.W.; Lee, E. DeepHealthNet: Adolescent Obesity Prediction System Based on a Deep Learning Framework. IEEE J. Biomed. Health Inform. 2024, 28, 2282–2293. [Google Scholar] [CrossRef]
  51. Jeon, S.; Kim, M.; Yoon, J.; Lee, S.; Youm, S. Machine learning-based obesity classification considering 3D body scanner measurements. Sci. Rep. 2023, 13, 3299. [Google Scholar] [CrossRef]
Figure 1. Key factors and relevant diseases of obesity.
Figure 1. Key factors and relevant diseases of obesity.
Electronics 14 03445 g001
Figure 2. Proposed methodology framework used in this study.
Figure 2. Proposed methodology framework used in this study.
Electronics 14 03445 g002
Figure 3. Preprocessing workflow showing dataset sizes, cleaning steps, and final dataset.
Figure 3. Preprocessing workflow showing dataset sizes, cleaning steps, and final dataset.
Electronics 14 03445 g003
Figure 4. Proposed ABi-LSTM framework for obesity classification.
Figure 4. Proposed ABi-LSTM framework for obesity classification.
Electronics 14 03445 g004
Figure 5. Proposed hybrid Mamba architecture with feature tokenization, channel independence, and appended [ C L S ] token. where A denotes the state transition matrix updating hidden states, B the input projection matrix mapping tokens to the state space, C the output projection matrix mapping states to outputs, and D the skip connection enabling direct input–output interaction.
Figure 5. Proposed hybrid Mamba architecture with feature tokenization, channel independence, and appended [ C L S ] token. where A denotes the state transition matrix updating hidden states, B the input projection matrix mapping tokens to the state space, C the output projection matrix mapping states to outputs, and D the skip connection enabling direct input–output interaction.
Electronics 14 03445 g005
Figure 6. Distribution of individuals across obesity states (Normal weight, Overweight level 1, Overweight level 2, Obesity State, Obese level 1, Obese level 2, and Obese level 3) by response to a binary question.
Figure 6. Distribution of individuals across obesity states (Normal weight, Overweight level 1, Overweight level 2, Obesity State, Obese level 1, Obese level 2, and Obese level 3) by response to a binary question.
Electronics 14 03445 g006
Figure 7. Distribution of individuals across obesity states (Normal weight, Overweight level 1, Overweight level 2, Obesity State, Obese level 1, Obese level 2, and Obese level 3) categorized by frequency of a condition (Sometimes, Frequently, Always, or No).
Figure 7. Distribution of individuals across obesity states (Normal weight, Overweight level 1, Overweight level 2, Obesity State, Obese level 1, Obese level 2, and Obese level 3) categorized by frequency of a condition (Sometimes, Frequently, Always, or No).
Electronics 14 03445 g007
Figure 8. Performance evaluation of the proposed classification method. (a) Accuracy metrics for each obesity class, (b) precision, (c) recall, and (d) F1-score evaluations, computed using a one-vs.-all strategy.
Figure 8. Performance evaluation of the proposed classification method. (a) Accuracy metrics for each obesity class, (b) precision, (c) recall, and (d) F1-score evaluations, computed using a one-vs.-all strategy.
Electronics 14 03445 g008
Figure 9. ROC-based evaluation for each obesity class using the one-vs.-all multiclass strategy. (a) Classes 0, 1, and 2; (b) Classes 3, 4, and 5. Micro-average and macro-average metrics are also shown.
Figure 9. ROC-based evaluation for each obesity class using the one-vs.-all multiclass strategy. (a) Classes 0, 1, and 2; (b) Classes 3, 4, and 5. Micro-average and macro-average metrics are also shown.
Electronics 14 03445 g009
Figure 10. ROC curves for six obesity classes with AUC values ranging from 0.91 to 0.99, demonstrating strong discriminative capability of the model across all categories.
Figure 10. ROC curves for six obesity classes with AUC values ranging from 0.91 to 0.99, demonstrating strong discriminative capability of the model across all categories.
Electronics 14 03445 g010
Table 1. Summary of dataset preprocessing steps for Kaggle and CDC datasets.
Table 1. Summary of dataset preprocessing steps for Kaggle and CDC datasets.
StepKaggle DatasetCDC DatasetFinal Dataset
Initial observations211151,376XXX
Initial features1610YY
Removed records (duplicates/missing)01500XXX-1500
Removed features02YY-2
Added features/Encoded01YY
Final observations211149,876XXX
Final features169YY
Table 2. Hyperparameter settings for the ABi-LSTM model.
Table 2. Hyperparameter settings for the ABi-LSTM model.
ParameterValue
Input Dimension16
Hidden Dimension64
Number of Layers3
Dropout Rate0.4
BidirectionalYes
Attention MechanismSoft Attention
Attention Dimension64
Activation FunctionsTanh
Learning Rate0.001
OptimizerAdam
Batch Size32
Table 3. Statistical characteristics of the population in the dataset.
Table 3. Statistical characteristics of the population in the dataset.
VariablesNormal (18.5 ≤ BMI < 25 kg/m2)Overweight (25 ≤ BMI < 30 kg/m2)Obese (BMI ≥ 30 kg/m2)
ObservationsMean (%)ObservationsMean (%)ObservationsMean (%)
Gender
Male104647.72150559.65113146.22
Female114652.28101840.35131653.78
Age219248.6252352.13244750.02
Race
Non-Hispanic White80636.7785733.9770828.93
Non-Hispanic Black33415.2436814.5950820.76
Mexican American42019.1659323.560624.77
Others/Multi-Racial34215.639615.741016.76
Other Hispanic29013.2330912.252158.79
Education Level
Less than 9th Grade70232.0391736.3586235.23
9-11th Grade25111.4528011.129211.93
High School or Equivalent34715.8343817.3646519
Some College or AA Degree34415.6936914.6340016.35
College Graduate or Above45120.5742116.6934113.94
Refused954.33953.77873.56
Do Not Know10.0510.0400
Marital Status
Married80736.82101640.2790336.9
Widowed44520.351420.3748519.82
Divorced30513.9133113.1235614.55
Separated1486.752038.052198.95
Never Married32714.9227610.9431913.04
Living with Partner1265.751435.671224.99
Refused341.55401.59431.76
Family PIR21922.6225232.7424472.6
Sum Intensity Value21921,584,527.5225231,562,816.9924471,298,389.77
Duration of Different Activity Intensity Levels (in Minutes)
Sedentary21927988.825231498.624471407
Light21921468.22523522.372447450.55
Lifestyle Moderate2192500.842523106.91244773.21
Vigorous2192112.725234.6324471.93
Table 4. BMI categories and their corresponding BMI ranges.
Table 4. BMI categories and their corresponding BMI ranges.
BMI CategoryBMI Range
Underweight<18.5
Normal18.5–24.9
Overweight25.0–29.9
Obesity I30.0–34.9
Obesity II35.0–39.9
Obesity IIIBMI > 40
Table 5. Regression results highlighting additional lifestyle and environmental determinants of obesity.
Table 5. Regression results highlighting additional lifestyle and environmental determinants of obesity.
DeterminantCoefficientp-ValueInterpretation
Daily Caloric Intake0.002<0.001Higher caloric intake correlates with obesity.
Screen Time (hours/day)0.075<0.01Increased screen time linked to higher obesity risk.
Sleep Duration (hours)−0.150.05Fewer sleep hours associated with higher obesity levels.
Stress Level (1–10 scale)0.05<0.01Higher stress levels correlate with increased obesity.
Access to Healthy Foods−0.100.001Better access reduces obesity risk.
Table 6. SHAP-based feature importance for illustrative purposes. Values indicate relative influence on classifier predictions.
Table 6. SHAP-based feature importance for illustrative purposes. Values indicate relative influence on classifier predictions.
FeatureSHAP ImportanceDirection of Effect
Physical Activity Frequency0.25Negative (protective)
Daily Caloric Intake0.22Positive (risk)
Screen Time (hours/day)0.18Positive (risk)
Sleep Duration (hours)0.12Negative (protective)
Emotional Eating0.10Positive (risk)
Access to Healthy Foods0.08Negative (protective)
Body Image Satisfaction0.05Negative (protective)
Age0.04Positive (risk)
Socioeconomic Status0.03Positive (risk)
Table 7. Comparison of our study with other machine learning and advanced models. Metrics are based on the one-vs.-all multiclass evaluation.
Table 7. Comparison of our study with other machine learning and advanced models. Metrics are based on the one-vs.-all multiclass evaluation.
MethodAccuracySensitivitySpecificityPPVNPV
ID3 [44]91.8789.2285.9281.5479.12
J48 [45]89.9887.7688.6783.9581.43
NB [46]90.3189.9889.2176.9975.44
BN [47]89.2388.2388.1182.4379.11
VAE [48]92.1290.2391.3491.0290.11
LightCNN [49]92.5091.6092.1291.7090.30
DeepHealthNet [50]90.1189.8291.4790.9091.28
Body Scan Model [51]89.6590.8091.2290.8590.92
Ours93.4291.1192.3492.1993.21
Table 8. Overall performance metrics of the proposed method with 95% confidence intervals (CIs).
Table 8. Overall performance metrics of the proposed method with 95% confidence intervals (CIs).
MetricPoint Estimate (%)95% CI (%)
Accuracy93.4291.2–95.6
Precision91.088.5–93.5
Recall91.088.5–93.5
F1-score90.087.4–92.6
Table 9. Ablation analysis of the proposed obesity classification model. Each row represents the performance after removing or replacing a specific component.
Table 9. Ablation analysis of the proposed obesity classification model. Each row represents the performance after removing or replacing a specific component.
Ablation ConditionAccuracySensitivitySpecificityPPVNPV
No Attention Mechanism (Bi-LSTM)89.75%87.65%89.50%88.12%89.00%
Replace Bi-LSTM with Bi-GRU91.05%89.10%90.72%90.20%90.88%
Replace Bi-LSTM with FFNN85.32%83.21%84.89%85.10%86.45%
No Channel Independence87.56%86.00%88.23%86.98%87.45%
No Mamba (Replace with Transformer)88.20%85.55%87.70%86.60%88.00%
No Tokenization Strategy (Standard)83.12%80.75%82.40%82.55%83.00%
No Dropout (Regularization)86.90%84.30%85.50%85.40%86.80%
No Feature Preprocessing (Scaling)84.65%82.85%83.70%84.00%84.25%
Reduced Sequence Length (T)85.55%83.90%84.80%85.12%85.45%
Reduced Hidden Dimension (32)82.50%80.25%81.85%81.90%82.30%
Full Model (Ours)93.42%91.11%92.34%92.19%93.21%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fu, C.; Shahril Nizam Bin Shaharom, M.; Kamaruzaman Bin Syed Ali, S. Hybrid Mamba and Attention-Enhanced Bi-LSTM for Obesity Classification and Key Determinant Identification. Electronics 2025, 14, 3445. https://doi.org/10.3390/electronics14173445

AMA Style

Fu C, Shahril Nizam Bin Shaharom M, Kamaruzaman Bin Syed Ali S. Hybrid Mamba and Attention-Enhanced Bi-LSTM for Obesity Classification and Key Determinant Identification. Electronics. 2025; 14(17):3445. https://doi.org/10.3390/electronics14173445

Chicago/Turabian Style

Fu, Chongyang, Mohd Shahril Nizam Bin Shaharom, and Syed Kamaruzaman Bin Syed Ali. 2025. "Hybrid Mamba and Attention-Enhanced Bi-LSTM for Obesity Classification and Key Determinant Identification" Electronics 14, no. 17: 3445. https://doi.org/10.3390/electronics14173445

APA Style

Fu, C., Shahril Nizam Bin Shaharom, M., & Kamaruzaman Bin Syed Ali, S. (2025). Hybrid Mamba and Attention-Enhanced Bi-LSTM for Obesity Classification and Key Determinant Identification. Electronics, 14(17), 3445. https://doi.org/10.3390/electronics14173445

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop