1. Introduction
Heart rate variability (HRV), obtained from beat-to-beat oscillations in consecutive RR intervals, has long been recognized as a noninvasive biomarker of autonomic cardiac regulation [
1,
2]. There is a considerable amount of evidence showing that reductions in HRV are associated with impaired metabolic control, chronic inflammation, insulin resistance, and increased risk of cardiovascular disease. Over the last decade, a number of well-designed studies have solidified this association [
3,
4]. Constant comparisons consistently show that patients with metabolic syndrome (MetS) exhibit reduced vagal modulation, reduced global variability, and nonlinear HRV, reflecting autonomic rigidity [
3,
4]. Supporting data from controlled metabolic challenges suggest that autonomic dysfunction may also develop in individuals with nondiabetic conditions showing excessive glucose excursions after ingestion [
5]. These findings collectively establish HRV as an early indicator of cardiovascular and cardiometabolic impairment. They provide an informative physiological contrast in endurance-trained individuals.
Higher-intensity or prolonged bouts of exercise are often associated with dramatically increased HRV, a more prominent parasympathetic profile, and improved autonomic flexibility. Current evidence in athletes suggests retained complexity, increased dynamic stability of RR intervals, and improved recovery of autonomic modulation following metabolic load [
6]. These features delineate an individual autonomic phenotype, providing useful information on how autonomic regulation adapts to different metabolic scenarios when compared with those within MetS and healthy control cohorts.
The oral glucose tolerance test (OGTT) serves as a rigorous experimental stimulus to evaluate such adaptations. The OGTT, as a standardized metabolic stimulus, consistently elicits coordinated autonomic, endocrine, and vascular responses [
7,
8]. An interest in characterizing cardiac autonomic behavior during the OGTT has also greatly expanded. Recent studies show that, through neurohumoral pathways, transient increments in glucose and insulin regulate RR dynamics [
6,
9]. These regulatory responses are often blunted in MetS [
10], whereas endurance-trained marathoners exhibit a robust parasympathetic rebound and accelerated re-stabilization of autonomic activity. These patterns suggest that the responses of the RR interval during an OGTT reflect subtle disturbances in neurocardiac control that remain undetectable under resting conditions. However, despite this diagnostic potential, relatively few studies have assessed autonomic regulation in all stages of OGTT, and even fewer have described dynamic HRV activity throughout the protocol.
Relatively short HRV indices capture relevant yet fundamental aspects of autonomic control, but fall short of providing the complete temporal organization and dynamic architecture of RR interval behavior [
9]. Therefore, recent analytical frameworks increasingly use nonlinear and complexity-based metrics, including entropy measurements, fractal scaling exponents, symbolic dynamics, and Poincaré descriptors, which have been shown to contribute to the early detection of metabolic dysregulation and cardiometabolic risk [
9]. However, HRV descriptors may miss transient or non-stationary patterns that develop solely during the metabolic stimulation period. This constraint has driven the rise of integrative techniques that combine traditional HRV metrics, nonlinear dynamics, and time-series analysis.
Artificial intelligence developments have expedited this transition. Deep learning approaches, especially convolutional neural networks (CNNs), have achieved a level of expert performance in arrhythmia detection, ECG morphology measurement, and multilead cardiac classification [
11,
12]. Long short-term memory (LSTM) networks and gated recurrent units can also leverage long-range dependencies and autonomic oscillations in physiological time-series [
13,
14,
15,
16,
17]. More recent hybrid architectures, such as CNN-LSTM architectures, attention mechanisms, and transformer-based models, have helped advance the characterization of autonomic state, stress reactivity, and cardiometabolic risk [
18]. Nonetheless, the majority of available studies analyze RR interval sequences and heart rate variability (HRV) descriptors in isolation. Integrative multimodal approaches remain scarce; only a limited number of studies have applied deep learning frameworks to metabolic challenge paradigms, such as the OGTT test [
19]. Furthermore, many studies continue to employ record-wise validation, a practice recognized to introduce data leakage and yield inflated performance estimates [
20].
Based on the above limitations, this paper proposes a multimodal deep learning framework that combines knowledge from RR timing series with HRV descriptors from the time, frequency, and nonlinear domains across the five stages of OGTT. We assess three architectures for increasing complexity, which are (i) a convolutional neural network with a multilayer perceptron, (ii) the same convolutional encoder along with a support vector machine (SVM), and (iii) a temporal architecture with long short-term memory layers augmented with an SVM classifier. An 80/20 subject-wise split was used, with 80% allocated to training and internal validation and 20% to unbiased testing. This multimodal construction can facilitate the joint modeling of RR dynamics and multidomain HRV descriptors during the OGTT and the study of the extent to which representations reflect autonomic modifications across different metabolic states. The subject-wise evaluation approach also guarantees sound generalization estimates and addresses the data leakage problem typical in record-wise validation. The present framework integrates temporal data of RR intervals with the physiological content embedded in HRV characteristics, thereby yielding a more comprehensive characterization of autonomic responses in MetS, healthy individuals, and endurance athletes. This suggests that multimodal fusion facilitates the discrimination of metabolic dysfunction beyond what either modality alone can achieve.
2. Materials and Methods
2.1. Study Population and Protocol
Based on clinical and functional criteria, forty adult male volunteers participated in this study. Fifteen participants were diagnosed with metabolic syndrome (MetS) according to the NCEP ATP III guidelines [
21]. The control group (C) consisted of ten healthy individuals, and the remaining fifteen participants were endurance-trained marathon runners (M) who typically ran between 180 and 240 km per week. Participants were recruited from individuals undergoing routine clinical evaluation at the University Hospital of Caracas.
Endurance-trained marathon runners were included as a physiological reference group, in addition to participants with metabolic syndrome and healthy controls, as they exhibit enhanced autonomic regulation associated with long-term aerobic training [
22,
23]. This design enabled the comparison of three different autonomic profiles spanning a continuum of autonomic regulation, ranging from impaired modulation in metabolic syndrome to enhanced vagal modulation in endurance-trained athletes. Participants were excluded if they had known cardiovascular disease, arrhythmias, or ECG abnormalities that could significantly interfere with reliable RR interval analysis, diabetes mellitus or other metabolic disorders different from metabolic syndrome, neurological diseases affecting autonomic regulation, or treatment with medications known to influence heart rate variability, such as beta blockers. To reduce potential confounding effects on autonomic measurements, participants were instructed to refrain from caffeine, alcohol consumption, and intense physical exercise prior to the recordings.
All participants underwent a standardized oral glucose tolerance test (OGTT) after an overnight fast. Venous blood samples were obtained at five time points corresponding to baseline (0 min), 30 min, 60 min, 90 min, and 120 min following the ingestion of a 75 g glucose load. At each stage, glucose, insulin, triglycerides, and HDL cholesterol levels were measured. Immediately after each blood draw, a resting 12 lead ECG was recorded for at least 10 min at a sampling frequency of 1000 Hz and a resolution of 16 bits. Thus, five ECG recordings were obtained for each participant, corresponding to the five OGTT stages, yielding a total of 200 ECG recordings (40 subjects × 5 stages).
All procedures were conducted in accordance with the ethical principles of the Declaration of Helsinki and its subsequent amendments. The recruitment procedures, informed consent process, and data acquisition protocol have been previously described in a peer-reviewed IEEE publication [
24].
Table 1 presents the clinical and biochemical characteristics of the three study groups. Participants with MetS exhibited higher body mass index, blood pressure, triglyceride concentrations, and fasting and post-load glucose and insulin levels compared with the control group and marathon runners. In contrast, marathon runners showed lower fasting and post-load glucose and insulin concentrations, as well as reduced area under the curve values, indicating enhanced metabolic efficiency.
2.2. Signal Processing
All ECG recordings were processed using a structured, transparent, and fully reproducible signal processing pipeline designed to ensure the consistent detection of cardiac events and reliable characterization of autonomic modulation. We first extracted R peaks independently from the 12- leads using a modified implementation of the Pan-Tompkins algorithm, one of the most widely validated approaches for real-time QRS detection [
25]. Each channel was bandpass-filtered (5–15 Hz) using a third-order Butterworth filter to attenuate baseline drift and high-frequency artifacts [
26]. The filtered signals were then differentiated to enhance rapid QRS transitions, squared to amplify steep slopes, and integrated using a moving window operator to obtain an energy envelope highlighting candidate QRS complexes.
Initial peak detections were obtained through adaptive thresholding applied to the integrated signal. To avoid multiple detections within a single cardiac cycle, a physiological refractory period of 200 ms was enforced during peak detection. Each detected peak was subsequently refined by locating the maximum amplitude within a local window of ±50 ms around the candidate peak in the filtered ECG signal.
To improve robustness, particularly under noise, motion artifacts, or morphological variability across leads, detections from the twelve ECG channels were combined using a multichannel fusion strategy. Candidate R peaks detected independently in each lead were temporally aligned and grouped using a clustering tolerance of 50 ms. The final R peak position was defined as the median timing of the clustered detections, producing a fused R-peak sequence that integrates information from all leads while suppressing spurious detections.
Figure 1 presents an example of the multichannel detection approach applied to a ten-second segment of a twelve-lead ECG. The red markers identify the R peaks detected in each channel, and an isolated artifact visible in lead V4 is excluded by the fusion mechanism. This outcome demonstrates that integrating information from all leads enhances the robustness of the detection process and supports a more physiologically consistent identification of cardiac cycles.
After multichannel fusion, RR interval time-series were constructed from the fused R peak positions. RR intervals were defined as the temporal difference between consecutive R peaks and expressed in milliseconds. The resulting RR series was subsequently inspected for abnormal intervals and artifacts. Outliers were identified using a moving median filter with a window size of five beats. RR intervals deviating more than 20% from the local median were considered artifacts and removed from the sequence.
Missing or corrupted RR intervals were then corrected using linear interpolation in order to obtain a continuous RR time-series suitable for variability analysis. For each OGTT stage, contiguous RR segments were selected for analysis. Short-term HRV analysis is typically conducted using segments of approximately five minutes, which are considered sufficient for reliable estimation time domain, frequency domain, and nonlinear indices of autonomic modulation. Five-minute recordings represent the standard duration for short-term HRV analysis and have been widely adopted in physiological and clinical studies, according to the recommendations of the Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology [
27,
28]. In this study, each ECG recording lasted approximately ten minutes, allowing for the selection of artifact-free five- minute RR segments to ensure consistent HRV characterization across the different OGTT stages. In accordance with standard recommendations for short-term HRV analysis, five-minute RR segments were used to characterize autonomic modulation.
For the deep learning models, a sequence of 256 consecutive RR intervals was extracted from each segment to represent short-term heart rate dynamics and serve as input to the sequential branch of the multimodal classifier.
2.3. HRV Feature Extraction
Once the fused R peak positions were obtained, the RR interval series for all OGTT stages were analyzed using automated procedures that included visual inspection to confirm that the temporal patterns were physiologically plausible. Ectopic beats, artifacts, and intervals that deviated from the expected cardiac timing were identified using adaptive criteria based on local variability, slope changes, and median filtering. Only small patches that clearly required correction were interpolated, and this was restricted to the minimum number of points required to maintain the natural temporal structure of the signal. This cautious method preserved the nonlinear nature of the RR dynamics and minimized the risk of altering important features through excessive preprocessing.
An extensive selection of HRV descriptors for the five OGTT time points (0, 30, 60, 90, and 120 min) was then computed following a previously described protocol [
29,
30]. The features were categorized into three domains representing complementary dimensions of autonomic function. The time domain included mean RR, SDNN, RMSSD, pNN50, and SDSD, which are commonly used measures to quantify short- and long-term variability. The frequency domain consisted of VLF, LF, HF power, and the LF/HF ratio, all obtained using Welch spectral estimation to provide stable representations of oscillatory components. The nonlinear domain included Poincaré indices (SD1 and SD2), Sample Entropy (SampEn), Approximate Entropy (ApEn), and detrended fluctuation analysis coefficients (DFA-
and DFA-
), which capture the complexity and self-similarity in cardiovascular regulation.
Collectively, these descriptors provide a comprehensive view of the dynamics emerging under autonomic control, incorporating rapid fluctuations, slower oscillatory components, and nonlinear regulatory behavior. In total, 32 HRV descriptors were computed for each OGTT stage. Prior to model training, HRV features were standardized using z-score normalization computed from the training dataset to ensure comparable scaling across subjects.
The RR interval trajectories for one representative participant from each study group are shown in
Figure 2. As illustrated, the metabolic syndrome case displays reduced variability, healthy individuals show moderate autonomic modulation, and marathon runners exhibit broader and more adaptive fluctuations consistent with enhanced autonomic flexibility. The HRV parameters extracted at each stage are summarized in three tables in the
Section 2, presenting median and interquartile range values for the time, frequency, and nonlinear domains. Differences between groups were evaluated using the Kruskal–Wallis test followed by Dunn’s multiple comparison tests when appropriate. Considering five OGTT stages per subject, the final dataset consisted of 200 multistage recordings (40 subjects × 5 stages), each represented by a multimodal pair composed of a 256-sample RR sequence and a 32-dimensional HRV feature vector.
2.4. Classification Methodology
2.4.1. Model Development
To examine autonomic alterations during the OGTT, we implemented three supervised learning architectures that represent current strategies for analyzing biomedical time-series. The first architecture was a multimodal convolutional model that combined a convolutional encoder with a multilayer perceptron. The second architecture used the same convolutional and dense modules, but replaced the final softmax layer with a support vector machine classifier. The third architecture extended the convolutional front end by incorporating recurrent processing with long short-term memory units, together with a final support vector machine classifier.
These architectures were selected based on previous evidence showing that convolutional encoders achieve strong performance for ECG-based morphology learning [
14,
15,
17]. Recurrent units, particularly long short-term memory layers, capture temporal dependencies that are relevant to autonomic modulation [
13,
16]. Furthermore, hybrid configurations that combine deep feature encoders with discriminative classifiers have demonstrated improved decision boundaries and enhanced generalization on moderate-sized biomedical datasets [
13,
15,
17].
All models processed the RR interval sequences obtained using the multilead Pan-Tompkins procedure. The multimodal configurations also received a set of 32 standardized HRV descriptors that represented the time domain, frequency domain, and nonlinear dynamics. This design enabled the models to integrate the beat-to-beat morphology encoded in the RR sequence with the broader statistical and complex variability patterns captured by the HRV measures [
9,
29].
2.4.2. Subject-Wise Data Partitioning
During model development, the dataset was divided into three independent subsets: training, validation, and test. To guarantee strict independence between participants and prevent information leakage across repeated measurements, the partitioning was performed at the subject level rather than at the sample level. Consequently, all RR interval sequences and HRV feature vectors belonging to the same participant were assigned exclusively to a single subset. This constraint is essential in physiological time-series analysis, where intra-subject correlations may otherwise lead to overly optimistic performance estimates.
A subject-wise hold-out strategy was adopted. First, the 80% of the participants were assigned to the development subset and the remaining 20% to the test subset. Then, within the development subset, an additional subject-wise split was performed to reserve 10% of the subjects for internal validation. This validation subset was used to monitor convergence and apply early stopping during training, while the test subset remained completely unseen until the final evaluation stage.
To further avoid data leakage, preprocessing steps applied to the HRV features, including missing-value imputation and standardization, were fitted exclusively on the training subset and subsequently transferred to the validation and test subsets. In addition, variables associated with subject identity were excluded prior to training. Final model performance was assessed on the held-out test subjects using accuracy, precision, sensitivity, specificity, and F1 score.
2.5. Model Architectures
Three multimodal classification architectures, as can be seen in
Figure 3, were designed to evaluate different strategies for integrating temporal RR interval dynamics with statistical heart rate variability (HRV) descriptors. All models receive two complementary inputs: (1) a temporal sequence of RR intervals and (2) a vector of handcrafted HRV features derived from the same recordings. The RR sequence was represented as a fixed-length vector of 256 samples obtained after preprocessing and interpolation, while the HRV branch consisted of a standardized 32-dimensional feature vector including time-domain, frequency-domain, and nonlinear metrics.
2.5.1. Model A: CNN–MLP (End-to-End Multimodal Baseline)
Model A implements an end-to-end multimodal architecture in which RR interval sequences and HRV descriptors are jointly encoded and classified using a neural network. The RR stream is processed by a two-block one-dimensional convolutional encoder designed to capture short-range temporal structures in cardiac rhythm dynamics.
The first convolutional block consists of a 1D convolution layer with 64 filters (kernel size = 5, stride = 1, ReLU activation), followed by batch normalization, max pooling (pool size = 2), and dropout (rate = 0.2). The second convolutional block includes a 1D convolution layer with 128 filters (kernel size = 3, ReLU activation), batch normalization, max pooling, and dropout (rate = 0.2). The resulting feature maps are flattened to produce a learned RR embedding of 64 dimensions.
In parallel, the HRV branch receives a standardized 32-dimensional feature vector. These descriptors are processed through a multilayer perceptron with 64 and 32 neurons (ReLU activation), producing a 32-dimensional HRV embedding. The RR and HRV embeddings are then directly concatenated, implementing a late-fusion strategy that combines temporal features and global autonomic descriptors into a fused multimodal representation of 96 dimensions.
The fused embedding is then processed by a final classification layer with softmax activation to estimate the probability of the three classes: Metabolic Syndrome (MetS), Control (C), and Marathon Athletes (M).
Convolutional encoders similar to this architecture have demonstrated strong performance for ECG morphology learning and rhythm discrimination in large-scale studies [
12,
19]. Furthermore, combining deep-learned representations with handcrafted HRV descriptors has been shown to improve robustness when datasets are relatively small [
10,
11,
20].
2.5.2. Model B: CNN–MLP + SVM (Hybrid Deep Feature Extractor)
Model B shares the same multimodal feature encoder as Model A, but replaces the neural softmax classifier with a linear one-versus-rest Support Vector Machine (SVM). After supervised training of the CNN–MLP backbone, the latent representation obtained after fusion corresponds to a compact 96-dimensional multimodal feature vector.
This representation is subsequently used as input to an SVM classifier with a linear kernel. Hybrid deep feature extractor + SVM schemes have been widely adopted in biomedical signal analysis, where margin-based classifiers often improve decision boundaries and generalization when datasets are limited or moderately sized [
13,
14,
19,
31].
This configuration allows for the architecture to retain the expressive capacity of the deep convolutional encoder while leveraging the stability and strong generalization properties of SVM classifiers.
2.5.3. Model C: CNN + LSTM–MLP + SVM (Temporal Contextual Multimodal Model)
Model C extends the previous architectures by incorporating explicit temporal modeling of RR interval dynamics. As in the previous models, the RR stream is first processed through the same two convolutional blocks (64 and 128 filters) used in Model A and Model B, producing a sequence of convolutional feature maps that encode local morphological patterns of the RR signal.
Instead of directly flattening these features, the resulting sequence is fed into a unidirectional Long Short-Term Memory (LSTM) layer with 64 hidden units. This recurrent layer allows for the model to capture medium-range temporal dependencies and autonomic fluctuations that cannot be fully represented by convolutional operators alone.
The temporal embedding generated by the LSTM layer is then concatenated with the 32-dimensional HRV embedding, again using late multimodal fusion. The resulting fused representation has 96 dimensions and is subsequently used for classification through a linear one-versus-rest SVM classifier.
Hybrid CNN–LSTM architectures have consistently demonstrated state-of-the-art performance in ECG sequence modeling and physiological time-series analysis [
13,
19,
32]. Similarly, deep encoders combined with SVM classifiers have shown improved discriminative capability in ECG-based disease detection tasks [
33].
2.6. Statistical Hypothesis Tests
We performed statistical hypothesis tests with a 5% significance level to evaluate differences in HRV parameters among the three study groups across the OGTT stages. Prior to group comparisons, normality was assessed using the Shapiro–Wilk test, which indicated that most variables did not follow a Gaussian distribution. Therefore, all subsequent analyses were conducted using nonparametric methods.
Group comparisons at each OGTT stage were performed using the Kruskal–Wallis test for independent samples. When the null hypothesis was rejected, pairwise differences were examined using Dunn’s post hoc test with Bonferroni adjusted p values.
All resulting
p-values obtained for the time, frequency, and nonlinear HRV parameters are presented in
Table 2,
Table 3 and
Table 4. These tables provide a consolidated view of the statistical comparisons across the OGTT stages and study groups.
2.7. Evaluation Metrics
The behaviour of the proposed classifiers was assessed using a confusion matrix, which summarizes the counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) for each class. From these quantities, we derived a set of widely used performance indicators in biomedical signal classification, including accuracy (Acc), sensitivity (
), specificity (
), positive and negative predictive values (PPV and NPV), and the false positive and false negative rate (FPR and FNR). These metrics are recommended for multiclass physiological classification tasks, especially when class imbalance or substantial inter-subject variability is present [
34,
35,
36].
The false positive and false negative rates, denoted as
and
, were computed using Equations (
1) and (
2):
Predictive values and class-wise sensitivity and specificity, denoted as
,
,
, and
, were obtained using Equations (
3)–(
6):
Overall accuracy, denoted as
, which aggregates all confusion matrix components into a single global indicator, was expressed as shown in Equation (
7):
Beyond these confusion matrix-derived measures, we also quantified macro and weighted F1 scores, balanced accuracy, Cohen’s
, and the Matthews correlation coefficient (MCC), all of which provide complementary insights into multiclass discrimination performance [
34]. The area under the receiver operating characteristic curve (AUROC) was computed under a one vs rest (OvR) formulation, reporting both macro- and micro-averaged values. For the HRV feature branch of the multimodal classifier, feature importance was estimated using SHAP-based gradient explanations, enabling the assessment of the relative contribution of each HRV descriptor to the model’s predictions.
3. Results
A total of 40 participants completed the five stages of the OGTT, providing high-quality RR interval time-series for HRV analysis and enabling the construction of multimodal classification models. In this section, we first describe group differences in HRV parameters across OGTT stages, and then present the performance of multimodal deep learning classifiers for three-class discrimination between metabolic syndrome (MetS), healthy controls (C), and marathoners (M). Unless otherwise stated, values are reported as medians (interquartile ranges), and between-group comparisons correspond to Kruskal–Wallis tests with post hoc pairwise contrasts.
3.1. Time Domain HRV Parameters
As shown in
Table 2, subjects with MetS exhibited consistently shorter mean RR intervals across all OGTT stages compared with marathoners, with statistically significant differences at every time point (all
p values < 0.01). This pattern reflects a higher resting heart rate and reduced vagal modulation, in line with population-based studies linking metabolic dysregulation to diminished HRV amplitude and parasympathetic withdrawal [
4,
9]. Differences between the MetS and control groups were more modest; nevertheless, a significant reduction in mean RR was observed at the 30 min stage (
p = 0.014), suggesting an accentuated chronotropic response to the early glucose load in the MetS group. At baseline, SDNN was significantly lower in MetS than in controls (47.6 vs. 91.0 ms,
p = 0.040), indicating reduced global HRV under resting conditions, which agrees with previous evidence of overall variability loss in individuals with clustered metabolic risk factors [
3,
9]. Across the remaining OGTT stages, no statistically significant group differences were found for RMSSD, SDSD, or pNN50, suggesting that short-term beat-to-beat variability is primarily altered at rest rather than during the acute metabolic challenge. Endurance athletes consistently showed longer RR intervals and higher variability indices (SDNN, RMSSD, SD1) than the other groups, consistent with the enhanced parasympathetic tone and improved autonomic efficiency described in trained populations [
37]. Overall, time domain results indicate that autonomic impairment in MetS is most evident at baseline, characterized by shorter RR intervals and lower global variability. In contrast, OGTT-induced changes in short-term indices show limited discriminatory power between groups.
3.2. Frequency Domain HRV Parameters
Table 3 summarizes the frequency domain HRV results across the study groups and OGTT stages. At the baseline measurement, notable contrasts appeared in the very low-frequency (VLF) and low-frequency (LF) ranges. Individuals with MetS displayed substantially lower VLF power than both marathoners (
p = 0.036) and healthy controls (
p = 0.016). This reduction in slower oscillatory components agrees with previous work linking MetS to weakened baroreflex activity and a decline in long-period autonomic rhythms [
3,
4,
10]. LF power also tended to be lower in MetS compared with healthy controls (
p = 0.047), suggesting a reduced oscillatory contribution to autonomic regulation.
The LF/HF ratio at baseline was higher in healthy controls than in MetS and Marathoners groups, a trend consistent with the sympathetic predominance often observed in sedentary adults [
30]. Marathoners, on the other hand, exhibited lower LF/HF values, reflecting the vagally oriented autonomic profile characteristic of endurance training [
37].
When the remaining OGTT stages were examined, the frequency-based measures did not reveal statistically significant differences among the groups, indicating that the short-term spectral adjustments following glucose ingestion were broadly comparable. Taken together, the baseline spectral indicators provided stronger group separation than the dynamic changes observed during the glucose challenge. These findings suggest that the autonomic alterations associated with MetS become more pronounced under resting conditions, where reductions in global spectral power and weakening of low-frequency rhythms are most evident.
3.3. Nonlinear HRV Parameters
Table 4 presents the comparison of nonlinear HRV descriptors across groups and OGTT stages. At baseline, significant differences were observed in the short-term fractal scaling exponent DFA-
. Participants with MetS showed lower values than both controls (
p = 0.017) and marathoners (
p = 0.028). Reduced DFA-
indicates a shift toward more random short-term dynamics and diminished fractal organization, a pattern frequently reported in cardiometabolic dysfunction [
3,
4].
Baseline differences were also evident in long-range variability indices. SD2, which reflects broader oscillatory behaviour in the Poincaré plane, was significantly lower in MetS than in controls (
p = 0.022). This reduction has been linked to impaired baroreflex engagement and decreased autonomic adaptability in metabolic disorders [
4]. Conversely, the SD1 to SD2 ratio was higher in MetS (
p = 0.003), suggesting a compression of long-term variance relative to short-term fluctuations, consistent with early autonomic impairment [
9].
Entropy-based measures showed fewer baseline differences. Approximate entropy and sample entropy tended to be lower in MetS, indicating reduced irregularity and diminished dynamical richness in the cardiac rhythm. These metrics were originally proposed as markers of complexity and physiological variability in cardiovascular time-series [
38]. Notably, sample entropy at the 120 min stage was significantly lower in MetS than in marathoners (
p = 0.041) and controls (
p = 0.024). These late-stage differences may reflect group specific autonomic responses to the progressive metabolic load during the OGTT, consistent with the sensitivity of entropy indices to metabolic and inflammatory stressors.
Across OGTT stages, nonlinear metrics did not show consistent differences among groups, suggesting that resting-state complexity markers are more informative than acute glucose responses. Overall, the nonlinear descriptors reveal that MetS is characterized by reduced fractal organization, diminished long-term variability, and lower system irregularity, features that reflect impaired autonomic complexity and early metabolic dysregulation [
9].
3.4. Multimodal Deep Learning Models for Metabolic Syndrome Classification
A total of 200 multistage recordings were used for model development, corresponding to five OGTT stages (baseline, 30, 60, 90, and 120 min) from 40 subjects. Each sample consisted of two complementary inputs: (i) a short RR interval sequence describing beat-to-beat dynamics and (ii) a vector of 32 handcrafted HRV descriptors derived from time, frequency, and nonlinear domains. HRV features were standardized using z-score normalization computed exclusively on the training data.
Three multimodal neural architectures were evaluated to integrate sequential RR dynamics with HRV descriptors. The RR branch was modeled using a one-dimensional convolutional neural network (1D-CNN) that processes the input sequence . The architecture includes two convolutional blocks composed of Conv1D layers (64 and 128 filters; kernel sizes 5 and 3), ReLU activations, and MaxPooling (pool size = 2). The resulting representation is projected to a 64-dimensional embedding .
The HRV branch processes the standardized feature vector using a multilayer perceptron with two dense layers (64 and 32 neurons, ReLU activation), producing a 32-dimensional embedding .
Both modalities are integrated using late multimodal fusion via vector concatenation,
which serves as the joint representation for classification.
Based on this shared encoder, three classifier configurations were investigated:
- 1.
CNN–MLP: The fused representation is classified using a fully connected layer with Softmax activation for three-class prediction (MetS, Control, Marathon).
- 2.
CNN–MLP + SVM: The multimodal encoder acts as a feature extractor and the fused embedding is classified using a linear Support Vector Machine.
- 3.
CNN + LSTM–MLP + SVM: An additional LSTM layer (64 units) is included in the RR branch to model temporal dependencies before multimodal fusion, with the fused representation classified using an SVM.
Models were trained using the Adam optimizer ( learning rate) and categorical cross-entropy loss. Training was performed with a batch size of 32 for up to 100 epochs with dropout regularization (0.2 in the convolutional blocks and 0.3 in the MLP layers). To prevent data leakage, dataset partitioning was performed at the subject level so that all recordings from a participant were assigned exclusively to either the training or testing set.
The main architectural parameters and training configuration are summarized in
Table 5.
3.4.1. Global Performance Across Models
Table 6 provides an overview of the global performance achieved through the independent test set with respect to the three class problem (MetS, C, and M). All models showed high discriminative quality, with macro F1 scores ranging from 0.92 to 0.95 and discriminative quality ranging from 0.92 to 0.95. All three indicate that the multimodal encoders perform well and are generalizable, despite their modest data size.
The CNN–MLP with a Softmax output layer achieved the best performance in terms of accuracy (0.95), macro F1 score (0.95), and strong agreement metrics (Cohen’s kappa = 0.92, MCC = 0.93).
The CNN–MLP + SVM was the second configuration, with an accuracy of 0.93, a macro F1 score of 0.92, and balanced accuracy and reliability metrics greater than 0.89. This similarity reaffirms that most of the discriminative power comes from the shared multimodal encoder rather than the classifier head.
The performance on the CNN + LSTM–MLP + SVM architecture was competitive, with an accuracy of 0.92 and a macro F1 score of 0.92. It is slightly less accurate than the fully convolutional models, but the AUC was 0.97, indicating excellent class separability across varying decision thresholds.
Overall, all three multimodal approaches yielded consistent, physiologically coherent results, with the CNN-MLP configuration being the most balanced, while the CNN + LSTM–MLP + SVM architecture proved to be a powerful discriminator in terms of AUC.
3.4.2. Per Class Performance
Table 7 reports the precision, recall, and F1 score per class for each architecture in the test set. In the CNN-MLP model, the control group achieved perfect recall (1.00) with high precision (0.91), while marathoners achieved high precision and perfect recall (F1 = 0.97). The MetS class achieved perfect precision (1.00), but slightly reduced recall (0.87), indicating that the few errors primarily involved MetS subjects misclassified as healthy controls. Clinically, this behavior is conservative, as it minimizes false positive MetS predictions.
The CNN-MLP + SVM model reproduced an almost identical per class pattern, confirming that the discriminative strength arises mainly from the shared multimodal encoder rather than the classifier head. The CNN + LSTM–MLP + SVM architecture also produced strong class-wise performance, though with a moderate reduction in recall for the control (0.80) and MetS (0.87) groups. In contrast, marathoners remained consistently well classified across all models, suggesting that their autonomic profiles during the OGTT are the most distinct.
Overall, the integration of RR dynamics and HRV descriptors enabled reliable separation not only between MetS and non-syndromic subjects, but also between healthy controls and endurance athletes, demonstrating that subtle autonomic differences across groups can be effectively captured through multimodal ECG analysis.
3.4.3. Contribution of HRV Parameters
In the HRV branch of multimodal models, the focus of importance analysis was on the most compact category of variables, comprising those that contributed more to the final decision (
Table 8). These three architectures ranked similarly, exhibiting the influence of global variability indices (mean RR and SDNN), sympathovagal balance (LF/HF ratio), geometric descriptors extracted from the Poincaré plot (SD1/SD2 and ellipse area
S), and nonlinear complexity measures (sample entropy and DFA-
).
It shows that the models take advantage of physiologically interpretable HRV patterns that jointly encode overall variability, frequency band modulation, and signal complexity. When combined with the temporal information in the RR sequence, these descriptors provide complementary views of autonomic regulation, thereby supporting consistent discrimination among individuals with MetS, controls, and marathoners.
3.4.4. Graphical Representation
Figure 4 summarizes the learning dynamics of the three multimodal architectures. Across all models, both training and validation curves exhibit rapid convergence during the first epochs, followed by stable behavior without divergence between loss trajectories. This pattern is consistent with controlled model capacity and adequate regularization, indicating that none of the architectures experienced marked overfitting. The CNN + LSTM–MLP backbone shows the smoothest validation loss profile, suggesting enhanced temporal generalization due to its recurrent component.
Receiver operating characteristic (ROC) curves for all classes (Control, Marathoner, and MetS) and all architectures are shown in
Figure 5. In all three models, the ROC curves consistently remain close to the upper left region of the plot, with class-wise AUC values ranging approximately from 0.93 to 0.99. Control and Marathoner profiles achieve the highest AUCs (up to 0.99), whereas the MetS class exhibits slightly lower but still excellent AUCs (0.93–0.96), reflecting a more heterogeneous autonomic phenotype. These results align with the global performance metrics reported in
Table 6 and confirm strong separability among autonomic profiles across OGTT stages.
To complement these analyses,
Figure 6 displays the normalized confusion matrices for the three multimodal models. All configurations consistently show high recall for the Marathoner class, while the CNN-MLP + SVM and CNN + LSTM-MLP + SVM models achieve perfect classification of the Control group. The remaining misclassifications predominantly affect subjects with MetS, who are occasionally assigned to the Control or Marathoner classes, in line with the partial physiological overlap between these groups. Taken together, these visual results reinforce the high discriminative capacity of the proposed multimodal architectures and the added value of integrating RR dynamics with multidomain HRV features.
4. Discussion
MetS was associated with an altered cardiac autonomic profile characterized by reduced vagal control, consistent with established physiological models and prior evidence. Participants with MetS exhibited significantly lower vagally mediated HRV indices, specifically
rMSSD and
SD1, together with reduced HF power and a relatively higher
ratio. Collectively, these findings support diminished parasympathetic modulation and a relative sympathovagal imbalance. This interpretation is consistent with previous evidence showing that MetS is associated with lower short-term HRV and reduced autonomic flexibility [
4,
10,
39].
To place these findings in context, several previous studies have systematically examined the relationship between MetS and HRV using observational, systematic review, and meta-analytic approaches. Overall, the available evidence consistently supports a reduction in vagally mediated HRV indices, lower global variability, and broader impairment in autonomic regulation among individuals with MetS. This convergence reinforces the interpretation that the autonomic alterations observed in the present study are not isolated findings, but rather part of a reproducible cardiometabolic phenotype [
4,
10,
39].
In contrast, endurance athletes demonstrated greater autonomic efficiency, evidenced by higher HF power, increased nonlinear complexity (
SampEn), and preserved
DFA-α1 dynamics in our cohort. The higher vagally mediated profile observed in athletes is in agreement with prior work showing that aerobic fitness and endurance training are associated with more favorable cardiac parasympathetic regulation and improved autonomic adaptation [
40,
41]. Healthy participants exhibited an intermediate autonomic profile between the MetS and athlete groups.
Furthermore, individuals with MetS displayed significantly lower VLF and LF power compared with controls, suggesting attenuation of longer-period oscillatory rhythms and possible impairment in baroreflex related cardiovascular modulation. This interpretation is supported by evidence indicating that metabolic syndrome is accompanied by neuroadrenergic and reflex abnormalities, including sympathetic activation and altered baroreflex control [
42]. From a physiological perspective, these baseline alterations are compatible with impaired cardiovascular autonomic regulation in MetS.
Notably, despite these baseline differences, spectral HRV indices exhibited broadly similar short-term responses to glucose ingestion during the OGTT. This suggests that although basal autonomic tone is altered in MetS, acute autonomic adjustments to a metabolic challenge may remain at least partially preserved, albeit from a less favorable autonomic baseline. Given the limited direct literature specifically addressing dynamic HRV responses to OGTT across these phenotypes, this aspect should be interpreted cautiously and warrants further study.
Consistent with the current HRV literature, we adopt the term “relative sympathovagal imbalance” rather than “sympathetic predominance,” acknowledging the physiological complexity and the ongoing debate regarding the interpretation of the
ratio as a direct index of sympathetic activity [
43]. Integrating multidomain HRV descriptors with short RR interval dynamics during the OGTT, we investigated autonomic modulation in participants and its contribution to differentiating participants with MetS, C, and M. The results indicate that time, frequency, and nonlinear domain HRV indices capture complementary facets of autonomic regulation under resting conditions and glucose stimulus induced metabolic stress.
At all OGTT phases, the M group showed the typical autonomic features of endurance-trained subjects, characterized by longer RR intervals, greater global variability (SDNN), and higher short-term parasympathetic indices (RMSSD and SD1). These findings are consistent with other evidence indicating that habitual endurance exercise induces increased vagal activity, retained complexity, and greater autonomic responsiveness [
37]. This pattern, by contrast, is consistent with MetS patients, who exhibit consistently shorter RR intervals and overall decreased variability, especially at baseline. This pattern implies sustained vagal withdrawal and very limited autonomic adjustment, and is consistent with an established pattern of studies associating metabolic dysregulation with decreased HRV amplitude and baroreflex function [
3,
4,
10].
These interpretations were also consistent with spectral analysis. MetS participants had reduced VLF and LF power during rest, confirming that metabolic dysfunction impairs slower oscillatory rhythms and disturbs the sympathetic–parasympathetic homeostatic balance. Nonlinear indicators provided insight: diminished fractal structure (DFA-
), lower long-term variability (SD2), and a decline in entropy after 120 min is indicative of a loss of physiological complexity due to glucose exposure. These results corroborate previous work demonstrating that nonlinear HRV indices are sensitive to early autonomic decline and loss of adaptive function in metabolic conditions [
9,
30].
Our multimodal deep learning results were convergent evidence. An integrative task combining RR temporal structure with HRV descriptions achieved a good discriminative performance in the three class classification tasks. All multimodal architectures had class-wise AUCs between 0.93 and 0.99, with C and M providing the highest class specificity. MetS returned slightly lower values, but similar results that reflect MetS’ rather distinct autonomic phenotype. Normalized confusion matrices showed the CNN–MLP + SVM, as well as CNN + LSTM–MLP + SVM architecture, achieved perfect recall for C and M while correctly identifying 87% of cases of MetS. Distribution of misclassification patterns with respect to C and M was consistent with the expected physiological overlap between MetS and non-MetS autonomic profiles.
The similarity of results between the Softmax and SVM classifier heads indicates that a large majority of discriminative information is encoded in the multimodal encoder. The models were also validated for the inclusion of physiologically interpretable measures such as mean RR, SDNN, LF/HF ratio, fractal exponent, and entropy based on feature importance checks, emphasizing the validity of the learned structures according to bio significance. Combined, these results indicate that integrating RR dynamics with multidomain HRV descriptors not only captures the autonomic regulation of the OGTT more effectively, but also allows for discrimination between metabolic phenotypes beyond either modality alone.
A limitation of this study is that autonomic regulation was assessed indirectly through RR interval dynamics and multidomain HRV indices derived from ECG recordings, without complementary standardized autonomic function tests such as deep breathing, Valsalva maneuver, active standing, or baroreflex sensitivity assessment. This decision was consistent with the specific aim of the study, which was to characterize cardiac autonomic responses to metabolic stimulation during the OGTT using non-invasive ECG-based markers. Therefore, the present findings should be interpreted as indirect evidence of altered cardiac autonomic modulation during OGTT rather than as a comprehensive evaluation of autonomic nervous system function. Future studies may strengthen the physiological interpretation by integrating HRV analysis with formal autonomic testing.
5. Conclusions
We show that the integration of multiple-domain HRV descriptors and short-term RR interval dynamics during OGTT captures complementary aspects of autonomic regulation, thus facilitating a clearly discriminative process among individuals with MetS, C, and M, and that resting HRV was particularly revealing in that participants who had MetS consistently demonstrated a reduced global variability, attenuation of low-frequency oscillations, reduced fractal organization, and diminished signal complexity. Together, these observations indicate a chronic impairment in autonomic cardiovascular control linked to metabolic dysregulation.
For the multimodal classifiers tested in this study, performance was impressive, with accuracies near 0.95 and macro AUCs > 0.96. The CNN-MLP model produced the most balanced behaviour amongst all three groups, and the CNN + LSTM-MLP + SVM combination displayed the most obvious separation between classes. Feature relevance analyses bolstered the physiological plausibility of these models, suggesting a consistent integration of variability indexes, spectral characteristics, and nonlinear signals that guided their selection.
In summary, this research suggests new potential for HRV-based multimodal frameworks as a non-invasive yet efficient means for early screening and stratification of MetS. When these physiological descriptors are combined with clinical characteristics such as biochemical markers, anthropometric profiles, and lifestyle indicators, predictive power can be increased, and the potential for more individualized assessment of cardiometabolic risk could be realized.
Future work should consider additional ECG-derived features beyond standard HRV metrics for PR and QT intervals, QTc formulations, ST segment properties, and beat-to-beat morphological variables. Studying the coupling of these variables might provide more subtle autonomic and electrophysiological signatures of the metabolic dysfunction. Future research must also consider model calibration, longitudinal monitoring to evaluate temporal stability, and the inclusion of individuals in the study cohort with isolated metabolic abnormalities, prediabetes, or cardiovascular comorbidities. These processes will make it more generalizable and enable the translation of these HRV-based multimodal systems into clinically relevant, practical use.