1. Introduction
Diabetes mellitus and prediabetes remain major global public health challenges. The International Diabetes Federation’s (IDF) Diabetes Atlas (11th edition) estimates that 11.1% of adults aged 20–79 years (approximately 589 million people) had diabetes in 2025, and projects an increase to about 853 million by 2050 (approximately 12.5% of adults) [
1]. Importantly, more than 40% of adults with diabetes are estimated to be undiagnosed, highlighting persistent gaps in early detection of impaired glucose metabolism [
1]. From the perspective of primary prevention of type 2 diabetes, identification at the prediabetes stage is particularly important. In the Diabetes Prevention Program (DPP), intensive lifestyle intervention and metformin reduced the risk of developing type 2 diabetes by 58% and 31%, respectively, compared with placebo among individuals with impaired glucose tolerance [
2]. These findings underscore the importance of identifying high-risk individuals at early stages and linking them to effective interventions.
Fasting-plasma glucose (FPG) and glycated hemoglobin (HbA1c) are widely used for screening impaired glucose metabolism, but both require invasive blood sampling and are typically obtained at discrete time points in clinical practice, which may present practical challenges for frequent, large-scale primary screening outside clinical settings.
Moreover, HbA1c reflects average glycemic exposure over the preceding 2–3 months and does not capture day-to-day glycemic variability (GV). Previous studies have highlighted the clinical relevance of GV beyond mean glucose levels; Suh and Kim summarized common GV metrics, such as SD, MAGE, and CV, and discussed associations with oxidative stress and endothelial dysfunction [
3], while Monnier et al. reported that acute glucose fluctuations were more strongly associated with oxidative stress markers than sustained hyperglycemia [
4]. Although continuous glucose monitoring (CGM) enables detailed assessment of glycemic profiles, including GV, it requires subcutaneous sensor placement and remains less suitable for broadly accessible, low-cost primary screening. These limitations motivate the search for noninvasive indicators that can be deployed at scale.
Autonomic nervous system (ANS) function has emerged as a potential target for the early identification of metabolic risk. Glucose homeostasis is regulated in part by the balance between sympathetic and parasympathetic activity, and autonomic imbalance has been implicated in insulin resistance and the pathophysiology of type 2 diabetes. Population-based studies suggest that autonomic dysfunction may function not merely as a downstream consequence of diabetes but also as an aggravating factor or a parallel pathological process. For example, the Toon Health Study reported that reduced heart rate variability (HRV) and relative sympathetic dominance modified the association between insulin resistance and metabolic syndrome [
5].
Cardiac autonomic neuropathy (CAN) has also been documented in individuals with prediabetes, with reported prevalence estimates ranging from 9 to 38%. Moreover, large cohort studies, such as the Cooperative Health Research in the Region of Augsburg (KORA) S4, have demonstrated higher rates of CAN in prediabetes compared with normal glucose tolerance [
6,
7]. Consistent with these findings, recent reviews have further supported the notion that autonomic dysfunction may precede the clinical onset of diabetes [
8].
Heart rate variability (HRV) is widely used as a noninvasive surrogate marker of autonomic function, with established physiological interpretations described in the Task Force guideline and subsequent reviews [
9,
10]. Epidemiological evidence consistently links reduced HRV to adverse glucose metabolism. Meta-analyses have reported lower HRV across multiple indices in type 2 diabetes and metabolic syndrome, including SDNN (standard deviation of normal-to-normal intervals) and RMSSD (root mean square of successive differences between consecutive RR intervals, estimated here from PPG-derived inter-beat intervals), total power, and HF power (high-frequency power, typically 0.15–0.40 Hz) [
11,
12]. Population-based studies, such as the Maastricht Study, have further shown that HRV indices are reduced not only in type 2 diabetes but also in prediabetes compared with normal glucose metabolism [
13]. Longitudinal studies suggest that temporal patterns of HRV provide additional prognostic information; for example, HRV trajectories predicted diabetes onset in the Rotterdam Study [
14], and multiple HRV indices independently predicted diabetes risk in ELSA-Brasil [
15]. These findings motivate attention to dynamic autonomic patterns rather than single timepoint summaries.
Associations between autonomic function and glucose metabolism may be particularly evident during sleep, when major daytime confounders such as physical activity, diet, and acute psychological stress are minimized. Experimental studies have demonstrated that selective slow-wave sleep suppression reduces insulin sensitivity by approximately 25% [
16], and that sleep fragmentation worsens glucose metabolism via sympathetic activation [
17]. Epidemiological evidence further links sleep characteristics with glycemic markers and diabetes risk, including associations between poor sleep quality and higher HbA1c [
18], and U-shaped relationships between sleep duration and diabetes risk [
19].
Importantly, a study using 24-h ECG recordings in patients with type 2 diabetes demonstrated that sleep-stage-specific HRV indices—particularly nonlinear measures analyzed separately for REM and NREM sleep—were significantly associated with metabolic function and glycemic control [
20].
Together, these observations provide physiological and methodological support for focusing on sleep-time HR and HRV dynamics.
Recent advances in consumer wearable sensors enable longitudinal monitoring of physiological signals such as heart rate (HR) and HRV in daily life. Large-scale studies have linked wearable-derived data to chronic disease outcomes, including type 2 diabetes [
21,
22]. Although pulse rate variability (PRV) derived from photoplethysmography (PPG) does not perfectly match ECG-derived HRV, validation studies support PPG-based indices as practical surrogates under appropriate conditions. Agreement between PRV and HRV has been reported at rest and during free-living conditions [
23,
24], with particularly high accuracy during sleep when motion artifacts are minimized [
25,
26]. These findings support the feasibility of sleep-time wearable monitoring as a noninvasive approach to assessing autonomic dynamics related to glycemic status.
The primary objective of this exploratory study is to identify sleep-derived HR/HRV features—particularly dynamic features capturing variability, trends, and instability—that are associated with glucose metabolism status. We hypothesize that dynamic autonomic indices, such as overnight trends in HRV dispersion and patterns of HR variability, show stronger associations with glycemic markers than conventional static mean values. To address this objective, we apply LASSO (Least Absolute Shrinkage and Selection Operator)-based feature selection to screen candidate features and conduct between-group comparisons to assess effect sizes, with the aim of generating hypotheses for confirmatory investigation in larger cohorts.
2. Materials and Methods
2.1. Participants and Study Design
This prospective observational study enrolled adults who wore continuous glucose-monitoring (CGM) devices and wrist-worn wearable devices simultaneously under free-living conditions. Twenty-one adults were initially enrolled in the study. Three participants were excluded due to incomplete data (insufficient valid sleep-time HRV recordings or CGM data). The final analytical sample comprised 18 participants with valid sleep-time heart rate variability (HRV) data (9 males and 9 females; mean age: 39.6 ± 10.4 years; mean body mass index (BMI): 22.7 ± 2.3 kg/m2) were included in the analysis. Based on preliminary screening, participants had no confirmed diagnosis of diabetes, no history of serious cardiovascular disease (e.g., heart failure, severe arrhythmia, ischemic heart disease), and no skin conditions or allergies that would interfere with device wear.
This study was approved by the Institutional Review Board of The University of Osaka (protocol code R4-17).
Before participation, all participants received oral and written explanations of the study objectives, procedures, anticipated risks and benefits, and privacy protections. Participation was voluntary, and participants could withdraw at any time without disadvantage. Written informed consent was obtained from all participants. Data were managed using anonymized research IDs and analyzed and reported in a non-identifiable form.
Participants were instructed to wear the devices continuously for up to 15 days during daily life. Day 1 served as an acclimation period; data collected from Day 2 onward were used for analysis.
Continuous Glucose Monitoring: Abbott FreeStyle Libre sensors were worn on the upper arm to record interstitial glucose continuously at approximately 15 min intervals. CGM systems, including FreeStyle Libre, measure interstitial glucose via electrodes and transmit values to receivers or smartphones at intervals ranging from several minutes to 15 min [
27,
28,
29]. Glucose values with timestamps were extracted using the pipeline described below and used to compute daily mean glucose and the glucose management indicator (GMI; estimated HbA1c).
Wearable Device (Heart Rate, Activity, and HRV): Heart rate (HR) and HRV were measured using Fitbit Charge 6 wrist-worn activity trackers (Google LLC, Mountain View, CA, USA). The device employs an optical heart rate sensor based on wrist reflectance photoplethysmography (PPG), for which green light (wavelength approximately 525 nm) is commonly used in consumer wearables to achieve favorable signal quality at the wrist while reducing susceptibility to motion artifacts compared with longer-wavelength light sources, such as red or infrared LEDs [
30,
31,
32,
33].
The device internally samples the PPG signal at a frequency sufficient for beat-to-beat interval detection. For research analysis, the data are downsampled and aggregated: heart rate values are recorded at 1-s intervals during exercise and 5-s intervals during standard monitoring (including sleep). HRV metrics (specifically RMSSD) are computed by the device over non-overlapping 5-min windows based on high-resolution beat-to-beat intervals and are reported as discrete values at 5-min intervals [
33,
34]. Sleep period detection was performed using the Fitbit device’s proprietary sleep-staging algorithm. Although the specific algorithm implemented in the Charge 6 has not been independently validated in peer-reviewed literature, previous validation studies of earlier Fitbit models with sleep-staging capabilities against polysomnography have reported high sensitivity (0.95–0.96) for detecting sleep epochs [
34].
Sleep periods were defined using sleep onset and wake time estimates (StartTime and EndTime) provided by the device. For each participant and night, all physiological time-series data (HR and HRV) were temporally aligned, and only data points falling within the interval between the estimated sleep onset and wake time were extracted and analyzed as sleep-period data. No sleep-stage-specific classification was applied; sleep periods were treated in a stage-agnostic manner based solely on the device-estimated sleep onset and offset.
2.2. Feature Extraction: Two-Stage Approach
Features were extracted using a two-stage hierarchical approach: (1) window-level statistics computed within each 30 min sleep window, and (2) night-level aggregation across all windows within a sleep session.
2.2.1. Stage 1: Window-Level Feature Extraction
Sleep periods were segmented into non-overlapping 30 min windows. For each window w, we computed descriptive statistics for heart rate (HR) and log-transformed RMSSD (ln RMSSD) based on the time series x sampled within that window.
For each 30 min window w, let x denote the subwindow time series consisting of all HR or (ln RMSSD) samples observed within the window. The following within-window descriptors were computed from this subwindow time series: the mean, defined as the average value of x within the window; the standard deviation, defined as the dispersion of x around its window-specific mean; and the variance, defined as the squared standard deviation, reflecting the overall variability of x within the window.
To characterize short-term monotonic trends, we fitted a linear model to the subwindow time series
x within each 30 min window
w:
where
denotes the elapsed time in hours from the start of window
w (range: 0–0.5 h). The parameters are defined as follows:
(Slope): This represents the rate of change in the physiological metric x per hour, quantifying the direction and magnitude of the monotonic trend within the window.
(Intercept): This parameter represents the fitted baseline value of the metric at the exact start of the window (). Physiologically, this estimates the initial level of the autonomic parameter for that specific epoch, filtering out the subsequent temporal drift.
(Residual): Residual represents the deviation of the observed data point from the fitted linear trend at time i. This term captures higher-frequency fluctuations and short-term variability (such as instantaneous autonomic responses to micro-arousals) not explained by the linear trajectory.
The estimated slope has units of change per hour (e.g., bpm/h for HR and ln(ms)/h for ). Here, bpm/h (beats per minute per hour) denotes the change in heart rate (measured in beats per minute) per hour of elapsed time. For example, bpm/h means that HR increases by 2 bpm for every hour of elapsed time within the 30 min window. Similarly, ln(ms)/h denotes the change in per hour of elapsed time.
As a complementary, nonparametric measure of trend magnitude, we computed the endpoint-to-endpoint rate of change within each window as
where
denotes the start time of window
w. Accordingly,
has units per minute (e.g., bpm/min for HR).
2.2.2. Stage 2: Night-Level Aggregation
For each participant-night d, multiple valid non-overlapping 30 min sleep windows were retained. Let denote the number of retained windows for night d, and let denote a window-level descriptor computed from heart rate (HR) or natural-log-transformed RMSSD (lnRMSSD) in window .
Window-level descriptors were aggregated across windows to construct night-level features that characterize the overall level, temporal evolution, and across-window variability of autonomic dynamics during the sleep session.
To summarize the overall level and across-window variability of a window-level descriptor during a sleep session, we computed the nightly mean and the nightly standard deviation across all retained windows within night d. The nightly mean represents the typical level of the descriptor over the night, whereas the nightly standard deviation quantifies window-to-window variability. The latter was primarily used for dynamic window-level descriptors, such as within-window linear slopes and endpoint-to-endpoint speeds, which explicitly reflect short-term fluctuations within each 30 min window.
To capture systematic changes in window-level descriptors over the course of sleep, we regressed
on elapsed time from the first retained window of the night:
where
denotes the elapsed time in minutes from the start of the first retained window of night
d.
The slope , referred to as the nocturnal trend, quantifies whether the window-level descriptor systematically increases or decreases over the night. The nocturnal trend has units per minute (e.g., bpm/min for HR, ln(ms)/min for ) because is expressed in minutes. This differs from the within-window slopes , which use elapsed time in hours and thus have units per hour.
For LASSO-based feature screening, night-level features were constructed as follows: For window-level descriptors reflecting distributional properties within each 30 min window (window-wise mean, standard deviation, variance, minimum, and maximum of HR and lnRMSSD), we used the nightly mean and the nocturnal trend. For dynamic window-level descriptors characterizing short-term changes within each window (within-window linear slope and endpoint-to-endpoint speed), we used the nightly mean and the nightly standard deviation across windows.
A one-to-one correspondence between these manuscript definitions and the feature identifiers used in the analysis is provided in
Supplementary Table S1.
2.3. Glucose Metabolism Status Definition and Outcome Labeling
2.3.1. Estimated HbA1c (eHbA1c) from CGM Mean Glucose
For each participant, we computed an HbA1c-equivalent index from the participant-level mean CGM glucose (mg/dL) using the A1c-Derived Average Glucose (ADAG) relationship:
This expression is a rearrangement of the ADAG regression linking laboratory-measured HbA1c to average glucose, providing an HbA1c-equivalent summary derived from mean glucose [
35].
The glucose management indicator (GMI) has been proposed as an alternative CGM-based index that expresses mean glucose on an HbA1c-like scale using a CGM-specific regression [
36]. Because eHbA1c/GMI may differ from laboratory HbA1c due to inter-individual factors (e.g., variability in erythrocyte lifespan and glycation kinetics), such CGM-derived indices should be interpreted as HbA1c-equivalent proxies rather than diagnostic measurements [
37].
2.3.2. Outcome Labeling for Machine Learning
We defined a screening-oriented binary outcome at the participant level using a CGM-derived eHbA1c threshold of 5.5%. Participants with eHbA1c < 5.5% were classified as the lower-glycemic-risk group, whereas those with eHbA1c ≥ 5.5% were classified as the higher-glycemic-risk group. This threshold was selected for risk stratification based on evidence from Japanese cohorts evaluating HbA1c cutoffs for future type 2 diabetes risk (Hisayama Study) [
38] and supported by longitudinal findings reporting substantially elevated diabetes incidence within the HbA1c 5.7–6.4% range (TOPICS 3) [
39]. We emphasize that these group labels are derived from a CGM-based HbA1c-equivalent proxy and are used solely to operationalize an outcome for screening-oriented research; they do not constitute a clinical diagnosis.
We selected 5.5% as a conservative, sensitivity-prioritized cutoff for early-stage screening in an apparently healthy population based on the following considerations.
- 1.
Japanese epidemiological evidence. In the Hisayama Study, a population-based prospective cohort of Japanese adults (
; 7-year follow-up), individuals with baseline HbA1c 5.5–5.9% had approximately a 4.8-fold higher risk of developing type 2 diabetes than those with HbA1c < 5.5% (95% CI: 2.0–11.6) [
38]. Similarly, the Toranomon Hospital Health Management Center study (TOPICS 3;
) reported substantially elevated annual diabetes incidence in the 5.7–6.4% range [
39].
- 2.
Asian-specific risk profiles. Meta-analyses suggest that Asian populations may exhibit increased diabetes risk at lower HbA1c levels than Western populations, potentially due to differences in beta-cell function and insulin sensitivity [
40]. This supports adopting a lower screening threshold in Japanese cohorts.
- 3.
Screening philosophy. For primary screening aimed at identifying individuals who may benefit from early lifestyle intervention, prioritizing sensitivity is methodologically appropriate. The 5.5% cutoff may help detect early glycemic dysregulation before progression to the ADA-defined prediabetes range (5.7–6.4%) or overt diabetes (≥).
We acknowledge that this threshold is lower than the ADA prediabetes criterion (5.7%) [
41]. Therefore, individuals identified as screen-positive by the model would require confirmatory laboratory testing (measured HbA1c and/or OGTT) before clinical classification.
2.4. Feature Selection: LASSO-Based Variable Screening
The primary analytical objective was to identify candidate HR/HRV features associated with glucose metabolism indicators.
Nocturnal mean glucose (continuous) at the participant–night level was used as the outcome variable for feature screening.
Candidate predictors included 28 HR/HRV-derived variables together with demographic covariates (age and BMI). Variables with more than 50% missing values or zero variance were excluded. The remaining values were imputed using a two-stage procedure: within-subject mean imputation followed by overall mean imputation. All predictors were standardized to have zero mean and unit standard deviation prior to model fitting.
Elastic Net regression was applied with
, corresponding to equal weighting of LASSO and ridge penalties. LASSO is a regularized regression technique that adds an L
1 penalty term (
) to the standard least squares objective function. This penalty shrinks the coefficients of less informative predictors toward zero and, critically, forces some to become exactly zero, thereby performing automatic feature selection. However, when predictors are highly correlated (as HR/HRV features often are due to shared physiological origins), LASSO may arbitrarily select one variable and ignore others. Elastic Net addresses this by combining the L
1 penalty with the L
2 penalty of Ridge regression (
), providing both sparsity (feature selection) and robustness to multicollinearity [
42,
43]. The regularization parameter
was selected using 5-fold cross-validation. Features with non-zero coefficients at the optimal value of
were retained as candidate correlates of glucose metabolism.
2.5. Between-Group Comparisons
Independent of the feature screening stage, we conducted univariate between-group comparisons at the participant level to assess whether individual HR/HRV features differed between the lower- and higher-glycemic-risk groups. Because group sizes were unequal and homogeneity of variance could not be assumed, Welch’s t-test was used for continuous variables. Effect sizes were quantified using Cohen’s d to complement -values and to facilitate interpretation of the magnitude and practical relevance of between-group differences.
2.6. Statistical Analysis
All analyses were performed in MATLAB R2025a. Baseline demographic characteristics were compared between groups using Welch’s t-test for continuous variables and Fisher’s exact test for categorical variables, as appropriate. Statistical significance was assessed at a two-sided threshold of .
Given the exploratory nature of this study and the limited sample size, we report unadjusted p-values and emphasize effect sizes and consistency of findings across related features. Accordingly, statistical results should be interpreted as hypothesis-generating rather than confirmatory, and any potentially informative signals identified here warrant validation in larger, independent cohorts.
3. Results
This section first reports the results obtained using the dynamic autonomic indices newly introduced in this study. These indices, derived from sleep-period heart rate (HR) and heart rate variability (HRV) signals—such as HR change speed and slopes of HRV-related measures—were designed to capture temporal trends and fluctuations in autonomic regulation during sleep. We found that these dynamic indices showed stronger associations with individual glucose metabolism status than conventionally used static mean HR/HRV indices.
This section then presents participant characteristics, results of feature selection, classification performance, and between-group comparisons. Overall, the dynamic autonomic indices, which reflect time-varying sympathetic and parasympathetic activity, demonstrated higher explanatory power for nocturnal mean glucose than static summary measures, supporting their potential utility for screening-oriented assessment of glycemic risk.
3.1. Participant Characteristics
The analytical sample comprised 18 participants contributing 189 person–nights. In total, 7 participants (38.9%) were classified as the higher-glycemic-risk group (eHbA1c ≥ 5.5%), and 11 participants (61.1%) as the lower-glycemic-risk group (eHbA1c < 5.5%). Age and BMI did not differ significantly between groups (both ).
The mean glucose and eHbA1c were significantly higher in the higher-glycemic-risk group (both
;
Table 1).
Figure 1 shows the between-group comparison of eHbA1c by glycemic status.
3.2. Feature Selection Results: Dynamic Features Show Stronger Associations
Elastic Net regression (
; 5-fold CV) identified 14 features with non-zero coefficients from 28 initial HR/HRV variables plus demographics (
Figure 2).
3.2.1. Demographic Covariates
Age exhibited the largest standardized coefficient (), indicating a strong positive association with nocturnal mean glucose. BMI also showed a positive contribution. These associations are consistent with established epidemiological evidence linking older age and higher BMI to impaired glucose metabolism.
3.2.2. HRV Dynamic Indices
Among the dynamic features derived from , three indices retained non-zero coefficients in the feature selection procedure.
The first was the nightly mean of the within-window linear slopes, denoted as . This index represents the average linear slope of computed within individual 30 min sleep windows and then averaged across all windows within a night. As such, it summarizes the typical short-term change in parasympathetic activity during sleep, with positive values indicating that tended to increase within windows on average.
The second index was the nocturnal trend in the within-window standard deviation,
. This feature was obtained by regressing the window-level standard deviation of
,
, on elapsed time across the night (Equation (
3)). It captures whether HRV variability systematically increases (
) or decreases (
) as sleep progresses.
The third index, , represents the nocturnal trend of the within-window variance of . Analogous to the standard-deviation-based index, this measure was computed using the window-level variance and reflects changes in the overall dispersion of parasympathetic activity over the course of the night.
3.2.3. HR Dynamic Indices
In addition to HRV-based measures, three dynamic features derived from heart rate (HR) also retained non-zero coefficients in the feature selection procedure.
One such feature was the nocturnal trend of the within-window HR variance, denoted as . This index was obtained by regressing the window-level HR variance, , on elapsed time across the night. It captures whether short-term HR dispersion systematically increases () or decreases () as sleep progresses.
Another retained feature was the nightly mean of the within-window HR variance, , which reflects the average magnitude of short-term HR variability across all 30 min sleep windows within a night. This measure summarizes the overall level of HR instability during sleep.
The third feature was the nightly standard deviation of the within-window HR speed, . This index quantifies how variable the rate of HR change is from window to window over the course of the night, with higher values indicating more heterogeneous dynamics, characterized by rapid HR changes in some windows and minimal changes in others.
3.3. Between-Group Differences in Dynamic Features
To assess effect sizes independently of the LASSO-based feature screening procedure, we performed univariate between-group comparisons at the subject level (
Figure 3).
Two HRV dynamic indices exhibited statistically significant between-group differences with large effect sizes. Specifically, the nocturnal trend of the within-window standard deviation of , , differed significantly between groups (, Cohen’s ). Similarly, the nocturnal trend of the within-window variance of , , also showed a significant difference (, Cohen’s ). In both cases, the magnitude of the effect sizes exceeded conventional thresholds for large effects.
3.4. Summary of Key Findings
Table 2 summarizes the candidate features identified through LASSO screening and their between-group comparison results.
The direction of these effects was consistent across indices. Participants in the lower-glycemic-risk group exhibited negative nocturnal trends (), indicating that HRV variability decreased over the course of the night. This pattern is consistent with progressive autonomic stabilization during sleep, particularly reflecting parasympathetic modulation. In contrast, participants in the higher-glycemic-risk group showed positive nocturnal trends (), indicating increasing HRV variability overnight, a pattern suggestive of persistent autonomic instability or disrupted autonomic regulation during sleep.
By contrast, the remaining features—including HR-derived dynamic indices, the nightly mean of within-window slopes , age, and BMI—did not differ significantly between groups at the subject level (all ).
4. Discussion
This exploratory study identified candidate sleep-derived HR/HRV features that may be associated with glucose metabolism status. The key finding is that dynamic autonomic indices—capturing overnight trends and variability patterns—showed stronger associations with nocturnal glucose than conventional static mean values. Two HRV dynamic features (the nocturnal trends and ) differed significantly between glycemic status groups with large effect sizes (), providing converging evidence for their potential physiological relevance.
4.1. Physiological Interpretation of Dynamic HR/HRV Features
The prominence of dynamic indices in our feature selection results aligns with growing recognition that temporal patterns of autonomic activity—not just absolute levels—carry information about metabolic health. Prior longitudinal evidence supports this interpretation: the Rotterdam Study reported that changes in HRV trajectories over time predicted incident type 2 diabetes [
14], and ELSA-Brasil found that unfavorable HRV patterns independently predicted diabetes risk [
15].
A static mean HRV (e.g., ) reflects the overall level of parasympathetic activity. In contrast, the nocturnal trend of HRV variability () captures whether autonomic regulation stabilizes or remains unstable as sleep progresses. This distinction may be clinically meaningful: healthy sleep is characterized by progressive autonomic stabilization, particularly during deep non-rapid eye movement (NREM) sleep, whereas disrupted sleep architecture may manifest as persistent variability.
4.2. Interpretation of the Observed Directional Pattern
The directional pattern observed in this study—namely, a negative nocturnal trend in HRV variability () in the lower-glycemic-risk group contrasted with a positive trend () in the higher-glycemic-risk group—may be interpreted in light of several plausible physiological mechanisms.
Sleep architecture disruption may play an important role in the observed patterns. NREM sleep is generally characterized by parasympathetic dominance and relatively stable HRV, whereas rapid eye movement (REM) sleep shows greater sympathetic influence [
44]. In healthy individuals, the predominance of deep NREM sleep early in the night may lead to progressive autonomic stabilization (
). Disruptions in sleep-stage continuity—commonly observed in metabolic dysfunction—could attenuate this stabilization pattern, resulting in
or
.
Additionally, micro-arousals and sleep fragmentation warrant consideration as potential contributors to the directional differences between groups. Experimental studies have shown that sleep fragmentation shifts autonomic balance toward sympathetic dominance and reduces insulin sensitivity [
17]. The increasing HRV variability (
) observed in the higher-glycemic-risk group may reflect more frequent micro-arousals or disrupted sleep transitions, which transiently perturb autonomic regulation and increase within-window HRV dispersion.
Circadian autonomic dysregulation may also contribute to the observed patterns, potentially interacting with the mechanisms described above. Prior reviews have summarized links between sleep disruption, autonomic dysfunction, and insulin resistance [
45,
46]. The nocturnal trend of HRV variability may integrate these interacting processes into a single quantitative marker that captures the overall quality of overnight autonomic regulation.
4.3. Consistency with Prior Cohort Studies
The present findings are consistent with evidence from larger population-based cohorts. The Maastricht Study (
) reported significantly lower HRV indices in individuals with prediabetes and type 2 diabetes [
13]. In addition, Cheng et al. reported that sleep-time HRV measures were more strongly correlated with metabolic markers than waking-period measures [
20]. Although the present study is smaller in scale, this consistency supports the rationale for focusing on sleep-time HR/HRV dynamics as candidate correlates of glycemic status.
4.4. Why Dynamic Features Were Selected over Static Means
The LASSO-based screening procedure preferentially selected dynamic indices—such as nocturnal trends and nightly variability of within-window slopes and speeds—over static mean measures. This pattern may reflect several complementary considerations. First, absolute HRV levels vary substantially between individuals due to factors that are not necessarily related to glycemic status, including physical fitness, resting heart rate, and genetic influences. In contrast, overnight trends may be less sensitive to these baseline differences, potentially providing a more direct window into physiological regulation linked to metabolic status.
Second, static mean values compress an entire sleep period into a single summary statistic and therefore discard information about temporal evolution. Dynamic indices preserve how autonomic activity changes across the night, which may be more tightly coupled to metabolic processes that also vary over the sleep cycle, including state-dependent shifts in autonomic balance and sleep-stage transitions.
Finally, these observations are consistent with theoretical perspectives on autonomic–metabolic interactions, which emphasize autonomic flexibility and the capacity to regulate appropriately across different physiological states [
47]. Dynamic indices such as
may better reflect this regulatory capacity than static averages, thereby offering greater sensitivity to subtle dysregulation.
Furthermore, the concept that dynamic variability patterns contain information beyond static averages has been observed in other modalities. For example, Piersanti et al. (2025) demonstrated that variability metrics of wrist skin temperature, rather than mean skin temperature, serve as novel digital biomarkers of glycemia [
48]. This parallel finding suggests that metabolic dysregulation may manifest as subtle instability across multiple physiological systems, reinforcing the value of focusing on dynamic features in wearable-based assessments.
4.5. Limitations
This study has several important limitations that constrain interpretation of the findings. The sample size was small, with only 18 participants, which limits statistical power and precludes definitive inference. Accordingly, the identified associations should be regarded as preliminary signals rather than confirmatory evidence. In line with this exploratory aim, p-values were reported without correction for multiple comparisons and should be interpreted with appropriate caution.
The use of LASSO-based feature selection in a small-sample setting also introduces potential instability. When the number of predictors is large relative to the sample size, LASSO coefficients may be sensitive to data partitioning, and the specific features selected can vary across resamples. Approaches such as stability selection or bootstrap-based validation would strengthen confidence in the robustness of the identified candidate features.
In addition, this study employed a single-cohort design based on a convenience sample drawn from a Japanese population. As a result, the generalizability of the findings to other populations, age ranges, or ethnic groups remains uncertain, and external validation in independent cohorts is essential before broader conclusions can be drawn.
Glucose metabolism status was operationalized using CGM-derived eHbA1c rather than laboratory-measured HbA1c. Although the ADAG relationship linking mean glucose to HbA1c is well validated at the population level [
35], individual-level discrepancies may arise due to biological variability in glycation kinetics and red blood cell lifespan. Future studies should incorporate laboratory HbA1c measurements as the reference standard. It should also be noted that both groups fell within a relatively narrow range of estimated HbA1c values near the prediabetes threshold. Such range restriction may attenuate observable between-group differences, and the identified associations should therefore be interpreted as hypothesis-generating rather than definitive.
Additionally, sex distribution differed marginally between groups (), with a higher proportion of males in the higher glycemic risk group (6/7) compared with the lower glycemic risk group (3/11). Given known sex differences in both HRV and glucose metabolism, this imbalance represents a potential confound. Future studies should aim for sex-balanced recruitment or include sex as a covariate in multivariable models.
Finally, HR and HRV features were derived from consumer-grade wearable devices using photoplethysmography (PPG), which has known limitations compared with electrocardiography (ECG)-based measurements [
23]. Although focusing on sleep periods and time-domain indices, such as RMSSD, may mitigate some sources of noise, measurement constraints inherent to consumer wearables should be considered when interpreting the results.
Furthermore, the sleep periods analyzed in this study were identified using the proprietary sleep-staging algorithm implemented in the Fitbit device. Although validation studies of earlier Fitbit models with sleep-staging capabilities have reported high sensitivity (0.95–0.96) for detecting sleep epochs against polysomnography [
34], specificity for distinguishing quiet wakefulness from sleep has been shown to range from approximately 0.58 to 0.69 in these validation studies [
34]. As a result, some epochs classified as sleep may include brief periods of wakefulness. Such misclassification could introduce noise into sleep-stage-agnostic HR/HRV estimates, although this effect is expected to be attenuated by our use of multi-night aggregation and by focusing on relative nocturnal trends rather than absolute values.
4.6. Future Directions
The exploratory findings of this study highlight several avenues for future confirmatory research. First, external validation in larger and independent cohorts represents a critical next step. Replication using samples of at least 50–100 participants, ideally with laboratory-measured HbA1c, will be necessary to evaluate the robustness and generalizability of the identified associations.
Second, the application of stability selection approaches may strengthen confidence in feature identification. For example, running LASSO-based screening across multiple bootstrap subsamples and quantifying selection frequencies would allow for assessment of the robustness of candidate features against sampling variability.
Third, longitudinal follow-up studies are needed to clarify the prognostic value of the proposed dynamic autonomic indices. Prospective designs examining whether these features predict future progression toward impaired glucose metabolism would help distinguish markers of current glycemic status from early indicators of disease development.
Finally, mechanistic investigations integrating wearable-derived autonomic measures with polysomnography could provide insight into the physiological substrates underlying the observed patterns. Such studies may clarify whether nocturnal HRV variability trends are related to specific aspects of sleep architecture, such as micro-arousal frequency or slow-wave sleep duration.
4.7. Night-to-Night Variability and Clinical Implications
Night-to-night variability is a critical consideration for any wearable-based screening approach, as sleep-time autonomic dynamics are influenced by multiple time-varying factors including sleep architecture (which varies naturally from night to night), prior-day physical activity, dietary intake, psychological stress, circadian phase, and environmental conditions. We acknowledge that substantial within-subject night-to-night variability exists in the extracted HR/HRV features.
The implications differ depending on whether assessment is based on single nights or multi-night aggregation. For individual-night classification, high variability would limit reliability. A single-night assessment might yield “low risk” one night and “high risk” another due to transient factors unrelated to underlying glycemic status. We explicitly caution that single-night assessments should not be used for definitive risk stratification. In contrast, for multi-night aggregation, averaging features across multiple nights (as we did at the participant level, approximately 10.5 nights per participant) attenuates transient noise and reveals more stable, participant-level autonomic patterns. This is analogous to averaging multiple blood pressure or fasting glucose measurements to improve diagnostic accuracy. Our between-group comparisons using participant-level aggregated features showed large effect sizes (Cohen’s ), suggesting that the signal-to-noise ratio improves substantially with multi-night averaging.
Emerging evidence suggests that intra-individual variability itself may carry prognostic information. In cardiovascular epidemiology, visit-to-visit variability in blood pressure predicts adverse outcomes independently of mean levels. By analogy, the magnitude of night-to-night fluctuations in autonomic features could reflect autonomic instability or impaired regulatory capacity linked to metabolic dysfunction.
We hypothesize that the dynamic trend features proposed in this study may be more robust to night-to-night variability than conventional static mean values. Static means are highly sensitive to absolute levels, which can vary substantially with prior-day activity or stress. In contrast, dynamic trends capture the “shape” of overnight recovery—the direction and pattern of autonomic change—which may reflect more fundamental regulatory processes that are less affected by transient perturbations.
For practical implementation, we propose a two-stage screening workflow: (1) initial screening via multi-night wearable-based assessment (aggregating data over 7–14 nights) to identify at-risk individuals; (2) confirmatory laboratory testing (measured HbA1c and/or oral glucose tolerance test) for those flagged as screen-positive. This approach leverages the scalability of consumer devices while maintaining diagnostic rigor.
Finally, we note that the “first-night effect” commonly observed in laboratory polysomnography is substantially mitigated by wearable monitoring in free-living conditions. Furthermore, our protocol explicitly excluded Day 1 as an acclimation period, ensuring that analyzed nights reflect habitual sleep patterns.
Future studies should (i) determine the optimal aggregation period (minimum number of nights required for stable feature estimates), (ii) assess whether within-subject variability metrics provide incremental predictive value beyond mean feature levels, and (iii) establish the test-retest reliability of dynamic features over longitudinal periods.
5. Conclusions
In this study using consumer wearables and CGM under free-living conditions, we examined whether sleep-derived autonomic signatures are associated with glucose metabolism status. By extracting window-level and night-level HR/HRV descriptors and applying Elastic Net feature screening, we found that dynamic autonomic indices—capturing overnight trends and changes in variability—were more informative than conventional static mean measures for explaining nocturnal mean glucose.
Among the candidate features, the nocturnal trends of within-window HRV variability quantified from emerged as the most salient. Specifically, (the nocturnal trend of within-window standard deviation) and (the nocturnal trend of within-window variance) not only retained non-zero coefficients during feature screening but also differed significantly between the lower- and higher-glycemic-risk groups with large effect sizes (both ; Cohen’s ). The directional pattern was consistent across both indices: HRV variability tended to decrease overnight in the lower-glycemic-risk group (), whereas it tended to increase in the higher-glycemic-risk group (). This contrast is physiologically plausible and aligns with prior evidence linking autonomic dysregulation and sleep-related instability to impaired glucose metabolism.
These results should be interpreted as hypothesis-generating given the small sample size, the single-cohort design, and the use of CGM-derived eHbA1c as an HbA1c-equivalent proxy. Nevertheless, the present findings suggest that sleep-time patterns of autonomic regulation—particularly nocturnal trends in HRV variability—may provide a practical, non-invasive set of candidate markers for screening-oriented assessment of glycemic risk using widely available consumer devices. Confirmatory studies in larger and independent cohorts, incorporating laboratory-measured HbA1c (and, where feasible, polysomnography or validated sleep metrics), are needed to evaluate generalizability, refine feature definitions, and determine whether these dynamic indices provide prognostic information for future progression to impaired glucose metabolism.