4.1. Hypothesis Testing
H1. Caregiver burden is hypothesized to be associated with wearable-derived features across sleep, heart rate, and activity domains.
Supported. Feature screening identified several wearable-derived variables showing exploratory associations with caregiver burden, and the most consistent candidate signals were observed for sleep variability and heart rate measures. In particular, REM sleep variability showed the largest observed correlation in the screening analysis (ρ = −0.32, p = 0.004, q = 0.051), although it narrowly missed the conventional FDF threshold. Light and total sleep variability demonstrated moderate associations at the nominal level (p < 0.05), but these did not remain significant after FDR correction. Maximum daily heart rate also showed a positive adjusted association with caregiver burden in the multivariable model (β = 2.59, p = 0.005). Notably, maximum daily heart rate did not show a significant association in the bivariate correlation analysis but was associated with ZBI after adjustment for other variables; this finding should be interpreted in the context of the non-significant overall model test and modest adjusted R2. This pattern suggests that certain physiological signals may become informative only after accounting for shared variance with other features, highlighting the importance of cautious multivariate interpretation when analyzing wearable-derived data.
These exploratory associations were observed across both correlation-based screening and regression analyses. Rather than indicating a single dominant predictor, the results suggest that caregiver burden may be reflected across multiple physiological domains, including sleep patterns and heart rate responses. This convergence across analytical approaches supports the interpretation that wearable-derived signals may capture meaningful but incomplete aspects of caregiving-related stress, although the strength and consistency of these associations vary across features.
H2. Variability-based physiological features are hypothesized to be more strongly associated with caregiver burden than mean-based counterparts.
Partially supported. A contrast was observed between mean-based and variability-based features. None of the mean-based features showed significant associations across either correlation screening or adjusted regression analyses, whereas variability-based features—particularly sleep-related measures—showed more consistent exploratory associations with caregiver burden. REM sleep variability showed the most consistent candidate pattern across analyses, while light sleep variability showed a similar nominal pattern and total sleep variability showed a marginal association. In contrast, most activity- and heart rate-related variability features were not significantly associated with caregiver burden. Taken together, these findings suggest that temporal fluctuations in sleep may be more informative than average physiological levels in capturing caregiving-related burden. However, this pattern was not uniform across all variability measures, and the FDR-adjusted results indicate that these sleep variability findings should be treated as exploratory candidate signals.
H3. These associations are hypothesized to remain significant after adjusting for relevant clinical and demographic covariates.
Supported for select features. In the fully adjusted multivariable model, maximum daily heart rate showed a positive adjusted association with caregiver burden (β = 2.59, p = 0.005), indicating that peak heart rate responses may be associated with higher burden levels. The contrast between Strategy A and Strategy B further highlights the impact of feature interdependence. Several variability-based features that were not significant in the full model became nominally significant when evaluated individually, suggesting that collinearity among related measures may obscure their effects in simultaneous models. In feature-by-feature adjusted models, REM sleep variability also showed a nominal association (β = −0.39, p = 0.038), with light sleep variability showing a similar nominal association and total sleep variability showing a marginal effect; however, these individual-feature findings did not remain significant after FDR correction.
However, not all associations persisted after adjustment. The modest explanatory performance of the multivariable model (adjusted R2 = 0.08) indicates that wearable-derived features capture only a portion of the variance in caregiver burden. This is consistent with the multifactorial nature of caregiver burden, which is influenced by psychological, social, and contextual factors beyond physiological signals.
4.2. Clinical and Practical Implications for Wearable-Based Monitoring
The present findings provide preliminary evidence that wearable-derived signals, particularly sleep variability and heart rate, may serve as candidate objective indicators of caregiver burden. Maximum daily heart rate showed a positive adjusted association with burden in the multivariable model (β = 2.59, p = 0.005) and may reflect periods of elevated physiological stress. Because heart rate can be continuously and passively collected, these signals may support real-time monitoring of caregiver state as a complement to self-report.
The observed effect sizes should be interpreted cautiously. Although maximum daily heart rate was positively associated with caregiver burden in the adjusted model (β = 2.59, p = 0.005), the overall explanatory performance was modest (adjusted R2 = 0.08), and the magnitude of this association is unlikely to support direct clinical classification of caregiver burden from heart rate alone. The association between REM sleep variability and ZBI was also modest and exploratory (ρ = −0.32; β = −0.394) and did not survive FDR correction. These effect sizes therefore support the potential value of wearable-derived features such as low-burden, continuously collected candidate indicators but highlight the need for longitudinal validation before these measures can inform clinical decision-making.
These findings support intervention strategies aimed at regulating heart rate and reducing stress-related physiological load. Approaches such as mindfulness-based stress reduction (MBSR) and slow-paced breathing have shown effectiveness in reducing perceived stress and stabilizing heart rate responses in caregivers [
35,
36]. In addition, structured respite care may help reduce repeated high-stress episodes by providing periodic relief from caregiving demands.
Several sleep time measures, including sleep duration, minutes asleep, and minutes awake, were also significant in the multivariable model, although these findings should be interpreted cautiously because of potential overlap among related sleep variables.
Findings related to sleep variability also provide important but exploratory insights. REM sleep variability showed a nominal negative association with caregiver burden after adjustment (β = −0.39, p = 0.038), with similar patterns observed for light sleep variability. One possible interpretation is that some degree of sleep timing or stage variability may reflect flexibility in adapting to irregular caregiving demands. However, this interpretation should be considered cautiously because the association did not survive FDR correction and alternative mechanisms may also explain the observed pattern.
A clinically plausible alternative explanation is REM truncation or a floor effect rather than adaptive flexibility. REM sleep tends to occur more prominently in the latter part of the sleep period; therefore, caregivers with high burden who experience chronically shortened or interrupted sleep may have fewer opportunities to enter or sustain REM sleep. Under this scenario, lower day-to-day REM variability could arise because REM duration is already constrained near a lower range, producing a statistical floor effect. The negative association may also reflect stress-related REM suppression, irregular awakenings related to nighttime caregiving, or unmeasured medication effects. Accordingly, REM sleep variability should be interpreted as a candidate signal with multiple plausible mechanisms rather than as direct evidence of adaptive sleep flexibility.
Sensitivity analyses provided further context for interpreting the primary findings. The REM sleep variability association was robust to adjustment for mean total sleep time (β = −0.388, p = 0.060), arguing against a floor-effect explanation, and remained directionally consistent after adjustment for the Vascular Risk Score (β = −0.394, p = 0.048), indicating that general cardiovascular health does not confound this finding. However, the association was attenuated after adjustment for care-recipient cognitive status (β = −0.281, p = 0.140), suggesting that disease severity of the care recipient may partly contribute to the observed pattern. Sleep fragmentation, operationalized as the ratio of awake minutes to total time in bed, was not significantly associated with caregiver burden (ρ = 0.070, p = 0.545), suggesting that this aggregate proxy does not capture the same dimension of sleep disruption as REM stage variability.
From a clinical and practical perspective, the continuous and passive nature of wearable data enables real-time monitoring and timely intervention in caregiving contexts. Heart rate signals, in particular, can be used to identify periods of elevated stress and provide immediate support. For example, wearable-based systems could deliver just-in-time interventions (e.g., breathing guidance or stress alerts) when signals exceed predefined thresholds [
37]. Integrating such functionality into digital caregiver support platforms offers a scalable and low-burden approach to personalized care, enabling proactive support without requiring users to explicitly report distress.
The present study relied on inter-day mean and standard deviation of daily heart rate summaries, rather than intra-day heart rate variability (HRV). This choice was driven by the resolution of data available through the Fitbit Web API, which does not provide beat-to-beat RR interval data required for standard HRV metrics (e.g., RMSSD, SDNN). The inter-day mean of daily heart rate is analogous to the circadian mesor used in prior wearable research to characterize physiological load [
23], and its inter-day variability captures temporal fluctuations in cardiovascular state across days—a distinct construct from intra-day autonomic regulation. While intra-day HRV over 24-h or 5-min epochs is a more established and sensitive measure of autonomic stress responses, the inter-day approach used here is appropriate for the available data resolution and provides a complementary perspective on sustained physiological burden over multi-day windows. Future work should investigate intra-day HRV—ideally derived from chest-worn ECG or validated optical HRV sensors—as a more direct measure of caregiver autonomic load, as it may provide stronger and more mechanistically interpretable associations with caregiver burden than the daily summary metrics used here [
37].
Unlike prior wearable-based caregiver studies that examined isolated sleep metrics in small samples [
16,
21], the present work simultaneously analyzed multimodal Fitbit data across sleep stage architecture, heart rate dynamics, and activity patterns, with a focus on inter-day variability and rigorous sensitivity analyses adjusting for clinical covariates. These methodological advances, combined with the passive and continuous nature of consumer-grade wearables, position this framework as a scalable foundation for objective caregiver burden monitoring, early detection of physiological stress, and real-time support in caregiving contexts.
4.3. Limitations
Several limitations should be considered when interpreting these findings. In particular, not all wearable-derived signals were equally informative, as most activity-related and heart rate variability features showed weak or non-significant associations across analyses. First, the cross-sectional design allows identification of associations but does not support causal inference. It remains to be determined whether sleep variability plays a protective role against caregiver burden, whether caregivers with lower burden are better able to maintain flexible sleep patterns, or whether the observed REM variability pattern reflects sleep curtailment or other unmeasured factors. Longitudinal studies will be required to clarify this directionality.
Second, the sample size (n = 78), while adequate for exploratory analysis, limits statistical power. A post hoc power analysis indicated that the full multivariable model (k = 10 predictors, n = 78) achieved a power of 0.99 to detect the observed large effect size (f2 = 0.47; Cohen’s large threshold for a large effect), suggesting adequate power to detect effects of this magnitude. However, the feature-by-feature adjusted models for individual variability features had more modest statistical power (estimated power ≈ 0.57 for REM sleep variability), indicating that some true small-to-medium associations may have gone undetected. A sample of approximately 150–200 participants would be required to achieve 80% power to detect medium-sized individual feature effects (f2 ≈ 0.10) in the adjusted models. The modest explanatory performance of the multivariable model (adjusted R2 = 0.08) should be interpreted in this context. Caregiver burden is a complex and multifactorial phenomenon, and it is unlikely that any single wearable-derived signal can fully explain its variance. Consistent with this, the overall model was not statistically significant (F-test p = 0.208), indicating that while individual features carry meaningful information, wearable data alone does not constitute a complete model of caregiver burden. Rather, this reflects the high dimensionality and interdependence of physiological and behavioral signals captured in real-world settings.
Third, several sleep-related variables included in the multivariable model represent overlapping constructs (e.g., total sleep duration, minutes asleep, and minutes awake), which showed near-perfect intercorrelations (r > 0.76) and may introduce multicollinearity, leading to unstable or opposing coefficient estimates. A revised parsimonious model retaining only sleep efficiency produced consistent results for the heart rate finding, supporting its robustness. Future work may benefit from feature selection or dimensionality reduction approaches to address redundancy among closely related variables.
A further limitation concerns the heterogeneous timing of ZBI administration relative to the wearable monitoring period: 45 participants completed the ZBI prior to monitoring, 15 after the monitoring period, and 17 during or on the first day of monitoring. Although the 7-day window selection strategy was designed to maximize temporal proximity, some temporal misalignment between physiological data and burden assessment may have remained. Because a formal sensitivity analysis across timing subgroups was not feasible with the available data, future studies should administer burden assessments concurrently with wearable monitoring to strengthen temporal validity.
Specific medication data, including beta-blockers, antiarrhythmics, and other heart-rate-modifying drugs, were not collected in this study. Such medications may systematically alter average and maximum heart rate values measured by photoplethysmography-based wearables, and conditions such as atrial fibrillation are known to reduce the accuracy of optical heart rate estimation in consumer devices. While our sensitivity analysis using the Vascular Risk Score suggests that general cardiovascular health does not confound the primary findings (VRS p > 0.40 in all models), this approach addresses only aggregate cardiovascular risk rather than specific pharmacological effects. Future studies should prospectively collect detailed medication logs to more precisely account for pharmacological influences on wearable-derived heart rate features.
Consumer-grade wearable devices carry inherent measurement limitations under free-living conditions. Optical PPG-based heart rate estimation is susceptible to motion artifacts and reduced accuracy during high-intensity activity or in participants with arrhythmias [
27]. Fitbit-derived sleep stage classifications show only moderate agreement with polysomnography, particularly for brief awakenings and REM detection [
25,
26], and may systematically underestimate REM duration in participants with fragmented sleep—potentially biasing the REM variability features central to this study. These limitations reinforce the exploratory nature of the present findings and the need for replication using laboratory-validated measures.
Sleep timing and polyphasic sleep patterns were not fully captured in the present analysis. The Fitbit API provided daily aggregated sleep session data, which does not allow identification of separate daytime napping bouts or precise sleep onset and offset times. A sleep fragmentation proxy (ratio of awake minutes to total time in bed) was examined but showed no significant association with caregiver burden (ρ = 0.070, p = 0.545). Future studies should utilize raw intraday Fitbit data or actigraphy-based algorithms capable of detecting polyphasic sleep, daytime napping, and phase-shifted sleep patterns, which may be particularly relevant in caregiver populations experiencing nocturnal disruptions due to care demands.
In addition, the negative association observed for REM sleep variability, while evident in exploratory analyses, should be interpreted cautiously and should not be taken as evidence that lower REM variability is inherently adaptive. Because REM sleep is concentrated in the latter portion of the sleep period, chronic sleep curtailment in highly burdened caregivers could truncate REM episodes, reduce the observable range of REM duration, and produce a floor-effect pattern in which the standard deviation of REM sleep is mechanically lower. Stress-related REM suppression and unmeasured clinical factors may also contribute. Medication use was not included as an adjustment variable in the present analysis, although antidepressants, hypnotics, sedatives, and other sleep or neuropsychiatric medications can affect REM sleep and sleep-stage architecture. Future studies should collect and adjust for medication exposure to distinguish caregiving-related physiological patterns from pharmacological or sleep-duration-driven effects.
In addition, although PSQI was included only as a covariate rather than a feature of interest, its use introduces a degree of circularity because PSQI is itself a self-reported measure of sleep quality. Therefore, the adjusted models should be interpreted as estimating the association between wearable-derived features and caregiver burden after accounting for baseline subjective sleep quality, rather than as fully objective wearable-only models. Future studies should evaluate models relying exclusively on objective wearable-derived features to assess their standalone predictive value.
Sensitivity analyses showed that the REM sleep variability association remained largely consistent after adjustment for mean total sleep time (β = −0.388, p = 0.060), suggesting that the association is unlikely to be fully explained by overall sleep duration, although REM-specific truncation or floor effects cannot be completely ruled out. However, the association was attenuated after adjustment for care-recipient cognitive status (β = −0.281, p = 0.140), suggesting that underlying disease severity may partly contribute to the observed pattern. Medication use, which was not accounted for in the current analyses, represents an important covariate to be addressed in future work.
Finally, the study sample consisted of informal caregivers, which may limit generalizability to other caregiving populations, such as professional caregivers or different care settings. Despite these limitations, the findings provide a useful foundation for future work and highlight the potential of wearable-derived data as a complementary tool for understanding caregiver burden in real-world environments.