Using Consumer-Wearable Activity Trackers for Risk Prediction of Life-Threatening Heart Arrhythmia in Patients with an Implantable Cardioverter-Defibrillator: An Exploratory Observational Study

Ventricular arrhythmia (VA) is a leading cause of sudden death and health deterioration. Recent advances in predictive analytics and wearable technology for behavior assessment show promise but require further investigation. Yet, previous studies have only assessed other health outcomes and monitored patients for short durations (7–14 days). This study explores how behaviors reported by a consumer wearable can assist VA risk prediction. An exploratory observational study was conducted with participants who had an implantable cardioverter-defibrillator (ICD) and wore a Fitbit Alta HR consumer wearable. Fitbit reported behavioral markers for physical activity (light, fair, vigorous), sleep, and heart rate. A case-crossover analysis using conditional logistic regression assessed the effects of time-adjusted behaviors over 1–8 weeks on VA incidence. Twenty-seven patients (25 males, median age 59 years) were included. Among the participants, ICDs recorded 262 VA events during 8093 days monitored by Fitbit (median follow-up period 960 days). Longer light to fair activity durations and a higher heart rate increased the odds of a VA event (p < 0.001). In contrast, lengthier fair to vigorous activity and sleep durations decreased the odds of a VA event (p < 0.001). Future studies using consumer wearables in a larger population should prioritize these outcomes to further assess VA risk.


Introduction
Heart arrhythmias constitute a growing challenge to healthcare systems worldwide, and ventricular arrhythmias (VA) are an important and increasingly frequent cause of sudden death and health deterioration. Implantable cardioverter-defibrillators (ICDs) and remote monitoring have led to significant advances in reliably avoiding malignant VAs [1,2], leading to reduced hospitalization rates and improved quality of care [3,4]. Still, VAs are life-threatening and pose challenges for risk assessment, including prevention of inappropriate therapy and detection of impending events in time for proactive clinical intervention [5].

Measured Outcomes 2.3.1. ICD-Reported Outcomes
The first data source was the ICDs, which provided data report files on remote heart rhythm monitoring in XML format through Medtronic CareLink [31]. Three types of VA events were reported by the ICDs (Table 1) and reflected in the Mainspring TM Report Export [31]: ventricular tachycardia (VT), ventricular tachycardia at two thresholds (VT1, VT2), and ventricular fibrillation into ventricular tachycardia (VF-VT). The incidence of a VA event (yes or no) during monitoring was used as the main outcome variable. Table 1. VA event types reported by the implanted ICDs.

VA Event Type Description
VT Ventricular tachycardia is a very fast heart rhythm that begins in the ventricles. It is defined as a heart rate of more than 100 beats/min with at least three irregular heartbeats in a row.

VT1
Ventricular Tachycardia Zone 1: Medtronic has an option to divide VTs into heart-rate zones. This division allows physicians to program different treatments for the different zones. For example, Zone 1 may range from 100 to 180 beats/min.

VT2
Ventricular Tachycardia Zone 2: Zone 2 is similar to VT1, but with a different beat-per-minute interval.

VF-VT
Ventricular fibrillation into ventricular tachycardia: VT is potentially lethal, VF even more so. In ventricular fibrillation, the ventricular rates are higher than in VT.

Fitbit-Reported Data
The second data source was the Fitbit Alta HR (Fitbit Inc., San Francisco, CA, USA) consumer wearable activity tracker. The Fitbit data were collected through an application programming interface [32] that provided behavioral markers for physical activity, sleep, and heart rate in the CSV format. A data format example is available in Listing S1: Data Format Example. The markers reported by the Fitbit and used in this study were either raw (steps, heart rate) or processed according to Fitbit's proprietary activity recognition algorithms (sedentary, physical activity, and sleep duration) [32].
The Fitbits counted participants' steps and classified the physical intensity as sedentary, light, fair, or vigorous for each 15-min interval in a day (up to 96 intervals/day). For time periods of assumed sleep, the Fitbits classified the sleep type as asleep, awake, restless, or unknown for 1-min intervals (up to 1440 intervals/day). Fitbit did not provide precise thresholds for its physical activity recognition algorithms [33]. Thus, in this analysis, variables for cumulative adjacent intensities (e.g., light + fair) and variables for combinations of sleep types (e.g., awake + asleep) were derived. Sleep was not measured for all patients, and sedentary duration also included sleep. Therefore, all durations that included sedentary duration were deemed unreliable and excluded from analysis. The Fitbits also reported heart rate in 1-min intervals (up to 1,440 intervals/day). For the 15-min intervals, minimum, mean, median, maximum, and standard deviation (SD) heart rate values were derived from the 1-min heart rates. This additional step for the heart rate was necessary to derive the aggregate variables feasibly in time, while maintaining a high measurement frequency and aligning the heart rate intervals with those for the other behavioral markers. Therefore, all variables were derived for the 15-min intervals ( Table 2).

Data Analysis
The Fitbit variables were aggregated for analysis. The variables were first aggregated over days, then weeks, then intervals of 1-8 consecutive weeks called periods. The reason for deriving different periods was to explore the risk of VA events for the purposes of timely clinical intervention (e.g., behaviors leading to events within one week vs. behaviors leading to events within eight weeks). Inferential and descriptive analyses using the aggregations were then conducted. The data analysis was performed in Python [34] using the Anaconda environment [35] (data aggregation and descriptive analysis) and in R [36] using the RStudio environment [37] and the Survival library [38] (inferential analysis). The data analysis code is available in Listing S2: Data Analysis Code. A similar approach, leveraging the aggregation of data for different periods before the event, has been proposed and evaluated as a co-calibration method [39].

Data Quality Assurance and Data Aggregation
Fitbit measurements reported as "0" were excluded from analysis. Valid days, weeks, and periods were then derived according to several scenarios. First, only days with at least one, two, four, or eight hours of physical activity data available (i.e., classified as any combination of sedentary, light, fair, or vigorous) between 8 a.m. and 8 p.m. were deemed valid days and included in the analysis as four separate scenarios. Second, only weeks with at least four, five, or seven valid days were included as valid weeks as three separate scenarios. Third, only periods with sufficient valid weeks were deemed valid periods according to three increasingly strict scenarios: minimum 50%, 75%, and 100% valid weeks ( Figure 1). This system totaled 36 combinations of scenarios based on the four scenarios for valid days, three for valid weeks, and three for valid periods. We further elaborate on data validation in Appendix A.2.
Data aggregation was conducted through accumulation of all variables from the 15-min intervals in valid Fitbit days. Steps, physical activity duration, and sleep duration were summarized via daily aggregation to support the subsequent analysis. This approach has also been implemented in other studies using Fitbit data [39][40][41]. Daily heart rates were aggregated, and heart rates were reported by minimum, mean, median, maximum, and SD across the 15-min intervals. For each valid week, the mean daily count of steps, mean physical activity durations, mean sleep durations, and minimum heart rate were accumulated from the daily aggregations. The same aggregations were performed on valid periods using the aggregations on valid weeks ( Figure 2 Data aggregation was conducted through accumulation of all variables from the 15min intervals in valid Fitbit days. Steps, physical activity duration, and sleep duration were summarized via daily aggregation to support the subsequent analysis. This approach has also been implemented in other studies using Fitbit data [39][40][41]. Daily heart rates were aggregated, and heart rates were reported by minimum, mean, median, maximum, and SD across the 15-min intervals. For each valid week, the mean daily count of steps, mean physical activity durations, mean sleep durations, and minimum heart rate were accumulated from the daily aggregations. The same aggregations were performed on valid periods using the aggregations on valid weeks ( Figure 2). Data validation for days, weeks, and periods for one combination of scenarios: at least four hours of physical activity data between 8 a.m. and 8 p.m. for valid days, at least four valid days for valid weeks, and at least 50% valid weeks for valid periods. Days (depicted in green in the top left) contain physically active or inactive time (physically active time is depicted with solid blue, while physically inactive time is depicted with pale blue). If at least 50% of the time between 8 a.m. and 8 p.m. is active, the day is valid. Weeks (depicted in yellow in the top center) contain seven days (valid days are depicted with solid green, while invalid days are depicted with pale green). If the week has at least four valid days, that week is valid. Periods (in orange in the top right) contain 1-8 weeks (valid weeks are depicted with solid yellow, while invalid weeks are depicted with pale yellow). If at least 50% of the weeks of a period are valid, the period is valid.

Analytic Design
The descriptive analysis consisted of two parts. The first part concerned summary statistics (median, mean, and SD) for data quality and behavioral markers. The second part concerned changes in Fitbit wear in the temporal vicinity of VA events observed for individual patients. For brevity, only the first part of the descriptive analysis is included in this paper. The second part is detailed in Appendix B1.3.
The inferential analysis assessed the extent to which the given behaviors affected the odds of a VA event over time. This assessment was performed by means of a case-crossover design using conditional logistic regression [42,43]. This approach was chosen because it enabled meticulous analysis of cases of patients who had experienced a VA event; the patients served as their own controls.
For each patient, all monitoring days were used to capture the outcome of VA or no VA on any given day. Windows of valid Fitbit-measured periods of 1-8 weeks were used to define the exposure immediately succeeding each day of VA (case periods) or no VA (control periods). Figure 3 provides an example. In this way, the data were extended to include several time periods, one for each day of monitoring for each patient [44], and all patients with both VA events and Fitbit-measured behaviors contributed with cases, con-

Analytic Design
The descriptive analysis consisted of two parts. The first part concerned summary statistics (median, mean, and SD) for data quality and behavioral markers. The second part concerned changes in Fitbit wear in the temporal vicinity of VA events observed for individual patients. For brevity, only the first part of the descriptive analysis is included in this paper. The second part is detailed in Appendix B.1.3.
The inferential analysis assessed the extent to which the given behaviors affected the odds of a VA event over time. This assessment was performed by means of a case-crossover design using conditional logistic regression [42,43]. This approach was chosen because it enabled meticulous analysis of cases of patients who had experienced a VA event; the patients served as their own controls.
For each patient, all monitoring days were used to capture the outcome of VA or no VA on any given day. Windows of valid Fitbit-measured periods of 1-8 weeks were used to define the exposure immediately succeeding each day of VA (case periods) or no VA (control periods). Figure 3 provides an example. In this way, the data were extended to include several time periods, one for each day of monitoring for each patient [44], and all patients with both VA events and Fitbit-measured behaviors contributed with cases, controls, or both to the analysis.  . Case-crossover analysis: case and control periods for a combination of scenarios: an arbitrary scenario for valid days, at least four valid days for valid weeks, and at least 50% valid weeks for valid periods. Periods (depicted with orange contour) contain valid weeks (depicted with strong yellow contour and fill) and invalid weeks (depicted with pale yellow contour only). Week validity depends on having at least four valid days (depicted with strong green contour and fill) and at most three invalid days (depicted with pale green contour only). Case periods are followed by an event on the next day (depicted by a magenta vertical line). Events on the following day do not follow control periods. Valid periods (depicted with strong orange contour and fill) are either case periods (above the timeline of wearable monitoring) or control periods (below the timeline). Invalid periods are neither case periods nor control periods. The analysis uses all patients' case and control periods.

Conditional Logistic Regression
Conditional logistic regression was used to assess how a one-unit change in behaviors (e.g., one extra minute of physical activity at a certain intensity or an extra beat per minute for the heart rate) affected the change in probability of a VA event. The predictors in the conditional logistic regression models were (a) behavior aggregate variables (continuous exposure) and (b) time-specific variables for time-point adjustment: (i) season (spring, summer, fall, winter), (ii) day of week (1-7), and (iii) weekday status (weekday, weekend day) of the date immediately succeeding the period. A scenario defines a specific combination of predictors: (a) behavior aggregate variables and (b) time-specific variables. Three conditional logistic regression formulae were derived. A total of 108 scenario-formula combinations resulted from 36 scenario combinations and three formulae (Table 3).  . Case-crossover analysis: case and control periods for a combination of scenarios: an arbitrary scenario for valid days, at least four valid days for valid weeks, and at least 50% valid weeks for valid periods. Periods (depicted with orange contour) contain valid weeks (depicted with strong yellow contour and fill) and invalid weeks (depicted with pale yellow contour only). Week validity depends on having at least four valid days (depicted with strong green contour and fill) and at most three invalid days (depicted with pale green contour only). Case periods are followed by an event on the next day (depicted by a magenta vertical line). Events on the following day do not follow control periods. Valid periods (depicted with strong orange contour and fill) are either case periods (above the timeline of wearable monitoring) or control periods (below the timeline). Invalid periods are neither case periods nor control periods. The analysis uses all patients' case and control periods.

Conditional Logistic Regression
Conditional logistic regression was used to assess how a one-unit change in behaviors (e.g., one extra minute of physical activity at a certain intensity or an extra beat per minute for the heart rate) affected the change in probability of a VA event. The predictors in the conditional logistic regression models were (a) behavior aggregate variables (continuous exposure) and (b) time-specific variables for time-point adjustment: (i) season (spring, summer, fall, winter), (ii) day of week (1-7), and (iii) weekday status (weekday, weekend day) of the date immediately succeeding the period. A scenario defines a specific combination of predictors: (a) behavior aggregate variables and (b) time-specific variables. Three conditional logistic regression formulae were derived. A total of 108 scenario-formula combinations resulted from 36 scenario combinations and three formulae (Table 3).
For each of the 108 scenario-formula combinations, conditional logistic regression models were created for periods of a fixed duration of 1-8 weeks (denoted as separate models) and for periods of durations in weeks at most the fixed duration 1-8 weeks, falling within the larger duration scope (denoted as combined models), as illustrated in Figure 4. A total of 108 × (8 + 8) = 1728 models resulted. falling within the larger duration scope (denoted as combined models), as illustrated in Figure 4. A total of 108 × (8 + 8) = 1728 models resulted. . Periods in separate and combined models for a fixed duration of four weeks. Periods (depicted in orange) span across 1-8 weeks (depicted in yellow). For the separate models, only periods of precisely four weeks are included. For the combined models, periods of up to 4 weeks (precisely one week, and precisely two weeks, and precisely three weeks, and exactly four weeks) are included.
As the objective was to explore patterns without focusing on individual results, any odds ratio (OR) that exceeded the significance threshold ɑ = 0.05 was reported, without adjustments for multiple comparisons. However, highly significant ORs (e.g., p < 0.001) were expected. If, for a given behavior, across all models, (1) there were no significant ORs or (2) some significant ORs were sub-unit and some significant ORs were supra-unit, the OR was reported as inconclusive.

Participant Information
Of the 65 heart patients with an ICD or an ICD with cardiac resynchronization therapy (CRT-D) who were invited to participate in the study, 27 participants provided written informed consent. Of these, 25 were male (93%), and the median age among participants was 59 years (mean 57.3 ± 11.1), as presented in Table 4. . Periods in separate and combined models for a fixed duration of four weeks. Periods (depicted in orange) span across 1-8 weeks (depicted in yellow). For the separate models, only periods of precisely four weeks are included. For the combined models, periods of up to 4 weeks (precisely one week, and precisely two weeks, and precisely three weeks, and exactly four weeks) are included.
As the objective was to explore patterns without focusing on individual results, any odds ratio (OR) that exceeded the significance threshold α = 0.05 was reported, without adjustments for multiple comparisons. However, highly significant ORs (e.g., p < 0.001) were expected. If, for a given behavior, across all models, (1) there were no significant ORs or (2) some significant ORs were sub-unit and some significant ORs were supra-unit, the OR was reported as inconclusive.

Participant Information
Of the 65 heart patients with an ICD or an ICD with cardiac resynchronization therapy (CRT-D) who were invited to participate in the study, 27 participants provided written informed consent. Of these, 25 were male (93%), and the median age among participants was 59 years (mean 57.3 ± 11.1), as presented in Table 4.  ICD  19  Male  45  45  450  ICD  20  Male  67  127  801  ICD  21  Male  66  0  148  Not specified  22  Male  69  1  395  ICD  23  Male  38  0  98  Not specified  24  Male  59  0  136  Not specified  25  Male  51  3  842  ICD  26  Male  49  0  891  Not specified  27  Male  74  0  796 Not specified 1 Refers to the number of VA events recorded. 2 Refers to the number of days with Fitbit measurements. 3 Refers to the use of an implantable cardiac-defibrillator (ICD) or ICD with cardiac resynchronization therapy (CRT-D).

Descriptive Analysis of Data Quality and Behavioral Markers
VA events were reported in 16 of the 27 participants. A total of 262 different types of VA events were recorded, with a mean of 16.4 ± 31.7/patient over a mean duration of 32.5 ± 28.9 months/person. Of these events, 56 were ventricular tachycardia (VT; mean 3.5 ± 8.8 events/patient), 172 were ventricular tachycardia type 1 (VT1; mean 10.8 ± 23.6 events/patient), and 34 were ventricular fibrillation and ventricular tachycardia (VF-VT) events (mean 2.1 ± 2.3 events/patient). The VA events by type for each patient are presented in Appendix B.1.1 (Table A1).
Fitbit-recorded behavioral data were available for all 27 patients with a total of 11.769 days (mean 435.9 ± 316.3 days/patient, median 357 days). The median ICD followup period was 960 days (mean 991.3 ± 880.9 days/patient). The follow-up periods for each patient are presented in Appendix B.1.1 (Table A2). The valid and invalid Fitbit days for each patient are also available in Appendix B.1.1 (Tables A3-A5).
As previously mentioned, Fitbit-recorded behavioral markers for physical activity, sleep, and heart rate were collected for the 27 patients. Mean daily physical activity consisted of 7667.7 ± 3521.6 steps; 352.8 ± 89.8 min/patient in light activity duration; 43.4 ± 40.1 min/patient in fair activity duration; and 60.6 ± 33.0 min/patient in vigorous activity duration. For heart rate, the Fitbits recorded a mean of 50.3 ± 6.7 beats/min for the daily minimum, 69.3 ± 9.8 beats/min for the daily mean, a median mean of 66.2 ± 10.7 beats/min, a maximum mean of 124.9 ± 12.7 beats/min, and SD mean 4.2 ± 0.9 beats/min. More details are available in Appendix B.1.2 (Tables A6-A8).

Inferential Analysis
Increases in light to fair physical activity duration and in heart rate resulted in an increased risk probability of a VA event; conversely, increases in fair to vigorous physical ac-tivity duration and sleep duration that included the awake sleep type (i.e., asleep + awake) resulted in a decreased risk probability of a VA event (Figures 5 and 6 and Table 5). ORs are depicted as bars: OR > 1 (red and upwards) and OR < 1 (green and downward). Bar height corresponds to the distance between the OR and 1. ORs are depicted as bars: OR > 1 (red and upwards) and OR < 1 (green and downward). Bar height corresponds to the distance between the OR and 1.  ORs are depicted as bars: OR > 1 (red and upwards) and OR < 1 (green and downwards). Bar height corresponds to the distance between the OR and 1.  Spending more time in light to fair physical activity increases the risk of VA events. On average, the odds increase by 9 to 20 percent when time spent in light-intensity physical activity increases by 15 min per day, as measured over 1-3 weeks. Furthermore, 15 additional minutes of light or fair activity led to an average odd increase of 9 to 12 percent, as measured over 1-2 weeks.
However, fair to vigorous activity reduces the risk of VA events. The odds decreased by 32 to 34 percent on average with every 15 extra minutes of fair activity per day, measured over 4-8 weeks. The odds also decreased by 16 to 21 percent on average with every 15 extra minutes of vigorous activity, measured over 2-8 weeks. Furthermore, 15 more minutes of combined fair and vigorous activity reduced the odds by 12 to 16 percent, as measured over 2-8 weeks.
A higher heart rate increases the risk of VA events. Ten extra beats per minute increase the odds of a VA event as follows: minimum heart rate measured over one week doubled the odds, mean heart rate measured over 2-3 weeks increased the odds four to ten times, median heart rate measured over 2-4 weeks increased the odds four to sixteen times, and maximum heart rate measured over two weeks doubled the odds of a VA event. More findings are available in Appendix B.2 (Tables A9-A12).

Key Findings Compared to Prior Work
This exploratory observational study assessed the relationship between behavioral activity changes and the risk probability of potentially life-threatening VA events by comparing data from ICDs and Fitbit wearable activity trackers. We found that higher heart rates and spending more time in light to fair physical activity increased the risk of imminent VA events, whereas fair to vigorous activity reduced the risk. Few previous studies have assessed the risk probability of VA events using technology-reported data from ICDs and wearable activity trackers, especially with longer follow up times. By assessing the utility of consumer-grade wearable activity trackers for early risk assessment of VA events, the aim is to build towards the validation of interpreting significant behavior changes (activity levels, sleep) as a vital clinical sign for early clinical intervention among patients at risk of life-threatening heart arrhythmias.
Our cohort was representative of an ICD population with regards to age and gender, with a predominance of males. The results indicated that increased duration of light or light + fair physical activity increased the risk probability of a VA. Conversely, a decrease in fair, fair + vigorous, or vigorous activity levels increased the risk probability of a VA. Our results therefore support a cardioprotective effect of exercise [45] and suggest that there is an increased risk of developing arrhythmia with decreased activity levels. Few previous studies have focused on the outcome of VA and its relationship to physical activity measured by an ICD device [46]. For example, in one study among an all-female cohort, ICD device-measured physical activity started to decline 16 days before a VA and defibrillator shock [47]. Moreover, declining physical activity has been previously used as a predictor for outcomes such as heart failure and mortality [46]. An inverse relationship between activity level and cardiovascular events has been found using a wearable activity tracker for activity monitoring in a cohort without a prior or concurrent cardiovascular disease [48].
Based on our findings, future studies could measure light physical activity over shorter time intervals (1-3 weeks). They may consider 9 percent as a baseline odd increase for 15 extra minutes of light to fair activity. Fair and vigorous physical activity could be measured between 2 and 8 weeks, where future studies may consider the average odds decreases of 32% and 16% for every 15 extra minutes of these activity intensities. Heart rate yielded significant effects within 1-3 weeks of monitoring. Ten extra beats per minute increased the odds twofold on average. Heart rate could be monitored closest, as changes associated with the minimum and maximum heart rates were visible in short periods (one and two weeks, respectively).
This study does not report key findings related to sleep. This choice was made because we noted that Fitbit may not always have reported sleep durations for the patients, as the awake, sleeping, and non-wear times were accumulated in the dataset. Nevertheless, prior literature has described an association between sleep behavior and physical activity for patient-reported physical function, quality of life, and cognitive function, though often with the limiting factor of self-reported sleep outcomes in homogeneous populations [49,50]. As there are interactions between physical activity and sleep throughout the day [51], we suggest that sleep measurements should be included in future studies; the validation of technology-reported sleep duration with self-reported sleep duration could ensure realistic measurements.

Implications for Designing Systems for Ventricular Arrhythmia Risk Assessment Using Wearable Activity Data
The results from this study and similar previous studies suggest that several critical aspects may influence the quality of data collected and pose potential scaling issues when leveraging consumer-wearable activity trackers for VA risk assessment among ICD patients. Although the results are indicative, they suggest that consumer-wearable activity trackers can be leveraged for VA event risk assessment, potentially enabling better (self-) management of activities contributing to health (especially physical activity), and may ultimately lead to improved health outcomes.
There are several implications for systems designed for risk prediction of VAs. First, there is, on the one hand, the potential derived from using external research-grade wearable activity trackers, which can capture high-granularity and high-accuracy data [52]. Such devices have produced more accurate measures of daily activity compared to activity tracking using accelerometers embedded in ICDs, which is limited to daily summaries of physical activity [53]. On the other hand, the consumer-wearable data must be accurate enough for the clinical purpose; in our case, the sleep datasets were deemed unreliable. Second, in the larger context of current developments, comparisons between researchgrade and consumer-wearable activity trackers have shown strong validity, although the validity ranged widely between devices [25]. Third, there is a need for consensus on many levels regarding the use of wearable accelerometers, such as ways to manage the differences among proprietary algorithms for behavioral markers [15,16]. Finally, to identify implications for technology and system design, exploration of the benefits of long-term use of external activity trackers is eminent.
For patients who accept consumer-wearable activity trackers, the accuracy of the predictive performance and the timeliness of notifications are critical for the success of usage and collected data quality [54]. In an era of remote, decentralized, and increasingly personalized patient care, our results indicate that physical activity measured through consumer-wearable activity trackers can play a role in cardiovascular event risk assessment. However, there is an evident need for more extensive, prospective, and well-designed studies to quantify the utility of physical activity as a vital signal of clinical deterioration and VA.

Strengths and Limitations
A strength of this study is that it is the first to examine the outcome of VA, using raw data from ICDs and compare it to data from a consumer-wearable activity tracker. Second, the activity tracker wear time was on average over one year, which far exceeds the average wear time in previous studies [16]. Third, the wearable trackers measured multiple daily behaviors continuously, allowing behaviors to unfold over extended periods of time. Fourth, an accurate, user-friendly, and data collection-friendly device was used. The selected device may have positively contributed to the data quality to a greater extent possible than other consumer-wearable activity trackers [19,20].
This study has several limitations. First, the small sample size of unique patients prevents a separate analysis based on age or gender and limits our confidence regarding the generalizability of the findings to a larger population. The results are based on a large amount of longitudinal data with several time epochs per individual patient. This approach poses a risk of bias by carry-over, but arguably resulted in a conservative analysis given that any association between exposure and outcome had to be robust to nullify inverse or null associations during parts of the same time epoch. We allowed a permissive significance threshold of α = 0.05 without adjustments for multiple tests, but also found highly significant results (e.g., p < 0.001). These results would therefore remain visible with statistically significant corrections up to 50×. Furthermore, patient-effect was adjusted for by means of distinguishing each behavior-time-event data point as unique to the patient, and no additional variables could be added, avoiding collinearity.
Second, the definitions of behaviors reported by Fitbit, specifically the thresholds in the Fitbit activity recognition algorithm, are unknown for different physical activity intensities, as well as for sleep. This limitation was accounted for by including cumulative variables for physical intensities (e.g., light + fair) and sleep (e.g., asleep + awake). In addition, Fitbit did not distinguish sedentary duration as time awake, sleeping, or non-wear. This limitation made it difficult to delineate and thereby analyze these behaviors and may explain the mean recorded sleep duration of under four hours. This limitation was accounted for by excluding the sedentary duration from the analysis.
Third, feedback about behaviors (e.g., visualizations of the number of steps) and observations reported to patients by the Fitbit device and associated application might have influenced the behaviors under study. The patients may have changed their physical activity patterns, or sleep patterns based on the feedback provided by the device.
Finally, the patient data did not contain baseline characteristics-such as concurrent heart disease, presence of heart failure, medications, or comorbidities (e.g., hypertension or diabetes)-that may be confounders, influencing the behavioral, as well as the VA outcomes.

Conclusions
In the light of the increased availability and reliability of consumer-wearable activity trackers, this study explored the extent to which daily behaviors reported by such trackers can assist in VA risk assessment in ICD patients. The results indicated that increased levels of activity are cardioprotective, decreasing the odds of experiencing a VA event. Future studies using consumer-wearable activity trackers in a larger population can further refine our findings to assess the risk of VA.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jpm12060942/s1. The supplementary files contain a Data Format Example (Listing S1) and the Data Analysis Code (Listing S2).

Abbreviations
The following abbreviations are used in this manuscript: From a space of consumer wearable activity trackers with over 200 models [22], we selected Fitbit for this study driven by three considerations: Fitbit embraces human factors for extended wear, reports daily life behavioral markers accurately, and allows researchers to collect data reliably. First, in past usability studies, participants found Fitbit "easy to use, useful, and acceptable" and the most usable compared to other wearables [23,24]. Second, Fitbit aims to motivate consumers to "reach health and fitness goals by tracking activity, exercise, sleep, weight, and more" [25]. Previous studies measured the accuracy of Fitbit consumer-friendly devices in reporting daily life behaviors of physical activity [26][27][28] and sleep [29,30]. Furthermore, Fitbit was selected for Digital Health software pre-certification by the US FDA [55]. Third, Fitbit reliably exposes behavioral markers for physical activity, sleep, and heart rate multiple times per day (sufficient for longitudinal studies) in the JSON and CSV formats, easy to read by both humans and machines. We selected the Fitbit Alta HR wearable for our study, a lightweight activity tracker that can monitor physical activity, sleep, and heart rate throughout the day.

Appendix A.2 Data Quality Validation
We evaluated the quality of our data under three types of scenarios; for each the data is called 'valid' if its quality is above the threshold, and 'invalid' otherwise. The main results of the paper are based on the third scenario type for days and all scenarios for weeks and periods. The remainder of scenario types for days are elaborated on below.
A day was deemed valid if it did not include a VA event and: The scenario types permitted missing data for device battery charging and handling (1), the presence of at least some physical activity (2), and extended sedentary behaviors expected from older patients during the day (3). Contrary, the scenario types aimed at reducing the impact of missing measurements by modeling the day as a period of 24 h [45].
A week was deemed valid if it did not include a VA event and had at least four, six, or seven valid days (three separate scenarios). These thresholds were meant to ensure enough days during the week while allowing for a few days without monitoring. These thresholds are concordant with previous studies using Fitbit consumer wearables [31].
A 1-8-week period was deemed valid if it did not include a VA event and had at least 50%, 75%, or 100% valid weeks (three separate scenarios). We chose at least one week for the analysis to monitor data representative of daily life since Fitbit's accuracy of the active minutes improves from one day to seven days [26]. An example is visible in Figure A1. The scenario types permitted missing data for device battery charging and handling (1), the presence of at least some physical activity (2), and extended sedentary behaviors expected from older patients during the day (3). Contrary, the scenario types aimed at reducing the impact of missing measurements by modeling the day as a period of 24 h [45].
A week was deemed valid if it did not include a VA event and had at least four, six, or seven valid days (three separate scenarios). These thresholds were meant to ensure enough days during the week while allowing for a few days without monitoring. These thresholds are concordant with previous studies using Fitbit consumer wearables [31].
The Fitbit activity tracker for Patient 4 recorded only a few days of sleep before a VA. The daily heart rate kept increasing for two weeks before the event. Then, right after the event and for three months, Fitbit recorded daily sleep durations almost every day. Then, Fitbit recorded sleep more sporadically, as before the event. Figure A2 shows the measurement period in extenso for Patient 4.
Upon experiencing the VA event, Patient 4 may have decided to extend the wear of his activity tracker during the awake sedentary time (as self-reported) and during the sleeping time to monitor his heart rate in expectation of a future VA event. The absence of a second VA event within three months may have contributed to his effort's decay.
Patient 12 (male, 66-year-old, device type CRT-D) was monitored over a period of 1147 days (Table A2). He experienced 23 VA events, five of type VT, 12 of type VT1, and six of type VF-VT (Table A1) over a period of 1034 days (Table A2). He is also a compliant Fitbit user with 647 days (Table A2) from which up to 543 valid days (Tables A3-A5). His physical activity was below average (Table A6), his sleep was above average (Table A7), and his Fitbit-reported heart rate was not extreme (Table A8). However, he suffered from disease-related anxiety, was very alert to bodily signs, and had difficulty sleeping: "as soon as there is even a little thing in these zones [in his chest region], and I would even say just one, like a sprain, I get nervous" [20]. He self-reported that Fitbit helped him feel safe: "now I get certainty. Is something wrong or not" [20], but he also evoked doubts about its usefulness after he and his wife saw it overestimate sleep: "we could not make the numbers fit because I had been awake a lot" [20].
sleeping time to monitor his heart rate in expectation of a future VA event. The absence of a second VA event within three months may have contributed to his effort's decay.  (Table A2). He experienced 23 VA events, five of type VT, 12 of type VT1, and six of type VF-VT (Table A1) over a period of 1034 days (Table A2). He is also a compliant Fitbit user with 647 days (Table A2) from which up to 543 valid days (Tables A3-A5). His physical activity was below average (Table A6), his sleep was above average (Table A7), and his Fitbit-reported heart rate was not extreme (Table A8). However, he suffered from disease-related anxiety, was very alert to bodily signs, and had difficulty sleeping: "as soon This patient wore the device compliantly for one year, measuring physical activity and sleep. The Fitbit tracker measured sleep for about nine months before the initial VA event. Right before the event, the heart rate increased for about one week. For three months after the initial VA event, the patient wore the tracker with interruptions. A series of other VA events led to a similar interruption of wear. His steps decreased in the time intervals following VA events. Figure A3 shows the measurement period in extenso for Patient 12.
After experiencing the first VA event, the compliance of Patient 12 decreased with interrupted wear time lasting several weeks and he performed fewer steps, potentially recovering from the event. However, before the second event, reduced compliance was observed again, with partial Fitbit wear during the day. The change in wear indicates that his behaviors or physical state may have changed (e.g., he started a new type of activity that may have contributed to the VA event). months after the initial VA event, the patient wore the tracker with interruptions. A series of other VA events led to a similar interruption of wear. His steps decreased in the time intervals following VA events. Figure A3 shows the measurement period in extenso for Patient 12. After experiencing the first VA event, the compliance of Patient 12 decreased with interrupted wear time lasting several weeks and he performed fewer steps, potentially recovering from the event. However, before the second event, reduced compliance was observed again, with partial Fitbit wear during the day. The change in wear indicates that his behaviors or physical state may have changed (e.g., he started a new type of activity that may have contributed to the VA event). Patient 18 (male, 47-year-old, device type ICD) was monitored over a period of 993 days (Table A2). He experienced 20 VA events, nine of type VT, six of type VT1, and five of type VF-VT (Table A1) over a period of 972 days (Table A2). He wore the Fitbit for 326 days (Table A2) of which up to 218 were valid (Tables A3-A5). He was less active than average (Table A6), measured more sleep than average (Table A7), and had a lower heart rate than average (Table A8). He only evoked doubts about using the Fitbit device, indicating that "sensing is more useful than activity data" [20], but was otherwise compliant in wearing it.
We first observed a decrease in wear after a series of VA events for Patient 18. He was compliant in the first three months when both physical activity and sleep were measured. Then, wear continued for one month, but almost without sleep. During this period, an event occurred. After the event, Fitbit recorded one month of similar compliance and three months of reduced compliance. A second event was followed by increased compliance for less than two months. Finally, the third series of events along 15-20 days coincided with the cessation of compliant wear. Figure A4 shows the measurement period in extenso for Patient 18.
Before the event, the wearing pattern changed, indicating that Patient 18 may have begun a new activity or entered a physical state that reduced wear at night or natural sleep and culminated in a VA. One month after the event, the wearing pattern continued, indi-  (Table A2). He experienced 20 VA events, nine of type VT, six of type VT1, and five of type VF-VT (Table A1) over a period of 972 days (Table A2). He wore the Fitbit for 326 days (Table A2) of which up to 218 were valid (Tables A3-A5). He was less active than average (Table A6), measured more sleep than average (Table A7), and had a lower heart rate than average (Table A8). He only evoked doubts about using the Fitbit device, indicating that "sensing is more useful than activity data" [20], but was otherwise compliant in wearing it.
We first observed a decrease in wear after a series of VA events for Patient 18. He was compliant in the first three months when both physical activity and sleep were measured. Then, wear continued for one month, but almost without sleep. During this period, an event occurred. After the event, Fitbit recorded one month of similar compliance and three months of reduced compliance. A second event was followed by increased compliance for less than two months. Finally, the third series of events along 15-20 days coincided with the cessation of compliant wear. Figure A4 shows the measurement period in extenso for Patient 18. Before the event, the wearing pattern changed, indicating that Patient 18 may have begun a new activity or entered a physical state that reduced wear at night or natural sleep and culminated in a VA. One month after the event, the wearing pattern continued, indicating that the patient may have continued daily life without changing his activity or state that contributed to the first event. Towards the end of the monitoring period, the patient may have experienced too many events or lost faith in using the device for self-monitoring.

Appendix B.2. Inferential Analysis
Tables A9-A11 illustrate the results across all conditional logistic regression models for the three validation scenario types for days. Table A12 summarizes the results across all validation scenario types for days, in agreement with the results reported in the main paper. Additionally, Table A13 presents the results by period duration in the third validation scenario.

Appendix B.2 Inferential Analysis
Tables A9-A11 illustrate the results across all conditional logistic regression models for the three validation scenario types for days. Table A12 summarizes the results across all validation scenario types for days, in agreement with the results reported in the main paper. Additionally, Table A13 presents the results by period duration in the third validation scenario.