Validation of Commercial Activity Trackers in Everyday Life of People with Parkinson’s Disease

Maintaining physical activity is an important clinical goal for people with Parkinson’s disease (PwPD). We investigated the validity of two commercial activity trackers (ATs) to measure daily step counts. We compared a wrist- and a hip-worn commercial AT against the research-grade Dynaport Movemonitor (DAM) during 14 days of daily use. Criterion validity was assessed in 28 PwPD and 30 healthy controls (HCs) by a 2 × 3 ANOVA and intraclass correlation coefficients (ICC2,1). The ability to measure daily step fluctuations compared to the DAM was studied by a 2 × 3 ANOVA and Kendall correlations. We also explored compliance and user-friendliness. Both the ATs and the DAM measured significantly fewer steps/day in PwPD compared to HCs (p < 0.01). Step counts derived from the ATs showed good to excellent agreement with the DAM in both groups (ICC2,1 > 0.83). Daily fluctuations were detected adequately by the ATs, showing moderate associations with DAM-rankings. While compliance was high overall, 22% of PwPD were disinclined to use the ATs after the study. Overall, we conclude that the ATs had sufficient agreement with the DAM for the purpose of promoting physical activity in mildly affected PwPD. However, further validation is needed before clinical use can be widely recommended.


Introduction
Maintaining sufficient levels of physical activity (PA) is recognized as an important component in the treatment of Parkinson's disease (PD) [1,2] and is adopted in current clinical guidelines of physiotherapy [3,4]. A recent retrospective cohort study showed that higher PA levels were associated with better gait and balance scores six years later in PD [5].
Although this class II study should be interpreted cautiously, the case for improving or at least maintaining PA levels in PD is strong [6]. Furthermore, various studies have shown that high PA is associated with a reduced risk for conversion to PD [7,8], and increasing both PA and exercise intensity were recommended as the most important lifestyle changes to delay the onset of PD in prodromal cases [6].
Despite all these positive findings indicating the importance of PA as a strategy to impact disease outcomes it is also known that maintaining or improving PA is not trivial in PD. The same longitudinal study mentioned above [5], demonstrated that PA levels declined significantly over six years in PD, while remaining stable in age-matched healthy controls (HCs). Even in de novo PwPD with low disease severity [9] and in a cohort with 1.
To examine whether the ATs can detect day-to-day fluctuations accurately and adequately rank days with high and low step counts when compared with the researchgrade device in PD versus the HC.

2.
To explore the correlations with other gait and balance capacity measures in PD.

3.
To examine the compliance and user-friendliness of the tracking devices in PD.

Participants
For this study, 28 PwPD and 18 HCs were prospectively recruited through a GDPRcompliant database of study volunteers between August 2018 and March 2021. Data of another fifteen HCs (total HC n = 33) were retrospectively obtained from a study in chronic obstructive pulmonary disease applying exactly the same methodological procedures [19]. Inclusion criterion for both groups were that they were aged 40 years or older. For PwPD, they also needed to have a diagnosis of idiopathic disease according to the UK Brain Bank criteria [20]. Exclusion criteria were: (I) other self-reported neurological diseases (than PD) or other conditions significantly affecting mobility; (II) having a body mass index (BMI) over 40 kg/m 2 ; (III) using an assistive walking device; (IV) >1 fall per week in the past 6 months based on self-report; and (V) Mini-Mental State Examination (MMSE) score of <24/30. All participants provided written informed consent prior to participation in accordance with the Declaration of Helsinki and Ethical approval was obtained from the Ethical Committee Research UZ/KU Leuven (S60227).

Instruments
Three different commercial ATs were used in this study: the Fitbit Zip, Fitbit Alta, and Fitbit Inspire (Fitbit Inc., San Francisco, USA). The Fitbit Zip and Fitbit Alta were designed for wearing on the hip and wrist, respectively. The more recently developed Fitbit Inspire was designed for both locations and was brought into the study to replace defective Fitbit Zips and Fitbit Alta ATs, which were no longer available. All three commercial ATs entailed a triaxial accelerometer and proprietary algorithms to provide direct feedback to the wearer via the device display. The Fitbit Zip had a 3 V coin battery with an autonomy of 4 to 6 months. The Fitbit Alta and Fitbit Inspire had built-in batteries requiring charging every 5 to 7 days. The information displayed by the trackers was reduced to the minimum (step counts and clock) and the ATs' built-in prompts and rewards were disabled. The research-grade monitor was the Dynaport Movemonitor (DAM-McRoberts BV, The Hague, the Netherlands), which was worn on the lower back with an elastic strap. The DAM contains a triaxial accelerometer, a triaxial magnetometer, a temperature sensor, and a barometer. It has a maximal measurement duration of 14 days without charging the battery at a sample frequency of 200 Hz. The DAM only records movement signals and does not run an onboard algorithm allowing direct feedback to the wearer. The DAM was previously validated in a laboratory setting for detecting step counts in PwPD compared to videotaped step counts (n = 32; ICC = 0.98; absolute percentage error 6.9 ± 3.0) [21,22]. Of note, however, is that short walks resulted in the highest absolute percentage error of step counts (3 m: 18.4 ± 21.0; 5 m: 9.6 ± 3.4). The DAM is currently used in an ongoing study to obtain regulatory endorsement for real-world digital mobility in PD and other chronic diseases [23].

Procedure
In this study, participants underwent a baseline assessment after which a 14-day activity monitoring period was started. During the baseline assessment, the following measures were collected in both groups: (1) demographics, (2) the Montreal Cognitive Assessment Next, participants were instructed to wear both an AT and the DAM for 14 days during wake time except for during bathing, showering, or swimming. No specific instructions were given to monitor their step counts regularly during use of the AT. Each participant wore a Fitbit Zip/Fitbit Inspire at the hip (Hip-AT) and a Fitbit Alta/Fitbit Inspire at the wrist (Wrist-AT). AT settings were adjusted according to the age, height, and weight of the participants and were worn on the same body side (see Figure 1). The HC participants wore the AT on their non-dominant side to reduce noise due to other arm movements. The DAM was positioned at the lower back and fastened by a strap; see Figure 1. PwPD wore the commercial AT on their least affected side (wrist or hip) as determined by the MDS-UPDRS-III. Participants received a visual demonstration on how to put on and recharge the AT. They also received a manual describing all the information for home use. Although batteries could last for multiple days, participants were instructed to recharge them each night to avoid step count discrepancies between the AT and DAM due to battery depletion. After the 14-day monitoring period, the devices were re-collected. A brief exit questionnaire (see Supplementary Materials), developed in our center [19], evaluated the user experience on a 5-point Likert scale including the following items: (1) the comfort of wearing, (2) recharging the AT, (3) how often they looked at the AT display, (4) for how long they would wear the AT in future daily routine, and (5) which AT they preferred. Finally, open questions were included to list the ATs' positive and negative aspects. comfort of wearing, (2) recharging the AT, (3) how often they looked at the AT display, (4) for how long they would wear the AT in future daily routine, and (5) which AT they preferred. Finally, open questions were included to list the ATs' positive and negative aspects. Daily step counts were extracted from the online Fitbit platform after re-collecting the commercial devices. The DAM-data were uploaded and processed on the McRoberts cloud service, generating activity reports which included both the step counts and wearing time. All data were manually extracted from the respective platforms and entered into REDCap (www.project-redcap.org). Wearing time was only registered by the DAM. Only days with a wearing time of 8 h or more and only participants with at least 3 days of valid step count data from both the AT and the DAM were included for analysis [19].

Statistical Analysis
Normality was tested using Shapiro-Wilk tests and inspecting histograms and Q-Q plots. Parametric statistics were applied for all analysis. In case of abnormally distributed data, non-parametric statistics were also applied. If both parametric and non-parametric analyses showed similar results, parametric results are reported.
Criterion validity for daily step counts of the commercial AT was assessed through a 2 × 3 (group × device) ANOVA and absolute agreement intraclass correlation coefficients (ICC2,1) between each AT and the DAM per group. Bland-Altman plots were used to visually investigate the agreement between the AT and the DAM.
To evaluate whether the ATs were able to monitor day-to-day fluctuations, the delta between the step counts for each consecutive day was calculated. Next, for each subject, this delta was averaged and expressed as a percentage of the subject's average step count over the 14-day period. A 2 × 3 ANOVA was used to test the differences between groups and ATs for this daily variance. In addition, the step counts for each participant and for each device were ranked separately from the most to the least active day. Next, the consistency of the ranking between the DAM and that of the AT was investigated using a Kendall correlation. Correlations were interpreted as: weak correlation r = 0.30-0.49; moderate correlation r = 0.50-0.69; strong correlation r = 0.70-0.89; and very strong correlation r ≥ 0.90 [24].
To explore the concurrent validity of AT-based step counts, Pearson correlations were used to examine the association with disease measures and gait and balance capacity outcomes in PwPD only. Differences in compliance between PwPD and HCs were calculated using an independent t-test on the average number of days of use and average Daily step counts were extracted from the online Fitbit platform after re-collecting the commercial devices. The DAM-data were uploaded and processed on the McRoberts cloud service, generating activity reports which included both the step counts and wearing time. All data were manually extracted from the respective platforms and entered into REDCap (www.project-redcap.org). Wearing time was only registered by the DAM. Only days with a wearing time of 8 h or more and only participants with at least 3 days of valid step count data from both the AT and the DAM were included for analysis [19].

Statistical Analysis
Normality was tested using Shapiro-Wilk tests and inspecting histograms and Q-Q plots. Parametric statistics were applied for all analysis. In case of abnormally distributed data, nonparametric statistics were also applied. If both parametric and non-parametric analyses showed similar results, parametric results are reported.
Criterion validity for daily step counts of the commercial AT was assessed through a 2 × 3 (group × device) ANOVA and absolute agreement intraclass correlation coefficients (ICC 2,1 ) between each AT and the DAM per group. Bland-Altman plots were used to visually investigate the agreement between the AT and the DAM.
To evaluate whether the ATs were able to monitor day-to-day fluctuations, the delta between the step counts for each consecutive day was calculated. Next, for each subject, this delta was averaged and expressed as a percentage of the subject's average step count over the 14-day period. A 2 × 3 ANOVA was used to test the differences between groups and ATs for this daily variance. In addition, the step counts for each participant and for each device were ranked separately from the most to the least active day. Next, the consistency of the ranking between the DAM and that of the AT was investigated using a Kendall correlation. Correlations were interpreted as: weak correlation r = 0.30-0.49; moderate correlation r = 0.50-0.69; strong correlation r = 0.70-0.89; and very strong correlation r ≥ 0.90 [24].
To explore the concurrent validity of AT-based step counts, Pearson correlations were used to examine the association with disease measures and gait and balance capacity outcomes in PwPD only. Differences in compliance between PwPD and HCs were calculated using an independent t-test on the average number of days of use and average wearing time. User experience and preferences were analyzed in PwPD only. Questionnaire data were analyzed descriptively as in the original publication [19]. Positive and negative statements were counted and added to the results only to help interpretation of the Likert-scale rated items.
All data were analyzed using SPSS version 28.0 (IBM, Armonk, NY, USA) and the significance level was set at p < 0.05 for all analyses. In case of significant interaction effects in the ANOVA analyses, Bonferroni corrected post hoc tests were applied. Exploratory correlation analyses were not corrected for multiple testing.

Results
Three HC participants were excluded because they had less than three valid days of step count. Demographics are presented in Table 1, showing that the groups were matched for age. As expected, PwPD had a significantly shorter 6 MinWT distance and higher subjectively reported walking difficulties on the 12-WS compared to the HC. Twelve PwPD self-reported having freezing of gait. Furthermore, the life space assessment tended to be smaller for PwPD than for HC (p = 0.06).

Criterion Validity
Overall, there was a significant interaction effect (p < 0.001, see Table 2 and Figure 2A), with between-group post hoc tests revealing that both ATs measured fewer steps/day in PwPD compared to HCs, a discriminative ability which was similar to that of the DAM (p < 0.01). Within the DAM PwPD reached 87% of the HC values, within Hip-AT 83%, and within Wrist-AT 70% of HC values. Furthermore, the Hip-AT significantly underestimated the steps/day compared to the DAM, and this in both PwPD and HCs (∆PwPD: −746 (−10%) steps/day; p < 0.001; Figure 3A, ∆HC: −505 (−6%) steps/day; p < 0.001; Figure 3C). In contrast, while the Wrist-AT significantly overestimated the steps/day in the HC (∆ 1613 (20%) steps/day; p < 0.001; Figure 3D), there was no significant difference with what the DAM found in PwPD (∆ −243 (−3%) steps/day; p = 0.29; Figure 3B). Despite these errors, overall daily step counts derived from the AT had good to excellent agreement with the step detections of the DAM (ICC 2,1 > 0.83, see Table 2). These findings are supported by the Bland-Altman plots depicted in Figure 2. Interestingly, although the bias of the Hip-AT (−746 steps/day; Figure 3A) was larger than that of the Wrist-AT (−243 steps/day; Figure 3B), the 95% confidence intervals were smaller for the Hip-AT in PwPD, resulting in better ICC-values. supported by the Bland-Altman plots depicted in Figure 2. Interestingly, although the bias of the Hip-AT (−746 steps/day; Figure 3A) was larger than that of the Wrist-AT (−243 steps/day; Figure 3B), the 95% confidence intervals were smaller for the Hip-AT in PwPD, resulting in better ICC-values.

Detection of Daily Fluctuations
As presented in Table 3 and Figure 2B, a significant interaction effect was observed for the daily fluctuations (p = 0.005). Post-hoc analysis revealed that both the Hip-AT (p < 0.001) and the Wrist-AT (p = 0.03) significantly overestimated the daily fluctuations with 12.8% and 9.5%, respectively, in comparison with the DAM in PwPD only. No significant differences were observed between HCs and PwPD within each device (all p > 0.20).

Detection of Daily Fluctuations
As presented in Table 3 and Figure 2B, a significant interaction effect was observed for the daily fluctuations (p = 0.005). Post-hoc analysis revealed that both the Hip-AT (p < 0.001) and the Wrist-AT (p = 0.03) significantly overestimated the daily fluctuations with 12.8% and 9.5%, respectively, in comparison with the DAM in PwPD only. No significant differences were observed between HCs and PwPD within each device (all p > 0.20). In line, the Kendall analysis indicated that the ranking of high-and low-step days was significantly associated between each commercial AT and the DAM, although less optimally in PwPD (Hip AT: r = 0.64; p < 0.001, Wrist-AT: r = 0.60; p < 0.001) compared to HCs (Hip-AT: r = 0.74; p < 0.001, Wrist-AT: r = 0.64; p < 0.001). In Figure 4, the larger dots indicate that more participants received a similar ranking from the ATs versus the DAM. PwPD had more scattered ranking in both Hip-AT ( Figure 4A) and Wrist-AT ( Figure 4B) in comparison with HCs ( Figure 4C and 4D, respectively). This worse scatter was more prominent in the Wrist-AT in comparison with the Hip-AT, although this ranking difference between ATs was similar in HC.

p-value DAM-Hip-AT
Post hoc contrast p-value DAM-Wrist-AT p = 0.03 p > 0.99 DAM = Dynaport Movemonitor; Hip-AT = Hip worn activity tracker; Wrist-AT = Wrist worn ity tracker; Significant p-values are indicated with bold. Values were calculated as mean (sta deviation) of the daily fluctuations relative to the average steps/day (%).
In line, the Kendall analysis indicated that the ranking of high-and low-step was significantly associated between each commercial AT and the DAM, althoug optimally in PwPD (Hip AT: r = 0.64; p < 0.001, Wrist-AT: r = 0.60; p < 0.001) compa HCs (Hip-AT: r = 0.74; p < 0.001, Wrist-AT: r = 0.64; p < 0.001). In Figure 4, the large indicate that more participants received a similar ranking from the ATs versus the PwPD had more scattered ranking in both Hip-AT ( Figure 4A) and Wrist-AT (Figu in comparison with HCs ( Figure 4C and 4D, respectively). This worse scatter was prominent in the Wrist-AT in comparison with the Hip-AT, although this ranking ence between ATs was similar in HC.

Concurrent Validity
Both the mean step count measure by Hip-AT and the Wrist-AT outcomes in PD correlated significantly with the 6 MinWT (both: R = 0.55; p < 0.01), which was similar for the DAM (R = 0.56; p < 0.01). Similarly, all three devices' step counts correlated significantly with the Mini-BESTest

User Experiences
Compliance with the AT was comparable between HCs and PwPD with an average daily wear time of 13.56 (1.70) hours in PwPD and 14.12 (1.61) hours in HCs (p = 0.20). However, the number of valid days was higher in PwPD (13.25 ± 0.93) in comparison to HCs (11.70 ± 2.65; p = 0.005). No differences between the number of valid days were determined between the retrospectively (11.71 ± 2.08) and prospectively (11.69 ± 2.99; p = 1.00) included HCs. Note that the three excluded HC participants for insufficient days of data were not included in this analysis. Table 4 details the user experiences in PwPD only. Wearing the Wrist-AT was considered to be pleasant more often and the number of steps/day was more frequently checked on this device compared to the Hip-AT. In a similar vein, 21 (75%) of PwPD preferred the Wrist-AT, while only six (21%) preferred the Hip AT. Four of the 6 PwPD with a preference for the Hip-AT indicated that this was because of their fine motor difficulties in strapping on the Wrist-AT. The other two PwPD were surprised by the differences in the step counts between the Hip-AT and Wrist-AT, and had the impression that the Hip AT was more accurate. Only 11 (39%) PwPD were willing to use the Wrist-AT for a year at least. Thirteen PwPD were unwilling to use the Hip-AT again and for six PwPD the same applied for the Wrist-AT. Those unwilling to use the Wrist-AT were also unwilling to use the Hip-AT. Seven PwPD were willing to continue with the Wrist-AT, of which four were even willing to do this for a year at least. Overall, 22% of the PwPD disinclined further use of an AT in daily life. How long would you be willing to wear the tracker in the future as part of your clinical routine? * A year or longer 11 (39%) 6 (22%) Months 5 (18%) 5 (18%) Weeks 4 (14%) 3 (11%) Days 1 (3.5%) 0 (0%) Never 6 (22%) 13 (46%) * One PwPD did not respond to this question.

Discussion
This study investigated the validity and user experience of commercial ATs for step count monitoring in PwPD. We contrasted the ATs' ability to measure step counts to that of a research-grade device by comparing their performance between PwPD and HCs. Contrary to our hypothesis, step count measurement was worse in HCs compared to PwPD for the Wrist-AT, showing a consistent overestimation by 20% in HCs. Even though the Hip-AT significantly underestimated the number of steps, there was excellent agreement between the step counts of this Hip-AT with the research-grade monitor in PwPD, in contrast to the Wrist-AT. Otherwise, a largely similar pattern of good agreement between devices was found between and within groups. Furthermore, the between-group post-hoc analyses indicated that the ATs were able to discriminate PwPD from HCs as well as the DAM. These results are encouraging as they were derived from prolonged daily life walking for 14 days, constituting a unique feature of this study. Previous work also found valid results when comparing ATs with different ground truths, i.e., investigator in situ counts and video-based step detection [14][15][16][17]. However, these reports relied on shorter measurement periods and limited gait protocols.
Another important result from this study was that the ATs were able to measure daily fluctuations of step counts. Although daily fluctuations were overestimated by both ATs in comparison with the DAM in PwPD, the ATs could rank more from less physically active days similarly to the DAM. Taken together, this means that caution is warranted when interpreting fluctuations between consecutive days, whereas progression over a period of 14 days can be discerned reliably. This ability holds promise for future use of commercial ATs in a therapeutic context. Two recent studies have shown that activity tracking in conjunction with therapeutic advice delivered remotely was able to impact gait and balance capacity [25] and prevent the decline of step counts in one year in a more severely affected subgroup of PwPD [13]. However, as a considerable group of PwPD (43%) indicated that they were not inclined to use the devices for months or years as part of a clinical routine, adopting ATs in a therapeutic setting for PA stimulation may not be that straightforward in PwPD. In line with prior work [11], ATs may need to be integrated into rehabilitation programs by PD-specialized healthcare professionals in order to achieve optimal PA levels.
We found significant correlations between step counts and the 6 MinWT, a wellreported test of prolonged walking capacity, and the Mini-BESTest, representing balance capacity. Furthermore, higher step counts were modestly associated with higher LSA scores, a measure of self-reported mobility. These associations were robust as they were observed across all three devices. These results are contrary to the outcomes from other studies that daily life step counts would represent a different construct than gait capacity measures [26]. The discrepancy may be attributed to the type of ATs in the present study that allowed participants to make use of feedback displays. Indeed, 79% of the PwPD used this function at least once a day. This may have encouraged participants to 'live-up' to their capacity level. If so, it will not have influenced the validity outcomes as ATs and the DAM were always worn together. Speculatively, it also underscores the potential of the AT as a motivational tool for therapeutic purposes. However, since this study was limited to 14 days, it remains unknown how long the possibility of a 'boosting' effect would be maintained without therapeutic follow-up. The lack of significant associations between step counts as derived from the commercial ATs and measures of disease severity in the present study indicates the need to use more refined outcomes from research-grade ambulatory monitoring devices, such as the DAM, as possible biomarkers of disease severity and progression [23].
Preference for wrist-worn devices concurs with other studies investigating compliance with wearable sensors. Even though Silva de Lima et al. [27] did not compare a wrist-worn device with others, they found that in 805 participants with PD, compliance with a wristworn device reached 62-68% hours/participant/day. This rate only declined by 23-26% after 13 weeks. The reason for liking the wrist-worn device appeared to be due to the ease of checking the number of steps. However, two PwPD in our study found it more difficult to apply the Wrist-AT versus the Hip-AT. Interestingly, the Wrist-AT seemed to overestimate the step counts, particularly in HCs, compensating for the PD-related underestimation of step counts. This overestimation in HCs could be attributed to the fact that upper limb activities, such as folding laundry, were erroneously detected as steps [28]. We attempted to minimize this by instructing the HCs to wear the Wrist-AT on their non-dominant hand. As PwPD are more restricted in manual tasks [29], this drawback might not have impeded step estimations as much in PwPD. To minimize the effect of reduced arm swing on Wrist-AT's step count detection, we instructed the PwPD to wear the Wrist-AT on their least affected side. However, since then, another study has shown that step counts from the more affected wrist may be more accurate [30]. Future studies need to examine why the affected arm enhances the accuracy of the detection. Possibly, this may be because the stationary arm is nearer the body's center of mass [28] and thus closer to the spatial location of the DAM at the lower back. Yet, our lower accuracy for the Hip-AT compared to the DAM does not to support this notion.
Several limitations need to be considered when interpreting the present findings. In this study, we validated the commercial ATs against a research-grade DAM, which is considered a well-validated activity monitor available on the market [21,22]. Despite the fact that the DAM was previously validated for step measurement in PwPD, and is currently used in a large ongoing validation study including four different disease cohorts [23], most of its validation was conducted in a laboratory setting using straight-line walking. Only recently, the algorithms for the DAM's step detection underwent further technical validation in a semi-structured and a daily life setting, improvements which were not yet available for implementation in the present study [31]. The commercial ATs under investigation in this study also did not allow access to the raw data of the internal sensor hardware, precluding passing through a technical validation framework described by Mazza et al. [31]. As a result, this study does not offer recommendations on how to improve the accuracy of the step count readings.
Although simultaneous use of the devices was a strength of this study, at the same time, participants were able to compare results, which may have influenced the subjective evaluation of the devices. In contrast to the daily wear-time of the DAM, the wear-time of the ATs could not be recorded objectively. Furthermore, valid days were based on the availability of data in all three devices, which could have been an underrepresentation of the actual compliance in wear time. PwPD were in the early to mid-stage of PD without cognitive impairment, having adequate activity levels, and this in a convenient sample size, which limits the generalizability of the findings to the wider population of PwPD. However, 43% of this cohort presented with freezing of gait and overall PwPD had lower step counts, suggesting that gait disorder was present in this group as expected. Finally, the differences between PwPD and HCs could be explained by the COVID-19 restrictions, possibly affecting the groups' activity levels differently. All but three of the HCs were assessed prior to the COVID-19 pandemic, while this was only the case for seven of the 28 PwPD. Still, the within-subject comparison between the ATs and the research-grade monitor were not affected by the pandemic.

Conclusions
This study demonstrated that although commercial ATs lack some accuracy in registering daily step counts compared to a research-grade device, they have sufficient criterion validity for daily use in early-to mid-stage PwPD. We base this conclusion on the high agreement found between the ATs and the research-grade device for global step counts, as well as on their ability to differentiate high from low step counting days. The concurrent validity with other mobility outcomes also supports the use of ATs, as does the excellent compliance and adequate user-friendliness. About half of the PwPD indicated that they would consider continued use for a prolonged period. Therefore, we cautiously suggest that commercial ATs may be useful tools in therapeutic programs aimed to enhance daily PA levels. However, we also foresee that therapists' input may be required to encourage more severely affected PwPD to apply ATs consistently to facilitate their long-term use.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethical Committee Research UZ/KU Leuven (S60227).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to GDPR restrictions on pseudonymized participant information.