Predicting Fall Counts Using Wearable Sensors: A Novel Digital Biomarker for Parkinson’s Disease

People with Parkinson’s disease (PD) experience significant impairments to gait and balance; as a result, the rate of falls in people with Parkinson’s disease is much greater than that of the general population. Falls can have a catastrophic impact on quality of life, often resulting in serious injury and even death. The number (or rate) of falls is often used as a primary outcome in clinical trials on PD. However, falls data can be unreliable, expensive and time-consuming to collect. We sought to validate and test a novel digital biomarker for PD that uses wearable sensor data obtained during the Timed Up and Go (TUG) test to predict the number of falls that will be experienced by a person with PD. Three datasets, containing a total of 1057 (671 female) participants, including 71 previously diagnosed with PD, were included in the analysis. Two statistical approaches were considered in predicting falls counts: the first based on a previously reported falls risk assessment algorithm, and the second based on elastic net and ensemble regression models. A predictive model for falls counts in PD showed a mean R2 value of 0.43, mean error of 0.42 and a mean correlation of 30% when the results were averaged across two independent sets of PD data. The results also suggest a strong association between falls counts and a previously reported inertial sensor-based falls risk estimate. In addition, significant associations were observed between falls counts and a number of individual gait and mobility parameters. Our preliminary research suggests that the falls counts predicted from the inertial sensor data obtained during a simple walking task have the potential to be developed as a novel digital biomarker for PD, and this deserves further validation in the targeted clinical population.


Introduction
Parkinson's disease (PD), a progressive neurodegenerative disease, has significant deleterious effects on gait and balance. The prevalence of PD has been estimated as 0.3% in industrialized countries [1], increasing with age to 1% in the over-60s, rising further in the over-80s. The costs associated with Parkinson's disease are significant, estimated to be $23 Bn per year [2,3] in the US and £449 M-£3. 3 Bn per year in the UK [4].
The clinical manifestation of PD is characterized by a broad spectrum of motor and non-motor symptoms [5], and the well-recognized four cardinal features of PD are tremor at rest, rigidity, akinesia (or bradykinesia) and postural instability. A clear symptom of the disease, known as Parkinsonian gait, is characterized by slow, shuffling steps, coupled with impaired dynamic balance.
People with PD are at a much higher risk of falls compared to the general population [6]. They are also twice as likely to fall as patients with other neurological conditions [7,8], falling more frequently particularly in the advanced stages of the disease. It has been estimated that 38-68% of PD patients will fall at some point during the course of their disease [6,[9][10][11]; however, a range of novel medications and non-pharmacological interventions are under development to address this unmet need [12][13][14][15]. The current clinical evidence suggests the best predictor of a fall in a person with PD is the occurrence of a fall in the previous year [16]. As a result, some clinical trials are adopting self-reported falls during a follow-up period as the primary endpoint [12]. When falls are used as an outcome measure, longitudinal studies with longer durations are needed to obtain sufficient falls data to detect a drug compound's effect when compared to the baseline. Furthermore, a lengthy study burdens the patients and may delay the time to market for a compound or intervention that could potentially reduce the frequency of falls.
Accurately measuring the number of falls per patient in a clinical trial can be challenging. Data collection usually involves self-reporting either via diaries or regular (e.g., weekly) investigator follow-up. Self-reporting is prone to bias as it relies on patient recall and can be affected by the individual's perception of a fall [17]. In addition, inaccurate reporting of falls is well-documented [18]. Fall detection technologies use body-worn or ambient sensors to measure or detect the impacts associated with falls. While these systems have been shown to be accurate and sensitive in detecting falls in situations where young adults simulate falls under controlled conditions [19,20], in real-world settings, they have been shown to suffer from a significant rate of false positives. Moreover, they are prone to noncompliance with regard to the long-term wearing of the device [19,21].
The assessment of falls risk in PD patients by measuring individual predictors of falls, such as pathological gait or impaired balance, provides an opportunity to obtain an indication of each patient's individualized risk of falling [22][23][24][25][26]. In contrast to directly detected falls, falls risk can be assessed frequently, objectively and reliably using clinical tests quantified using wearable sensors in a clinical setting or potentially under free-living conditions [24,[27][28][29][30][31]. Moreover, utilizing falls risk assessment as an outcome measure has the potential to reduce the trial sample size and duration. The rate or number of falls observed in PD patients during a clinical trial is often used as a primary outcome measure to capture meaningful, interpretable change attributable to an intervention. However, a digital biomarker predicting the number of falls is currently lacking.
A more promising approach is to use wearable sensors to provide a more sensitive and objective assessment of the response to intervention than what is currently offered by typical functional tests (such as the TUG (Timed Up and Go) test, 180 • turn test, the Tinetti Scale [32], the Functional Reach Test and the Berg Balance Test [33]). As an example, the TUG is a standard clinical test of mobility where patients are observed and timed as they rise from a chair, walk 3 meters, turn, walk back to the chair and sit back down. The time taken to complete the test (TUG time), measured using a stopwatch, is compared with standard values with longer times associated with a greater risk of falls. However, studies using TUG time to distinguish fallers and non-fallers in PD patients report only moderate sensitivity [22,34,35], with similar results reported for other standard functional tests [36,37].
Previous research from our group has shown promising results when instrumenting the TUG test with inertial sensors (QTUG), combining signal processing and machine learning algorithms to produce a statistical estimate of the patient's risk of having a fall [27,38] as well as a statistical estimate of their level of frailty [39]. QTUG has been shown to be reliable in the measurement of gait and mobility [40], as well as accurate in predicting falls in PD [22] and community dwelling older adults [38,41]. We believe a statistical model based on inertial sensor measures of movement has the potential to be used as a surrogate measure of falls counts in patients with PD.
The current literature supports the utility of an array of objective gait and mobility parameters as predictors of falls risk in PD patients; however, in clinical trials, self-reporting of the falls count is still considered the gold standard to evaluate the efficacy of new therapies. To the best of our knowledge, this is the first study employing comprehensive wearable sensor data to develop an algorithm to predict the number of falls that will be experienced by a person with PD. Our work will offer a preliminary clinical validation of a novel digital biomarker that can support the effect detection and interpretation of clinical trial outcomes. Given that falls can be catastrophic for people with PD, therapies and interventions that aim to reduce falls could have a significant and beneficial effect on the quality of life for disease sufferers.

Datasets
Three datasets containing a total of 1057 participants were included in the analysis, including 71 participants previously diagnosed with PD. The data consisted of one set of healthy community dwelling older adults recruited into a large research study (Technology Research for Independent Living "TRIL" dataset), referred to hereafter as the Training Dataset "TD", and two sets of Parkinson's Disease patients (Order of Saint Francis "PD1" dataset and University College Dublin's "Healthy PD", referred to as "PD2" dataset hereafter). All the participants completed at least one QTUG assessment depending on the study protocol. Table 1 below provides summary details on the three datasets included. Further details on each dataset are provided in the sections below and in Appendix A. Of the two PD datasets reported, the PD2 participants were considered less impaired than the PD1 participants (see UPRDS (Unified Parkinson's Disease Rating scale part III) scores in Table 1). Each study received ethical approval from the local ethics committee (see detailed information below). All the data reported here were anonymized and stored in line with data privacy regulations in each country (e.g., HIPAA, GDPR).
Marked differences in falls rates can be observed across the datasets, with the PD2 dataset reporting a rate of 0.37 falls per participant (measured retrospectively) in contrast to the PD1 data, which reported 12.1 falls per participant over the study duration (24 weeks measured longitudinally). The distribution of falls for each dataset is reported in Figure 1, which provides a histogram of the falls counts for each dataset. Falls distributions are clustered around low numbers of falls, with a majority (>50%) of participants in each dataset reporting no falls. The falls data obtained in TD were reported clinically to an experienced research nurse who cross-checked against hospital records (where possible). The PD1 falls data were obtained prospectively through daily falls diaries and collected weekly for 6 months from the baseline. The PD2 falls data were self-reported to the researcher at the time of each QTUG assessment over the 12-week study duration. The differences between the falls outcome data obtained for PD1 and PD2 meant that the two datasets could not be pooled and needed to be analyzed separately. Marked differences in falls rates can be observed across the datasets, with the PD2 dataset reporting a rate of 0.37 falls per participant (measured retrospectively) in contrast to the PD1 data, which reported 12.1 falls per participant over the study duration (24 weeks measured longitudinally). The distribution of falls for each dataset is reported in Figure 1, which provides a histogram of the falls counts for each dataset. Falls distributions are clustered around low numbers of falls, with a majority (>50%) of participants in each dataset reporting no falls. The falls data obtained in TD were reported clinically to an experienced research nurse who cross-checked against hospital records (where possible). The PD1 falls data were obtained prospectively through daily falls diaries and collected weekly for 6 months from the baseline. The PD2 falls data were self-reported to the researcher at the time of each QTUG assessment over the 12-week study duration. The differences between the falls outcome data obtained for PD1 and PD2 meant that the two datasets could not be pooled and needed to be analyzed separately.

Training Dataset (TD)
The data used to train all the statistical models (Training Dataset (TD)) were obtained from the TRIL research project, which examined technologies to support positive ageing and included a focus on the prevention of falls. These data were combined with a number of other smaller datasets arising from separate research studies to form a reference dataset, used to train the falls risk estimate (FRE) classifier models and mobility risk scores included in the Kinesis QTUG™ product [27,38,42,43].
The data consisted mainly of community dwelling older adults assessed at St James Hospital, Dublin, Ireland and included N = 1015 subjects for analysis. Ethical approval

Training Dataset (TD)
The data used to train all the statistical models (Training Dataset (TD)) were obtained from the TRIL research project, which examined technologies to support positive ageing and included a focus on the prevention of falls. These data were combined with a number of other smaller datasets arising from separate research studies to form a reference dataset, used to train the falls risk estimate (FRE) classifier models and mobility risk scores included in the Kinesis QTUG™ product [27,38,42,43].
The data consisted mainly of community dwelling older adults assessed at St James Hospital, Dublin, Ireland and included N = 1015 subjects for analysis. Ethical approval was received from the St James hospital research ethics committee. Each participant completed a battery of functional tests, including the Timed Up and Go and a 6-meter walk, instrumented with inertial sensors. In addition, each participant received a Comprehensive Geriatric Assessment (CGA), which included vision tests, a medication review and a blood pressure and cardiovascular assessment. Participants also received a cognitive function assessment: the Mini Mental State Examination (MMSE). Participants had an average age of 72.2 years, mean height of 166.6 cm and mean weight of: 74.6 kg. Twenty-nine participants reported they had been diagnosed with PD prior to assessment.
Inclusion criteria: inclusion criteria were subjects 60 years and older, with no history of stroke, able to walk without assistance, able to provide written informed consent.
Exclusion criteria: aged under 60 years of age, unable to provide informed consent or MMSE less than 18.

Parkinson's Disease Dataset 1 (PD1)
This study was a single site longitudinal study of Parkinson's disease patients. A total of 16 participants were recruited from the OSF HealthCare-Illinois Neurological Institute (Peoria, IL, USA); sensor data were not available for one participant, leaving 15 participants for analysis (5 female, mean age 67.3 ± 7.1 years). Patients were assessed over a 6-month period; QTUG assessments were conducted on a monthly basis following an initial baseline assessment. A total of 94 QTUG recordings were available for the 15 participants. Participants were evaluated three times using the UPDRS part III: at baseline, 90 days and 180 days. The study [22] included a weekly falls diary as well as SF−36, UPDRS and medication information for each assessment. 'ON' or 'OFF' state was documented only as a function of the clinical data captured and has not been explicitly analyzed. Participants did not receive any pharmaceutical or other intervention over the course of the trial.
All patients were required to provide informed consent. Ethical approval was received from the Peoria Institutional Review Board.
Inclusion criteria: able to provide written informed consent, aged 40 to 80, Idiopathic Parkinson's disease (meeting UK Brain Bank criteria), responsive to Levodopa for at least four years, MMSE score greater than 22 and able to walk at least 3 m independently.
Exclusion criteria: atypical Parkinsonism, Hoehn and Yahr stage V, MMSE 21 or less, use of assisted device for ambulation, co-morbidities affecting balance: severe neuropathy, weakness, bilateral hip replacement, syncopal episodes causing falls, diagnosed with lumbar radiculopathy, spinal stenosis or any other back conditions with the potential to affect fall behavior, drug abuse or alcoholism.

Parkinson's Disease Dataset 2 (PD2)
The second Parkinson's Disease dataset (PD2) arose from the "Healthy PD" study, conducted in University College Dublin (Dublin, Ireland), and examined the effect of a 12-week exercise intervention study [44] in patients with PD.
Twenty-seven subjects, each with idiopathic PD ranging from stage I to stage III on the Hoehn and Yahr (H&Y) staging scale, voluntarily consented to participate in the study. Experimental protocols were approved by the Human Research Ethics Committee for Sciences at University College Dublin. Participants were evaluated four times during the study. Following an initial baseline assessment, each participant took part in an exercise intervention program, was subsequently re-assessed post-intervention and then followed-up three months after study completion. The participants did not receive any pharmaceutical intervention over the course of the trial. The participants reported their history of falls in the previous 12 months at each QTUG assessment; one participant did not provide falls history.
Inclusion criteria: diagnosed with Parkinson's, stage I to stage III on the Hoehn and Yahr (H&Y), able to provide written informed consent.

QTUG Assessment Protocol
Each participant was assessed using a Timed Up and Go (TUG) test, instrumented with inertial sensors (QTUG), placed on each shin below the knee (see Figure 2). QTUG assessments follow a highly prescriptive protocol: test distance measured as exactly 3 meters, turn point marked on the ground using tape (and not marked using a cone). Participants were always instructed to wear comfortable walking shoes and to complete the TUG "as fast as safely possible". Every effort was made to control underfoot conditions (e.g., removing obstacles or loose carpeting) and to ensure at least four-meter linear space, ensuring adequate space to turn. Where possible, participants were encouraged to complete the test without the use of a walking aid; if a walking was used, this was noted in the software.

QTUG Assessment Protocol
Each participant was assessed using a Timed Up and Go (TUG) test, instrumented with inertial sensors (QTUG), placed on each shin below the knee (see Figure 2). QTUG assessments follow a highly prescriptive protocol: test distance measured as exactly 3 meters, turn point marked on the ground using tape (and not marked using a cone). Participants were always instructed to wear comfortable walking shoes and to complete the TUG "as fast as safely possible". Every effort was made to control underfoot conditions (e.g., removing obstacles or loose carpeting) and to ensure at least four-meter linear space, ensuring adequate space to turn. Where possible, participants were encouraged to complete the test without the use of a walking aid; if a walking was used, this was noted in the software.
Each QTUG assessment (QTUG™, Kinesis Health Technologies, Dublin, Ireland) produces 71 different calculated parameters, which include a range of features quantifying gait, mobility, turning and transfers [27] (details on the processing applied to produce each parameter are reported elsewhere [38,45]). In addition, each assessment produces a statistical falls risk estimate (FREsensor) [38,45] based on the inertial sensor data as well as a statistical Frailty estimate based on the inertial sensor data (FEsensor) [39].

Figure 2.
Inertial sensor is placed on the shin using a Velcro strap for each assessment. Sensor data are streamed to a tablet device via Bluetooth. Sensor accelerometer and gyroscope axes are shown. Figure 2. Inertial sensor is placed on the shin using a Velcro strap for each assessment. Sensor data are streamed to a tablet device via Bluetooth. Sensor accelerometer and gyroscope axes are shown.
Each QTUG assessment (QTUG™, Kinesis Health Technologies, Dublin, Ireland) produces 71 different calculated parameters, which include a range of features quantifying gait, mobility, turning and transfers [27] (details on the processing applied to produce each parameter are reported elsewhere [38,45]). In addition, each assessment produces a statistical falls risk estimate (FRE sensor ) [38,45] based on the inertial sensor data as well as a statistical Frailty estimate based on the inertial sensor data (FE sensor ) [39].

Statistical analysis 4.1. Exploratory Analysis
The association of FRE sensor produced by QTUG with falls count was explored for each dataset. A one-way ANOVA with significance level set to p < 0.05, where the number of falls is treated as an ordinal variable, was used to examine this relationship. As the TD dataset was used to develop and validate the current FRE sensor , only the PD1 and PD1 datasets were included in this analysis as they are statistically independent from the data used to generate FRE sensor .
The association of the mobility risk scores with falls counts is examined for the PD2 dataset, which is statistically independent of the reference dataset used in creating the mobility risk scores. A one-way ANOVA (with significance level set to p < 0.05), where falls count is treated as a categorical variable, was used to examine this relationship.
The exploratory analysis of the association of falls counts with individual gait and mobility parameters produced by QTUG is included in Appendix B.

Predictive Model of Falls Counts
Two main approaches were taken to develop a novel method to predict fall rate (counts) in PD:

1.
Using existing trained classifiers to predict falls counts (QTUG FRE and Mobility score models) 2.
Ensemble model based on elastic net models with Poisson regression For each approach, training and validation were carried out using the training data (TD) set, while testing on each of the models was carried out on two independent PD datasets (PD1 and PD2).
Falls count data for all models were log transformed to reduce the effect of zeroinflation on the distribution using the following expression: where min(NumFalls) is zero for all datasets reported here. To analyze or plot predicted falls counts against actual falls, the prediction can be converted back to NumFalls using the following expression: All analyses were performed using Matlab v9.3 (R2017b).

Predicting Falls Counts Using Existing QTUG Risk Estimates (FRE Model)
Two statistically independent datasets (PD1 and PD2) were used to test the performance of a predictive model of falls counts (referred to hereafter as the FRE model) based on measures produced by existing trained QTUG classifier models using three standard features: FRE sensor , FE sensor and TUG time.
FRE sensor and FE sensor were trained using the training dataset, as reported elsewhere [38,39], and are based on regularized discriminant and logistic regression models, respectively. TUG time (the time to complete the TUG test) has been frequently shown in the literature to be associated with falls [46,47].
To test the performance of the FRE model, data for each of the independent PD datasets were then applied to the model using negative binomial regression. Negative binomial regression was used due to zero-inflation and over-dispersion of the falls count data.

Predicting Falls Counts Using QTUG Mobility Risk Scores (Mobility Score Model)
QTUG produces five mobility risk scores that were calculated from the reference dataset (N = 1495) [27,43] (this set contains data from both the TD and PD1 datasets, and a number of other clinical datasets). The PD2 dataset was used to examine the performance of the QTUG mobility risk scores in predicting falls count (Mobility score model). To test the performance of the Mobility score model, data for each of the independent PD datasets were then fitted to each independent dataset using negative binomial regression.

Predicting Falls Counts with Elastic Net Regression
To train and validate a novel predictive model for falls counts in PD using QTUG data, a cross-validated elastic net procedure with a Poisson distribution was used. The model was trained using the training dataset and tested on the two independent PD datasets (PD1 and PD2).
The training data were considered in a number of different subsets as follows: • TD-All-All training data • TD-Fallers-training data excluding non-fallers (number of falls >0) • TD-PD-PD patients within the training dataset only • TD-Fallers-PD-PD patients who had experienced at least one fall • TD-NoPD-training dataset excluding PD patients • TD-Fallers-NoPD-dataset excluding fallers and patients with PD A set of trained regression models was produced for each of the training datasets listed above. To produce a trained model for testing on the independent datasets, the selected model was trained on all available training data and applied to the test set. Where validation was required prior to testing, the training set was split into training and validation sets. In all approaches, model selection was performed using 10-fold cross-validation to reduce bias.
For the elastic net model, an alpha value was set, a priori, to 0.1; models were also a priori constrained to a minimum model size of three and a maximum model size of 20. Model selection was also constrained to include the TUG time and to exclude a number of features with previously reported poor reliability (such as stance time asymmetry).
We considered an ensemble of regression models where the ensemble prediction is completed by predicting falls count as a linear combination of the estimates from the TD-NoPD and TD-Fallers-NoPD models. The ensemble coefficients for the linear combination were obtained through linear regression, estimating true falls count using predictions from each constituent model on the TD-PD validation set.
The PD1 dataset contains a number of very large outlier falls count values (e.g., NumFalls = 127) that distort model predictions. The results are presented for all the data as well as with outliers removed (NumFalls > 10). For the training set, a small number of outliers were removed (where NumFalls > 10). The PD2 dataset did not contain any outliers.

Model Performance Metrics
The performances of each model on the training and testing sets were evaluated using the following metrics: coefficient of determination (R 2 ), root mean squared error (RMSE), Spearman's rank correlation (ρ) and model size (number of features).
The coefficient of determination was included for completeness as a measure of 'goodness of fit'. However, as the falls count data approximate a zero-inflated Poisson distribution, it should be noted that this measure does not always provide a reliable assessment of model fit in the presence of outliers or with low samples sizes.

Results
We report the results for a range of statistical models intended to predict the fall rates in PD patients. The sensor data for all the datasets were not normally distributed. Falls count data were heavily clustered around zero, suggesting that the data may follow a negative binomial or a zero-inflated Poisson distribution.
The results of a battery of statistical tests analyzing the association with falls counts for TD, PD1 and PD2 datasets are included in Appendix B.

Predictive Model of Falls Counts Using QTUG
The results for the predictive model of falls counts using QTUG data are detailed below. This involved training and validating a suite of models using the training (TD) dataset. The selected models were then tested on the two independent PD datasets, which were held out from all the model training.

Existing QTUG Falls Risk Model
This section details the performance of the previously trained QTUG FRE sensor digital biomarker as a surrogate measure of falls counts as well as in classifying falls risk. A significant association between FRE sensor and falls count was observed for the PD2 dataset (F = 4.37, p < 0.05, respectively). The PD2 dataset showed a notable but non-significant association (F = 2.01, p = 0.17) with falls counts. Figure 3 shows the association of FRE sensor with falls counts for each of the PD1 and PD2 datasets.

Predictive Model of Falls Counts Using QTUG
The results for the predictive model of falls counts using QTUG data are detailed below. This involved training and validating a suite of models using the training (TD) dataset. The selected models were then tested on the two independent PD datasets, which were held out from all the model training.

Existing QTUG Falls Risk Model
This section details the performance of the previously trained QTUG FREsensor digital biomarker as a surrogate measure of falls counts as well as in classifying falls risk. A significant association between FREsensor and falls count was observed for the PD2 dataset (F = 4.37, p < 0.05, respectively). The PD2 dataset showed a notable but non-significant association (F = 2.01, p = 0.17) with falls counts. Figure 3 shows the association of FREsensor with falls counts for each of the PD1 and PD2 datasets. The results are reported in Table 2 for the prediction of falls counts using an existing pre-trained QTUG classifier model (FRE model) tested on two independent datasets (PD1 and PD2). A mean RMSE of 0.42 and a mean Spearman's correlation coefficient of 0.30 were obtained when the model performance was averaged across two independent PD datasets.
The PD1 dataset (N = 15) contained a number of very large outlier falls count values (e.g., Number of falls (NumFalls) = 127), which distort model predictions. The results are presented for all the data as well as with outliers removed (Number of Falls >10). In addition, we present the data when grouped into falls count categories (where Number of falls >5 is placed into the five falls categories); the results are provided for three different scenarios (tested on all data, outliers removed and placed into categories 0-5+).
The PD2 data (N = 26) did not contain any outliers, so the results with outliers excluded are not presented. For the training set, outliers were also removed.
In addition, a negative binomial model was fit to the mobility scores (Mobility model) and tested using the PD2 dataset, which yielded ab RMSE of 0.33 and correlation coefficient of 0.55. The results are reported in Table 2 for the prediction of falls counts using an existing pre-trained QTUG classifier model (FRE model) tested on two independent datasets (PD1 and PD2). A mean RMSE of 0.42 and a mean Spearman's correlation coefficient of 0.30 were obtained when the model performance was averaged across two independent PD datasets. The PD1 dataset (N = 15) contained a number of very large outlier falls count values (e.g., Number of falls (NumFalls) = 127), which distort model predictions. The results are presented for all the data as well as with outliers removed (Number of Falls >10). In addition, we present the data when grouped into falls count categories (where Number of falls >5 is placed into the five falls categories); the results are provided for three different scenarios (tested on all data, outliers removed and placed into categories 0-5+).
The PD2 data (N = 26) did not contain any outliers, so the results with outliers excluded are not presented. For the training set, outliers were also removed.
In addition, a negative binomial model was fit to the mobility scores (Mobility model) and tested using the PD2 dataset, which yielded ab RMSE of 0.33 and correlation coefficient of 0.55. Figure 4 below shows a scatter plot (top panel) and boxplots (bottom panel) of predicted falls counts versus actual falls counts for the existing QTUG data model, tested on the two independent datasets. For the PD1 data, outliers where number of falls is greater than 10 are removed.  Figure 4 below shows a scatter plot (top panel) and boxplots (bottom panel) of predicted falls counts versus actual falls counts for the existing QTUG data model, tested on the two independent datasets. For the PD1 data, outliers where number of falls is greater than 10 are removed.

Elastic Net Ensemble Models Training Falls Count Models Using Cross-Validation
The model selection was conducted using 10-fold cross-validation for each model. Once the model selection was complete, all the training data from each subset were fitted to the data by re-substitution in order to produce a set of coefficients for each model, along with results for the performance of the model on the training data. The training results and coefficients for each elastic net regression model and sub-set are detailed in Appendices C and D respectively.

Testing Falls Count Models on Independent PD Datasets
The performance of each of the trained elastic net falls count models on each of the independent Parkinson's disease tests sets is detailed in Table 3 below. The 'TD-All' model yielded a mean RMSE of 0.62 and Spearman's rank correlation coefficient of 0.27 across the two statistically independent PD datasets.  Figure 5 below demonstrates the performance of the ensemble model when tested on the two independent PD datasets as both a scatter plot and a boxplot. The results show a mean RMSE of 1.28 and mean rank correlation coefficient of 0.38.

Discussion
This manuscript reports the results of a wearable sensor-based method to generate surrogate measures of falls counts in PD. We believe this approach has the potential to be further developed as a more sensitive readout of falls in PD. Three distinct and independent datasets, containing a total of 1057 participants (including 71 previously diagnosed with PD) were included in the analysis.
Using a comprehensive kinematic assessment, we explored the possibility to predict falls counts using two existing trained models (the FRE model and Mobility model), previously tested for falls risk assessment, and a novel mathematical approach using elastic net, ensemble learning and Poisson regression. Previous research has shown that the FRE and Mobility score models were associated with falls in a number of populations. The gait and mobility parameters produced by the QTUG algorithm can be noisy and mutually correlated, so the feature vector dimension needs to be reduced through feature or model selection. This point, combined with the fact that falls counts are thought to follow a zeroinflated Poisson process, meant that we considered an elastic net procedure combined with Poisson regression and ensemble modelling to be a promising means to obtain an accurate model of falls counts.
The results for the FRE model found that falls counts can be predicted with a mean RMSE of 0.42 and a mean correlation of 30% with falls counts for two statistically independent datasets of patients with PD. Similarly, the results for the Mobility model found that falls can be predicted with an RMSE of 0.33 and correlation coefficient of 0.55 when

Discussion
This manuscript reports the results of a wearable sensor-based method to generate surrogate measures of falls counts in PD. We believe this approach has the potential to be further developed as a more sensitive readout of falls in PD. Three distinct and independent datasets, containing a total of 1057 participants (including 71 previously diagnosed with PD) were included in the analysis.
Using a comprehensive kinematic assessment, we explored the possibility to predict falls counts using two existing trained models (the FRE model and Mobility model), previously tested for falls risk assessment, and a novel mathematical approach using elastic net, ensemble learning and Poisson regression. Previous research has shown that the FRE and Mobility score models were associated with falls in a number of populations. The gait and mobility parameters produced by the QTUG algorithm can be noisy and mutually correlated, so the feature vector dimension needs to be reduced through feature or model selection. This point, combined with the fact that falls counts are thought to follow a zero-inflated Poisson process, meant that we considered an elastic net procedure combined with Poisson regression and ensemble modelling to be a promising means to obtain an accurate model of falls counts.
The results for the FRE model found that falls counts can be predicted with a mean RMSE of 0.42 and a mean correlation of 30% with falls counts for two statistically independent datasets of patients with PD. Similarly, the results for the Mobility model found that falls can be predicted with an RMSE of 0.33 and correlation coefficient of 0.55 when tested on an independent dataset of PD patients. An ensemble of Poisson regression models produced a mean RMSE of 1.28 and mean rank correlation coefficient of 0.38, while the results, averaged across a separate male/female elastic net Poisson regression model, yielded an RMSE of 0.57 and correlation with falls counts of 0.29 for two independent datasets. The best results reported here were obtained by using an existing trained classifier model (FRE model) to predict falls counts. However, some limitations should be considered in the interpretation of our findings. First, we believe the ensemble modelling approach was hampered by a small sample of PD patients (given the FRE model was trained on a much larger, more varied dataset) and may be more promising and might perform better when trained with a larger and more representative dataset containing a larger proportion of PD patients. Second, the falls count data for PD1 contained a number of extreme outliers and, to evaluate their effects, the analyses were performed both including and excluding those samples. As expected, outliers had a significant impact on the reported outcome, probably due to the fact that the training data (TD) only contained low falls counts and, as a result, the model did not generalize well to extremely large falls count values. Third, due to the small sample size, medication status, on/off periods and disease severity were not considered in the analysis. Furthermore, the PD2 dataset was perhaps unusually healthy for a sample of PD patients, able to undertake a 12-week exercise program (with no participants dropping out) that consisted of spinning, circuit training and tai-chi; this may have led to lower than expected numbers of falls. Moreover, it is important to consider the intrinsic nature of falls, which are essentially a random, stochastic event. For this reason, predicting the exact number of falls that will occur in a given time frame is inherently difficult; the infrequency of the event belies the risk that may or may not be captured within the clinical trial horizon. However, this approach, in combination with statistical fall risk estimates, could provide a clinical view with a higher level of granularity; statistical indices, which produce a probabilistic estimate of future falls based on an analysis of movement, might offer additional insight into the state of the neuromuscular control system as opposed to solely relying on an approach that aims to catch an infrequent and potentially catastrophic endpoint (i.e., a fall event).
The results also suggest a significant association between the number of falls (falls counts) and the sensor-based falls risk estimate model (FRE sensor ) for community dwelling older adults reported previously [22,27] and currently deployed in a commercial product (Kinesis QTUG™). In addition, strong associations were observed between falls counts and a number of individual gait and mobility parameters, particularly measures of gait variability and average values of temporal-spatial gait during the TUG test (Appendix B).
Several studies have examined the value of instrumented gait and mobility tests in the assessment of Parkinson's [22,29,48,49]. A recent meta-analysis of 26 studies [50] found that spatiotemporal characteristics of gait, such as slower walking speed, lower cadence and shorter strides, can increase the risk of future falls. Importantly, clinical features can be combined with spatiotemporal gait dynamics to elucidate falls pathophysiology [51]. The most consistent results are in relation to stride time variability, which was significantly associated with falls counts (fall frequency) and not related to tremor, rigidity or bradykinesia in the "off" state [52]. However, stride time variability significantly improved in response to levodopa, both in fallers and non-fallers, but remained increased in fallers when compared to non-fallers. Hoskovcová et al. [34] found that stride time variability may predict falls in prospectively identified PD fallers, which agrees with previous research by Hausdorff et al. [53] and Lord et al. [54] suggesting the increased gait variability predicts falls in community dwelling older adults [55]. In addition, authors found that stride time variability correlated with the total BDI-II score, which was increased in PD fallers. While the association of various measures of gait with falls in PD has been well-established, this study demonstrates the potential of combining such measures into a predictive model for use as a clinical trial endpoint.
A limitation of this study is the small sample sizes available for the two independent PD datasets; as such, the results reported here may need to be replicated in a larger study.
Future work will aim to replicate these findings in a larger study, including the evaluation of longitudinal relationships between mobility parameters, falls counts and UPDRS scores in PD, and their utility in measuring disease progression. Improvements in the predictive model for falls counts will entail an extended ensemble model approach that would include both the FRE and Mobility score models.

Conclusions
To conclude, our findings support the goal of integrating wearable sensor technology into the clinical and routine care of patients with movement disorders and may offer novel objective endpoints for future clinical trials.

Institutional Review Board Statement:
This manuscript contains data from three different studies. Each study received ethical approval from the local ethics committee (see detailed information below). All data reported here were anonymized and stored in line with data privacy regulations in each country (e.g., HIPAA, GDPR).

Data Availability Statement:
The data that support the findings of this study are available from the Principal Investigators of each of the original studies. Restrictions apply to the availability of these data, which were used under license for the current study, so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of the investigators.

Acknowledgments:
The authors would like to acknowledge the participants in the study.

Abbreviations
Abbreviations used in this manuscript.

Appendix A. Anthropomorphic Data
Anthropomorphic and clinical data for each dataset, stratified by gender, are included in Table A1 below.

Appendix B. Association of QTUG Parameters with Falls Counts
QTUG sensor parameters often do not follow perfectly normal distributions, so nonparametric statistical tests are used where appropriate. As the number of falls is a count variable, it can be modelled as an ordinal variable or a Poisson process.
Three sets of statistical tests were carried out for each dataset using binary falls classification (faller/non-faller) and falls count data (number of falls). The Mann-Whitney rank sum test was used to compare each QTUG parameter for discrimination between faller and non-faller populations, where a faller is defined as having one injurious fall or more than one previous fall. Spearman's rank correlation was used to examine the relationship of each QTUG parameter against falls count data. One-way ANOVA, using the falls count as the categorical variable, was used to examine the association of the QTUG parameters with falls counts.
The alpha value for each hypothesis test was set to p < 0.05 to detect statistical significance. Where possible, falls counts were analyzed in the 0-5+ range, but data for each value in this range were not available for all datasets (e.g., PD2, only 0, 1, 3 counts available).
Note: the statistical tests detailed above should be considered as purely exploratory. Model selection is not performed based on this analysis due to the possibility of a type I error arising from multiple comparisons.
Seventy-one QTUG sensor parameters were included in the analysis per dataset [27,38]. These parameters can be grouped into categories as follows: falls risk and frailty scores, mobility risk scores, temporal gait parameters, spatial gait parameters, turn parameters, gait variability, gait symmetry, angular velocity parameters. Table A2 details the results of a battery of statistical tests conducted on QTUG parameters calculated from the TD dataset (largely drawn from a sample of community dwelling older adults). Analyses examined statistical differences in each measure between fallers and non-fallers as well as the relationship between each measure and the number of falls reported by participants. Table A3 details the results of a battery of statistical tests conducted on QTUG parameters calculated from the PD1 dataset (longitudinal study of Parkinson's Disease patients) [22]. Analyses examined statistical differences in each measure between fallers and non-fallers as well as the relationship between each measure and the number of falls reported by participants. Strong correlations were observed between a number of gait variability and temporal-spatial gait measures and falls counts. Table A2. Statistical analysis of QTUG parameters with falls counts for TD dataset. M and F refer to population stratified by gender (male and female). U refers to Mann-Whitney rank sum statistic, ρ refers to Spearman's correlation coefficient, while F refers to F-score from 1-way Anova test. Statistically significant differences (p < 0.05) are indicated by *.

Mann-Whitney Spearman Anova
Parameter Name Faller (Mean ± Std)  Table A3. Statistical analysis of QTUG parameters with falls counts for PD1 dataset. M and F refer to population stratified by gender. U refers to Mann-Whitney rank sum statistic, ρ refers to Spearman's correlation coefficient, while F refers to F-score from 1-way Anova test. Statistically significant differences (p < 0.05) are indicated by *.

Mann-Whitney Spearman Anova
Parameter Name Faller (Mean ± std) Analysis of the five mobility risk scores with falls counts for the PD2 dataset found non-significant associations between each mobility risk score and the number of falls as follows: • Speed score, F = 2.09, p = 0.15 • Turn score, F = 0.71, p-value: 0.50 • Transfer score, F = 2.14, p = 0.14 • Variability score, F = 1.3, p = 0.29 • Symmetry score, F = 2.41, p = 0.11 Figure A1 below illustrates the association of each mobility risk score with falls counts for the PD2 dataset. Note: falls count data were only available for the count values 0, 1 and 3.    [44]. Analyses examined statistical differences in each measure between fallers and non-fallers, as well as the relationship between each measure and the number of falls reported by participants. Significant associations were found between falls counts and a number of gait and mobility parameters. Table A4. Statistical analysis of QTUG parameters with falls counts for PD2 dataset. M and F refer to population stratified by gender. U refers to Mann-Whitney rank sum statistic, ρ refers to Spearman's correlation coefficient, while F refers to F-score from 1-way Anova test. Statistically significant differences (p < 0.05) are indicated by *.

Mann-Whitney
Spearman Anova   [44]. Analyses examined statistical differences in each measure between fallers and non-fallers, as well as the relationship between each measure and the number of falls reported by participants. Significant associations were found between falls counts and a number of gait and mobility parameters. Table A4. Statistical analysis of QTUG parameters with falls counts for PD2 dataset. M and F refer to population stratified by gender. U refers to Mann-Whitney rank sum statistic, ρ refers to Spearman's correlation coefficient, while F refers to F-score from 1-way Anova test. Statistically significant differences (p < 0.05) are indicated by *.

Appendix C. Model Training Performance
Training set results for Poisson regression models for the TD dataset are detailed in Table A5 below. The results are reported with outliers removed (number of falls >10), resulting in three samples being excluded from the training sets.  Figure A2 below shows the elastic net trace and deviance plots for TD-All model fit. The selected model chosen was for the lambda value yielding the minimum standard error (SE) for a minimum model size of 3 and a maximum model size of 20.

Appendix C. Model Training Performance
Training set results for Poisson regression models for the TD dataset are detailed in Table A5 below. The results are reported with outliers removed (number of falls >10), resulting in three samples being excluded from the training sets.  Figure A2 below shows the elastic net trace and deviance plots for TD-All model fit. The selected model chosen was for the lambda value yielding the minimum standard error (SE) for a minimum model size of 3 and a maximum model size of 20. Figure A2. Elastic net trace and deviance plots for TD-All fit; lambda is selected as the value giving minimum deviance for a model size greater than or equal to 3 and less than or equal to 20. Figure A2. Elastic net trace and deviance plots for TD-All fit; lambda is selected as the value giving minimum deviance for a model size greater than or equal to 3 and less than or equal to 20.

Appendix D. Model Coefficients
Regression coefficients and the selected feature for each model trained on TD data are detailed in the tables below (Tables A6-A12). Each set of model coefficients is identified by the TD data subset used to train it.

Appendix D. Model Coefficients
Regression coefficients and the selected feature for each model trained on TD data are detailed in the tables below (Tables A6-A12). Each set of model coefficients is identified by the TD data subset used to train it.