Simplified Decision-Tree Algorithm to Predict Falls for Community-Dwelling Older Adults

The present study developed a simplified decision-tree algorithm for fall prediction with easily measurable predictors using data from a longitudinal cohort study: 2520 community-dwelling older adults aged 65 years or older participated. Fall history, age, sex, fear of falling, prescribed medication, knee osteoarthritis, lower limb pain, gait speed, and timed up and go test were assessed in the baseline survey as fall predictors. Moreover, recent falls were assessed in the follow-up survey. We created a fall-prediction algorithm using decision-tree analysis (C5.0) that included 14 nodes with six predictors, and the model could stratify the probabilities of fall incidence ranging from 30.4% to 71.9%. Additionally, the decision-tree model outperformed a logistic regression model with respect to the area under the curve (0.70 vs. 0.64), accuracy (0.65 vs. 0.62), sensitivity (0.62 vs. 0.50), positive predictive value (0.66 vs. 0.65), and negative predictive value (0.64 vs. 0.59). Our decision-tree model consists of common and easily measurable fall predictors, and its white-box algorithm can explain the reasons for risk stratification; therefore, it can be implemented in clinical practices. Our findings provide useful information for the early screening of fall risk and the promotion of timely strategies for fall prevention in community and clinical settings.


Introduction
Falls are a major public health problem, and approximately 28-35% of individuals aged ≥65 years fall each year [1]. Fall-related injuries are associated with disability [2] and mortality [3]; the fall-associated financial costs, including those of ambulance services and health and social care, are substantial and continuously increasing worldwide [1]. Therefore, early screening of fall risk is necessary to promote effective fall prevention strategies.
Previous research has revealed several fall risk factors, such as previous fall history [4], gait and balance impairments [4,5], arthritis [4], pain [4], polypharmacy [5,6], and fear of falling (FOF) [7]; thus, multifactorial risk assessment is often recommended [8]. The American Geriatrics Society and British Geriatrics Society (AGS/BGS) Panel has published clinical practice guidelines for the prevention of falls in older persons and provided a conceptual algorithm with multiple risk factors for assessment and intervention to reduce the frequency of falls in older adults [9]. However, statistical examination of decisionmaking algorithms for fall prediction, with respect to hierarchy, or optimal combination of risk assessment have not been fully considered.
Recently, machine learning methods that can iteratively learn nonlinear interactions from large samples using computer algorithms have been applied in various fields, in-cluding disease risk assessment and prediction [10]. In particular, decision-tree analysis can provide an intuitive diagram that represents risk prediction without the need for complicated calculations [11]. Thus, decision-tree analysis has been used in many fields for decision-making purposes to develop models that can classify subjects into various risk categories [12].
We identified several previous studies that have examined the utility of the decisiontree model in predicting falls in community-dwelling older adults [13][14][15]. Stel et al. created a decision-tree model to predict recurrent falls based on known risk factors (e.g., fall history, physical performance, pain, physical activity, and limitation in activities of daily living) and showed that the risk of recurrent falls could be stratified by 9-70% [13]. However, they did not report their performance measures, such as accuracy or area under the curve (AUC), because they did not validate their model with another dataset. Gomez et al. and Lam et al. also proposed fall-prediction models that included multiple risk factors based on a decision-tree analysis, with performance measures for community-dwelling older adults [14,15]. However, their prediction variables included the scores of test batteries (i.e., a short physical performance battery or frailty criteria), which have already been combined with multiple assessment items. Some recent studies have attempted to improve predictive accuracy by using ensemble methods, which create many (hundreds) decision trees while predictions from each tree are aggregated. Speiser et al. developed a prediction model for serious fall injury using random forest method; the authors achieved a prediction accuracy higher than that of a single decision tree model [16]. Ye et al. developed a fall prediction model using extreme gradient boosting with electronic health records to achieve high performance (C-statistic = 0.81) [17]. Although these ensemble methods provide a relatively high prediction accuracy, they have the disadvantage of making it difficult to visually interpret or explain the results. Therefore, there is still opportunity to examine the minimum and optimal combinations of fall predictors consisting of common and easily measurable items, and it is worthwhile to illustrate the results as a single decision tree.
This study aimed to develop a simplified decision-tree algorithm for fall prediction using easily measurable predictors with longitudinal cohort data. We hypothesized that the decision-tree model would predict falls more accurately than a logistic regression model.

Design
In this longitudinal observational study, fall predictors for community-dwelling older adults were assessed in a baseline survey (August 2011 to February 2012) and recent falls were assessed in a follow-up survey (August 2015 to February 2016). The participants were enrolled from a sub-cohort of the National Center for Geriatrics and Gerontology-Study of Geriatric Syndromes, a population-based national cohort study.
All assessments were conducted by trained nurses and study assistants at community centers. Before the study began, we trained all the staff regarding the appropriate protocols for conducting these assessments.

Participants
Individuals aged 65 years or older who lived in Obu, Japan, were invited to participate in this study. We applied the following exclusion criteria in the baseline survey: (1) history of Alzheimer's disease, stroke, or Parkinson's disease; (2) severe cognitive impairment based on the mini-mental state examination [18] score that was less than 20; (3) certification by the national long-term care insurance system as having a functional disability; (4) missing data for these criteria; and (5) lack of assessment of fall risk factors. After the exclusions, the candidate subjects were invited to complete a follow-up survey 48 ± 2 months after the baseline survey. During the follow-up period, we excluded those participants who (1) had moved to another city, (2) had died, or (3) did not receive the follow-up survey. After the follow-up survey, we also excluded the following participants: (1) those deviating from the 48 ± 2 months follow-up period and (2) those who did not complete the fall assessment in the follow-up survey. After exclusions, data from 2520 participants were analyzed ( Figure 1). Written informed consent was obtained from all participants prior to their inclusion in the study. This study was conducted in accordance with the Declaration of Helsinki, and the ethics committee of the National Center for Geriatrics and Gerontology approved the study protocol (approval number: 1440-2).

Assessment of Falls
Falls were assessed by face-to-face interviews in both baseline (fall history) and followup (primary outcome) surveys. A fall was defined as "an unexpected event in which a person comes to rest on the ground, floor, or a lower level" [19]. A recent fall was measured by participants' responses to the following question: "Do you have any history of a fall within the past year?" [20] In this study, fall history was defined as at least one fall within the past year in the baseline survey, and at least one fall within the past year in the follow-up survey was used as the outcome of the decision-tree algorithm [21,22].

Assessment of Fall Predictors
As fall-prediction variables, we assessed common fall risk factors as well as fall history, age, sex, FOF, prescribed medication, knee osteoarthritis, lower limb pain, gait speed, and timed up and go test (TUG). FOF was assessed by a closed-ended question: "Are you afraid of falling?" [23]. Participants who selected "very much" or "somewhat" were classified as having FOF, whereas participants who chose "a little" or "not at all" were classified as those without FOF [24]. Prescribed medication was assessed as the total number of all drugs continuously prescribed by a doctor to the individual, and we defined ≥5 drugs as polypharmacy [6]. Medical history of knee osteoarthritis and presence of daily pain in the lower limbs were assessed through face-to-face interviews. Regarding physical performance tests, the AGS/BGS guideline states that fall risk assessments should include gait and balance evaluation [9]; therefore we measured TUG that is recommended in the guideline in addition to gait speed [9]. Gait speed was measured in five trials using a stopwatch. Participants were asked to walk on a flat and straight surface at a comfortable gait speed. Two markers were used to indicate the start and end of a 2.4-m walk path, with a 2-m section to be traversed before passing the start marker so that participants were walking at a comfortable pace by the time they reached the timed section. Participants were asked to walk a further distance of 2-m past the end of the path to ensure a consistent walking pace while on the timed path [25]. In our gait speed measurement protocol, a relatively short walking path was set; therefore, a preliminary experiment was conducted to confirm the correlation between 10-m and 2.4-m gait speeds (r = 0.989, p < 0.01) [26]. The mean gait speed from the five trials was used as a fall-prediction variable in this study. The TUG time was measured as the time taken to rise from a standard armchair, walk a distance of 3 m at a normal and safe pace, turn around, walk back to the chair, and sit down again [27]. A previous study demonstrated that TUG had a high reliability (intraclass correlation coefficient (3,3) = 0.98) and could identify fallers (accuracy = 0.87) [28]. Two trials of TUG were conducted, and the mean time to complete the test was used as a prediction variable in this study.

Statistical Analysis
First, we classified the participants as fallers or non-fallers according to fall status in the follow-up survey and compared their baseline characteristics using Student's t-test for continuous variables and χ 2 test for categorical variables. We also calculated the odds ratios and 95% confidence intervals of all potential predictors assessed in this study. Second, we created fall-prediction models. In this procedure, a random resampling technique was applied in the minor class (fallers) and the imbalanced data were corrected into balanced data (fallers:non-fallers = 1:1) because some supervised algorithms with imbalanced datasets deliver inferior performance [29]. Next, we performed a decision-tree analysis using the C5.0 algorithm to identify the optimal and minimum combination of risk factors necessary to predict the fall status in the follow-up survey. The C5.0 algorithm is a classification approach that generates a tree in a top-down scheme based on the provided information using a recursive process [30]. In the process of building the decision tree, the optimal cut-off point (threshold at which the fall risk can be most clearly classified) is automatically calculated as a branch for continuous variables. To improve the model's performance, we generated 100 boosted decision trees. We conducted global pruning with 75% pruning severity to avoid overfitting. The minimum node size was set at 100. Furthermore, 10-fold cross-validation [31] was performed to test the stability of the decision tree. We also created a logistic regression model as a benchmark to evaluate the decision-tree model. This logistic regression analysis was performed via a backward stepwise approach using the same prediction variables as those in the decision-tree analysis. Finally, we identified the model performance of the decision-tree model and logistic regression model using AUC, based on the receiver operating characteristic analysis, accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). All analyses were performed using IBM SPSS Statistics 25 and IBM SPSS Modeler 18 (IBM Japan, Tokyo, Japan). The level of statistical significance was set to p < 0.05.

Flow of Participants and Their Characteristics
A total of 2520 community-dwelling older adults aged 65 years or older met our criteria, and their longitudinal data were analyzed. Among the 2520 participants enrolled in this study, 415 (16.5%) reported recent falls in the 48-month follow-up survey. The differences in baseline characteristics between fallers and non-fallers are shown in Table 1. Compared to non-fallers, fallers were significantly older (P = 0.001), had a higher prevalence of fall history (p < 0.001) and FOF (p < 0.001), took more prescribed medication (p = 0.002), had a higher prevalence of knee osteoarthritis (p = 0.005) and lower limb pain (p = 0.015), and showed slower gait speed (p < 0.001) and TUG time (p = 0.007).

Prospective Association between Potential Predictors and Future Falls
Odds ratios (ORs) and 95% confidence intervals (95% CIs) of all potential fall predictors in the follow-up survey are shown in Table 2. All the potential predictors, except for sex, were significantly associated with falls in the crude model, and the ORs (95% Cis) of each predictor were as follows: age group (≥75

Fall-Prediction Models Using Decision-Tree and Logistic Regression
The final decision-tree model is shown in Figure 2. This model includes 14 nodes with six predictors as follows: fall history, polypharmacy, TUG, FOF, lower limb pain, and age group. The decision tree subdivided the samples into eight risk groups with fall incidence probabilities ranging from 30.4% to 71.9% (Figure 3). We also performed multivariable logistic regression analysis using a backward stepwise approach, and the final logistic regression model is shown in Table 3

Discussion
We aimed to develop a simplified decision-tree algorithm for fall prediction using easily measurable predictors and examine prediction validity using longitudinal cohort data. We created a decision-tree model that uses six predictors that are common and easily measurable items, and the model could stratify the probabilities of fall incidence ranging from 30.4% to 71.9%. Additionally, the decision-tree model outperformed the logistic regression model with respect to AUC, accuracy, sensitivity, PPV, and NPV.
Regarding the components of our decision-tree model, previous studies have demonstrated the association of fall history [4], polypharmacy [5,6], TUG [28], FOF [7], pain [4], and age group [4] with falls in older people; thus, the selected items in our decision-tree analysis corroborated these findings. Moreover, all items from our decision-tree model are easily measurable and widely used as fall risk factors in clinical and research fields; therefore, our model is acceptable for use in a wide variety of situations. In this study, the selected predictors differed between the decision-tree model and the logistic regression model; TUG and lower limb pain were included only in the decision-tree model, whereas knee osteoarthritis and gait speed were included only in the logistic regression model. Logistic regression analysis is based on linear regression, whereas decision-tree analysis is based on a nonlinear model, and therefore different combinations of fall predictors might be selected by each model. Thus, the decision-tree analysis, with its nonlinear algorithm, may be useful in revealing stratified relationships between each fall predictor and subsequent fall risk.
As a feature point of our decision-tree model, an importance-based ordering of predictors is presented visually with respect to positions in the branch of the algorithm. Fall history is located at the top of the tree. Previous falls are known to be the most influential predictor [4], and the existing algorithm by the AGS/BGS Panel recommends assessing fall history first [9]. Therefore, we believe that the structure of our model is valid. Additionally, our decision tree calculated the optimal cut-off point of TUG as 9.7 s for fall prediction. Regarding the cut-off point of TUG for falls, the 13.5 s previously reported by Shumway-Cook et al. [28] has been widely used, and a previous systematic review showed that published cut-off points of TUG for independent-living older persons varied between 8.1 and 16.0 s [32]. Our cut-off, 9.7 s, is relatively fast, and this may be because it was calculated among our subjects without fall history or polypharmacy. Therefore, for older people who live independently and have not had any recent falls, the cut-off point of TUG for fall prediction might have to be set as a relatively faster time than previously believed.
The decision-tree model in this study outperformed the logistic regression model with respect to AUC, accuracy, sensitivity, PPV, and NPV. Although only specificity was lower in the decision-tree model than in the logistic regression model, the decision-tree model demonstrated relatively high sensitivity which would still make it suitable for use as a primary screening tool for fall risk. Additionally, our decision-tree model consists of common and easily measurable fall predictors and thus provides a minimal and personalized combination of predictors to calculate fall probability, ensuring that it can be useful as an efficient and effective tool in various healthcare settings.
A major strength of this study is that we analyzed large-scale, well-characterized cohort data using a longitudinal design. Additionally, we created a white-box decision-tree model using common fall risk factors without complicated calculations; thus, our fallprediction model can be successfully applied to a variety of situations. However, this study has some limitations. First, our decision-tree model was only based on items assessed in medical or physical contexts; therefore, further examination combined with other aspects of fall risk should be conducted to improve the prediction performance. Second, although we used a cross-validation method, we did not use a hold-out dataset. Therefore, overfitting may still occur, and our results should be further validated in other external cohorts that have similar characteristics to the one used in the present study. Third, participants were healthy enough to undergo health checkups at the community center, and still 37.9% of participants dropped out of the follow-up survey. This selection bias may have led to an underrepresentation of baseline fall risk factors and future falls.

Conclusions
In this study, we developed a simplified decision-tree algorithm for fall prediction and confirmed its prediction validity using longitudinal cohort data. The decision-tree model outperformed the logistic regression model using the same predictors and could stratify the probabilities of fall incidence into various ranges. Our findings provide useful information for the early screening of fall risk and promote timely preventive strategies in community and clinical settings.