Comparative Analysis of Fall Risk Assessment Features in Community-Elderly and Stroke Survivors: Insights from Sensor-Based Data

Fall-risk assessment studies generally focus on identifying characteristics that affect postural balance in a specific group of subjects. However, falls affect a multitude of individuals. Among the groups with the most recurrent fallers are the community-dwelling elderly and stroke survivors. Thus, this study focuses on identifying a set of features that can explain fall risk for these two groups of subjects. Sixty-five community dwelling elderly (forty-nine female, sixteen male) and thirty-five stroke-survivors (twenty-two male, thirteen male) participated in our study. With the use of an inertial sensor, some features are extracted from the acceleration data of a Timed Up and Go (TUG) test performed by both groups of individuals. A short-form berg balance scale (SFBBS) score and the TUG test score were used for labeling the data. With the use of a 100-fold cross-validation approach, Relief-F and Extra Trees Classifier algorithms were used to extract sets of the top 5, 10, 15, 20, 25, and 30 features. Random Forest classifiers were trained for each set of features. The best models were selected, and the repeated features for each group of subjects were analyzed and discussed. The results show that only the stand duration was an important feature for the prediction of fall risk across all clinical tests and both groups of individuals.


Introduction
Falls are problems that affect different groups of individuals, and most studies on fall risk focus on older adults [1]. However, stroke survivors also experience falls. Given that these two categories of people fall frequently, it is crucial to create effective fall prevention programs since the expenses associated with falls place an increasing burden on the public health system [2].
Previous studies have determined that falls are a multifactorial problem [3]. Mobility [4], gait instability [5][6][7], and balance issues [4,7,8] are some of the most common causes that affect older adults. Similarly, the most common factors that affect stroke patients include balance [9][10][11][12] and mobility issues [11,13]. In response, fall-risk prevention programs use clinical tests to detect subjects who suffer from these issues. Two of the most common clinical tests associated with fall-risk prevention are the Timed-Up and Go (TUG) test and the Berg balance scale (BBS), as they were developed and evaluated in several fall-risk assessment studies [14]. Despite their effectiveness, their implementation requires the presence and expertise of a medical professional. This has presented the opportunity for researchers to study the applications of wearable inertial devices as auxiliary tools to assist medical professionals with their light weight, portability, low cost [14], and ability to Healthcare 2023, 11,1938 2 of 19 collect sensitive and reliable TUG data [15,16] which enables researchers to identify reliable parameters [17].
Researchers have identified features that can be used with statistical or machinelearning models to identify older adults at risk of falling [18], as well as post-stroke individuals at risk of falling [19] using data collected by inertial sensors from the TUG test. However, the features identified by these studies have no medical interpretation for medical professionals, which makes analyzing the underlying health problems related to these falls challenging. To address these limitations, a similar fall-risk assessment study used a wait-mounted accelerometer to estimate the BBS score of community dwelling elderly [20]. The features used had medical meaning and are easily interpreted by medical professionals. However, most features extracted in this study require extensive signal processing and data cleaning, which makes the procedure difficult to reproduce in elderly homes without the constant monitoring of trained personnel. Moreover, none of the studies implemented a multifactor clinical test, which is more efficient at capturing the complex nature of falls [21,22].
Finally, to our knowledge, no research has analyzed the similarities between communitydwelling elderly adults and post-stroke patients that can predict fall risk. Our motivation to include both groups of subjects is due to their frequency of fall, the severity of the injuries they suffer because of their condition, and the limited number of studies analyzing features related to falls among post-stroke subjects. We will calculate numerous features from the TUG signals of all individuals. Using feature selection algorithms, we will estimate feature importance and compare them between both groups of subjects. Consequently, this study will focus on summarizing the features required to detect the largest number of classes with a higher potential to fall while simultaneously using machine-learning algorithms to discuss the benefits of automatic screening.

General Approach
Subjects from both groups wore an inertial sensor while they performed the TUG test. The data will be segmented into sit to stand (Sist), walk, turn, and stand to sit (Stsi), after which a set of features were calculated from the acceleration data of each subject. The scores for the TUG and SFBBS clinical tests were used as labels for the features. Using ETC and Relief-F algorithms, feature importance was calculated from both groups. A random forest classifier (RF) was used to classify the subjects into fall risk or healthy using different sets of features selected based on their importance. Finally, the most important features were analyzed to determine whether any features could be regarded as important for fall-risk prediction, independent of the type of subject being tested.

Subjects
The community elderly and stroke survivors both commonly use an inertial sensor to identify fall risk. This study focuses on identifying a set of features that can explain fall risk for subjects of these two groups.

Community-Dwelling Elderly
Community-dwelling elderly subjects from a hospital in central Taiwan participated in a set of clinical tests between April 2014 and May 2015. The studies involving human participants were reviewed and approved by Tsaotun Psychiatric Center, Ministry of Health and Welfare (IRB No. 104013). A team of physiotherapists and rehabilitation physicians assisted and monitored the participants. All subjects wore a waist-mounted inertial sensor while completing the clinical assessments. As summarized in Table 1, data were collected from 65 elderly adults (with an average age of 76 ± 7 years). Such subjects were recruited after confirming that they had no history of musculoskeletal injuries or central nervous system injuries and that they could walk without aid to perform the clinical tests. Between April 2018 and October 2018, we recruited stroke survivors from a hospital in north Taiwan to participate in a series of clinical tests (IRB No. TYGH106045). Subjects capable of performing these tests with or without walking assistance were included in this study. In total, we gathered data from 35 different individuals (22 men and 13 women) who had suffered from an Ischemic stroke. All subjects that participated in our study did it willingly and provided consent to have their acceleration data collected. A summary of the demographic data for the stroke survivors in our study is shown in Table 1.

Clinical Tests
In this research, two different clinical tests were performed by all test subjects, i.e., the short-form berg balance scale (SFBBS) and the TUG. The SFBBS was conducted by a professional physiotherapist. Subjects who took the SFBBS test were required to perform seven different activities, which are standing still with both eyes closed, sitting to standing transitions, standing with both feet while keeping an arm reaching forward, picking up an object from the floor, turning 360 degrees while standing up, standing with one foot in front, and standing on one leg unsupported. The professional physiotherapist assigned a score to each task performed by the subjects. This score ranged from 0 (subject could not perform the task) to 4 (subject performed the task without problems). Consequently, subjects who had no problems performing any of the seven tasks obtained the maximum score of 28. In contrast, subjects who had problems regarding their static balance obtained a score lower than 23, which was found to be the significant threshold to patients with posture problems by a previous study [23].
When performing the TUG test [24], subjects began by sitting on a chair. Then, they were asked to stand up, walk at a natural pace forward, turn 180 degrees when they reached a mark on the floor, walk back toward the chair, and sit down.
In this study, we used an inertial sensor to collect data from subjects when they performed the TUG test. We did not collect any acceleration data from the SFBBS test. Therefore, the features used in this study were extracted exclusively from the TUG acceleration signals of subjects. Moreover, we used the scores of the SFBBS test and the score of the TUG test to label the subjects as fallers and non-fallers. The duration of the TUG test was used to classify those subjects who performed the test in over 12.47 s as fallers, since a previous study found that this threshold was substantial for community-dwelling elderly [23]. Similarly, we labeled subjects with a SFBBS score lower than 23 as fallers, as this was also substantial in previous studies [20,25].

Wearable Accelerometer
To find a set of features that are not related to the type of sensor used, different sensors were used to collect data from each subject group. For the community-dwelling elderly, the ADXL345 accelerometer was used. This sensor collected data at a frequency of 30 Hz from three different axes, namely mediolateral (ML), vertical (V), and anterior-posterior (AP). For the stroke survivor subjects, a triaxial accelerometer (RD3152MMA7260Q, Freescale Semiconductor-NXP, Austin, TX, USA) sensor was used. It was calibrated at a frequency of 45 Hz and recorded acceleration data from the ML, V, and AP axes. Each sensor was attached to a waist-mounted strap, and it was situated at the lower back of the subjects. This location approximates the center of mass of most individuals, making it the most common across similar studies [14]. An illustration of the location of the sensors for both experiments can be found in Figure 1.
In this study, we used an inertial sensor to collect data from subjects when they per formed the TUG test. We did not collect any acceleration data from the SFBBS test. There fore, the features used in this study were extracted exclusively from the TUG acceleration signals of subjects. Moreover, we used the scores of the SFBBS test and the score of the TUG test to label the subjects as fallers and non-fallers. The duration of the TUG test was used to classify those subjects who performed the test in over 12.47 s as fallers, since a previous study found that this threshold was substantial for community-dwelling elderly [23]. Similarly, we labeled subjects with a SFBBS score lower than 23 as fallers, as this was also substantial in previous studies [20,25].

Wearable Accelerometer
To find a set of features that are not related to the type of sensor used, different sen sors were used to collect data from each subject group. For the community-dwelling el derly, the ADXL345 accelerometer was used. This sensor collected data at a frequency o 30 Hz from three different axes, namely mediolateral (ML), vertical (V), and anterior-pos terior (AP). For the stroke survivor subjects, a triaxial accelerometer (RD3152MMA7260Q Freescale Semiconductor-NXP, United States) sensor was used. It was calibrated at a fre quency of 45 Hz and recorded acceleration data from the ML, V, and AP axes. Each sensor was attached to a waist-mounted strap, and it was situated at the lower back of the sub jects. This location approximates the center of mass of most individuals, making it the most common across similar studies [14]. An illustration of the location of the sensors for both experiments can be found in Figure 1. Illustration showing the estimated location of the inertial sensor (white box). This sensor was attached to a belt, and it was located on the lower back of subjects (between the L4 and L5 vertebrae). The three axes of the sensor collected along with their orientations are shown for refer ence.

Feature Extraction
We calculated a set of 79 different features from the inertial sensor data using Python which can be found in Table 2. Every feature was calculated for each axis (ML, V, and AP) and was found to be related to fall risk by previous studies. This section introduces these features from a physiological point of view.
Root mean square (RMS) represents the degree of spread of the data with respect to zero [26]. As the data in this study were collected from the lower back level, RMS measures the degree of variability in trunk acceleration. This feature is commonly used in similar studies, as maintaining balance relies heavily on trunk control since this the approximate location of center of body mass [27,28]. Consequently, previous studies have found low acceleration RMS to be associated with instability [29][30][31], which directly affects posture. Illustration showing the estimated location of the inertial sensor (white box). This sensor was attached to a belt, and it was located on the lower back of subjects (between the L4 and L5 vertebrae). The three axes of the sensor collected along with their orientations are shown for reference.

Feature Extraction
We calculated a set of 79 different features from the inertial sensor data using Python, which can be found in Table 2. Every feature was calculated for each axis (ML, V, and AP), and was found to be related to fall risk by previous studies. This section introduces these features from a physiological point of view.
Root mean square (RMS) represents the degree of spread of the data with respect to zero [26]. As the data in this study were collected from the lower back level, RMS measures the degree of variability in trunk acceleration. This feature is commonly used in similar studies, as maintaining balance relies heavily on trunk control since this the approximate location of center of body mass [27,28]. Consequently, previous studies have found low acceleration RMS to be associated with instability [29][30][31], which directly affects posture.  (36) Step Length (37) Gait Speed (38) Step Time ( Similar to RMS, jerk measures the rate of change in acceleration [32]. Jerk is a common feature in previous studies as healthy subjects will exert higher muscle strength when performing sit to stand or stand to sit transitions [33], which may result in noticeable acceleration changes. These acceleration changes can also be reflected in subjects as they lean forward during standing or backwards while sitting and can be captured by their maximum acceleration values [34,35]. Moreover, subjects with posture balance problems perform the standing and sitting transitions in a more controlled manner as they have reduced ability to control their movement while performing these tasks. This restricted movement has been captured by previous studies and found to be considerably different between fallers and non-fallers as shown by their standard deviation measurements of acceleration [36,37], median acceleration values [37], and range acceleration values [37].
Individuals who are at risk of falling also exhibit abnormal sways when walking [27,38]. This abnormal sway can be caused by a strategy of remaining in control of their balance to avoid falling [39]. Consequently, this strategy of caution affects the total time it takes subjects to walk. In fact, studies have found walk duration to be a good predictor of falls walk duration [40][41][42][43]. A similar strategy used by frail subjects at risk of falling is to take smaller steps to improve their balance while walking, as reflected by shorter step lengths [44][45][46] and stride length [43,47]. Reducing walking speed is also common, as shown by recent studies that found significant relations between risk of falling and gait speed [43,47], cadence [43,47], stride time [48], and step time [48].
Stride length measures the distance from the moment a particular heal touches the ground, goes through a gait cycle, and touches the ground again. Similarly, step length measures the distance from the moment a heel touches the ground to the moment the heel on the opposite side touches ground, which is usually half of a stride. As observed by previous studies, stroke survivors suffer from variations in step and stride length caused by underlying paretic leg impairment [49,50]. These differences provide information on the severity of gait abnormalities and have implications for fall risk assessment. Understanding these differences and analyzing their relationship with fall risk can provide comprehensive information for stroke patient fall risk assessment and help to screen or design more effective interventions. Similar variations are also common in elderly subjects with dementia [51]. These abnormalities in gait increase the risk of falling, as observed by recent studies which found subjects at risk of falling to have higher coefficient of variation (CV) for step time and stride time [51][52][53] when compared to healthy subjects.
Postural problems can also be identified during the standing and sitting transitions. Sitto-stand and stand-to-sit durations have been found to be statistically significant between healthy elderly and those with transitional posture problems [54,[54][55][56]. Sit-to-stand duration has also been found to be statistically significant among stroke survivors as they need considerably more time to achieve stability when standing up [57]. Similarly, stand-to-sit duration was also found to be a good predictor of falls among stroke survivors, as subjects tend to shift their weight towards one leg, which causes difficulties to sit naturally [57].

Multiscale Entropy (MSE) Analysis
The calculation of MSE begins by defining the scaling factors τ to be analyzed. Then, for each scaling factor, a coarse-grained series is extracted from a given time series of length N. This process is performed by estimating the mean of all data points within a sliding window of size τ. As the name suggests, this window slides through the entire time series; thus, the resulting coarse-grained series has a length of N/τ data points. An example found in another study [58] illustrating the process of calculating coarse grained series can be found in Figure 2. The calculation of MSE begins by defining the scaling factors τ to be analyzed. Then, for each scaling factor, a coarse-grained series is extracted from a given time series of length N. This process is performed by estimating the mean of all data points within a sliding window of size τ. As the name suggests, this window slides through the entire time series; thus, the resulting coarse-grained series has a length of N/τ data points. An example found in another study [58] illustrating the process of calculating coarse grained series can be found in Figure 2. Next, for each coarse-grained series, sample entropy (SampEn) is calculated. SampEn measures the complexity of a signal by finding the probability that similar sequences of m consecutive data points will remain similar if their number of data points increases by one data point. As observed in Equation (1), a signal with low complexity has a SampEn value close to zero.

SampEn
(1) Finally, after calculating SampEn, the complexity index (CI) is calculated as the sum of the SampEn values of all coarse-grained series (for all scaling factors ), as illustrated in Equation (2). CI was found useful to categorize falling behavior [59], as it can measure the information contained in physiological time series over multiple scales.

Permutation Entropy (PE)
The first step in calculating PE is to use a window (of length D) and slide it τ data points each step through the entire time-series data (of length T). This will result in a twodimensional matrix of shape D × T − (D − 1) τ, where each column represents the data scanned at each step by the sliding window. To illustrate this process, consider the Next, for each coarse-grained series, sample entropy (SampEn) is calculated. SampEn measures the complexity of a signal by finding the probability that similar sequences of m consecutive data points will remain similar if their number of data points increases by one data point. As observed in Equation (1), a signal with low complexity has a SampEn value close to zero.
Finally, after calculating SampEn, the complexity index (CI) is calculated as the sum of the SampEn values of all coarse-grained series (for all scaling factors τ), as illustrated in Equation (2). CI was found useful to categorize falling behavior [59], as it can measure the information contained in physiological time series over multiple scales.

Permutation Entropy (PE)
The first step in calculating PE is to use a window (of length D) and slide it τ data points each step through the entire time-series data (of length T). This will result in a two-dimensional matrix of shape D × T − (D − 1) τ, where each column represents the data scanned at each step by the sliding window. To illustrate this process, consider the To achieve this, we first find all possible ordinal patterns which capture the ordinal rankings of the data. We can find these by calculating all possible permutations for a given window size. In this particular case, given our sliding window of size D = 3, then the ordinal patterns are To map these ordinal patterns to the matrix obtained above, it is necessary to observe the order of the values in each column. For example, given the first column, the permutation that should be mapped to it should be π 1 = {2, 1, 0}, since 8 > 5 > 4. Therefore, if we map all the permutations to our matrix, we would obtain the following permutation matrix:   2 2 1 0 2 0 0 0 2 1 1 1 2 1 0   Given the permutation matrix, the frequency of each permutation that appears throughout all sequences is then calculated. This frequency is then divided over the total number of sequences (or several columns in the matrix), which gives a probability p. For the permutation matrix obtained above, the probabilities p i of each ordinal pattern are p π 1 = 0/5 p π 2 = 1/5 p π 3 = 1/5 p π 4 = 0/5 p π 5 = 2/5 p π 6 = 1/5 Finally, the value for PE for a given order D is obtained using Equation (3):

Feature Importance and Classification
The methodology proposed for feature importance selection is inspired by a recent and novel study [60]. Using the score and specific criteria for each clinical test, we labeled each subject as either fall risk or non-fall risk. With a 100-fold cross-validation strategy, we used two different feature selection algorithms, namely Relief-F and ETC, to find the top 5, 10, 15, 20, 25, and 30 features for each clinical test. We selected this number of folds as it is a common technique to reduce bias towards samples in small datasets [61]. Using each set of features, we used a random forest algorithm with a 100-fold cross-validation approach to classify subjects into fall-risk or healthy categories. Furthermore, we selected the best model for each clinical test, feature selection algorithm, and subject group, based on the average AUC score across folds. Finally, we selected those features found in both models as the set of important features for the respective clinical test. This entire procedure is illustrated in Figure 3. Relief-F has been used for feature selection in fall-risk assessment studies and gait analysis studies [60,62,63]. This popularity can be attributed to its numerous characteristics, such as its computational efficiency when dealing with large feature spaces (which are common in fall-risk assessment studies). It is capable of detecting feature dependencies by indirectly deriving interactions through the concept of nearest neighbors [64]. Furthermore, Relief-F is a non-parametric feature selection method, which allows it to determine feature importance across a wide range of datasets without relying on the underlying distribution of the data [65]. Contrary to other filter-based feature selection methods, Relief-F has more robustness against imbalanced datasets [65]. Thus, it has been preferred for our imbalanced dataset.
The main objective of this algorithm is to estimate the quality of attributes (features) based on their ability to classify samples that are similar. Features that can correctly classify neighboring samples obtain high-quality estimation, whereas features that misclassify neighboring samples are ranked poorly. This iterative algorithm starts at iteration 1 by first setting the quality of all samples, , to 0. Then, for each next iteration i = 1, 2, …, m, the algorithm randomly selects a sample, , and using the Manhattan distance, , , it computes a set of k-nearest neighbors for each class. Finally, it updates the quality estimation, , for each neighbor using Equation (4) assuming and belong to the same class or using Equation (5) if they belong to different classes.
where is the weight of the feature at iteration , prior probability of 's and 's class, respectively, is the maximum number of iterations, set by the user, and ∆ is the difference of the feature value between and , and is expressed as ∆ . Figure 3. Illustration of the procedure to estimate important features.

Relief-F
Relief-F has been used for feature selection in fall-risk assessment studies and gait analysis studies [60,62,63]. This popularity can be attributed to its numerous characteristics, such as its computational efficiency when dealing with large feature spaces (which are common in fall-risk assessment studies). It is capable of detecting feature dependencies by indirectly deriving interactions through the concept of nearest neighbors [64]. Furthermore, Relief-F is a non-parametric feature selection method, which allows it to determine feature importance across a wide range of datasets without relying on the underlying distribution of the data [65]. Contrary to other filter-based feature selection methods, Relief-F has more robustness against imbalanced datasets [65]. Thus, it has been preferred for our imbalanced dataset.
The main objective of this algorithm is to estimate the quality of attributes (features) based on their ability to classify samples that are similar. Features that can correctly classify neighboring samples obtain high-quality estimation, whereas features that misclassify neighboring samples are ranked poorly. This iterative algorithm starts at iteration i = 1 by first setting the quality of all samples, w j , to 0. Then, for each next iteration i = 1, 2, . . . , m, the algorithm randomly selects a sample, x r , and using the Manhattan distance, d rq, , it computes a set of k-nearest neighbors for each class. Finally, it updates the quality estimation, w j , for each neighbor x q using Equation (4) assuming x r and x q belong to the same class or using Equation (5) if they belong to different classes.
where W i j is the weight of the feature F j at iteration ip y r , p y q prior probability of x r 's and x q 's class, respectively, m is the maximum number of iterations, set by the user, and ∆ j is the difference of the feature value F j between x r and x q , and is expressed as

Extra Trees Classifier (ETC)
Decision Trees (DT) are algorithms that classify samples by recursively evaluating features that best split the data. For categorical DT, this splitting criterion is determined using different metrics, i.e., the Gini index or entropy. The main problem with DTs is that they are inaccurate, and thus can be solved by random forests (RF) [66] by combining many DTs to make predictions. However, combining multiple DTs without any data preparation results in a highly biased prediction. Thus, RFs use bootstrapping to reduce correlation across trees. This technique consists of the generation of datasets of the same size as the original but with randomly selected samples with replacement. Then, for each bootstrapped dataset, a decision tree is created using only a random subset of features n at each step. Therefore, given a new sample, the classification results for all trees are aggregated (this technique is called bagging), and the final classification result is obtained. Finally, feature importance is calculated by estimating the average of each decrease in the impurity of the feature across trees.
ETC has also been used for feature selection in studies of fall-risk assessment [60,62,67]. Non-parametric in nature, ETCs serve as effective tools for uncovering nonlinear associations and are considered valuable in the analysis of data [68]. They achieve this by using randomized splitting points for each tree. Moreover, they are highly interpretable and can be used for both discrete and continuous data. Furthermore, they are able to reach a balanced ratio between variance and bias when compared to other feature selection algorithms [69]. ETCs do not use bootstrapped datasets but rather consider the whole dataset for each DT. In addition, they consider a random subset of features at each step when building a DT, and this subset is generally larger in ETC than in RF. Finally, the split decision at each node is random, as opposed to the impurity criteria used by RF, which allows them to be computationally less expensive. To select features, ETCs use mean decrease impurity methods, which allow ranking of features in order of classification significance.

Results and Discussion
We performed the analysis by first calculating all sets of important features using each feature selection algorithm. Then, we analyzed the best-performing model for each feature selection algorithm, clinical test, and subject group. The features used were then compared by each best-performing model, as these features contain relevant information related to the falling problem. Furthermore, we compared these features across clinical tests. The ranking of features is different across feature selection algorithms as these use different criteria to rank features. Consequently, features found to be important across both feature selection mechanisms and both groups of subjects are discussed.

Top Features Selected by Both Feature Selection Algorithms for Each Subject Group
The sets of top 5, 10, 15, 20, 25, and 30 features (in descending order) selected by both feature selection algorithms for community-dwelling adults can be found in Table 3. As observed, for most clinical tests, the gait-related features are at the top of the table. This further indicates the importance of gait-related features to fall-risk screening, which is consistent with previous studies [70,71]. Moreover, features from the ML axis are predominant. This correlates with previous studies that present ML as indicative of fall risk in the elderly [72], irrespective of laboratory or clinical measures of postural stability. Step The top features for stroke survivors are summarized in Table 4. Most gait-related features are at the top of the table, which highlights their importance for fall-risk assessment. For community-dwelling elderly, gait speed and step length are consistently within the top five important features. This is an indication that these two features might hold valuable information when studying falls across these two subject groups. Finding these features is also consistent with studies [73,74]. Moreover, most features for stroke survivors are related to the vertical axis, rather than the ML axis. We believe this is due to the need to maintain proprioceptive balance in stroke patients [75]. Step Length Gait Speed Step Length 2 Gait Speed RMS Sit (V) Std Sit (AP) Gait Speed Step Length Gait Speed 3 Step

Best-Performing Models for Each Clinical Test, Feature Selection Mechanism, and Subject Group
The average AUC score for the best-performing models is highlighted in Table 5. In most cases, the best-performing models have a high AUC score, indicating an overall good classification performance. From the perspective of stroke survivors, the AUC of the TUG test is better for all sets of features. Similarly, from the perspective of community-dwelling elderly adults, almost all the best-performing models use the TUG test. Moreover, this table also shows that half of the best-performing models for stroke survivors use the set of top 30 features. While in contrast, half of the best models for community-dwelling elderly subjects use the set of top 15 features. This could be attributed to a larger number of samples in our community-dwelling elderly group. It can also be observed that in most cases, the models that use the multifactor clinical score as a label also use a smaller set of features than the models that use the SFBBS or TUG clinical scores. This can be explained by more robust classification criteria obtained after combining the SFBBS and TUG clinical scores. In addition, from the point of view of the feature selection mechanism, the AUC results of ETC are all higher than that of Relief-F, which is consistent with the findings of a similar study [60]. Moreover, as stated earlier, this study focuses on finding sets of features that can be used to study the underlying problems related to fall risk. Table 6 demonstrates that the trained models indeed are able to obtain good classification results. Thus, the features selected by these models are related to fall risk. After finding the best model for each clinical test and each subject group, we also included other statistics from such models to show a more complete performance summary, as can be found in Table 6. When observing the results of using Relief-F as the feature selection mechanism, it can be observed that the AUC scores for TUG are the highest. Similarly, the precision results for community-dwelling elderly show that the TUG test obtained the best results since the features were directly extracted from the TUG acceleration signals. In contrast, when analyzing the F1 score, we can observe that the multifactor clinical test (SFBBS + TUG) had a better prediction accuracy for the stroke survivor subjects. Moreover, when analyzing the overall classification performance of using the features selected by ETC, we observed that TUG shows the best classification performance across both groups. This is expected as the features were extracted directly from the inertial acceleration data collected during the TUG test. Despite the higher AUC scores for most TUG tests, it is important to take into consideration that SFBBS and TUG tests measure different characteristics of a subject's balance. TUG focuses on gait, while SFBBS focuses on static balance. Therefore, considering the multifactor test can provide a deeper and more robust understanding of a subject's balance, as was suggested by previous studies [56,57].

Most Important Features for the Community-Dwelling Elderly and Stroke Survivors
From the community-dwelling elderly subjects, the set of repeated features that were found to be used by the best models of both feature extractor algorithms can be found in Table 7. As observed, only stand duration feature is present across all clinical tests (as it is highlighted by a "*"). This is consistent with previous studies, which found this feature to be significantly different between healthy and fall-risk subjects [53,76] and to be statistically significant between healthy elderly and those with transitional posture problems [54]. Meanwhile, gait speed was found to be important across two of the three clinical tests. This is consistent with previous studies that found this feature helpful for fall prediction [43,47]. Table 7. Features selected by the best models for the community-dwelling elderly. Features found to be repeated across all clinical tests are marked with a "*".

Walk Duration Stand Duration * Gait Speed Gait Speed
Stand Duration * Step Length Stand Duration * MSE-V CI MSE-V Mean From the stroke survivors, the set of repeated features that were used by the best models of both feature extractor algorithms is shown in Table 8. Two features were important across all three clinical tests, i.e., step length and stand duration (as highlighted by a "*").
Step length was also found in previous studies to predict falls [43] and was important to determine posture balance [77]. This can be related to gait speed, as subjects who have poor balance will try to maximize the time that they have for direct contact of their feet with the ground to avoid falling.
Step length was important for fall-risk prediction among stroke survivors [74]. Moreover, duration of standing, step length, and stride time (important across two different clinical tests) were believed to be related to gait asymmetry and are related to fall risk [78] because they indicate the level of lower limb control the subject has while walking [79]. Table 8. Features selected by the best models for the stroke-survivors. Features found to be repeated across all clinical tests are marked with a "*".

Walk duration Stand duration * Walk duration Gait speed
Step Length * Sit duration Step length * Stride time CV step time Mean sit (V) RMS walk (ML) Stand duration * CV stride time CV stride time Sit duration Step Length * Std sit (AP) RMS turn (Y) Jerk sit (AP) Cadence Median turn (AP) Stand duration * Step time Stride time Finally, by analyzing the features found to be important across both groups (found in Table 9), it can be observed that four features were found to be important for the SFBBS clinical test, which is walk duration, gait speed, step length, and stand duration. Moreover, stand duration was found to be important for both the TUG test and the multifactor test. This can be attributed to a reduction and weakening of the hamstring muscles (located on the legs) of fall-risk individuals, which causes them to hastily perform the standing transitions [76]. Differences in the sit-to-stand transition can also be explained by a reduction in balance as the center of mass is raised further from the ground [37]. In summary, no matter which group of subjects is studied (community-dwelling or poststroke), stand duration has important information that can help researchers and doctors to judge and further study fall risk among subjects. Table 9. Most important features across both groups of subjects.

Walk Duration Stand Duration Stand Duration Gait Speed
Step Length Stand Duration Finding stand duration as a critical feature for posture assessment is backed by several studies which found it to be significantly different between healthy individuals and those at risk of falling [53,54,76,80]. Stand duration is calculated by measuring the total time in seconds it takes for the person to stand up from the sitting position. This time is calculated from the moment they lounge their upper body forwards to the moment they are standing upright, with legs fully stretched. In older individuals, a variation in stand duration can be explained by reduced muscle strength. This weakening of muscles results in a loss of balance during this transition which can lead to falls. In fact, clinical tests involving repeated sit-to-stand exercises have been found in previous studies to accurately identify individuals with reduced lower muscle strength [33,81]. Individuals with reduced muscle strength rely on their arms for support while standing, which increases the time they require to stand up [33].
According to multiple studies, post-stroke individuals are most susceptible to falling during sit-to-stand transitions [82][83][84]. This is generally caused by individuals shifting their weight towards their unaffected leg when standing up [57,82,[85][86][87]. This shifting of weight causes individuals to require longer times to perform these transitions. Consequently, sitto-stand tests have also been recommended as a tool to measure lower muscle strength in post-stroke individuals, as well as a screening tool for individuals at risk of falling [88,89]. Table 10 summarizes the values in seconds for stand duration obtained by each group of subjects. As observed, healthy individuals across all clinical tests were able to stand up in a shorter time when compared to individuals considered to be under fall risk, which suggests this feature can provide important information to identify subjects with postural problems. The values in seconds for the community elderly are consistent with a previous study that performed the TUG test on four different elderly subjects and measured the time it took individuals to stand up [80]. From this table, it can also be observed that stroke survivors that are not healthy (according to all clinical tests) have significant problems standing up compared to healthy stroke survivors as evident from the time they required to stand up.

Conclusions
This study analyzed the fall risk in individuals by automatically extracting features from the inertial sensor data collected from a TUG test and using machine learning to classify subjects as fallers and non-fallers. Our results show that the set of features extracted can provide good screening performance on either single or multifactor clinical tests. Using two feature selection algorithms, we found a set of important features, which were also found to be related to fall risk in previous studies.
We recognize there are some limitations with our study. Mainly, our subjects are mostly female, which makes it difficult to draw conclusions that can be representative of larger populations. However, due to the restrictions for recruitment which were necessary during our study as well as the difficulties involving recruiting individuals for medical studies, it was inevitable to recruit a balanced number of female and males. Nonetheless, this study provides a valuable contribution from the current perspective of technology-assisted scientific research.
Studies on fall-risk assessment are mostly limited by the type of subjects that participate in them. To address such limitation, this study is the first to compare the set of important features between two groups of subjects, which are the community-dwelling elderly and stroke survivors. By comparing these two groups, this study focused on finding a set of features that can be used to predict fall-risk, independent of the type of individuals being studied. Results showed that, across all clinical tests, only stand duration was found to be important to fall risk. This is consistent with a multitude of fall-risk assessment studies and is generally attributed to the weakening of muscles and the reduction in balance during such transition in subjects who are at risk of falling.
Finally, it is important to understand that the factors of fall risk are multiple, including muscle strength, cognitive function, environmental factors, etc. The purpose of this study is to use technology-assisted methods to focus on measuring acceleration data and pairing it with a clinical balance test task to find out the association between it and the risk of falling. Such an approach is valuable in studying the commonalities between specific eigenvalues and populations. Future studies could further explore other possible characteristics and factors to develop a more comprehensive and reliable fall-risk assessment model.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to it belonging to individuals who participated in the study.