Exploration of Fall-Evaluation Scores Using Clinical Tools with the Short-Form Berg Balance Scale and Timed Up and Go and Motion Detection Sensors

: The results obtained by medical experts and inertial sensors via clinical tests to determine fall risks are compared. A clinical test is used to perform the whole timed up and go (TUG) test and segment-based TUG (sTUG) tests, considering various cuto ﬀ points. In this paper, (a) t-tests are used to verify fall-risk categorization; and (b) a logistic regression with 100 stepwise iterations is used to divide features into training (80%) and testing sets (20%). The features of (a) and (b) are compared, measuring the similarity of each approach’s decisive features to those of the clinical-test results. In (a), the most signiﬁcant features are the Y and Z axes, regardless of the segmentation, whereas sTUG outperforms TUG in (b). Comparing the results of (a) and (b) based on the overall TUG test, the Z axis multiscale entropy (MSE) features show signiﬁcance regardless of the approach: expert opinion or logistic prediction. Among various clinical test combinations, the only commonalities between (a) and (b) are the Y-axis MSE features when walking. Thus, machine learning should be based on both expert domain knowledge and a preliminary analysis with objective screening. Finally, the clinical test results are compared with the inertial sensor results, prompting the proposal for multi-oriented data analysis to objectively verify the sensor results.


Introduction
Falling is a common problem among older people living in the community and can have serious consequences for their lives and the society [1]. As reported by the World Health Organization (WHO), the population of elderly people (aged 60 or over) will increase from 900 million in 2015 to about 20 billion by 2050, accounting for 22% of the world's 2050 population [2]. Fall prediction is a multifaceted problem involving complex interactions among physiological, behavioral, and environmental factors [3]. Clinical fall-risk assessments typically include questionnaires and functional assessment of the posture, gait, cognition, and other risk factors with respect to falling [3]. These clinical assessments provide an overview of the fall-risk snapshots but are usually subjective, and those who use threshold assessment scores for performing the classification are considered to be decreasing rather than increasing [4]. Therefore, previous relevant studies have used the timed up and go (TUG) time score as a clinical evaluation item to overcome the subjective interpretation of falls [5]. The objective of clinical screenings is to find the outliers. The most common method used for clinically determining the thresholds is the clustering criteria. The TUG test performs a simple, rapid, and applicable clinical assessment of the balance and mobility of older people. However, the environmental assessment tools use clinical assessment scales or test data for analyzing the fall risk. The TUG assessment has been used to assess the fall risk in previous cohort studies of different versions and approaches using wearable sensors [5][6][7][8][9][10][11], predominantly by testing in controlled laboratory environments [9] or focusing on inpatients [11] and healthy subjects [5,[11][12][13]. In addition, although a few studies have focused on community-dwelling older people, there is no consensus about the scores for this group, most of which can be attributed to subjects belonging to different groups. For example, Shumway-Cook et al. [14] discriminated a community within 13.5 s. Steffen et al. [15] used 15 s to determine whether there is an optimal cutoff point for the risk of falling and whether the older people living in a family have a risk of falling. Subjects of different groups have different cutoff points. However, in the older-people community, there is no way to clearly divide this point. In case of the same problem, most research intends to select specific subjects for discussion; however, it is considerably difficult to select subjects applicable to the health of the real environment or the subhealth.
Under first-line clinical diagnosis, clinical-test evaluation allows a comprehensive quantitative comparison of the performances of different tasks; however, most fall-risk assessments require the supervision of a medical professional [16]. Therefore, these tests are usually not applicable to large groups, such as community-dwelling older people, under first-line clinical diagnosis because of medical manpower constraints. Screenings for older people surviving a fall must avoid cumbersome processes, minimize the time required for case screening, and reduce dropout. In community service, the risk of falling and the use of artificial intelligence machine learning techniques must be discussed as part of the Internet of Things (IoT) to support medical services and improve the medical quality. "Connected health" brings the concept of intelligent health, which is an application based on the IoT. The effectiveness of sensor-based care models in clinical outcomes and cost savings was mentioned in a literature review on connected medicine; however, an understanding of how to use the value of the data can be derived from the patients by monitoring them at home and in the community, so obtaining information and then combining it with other biopsychosocial data to obtain information about whether patients need intervention, knowing that corresponding care information is less discussed [17]. We believe that the use of data to optimize the development and verification of, and practical integration into, medical services, will be discussed as a benchmark study for the introduction of connected health.
Furthermore, the data collected by a sensor for determining the risk of falls has shown academic research value. Previous studies have shown that high specificity and sensitivity are the main objectives of a reliable fall prediction system [18] because the final decision will affect the classification and influence of the intermediate voters in case of several patients and crossovers. In some previous studies [16,19], inertial sensors were introduced in the community for ensuring community care services, and the TUG features were calculated using a stopwatch and by performing entropy measurements using sensors [9]. These methods are frequently utilized to evaluate physiological data, such as blood pressure, heart rate, and postural stability. Entropy analysis is used for gait evaluation across a spectrum of pathologies [20][21][22][23]. Sample entropy measures the probability that two sequences will remain similar at the next point of a time series. More random signals that are produced by healthy limbs are associated with higher sample entropy values [24]. Healthy human physical conditions are more complex when compared with the pathological movement. This entropy measurement indicates that the low entropy values represent a less random signal produced by an injured limb. A more random signal produced by a healthy condition is associated with a high entropy value. Unlike statistical measures or asymmetry indices, entropy values can provide insight with respect to the presence of what is considered to be desirable variability [24]. The entropy measures for time series, such as sample entropy (SampEn) and approximated entropy (ApEn) [25], can be used to measure the unpredictability (opposite of regularity) of a time series. More recently, Costa et al. [26,27] proposed multiscale entropy (MSE), a new entropy-based measure for time series that seems to better quantify complexity. These sensors can objectively understand the risk of falling; however, it is difficult to perform subsequent analysis in field experiments because of the large amount of data obtained from different axes of the sensor and because of the implementation of multiple evaluations. Moreover, the sensor does not have a cutoff point for classification; furthermore, there is no comparable experiment group, and it is impossible to extract the association of the fall-risk factors one-by-one. Therefore, when classifying high-dimensional data obtained from devices such as sensors, it is unclear whether the dataset is clearly separable (i.e., whether the interval between similar types is small or the interval between the different types is large).
If the dataset is not large (e.g., only a few hundred or thousand data items or only dozens or hundreds of fields per pen), training complex artificial intelligence algorithms may be insufficient. In contrast, small parts of large datasets can be sampled before the operation, and the data can be rapidly understood visually, albeit with limited accuracy, using a simple algorithm with low computational complexity. Recently, Wu et al. [28] used multiscale entropy to analyze the triaxial acceleration signals of the overall TUG and found that segment-based TUG (sTUG) was more irregular and complex for the non-fall-risk subjects compared with the fall risk one's subjects. These results indicate that, compared with the overall TUG, sTUG can reveal more information about postural (transition/gait) stability and provide more predictors of fall risk for assessment among community-dwelling elderly people. Therefore, we attempted to further explore the correlation between the internal sensor and the clinical test as further feedback. We must ensure the correlation of the data with clinical variables. We also need to learn more about how we used the MSE analysis sensor features for the best clinical capture and inform the patients about the environment in which the target data are obtained. By transforming data into information, we hope to use these data as physiological data for primary medical institutions for preliminary screening and prediction.
For those who actually perform the data analysis tasks, many ambiguous areas remain, such as the crossover among older people between fallers and non-fallers, as well as the different classification cut points. Therefore, a perception of the feasibility, acceptability, and availability of new tools and techniques in clinical practice [29,30] via data visualization is essential. This perception is far more critical than the usage of community services in obtaining an initial judgment of subhealth. Assessing the physical balance of older people in the community helps to identify a follow-up of those individuals who are likely to fall [31][32][33]. However, clinical trials of the scale filters are too sensitive, making older people likely to underestimate the risk of falling [15,[34][35][36][37][38]. In addition, fall-risk assessment should be verified across various aspects [20,[39][40][41] to ensure that the patient can reach a thoroughly informed decision. Such an assessment's main purposes include understanding the principles of data, discovering potential data structures, and extracting important variables. This approach also involves the detection of outliers, verification of hypotheses, development of data reduction models, and determination of optimization factor settings [40]. Therefore, in this study we hope to use the cutoff points of clinical tasks as benchmark results to explore how to incorporate sensors in the auxiliary tasks of TUG and short-form Berg balance scale (SFBBS) clinical tasks to reflect the results of patients in the past and to assess the value of the sensor data. This aim integrates connected health thinking into the environment, allowing individuals to shift from a "Reaction" mode to "Preaction" mode.

Protocol
The experimental design of this research is cross-sectional. The research data were sourced from a cross-disciplinary professional team in a regional hospital in central Taiwan, including geriatric physicians, rehabilitation physicians, physical therapists, occupational therapists, and social workers. From April 2014 to May 2015, a community fall risk assessment for the elderly in the neighborhood of the Feng Yuan hospital (including 7 administrative regions) was conducted, randomly including clinical tests such as TUG and the SFBBS. On average, older people spent 20 min to complete the suite of fall assessment tests. Simultaneously, data were collected using TUG tests with sensors. The X, Y, and Z axes were aligned with vertical (V; Up, +; Down, −), mediolateral (ML; Right, +; Left, −), and anterior-posterior (AP; Forward, +; Backward, −) directions; accordingly, a segment-based TUG (sTUG) test was designed, which can be obtained according to the domain knowledge, including the "sit-to-stand (STS)", "sit-to-walk", "turn", and "stand-to-sit (STS2)" segments. This study was approved by the Institutional Review Board of Tsaotun Psychiatric Center, Ministry of Health and Welfare.

Subjects
We present a field experiment conducted by a professional team from a hospital in central Taiwan that recruited older people in communities to assess their risk of falling. Seventy-four community-dwelling older people persons living in different communities in Taiwan were recruited as the convenience sample (17 males, age: 75.23 ± 11.50; 57 females, age: 73.12 ± 8.56). As reviewed by the professional team, the enrolment criteria for the subjects excluded those with musculoskeletal system injuries, any history of central nervous system injuries, and/or walking dependently with or without the use of any aid within the previous three months. After the researchers explained the details, each subject agreed to participate and signed consent forms prior to entering the laboratory.

Inertial Sensor
The inertial sensor used to collect data during the TUG test included a triaxial accelerometer sensor board (Freescale RD3152MMA7260Q) with a sampling rate of 45 readings/s, an internal battery, and an SD storage card placed inside a plastic box. The sensor was small and lightweight and could be worn by the test subjects without causing any discomfort. The location of the strapped sensor approximated the center of mass location with respect to the belt; it was placed between the 3rd and 5th lumbar vertebrae (L3-L5) [22], as shown in Figure 1. The sensor weighed approximately 26.5 g, and the length, width, and height were 232, 7, and 21 mm, respectively, placing no substantial burden on the subject. The accelerometer collected data from the X (vertical, V; up: +, down: −), Y (mediolateral, ML; right: +, left: −), and Z (anterior-posterior, AP; forward: +, backward: −) axes. We used MSE to calculate the complexity of the three axes as features. While calculating the sample entropy for each coarse-grained series, the MSE calculation process involved the following sub-processes [27]: (1) coarse granulation, (2) sample entropy, and (3) the complexity index. The "coarse-graining" process is based on the scale factor of the segment windows, in which the average of the data values for each segment window is calculated, forming a new time series. Each element y j (τ) of the new coarse-grained time series is calculated according to where τ is the scale factor, N is the size of the original dataset, and x i is a data point in the original time series. For a scale factor of 1, the coarse-grained time series is simply the original time series; a scale factor of 2 involves the averaging of a pair of consecutive points from the original time series, which becomes the time series from which to create a scale factor of 3, and so on. When calculating the first and remaining time series, the duration between them is determined and the maximum value is taken, as shown in Figure 2.  The sample entropy (SampEn) was calculated for each coarse-grained time series in order to obtain entropy measurement values at each scale [42]. SampEn at each time scale τ was expressed as the negative of the natural logarithm of the conditional probability ( ), where the two sequences are compared within a similarity tolerance r for consecutive points m and m+1. The formula for SampEn is where N is the total number of data points, m is the number of consecutive data points, and r is the tolerance for accepting the match, which was chosen to be between 10% and 20% of the sample standard deviation σ of the time series. In this study, the parameters were set as follows: m = 2 and r = 0.15σ. The complexity index is a function of sample entropy, which in turn is a function of the specification factor and is calculated using Equation (3). The sum of the sample entropy (SampEn) at each scale is the complexity index (CI), as shown in Figure 3.   The sample entropy (SampEn) was calculated for each coarse-grained time series in order to obtain entropy measurement values at each scale [42]. SampEn at each time scale τ was expressed as the negative of the natural logarithm of the conditional probability , where the two sequences are compared within a similarity tolerance r for consecutive points m and m+1. The formula for SampEn is where N is the total number of data points, m is the number of consecutive data points, and r is the tolerance for accepting the match, which was chosen to be between 10% and 20% of the sample standard deviation σ of the time series. In this study, the parameters were set as follows: m = 2 and r = 0.15σ. The complexity index is a function of sample entropy, which in turn is a function of the specification factor and is calculated using Equation (3). The sum of the sample entropy (SampEn) at each scale is the complexity index (CI), as shown in Figure 3.
Wearable inertial sensor specifications: ADXL345 three-axis acceleration gauge ITG3200 three-axis gyroscope HMC5883L 3-axis digital compass The sample entropy (SampEn) was calculated for each coarse-grained time series in order to obtain entropy measurement values at each scale [42]. SampEn at each time scale τ was expressed as the negative of the natural logarithm of the conditional probability C m (r), where the two sequences are compared within a similarity tolerance r for consecutive points m and m + 1. The formula for SampEn is where N is the total number of data points, m is the number of consecutive data points, and r is the tolerance for accepting the match, which was chosen to be between 10% and 20% of the sample standard deviation σ of the time series. In this study, the parameters were set as follows: m = 2 and r = 0.15σ. The complexity index is a function of sample entropy, which in turn is a function of the specification factor and is calculated using Equation (3). The sum of the sample entropy (SampEn) at each scale is the complexity index (CI), as shown in Figure 3.

Clinical Tests
Many factors contribute to the falls of older people. To enhance the participation of the older people and understand the influencing factors from different perspectives, this study used the SFBBS and TUG assessment scales that compare the results of the sensor. The details of each clinical scale are provided below. SFBBS, used to assess balance, is a performance-based measure of balance during specific movement tasks. It assesses the static and dynamic balance and fall risks in adults and the geriatric population. Karthikeyan et al. [43] suggested that a community-dwelling elderly cutoff point score of 23 or less is a deficit in balance (out of 28 points) for a better predictive value. The TUG test, used to evaluate gait ability, is a well-known clinical test for mobility and fall risk [44]. It assesses mobility, balance, walking ability, and fall risk in older adults. Shumway-Cook [14] suggested a cutoff point of 13.5 s for community-dwelling elderly (over 65 years old), but Alexandre et al. [45] suggested a cutoff point of 12.47 s to achieve a better predictive value. In this study, we will discuss each cutoff point. In addition, we refer to the past literature [9,28] to segment the TUG test (sTUG), as shown in Figure 4, dividing it into sit-to-stand (STS), walk, turn, and stand-to-sit (STS2). We also used MSE to calculate the complexity of the sTUG for each axis as a feature. Herein, we attempt to ensure that the knowledge discovered from the cutoff point-determined compassion via sensors can be communicated to domain experts, the provision of an explanation when deploying, and using this knowledge with new technology to explore the manner in which the sensor measurements are meaningful.

Clinical Tests
Many factors contribute to the falls of older people. To enhance the participation of the older people and understand the influencing factors from different perspectives, this study used the SFBBS and TUG assessment scales that compare the results of the sensor. The details of each clinical scale are provided below. SFBBS, used to assess balance, is a performance-based measure of balance during specific movement tasks. It assesses the static and dynamic balance and fall risks in adults and the geriatric population. Karthikeyan et al. [43] suggested that a community-dwelling elderly cutoff point score of 23 or less is a deficit in balance (out of 28 points) for a better predictive value. The TUG test, used to evaluate gait ability, is a well-known clinical test for mobility and fall risk [44]. It assesses mobility, balance, walking ability, and fall risk in older adults. Shumway-Cook [14] suggested a cutoff point of 13.5 s for community-dwelling elderly (over 65 years old), but Alexandre et al. [45] suggested a cutoff point of 12.47 s to achieve a better predictive value. In this study, we will discuss each cutoff point. In addition, we refer to the past literature [9,28] to segment the TUG test (sTUG), as shown in Figure 4, dividing it into sit-to-stand (STS), walk, turn, and stand-to-sit (STS2). We also used MSE to calculate the complexity of the sTUG for each axis as a feature. Herein, we attempt to ensure that the knowledge discovered from the cutoff point-determined compassion via sensors can be communicated to domain experts, the provision of an explanation when deploying, and using this knowledge with new technology to explore the manner in which the sensor measurements are meaningful.

Clinical Tests
Many factors contribute to the falls of older people. To enhance the participation of the older people and understand the influencing factors from different perspectives, this study used the SFBBS and TUG assessment scales that compare the results of the sensor. The details of each clinical scale are provided below. SFBBS, used to assess balance, is a performance-based measure of balance during specific movement tasks. It assesses the static and dynamic balance and fall risks in adults and the geriatric population. Karthikeyan et al. [43] suggested that a community-dwelling elderly cutoff point score of 23 or less is a deficit in balance (out of 28 points) for a better predictive value. The TUG test, used to evaluate gait ability, is a well-known clinical test for mobility and fall risk [44]. It assesses mobility, balance, walking ability, and fall risk in older adults. Shumway-Cook [14] suggested a cutoff point of 13.5 s for community-dwelling elderly (over 65 years old), but Alexandre et al. [45] suggested a cutoff point of 12.47 s to achieve a better predictive value. In this study, we will discuss each cutoff point. In addition, we refer to the past literature [9,28] to segment the TUG test (sTUG), as shown in Figure 4, dividing it into sit-to-stand (STS), walk, turn, and stand-to-sit (STS2). We also used MSE to calculate the complexity of the sTUG for each axis as a feature. Herein, we attempt to ensure that the knowledge discovered from the cutoff point-determined compassion via sensors can be communicated to domain experts, the provision of an explanation when deploying, and using this knowledge with new technology to explore the manner in which the sensor measurements are meaningful.

Data Analysis
We use different entry points for optimized clinical-test classification and substituted the multiscale entropy (MSE) analysis results of the axes and segmentation. We also resampled the row data [46] because the data length had to be considered when performing an MSE analysis. Typically, the time series of the length used herein with 1800 data points could be coarse-grained up to Scale 6, in which case the shortest coarse-grained time series would contain 300 data points [47], providing up to 1800 data points for each segment-based TUG for MSE calculation. To compare the results of the expert returns, we used the fall-risk assessment, including the MSE analysis, logistic regression, receiver operating characteristic (ROC), area under the ROC curve (AUC), and clinical tests. The data analysis will be divided into two parts: the first part is based on the fall-risk judgment of experienced clinicians, including TUG = 13.5, TUG = 12.47, and SFBBS = 23, as the basis for the fall risk of the cut-point group; and a t-test to determine the significant differences between the features. The other part uses the logistic regression path to gradually find the eigenvalues. The data are divided into two groups: an 80% training set (training set) and 20% test set (testing set). The training set data are included in the estimated parameters, and the model establishing the test set data is used to test the model built on the training set. The results of the verification can be used as an indicator of the best model selection. By comparing these two types of data analysis results, we can better understand the similarities and differences between the feature values determined using the internal sensor and the medical experts. Table 1 shows the features of the inertial sensor and TUG for analyzing fall events in this study.

Results
Our discussion and analysis will use internal sensor axes and a segment TUG as the features of data analysis and can be divided into three main parts. We used a clinical test in all TUG and sTUG tests and considered various cutoff points. First, we used t-test analysis to verify the fall-risk categorization. We also used a logistic regression analysis stepwise 100 times to divide the features into two groups: an 80% training set (training set) and 20% test set (testing set). (c) We compared the features between (a) and (b) to understand whether the decisive features are actually similar to the results of the clinical tests.

Clinical Test of Cut Point Based on Experienced Clinician's Fall-Risk Evaluation
We display multiple data to discuss the results of the inertial sensor because previous publications have performed discussion and verification on the clinical-test score (according to fallers and non-fallers). Karthikeyan et al. [43] suggested the occurrence of impaired balance when the BBS score is 23 or below; therefore, this value was considered to be our cutoff point. Shumway-Cook et al. [14] recommended that the community-dwelling older adults should be considered at high risk at TUG times of greater than 13.5 s. Alexandre et al. [45] suggested that a cutoff point of 12.47 s resulted in a good predictive value. Thus, we attempted to divide each clinical test into two groups by comparing the accelerometer features with a t-test. A p-value smaller than 0.05 implies a statistically significant difference, indicating fallers distinguished by features are quite similar to the clinical-test result. We use this as the opinion of expert judgment. Table 2 shows the results for the cutoff point of the TUG and SFBBS intersection or convergence. At first glance, with or without segmentation, the most significant features are the Y and Z axes.
Moreover, we created ROC to explore the ability of the clinical measures and CI values to predict a favorable fall-risk outcome. All the results shown in Figure 5 exhibit an AUC of between 0.7 and 0.9, Appl. Sci. 2020, 10, 6931 8 of 15 indicating an acceptable or excellent discrimination. The results of the sTUG AUC are all greater than 0.85, but the TUG is just greater than 0.79. However, it can be seen from Table 2 that the features of the sTUG show at least six features at different cutoff points simultaneously, but the overall TUG is 2, which can be seen in addition to the subdivision to cut off the AUC area.

Clinical Test of Stepwise Logistic Regression as Fall-Risk Evaluation (Prediction Results)
Because the clinical test uses the cutoff point as a dichotomous categorical variable, as the expert opinion, we use the logistic-regression analysis of the machine learning scheme as the automatic

Clinical Test of Stepwise Logistic Regression as Fall-Risk Evaluation (Prediction Results)
Because the clinical test uses the cutoff point as a dichotomous categorical variable, as the expert opinion, we use the logistic-regression analysis of the machine learning scheme as the automatic selection method to explore the prediction results. Among them, we used the clinical test (including intersection or convergence of TUG = 12.47, TUG = 13.5, and SFBBS = 23) and inertial sensor axes, with or without a segmented TUG, as the features. Therefore, we used the stepwise regression method. To avoid overfitting problems and minimize the bias, we used a 100-iteration, random-shuffle-split cross-validation (100-RSSCV). To perform this, a single random-shuffle-split was configured to select a random subset of 80% of the data for training the model with the remaining 20% of the data being used as a random subset for model testing when the observing feature frequency and feature importance of fall risk. Hence, the number of times each feature was selected can be easily calculated. Table 3 shows the results. We first selected a feature with a frequency of more than 50 times as important features. The results show that in case of overall TUG, T2 and T3 are more than 50 times, even 100 times, and that in case of sTUG, F4 and F8 are the features with a frequency of more than 50 times. We also created ROC to explore the ability of the clinical measures and CI values to predict a favorable fall-risk outcome. All the results shown in Figure 6 exhibit an AUC of between 0.7 and 0.96, indicating acceptable or excellent discrimination. Thus, the results of the sTUG AUC are all greater than 0.85, whereas the results of the TUG are just greater than 0.78. Bold-face values: We used a 100-iteration, random-shuffle-split cross-validation (100-RSSCV). The feature is selected with a frequency of more than 50 times. Bold-face values: We used a 100-iteration, random-shuffle-split cross-validation (100-RSSCV). The feature is selected with a frequency of more than 50 times. Figure 6. ROC for classification by logistic regression for the clinical test and TUG segmentation.

Comparing (a) and (b)
Comparing the results of (a) and (b), first, from the overall TUG of Tables 2 and 3, T3 is a feature that has significant differences regardless of expert opinion or logistic prediction, but the differences between different clinical test combinations can still be seen. For example, the features of TUG = 12.47

Comparing (a) and (b)
Comparing the results of (a) and (b), first, from the overall TUG of Tables 2 and 3, T3 is a feature that has significant differences regardless of expert opinion or logistic prediction, but the differences between different clinical test combinations can still be seen. For example, the features of TUG = 12.47 are T1, T2, and T3, but the feature of TUG = 13.5 is only T3. The use of clinical test in the stepwise logistic regression for fall-risk evaluation (prediction results) under different clinical test combinations is noteworthy. T2 and T3 are significantly different. This indicates that the features selected for fall risk are all T2 and T3.
From the perspective of sTUG (Tables 2 and 3), the expert use of the cutoff point to judge the risk of falling (clinical tests) is noted. Using different clinical test combinations, F7, F8, F9, F10, F11, and F12 exhibit significant differences. The clinical test's stepwise logistic regression for fall-risk evaluation (prediction results) results in F4 and F8 under different clinical test combinations. The only duplicate between the two is F8. At first glance, expert opinion requires at least six features (F7, F8, F9, F10, F11, and F12) to be screened for falling risk. However, from the prediction results, a maximum of five can be screened, and most of the combinations only require four features. The results of the sTUG AUC are all more than 0.85.

Discussion
The current literature has identified a wide range of accelerometer-based features, but with no consensus regarding the optimal variables to be examined based on the obtained data. The chosen features or the extraction manner often differ among studies. In this study, we aimed to extract a comparison of features that had previously been associated with a fall-risk cutoff point, with an emphasis on the selection of clinically relevant features; the inertial sensor diagnosis can be considered equivalent to expert diagnosis. In our study, we attempted to refer to the fall risk judged by the cutoff point from the past literature as the result of expert opinion judgment (Table 2). In the overall TUG result, we found that, regardless of the cutoff point, T3 is important and has significant differences. T3 represents the Z (AP) axis of the inner sensor. O'Neill et al. (1994) [48] reported that most falls involve falling forward. Accordingly, the AP direction (Z axis) can classify clusters. As mentioned by Weiss et al. [37], compared with the traditional method of using a stopwatch to distinguish fallers from non-fallers, we obtained the same result, meaning to get the subject to go forward and do the task is the first inertial step. It is worth noting that T2 is also a very important feature, which is equivalent to an expert clinically observing the part where the subjects will naturally swing their hands when walking [49]. The T1 X axis (up: +, down: −) values are not easy to use as observation values because people will naturally maintain their own balance, which is the balance maintained by proprioception.
Moreover, the results of the clinical test of the cutoff point for fall-risk evaluation (expert opinion) in Table 2 show the features that are significant with each cutoff point as follows: F7, F8, F9, F10, F11, and F12. It is worth noting that compared with the overall TUG, there are significant differences in the axial view between the Y and Z axes. Nevertheless, adding segments will increase the AUC area as a whole; for example, considering the situation of SFBBS = 23 or TUG = 13.5, we can determine the AUC area from 0.7974 to 0.9061. F8 and F12 indicate the Y and Z axes for walking; we assume that this interpretation is similar to the above for both TUG and sTUG. The four segments, F9, F10, F11, and F12, also focus on the Z axis (anterior-posterior, AP; forward: +, backward: −). In our opinion, based on most of the AUC area results in Table 2, we can see that the cutoff AUC is between 0.7 and 0.9, indicating acceptable or excellent discrimination. The use of sensors as objective instruments to assist frontline personnel does not necessarily require professionals to make observations [50] or even segmentation plus axial exploration. The combined results of the clinical test can further increase the AUC area. In addition, multifactor fall prevention measures for the elderly have been recommended as one of the most effective methods [51], and multifactor analysis has been used to assess the fall risk among the elderly. In the past, scholars have successively developed and verified fall-risk assessment tools for the elderly in communities, but most of the assessment tools are designed for screening and cannot include the risk factors required for a complete fall-risk assessment [52,53]. We discussed the different clinical tests and cutoff points of the intersection and set points and found that the overall TUG was T3 (CI_Z) in each clinical test; that is, the Z-axis result can be directly used as the clinical-test classification using MSE, similar to the preliminary result of the use of SFBBS and TUG multifactors as the initial screening factors for fall-risk assessment [52,53]. We consider that in addition to more accurate results, multifactors (oriented) can also be considered. Moreover, from the perspective of action segmentation, STS and turning only have significant differences in the Z axis, and walk and STS2 have differences in the Y and Z axes, respectively. This result seems to be the axial direction. In particular, no matter which segmentation action, the Z axis shows a significant difference, but if the accuracy is improved, segmentation needs to be added. It is difficult to integrate multiple information sources to enhance clinical decision-making and provide complete patient records, which is a major finding in this study, particularly in the decision of segmentation action [9,28].
From the results of sTUG, we noted an interesting observation. The AUC results are not different. The results of the clinical test stepwise logistic regression for fall-risk evaluation (prediction results) features are F4, F8; these are the X and Y axes of walking. Compared with the clinical test's cutoff point for fall-risk evaluation (expert opinion), the features are F7, F8, F9, F10, F11, and F12, focusing on the Y and Z axes. In addition to the observation method of the experts, the prediction results can assist to confirm the X-axis status and key actions. It is worth mentioning that the combination of prediction results and clinical test can reach the AUC level of expert opinion using only a maximum of five feature values. As auxiliary first-line medical staff, sensors can be used to replace manpower [54,55], even in the subsequent data analysis, and predicting the risk of falling can also be discussed using the multifactors of this study. It is difficult to integrate multiple information sources to enhance clinical decision-making and provide complete patient records. In our study, we compared the ways of making preliminary community screening recommendations to the hospital and constructing predictive regression models. The clustering results can reveal hidden information contained in personal health record systems to the experts and patients in a cluster. Such a sensor can effectively enable user matching, providing a new pipeline for users to collect information and allowing them to pay more attention to their own healthcare and medical plans.
Classification and regression are two of the most representative examples of problem description for both machine learning algorithms and statistical analysis. We used the same concept to analyze the classification. Note that the same idea can be expressed differently by statistical (clinical test) and machine learning (logistic regression) techniques. In addition, past studies have indicated that the cutoff points determined by different subjects are different [14,43]. Therefore, several studies [15,52,53] used multiple evaluations to resolve the conflict. Even in clinical settings, there is no shortage of fall assessment tools; however, because there are several causes for falls, multiple factors must be evaluated to improve the accuracy of fall-risk prediction [13,37]. In this study, we incorporated sensors while using logistic regression as an analytical method for prediction of multiple assessment tools, also simulating experts to verify the availability of sensors. Data collection among the elderly can be very difficult [5,56], so it is not easy to strike a balance with gender restrictions. Therefore, this study discussed the judgment of falls; any discussion of gender differences was necessarily rather limited.

Conclusions
This study used logistic-regression analysis to show the inertial data on a fall-risk scale with various perspectives to allow medical practitioners to screen for high-risk patients and to serve as a decision-making reference for predicting future falls, particularly during the early stages of a disease. Doctors with rigorous medical training can integrate clinical experience and scientific computing aids to make judgments and decisions. Comparing the results of the two (expert opinion vs. predication), it was found that the AUC results are not much different, but the features are not the same. In the overall TUG, by considering a higher AUC (prediction results), we found that the sensors of the Y and Z axes can classify fallers and non-fallers in each clinical-test combination. However, after the segmentation, a logistic regression was used to determine the AUC of the optimization result; the general combination only needs four features to achieve a considerable result. Therefore, this study proposes that machine learning should be based on expert domain knowledge in addition to preliminary analysis with objective screening. Hence, we used a clinical test to compare to the inertial sensor. Simultaneously, it proposes multi-oriented data analysis to objectively verify the results of the sensor.