Timed Up and Go and Six-Minute Walking Tests with Wearable Inertial Sensor: One Step Further for the Prediction of the Risk of Fall in Elderly Nursing Home People

Assessing the risk of fall in elderly people is a difficult challenge for clinicians. Since falls represent one of the first causes of death in such people, numerous clinical tests have been created and validated over the past 30 years to ascertain the risk of falls. More recently, the developments of low-cost motion capture sensors have facilitated observations of gait differences between fallers and nonfallers. The aim of this study is twofold. First, to design a method combining clinical tests and motion capture sensors in order to optimize the prediction of the risk of fall. Second to assess the ability of artificial intelligence to predict risk of fall from sensor raw data only. Seventy-three nursing home residents over the age of 65 underwent the Timed Up and Go (TUG) and six-minute walking tests equipped with a home-designed wearable Inertial Measurement Unit during two sets of measurements at a six-month interval. Observed falls during that interval enabled us to divide residents into two categories: fallers and nonfallers. We show that the TUG test results coupled to gait variability indicators, measured during a six-minute walking test, improve (from 68% to 76%) the accuracy of risk of fall’s prediction at six months. In addition, we show that an artificial intelligence algorithm trained on the sensor raw data of 57 participants reveals an accuracy of 75% on the remaining 16 participants.


Introduction
Falls are an inevitable part of aging and their prediction and prevention are of paramount importance to health care. According to the World Health Organization (WHO), falls are the second cause of accidental death and approximately 646,000 people die every year following falls, particularly people over the age of 65. In the European Union (EU), an average of 35,848 elderly people/y (65 and above) are reported to (TUG+), and an AI algorithm based on sensor measurements only. The first two tests require the presence of a therapist, while the third could be implemented in an autonomous wearable system.
Participants who chose to discontinue their participation, were admitted to hospital, had their medication changed thus preventing the continuation of the experiment or died before the completion of the experiment were also excluded.
Ninety-two individuals started the initial test. Twelve of them were excluded from the survey in view of the above criteria. Seventy-three participants completed the experiment.

Protocol
The survey took place between the month of May (t 1 ) and the month of November (t 2 ) 2018. At t 1 , the TUG test was performed in all participants. Participants sat down with their back against a 46 cm high chair rest. When signaled, participants were asked to get up, walk three meters, turn around and sit down again. Participants then performed the six-minute walking test with the right to take short pauses as required. Measurements were taken by the two same experimenters (L.C. and R.G.), who always walked aside the patient to prevent any fall in case they lost balance. Six-minute tests were performed by walking a typical point-to-point track of 25 m with serial turnarounds. The turnaround points were clearly marked with strong adhesive tape stuck on the floor. During the six-minute walking test, participants were equipped with a homemade sensor collecting several kinematic data. The ultralow-cost sensor, Sensors 2020, 20, 3207 4 of 15 called DYSKIMOT, was presented in a previous survey [14]. It is based on the Magnetic Angular Rate and Gravity (MARG) sensor LSM9DS1 (SparkFun), composed of a 3-axis accelerometer, gyrometer and magnetometer, plus a temperature sensor ( Figure 1A,B). It is light (10.44 g) and small enough (3 × 3 cm) to be worn by a patient without any disturbance ( Figure 1B). Among other quantities, the MARG sensor measures acceleration, → a (t) (in [g], ±16 [g]), and angular velocity, → ω(t) (in • /s, ±2000 • /s) at a sampling frequency of 100 Hz. The data are transmitted to a PC via an Arduino Uno Rev 3 and a USB cable (RS232 serial link) and then transferred to a homemade acquisition software for further analysis. More details can be found in [14], including a comparison between DYSKIMOT and a gold standard optoelectronic sensor. The sensitivity depends on the sensor and on the selected range; detailed information is given in the datasheet (https://www.st.com/en/mems-and-sensors/lsm9ds1.html). For example, the gyrometer sensitivity is 8.75 × 10 −3 • /s/LSB at the range ±245 • /s, i.e., the range we use in the present study, and the accelerometer sensitivity is 0.322× mg/LSB at the range ±4× g. During the experiment, the DYSKIMOT was positioned in the lumbar region of the individual at the level of the L4 vertebra ( Figure 1C,D) in such a way that the sensor's cartesian frame matched with walking directions.
Sensors 2020, 20, x FOR PEER REVIEW 4 of 14 disturbance ( Figure 1B). Among other quantities, the MARG sensor measures acceleration, ⃗ ⃗ ( ) (in [g], ±16 [g]), and angular velocity, ⃗⃗⃗ ( ) (in °/s, ±2000 °/s) at a sampling frequency of 100 Hz. The data are transmitted to a PC via an Arduino Uno Rev 3 and a USB cable (RS232 serial link) and then transferred to a homemade acquisition software for further analysis. More details can be found in [14], including a comparison between DYSKIMOT and a gold standard optoelectronic sensor. The sensitivity depends on the sensor and on the selected range; detailed information is given in the datasheet (https://www.st.com/en/mems-and-sensors/lsm9ds1.html). For example, the gyrometer sensitivity is 8.75 10 −3°/ s/LSB at the range ±245 °/s, i.e., the range we use in the present study, and the accelerometer sensitivity is 0.322× mg/LSB at the range ±4× g. During the experiment, the DYSKIMOT was positioned in the lumbar region of the individual at the level of the L4 vertebra ( Figure 1C,D) in such a way that the sensor's cartesian frame matched with walking directions. Between t1 and t2, each fall of a participant was recorded by the nursing home staff. The fall records were used at t2 to categorize participants into fallers and nonfallers. At the end of the survey, 23 fallers and 50 nonfallers were noted. Among the 23 fallers, 17 participants made 1 fall, while 2 participants fell two, three and four times. The lack of "frequent fallers" (more than 1 fall) in our population has led us to consider a binary classification rather than a more detailed description in terms of, say, the number of falls between t1 and t2.

Data Analysis
Time series included ⃗⃗⃗⃗⃗⃗ = ( , , ) and ⃗ ⃗ = ( , , ) (2 times 3 components). Typical traces of selected time series are shown in Figure 2. The indices ml, v and ap stand for mediolateral, vertical and anteroposterior, respectively. We recorded for analysis the data at t1 and t2 of the 73 participants still included at t2. Between t 1 and t 2 , each fall of a participant was recorded by the nursing home staff. The fall records were used at t 2 to categorize participants into fallers and nonfallers. At the end of the survey, 23 fallers and 50 nonfallers were noted. Among the 23 fallers, 17 participants made 1 fall, while 2 participants fell two, three and four times. The lack of "frequent fallers" (more than 1 fall) in our population has led us to consider a binary classification rather than a more detailed description in terms of, say, the number of falls between t 1 and t 2 . anteroposterior, respectively. We recorded for analysis the data at t 1 and t 2 of the 73 participants still included at t 2 . We chose to focus on the assessment of time series variability. As shown in [5], gait variability can be correlated with the risk of fall. Further studies such as [9] have analyzed it using mathematical tools such as fractal dimension and have demonstrated that those tools can discriminate between healthy and disabled individuals, healthy individuals showing a higher fractal dimension. The parameters identified in the present study are the standard deviation (SD) and fractal dimension (D) obtained from the six time series (⃗ ⃗ and ⃗⃗⃗ ) recorded during the six-minute walking test. Fractal dimension was computed by resorting to the box counting method: If ( ) is the number of square boxes of side necessary to cover the plot of the time series under study, then ( ) scales as − when → . D is, therefore, the slope of ( ) vs. in a log-log plot for small values of . Computational details about the method we use are presented in [19] and additional mathematical references can be found in [20]. SD and D give complementary information about gait variability: SD provides an indication about the magnitude of the fluctuations while D represents the time series complexity, i.e., smooth (D close to 1) or abrupt (D close to 2) relative changes in successive measurements.

Time series included
Due to failed normality tests on data, a Mann-Whitney test was performed with a significance level of 0.05 in order to ascertain potential differences between fallers and nonfallers at t1.
Three classification algorithms were then defined to classify participants as presenting a risk of fall or not: the TUG test, the TUG+ test and the AI algorithm (see below for more details). Standard tools belonging to binary classification were then used to compare our "diagnostic" (risk of fall or not) to the actual faller or nonfaller status of our participants. Sensitivity ( The TUG test at t1 was first analyzed by computing a Receiver Operating Characteristic (ROC) curve. The time maximizing Youden's index + − was computed and chosen as the threshold, t*, to separate participants with and without risk of fall. An augmented TUG test (TUG+) was then designed following the decision tree shown in Figure 3. It includes information from the variability indices that showed a significant difference between fallers and nonfallers at t1. If an individual is recognized by the clinical test as a faller, then he/she is assessed a second time by one or more kinematic parameters displaying a significant difference between fallers and nonfallers (according to fall recording). Threshold values were chosen after several attempts at designing an We chose to focus on the assessment of time series variability. As shown in [5], gait variability can be correlated with the risk of fall. Further studies such as [9] have analyzed it using mathematical tools such as fractal dimension and have demonstrated that those tools can discriminate between healthy and disabled individuals, healthy individuals showing a higher fractal dimension. The parameters identified in the present study are the standard deviation (SD) and fractal dimension (D) obtained from the six time series ( → a and → ω) recorded during the six-minute walking test. Fractal dimension was computed by resorting to the box counting method: If N(ε) is the number of square boxes of side ε necessary to cover the plot of the time series under study, then N(ε) scales as ε −D when ε → 0 . D is, therefore, the slope of N(ε) vs. ε in a log-log plot for small values of ε. Computational details about the method we use are presented in [19] and additional mathematical references can be found in [20]. SD and D give complementary information about gait variability: SD provides an indication about the magnitude of the fluctuations while D represents the time series complexity, i.e., smooth (D close to 1) or abrupt (D close to 2) relative changes in successive measurements.
Due to failed normality tests on data, a Mann-Whitney test was performed with a significance level of 0.05 in order to ascertain potential differences between fallers and nonfallers at t 1 .
Three classification algorithms were then defined to classify participants as presenting a risk of fall or not: the TUG test, the TUG+ test and the AI algorithm (see below for more details). Standard tools belonging to binary classification were then used to compare our "diagnostic" (risk of fall or not) to the actual faller or nonfaller status of our participants. Sensitivity The TUG test at t 1 was first analyzed by computing a Receiver Operating Characteristic (ROC) curve. The time maximizing Youden's index Sp + Se − 1 was computed and chosen as the threshold, t*, to separate participants with and without risk of fall. An augmented TUG test (TUG+) was then designed following the decision tree shown in Figure 3. It includes information from the variability indices that showed a significant difference between fallers and nonfallers at t 1  test as a faller, then he/she is assessed a second time by one or more kinematic parameters displaying a significant difference between fallers and nonfallers (according to fall recording). Threshold values were chosen after several attempts at designing an augmented TUG test as it provided the best accuracy. The AI algorithm was designed as follows. The times series related to a participant were divided into fixed-duration windows so that the inputs of our model are fixed-size vectors. Models were trained and tested for different window sizes; the value 20 s was chosen, see Section 3.3 for a justification. Splitting one 6 min time series into smaller ones may also have the advantage of avoiding the AI to learn long-term autocorrelations which are typical of human walk [9,21]. We used a 10 s overlapping during the frame generation of the training dataset ( Figure 4) to increase its size and to remove any bias regarding the starting position of the frames. The obtained dataset was divided by test/validation datasets and training as a compromise between having enough fallers data for test/validation and for training. The test/validation is made up of 16 participants (8 fallers and 8 nonfallers randomly chosen) and the training dataset is made up of 57 participants (16 fallers and 41 nonfallers). Models based on Convolutional Neural Network (CNN) [22] have then been trained and tested to find the optimal accuracy on the risk of fall prediction in the test/validation dataset. We have chosen an AI classification based on deep learning (CNN) among other machine learning solutions because of its capacity to extract features by itself. Since TUG and TUG+ methods are based on features we selected (TUG time and variability indices), such a deep learning approach was preferred because it is complementary.

. If an individual is recognized by the clinical
Sigmaplot (v. 11.0) and R (v. 3.5.0) software were used to perform the statistical calculations. AI algorithm was designed by using the standard software Keras over Tensorflow 2.0. R package pROC was used. The AI algorithm was designed as follows. The times series related to a participant were divided into fixed-duration windows so that the inputs of our model are fixed-size vectors. Models were trained and tested for different window sizes; the value 20 s was chosen, see Section 3.3 for a justification. Splitting one 6 min time series into smaller ones may also have the advantage of avoiding the AI to learn long-term autocorrelations which are typical of human walk [9,21]. We used a 10 s overlapping during the frame generation of the training dataset ( Figure 4) to increase its size and to remove any bias regarding the starting position of the frames. The obtained dataset was divided by test/validation datasets and training as a compromise between having enough fallers data for test/validation and for training. The test/validation is made up of 16 participants (8 fallers and 8 nonfallers randomly chosen) and the training dataset is made up of 57 participants (16 fallers and 41 nonfallers). Models based on Convolutional Neural Network (CNN) [22] have then been trained and tested to find the optimal accuracy on the risk of fall prediction in the test/validation dataset. We have chosen an AI classification based on deep learning (CNN) among other machine learning solutions because of its capacity to extract features by itself. Since TUG and TUG+ methods are based on features we selected (TUG time and variability indices), such a deep learning approach was preferred because it is complementary.
Sigmaplot (v. 11.0) and R (v. 3.5.0) software were used to perform the statistical calculations. AI algorithm was designed by using the standard software Keras over Tensorflow 2.0. R package pROC was used.

Variability Indices
The variability indices (SD and D) computed from the six time series recorded during the sixminute test are shown in Table 2. The medians are compared and the p-values are indicated. Two parameters are significantly different for the fallers and nonfallers: SDaap is significantly larger for fallers and Dav is significantly smaller. Those parameters will then be selected in the TUG+ test. Note that the fractal dimensions are globally smaller for fallers than for nonfallers.

TUG and TUG+ Tests
Fallers perform the TUG test significantly slower than nonfallers. ROC curve and confusion matrix for the TUG test are shown in Figures 5 and 6. Related parameters are shown in Table 3. Youden's statistics for the TUG test leads to an optimal threshold of 22.5 s. An individual performing

Variability Indices
The variability indices (SD and D) computed from the six time series recorded during the six-minute test are shown in Table 2. The medians are compared and the p-values are indicated. Two parameters are significantly different for the fallers and nonfallers: SDa ap is significantly larger for fallers and Da v is significantly smaller. Those parameters will then be selected in the TUG+ test. Note that the fractal dimensions are globally smaller for fallers than for nonfallers.

TUG and TUG+ Tests
Fallers perform the TUG test significantly slower than nonfallers. ROC curve and confusion matrix for the TUG test are shown in Figures 5 and 6. Related parameters are shown in Table 3. Youden's statistics for the TUG test leads to an optimal threshold of 22.5 s. An individual performing the TUG test over that threshold is likely to be considered as presenting a risk of fall in our population.
Sensors 2020, 20, x FOR PEER REVIEW 8 of 14 The threshold of 22.5 s is kept in the TUG+. The SDaap and Dav thresholds were adjusted in order to maximize the TUG+ test's accuracy: They are shown in Figure 5. The TUG+ test has better accuracy than the TUG test-an Mc Nemar test performed on the TUG and TUG+ classifications confirm that both tests are significantly different (p = 0.013). A detailed comparison between both tests is given in Table 3. The confusion matrix for the TUG+ test is also shown in Figure 6.    The threshold of 22.5 s is kept in the TUG+. The SDaap and Dav thresholds were adjusted in order to maximize the TUG+ test's accuracy: They are shown in Figure 5. The TUG+ test has better accuracy than the TUG test-an Mc Nemar test performed on the TUG and TUG+ classifications confirm that both tests are significantly different (p = 0.013). A detailed comparison between both tests is given in Table 3. The confusion matrix for the TUG+ test is also shown in Figure 6.    The threshold of 22.5 s is kept in the TUG+. The SDa ap and Da v thresholds were adjusted in order to maximize the TUG+ test's accuracy: They are shown in Figure 5. The TUG+ test has better accuracy than the TUG test-an Mc Nemar test performed on the TUG and TUG+ classifications confirm that both tests are significantly different (p = 0.013). A detailed comparison between both tests is given in Table 3. The confusion matrix for the TUG+ test is also shown in Figure 6.
We have purposely designed TUG and TUG+ tests in a simple way, like how clinical tests are usually designed. For example, the TUG+ test can be seen as a checklist with three questions: Does the patient lie above the TUG and SDa ap , or below Da v thresholds? A "diagnostic" of the risk/no risk of fall can be made from these three answers. Both tests could have been designed in a more complex way by resorting to logistic regressions: We present that approach in Appendix A. The performances of both versions of the TUG+ test are equivalent. Table 3. Characterization of the TUG and TUG+ test. The following parameters are displayed: sensitivity (Se), specificity (Sp), positive (LR + ) and negative (LR − ) likelihood ratios, positive (PPV) and negative (NPV) predictive values and accuracy (Acc). Improvements of TUG+ and AI methods with respect to TUG are emphasized in bold font.

AI Classification
Models were trained and tested for the following window sizes: 1, 2, 5, 10, 20 and 60 s. Twenty seconds windows showed the best convergence rate as shown in Figure 7. We have then performed a random search hyperparametric study to optimize the accuracy of our AI algorithm [23]. The convergence rate may seem small but it is the consequence of the small (with respect to deep learning) dataset at our disposal and of the wide range of parameters we explored in the parameter space.
Sensors 2020, 20, x FOR PEER REVIEW 9 of 14 of fall can be made from these three answers. Both tests could have been designed in a more complex way by resorting to logistic regressions: We present that approach in Appendix A. The performances of both versions of the TUG+ test are equivalent.

AI Classification
Models were trained and tested for the following window sizes: 1, 2, 5, 10, 20 and 60 s. Twenty seconds windows showed the best convergence rate as shown in Figure 7. We have then performed a random search hyperparametric study to optimize the accuracy of our AI algorithm [23]. The convergence rate may seem small but it is the consequence of the small (with respect to deep learning) dataset at our disposal and of the wide range of parameters we explored in the parameter space. Figure 7. Convergence rate of the AI algorithm versus window size. The convergence rate is the ratio of models that reached a precision of at least 65% on test data and the total amount of models trained for a specific window size.
We have kept the solution presented in Figure 8, leading to the confusion matrix displayed in Figure 6D. It showed maximal accuracy while being equally specific and sensitive (Sp = Se). One solution with higher accuracy was found (Acc = 81%) but at the expense of sensitivity (Se = 62.5%) and was, therefore, not kept. Convergence rate of the AI algorithm versus window size. The convergence rate is the ratio of models that reached a precision of at least 65% on test data and the total amount of models trained for a specific window size.
We have kept the solution presented in Figure 8, leading to the confusion matrix displayed in Figure 6D. It showed maximal accuracy while being equally specific and sensitive (Sp = Se). One solution with higher accuracy was found (Acc = 81%) but at the expense of sensitivity (Se = 62.5%) and was, therefore, not kept.
for a specific window size.
We have kept the solution presented in Figure 8, leading to the confusion matrix displayed in Figure 6D. It showed maximal accuracy while being equally specific and sensitive (Sp = Se). One solution with higher accuracy was found (Acc = 81%) but at the expense of sensitivity (Se = 62.5%) and was, therefore, not kept.

Discussion
Our study aimed at designing tests assessing the risk of fall in nursing home patients. The predictions of these tests were compared to the faller/nonfaller status of the participants. The originality of the present work is that two examination times separated by six months were included, and that the fall records of the nursing home during that time was used to classify participants as fallers or nonfallers.
We first confirm that the TUG test alone may predict falls despite its simplicity. We do not observe sensitivity and specificity as high as in [24] (Sp and Se of 87%). This could be explained by the fact that their sample only included 30 participants. Furthermore, a difference of 3.9 s between fallers and nonfallers is observed in our survey, which correlates with that of [25] (difference of 3.59 s). The threshold defining the risk of falls varies between 13 and 32.6 s according to the studies quoted by [26]. Our value of 22.5 s is intermediate. This latter study shows that the TUG test is a test that allows a better division between fallers and nonfallers when they are nursing home residents but not when they are home-based. This conclusion is shared by [26].
Kinematic analysis of the six-minute walking using the DYSKIMOT sensor reveals that this low-cost wearable device is able to measure significant differences between fallers and nonfallers. It is the first time our homemade system is applied to a geriatric population. The homemade sensor we used is not wireless yet. Placing the sensor in the lower limb (e.g., in one shoe) would have been relevant, as it is known from the literature, that instrumented socks are able to identify gait events [27,28], but the wire was uncomfortable for participants. This is coherent with the findings of [29] showing that patients mostly favor devices placed in the upper part of the body. The most comfortable solution we found is the placement in the lumbar region, which leads moreover to a sensor near the participant's body center of mass, i.e., a crucial point as far as stability and balance are concerned.
The magnitude of anteroposterior acceleration's fluctuations is significantly larger in fallers. It is coherent with the findings of [30] showing that elderly people who have already fallen have longer deceleration periods during a walking cycle than healthy young people. This behavior aims at reducing the swing phase to shorten unstable periods during the walking cycle. fallers also present smaller fractal dimensions. This observation can be related to the optimal complexity framework of [31]. They argued that physiological signals recorded in a healthy individual have a maximal complexity (e.g., high fractal dimension or entropy). A loss in complexity is associated with aging or disease. For instance, a smaller complexity in the walking pattern of healthy aged participants was observed in [32]. In our population, fallers are less able to perform quick modifications of their walking pattern and show a less complex behavior than nonfallers. Note that the six-minute walking test was interrupted by turnarounds due to the typical 25 m length of the nursing home corridors. Even if in the case of a point-to-point track, a walking course of 30 m is preferred, the minimum recommended length is 15 m [33]. Still, it is worth noticing that turnarounds are complex motor tasks that are representative of the daily activities of nursing home residents. Therefore, the kinematic information contained in the turnarounds could be isolated from the straight gait time series for separate analysis.
According to the review [34], techniques combining sensors and clinical tests are encouraging but the protocols used have yet to be standardized (sensor position, choice of clinical tests, data analysis). It is, therefore, possible to find combinations of clinical tests and kinematic parameters with an accuracy between 47.9% and 100% in a given sample [34]. The choice of the TUG test appears adequate in view of its intensive use in the field of geriatric rehabilitation and simplicity to perform. This test provides important information to predict the risk of fall when combined with kinematic data. The position of the inertial sensor in the lumbar area is relevant as it is close to the body center of mass. Studies such as [35] show that it is at this position that the best information can be gathered in order to differentiate between fallers and nonfallers. Furthermore, our choice of a six-minute walking test with an inertial sensor provides long-term information about an individual's gait. Such information is not available with the TUG test alone. It appears from our study that the TUG test combined with kinematic parameters such as SCa ap and Da v collected during the six-minute walking test improves the accuracy in predicting falls.
We report here for the first time on the increase of accuracy of TUG+ compared to TUG in predicting falls in elderly nursing home people. The novelty of this TUG+ is to combine TUG stopwatch-based duration results to gait variability indicators measured during a six-minute walking test by the DYSKIMOT sensor. Previous studies followed a similar approach. The study [36] is an attempt to improve the TUG test by increasing the walking distance and by timing each phase of this move (chair lifting, walking time, turnaround time) in order to get more information during the test. Others chose to complement the TUG test with additional sensors in order to improve its effectiveness [37,38]. In [38], the TUG test is specifically coupled with inertial sensors and shows an accuracy of 88%. In another survey, the same authors combined a questionnaire-based clinical evaluation with kinematic data measured by an inertial sensor during the TUG test [37]. They obtained accuracies of 68% for clinical evaluation alone, 73% for inertial sensors alone and 76% for combined evaluation. Those results are similar to ours: We find that the TUG+ test is a better way to predict the risk of falls in the elderly than the TUG test alone: We managed to increase the accuracy of the TUG test by 8.2% (Table 3). Using the TUG+ test, about 74% of the individuals on our sample were correctly categorized. To our knowledge, it is the first time that TUG and six-minute walking test results are merged that way.
We finally have built a complementary approach: AI analysis of six-minute walking kinematical time series in view of predicting the risk of fall. AI techniques are nowadays able to detect falls in real-time [39][40][41]. The particularity of our first attempt is to focus on a six-month prediction rather than on real-time detection. The obvious weakness of our AI classification, based on a convolutional neural network, is the size of our data set. Still, it shows the feasibility of risk of fall prediction from the kinematic data of an elderly walking six minutes, with an accuracy, specificity and sensibility of 75%.
The TUG+ test is an interesting solution in nursing homes because patients generally favor systems that do not replace a therapist [42]. Since our thresholds were fitted on the full population, we can safely conclude that a simple augmented clinical test is able to assess the risk of fall in our sample. The next step in this research is to study a different sample of nursing home patients with the tests built in the present study in order to fully assess our test's predictive power. We hope to present such results in future work.
AI classification of the risk of fall, combined with a wearable sensor, gives hope that relevant tools monitoring the risk of fall of home-based elderly will become available in the near future. The MARG sensor we used is small (9 cm 2 ) and light (10.44 g) enough to imagine several sensors attached on a wearable shirt, as proposed in [16,43] where it is shown that an undershirt equipped with 11 sensors is able to recognize several complex manual material handling tasks and basic postures (sitting, standing and lying down) as well as walking and running. In view of these promising results, it can be hoped that increasing the number of sensors in our system will increase the AI's accuracy in the assessment of risk of fall. We leave such a program for future works.

Acknowledgments:
The authors thank the following Belgian nursing homes for their support to the project: l'Adret (Gosselies), le Centenaire (Châtelet), le home Notre-Dame De Bonne Espérance (Châtelet) and Au Temps des Cerises (Châtelet). The authors also thank CeREF's technical department for having allowed the use of DYSKIMOT sensor.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A TUG and TUG+ Tests with Logistic Regressions
Instead of computing the threshold for TUG time from the ROC curve, a logistic regression of LOGIT = ln P 1−P vs. the TUG time can be performed, P being the probability of being a faller (the R package caret is used). This test, called TUG (Logit), is defined by LOGIT = −1.71 + 0.0435 TUG(s) and the threshold P* = 0.41; its confusion matrix is presented in Figure A1 and its parameters are given in Table A1. The threshold was fitted to reach optimal accuracy. Sensors 2020, 20, x FOR PEER REVIEW 12 of 14

Appendix A: TUG and TUG+ Tests with Logistic Regressions
Instead of computing the threshold for TUG time from the ROC curve, a logistic regression of = ( − ) vs. the TUG time can be performed, P being the probability of being a faller (the R package caret is used). This test, called TUG (Logit), is defined by = − . + .
( ) and the threshold P* = 0.41; its confusion matrix is presented in FigureA1 and its parameters are given in Table A1. The threshold was fitted to reach optimal accuracy.  . With the same threshold, one is led to the results presented in Figure   A1 and Table A1.  The TUG+ test can also be reformulated by using a logistic regression of LOGIT vs. TUG time, SDa ap and Da v . This test, called TUG+ (Logit), is defined by LOGIT = −3.58 + 0.0481 TUG(s) + 7.29 SDa ap m s 2 + 0.577 Da v . With the same threshold, one is led to the results presented in Figure A1 and Table A1.
The features of TUG (Logit) test are significantly different from TUG test: A Mc Nemar test leads to p < 0.01. The TUG test has a higher accuracy than the TUG (Logit) one. However, the performance of TUG+ (Logit) and TUG+ tests are very similar. A Mc Nemar test leads to p = 0.149.