Reliability of IMU-Derived Static Balance Parameters in Neurological Diseases

Static balance is a commonly used health measure in clinical practice. Usually, static balance parameters are assessed via force plates or, more recently, with inertial measurement units (IMUs). Multiple parameters have been developed over the years to compare patient groups and understand changes over time. However, the day-to-day variability of these parameters using IMUs has not yet been tested in a neurogeriatric cohort. The aim of the study was to examine day-to-day variability of static balance parameters of five experimental conditions in a cohort of neurogeriatric patients using data extracted from a lower back-worn IMU. A group of 41 neurogeriatric participants (age: 78 ± 5 years) underwent static balance assessment on two occasions 12–24 h apart. Participants performed a side-by-side stance, a semi-tandem stance, a tandem stance on hard ground with eyes open, and a semi-tandem assessment on a soft surface with eyes open and closed for 30 s each. The intra-class correlation coefficient (two-way random, average of the k raters’ measurements, ICC2, k) and minimal detectable change at a 95% confidence level (MDC95%) were calculated for the sway area, velocity, acceleration, jerk, and frequency. Velocity, acceleration, and jerk were calculated in both anterior-posterior (AP) and medio-lateral (ML) directions. Nine to 41 participants could successfully perform the respective balance tasks. Considering all conditions, acceleration-related parameters in the AP and ML directions gave the highest ICC results. The MDC95% values for all parameters ranged from 39% to 220%, with frequency being the most consistent with values of 39–57%, followed by acceleration in the ML (43–55%) and AP direction (54–77%). The present results show moderate to poor ICC and MDC values for IMU-based static balance assessment in neurogeriatric patients. This suggests a limited reliability of these tasks and parameters, which should induce a careful selection of potential clinically relevant parameters.


Introduction
Maintaining balance seems to be a simple task, but it is an essential prerequisite to stand, walk, and engage in everyday activities [1,2]. The underlying processes to sustain balance are based on the interconnection of the vestibular, visual, and somatosensory systems in the central nervous system [3]. Unconsciously, humans stabilize their gaze and perceive their body position in order maintain balance or walk [4]. Alterations of this ability serve as risk factors for disabilities or a worsening of health status, and can lead to reduced quality of life, particularly when falls occur [5]. Patients suffering from neurological diseases [3] or orthopedic problems [6,7] often show impaired static balance, which can also lead to gait alterations, as observed in patients with multiple sclerosis or Parkinson's disease [8][9][10].
Static balance analyses have been well-established in the clinical environment, but often require a dedicated laboratory or expensive equipment [11]. However, the precision of such equipment (e.g., 3D motion capture systems and force plates) has been tested extensively, yielding very good reliability on the outcome measures [12,13].
An alternative to fixed systems is the application of wearable health technology, such as inertial measurement units (IMUs) [14,15]. IMUs commonly consist of accelerometers, gyroscopes, and magnetometers, and provide the opportunity to assess gait and balance directly on the clinical ward, without the need for a dedicated laboratory [16]. Such supervised assessments can be used to monitor improvements over time e.g., during hospitalization, or serve as digital clinical endpoints to quantify the success or failure of an intervention [17].
IMUs generate time series of raw data that are then digitally processed to extract spatiotemporal parameters [18,19]. Their validity has been evaluated abundantly [20], and the clinical evaluation in a large sample of neurological inpatients has been shown to be very successful [15]. Although these are promising results, assessment of day-to-day variability, reliability, and minimal detectable change (MDC) of IMU-based static balance measures is lacking.
Therefore, the aim of this study is to assess the day-to-day variability and use the results to provide hypothetical reliability and MDC measures of standardized balance assessment in a cohort of neurogeriatric patients, by using different tasks and extracting the most relevant parameters.

Materials and Methods
A convenience sample of 41 inpatients was chosen (20 men (age: 78 ± 4 years, BMI = 25.9 kg/m 2 , and 21 women (age: 79 ± 4 years, BMI = 25.1 kg/m 2 ). Study participants were referred to the neurogeriatric or neurological wards of the Neurology Department at the University Hospital Schleswig-Holstein, Campus Kiel between 09/2017 and 12/2019 (the study protocol is reported in [16]).
Inclusion criteria were the presence of a neurological disorder, age of at least 60 years, ability to stand alone for at least ten seconds, and the ability to walk three meters without personal assistance (walking aids allowed) [16]. Exclusion criteria were current or past chronic substance abuse (except nicotine), corrected visual acuity < 60%, >2 falls in the previous week (risk of falling too high during the examination), ≤5 points in the Montreal Cognitive Assessment (MoCa) test [17,21], and not being able to perform at least one of the balance tasks. The ethics committee of the medical faculty of the University of Kiel approved the study (No. D427/17), and all participants gave written informed consent prior to participation. The three most common diagnoses (60% of all investigated diagnoses) were stroke (n = 16), Parkinson's disease (n = 6), and back pain (n = 3).

Quantitative Gait and Balance Assessment
To determine the day-to-day reliability and MDC, two standardized IMU-based balance assessments were performed within 12-24 h. For each of them, participants were equipped with a wearable IMU system (Rehawatch ® , Hasomed, Magdeburg, Germany) consisting of three IMUs worn at both ankles and at the lower back (L4-L5). Each IMU contains a 3D accelerometer (±8 g), a 3D gyroscope (±2000 • /s), and a magnetometer (±1.3 Gs), resulting in nine degrees of freedom. The assessment included the following tasks: side-by-side stance, semi-tandem stance, tandem stance on hard ground with eyes open, and semi-tandem assessment on a soft surface with eyes open and closed for 30 s each ( Figure 1). If the tasks on the hard surface were successfully completed, then the participants were asked to stand for 30 s on a soft pad (Airex balance pad, 50 × 41 × 6 cm) in a semi-tandem position with eyes open and eyes closed.

Sensor Data Processing
The IMU data was processed by custom written scripts using MATLAB (MathWorks, Nantick, MA) based on Mancini et al. [19]. The collected parameters provided information about the sway area (surface) (cm 2 /s 4 ), velocity (vel) (cm/s), acceleration (acc) (cm/s 2 ), jerk (cm/s 3 ), and frequency (Hz). Velocity, acceleration, and jerk were expressed as the root mean square value and computed in both anterior-posterior (AP) and medio-lateral (ML) directions, as there is evidence that differences in both directions can represent different pathologies or compensation strategies of the body [19,22].

Sensor Data Processing
The IMU data was processed by custom written scripts using MATLAB (MathWorks, Nantick, MA) based on Mancini et al. [19]. The collected parameters provided information about the sway area (surface) (cm 2 /s 4 ), velocity (vel) (cm/s), acceleration (acc) (cm/s 2 ), jerk (cm/s 3 ), and frequency (Hz). Velocity, acceleration, and jerk were expressed as the root mean square value and computed in both anterior-posterior (AP) and medio-lateral (ML) directions, as there is evidence that differences in both directions can represent different pathologies or compensation strategies of the body [19,22].

Statistical Analysis
The mean (M) and standard deviation (SD) values from the two measurements were calculated for the respective parameters. A paired sample t-test, was then performed to compare potential differences of the two measurements. The relative reliability (rR) was expressed by the intra-class correlation coefficient (ICC) two-way random, the average of the k raters' measurements (2, k), and by the formula: Here, the between-target mean square (BMS), residual mean square (EMS), within participants mean square (JMS), and number of participants (n) were included in the analysis [23,24]. The ICC is used to evaluate both systematic and random errors that could affect the rR of the exercises. An ICC of >0.9 indicates excellent, >0.75-0.9 good, >0.5-0.75 moderate, and ≤0.5 poor reliability [25,26].
Absolute reliability (aR) describes the participant-internal variability that is due to repeated measurements or, in other words, describes the smallest amount of change required to designate a change as real and beyond the bounds of measurement error [26]. MDC values are important to interpret change scores, as they allow the evaluation of the effectiveness of therapeutic interventions [27].
MDC values are calculated based on the standard error of measurement (SEM). The SEM is calculated as follows: The MDC, which represents the minimum detectable change in two repeated measurements that is not due to random variation in the measurements, is calculated for a 95% confidence interval as: 1.96 is the z-value for a normally distributed two-sided table with a 95% confidence interval, and 2 is used to account for the variance of the two measurements.
The MDC95 is also expressed as a percentage, and defined as: Here, mean is the mean value of the respective parameters for all measurements of the two assessments. MDC95% represents the minimum detectable change presented as a percentage that is not due to random variations of the measurements [25].

Results
Forty-one participants completed the side-by-side stance, 39 the semi-tandem stance, and 21 the tandem stance task on a hard surface. Nineteen participants completed the semitandem stance on a soft surface with eyes open, and nine participants the semi-tandem stance on a soft surface with eyes closed. Table 1 shows that the rR was in a poor range for all parameters (ICC < 0.5). The best ICC values were reached by acc AP (0.36), vel AP (0.35), and frequency (0.29). The worst value was reached by jerk ML (0.02). The ICC values were slightly better in the AP direction than in the ML direction. MDC95% was best for the parameters acc ML = 43%, frequency = 48%, and acc AP = 73%. The values were slightly better in the ML direction than in the AP direction.  Table 2 shows the measurement results of the semi-tandem stance on hard ground. The rRs were in a moderate range for the parameters acc AP (ICC = 0.52) and surface (ICC = 0.50). All other parameters showed poor ICC values. ICC values were slightly better in the AP direction than in the ML direction. The best MDC95% values were reached by frequency (43%), acc ML (51%), and acc AP (54%). The worst value was reached by Jerk ML (213%). The values of the parameters area, vel AP , vel ML , jerk AP , and jerk ML were between 87-162%. Better values for acceleration and velocity were found in the ML direction than in the AP direction, and vice versa for the jerk.  Table 3 shows the results of the tandem stance; rR was in a poor range for all parameters (ICC < 0.5). The best values were reached by acc AP (0.48) and frequency (0.35). Jerk ML (ICC = 0.01) reached the worst value. ICC values were slightly better in the AP direction than in the ML direction. The best aR was found for frequency (MDC95% = 57%) and acc AP (95%). Jerk AP (220%) reached the worst value. The other parameters were between 101-190%. Values for the parameters velocity and jerk were better in the ML direction than in the AP direction, and vice versa for acceleration. Table 4 shows the measurement results of the semi-tandem stance on soft ground with open eyes. Acc ML showed a good rR, with an ICC of 0.75. The parameters area, vel AP , vel ML , and acc AP had a moderate ICC (0.5-0.75). The poorest rR was reached by jerk AP (0.03). The best MDC95% values were reached by acc ML (53%) and frequency (55%). Jerk AP had the worst value (202%). The other parameters were between 77% and 200%. Better results were achieved in the ML direction than in the AP direction. Table 5 shows the measurement results of the semi-tandem stance on soft ground with eyes closed. The best rRs were reached by acc ML (ICC = 0.65), which reflected moderate reliability. Acc AP , vel ML , surface, and frequency also reached moderate ICC values (0.50-0.60). The worst ICC was reached by jerk ML , jerk AP , and vel AP .

Semi Tandem Stance a Soft Surface (Eyes Closed)
The best MDC95% values were obtained by the parameters frequency (39%) and acc ML (55%). The worst value was jerk AP (278%). The other values were between 66% and 197%. ML direction values were better than those in the AP direction.

MDC95% Values of All Parameters and Experimental Conditions
The MDC95% values of all exercises and associated parameters are shown in Figure 2. The parameter frequency showed good MDC95% values (39-57%) for all exercises, followed by the parameters acc ML and acc AP , with MDC95% values of 43% to 77%. By far the worst MDC95% values were determined by the parameter jerk, with values between 162% and 278%. The parameters vel ML , vel AP , and surface were always in an MDC95% range of >100%, except for vel ML in the semi-tandem stance on soft ground with open eyes, with an MDC95% = 80%, and the surface parameter in the exercise semi-tandem stance on hard ground with an MDC95% = 87%.

Discussion
In this study, IMUs were used to evaluate the day-to-day variability of static balance parameters in neurogeriatric patients. Two assessments containing five experimental conditions were performed within 24 h to evaluate the reliability using ICC [22] and MDC [28] for eight balance parameters.
The highest ICC values (between 0.5 and 0.75, which reflects moderate reliability) were obtained during the semi-tandem stance on a soft surface with eyes open and eyes closed. The highest ICC values were obtained with the parameters accML (ICC = 0.75) and area (ICC = 0.66). Our results are in line with previous findings showing MDC95% values of 113% in older adults during single-leg stance time, which is also in a poor reliability range [29]. Our findings are in contrast with results presenting the evaluation of very coarse balance scales, such as the Berg Balance Scale in Parkinson's disease patients, showing MDC95% values of 10-13% and ICC values of 0.87-0.95 [30,31].
When comparing cohorts using the above parameters [20,32], our results need to be interpreted with caution. The questionable reliability and high MDC values may seem surprising. However, a direct comparison with previous studies is difficult due to different cohorts and experimental protocols. The data reported by [19] contain 13 subjects with early untreated Parkinson's disease and 12 age-matched control subjects. Their ICC values for the control group ranged between 0.60 and 0.89, and for the PD group between 0.55 and 0.86. However, the participants performed the experiment twice within 30 min, thereby reducing variability. In contrast, in our setup, measurements were repeated on the following day, thereby capturing day-to-day variability of function, which is known to be high in neurogeriatric patients. We would argue that this captures the true performance range of neurogeriatric patients more accurately than directly repeated tasks. In addition, when calculating the MDC95% using the reported ICC, SEM, and mean values from their experiment, the values also range from 14-32% for healthy controls and 20-

Discussion
In this study, IMUs were used to evaluate the day-to-day variability of static balance parameters in neurogeriatric patients. Two assessments containing five experimental conditions were performed within 24 h to evaluate the reliability using ICC [22] and MDC [28] for eight balance parameters.
The highest ICC values (between 0.5 and 0.75, which reflects moderate reliability) were obtained during the semi-tandem stance on a soft surface with eyes open and eyes closed. The highest ICC values were obtained with the parameters acc ML (ICC = 0.75) and area (ICC = 0.66). Our results are in line with previous findings showing MDC95% values of 113% in older adults during single-leg stance time, which is also in a poor reliability range [29]. Our findings are in contrast with results presenting the evaluation of very coarse balance scales, such as the Berg Balance Scale in Parkinson's disease patients, showing MDC95% values of 10-13% and ICC values of 0.87-0.95 [30,31].
When comparing cohorts using the above parameters [20,32], our results need to be interpreted with caution. The questionable reliability and high MDC values may seem surprising. However, a direct comparison with previous studies is difficult due to different cohorts and experimental protocols. The data reported by [19] contain 13 subjects with early untreated Parkinson's disease and 12 age-matched control subjects. Their ICC values for the control group ranged between 0.60 and 0.89, and for the PD group between 0.55 and 0.86. However, the participants performed the experiment twice within 30 min, thereby reducing variability. In contrast, in our setup, measurements were repeated on the following day, thereby capturing day-to-day variability of function, which is known to be high in neurogeriatric patients. We would argue that this captures the true performance range of neurogeriatric patients more accurately than directly repeated tasks. In addition, when calculating the MDC95% using the reported ICC, SEM, and mean values from their experiment, the values also range from 14-32% for healthy controls and 20-160% for patients with Parkinson's disease, confirming the large random variations observed in our experiments. Similarly, data from [33] contain 21 healthy subjects and 17 patients with diabetic peripheral neuropathies, and the reported ICCs were excellent (ICC(1, 1) = 0.76, F(41, 40.5) = 0.71, 95% CI = (0.57, 0.86), p < 10 −6 ); however, the authors calculated the reliability by combining measurements with eyes open and closed from data obtained within the same measurement session.
In order to maintain static balance, two main strategies exist, the ankle and the hip strategy [34]. Depending on the difficulty of the balance task, these strategies might be used separately or together [35], especially when external perturbation below or at the feet level are introduced [36]. While the ankle strategy tries to maintain balance by only stabilizing the body about the ankle joint with minimal movement about the upper body, the hip strategy involves the upper body and influences the moment of inertia about the ankle to maintain balance [37]. The ankle strategy is expected to be employed for unperturbed stance, whereas the hip strategy is expected to be employed for perturbations, or when the support surface is soft and little ankle torque can be applied [38]. Consequently, the results for the simple exercises (side-by-side stance and semi-tandem stance on hard ground) were better (therefore more reliable) on average, and for the more difficult exercises (semitandem stance on soft ground with open/closed eyes and tandem stance) were worse (less reliable), which could also be related to the choice of strategy.
As a consequence, even though IMUs have already been proven to be a reliable instrument during the timed up and go test [39], walking [40], and a way to measure static balance under clinical and out of hospital conditions [16,19,20,41,42], not every experimental condition and extracted parameter seem equally suitable for clinically relevant measurements [43].
There are limitations of this study that may explain the low ICCs and large MDC values. Firstly, there may be some learning effect from the first to the second assessment. However, we do not feel that this substantially influenced the results, as we did not see in the plots a systematic improvement of parameters during the second assessment compared to the first. Secondly, different performance between day one and day two has already been described in a sit-to-stand test [25] and balance and ambulation tests [31]. This is exactly what we want to measure in studies as presented here, and it may be that patients with neurogeriatric conditions may be particularly prone to fluctuations of performance between two days. In fact, we know especially from Parkinson's disease that long-term fluctuations can be regularly observed [44]. Finally, group sizes were different across the conditions investigated. This may have additionally influenced the outcome parameters, such as the ICC and the MDC95% [45,46], as larger group sizes could potentially increase the reliability.

Conclusions
Our results support previous studies reporting that static balance can be assessed with IMUs. However, reliability of the extracted parameters remains questionable in a neurogeriatric cohort. The most stable parameters were acceleration (especially in the ML direction) and frequency, which may thus have the highest potential to reflect disease progression and response to treatment. Future reliability studies are urgently needed investigating additional neurogeriatric cohorts, healthy control groups, and other diseased cohorts to fully understand the potential of mobile health technology-derived parameters that are generally considered as highly accurate.