Augmenting Clinical Outcome Measures of Gait and Balance with a Single Inertial Sensor in Age-Ranged Healthy Adults

Gait and balance impairments are linked with reduced mobility and increased risk of falling. Wearable sensing technologies, such as inertial measurement units (IMUs), may augment clinical assessments by providing continuous, high-resolution data. This study tested and validated the utility of a single IMU to quantify gait and balance features during routine clinical outcome tests, and evaluated changes in sensor-derived measurements with age, sex, height, and weight. Age-ranged, healthy individuals (N = 49, 20–70 years) wore a lower back IMU during the 10 m walk test (10MWT), Timed Up and Go (TUG), and Berg Balance Scale (BBS). Spatiotemporal gait parameters computed from the sensor data were validated against gold standard measures, demonstrating excellent agreement for stance time, step time, gait velocity, and step count (intraclass correlation (ICC) > 0.90). There was good agreement for swing time (ICC = 0.78) and moderate agreement for step length (ICC = 0.68). A total of 184 features were calculated from the acceleration and angular velocity signals across these tests, 36 of which had significant correlations with age. This approach was also demonstrated for an individual with stroke, providing higher resolution information about balance, gait, and mobility than the clinical test scores alone. Leveraging mobility data from wireless, wearable sensors can help clinicians and patients more objectively pinpoint impairments, track progression, and set personalized goals during and after rehabilitation.


Introduction
Gait and balance play a vital role in functional mobility. From a clinical perspective, these activities of everyday living are essential to maintain independent functional mobility and are determinants of quality of life [1] and risk of falls [2] in elderly and impaired populations (e.g., stroke, multiple sclerosis, cerebral palsy, and Parkinson's disease).
In a clinical setting, the monitoring, treatment, and evaluation of gait and balance deficits currently rely on intermittent use of standardized outcome tests; for example, tests of walking speed, walking distance, balance confidence, and the ability to hold static postures or complete dynamic movements. Many of these tests are scored using a single performance-based measure, whereas others are qualitative and may be subjective in that they rely on patient self-reports and therapist observations [3]. These measures, though effective in maintaining clinical integrity, lack the resolution needed to detect subtle, impairment-based changes occurring during the recovery process. Furthermore, these methods are subject to inter-and intra-observer variability [4], making it difficult to design person-specific, progressive therapeutic strategies to improve deficits. Automated systems, computing clinically-relevant measures from real-time data, would enable therapists and physicians to continuously track gait and balance objectively and at a higher resolution than currently possible. Such an approach would likely help clinicians make informed decisions about early interventions, treatment efficacy, and patient recovery progress.
Wearable sensors, such as inertial measurement units (IMUs), are promising tools to augment the current clinical tests of gait and balance. This technology provides continuous, objective, and high-resolution movement data that may better quantify test performance. These sensors are also relatively inexpensive, easy to use, lightweight, and unobtrusive compared with specialized laboratory equipment (i.e., force plates, infrared motion capture systems). It is not surprising then that there exists an extensive body of work regarding algorithm development for human motion analysis using IMUs. For instance, there are multiple systematic reviews that summarize studies of inertial sensors to estimate gait kinematic and kinetics [5], gait cycle segmentation by signal processing [6] or machine learning [7], and postural stability metrics [8]. Despite continued interest in wearable technology, there is still a lack of research that validates or implements sensor data in actual clinical practice. An exception is the work by Bergamini et al. [9], in which multiple inertial sensors were used to measure gait stability during a 10 m Walk Test (10MWT) in subacute stroke patients. Importantly, the sensor data were able to discriminate between different levels of walking ability as well as fall risk. In another example, the authors of [10] explored age-related changes of gait and balance using IMUs at multiple locations during a similar, though nonstandardized, Instrumented Stand and Walk test (iSAW). They found a linear deterioration of postural sway and gait with age for some features of sensor data, and alternative patterns for other features (e.g., deterioration after the 6th or 7th decade or no change throughout age span). The next logical step in this line of investigation is to examine sensor data collected during multiple standardized gait and balance tests, which are common tools used to evaluate aging or impaired individuals.
For a realistic implementation of wearable technology in the clinical setting, limiting the number of sensor devices is critical to day-to-day usability and field deployment. To minimize the temporal, physical, and cognitive burdens on clinicians, it is highly desirable to have the fewest number of devices to assess performance as possible, particularly for impaired populations. Various algorithms have been developed for a single IMU on the lower back (approximate location of center of mass) to quantify gait and balance. To augment measures currently used in the clinical setting, we propose a "clinical meta-feature extraction" (CMFE) process, which we define as a comprehensive combination of algorithms to extract quantitative features of gait and balance. The CMFE process is intended to consolidate previously developed signal processing approaches to robustly quantify gait and balance using IMU signal features during multiple standardized clinical tests.
Using CMFE, we sought to develop a normative dataset of gait and balance features from healthy, age-ranged individuals using a single inertial sensor on the lower back. We selected the BioStampRC sensor (MC10 Inc.; Lexington, MA, USA), which is a flexible, wireless, multimodal, research-grade device. This device was chosen for its low profile and flexible mechanical properties, as well as its ability to collect multiple sensing modalities (e.g., triaxial accelerometer and gyroscope data on a single device). Bilateral shank placement of the BioStampRC was previously validated against an activity monitor for counting steps and temporal measures of the gait cycle [11]. For a single device on the Sensors 2019, 19, 4537 3 of 28 lower back, metrics from the BioStampRC have also been validated during standing balance against force plate data [12]. In the CMFE approach, we combined previous algorithms to extract kinematic movement estimates and descriptive signal characteristics in the time and frequency domains. These algorithms were selected for their relevance to the sensor location at the lower back, the clinical tests of interest, and previously demonstrated accuracy. The feature set included spatiotemporal gait kinematics (e.g., stance time and step length), which rely on accurate detection of gait cycle events such as heel strike and toe-off. Because of the novel combination of algorithms in CMFE for identifying gait cycle events, we also validated the subset of spatiotemporal gait features against gold standard measures.
The objectives of this study are threefold: (1) to validate sensor-derived spatiotemporal gait kinematics (based on the detection of gait cycle events) against gold standard measures to assess accuracy and bias, (2) to implement the CMFE approach to compute sensor-derived features of gait and balance during common clinical outcome measures for age-ranged healthy individuals, and (3) to quantify the effect of age and phenotype characteristics (sex, height, and weight) on these sensor-derived features. This approach lays a foundation to monitor high-resolution gait and balance measures in different impaired populations. As a proof of concept, we also applied CMFE to compute sensor-derived features for a single individual with stroke and compared their outcome data to the healthy group.

Participants
Fifty-one healthy adults participated in the study (N = 51; age range: 20 to 70). Two subjects were excluded from analysis due to issues with the sensor battery, resulting in 49 total subjects who served as a basis for three different age groups (Table 1). These participants had no known musculoskeletal or neurological issues. In addition, one 57-year-old male with a right-side pario-occipital and cerebellar hemorrhagic stroke participated while undergoing inpatient rehabilitation at the Shirley Ryan AbilityLab (Chicago, IL). The patient was 42 days post-stroke and presented with left-side hemiparesis. He was discharged from the hospital to his home the following day.
All individuals provided written informed consent before participation. The study was approved by the Institutional Review Board of Northwestern University (Chicago, IL, USA) in accordance with federal regulations, university policies, and ethical standards regarding research on human subjects.

Protocol and Data Collection
Participants performed a sequence of three tests based on common clinical outcome measures in random order: 1.
The 10-m walk test (10MWT) of gait speed, with three trials each at a self-selected velocity (SSV) and fast velocity (FV). Increasing gait speed has been correlated with a higher quality of life [1] and community mobility [13]. The traditional clinical outcome of the 10MWT is average walking speed in the SSV and FV conditions. Participants walked over an instrumented walkway Static postural stability condition of the Berg Balance Scale (BBS), including standing unsupported with feet apart (SU), standing with eyes closed (SEC), standing with feet together (SFT), (d) standing in tandem stance (ST) with their nondominant (or paretic) leg behind, and standing on one leg (SOL) on their nondominant (or paretic) leg. This test assesses functional balance and is associated with risk of falling [2]. A trained clinician scores each item on a 5-point ordinal scale, ranging from 0 (lowest function) to 4 (highest function). The traditional clinical outcome of the BBS is the total score. 3.
Timed Up and Go (TUG) test of functional mobility, with two trials collected. This test assesses functional mobility and is used to predict the risk of falls [14]. Participants began seated in a chair, rose to a standing position without use of their hands (Sit-to-Stand), walked 3 m (Walk), turned 180 degrees (Turn 1), walked 3 m back to the chair (Walk), turned 180 • (Turn 2), and sat down in the chair without use of their hands (Stand-to-Sit). The traditional clinical outcome of the TUG is the total time required to complete the test.
To validate step count estimates, participants also performed four naturalistic walking trials in a circuit at a self-selected velocity. The circuit required approximately 91 m (300 ft) of walking, including straight walking, walking through three open doorways, and turning corners (two right turns, two left turns). Visual step count was recorded as the gold standard for validating step count computed from sensor data, obtained from a researcher walking behind the participant and clicking a tally-counter each time the participant's foot impacted the ground. The individual with stroke performed a similar, abbreviated circuit, 33.5 m (110 ft) in length, on the inpatient hospital floor (one open doorway, one right turn, and one left turn).

Sensor Technology
Participants wore a skin-mounted IMU (BioStampRC; MC10, Inc., Cambridge, MA, USA; dimensions: 65 × 35 × 3 mm, weight: 7 g) positioned on the fifth lumbar vertebra (L5), approximating the location of the body center of mass (CoM). The sensor was attached to the skin with an overlying layer of transparent adhesive film (Tegaderm; 3M, St. Paul, MN, USA). The BioStampRC collected triaxial acceleration (sensitivity ±4 g) and triaxial angular velocity (sensitivity ±2000 • /s) at 31.25 Hz. Sensor axes were aligned with the local coordinate system of the L5 vertebra ( Figure 1). A Samsung Galaxy tablet running the proprietary BioStampRC application was used to collect the sensor data and annotate the beginning and end of each trial/condition during the clinical tests.  [2]. A trained clinician scores each item on a 5-point ordinal scale, ranging from 0 (lowest function) to 4 (highest function). The traditional clinical outcome of the BBS is the total score. 3. Timed Up and Go (TUG) test of functional mobility, with two trials collected. This test assesses functional mobility and is used to predict the risk of falls [14]. Participants began seated in a chair, rose to a standing position without use of their hands (Sit-to-Stand), walked 3 m (Walk), turned 180 degrees (Turn 1), walked 3 m back to the chair (Walk), turned 180° (Turn 2), and sat down in the chair without use of their hands (Stand-to-Sit). The traditional clinical outcome of the TUG is the total time required to complete the test.
To validate step count estimates, participants also performed four naturalistic walking trials in a circuit at a self-selected velocity. The circuit required approximately 91 m (300 ft) of walking, including straight walking, walking through three open doorways, and turning corners (two right turns, two left turns). Visual step count was recorded as the gold standard for validating step count computed from sensor data, obtained from a researcher walking behind the participant and clicking a tally-counter each time the participant's foot impacted the ground. The individual with stroke performed a similar, abbreviated circuit, 33.5 m (110 ft) in length, on the inpatient hospital floor (one open doorway, one right turn, and one left turn).

Sensor Technology
Participants wore a skin-mounted IMU (BioStampRC; MC10, Inc., Cambridge, MA, USA; dimensions: 65 × 35 × 3 mm, weight: 7 g) positioned on the fifth lumbar vertebra (L5), approximating the location of the body center of mass (CoM). The sensor was attached to the skin with an overlying layer of transparent adhesive film (Tegaderm; 3M, St. Paul, MN, USA). The BioStampRC collected triaxial acceleration (sensitivity ±4 g) and triaxial angular velocity (sensitivity ±2000°/s) at 31.25 Hz. Sensor axes were aligned with the local coordinate system of the L5 vertebra ( Figure 1). A Samsung Galaxy tablet running the proprietary BioStampRC application was used to collect the sensor data and annotate the beginning and end of each trial/condition during the clinical tests.
De-identified sensor data were uploaded to the MC10 BioStampRC Cloud and then downloaded to a HIPAA-compliant (Health Insurance Portability and Accountability Act of 1996) secure server. Data processing and analysis were implemented in MATLAB 2017a (MathWorks, Natick, MA, USA).   De-identified sensor data were uploaded to the MC10 BioStampRC Cloud and then downloaded to a HIPAA-compliant (Health Insurance Portability and Accountability Act of 1996) secure server. Data processing and analysis were implemented in MATLAB 2017a (MathWorks, Natick, MA, USA).

Data Exclusions
Two individuals were excluded from certain clinical tests. One participant was excluded from analysis of the 10MWT in the FV condition because of a particularly high walking velocity (2.75 m/s); in this case, the sensor sampling rate was unable to capture the underlying time and frequency components needed to estimate the foot gait events. Additionally, one subject was excluded from the TUG analysis because of additional noise in the signals, likely due to poor sensor adhesion (i.e., from sweat or prolonged wear time of Tegaderm) and the resulting movement artifacts, which made it difficult to identify phases of the TUG. For the validating step count from the naturalistic walking bouts, three trials from a single subject were excluded due to a lack of visual step count to use as the gold standard.

Clinical Meta-Feature Extraction
The CMFE process involved extracting a wide-ranging set of sensor features from the clinical tests using the following process. First, the accelerometer signals were transformed to a horizontal-vertical coordinate system to correct for slight variations in sensor placement and so that the triaxial signals corresponded to dynamic accelerations in three true anatomical directions: anteroposterior (AP), mediolateral (ML), and vertical (V). This was done using the approach reported in [15], projecting the raw measured accelerations a x , a y , and a z to the anatomical planes and removing the static vertical acceleration due to gravity (1g). The true AP, ML, and V were estimated using a provisional vertical accelerationâ V and the following set of equations, where all values are normalized to gravity: a V = a z sin θ z + a y cos θ z (1a) a AP = −a z cos θ z + a y sin θ z (1b) here, θ x and θ z are the angles between the true horizontal (ML) plane and the IMU-fixed x-and z-axes, respectively, with positive rotation being upwards from the horizontal plane. These angles were computed using the mean acceleration a in that direction, based on the approximations sin θ x ≈ a x and sin θ z ≈ a z for large n [15]. Finally, all accelerations were converted to m/s 2 . For walking-related tests (10MWT, TUG walking phase, and naturalistic walking bouts for step count), accelerometer signals were filtered using a fourth-order Butterworth low-pass filter at 10 Hz [16] to obtain preprocessed accelerations, a AP , a ML , and a V .
Gait Event Detection Algorithm: Gait events were detected using the flowchart in Figure 2. Foot contact events were estimated using a continuous wavelet transform approach (CWT) on the preprocessed vertical acceleration a V [17]. This algorithm uses two wavelets-Gaussian and Mexican Hat-to detect initial contact (IC) and end contact (EC) respectively (function cwt in MATLAB).
To determine the scale for each wavelet, a nonlinear frequency-scale relationship was implemented [16]. First, the acceleration signal was integrated and differentiated with respect to CWT using a Gaussian wavelet (gaus1), and the resulting local minima were identified as IC events. The signal was again differentiated using the Mexican Hat wavelet (gaus2), and the resulting local maxima were identified as EC events. Only peaks with a magnitude > 20% of the mean of all peaks were considered for EC detection. IC events were assumed to pair with the subsequent EC event, and additional (false) ICs were removed if they occurred within 0.25 s or outside 2.25 s of the previous IC [18]. Finally, the angular velocity about the vertical axis, also known as yaw, was filtered using a fourth-order low-pass Butterworth filter at 2 Hz to designate right and left leg gait events.
Temporal gait parameters for a gait cycle i were estimated as follows [16], where T Stance , T Stride , T Step , and T Swing are the stance time, stride time, step time, and swing time, respectively, and t EC and t IC are the times of end contact and initial contact, respectively.
Step count was defined as the number of initial contact events identified.
Step Length Estimation Algorithm: Step length was estimated using a modified inverted pendulum model. During gait, the CoM undergoes changes in height, which is used to estimate the step lengtĥ L Step [19]:L where L is the pendulum length (distance from the lower back sensor to the ground) and h is the change in height obtained by double integration of a V . A constant offset is added to improve this estimate and compute a final step length L Step [20]: where S is the participant's shoe size (vector of shoe sizes for all participants in Equation (4a)), K is an optimum proportional constant used for all participants, and L * Step is a vector array of the actual step length obtained from the gold standard (GAITRite) during the 10MWT. Because step length increases with walking speed, two constants were computed for the two velocity conditions for the healthy participants: K = 1.13 for SSV and K = 1.50 for FV. Separate K constants were computed for the stroke participant: K = −0.25 for SSV and K = −0.11 for FV. Without the KS correction term,L Step generally underestimated the actual step length for the healthy participants and overestimated actual step length for the stroke participant.
The integration drift of a V was removed by Empirical Mode Decomposition (EMD) [20,21]. First, the vertical velocity v V is obtained by integrating a V and then decomposed into Intrinsic Mode Functions (IMFs). Each IMF represents a component of the original v V , from high-frequency to low-frequency components. To reconstruct v V without the integration drift, specific IMFs were selected using the Hurst exponent, which is a measure of predictability of a time series [22]. IMF components were visually inspected for the presence of trends to determine a quantitative cutoff of the Hurst exponent. Components with Hurst exponents > 0.8 were removed from the signal. The same process was applied when integrating v V to obtain a reconstructed version of the CoM vertical displacement h without drift.

Static Postural Balance Algorithm:
The frequency-domain features ( Table 2) were estimated using the fast Fourier transform (function "fft" in MATLAB). Time-domain features were estimated from the acceleration, as well as its differentiated (jerk) and integrated (velocity) signals. Finally, the ellipse features (Table 2) were obtained by computing the eigenvalues and eigenvectors of the covariance matrix of the acceleration signals in AP and ML planes [24].
TUG Phase Detection Algorithm: This algorithm was developed to detect four main phases in the TUG: rising from a chair (sit-to-stand), walking, turning, and sitting down (stand-to-sit) [25][26][27]. Sit-to-Stand and Stand-to-Sit phases were estimated by a reconstruction of the pitch signal after using a discrete wavelet approach with a Daubechies mother wavelet (db5) and an approximation level 5 (5A). Two turning phases were identified under the same approach but using the yaw signal and an approximation level 2 (2A). Finally, the gait event detection algorithm described above was used to identify the walking phase, walking features, and step counts in each turn. A flowchart of the TUG Phase Detection Algorithm is given in Figure 3.  [15,16], (b) [16,23], (c) [17], (d) [18], (e) [16], and (f) [16,20]. a V , a ML , a AP = acceleration in vertical, mediolateral, and anterposterior directions, respectively. CWT = continuous wavelet transform; t IC = times of initial contact; t EC = times of end contact; i = index of gait cycle; T Stance = stance time; T Stride = stride time; T Step = step time; T Swing = swing time; EMD = empirical mode decomposition; h = vertical displacement of CoM; L = distance from sensor (approximately located at CoM) to ground during upright standing; K = optimization constant; L Step = step length; V Step = step velocity.
Static Postural Balance Algorithm: The frequency-domain features ( Table 2) were estimated using the fast Fourier transform (function "fft" in MATLAB). Time-domain features were estimated from the acceleration, as well as its differentiated (jerk) and integrated (velocity) signals. Finally, the ellipse features (Table 2) were obtained by computing the eigenvalues and eigenvectors of the covariance matrix of the acceleration signals in AP and ML planes [24].
TUG Phase Detection Algorithm: This algorithm was developed to detect four main phases in the TUG: rising from a chair (sit-to-stand), walking, turning, and sitting down (stand-to-sit) [25][26][27]. Sit-to-Stand and Stand-to-Sit phases were estimated by a reconstruction of the pitch signal after using a discrete wavelet approach with a Daubechies mother wavelet (db5) and an approximation level 5 (5A). Two turning phases were identified under the same approach but using the yaw signal and an approximation level 2 (2A). Finally, the gait event detection algorithm described above was used to identify the walking phase, walking features, and step counts in each turn. A flowchart of the TUG Phase Detection Algorithm is given in Figure 3.

Features Summary
A total of 184 features were calculated from the acceleration and angular velocity signals during the clinical tests and naturalistic walking bouts, summarized in Table 2. Of these, six features derived from gait event detection-stance time, swing time, step time, step length, step velocity, and step count-were first validated against gold standard measures.

Features Summary
A total of 184 features were calculated from the acceleration and angular velocity signals during the clinical tests and naturalistic walking bouts, summarized in Table 2. Of these, six features derived from gait event detection-stance time, swing time, step time, step length, step velocity, and step count-were first validated against gold standard measures.  Step Length Symmetry Ratio (SSV, FV) unitless Step length ratio of right and left leg (spatial symmetry) Examples of features and activity segmentation estimated from these algorithms are shown in Figure 4 for the clinical tests of gait and balance.
Examples of features and activity segmentation estimated from these algorithms are shown in Figure 4 for the clinical tests of gait and balance.

Statistical Analysis
Statistical analysis was performed using SPSS v25 (IBM, Armonk, NY, USA). Bland-Altman plots were used to express the error between the sensor system and the corresponding gold standard system (i.e., MC10 vs. GAITRite; MC10 vs. visual step count). Absolute agreement between the two systems was evaluated using intraclass correlations (ICC) and limits of agreement (LoA). Relative agreement between the systems was determined using Pearson's correlation coefficient (r). Classification for ICC were considered as excellent (values greater than 0.9), good (between 0.75 and 0.9), moderate (between 0.5 and 0.75), or poor (less than 0.5) [30].
Overall, 183 of the 184 sensor-derived features were assessed across the three clinical tests for age effects (step count in the naturalistic walking bouts was used for validation only). Spatiotemporal gait features for the right and left legs were averaged for healthy participants, i.e., those who exhibited relatively symmetrical gait. Gait symmetry for the healthy participants was verified by comparing the empirical cumulative distribution for the left and right legs in each feature. Feature intercorrelations from each clinical test were explored using a correlation matrix and the Pearson correlation coefficients [10], to examine the presence or absence of relationships between gait and balance features.
The relationship between each feature and age was initially assessed using univariate correlations. Normality of the features was tested using D'Agostino-Pearson omnibus K 2 with significance level set to 0.05. Strength and direction of the correlations with age were measured with Pearson product-moment correlation for the normally distributed features, and Spearman's rank order correlation for the non-normally distributed features. Partial correlations (r*) were performed to control for effects of weight and height. Correlations were considered non-negligible (that is, some association existed between the feature and age) for r values of 0.3 or greater [31].
Hierarchical multiple regression was performed to quantify the effect of age on features with significant, non-negligible univariate correlations (|r| ≥ 0.3, p < 0.05). The goal of these models was to determine whether adding age as a predictor variable significantly improves the proportion of explained variance (R 2 ) for the feature in question. Here, age was added as a predictor variable after adding the variables for weight, height, and sex, respectively [10], thereby testing the effect of age alone on a feature after controlling for other phenotype characteristics.
Finally, to identify the difference and level of resolution of the proposed features compared to current clinical outcome measures (total duration for the TUG, walking velocity for the 10MWT and therapist scores between 0 and 4 in the BBS), the same clinical outcome measures were tested for differences within clinical tests and among age groups using inferential statistics (two way ANOVAs with main effects of age and test condition, as well as their interaction). There was excellent agreement between the gold standard measure and estimates from sensor data for most temporal gait features, including stance time, step time, and gait velocity (ICC > 0.90, LoA < 20%), as well as for step count (ICC = 0.98, LoA = 10%). There was good agreement for swing time (ICC = 0.78, LoA = 18%), with a notable trend of overestimating longer swing time. There was moderate agreement for step length (ICC = 0.68, LoA = 20%), with greater errors typically seen in the FV condition for larger steps.

Feature Independence between Clinical Tests
Figure 6 maps the Pearson product-moment correlation coefficients for the estimated features from one condition of each clinical test. Sensor-derived features were highly correlated within each clinical test, but clearly separable between the dynamic mobility (10MWT and TUG) and static balance (BBS) tests. This suggests that the features for each clinical test effectively represent different domains. Stronger correlations are seen between the 10MWT and TUG, which is expected as both tests include walking. In the BBS, time and frequency domain features were highly correlated within domains but showed almost no correlation between domains.  Figure 6 maps the Pearson product-moment correlation coefficients for the estimated features from one condition of each clinical test. Sensor-derived features were highly correlated within each clinical test, but clearly separable between the dynamic mobility (10MWT and TUG) and static balance (BBS) tests. This suggests that the features for each clinical test effectively represent different domains. Stronger correlations are seen between the 10MWT and TUG, which is expected as both tests include walking. In the BBS, time and frequency domain features were highly correlated within domains but showed almost no correlation between domains.

Correlation between Age and Sensor-Derived Features
Tables A1-A8 (see in Appendix A) describe how sensor-derived features across the different clinical tests are related to age. Statistically significant, non-negligible correlations with age (|r| ≥ 0.3, p < 0.05) were found in 28 out of 183 features across the three clinical tests. After adjusting for weight and height, this increased to 36 total features that were correlated with age.
The strongest correlation with age was in the TUG second turn phase, for the yaw acceleration (i-ii) (r = −0.581, p ≤ 0.001; Table A8). This indicates that older participants rotated more slowly about their body midline in the first half of the second turn, before sitting. This particular finding illustrates the ability of sensorized motion to uncover behaviors that would be "invisible" to the typical clinician. Other moderate-to-strong correlations with age (|r| ≥ 0.

Correlation between Age and Sensor-Derived Features
Tables A1-A8 (see in Appendix A) describe how sensor-derived features across the different clinical tests are related to age. Statistically significant, non-negligible correlations with age (|r| ≥ 0.3, p < 0.05) were found in 28 out of 183 features across the three clinical tests. After adjusting for weight and height, this increased to 36 total features that were correlated with age.
The strongest correlation with age was in the TUG second turn phase, for the yaw acceleration (i-ii) (r = −0.581, p ≤ 0.001; Table A8). This indicates that older participants rotated more slowly about their body midline in the first half of the second turn, before sitting. This particular finding illustrates the ability of sensorized motion to uncover behaviors that would be "invisible" to the typical clinician. Other moderate-to-strong correlations with age (|r| ≥ 0.3, p < 0.05) were found for the BBS: in SU (1/23 features), F95% ML; in SEC (1/23 features), SC ML; in SFT (1/23 features), F95% ML; in ST (1/23 features), F50% AP; and in SOL (5/23 features), maximum, mean, and RMS acceleration in the ML direction, AP ellipse axis, and ML sway velocity. For the 10MWT SSV (4/12), mean stance time, mean step length, mean velocity, and duration. For the 10MWT FV (5/12), mean step length, maximum power frequency, mean velocity, number of steps, and duration. For the TUG (17/42), in Sit-to-Stand phase, mean pitch velocity (i-iii), maximum pitch velocity and mean pitch acceleration (i-ii), and mean AP acceleration (i-iii); in Walking phase, RMS of AP acceleration, number of steps, and phase duration; in Turn 2 phase, maximum yaw velocity and mean yaw acceleration (ii-iii), and phase duration; finally, in the Stand-to-Sit phase, range and maximum pitch velocity (i-ii), range of pitch velocity and mean pitch acceleration (ii-iii), standard deviation of pitch velocity (i-iii), and mean and standard deviation of AP acceleration. As with the TUG, these results illustrate the power of quantifying motion as subtle differences between groups are easily determined.

Hierarchical Multivariate Regression for Age Effects in Sensor-Derived Features
Hierarchical multivariate regression was performed for the 36 features that had significant, non-negligible, independent correlations with age (|r| ≥ 0.3; p < 0.05) after correcting for weight and height (Table 3). This approach quantifies the relative effects of age, sex, height, and weight on the features of interest, and assesses the effect of age while controlling for these phenotype variables. The features included totaled nine in the BBS, nine in the 10MWT, and 18 in the TUG. Introducing age as a variable in the model significantly increased the amount of explained variance for all features, by 6.9-32.5%. From these 36 features, weight was also a significant predictor for F95% ML in the SU balance task, and for the Turn 2 duration in the TUG. Height was also a significant predictor for mean stance time in 10MWT-SSV. Sex was also a significant predictor for ML sway velocity in the SOL balance task, as well as mean step length and number of steps in 10MWT-FV. For the rest of the features, age was the only significant predictor.

Differences between Age Groups and Stroke Rehabilitation Participant
An example comparison of the clinical test scores and sensor-derived features is shown in Figure 7 for each test. A single sensor-derived feature was chosen from each test to illustrate differences between age groups and/or between these healthy participant groups and an individual with stroke.
The 10MWT is scored based on gait velocity, which decreased with age in both SSV and FV conditions (Figure 7a, top). There were main effects of age group (p = 0.003) and speed condition (p < 0.001) on gait velocity (p < 0.001), with no interaction effect (p = 0.95). Post hoc tests showed that the 55-70 age group had significantly lower gait velocities than the 20-34 age group (p = 0.001). Gait velocity for the stroke participant in SSV and FV was notably lower, at 0.36 and 0.48 m/s, respectively.
For the sensor-derived feature of step length, mean values also decreased with age in both conditions (Figure 7b, top). There were main effects of age group (p = 0.014) and speed condition (p < 0.001) on step length, with no interaction effect (p = 0.91). Post hoc tests showed that the 55-70 age group had significantly shorter step length than the 20-34 age group (p = 0.006). The stroke participant took longer steps on his non-paretic side than his paretic side in both speed conditions, with shorter steps on average compared to healthy controls.
The BBS tasks are scored on a discrete scale of 0 to 4 by a therapist, for an individual's ability to complete the task safely and for the required amount of time. All healthy participants received a perfect score on the static standing conditions, except one individual in the 34 to 54 age range who scored a 3 for Standing on One Leg because they could not hold the position for a full 10 seconds (Figure 7a, middle). The stroke participant scored a 3 on SU, SEC and SFT (able to complete under supervision) and a 0 on ST and SOL (loss of balance on attempt). The sensor approach provides continuous metrics to characterize performance, whereas specific differences between the balance conditions and age groups are seen in the length of the 95% ellipse anteroposterior axis computed from sensor data (Figure 7b, middle). Generally, AP axis length increased with the difficulty of the condition, meaning that individuals had greater acceleration in the forward-backward direction as balance was more challenged. There was a main effect of condition (p < 0.001) but not age group (p = 0.46) on AP axis length, and no significant interaction (p = 0.51). AP axis length for the stroke participant was greater in all conditions.

Discussion
In this study, we estimated features from different clinical tests performed in the rehabilitation setting using a novel combination of algorithms and data from a single IMU placed on the lower back. The end goal is to augment the information that can be obtained from current clinical tests, by automatically computing high-resolution measures of gait and balance.
The first objective was to validate a subset of sensor-derived spatiotemporal gait features against gold standard measures (GAITRite and visual step count) to explain systematic differences between Finally, the TUG is scored as a time to complete all five phases of the test, which increased with age (Figure 7a, bottom). There was a main effect of age group on total TUG duration (p = 0.019), for which the 55-70 age group had significantly longer durations than the 20-34 age group (p = 0.007). The stroke participant completed the TUG in 22.06 s. Our CMFE approach can distinguish durations of each phase of the TUG to determine in which phase an individual moves faster or slower (Figure 7b, bottom). There were main effects of age group (p < 0.001) and phase (p < 0.001) on phase durations, as well as an interaction effect (p = 0.002). Post hoc tests showed that the 50-70 age group had a longer duration for the second turn (p = 0.041) and for the walking (p = 0.006) phase than the 20-34 age group. The duration of each phase for the stroke participant was sit-to-stand 1.25 s, stand-to-sit 0.74 s (uncontrolled descent), first turn 2.84 s, second turn 2.00 s, and walk 15.23 s.

Discussion
In this study, we estimated features from different clinical tests performed in the rehabilitation setting using a novel combination of algorithms and data from a single IMU placed on the lower back. The end goal is to augment the information that can be obtained from current clinical tests, by automatically computing high-resolution measures of gait and balance.
The first objective was to validate a subset of sensor-derived spatiotemporal gait features against gold standard measures (GAITRite and visual step count) to explain systematic differences between the two systems. Temporal gait parameters in the 10MWT demonstrated excellent agreement (mean step time, stance time, and gait velocity) or good agreement (swing time), similar to previous work [16,17].
Step length demonstrated moderate agreement, with estimation errors that increased with gait velocity (SSV vs. FV conditions).
A potential explanation for the lower accuracy in step length is in the walking kinematics of our participants. When modeling gait as a rigid inverted pendulum, it is assumed that the distance between the point of contact and the CoM is constant and an ideal pendulum has an equal exchange between kinetic and gravitational potential energy. In this model, increasing gait velocity increases the vertical displacement of the center of mass, and consequentially produces a larger step length. However, most of the subjects exhibited only a small change in vertical displacement between the fast and self-selected velocity conditions (mean 0.48 ± 0.88 cm), which would result in underestimation of the step length for faster gait velocities. The virtual limb model proposed by the authors of [32] may explain participants' behavior. In this model, a virtual limb (pendulum) compresses in the stance phase at higher velocities, thereby reducing vertical displacement of the center of mass, and enhancing elastic energy storage (i.e., in the muscles and tendons). Another possible source of error in step length estimates is from the double integration to obtain position of the CoM from acceleration signals, as integration results in error accumulation and a "drift" in the integrated signal. Though we attempted to remove drift via EMD methods, other drift removal techniques paired with sensors at the foot may improve estimates [33]. Future work will examine the main source of step length error by comparing vertical excursions estimated from the inertial sensor signal to that from motion capture data.
The second and third objectives were to compute a series of sensor-derived features during clinical outcome tests of gait and balance, and to examine the effect of age and phenotype on these features. This was achieved using our novel CMFE process, combining several previously published algorithms to obtain a large, multidimensional feature set. These features are by no means exhaustive, and additional features may also be sensitive to age. For example, joint angles or muscle activation patterns would further augment test outcomes via additional sensor types and locations. Alternative algorithms for gait event detection could also be considered to improve accuracy of spatiotemporal features, include signal processing using different filters (FIR or IR) [6], sensor fusion [34], or machine learning [7]. We selected specific algorithms for gait event detection in CMFE based on the minimal selected sensor parameters (e.g., sensing types and position on body) and reliability for the target population, but different sensor types, body placement, or computational approaches could be substituted when extracting clinically-relevant features during standardized outcome measures.
The extracted features included 23 in each of the five static balance conditions of the BBS, 42 across all phases of the TUG, 13 in each condition of the 10MWT, and an additional feature in the difference between self-selected and fast gait velocity in the 10MWT. Of these 184 total features, 36 were significantly correlated with age. Hierarchical multivariate regression confirmed that age was the most consistent contributor to changes in these features. Specific findings regarding age-related features are discussed below for each clinical outcome test.

BBS Static Balance Performance
Our findings confirmed that balance declines with age across the static standing conditions (SU, SEC, SFT, ST, and SOL). Following the pattern reported by the authors of [10], participants demonstrated increasing time-domain sway features with age (i.e., mean and maximum velocity, acceleration, and jerk) and decreasing frequency domain features in the ML plane (F95% for SU and SFT, and SC for SEC).
Age alone was the most significant predictor of seven features in the BBS, including positive correlations with time domain features for standing on one leg (Max Acc, Mean Acc, and RMS in the ML plane, and Ellipse Axis in the AP plane) and negative correlations with frequency domain features in the SEC, SFT, and ST tasks. This can be interpreted as larger and slower postural corrections with age. Aging affects neural factors such as increased reaction times [35] and biomechanical factors such as muscle weakness [35], which would affect balance performance in a pattern consistent with our findings.

10MWT Performance
Gait velocity computed from the sensor data was negatively correlated with age in both self-selected and fast walking conditions, confirming that sensors can effectively capture the reductions in walking speed that are well-documented with age [36].
Step length decreased with age, whereas stance time increased with age and height.

TUG Performance
In line with previous studies, the sit-to-stand and stand-to-sit phases in TUG exhibited the strongest correlations with age, related to the angular velocity (pitch signal) [25]. Age alone was the most significant predictor of 17 features in TUG, including positive correlations with the mean (i-iii) and maximum velocity and mean acceleration (i-ii) of the pitch signal in the sit-to-stand phase, as well as the RMS of AP acceleration in the walking phase. In the stand-to-sit phase, we also found negative correlations with the range of pitch velocity, maximum velocity and mean acceleration (i-ii) of the yaw signal, and the mean and standard deviation of the AP acceleration.
Aging causes lower limb strength deficits (i.e., hip and knee flexion/extension and ankle dorsiflexion) [37]. Our findings suggest that older individuals rely more on trunk momentum to stand up from a sitting position. Specifically, they exhibit increased flexion of the trunk to translate the CoM to the base of support and subsequently extend the trunk via increasing the angular velocity (pitch) that contributes to the CoM vertical momentum [38,39]. Finally, the negative correlations in the second turn and in the sit-to-stand phases suggest a slower and more controlled turn and transition to sitting.

Strengths and Limitations
The strength of this study is its deployability, which is based on the simplicity of using only a single sensor to quantify the effect of age on gait and balance during well-established clinical outcomes and validating spatiotemporal gait features with the lowest sampling rate reported in the literature to our knowledge. We saw that sensor-based features extracted from these clinical tests could be grouped into separate domains to assess balance or gait and gait mobility. These features expand the traditional one-dimensional measures of the clinical outcome tests. As illustrated in Figure 7, these sensor-derived features mitigate floor/ceiling effects by distinguishing continuous differences between individuals and tasks. Finally, we demonstrate a proof-of-concept to implement this approach in stroke patients, illustrating how sensor-derived features from a single patient can be compared to normative data from healthy, age-ranged cohorts.
There are some limitations to this study that should be considered. First, the sample size for each age group was relatively small, and the maximum age of participants in the study was 70 in a fairly active cohort of older adults. This limits our ability to predict age-related changes in sensor data for an older or more general population. For instance, the univariate correlations of sensor-based features were relatively weak, with 35 features exhibiting a low correlation (|r| values 0.3 to 0.5) after controlling for weight and height, one feature exhibiting a moderate correlation (0.5 to 0.7), and no features exhibiting high (0.7 to 0.9) or very high correlations (0.9 to 1.0) [31]. Performance in BBS, 10MWT, and TUG continues to decline for individuals over the age of 70 [36], so extending the age range may capture a stronger relationship between sensor features and age. Similarly, for sensor-based features showing a non-negligible correlation with age (|r| ≥ 0.3, p < 0.05), the hierarchical multivariate regression model of age, gender, height, and weight yielded relatively low R 2 values, ranging from 0.104 to 0.604. This suggests that the 4-variable model does not explain most of the variance in this data and would be insufficient to predict outcomes accurately. However, the significance of age or other phenotype characteristics as predictors indicates a relationship between these variables and the sensor-based features. Further research is required to determine additional predictors of these sensor-based features, as well as their clinical relevance to age or impairment.
It should also be noted that, in the current study, estimating step length from sensor data requires an optimization constant K that is derived from gold standard data (in this case, actual step length from a GAITRite instrumented mat), as described in Equation (4a). Without this constant and the correction term presented in Equation (4b), we observed that step length was generally underestimated for the healthy group and overestimated for the stroke participant. Thus, we believe it would be critical to compute new, robust K values for gait-impaired populations, to ensure that the procedure would generalize to new individuals without measuring actual step length each time.
Future work will incorporate IMUs at the lower limbs to improve the step length estimation, and to maintain accuracy in step detection and spatiotemporal kinematics for gait-impaired populations [40]. Although a single lower back IMU was generally sufficient for gait cycle segmentation for healthy or mildly impaired individuals, additional IMUs or magneto-inertial sensors placed more distally (closer to the point of foot impact) are likely necessary when patients exhibit more drastic functional impairments such as slow walking [41], dropped foot [42], shuffling, or non-alternating steps [43].

Conclusions
In summary, we validated spatiotemporal gait parameters and quantified age-related changes across well-established clinical outcome tests used to quantify gait, mobility, and balance by applying a single IMU to the lower back. We demonstrated that sensor-derived features can improve the resolution required to determine changes related to age and augment current clinical outcome measures. Our results suggest that TUG is a reliable test for the quantification of age-related differences. The clinical meta-feature extraction approach with a single inertial sensor was feasible for estimating temporal gait features, though less accurate for step length estimation using the inverted pendulum model.
Overall, this study lays a foundation for amassing clinically-relevant baseline features from a healthy population to evaluate recovery progression across different impaired populations (e.g., stroke, multiple sclerosis, etc.). We expect that this approach would allow clinicians and therapists to better distinguish individual differences when evaluating gait and balance in the laboratory or in the community, thereby paving the way for more data-driven diagnoses and treatment of mobility impairment.