Inertial Sensor Reliability and Validity for Static and Dynamic Balance in Healthy Adults: A Systematic Review

Compared to laboratory equipment inertial sensors are inexpensive and portable, permitting the measurement of postural sway and balance to be conducted in any setting. This systematic review investigated the inter-sensor and test-retest reliability, and concurrent and discriminant validity to measure static and dynamic balance in healthy adults. Medline, PubMed, Embase, Scopus, CINAHL, and Web of Science were searched to January 2021. Nineteen studies met the inclusion criteria. Meta-analysis was possible for reliability studies only and it was found that inertial sensors are reliable to measure static standing eyes open. A synthesis of the included studies shows moderate to good reliability for dynamic balance. Concurrent validity is moderate for both static and dynamic balance. Sensors discriminate old from young adults by amplitude of mediolateral sway, gait velocity, step length, and turn speed. Fallers are discriminated from non-fallers by sensor measures during walking, stepping, and sit to stand. The accuracy of discrimination is unable to be determined conclusively. Using inertial sensors to measure postural sway in healthy adults provides real-time data collected in the natural environment and enables discrimination between fallers and non-fallers. The ability of inertial sensors to identify differences in postural sway components related to altered performance in clinical tests can inform targeted interventions for the prevention of falls and near falls.


Introduction
Postural control of balance is essential for keeping upright, moving effectively, and reacting to environmental challenges [1]. Good balance improves quality of life and wellbeing. Conversely, balance deficits can lead to a near fall or fall that may result in physical, psychological, or social consequences and, in some cases, death [2]. Near falls occur due to a loss of balance from a slip, trip or stumble where a fall is avoided "because a corrective action is taken to recover balance" [3] (p. 49). Although near falls are a predictor for falls [4], there is limited research concerning near falls, resulting in an unknown trajectory of the decline from near falls to falls [5]. People living in the community who have near falls and do not sustain an injury escape the attention of the health system. However, they are the group most likely to benefit from interventions to prevent falls. Until recently, having a fall has been the best predictor of having another fall. Recent evidence has identified clinical tests, namely single leg stance, lunge, and tandem walk five steps, that are able to discriminate near-fallers from fallers and non-fallers [6]. While gross changes in the performance of these tests are associated with falls history, there is no understanding of the contribution of postural sway to these outcomes.
Postural sway, the movement of the body over the base of support, is an indicator of balance. The traditional methods of measuring the speed, direction, and amplitude of postural sway by force plates or motion capture in gait laboratories has been superseded by wearable inertial sensors with recent interest in their measurement of standing balance [7] Sensors 2021, 21, 5167 2 of 16 and gait [8]. Compared to the laboratory equipment, inertial sensors are inexpensive, portable, and permit measurements of postural sway to be taken in any setting specific to the population under investigation [9]. Additionally, wearable inertial sensors are small, lightweight, unobtrusive, and can be fixed on the body by tape, belt, or strap. Sensor data can be captured on three axes and can therefore provide detailed information in three dimensions of subtle changes in postural sway for static or dynamic conditions. Inertial sensor measures of sway can discriminate between various age groups, and between healthy adults and adults with Parkinson's disease [10], multiple sclerosis [11], and other neurological conditions [12]. Falls risk assessment by wearable inertial sensor is more sensitive than clinical testing using the timed up and go [13]. However, the reliability and validity of inertial sensors to measure postural sway is still unclear [14], especially in seemingly healthy populations without known pathology who experience near falls and falls.
Therefore, the aim of this systematic review was to examine and synthesize the current literature on the validity and reliability of wearable inertial sensors to measure postural sway in healthy adults undertaking static and dynamic balance tests.

Search Strategy
Three stages of searches were undertaken, following PRISMA guidelines [15]. The first stage was to identify systematic reviews that investigated 'postural balance', 'inertial sensors', and 'reproducibility of results' via the reference list of a scoping review of systematic reviews previously conducted [16]. This search identified five systematic reviews [3,12,14,17,18]. One further relevant systematic review [7] was published after the scoping review went to press. The critical appraisal of these six recent systematic reviews [3,7,12,14,17,18] was undertaken by two independent reviewers, with a third person to mediate in the case of disagreement. None of these reviews directly answered the aims of this study. Therefore, a new search was conducted as stage two.
The existing systematic reviews assisted the development of search strategies, terms, and dates. Three main concepts informed keywords, MeSH, and search terms: 'postural control', 'inertial sensors', and 'validity/reliability' (see Appendix A for full list of search terms). Relevant truncations and expansions were applied for each database, which included Medline, PubMed, Embase, Scopus, CINAHL, and Web of Science. The dates for searching were from January 2019 to January 2021. Searches were conducted by a research librarian experienced in conducting systematic reviews.
Selection criteria followed PICO (population, intervention, comparison, outcome) principles as follows: (P) healthy adults including healthy adults as a control group; (I) wearable inertial sensor to measure static and dynamic balance; (C) force plates, motion capture or other digital or clinical measure; (O) reliability, validity, accuracy. Exclusions were for papers published with children or non-human subjects, balance or equilibrium other than postural, postural alignment, pressure sensors, and studies that investigated only static or dynamic balance, not both. Papers published before 2010 were excluded on the basis of technological advances in sensor manufacture in the past 10 years. Smartphone use was excluded because of the need to hold a device in the hand, thereby altering natural arm movement for balance maintenance or recovery [19]. Moreover, the range of balance tests interpretable by phone does not incorporate novel balance tests, such as tandem walk and lunge [6]. Only primary investigation studies were incorporated, including conference proceedings if peer reviewed. Language was limited to English.
The third search examined the reference lists of the included studies and the six systematic reviews for relevant studies that fitted the inclusion criteria.

Eligibility, Quality and Data Extraction
Two independent reviewers screened titles and abstracts against selection criteria prior to full text review. A third author was available for arbitration but was not required. All search information was managed using Covidence systematic review software. Critical appraisal of the internal and external validity of included studies was undertaken using JBI critical appraisal checklist for analytical cross-sectional studies [20].
The first two authors extracted data from the first five studies into Excel and crosschecked for accuracy. The first author then extracted the remainder of the data, which were checked for thoroughness by the third author.

Data Pooling
Data pooling was multistage. Studies were initially grouped broadly to validity or reliability, then refined within these two contexts. Validity was categorized as concurrent (compared to gold standard), discriminant (able to distinguish between groups), and convergent (related to the clinical measure). Reliability was categorized as internal consistency (inertial sensor accurately measures postural sway) or test-retest reliability (sensor data replicates the results of the same postural sway activity in the same person at two timepoints). Balance activities were dichotomized to static or dynamic tests, then further refined to sort into the same measurement outcomes, e.g., single leg stance for static balance; timed up and go for dynamic. Finally, the outcome measures for validity and reliability were grouped, e.g., Pearson's rho for validity; intraclass correlations for reliability. Heterogeneity was examined using τ2, I 2 and Cochran's Q statistic using the interpretations: τ2 = 0 suggests no heterogeneity, I 2 values < 25, 26-50%, and >75% suggest low, moderate, and high heterogeneity respectively, and a significant Q statistic indicated that the studies do not share similar effects [21].

Statistical Analysis
Interrater agreement between two reviewers was captured at three stages, namely title/abstract screen, full text inclusion, and reference list inclusion. Rater agreement was analysed using Cohen's kappa with agreement values interpreted as ≥0.81 excellent, 0.61-0.8 good, 0.41-06 fair and ≤0.4 poor [22]. For meta-analysis, homogeneity with balance activity, sensor location, and measurement outcome were required [23]. Where heterogeneity prevented meta-analysis, synthesis of the data was conducted.

Results
The search strategy identified 5430 articles. Following duplicate removal, as well as screening of titles, abstracts, and full text, 19 articles met the inclusion criteria. One paper repeated a previous study with different analysis and was therefore excluded [24] (see PRISMA flow diagram, Figure 1).
Dynamic balance was assessed with postural transitions (sit to stand, stand to sit, transfer chair to chair), stepping (first step, step up), walking for a set time or distance, running, turning around, and jumping forward and sideways ( Table 1). The outcome measures to evaluate these activities were varied. For example, gait was measured by step length, velocity, regularity, height, length, continuity, or symmetry; stride length or velocity; cadence and/or stance time. No single clinical test was used consistently. Dynamic clinical measures also included walking tests (timed up and go [31,33,36,37,42], 10 m walk [41], six-minute walk test [34], 25-foot walk test [37], and jumping (dynamic postural stability index (DPSI)) [25].
When both static and dynamic balance were assessed in the one balance test, they were measured by the Berg balance scale [28,31,42] and MiniBEST [31,34]. The time or distance within standardized tests differed between studies, e.g., in the TUG, the standard 3 m walking distance was increased to 7 m distance to provide more consistent data for gait parameters [36,37], and included additional single or dual tasks [31]. Static balance data from the sensors were analysed by multiple methods, most commonly root mean square (RMS) of acceleration, but also maximum, minimum, or mean of acceleration, jerk, various measures of velocity and Euclidian norm minus one (ENMO), providing challenges in grouping for meta-analysis. Dynamic balance similarly had multiple different analyses of step and stride length, stance time, cadence, and velocity.

Quality Assessment
All included papers were observational studies. Quality assessment used JBI analytical cross-sectional study critical appraisal checklist (see Table 2 Quality Assessment). All papers described the exposure, outcomes, and appropriate data analysis methods in detail. All studies but one [39] used standard, objective criteria. However, four studies lacked explicit selection criteria [27,29,38,39] and several provided no detail of the setting for the study [25][26][27]29,32,34,37,39,40,42,43], which impacts replicability. Five of the papers provided no identification or management of confounding factors [25,26,29,34,38], which impacts the trustworthiness of the results in these papers. Interrater agreement between two reviewers for screening, full text, and reference list selections was analysed using Cohen's kappa, with a result of k = 0.805 interpreted as a good result.

Sensors
Sensor type, number, position, fixation, sampling frequency, and calibration methods differed between studies as outlined in Table 3. Only one study used a dual axis accelerometer (antero-posterior and mediolateral) [32] while the remainder used triaxial sensors, providing accelerometry data for the additional vertical plane. Thirteen studies also used inertial sensors with inbuilt gyroscopes providing further rotational velocity information [26,28,29,[33][34][35][36][37][41][42][43]. The most common inertial sensors were Opals and XSens, where accelerometry data measured concurrent input from multiple sensors placed on the trunk and extremities. Various options for sensor body position and fixation were identified between studies. The preferred position for a sole sensor was on the lumbar spine [25,28,40,42] as this position corresponded closest to the centre of gravity of the body. Further, sensors situated on the low lumbar spine produced greater accuracy than thoracic sensors [27]. Studies using multiple sensor systems located them on the lower back, sternum, wrists, and ankles. Methods of fixation were not described in nine studies (47%). When stated, fixation from elasticated belts or bands [25,30,[34][35][36]39,40] or adhesive tape [27,28,42] were the preferred methods. Two papers discussed movement artefacts [25,42]. However, only one excluded data due to sensor movement [42]. Sampling frequency ranged from 20 to 400 Hz, although Velazquez-Perez [41] provided no Sensors 2021, 21, 5167 9 of 16 information on this. Only one paper [31] described down-sampling, which is the process of reducing the sample rate of a signal to manage the size of data. Accelerometry and gyroscopic data were analysed using sensor-specific tools [27,30], or in programs such as MATLAB [25][26][27][28][29]31,32,34,35,37,38,42,43] or Mobility Lab [33,34,36]. The statistical program 'R' was used in one study [39] and STASTICA in another [41].

Validity
The validity of the inertial sensor to measure balance was explored through concurrent (compared to gold standard), discriminant (able to distinguish between groups), and convergent (related to the clinical measure) validity. Data pooling was not possible for meta-analysis concerning validity due to the variety of protocols and outcome measures undertaken.
Regarding sensor position, a single lumbar spine sensor identified significant differences between younger and older healthy adults [32,42] and was able to distinguish. the different dynamic balance tasks of lateral and forward jumps [25]. Further, the single sensor was as accurate as the six-sensor array [33].
Convergent validity evaluated the inertial sensor balance measures against clinical balance tools. The six-sensor array identified differences between the study group and healthy controls when observed, whereas timed clinical tests could not [33,37].
Test-retest reliability showed reasonable consistency between studies. Meta-analysis was possible when static stance incorporated feet apart eyes open, measured by RMS of acceleration ML and AP, and when intraclass correlations were undertaken for statistical analysis [25,34,36]. Results from the grouped studies produced high homogeneity (I 2 = 0.0%) with similar effects (Cochrane's Q non-significant 0.14) indicating trustworthiness of the sensors to measure static balance ( Figure 2). However, the lower quality of two of the included papers [25,34] (Table 2) influenced the strength of findings. Therefore, meta-analysis results were considered informative rather than conclusive. Measurements of static sway distance, sway area, path length, mean velocity, and RMS were reliable, indicated by moderate to good correlations ranging between ICC 0.57 and 0.79 [34,36]. Although dynamic balance was measured in diverse balance tasks, all test-retest parameters of dynamic balance produced moderate to excellent correlations (ICC 0.696-0.94), indicating strong correlations and good reliability in healthy adults [25,34,36].
analysis results were considered informative rather than conclusive. Measurements of static sway distance, sway area, path length, mean velocity, and RMS were reliable, indicated by moderate to good correlations ranging between ICC 0.57 and 0.79 [34,36]. Although dynamic balance was measured in diverse balance tasks, all test-retest parameters of dynamic balance produced moderate to excellent correlations (ICC 0.696-0.94), indicating strong correlations and good reliability in healthy adults [25,34,36

Discussion
The aim of this systematic review was to investigate and synthesize the validity and reliability of wearable inertial sensors to measure postural sway in static and dynamic balance for healthy adults. Test-retest reliability results were consistently moderate to excellent for static and dynamic balance across the included studies. Meta-analysis was impossible for the validity studies due to heterogenous samples and methods. However, the synthesis showed moderate to good validity overall. These findings indicate consistency against gold standard equipment for measures of ML and AP sway in static balance and step time, step length, and gait velocity for dynamic balance. While the sensors were able to discriminate young from old, and fallers from non-fallers, the accuracy of discriminating healthy controls from diagnostic groups varied between studies.
The variability in equipment included multiple types of sensor. While all studies used accelerometer data, fewer included gyroscope data, suggesting data from accelerometers may be sufficient for clinical interventions. This reduced complexity may encourage more clinicians who are unfamiliar with the technical aspects of the new equipment to integrate sensors into practice. The multiple strategies for data acquisition, feature extraction, signal processing, and data analysis presented a heterogenous mix unsuitable for meta-analysis.
There was no consistent number of sensors or sensor placement position. However, the lumbar spine (L3-L5) was the preferred site overall. A single inertial sensor was as reliable as multiple sensors when placed near the centre of mass (L3-L5) and showed

Discussion
The aim of this systematic review was to investigate and synthesize the validity and reliability of wearable inertial sensors to measure postural sway in static and dynamic balance for healthy adults. Test-retest reliability results were consistently moderate to excellent for static and dynamic balance across the included studies. Meta-analysis was impossible for the validity studies due to heterogenous samples and methods. However, the synthesis showed moderate to good validity overall. These findings indicate consistency against gold standard equipment for measures of ML and AP sway in static balance and step time, step length, and gait velocity for dynamic balance. While the sensors were able to discriminate young from old, and fallers from non-fallers, the accuracy of discriminating healthy controls from diagnostic groups varied between studies.
The variability in equipment included multiple types of sensor. While all studies used accelerometer data, fewer included gyroscope data, suggesting data from accelerometers may be sufficient for clinical interventions. This reduced complexity may encourage more clinicians who are unfamiliar with the technical aspects of the new equipment to integrate sensors into practice. The multiple strategies for data acquisition, feature extraction, signal processing, and data analysis presented a heterogenous mix unsuitable for meta-analysis.
There was no consistent number of sensors or sensor placement position. However, the lumbar spine (L3-L5) was the preferred site overall. A single inertial sensor was as reliable as multiple sensors when placed near the centre of mass (L3-L5) and showed moderate to good validity and test-retest reliability for both static and dynamic balance. A single sensor placed over the centre of mass would provide simplicity in the clinical setting, particularly during telehealth interactions when instruction, observations, and interventions are provided remotely. A single sensor also aligns with recent literature for identifying differences between fallers and non-fallers [44]. However, using different placements for static and dynamic balance activities [29], and different body positions for sensor fixation, created challenges with pooling data. While different research questions demand different types of analysis, the standardization of the sensor position would permit a comparison of results across studies. The validity of sway measures from wearable inertial sensors compared to the gold standard force plates or motion capture provided promising results across studies. These results concur with a previous scoping review of systematic reviews [16] as well as recent studies investigating the concurrent validity of sensors to measure balance in healthy adults [8,[45][46][47]. In healthy populations, this indicates that inertial sensors provide valid data when used in home and community settings [48]. This provides flexibility for clinical treatment and trials, particularly in rural and remote settings, or during social distancing such as with COVID-19 [49]. Importantly it ensures that performance during testing is not altered by an unfamiliar environment. Therefore, these findings provide reassurance that the sensors are a valid proxy for the gold standard as a means of measuring static and dynamic balance in the community.
Sensors were valid in discriminating sway between younger and older participants, reinforcing the sway changes that occur due to ageing [50]. Sensors also discriminated fallers from non-fallers. The sensor data discriminated sub-tasks within clinical tests such as separating components for the timed up and go into the sit-to-stand, walk straight, turn, and stand-to-sit, which is consistent with previous findings in timed up and go [51].
While several studies measured the sway differences between fallers and non-fallers, no studies investigated differences between non-faller, fallers, and those who had experienced near falls. As this review provides evidence that sensors can identify subtle changes in sway between different aged healthy people, it is possible that sensors may identify sway differences between near fallers, fallers, and non-fallers. The early detection of subtle changes in postural sway is required to identify the risk of near falls [4,52] and can be measured reliably and with confidence of validity using inertial sensors.
The main limitation to this investigation was the inability to pool included studies for meta-analysis due to heterogeneity with balance activity, sensor location, and measurement outcomes. Additionally, some limitation may be considered from the inclusion of articles written only in English.

Conclusions
Measuring postural sway using inertial sensors in healthy adults permits assessment and treatment in the person's natural environment, providing reassurance of accurate measures during times of social distancing. The ability to identify separate components of clinical tests using sensors permits the detection of subtle sway changes that may contribute to understanding sway differences for near falls as well as falls. Further research is required to evaluate the convergent validity of using a single sensor over the centre of mass rather than a six-sensor array for clinical balance tests such as the timed up and go test. Similarly, further research using a single sensor to discriminate sway differences between healthy and diagnostic groups, distinct age groups, and fallers/non-fallers would encourage the clinical uptake of sensors.