1. Introduction
Human gait requires the complex interaction of the neurological, musculoskeletal and vestibular systems, making it a rich source of information for monitoring and assessment of their function. Common functional movement tests such as the Functional Gait Assessment and Berg Balance Scale, which are used to assess an individual’s postural stability and ability to perform multiple motor tasks, contain a significant number of gait-related activities [
1]. Compared to straight-line walking, these provide additional challenges to standard gait, for example by adding directed head movements and following a turning route. These reflect scenarios which people may find difficult in daily life, particularly when they have an underlying condition affecting movement. While there are multiple existing methods of instrumenting and analysing these tests using camera-based methods or often simply under direct clinical supervision, recent work has moved towards monitoring and assessing gait function using wearable sensors.
Wearable sensors have been deployed in a multitude of clinical and academic studies both to study movement directly and as an addition to other sensing modalities [
2,
3,
4,
5], as well as for larger-scale population behaviour [
6]. They are convenient for gait analysis since their use is not restricted to a specific location and they do not require the direct supervision of a clinician, while they require less effort to set up than a camera-based system. However, a wearable sensor must have a suitable form factor and be placed easily and consistently without interfering with a user’s natural gait pattern.
Wearable devices are often composed of inertial measurement units (IMUs), containing accelerometers, gyroscopes and sometimes magnetometers, and worn at a wide range of locations such as the feet, shank, trunk or wrist [
7,
8,
9,
10]. Each possible location provides a trade-off between wearability and availability of information about the aspect of gait being monitored, and this has been explored extensively in the previous literature [
11,
12]. IMU sensors placed at the ear, often referred to as ’earables’, provide a suitable solution given the constraints since the ear is a stable, comfortable and aesthetically acceptable place to house a sensor, which is reflected in widespread commercial use of headphones and earbuds [
13,
14,
15]. Many individuals use hearing aids, especially those in ageing populations where movement disorders are more prevalent, and some commercially available earbuds and hearing aids now contain an IMU or accelerometer. Since the vestibular system is located in the inner ear, it is a useful body segment to track in itself when monitoring individuals with balance disorders. Further, due to the direct coupling to the trunk, the signals produced from a head or ear-worn sensor describe the movement of centre of mass well [
16,
17] and even filter out some lower body motion artifacts to produce a clearer signal for gait event detection [
18]. The head’s location on the sagittal plane also means that the movement of both left and right sides of the body can be measured using the same system.
Gait analysis using IMUs involves processing of the signals produced by the accelerometer and gyroscope. While IMUs have been used to analyse many aspects of gait, most metrics used for gait analysis firstly require separation of the time series signal into gait cycles (consecutive initial ground contacts of the same foot) and subphases; hence, it is important that this is performed accurately. Further, the gait events themselves are used to calculate temporal parameters such as stance and swing times, which are clinically useful in their own right. There is a body of existing work on classifying gait events using head- and ear-worn IMUs. Jarchi et al. proposed a method based on singular spectrum analysis (SSA) with longest common subsequence (LCSS) on the anterior–posterior (AP) and mediolateral (ML) acceleration signals to determine initial contact (IC) and terminal contact (TC) events in level, downstairs and upstairs walking [
19]. Jarchi’s method, however, requires averaging over many cycles and was shown to perform poorly during other studies [
20]. Diao et al. applied a similar method using SSA to determine these events for both healthy and postoperative individuals during straight-line walking, but used the superior–inferior (SI) signal rather than the AP signal to identify ICs, achieving low error in stride, swing and stance time relative to a commercial IMU-based gait evaluation system [
21]. Seifer et al. improved on Diao’s algorithm by using the first dominant oscillation of the ML acceleration, which reduced the error in determining the laterality of each step [
20].
Xu et al. used a deep learning approach to account for short walking bouts; however, their method suffers from high false positive rate and was tested with only thirteen participants in a non-natural walking scenario [
22]. Also applying a deep learning approach, Decker et al. recently achieved an F1 score of 99% for IC detection on level ground walking across a range of walking speeds; however, they only collected data from healthy individuals and scored relatively lower (91%) for TC events [
23]. Jung et al. validated five walking and six running gait parameters detected by an ear-worn system against a motion capture system and instrumented treadmill, though they did not develop a new gait parameter estimation algorithm [
24]. Hwang et al. noticed a pattern of peaks in the SI signal mapping to foot-off, producing a method to detect ICs and TCs from only the SI acceleration. However, this was only tested on seven young, healthy participants and made a large assumption about the number of peaks per cycle, which varies depending on the participant’s natural gait pattern and the nature of the surface being walked on [
25].
While several of these methods have been shown to perform well in straight-line walking, none have been tested in more challenging scenarios such as during turning and while walking with head turns. These have been shown to uncover underlying movement-related deficiencies which do not appear during straight-line walking and are therefore powerful clinical tools [
7,
26,
27]. Head-worn IMUs have been used to investigate the coupling of head and trunk during turning, though not for gait event detection [
28]. Other studies have investigated gait event detection during turning movements with IMUs; however, these were placed at the shank [
7], feet [
8] and lower trunk [
27]. Further, the majority of studies in this field lack diversity in age, gender and level of movement function, with a bias towards young, typically functioning males. Movement patterns vary greatly across these cohorts, making it crucial to collect data from a representative participant group.
Developing algorithms for an ear-worn IMU-based system which are robust to turning and walking with directed head turns provides a means to quantitatively analyse these tasks in less restrictive settings. This provides a basis to develop a more scalable system to improve access to clinical tests which include these movements. Therefore, this study seeks to provide the following contributions:
Evaluation of gait event detection using an ear-worn IMU system during complex gait movement involving head turns against an optical ground truth.
Comparison of ear-worn IMU gait detection algorithm performance for a large participant cohort, involving both typically functioning individuals and those with an underlying movement disorder.
Proposal and evaluation of a new gait event detection algorithm, TP-EAR (Temporal Parameters from the EAR), which provides more robust TC detection and improved estimation of stance and swing time during complex gait tasks.
4. Discussion
This study investigated the performance of two algorithms for detecting gait events during complex walking tasks using ear-worn IMUs. These were tested on data collected from 68 participants across a broad range of age, sex and size, and the sample crucially included 18 participants with a known movement disorder. Temporal parameters, namely stride, stance and swing times, were calculated using the detected events and compared with a ground truth labeled using optical data.
4.1. Comparison by Activity
4.1.1. Sensitivity
Both algorithms generally performed strongly for IC sensitivity, with half of the activity/group combinations scoring 99.9% or higher. Given that both algorithms used SSA and peak identification to find ICs, it follows that the sensitivity was generally similar between them. The scores align with those reported by [
20], who reported IC sensitivity above 99% in their implementation of the Diao algorithm for straight-line walking, and [
25], who reported a 0.24% MAE in step counting around an athletics track using a head IMU. For Walk and WalkH, both algorithms missed very few IC events; however, both performed relatively worse for WalkV and Turn when averaged across both participant groups. Since WalkV incorporates vertical head movements, which adds an extra component to the SI signal, it makes sense that the algorithm performed slightly worse here. Similarly, the turn provides an interruption to the straight-line gait signal, which makes it trickier for the SSA algorithm to identify the dominant oscillation. This being said, both algorithms still identified the vast majority of IC events correctly. TP-EAR did tend to achieve slightly higher sensitivity than Diao, particularly for the more challenging WalkV and Turn, which supports the use of the underlying SI signal peak rather than simply using the dominant oscillation peak. Further, both algorithms were more sensitive than the 95–97% IC precision reported by Romijnders et al.’s study, which used shank-mounted IMUs to measure gait events during turning [
7]. This suggests that they would be suitable for counting steps completed during a turn, which has been shown to differentiate between mildly impaired Parkinson’s Disease patients and age-matched controls [
35].
Much greater differentiation took place when comparing the TC events. TP-EAR was highly sensitive across all activities, scoring higher for all events than the 97.5% sensitivity reported by [
20] during straight-line walking and comparably to the 99% precision for healthy participants reported by [
36]. However, while Diao was highly sensitive for Walk and WalkV (the lower result for WalkV is caused by compounding of the missed IC events), this dropped off slightly for WalkH and more so for Turn. The high sensitivity for Walk and WalkV would be expected since there was no change to prescribed movement pattern in the ML plane. The relatively high TC sensitivity for WalkH, where the horizontal head turning directly affected the ML acceleration signal, may have been due to the fact that while the pattern was changed, there were still peaks/troughs within 300 ms of the IC which the algorithm interpreted as TC events. Therefore, the poorer performance of the Diao algorithm on this activity was reflected more by the higher stance and swing time errors. Meanwhile, the Diao algorithm’s poor performance on Turn was likely due to a combination of the unusual foot placement and the extra disruption of the ML signal due to the imbalance imposed by the turning motion, which was avoided by TP-EAR since it relied on the SI signal.
4.1.2. Laterality Accuracy
The method of laterality detection was the same for both algorithms; therefore, the results were largely comparable. Small differences stemmed from the adapted IC detection algorithm used by TP-EAR, which changed the number of detected events and location of the IC in the time series signal. The main finding was the large drop-off in laterality detection accuracy for WalkH, especially for the typically functioning group. This was expected since the horizontal head movements caused the ML signal to differ significantly from that expected during straight-line walking. During turning in the typically functioning group, the sensitivity was much higher than for WalkH but lower than Walk and WalkV. Given that there is some disruption to the ML signal over the course of the turn but less so than in WalkH, it follows that the laterality should be slightly negatively impacted relative to Walk but not so relative to WalkH. However, the laterality detection during turning reached 100% in the non-typically functioning group, which may reflect differences in their turning strategy (discussed more in
Section 4.3).
4.2. Temporal Parameters
Both algorithms produced consistent stride time errors across Walk, WalkV and WalkH, with MAEs between 10 ms and 17 ms for the typically functioning participants and between 19 ms and 37 ms for the non-typically functioning participants. This is comparable with the stride time MAE reported using the Diao improved algorithm in [
20] and to Zijlstra et al.’s lower trunk approach (between 2 ms and 15 ms depending on speed) [
10]. Since the sampling interval is 10 ms, this equates to between one and four samples. The TP-EAR algorithm showed lower MSE magnitudes for these activities than Diao (and in fact was nearly zero), suggesting that using the underlying SI signal peak removes a bias introduced by using the dominant signal.
The stride time MAE for Turn was greater for both algorithms, between 40 ms and 50 ms, while the MSE showed a negative bias for both participant groups. This reflected the added difficulty for the SSA-reconstructed component to capture the change in SI signal during the turn where the nature of foot contact, and therefore the location of each IC, changed. The stride time MAE was much higher than that reported by [
11] using lower leg IMUs on a slalom course (between 9.4 ms and 15.2 ms depending on algorithm and location); however, this may be due to the turn in this study being much sharper and hence more disruptive to gait. The mean and range of the stride time values calculated by the IMU still align well with the ground truth (shown in
Appendix B), suggesting it still adequately captures the underlying pattern.
Both the mean and standard deviation in difference between IMU and ground truth were higher for stance and swing times, which depend on TC as well as IC accuracy. This was expected since lower accuracy and precision are frequently reported for TC versus IC events, even when using the trajectories from optical markers placed on the feet [
37]. The increase in error was particularly evident in WalkH, where the TC detection was more difficult due to the disruption to the ML acceleration signal. While the difference in AE and SE is notable, it is likely that this would be much greater if the missed events were taken into account, since TP-EAR missed fewer events. The MAE across the other events was comparable between the two algorithms, though the magnitude of SE was generally much lower for TP-EAR, suggesting it was much less biased.
Across all activities, there were a handful of large outliers, some of which were larger than 300 ms, which are apparent when observing
Figure 6 and
Figure 7. These occurred when a gait event was detected early and the following event was detected late (or vice-versa). If one or both of these events was wrongly estimated by a margin close to the cut-off (300 ms), the calculated temporal parameter could therefore be out by more than 300 ms. While this happened rarely, depending on the application scenario it may help to incorporate a threshold to exclude known outlier values, for example when calculating average temporal parameters across many trials for a given individual.
4.3. Participant Group Comparison
For Walk and WalkV, the sensitivity and laterality determination was slightly lower for non-typically functioning participants when compared to the typically functioning participants, and the AE and SE were also slightly higher. This was most noticeable for WalkV with the Diao algorithm, though less so with TP-EAR, which suggests that the vertical head movements contributed to an offset in the dominant SI component which could be compensated for by TP-EAR.
Contrary to what may be expected, the sensitivity of both algorithms improved for the Non-Typical group for WalkH and Turn, and TP-EAR attained 100% sensitivity for both IC and TC. For WalkH, this may be due to differences in the timing or intensity of the horizontal movements of the head while performing the task between groups, which may be supported by the fact that they spent relatively longer in stance phase compared to Walk than the typically functioning group (as shown in
Table A7), and that the laterality determination accuracy is much higher, indicating less change to the ML signal.
The better performance of both algorithms on the Non-Typical group for Turn may be due to the greater tendency for the typical group to adopt more aggressive turning strategies. Courtine et al. [
31] found trunk roll during turning is greater at higher speeds, which would provide greater disruption to the signal. The 100% laterality determination accuracy for Non-Typical participants implies that the ML signal was not significantly disrupted. Observing the optical data showed that some typically functioning participants landed on their toes and/or pivoted strongly around one foot to complete a tight turning circle, while this was not the case for the non-typically functioning group. In contrast, the algorithm correctly identified 100% of the laterality for the non-typically functioning group. This may reflect the reticence for the non-typically functioning group to disrupt their walking pattern when planting their feet during the turn, though this should be investigated further.
Ultimately, gait stability assessment is used to benefit those with impaired function, so the fact that both algorithms performed relatively better for this group during tasks with horizontal head movements may improve its suitability for clinical use.
4.4. Limitations
This study had several limitations. Firstly, despite the fact that the cohort was diverse and included a significant number of participants with known movement disorders, none of these were highly severe. Including participants with higher levels of impairment would be a greater test of the algorithms’ ability to handle non-typical gait. Future work could also test the TP-EAR algorithm with very slow gait to see whether the peak corresponding to TC is still sufficiently prominent to be detected, as its amplitude has been shown to decrease with slower speeds [
10].
Another limitation was the necessity of manual selection of the window length parameter when applying the SSA algorithm. To the authors’ best knowledge, this has not been discussed in the context of gait analysis using IMUs; however, there is existing research on finding the mathematically optimal window size for a given signal [
38], which could provide an avenue for improving current SSA-based gait analysis implementations.
Additionally, while the scope of this study was to perform gait event detection for complex walking tasks used during functional assessments, it would be desirable to integrate this into in-the-wild monitoring. The set of activities performed was relatively contrived, so testing the algorithms in a less structured setting would allow assessment of their performance for more natural movements. Further to this, implementing the algorithm to run on-device and in real time would improve its transferability to a home-monitoring system. While running SSA is relatively computationally expensive since it relies on singular value decomposition, there exist efficient implementations which run on embedded devices [
39].
Future work should look more thoroughly into the analysis of turning using ear-worn IMUs. It would be beneficial, for instance, to recreate the turning path and speed using the IMU signals, either to give context to the detected gait event or to look more deeply into the strategies adopted by varying participant groups. Analysis could also be extended to the sit-to-stand and stand-to-sit sections of the full timed up-and-go trial, which provide further movement-related challenges. While some work has been completed for this using glasses mounted with an IMU, this has yet to have been implemented using an ear-worn IMU [
40].