Reliability and Validity of Running Cadence and Stance Time Derived from Instrumented Wireless Earbuds

Instrumented earbuds equipped with accelerometers were developed in response to limitations of currently used running wearables regarding sensor location and feedback delivery. The aim of this study was to assess test–retest reliability, face validity and concurrent validity for cadence and stance time in running. Participants wore an instrumented earbud (new method) while running on a treadmill with embedded force-plates (well-established method). They ran at a range of running speeds and performed several instructed head movements while running at a comfortable speed. Cadence and stance time were derived from raw earbud and force-plate data and compared within and between both methods using t-tests, ICC and Bland–Altman analysis. Test–retest reliability was good-to-excellent for both methods. Face validity was demonstrated for both methods, with cadence and stance time varying with speed in to-be-expected directions. Between-methods agreement for cadence was excellent for all speeds and instructed head movements. For stance time, agreement was good-to-excellent for all conditions, except while running at 13 km/h and shaking the head. Overall, the measurement of cadence and stance time using an accelerometer embedded in a wireless earbud showed good test–retest reliability, face validity and concurrent validity, indicating that instrumented earbuds may provide a promising alternative to currently used wearable systems.


Introduction
Using an instrumented treadmill to measure gait parameters in the lab is a widely accepted and commonly applied method [1,2]. The force platforms embedded in the treadmill allow accurate recording of all steps in a trial, which constitutes a significant improvement over the previously used force platforms embedded in the floor that could only measure a single stance [1]. However, measurements with instrumented treadmills are restricted to the lab and, although biomechanically similar [3], some aspects of treadmill gait differ from overground gait because of the often fixed imposed speed in combination with the limited treadmill surface [4][5][6]. Moreover, the option of providing real-time feedback for long-term performance improvement is limited in a lab as it requires that participants regularly return to the lab. Various methods for gait analysis using inertial sensors have been developed for both walking and running. These inertial sensors can measure multiple steps, and are mobile, allowing measurement and feedback in the field [7][8][9][10][11][12][13][14].
Although the use of inertial sensors is popular among runners, usually in the form of wearables [15][16][17], they still have limitations that need to be resolved. First, the wearable sensors are often located on the legs [7,8,10,13] or the lower back [9,13], which is not very practical for the runner. In addition, the sensor placement can influence output variables and there is still no consensus on the optimal location for sensor placement [11][12][13][18][19][20], although algorithms for the determination of cadence have been developed that do not depend on sensor location [21]. Second, real-time feedback of gait parameters such as cadence and contact times derived from inertial sensors is often provided visually on a watch, looking at which interferes with running [15,22]. Auditory instructions or feedback

Data Collection
Each participant completed all conditions in a repeated-measures design. Before the measurements began, participants ran on the treadmill at a comfortable speed to warmup and familiarize themselves with running on the treadmill. Then, the maximum running speed was determined by discussing it with the participant and optionally trying to run at that speed for a short while. Participants were told that they could abort a trial at any time if the speed turned out to be too high.

Data Collection
Each participant completed all conditions in a repeated-measures design. Before the measurements began, participants ran on the treadmill at a comfortable speed to warmup and familiarize themselves with running on the treadmill. Then, the maximum running speed was determined by discussing it with the participant and optionally trying to run at that speed for a short while. Participants were told that they could abort a trial at any time if the speed turned out to be too high.

Data Collection
Each participant completed all conditions in a repeated-measures design. Before the measurements began, participants ran on the treadmill at a comfortable speed to warm-up and familiarize themselves with running on the treadmill. Then, the maximum running speed was determined by discussing it with the participant and optionally trying to run at that speed for a short while. Participants were told that they could abort a trial at any time if the speed turned out to be too high.
The participants were invited to run at four speeds ranging from 7 km/h, around which the transition from walking to running occurs [36], to 16 km/h, which was the maximum speed of the belt, with increments of 3 km/h, i.e., they ran at 7 km/h, 10 km/h, 13 km/h, and the maximum running speed of the participant, up to a maximum of 16 km/h. Each speed was measured for one minute. During the speed trials, no instruction regarding head movement was given. The order in which participants performed the speed conditions was randomized. After having completed all speed conditions once, participants performed a trial with instructed head movements while running at a comfortable running speed. During this head-movement trial, participants started running while keeping the head in a neutral position (facing forward). After about 15 s, participants were verbally instructed by the researcher to perform a particular head-movement condition every 15 s. Participants received seven such instructions, pertaining to a variety of different headmovement conditions (Table 1), which were explained beforehand. They were instructed to perform the head movements as realistically as they could during a run. The order in which those head-movement conditions were performed within the head-movement trial was random. After this trial with instructed head movements, the participants performed all speed conditions a second time in the same order to establish the test-retest reliability. Each trial started with a jump on the instrumented treadmill for synchronization purposes. Subsequently, the treadmill was started and brought up to speed. At the end of each trial, the speed of the treadmill was decreased to standstill. To avoid fatigue, participants were allowed a break in between trials for as long as they deemed necessary to recover. The experiment was concluded when the participants had completed all trials.

Data Analysis
All calculations were conducted in MATLAB (MathWorks ® , R2018b). The earbud data were resampled to 500 Hz and synchronized to the force-plate data manually by overlapping the jumps in the different datasets. Differences in maximum running speed and technical problems led to an incomplete dataset, within which not all speeds could be compared for all participants. Therefore, the sample size was reported for each comparison, and the maximum speed condition was not analyzed statistically due to the small sample size.
For all trials, the time instances of foot strike (t foot strike ) and foot off (t foot off ) were determined based on the force data and the acceleration data from the earbud. For the different speed conditions, data for 50 s of consistent running were used for the analysis, and for the head-movement conditions, data for 10 s around the head movement were used.
Force plate. Foot-strike and foot-off events were determined as the time instances at which the vertical component of the ground reaction force crossed a threshold of 25% of body weight in the upward and downward direction, respectively ( Figure 3a). Based on the determined events, cadence (in steps/min) was calculated as: and stance time (in s) was calculated as: where n is the number of steps and t foot strike(i+1) > t foot off(i) > t foot strike(i) .
and stance time (in s) was calculated as: where n is the number of steps and tfoot strike(i+1) > tfoot off(i) > tfoot strike(i). Earbud. A custom-written algorithm, which was developed specifically for the instrumented earbud by Dopple B.V., was used to determine cadence and stance time. This algorithm comprises a sliding window analysis (2 s, 0.6 s overlap) for real-time gait-event estimation. For each sample in a window, it first calculates the root sum square of the 3D acceleration to limit the effect of orientation differences. Then, the integral over time of this root sum square series is calculated to represent the velocity time series (Figure 3b). Next, the negative zero crossings of the velocity are determined, which roughly represent the middle of the flight phase ( Figure 3c). Finally, for every flight phase identified in the window, the corresponding width of the horizontal section in the root sum square of the acceleration is determined, whose boundaries provided estimates of foot-off and footstrike events (Figure 3d,e). All events were collated and duplicates were removed before calculating the cadence and stance time according to Equations (1) and 2, respectively.  Earbud. A custom-written algorithm, which was developed specifically for the instrumented earbud by Dopple B.V., was used to determine cadence and stance time. This algorithm comprises a sliding window analysis (2 s, 0.6 s overlap) for real-time gait-event estimation. For each sample in a window, it first calculates the root sum square of the 3D acceleration to limit the effect of orientation differences. Then, the integral over time of this root sum square series is calculated to represent the velocity time series (Figure 3b). Next, the negative zero crossings of the velocity are determined, which roughly represent the middle of the flight phase ( Figure 3c). Finally, for every flight phase identified in the window, the corresponding width of the horizontal section in the root sum square of the acceleration is determined, whose boundaries provided estimates of foot-off and foot-strike events (Figure 3d,e). All events were collated and duplicates were removed before calculating the cadence and stance time according to Equations (1) and 2, respectively.

Statistical Analysis
The statistical analysis was performed in IBM SPSS Statistics 25. Results were deemed significant when α < 0.05.
For each speed, within-method test-retest reliability was assessed for cadence and stance time using a two-tailed paired-samples t-test to identify systematic biases between test and retest, accompanied by the intra-class correlation coefficient (ICC) based on a single-rater, absolute agreement, 2-way mixed-effects model (values <0.5, 0.5 to 0.75, 0.75 to 0.9, and >0.9 indicate poor, moderate, good, and excellent agreement, respectively [37]). Bland-Altman analysis was used to establish the bias and 95% limits of agreement [38]. Bland-Altman plots were made to visualize the bias and limits of agreement.
Face validity was determined per method by comparing cadence and stance time over speeds with a one-way repeated-measures ANOVA, with paired-samples t-tests for post-hoc comparisons. Greenhouse-Geisser corrections were applied when Mauchly's test of sphericity was significant.
The between-methods agreement for cadences and stance times was assessed for the test and retest separately, as well as for the two tests combined [39], again using paired-samples t-tests accompanied by ICC and Bland-Altman analysis.

Results
Datafiles of the force and acceleration data for all trials per participant, together with the calculated variables, can be found in the Supplementary Materials. The mean cadence and stance time, as measured for the different speeds by the two methods in test and retest, are shown in Figure 4.

Statistical Analysis
The statistical analysis was performed in IBM SPSS Statistics 25. Results were deemed significant when α < 0.05.
For each speed, within-method test-retest reliability was assessed for cadence and stance time using a two-tailed paired-samples t-test to identify systematic biases between test and retest, accompanied by the intra-class correlation coefficient (ICC) based on a single-rater, absolute agreement, 2-way mixed-effects model (values < 0.5, 0.5 to 0.75, 0.75 to 0.9, and >0.9 indicate poor, moderate, good, and excellent agreement, respectively [37]). Bland-Altman analysis was used to establish the bias and 95% limits of agreement [38]. Bland-Altman plots were made to visualize the bias and limits of agreement.
Face validity was determined per method by comparing cadence and stance time over speeds with a one-way repeated-measures ANOVA, with paired-samples t-tests for post-hoc comparisons. Greenhouse-Geisser corrections were applied when Mauchly's test of sphericity was significant.
The between-methods agreement for cadences and stance times was assessed for the test and retest separately, as well as for the two tests combined [39], again using pairedsamples t-tests accompanied by ICC and Bland-Altman analysis.

Results
Datafiles of the force and acceleration data for all trials per participant, together with the calculated variables, can be found in the Supplementary Materials. The mean cadence and stance time, as measured for the different speeds by the two methods in test and retest, are shown in Figure 4.

Within-Method Reliability: Speed Conditions
Test-retest biases were not significant for both methods ( Figure 5; Table 2). The ICC showed good-to-excellent agreement for the earbud method (Table 2) and excellent agreement for the force-plate method ( Table 2). The Bland-Altman plots showed similar limits of agreement for both methods and indicated no dependence of the difference on the mean ( Figure 5).

Within-Method Reliability: Speed Conditions
Test-retest biases were not significant for both methods ( Figure 5; Table 2). The ICC showed good-to-excellent agreement for the earbud method (Table 2) and excellent agreement for the force-plate method ( Table 2). The Bland-Altman plots showed similar

Face Validity: Difference over Speed Conditions in Expected Direction
For both methods, cadence increased significantly with speed ( Figure 4a

Face Validity: Difference over Speed Conditions in Expected Direction
For both methods, cadence increased significantly with speed ( Figure 4a

Between-Methods Agreement: Speed Conditions
For most speeds, no significant between-methods biases in cadence and stance time were found (Figure 6a,c; Table 3). Only the stance time in the retest at 7 km/h had a significant bias of −0.008 ± 0.010 s (p = 0.02; Table 3), showing that the average stance time as measured with the force plate was 0.008 s longer than the average stance time as measured with the earbud. The test, retest and combined ICC showed excellent agreement for the cadence for all speeds (Table 3) and for the stance time at 7 and 10 km/h (Table 3); the stance time at 13 km/h showed moderate agreement when test and retest were combined, with poor agreement on the test and good agreement on the retest (Table 3). Bland-Altman plots revealed no clear dependence of the difference on the mean (Figure 6a,c).

Between Methods Agreement: Instructed Head-Movement Conditions
The between-methods bias was not significant for both cadence and stance time for all head movements (Table 4; Figure 6b,d). The ICC showed excellent agreement for cadence for all head movements (Table 4), and for stance time for all head movements except shaking the head, for which the agreement was good ( Table 4). The Bland-Altman plots also revealed that the difference between the methods was largest for the instruction to shake the head (Figure 6d).

Between-Methods Agreement: Speed Conditions
For most speeds, no significant between-methods biases in cadence and stance ti were found (Figure 6a,c; Table 3). Only the stance time in the retest at 7 km/h had a s nificant bias of −0.008 ± 0.010 s (p = 0.02; Table 3), showing that the average stance time measured with the force plate was 0.008 s longer than the average stance time as measur with the earbud. The test, retest and combined ICC showed excellent agreement for cadence for all speeds (Table 3) and for the stance time at 7 and 10 km/h ( Table 3); stance time at 13 km/h showed moderate agreement when test and retest were combin with poor agreement on the test and good agreement on the retest (Table 3). Bland-A man plots revealed no clear dependence of the difference on the mean (Figure 6a,c).  and retest combined; (a,c)) and the head-movement conditions (b,d). Table 3. Between-methods bias and agreement of cadence and stance time between earbud and force-plate methods for the speed conditions (bias = earbud − force plate).

Discussion
In this study, we determined cadence and stance time from accelerometer data collected in an earbud (new method) and force-plate data (well-established method) for a range of running speeds. For both methods, we assessed the test-retest reliability per speed condition and face validity over speed conditions (i.e., does cadence increase and stance time decrease with increasing speed?). In addition, concurrent validity was assessed by determining the between-methods agreement of cadence and stance time for the different speeds, as well as for a variety of instructed head-movements performed at a comfortable running speed. The test-retest reliability was good to excellent for both earbud and force-plate methods for both cadence and stance time, with similar limits of agreement. Face validity was also good-with significant differences in cadence and stance time with running speed in to-be-expected directions [35]-for both methods. We found excellent between-methods agreement for the cadence determined over the range of speeds and during the instructed head movements. The between-methods agreement for the stance time was good-to-excellent, with the agreement being less good for 13 km/h and while shaking the head. For the retest at 7 km/h, the between-methods bias in stance time was significant, although small (8 ms), while the corresponding agreement was excellent (Table 3).
Although both test-retest reliability and between-methods agreement were excellent, the limits of agreement for cadence for between-methods differences were considerably smaller than for test-retest differences within the methods (about three times; compare Figure 6a with Figure 5a,b, respectively). Variations found in test-retest reliability can partly be attributed to the variation in the performance of the participant [40], especially given the similarity in the limits of agreement and ICC for the test and retest assessments of the earbud and force-plate methods (Figure 5a,b). These results indicate that the two methods can be used interchangeably to measure running cadence over a range of speeds.
For the stance time, the limits of agreement were not smaller for the between-methods agreement compared to the test-retest reliability, although the limits of agreement on the test-retest comparisons were still similar for the earbud and force-plate methods (Figure 5c,d). It should be noted that the stance time, as calculated from the force-plate data, is dependent on the chosen threshold. Different thresholds are used in the literature (e.g., set thresholds [1,12,13,41], weight-dependent thresholds [39], or variability-dependent thresholds [2]). A higher force-plate threshold leads to systematically shorter stance times and vice versa, which would influence the bias between the force-plate and earbud data. In this study, we chose to use a weight-dependent threshold in view of the large weight differences between participants, which could have influenced event detection when using a set threshold. Future research should further examine the effects of threshold choice in force plate-based gait analysis and, ideally, culminate in evidence-based guidelines.
Overall, the between-methods comparisons yielded similar agreements for the different speeds and the different instructed head movements, indicating that incidental head movements did not have a substantive impact on the derived cadences and stance times. For this study, we calculated the mean cadence/stance time over a certain time window. For the instructed head movements, this meant the mean cadence/stance time for a window of 10 s around the head movement. Some of the instructed head movements, in particular the head shake, tended to cause errors in the earbud-based cadence/stance time estimates for the steps during which the particular head movement was performed, but not for the other steps ( Figure 7). The size of the window over which the mean is taken will influence the impact of errors in cadence/stance time associated with incidental head movements. If this is found to be a problem, using measures that are less sensitive to outliers, such as the median, will further limit the impact of incidental head movements on calculated variables. The relative differences found in cadence and stance time between the two methods are similar to the relative differences found in some previous studies that validated the use of ear-worn sensors in walking [31,34], and smaller than in other studies [32]. Our results extend the finding that ear-worn sensors can be used for gait analysis during walking to running, thus broadening the applicability of this type of sensors.
In conclusion, our results showed that cadence and stance time during running can be derived reliably from instrumented wireless earbuds equipped with an accelerometer, with good face validity and concurrent validity. Combined with the previously mentioned advantage of having a practical sensor location and the possibility to provide on-line auditory instructions or feedback, such as pacing to modulate cadence [23] and impact forces [42], instrumented earbuds may become a promising alternative to currently used wearable systems and the earbuds could even be developed into a promising all-in-one feedback device for both treadmill and outdoor running. A follow-up study comparing cadence and stance times derived using the earbud method to results obtained via a different validated method that is suitable for deriving cadence and stance times outdoors is recommended to substantiate this promise. Aspects of running outdoors that could be considered are speed variability and higher inter-step variability due to turns, uneven terrain and obstacles, and impact differences due to surface characteristics [43]. The relative differences found in cadence and stance time between the two methods are similar to the relative differences found in some previous studies that validated the use of ear-worn sensors in walking [31,34], and smaller than in other studies [32]. Our results extend the finding that ear-worn sensors can be used for gait analysis during walking to running, thus broadening the applicability of this type of sensors.
In conclusion, our results showed that cadence and stance time during running can be derived reliably from instrumented wireless earbuds equipped with an accelerometer, with good face validity and concurrent validity. Combined with the previously mentioned advantage of having a practical sensor location and the possibility to provide online auditory instructions or feedback, such as pacing to modulate cadence [23] and impact forces [42], instrumented earbuds may become a promising alternative to currently used wearable systems and the earbuds could even be developed into a promising all-inone feedback device for both treadmill and outdoor running. A follow-up study comparing cadence and stance times derived using the earbud method to results obtained via a different validated method that is suitable for deriving cadence and stance times outdoors is recommended to substantiate this promise. Aspects of running outdoors that could be considered are speed variability and higher inter-step variability due to turns, uneven terrain and obstacles, and impact differences due to surface characteristics [43].