The Agreement between Wearable Sensors and Force Plates for the Analysis of Stride Time Variability

The variability and regularity of stride time may help identify individuals at a greater risk of injury during military load carriage. Wearable sensors could provide a cost-effective, portable solution for recording these measures, but establishing their validity is necessary. This study aimed to determine the agreement of several measures of stride time variability across five wearable sensors (Opal APDM, Vicon Blue Trident, Axivity, Plantiga, Xsens DOT) and force plates during military load carriage. Nineteen Australian Army trainee soldiers (age: 24.8 ± 5.3 years, height: 1.77 ± 0.09 m, body mass: 79.5 ± 15.2 kg, service: 1.7 ± 1.7 years) completed three 12-min walking trials on an instrumented treadmill at 5.5 km/h, carrying 23 kg of an external load. Simultaneously, 512 stride time intervals were identified from treadmill-embedded force plates and each sensor where linear (standard deviation and coefficient of variation) and non-linear (detrended fluctuation analysis and sample entropy) measures were obtained. Sensor and force plate agreement was evaluated using Pearson’s r and intraclass correlation coefficients. All sensors had at least moderate agreement (ICC > 0.5) and a strong positive correlation (r > 0.5). These results suggest wearable devices could be employed to quantify linear and non-linear measures of stride time variability during military load carriage.


Introduction
Injury in the military is problematic, as it impacts a soldier's physical and mental health, training, and progression and an army's general readiness for deployment [1].The burden of injury is predicted to cost the Australian Defence Force more than AUD 210 million per year, with around 17.7 injuries occurring per 100 person-years of active service [2,3].Load carriage has become a focal point of investigation as one of the highest self-reported causes of soldier injury and has been shown to contribute to overall musculoskeletal injury (MSI) risk [4][5][6].Military load carriage commonly exceeds 20 kg during general patrol duties; however, evidence suggests that >13 kg of a load can increase MSI risk by 50-60% [7].At a macro level, applying load management principles such as adjusting the intensity, frequency, and duration of military training activities can decrease the number and severity Sensors 2024, 24, 3378 2 of 12 of injuries [8,9].However, much research has endeavoured to identify the more subtle predispositions, behaviours, adaptions, and risk factors that precede military-related injury.
The examination of gait characteristics under military-relevant load has shown that stride length, stride frequency, gait variability, and trunk lean are all affected when compared with unloaded gait [10,11].Kinematic changes may be necessary, but understanding their potential thresholds and identifying maladaptation could be vital to understanding injury risk.Traditional linear statistical measures of stride time variability, such as the standard deviation (SD) and coefficient of variation (CV), are used to quantify the magnitude of stride-to-stride fluctuations [12].These linear measures have primarily been used in the aging population and can differentiate between a healthy gait, those at an increased fall risk [13], and those with motor diseases [14].Regarding military load carriage, Springer [15] performed a longitudinal study monitoring the gait characteristics of 76 soldiers during their first year of service.Using multiple logistic regression, they reported that linear stride time variability was associated with lower-body injury risk.Despite this, non-linear measures are considered more sensitive than linear measures [16] but have yet to be used to examine stride time variability during military load carriage in the context of injury risk.Further information exists within the temporal structure of stride time and can be explored using non-linear measures such as detrended fluctuation analysis (DFA) [17,18] and sample entropy [19].DFA quantifies the persistence of stride time patterns, whereas SE measures the predictability or regularity.A healthy stride time pattern exhibits persistent long-term correlation; on the contrary, a loss of persistence and an increase in regularity are often associated with an impaired gait [20].Although non-linear measures are more sensitive to changes in stride time than linear measures, they require the accurate detection of heel contact events.
Force plates are considered one of the gold standard methods for recording heel contacts [21,22].However, these can be large, expensive, and immobile, which limits the application and accessibility of stride time analysis in a field setting [23].Wearable devices (e.g., inertial measurement units (IMUs)) offer a potential solution, as they are comparably smaller, lighter, and inexpensive, making them ideal for monitoring soldiers during military activities [24].Previous research has shown that IMUs attached to the foot are valid for detecting heel contact events, stride time, walking speed, swing time, stride length, and cadence [25,26].However, IMU validity for measuring linear variability has been reported to range from poor to excellent (ICC: 0.22-0.90)when sensors are placed at the ankle or waist [27,28].The difficulty of consistently identifying heel contact events from the IMU signal over time is thought to contribute to the validity [28,29].Few studies have assessed the use of IMUs for quantifying non-linear measures of stride time, such as DFA [30].To date, no studies have explored these measures in a military context, where speeds and loads can affect a soldier's walking dynamic behaviour.
The aim of this study was to assess the agreement between several wearable sensors and force plates to quantify linear and non-linear stride time measures using recommended approaches [31].The results will help determine the utility of wearable sensors in measuring non-linear stride time variables of walking in the military context, which can be potentially used as markers of MSI risk.

Experimental Overview
The research design of this study was part of a broader investigation aimed at exploring various biomechanical and physiological aspects of load carriage [33].The participants attended three sessions, with each session taking place one week apart.In the first session, the participants completed two minutes of walking on a Tandem Force-Sensing Treadmill (AMTI Inc., Watertown, MA, USA) to familiarise themselves with the device and task.In each session, the participants completed a 12 min treadmill walk whilst holding a replica F88 Austeyr rifle (3.2 kg) at 5.5 km/h at a 0 • incline.The participants were fitted with 23 kg of a body-borne load distributed via a weighted vest (anterior load: 17 kg, posterior load: 6 kg), and they wore loose active clothing with military boots (Figure 1).

Experimental Overview
The research design of this study was part of a broader investigation aimed at explo ing various biomechanical and physiological aspects of load carriage [33].The participan attended three sessions, with each session taking place one week apart.In the first sessio the participants completed two minutes of walking on a Tandem Force-Sensing Treadmi (AMTI Inc., Watertown, MA, USA) to familiarise themselves with the device and task.I each session, the participants completed a 12 min treadmill walk whilst holding a replic F88 Austeyr rifle (3.2 kg) at 5.5 km/h at a 0° incline.The participants were fitted with 23 k of a body-borne load distributed via a weighted vest (anterior load: 17 kg, posterior loa 6 kg), and they wore loose active clothing with military boots (Figure 1).The participants were instructed to walk in the middle of the treadmill and make a exaggerated stomp as their first step to assist with force plate and sensor synchronisatio A countdown of "3-2-1-go" was used to begin each trial.During the trials, the participan were informed about the remaining time, but no additional audible or visual stimulus wa present throughout the trials.Five wearable sensors were secured to/in each participant boots using a combination of Fixomull, rigid tape, and/or Velcro straps (Figure 2).As th study was a part of a more extensive research programme, the participants also wore portable spiroergometer, wireless electromyography sensors, and motion capture mark ers throughout their trials.The participants were instructed to walk in the middle of the treadmill and make an exaggerated stomp as their first step to assist with force plate and sensor synchronisation.A countdown of "3-2-1-go" was used to begin each trial.During the trials, the participants were informed about the remaining time, but no additional audible or visual stimulus was present throughout the trials.Five wearable sensors were secured to/in each participant's boots using a combination of Fixomull, rigid tape, and/or Velcro straps (Figure 2).As this study was a part of a more extensive research programme, the participants also wore a portable spiroergometer, wireless electromyography sensors, and motion capture markers throughout their trials.

Instrumented Treadmill
The force and centre of pressure data were collected from the treadmill's integ force plates (FP; 1000 Hz) via Vicon Nexus software (v2.14, Vicon Motion Systems Oxford, UK) and imported into MATLAB 2021a (Natick, MA, USA) [34] for pos cessing.Time-series data were downsampled to 120 Hz using cubic spline interpo to match the lowest sampling rate of the wearable sensors.A 120 Hz sample rat deemed appropriate, as it met the suggested minimum frequency for capturing variations in non-linear measures [35].The treadmill featured dual tandem force p allowing a participant's foot to move freely across both force plates while walking.C of pressure (COP) data were filtered using a 50 Hz second-order low-pass Butter filter.To calculate heel contact times, the antero-posterior (AP) COP was utilised contact was defined as the moment when the AP COP transitioned from a posterior anterior direction.MATLAB's "findpeaks" function identified this local maximu each contact.Medio-lateral (ML) COP was used to determine whether the first heel c was from the left or right heel using the first three seconds of a trial, with the "findp function identifying the first occurrence.

Wearable Sensors
Five sensors were used in this study: Axivity AX3 (AX; 200 Hz; Axivity Ltd., castle upon Tyne, UK), Xsens DOT (XS; 120 Hz; Movella Inc., Henderson, NV, USA), tiga (PG; 500 Hz; Plantiga Technologies Inc., Vancouver, BC, Canada), Opal APDM 128 Hz; APDM Inc., Portland, OR, USA), and Vicon Blue Trident (BT; 1000 Hz; Vico tion Systems Ltd., Oxford, UK).The placement of the sensors on the boot is illustra Figure 2. All sensors were chosen due to them being commercially available, with th to include a sample of popular sensors that incorporated both cost-effective and hig options.Each sensor recorded 3D-acceleration data, which was then saved in their re tive proprietary software.The raw data were later exported into MATLAB 2021a for processing.Time series data from sensors with a sampling rate greater than 120 Hz downsampled to 120 Hz using cubic spline interpolation.A 60 Hz second-order low Butterworth filter was applied to the vertical acceleration.To commence, heel co were detected using MATLAB's "findpeaks" function.The reference peak (RP) was lated as the average peak height of all the peaks identified.Peaks exceeding half of t were accepted as heel contacts.In conjunction with the RP, the average time betwe cepted heel contacts was calculated and used to reanalyse the time series and find a

Data Analysis 2.3.1. Instrumented Treadmill
The force and centre of pressure data were collected from the treadmill's integrated force plates (FP; 1000 Hz) via Vicon Nexus software (v2.14, Vicon Motion Systems Ltd., Oxford, UK) and imported into MATLAB 2021a (Natick, MA, USA) [34] for post-processing.Time-series data were downsampled to 120 Hz using cubic spline interpolation to match the lowest sampling rate of the wearable sensors.A 120 Hz sample rate was deemed appropriate, as it met the suggested minimum frequency for capturing subtle variations in non-linear measures [35].The treadmill featured dual tandem force plates, allowing a participant's foot to move freely across both force plates while walking.Centre of pressure (COP) data were filtered using a 50 Hz second-order low-pass Butterworth filter.To calculate heel contact times, the antero-posterior (AP) COP was utilised.Heel contact was defined as the moment when the AP COP transitioned from a posterior to an anterior direction.MATLAB's "findpeaks" function identified this local maximum for each contact.Medio-lateral (ML) COP was used to determine whether the first heel contact was from the left or right heel using the first three seconds of a trial, with the "findpeaks" function identifying the first occurrence.

Wearable Sensors
Five sensors were used in this study: Axivity AX3 (AX; 200 Hz; Axivity Ltd., Newcastle upon Tyne, UK), Xsens DOT (XS; 120 Hz; Movella Inc., Henderson, NV, USA), Plantiga (PG; 500 Hz; Plantiga Technologies Inc., Vancouver, BC, Canada), Opal APDM (OP; 128 Hz; APDM Inc., Portland, OR, USA), and Vicon Blue Trident (BT; 1000 Hz; Vicon Motion Systems Ltd., Oxford, UK).The placement of the sensors on the boot is illustrated in Figure 2. All sensors were chosen due to them being commercially available, with the aim to include a sample of popular sensors that incorporated both cost-effective and highend options.Each sensor recorded 3D-acceleration data, which was then saved in their respective proprietary software.The raw data were later exported into MATLAB 2021a for post-processing.Time series data from sensors with a sampling rate greater than 120 Hz were downsampled to 120 Hz using cubic spline interpolation.A 60 Hz second-order low-pass Butterworth filter was applied to the vertical acceleration.To commence, heel contacts were detected using MATLAB's "findpeaks" function.The reference peak (RP) was calculated as the average peak height of all the peaks identified.Peaks exceeding half of the RP were accepted as heel contacts.In conjunction with the RP, the average time between accepted heel contacts was calculated and used to reanalyse the time series and find all heel contacts.The AX, XS, PG, and OP sensors captured right heel contacts, whereas the BT sensor captured left heel contacts.

Data Processing
Data recording began as the participant stood in a static position on the treadmill.BT data were synchronised with the treadmill force plates and recorded in Vicon Nexus.The first 20 s of recorded data were removed to eliminate any effects of gait initiation.Matched stride time series were then adjusted to include 513 consecutive heel contacts (512 stride times).The stride time series' mean (M), CV, and SD were calculated in RStudio (RStudio, 2022 [36]), whereas the DFA and SE were calculated in MATLAB.The DFA-alpha scaling exponent was calculated using the average evenly spaced windows method, which has been shown to increase DFA-alpha's precision [17].The window size range was set from nmin = 4 to nmax = N/9 (57), with k (21) estimated using the method described by Liddy and Haddad [37].SE was calculated following Richman and Moorman's [38] method, with the parameters of r = 0.2 (tolerance ratio) and m = 2 (vector length).

Statistical Analysis
Out of a possible total of 1995 (participants (19), sensors (7, including left and right FP), sessions (3), stride time measures (5, including mean)) stride time data points, 1540 data points were collected from the participants.There were a few reasons for the reduced number of data points.One participant did not return after their initial session due to an unrelated injury, and several participants did not complete all conditions.Of the collected data points, 24 data points were identified as outliers that fell below −3.5 or above 3.5 using modified z-score cut-offs and were removed [39], leaving 1516 stride time data points.After this procedure, all variables were normally distributed (Shapiro-Wilk's test) with homogeneous variance (Levene's test; p > 0.05).Using the "irr" RStudio package [40], Pearson's r was used to assess the relative agreement between the sensor and FP measures.A two-way random-effects single-measures intraclass-correlation model (ICC 2,1) was used to evaluate the absolute agreement.The agreement was categorised as poor (ICC < 0.5), moderate (ICC = 0.5-0.75),good (ICC = 0.75-0.90),or excellent (ICC > 0.9) using the criteria proposed by Koo and Li [41].The significance was set at p < 0.05.Bland-Altman plots were produced using the "blandr" RStudio package [42].The "Metrics" RStudio package [43] was used to calculate the root mean squared error (RMSE) between the measures obtained with each sensor and the FP.A paired-samples t test was performed using the "stats" RStudio package [36] to assess if the means between the sensors and the FP were statistically different for each measure.

Results
Table 1 presents summary statistics (mean, standard deviation, and RMSE) for the force plates (FP) and each wearable sensor for all variables.There were significant differences between the force plates and AX for the stride time mean and standard deviation, XS for the stride time mean, and APDM for all variables (Table 1).

Absolute Agreement
Absolute agreement between the sensor and FP-calculated stride time measures, as measured by ICC, was demonstrated to be at least moderate (p < 0.05; Table 2).Linear measures of SD and CV exhibited good-to-excellent agreement between all sensors and FP (ICC > 0.88).For the non-linear measures of DFA and SE, there was good-to-excellent agreement between FP and XS and AX and PG, whereas for BT and OP, the level of agreement was moderate-to-good (Table 2).

Relative Agreement
All sensors demonstrated significant (p < 0.05) and strong (r > 0.5) positive correlations with FP for all stride time measures (Table 3).Linear measures reported stronger correlations (r = 0.88-0.99)than non-linear measures (r = 0.65-0.91).The inspection of the Bland-Altman plots shows positive and negative dispersion for all sensors with the force plates, with very little evidence of systematic or proportional bias and very few data points outside the limits of agreement.

Discussion
This study aimed to evaluate the agreement between several wearable sensors and force plates to quantify linear and non-linear stride time measures.All sensors exhibited at least moderate agreement and a strong positive correlation with FP across all measures.Wearable sensors appear to be a cost-effective and portable option for assessing linear and non-linear measures of stride time and its variability during load carriage.
Linear measures (SD, CV) demonstrated good-to-excellent agreement (ICC > 0.85) and a strong positive correlation (r > 0.88) with FP.This is consistent with the ICC values reported for waist-worn IMUs [28] but demonstrated higher agreement than the reported values for IMUs placed on the ankle [44].The observed differences are likely due to the dissimilar algorithms utilised for calculating heel strikes from the IMUs.Rantalainen [44] in 2019 used IMUs placed on the ankle and hypothesised that their poor concurrent validity resulted from their gyroscope-based heel detection algorithm, which is different from the accelerometer-based algorithm used in the current study.Rantalainen [28] was able to achieve excellent concurrent validity for waist-worn IMUs; however, this was achieved using an algorithm that removed heel strikes outside a ± 0.2-s threshold of the forceplate-recorded heel strike.The current study shows that linear measures calculated from wearable devices are comparable to those of force plates without the use of an algorithm that relies on another concurrent analysis.
The non-linear measures obtained with the wearable sensors exhibited lower agreement (ICC = 0.57-0.92)and weaker correlations (r = 0.65-0.91)with the FP than the linear measures.This is likely because non-linear measures explore the temporal structure of a time series rather than the overall magnitude of variability (e.g., SD), making them more sensitive to subtle changes occurring on a stride-by-stride basis.DFA and SE have a greater dependency on the accurate detection of gait events and are likely influenced by the disparity between the original sampling frequencies of the sensors and the force plates [45].Liddy [46] explored the effect of the sampling rate on stride time DFA alpha and concluded that a lower sampling frequency could distort a stride time series and decrease the presence of long-term correlations.However, the latter only occurred for sampling frequencies below 120 Hz, with DFA calculations using 120-360 Hz being unaffected.Notably, most of the included sensors in this study sampled within the mentioned range, except for PG (500 Hz) and BT (1000 Hz), which sampled at a higher rate.However, the FP recorded considerably higher values (1000 Hz) than most of the sensors, making FP more sensitive to subtle stride time variations.Alternatively, it is possible that using a sampling frequency that is too high may introduce non-biological white noise, which may result in lower agreement between the sensors and FP for non-linear measures [47].
It has been suggested that downsampling COP data can lead to a linear decrease in the DFA alpha and a linear increase in SE.This has the potential to affect the results of the current study, as several sensors and FPs were downsampled by different magnitudes to reach the common frequency of 120 Hz [48].However, after conducting a subsequent comparison between the FP and the sensors using their original sampling frequency, the ICC and correlation values were similar when compared with the 120 Hz comparison (see Appendix A).The one exception to this was SE (decreased agreement with the original frequency), which is known to be affected by the sample frequency, as the interval between consecutive data points decreases when the sample rate is increased [49].
The Bland-Altman plots (Figures 3 and 4) and the RMSE values (Table 1) demonstrate that there were only small measurement differences between each sensor and the FP when calculating linear and non-linear measures.Despite the generally small differences observed, there are instances where values fall outside the limits of agreement in the Bland-Altman plots (Figures 3 and 4).These outliers may be due to measurement or calibration errors.Alternatively, they could be due to unexpected changes in a participant's gait, which could affect their heel contact with the force plate and impact their detection by the sensor's algorithm to a different extent.Furthermore, when comparing non-linear measures (Figure 4) to linear measures (Figure 3), there appears to be a greater dispersion in agreement, indicating more variability in agreement.The results of the paired t tests indicated a significant difference in the means calculated from certain sensors (OP, AX [SD]) when compared to the measurements obtained from the FP.Previous studies have shown that differences of >0.19 in DFA [50] and >0.66 in SE [51] can differentiate between healthy and impaired walking.Therefore, it is highly probable that wearable sensors could also be used to discriminate between similar population groups.There were a few limitations that may have influenced the results of the study.The position of the OP sensors being over the shoelaces rather than close to the heel may have affected the stride time values obtained from this sensor, as demonstrated by the significant t test and lower levels of absolute and relative agreement.However, due to the OP   There were a few limitations that may have influenced the results of the study.The position of the OP sensors being over the shoelaces rather than close to the heel may have affected the stride time values obtained from this sensor, as demonstrated by the significant t test and lower levels of absolute and relative agreement.However, due to the OP Sensors 2024, 24, 3378 9 of 12 There were a few limitations that may have influenced the results of the study.The position of the OP sensors being over the shoelaces rather than close to the heel may have affected the stride time values obtained from this sensor, as demonstrated by the significant t test and lower levels of absolute and relative agreement.However, due to the OP software (Moveo Explorer, 2020 [52]) and sensor shape, it was not feasible to change its location.A similar issue occurred with the PG sensor, which only had one possible location (insole).Future research should randomise each device's location to ensure that the sensor's position on the foot does not affect the agreement with force plate measures.The decision to position the sensors around the boot of the participants was based on the consideration that this area would have a minimal impact on a soldier's ability to perform their duties in a field setting.Consistent heel contact detection at the ankle/foot has been recognised as a potential limitation of existing gait analysis methods [26,27].When quantifying a discrete signal (stride time) from a continuous signal, the discretisation error is present in the calculation of all measures.Although the sampling frequency may have affected the results of the present study, the aim of the study was to compare currently available technologies and not to explore the fundamentals of time series non-linear analysis and the effects of technical specifications on them.Each sensor's absolute and relative agreement with the force plates sampling at their original frequency is shown in Appendix A (without the removal of outliers).All results are comparable to that of their downsampled agreement, apart from SE, which, as previously mentioned, is known to be influenced by sample frequency [49]).Continuous advancements in wearable devices and a consensus on the technical specifications and parameters used for non-linear analyses may help solve some of the present clinometric issues that prevent their expanded use.
In summary, the study demonstrated that linear and non-linear measures obtained with wearable devices' data are comparable to those obtained with FP during a loaded military march.Therefore, this technology can be used to examine the potential use of dynamic behaviour metrics in future research, such as assessing injury risk, and should encourage broader and in-field research to understand how constraints (e.g., load, terrain) affect soldiers' dynamic performance during military walking.

Figure 1 .
Figure 1.Experimental setup showing a participant on the instrumented treadmill.The coloure squares indicate the position of each embedded force plate.

Figure 1 .
Figure 1.Experimental setup showing a participant on the instrumented treadmill.The coloured squares indicate the position of each embedded force plate.

Figure 2 .
Figure 2. Schematic of sensor positions on/in military boots.

Figure 2 .
Figure 2. Schematic of sensor positions on/in military boots.