1. Introduction
Three-dimensional (3D) motion capture within human biomechanics typically involves the use of infrared cameras and retro-reflective markers affixed to participants. This methodology, which can track 3D marker location at the sub-millimeter level depending on the capture volume, is commonly accepted as the nearest method to approach the “gold standard” fluoroscopy approach [
1]. Although these marker-based methods are widely used in clinical and research domains, several limitations may challenge their practicality for certain applications.
Fundamentally, motion capture aims to measure movement of the skeletal system. Retro-reflective markers placed on the skin of key bony landmarks (e.g., lateral malleolus) may move relative to the underlying skeletal system (i.e., bones) due to the movement of clothing (if present), skin, and soft tissue (e.g., adipose) [
2]. Several methodological approaches have sought to minimize the influence of these artifacts; however, they cannot eliminate them [
3]. As such, participants are typically asked to wear minimal or tight clothing to reduce this error source. Further, when data collection periods are long or participants sweat profusely (e.g., the maximal aerobic capacity running test), the propensity for markers moving or falling off is heightened. Both situations would compromise data fidelity.
Markers placed on the participant are used to define a biomechanical model of the skeletal system. For example, markers are commonly placed on the femoral condyles with the vector connecting them representing the knee joint axis (i.e., the axis about which flexion and extension occur). The placement of the marker on the exact location of the condyle is subject to both random and systematic error as placement in research investigations can vary both between and within researchers [
4,
5,
6] (i.e., repeated measures design). Thus, although markers can be tracked precisely, their location relative to the underlying skeletal system is approximate, inducing data variability and compromising reliability.
From a pragmatic standpoint, the time required to securely fixate markers to participants is a serious drawback to the ease of use and clinical adoption of marker-based motion capture. Accurately fixing markers to a participant could take between 20 and 30 min for a skilled researcher in addition to the time required to prepare the markers prior to the data collection session. The presence of the markers on the body/clothing may also raise the specter that the participant will be cognizant of their presence and potentially alter their movement.
Marker-less (ML) motion capture systems are similar to marker-based systems in that they seek to accurately and repeatably measure the 3D human segmental motion. Fundamentally, both approaches attempt to simplify the complex representation of the human skeletal system with a simplified biomechanical model. ML systems do not require markers to be placed on the participant and rely on synchronized 2D video cameras to obtain a 3D reconstruction. The obviation of markers has (i) vastly reduced the time and complexity for subject preparation, (ii) is non-invasive, and (iii) allows for data collection in non-laboratory settings (e.g., sporting games). With recent technological advances in computational speed and an apparent commercial market for ML systems, several software packages are available for purchase (e.g., Theia3D, DariMotion, and The Captury) or available as open source (e.g., OpenPose and OpenCap). It is beyond the scope of this manuscript to provide a comprehensive review dissecting similarities and/or differences between ML software approaches that each take towards generating 3D pose representations of the biomechanical model; however, the reader is pointed towards Wade et al. [
7] for a comprehensive review.
Theia3D (Theia Marker-less Inc., Kingston, ON, Canada), a commercial provider of ML motion capture, utilizes a machine-learning based approach to solve the 3D biomechanical model representation from synchronized and calibrated 2D video. Kinematics measured with Theia3D and concurrently with the gold standard marker-based motion capture have compared favorably for treadmill walking gait, overground walking gait, baseball pitching, and boxing [
8,
9,
10,
11]. In particular, walking gait spatial metrics (e.g., stride length, step length, and stride width) determined from Theia3D had good comparisons to marker-based motion capture and fell below the minimal detectable change for these parameters [
12]. Sagittal and frontal plane lower body joint angles determined from Theia3D deviated from marker-based motion capture by between 2.6 and 11°, while transverse plane lower body joint angles deviated up to 13° for walking gait [
10]. However, a follow-up study determined the inter-session repeatability for walking lower extremity joint angles determined from Theia3D was less than 2.5° across all joint angles which outperforms traditional marker-based motion capture [
13]. For upper extremity sporting motions (baseball pitching and boxing), Theia3D’s joint angle determination differed more substantially from the gold standard with reduced agreement occurring along the internal/external joint axes [
9,
14]. Despite the differences, segment velocities compared favorably, and patterns were similar enough to warrant consideration for future applications. To date, no investigation has reported the reliability of treadmill running gait kinematics when assessed with ML motion capture over multiple sessions.
Determination of key event markers (e.g., foot strike and toe off) is critical for the biomechanical analysis of running gait. In experimental and clinical setups without a force plate, these event markers can either be determined via visual inspection of a video recording or by kinematic algorithms. Kinematic algorithms using marker-based motion capture have achieved an absolute error of less than 24.7 ms for foot contact (using a rearfoot strike) and less than 5.3 ms for toe off when compared to kinetic outputs [
15]. Milner and Paquette [
16] also reported excellent agreement between event timings from a force plate and their foot contact algorithm which utilizes the vertical velocity of the pelvis segment. The algorithm’s accuracy was assessed for all foot strike types (rear, mid, and forefoot) during overground running.
To date, no investigation has reported the repeatability of ML motion capture (Theia3D) for the determination of kinematics (e.g., joint angles) and subsequent detection of key gait events during treadmill running across a range of speeds. Therefore, the following specific aims are proposed for this study:
- (1)
To determine the level of agreement between spatiotemporal metrics (stance time, step length, and cadence) derived from automatic detection of foot strike and toe off during running to a gold standard assessment (an instrumented pressure treadmill).
- (2)
To determine the intra-trial variability, inter-session variability, and variability ratio of lower extremity joint angles (hip, knee, and ankle) computed from an ML motion capture system across the entire running gait cycle.
- (3)
To determine the inter-session repeatability, standard error of measurement, and minimal detectable change of 15 key discrete biomechanical metrics of the stance phase of running.
2. Materials and Methods
All study procedures were approved by the University’s Institutional Review Board and conducted in accordance with the Declaration of Helsinki. Participants were required to (i) be between the ages of 18 and 30 years, (ii) run at least 16 km per week for at least three consecutive months prior to testing, (iii) have a minimum of three years of running experience, and (iv) be familiar with treadmill running. Additionally, participants were required to be free of any history of major medical problems, including metabolic or cardiovascular disease, endocrine disorders, thermoregulatory disorders, or musculoskeletal injuries within the previous eight weeks. Twenty-one healthy, adult runners (14 females and 7 males; age: 19.5 (1.4) years; height: 1.72 (0.08) m; mass: 64.2 (12.2) kg; running experience: 7.3 (2.4) years) met the above criteria, granted informed consent, and were included within the current investigation.
2.1. Instrumentation
A stadiometer (Tanita Corporation; Arlington Heights, IL, USA) was used to record each participant’s mass and standing height prior to the first data collection session. A PhysTread Pressure Treadmill (Noraxon USA, Scottsdale, AZ, USA) containing 3120 sensors collected force and pressure data (100 Hz) during the running trials. Eight Sony RX0 II cameras (Sony Corporation; Minato, Japan; 120 Hz) were synchronized via Sony camera control boxes and positioned to accommodate a capture volume of approximately 2.4 × 2.4 × 3.1 m dimensions with the treadmill centralized (
Figure 1). The cameras were moved minimally throughout the entire investigation, but they were recalibrated prior to each day of data collection. The treadmill was moved between sessions; however, tape placed on the floor allowed the treadmill position to be nearly identical. Participants were permitted to use any running footwear that they desired (
Table 1); however, footwear was maintained across the three testing sessions. Participants were instructed to wear their “normal” running attire such that it consisted of shorts (or short tights) and a shirt, which was to be tucked into the shorts. Previous work has reported that clothing type only induced negligible influences during overground walking with the ML motion capture software used in this study [
17].
2.2. Procedures
This repeated measures research investigation featured three distinct data collection days with at least one day and no more than ten days between visits. Each participant ran for a total of six minutes on the instrumented pressure-sensitive treadmill (TM) at three distinct speeds that were self-selected by the participant based upon their rating of perceived effort. The speeds corresponded to a 3 out of 10 in effort (trial 1), a 5 out of 10 in effort (trial 2), and a 7 out of 10 in effort (trial 3). The speeds were held constant over the next two testing days. The last ~25 s at each speed were recorded by the eight video cameras and the pressure-sensitive TM. Video and TM data were not synchronized but recordings were started and stopped at approximately the same time by two researchers using “Go” and “Stop” commands.
2.3. Data Processing and Analysis
Noraxon MyoPressure software (Noraxon USA, Scottsdale, AZ, USA) collected and processed force and pressure TM data. Data were filtered with a 4th order Butterworth lowpass filter (50 Hz) prior to bilateral stance time, step length, and cadence being computed and subsequently averaged for each trial. Visual observation of foot strike and evaluation of center of pressure mapping was used to classify foot strike into rearfoot strike (RFS) and non-rearfoot strike patterns.
Theia3D ML motion capture software (Theia Marker-less Inc., Kingston, ON, CA) determined three-dimensional human segment location based upon the eight video recordings per trial. The derived 19-segment kinematic model consisted of an upper extremity and lower extremity kinematic chain. Lower extremity joints (hip, knee, and ankle) connecting neighboring segments within the kinematic chain were modeled as having three rotational degrees of freedom.
Kinematic data (4
× 4 segment rotation matrices) and locations of virtual bilateral heel markers were exported and subsequently processed with Visual3D Professional (C-Motion Inc.; Germantown, MD, USA). Data were filtered with a GCVSPL filter with a 12 Hz cutoff frequency [
18]. A built-in Visual3D model with segment properties (segmental mass and center of mass location) derived from previous work [
19,
20,
21] was applied to determine full body center of mass (COM) location throughout each trial.
The Z-coordinate (vertical direction) of the distal endpoint of the toe segment was utilized to determine the toe-off gait event for each running gait cycle. A two-frame offset preceding when the Z-coordinate switched from the local minima to superior vertical movement was tagged as toe off (
Figure 2). The vertical velocity of the COM (vCOM
z) was utilized to determine foot contact. This algorithm, similar to Milner and Paquette [
16] who utilized the peak inferiorly directed pelvis segment velocity to detect foot strike, determined foot contact to be the data frame preceding the minima of vCOM
z (
Figure 3). Stance time, cadence, and step length computed and based on these kinematical-derived gait events were then compared to instrumented TM data from the 21 participants across the three respective speeds (63 trials) from the first data collection session.
Lower extremity joint angles (hip, knee, and ankle) were calculated between adjacent segments in all planes via Visual3D software using a standardized approach (Cardan rotation sequence: X-Y-Z) consistent with the joint coordinate system [
22]. Segmental angles of the pelvis and trunk were computed relative to the laboratory coordinate system. Each gait cycle within the 25–30
s trials was time normalized (100 points) and exported to a custom Python software (ver. 5.3.3) script for final analysis. Time normalization was performed separately using the ML kinematic-derived gait events. All trials exceeded seven gait cycles, which has been reported to increase the confidence level to 90% (R > 0.9) for running kinematic data [
23].
The timing of gait events derived from ML kinematics was compared to those from an instrumented TM for agreement using three metrics: (1) mean differences (MD), (2) Bland–Altman limits of agreement (LoA) over 95% confidence intervals, and (3) intraclass correlation coefficients (ICCs) with a two-way mixed effects model and mean of k measurements with absolute agreement. Correlations were classified as poor, moderate, good, and excellent for ICC values below 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and over 0.9, respectively [
24].
Two distinct reliability analyses (time series of full gait cycle and discrete variables) were conducted. Reliability across the entire gait cycle was assessed via intra-trial and inter-session variability based upon the work of Schwartz et al. [
25] and Kanko et al. [
13] (
Figure 4). Intra-trial reliability was defined as the average standard deviation across all the gait cycles (total > 30) within a trial. This average standard deviation within each trial was then averaged across all speeds and runners (n = 63) to arrive at an average intra-trial reliability value. Data from the first data collection session were used for this analysis. The intra-trial variability reflects the typical joint movement fluctuations that may be associated with TM running and ML motion capture. Inter-session variability was defined as the standard deviation of each joint angle across the time series (multiple gait cycles) for each speed and runner. Inter-session variability may reflect methodological fluctuations that arise from differences in camera calibration, participants’ clothing, lighting, etc. A variability ratio was then computed as the quotient of the inter-session variability and the intra-trial variability. A ratio greater than one would indicate methodological variances between sessions that exceed the typical variance within a TM running trial.
As clinical running gait analysis typically focuses on discrete variables, inter-session reliability was also assessed for kinematic variables at foot contact, toe off, and peak joint angles during the stance phase. Reliability for these variables was assessed using ICC3,1, standard error of measurement (SEM), and minimum detectable change (MDC). JASP software (ver. 0.16.4) was utilized for all statistical computations.
3. Results
Twenty-one participants completed all study procedures with 6.5 ± 3.2 days between visits. Participants demonstrated varied foot strikes (13 rearfoot and 8 non-rearfoot) while wearing an assortment of footwear across a range of TM speeds (2.67–4.44 m/s) (
Table 1). On average, participants completed 37.6 ± 4.6 running cycles per trial.
Table 1.
Twenty-one participants completed the study procedures with sex, footwear and foot strike breakdown below. Foot strike was determined from visual inspection and center of pressure data.
Table 1.
Twenty-one participants completed the study procedures with sex, footwear and foot strike breakdown below. Foot strike was determined from visual inspection and center of pressure data.
| Rearfoot Strike | Non-Rearfoot Strike | Full Population |
---|
| n | % | n | % | n | % |
---|
Sex | | | | | | |
Female | 10 | 47.6% | 4 | 21.1% | 14 | 66.7% |
Male | 3 | 14.3% | 4 | 21.1% | 7 | 33.3% |
Footwear | | | | | | |
Adidas | 1 | 4.8% | 0 | 0.0% | 1 | 4.8% |
Asics | 2 | 9.5% | 0 | 0.0% | 2 | 9.5% |
Brooks | 0 | 0.0% | 2 | 10.5% | 2 | 9.5% |
Hoka | 3 | 14.3% | 1 | 5.3% | 4 | 19.0% |
Mizuno | 1 | 4.8% | 0 | 0.0% | 1 | 4.8% |
New Balance | 2 | 9.5% | 1 | 5.3% | 3 | 14.3% |
Nike | 1 | 4.8% | 2 | 10.5% | 3 | 14.3% |
On Cloud | 2 | 9.5% | 0 | 0.0% | 2 | 9.5% |
Saucony | 1 | 4.8% | 2 | 10.5% | 3 | 14.3% |
3.1. Spatiotemporal Metrics
Spatiotemporal metrics computed from events determined through ML kinematics were compared to these same metrics computed from the instrumented TM. Mean cadence (ML: 170.6 ± 9.0 steps·min−1; TM: 170.4 ± 9.0 steps·min−1; MD = −0.13; LoA: (−0.80, 0.54); and ICC = 1.0), mean stance time (ML: 0.234 ± 0.019 s; TM: 0.233 ± 0.021 s; MD = −0.001; LoA: (−0.014, 0.011); and ICC = 0.982), and mean step length (ML: 1.22 ± 0.12 m; TM: 1.22 ± 0.12 m; MD = 0.02 cm; LoA: (−0.80, 0.83); and ICC = 1.0) demonstrated excellent agreement between ML motion capture and instrumented TM when averaged across all trials (n = 63).
3.2. Intra-Trial Variability
Average intra-trial variability for all lower extremity joint angles was less than 3° (
Table 2,
Figure 5). When averaged across all joints and planes, intra-trial variability was 1.7°. Maximum variability (4.7°) occurred for knee flexion during the swing phase.
3.3. Inter-Session Variability
Average inter-session variability for all lower extremity joint angles was less than 2° (
Table 2,
Figure 5). When averaged across all joints and planes, inter-session variability was 1.1° with knee flexion/extension demonstrating the greatest variability (2.3°) between the three data collection sessions.
3.4. Variability Ratio
Inter-session variability for all joints and planes was less than intra-trial variability resulting in all variability ratios being less than 1. When averaged across all joints and planes, the variability ratio was 0.67.
3.5. Reliability of Discrete Metrics
At initial foot contact, all sagittal plane variables demonstrated either good or excellent reliability (ICC
3,1 range: 0.781–0.956;
Table 3). In the frontal plane, both the knee and trunk variables demonstrated excellent reliability, while the ankle, hip, and pelvis variables demonstrated moderate reliability (ICC
3,1 range: 0.674–0.745). In the transverse plane, the knee, pelvis, and trunk variables demonstrated either good or excellent reliability (ICC
3,1 range: 0.791–0.952), while the ankle and knee demonstrated moderate and good reliability, respectively. Across all planes, SEM values were low, ranging from 0.43 to 1.31° and averaging 0.68°.
All kinematic peak angles during stance demonstrated either good or excellent reliability (ICC range: 0.760–0.958), except for transverse plane rotations of the ankle and hip which demonstrated moderate and poor reliability, respectively (
Table 4). Across all planes, SEM values were low (<1.1°), averaging 0.67°.
At toe off, all kinematic metrics in the sagittal plane demonstrated excellent reliability (ICC range: 0.917–0.956), except for the hip and pelvis variables (ICC = 0.659, 0.669) (
Table 5). All kinematic metrics in the frontal plane demonstrated either good or excellent reliability (ICC range: 0.831–0.911), except for the ankle variable (ICC = 0.644). All kinematic metrics in the transverse plane demonstrated either good or excellent reliability (ICC range: 0.827–0.940), except for the ankle and hip variables (ICC = 0.501, 703). Across all planes, SEM values were low (<0.4°), averaging 0.27°.
4. Discussion
Distinguishing between measurement error and true change in repeated measures study design (clinical or research application) requires reliable data sources. The aim of the present study was to assess the reliability of an ML motion capture system to measure key biomechanical metrics during TM running. In the current investigation, spatiotemporal metrics derived from the automatic detection of foot strike and toe off from ML kinematics demonstrated excellent agreement (ICC = 0.982) with the same metrics derived from an instrumented pressure TM. Mean differences in stance time were less than 10 milliseconds (less than the sampling interval of the instrumented TM) indicating that the automated kinematic-based algorithms detected stance time within one frame of force data. These results are within the error tolerance reported for automated event detection methods from marker-based motion capture of TM running [
15,
16]. As this study’s sample of runners included a variety of foot strikes utilizing various footwear and running speeds, the proposed ML kinematic algorithms appear to be quite robust and possibly viable as an alternative for clinics/labs without a force measurement device.
The average intra-trial variation of lower extremity joint angles in all planes during TM running was small (<3°). This is smaller than the intra-trial variability demonstrated during overground walking using the same ML motion capture system [
13]. The reduced variation can most likely be attributed to a more consistent velocity obtained during TM gait versus overground gait. Another potential reason for the reduced variation was that the present investigation used more than 30 gait cycles per trial, while an overground trial would only include 2–3 gait cycles. The increased number of gait cycles may better capture the normal variability during gait and reduce the influence of the extreme gait cycles. The maximum variability in the joint kinematics was 4.7° for knee flexion during the swing phase. Although it would be challenging to discern what percentage of this measured variance is due to intrinsic (natural variation) versus extrinsic (ML motion capture software) factors, increased knee flexion variability has been previously reported during terminal swing in running [
26]. The greatest variations over the entire gait cycle during TM running occurred for the sagittal plane joint angles (average 2.0°). The frontal plane joint angles demonstrated the least variance (average 1.2°).
The variability between sessions was less than the variability within a running TM trial, and as a result, the variability ratio for all lower extremity joint angles was less than one. This indicates that performing multiple sessions with an ML motion capture system (while being deliberate about methodological consistency) does not increase kinematic variability. This demonstrates one of the advantages of an ML motion capture system over a marker-based motion capture system. With a marker-based motion capture system, the placement of the reflective and the movement of the skin/clothing present potential sources of error [
27]. However, with an ML motion capture system, participants can be instructed to maintain footwear between sessions and wear typical running clothes (as in the current study) without increasing measurement variability. Keller et al. [
17] reported that clothing choice produced, on average, root-mean-square-differences of 2.6° within lower extremity joint angles during overground walking while using the same ML motion capture system used in this study. This magnitude exceeded inter-session differences for the current study and highlights that some standardization of clothing (i.e., “running clothes”) may reduce inter-session variability for TM running.
The final objective of the current investigation was to examine the reliability of discrete measures typically reported within research investigations or clinical assessments. Good or excellent agreement was noted in over 70% (30/45) of measures and moderate agreement in 27%. Only peak hip internal rotation excursion during stance demonstrated poor agreement (ICC
3,1 = 0.487) over the three sessions. On average, discrete measures demonstrated good reliability across all planes with sagittal and frontal plane metrics slightly better than transverse plane metrics (ICC
sagittal = 0.849, ICC
frontal = 0.828, and IC-C
transverse = 0.770). Bramah et al. [
28] reported similar findings (ICC
sagittal = 0.788, ICC
frontal = 0.833, and ICC
transverse = 0.771) in their investigation of the repeatability of TM running using a marker-based motion capture system. In the current study, the transverse plane angles of the ankle and hip had the worst reliability (ICC = 0.618, 0.620) and were lower than values (ICC = 0.759, 0.752) reported by Bramah et al. [
28]. Measurement of transverse plane angles has been noted as a challenge for ML motion capture with previous investigations reporting reduced performance in their accurate measurement [
9,
10]. Overall, an ML motion capture system can reliably measure key discrete metrics of TM running biomechanics over multiple sessions in a manner similar to using a marker-based system.
The average SEM for discrete measures at initial contact (0.68°), peak stance phase angle (0.67°), and toe off (0.27°) gave an indication of the precision of an ML motion capture system in assessing kinematics at these events. Additionally, the MDC reported for these metrics is required to ascertain whether biomechanical changes surpass clinical thresholds. On average, both the SEMs and MDCs reported are low; however, as noted by Bramah et al. [
28], future work would need to be conducted to determine if such small changes have practical significance as they relate to either running injury or performance.
Several limitations should be noted with regards to the current investigation. The ML motion capture system was not synchronized with the instrumented TM. Although data collection of the respective hardware was started and stopped at approximately the same time, it was not exactly matched. However, this limitation is mitigated slightly by the inclusion of some metrics that rely only on a change in time (e.g., stance time) and not the time value itself. The sampling frequency of the instrumented TM was low (100 Hz), thus reducing its resolution to 10 milliseconds in the determination of stance time. This could explain the minimal deviations that were noted in stance times between systems. Future work should collect synchronized data with an instrumented TM capable of collecting at a higher sampling rate to evaluate the accuracy of event marker detection algorithms introduced within this investigation. The current work was also delimited to (i) TM running, (ii) 0% incline, and (iii) healthy runners. Caution is warranted when applying the current results to testing environments (e.g., overground running) outside of these confinements.