1. Introduction
Quantitative gait analysis is a cornerstone of movement science, rehabilitation, and clinical neurology, providing objective measures of locomotor function that support diagnosis, prognosis, and intervention evaluation. Traditionally, three-dimensional (3D) motion capture has relied on marker-based optical systems, which are considered the gold standard for kinematic analysis; however, they are constrained by high costs, specialized infrastructure, and participant burden [
1,
2]. Advances in computer vision and machine learning have enabled the emergence of markerless motion capture (MMC), which derives body kinematics from standard video streams using human pose estimation algorithms. 3D-MMC systems are significantly decreasing the expenses associated with traditional motion capture, as machine learning reduces the reliance on specialized cameras and equipment. Open-source pose estimation is a valid and reliable method for measuring gait parameters (kinematics and spatiotemporal), comparable to that of a laboratory-based 3D Motion Capture system using markers [
3,
4,
5]. Thus, the 3D-MMC approach offers greater accessibility and scalability by reducing setup complexity and facilitating data collection in both laboratory and real-world environments [
4,
6].
Recent systematic reviews have consistently demonstrated that 3D-MMC achieves good to excellent validity and reliability for spatiotemporal gait parameters compared to marker-based systems, with intraclass correlation coefficients often exceeding 0.90 for metrics such as walking speed, cadence, and step length [
1,
4,
7]. However, kinematic accuracy is joint and plane-dependent: sagittal hip and knee angles show moderate to good agreement, while ankle kinematics and non-sagittal planes remain less reliable [
1,
2,
4]. These findings underscore the importance of adhering to rigorous methodological practices to optimize measurement quality and ensure comparability across studies. Several reviews emphasize that outcomes are susceptible to factors such as camera configuration, calibration procedures, participant setup, and trial segmentation, yet standardized protocols for MMC gait analysis remain scarce [
2,
8]. The most recent systematic review and meta-analysis specifically focusing on OpenCap has determined that validity and reliability are generally good to excellent, but that there is greater heterogeneity in the methodological framework and a lack of standardized procedures [
7].
Dual-task (DT) gait paradigms are widely used to probe the interaction between locomotion and concurrent cognitive demands, and DT-related changes in spatiotemporal parameters are frequently interpreted as indicators of cognitive–motor interference. However, meta-analytic work also emphasizes substantial methodological heterogeneity in DT protocols and outcome definitions, which complicates comparisons across studies and limits the interpretability of DT cost metrics [
9,
10,
11,
12,
13]. In this context, markerless motion capture protocols that are explicitly documented and accompanied by quantitative estimates of reliability and measurement error may provide a useful foundation for more reproducible DT gait research.
Therefore, the primary objective of the present study was to establish a standardized protocol and procedures for 3D-MMC-based gait analysis using OpenCap and to quantify the within-session relative and absolute reliability of key spatiotemporal gait parameters obtained with this protocol. By documenting procedures for camera positioning, calibration, lighting, participant setup, and trial segmentation, this work aims to advance reproducible, comparable MMC practices. As a secondary, illustrative objective, we applied this protocol to a set of simple cognitive dual-task walking conditions and used the intra-class correlation coefficients (ICCs), standard error of measurements (SEMs), and minimal detectable change (MDC) values to determine which dual-task-related changes in gait parameters can be distinguished from measurement error.
3. Results
In total, 500 walking trials were performed (50 participants × 10 trials), and 491 trials (98.2%) were successfully processed using the OpenCap gait analysis program. Nine trials (1.8%) failed to process due to technical errors (2 trials failed to upload, 7 trials failed to process overground gait analysis) and were therefore excluded from the statistical analysis. The nine failed trials affected data for a total of 7 participants; 1 trial/participant was excluded from the ST-CW data, and 8 trials for 6 participants from the DT data. This left complete data for 49 participants in the reliability analysis and 43 participants for the dual-task analysis. Participant baseline demographics are presented in
Table 2.
Within-subject test–retest reliability across the five ST-CW trials demonstrated good to excellent relative reliability as represented in
Table 3. ICC (3,1) ranged from 0.50 to 0.89, with the highest reliability observed for gait speed, stride length, and cadence. Step width exhibited good reliability, while double-support time showed moderate reliability. When the mean of all five ST-CW trials was analyzed, the average-measure ICC (3,5) values showed excellent reliability in gait speed, stride length, cadence, and step width; double-support time demonstrated good to excellent reliability.
Within-subject variability from the GLM (SD
(within
)) and manually derived SEM (1) values were virtually identical for all spatiotemporal variables, confirming consistency between the two absolute reliability approaches. For single trials, the corresponding MDC (1) values indicated that relatively modest changes in gait speed and stride length, but larger changes in step width, cadence, and double support, are required to exceed measurement error at the 95% confidence level. When outcomes were expressed as the mean of five trials, both SEM (5) and MDC (5) were markedly reduced for all variables, indicating a clear gain in measurement precision when multiple trials are utilized; this was especially evident for gait speed, stride length, and cadence, whereas step width and double support remained comparatively less precise but still benefited from multiple trials.
Table 4 reports the calculations described in the methods section for each spatiotemporal variable’s absolute reliability.
Descriptive statistics for the ANOVA analysis for gait parameters across the six task conditions are presented in
Table 5 (
n = 43). Gait speed was highest during ST-CW (1.42 ± 0.22 m/s) and lowest during DT5-MULT (1.24 ± 0.28 m/s). Stride length was longest during ST-CW (1.53 ± 0.17 m) and decreased progressively across dual-task conditions, with the shortest observed in DT5-MULT (1.45 ± 0.22 m). Step width increased under dual-task conditions, with ST-CW demonstrating the narrowest mean width (10.19 ± 3.31 cm) and DT4-SC the widest (12.00 ± 3.62 cm). Cadence followed a similar but negative pattern, being highest in ST-CW (109.99 ± 7.51 steps/min) and decreasing notably in DT5-MULT (101.13 ± 11.22 steps/min). Double support time was lowest during ST-CW (28.67 ± 2.35%) and increased slightly in the dual-task conditions, peaking during DT5-MULT (30.41 ± 3.41%).
Figure 2 presents a visualization of the data with box and scatter plots for the five gait variables for all six conditions.
The multivariate test revealed a significant main effect of task condition on the combined spatiotemporal gait parameters (Pillai’s Trace = 0.501, F(25, 1050) = 4.67, p < 0.001, partial η2 = 0.100; Wilks’ Lambda = 0.552, F(25, 767) = 5.31, p < 0.001, partial η2 = 0.112) indicating that task condition accounted for a substantial proportion of the variance in the multivariate outcome.
Mauchly’s test indicated that the assumption of sphericity was violated for all dependent variables: speed (W = 0.308, p < 0.001), step length (W = 0.370, p < 0.001), step width (W = 0.525, p = 0.028), cadence (W = 0.169, p < 0.001), and DS (W = 0.213, p < 0.001). Therefore, Greenhouse–Geisser corrections were applied to the univariate analyses. The univariate analyses showed that task condition had a significant effect on gait speed, F(3.34, 140.37) = 17.97, p < 0.001, partial η2 = 0.300; stride length, F(3.7, 155.25) = 8.03, p < 0.001, partial η2 = 0.161; step width, F(4.05, 170.18) = 7.51, p < 0.001, partial η2 = 0.152; cadence, F(3.28, 137.88) = 19.35, p < 0.001, partial η2 = 0.315; and DS, F(2.86, 119.97) = 4.96, p = 0.003, partial η2 = 0.106.
Bonferroni-adjusted pairwise comparisons showed that gait speed during ST-CW was significantly greater than during DT1-SC (p = 0.011), DT2-SC (p = 0.001), DT4-SC (p < 0.001), and DT5-MULT (p < 0.001), with no significant difference compared to DT3-SC (p = 0.053). Gait speed during DT5-MULT was significantly slower than all other trials. There was no significant difference in gait speed between all DT-SC trials. Furthermore, the stride length during ST-CW was significantly greater than during DT5-MULT (p = 0.002). Step width was significantly narrower during ST-CW than during DT2-SC (p = 0.002), DT3-SC (p < 0.001), DT4-SC (p = 0.002), and DT5-MULT (p < 0.001). Cadence during ST-CW was significantly greater than during DT1-SC (p = 0.002), DT2-SC (p < 0.001), DT3-SC (p < 0.001), DT4-SC (p < 0.001), and DT5-MULT (p < 0.001). Lastly, double support time during ST-CW was significantly less than during DT2-SC (p < 0.001) and DT5-MULT (p = 0.002).
To determine whether these condition-related differences exceeded measurement error, we compared the observed changes in mean spatiotemporal parameters with MDC(5) derived from the single-task reliability analysis (
Table 4). For gait speed, the reduction from ST-CW (1.42 m/s) to DT5-MULT (1.24 m/s) was 0.18 m/s, which exceeds the MDC(5) of 0.091 m/s. Similarly, reductions in cadence from ST-CW (110.0 spm) to all DT conditions exceeded the MDC(5) of 3.48 spm. In contrast, increases in step width and double support between ST-CW and several DT conditions were of smaller magnitude and were closer to or below their respective MDC(5) values.
4. Discussion
The primary aim of this study was to establish a standardized protocol and procedures for 3D MMC-based gait analysis using OpenCap and to quantify the reliability and within-session precision of key spatiotemporal gait parameters obtained. By providing detailed documentation of our setup, including camera positioning, calibration procedures, participant alignment and preparation, and trial segmentation, we aimed to establish a replicable framework for future studies that utilize markerless technology in gait analysis, both within and outside laboratory settings.
OpenCap’s published best practices highlight key elements for ensuring high-quality data capture during overground trials, including maximizing camera field-of-view overlap, minimizing occlusion, providing stable and consistent camera placement, and performing thorough calibration across the capture volume [
15]. In alignment with these recommendations, we positioned two iOS cameras laterally at a consistent height (80 cm) on stable tripods at a 27-degree angle off-centerline. This setup allowed participants to be within both camera views when they started the walking trials and remain in view until the end of the six-meter measurement area. In turn, the last viable stride captured was clearly within the measurement area, providing certainty that the stride was not taken during an acceleration or deceleration phase of the 10 m walkway, which aligns with the best practices of general 10 m walking tests [
14].
In terms of calibration, our protocol adhered closely to OpenCap’s recommendations by using a rigid, matte-finish calibration board (the exact checkerboard available on OpenCap’s best practices website). The board was placed at the same location within the measurement area, aligned with a fixed floor marker to ensure reproducibility across trials and sessions. OpenCap emphasizes the importance of calibrating across the whole capture area and using high-contrast, non-reflective calibration tools, both of which were integral to our protocol. Additionally, our use of a standard neutral pose calibration positioned over the same fixed marker within the measurement area ensured accurate biomechanical model initialization, further reducing variability in spatiotemporal and kinematic data.
We specifically mounted the calibration checkerboard at or near the same height as the cameras (80 cm). Although the camera height in comparison to the calibration board height is not mentioned in OpenCap’s best practices, our preliminary trials before the study’s onset indicated a higher rate of failed gait analysis in the OpenCap software when the cameras were positioned higher than the calibration board, and vice versa. The authors of the current research believe that this is an important distinction that warrants further study to verify our assumptions.
Regarding lighting, OpenCaps’ best practices simply state “a well-lit environment with even lighting across the calibration board and capture space.” Theia 3D, an alternative MMC platform, has published a blog post suggesting a minimum of 500 lux and a standard of 1000 lux as best practice [
26]. The lighting in our laboratory was measured across multiple sessions, with average illumination values consistently ranging from approximately 950 to 1080 lux, including the calibration board, camera zones, and walkway endpoints. Lighting uniformity was mostly satisfactory, with standard deviations between 9 and 40 lux in most zones; however, localized variability was observed at the calibration board (954 ± 125 lux) and the start of the walkway (872 ± 135 lux), suggesting minor inconsistencies in those regions possibly due to the laboratory having a half-vaulted ceiling (LED lighting further from measurement area) and multiple windows on the far west side (start of the walkway), which led to fluctuation in lighting depending upon the amount of sunlight coming in from the outside (owing to the time of day or cloudy/sunny conditions). Despite these variations, the overall mean illuminance was 963.8 ± 108.2 lux, indicating a well-lit environment that supports stable video capture. These results suggest that the laboratory’s fixed overhead LED lighting system provides a reliable illumination environment for markerless motion analysis. However, minor improvements in uniformity at specific locations may further enhance tracking accuracy. Furthermore, to the best of our knowledge, no scientific research has been conducted on this topic. It would be of great interest for future research to compare different lighting scenarios and determine the appropriate lighting ranges for 3D-MMC.
Beyond adhering to the best practices published on the OpenCap webpage [
15], our study contributes further structure by formalizing trial segmentation (designating acceleration, measurement, and deceleration zones) and participant instructions to minimize inter-trial variability. A recent systematic review by Cheng et al. (2025) has highlighted that, among current computer vision-based motion capture systems, OpenCap holds the most promise for clinical gait analysis [
4]. Additionally, the current detailed documentation and schematic (
Figure 1) serve not only as a record of our protocol but also as a proposed template for researchers seeking to further standardize OpenCap gait analysis procedures.
OpenCap records video on iOS devices and uploads this data to a cloud-based server over Wi-Fi, where the spatiotemporal parameters are generated once the user selects the appropriate secondary processing option (overground or treadmill gait). In the present study, 500 normal and dual-task walking trials were recorded, of which 491 (98.2%) were successfully uploaded, processed, and passed our quality checks. Two trials (0.4%) failed at the upload stage, most likely due to unstable Wi-Fi in our laboratory location compared with other areas on campus. The remaining seven failed trials (1.4%) were successfully uploaded but failed to complete overground gait processing. We suspect that atypical movement-related events during these trials may have challenged the underlying tracking and segmentation algorithms. This cannot be confirmed because OpenCap does not currently provide detailed error diagnostics. We suggest that reporting such capture and processing metrics become standard in research using OpenCap, as it provides transparent information on data completeness, highlights potential procedural shortfalls, and offers a practical benchmark for the robustness of processing protocols.
Relative and absolute reliability analyses show that our standardized protocol and procedures for 3D MMC (OpenCap) yield good-to-excellent relative reliability and relatively small absolute errors for key spatiotemporal gait parameters. Single-measure ICC (3,1) values ranged from poor to excellent, and the corresponding ICC (3,5) coefficients for the mean of five ST-CW trials were predominantly excellent, indicating highly stable individual measurements under the proposed protocol. Gait speed, stride length, and cadence show good to excellent test–retest reliability for single measurements (ICC(3,1)). When averaged over five trials (ICC(3,5)), the relative reliability of the scores demonstrates even higher excellent ratings, with only DS reporting a good-excellent rating. This data should serve as a reminder to collect data across multiple trials when using 3D-MMC systems to improve measurement reliability.
The current data demonstrate that SD (within) and SEM (1) were nearly identical across all spatiotemporal outcomes, indicating that the variance components extracted from the repeated-measures GLM and the classical ICC-based SEM calculations shown in the methods section and presented in
Table 4 capture the same underlying trial-to-trial error. This internal consistency supports the use of SD (within) as a straightforward GLM-derived precision metric and SEM and MDC as complementary, clinically interpretable indices of absolute reliability. These data also confirm that the trial-to-trial noise/variance in the OpenCap software is minimal when standardized procedures are used, and it achieves precision comparable to other validated gait technologies [
1,
22,
23,
27,
28,
29,
30,
31]. However, questions arise regarding the step width and DS variables, where the variance attributable to measurement error is much greater and should be interpreted with caution.
Furthermore, within classical test theory (Lord & Novick, 1968) and more recently noted by Borsboom & Mellenbergh (2022) [
32,
33], reliability is defined as the ratio of true-score variance to observed-score variance, such that the proportion of variance attributable to measurement error equals
. In our case, the standard error of measurement (SEM) was computed as
. It follows directly that
, and thus
represents the percentage of the observed variance that is due to measurement error. Using this approach, our single-trial OpenCap measurements suggest that measurement error accounts for 11.40% of the observed variance in gait speed, 12.4% in stride length, 21.7% in step width, 11.9% in cadence, and 50.3% in double support (DS). When the mean of five trials is used (
), the proportion of variance attributable to measurement error is significantly reduced, to 2.28% for gait speed, 2.48% for stride length, 4.34% for step width, 2.38% for cadence, and 10.06% for DS. Equally, reliability for each variable can be expressed as
. Essentially, coming back full circle, a 2.28% error variance for gait speed implies an ICC of approximately 0.975 when averaged over five trials, as reported in the ICC tables.
As a secondary objective, we used dual-task conditions to illustrate how the reliability estimates obtained under the standardized protocol can inform the interpretation of condition-related changes. When the DT data were examined in light of the MDC (5) values derived from the single-task reliability analysis, reductions in gait speed, stride length, and cadence between ST-CW and DT conditions consistently exceeded their respective MDC (5), indicating that these changes are unlikely to arise from measurement error alone. In contrast, increases in step width and double support were smaller relative to their MDC (5) values and therefore should be interpreted more cautiously. This is not surprising as most validity and reliability research to date has found that frontal plane variables demonstrate greater variance than criterion measures [
1,
2,
4]. These findings demonstrate that, under a rigorously standardized 3D-MMC protocol, OpenCap is sufficiently precise to detect relatively small experimental manipulations at the group level, while also highlighting which gait variables are less suitable for detecting subtle changes.
Prior research has utilized serial counting in 3 s and/or 7 s, usually counting down, but not exclusively [
34,
35,
36,
37,
38,
39]. However, the authors developed the current DT protocols with the progression of complexity in mind. We approached the serial counting method by counting up (DT1-SC and DT3-SC), counting down (DT2-SC and DT4-SC), and multiplying (DT5-MULT); within that model, we asked the participants to count in different ways, such as odd or even numbers, in sets/groups, and multiplying with progressively larger numbers. Although the data is not explicitly presented here, there were no statistical differences in the dual-tasks questions within the same class (i.e., counting up in odd (DT1.1) or even numbers (DT1.2) or counting down in sets by 20 (DT4.1), 50 (DT4.2), or 100 (DT4.3)). In the current model, counting down appears to be more cognitively demanding than counting up, regardless of the number or pattern (odds vs. evens, 3 s vs. 7 s, etc.). The current data help support the argument that serial counting down in 3 s and/or 7 s is a functional cognitive DT test for detecting changes in spatiotemporal gait analysis in healthy young adults. However, future research should further examine this hypothesis to determine the hierarchical structure of serial-counting dual-task. A recently published paper by Almutari et al. attempted to answer a similar question using six different dual-tasks; yet, only two of those tasks involved serial counting, and the study found no significant differences between the six observed dual-tasks [
38].
Importantly, our study design minimized methodological variability that might otherwise confound these results. Standardized trial instructions, consistent calibration procedures, and strict task delivery ensured that observed differences in gait were attributable to cognitive load rather than inconsistencies in data collection or participant understanding. This methodological rigor complements meta-analytic evidence that dual-task costs scale with cognitive task complexity [
37]. The markerless motion capture’s sensitivity in detecting small changes in gait speed, stride length, and cadence, as demonstrated here, underscores its utility for advancing dual-task gait research, especially when standardization procedures are strictly followed.
Our results contribute to the growing body of evidence that even simple cognitive tasks, when performed concurrently, can significantly affect spatiotemporal gait parameters in healthy college-age adults, and this effect is not exclusive to an aging population. This work highlights the importance of establishing standardized protocols in dual-task gait analysis to facilitate comparisons across studies and populations. Future research should explore how different categories of cognitive tasks, as well as task prioritization strategies, further modulate gait, and how MMC systems can be optimized to support large-scale, multi-site investigations. The authors believe that baseline data should be collected in younger, healthy adult populations to standardize the baseline cost of cognitive dual-tasking, which can then be compared in aging and/or diseased populations.
The current study has several limitations that should be acknowledged. First, all reliability and dual-task analyses were conducted in a single laboratory session with a relatively homogeneous sample of healthy young adults; precision and dual-task effects may differ in heterogeneous or clinical populations and across different environments and settings. Second, reliability was established for one specific OpenCap configuration (two cameras, fixed distances, lighting, and overground walkway length), so the results should not be generalized to other MMC setups without additional validation and reliability testing. Third, we focused only on spatiotemporal variables; joint-level kinematics and gait variability measures were not evaluated here. Fourth, we did not include a criterion reference system, such as a marker-based 3D motion capture system or an instrumented walkway, for direct validation of OpenCap outputs. In contrast, several previous reviews (cited in the introduction) have compared markerless or wearable systems against such gold standards. Fifth, participants’ baseline cognitive capacity was not measured, and cognitive-task performance during walking was not quantified; therefore, the dual-task analyses presented here should be viewed primarily as an illustration of how to apply the reliability metrics rather than a definitive characterization of cognitive–motor interference. Finally, although averaging the five ST-CW trials improved precision, step width and double support remained less reliable, so conclusions based on these variables should be interpreted with caution.
Importantly, while our secondary DT analyses demonstrated the sensitivity of this protocol in detecting changes, the greater contribution lies in demonstrating how MMC technology can be applied rigorously in research and clinical settings. As the field moves toward broader adoption of MMC, further dialog is needed to standardize methodologies and ensure data comparability across laboratories and studies. This work aims to initiate this conversation and provide a foundation for future methodological discourse, ultimately leading to consensus within the research community.
Given the rapid emergence of MMC, such as OpenCap, the scientific community must engage in discussions about standardized data collection procedures and best practices. Establishing shared practices will ensure that data generated using this technology is valid, reliable, and comparable across laboratories. Although our analysis of dual-task gait changes provides insight into cognitive-motor interactions, it is secondary to our primary objective of advancing transparent, rigorous methods for gait data acquisition using this promising technology.