Validation of a Markerless Multi-Camera Pipeline for Bouldering Fall Kinematics
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors- Supplement the synchronization accuracy verification method between multiple cameras, such as inter-frame delay error measurement, and also supplement the time difference threshold for different cameras to capture the same action after synchronization.
- Section 2.5.1 does not specify the key parameter settings during joint trajectory reconstruction, such as confidence threshold and selection of joint point optimization algorithm.
- Supplement the specific standards and basis for outlier handling.
- Section 3.1 only presents the relative errors of overall segment lengths. Statistical test results of relative errors for each segment can be supplemented separately according to the three scenarios of "Standing/Loss of Balance/Roll" to clarify whether different fall postures have a statistically significant impact on reconstruction accuracy.
- It is recommended to add a comparative analysis of gender groups in each core result section (such as fall height, peak velocity, peak acceleration) to clarify whether gender factors have a significant impact on the measurement results.
Author Response
Response to reviewers:
Title: Validation of a Markerless Multi-Camera Pipeline for Bouldering Fall Kinematics
Manuscript ID: sensors-4064311
We would like to thank the Reviewers for their careful evaluation of our manuscript and for their constructive comments. We have revised the manuscript accordingly and believe that these changes have improved the clarity and quality of the work.
In the response below, reviewers’ comments are reproduced and our responses are provided right below. All modifications have been incorporated in the revised manuscript and are highlighted.
In the response below, reviewers’ comments are reproduced in black, and our responses are provided in blue. All modifications have been incorporated in the revised manuscript and are highlighted.
Sincerely,
Nathan Carretier, Erwan Beurienne, Marie-Hélène Beauséjour, Lucas Gros, Claire Bruna-Rosso, Marine Dorsemaine, Michel Behr, Nicolas Bailly, Julien Clément
Reviewer 1:
- Comment 1: Supplement the synchronization accuracy verification method between multiple cameras, such as inter-frame delay error measurement, and also supplement the time difference threshold for different cameras to capture the same action after synchronization.
Response: Thank you for pointing this out. We clarified and expanded Section 2.4 (“Calibration and Synchronization”) to explicitly describe how inter-camera synchronization accuracy was verified and to report the corresponding time threshold. Because camera recordings were initiated using a wireless remote, small differences in recording start times across views could occur. To synchronize the five cameras, we identified the exact hand-contact frame of the hand clap in each camera view using Kinovea (frame-by-frame). The camera for which the clap occurred earliest was used as the reference (corresponding to the shortest recording), and the beginning of the other videos was trimmed by the measured temporal offset so that the clap aligned across all views. We then quantified the residual inter-camera delay after trimming and confirmed it was ≤1 frame at 240 fps (≤4.17 ms) by comparing the first and last frames as well as the total number of frames of each trimmed video, confirming that all views covered the same time window. These additions are included in Section 2.4 of the revised manuscript.
Lines 187-200: “Before each climb a standardized sequence was done by the athlete: a vertical jump, a T-pose, and a single hand clap. Camera recordings were initiated using a wireless remote connected to all cameras, which can introduce small differences in recording start times across views. Therefore, synchronization between cameras was performed using the hand clap event. First, the exact frame of hand contact was identified in each camera view using Kinovea (frame-by-frame). The camera for which the clap occurred earliest was used as the reference (and corresponded to the shortest recording, as all cameras captured the same scene with only a small trigger delay). For each other camera, the temporal offset relative to the reference was computed, and the beginning of the video was trimmed by the corresponding amount so that the clap was aligned across all five views. After trimming the beginning of each video based on the hand-clap contact frame identified in Kinovea, we verified that the residual inter-camera delay was ≤1 frame at 240 fps (≤4.17 ms) by comparing the first and last frames as well as the total number of frames of each trimmed video, confirming that all views covered the same time window.”
- Comment 2: Section 2.5.1 does not specify the key parameter settings during joint trajectory reconstruction, such as confidence threshold and selection of joint point optimization algorithm.
Response: Thank you for the suggestion. We agree that key reconstruction parameters should be explicitly reported for reproducibility. Therefore, we expanded Section 2.5.1 to specify the pose-estimation model and mode (RTMLib, Body_with_feet, performance), the single-person setting (multi_person = false), and the key confidence (likelihood) and reprojection error thresholds used for person association (tracked keypoint: Neck; 20 px; likelihood = 0.3) and triangulation (minimum cameras = 2; reprojection error = 25 px; likelihood = 0.3). We also clarified missing data handling (linear interpolation <10 frames; larger gaps left as NaN) and noted that Pose2Sim does not implement an explicit joint point optimization/IK fitting step, as trajectories are obtained via multi-view triangulation. These changes appear in Section 2.5.1 of the revised manuscript.
Lines 210-222: “Videos were processed with Pose2Sim v0.10.20 [15,16,24], which performs multi-camera markerless 3D reconstruction of joint trajectories. Pose estimation was performed with RTMLib using the Body_with_feet model in performance mode, and a single-person configuration (multi_person = false). Person association followed a single-person workflow (tracked keypoint: Neck) with an association reprojection-error threshold of 20 px and a keypoint confidence (likelihood) threshold of 0.3. 3D joint trajectories were reconstructed by triangulation using at least two cameras (minimum cameras for triangulation = 2), with a triangulation likelihood threshold of 0.3 and a triangulation reprojection-error threshold of 25 px. Pose2Sim does not include an explicit joint-point optimization step (e.g., inverse-kinematics fitting); joint trajectories result from multi-view triangulation under the above confidence and reprojection constraints. Missing data were linearly interpolated only for gaps shorter than 10 frames; larger gaps were left as missing values (filled with NaN). All parameters were defined in the project configuration file and applied consistently across all trials.”
- Comment 3: Supplement the specific standards and basis for outlier handling.
Response: We agree that the criteria for outlier handling required further clarification. We have therefore added a dedicated explanation in the Methods section (Section 2.6, Variables and Statistical Analysis) specifying the rationale for the 1000% relative error threshold. We now explain that this threshold corresponds to a tenfold deviation from the expected segment length and was selected to identify major tracking failures caused by incorrect joint identification rather than intrinsic segment length estimation errors.
Lines 291-298: “Segment lengths were computed as joint-to-joint distances and averaged across frames for each trial. The analysis was performed twice: (1) after removing extreme values (>1000% relative error) corresponding to a tenfold increase relative to the expected segment length and indicative of clear pose estimation failures due to incorrect joint identification rather than intrinsic segment length estimation errors and (2) including all data points to examine how fall condition (Standing, Loss of Balance, Roll) influences reconstruction variability. Both results are reported in a single table to distinguish typical performance from condition-dependent tracking errors.”
- Comment 4: Section 3.1 only presents the relative errors of overall segment lengths. Statistical test results of relative errors for each segment can be supplemented separately according to the three scenarios of "Standing/Loss of Balance/Roll" to clarify whether different fall postures have a statistically significant impact on reconstruction accuracy.
Response: We agree that reporting the statistical results by fall scenario improves clarity. Therefore, we added in Section 3.1 a scenario-specific analysis: paired t-tests of segment-length relative errors were computed separately for Standing, Loss of Balance, and Roll conditions (with and without exclusion of extreme values). These tests did not reveal any systematic bias for any segment in any scenario (all p > 0.05; p-values ranged from 0.16–0.93 in Standing, 0.11–0.93 in Loss of Balance, and 0.08–0.84 in Roll). To keep Table 1 readable and avoid overloading the manuscript with additional columns, we reported the scenario-specific p-value ranges directly in the Results text (Section 3.1), while Table 1 continues to summarize the mean ± SD patterns (with/without outliers) across scenarios.
Lines 333-339: “When examined separately by fall scenario (Standing, Loss of Balance, Roll), paired t-tests of segment-length relative errors did not reveal any systematic bias for any segment (all p > 0.05), both when including and excluding extreme values. Specifically, p-values ranged from 0.16–0.93 in Standing, 0.11–0.93 in Loss of Balance, and 0.08–0.84 in Roll. These results indicate that fall posture did not significantly affect mean reconstruction accuracy, while Roll trials primarily increased variability and the occurrence of extreme tracking failures (Table 1).”
- Comment 5: It is recommended to add a comparative analysis of gender groups in each core result section (such as fall height, peak velocity, peak acceleration) to clarify whether gender factors have a significant impact on the measurement results.
Response: Thank you for this recommendation. After careful consideration, we chose not to include a sex-stratified analysis in the manuscript. Our sample is small and unbalanced (7 females and 3 males, 25 female falls vs. 15 male falls). Moreover, sex is strongly intertwined with anthropometrics (e.g., body mass and height), exposure (fall height; controlled vs. natural trials), and landing strategies (posture/anticipation). Given the strong correlation between these factors and the large fall-to-fall variability, isolating a specific sex effect would be highly uncertain and prone to overinterpretation. Since the primary aim of this paper is methodological validation, we believe adding such analyses would introduce noise and distract from the main contribution. We therefore clarified this point in the Limitations and note that a dedicated study with larger, balanced groups would be required to assess sex effects robustly.
Lines 562-567: “Sex-stratified analyses were not included because the sample was small and unbalanced (7 females and 3 males, 25 female falls vs. 15 male falls). Moreover, sex is intertwined with anthropometrics (body mass/height), exposure (fall height; controlled vs. natural trials), and landing strategies (posture/anticipation), making it difficult to isolate a sex effect without overinterpretation; a dedicated study with larger, balanced groups will be required.”
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsLine 60: It’s more accurate to say that kinovea is a manual annotation and motion analysis tool
Line 112: when describing the equipment, don’t mix the manufacturer’s details with the spec. Example: . Five cameras (HERO12, GoPro, San Mateo, CA, USA) with a max resolution of 2.7K at 240 fps…. Same for the IMUs mentioned later.
Line 114: give a few more details regarding the camera set up an named before metioning C4 (e.g Fthe cameras were labeled C1-C5…). Also please explain why C4 was chosen (it is obvious from figure 1, but please, mention it in the text as well)
Line 209: Wouldn’t filtering 13 Hz of the downsample 240 Hz signal, effectively remove signals 86,6 Hz and lower from the 1600Hz signal? That would affect the information contained. In general, downsampling is the last step, especially if the event examined (such as falling) has high-frequencies.
Figure 3: The fonts are too small and the resolution is not high enough and it can be hard to read the labels
Figure 7: Similarly, the bland-altman plots are of lowe resolution, zooming in makes them look blurry.
3.6 Impact of resampling: Doesn’t that mean the lower resolution has distorted the data? Section 3.6.1 seems to say exactly that. Explain why your results are still valid. The discussion and conclusion section deiscuss the the practical implication, but here the technical aspects (such as events that were lost) should be presented
Author Response
Response to reviewers:
Title: Validation of a Markerless Multi-Camera Pipeline for Bouldering Fall Kinematics
Manuscript ID: sensors-4064311
Dear Reviewers,
We would like to thank the Reviewers for their careful evaluation of our manuscript and for their constructive comments. We have revised the manuscript accordingly and believe that these changes have improved the clarity and quality of the work.
In the response below, reviewers’ comments are reproduced and our responses are provided right below. All modifications have been incorporated in the revised manuscript and are highlighted.
In the response below, reviewers’ comments are reproduced in black, and our responses are provided in blue. All modifications have been incorporated in the revised manuscript and are highlighted.
Sincerely,
Nathan Carretier, Erwan Beurienne, Marie-Hélène Beauséjour, Lucas Gros, Claire Bruna-Rosso, Marine Dorsemaine, Michel Behr, Nicolas Bailly, Julien Clément
Reviewer 2:
- Comment 1: Line 60: It’s more accurate to say that Kinovea is a manual annotation and motion analysis tool.
Response: Thank you for pointing this out. Therefore, we revised the Kinovea description to clarify that it is a manual video annotation and motion analysis tool enabling frame-by-frame measurements. This change has been made in the revised manuscript in the Methods section (Kinovea description).
Lines 60-62: “Two-dimensional (2D) video analysis software Kinovea (kinovea.org) is a manual video annotation and motion analysis tool that enables frame-by-frame measurements of displacement and velocity.”
- Comment 2: Line 112: when describing the equipment, don’t mix the manufacturer’s details with the spec. Example: Five cameras (HERO12, GoPro, San Mateo, CA, USA) with a max resolution of 2.7K at 240 fps…. Same for the IMUs mentioned later.
Response: We agree with this comment. Therefore, we revised the IMU description to separate the device identification (model and manufacturer) from the measurement specifications. In particular, we now report the IMU sampling rate (1600 Hz) and measurement range (±200 g) in a separate sentence from the manufacturer details. This change has been made in the revised manuscript in the Methods (equipment description).
Lines 121-126: “(HERO12 Black, GoPro Inc., San Mateo, CA, USA) were installed around the wall to maximize visibility of the athletes and minimize occlusions. The cameras were labeled C1–C5 from right to left in the setup (Figure 1), and each camera was placed at the same location across all recording sessions to facilitate subsequent data processing. Videos were recorded at 240 fps with a maximum resolution of 2.7K.”
Lines 133-135: “Three IMUs (Blue Trident, Vicon Motion Systems Ltd., Oxford, UK) [13,14] were attached to the right ankle, sacrum, and forehead. Triaxial acceleration was sampled at 1600 Hz with a measurement range of ±200 g.”
- Comment 3: Line 114: give a few more details regarding the camera set up an named before metioning C4 (e.g Fthe cameras were labeled C1-C5…). Also please explain why C4 was chosen (it is obvious from figure 1, but please, mention it in the text as well)
Response: We agree with this comment. Therefore, we clarified the camera naming convention by stating that the five cameras were labeled C1–C5 from right to left (Figure 1), and that each camera was placed at the same location across all sessions to facilitate consistent downstream processing. We also added the rationale for selecting C4 for Kinovea-based 2D measurements and calibration, namely that the frontal view provides the most orthogonal perspective during the T-pose and the fall, minimizing perspective distortion for scaling and planar kinematic extraction. This revision has been made in the Methods (Experimental Setup).
Lines 123-129: “The cameras were labeled C1–C5 from right to left in the setup (Figure 1), and each camera was placed at the same location across all recording sessions to facilitate subsequent data processing. Videos were recorded at 240 fps with a maximum resolution of 2.7K. For Kinovea, we used the frontal view (C4) because it provided the most orthogonal perspective of the athlete during the T-pose, thereby minimizing perspective distortion when scaling and extracting planar kinematics. as shown in Figure 1.”
- Comment 4: Wouldn’t filtering 13 Hz of the downsample 240 Hz signal, effectively remove signals 86,6 Hz and lower from the 1600Hz signal? That would affect the information contained. In general, downsampling is the last step, especially if the event examined (such as falling) has high-frequencies.
Response: We agree that the order of filtering and resampling must be stated clearly to avoid ambiguity and aliasing. Importantly, filtering at 13 Hz does not “remove the 1600 Hz signal”; it only removes signal content above 13 Hz after resampling, as a smoothing step. IMU signals were initially recorded at 1600 Hz. Before resampling to 240 Hz, we applied an explicit anti-aliasing low-pass filter (4th-order zero-phase Butterworth, 110 Hz cutoff), chosen conservatively below the Nyquist frequency of the target sampling rate (240 Hz → 120 Hz). The anti-aliased signals were then resampled to 240 Hz. Subsequently, Yu’s residual method was applied on the 240 Hz data to select the biomechanically relevant low-pass cutoff for impact analysis; this procedure yielded ~13 Hz, which was then used for final smoothing. These clarifications were added to the Methods section (Filtering, Section 2.5.4).
Lines :259-265: “Prior to resampling to 240 Hz, an explicit anti-aliasing low-pass filter was applied (4th-order zero-phase Butterworth, 110 Hz cutoff), selected conservatively below the Nyquist frequency of the target sampling rate (120 Hz). The anti-aliased signals were then resampled to 240 Hz. Yu’s residual method was subsequently applied on the resampled signals to determine the biomechanically relevant low-pass cutoff for smoothing; this yielded ~13 Hz, which was then applied to the 240 Hz IMU signals [30,31].”
- Comments 5 and 6: Figure 3: The fonts are too small and the resolution is not high enough and it can be hard to read the labels. Figure 7: Similarly, the bland-altman plots are of lower resolution, zooming in makes them look blurry.
Response: We agree that the readability and resolution of Figures 3 and 7 needed improvements. Therefore, we regenerated both figures at higher export resolution to ensure that all labels remain legible when zooming in. In the revised manuscript, the updated versions of Figure 3 and Figure 7 have been replaced with these higher-quality exports.
- Comment 7: 3.6 Impact of resampling: Doesn’t that mean the lower resolution has distorted the data? Section 3.6.1 seems to say exactly that. Explain why your results are still valid. The discussion and conclusion section discuss the practical implication, but here the technical aspects (such as events that were lost) should be presented
Response: We agree that lowering temporal resolution attenuates peak accelerations, which can be interpreted as a distortion of the original high-frequency IMU signal. To address the technical aspects requested, we (1) clarified our resampling/processing pipeline in the Methods (Section 2.5.4, Filtering), specifying that IMU accelerations were anti-aliased prior to downsampling (4th-order zero-phase Butterworth, 110 Hz cutoff), then resampled to 240 Hz, and subsequently low-pass filtered at 240 Hz using Yu’s residual method (≈13 Hz) to ensure comparability with video-processing constraints. We also (2) expanded Section 3.6.1 to explicitly explain which events are affected: short-duration impact transients contain high-frequency components, so the sharpest peaks and, when present, multiple closely spaced sub-peaks may be underestimated or merged after resampling. This effect is quantified in Section 3.6.1 by the observed peak reductions (~20–37% depending on sensor location). Importantly, our results remain valid because this resampling analysis is reported to isolate the technical contribution of temporal resolution (and associated filtering) to peak underestimation and to enable comparison with video-based methods, whereas our conclusions regarding impact intensity continue to rely on the original 1600 Hz IMU data as the reference.
Lines 259-265: “Prior to resampling to 240 Hz, an explicit anti-aliasing low-pass filter was applied (4th-order zero-phase Butterworth, 110 Hz cutoff), selected conservatively below the Nyquist frequency of the target sampling rate (120 Hz). The anti-aliased signals were then resampled to 240 Hz. Yu’s residual method was subsequently applied on the resampled signals to determine the biomechanically relevant low-pass cutoff for smoothing; this yielded ~13 Hz, which was then applied to the 240 Hz IMU signals [30,31].”
Lines 428-431: “This attenuation reflects the loss of high-frequency components in short-duration impact transients; consequently, the sharpest peaks (and, when present, multiple closely spaced sub-peaks) may be underestimated or merged after resampling.”
Lines 435-437: “Importantly, this resampling analysis was used to quantify the technical effect of limited temporal resolution on peak estimation and to enable a fair comparison with video-based methods.”
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis manuscript validates a markerless multi-camera motion capture pipeline (Pose2Sim) for quantifying bouldering fall kinematics in a real climbing gym environment. The authors compared Pose2Sim outputs against two reference methods: 2D video analysis using Kinovea for displacement and velocity, and high-frequency inertial measurement units (IMUs) for peak acceleration. Ten adolescent climbers performed 40 falls captured with five synchronized high-speed cameras (240 fps) and three IMUs (1600 Hz) placed at the ankle, sacrum, and forehead. The results demonstrate excellent agreement between Pose2Sim and Kinovea for fall height and peak velocity (ICCs > 0.98), confirming the suitability of markerless video methods for capturing global fall kinematics. However, Pose2Sim systematically underestimated peak impact accelerations compared to IMUs, particularly at the ankle and sacrum. A resampling analysis showed that temporal resolution strongly influences peak acceleration estimates, partially explaining these discrepancies. The authors conclude that Pose2Sim is well suited for studying fall trajectories and typology in ecological bouldering settings, while IMUs remain essential for quantifying impact intensity.
Major Comments and Questions for the Authors
- The study addresses an important and timely topic, especially given the growth of indoor bouldering and the practical limitations of laboratory motion capture. While Pose2Sim has been validated previously, its application to high-impact, non-cyclic movements in an ecological gym setting is novel. However, the Introduction would benefit from a clearer and more explicit statement of what is fundamentally new compared to prior Pose2Sim validations, ideally summarized in a short paragraph at the end of the Introduction.
- The study focuses exclusively on adolescent climbers (14–17 years). Please discuss more explicitly how age, body mass, and movement strategies might influence impact kinematics and whether the findings can be generalized to adult or elite climbers.
- Clarify why only 10 athletes were selected from the initial cohort of 22, beyond the statistical power rationale. Were there systematic differences between included and excluded participants?
- The mix of “natural falls” and “controlled jumps” is reasonable, but these two conditions likely differ substantially in neuromuscular control and landing strategy. Did the authors examine whether method agreement differed between these fall types?
- Figure 3 (fall scenario map) is informative, but the manuscript would benefit from reporting the number of trials per category directly in the text to improve transparency.
- Acceleration was derived from markerless positional data, which requires numerical differentiation and filtering. While the Discussion addresses this limitation well, it would be helpful to explicitly clarify in the Methods that acceleration estimates are second derivatives of position and therefore highly sensitive to noise.
- The choice of using the acceleration norm for IMUs is appropriate; however, this removes directional information. Please clarify whether directional acceleration components (e.g., vertical) were explored and why they were not retained.
- The introduction of “peak acceleration width” is interesting and potentially valuable. However:
- Please provide a stronger rationale for selecting the 90% threshold.
- Discuss whether peak width has known biomechanical or injury-related relevance in bouldering or whether it is proposed here as an exploratory metric.
- The statistical framework is generally sound. However, given the repeated-measures nature of the data (multiple falls per participant), please clarify whether participant-level clustering was considered or whether all falls were treated as independent observations.
- Consider reporting effect sizes (e.g., Cohen’s d or partial η²) alongside p-values for key comparisons, particularly for acceleration differences.
- The Discussion appropriately attributes acceleration underestimation to frame rate, filtering, and differentiation. However, it would strengthen the paper to:
- Explicitly state that this limitation reflects fundamental constraints of video-based kinematics, not a failure of Pose2Sim itself.
- Clarify whether higher-frame-rate cameras or alternative smoothing/differentiation strategies could realistically close this gap in future studies.
- The applied relevance is strong, but the authors may consider adding a short subsection or paragraph explicitly outlining practical recommendations for researchers and practitioners (e.g., when Pose2Sim alone is sufficient vs. when IMUs are indispensable).
- Figures are generally clear, but some boxplots (e.g., Figure 8) are visually dense. Consider simplifying legends or adding brief explanatory captions for readers less familiar with acceleration metrics.
- Ensure consistency in terminology (e.g., “P2S” vs. “Pose2Sim”) throughout the manuscript.
- Double-check that all supplementary tables are referenced explicitly in the main text.
Quality of English
The manuscript is written in clear, professional, and technically accurate English. Sentence structure and terminology are appropriate for an international biomechanics and engineering audience. Minor stylistic improvements could be made by shortening some long sentences in the Discussion and avoiding occasional repetition of similar phrases (e.g., “systematic underestimation”). Overall, the English quality is very good, and only light copyediting is recommended.
Comments on the Quality of English Language
Quality of English
The manuscript is written in clear, professional, and technically accurate English. Sentence structure and terminology are appropriate for an international biomechanics and engineering audience. Minor stylistic improvements could be made by shortening some long sentences in the Discussion and avoiding occasional repetition of similar phrases (e.g., “systematic underestimation”). Overall, the English quality is very good, and only light copyediting is recommended.
Author Response
Response to reviewers:
Title: Validation of a Markerless Multi-Camera Pipeline for Bouldering Fall Kinematics
Manuscript ID: sensors-4064311
Dear Reviewers,
We would like to thank the Reviewers for their careful evaluation of our manuscript and for their constructive comments. We have revised the manuscript accordingly and believe that these changes have improved the clarity and quality of the work.
In the response below, reviewers’ comments are reproduced and our responses are provided right below. All modifications have been incorporated in the revised manuscript and are highlighted.
In the response below, reviewers’ comments are reproduced in black, and our responses are provided in blue. All modifications have been incorporated in the revised manuscript and are highlighted.
Sincerely,
Nathan Carretier, Erwan Beurienne, Marie-Hélène Beauséjour, Lucas Gros, Claire Bruna-Rosso, Marine Dorsemaine, Michel Behr, Nicolas Bailly, Julien Clément
Reviewer 3:
- Comment 1: The Introduction would benefit from a clearer statement of what is fundamentally new compared to prior Pose2Sim validations, ideally summarized at the end of the Introduction.
Response: Thank you for this suggestion. We revised the final paragraph of the Introduction to state the key novelty more explicitly: unlike prior Pose2Sim validations mostly performed on cyclic tasks in controlled laboratory settings, this study evaluates Pose2Sim in a real-condition indoor bouldering context with high-impact, non-cyclic falls (occlusions/rapid rotations). We also clarified our dual validation approach (global kinematics vs. Kinovea; impact peaks vs. high-frequency IMUs). Location in revised manuscript: End of the Introduction (final paragraph).
Lines 93-101: “Unlike most prior P2S validations performed on cyclic tasks in controlled laboratory settings, the present work evaluates P2S in a real-condition indoor bouldering context characterized by high-impact, non-cyclic falls and rotations. Importantly, this context also imposes constrained camera viewpoints (no full 360° coverage due to the wall and gym lay-out), which differs from typical multi-camera laboratory configurations. This study therefore extends P2S validation to a practical field setting and explicitly delineates what video-based kinematics can reliably estimate (global fall kinematics) versus what remains limited for impact quantification (peak accelerations) without high-frequency IMUs.”
- Comment 2: The study focuses exclusively on adolescent climbers (14–17 years). Please discuss more explicitly how age, body mass, and movement strategies might influence impact kinematics and whether the findings can be generalized to adult or elite climbers.
Response: We agree with this comment. Therefore, we revised the Discussion - Limitations paragraph to explicitly state why generalization to adult or elite climbers should be made with caution, and we specified how age, height and body mass, and landing strategies could influence impact kinematics (impact magnitude/timing, fall typology, energy dissipation strategies). This change can be found in the revised manuscript in the Discussion, Limitation’s paragraph, replacing the original sentence about adolescents/generalization.
Lines 554-562: “Additionally, our participant group consisted of adolescent climbers; therefore, generalization to adult or elite climbers should be made with caution. The mean height and body mass of our participants (163.9 cm, 52.8 kg) were lower than normative values reported for a 50th percentile adult male (174.9 cm, 78.6 kg) [43], and such anthropometric differences are known to influence impact kinematics (e.g., peak magnitude and timing). Moreover, age and experience can influence landing strategies (e.g., anticipation of contact, limb stiffness modulation, and intentional rolling or energy dissipation). Together, these factors could affect both the distribution of fall typologies and absolute peak accelerations.”
- Comment 3: Clarify why only 10 athletes were selected from the initial cohort of 22, beyond the statistical power rationale. Were there systematic differences between included and excluded participants?
Response: We agree with this comment. Therefore, we clarified in the Methods that the subset of 10 athletes was selected purposefully to (i) ensure a broad representation of fall profiles (e.g., rotation, landing outcome, start height categories) and anthropometric variability, (ii) retain only trials with sufficient data quality for synchronized multi-camera reconstruction and IMU processing (e.g., minimal additional occlusions such as bystanders obstructing camera views, adequate visibility for 3D reconstruction, and complete IMU recordings), and (iii) assemble a final dataset providing adequate statistical power for the planned method-comparison analyses.
Lines 154-161: “From this dataset, we retained a purposeful subset of 10 athletes and extracted 40 trials for analysis (12 natural falls and 28 controlled jumps). Trial selection aimed to (i) provide adequate statistical power for method-comparison analyses, (ii) balance fall scenarios (start height, presence/absence of rotation, and landing outcome), and (iii) ensure sufficient data quality for synchronized multi-camera reconstruction and IMU processing (e.g., minimal additional occlusions such as bystanders obstructing camera views, adequate visibility for 3D reconstruction, and complete IMU recordings) as detailed in the sub-section 2.6 Variables and Statistical Analysis.”
- Comment 4: The mix of “natural falls” and “controlled jumps” is reasonable, but these two conditions likely differ substantially in neuromuscular control and landing strategy. Did the authors examine whether method agreement differed between these fall types?
Response: We agree that natural falls and controlled jumps may differ in anticipatory neuromuscular control and landing strategy. In this study, we did not stratify method agreement analyses by “natural” versus “controlled”, as our primary objective was to validate the pipeline across a broad set of landing conditions. Instead, trials were grouped by landing posture (Standing, Loss of balance, Roll), which more directly governs impact mechanics and also drives markerless tracking difficulty (e.g., rapid rotations, self-occlusions). We have now clarified this point in the Discussion as a limitation and note that future studies with should explicitly compare anticipated versus unanticipated falls to examine potential differences in kinematics and method agreement.
Lines 571-576: “Furthermore, our dataset mixed natural falls and controlled jumps; although these conditions may differ in anticipatory neuromuscular control, we did not stratify analyses by “natural” vs “controlled”. Instead, trials were grouped by landing posture (Standing, Loss of balance, Roll), which more directly governs impact mechanics and markerless tracking difficulty (e.g., rapid rotations and self-occlusions). Future studies should explicitly compare anticipated vs unanticipated falls.”
- Comment 5: Figure 3 (fall scenario map) is informative, but the manuscript would benefit from reporting the number of trials per category directly in the text to improve transparency.
Response: We agree with this comment. Therefore, we added the trial counts for each category (start height, rotation, and reception position) directly in the Methods paragraph introducing Figure 3, so readers do not need to infer them from the diagram. This change can be found in the revised manuscript in Section 2.3 (Experimental Setup), same paragraph that introduces Figure 3, immediately after the sentence listing the categories (fall heights, rotation, and reception positions).
Lines 168-172: “Across the 40 selected trials, start heights were distributed as Top (n = 19), Middle (n = 15), and Bottom (n = 6). Rotations were observed in 13 trials (Without rotation: n = 27). Reception positions were “Standing on feet” (n = 15), “On feet with loss of balance” (n = 12), and “On feet and roll” (n = 13), as summarized in Figure 3.”
- Comment 6: Acceleration was derived from markerless positional data, which requires numerical differentiation and filtering. While the Discussion addresses this limitation well, it would be helpful to explicitly clarify in the Methods that acceleration estimates are second derivatives of position and therefore highly sensitive to noise.
Response: TWe agree with this comment. Therefore, we added an explicit clarification in the Methods (Section 2.6 Variables and Statistical Analysis) stating that for video-based methods (Pose2Sim and Kinovea), velocity is obtained by differentiating position trajectories over time and acceleration is then obtained by differentiating velocity (i.e., acceleration is the second derivative of position).
Lines 288-290: “For video-based methods (P2S and Kinovea), velocity was obtained by differentiating position trajectories over time, and acceleration was then obtained by differentiating velocity over time (i.e., acceleration is the second derivative of position).”
- Comment 7: The choice of using the acceleration norm for IMUs is appropriate; however, this removes directional information. Please clarify whether directional acceleration components (e.g., vertical) were explored and why they were not retained.
Response: We agree with this comment. Therefore, we clarified in the Methods (Section 2.6 Variables and Statistical Analysis) that IMU peak acceleration was quantified using the resultant acceleration norm (‖a‖). We also state that directional components (e.g., vertical acceleration) were not retained because IMU signals are expressed in a local sensor frame that may rotate during the fall and landing, making component-wise alignment with the global vertical axis uncertain in this real-condition setting. This change can be found in the revised manuscript at the end of Section 2.6
Lines 315-319: “IMU peak acceleration was quantified using the resultant acceleration norm (‖a‖). Directional components (e.g., vertical acceleration) were not retained because IMU signals are expressed in a local sensor frame that may rotate during the fall and landing. The norm therefore provides a robust, orientation-independent indicator of impact intensity, at the cost of losing directional information.”
- Comment 8: The introduction of “peak acceleration width” is interesting and potentially valuable. However: Please provide a stronger rationale for selecting the 90% threshold. Discuss whether peak width has known biomechanical or injury-related relevance in bouldering or whether it is proposed here as an exploratory metric.
Response: We agree with this comment. Therefore, we added a short rationale in the Methods (Section 2.6 Variables and Statistical Analysis) to justify the 90% threshold and to clarify that peak acceleration width is reported as an exploratory metric in bouldering. Specifically, we now explain that the 90% threshold was chosen to capture the short near-peak portion of the impact transient while reducing sensitivity to lower-amplitude post-impact oscillations and baseline noise. We also explicitly state that the injury relevance of this width metric in bouldering is not yet established and that it is included as a descriptor of impact “sharpness” rather than a validated injury predictor.
Lines 308-314: “This threshold was selected to quantify the duration of the main impact transient (near-peak region) while reducing sensitivity to lower-amplitude post-impact oscillations and baseline noise. Because the biomechanical and injury-related relevance of this metric in bouldering is not yet established, peak acceleration width is reported here as an exploratory descriptor of impact sharpness. A 90% threshold was suggested here as a pragmatic compromise: narrow enough to focus on the near-peak transient, yet wide enough to remain robust to sampling resolution and small peak-shape variations.”
- Comment 9: The statistical framework is generally sound. However, given the repeated-measures nature of the data (multiple falls per participant), please clarify whether participant-level clustering was considered or whether all falls were treated as independent observations.
Response: Thank you for this important comment. We agree that the repeated-measures nature of the dataset could, in principle, motivate the use of participant-level clustering or mixed-effects models. However, such models were not adopted here for several reasons. First, the number of participants was limited (n = 10), with an unbalanced number of falls per participant (ranging from 1 to 6), which would make random-effects estimation unstable and potentially unreliable. Second, the primary objective of the statistical analyses was to compare measurement methods at the level of individual fall events, rather than to model participant-specific effects. Finally, falls were not repeated executions of a standardized task but discrete, non-stationary events with substantial trial-to-trial variability in height, posture, and landing strategy, even within the same participant. For these reasons, analyses were conducted at the trial level and falls were treated as independent observations. This choice is acknowledged as a limitation and is now explicitly stated in the manuscript.
Lines 320-325: “Analyses were conducted at the trial (fall) level. Although multiple falls were recorded per participant, mixed-effects or repeated-measures models were not applied due to the limited number of participants (n = 10), the unbalanced number of falls per participant (1–6), and the non-repetitive nature of fall events. The primary objective of the analyses was to compare measurement methods at the level of individual falls rather than to model participant-specific effects.”
Lines 567-571: “Even though multiple falls were recorded per participant; analyses were conducted at the trial level. Mixed-effects or repeated-measures models were not applied due to the limited number of participants and the unbalanced number of falls per participant; as a result, treating falls as independent observations may not fully account for within-participant correlation.”
- Comment 10: Consider reporting effect sizes (e.g., Cohen’s d or partial η²) alongside p-values for key comparisons, particularly for acceleration differences.
Response: We agree with this comment. Therefore, we added effect sizes alongside p-values for the key paired comparisons of peak accelerations. Specifically, we now report Cohen’s dz (paired effect size) directly in Table S4, with each comparison presented as p-value (Cohen’s dz) to facilitate interpretation of the magnitude of differences beyond statistical significance. In addition, we clarified in the Results section (3.5.1 Peak Magnitude) that everything can be found in Table S4.
Lines 628-629:
Table S4. Peak acceleration (m·s⁻²) by segment and method (IMUs, P2S, Kinovea): mean ± SD and post hoc paired t-test p-values (Cohen’s dz)
|
Segment |
IMU (Mean± SD) |
P2S (Mean ± SD) |
Kinovea (Mean ± SD) |
p(dz) IMU vs P2S |
p(dz) IMU vs Kinovea |
p(dz) P2S vs Kinovea |
|
Ankle |
198.1 ± 75.7 |
133.2 ± 64.0 |
83.5 ± 66.4 |
<0.001 (0.87) |
<0.001 (1.42) |
0.0007 (0.58) |
|
Sacrum |
100.4 ± 44.6 |
56.6 ± 20.0 |
45.6 ± 30.0 |
<0.001 (1.43) |
<0.001 (1.23) |
0.0223 (0.38) |
|
Forehead |
40.3 ± 15.6 |
43.8 ± 16.1 |
38.0 ± 25.3 |
0.0531 (-0.32) |
0.5124 (0.11) |
0.1303 (0.25) |
- Comment 11: “The Discussion appropriately attributes acceleration underestimation to frame rate, filtering, and differentiation. However, it would strengthen the paper to: (i) explicitly state that this limitation reflects fundamental constraints of video-based kinematics, not a failure of Pose2Sim itself; (ii) clarify whether higher-frame-rate cameras or alternative smoothing/differentiation strategies could realistically close this gap in future studies.”
Response: We have clarified the Discussion (paragraph 3) to emphasize that the observed underestimation of peak impact acceleration reflects fundamental constraints of video-derived kinematics (finite frame rate and noise amplification by numerical differentiation), rather than a Pose2Sim-specific limitation. We also added that higher frame rates and alternative smoothing/differentiation strategies may reduce attenuation, but peak impacts are still likely to remain underestimated compared with high-frequency IMUs for very short transients.
Lignes 490-492: “More broadly, this limitation reflects fundamental constraints of video-based kinematics (finite frame rate and noise amplification by numerical differentiation), not a P2S-specific issue.”
Lignes 495-504: “Our resampling analysis confirmed that temporal resolution plays a major role: downsampling IMU data from 1600 Hz to 240 Hz reduced peak accelerations by ~20 to 37% [27–29]. However, even after accounting for this sampling effect, systematic differences remained. Second, some degree of smoothing must be applied to video-derived trajectories to reduce noise, and this filtering intrinsically blunts the true peaks in the signal [31–33]. Additionally, calculating acceleration from positional data requires numerical differentiation, which amplifies any small tracking errors and further diminishes the amplitude of detected peaks [31]. In addition, inconsistencies in joint identification and pose reconstruction remain a known limitation of markerless pipelines, which can further affect the reliability of acceleration estimates [42].”
- Comment 12: The applied relevance is strong, but the authors may consider adding a short subsection or paragraph explicitly outlining practical recommendations for researchers and practitioners (e.g., when Pose2Sim alone is sufficient vs. when IMUs are indispensable).
Response: We agree that the applied relevance would benefit from clearer, actionable guidance. Therefore, we added a short “Practical recommendations” paragraph in the Discussion to explicitly state when Pose2Sim alone is sufficient versus when IMUs are indispensable, based on our findings (accurate fall height/velocity vs. attenuated impact peaks from video-based acceleration). We also added a brief reinforcing sentence in the Conclusion to summarize these use-cases.
Lines 536-547: “Based on the present validation, P2S alone is appropriate when the research or applied question concerns global fall kinematics, such as fall height, peak velocity, trajectory, body orientation, and fall typology in real-condition gym settings. In contrast, when the objective is to quantify impact intensity (e.g., peak acceleration at the ankle or sacrum, short-duration impact transients), high-sampling-rate IMUs remain indispensable, as video-derived accelerations are intrinsically limited by frame rate, smoothing, and numerical differentiation. In practice, we recommend a hybrid workflow: use markerless multi-camera reconstruction to characterize the whole-body fall kinematics and landing strategy, and deploy IMUs on critical sites (e.g., ankle and pelvis/sacrum) to capture the magnitude of impact peaks. When IMUs are not feasible, higher-frame-rate cameras and optimized filtering may reduce attenuation, but peak impact metrics should be interpreted conservatively.”
- Comment 13: Figures are generally clear, but some boxplots (e.g., Figure 8) are visually dense. Consider simplifying legends or adding brief explanatory captions for readers less familiar with acceleration metrics.
Response: We agree that Figure 8 could be dense for readers less familiar with boxplots and acceleration metrics. Therefore, we revised the caption of Figure 8 to briefly explain how to read the boxplots (median, interquartile range, whiskers, and individual trials) while keeping the figure unchanged. In addition, we added a short explanatory sentence in the Results to clarify why variability is expected in this dataset (falls selected across multiple start heights and landing outcomes). These changes are included in Section 3.5.1 (Peak magnitude), immediately after Figure 8, and in the Figure 8 caption in the revised manuscript.
Lines 395-399: “Boxplots display the median (center line), interquartile range (box), and whiskers; points represent individual trials. Asterisks indicate significant post hoc paired t-test differences following RM-ANOVA (α = 0.05). For sacrum and forehead, n = 39 because one trial showed no identifiable acceleration peak at those locations (landing impulse mainly concentrated at the ankle).”
Lines: 400-408: “As shown in Figure 8, peak-acceleration distributions were wider at the ankle and sacrum than at the forehead, indicating greater trial-to-trial variability and higher impacts at lower-body sensors. This variability is expected because trials were purposely selected across multiple start heights and landing outcomes, leading to a broad range of impact intensities.
At the ankle and sacrum, IMU peak accelerations were significantly higher than both P2S and Kinovea (all p < 0.05). At the forehead, no significant differences were observed between methods (all p > 0.05). Effect sizes (Cohen’s dz) are reported alongside p-values in Table S4.”
- Comment 14: Ensure consistency in terminology (e.g., “P2S” vs. “Pose2Sim”) throughout the manuscript.
Response: We agree that terminology should be fully consistent. Therefore, we standardized the nomenclature throughout the manuscript by defining Pose2Sim (P2S) at first mention and using P2S consistently thereafter (and similarly harmonizing related abbreviations and units). This change was applied across the full text except when the Version was mentioned: 2.5 Data Processing: 2.5.1 P2S “Videos were processed with Pose2Sim v0.10.20.”
- Comment 15: Double-check that all supplementary tables are referenced explicitly in the main text.
Response: We agree that all supplementary tables should be explicitly referenced in the main text. Therefore, we ensured that each supplementary table (Tables S1–S5) is cited at the relevant location.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have adequately addressed all of my concerns. I recommend the manuscript for acceptance.
Reviewer 3 Report
Comments and Suggestions for AuthorsThank you for addressing the comments.
Comments on the Quality of English LanguageThank you for addressing the comments.

