Assessing the Contribution of Arm Swing to Countermovement Jump Height Using Three Different Measurement Methods in Physically Active Men
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript addresses an important topic by comparing different methods for measuring countermovement jump (CMJ) height, providing useful insights into their reliability and validity.
Τhe authors are commended for their detailed methodological approach and comprehensive statistical analysis, as well as for clearly defining flight time and incorporating arm swing effects, which enhances the relevance of the findings.
However, there are some concerns and suggestions that should be addressed to improve the clarity, rigor, and reproducibility of the study.
Materials and Methods
Line 100: The rationale for selecting a 3–7 day interval between testing sessions is not fully explained. Although a recovery period of 48–72 hours is generally considered sufficient for submaximal tasks such as countermovement jumps (CMJ), this is valid only if the participant’s activity is strictly controlled. The rather wide interval range may have introduced variability in neuromuscular readiness and fatigue status, particularly if physical activity between sessions was not closely monitored.
The authors should provide:
- an elaborated justification for applying a 3-7 interval between testing sessions, and
- a clear description of how the participant’s behavior was monitored or accounted for during the 3-7 interval between testing sessions.
Line 118: The use of vertical ground reaction force (GRFv) integration to calculate take-off velocity (Vto) in Equation 1 is considered appropriate; however, it is noted that unfiltered GRF data was utilized.
Given that numerical integration is sensitive to high-frequency noise, error in velocity and consequently in jump height estimation may be introduced by this choice.
The authors should:
- Justify the decision not to apply filtering, and
- To discuss the potential impact on data accuracy
Furthermore, the authors should clarify if the GRF data from both lower extremities were used jointly (i.e., summed) in the analysis. If so, they should also indicate whether an analysis was conducted to assess potential asymmetries between legs and whether such differences might affect the validity of the impulse-momentum method outcomes.
Such information would strengthen the methodological clarity and enhance the interpretability of the findings.
Line 125-126: The start of the integration window is defined as the point when GRFv deviates by 1% from body weight, which is acceptable. However, the endpoint of the integration window is not explicitly described. It is assumed that integration ends at take-off, but this should be clearly stated.
The authors should provide detailed definition of both the beginning and end of the integration window to enhance reproducibility and clarity.
Line 130-131: The definition of flight time (tflight) as the interval between the first and second instances when the vertical ground reaction force (GRFv) crosses a threshold of 10 N is appreciated for its clarity. Similarly, the authors should provide explicit detail regarding how take-off time is defined for the integration window in the FPimp method. Consistent and clear definitions of take-off and landing events across methods are important to ensure comparability and reproducibility of jump height calculations.
The authors are advised to include a graph or plot illustrating the GRFv signal with marked points for motion start, take-off, and landing would enhance methodological clarity and assist readers in understanding the data processing steps.
Statistical Analysis
Line 140-174: The statistical analysis section is generally comprehensive and includes appropriate reliability and validity metrics for this type of methodological comparison. However, the current structure would benefit from clearer segmentation and a more logical progression.
As it stands, the section shifts between concepts such as intra-session reliability, concurrent validity, inter-day reliability, and group comparisons in a way that may disrupt the reader’s ability to follow the analytic strategy step-by-step.
It is recommended that the section be reorganized using subheadings for each major type of analysis (e.g., Intra-session reliability, Concurrent validity, Inter-day reliability, comparisons). This would greatly improve readability and help the reader better understand how the statistical methods align with the research objectives.
Additionally, while the manuscript initially specifies that some analyses were based on all three trials and others on the average of the three, this distinction becomes inconsistently applied and unclear throughout the section. Clarifying and consistently indicating which metric was used in each case is important for clarity and procedure reproducibility.
Lastly, it is unclear why the majority of analyses was performed exclusively on Day 1 data, given that data from two sessions were collected. This decision should be explicitly justified in the manuscript. Alternatively, including Day 2 data in the primary analyses (either as a separate analysis or as part of an average) could enhance the robustness of the findings.
Line 150: The phrase “inter-individual concurrent validity” is redundant and may cause confusion. Concurrent validity is typically assessed across the sample; thus, “concurrent validity” alone would suffice and improve clarity.
Results
The Results section presents the key findings in consistency with the methods described. However, for text flow, clarity and better readability, it is suggested that the order of the results’
presentation strictly follows the order of analysis sequence outlined in Methods
For example, inter-day reliability is reported before concurrent validity and reliability, which disrupts the logical reading progression.
Line 177-180: Inter-day reliability is crucial as it reflects the stability and repeatability of the methods across different time points. However, the current analysis and results mainly focus on data from Day 1, leaving unresolved whether the validity and reliability of the methods remain consistent on a second day. Thus, it is suggested that authors provide a more comprehensive insight concerning the methods’ concurrent reliability and validity across different days, thus strengthening the robustness of the conclusions.
Line 199-202: The reported interaction suggests that the effect of the method on jump height depends on the jump modality (CMJAS vs. CMJNAS). However, the follow-up tests indicate a consistent method ranking (PUSH2 > FPtime > FPimp) across both modalities. This pattern is more indicative of a main effect of method, not an interaction. The authors should clarify or reconsider the interpretation of the interaction effect.
Discussion
Line 238-239: The opening sentence of the Discussion should be revised to better reflect the study’s specific purposes as outlined in the Introduction.
Line 248-250: The reported significant interaction between jump modality and measurement method on CMJ height seems to be more accurately described as a main method effect, since the ranking of methods (PUSH2 > FPtime > FPimp) remains consistent across both CMJ with and without arm swing. Clarifying this distinction in the Results and Discussion sections will enhance the accuracy and clarity of data interpretation.
Line 255-261: The discussion on how trunk kinematic variations between CMJ with and without arm swing might influence the PUSH2 IMU’s jump height overestimation is interesting but lacks sufficient detail.
It would be valuable if the authors could explain how such kinematic differences specifically affect the sensor signals and the calculation of jump height.
Additionally, the methods section does not describe how jump height was derived from the sensor data, making it difficult to evaluate the potential impact of these kinematic variations on the measurement accuracy. Including a clear explanation of the jump height calculation from the sensor and how different movement patterns might influence this calculation would improve clarity and strengthen the study’s conclusions.
Line 296-301: The authors note that differences in jump height overestimation across studies may be due to updates in the PUSH2 software versions, which likely include changes in the underlying computation algorithms. However, the manuscript does not clearly specify which algorithm is used by PUSH2 to calculate jump height.
Author Response
Dear Reviewer,
We would like to express our sincere gratitude for the valuable comments you provided about our manuscript. Your insight was very useful in improving our manuscript and we hope that it will now meet your standards. Please find our answer to your comments (italicized) below.
Materials and Methods
Line 100: The rationale for selecting a 3–7 day interval between testing sessions is not fully explained. Although a recovery period of 48–72 hours is generally considered sufficient for submaximal tasks such as countermovement jumps (CMJ), this is valid only if the participant’s activity is strictly controlled. The rather wide interval range may have introduced variability in neuromuscular readiness and fatigue status, particularly if physical activity between sessions was not closely monitored.
The authors should provide:
- an elaborated justification for applying a 3-7 interval between testing sessions, and
- a clear description of how the participant’s behavior was monitored or accounted for during the 3-7 interval between testing sessions.
Response
Thank you for your valuable remarks. About the two points raised:
1. The 3–7 day interval between testing sessions aligns with previous CMJ reliability studies. For instance, Iglesias-Caamano et al. (2022) used a 3-day interval, Cormack et al. (2008) used a 7-day interval, and Dobbin et al. (2018) reported an average interval of 7.9 days (range 5–14 days). This period allows for neuromuscular recovery while minimizing learning or fatigue effects. Crucially, our high inter-day reliability (ICC > 0.94 for jump heights, Table 2) supports that this interval was appropriate for stable performance. We modified the manuscript to introduce this rational (lines 104-105).
2. We added in the Methods (Line ZZZ) that “Participants were also asked to maintain their normal routines, avoid strenuous exercise 48 hours before each test and during the interval between sessions.” We also added a sentence in the limitations to acknowledge the absence of a strict control between days. (Line 318-320)
Line 118: The use of vertical ground reaction force (GRFv) integration to calculate take-off velocity (Vto) in Equation 1 is considered appropriate; however, it is noted that unfiltered GRF data was utilized. Given that numerical integration is sensitive to high-frequency noise, error in velocity and consequently in jump height estimation may be introduced by this choice. The authors should: Justify the decision not to apply filtering, and to discuss the potential impact on data accuracy
Response
Thank you for your comment. Indeed, noise is often a concern when differentiating. Our decision to use unfiltered GRF data for FPimp was primarily to avoid the systematic errors that inappropriate low-pass filtering can introduce, as highlighted by Street et al. (2001). They reported that filters with cut-offs < 580 Hz could underestimate jump height by up to 26%, an error potentially far exceeding that from raw signal noise, especially with our high sampling rate (1200 Hz). This approach aligns with researchers such as Comfort et al. (2018, IJSPP), who also used unfiltered data citing Street et al. (2001). Furthermore, the integration process inherently mitigates the influence of high-frequency noise on the derived take-off velocity.
Furthermore, the authors should clarify if the GRF data from both lower extremities were used jointly (i.e., summed) in the analysis. If so, they should also indicate whether an analysis was conducted to assess potential asymmetries between legs and whether such differences might affect the validity of the impulse-momentum method outcomes. Such information would strengthen the methodological clarity and enhance the interpretability of the findings.
Response
As originally stated in our Methods (Line 120-121), “Unfiltered left and right side vertical ground reaction forces (GRFv) were then summed for analysis [24].” Thus, all impulse momentum calculations used the total vertical force from both legs.
While we find the reviewer’s suggestion about asymmetries deeply interesting, our focus this time was on comparing jump height measurement methods. Combined GRFv in the FPimp method directly reflects the net total vertical force that accelerates the system's center of mass, regardless of its distribution between limbs. This approach, and the consistent use of summed GRFv for take-off and landing detection in both FPimp and FPtime methods, ensures a robust comparison between these force platform derived measures.
Line 125-126: The start of the integration window is defined as the point when GRFv deviates by 1% from body weight, which is acceptable. However, the endpoint of the integration window is not explicitly described. It is assumed that integration ends at take-off, but this should be clearly stated. The authors should provide detailed definition of both the beginning and end of the integration window to enhance reproducibility and clarity.
Response
Thank you for your suggestion. We have now explicitly stated and referenced (Line 126-127) that the integration window for the FPimp method ends at take-off, defined as the first frame where GRFv drops below 10 N.
Line 130-131: The definition of flight time (tflight) as the interval between the first and second instances when the vertical ground reaction force (GRFv) crosses a threshold of 10 N is appreciated for its clarity. Similarly, the authors should provide explicit detail regarding how take-off time is defined for the integration window in the FPimp method. Consistent and clear definitions of take-off and landing events across methods are important to ensure comparability and reproducibility of jump height calculations. The authors are advised to include a graph or plot illustrating the GRFv signal with marked points for motion start, take-off, and landing would enhance methodological clarity and assist readers in understanding the data processing steps.
Response
Thank you for these valuable suggestions. Take-off for FPimp integration is now explicitly defined in Methods (Line1286-128). Additionally, we have added a figure (Figure 2) illustrating a typical GRFv signal with key events marked (weighing, motion start, 10 N take-off/landing thresholds).
Statistical Analysis
Line 140-174: The statistical analysis section is generally comprehensive and includes appropriate reliability and validity metrics for this type of methodological comparison. However, the current structure would benefit from clearer segmentation and a more logical progression.
As it stands, the section shifts between concepts such as intra-session reliability, concurrent validity, inter-day reliability, and group comparisons in a way that may disrupt the reader’s ability to follow the analytic strategy step-by-step.
Response
Following your judicious advice, the Statistical Analysis section has been reorganized with subheadings for each major analysis type to improve its logical progression.
It is recommended that the section be reorganized using subheadings for each major type of analysis (e.g., Intra-session reliability, Concurrent validity, Inter-day reliability, comparisons). This would greatly improve readability and help the reader better understand how the statistical methods align with the research objectives.
Response
We agree with this sensible remark. Consequently, the presentation order in the Results section, along with the organization of figures and tables, has been restructured to match this clearer analytical sequence.
Additionally, while the manuscript initially specifies that some analyses were based on all three trials and others on the average of the three, this distinction becomes inconsistently applied and unclear throughout the section. Clarifying and consistently indicating which metric was used in each case is important for clarity and procedure reproducibility.
Response
Thank you for your comment. The use of individual versus averaged trials is now specified at the beginning of the Statistical Analysis section (Line 149-151). The reorganization of the Results section to match the analytical sequence should also make this distinction clearer throughout.
Lastly, it is unclear why the majority of analyses was performed exclusively on Day 1 data, given that data from two sessions were collected. This decision should be explicitly justified in the manuscript. Alternatively, including Day 2 data in the primary analyses (either as a separate analysis or as part of an average) could enhance the robustness of the findings.
Response
Thank you for your feedback. Following your advice, Day 2 data have now been incorporated into all relevant primary analyses throughout the manuscript. This has enhanced robustness and provided new insights into inter-day variability. Consequently, Bland-Altman results are now summarized in Table 3, and the plots have been removed to avoid redundancy."
Line 150: The phrase “inter-individual concurrent validity” is redundant and may cause confusion. Concurrent validity is typically assessed across the sample; thus, “concurrent validity” alone would suffice and improve clarity.
Response
Well noted. This has been revised to “concurrent validity” throughout.
Results
The Results section presents the key findings in consistency with the methods described. However, for text flow, clarity and better readability, it is suggested that the order of the results’ presentation strictly follows the order of analysis sequence outlined in Methods.
For example, inter-day reliability is reported before concurrent validity and reliability, which disrupts the logical reading progression.
Response
Thank you for your suggestion. As mentioned earlier, we have now restructured the Results section to strictly follow the sequence of analyses outlined in the revised Methods section.
Line 177-180: Inter-day reliability is crucial as it reflects the stability and repeatability of the methods across different time points. However, the current analysis and results mainly focus on data from Day 1, leaving unresolved whether the validity and reliability of the methods remain consistent on a second day. Thus, it is suggested that authors provide a more comprehensive insight concerning the methods’ concurrent reliability and validity across different days, thus strengthening the robustness of the conclusions.
Response
Acknowledged. Day 2 data is now included into all relevant analyses.
Line 199-202: The reported interaction suggests that the effect of the method on jump height depends on the jump modality (CMJAS vs. CMJNAS). However, the follow-up tests indicate a consistent method ranking (PUSH2 > FPtime > FPimp) across both modalities. This pattern is more indicative of a main effect of method, not an interaction. The authors should clarify or reconsider the interpretation of the interaction effect.
Response
Thank you for your insightful comment regarding the interpretation of the interaction effect. In response, we first added a statement on the simple main effect, noting that CMJAS height was significantly greater than CMJNAS height across all three measurement methods (added to Line 186-190): “Regarding jump modalities, post hoc paired t-test with Bonferroni correction revealed that CMJAS height was significantly greater than CMJNAS height across all three measurement methods (p < 0.01).”
Then we clearly linked the AIabs results to the interaction effect by adding the following statement (Line 197-199): “The significant difference in AIabs explains the interaction observed in the two-way ANOVA: PUSH2 amplified the overestimation of jump height due to arm swing to a greater extent in CMJAS than the other methods”
Discussion
Line 238-239: The opening sentence of the Discussion should be revised to better reflect the study’s specific purposes as outlined in the Introduction.
Response
Well noted. The revised sentence now reads (L245-246):
“This study evaluated the reliability and validity of force platform-derived impulse (FPimp), force platform-derived flight time (FPtime), and an inertial measurement unit (PUSH2) for assessing CMJ height, both with and without arm swing, and subsequently examined the implications of these methods on arm contribution indices.”
Line 248-250: The reported significant interaction between jump modality and measurement method on CMJ height seems to be more accurately described as a main method effect, since the ranking of methods (PUSH2 > FPtime > FPimp) remains consistent across both CMJ with and without arm swing. Clarifying this distinction in the Results and Discussion sections will enhance the accuracy and clarity of data interpretation.
Response
Thank you for this valuable comment. In addition to changes to the Results section mentioned earlier, we have added a sentence at the beginning of the third paragraph of the Discussion (Line 257-259) to explicitly acknowledge the consistent ranking of methods, while also highlighting the nature of the significant interaction: “CMJ heights were consistently ranked, from the highest to the lowest, PUSH2 - FPtime - FPimp for both CMJAS and CMJNAS”
Line 255-261: The discussion on how trunk kinematic variations between CMJ with and without arm swing might influence the PUSH2 IMU’s jump height overestimation is interesting but lacks sufficient detail.
Response
We acknowledge that the PUSH2 algorithm is proprietary, which limits a detailed mechanistic explanation. We have addressed this by stating in the Discussion (L263-266): “However, PUSH Inc. has not disclosed their algorithms making it uncertain whether the algorithms for the two CMJ modalities are identical. This is problematic because the computation algorithm, which includes data filtering, can have a sizeable effect on the measurement [37].”
Regarding the influence of trunk kinematics, we have included “Consequently, kinematic differences in the trunk might lead to a more pronounced overestimation of jump height from PUSH2 attached to the participants’ waist when arm swing is involved.” (L271-273)
We believe these statements reflect the extent of what can be discussed given the "black-box" nature of the device.
It would be valuable if the authors could explain how such kinematic differences specifically affect the sensor signals and the calculation of jump height. Additionally, the methods section does not describe how jump height was derived from the sensor data, making it difficult to evaluate the potential impact of these kinematic variations on the measurement accuracy. Including a clear explanation of the jump height calculation from the sensor and how different movement patterns might influence this calculation would improve clarity and strengthen the study’s conclusions.
Response
Our response to this point is essentially the same as our explanation provided above for your comment.
Line 296-301: The authors note that differences in jump height overestimation across studies may be due to updates in the PUSH2 software versions, which likely include changes in the underlying computation algorithms. However, the manuscript does not clearly specify which algorithm is used by PUSH2 to calculate jump height.
Response
Thank you for your remark. Indeed, the algorithm being proprietary, it is not freely available. We, however, included the software version (Line 122-123) which is most likely the version of the algorithm. Additionally, the introduction (Line 61) and the limitation paragraph (Line 320-322) mention the “black box” issue with commercially available solutions. We regret not being able to provide more information.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe current paper looked at validating jump heights across three methods: Force-impulse, flight time and from a black-box algorithm via a PUSH 2.0 band. Overall, the authors have done a good job with the paper, highlighting the rationale for the research and presenting the methods and results clearly.
With the increasing use of wearable sensors and constant monitoring in sport, I think this paper is highly relevant in the current context.
My recommendation is to accept with minor revisions.
Please see my comments below:
L94-97: Power calculation - I was unable to replicate the power calculation. Can you provide the exact inputs for GPower? Furthermore, was the output sample-size divided by 3 to get the final number of 14?. The effect size also seems too high for difference between devices. Is this meant to be the ES for between jump types (i.e., AS vs NAS). If so, I would consider two power calculations. Alternatively, I would consider removing this (unless it is a journal requirement), as power calculations in sports science literature are usually calculated incorrectly due to inherent limitations with the field.
L127: Please provide the Take-off velocity calculation for the FPimp formula. Specifically, at what point did you consider the participant to be "taken off". Was it same as flight time? (i.e., GRF < 10N).
Figure 2: I would suggest the authors center zero in the graphs and make y-axis limits symmetric for better readability.
L259-261: Please provide more context into why you think these specific differences may lead to changes in the measurement from PUSH2 (or an IMU in general). For example, are you assuming they are not using the Gyroscope, or if they are using only a single axis to estimate vertical height, or something entirely different?
L289-291: This was an interesting deviation from the rest of the results. Can you please discuss this further with your thoughts on why this may have shown up.
L311-312: Please separate this into two sentences and provide more clarity about the instruction for arm-swing. Many practitioners will glance through the discussion and jump to the conclusion, and clarification here may help.
Author Response
Dear Reviewer,
We would like to express our sincere gratitude for the valuable comments you provided about our manuscript. Your insight was very useful in improving our manuscript and we hope that it will now meet your standards. Please find our answer to your comments (italicized) below.
L94-97: Power calculation - I was unable to replicate the power calculation. Can you provide the exact inputs for GPower? Furthermore, was the output sample-size divided by 3 to get the final number of 14? The effect size also seems too high for difference between devices. Is this meant to be the ES for between jump types (i.e., AS vs NAS). If so, I would consider two power calculations. Alternatively, I would consider removing this (unless it is a journal requirement), as power calculations in sports science literature are usually calculated incorrectly due to inherent limitations with the field.
Response
Thank you for your comment. Upon reviewing our G*Power procedure in light of your comment, we realized there might have been a misunderstanding in our initial selection of the statistical test option. Thankfully, the revised computation does not impact the practical output as our number of participants matched the power analysis.
Here are the exact parameters used: 'ANOVA: Repeated measures, within factors' test (F tests), with “Effect size f: 0.253” (corresponding to η_p^2 = 0.06), “err prob: 0.05”, “Power (1-β err prob): 0.80”, “Number of groups: 1” (all subjects performed all the conditions), “Number of measurements: 6” (3 methods × 2 modalities), “Correlation among repeated measures: 0.5” (default), “Nonsphericity correction ε: 1” (default)
We added this information to the manuscript (Line 93-98).
L127: Please provide the Take-off velocity calculation for the FPimp formula. Specifically, at what point did you consider the participant to be "taken off". Was it same as flight time? (i.e., GRF < 10N).
Response
Thank you for your remark. Indeed, for the FPimp method, the instant of take-off was defined as when the vertical Ground Reaction Force (GRFv) dropped below 10 N. This definition is the same as that used to determine take-off for the FPtime method. We have clarified this in the revised Methods section (Line 133)."
Figure 2: I would suggest the authors center zero in the graphs and make y-axis limits symmetric for better readability.
Response
Thank you for your suggestion. Based on remarks from other reviewers, we have removed the Bland-Altman plots and have instead summarized the key agreement statistics (systematic bias and 95% Limits of Agreement) in Table 4.
L259-261: Please provide more context into why you think these specific differences may lead to changes in the measurement from PUSH2 (or an IMU in general). For example, are you assuming they are not using the Gyroscope, or if they are using only a single axis to estimate vertical height, or something entirely different?
Response
Thank you for comment. Because PUSH2's algorithm is not publicly available, we cannot definitively state its internal workings. Therefore, we refrained from detailed technical speculation in the manuscript. However, our reasoning for discussing potential trunk kinematic influences (Lines 271-273) is based on general IMU principles: 1) Given its capability to measure barbell velocity, PUSH2 likely uses more than just a single-axis accelerometer. 2) Its reported lower accuracy in highly rotational movements (like power cleans) (Thompson et al., Sports, 2020) might suggest that gyroscope data, if used, may not fully compensate for complex trunk rotations during jumps.
In the manuscript, we kept this general by stating that trunk kinematic differences might lead to overestimation by PUSH2, without speculating on unconfirmed algorithmic details. We appreciate you prompting this deeper consideration.
L289-291: This was an interesting deviation from the rest of the results. Can you please discuss this further with your thoughts on why this may have shown up.
Response
We acknowledge that the PUSH2 algorithm is proprietary, limits a detailed mechanistic explanation from our end for why such device-specific patterns like proportional bias might occur. However, your comment prompted us to re-examine the literature more closely. We found previous research that similarly reported proportional bias when comparing PUSH2 with force platforms, which we have now cited and briefly discussed in the revised manuscript. (L304-306)
“While proportional bias was generally not observed between FPimp and FPtime, it tended to be present or higher in comparisons involving PUSH2 against the other two methods, which aligns with findings from other research using PUSH2 [20].”
L311-312: Please separate this into two sentences and provide more clarity about the instruction for arm-swing. Many practitioners will glance through the discussion and jump to the conclusion, and clarification here may help.
Response
In response to feedback from another reviewer, the entire Conclusion section has been revised.
Reviewer 3 Report
Comments and Suggestions for AuthorsThis study assessed the reliability and validity of three measurement methods for determining jump height during countermovement jumps with and without arm swing in physically active men, and analyzed their impact on arm contribution indices. The obtained results provides sport training and athlete monitoring with reliable measurement tools and methods, thereby optimizing training programs and enhancing athletic performance.
The following comments should be addressed before considering of publication:
1.Specific results are not recommended for writing in the abstract and are more suitable for the conclusion section.
2. It is recommended to use a table to compare the characteristics, advantages, and disadvantages of the three methods.
3. Algorithm optimization can affect measurement results. Since the paper doesn't clarify the algorithm used by PUSH Band, it can't represent the general use of IMU methods.
4. Please provide the type and model number of the IMU sensor.
5. The paper is more like a test report of the product, which lacks theoretical elaboration and innovative methods. It is recommended that the authors adopt more theoretical methods or analyze the experimental data in depth, so as to enhance the academic value of the paper. For example, constructing mathematical models or proposing new algorithms.”
6. The conclusion should highlight the academic contributions of this paper and provide recommendations for future research directions.
Author Response
Dear Reviewer,
We would like to express our sincere gratitude for the valuable comments you provided about our manuscript. We did our utmost to provide argumented answers to each, and we hope that they will properly address your concerns. Please find our answer to your comments (italicized) below.
1.Specific results are not recommended for writing in the abstract and are more suitable for the conclusion section.
Response
Thank you for your suggestion. After careful consideration, we found that the absence of data in the abstract may cause ambiguity for readers first evaluating the merit of our study before investing time into reading the full manuscript. Additionally, the journal's guidelines about the conclusion states: 'A final summarizing comment of the main conclusions or interpretations.' Accordingly, our Conclusion section (Lines 26-30) only synthesizes the primary takeaways and the broader implications derived from our findings, rather than reiterate specific numerical results from the Results section. Therefore, while we appreciate the merit of your suggestion, based on adherence to the journal’s guidelines and concerns about clarity, we decided to leave both sections unchanged.
2. It is recommended to use a table to compare the characteristics, advantages, and disadvantages of the three methods.
Response
Thank you for your remark. We agree on the utility of a comparison table for readers. However, the primary focus of this study is the empirical validation of these methods for CMJ height under specific conditions (with/without arm swing) and their impact on arm contribution indices, rather than a general review of all their characteristics, advantages, and disadvantages. While such a table is undeniably informative, we also feel that a comprehensive comparison would extend beyond the scope of the present study. We have briefly discussed relevant method characteristics in the Introduction and Discussion as they pertain to our findings. Therefore, we have opted to keep the manuscript focused on our specific research aims. Thank you for your understanding.
3. Algorithm optimization can affect measurement results. Since the paper doesn't clarify the algorithm used by PUSH Band, it can't represent the general use of IMU methods.
Response
Thank you for your comment regarding the specificity of our findings to the PUSH Band 2.0 due to its proprietary algorithm, and how this might limit generalization to all IMU methods. We fully agree that this point is important. While we cannot discuss the algorithm's inner workings, we tried to address this concern, if only partially, by providing the software version (Line 122-123) and explicitly mentioned the need for caution in generalizing the present results in the paragraph about limitations (Line 315-322).
4. Please provide the type and model number of the IMU sensor.
Response
Thank you for your suggestion. Regarding the IMU sensor details, available information about the PUSH Band 2.0 on lines 62-63. Unfortunately, as with many commercially available equipment, the manufacturer does not publicly disclose the specific model numbers of its internal sensor components.
5. The paper is more like a test report of the product, which lacks theoretical elaboration and innovative methods. It is recommended that the authors adopt more theoretical methods or analyze the experimental data in depth, so as to enhance the academic value of the paper. For example, constructing mathematical models or proposing new algorithms.
Response
Thank you for your suggestion. We would like to clarify that the primary objective of this study was neither to develop or refine IMU algorithms, nor to construct new theoretical mathematical models for jump analysis. Instead, our main purpose was to evaluate the practical utility and measurement properties (i.e., reliability and validity) of commercially available and commonly used measurement tools, including a popular IMU device (PUSH Band 2.0), in a real-world context of assessing countermovement jump performance with and without arm swing.
- The conclusion should highlight the academic contributions of this paper and provide recommendations for future research directions.
Response
Thank you for this comment. While our manuscript is targeted at coaches and practical implementation of athletic evaluation, the academic contribution is the effect of arm swing on jump performance measurement, which occupies most of the conclusion. Therefore, it appears difficult to highlight it any further. We have added recommendations for future research (L322-325).
“Future research may need to clarify how the arm swing impacts the reliability of jump performance measurement across populations of athletes, notably higher-level athletes implementing jumping as a tool to monitor training implementations over longer time periods.”
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe major revisions previously suggested have been successfully implemented, resulting in substantial improvements to the manuscript. The methodological and structural changes—particularly in the Materials and Methods and Results sections—have greatly enhanced the clarity, rigor, and logical flow of the paper. The justifications provided for key methodological decisions (e.g., GRF filtering, test–retest interval) are sound and well-supported by relevant literature.
A few minor revisions are still recommended to further refine the manuscript. These include slight adjustments to the sequencing of analysis presentations and the relocation of interpretive statements from the Results to the Discussion section, in order to better align with academic writing conventions.
Overall, the manuscript is significantly improved and is approaching a publishable standard.
Specific Comments:
Methods Section (Line 120):
The rationale for using unfiltered GRF data to calculate take-off velocity (Vto) and jump height via integration was well-argued in the response. It is recommended that this justification be added to the Methods section. Including this detail would strengthen the methodological rigor of the manuscript and address potential concerns about signal processing that readers may have.
Line 150:
The sentence, "The height of each participant’s three CMJNAS or CMJAS was used to compute intra-session and concurrent reliability, along with Bland–Altman analyses. The average of the three trials was used for the other analyses," would benefit from additional clarification. It is currently unclear which specific analyses (e.g., ANOVAs, descriptive statistics, test–retest reliability, correlations) are included under “other analyses.”
Consider revising the sentence for clarity. For example:
“The height of each participant’s three CMJNAS or CMJAS trials was used to compute intra-session and concurrent reliability, along with Bland–Altman analyses. The average of the three trials was used for the other analyses, including ANOVAs, assessments of concurrent validity, and inter-day test–retest reliability.”
Lines 162–165:
The Results section has been restructured to more closely follow the sequence outlined in the revised Statistical Analysis section, which improves overall clarity. However, the inter-day test–retest reliability analysis is still presented early in the Results, immediately following the intra-session reliability results. While this thematic grouping makes sense (as both assess reliability), it does not accurately reflect the order of presentation in the Methods, where test–retest reliability is presented after the ANOVA analyses.
To ensure alignment between the Methods and Results sections, consider moving the following sentence from its current location to follow the intra-session reliability description (line 155):
“In addition, test–retest inter-day reliability of all variables between Day 1 and 2 was assessed using ICC2,1 (i.e., two-way random-effects model with absolute agreement) with a 95% CI [31].”
This change would improve consistency in the analytical structure across sections.
Lines 197–199:
The explanation provided in these lines—“The significant difference in AIabs explains the interaction observed in the two-way ANOVA: PUSH2 amplified the overestimation of jump height due to arm swing to a greater extent in CMJAS than the other methods.”—is insightful and appropriately interprets the observed statistical interaction. However, this type of interpretive comment is more suitable for the Discussion section. The Results section should remain focused on reporting the findings without interpretation.
Author Response
Methods Section (Line 120):
The rationale for using unfiltered GRF data to calculate take-off velocity (Vto) and jump height via integration was well-argued in the response. It is recommended that this justification be added to the Methods section. Including this detail would strengthen the methodological rigor of the manuscript and address potential concerns about signal processing that readers may have.
Response
Thank you for your comment. Following your suggestion, we have added the explanation as follows:
Unfiltered left- and right-side vertical ground reaction forces (GRFv) were then summed for analysis because low-pass filtering can cause substantial underestimation of jump height [27]. (L120-121)
Line 150: The sentence, "The height of each participant’s three CMJNAS or CMJAS was used to compute intra-session and concurrent reliability, along with Bland–Altman analyses. The average of the three trials was used for the other analyses," would benefit from additional clarification. It is currently unclear which specific analyses (e.g., ANOVAs, descriptive statistics, test–retest reliability, correlations) are included under “other analyses.”
Consider revising the sentence for clarity. For example:
“The height of each participant’s three CMJNAS or CMJAS trials was used to compute intra-session and concurrent reliability, along with Bland–Altman analyses. The average of the three trials was used for the other analyses, including ANOVAs, assessments of concurrent validity, and inter-day test–retest reliability.”
Response.
Thank you for your comment. We have revised the sentence exactly as you suggested for improved clarity. (L150-151)
Lines 162–165: The Results section has been restructured to more closely follow the sequence outlined in the revised Statistical Analysis section, which improves overall clarity. However, the inter-day test–retest reliability analysis is still presented early in the Results, immediately following the intra-session reliability results. While this thematic grouping makes sense (as both assess reliability), it does not accurately reflect the order of presentation in the Methods, where test–retest reliability is presented after the ANOVA analyses. To ensure alignment between the Methods and Results sections, consider moving the following sentence from its current location to follow the intra-session reliability description (line 155):
“In addition, test–retest inter-day reliability of all variables between Day 1 and 2 was assessed using ICC2,1 (i.e., two-way random-effects model with absolute agreement) with a 95% CI [31].”
This change would improve consistency in the analytical structure across sections.
Response:
Thank you for your helpful comment. In accordance with your suggestion, we have moved the sentence and revised the structure of both the Statistical Analysis and Results sections to ensure consistency. Specifically, the order of analyses is now as follows: (L157-159)
intra-session reliability,
inter-day test–retest reliability,
ANOVA,
concurrent validity,
Bland–Altman analysis,
and concurrent reliability.
Lines 197–199:
The explanation provided in these lines—“The significant difference in AIabs explains the interaction observed in the two-way ANOVA: PUSH2 amplified the overestimation of jump height due to arm swing to a greater extent in CMJAS than the other methods.”—is insightful and appropriately interprets the observed statistical interaction. However, this type of interpretive comment is more suitable for the Discussion section. The Results section should remain focused on reporting the findings without interpretation.
Response
Thank you for your comment. Based on the suggestion from another reviewer, we intended to explain the reason for the observed interaction at this point. However, we acknowledge that the original sentence extended into interpretation. We have now revised the sentence to present only the factual outcome:
“This greater difference in AIabs from PUSH2 explains the significant interaction between jump modality and measurement method.” (L199-201)
Reviewer 3 Report
Comments and Suggestions for AuthorsIt is recommended that the abstract section and the conclusion section be rewritten to highlight the purpose and significance of the article's research. The abstract is too complicated with data and a lot of abbreviations to show the focus and innovation of the study. The conclusion section is too simple to demonstrate the extensive work done by the authors.
Author Response
Thank you for your constructive comments. In accordance with your suggestions, we have revised the Abstract to more clearly highlight the study’s purpose and significance. Specifically, we removed less essential numerical details and avoided the use of abbreviations such as “IMU,” “AIabs,” and “AIrel” to enhance clarity and focus.
Regarding the Conclusion section, we agree that it was previously too brief to reflect the scope of our work. We have revised it to better summarize the key findings and highlight the practical implications of the study. (L329-333)