Criterion Validity of iOS and Android Applications to Measure Steps and Distance in Adults
Round 1
Reviewer 1 Report
This work presents a criterion validity of iOS and Android applications to measure steps and distance in adults.
Please compare your work with similar approaches and present a clear discussion about the common outcomes and limitations.
Clearly state why and how this paper can support future research activities in this field.
Author Response
Please compare your work with similar approaches and present a clear discussion about the common outcomes and limitations.
Clearly state why and how this paper can support future research activities in this field.
We believe that we have substantially discussed our findings (presenting both strengths and limitations of this study) based on previous similar approach.
In addition, in lines 385-388 it is clearly stated how this paper can support future similar research activities.
Reviewer 2 Report
In this manuscript, the authors examine the measurement accuracy of several freeware accelerometer-based physical activity smartphone apps, during incremental intensity treadmill waking and jogging. The focus was on two measures, step counts and distance. Thirty healthy adults participated in a cross-sectional study where they were fitted with an Android and an iPhone, each running the same set of four different apps simultaneously (Runtastic Pedometer, Accupedo, Pacer and Argus). Each participant walked or jogged for 5-minute intervals as each of three predefined speeds of 4.8, 6.0, and 8.4 km/h on a treadmill and two researchers counted every step taken with a digital tally counter. Accuracy/validity was evaluated by comparing each app with the criterion measure using Repeated measures ANOVA, Mean Absolute Percentage Errors (MAPE), and Bland-Altman plots. Results suggest that the Android apps performed slightly more accurately than the iOS apps when measuring step counts; MAPE were generally low for all apps, and accuracy increased at higher speeds. However, errors were generally higher for estimating distance. The authors conclude that the findings suggest that accelerometer-based apps are accurate tools for counting steps during treadmill walking and jogging and could be considered suitable for use as an outcome measure within a clinical trial, but none of the apps was suitable for measuring distance during treadmill walking and jogging.
This is a well-written evaluation of four physical activity apps, with findings that are helpful for facilitating physical activity surveillance, with implications for suitability for clinical trials. Minor suggestions for strengthening the manuscript follow.
- Pg. 4, lines 150-160. It would help to explicitly clarify that the objective measures described in this paragraph were used as the criterion measures for step counts and distance.
- Pg. 5, Statistical analyses. It would help if the authors could walk the reader through the statistical analyses, instead of assuming that they will be familiar with repeated measures ANOVA and Bland-Altman tables/plots (not all readers will be as familiar as the authors with these methods). For example, it would help to be more explicit about what repeated measures were used in each ANOVA, since there are 4 sets of paired measures (1 for Android, 1 for iPhone) of two outcomes (step counts, distance) over three speeds (4.8, 6.0, and 8.4 km/h). It is hard to tell from the table what was treated as a repeated measure, given the separate F tests for each of the 8 apps, as well as an overall F test in the text (lines 228-236). Please clarify.
- It is customary to include the degrees of freedom for an F test. Although the degrees of freedom are included for the overall MANOVA on pg. 5, the degrees of freedom for each F test in Tables 2 and 4 need to be shown, at least in a footnote for each table, assuming that the degrees of freedom are the same for each F test.
- It would help to walk the reader through the Bland-Altman results in Tables 3 and 5 and the plots in Appendix A. Otherwise, it may be hard for the reader to understand what these results mean from an intuitive perspective. The authors need to be careful in their terminology. Consider using “widest” instead of “highest” when referring limits of agreement, since “highest” may mislead the reader into thinking that there is high agreement when the authors intended to refer to “wider” limits of agreement. It would also help to explain explicitly what the slope represents in the Bland-Altman tables and plots.
Author Response
- 4, lines 150-160. It would help to explicitly clarify that the objective measures described in this paragraph were used as the criterion measures for step counts and distance.
Now in lines 161-163 it is explicitly mentioned that the objective measures were used as the criterion measures.
- 5, Statistical analyses. It would help if the authors could walk the reader through the statistical analyses, instead of assuming that they will be familiar with repeated measures ANOVA and Bland-Altman tables/plots (not all readers will be as familiar as the authors with these methods). For example, it would help to be more explicit about what repeated measures were used in each ANOVA, since there are 4 sets of paired measures (1 for Android, 1 for iPhone) of two outcomes (step counts, distance) over three speeds (4.8, 6.0, and 8.4 km/h). It is hard to tell from the table what was treated as a repeated measure, given the separate F tests for each of the 8 apps, as well as an overall F test in the text (lines 228-236). Please clarify.
More information regarding repeated measures ANOVA and Bland-Altman plots are now included in the Statistical analysis section (lines 215-218).
- It is customary to include the degrees of freedom for an F test. Although the degrees of freedom are included for the overall MANOVA on pg. 5, the degrees of freedom for each F test in Tables 2 and 4 need to be shown, at least in a footnote for each table, assuming that the degrees of freedom are the same for each F test.
In the RM-ANOVA analyses, the degrees of freedom were initially presented not only for the overall RM-ANOVA, but also for the statistically significant post-hoc pairwise comparisons (1,29). Now, this information is also included as a footnote under Tables 2 and 4.
- It would help to walk the reader through the Bland-Altman results in Tables 3 and 5 and the plots in Appendix A. Otherwise, it may be hard for the reader to understand what these results mean from an intuitive perspective. The authors need to be careful in their terminology. Consider using “widest” instead of “highest” when referring limits of agreement, since “highest” may mislead the reader into thinking that there is high agreement when the authors intended to refer to “wider” limits of agreement. It would also help to explain explicitly what the slope represents in the Bland-Altman tables and plots.
“Highest” was replaced by “widest”, as suggested by the reviewer. The explanation of what slope represents is now presented in the Statistical analysis section (lines 271-273 and 311-314). Furthermore, the final sentences of the Bland-Altman plots results further explain what the various slopes represent.
Reviewer 3 Report
Dear Author,
Thank you for the opportunity to review the paper entitled “Criterion validity of iOS and Android applications to measure steps and distance in adults”. The study examined the criterion validity of freeware accelerometer-based PA smartphone apps, during incremental intensity treadmill walking and jogging. The presented paper takes up an interesting research problem, gaining much popularity. The study's conclusions provide a practical message, to be applied to future research in this area. I like the extensive introduction of the work, showing the findings to date. However, I miss a paragraph indicating the use of other accelerometers for physical activity measurement (e.g. wearable sensors, like ActiGraph, SenseWear Armband, RT3), which are also used for step count or distance analysis. The clarity of the article is high, methodologically correct, but I have some minor comments and suggestions that I think should be considered.
Minor
- I found an incorrect reference format
- Lines 151-153: the sentence regarding the distance in the test, I think it can be moved to the results
- Tables should be placed in the manuscript instead of at the end (lines 183, 238,…), likewise with figures (lines 246, 283,…). According to the journal's guidelines: All Figures, Schemes and Tables should be inserted into the main text close to their first citation and must be numbered following their number of appearance (Figure 1, Scheme I, Figure 2, Scheme II, Table 1, etc.).
Author Response
I miss a paragraph indicating the use of other accelerometers for physical activity measurement (e.g. wearable sensors, like ActiGraph, SenseWear Armband, RT3), which are also used for step count or distance analysis.
A paragraph including 3 systematic reviews of physical activity wearable monitors (e.g., Fitbit, Garmin, etc.), which are more similar to the smartphone apps (instead of research-based activity trackers) is now included in lines 62-71.
- I found an incorrect reference format.
The journal initially accepts Free Format Submissions, so the APA format was initially used due to time constraints. Now, the reference format has been changed according to the journal’s guidelines.
- Lines 151-153: the sentence regarding the distance in the test, I think it can be moved to the results.
The distance covered in the 3 trials has been deleted and has been moved in the Results section (lines 277-278).
- Tables should be placed in the manuscript instead of at the end (lines 183, 238,…), likewise with figures (lines 246, 283,…). According to the journal's guidelines: All Figures, Schemes and Tables should be inserted into the main text close to their first citation and must be numbered following their number of appearance (Figure 1, Scheme I, Figure 2, Scheme II, Table 1, etc.).
Thank you for noticing this, however since all tables and figures were large enough, and the orientation of the word document should be changed to include each table separately, it was preferred to include all tables and figures at the end of the manuscript and during the editing process the editors will fix them in place accordingly. All Tables and Figures are numbered following their number of in-text appearance (Table 1, Table 2, Figure 1, etc.).
Round 2
Reviewer 1 Report
My comments are not addressed in the revised version of the manuscript.
The proposed method is not quantitatively compared with similar approaches to show better performance. This comparison should be done using a table.
Author Response
I would like to thank the reviewer for his/her comments in order to further improve the manuscript.
However, I would like to ask the reviewer to more adequately explain what does (s)he mean with the sentence “The proposed method is not quantitatively compared with similar approaches to show better performance”. Does (s)he mean that the methodology followed is not adequately developed/performed/presented/analyzed? If this is the case, the methodology presented is similar to previous validation studies that have been carried out worldwide, and I firmly believe that it is adequately presented and explained. If the reviewer has more specific comments on this, it would be greatly appreciated if (s)he could more adequately explain what does (s) he mean. And also, what does “show better performance” mean? Performance of the methodology presented, of the results, of the apps, etc.? This should be more adequately explained. Also, I firmly believe that in the Discussion section the Results of this study have been clearly discussed and connected to the Results of many previous similar validation studies.
Furthermore, the reviewer mentions that “This comparison should be done using a table”. Once more, it is not clear what does the reviewer mean by this. For example, what should be the elements included in this table?
Finally, I would like to highlight that the 2 other reviewers had minimum comments following the 1st review round, and all comments have already been adequately addressed. The comments did not include any serious flaws in the Methodology followed, the presentation of the Results and the Discussion section. It would be greatly appreciated if the comments of the 1st reviewer are more focused and better explained.
I would like to thank you once more for the support during the review of this paper.