Optimizing Assessment Thresholds of a Computer Gaming Intervention for Students with or at Risk for Mathematics Learning Disabilities: Accuracy and Response Time Trade-Offs
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThank you for the opportunity to review this manuscript. The study tackles an important problem in early mathematics assessment and brings a creative analytic approach through drift diffusion modeling. The dataset is strong, and the authors clearly invested effort into the analysis. The work has the potential to make a useful contribution to research on fluency measurement for students with or at risk for mathematics learning difficulties.
Several areas need attention to strengthen the paper. The introduction provides helpful background, but the main research questions are not stated plainly. Adding a short section that lays out the research questions up front would give readers a clearer roadmap. The discussion offers thoughtful insights, yet some of the broader recommendations extend past what the current sample can fully support. Tightening the link between specific results and the claims drawn from them will make the argument more convincing. The manuscript also relies on long, complex sentences that sometimes obscure key points. More concise writing and consistent terminology will make the work easier to follow. With these revisions, the paper can communicate its contributions more clearly and more effectively.
Comments on the Quality of English LanguageThe English is generally clear, but several areas would benefit from editing for precision and flow. Long sentences in the introduction and discussion make important ideas harder to follow, and a few shifts in terminology create minor confusion. A round of careful revision focused on sentence clarity, conciseness, and consistent terminology will strengthen the readability of the manuscript.
Author Response
Comments 1: [Comments and Suggestions for Authors
Thank you for the opportunity to review this manuscript. The study tackles an important problem in early mathematics assessment and brings a creative analytic approach through drift diffusion modeling. The dataset is strong, and the authors clearly invested effort into the analysis. The work has the potential to make a useful contribution to research on fluency measurement for students with or at risk for mathematics learning difficulties.
Several areas need attention to strengthen the paper. The introduction provides helpful background, but the main research questions are not stated plainly. Adding a short section that lays out the research questions up front would give readers a clearer roadmap. The discussion offers thoughtful insights, yet some of the broader recommendations extend past what the current sample can fully support. Tightening the link between specific results and the claims drawn from them will make the argument more convincing. The manuscript also relies on long, complex sentences that sometimes obscure key points. More concise writing and consistent terminology will make the work easier to follow. With these revisions, the paper can communicate its contributions more clearly and more effectively.]
Response 1: [Thank you for the thoughtful suggestions to sharpen our paper. We reorganized introduction with additional section of research questions. Throughout the paper, we also shortened long sentences, as suggested, to clarify key points and improve the readability and made changes to tighten the link between results and discussion as well.]
Comments 2: [Comments on the Quality of English Language
The English is generally clear, but several areas would benefit from editing for precision and flow. Long sentences in the introduction and discussion make important ideas harder to follow, and a few shifts in terminology create minor confusion. A round of careful revision focused on sentence clarity, conciseness, and consistent terminology will strengthen the readability of the manuscript.]
Response 2: [We went through a series of additional editing process focusing on sentence clarity, conciseness, and consistent terminology use.]
Reviewer 2 Report
Comments and Suggestions for AuthorsThe study presents an interesting and valuable contribution to understanding the balance between speed and accuracy in mathematics learning interventions using the drift diffusion model. The topic is relevant and timely, particularly in the context of technology-based assessments for students with or at risk for mathematics learning disabilities. However, several areas of the manuscript require substantial improvement to strengthen its clarity, coherence, and scholarly impact.
Comments for author File:
Comments.pdf
Author Response
Comments 1: [Comments and Suggestions for Authors
The study presents an interesting and valuable contribution to understanding the balance between speed and accuracy in mathematics learning interventions using the drift diffusion model. The topic is relevant and timely, particularly in the context of technology-based assessments for students with or at risk for mathematics learning disabilities. However, several areas of the manuscript require substantial improvement to strengthen its clarity, coherence, and scholarly impact.]
Response 1: [Thank you for the suggestion. We revised to improve clarity, coherence, and scholarly impact throughout the paper.]
Reviewer 3 Report
Comments and Suggestions for AuthorsSee the attached file
Comments for author File:
Comments.pdf
Author Response
Dear Authors,
Thank you for the opportunity to review your manuscript. This is an innovative and technically rigorous study that applies the drift diffusion model (DDM) to explore speed accuracy trade-offs in a computer-based mathematics intervention for students with or at risk for learning disabilities. The work contributes to the growing intersection between educational data science and special education interventions. I found the study valuable and methodologically strong, but several areas would benefit from clarification and tightening before publication.
1. Conceptual Framing
- Comments 1: [The introduction presents a solid theoretical base but becomes dense and technically focused early Consider opening with a clearer educational rationale: why studying response time and accuracy trade-offs matters for early numeracy intervention and teacher decision-making.]
Response 1: [We added educational rationale in section 1.3 Accuracy and Timing Elements.]
- Comments 2: [Clarify whether the target population is students with learning disabilities in general or specifically with mathematics learning disabilities (MLD). This distinction affects interpretation throughout the text.]
Response 2: [We have clarified (throughout the paper) that we focused on students with specific MLD.]
- Comments 3: [The background on the NumberShire game could appear earlier to ground readers unfamiliar with this intervention.]
Response 3: [We agree with this suggestion. However, along with other reviewers, our team had a long discussion on which components should appear first (e.g., fluency, timing elements, DDM). Consequently, we intentionally introduced NumberShire in the materials/methods section. If this is a significant weakness, we will be happy to reorganize the order. ]
2. Educational Relevance
- Comments 4: [While the modeling is robust, the educational implications of findings remain somewhat Expand on how teachers, curriculum developers, or game designers could use these thresholds to guide instructional pacing, progress monitoring, or adaptive feedback.]
Response 4: [In the discussion section, as suggested, we added/expanded on how developers/designers can use our findings to guide their design/development process and how teachers can utilize our findings.]
- Comments 5: [Explicitly connect your results to established frameworks such as RTI/MTSS or data-driven differentiation in mathematics education.]
Response 5: [Thank you for the suggestion. We made a clear connection between our results with MTSS and making instructional decisions to provide additional support or intensify instructions for students with or at risk for MLD.]
3. Methods and Analysis
- Comments 6: [The methods section is very strong but should be more reader-friendly for a non-psychometric audience. A figure or flowchart explaining the DDM
parameters (boundary separation, drift rate, etc.) and how these relate to student behavior would help.]
Response 6: [To improve accessibility for readers unfamiliar with drift–diffusion modeling, we added Figure 1 (and adjusted other figure numbers accordingly), which provides a simple flowchart illustrating the key DDM parameters (boundary separation, drift rate, bias, and non-decision time) and how they relate to students’ observable response patterns.]
- Comments 7: [Simplify technical explanations when possible (e.g., reduce repetition of parameter ranges).]
Response 7: [We reviewed the Methods-analysis section and revised several sections to simplify technical content, particularly by reducing redundant explanations of parameter ranges and model specifications.]
4. Discussion and Implications
- Comments 8: [The “sweet spot” idea is a valuable finding; however, clarify its practical meaning. What does “4.1 seconds optimal RT” imply for instructional design or game settings?]
Response 8: [We have described how we applied the sweet spot (87% accuracy rate with 4.1 seconds) to propose 15 digits correct per minute. To clarify its practical meaning, we added how teachers can review and provide additional support or move students between tiers.]
- Comments 9: [The discussion could better integrate the results with current math fluency literature and digital learning research.]
Response 9: [We added sentences to describe the connections between our results with the current fluency (e.g., Stocker et al., 2019). As well, we added a suggestion for developers consider 15 DCPM for technology-based mathematics interventions.]
- Comments 10: [Add a short subsection summarizing limitations and future directions, particularly regarding generalizability beyond the current dataset and implications for real-time adaptive interventions.]
Response 10: [Absolutely. We added a note in the limitation section.]
This paper represents a valuable and well-executed contribution that bridges cognitive modeling and educational technology in special education contexts. emphasizing clarity, practical interpretation, and contextual framing, it will be a strong fit for Education Sciences.
Reviewer 4 Report
Comments and Suggestions for Authorseducation-3983573
Thank you for the opportunity to review the manuscript, “Optimizing Assessment Thresholds of a Computer Gaming Intervention for Students with or at risk for Mathematics Learning Disabilities: Accuracy and Response Time Trade-offs” for publication in Education Sciences.
This is an extremely well-written and clearly explained manuscript. The authors highlight the use of accuracy and response times within the context of technology-based intervention and assessment. The authors used the NumberShire intervention and analyzed accuracy and response data from first-grade students with or at risk for mathematics learning disabilities (MLD). While the manuscript is quite technical, the presentation of the results, conversation around mathematical fluency, and how computer-based programs and assessments can utilize accuracy and response rate to individualize instruction is important.
Introduction: The introduction clearly frames how procedural fluency supports conceptual understanding and provides nice insight into how mathematical fluency is defined and the importance of accuracy and timing elements. The study’s purpose and research goals are explicit and well-written.
- P.1, Line 35 – Sentence: “Yet, academically low-achieving students, especially those with or at risk for MLD have experienced learning difficulties across the school years.” This sentence reads as a transition to why conceptual and procedural fluency is important for students with MLD from the prior sentence. However, the claim regarding that they experience learning difficulties across the school years could be reframed or made more relevant to mathematical learning. The next sentence makes that connection clearer with the description of inefficient strategies and longer response times.
- P.2, Lines 43‒58: You give background information related to the accuracy-speed trade-off and introduce Fitts’ law as a model to predict speed based on task difficulty. Could you share an additional sentence or two about Fitts’ law to ensure the reader has a firm understanding?
- P.2, Lines 54‒55: Minor edit “different group of students” should be “groups” and “More study is needed” should also be rephrased.
- P.3, Line 90: Minor edit to remove “in” from the sentence, “Other researchers suggest that in early mathematical fluency is an indicator…”.
- P.4, Line 145: Minor edit “yet it has not been widely used in the field of special education, particularly technology-based interventions…,” should the word “among” be inserted after “particularly”.
Method: The methodology is clear and replicable. I was not familiar with this intervention, so I appreciated reading the clear descriptions and the inclusion of the sample lesson. The figure also complimented the gameplay pathways description. The tables were clear and easy to read. The data analysis description was extremely thorough and well-described.
- P.5, Line 197: Missing the word “with” when describing students at risk for MLD.
- P.5, Line 201: Missing a hyphen between first and grade.
- P.5, Line 207: The inclusion of the quotation to demonstrate the elder’s explicit instruction might benefit from some rephrasing for clarity.
- P.5, Line 226: Remove “the” between suggest and significant.
- P.6, Line 247 & 248: I believe APA indicates that number ranges should utilize an en dash as opposed to a hyphen.
- P.6, Line 263: The authors state that “An Embedded Assessment section” was described previously, however, I could not find a clearly indicated section in the previous pages related to embedded assessment. Embedded assessment is mentioned in the following section regarding the data set. Is this what the author is referring to?
- P.7, Lines 290‒303: The authors describe that they identified 27 mini-games that addressed the focal standard of addition and subtraction. Can the authors verify at this point whether they included both addition and subtraction assessment items in the analysis or if it was only addition? Did the authors consider whether RT differed based on problem type e.g., did addition facilitate faster times and accuracy than subtraction? Was there any reason as to why 2 minutes was the cutoff for outliers? In this paragraph, the authors mention that RT was collected in milliseconds from the server. I am assuming that the authors converted milliseconds to seconds for ease of reporting because the average reported is in seconds. Would it also be of use to report on the RT range in the table?
- P.9, Line 316: The P in point-biserial does not need to be capitalized.
- P.9, Line 329: The authors use the notation of t and subscript 0 for non-decision time, however, earlier in the paper the notation was different. Could the authors verify the correct one?
- P.9, Line 356: I understand that when you conducted ANOVAs you used partial eta- squared as the effect size. Is there a reason this effect size was used?
- P.9, General: As you discuss your data analysis plan, e.g., conducting scatterplots etc., would it be useful to indicate which tables/figures represent that information?
Results: The results and associated tables and figures are all well-explained and illustrated.
- P.10, 364‒372: Could you provide the significance values for the correlations?
- P.12, 411: The sentences seems like it is missing the word “if” ,“to establish [if] the model adequately fits…”
Discussion: The discussion is clear and connects back to the ideas presented in the introduction.
- P.18, Lines 543‒545: Could the authors reword the sentence that begins with “Based” for clarity?
- P.18, Lines 546: Could the authors expand on the idea of “capping a maximum RT”? Is this suggesting that within the game, they would allow about 6.4 seconds and then force the student into a specific learning pathway? The Domingue citation is missing a period after et al.
- P.18, Line 551: The authors share that optimal response thresholds are novel in SpEd research. Could they add a sentence or two about research that has used this concept or extend to the practical utility of the threshold?
- P.19, Line 561‒564: The authors provide an interesting conclusion in this sentence. Could they add an additional sentence to translate to the implication of that finding? If we are forcing moderate pacing or that decrease in response time, how is that helping our students overall?
- P.19 566‒589: The authors reiterate their subgroup performance data, and these results are interesting. With your expertise in this area, why do you think the subgroup empirical tests were significant, but the DDM parameter values were consistent across groups?
- P.19, Line 590: Should this be “technology-based”?
- P. 20, Lines 608‒627: You discuss how RT could be utilized in conjunction with accuracy to optimize learning pathways. Could you contextualize what this would mean in the sense of the technology-based intervention? Toward the end of these lines, you introduce that this information could help teachers intensify instruction and individualize learning experience. Could you expand and share what you envision there?
- 20-21, Lines 652‒672. This paragraph is especially helpful with the ultimate vision and practical implications. I wonder if having some of these ideas introduced earlier could be useful.
References
- There are some inconsistencies with APA formatting throughout the references (e.g., missing & before last author, missing periods after initials and dates, and sentence case in titles).
Tables, Figures, Appendices
- Figure 1: Bullet 3 refers to the student as “she” and Bullet 4 should have the word “to” instead of “do”.
- Table 1: I appreciated the clarity of this table, especially being able to see the average response times for a correct and incorrect response by group. It is interesting to see how quickly students were able to respond to these assessment items. The TX group had 853 students, and you report demographic data for 801 students. I assume that the reduction in sample size was due to attrition. I understand that 27 students did not report demographic data. Did you still include their data in the analyses? I appreciate that you reported whether students had IEPs or not. For those 58 students with IEPs, do you know their eligibilities?
- Figure 3: It would be awesome if the editor could make these plots slightly larger, so it is easier to see the line, points, and decile bins. I noticed in the description the font changes and there are some minor edits (“female” -> “females”, Els -> ELs).
- Table 2: In the note, I think the sentence, “The RT are addressed by back transformed from log RT…” should be rephrased for clarity. Could you explain why there is a division in the table with two sets of bins?
- Table 5: You provide the β value for Gender in the Table but do not for EL and IEPs (the table reverts to F values on page 17).
- Appendix A:
- Under “Feature Set,” there is a statement regarding the slate and includes the words “Gabe’s idea,” is this a typo? In Bullet 3, should the line read “When the character models read…” as opposed to “reading”? In Bullet 4, should the readers of the manuscript know about “eliminate discussion after each item” or was this a directive for those making the lessons? Bullets 5 and 6 are missing punctuation.
- Under “WE DO Instructional Design Principles,” in Bullet 2, 1 + 6 + 7 should read as 1 + 6 = 7. In Bullet 4, should the word “hit” be replaced with “click”?
Author Response
Comments and Suggestions for Authors
Thank you for the opportunity to review the manuscript, “Optimizing Assessment Thresholds of a Computer Gaming Intervention for Students with or at risk for Mathematics Learning Disabilities: Accuracy and Response Time Trade-offs” for publication in Education Sciences.
This is an extremely well-written and clearly explained manuscript. The authors highlight the use of accuracy and response times within the context of technology-based intervention and assessment. The authors used the NumberShire intervention and analyzed accuracy and response data from first-grade students with or at risk for mathematics learning disabilities (MLD). While the manuscript is quite technical, the presentation of the results, conversation around mathematical fluency, and how computer-based programs and assessments can utilize accuracy and response rate to individualize instruction is important.
Introduction: The introduction clearly frames how procedural fluency supports conceptual understanding and provides nice insight into how mathematical fluency is defined and the importance of accuracy and timing elements. The study’s purpose and research goals are explicit and well-written.
Comments 1: P.1, Line 35 – Sentence: “Yet, academically low-achieving students, especially those with or at risk for MLD have experienced learning difficulties across the school years.” This sentence reads as a transition to why conceptual and procedural fluency is important for students with MLD from the prior sentence. However, the claim regarding that they experience learning difficulties across the school years could be reframed or made more relevant to mathematical learning. The next sentence makes that connection clearer with the description of inefficient strategies and longer response times.
Response 1: Thank you for the suggestion to further improve the quality of the sentence. We revised as suggested.
Comments 2: P.2, Lines 43‒58: You give background information related to the accuracy-speed trade-off and introduce Fitts’ law as a model to predict speed based on task difficulty. Could you share an additional sentence or two about Fitts’ law to ensure the reader has a firm understanding?
Response 2: Thank you for the suggestion! We added a sentence to directly explain what Fitts’ law is and revised the next sentence accordingly.
Comments 3: P.2, Lines 54‒55: Minor edit “different group of students” should be “groups” and “More study is needed” should also be rephrased.
Response 3: We rephrased as suggested.
Comments 4: P.3, Line 90: Minor edit to remove “in” from the sentence, “Other researchers suggest that in early mathematical fluency is an indicator…”.
Response 4: We removed “in” as suggested.
Comments 5: P.4, Line 145: Minor edit “yet it has not been widely used in the field of special education, particularly technology-based interventions…,” should the word “among” be inserted after “particularly”.
Response 5: We revised as suggested.
Method: The methodology is clear and replicable. I was not familiar with this intervention, so I appreciated reading the clear descriptions and the inclusion of the sample lesson. The figure also complimented the gameplay pathways description. The tables were clear and easy to read. The data analysis description was extremely thorough and well-described.
Comments 6: P.5, Line 197: Missing the word “with” when describing students at risk for MLD.
Response 6: Thank you for catching this! We added “with”.
Comments 7: P.5, Line 201: Missing a hyphen between first and grade.
Response 7: We added hyphen here and in the abstract.
Comments 8: P.5, Line 207: The inclusion of the quotation to demonstrate the elder’s explicit instruction might benefit from some rephrasing for clarity.
Response 8: As suggested, we added the quotation of the elder’s explicit instruction as an example.
Comments 9: P.5, Line 226: Remove “the” between suggest and significant.
Response 9: We removed “the” as suggested.
Comments 10: P.6, Line 247 & 248: I believe APA indicates that number ranges should utilize an en dash as opposed to a hyphen.
Response 10: As suggested, we changed hyphens to en dashes.
Comments 11: P.6, Line 263: The authors state that “An Embedded Assessment section” was described previously, however, I could not find a clearly indicated section in the previous pages related to embedded assessment. Embedded assessment is mentioned in the following section regarding the data set. Is this what the author is referring to?
Response 11: Our sincere apology. While we worked on our initial draft, we had the section (Embedded Assessment) but it didn’t fully incorporated into other sections while we finalized our draft. We re-organized the sub-sections and described “embedded assessment” in section 2.1.2.
Comments 12: P.7, Lines 290‒303: The authors describe that they identified 27 mini-games that addressed the focal standard of addition and subtraction. Can the authors verify at this point whether they included both addition and subtraction assessment items in the analysis or if it was only addition? Did the authors consider whether RT differed based on problem type e.g., did addition facilitate faster times and accuracy than subtraction? Was there any reason as to why 2 minutes was the cutoff for outliers? In this paragraph, the authors mention that RT was collected in milliseconds from the server. I am assuming that the authors converted milliseconds to seconds for ease of reporting because the average reported is in seconds. Would it also be of use to report on the RT range in the table?
Response 12: Thank you for pointing it out. The previous sentence was confusing or vague. In the current data set of 27 mini-games, we focused on whole number addition; thus, we removed “subtracting” to clarify that. Accordingly, we adjusted our discussion to clarify that the scope of our study is “adding whole numbers” in section 4.2. Regarding the 2-minute cutoff, we based our outlier criteria on Stocker et al.’s (2019) findings that the lowest criteria is 20 correct responses per minutes. Thus, we believed that two minutes for one response is more than the minimum/lowest criteria and operationalized it as outliers for the purpose of our study. We will be happy to add this explanation if needed. Finally, thank you for the suggestion to report the RT range. Due to the outlier cutoff, we don’t think reporting the RT range is useful. But, please let us know if you strongly prefer to report the RT range.
Comments 13: P.9, Line 316: The P in point-biserial does not need to be capitalized.
Response 13: That’s right! We revised as suggested.
Comments 14: P.9, Line 329: The authors use the notation of t and subscript 0 for non-decision time, however, earlier in the paper the notation was different. Could the authors verify the correct one?
Response 14: Thank you so much for catching this. It should be t0 (with 0 being subscripted). We have double checked and changed all notations throughout the paper.
Comments 15: P.9, Line 356: I understand that when you conducted ANOVAs you used partial eta- squared as the effect size. Is there a reason this effect size was used?
Response 15: Because the analyses were conducted at the item level with a very large number of observations, even small differences reached statistical significance. We used partial eta-squared because it is the standard effect size for ANOVAs in educational and behavioral research and it helps convey the practical magnitude of the findings (Fritz et al., 2012). We also decomposed the significant interactions to support clearer interpretation.
Comments 16: P.9, General: As you discuss your data analysis plan, e.g., conducting scatterplots etc., would it be useful to indicate which tables/figures represent that information?
Response 16: We revised the description of the results section to explicitly correspond the planning for scatterplots that display in specified figure. For example, we now note that scatterplots illustrating accuracy–RT relationships are shown in Figure 3 (previously Figure 2).
Results: The results and associated tables and figures are all well-explained and illustrated.
Comments 17: P.10, 364‒372: Could you provide the significance values for the correlations?
Response 17: All correlations between accuracy and response time across subgroups were statistically significant (p < .001). And we put this explanation in the note of Figure 3 (previously Figure 2).
Comments 18: P.12, 411: The sentences seems like it is missing the word “if” ,“to establish [if] the model adequately fits…”
Response 18: We revised the sentence to include the missing word “if,” so it now reads: “to establish if the model adequately fits…”
Discussion: The discussion is clear and connects back to the ideas presented in the introduction.
Comments 19: P.18, Lines 543‒545: Could the authors reword the sentence that begins with “Based” for clarity?
Response 19: We revised the sentence to clarify that accuracy rate of the slowest group was below other groups.
Comments 20: P.18, Lines 546: Could the authors expand on the idea of “capping a maximum RT”? Is this suggesting that within the game, they would allow about 6.4 seconds and then force the student into a specific learning pathway? The Domingue citation is missing a period after et al.
Response 20: Thank you for the suggestion. As suggested, we added “forcing the student into a specific learning pathway after 6.4 seconds”. And, thank you for catching the missing period. We added a period as suggested.
Comments 21: P.18, Line 551: The authors share that optimal response thresholds are novel in SpEd research. Could they add a sentence or two about research that has used this concept or extend to the practical utility of the threshold?
Response 21: Definitely! We added a short description that it has been used in clinical settings.
Comments 22: P.19, Line 561‒564: The authors provide an interesting conclusion in this sentence. Could they add an additional sentence to translate to the implication of that finding? If we are forcing moderate pacing or that decrease in response time, how is that helping our students overall?
Response 22: This is a great suggestion. We added a sentence to describe the practical implication of the finding and it can help guide teachers’ instructional decision making process in support of students with or at risk for MLD.
Comments 23: P.19 566‒589: The authors reiterate their subgroup performance data, and these results are interesting. With your expertise in this area, why do you think the subgroup empirical tests were significant, but the DDM parameter values were consistent across groups?
Response 23: This is a great question. We believe the difference between empirical data and DDM-estimated patterns is due to the nature of DDM parameters. Unlike conventional data analysis technique, DDM simulation can generate and capture other parameters like non-decision time, starting point, etc. To avoid unnecessary confusion among readers, we did not elaborate on this thoughts, but we will be happy to find a place to add/describe this if space is not an issue.
Comments 24: P.19, Line 590: Should this be “technology-based”?
Response 24: Definitely. Thank you again for catching this typo. We revised it as suggested.
Comments 25: P. 20, Lines 608‒627: You discuss how RT could be utilized in conjunction with accuracy to optimize learning pathways. Could you contextualize what this would mean in the sense of the technology-based intervention? Toward the end of these lines, you introduce that this information could help teachers intensify instruction and individualize learning experience. Could you expand and share what you envision there?
Response 25: This is another great suggestion. We actually have described and expanded on our vision in the next section of future work. For example, we explained how it looks like to revising the threshold by adjusting different DDM parameters.
Comments 26: 20-21, Lines 652‒672. This paragraph is especially helpful with the ultimate vision and practical implications. I wonder if having some of these ideas introduced earlier could be useful.
Response 26: We appreciate your insights and had same thoughts (introducing it earlier). Without making major changes of flow, we simply added “see the next section for future direction” in the earlier section to provide a heads up for the readers.
References
Comments 27: There are some inconsistencies with APA formatting throughout the references (e.g., missing & before last author, missing periods after initials and dates, and sentence case in titles).
Response 27: We carefully reviewed and revised the entire reference list to ensure full alignment with APA 7th edition guidelines, including adding missing ampersands and periods, correcting author initials and dates, and applying sentence-case formatting to all article and chapter titles.
Tables, Figures, Appendices
Comments 28: Figure 1: Bullet 3 refers to the student as “she” and Bullet 4 should have the word “to” instead of “do”.
Response 28: We revised it as suggested in both Bullet 3 Bullet 4.
Comments 29: Table 1: I appreciated the clarity of this table, especially being able to see the average response times for a correct and incorrect response by group. It is interesting to see how quickly students were able to respond to these assessment items. The TX group had 853 students, and you report demographic data for 801 students. I assume that the reduction in sample size was due to attrition. I understand that 27 students did not report demographic data. Did you still include their data in the analyses? I appreciate that you reported whether students had IEPs or not. For those 58 students with IEPs, do you know their eligibilities?
Response 29: Regarding the sample size discrepancy, 853 students were recruited into the larger study, but only 801 students completed the focal game-based assessments and provided demographic information. Thus, 801 students were included in the analyses. We added this description to clarify the sample size discrepancy in section 2.2. Students missing demographic data (n = 27) were retained in the analytic models using available item-level responses. We do not have specific disability eligibilities and asked for IEP.
Comments 30: Figure 3: It would be awesome if the editor could make these plots slightly larger, so it is easier to see the line, points, and decile bins. I noticed in the description the font changes and there are some minor edits (“female” -> “females”, Els -> ELs).
Response 30: We revised Figure 3 to increase the plot size and improve visibility of the lines, points, and decile bins. We also corrected the font inconsistencies and updated the labels (e.g., “females,” “ELs”) in the note.
Comments 31: Table 2: In the note, I think the sentence, “The RT are addressed by back transformed from log RT…” should be rephrased for clarity. Could you explain why there is a division in the table with two sets of bins?
Response 31: We revised the wording in the note for clarity and now state that response times were “back-transformed from log RT values.” Regarding the question about the two sets of bins, Table 1 presents decile bins (10 groups) for each outcome, and because the table includes both observed accuracy (RA) and response time (RT), each bin is represented twice, with the upper row showing accuracy and the lower row showing response time. The title of Table 1 reflects this structure and clarifies the distinction between the two rows.
Comments 32: Table 5: You provide the β value for Gender in the Table but do not for EL and IEPs (the table reverts to F values on page 17).
Response 32: We revised Table 5 to report β values for accuracy outcomes, including EL status and IEPs, and removed the unintended reversion to F-values.
Comments 33: Appendix A:
- Under “Feature Set,” there is a statement regarding the slate and includes the words “Gabe’s idea,” is this a typo? In Bullet 3, should the line read “When the character models read…” as opposed to “reading”? In Bullet 4, should the readers of the manuscript know about “eliminate discussion after each item” or was this a directive for those making the lessons? Bullets 5 and 6 are missing punctuation.
- Under “WE DO Instructional Design Principles,” in Bullet 2, 1 + 6 + 7 should read as 1 + 6 = 7. In Bullet 4, should the word “hit” be replaced with “click”?
Response 33: Thank you again for paying extra attention to details. We realized that we used a draft version of the sample lesson plan not the one we finalized. We replaced it to address all concerns regarding Appendix A.

