Previous Article in Journal
Enabling Deep Recursion in C++
Previous Article in Special Issue
Digital Twin-Enhanced Programming Education: An Empirical Study on Learning Engagement and Skill Acquisition
 
 
Article
Peer-Review Record

A Preliminary Usability Study of a Novel Educational Training System to Teach ScratchJr. in School

by María Jesús Manzanares 1, Diana Pérez Marín 2,* and Celeste Pizarro 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 31 October 2025 / Revised: 19 December 2025 / Accepted: 22 December 2025 / Published: 1 January 2026
(This article belongs to the Special Issue Future Trends in Computer Programming Education)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors present a study on 50 children of 6-7 years of age on the usage of "Boby", a pedagogical conversational agent app, to learn ScratchJr. The paper's research questions mainly focus on app appreciation vs. kids liking block programming, or their ability to read, or their usual use of tablets for study vs. for gaming. The kids answers show high app appreciation without any statistical dependence on each of the three factors. The study had 2 kids with special needs that were able to complete the tasks with some help. No particular difference has been noticed in these students' answers w.r.t. the other ones.

While the topic of using a pedagogical conversational agent for helping kids is interesting, the conclusion that app appreciation was not correlated to any of the three characteristics, in my opinion, is not sufficiently interesting.
The research could be greatly improved if a control group were present, if some assessment were done regarding learning outcomes instead of only app likeness (as it could be influenced by the presence of the author as an observer in the class), and with more samples, and with more students with special needs to be able to analyze their case with higher confidence.

Author Response

We agree that a bigger sample of students with special needs is needed as the original paper was submitted to validate the system both by neurotypical and neurodivergent students. However, as the first editor indicates and we agree, the paper could be interesting to the readers of the journal because it presents a novel use of Pedagogic Conversational Agents not to indicate that this technology is better than others to train young students in learning ScratchJr. but to validate that it can be used by students, who find it satisfactory and could complete the activities relating to the input/output concept.

It is also true as commented by the first editor that more programming concepts could be and should be evaluated, as well as a bigger sample, particularly in the case of students with special needs. All in all, this initial pilot study with 48 students without special needs and 2 students with special needs open the door to more research in the topic of how teaching programming in early ages not only for neurotypical students but also for neurodivergent students.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The work introduces  the development and evaluation of the "Boby" Pedagogical Conversational Agent (PCA). The study aimed to validate this system, which implements the PCA in a "student" role, for teaching programming to a cohort of children aged 6–7, including both neurotypical and neurodivergent participants. The study addresses a timely and important topic by focusing on inclusivity and combining pedagogical conversational agents (PCAs) with education for neurotypical and neurodivergent children aged 6–7. The development of the "Boby" Pedagogical Conversational Agent (PCA) shows promise, and the work is commendable for its adherence to a user-centered design philosophy and Universal Design for Learning principles. The authors have definitely succeeded in methodological rigor and transparency in the statistical analysis, explicitly identifying and reporting issues like data imbalances, quasi-separation, and low expected cell frequencies.

Notwithstanding the mentioned strong points of the work, the Reviewer desires to bring to the authors’ attention the following issues to enhance and adequately position the work in accordance with its actual content.  

First of all, drawing sweeping conclusions about the system's effectiveness for the entire neurodivergent population from this small sample is considered a fatal flaw and a severe breach of academic ethics. Therefore, the claim that both neurotypical and neurodivergent students are able to complete the activity is an unsupported generalization.

Secondly, even the analysis for the neurotypical cohort (N=48) is critically flawed. The authors admit the data is highly skewed (e.g., satisfaction over 80% 'Yes'). Consequently, their inferential models (logistic regression) became numerically unstable due to issues like quasi-separation and low expected cell frequencies (with 77.8% of cells lower than 5), rendering the p-values and odds ratios unreliable. The complex statistical models failed because the data was unsuitable for this type of analysis.

Thirdly, the authors ground the paper's significance in the false assertion that "there are no examples in the literature of teaching programming to neurodivergent students with the support of this type of agent [Pedagogical Conversational Agent]". This is factually incorrect, as research into "chatbots" (a primary form of PCA) for neurodiverse learners already exists, establishing a field that is "emerging-but-extant".

Regarding the organization and referencing, the State of the Art, for example, is inadequate and biased, failing to situate the work within the broader research field. Specifically, it omits crucial alternative paradigms, such as the central "Tangible vs. Digital" debate. The review fails to cite existing work on screen-free, tangible robots (such as KIBO) used with neurodiverse children, which directly challenges the paper's digital, screen-based approach.

Therefore, to make the work scientifically defensible and valid for publication, the following mandatory, non-negotiable revisions must be addressed:

The paper must be fundamentally re-written as a "preliminary usability study" of the "Boby" agent. In particular, the title, abstract, introduction, and conclusion must all be changed to reflect this modest scope, and all language claiming "validation" for "neurodivergent children" must be removed, as well as all claims of effectiveness or generalization for "neurodivergent" learners must be deleted. As a consequence, the N=2 neurodivergent sample must be re-presented, at most, as two "illustrative case studies" within the discussion section.

As for the statistical re-analysis, the authors must explicitly acknowledge that their inferential statistical models (logistic regression and chi-square) failed due to low variance, skew, and quasi-separation. The unstable, misleading regression tables (e.g., Tables 3 and 7) must be removed. As a consequence, results must be re-reported using only simple, robust descriptive statistics (e.g., frequencies, percentages of satisfaction and task completion), as this is the only scientifically valid conclusion from the current dataset.

As for the comprehensive literature review revision, the unsupported claim of "no examples" (the "strawman gap") must be removed. Moreover, the authors are strongly suggested to rewrite Section 2 to incorporate a new, comprehensive literature search that includes the existing work on chatbots/PCAs for neurodiverse learners, the "Tangible vs. Digital" debate (citing tangible robots like KIBO), and other scaffolding techniques for this population.

Finally, regarding  organizational corrections, the introduction's basic structural error (the erroneous roadmap) must be corrected to reflect professionalism and the paper's actual structure. The authors are also strongly encouraged to expand the Materials and Methods section to include system design and architecture details.

Author Response

The work introduces  the development and evaluation of the "Boby" Pedagogical Conversational Agent (PCA). The study aimed to validate this system, which implements the PCA in a "student" role, for teaching programming to a cohort of children aged 6–7, including both neurotypical and neurodivergent participants. The study addresses a timely and important topic by focusing on inclusivity and combining pedagogical conversational agents (PCAs) with education for neurotypical and neurodivergent children aged 6–7.

Many thanks. As we commented to reviewer 1, it is necessary to conduct more research on inclusivity to teach programming to young learners. We agree more students with special needs should have been included in the sample, but this time was not possible. All in all, it is an initial pilot study that could serve as a start for more research on the topic of Pedagogic Conversational Agents to teach programing to all.

 

The development of the "Boby" Pedagogical Conversational Agent (PCA) shows promise, and the work is commendable for its adherence to a user-centered design philosophy and Universal Design for Learning principles. The authors have definitely succeeded in methodological rigor and transparency in the statistical analysis, explicitly identifying and reporting issues like data imbalances, quasi-separation, and low expected cell frequencies.

Many thanks.

 

Notwithstanding the mentioned strong points of the work, the Reviewer desires to bring to the authors’ attention the following issues to enhance and adequately position the work in accordance with its actual content.  

First of all, drawing sweeping conclusions about the system's effectiveness for the entire neurodivergent population from this small sample is considered a fatal flaw and a severe breach of academic ethics. Therefore, the claim that both neurotypical and neurodivergent students are able to complete the activity is an unsupported generalization.

 

We totally agree. The title has been changed and the conclusions have been toned down regarding the validation with neurodivergent students. It is really an initial pilot study.

 

Secondly, even the analysis for the neurotypical cohort (N=48) is critically flawed. The authors admit the data is highly skewed (e.g., satisfaction over 80% 'Yes'). Consequently, their inferential models (logistic regression) became numerically unstable due to issues like quasi-separation and low expected cell frequencies (with 77.8% of cells lower than 5), rendering the p-values and odds ratios unreliable. The complex statistical models failed because the data was unsuitable for this type of analysis.

We agree. The statistical section has been rewritten as also indicated by the first editor.

 

Thirdly, the authors ground the paper's significance in the false assertion that "there are no examples in the literature of teaching programming to neurodivergent students with the support of this type of agent [Pedagogical Conversational Agent]". This is factually incorrect, as research into "chatbots" (a primary form of PCA) for neurodiverse learners already exists, establishing a field that is "emerging-but-extant".

Regarding the organization and referencing, the State of the Art, for example, is inadequate and biased, failing to situate the work within the broader research field. Specifically, it omits crucial alternative paradigms, such as the central "Tangible vs. Digital" debate. The review fails to cite existing work on screen-free, tangible robots (such as KIBO) used with neurodiverse children, which directly challenges the paper's digital, screen-based approach.

 

The state-of-the-art has also been rewritten taking into account these comments. It was not our initial intention to consider literature of tangible robots as it is a technology that we know that some schools in some countries cannot use as they only have tablets. It is not possible for them to buy robots. This is the reason why research that we know about, and we are aware of the good results is difficult to validate when the resources are not available. As also known, the cost of technology could generate a gap between the users who can benefit from these possibilities, and there is no debate when the schools, at least, in some cases, only have tablets.

 

Therefore, to make the work scientifically defensible and valid for publication, the following mandatory, non-negotiable revisions must be addressed:

The paper must be fundamentally re-written as a "preliminary usability study" of the "Boby" agent. In particular, the title, abstract, introduction, and conclusion must all be changed to reflect this modest scope, and all language claiming "validation" for "neurodivergent children" must be removed, as well as all claims of effectiveness or generalization for "neurodivergent" learners must be deleted. As a consequence, the N=2 neurodivergent sample must be re-presented, at most, as two "illustrative case studies" within the discussion section.

 

The title, abstract, introduction and conclusion have been changed as indicated by the reviewer.

 

As for the statistical re-analysis, the authors must explicitly acknowledge that their inferential statistical models (logistic regression and chi-square) failed due to low variance, skew, and quasi-separation. The unstable, misleading regression tables (e.g., Tables 3 and 7) must be removed. As a consequence, results must be re-reported using only simple, robust descriptive statistics (e.g., frequencies, percentages of satisfaction and task completion), as this is the only scientifically valid conclusion from the current dataset.

The statistical section has been rewritten.

 

As for the comprehensive literature review revision, the unsupported claim of "no examples" (the "strawman gap") must be removed. Moreover, the authors are strongly suggested to rewrite Section 2 to incorporate a new, comprehensive literature search that includes the existing work on chatbots/PCAs for neurodiverse learners, the "Tangible vs. Digital" debate (citing tangible robots like KIBO), and other scaffolding techniques for this population.

 

The state-of-the-art section has been rewritten.

 

 

Finally, regarding  organizational corrections, the introduction's basic structural error (the erroneous roadmap) must be corrected to reflect professionalism and the paper's actual structure. The authors are also strongly encouraged to expand the Materials and Methods section to include system design and architecture details.

We agree that more design and architecture details have been added.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The work of Pérez Marín and colleagues explores the advantages of early interactions between neurotypical and neurodivergent children aged 6-7 years, particularly in coding via programming-oriented activities. This contribution is interesting and timely and might interest the community. This reviewer is inclined to recommend this study for publication pending resolution of the following points:

Major comments

  1. Structural issues: The manuscript is not paginated, making its processing difficult and deviating from basic formatting standards of research papers.
  2. Model’s statistical limitations: The research questions are tested through hypotheses H1 and H2. However, the authors should consider a more standard statistical approach, such as restating H1 and H2 in terms of the null (H0) and alternative (H1) hypotheses.
  3. Missing statistical information: The sample size is 50, comprising 33 girls and 17 boys. Is there a reason for this imbalance? To what extent would a different setting (e.g., the same number of girls and boys) compromise the research outcomes?
  4. Missing statistical information: Related to the above point, the study contains two neurodivergent and 48 neurotypical children. Could the authors elaborate on their expectation regarding the output in a more balanced scenario, i.e., the same number of neurodivergent and neurotypical students?
  5. Structural issues: As stated by the authors, “…the aim is to explore which factors influence children’s ability to complete activities with Boby and the outcomes they achieve when interacting with this type of system before beginning to use Scratch Jr.” However, neither the Conclusions nor the Abstract mentions any potential factors influencing children’s abilities.
  6. Formatting inconsistencies: The integer part of fractional numbers is expressed explicitly (e.g., Table 6:189, 0.104, and -0.039; Table 7: 0.116) and implicitly (e.g., Table 6: .910, .747, and .81; Table 7: .745 and .005). Please revise the entire text and express the integer part of all fractional numbers explicitly, following research papers’ standards; for instance, 0.910, 0.747, and 0.81 (see Table 6).
  7. Issues with figures: In Figure 5, the font size makes critical elements, including legends and axes numbers, inaccessible. Please increase the font size (using the manuscript text size as a guide) and include a name for the ordinate axis.  
  8. Issues with figures: In Figure 5, please include letters to facilitate each panel, e.g., (a), (b), …. Additionally, modify this figure’s legend accordingly to reference each panel and facilitate the overall understanding.
  9. Issues with figures: In Figure 6, the font size makes critical elements, including legends and axes numbers, inaccessible. Please increase the font size using the manuscript’s font size as a reference. Also, please include a name for the ordinate axis.  
  10. Issues with figures: In Figure 7, the font size makes critical elements, including legends and axes numbers, inaccessible. Please increase the font size using the manuscript’s font size as a reference.
  11. Issues with figures: In Figure 8, the font size makes critical elements, including legends and axes numbers, inaccessible. Please consider a bigger font size, using the manuscript’s font size as a guide.
  12. Pattern violations: In order to facilitate the manuscript reading, it is recommended to carefully revise it and rephrase sentences like “According to [4], …” as “According to Manches and Plowman, …”.
  13. Missing information: To facilitate reproducibility, it is recommended that the authors provide a link to the data. For instance, the Data Availability Statement could be rephrased to include the data link: “The data utilized and generated by this study is publicly available at the following University’s repository: http://...

 

Minor comments

  1. In the Abstract, rephrase “…at early age…” as “at an early age…”.
  2. In Table 2, rephrase “…Chi-cuadrado…” as “…Chi-square…”.

 

Author Response

The work of Pérez Marín and colleagues explores the advantages of early interactions between neurotypical and neurodivergent children aged 6-7 years, particularly in coding via programming-oriented activities. This contribution is interesting and timely and might interest the community.

Many thanks.

 

This reviewer is inclined to recommend this study for publication pending resolution of the following points:

Major comments

  1. Structural issues: The manuscript is not paginated, making its processing difficult and deviating from basic formatting standards of research papers.

The paper has been paginated. Although we have followed the template provided and we are not sure whether the editors will change that.

 

  1. Model’s statistical limitations: The research questions are tested through hypotheses H1 and H2. However, the authors should consider a more standard statistical approach, such as restating H1 and H2 in terms of the null (H0) and alternative (H1) hypotheses.

The statistical section has been rewritten in a more descriptive approach as suggested by the other reviewers too.

 

  1. Missing statistical information: The sample size is 50, comprising 33 girls and 17 boys. Is there a reason for this imbalance? To what extent would a different setting (e.g., the same number of girls and boys) compromise the research outcomes?

Given that it is a real class, and we could not choose the number of girls and boys in the class it reflects the imbalances that usually are in the schools. However, given that we are not considering any gender considerations, we believe it should not compromise the research outcomes.

 

During the experiment, boys and girls were treated in the same way. We did not intend to investigate whether there are differences in their use of the agent, but we are thinking now that it could be an interesting future work.

 

  1. Missing statistical information: Related to the above point, the study contains two neurodivergent and 48 neurotypical children. Could the authors elaborate on their expectation regarding the output in a more balanced scenario, i.e., the same number of neurodivergent and neurotypical students?

 

As the other reviewers have also commented,  we have rewritten the paper including title, abstract, introduction and conclusions to highlight that currently it is only a preliminary study with a pilot study with 2 students with special needs and we are working in more research with more students with special needs once this first pilot study has been completed. We believe this study could be the one to more research on the topic.

 

  1. Structural issues: As stated by the authors, “…the aim is to explore which factors influence children’s ability to complete activities with Boby and the outcomes they achieve when interacting with this type of system before beginning to use Scratch Jr.” However, neither the Conclusions nor the Abstract mentions any potential factors influencing children’s abilities.

We agree, this paragraph has been removed and the conclusions have been toned down given also the limited of the sample.

  1. Formatting inconsistencies: The integer part of fractional numbers is expressed explicitly (e.g., Table 6:189, 0.104, and -0.039; Table 7: 0.116) and implicitly (e.g., Table 6: .910, .747, and .81; Table 7: .745 and .005). Please revise the entire text and express the integer part of all fractional numbers explicitly, following research papers’ standards; for instance, 0.910, 0.747, and 0.81 (see Table 6).

The statistical section has been rewritten.

 

  1. Issues with figures

Figures have been redone.

 

  1. Pattern violations: In order to facilitate the manuscript reading, it is recommended to carefully revise it and rephrase sentences like “According to [4], …” as “According to Manches and Plowman, …”.

Done.

 

  1. Missing information: To facilitate reproducibility, it is recommended that the authors provide a link to the data. For instance, the Data Availability Statement could be rephrased to include the data link: “The data utilized and generated by this study is publicly available at the following University’s repository: http://...

We agree. The data is public at

https://edatos.consorciomadrono.es/dataset.xhtml?persistentId=doi:10.21950/XP2SPQ

and we have rewritten the Data Availability Statement to show the link.

 

Minor comments

  1. In the Abstract, rephrase “…at early age…” as “at an early age…”.

Done.

  1. In Table 2, rephrase “…Chi-cuadrado…” as “…Chi-square…”.

Previous Table 2 has been removed as the Statistical Section has been rewritten.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I am really sorry, but I don't find the paper interesting.

Author Response

We are sorry to hear that, but the academic editor,  and the other reviewers all agree that the paper is interesting and this research is needed.

Reviewer 2 Report

Comments and Suggestions for Authors

The revised manuscript has successfully addressed the primary methodological and ethical concerns raised in the initial review, demonstrating significant effort and scientific integrity. The work is now substantially strengthened and is very close to acceptance, requiring only a few final, minor editorial corrections to ensure compliance with all mandated revisions.

The transition from a "validation" study to a "preliminary usability study" is complete and highly effective. The authors’ decision to reframe the neurodivergent sample as a "pilot study" and to rely exclusively on descriptive statistics (frequencies and percentages) successfully resolved the statistical and ethical "fatal flaws" identified in the first version. The inclusion of detailed system architecture (Figures 5 and 6) and the expansion of the literature review to include the Tangible vs. Digital debate (citing KIBO) are commendable additions that enhance the paper's scientific merit.

The manuscript is highly suitable for publication upon verification of the following minor corrections.

Area

Correction Required

Rationale

1. Literature Review: Unsupported Claim (Factual Correction)

The authors must delete the sentence in the Introduction that asserts: "However, there are no examples in the literature of teaching programming to neurodivergent students with the support of this type of agent."

This claim is factually incorrect and was explicitly mandated for removal. Retaining this assertion undermines the paper's credibility by ignoring existing research on chatbots and PCAs for neurodiverse populations. If not deleted, the sentence must be heavily nuanced to specifically limit the scope to agents used for ScratchJr block-based programming. Otherwise add a new paragraph in Section 2 (State of the Art) that explicitly discusses the "Tangible vs. Digital" debate, citing the Albo-Canals (2018) (Albo-Canals, J., Martelo, A. B., Relkin, E., Hannon, D., Heerink, M., Heinemann, M., ... & Bers, M. U. (2018). A pilot study of the KIBO robot in children with severe ASD. International Journal of Social Robotics, 10(3), 371-383.) study on KIBO and ASD to provide a balanced view of the field.

2. Organizational Structure: Roadmap Error (Editorial Polish)

The roadmap paragraph in the Introduction must be corrected to reflect the professional and accurate ordering of the sections in the revised manuscript.

The revised version correctly implements the structure (Materials and Methods, followed by Results), but the roadmap text still mistakenly lists "Section 3 describes the results... Section 4 presents the materials and methods". The descriptions for Sections 3 and 4 must be swapped in the introduction to match the actual document order, in addition the roadmap paragraph in the Introduction should accurately reflect the actual 7-section structure.

 

Author Response

We are glad to read that our effort to apply the comments is visible. Many thanks again for your valuable comments which we believe have improved the paper very much.

We have removed the sentence as indicated, and the roadmap paragraph has been corrected in the new version.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have addressed all suggestions and recommendations. This reviewer now recommends the publication of this work as is.  

Author Response

Many thanks.

Back to TopTop