The Impact of Educational LLM Agent Use on Teachers’ Curriculum Content Creation: The Chain Mediating Role of School Support and Teacher Self-Efficacy
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThank you for the opportunity to read your manuscript on how educational LLM agents relate to teachers’ curriculum content creation, with school support and teacher self-efficacy examined as mediating mechanisms. The topic is timely and relevant; the study draws on a substantial survey (N=464) complemented by 23 semi-structured interviews, which together provide a strong empirical base. The mixed-methods design and the clear articulation of the four focal constructs are notable strengths.
At the same time, my overall impression is that the paper tries to do too much at once. The quantitative model, the qualitative strand, and a broad theoretical arc each have merit, but they currently sit alongside one another more than they cohere into a single, tightly focused narrative. As a result, reader uptake becomes harder than necessary. A productive first step would be to decide which thread is primary. If the quantitative SEM is the backbone, consider integrating the interviews more surgically—as targeted, joint-display style illustrations that illuminate specific paths in the model—rather than as a parallel, free-standing analysis. If, instead, you wish to deepen the qualitative contribution, that strand may warrant its own paper, with this manuscript streamlined around the modeling results. The abstract and introduction already foreground the survey and chain mediation; letting that emphasis guide the structure would improve coherence.
On measurement and validity, the most pressing issue is common method bias (CMB) and discriminant validity. The current approach relies chiefly on Harman’s single-factor test and a comparison of alternative CFA models. While your four-factor baseline fit is acceptable overall (e.g., CFI=0.970), the RMSEA=0.076 is at the upper bound of conventional guidelines, and Harman’s test alone does not rule out CMB. I recommend supplementing with a CFA common latent factor (or ULMC) and/or a marker-variable approach, and reporting the results of these sensitivity checks. In addition, please report AVE and CR for each construct, apply the Fornell–Larcker criterion and HTMT ratios, and include multicollinearity diagnostics (e.g., VIF) for paths that could be affected by overlap. This is especially important because several correlations are very high (for example, School Support with Teacher Self-Efficacy and with Teachers’ Curriculum Content Creation), which raises the possibility of insufficient discriminant validity and/or conceptual redundancy across scales—particularly alongside very high internal consistencies (α≈.92–.96).
With respect to model estimation and reporting, the paper would benefit from fuller transparency. Please (i) state the SEM software and estimator, (ii) report all key fit indices with confidence intervals (e.g., the 90% CI for RMSEA; SRMR consistently across models), and (iii) justify any correlated errors or alternative specifications you considered. For mediation, you nicely summarize total/direct/indirect effects with bootstrapping; please add the number of bootstrap samples, the CI type (e.g., BCa), random seed if relevant, and standardized indirect effects with brief effect-size interpretations. This will make the statistical evidence easier to evaluate and replicate.
Sampling and generalizability also deserve clearer treatment. You note a focus on economically developed coastal regions and, at the same time, report a high proportion of participants with rural household registration. That combination can certainly occur, but the context and its implications should be reconciled and discussed (including potential self-selection and non-response bias in an online sample). A short limitations paragraph on external validity would help readers interpret the scope conditions for your findings.
The qualitative component is promising but under-specified methodologically. Please describe the coding approach (e.g., thematic vs. grounded; open/axial/selective), number and training of coders, inter-rater reliability (e.g., Cohen’s κ), how saturation was determined, and translation/anonimization procedures where applicable. Including a few longer, well-chosen quotes that map directly onto model paths (preferably alongside a compact joint display) would strengthen integration between strands and move the qualitative section from illustration toward explanation.
Editorially, a few consistency edits will improve polish and readability: settle on a single term for the focal technology (e.g., “educational LLM agents” throughout); harmonize significance markers and footnotes across tables; and, if you retain the model figure, consider adding standardized coefficients and significance indicators to it. The very high Cronbach’s alpha values suggest there may be room to shorten scales or, at least, to discuss potential item redundancy. Finally, rather than “data available on request,” please consider depositing anonymized data, code (or SEM input), and the finalized instrument in a public repository; this will materially enhance transparency and impact.
In sum, this is a high-potential manuscript with a strong dataset and a timely question. With a sharper narrative focus, stronger evidence on measurement/CMB, more complete reporting of the SEM and bootstrapping details, and a clearer account of the qualitative methods, the paper can become a compelling contribution to the literature on LLM-supported pedagogy. My recommendation at this stage is major revision. Thank you for your careful work and for engaging a topic of real practical importance for teachers and schools.
Author Response
Respected Reviewer:
First and foremost, we would like to express our sincere gratitude to you for putting forward your valuable review comments on this study. Your comments not only accurately point out the areas that need improvement in the research but also provide a clear and professional direction for our revision work. In response to all your questions and suggestions, we have sorted them out one by one and carried out systematic responses and supplementary improvements, striving to make the logical system of the research more rigorous and the content more practical.
Problem 1:At the same time, my overall impression is that the paper tries to do too much at once. The quantitative model, the qualitative strand, and a broad theoretical arc each have merit, but they currently sit alongside one another more than they cohere into a single, tightly focused narrative. As a result, reader uptake becomes harder than necessary. A productive first step would be to decide which thread is primary. If the quantitative SEM is the backbone, consider integrating the interviews more surgically—as targeted, joint-display style illustrations that illuminate specific paths in the model—rather than as a parallel, free-standing analysis. If, instead, you wish to deepen the qualitative contribution, that strand may warrant its own paper, with this manuscript streamlined around the modeling results. The abstract and introduction already foreground the survey and chain mediation; letting that emphasis guide the structure would improve coherence.
Response:
Firstly, we extend our sincere gratitude to the review panel for their invaluable suggestions regarding the research. In response to your query concerning ‘A productive first step would be to decide which thread is primary’, our reply is as follows: the core contribution and principal logical framework of this study is underpinned by the chained mediation model derived from quantitative research.
Secondly, regarding your suggestion that ‘If the quantitative SEM is the backbone, consider integrating the interviews more surgically—as targeted, joint-display style illustrations that illuminate specific paths in the model—rather than as a parallel, free-standing analysis’, our reply is as follows: We have now combined the previously standalone quantitative chained mediation model diagram with the qualitative analysis diagram. This integration positions the qualitative data as a ‘concrete annotation’ to the quantitative model rather than an independent illustration. Please refer to Figure 2 within the article for specifics. (L 400-490)
Problem 2:On measurement and validity, the most pressing issue is common method bias (CMB) and discriminant validity. The current approach relies chiefly on Harman’s single factor test and a comparison of alternative CFA models. While your four-factor baseline fit is acceptable overall (e.g., CFI=0.970), the RMSEA=0.076 is at the upper bound of conventional guidelines, and Harman’s test alone does not rule out CMB. I recommend supplementing with a CFA common latent factor (or ULMC) and/or a marker-variable approach and reporting the results of these sensitivity checks. In addition, please report AVE and CR for each construct, apply the Fornell–Larcker criterion and HTMT ratios, and include multicollinearity diagnostics (e.g., VIF) for paths that could be affected by overlap. This is especially important because several correlations are very high (for example, School Support with Teacher Self-Efficacy and with Teachers’ Curriculum Content Creation), which raises the possibility of insufficient discriminant validity and/or conceptual redundancy across scales—particularly alongside very high internal consistencies (α≈.92–.96).
Response:
Firstly, regarding your suggestion that ‘I recommend supplementing with a CFA common latent factor (or ULMC) and/or a marker-variable approach, and reporting the results of these sensitivity checks’, We have incorporated the common latent factor (ULMC) from CFA and updated the report accordingly(L 321-333),the specific modifications are as follows:
Because this study used multiple scales completed by the same participants, we examined the potential for common method bias. Following Zhou and Long’s (2004) recommendations, the methods of “Harman single factor test” and “controlling unmeasured single method potential factor” were used to test the common method bias. When all items loaded onto a single common factor, the fit indices were poor: χ²/df =10.817, CFI=0.884, TLI=0.867, GFI=0.692, RMSEA=0.146, and RMR=0.038. In contrast, the four-factor model produced substantially better fit indices (χ²/df=3.688, RMSEA=0.076, NFI=0.959, CFI=0.970, GFI=0.904,RMR=0.018). Subsequently, after adding the method factor to the four-factor model, the model's fit indices were as follows:χ²/df =3.077, CFI=0.977, TLI=0.972, GFI=0.921, RMSEA=0.067, and RMR=0.063.The results show that â–³CFI = 0.007, â–³TLI = 0.008< 0.1, and â–³RMSEA = 0.009, â–³RMR =0.045< 0.05, indicating that there is no significant common method bias in the measurements.
Secondly, regarding your suggestion that‘In addition, please report AVE and CR for each construct, apply the Fornell–Larcker criterion and HTMT ratios’, We have added this content and the changes are as follows(L 263-283):
the educational LLM application (CR=0.943, AVE=0.805), school support (CR=0.926, AVE=0.757), Teacher Self-Efficacy (CR=0.955, AVE=0.809), and Teacher Curriculum Design Content Creation (CR=0.957, AVE=0.847) all exhibited composite reliability (CR) ≥ 0.7 and average variance extracted (AVE) ≥ 0.5. This indicates strong item aggregation under the same construct, effectively capturing core conceptual content. Finally, calculations confirmed that all constructs satisfied the criterion ‘AVE root square > correlation coefficients with other constructs,’ thereby meeting the Fornell-Larcker criteria. The HTMT values were: 0.638 for educational LLM application and school support; 0.731 for teacher self-efficacy; 0.71 for teacher curriculum design content creation; School Support exhibited HTMTs of 0.698 and 0.619 with Teacher Self-Efficacy and Teacher Curriculum Design Content Creation respectively; Teacher Self-Efficacy showed an HTMT of 0.694 with Teacher Curriculum Design Content Creation. Their Heterogeneity-to-Homogeneity Ratios (HTMTs) were all <0.85, confirming clear construct boundaries and the absence of dimensional overlap.
Thirdly, regarding your suggestion that‘include multicollinearity diagnostics (e.g., VIF) for paths that could be affected by overlap’, We have added this section, and the changes are as follows(L 350-359):
However, the high correlation coefficients among variables readily give rise to multicollinearity issues. Consequently, this study employed educational LLM application, school support, and teacher self-efficacy as independent variables, with teacher curriculum design content creation as the dependent variable, for further validation. Results indicate that the variance inflation factors (VIF) for educational LLM application (VIF=5.626), school support (VIF=3.205), and teacher self-efficacy (VIF=7.04) were all <10. This suggests that despite high inter-variable correlations, the current regression model exhibits no severe multicollinearity interference, and the estimated effects of each variable on the dependent variable possess a degree of reliability.
Problem 3:With respect to model estimation and reporting, the paper would benefit from fuller transparency. Please (i) state the SEM software and estimator, (ii) report all key fit indices with confidence intervals (e.g., the 90% CI for RMSEA; SRMR consistently across models), and (iii) justify any correlated errors or alternative specifications you considered. For mediation, you nicely summarize total/direct/indirect effects with bootstrapping; please add the number of bootstrap samples, the CI type (e.g., BCa), random seed if relevant, and standardized indirect effects with brief effect-size interpretations. This will make statistical evidence easier to evaluate and replicate.
Response:
Firstly, regarding your suggestion that ‘ state the SEM software and estimator’, We have added this content and the changes are as follows(L313-314):
we employed SPSS AMOS 26.0 and the Maximum Likelihood (ML) estimator to conduct structural equation modelling (SEM) analysis in order to test our proposed hypotheses.
Secondly, regarding your suggestion that‘report all key fit indices with confidence intervals’, We have added this content and the changes are as follows(L 311-312):
|
Model |
χ2 |
df |
χ2 /df |
CFI |
TLI |
NFI |
GFI |
RMSEA |
RMSEA 90%CI |
SRMR |
Δχ2 |
|
Baseline Model(M1) |
416.794 |
113 |
3.688 |
0.970 |
0.964 |
0.959 |
0.904 |
0.076 |
[0.068,0.084] |
0.0234 |
|
|
Model 1 |
868.401 |
116 |
7.486 |
0.925 |
0.912 |
0.915 |
0.783 |
0.118 |
[0.111,0.126] |
0.0409 |
359.606 |
|
Model 2 |
570.135 |
116 |
4.915 |
0.955 |
0.947 |
0.944 |
0.866 |
0.092 |
[0.084,0.1] |
0.0288 |
153.341 |
|
Model 3 |
763.599 |
116 |
6.582 |
0.936 |
0.925 |
0.925 |
0.807 |
0.110 |
[0.102,0.117] |
0.0349 |
346.805 |
|
Model 4 |
961.679 |
118 |
8.150 |
0.916 |
0.903 |
0.906 |
0.768 |
0.124 |
[0.117,0.132] |
0.0418 |
544.885 |
|
Model 5 |
1287.207 |
119 |
10.817 |
0.884 |
0.867 |
0.874 |
0.692 |
0.146 |
[0.138,0.153] |
0.0466 |
870.413 |
Thirdly, regarding your suggestion that‘justify any correlated errors or alternative specifications you considered’, We reported within the limitations and the specific changes are as follows(L 663-670):
This study did not incorporate certain control variables (such as demographic variables), potentially introducing minor omission bias. However, adding supplementary control variables increases model complexity, and pre-testing indicates these variables exert no significant influence on core pathways. Consequently, establishing the baseline model remains a priority. Future research may employ multi-group SEM to validate the cross-sample robustness of the baseline model or construct cross-lagged models using longitudinal data to further enhance the reliability of causal inference.
Fourthly, regarding your suggestion that‘please add the number of bootstrap samples, the CI type (e.g., BCa), random seed if relevant, and standardized indirect effects with brief effect-size interpretations’, we have added this information and replaced the original chained intermediary data with standardized chained intermediary data. For specific data, see Table 5, and the modifications are as follows(L 373-382):
In the chained mediation model analysis conducted within this study, standardized data were employed to ensure the accuracy and robustness of mediation effect testing. The PROCESS v4.1 macro was utilized to perform the relevant tests, with the following parameter settings: A 95% confidence level was selected, employing BCa (bias-corrected accelerated) confidence intervals to better accommodate potential non-normal data distributions; Simultaneously, a random seed of 20241211 was specified to guarantee the replicability of the testing process, thereby providing reliable statistical grounds for determining the significance of chained mediating effects.
Problem 4:Sampling and generalizability also deserve clearer treatment. You note a focus on economically developed coastal regions and, at the same time, report a high proportion of participants with rural household registration. That combination can certainly occur, but the context and its implications should be reconciled and discussed (including potential self-selection and non-response bias in an online sample). A short limitations paragraph on external validity would help readers interpret the scope conditions for your findings.
Response:
Firstly, regarding your suggestion that ‘You note a focus on economically developed coastal regions and, at the same time, report a high proportion of participants with rural household registration’, We provide explanations in the research Participants section (L 205-215) and the limitation section (L 670-695), as follows::
Participants section(L 205-215):It should be clarified that the term ‘coastal developed regions’ as defined in this study specifically refers to coastal provinces such as Zhejiang, Fujian, and Jiangsu, which exhibit relatively high levels of economic development. Although these provinces rank among the nation's leaders in overall socio-economic advancement, they still contain extensive rural areas, demonstrating a pronounced urban-rural development gradient. During the data collection phase, questionnaires were distributed through teacher training programmes. These training sessions encompassed not only urban schoolteachers but also a substantial number of frontline educators from rural areas within the aforementioned developed provinces.
Limitation section(L 670-695):Thirdly, this study exhibits certain limitations in terms of sample size and disciplinary coverage, which to some extent constrain the generalizability of its conclusions. Firstly, the geographical distribution of the sample is markedly imbalanced, with research data predominantly sourced from economically developed coastal regions. The proportion of survey samples from western and less developed areas is extremely low, making it difficult to fully reflect the operational mechanisms of research variables across diverse educational settings. It also fails to comprehensively reveal the potential impact of regional developmental disparities on the core research questions. Secondly, the selection of subject areas is singularly focused. Current research samples and data concentrate exclusively on science disciplines, excluding subjects such as humanities and languages. Given the significant differences in teaching methodologies, learning contexts, and teacher-student interaction dynamics across disciplines, the findings struggle to be transferred to the broader educational landscape encompassing all subject areas. Therefore, future research may be optimized and expanded in two respects. Firstly, efforts should be made to broaden the geographical coverage of the sample, with particular emphasis on supplementing survey data from western and less developed regions. By comparing samples across multiple regions and developmental levels, the cross-regional applicability of this study's conclusions may be validated, whilst simultaneously exploring the interactive effects between regional development factors and core research variables. Secondly, disciplinary boundaries should be transcended by incorporating humanities and language disciplines into the research scope. Comparing the adaptability of research models across different academic fields will enhance the theoretical framework's disciplinary universality, thereby providing more comprehensive empirical support for constructing an education intervention system with full disciplinary coverage.
Problem 5:The qualitative component is promising but under-specified methodologically. Please describe the coding approach (e.g., thematic vs. grounded; open/axial/selective), number and training of coders, inter-rater reliability (e.g., Cohen’s κ), how saturation was determined, and translation/anonymization procedures where applicable. Including a few longer, well-chosen quotes that map directly onto model paths (preferably alongside a compact joint display) would strengthen integration between strands and move the qualitative section from illustration toward explanation.
Response:
Firstly, regarding your suggestion that ‘Please describe the coding approach’, We have added this content and the changes are as follows(L 386-400):
In the qualitative analysis section, this study employs content analysis, a semi-quantitative research method that enables repeatable, valid inferences from texts (or other meaning-bearing entities such as videos) to their contextual usage. It objectively, systematically, and quantitatively describes explicit communication content. Utilizing this approach, the study conducts objective, systematic, and quantitative content coding and analysis of materials related to teachers' use of educational LLMs in course content creation. The specific process strictly adhered to the six stages of content analysis methodology: establishing research objectives, defining the research population and selecting units of analysis, designing the analytical dimension system, sampling and quantifying the materials, and conducting evaluative recording and analytical inference. Throughout, standardized procedures inherent to content analysis were applied, including dual coding, consistency testing and reporting, and resolving discrepancies through discussion.
Secondly, regarding your suggestion that‘Including a few longer, well-chosen quotes that map directly onto model paths ‘,we have integrated the original independent quantitative chain-mediated model diagram with the qualitative analysis diagram, making the qualitative data serve as a "concrete annotation" for the quantitative model, and provided supplementary explanations. For details, please refer to the specific sections L 404-457:
Direct effect: The direct effect was 0.6850 (95% CI [0.5944, 0.7757]), accounting for 76.44% of the total effect. This indicates Educational LLM Agents Use directly predicts curriculum content creation, providing strong support for Hypothesis H1. As Respondent 19 stated: Our school secured a pilot program for an educational large model. Through such models, we efficiently obtained substantial course design content creation materials, enriching our teaching content diversity. Teachers leveraging educational LLM agents to enhance curriculum content creation fully validates Hypothesis 1.
Indirect effect via School Support: The indirect path “Educational LLM Agents Use → School Support → Curriculum Content Creation” was not significant (95% CI [–0.0261, 0.1349]), as the confidence interval included zero. Hypothesis H2 was therefore not supported. In qualitative research, Respondent 7 reflected: "When using AI agents to assist in designing course teaching content, the Spark Teacher Assistant AI agent recommended during school training proved unhelpful. I was designing a junior secondary school Rainbow Fountain experiment plan at the time, and the content provided by the Spark Teacher Assistant recommended by the school was not what I needed. It was less useful than the “Physics, Chemistry and Biology Experiment Inquiry Design” personalized AI platform recommended to me by my apprentice. This platform could provide the experiment's purpose, a list of experimental materials, experimental steps, expected results, experimental conclusions, and safety precautions. Notably, the materials list was presented in tabular format, categorized into containers, reagents, tools, and auxiliary materials, with specific items and safety warnings included. This demonstrates that resources provided by schools, if they fail to precisely match teachers' needs, cannot effectively drive the creation of curriculum content. This further explains why Hypothesis 2 was not significant.
Indirect effect via Teacher Self-Efficacy: The indirect path “Educational LLM Agents Use → Teacher Self-Efficacy → Curriculum Content Creation” was significant, with an effect size of 0.1192 (95% CI [0.0372, 0.2111])—accounting for 13.30% of the total effect. This means Educational LLM Agents Use enhances Teacher Self-Efficacy, which in turn facilitates curriculum content creation. As Respondent 14 noted: "I use the teaching materials provided by Kimi and find their quality increasingly high, with reference sources also provided. The optimization following follow-up inquiries has also improved significantly. I frequently share exemplary application cases within our teaching research group chat, and later the year group even invited me to share my experience, which gave me a real sense of pride. Moreover, during the teaching research experience sharing session, there was deeper case discussion centered around the content design of my elective course, which actually proved very helpful for further developing that elective." Teachers' use of educational LLM agents enhances their sense of self-efficacy, indirectly driving curriculum content creation, thereby fully validating Hypothesis 3.
Chain mediation effect: The path “Educational LLM Agents Use → School Support → Teacher Self-Efficacy → Curriculum Content Creation” produced a significant chain mediation effect (effect size = 0.0434; 95% CI [0.0140, 0.0756]), accounting for 4.84% of the total effect. While smaller in magnitude, this pathway shows School Support indirectly contributes to curriculum content creation by strengthening Teacher Self-Efficacy. As interviewee 19 noted, ‘The school secured a pilot for a large-scale educational model,’ and interviewee 14 remarked, ‘Using Kimi yielded valuable application cases, which we later shared as best practices – a source of considerable pride.’ The use of educational LLM agents drives institutional support for enhancing teacher self-efficacy, ultimately elevating curriculum content creation capabilities. This fully validates Hypothesis 4.
The total indirect effect was 0.2111 (95% CI [0.0992, 0.3317]), accounting for 23.56% of the total effect. These results confirm Educational LLM Agents Use not only directly shapes teachers’ curriculum content creation but also exerts indirect effects through School Support and Teacher Self-Efficacy (Table 5).
Problem 6:Editorially, a few consistency edits will improve polish and readability: settle on a single term for the focal technology (e.g., “educational LLM agents” throughout); harmonize significance markers and footnotes across tables; and, if you retain the model figure, consider adding standardized coefficients and significance indicators to it. The very high Cronbach’s alpha values suggest there may be room to shorten scales or, at least, to discuss potential item redundancy. Finally, rather than “data available on request,” please consider depositing anonymized data, code (or SEM input), and the finalized instrument in a public repository; this will materially enhance transparency and impact.
Response:
Firstly, regarding your suggestion that ‘settle on a single term for the focal technology (e.g., “educational LLM agents” throughout); harmonize significance markers and footnotes across tables’, We have reviewed the entire text and ensured the consistent use of the term "educational LLM agents" throughout. Furthermore, we have aligned key markings and footnotes between Tables 2 and 3.
Secondly, regarding your suggestion that‘if you retain the model figure, consider adding standardized coefficients and significance indicators to it’, We incorporated your earlier suggestions by integrating the chain-of-causation model with interviews, as shown in Figure 2.
Thirdly, regarding your suggestion that‘The very high Cronbach’s alpha values suggest there may be room to shorten scales or, at least, to discuss potential item redundancy’, We supplemented the criteria with the following: all dimension combination reliabilities (CR) ≥0.7, all average variance extractions (AVE) ≥0.5, the Fornell-Larcker criterion, and the heterogeneity-to-homogeneity ratio (HTMT),see details L 263-283. We have also reflected on the limitations, as detailed below(L 697-703):
The scales employed in this study exhibit high Cronbach's alpha values, potentially indicating redundancy. However, owing to constraints on sample accessibility and the unique timing of the research, it may not be feasible to revise the scales. Consequently, future research will incorporate semantic differentiation screening and reverse-scored items during the scale design phase. This approach aims to broaden the expressive dimensions of items while preserving the core construction’s essence, thereby balancing internal consistency with the richness of construct representation.
Fourthly, regarding your suggestion that‘Finally, rather than “data available on request,” please consider depositing anonymized data, code (or SEM input), and the finalized instrument in a public repository’, our response is as follows:
We sincerely appreciate the reviewer’s insightful suggestion to deposit anonymized data, codes (and SEM input files) as well as the finalized research instrument in a public repository, which will indeed significantly enhance the transparency and influence of the study. I fully endorse the concept of open-air sharing and am inclined to make the relevant data publicly available. However, since one co-author is currently conducting in-depth follow-up research based on this dataset, they suggest not releasing it to the public for the time being and will take charge of uploading the data after their own paper is completed. Additionally, if you still recommend uploading the materials and opening a feedback channel, we will conduct further discussions to determine the final plan in due course.
We would like to extend our sincere thanks to you again! Your rigorous and detailed review comments have not only helped us identify the omissions in the research but also promoted the dual improvement of the academic quality and practical guiding significance of the paper. We will strictly follow your comments to complete the final revision of the paper and will continue to absorb your suggestions in future research to further deepen the relevant issues in the field of educational digitalization.
Reviewer 2 Report
Comments and Suggestions for AuthorsIn general terms, the manuscript identifies a relevant gap, but it is still formulated in too broad a manner. You point out that further research is needed on how LLM-based educational agents influence teachers' curriculum design practices and how school support and self-efficacy intervene as mediators. This is well-intentioned, but as a reviewer, I would suggest making it much more specific: it is necessary to clearly show what previous studies have done exactly (e.g., focusing on student learning, intention to use, or teacher attitudes) and what they have not addressed (chain mediation model, focus on curriculum content creation, joint role of school support and self-efficacy).
The theoretical novelty can also be reinforced. You invoke social cognitive theory, but the gap is not formulated as a clear theoretical problem. It would be helpful to explicitly state that there is no empirical evidence modeling, in the context of educational LLMs, the sequence “environment (institutional support) → cognition (teacher self-efficacy) → behavior (innovation and curriculum content creation)”. This positions your contribution not merely as “another case involving AI,” but as a test and refinement of a classic framework of human behavior.
Finally, the study's context and design are an advantage that is not currently being exploited as part of the gap. You could emphasize that there are almost no studies of secondary school teachers in systems heavily driven by AI policies (such as the Chinese one) and that most previous work is exclusively quantitative. Your mixed-methods approach allows you to also cover the level of subjective and organizational meaning, and this should be explicitly presented as part of the gap that the study begins to fill.
Author Response
Respected Reviewer:
First and foremost, we would like to express our sincere gratitude to you for putting forward your valuable review comments on this study. Your comments not only accurately point out the areas that need improvement in the research but also provide a clear and professional direction for our revision work. In response to all your questions and suggestions, we have sorted them out one by one and carried out systematic responses and supplementary improvements, striving to make the logical system of the research more rigorous and the content more practical.
Problem 1:In general terms, the manuscript identifies a relevant gap, but it is still formulated in too broad a manner. You point out that further research is needed on how LLM-based educational agents influence teachers' curriculum design practices and how school support and self-efficacy intervene as mediators. This is well-intentioned, but as a reviewer, I would suggest making it much more specific: it is necessary to clearly show what previous studies have done exactly (e.g., focusing on student learning, intention to use, or teacher attitudes) and what they have not addressed (chain mediation model, focus on curriculum content creation, joint role of school support and self-efficacy).
Response:
Regarding your suggestion that ‘I would suggest making it much more specific: it is necessary to clearly show what previous studies have done exactly and what they have not addressed’, we have incorporated two recent studies focusing on student learning efficacy, and further elaborated on unresolved issues. The specific revisions are as follows:(L 58-72):
Despite these insights, some significant gaps remain. First, most research has concentrated on the impact of educational LLM agents on student learning(Sharma et al.,2025;Lim et al.,2025), leaving the mechanisms that shape teachers' curriculum design practices underexplored. Second, the interaction between school support and teachers' self-efficacy as mediators in the relationship between educational LLM agent use and content creation is not well understood, particularly regarding the potential for a chain mediation pathway. These gaps hinder our understanding of how educational LLM agents can effectively empower teachers' curriculum innovation. Finally, current research on secondary school teachers driven by strong AI education policies remains scarce, with existing studies predominantly quantitative in nature (FerikoÄŸlu & Akgün, 2022), rendering it difficult to address deeper issues concerning teachers' subjective perceptions and organizational dynamics. This study employs a mixed-methods approach, utilizing quantitative surveys to capture macro-level characteristics while employing qualitative interviews to explore micro-level experiences and organizational logic. This targeted methodology addresses existing research gaps, highlighting dual innovation in both contextual relevance and methodological design.
Problem 2:The theoretical novelty can also be reinforced. You invoke social cognitive theory, but the gap is not formulated as a clear theoretical problem. It would be helpful to explicitly state that there is no empirical evidence modeling, in the context of educational LLMs, the sequence “environment (institutional support) → cognition (teacher self-efficacy) → behavior (innovation and curriculum content creation)”. This positions your contribution not merely as “another case involving AI,” but as a test and refinement of a classic framework of human behavior.
Response:
Regarding your suggestion that ‘The theoretical novelty can also be reinforced’, we have added this section of explanation. And the specific changes are as follows(L 713-743):
Moreover, Bandura's social cognitive theory initially focused on interpersonal interaction scenarios, with its core triadic interaction determinism (individual factors, behavior, environment) long employed to explain cognitive and behavioral linkages among human groups (Bandura, 1986). In recent years, with the proliferation of artificial intelligence technologies, some scholars have extended the theory's application to human-computer interaction domains (Guan et al., 2025). However, existing research has scarcely addressed mutual cognitive facilitation between humans and personalized intelligent agents. Consequently, this study transcends social cognitive theory's traditional interpersonal interaction framework while expanding the boundaries of human-computer interaction research. By incorporating ‘cognitive interaction between humans and personalized educational agents’ into theoretical analysis, it provides novel empirical support for the theoretical evolution of social cognitive theory in the era of intelligent education.
Problem 3:Finally, the study's context and design are an advantage that is not currently being exploited as part of the gap. You could emphasize that there are almost no studies of secondary school teachers in systems heavily driven by AI policies (such as the Chinese one) and that most previous work is exclusively quantitative. Your mixed-methods approach allows you to also cover the level of subjective and organizational meaning, and this should be explicitly presented as part of the gap that the study begins to fill.
Response:
Regarding your suggestion that‘the study's context and design are an advantage that is not currently being exploited as part of the gap’, we have made the following additions and changes(L 64-72):
Finally, current research on secondary school teachers driven by strong AI education policies remains scarce, with existing studies predominantly quantitative in nature (FerikoÄŸlu & Akgün, 2022), rendering it difficult to address deeper issues concerning teachers' subjective perceptions and organizational dynamics. This study employs a mixed-methods approach, utilizing quantitative surveys to capture macro-level characteristics while employing qualitative interviews to explore micro-level experiences and organizational logic. This targeted methodology addresses existing research gaps, highlighting dual innovation in both contextual relevance and methodological design.
We would like to extend our sincere thanks to you again! Your rigorous and detailed review comments have not only helped us identify the omissions in the research but also promoted the dual improvement of the academic quality and practical guiding significance of the paper. We will strictly follow your comments to complete the final revision of the paper and will continue to absorb your suggestions in future research to further deepen the relevant issues in the field of educational digitalization.
Reviewer 3 Report
Comments and Suggestions for AuthorsAvoid yes/no RQ: they are not very nuanced.
l. 89: cite those existing studies
l. 136-7: good point
Figures are useful
l. 187: what was your population who received the questionnaire: was it developed coastal regions of China? If so, state that in this line. And what grade range - and subject matter-- did you disseminate the questionnaire to?
l. 195: you stated you focused on developed regions, but you say the majority were rural. Please explain.
l. 211: good that you based your questionnaires on validated tools.
It would be useful to see the questionnaire or at least the main question topics.
Good statistical methods
Appropriate limitations
Proofread citations for consistency (especially capitalization)
Author Response
Respected Reviewer:
First and foremost, we would like to express our sincere gratitude to you for putting forward your valuable review comments on this study. Your comments not only accurately point out the areas that need improvement in the research but also provide a clear and professional direction for our revision work. In response to all your questions and suggestions, we have sorted them out one by one and carried out systematic responses and supplementary improvements, striving to make the logical system of the research more rigorous and the content more practical.
Problem 1:Avoid yes/no RQ: they are not very nuanced.
Response:
Regarding your suggestion that‘Avoid yes/no RQ’, we have made the following changes(L 77-86):
1.To what extent and in what ways does the application of educational LLM agents exert a significant influence on the quality of teachers’ curriculum content creation?
2.What is the mediating role and effect size of school support in the association between educational LLM agent utilization and teachers’ curriculum content creation quality?
3.To what degree does teachers’ self-efficacy mediate the relationship between the use of educational LLM agents and the quality of their curriculum content creation?
4.How does school support indirectly boost the quality of teachers’ curriculum content creation by improving their self-efficacy, and what is the magnitude of this chain mediation effect?
Problem 2:l. 89: cite those existing studies
Response:
Regarding your suggestion that‘cite those existing studies’, we have made the following changes(L 94-97):
Curriculum content creation refers to a series of organized activities guided by teaching objectives, centered around online teaching resources, knowledge content systems, and the pre-class, in-class, and post-class teaching phases. (Liang & Lu, 2025).
Problem 3:l. 187: what was your population who received the questionnaire: was it developed coastal regions of China? If so, state that in this line. And what grade range - and subject matter-- did you disseminate the questionnaire to?
- 195: you stated you focused on developed regions, but you say the majority were rural. Please explain.
Response:
â‘ Regarding your suggestion that‘what was your population who received the questionnaire’, We respond as follows(L 200-204):
During the formal phase, we distributed the questionnaire online via Wen Juan Xing (Questionnaire Star). We collected a total of 520 questionnaires. After excluding 56 invalid ones (including those with missing data, blank items, or duplicate options), we finally obtained 464 valid questionnaires, with an effective recovery rate of 89.2%.​
â‘¡Regarding your suggestion that‘what grade range - and subject matter-- did you disseminate the questionnaire to’, We respond as follows(L 214-223):
By teaching grade, most participants taught Grade 7 (n = 215, 46.4%) and Grade 8 (n = 147, 31.7%); this distribution may relate to lower secondary teachers’ higher willingness to participate in the survey. By teaching subject, science (n = 188, 40.5%) and mathematics (n = 163, 35.1%) teachers were the most common, while information technology teachers were relatively few. Detailed demographic information is provided in Table 1.
â‘¢Regarding your suggestion that‘was it developed coastal regions of China’ and ‘ you stated you focused on developed regions, but you say the majority were rural. Please explain’, We respond as follows(L 206-215):
It should be clarified that the term ‘coastal developed regions’ as defined in this study specifically refers to coastal provinces such as Zhejiang, Fujian, and Jiangsu, which exhibit relatively high levels of economic development. Although these provinces rank among the nation's leaders in overall socio-economic advancement, they still contain extensive rural areas, demonstrating a pronounced urban-rural development gradient. During the data collection phase, questionnaires were distributed through teacher training programmes. These training sessions encompassed not only urban school teachers but also a substantial number of frontline educators from rural areas within the developed provinces.
In addition, we reflect on the limitations(L 673-678):
The geographical distribution of the sample is markedly imbalanced, with research data predominantly sourced from economically developed coastal regions. The proportion of survey samples from western and less developed areas is extremely low, making it difficult to fully reflect the operational mechanisms of research variables across diverse educational settings. It also fails to comprehensively reveal the potential impact of regional developmental disparities on the core research questions.
Problem 4:l. 211: good that you based your questionnaires on validated tools.
It would be useful to see the questionnaire or at least the main question topics.
Response:
Regarding your suggestion that‘It would be useful to see the questionnaire or at least the main question topics’, We have included four scales and their specific questions in the appendix(L 944-979):
Appendix A
Survey Instruments in the Study
Questionnaire survey of educational LLM agents applied to teachers' professional development
Dear teacher:
Hello! In order to gain a deeper understanding of your cognition, usage status and related attitudes towards agent tools, and explore the application value and potential of agents in education and teaching scenarios, we conducted this questionnaire survey. This survey is anonymous, and all data is only used for academic research and is strictly confidential. It is estimated that it will take you 5-7 minutes to fill out the questionnaire, please answer truthfully according to your actual situation
Thank you for your support and cooperation in your busy schedule!
- Basic information
Your gender is: â–¡ Male â–¡ Female
Your domicile:
â–¡ Rural â–¡ townshipsâ–¡ countiesâ–¡ prefecture-level citiesâ–¡ provincial capitalsâ–¡ municipalities directly under the Central Government
The middle school grades you teach are:
â–¡ 1st â–¡ 2nd â–¡ 3rd grade â–¡ Cross-grade teaching (please specify: ______)
The subjects you teach are:
â–¡ Chinese â–¡ Mathematics â–¡ English â–¡ Physics â–¡ Chemistry â–¡ Biology â–¡ History
â–¡ Geography â–¡ Politics (Ethics & Rule of Law) â–¡ Music â–¡ Fine Arts â–¡ Physical Education â–¡ Information Technology â–¡ Others (please specify: ______)
- Agent tools and functional cognition
Please hit "√" after the option that matches your situation, you can select multiple options:
I understand the tool platform of the agent: â–¡ Domestic: Baidu Wenxin Agent
â–¡ Domestic: iFLYTEK Spark Platform â–¡ Domestic: Tencent Hunyuan Platform â–¡ Others (please specify: ______)
I understand the functions of the following agents: â–¡ Wensheng Video
Function â–¡ Wensheng Document Function â–¡ Answering Function â–¡ Wensheng
Diagram Function â–¡ Audio Clip Function â–¡ Code Generation Function â–¡ Others (Please Specify: ______)
3.Attitude towards the use and operation of agent tools
The meaning of each alternative answer is as follows (same below):
5 Very true: means that this statement holds true for you in almost all cases;
4 Conformity: means that under normal circumstances, this statement is consistent withyou;
3 Uncertainty: means that in half of the cases, this statement is consistent with you;
2 Non-conformity: means that under normal circumstances, this statement is not in accordance with you;
1 Very inconsistent: means that in almost all cases this statement is inconsistent with you
|
Scale problem |
Degree of conformity |
||||
|
1. I will use an educational LLM agent for personalized learning. |
1 |
2 |
3 |
4 |
5 |
|
2. Educational LLM agents will enhance my teaching practice and interaction with digital resources. |
1 |
2 |
3 |
4 |
5 |
|
3. When using an educational LLM agent, I feel that technology can help me alleviate some of the teaching tasks that I don't think are ideal. |
1 |
2 |
3 |
4 |
5 |
|
4. The Educational LLM Agent Application takes on important ancillary tasks that I cannot complete due to time constraints. |
1 |
2 |
3 |
4 |
5 |
|
5. I will be hiring an educational LLM agent because it has a significantly higher adoption rate among teaching staff and peers at my institution. |
1 |
2 |
3 |
4 |
5 |
|
6. Both institutions and faculty teams support the use of LLM agents in education. |
1 |
2 |
3 |
4 |
5 |
|
7. My tutor will provide strong support in my use of educational LLM agents in course design content creation. |
1 |
2 |
3 |
4 |
5 |
|
8. Overall, the agency facilitates support for the implementation of educational LLM agents. |
1 |
2 |
3 |
4 |
5 |
|
9. I am confident in executing curriculum design content creation efficiently. |
1 |
2 |
3 |
4 |
5 |
|
10. I am confident in the process of creating course design content using an educational LLM agent. |
1 |
2 |
3 |
4 |
5 |
|
11. If you want to use an education-grade language model agent to produce course design content, I believe you can succeed. |
1 |
2 |
3 |
4 |
5 |
|
12. I am confident in using an educational LLM agent to improve the quality of course design content. |
1 |
2 |
3 |
4 |
5 |
|
13. If I encounter technical issues in creating course design content using an education-grade LLM agent, I believe I can handle it effectively. |
1 |
2 |
3 |
4 |
5 |
|
14. I believe that the Education LLM Agency course will positively impact my course design abilities and instructional content creation. |
1 |
2 |
3 |
4 |
5 |
|
15. The Education LLM Agent has enhanced my ability to solve the challenges of course content creation, as well as my critical thinking skills. |
1 |
2 |
3 |
4 |
5 |
|
16. By using educational LLM agents, I have observed an improvement in course content creation standards. |
1 |
2 |
3 |
4 |
5 |
|
17. The Education LLM Agent will positively influence my course content creation ability and teaching results. |
1 |
2 |
3 |
4 |
5 |
Thank you again for taking the time to fill out this survey! Good luck with your work and happy life!
Problem 5: Good statistical methods
Response:
Regarding your suggestion that‘Good statistical methods’, We adopted a hybrid method of quantitative and qualitative analysis, as detailed below(L 314-318或L 382-396)。
We conducted quantitative analyses using SPSS 26.0. First, we used descriptive statistics and correlation analyses to overview sample characteristics. Secondly, we employed SPSS AMOS 26.0 and the Maximum Likelihood (ML) estimator to conduct structural equation modelling (SEM) analysis in order to test our proposed hypotheses.
In the qualitative analysis section, this study employs content analysis, a semi-quantitative research method that enables repeatable, valid inferences from texts (or other meaning-bearing entities such as videos) to their contextual usage. It objectively, systematically, and quantitatively describes explicit communication content. Utilizing this approach, the study conducts objective, systematic, and quantitative content coding and analysis of materials related to teachers' use of educational LLMs in course content creation. The specific process strictly adhered to the six stages of content analysis methodology: establishing research objectives, defining the research population and selecting units of analysis, designing the analytical dimension system, sampling and quantifying the materials, and conducting evaluative recording and analytical inference. Throughout, standardized procedures inherent to content analysis were applied, including dual coding, consistency testing and reporting, and resolving discrepancies through discussion.
Problem 6: Appropriate limitations
Response:
Regarding your suggestion that‘Appropriate limitations’, we have made further optimizations. The changes are as follows(L 658-708):
Despite the strong application potential of educational LLM agents, their current development still faces certain limitations.
Firstly, Han (2024) pointed out that issues like data privacy and algorithmic bias remain significant challenges, and these challenges constrain the deeper integration of educational LLM agents into education.
Secondly, this study did not incorporate certain control variables (such as demographic variables), potentially introducing minor omission bias. However, adding supplementary control variables increases model complexity, and pre-testing indicates these variables exert no significant influence on core pathways. Consequently, establishing the baseline model remains a priority. Future research may employ multi-group SEM to validate the cross-sample robustness of the baseline model or construct cross-lagged models using longitudinal data to further enhance the reliability of causal inference.
Thirdly, this study exhibits certain limitations in terms of sample size and disciplinary coverage, which to some extent constrain the generalizability of its conclusions. Firstly, the geographical distribution of the sample is markedly imbalanced, with research data predominantly sourced from economically developed coastal regions. The proportion of survey samples from western and less developed areas is extremely low, making it difficult to fully reflect the operational mechanisms of research variables across diverse educational settings. It also fails to comprehensively reveal the potential impact of regional developmental disparities on the core research questions. Secondly, the selection of subject areas is singularly focused. Current research samples and data concentrate exclusively on science disciplines, excluding subjects such as humanities and languages. Given the significant differences in teaching methodologies, learning contexts, and teacher-student interaction dynamics across disciplines, the findings struggle to be transferred to the broader educational landscape encompassing all subject areas. Therefore, future research may be optimized and expanded in two respects. Firstly, efforts should be made to broaden the geographical coverage of the sample, with particular emphasis on supplementing survey data from western and less developed regions. By comparing samples across multiple regions and developmental levels, the cross-regional applicability of this study's conclusions may be validated, whilst simultaneously exploring the interactive effects between regional development factors and core research variables. Secondly, disciplinary boundaries should be transcended by incorporating humanities and language disciplines into the research scope. Comparing the adaptability of research models across different academic fields will enhance the theoretical framework's disciplinary universality, thereby providing more comprehensive empirical support for constructing an education intervention system with full disciplinary coverage.
Fourthly, the scales employed in this study exhibit high Cronbach's alpha values, potentially indicating redundancy. However, owing to constraints on sample accessibility and the unique timing of the research, it may not be feasible to revise the scales. Consequently, future research will incorporate semantic differentiation screening and reverse-scored items during the scale design phase. This approach aims to broaden the expressive dimensions of items while preserving the core construction’s essence, thereby balancing internal consistency with the richness of construct representation.
Last but not least,other factors, such as those related to schools and local education authorities, may influence how strongly teachers’ curriculum content creation is affected. Therefore, future research should further investigate the independent roles and interactive effects of factors at the educational administration and school levels. Such efforts will help uncover the underlying mechanisms connecting educational LLM agent use and teachers’ curriculum content creation from a wider perspective.
Problem 7: Proofread citations for consistency (especially capitalization)
Response:
Regarding your suggestion that‘Proofread citations for consistency (especially capitalization)’, We have reviewed and modified the document format. The changes are as follows: L925-926.
We would like to extend our sincere thanks to you again! Your rigorous and detailed review comments have not only helped us identify the omissions in the research but also promoted the dual improvement of the academic quality and practical guiding significance of the paper. We will strictly follow your comments to complete the final revision of the paper and will continue to absorb your suggestions in future research to further deepen the relevant issues in the field of educational digitalization.
