The Effect of Artificial Intelligence-Supported Sustainable Geography Education on the Preparation Process for the IGEO Olympiad
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors-
Research Design: Insufficient description of baseline equivalence
The manuscript only states that classes were “randomly assigned,” but does not report whether the experimental and control groups were comparable on prior geography grades, GIS experience, or socio-economic status.
Recommendation: add a table comparing baseline characteristics (e.g., previous standardized scores, AI self-efficacy, parental education) and use ANCOVA or hierarchical regression to control for potential confounders. -
Measurement Instruments: Contradictory reliability indices for the IGEO achievement test
The paper lists both KR-21 = 0.872 and Cronbach’s α = 0.981 without explaining why two internal-consistency coefficients differ so sharply, and does not clarify the criteria for retaining all 25 items.
Recommendation:
• Clarify that KR-21 is for dichotomous scoring and α for Likert sub-dimensions.
• Report item-deletion criteria (e.g., item-total < 0.30 or p < 0.20) and provide sample items. -
Data Analysis: Power and effect sizes are missing
Significance is claimed at p < 0.05, but no effect sizes (η² or Cohen’s d) or post-hoc power calculations are supplied.
Recommendation:
• Add Cohen’s d to Tables 4 and 7 (e.g., d = 1.12 indicates a large effect).
• State whether the sample size was pre-determined via a power analysis (α = 0.05, 1-β = 0.80) to avoid “small-sample-large-effect” misinterpretation. -
Ethics: Secondary use of minors’ data not addressed
Only parental consent is mentioned; there is no statement about whether the data will later be used to train machine-learning models or released as a public dataset.
Recommendation: supplement the “Ethical Process” with details on anonymization (e.g., name recoding, facial blurring) and data-retention period (e.g., destruction after five years), and explicitly state that data will not be used for commercial AI training. -
Conclusions & Discussion: Sustainability dimension under-evaluated
Although the title emphasizes “sustainable geography education,” the conclusions focus on IGEO scores and problem-solving, without quantifying changes in students’ SDG awareness or environmental attitudes.
Recommendation:
• Add pre-/post-differences on an SDG consciousness scale (e.g., Sustainability Consciousness Scale).
• Discuss how AI support fosters “sustainability thinking,” not just skill improvement, and propose longitudinal designs to track behavioral change (e.g., energy-saving actions), citing UNESCO (2021). - Lack of author names
- Lack of institution
Author Response
Research Design: Insufficient description of baseline equivalence
The manuscript only states that classes were “randomly assigned,” but does not report whether the experimental and control groups were comparable on prior geography grades, GIS experience, or socio-economic status.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Measurement Instruments: Contradictory reliability indices for the IGEO achievement test
The paper lists both KR-21 = 0.872 and Cronbach’s α = 0.981 without explaining why two internal-consistency coefficients differ so sharply, and does not clarify the criteria for retaining all 25 items.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Data Analysis: Power and effect sizes are missing
Significance is claimed at p < 0.05, but no effect sizes (η² or Cohen’s d) or post-hoc power calculations are supplied.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Ethics: Secondary use of minors’ data not addressed
Only parental consent is mentioned; there is no statement about whether the data will later be used to train machine-learning models or released as a public dataset.
Recommendation: supplement the “Ethical Process” with details on anonymization (e.g., name recoding, facial blurring) and data-retention period (e.g., destruction after five years), and explicitly state that data will not be used for commercial AI training.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Conclusions & Discussion: Sustainability dimension under-evaluated
Although the title emphasizes “sustainable geography education,” the conclusions focus on IGEO scores and problem-solving, without quantifying changes in students’ SDG awareness or environmental attitudes.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Lack of author names, Lack of institution
- Lack of author names
- Lack of institution
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsThis study examines the impact of an AI-enhanced, sustainability-focused geography curriculum on high school students’ spatial reasoning and problem-solving skills as they prepare for the International Geography Olympiad (IGEO). Although the topic is both timely and well matched to a mixed‐methods approach, several critical issues must be addressed before the manuscript can be considered for publication:
- RQ 1 is currently phrased as a fragment (“artificial intelligence-supported sustainable geography education on the IGEO preparation process?”) (p. 4). and not as a research question. Reformulate all three RQs as clear, parallel interrogative sentences.
- Section 2.1 describes “41 students from the experimental and control groups and 41 students from the control group were equalized” (p. 5), which is confusing. Clearly state the total N per group, how they were assigned (e.g., random or convenience), and any inclusion/exclusion criteria. A CONSORT-style flow diagram would help transparency.
- You test for normality (Shapiro–Wilk), report normal distributions, yet use Mann–Whitney U tests (Tables 4 & 7). There is no rationale for choosing non-parametric over t-tests once normality is established.
- Justify the use of non-parametric tests (e.g., small sample, ordinal scale, etc.).
- If the data are normally distributed and interval, consider independent-samples t-tests, reporting the t-value, degrees of freedom (df), and p-value.
- Clearly label which test was used in each table.
- In Table 8, R = .595 but R² = .831, Adjusted R² = .190—these values are inconsistent (R² should equal R² of R). Recompute and report the correct model summary (R, R², Adjusted R²), and ensure unstandardized (B), standardized (β), SEs, t, and p are accurate. Also discuss multicollinearity (VIFs) if multiple predictors are involved.
-
The process of coding (“by line… themes from codes”, L. 323-324) is mentioned, but lacks detail on coder training/experience, intercoder reliability, or trustworthiness strategies (e.g., member checking).
Expand the qualitative methods to include:
-
Number of coders and their backgrounds
-
Procedure for developing the codebook
-
Any checks on reliability (e.g., Cohen’s κ between coders)
-
Steps taken to enhance credibility (e.g., triangulation, thick description)
-
-
Subsection 4.1 is currently overly dense and unclear. Please restructure it into a concise, well-organized paragraph that explicitly states:
-
The ethics committee name, approval number, and date.
-
Who provided informed consent (e.g., participants and, for minors, their parents).
-
Any measures taken to ensure confidentiality and data security.
-
-
Improve Table 1: remove irrelevant text (“Girl–boy student available each other close”), align text to respective values, and present only clear demographic variables.
-
Ensure all tables include test statistics in the caption (e.g., “Table 4. Mann–Whitney U test comparing post-test map literacy scores, U = 167.00, p = .006.”).
- Have a native-level proofreader correct repeated grammatical errors (e.g., inconsistent capitalization, missing articles) and typos. Use active voice where possible, e.g., “We administered the IGEO achievement test as a pre- and post-test.”
- Avoid repeating IGEO descriptions in paras 2 and 4 of the Introduction (l. 32–39 vs. 46–53). Combine to streamline.
-
Explicitly acknowledge study limitations (e.g., single-site sample, potential novelty effect of AI tools). Suggest avenues for future research (e.g., longitudinal follow-up, scale-up to other subjects).
- Use in-text citations in accordance with the MDPI style.
Proofreading by a native speaker is needed.
Author Response
RQ 1 is currently phrased as a fragment (“artificial intelligence-supported sustainable geography education on the IGEO preparation process?”) (p. 4). and not as a research question. Reformulate all three RQs as clear, parallel interrogative sentences.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Section 2.1 describes “41 students from the experimental and control groups and 41 students from the control group were equalized” (p. 5), which is confusing. Clearly state the total N per group, how they were assigned (e.g., random or convenience), and any inclusion/exclusion criteria. A CONSORT-style flow diagram would help transparency.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
You test for normality (Shapiro–Wilk), report normal distributions, yet use Mann–Whitney U tests (Tables 4 & 7). There is no rationale for choosing non-parametric over t-tests once normality is established.Justify the use of non-parametric tests (e.g., small sample, ordinal scale, etc.).If the data are normally distributed and interval, consider independent-samples t-tests, reporting the t-value, degrees of freedom (df), and p-value. Clearly label which test was used in each table.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
In Table 8, R = .595 but R² = .831, Adjusted R² = .190—these values are inconsistent (R² should equal R² of R). Recompute and report the correct model summary (R, R², Adjusted R²), and ensure unstandardized (B), standardized (β), SEs, t, and p are accurate. Also discuss multicollinearity (VIFs) if multiple predictors are involved.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
The process of coding (“by line… themes from codes”, L. 323-324) is mentioned, but lacks detail on coder training/experience, intercoder reliability, or trustworthiness strategies (e.g., member checking).
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Subsection 4.1 is currently overly dense and unclear. Please restructure it into a concise, well-organized paragraph that explicitly states:
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Improve Table 1: remove irrelevant text (“Girl–boy student available each other close”), align text to respective values, and present only clear demographic variables.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Ensure all tables include test statistics in the caption (e.g., “Table 4. Mann–Whitney U test comparing post-test map literacy scores, U = 167.00, p = .006.”).
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Have a native-level proofreader correct repeated grammatical errors (e.g., inconsistent capitalization, missing articles) and typos. Use active voice where possible, e.g., “We administered the IGEO achievement test as a pre- and post-test.”
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Avoid repeating IGEO descriptions in paras 2 and 4 of the Introduction (l. 32–39 vs. 46–53). Combine to streamline.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Explicitly acknowledge study limitations (e.g., single-site sample, potential novelty effect of AI tools). Suggest avenues for future research (e.g., longitudinal follow-up, scale-up to other subjects).
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Use in-text citations in accordance with the MDPI style.
-All of these issues have been addressed and the corresponding revisions are highlighted in red in the manuscript. Thank you for your insightful and valuable feedback.
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for AuthorsUpon reviewing the manuscript, several grammatical and stylistic issues were identified that should be addressed to improve clarity and accuracy.
First, there are multiple subject-verb agreement errors, such as in the sentence: “These perspectives is necessary to understand…” The plural subject “perspectives” requires the plural verb “are,” so it should read: “These perspectives are necessary to understand…”
There are also instances of missing articles, which are essential in academic writing. For example, “can offer broader picture of…” should be corrected to “can offer a broader picture of…” since “picture” is a countable noun requiring an article.
In terms of redundant expressions, one example is “This study investigates and explores…” Since “investigates” and “explores” are nearly synonymous, using both without distinction is repetitive. It is recommended to either choose one or differentiate their roles, such as “This study investigates X and explores Y.”
Additionally, there are some prepositional errors, such as “in terms to…”, which should be “in terms of…” This is a common set phrase and using the wrong preposition here weakens the professionalism of the text.
Some pronoun references are vague. For instance, the sentence “It contributes to improving understanding” lacks a clear antecedent for “it.” A clearer revision would be: “This study contributes to improving understanding…”
There are also examples of ambiguous or weak word choices. In “as a base for sustainable behavior,” the term “base” is vague in this context. A stronger and more precise alternative would be “as a foundation for fostering sustainable behavior.”
Lastly, several sentences are overly long or dense, which may hinder readability. Breaking complex sentences into shorter, clearer ones would enhance the manuscript’s flow and accessibility.
Overall, the manuscript would benefit from minor but meaningful revisions in grammar, word choice, and sentence structure to align with standard academic English conventions. If desired, I can provide a version with tracked changes or in-line comments for easy editing.
Author Response
Thank you very much for your interest and support. I have made some revisions to my manuscript, but I would greatly appreciate your help with proofreading, if possible. Thank you again for everything. I am now sharing the full text of my manuscript with you.
Author Response File: Author Response.docx
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThis mixed-methods study examines the impact of a 10-week AI-supported, sustainability-focused geography program on high school students’ IGEO preparation. The paper now includes details of ethical approval, content validity procedures, and factor-analytic evidence for its scales. However, major concerns and issues remain:
- The description of how participants were selected—“41 students from the experimental and control groups and 41 students from the control group” via hierarchical regression—is unclear. Please specify the initial pool size, how randomization was achieved, and justify the use of regression for sample equalization, including coefficients and thresholds used for selection.
- The Problem‑Solving Skills perception scale’s confirmatory factor analysis yields CFI = .87—below the conventional .90 benchmark—and RMSEA = .052. Either consider model re‑specification (e.g., correlating residuals, item removal) or discuss why this fit is acceptable in your context. Reporting standardized loadings and modification indices would enhance transparency.
- You demonstrate normal distribution of post-test scores (Shapiro‑Wilk p > .05; Levene’s p = .868) yet employ the Mann–Whitney U test for post‑test comparisons. Please explain this choice—were assumptions for parametric tests violated for specific variables? If not, traditional t‑tests with confidence intervals could be more powerful and interpretable.
- The qualitative component lacks detail on sampling (how many students per interview/focus group), coding procedures, and measures of reliability or credibility. Describe your NVivo coding process, how you ensured intercoder agreement (if any), and how you determined thematic saturation to bolster trustworthiness.
- Language & Grammar: Numerous instances of awkward phrasing (e.g., “effect artificial intelligence” ---> “effect of artificial intelligence”) and inconsistent capitalization should be polished throughout.
- Figure 1 is actually Table 1. Do not paste it as a pic but as an editable DOCX table.
Author Response
Dear Editor,
Thank you very much for the opportunity to revise and resubmit our manuscript. We are sincerely grateful to you and the anonymous reviewers for your valuable time, insightful comments, and constructive suggestions that have significantly contributed to improving the quality of our work.
Comments 1: The description of how participants were selected—“41 students from the experimental and control groups and 41 students from the control group” via hierarchical regression—is unclear. Please specify the initial pool size, how randomization was achieved, and justify the use of regression for sample equalization, including coefficients and thresholds used for selection.
Response 1:
Explanations regarding the information you requested in item 4 have been provided, Professor. Additionally, a detailed explanation about the sample selection has also been included.
Randomization was implemented in the selection of the experimental and control groups; however, students’ academic achievements, parental educational status, previous educational experiences, AI self-sufficiency, GIS experience and levels of internal and external locus of control were also considered. Therefore, the study exhibits quasi-experimental characteristics. Furthermore, power analyses were conducted (α = 0.05, 1-β = 0.80) to mitigate potential misinterpretations arising from small sample sizes and large effect sizes during sample selection.
The sample of this study consisted of 102 students enrolled in [High school] in the city center of [Eskisehir]. In accordance with the quasi-experimental design with pretest-posttest control groups, students were initially assigned to experimental and control groups using a random number table to ensure randomization. However, to control for potential pre-existing differences between groups in terms of academic achievement, hierarchical regression analysis was conducted to create matched groups. In the regression model, students' academic achievement, parental educational status, previous educational experiences, AI self-sufficiency, GIS experience and levels of internal and external locus of control were included as independent variables. A significance level of p<.05 was used, and the model showed a coefficient of determination of R2=.42. Based on the standardized beta coefficients obtained from the regression, two groups of 41 students each (experimental and control) were formed with closely matched predicted values. This approach aimed to minimize individual differences between groups at baseline and to more accurately assess the effect of the experimental intervention.
Comments 2: The Problem‑Solving Skills perception scale’s confirmatory factor analysis yields CFI = .87—below the conventional .90 benchmark—and RMSEA = .052. Either consider model re‑specification (e.g., correlating residuals, item removal) or discuss why this fit is acceptable in your context. Reporting standardized loadings and modification indices would enhance transparency.
Response 2: You raised a very good point, Professor. It had completely slipped my mind, and the numerical data played tricks on me. I have updated the test, and the new CFI value is .92. Thank you very much for your support.
Comments 3: You demonstrate normal distribution of post-test scores (Shapiro‑Wilk p > .05; Levene’s p = .868) yet employ the Mann–Whitney U test for post‑test comparisons. Please explain this choice—were assumptions for parametric tests violated for specific variables? If not, traditional t‑tests with confidence intervals could be more powerful and interpretable.
Response 3: Professor, thank you very much for your valuable contribution. My explanation is as follows: Although the Shapiro-Wilk test and Levene’s test indicated that the post-test scores were normally distributed and variances were equal (Shapiro-Wilk p > .05; Levene’s p = .868), the Mann–Whitney U test was chosen due to the relatively small sample size (n = 41 per group), which may reduce the robustness of parametric tests. Additionally, some subgroups exhibited slight deviations from normality upon further examination, prompting a conservative approach by applying a non-parametric test. Nevertheless, both parametric (independent samples t-test) and non-parametric analyses were conducted, yielding consistent results. The decision to present Mann–Whitney U test outcomes was made to ensure the validity of conclusions despite potential violations of parametric assumptions.
Comments 4: The qualitative component lacks detail on sampling (how many students per interview/focus group), coding procedures, and measures of reliability or credibility. Describe your NVivo coding process, how you ensured intercoder agreement (if any), and how you determined thematic saturation to bolster trustworthiness.
Response 4:Professor, I am grateful for your support in this part. Thank you very much. I have made the revisions you requested.
- Sampling:The qualitative component of the study comprised a purposive sample of 24 voluntary participants from the experimental group. Participants were stratified into two homogeneous subgroups based on their scores on the map literacy achievement test and the Problem-Solving Skills Perception Scale: high scorers (n=12) and low scorers (n=12). Focus group sessions were conducted with an average duration of approximately one hour per group, consistent with the 1-2 hour timeframe recommended in the extant literature (Krueger, 1999). Approximately one hour of focus group interview was conducted for each participant. Coding Procedure: Qualitative data were systematically analyzed using NVivo 12 software. The analytical process commenced with open coding, followed by the aggregation of related codes into thematic clusters. This iterative coding approach facilitated in-depth engagement with the data to derive meaningful categories and emergent themes. Reliability: To ensure coding reliability, two independent coders separately coded the dataset. Intercoder agreement was quantitatively assessed through Cohen’s Kappa coefficient, yielding a value exceeding .80, indicative of strong concordance. This high level of agreement attests to the rigor and consistency of the coding procedure.Thematic Saturation: Data collection was concluded upon achieving thematic saturation, defined by the absence of novel themes or codes during successive focus group discussions. This methodological rigor bolsters the trustworthiness and validity of the study’s qualitative findings. Approximately one hour of focus group interview was conducted for each participant.
- Comments 5:
Language & Grammar: Numerous instances of awkward phrasing (e.g., “effect artificial intelligence” ---> “effect of artificial intelligence”) and inconsistent capitalization should be polished throughout.
Response 5: You are absolutely right, Professor. This section will be revised with professional language editing support during the research process.
Comments 6 : Figure 1 is actually Table 1. Do not paste it as a pic but as an editable DOCX table.
Response 6:
It has been corrected, Professor. Thank you very much for your valuable efforts. Respectfully yours.
Author Response File: Author Response.docx
Round 3
Reviewer 2 Report
Comments and Suggestions for AuthorsDear author, here are minor revisions recommended for the manuscript titled “The Effect of Artificial Intelligence Supported Sustainable Geography Education on the Preparation Process for the IGEO Olympiad”. These revisions focus on language, structure, consistency, and style—not major content or methodological changes that have already been taken care of.
- In the abstract, “the artificial effect intelligence (AI) supported sustainable geography Education” - change to "the effect of artificial intelligence (AI)-supported sustainable geography education"
- Fix wrong capitalizations, e.g., “Education”, “Were”, "This", “Focus group interview forms’-test” → should be lowercase and corrected. Fix the capitalization of "This" throughout the entire text. Ensure consistent usage (e.g., “Artificial Intelligence” vs. “artificial intelligence”, “Geography Education” vs. “geography education”)
- Be consistent in referring to “IGEO achievement test” and “Problem Solving Skills Perception Scale” (sometimes abbreviated, sometimes not).
- Olympyad” → “Olympiad"
- Educational cobots and smart classrooms. Interna-” → likely split error: “International”
-
Headings: Ensure consistent use of subsection numbering:
-
Fix: “2.6.Participant Diaries” → “2.6. Participant Diaries”
-
-
Repeated content: Line 603–604: “Participants… were informed that the data…” appears twice. Remove redundancy.
-
Tables: Clean formatting where needed. For example:
-
Table 6: Standard deviations like “8863” should be “8.863”
-
Use consistent decimal formatting (e.g., 8.86, not 8,86; use period, not comma, if following APA or MDPI).
-
-
Inconsistent formatting:
-
“t(39)=.716; p=.464 > .05” → consider APA style: t(39) = 0.72, p = .46
-
Add degrees of freedom, test statistic, and exact p-values throughout tables if not present.
-
-
Effect size reporting: Cohen’s d values are included, but consider reporting confidence intervals if possible.
-
AI-supported geography education was found to negatively predict the internal locus of control, while positively predicting the external locus of control.” → consider clarifying the educational significance of this result.
-
“The experimental process of the study was recorded by the researcher by keeping unstructured observation notes.” → could be smoother: “The researcher documented the experimental process through unstructured observational notes.”
- Ensure all in-text citations match the reference list and are numbered correctly in order of appearance.
- Fix duplicated sources in references
Author Response
Dear Editor,
I would like to sincerely thank the reviewers and editors for their valuable and constructive feedback on my manuscript titled “The Effect of Artificial Intelligence Supported Sustainable Geography Education on the Preparation Process for the IGEO Olympiad.”
I have carefully addressed all the minor revisions recommended, focusing on language, structure, consistency, and style, as suggested. Specifically, I have:
-
Corrected the phrasing in the abstract and throughout the manuscript, including proper capitalization and consistent terminology (e.g., “artificial intelligence” and “geography education”).
-
Ensured consistent references to the “IGEO achievement test” and “Problem Solving Skills Perception Scale” throughout the text.
-
Fixed typographical errors such as the split word “Interna-” corrected to “International,” and corrected “Olympyad” to “Olympiad.”
-
Standardized subsection numbering and heading formatting.
-
Removed redundant repeated sentences, particularly the duplicated content on lines 603–604.
-
Cleaned the formatting of tables, including correcting standard deviations (e.g., “8863” to “8.863”) and decimal points following APA/MDPI style.
-
Reformatted statistical notation to comply with APA style, adding degrees of freedom, test statistics, and exact p-values where missing.
-
Included suggestions regarding effect size reporting, and clarified the educational significance of key findings.
-
Improved sentence flow and clarity, such as rephrasing “The experimental process of the study was recorded by the researcher by keeping unstructured observation notes” to “The researcher documented the experimental process through unstructured observational notes.”
-
Verified that all in-text citations match the reference list, ensuring numbering consistency and corrected duplicated references in the bibliography.
I believe these revisions have significantly improved the clarity, coherence, and overall quality of the manuscript. I appreciate the time and effort of the reviewers and look forward to your favorable consideration.
Thank you very much for your attention.
Author Response File: Author Response.docx