Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Artificial Intelligence Performance in Introductory Biology: Passing Grades but Poor Performance at High Cognitive Complexity

Educ. Sci. 2025, 15(10), 1400; https://doi.org/10.3390/educsci15101400

by Megan E. Rai, Michael Ngaw and Natalie J. Nannas^*

Reviewer 1:

Promethi Das Deep

Reviewer 2:

Faith Jeremiah

Educ. Sci. 2025, 15(10), 1400; https://doi.org/10.3390/educsci15101400

Submission received: 24 July 2025 / Revised: 7 October 2025 / Accepted: 13 October 2025 / Published: 18 October 2025

(This article belongs to the Topic Generative Artificial Intelligence in Higher Education)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The title is appropriate and descriptive; however, I recommend simplifying it by removing the phrase “collegiate-level” for clarity, for example, “AI Performance in Introductory Biology.”

The abstract effectively outlines the study’s aim, methods, and key findings. To enhance its practical relevance, I suggest adding one sentence that highlights why the findings matter for educators or curriculum developers, particularly in shaping assessment strategies in AI-integrated learning environments.

The introduction could benefit from one brief sentence connecting this study to broader educational technology literature, not just biology. This would strengthen its interdisciplinary value. Line 30: “and create new content”, I suggest considering revising to “and generate novel content” for academic precision. Line 33: “over two-thirds of current occupations will be partially automated by AI.” I recommend specifying “occupational tasks” rather than “occupations,” which is broader. Line 39: “altering forms of assessments to avoid AI usage” , revise to “modifying assessment formats to limit AI interference.” Line 45: “how they perform on various assessments in different disciplines”, I believe it could be more direct as “their performance across assessment types in biology education.”

The Materials and Methods section is clearly structured, with appropriate use of standardized grading rubrics, alignment to Bloom’s Taxonomy, and valid statistical analyses. However, several areas would benefit from clarification and improved terminology. Line 107–108: The statement that “no iterative prompting was performed” is important but could be expanded. Clarify that only a single prompt was used per question and no feedback loops were allowed, this ensures methodological rigor. Line 115: “Each AI was treated as a unique individual” I suggest “Each AI was evaluated as an individual participant” to avoid anthropomorphism. Line 132: I recommend replace “Level 1, Level 2…” phrasing with “Bloom Level 1 (Remember), Bloom Level 2 (Understand)…,” for precision.

The Results section is clearly structured and effectively presents findings across multiple dimensions, including assessment type, cognitive complexity, and final grades. The use of statistical analysis is appropriate, and the tables and figures are well-aligned with the narrative. To improve clarity, the authors should briefly contextualize AI scores by indicating whether they represent passing or failing performance. Some descriptions of AI score improvements are repetitive and could be summarized more succinctly. The section on Bloom’s Taxonomy would benefit from a concise statement summarizing the overall trend of declining AI performance with increasing cognitive complexity. Additionally, all table references should be accompanied by brief interpretations rather than being presented as standalone data.

The Discussion appropriately interprets the findings in relation to prior literature and clearly addresses the implications for educational practice, particularly assessment design in AI-integrated classrooms. The authors effectively link their results to Bloom’s Taxonomy and provide practical recommendations for educators. To enhance clarity and scholarly rigor, a few areas could be strengthened. First, the section would benefit from a more focused synthesis of the main findings at the outset, rather than dispersing them throughout. Additionally, while the limitations of image and sequence-based tasks are acknowledged, the discussion could better address the study’s broader limitations, such as generalizability beyond biology or single-institution sampling. The forward-looking suggestions (e.g., AI as a reading companion) are thoughtful but could be better supported with emerging empirical evidence. A brief cautionary note on overreliance on AI-generated content would also add balance.

The conclusion effectively reiterates the central findings, that while generative AI tools can perform well on lower-order cognitive tasks, they struggle with higher-order assessments and tasks requiring image or sequence analysis. The call for educators to redesign assessments to focus on cognitive complexity is both appropriate and timely. However, the conclusion would benefit from a clearer, more concise summary of the study’s practical implications. Currently, key takeaways are somewhat embedded within broader reflections. The authors may consider ending with a sharper statement emphasizing how the findings can inform AI-integrated curriculum and assessment reform.

The references are generally current, relevant, and diverse. Thank you.

Comments on the Quality of English Language

The manuscript is readable but would benefit from minor language editing to improve clarity, consistency, and academic tone.

Author Response

We would like to thank Reviewer 1 for their careful reading of our manuscript and their insightful feedback to improve the work. We agree with the Reviewer on all of their points, and have made changes based on all comments and suggestions. Please see below for the specific edits. Thank you!

Reviewer 1 comment: The title is appropriate and descriptive; however, I recommend simplifying it by removing the phrase “collegiate-level” for clarity, for example, “AI Performance in Introductory Biology.”

Response: We thank the reviewer for their feedback on our manuscript. We have altered the title to remove “collegiate-level”.

Reviewer 1 comment: The abstract effectively outlines the study’s aim, methods, and key findings. To enhance its practical relevance, I suggest adding one sentence that highlights why the findings matter for educators or curriculum developers, particularly in shaping assessment strategies in AI-integrated learning environments.

Response: We have altered the last sentence of the abstract to speak more directly to educators to highlight how our findings can be used in curriculum development, stating “By understanding their capabilities at different levels of complexity, educators will be better able to adapt assessments based on AI ability, particularly through the utilization of image and sequence-based questions, and integrate AIinto higher education curriculum.”

Reviewer 1 comment: The introduction could benefit from one brief sentence connecting this study to broader educational technology literature, not just biology. This would strengthen its interdisciplinary value.

Response: We have edited the manuscript, specifically altering the last sentences of the first paragraph in the introduction (line 52-57) to state “However, before educators invest in different pedagogical approaches and curricular restructures, it is critical to understand the capabilities of generative AI tools and their performance across assessment types. While our study focused on biology, the results inform collegiate-level curriculum more broadly as the assessments are common formats including problem-sets, exams, and papers, used across many disciplines.”

Reviewer 1 comment: Line 33: “over two-thirds of current occupations will be partially automated by AI.” I recommend specifying “occupational tasks” rather than “occupations,” which is broader.