Advanced Prompt Engineering in Emergency Medicine and Anesthesia: Enhancing Simulation-Based e-Learning
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsMedical education is rapidly evolving with the integration of artificial intelligence (AI), particularly through the application of generative AI to create dynamic learning environments.
AUTHORS examine the transformative role of prompt engineering in enhancing simulation-based learning in emergency medicine. By enabling the generation of realistic, context-specific clinical case scenarios, prompt engineering fosters critical thinking and decision-making skills among medical trainees.
To guide systematic implementation, THEY introduce the PROMPT⁺ Framework, a structured methodology for designing, evaluating, and refining prompts in AI-driven simulations, while incorporating essential ethical considerations. Furthermore, we emphasize the importance of developing specialized AI models tailored to regional guidelines, standard operating procedures and educational contexts to ensure relevance and alignment with current standards and practices.
THEIR paper suggests how AI-driven simulations can improve scalability and adaptability in medical training, equipping healthcare professionals with the necessary skills to address complex real-world challenges while advancing educational outcomes across all stages of their careers.
The contribution is interesting
I have some minor comments with a pure academic spirit for the authors.
1) In my opinion the introduction is a bit sparse and should be expanded with the contribution of other studies, perhaps of the review type.
2) The presentation is also lost in the introduction of the sections and should be further strengthened
3) The study does not follow a standard architecture (intro, methods, results, discussion, etc.) but the academic editor can intervene on this if he wants, since it is clear to me
4) Sections 2 and 3 (“prompt techniques “ and “use case”) are clear but should be smoothed out to make them more functional for section 4 which reports the essential core of the study
5) The results are not present. Are they the framework itself?
6) The limitations are missing from the discussion
7) Given the type of study, a section on future developments and applications could be included
Author Response
Comments 1: In my opinion the introduction is a bit sparse and should be expanded with the contribution of other studies, perhaps of the review type.
Response 1: Thank you for your insightful comment. We have expanded the Introduction by integrating additional systematic reviews on AI-driven medical education, particularly regarding its role in clinical decision-making, competency-based training, and prompt engineering methodologies.
Comments 2: The presentation is also lost in the introduction of the sections and should be further strengthened
Response 2: Thank you for your valuable feedback. We have strengthened the Introduction’s structure by improving the flow between sections and explicitly outlining the organization of the paper. This ensures a clearer transition from the problem statement to the research objectives and provides a structured preview of the following sections.
Comments 3: The study does not follow a standard architecture (intro, methods, results, discussion, etc.) but the academic editor can intervene on this if he wants, since it is clear to me
Response 3: Thank you for your observation. We have consciously deviated from the traditional IMRaD structure to better align with the methodological nature of our study. Since our primary contribution is the PROMPT⁺ Framework, we structured the paper to emphasize the development, application, and implications of this approach. This allows for a more coherent presentation of the conceptual and practical aspects of AI-driven simulation design. However, we appreciate your consideration that the academic editor may provide additional guidance on this matter.
Comments 4: Sections 2 and 3 (“prompt techniques “ and “use case”) are clear but should be smoothed out to make them more functional for section 4 which reports the essential core of the study.
Response 4: We appreciate your recognition of the clarity of Sections 2 and 3 and your suggestion to enhance their connection to Section 4, where the PROMPT⁺ Framework is introduced.
To improve the logical flow, we have refined the transitions between these sections, ensuring a smoother progression from the description of individual prompt techniques (Section 2) to their practical application in a clinical use case (Section 3), and finally to their structured integration within the PROMPT⁺ Framework (Section 4).
Comments 5: The results are not present. Are they the framework itself?
Response 5: Thank you for your thoughtful comments. We have deliberately structured the paper differently from the traditional IMRaD format to better align with its methodological focus. Since our primary contribution is the PROMPT⁺ Framework, the study is structured to highlight its development, application, and implications, rather than presenting results in a conventional experimental format.
To clarify, the framework itself serves as the main outcome of our research, synthesizing best practices into a structured methodology for AI-driven medical simulations. We have adjusted the manuscript to make this explicit and ensure that the role of the PROMPT⁺ Framework as the core result is clear to the reader.
Comments 6: The limitations are missing from the discussion
Response 6: Thank you for your valuable feedback. We have now explicitly incorporated a detailed Limitations section within the Discussion, addressing key concerns such as bias in large language models, the generalizability of the PROMPT⁺ Framework, the need for empirical validation, and ethical considerations in AI-driven medical simulations. These additions ensure a balanced discussion of our study’s findings and highlight important areas for future research.
Comments 7: Given the type of study, a section on future developments and applications could be included
Response 7: Thank you for your insightful suggestion. We have now included a dedicated section on future developments and applications within the Discussion, outlining potential research directions.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe author's research topic is a very interesting topic in the typical field of medical and engineering integration. In the medical field, rigor is more required. Does the Prompt framework proposed by the author have specific experiments and results? At the end of the first chapter starting from line 58, the author should add a list of the main contributions and a preview of the remaining chapters. The author must create a new second chapter as related work, explaining the main differences between existing research and this study and the position of this study in this field. What does the plus sign in the title of Chapter 4 mean? What does "+:" mean in line 258? At present, the reviewer's overall feeling after reading the author's article is that there is a lot of exposition, but it lacks theoretical basis and specific experimental results. In addition, which parts are existing research and which parts are original to the author. These two parts are best explained in separate chapters. Thanks.
Author Response
Comments 1: The author's research topic is a very interesting topic in the typical field of medical and engineering integration. In the medical field, rigor is more required. Does the Prompt framework proposed by the author have specific experiments and results? At the end of the first chapter starting from line 58, the author should add a list of the main contributions and a preview of the remaining chapters. The author must create a new second chapter as related work, explaining the main differences between existing research and this study and the position of this study in this field. What does the plus sign in the title of Chapter 4 mean? What does "+:" mean in line 258? At present, the reviewer's overall feeling after reading the author's article is that there is a lot of exposition, but it lacks theoretical basis and specific experimental results. In addition, which parts are existing research and which parts are original to the author. These two parts are best explained in separate chapters. Thanks.
Response 1: Thank you for your detailed and constructive feedback. We appreciate your emphasis on methodological rigor and the need for a clearer differentiation between existing research and our original contributions. To strengthen the clarity and structure of the paper, we have implemented several key revisions.
As this study is primarily conceptual, focusing on the development of the PROMPT⁺ Framework rather than empirical validation, we do not present experimental results. The framework has been designed based on practical insights and real-world application within AI-driven medical simulations, drawing from our experience in medical education and prompt engineering. We specifically chose Retrieval-Augmented Generation, Chain-of-Thought Prompting, and Role-Specific Prompting. Other advanced prompting techniques were considered but ultimately not selected due to their limitations in this context. We prioritized techniques that ensure structured reasoning, adherence to clinical guidelines, and role-specific contextualization, as these represent fundamental pillars of effective medical simulations. Structured reasoning through Chain-of-Thought Prompting supports step-by-step clinical decision-making, preventing premature conclusions and reinforcing diagnostic logic. Contextual Augmentation ensures that all generated responses align with evidence-based guidelines, maintaining medical accuracy and reliability. Role-Specific Prompting enables realistic professional interactions by aligning the AI’s responses with the responsibilities and communication style of specific medical roles.
Other approaches either introduced unnecessary complexity or lacked adaptability.
While we acknowledge the importance of empirical validation, it falls outside the scope of this work. However, we have explicitly addressed this limitation in the Discussion and outlined future directions, including the need for controlled studies to assess the educational impact of structured prompt engineering.
To improve clarity, we have added an explicit preview of the paper’s structure. We have carefully considered your suggestion to include a Related Work section but have opted against adding a separate section. Instead, we have woven relevant comparisons and contextual references throughout the manuscript to maintain a more fluid structure while ensuring that the study is well-positioned within the existing body of research.
Regarding your question about the ”+” component in PROMPT⁺, this represents the ethical and reflective dimension of the framework, ensuring that AI-generated simulations incorporate considerations such as bias mitigation, human oversight, and accountability. We have clarified this aspect in Section 4 to ensure that its purpose is well-defined.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe reviewer does not deny the efforts made by the authors in this study. The topic and direction of this study are novel and interesting. However, if the development of the framework is not evaluated (especially with specific parameters and specific values), the research and contribution will be incomplete. How do the reviewers judge the quality and feasibility of the methods proposed by the authors? The reviewers encourage the authors to submit academic journal papers that should include complete research projects. The authors must improve the technicality and scientificity of the research content. The different sections of the PROMPT framework in Figure 1 should be set with relevant parameters for evaluation. For example, when optimizing decision-making and reducing risks are specific goals, what are the corresponding prompts for different clinical case scenarios? Quantitative evaluation is essential. If there are any questions about the above content, the author can refer to the opinions of other reviewers first. Thanks.
Author Response
Comments 1: The reviewer does not deny the efforts made by the authors in this study. The topic and direction of this study are novel and interesting. However, if the development of the framework is not evaluated (especially with specific parameters and specific values), the research and contribution will be incomplete. How do the reviewers judge the quality and feasibility of the methods proposed by the authors? The reviewers encourage the authors to submit academic journal papers that should include complete research projects. The authors must improve the technicality and scientificity of the research content. The different sections of the PROMPT framework in Figure 1 should be set with relevant parameters for evaluation. For example, when optimizing decision-making and reducing risks are specific goals, what are the corresponding prompts for different clinical case scenarios? Quantitative evaluation is essential. If there are any questions about the above content, the author can refer to the opinions of other reviewers first. Thanks.
Response 1:
We sincerely appreciate your thoughtful feedback on our manuscript. Your insights have been invaluable in refining our work, and we have carefully addressed each of your concerns to enhance the scientific rigor, technical precision, and methodological clarity of our study. One of the primary concerns raised was the need for a quantitative evaluation of the PROMPT⁺ Framework to ensure its scientific validity. In response, we have introduced a structured evaluation process aligned with the core PROMPT⁺ principles. This assessment systematically examines AI-generated responses for their adherence to clinical reasoning standards, decision-making accuracy, and adaptability across diverse medical scenarios. We now provide clear evaluation metrics, including Likert-scale expert assessments to measure realism and accuracy, pre/post-test comparisons to track improvements in decision-making, and error rate tracking to assess AI reliability in recognizing and correcting incorrect learner responses. These measures allow for a direct comparison between AI-assisted simulations and traditional case-based learning, ensuring that AI-generated content meaningfully contributes to medical education.
To address your concerns regarding the technicality and scientificity of the research, we have refined our descriptions of the prompt engineering techniques used within the framework. Specifically, we now clearly delineate the roles of Role-Specific Prompting, Chain-of-Thought Prompting, and Retrieval-Augmented Generation. We added Table 2, which now provides a detailed breakdown of the evaluation criteria and measurement methods applied within the PROMPT⁺ structure. The table outlines how each evaluation metric aligns with decision-making optimization and risk reduction strategies, directly addressing the need for a more rigorous empirical foundation. In addition, we acknowledge your request for a structured validation of the framework and have now outlined a clear future study design.
We sincerely appreciate your constructive input, which has strengthened our manuscript.
Round 3
Reviewer 2 Report
Comments and Suggestions for AuthorsAt the end of the Abstract, what kind of necessary skills do the authors provide for medical professionals? What are the complex real-world challenges? These brief remarks are not specific. In addition, the reviewers still have doubts about the safety of generative AI in the medical field. Although the authors have presented cases and specific parameter frameworks, scientific and systematic evaluation is still needed. The authors must improve the technical and scientific nature of the research. For example, what are the specific generative AIs proposed by the authors for use in the medical field? How do the authors implement their workflows? What specific programming languages and computing resources, algorithms, models, etc. are used? What are their differences and biases? How about manpower, cost, feasibility, adaptability, replicability, scalability, etc.? Sorry, it is difficult for reviewers to accept any research without specific evaluations and quantitative numerical results. Thanks.
Comments on the Quality of English LanguageThe reviewers have no experience commenting on the quality of the English language. The reviewers encourage the authors to have the full text of the manuscript reviewed and confirmed by a native English speaker. Thanks.
Author Response
Comment and Response:
Dear Reviewer,
We sincerely appreciate your detailed feedback, which has helped us refine and clarify our manuscript. Below, we systematically address each of your concerns.
“At the end of the Abstract, what kind of necessary skills do the authors provide for medical professionals? What are the complex real-world challenges? These brief remarks are not specific.”
We appreciate the reviewer’s request for greater specificity in the Abstract. To clarify, we have revised the final sentence to more accurately reflect the intended scope of the PROMPT⁺ Framework. The framework aims to provide a structured approach to engaging with AI-generated medical content, offering a method for learners to reflect on clinical reasoning, critically assess AI-generated recommendations, and explore the potential role of AI-assisted decision-making in medical training workflows.
Additionally, we have refined the description of real-world challenges.
“The framework aims to provide a structured approach for engaging with AI-generated medical content, allowing learners to reflect on clinical reasoning, critically assess AI-generated recommendations, and consider the potential role of AI tools in medical training workflows. Additionally, we acknowledge certain challenges associated with the use of AI in education, such as maintaining reliability and addressing potential biases in AI outputs. Our study explores how AI-driven simulations could contribute to scalability and adaptability in medical education, potentially offering structured methods for healthcare professionals to engage with generative AI in training contexts.”
“The reviewers still have doubts about the safety of generative AI in the medical field.”
We appreciate the reviewer’s concerns regarding the safety of generative AI in medical applications. It is important to clarify that the PROMPT⁺ Framework is exclusively designed for educational purposes and does not influence real-world patient care or clinical decision-making. It will never be applied in direct patient interactions or used as a diagnostic or treatment tool.
The framework serves as a structured training tool, allowing medical professionals to refine their prompt engineering skills and critically evaluate AI-generated outputs in a controlled learning environment.
To further address safety concerns, we have incorporated multiple safeguards within the framework. As outlined in the revised manuscript, these include:
- Human-in-the-loop validation, ensuring that all AI-generated cases undergo expert review before use in training.
- Retrieval-Augmented Generation to improve accuracy by anchoring AI outputs to verified medical guidelines.
- Bias mitigation strategies, including dataset audits.
- Post-simulation debriefings, where learners critically reflect on AI-generated responses and compare them against evidence-based clinical reasoning.
Additionally, we emphasize that large language models do not have real-time access to evolving clinical guidelines, which is why the framework integrates structured dataset validation protocols to ensure that training materials remain aligned with current medical standards.
We hope this clarification reassures the reviewer that the PROMPT⁺ Framework is not designed for direct patient care, but rather as a structured educational tool to enhance AI literacy and critical evaluation skills in medical trainees.
“What are the specific generative AIs proposed by the authors for use in the medical field? How do the authors implement their workflows? What specific programming languages and computing resources, algorithms, models, etc. are used?”
The focus of this study is on optimizing prompt engineering techniques, rather than modifying AI model architectures. The PROMPT⁺ Framework utilizes GPT-4 (OpenAI) and BioGPT (Microsoft), each selected for their respective strengths in medical reasoning and clinical text generation.
The framework does not require proprietary AI models, dedicated computing resources, or complex infrastructure. It is designed to be accessible on publicly available AI platforms and can be used by medical students and trainees with standard computing devices and internet access.
“The focus of this study is on optimizing prompt engineering techniques to enhance generative AI-driven medical simulations. Rather than modifying the underlying model architecture, we refine the interaction between users and large language models through structured prompt design. The PROMPT⁺ Framework is designed to be compatible with various generative AI models and has been tested using GPT-4 (OpenAI) and BioGPT (Microsoft) due to their capabilities in medical text generation and clinical reasoning. However, the framework is not dependent on these specific models and can be applied with any large language model that supports structured prompt interactions, allowing for flexibility in different educational and institutional contexts.”
“How about manpower, cost, feasibility, adaptability, replicability, scalability, etc.?”
We would like to reiterate that the PROMPT⁺ Framework is not a software tool, AI model, or programmed system, but rather a structured self-assessment methodology for learners to evaluate and refine their prompts when interacting with generative AI.
Since the framework is purely a guideline for structured prompt design, it does not require programming, institutional infrastructure, or specialized computing resources. It is designed for medical trainees to use independently with any publicly available AI model (e.g., ChatGPT) as a way to self-check and improve their AI interactions.
Because it is not a programmed application, concerns about manpower, cost, scalability, and feasibility are not relevant in the same way they would be for an AI-based system. The framework is inherently adaptable and replicable, as it provides a structured approach to prompt engineering rather than a technology that needs implementation or maintenance.
Furthermore, this study was conducted specifically for the Special Issue: Techniques and Applications in Prompt Engineering and Generative AI, where the focus is on methodologies and structured approaches rather than the development of new AI architectures or software systems. We hope this clarification provides a better understanding of the scope and intent of our work
“Sorry, it is difficult for reviewers to accept any research without specific evaluations and quantitative numerical results.”
The evaluation methodology is systematically detailed in Table 2 (Overview of Evaluation Criteria and Measurement Methods). Each component of the PROMPT⁺ Framework is assessed using structured parameters, including:
- Expert review & Likert-scale assessments for realism and guideline adherence
- Pre/post-test comparisons for evaluating structured decision-making
- Error tracking to measure the effectiveness of structured prompting in improving AI-generated case quality
While this study focuses on the conceptualization of the framework, future research should include controlled comparative studies to assess structured versus unstructured prompting effects on diagnostic reasoning and learning outcomes.
“The different sections of the PROMPT framework in Figure 1 should be set with relevant parameters for evaluation.”
As outlined in Table 2, each component of the PROMPT⁺ Framework is already associated with specific checklist questions, corresponding prompting techniques, and structured measurement methods. These include:
- Guideline adherence scoring to ensure AI outputs align with medical standards.
- Decision-making adaptability tests to evaluate AI response refinement.
- Expert benchmarking comparisons to validate AI outputs against human decision
pathways
Round 4
Reviewer 2 Report
Comments and Suggestions for AuthorsThe reviewer has seen all the author's efforts, retained the previous comments, and has no further comments. The author can refer to other reviewers' opinions. The reviewer hopes that the author's research will soon provide detailed results and be used to serve the world sincerely. Thanks.
Author Response
Comments 1: The reviewer has seen all the author's efforts, retained the previous comments, and has no further comments. The author can refer to other reviewers' opinions. The reviewer hopes that the author's research will soon provide detailed results and be used to serve the world sincerely. Thanks.
Response 1: We are deeply grateful for your continued support and for recognizing our efforts to address all previous comments.