How the Choice of LLM and Prompt Engineering Affects Chatbot Effectiveness
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors1. In the abstract, LLMs should be the abbreviation for large language models.
2. The author needs to quantify the experimental results in the abstract to further illustrate the effectiveness of the research.
3. The author mentions safety in lines 82 and 143 of the manuscript but lacks the necessary literature discussion. It is recommended that the author add relevant work: On the vulnerability of safety alignment in open-access llms; Exploring Clean Label Backdoor Attacks and Defense in Language Models; Jailbreaking attack against multimodal large language model. Any papers recommended in the report are for reference only. They are not mandatory. You may cite and reference other papers related to this topic.
4. The introduction section of the manuscript needs further elaboration on the contributions of this paper.
5. In the results of Table 5, the accuracy for 'purchase phone number' and 'transfer phone number' is 0. The author should further explain the reasons for this.
6. The resolution of Figure 4 needs further adjustment.
Author Response
Information: Due to the additional sentences resulting from the reviewers' comments, I had to rephrase the entire abstract to fit within the 200-word limit. Therefore, the previous version has been completely crossed out and rewritten.
Comment 1: In the abstract, LLMs should be the abbreviation for large language models.
Response 1: Thank you for pointing this out. I agree with this comment. Therefore, I have updated the abbreviation "language models" to "large language models (LLMs)" in the abstract. This change can be found on line 4.
Comment 2: The author needs to quantify the experimental results in the abstract to further illustrate the effectiveness of the research.
Response 2: Thank you for your suggestion. I have quantified the experimental results in the abstract by including specific accuracy improvements. This change can be found on lines 7-9.
Comment 3: The author mentions safety in lines 82 and 143 of the manuscript but lacks the necessary literature discussion. It is recommended that the author add relevant work: On the vulnerability of safety alignment in open-access LLMs; Exploring Clean Label Backdoor Attacks and Defense in Language Models; Jailbreaking attack against multimodal large language model. Any papers recommended in the report are for reference only. They are not mandatory. You may cite and reference other papers related to this topic.
Response 3: Thank you for your suggestion. I have added relevant literature to discuss the safety aspects of large language models. This change can be found on lines 122-131.
Comment 4: The introduction section of the manuscript needs further elaboration on the contributions of this paper.
Response 4: Thank you for your suggestion. I have added a detailed elaboration on the contributions of this paper. This change can be found in lines 73-83.
Comment 5: In the results of Table 5, the accuracy for 'purchase phone number' and 'transfer phone number' is 0. The author should further explain the reasons for this.
Response 5: Thank you for your comment. I have provided a detailed explanation for the accuracy results in Table 5. This explanation can be found in the revised manuscript on lines 342-355.
Comment 6: The resolution of Figure 4 needs further adjustment.
Response 6: Thank you for your suggestion. I have regenerated all the charts using Microsoft Excel, which allows for better quality. Additionally, all figures have been saved in PNG format instead of JPG, as this provides better quality for charts and diagrams. This change ensures that the figures are clearer and more readable.
Reviewer 2 Report
Comments and Suggestions for AuthorsIn this paper, the authors test various LLMs, prompt formats, and commands on Rasa Pro to optimize chatbot performance. They find that structured formats like JSON and YAML significantly improve response accuracy. The study demonstrates that effective prompt engineering enhances chatbot efficiency, reducing reliance on larger, more computationally expensive models. However, to strengthen the paper, the following revisions are recommended:
- Justification for Rasa Pro – Clearly explain why Rasa Pro was chosen as the platform for this research.
- Model Selection Rationale – Add a paragraph detailing why these specific models were selected and include a comparative table summarizing their characteristics.
- Consistency in Model Naming – Standardize the model names throughout the paper to avoid confusion. For instance, the abstract and conclusion use "Gemini-1.5-Flash-8B" and "Gemma2-9B-IT" (Line 6), while the results section uses "Gemini1.5F-8B" and "Gemma-9B" (Table 2).
- Broader Applicability – Discuss how the research findings can be extended to other conversational AI systems.
These revisions will enhance clarity and improve the paper’s readiness for publication.
Author Response
Information: Due to the additional sentences resulting from the reviewers' comments, I had to rephrase the entire abstract to fit within the 200-word limit. Therefore, the previous version has been completely crossed out and rewritten.
Comment 1: Justification for Rasa Pro - Clearly explain why Rasa Pro was chosen as the platform for this research.
Response 1: Thank you for your suggestion. I have provided a clear justification for choosing Rasa Pro as the platform for this research. This change can be found on lines 223-231 of the revised manuscript.
Comment 2: Model Selection Rationale - Add a paragraph detailing why these specific models were selected and include a comparative table summarizing their characteristics.
Response 2: Thank you for your valuable feedback. I have added a paragraph detailing the rationale behind the selection of the specific models. This change can be found on lines 309-314 of the revised manuscript. Regarding the comparative table, I believe that Table 2 already provides a comprehensive overview of the LLMs used in the research, including their key characteristics. I hope this meets your expectations and addresses your concerns.
Comment 3: Consistency in Model Naming - Standardize the model names throughout the paper to avoid confusion. For instance, the abstract and conclusion use "Gemini-1.5-Flash-8B" and "Gemma2-9B-IT" (Line 6), while the results section uses "Gemini1.5F-8B" and "Gemma-9B" (Table 2).
Response 3: Thank you for your feedback. In the abstract, I used the full names "Gemini-1.5-Flash-8B" and "Gemma2-9B-IT" for clarity, as the abstract is intended to function independently of the full document. Throughout the rest of the document, I consistently used the shortened names "Gemini1.5F-8B" and "Gemma-9B," which are defined in Table 2. In the description of the table columns, I clearly stated that these working names would be used throughout the document for easier identification. I hope this approach maintains clarity and consistency across the manuscript.
Comment 4: Broader Applicability - Discuss how the research findings can be extended to other conversational AI systems.
Response 4: Thank you for your valuable feedback. I have expanded the discussion on the broader applicability of the research findings to other conversational AI systems. This addition highlights how the principles of prompt engineering used in this study can be applied to various platforms, particularly those that allow for the replacement of LLMs and modification of prompts. This change can be found in the Conclusions section on lines 472-482 of the revised manuscript.
Reviewer 3 Report
Comments and Suggestions for AuthorsThank you for inviting me to review this manuscript. The title is "How the Choice of LLM and Prompt Engineering Affects Chatbot Effectiveness", submitted to Electronics. The topic is useful for scholars studying chatbot performance and prompt engineering. I have some observations and suggestions for this manuscript that I would like to share with the authors:
Abstract
The authors may add 1-2 sentences for the background at the beginning.
More information about the data, such as data information, time period of data collection, can be introduced in the middle of an abstract, especially for "a set of real conversations with customers of a mobile phone operator" in line 156. These are very important.
Give more details about the theoretical framework.
Practical implications can be added at the end of the abstract.
Keywords
The authors include 9 keyword phrases. Perhaps the most important 4-5 phrases can be included to keep the reader focused.
Introduction
Please separate the introduction from the literature review.
Some paragraphs are short. Please join them together for better paragraph development.
Try not to use the present tense in the manuscript. Please use the present tense more often.
Please state the aim of the research more clearly in the introduction.
Add an overall structure paragraph at the end of the introduction.
Literature Review
Authors may consider using a table to summarise the results of the literature review. This table contains several columns, including authors, year, methods, data size, research objectives and key findings. The contents of this table can be organised chronologically or by topic. The current study is placed in the last row of this table so that readers can clearly understand the development of this technology and how the current study fits into the field.
Authors can improve the paragraph development for the literature review. Some sentence patterns of the topic sentences are very repetitive, e.g. "This research introduces ..." in line 92; "This article introduces ..." in line 101; "This study investigates ..." in line 110; "This study examines ..." in line 122; and "This study investigates the ..." in line 129. Authors can start topic sentences with different subjects, noun phrases or even circumstances.
The authors can strengthen the connection between the above studies, how they relate to each other and how they relate to the present study.
At the end of the literature review, a clearer research gap can be identified and how the present study fills the research gap.
Methodology
The sub-heading would be Section 3 Research Methodology.
Authors can begin by outlining the research approach of the study, e.g. quantitative, qualitative or mixed methods. Which parts are quantitative, which are qualitative, and why.
Authors may want to include a discussion of inter-rater agreement on data interpretation. If your team members did not agree on these aspects, how did you resolve any disagreements?
Findings and Discussion
The authors mainly discuss their own findings and data on pages 8 to 11. They could refer more to the studies reviewed in the literature review section. It would be good to link the present study to previous work and show how it develops from the field. The authors could provide a more detailed comparison of their findings with others' work, e.g. what are the similarities and differences, and explore the underlying reasons. This will give the reader a better understanding of the results and connect them to the field.
Conclusion
Authors might consider using a table to summarise the main findings of the study so that the reader has a quick reference.
Comments on the Quality of English LanguageLanguage
Minor editing is required before publication. However, the overall meaning is concise. The flow of information is smooth and easy to follow.
Author Response
Information: Due to the additional sentences resulting from the reviewers' comments, I had to rephrase the entire abstract to fit within the 200-word limit. Therefore, the previous version has been completely crossed out and rewritten.
Comment 1: (Abstract) The authors may add 1-2 sentences for the background at the beginning.
Response 1: Thank you for your suggestion. I have added the following sentences to provide background at the beginning of the abstract. This change can be found on lines 1-2.
Comment 2: (Abstract) More information about the data, such as data information, time period of data collection, can be introduced in the middle of an abstract, especially for "a set of real conversations with customers of a mobile phone operator" in line 156. These are very important.
Response 2: Thank you for your suggestion. I have added more information about the data in the abstract: "The study utilized a dataset of 400 sample test phrases created based on real customer service conversations with a mobile phone operator's customers." This change can be found on lines 10-12.
Additionally, I have included detailed information about the dataset in the Research Methodology section. This change can be found on lines 251-254.
Comment 3: (Abstract) Give more details about the theoretical framework.
Response 3: Thank you for your suggestion. I have provided more details about the theoretical framework in the abstract. This change can be found on lines 14-15.
Comment 4: (Abstract) Practical implications can be added at the end of the abstract.
Response 4: Thank you for your suggestion. I have added practical implications at the end of the abstract. This change can be found on lines 16-18.
Comment 5: (Keywords) The authors include 9 keyword phrases. Perhaps the most important 4-5 phrases can be included to keep the reader focused.
Response 5: Thank you for your suggestion. I have reduced the number of keyword phrases to focus on the most important ones. The revised keywords are: "Large language models; Optimization methods; Chatbots; Prompt engineering." This change can be found on lines 32-34.
Comment 6: (Introduction) Please separate the introduction from the literature review.
Response 6: Thank you for your suggestion. The introduction has been separated from the literature review. The literature review is now presented in a separate section titled "Related Work".
Comment 7: (Introduction) Some paragraphs are short. Please join them together for better paragraph development.
Response 7: Thank you for your suggestion. The paragraphs that required it have been joined together, particularly in the "Related Work" section containing the literature review, to improve paragraph development and coherence.
Comment 8: (Introduction) Try not to use the present tense in the manuscript. Please use the present tense more often.
Response 8: I inferred that the suggestion was to avoid using the future tense and instead use the present tense. Thank you for your feedback. I have made the necessary adjustments in several places.
Comment 9: (Introduction) Please state the aim of the research more clearly in the introduction.
Response 9: Thank you for your suggestion. I have addressed this by clearly stating the aim of the research in the introduction. This change can be found on lines 73-83.
Comment 10: (Introduction) Add an overall structure paragraph at the end of the introduction.
Response 10: Thank you for your suggestion. I have added a paragraph outlining the structure of the paper at the end of the introduction. This change can be found on lines 84-86.
Comment 11: (Literature Review) Authors may consider using a table to summarise the results of the literature review. This table contains several columns, including authors, year, methods, data size, research objectives and key findings. The contents of this table can be organised chronologically or by topic. The current study is placed in the last row of this table so that readers can clearly understand the development of this technology and how the current study fits into the field.
Response 11: Thank you for your suggestion. While I understand the value of summarizing the literature review in a table format, not all the literature discussed in the review focuses solely on the chronological development of the field. To maintain coherence and provide a comprehensive narrative, I have chosen to retain the descriptive format. This approach allows for a more nuanced discussion of the various aspects and contributions of each study. I believe this will better serve the purpose of illustrating the current state of research and how the present study fits into the broader context.
Comment 12: (Literature Review) Authors can improve the paragraph development for the literature review. Some sentence patterns of the topic sentences are very repetitive, e.g. "This research introduces ..." in line 92; "This article introduces ..." in line 101; "This study investigates ..." in line 110; "This study examines ..." in line 122; and "This study investigates the ..." in line 129. Authors can start topic sentences with different subjects, noun phrases or even circumstances.
Response 12: Thank you for your comment. I have revised the paragraph development in the "Related Work" section to improve variety in sentence patterns.
Comment 13: (Literature Review) The authors can strengthen the connection between the above studies, how they relate to each other and how they relate to the present study.
Response 13: Thank you for your valuable feedback. I have addressed this by adding connecting sentences throughout the "Related Work" section to clarify how the reviewed studies relate to each other and to the present study.
Comment 14: (Literature Review) At the end of the literature review, a clearer research gap can be identified and how the present study fills the research gap.
Response 14: Thank you for your valuable feedback. I have addressed this by adding a section at the end of the "Related Work" to clearly identify the research gap and explain how the present study addresses this gap. This change can be found on lines 217-223.
Comment 15: (Methodology) The sub-heading would be Section 3 Research Methodology. Authors can begin by outlining the research approach of the study, e.g. quantitative, qualitative or mixed methods. Which parts are quantitative, which are qualitative, and why. Authors may want to include a discussion of inter-rater agreement on data interpretation. If your team members did not agree on these aspects, how did you resolve any disagreements?
Response 15: I greatly appreciate this insightful suggestion. I have addressed this by restructuring the "Research Methodology" section to include sub-sections that outline the mixed-methods approach, detailing both the quantitative and qualitative methods used in the study. Additionally, I have included a discussion on inter-rater agreement to ensure the reliability of the qualitative analysis. These changes can be found in the updated "Research Methodology" section on lines 247-277
Comment 16: (Findings and Discussion) The authors mainly discuss their own findings and data on pages 8 to 11. They could refer more to the studies reviewed in the literature review section. It would be good to link the present study to previous work and show how it develops from the field. The authors could provide a more detailed comparison of their findings with others' work, e.g. what are the similarities and differences, and explore the underlying reasons. This will give the reader a better understanding of the results and connect them to the field.
Response 16: Thank you for your valuable feedback. I have addressed this by incorporating references to previous studies reviewed in the "Related Work" section. Specifically, I have linked our findings to the work of He et al. (2024) and Bocklisch et al. (2024), highlighting the similarities and differences, and exploring the underlying reasons. These changes can be found in the "Results and Analysis" section on lines 394-397 and 402-406.
Comment 17: (Conclusion) Authors might consider using a table to summarise the main findings of the study so that the reader has a quick reference.
Response 17: Thank you for your valuable suggestion. I have addressed this by including a table in the "Conclusions" section to summarize the main findings of the study. This table, labeled as Table 10, provides a quick reference for the reader. These changes can be found in the "Conclusions" section on lines 448-462.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThank you for your reply; my concern has been addressed.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have addressed most of my concerns. I have no further comments.
All the best.