Improving Chatbot Usability Through Structured Prompt-Based Interaction Design

Bastidas-Guacho, Gisel Katerine; Azogue Martínez, Edison Patricio; Gabilanes Martínez, Marco Antonio; Moreno-Vallejo, Patricio Xavier

doi:10.3390/mti10060060

Open AccessArticle

Improving Chatbot Usability Through Structured Prompt-Based Interaction Design

by

Gisel Katerine Bastidas-Guacho

^1,*

,

Edison Patricio Azogue Martínez

¹

,

Marco Antonio Gabilanes Martínez

¹

and

Patricio Xavier Moreno-Vallejo

²

¹

Faculty of Computer Science and Electronics, Escuela Superior Politécnica de Chimborazo (ESPOCH), Riobamba 060155, Ecuador

²

Faculty of Business Administration, Escuela Superior Politécnica de Chimborazo (ESPOCH), Riobamba 060155, Ecuador

^*

Author to whom correspondence should be addressed.

Multimodal Technol. Interact. 2026, 10(6), 60; https://doi.org/10.3390/mti10060060

Submission received: 12 March 2026 / Revised: 29 April 2026 / Accepted: 13 May 2026 / Published: 28 May 2026

Download

Browse Figures

Versions Notes

Abstract

This study presents a comparative evaluation of the usability of an intelligent chatbot implemented in a childcare center management system, focusing on the impact of a prompt-enhanced conversational configuration on user experience. The Chatbot Usability Questionnaire (CUQ) was used to assess perceived usability under two conditions: a baseline configuration and an enhanced configuration incorporating role-based prompting and preprocessing mechanisms. The results indicate a substantial increase in CUQ scores, from 69 in the baseline condition to 91 in the enhanced condition, suggesting improved perceived usability. Rather than isolating prompt engineering as a standalone variable, this work evaluates a system-level design approach that integrates structured prompts, role-based contextualization, and interaction refinement strategies. This study contributes to the understanding of how prompt-enhanced conversational designs can improve response clarity, relevance, and interaction quality in multi-role environments, including parents, teachers, and administrators. The findings provide empirical evidence that such configurations are associated with more coherent and role-appropriate interactions in service-oriented chatbot systems.

Keywords:

structured prompting; chatbot usability; service-oriented chatbots; human–computer interaction; usability evaluation; chatbot usability questionnaire

1. Introduction

AI-based chatbots have been increasingly adopted in educational and service-oriented contexts, where they support communication, information access, and interaction with digital systems. These systems can provide immediate assistance by answering questions, offering guidance, and facilitating access to relevant information. Prior research has also identified prompt engineering as an important factor in shaping the quality and relevance of generated responses, particularly in goal-oriented interactions with generative AI systems. Higher-quality prompting skills have been associated with better system outputs, highlighting the importance of prompt formulation in applied conversational settings [1].

The scientific literature has extensively documented the benefits of chatbots in educational environments. Labadze et al. [2], in a systematic review of 67 studies, identified key areas of benefit, including support for tasks and study, personalized assistance, and the development of diverse skills. For educators, the main benefits include time-saving assistance and pedagogical improvement, although they also identified challenges related to reliability, accuracy, and ethical considerations. Similarly, Okonkwo and Ade-Ibijola [3] analyzed 53 articles and highlighted the potential of chatbots to provide fast and personalized services, emphasizing that their integration into educational institutions can significantly improve service accessibility and efficiency.

However, important challenges persist in the effective implementation of these systems. Wollny et al. [4] identified three major challenges in their comprehensive review: aligning chatbot evaluations with implementation goals, considering learning as a process rather than only an outcome, and reporting findings in ways that enable systematic comparisons. Kuhail et al. [5] found that educational chatbots are primarily designed as web platforms to teach computer science, languages, and general education, highlighting the importance of interaction design principles to improve user experience.

On the other hand, prompting has emerged as a paradigm for adapting pre-trained language models to downstream tasks without requiring full model retraining. Rather than modifying model parameters, prompting uses carefully designed input instructions to influence model behavior and adapt responses to specific tasks and contexts. Liu et al. [6] provides a comprehensive conceptual and methodological overview of prompting in natural language processing. Their work highlights prompting not only as a technical mechanism for task adaptation, but also as a broader shift in how human users interact with increasingly capable language models, making it relevant for applications that depend on precise instruction design and context-sensitive responses.

Usability evaluation in educational chatbots requires specialized instruments that capture the particularities of conversational systems. Holmes et al. [7] developed the Chatbot Usability Questionnaire (CUQ) specifically to address the limitations of conventional tools, evaluating aspects such as chatbot personality, user experience, error handling, and onboarding. Prompt design has been increasingly recognized as a relevant factor in shaping user interaction with generative AI systems. Lee and Palmer [8] discuss the importance of prompt engineering in higher education contexts, while Walter [9] highlights the role of strategic input design in supporting meaningful and context-sensitive interactions with AI tools.

This study aims to evaluate the impact of a prompt-enhanced conversational configuration on perceived usability in a childcare center management context. Rather than isolating prompt engineering as a standalone variable, the proposed approach integrates role-based contextualization and interaction refinement mechanisms within the chatbot system. The study focuses on assessing overall user experience within a multi-role environment, including parents, teachers, and administrators, with particular emphasis on the predominant user group in the system context.

2. Related Work

This section presents a structured review of the relevant literature organized into three major areas: chatbots in education, prompt engineering, and usability/user experience.

2.1. Chatbots in Education: Systematic Reviews

Labadze et al. [2], through a review of studies on AI chatbots in education, identified key benefits for students in three main areas: task and study support, personalized learning experiences, and development of diverse skills. For educators, the main benefits include time-saving assistance and pedagogical improvement. However, they also identified significant challenges related to reliability, accuracy, and ethical considerations.

In a similar line, Okonkwo and Ade-Ibijola [3] analyzed 53 articles on chatbot applications in education, providing a comprehensive understanding of previous studies, benefits, challenges, and future research directions. Their findings emphasize the potential of chatbots to provide fast and personalized services to both institutional staff and students. The authors highlight that integrating chatbot technology into educational institutions can significantly improve service accessibility and efficiency.

Wollny et al. [4] conducted a comprehensive systematic review asking “Are we there yet?” regarding the state of chatbots in education. They analyzed 80 articles published between 2016 and 2021 and identified three major research challenges: (1) aligning chatbot evaluations with implementation goals, (2) considering learning as a process rather than only an outcome, and (3) reporting findings in a way that enables systematic comparisons. Their work reveals that, although chatbots show potential, a considerable gap remains between expectations and reality.

Kuhail et al. [5] presented a systematic review focused specifically on interaction with educational chatbots, analyzing 74 studies. Their findings reveal that chatbots are primarily designed on web platforms to teach computer science, languages, and general education. More than half of the chatbots functioned as teaching agents, while others served as learning companions or teaching assistants. The study highlights the importance of interaction design principles for improving user experience.

Deng and Yu [10] conducted a meta-analysis of the effect of chatbot technology in sustainable education, synthesizing results from multiple quantitative studies. Their findings indicate that chatbots have a significant positive effect on learning achievement, explicit reasoning, and knowledge retention; however, results regarding critical thinking, engagement, and motivation were contradictory, suggesting the need for further research.

2.2. Prompt Engineering

Prompt engineering has emerged as a relevant area for improving interactions with generative AI systems, particularly in contexts where response precision and relevance are important.

Lee and Palmer [8] presented a systematic review of the role of prompt engineering in interactions with generative AI in higher education, showing that well-designed prompts can transform GenAI interactions in teaching and learning. Their findings highlight the importance of developing practical AI interaction skills, including meaningful prompt engineering. The study emphasizes that higher education should explicitly incorporate these competencies into curricula.

Walter [9] examined the transformative impact of AI in educational environments, focusing on the need for AI literacy, prompt engineering competence, and enhanced critical thinking skills. The study highlights that prompt engineering involves strategic input design to obtain specific responses from AI systems, thus enriching educational experiences. Walter argues that these competencies are as fundamental in the twenty-first century as digital literacy was in previous decades.

Knoth et al. [1] investigated non-expert perspectives when using LLM-based AI systems through prompt engineering and how AI literacy affects prompting behavior. Their results showed that higher-quality prompt engineering skills predict the quality of LLM output, suggesting that prompt engineering is a required skill for goal-directed use of generative AI tools. The study reveals a significant correlation between AI literacy and the ability to formulate effective prompts.

Park and Choo [11] developed practical prompt engineering strategies specifically for educators using generative AI. Their research provides a structured framework that includes techniques such as zero-shot prompting, few-shot prompting, and chain-of-thought prompting. The authors emphasize that prompt quality can strongly influence the effectiveness of generative AI in educational settings and provide concrete examples of how educators can improve their interactions with these tools.

Kambhampati et al. [12] critically analyzed the nature of prompt engineering in large language models, arguing that it represents a paradigm shift in how humans interact with computational systems. Their work discusses cognitive and educational implications of this new human–computer communication mode and suggests the need to develop new metalinguistic competencies.

2.3. Usability and User Experience

Usability and user experience evaluation is fundamental to determining the effectiveness of educational chatbots. Instruments specifically designed for conversational systems have proven more appropriate than conventional tools.

Holmes et al. [7] developed the Chatbot Usability Questionnaire (CUQ) specifically to measure chatbot usability, addressing the limitations of conventional tools such as SUS for conversational systems. The CUQ evaluates positive and negative aspects of chatbot usability through 16 questions, with scores calculated on a 100-point scale to enable comparability with SUS. This instrument represents a significant advance in the standardization of chatbot evaluations.

In a later study, Holmes et al. [13] validated the CUQ through a study with 26 participants who evaluated three chatbots classified as good, average, and poor quality. The results suggested that the questionnaire demonstrated construct validity and reliability. Factor analysis revealed that CUQ measures four main factors: personality, user experience, error handling, and chatbot onboarding. This multidimensional structure enables more granular and specific evaluations.

Følstad and Brandtzaeg [14] investigated user experience with chatbots through a questionnaire study, identifying pragmatic and hedonic attributes that influence user satisfaction. They found that efficient assistance (positive) and interpretation problems (negative) were important elements in reports of satisfying and frustrating episodes. Hedonic attributes such as entertainment value were also significant, although to a lesser extent than pragmatic attributes.

Borsci and Schmettow [15] reexamined the Chatbot Usability Scale (BUS-11), a validated instrument for evaluating chatbot interaction quality composed of 11 items across five subscales. Using 3186 observations from 44 chatbots, they identified a simpler factorial structure and confirmed that BUS-11 is a reliable and universal scale that can be used from both psychometric and designometric perspectives, demonstrating that chatbot-specific instruments capture unique aspects of conversational experience.

Adamopoulou and Moussiades [16] presented a comprehensive review of chatbots, discussing architectures, development techniques, and evaluation methods. Their work emphasizes that effective evaluation should consider multiple dimensions: technical functionality, user experience, and achievement of domain-specific goals. The authors argue that usability in educational chatbots requires specialized metrics that reflect the pedagogical context.

3. Methodology

This study adopts a quantitative experimental approach complemented by a literature review to evaluate how a structured prompt-based interaction design is associated with the perceived usability of a chatbot designed for a child development and early stimulation center. The chatbot is composed of two AI models, DeepSeek R1 and Llama 3.1, integrated into a hybrid architecture. In this implementation, the models do not receive prior domain-specific training or fine-tuning.

3.1. Description of the AI Models Used

The selection of DeepSeek R1 and Llama 3.1 was based on a comparative review of different versions of both models and other open-source alternatives. They were selected for two main reasons for chatbot implementation: (i) their availability and reported reasoning capabilities, which support their use in complex conversational settings, and (ii) their open-access availability, which facilitates practical implementation in institutions with limited resources.

DeepSeek R1 was selected because its reasoning-oriented architecture has been associated with multi-step problem solving and self-verification capabilities [17]. Recent research has also suggested that reasoning-focused prompting may improve the interpretability of large language model responses in complex domains [18]. In this study, these characteristics supported its inclusion as a model capable of handling queries that require more elaborate interpretation within the childcare support context.

Furthermore, Llama 3.1 was integrated due to its efficiency in processing direct queries, response speed, and capacity for behavioral adaptation through prompt engineering. These features are particularly useful in everyday interactions with parents, teachers, and administrators. He et al. [19] describe instruction-following and behavior modulation properties associated with Llama-based architectures, which support its use in prompt-guided conversational applications. In this study, these characteristics were considered relevant for supporting responsive and adaptable interaction behavior.

The decision to use two models in parallel was motivated by the search for a more robust and versatile system, reducing single-model error bias and improving response reliability and availability. Studies such as Luo et al. [20] and Chen et al. [21] suggest that combining multiple LLMs may improve robustness, availability, and task coverage in conversational systems. This perspective motivated the inclusion of two models in the proposed architecture.

3.2. Experimental Process

The methodological development for the experiment includes the planning, implementation, and experimental validation of the chatbot.

3.2.1. Technological Implementation and Chatbot Architecture

The chatbot was developed using two operating modes (Support Mode and AI Mode). In the first case, RESTful API services that feed predefined responses for Support Mode were integrated into a layered architecture within a web management system for a childcare center. The second case, which is the focus of this study, was implemented with a monolithic architecture that consumes external AI model services (DeepSeek R1 and Llama 3.1) via RESTful APIs provided by Hugging Face, enabling centralized access to both models.

Figure 1 present the interfaces developed for both chatbot modes. These screenshots illustrate the visual design, interactive elements, and general layout used during the experimental process, evidencing the functional differentiation between the two modes.

3.2.2. Experimental Configuration Process

Condition A Configuration (Without Prompt Engineering)

The chatbot operates without specific prompts for contextualization or precision support, and both models (DeepSeek R1 and Llama 3.1) operate with standard settings and default parameters based on the official DeepSeek documentation and the Llama 3.1 sampling configuration reported by Nhat Minh et al. [22]. In this condition, the chatbot simply receives the user’s query, processes it, and forwards it to one of the two models, prioritizing DeepSeek due to its stronger reasoning capability and handling of complex responses. If DeepSeek takes more than 20 s to respond, the query is forwarded to Llama 3.1. The resulting response then passes through a response standardization stage to remove tags such as <think> and other irregular artifacts that models may return as part of their internal reasoning traces, producing a clean final response for the user.

Condition B Configuration (with Structured Prompt-Based Interaction Design)

The chatbot implements a structured prompt-based interaction design aimed at improving contextualization and query interpretation. This configuration incorporates role-aware prompting strategies, contextual structuring of inputs, and response personalization adapted to different user profiles. Additional implementation details of the prompt-enhanced conversational configuration, including the base prompt structure, role-specific variants, representative few-shot examples, filtering rules, and post-processing rules, are provided in Appendix A.

In this condition, after receiving the user’s query, the system applies a set of preprocessing mechanisms, including orthographic normalization, role-based contextual adjustments, and thematic filtering, in order to reduce ambiguity and avoid role mismatches (e.g., a user receiving responses intended for a different profile). The processed query is then routed through the same model selection mechanism and latency threshold used in Condition A, ensuring consistency in the underlying model behavior across both experimental configurations. DeepSeek R1 served as the primary model in both conditions, while Llama 3.1 was activated only when the same latency threshold was exceeded; model allocation was not analyzed as an independent factor in this study.

In addition, response post-processing mechanisms are applied to refine the final output, including formatting adjustments and content filtering, to ensure clarity and coherence in the generated responses. Therefore, while Condition B incorporates a prompt-enhanced interaction design, the routing logic was preserved unchanged with respect to Condition A and was not treated as a differential factor between conditions.

Prompting Techniques and Adjustments

During implementation, internal tests were conducted to support prompt engineering deployment. The process consisted of two main phases: (1) construction of a base prompt establishing initial model contextualization, including identity, purpose, functional scope, and institutional knowledge; and (2) application of specialized prompt engineering techniques to optimize responses according to each usage context. Table 1 details the five techniques used to build role-specific prompts (parent, teacher, administrator) and general system prompts.

3.2.3. Usability Test Design

The chatbot was evaluated through controlled interactions in 10-min sessions, in which users performed queries categorized into three levels of complexity (low, medium, high). Task complexity was defined based on the cognitive and informational demands required to formulate and interpret responses: low-complexity tasks involved direct and specific queries requiring straightforward answers; medium-complexity tasks required contextual understanding or the combination of multiple pieces of information; and high-complexity tasks involved more elaborate reasoning, interpretation, or advisory responses related to childcare practices or decision-making scenarios. Representative task types were defined for each category to ensure consistency in interaction patterns.

Subsequently, the Chatbot Usability Questionnaire (CUQ), a validated instrument that measures personality, interaction experience, onboarding, and error handling, was administered. Table 2 summarizes the CUQ evaluation dimensions.

The participant group included users corresponding to different system roles (parents, teachers, and administrative staff). However, participants were not grouped for statistical comparison by role, and the analysis was conducted at an aggregate level across all users.

Given that the majority of active users in the system corresponded to parents, the evaluation primarily reflects the usability experience of this predominant user group within the childcare context.

The evaluation followed a within-subject design, in which participants interacted with both configurations (Condition A and Condition B) within the same session. Each participant completed the CUQ instrument after interacting with each condition.

Both configurations were evaluated using comparable interaction structures and the same task complexity framework (low, medium, high) to ensure consistency in user experience demands across conditions. The evaluation focused on perceived usability, and therefore did not include objective performance-based metrics such as task completion rate, response time, or interaction efficiency.

However, the experimental design did not include explicit randomization or counterbalancing of the order of conditions. Additionally, no washout period was implemented between interaction phases. As a result, potential order effects, learning effects, and carryover effects cannot be completely ruled out.

4. Results

This study evaluated the usability of a chatbot using the Chatbot Usability Questionnaire (CUQ) under two experimental configurations: a baseline configuration (Condition A) and a structured prompt-based interaction design (Condition B). In both phases, participants (parents, teachers, and administrators) interacted with the system in 10-min sessions for each configuration and subsequently completed the CUQ.

4.1. Preparatory Phase

During the preparatory phase, functional tests and validation of the two chatbot configurations were conducted. Correct integration of DeepSeek R1 and Llama 3.1 was verified, along with stability of communication with the Hugging Face APIs. At this stage, interaction scenarios were defined, model operating parameters were adjusted, and prompts for Condition B were designed.

Condition A operated without prompt engineering, using standard parameters (temperature = 0.7, top_p = 0.95) and processing queries directly without contextualization. Condition B implemented optimized contextual prompts with automatic role recognition and response personalization. This design incorporated role-aware contextualization intended to adapt the chatbot’s tone and content according to the user profile (parent, teacher, or administrator).

4.2. Population and Instruments

The sample consisted of 30 participants from the childcare center, distributed across the three main system user roles as follows: 23 parents, 6 teachers, and 1 administrator. The majority of participants corresponded to parents, who represent the primary and most active user group in the evaluated system context.

Each participant performed controlled queries classified into low, medium, and high complexity levels, ensuring comparable interaction structures across both experimental conditions. The Chatbot Usability Questionnaire (CUQ) was administered after each interaction phase to assess perceived usability.

Although multiple user roles were considered in the system design, the present analysis focuses on overall usability outcomes and does not include a role-based statistical comparison. Therefore, the reported results primarily reflect general user experience, influenced by the predominant user group.

Chatbot Usability Questionnaire (CUQ): The CUQ is a specialized instrument for evaluating the usability of conversational interfaces, designed to capture unique aspects of human–chatbot interaction [7]. The instrument consists of 16 items rated on a 5-point Likert scale: “Strongly disagree,” “Disagree,” “Neutral,” “Agree,” and “Strongly agree”. The CUQ score was calculated using Equation (1).

CUQ = (\frac{(\sum O_{i} - 8) + (40 - \sum E_{i})}{64}) \times 100

(1)

where

\sum O_{i}

is the sum of odd-numbered item scores and

\sum E_{i}

is the sum of even-numbered item scores.

4.3. Experimental Parameter Configuration

Both chatbot configurations maintained the same core routing logic, latency threshold, and general system configuration to ensure comparability across experimental conditions. At the same time, some response-generation parameters were adapted according to user role (parent, teacher, administrator) as part of the operational design of the system. These role-specific settings were preserved consistently across conditions and were not treated as a differential factor between the baseline and prompt-enhanced configurations. Table 3 shows the parameter configuration used for the different role profiles.

4.4. Experimental Results

The results in Table 4 show notable differences between experimental conditions. Condition A obtained lower mean scores in explaining the system purpose (

M = 2.2

) and understanding user queries (

M = 3.0

), whereas Condition B reached mean scores of 4.8 in both dimensions. Additional improvements were observed in personality perception (4.0 vs. 4.7) and reduced robotic perception (1.8 vs. 1.4), suggesting that the enhanced configuration supported more natural and effective interactions.

The global usability calculation in Table 5 confirmed a 32% increase in the CUQ score, from 69 points (marginal) to 91 points (acceptable). This reflects a substantial improvement in user experience after prompt optimization.

The usability scales shown in Figure 2 and Figure 3 indicate a transition from marginal to acceptable usability between conditions, suggesting that the enhanced configuration reached a satisfactory level for practical use in childcare center settings.

4.5. Statistical Analysis

In order to complement the descriptive results, inferential statistical analysis was conducted using the recoded CUQ scores. Negative items were reverse-coded so that higher values consistently represent better usability.

A paired t-test was conducted to compare recoded CUQ scores between conditions. The baseline condition obtained a mean score of M = 3.76 (SD = 0.28), whereas the prompt-enhanced condition reached M = 4.63 (SD = 0.17). The difference between conditions was statistically significant, t(29) = 20.57, p < 0.001, with a mean paired difference of 0.87 (95% CI [0.78, 0.95]); the paired-samples effect size was Cohen’s d = 3.76. The Wilcoxon signed-rank test confirmed the same pattern (p < 0.001), indicating a consistent increase in perceived usability across participants.

At the dimension level, all four CUQ dimensions improved under the prompt-enhanced condition. The largest increase was observed in the initial onboarding process (

Δ = 1.77

), followed by general interaction experience (

Δ = 0.85

), error-handling capability (

Δ = 0.42

), and chatbot personality (

Δ = 0.30

) as shown in Table 6. These results suggest that the prompt-enhanced condition showed its largest improvement in helping users understand the system and engage more effectively during early interaction.

Item-level analysis further supported this trend. Most items showed improvements in the expected direction, particularly those related to system purpose, response usefulness, input recognition, and relevance of responses. However, the effects were not entirely uniform. Two items showed no statistically significant differences, and one item related to perceived complexity exhibited a slight decrease after reverse coding. This indicates that, although the overall impact was clearly positive, some aspects of usability may be less sensitive to prompt-based optimization than others.

5. Discussion

The results suggest that the prompt-enhanced conversational configuration is associated with improved perceived usability of the chatbot system. The significant improvement observed in the global CUQ score indicates that the enhanced configuration was associated with a more favorable user experience. In particular, the strongest improvement was found in the onboarding dimension, suggesting that structured prompt-based strategies were particularly useful for clarifying the chatbot’s purpose and supporting users during their initial interaction.

Improvements were also observed in general interaction experience, chatbot personality, and error-handling capability. These findings indicate that structured prompt-based interaction design can improve the relevance, coherence, and communicative clarity of chatbot responses in service-oriented environments. However, the item-level results also show that the effects were not entirely uniform across all aspects of usability, suggesting that some perceptions, such as complexity, may be influenced by factors beyond the interaction design alone.

According to the CUQ results, the enhanced configuration improved query understanding (from 3.0 to 4.8), suggesting better interpretation of users’ natural language. It also improved the explanation of system purpose (from 2.2 to 4.8), indicating clearer initial communication regarding the chatbot’s role and scope. Moreover, response usefulness increased from 2.6 to 4.8, suggesting greater relevance and precision in interactions.

These findings are consistent with prior research showing that chatbot usability is strongly influenced by response clarity, conversational coherence, and system transparency. Larbi et al. [23], using the CUQ in an applied chatbot setting, showed that usability perceptions are closely related to how clearly and naturally the chatbot communicates with users. Similarly, Labadze et al. [2] emphasize that, in educational and support-oriented environments, the effectiveness of AI chatbots depends not only on the language model itself but also on the quality of the interaction design. In the same direction, Fulgencio [24] highlights that structured prompt design and iterative refinement can improve interaction quality and user acceptance in educational chatbot deployments. From a more general perspective, Patil et al. [25] argue that carefully designed prompts are particularly important in domains requiring contextual precision and clear communication, since they can reduce ambiguity and improve response relevance. In this sense, the present study provides empirical evidence that a prompt-enhanced conversational configuration can improve perceived usability in a childcare support context.

Considering the multi-role context (parents, teachers, and administrators), an overall increase in user satisfaction was observed. However, since the analysis was conducted at an aggregate level and the majority of participants corresponded to parents, these results primarily reflect general usability trends influenced by the predominant user group. The analysis of chatbot perception also showed a reduction in robotic interaction perception (from 1.8 to 1.4) and a considerable increase in naturalness and communicative closeness (from 4.0 to 4.7). These results are also in line with prior literature suggesting that well-designed prompts and structured conversational strategies contribute to more natural and satisfactory interactions [24,25].

The chatbot was deployed in an early childhood education context (Initial Level 1), where it serves as a support tool for parents, teachers, and administrative staff to access guidance on child development activities and practices. The system is intended for non-expert users, both in terms of technology use and child development knowledge, and aims to facilitate access to understandable and context-appropriate information.

In this sense, the primary objective of the present study is to evaluate usability, specifically whether users can effectively interact with the system and understand its responses. The chatbot is designed as a guidance tool and does not replace professional or clinical expertise. Therefore, the evaluation focuses on user experience rather than on the verification of content accuracy or safety. The study does not evaluate learning outcomes or knowledge acquisition, but focuses on usability and interaction quality within a childcare support context.

5.1. Limitations

The study presents several methodological limitations. First, the experimental design did not include randomization or counterbalancing of condition order, which may introduce order effects and learning effects. Since participants interacted with both conditions within the same session, familiarity with the system may have influenced responses in the second phase.

Second, no washout period was implemented between conditions, which limits the ability to isolate carryover effects. Although comparable task structures were used, their equivalence was not formally validated.

Furthermore, although the system supports multiple user roles (parents, teachers, and administrators), the majority of participants corresponded to parents, who represent the primary user group in the evaluated context. The analysis was conducted at an aggregate level, and no role-based statistical comparison was performed. As a result, the findings may be influenced by the predominant user group and do not reflect potential differences across roles.

The study was also conducted with a relatively small sample of 30 participants within a specific childcare center context. This sample size is partly explained by the limited number of users served by the institution, which operates on a relatively small scale. Therefore, the findings should be interpreted with caution and may not be directly generalizable to other institutional, cultural, or linguistic settings.

Finally, although the chatbot operates in a context related to child development guidance, the present study does not include an evaluation of response accuracy, reliability, or safety. The system is intended as a support tool for non-expert users and does not replace professional or clinical expertise.

5.2. Future Work

Future research should extend the present study in several directions. First, larger and more diverse samples should be considered in order to validate the robustness of the observed usability improvements across different institutional, cultural, and linguistic contexts.

Second, future studies should incorporate objective performance-based evaluation metrics such as task completion rate, response time, and interaction efficiency in order to complement perceived usability measures.

Third, qualitative feedback should be systematically collected and analyzed through open-ended comments, user narratives, or interaction-based observations to provide a richer human–computer interaction perspective.

In addition, more detailed role-specific analyses should be conducted to explore whether parents, teachers, and administrators differ in their perception of the chatbot configuration.

Finally, expert-based validation should be incorporated to assess the appropriateness, reliability, and safety of the generated responses, particularly given the childcare guidance context in which the chatbot operates.

6. Conclusions

The use of a structured prompt-based interaction design in the hybrid chatbot based on DeepSeek R1 and Llama 3.1 was associated with a substantial improvement in perceived usability. The increase in the overall CUQ score indicates a transition from marginal to acceptable usability, suggesting that carefully designed interaction strategies can positively influence the quality of user experience. In particular, role-aware contextualization and response refinement contributed to clearer communication of system purpose and greater perceived relevance of responses. These results highlight the importance of interaction design choices in shaping how users experience conversational systems in terms of coherence, clarity, and role-appropriate communication. Accordingly, this study provides empirical evidence that prompt-enhanced conversational configurations can improve the practical usability of chatbot systems deployed in childcare support settings. These findings should be interpreted within the scope of perceived usability and not as a validation of response accuracy or safety.

Author Contributions

Conceptualization, E.P.A.M., M.A.G.M., and G.K.B.-G.; methodology, E.P.A.M., M.A.G.M., and G.K.B.-G.; software, E.P.A.M., M.A.G.M., and P.X.M.-V.; validation, G.K.B.-G., and P.X.M.-V.; formal analysis, G.K.B.-G., and P.X.M.-V.; investigation, E.P.A.M., M.A.G.M., and G.K.B.-G.; resources, E.P.A.M., M.A.G.M., and G.K.B.-G.; data curation, E.P.A.M., and M.A.G.M.; writing—original draft preparation, E.P.A.M., M.A.G.M., and G.K.B.-G.; writing—review and editing, P.X.M.-V.; visualization, E.P.A.M., M.A.G.M., and P.X.M.-V.; supervision, G.K.B.-G.; project administration, G.K.B.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Written informed consent was waived due to the anonymous and minimal-risk nature of the survey. Before participation, respondents were informed about the study objectives, the anonymous treatment of data, and the intended use of the collected information. Participation was voluntary, and completion of the questionnaire was considered implied informed consent.

Data Availability Statement

The data presented in this study are not publicly available due to privacy and ethical restrictions, as they derive from anonymous survey responses collected in an educational setting.

Acknowledgments

The authors thank the participating childcare center and respondents for their collaboration in the usability evaluation.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Prompt Documentation for Reproducibility

This appendix provides additional implementation details of the prompt-enhanced conversational configuration used in Condition B. Its purpose is to improve transparency and reproducibility by documenting the base prompt structure, role-specific prompt variants, representative few-shot examples, input filtering and normalization rules, output post-processing rules, and representative interaction examples.

The materials presented here reflect the operational design principles implemented in the chatbot system. Because the final prompt was dynamically assembled at runtime, the templates below are reported in structured form to document the main components and constraints of the interaction design.

Appendix A.1. Base Prompt Template

The chatbot used a universal base prompt to define the assistant identity, domain scope, communication style, and global behavioral constraints. The following template summarizes the base prompt structure used in the system:

You are Ivy, a virtual assistant specialized in child development and early childhood guidance. You have experience in pedagogy, developmental psychology, and support for families, educators, and childcare staff.

Your main function is to provide guidance and support related to child development, early education, daily care, and institutional information relevant to the childcare center context.

Linguistic tolerance: always interpret the user’s intention, even if the message contains spelling errors, abbreviations, informal wording, or grammatical mistakes. Effective communication is more important than linguistic perfection.

Required response style:

Provide direct, clear, and specific answers.
Include concrete examples whenever appropriate.
Provide practical and actionable recommendations.
Indicate when professional support should be sought, if the situation requires it.
Maintain a coherent, respectful, and supportive tone.

Restrictions:

Do not generate internal reasoning traces.
Do not answer questions outside the permitted thematic scope for the detected user role.
Do not provide misleading certainty in cases that require professional evaluation.

Appendix A.2. Role-Specific Prompt Variants

In addition to the universal base prompt, the system incorporated role-specific prompt segments to adapt response style, vocabulary, and thematic emphasis according to the user profile.

Appendix A.2.1. Parent Role Variant

Use accessible, empathetic, and practical language. Avoid unnecessary technical terminology. Include everyday examples that can be applied at home. Prioritize guidance related to routines, stimulation activities, signs of developmental concern, and practical recommendations for daily care.

Permitted themes include:

basic child development,
daily care routines,
stimulation activities at home,
warning signs that may require professional consultation,
general childcare guidance.

Appendix A.2.2. Teacher Role Variant

Use a more technical and pedagogical tone. Provide structured and professionally oriented responses. When relevant, include methodological guidance, developmental criteria, educational strategies, classroom applications, and observation-oriented recommendations.

Permitted themes include:

child development in educational settings,
early childhood pedagogical strategies,
observation and evaluation support,
learning activities,
educational terminology relevant to childcare and early education.

Appendix A.2.3. Administrator Role Variant

Use an institutional and management-oriented tone. Focus on process clarity, service organization, administrative guidance, and practical support for center-related operations. When appropriate, emphasize efficiency, organization, and role-appropriate communication.

Permitted themes include:

childcare center management support,
communication processes,
administrative guidance related to the institution,
service-oriented operational information,
general center-related information.

Appendix A.3. Dynamic Prompt Construction

The final prompt provided to the model in Condition B was dynamically assembled from four components:

Universal base prompt;
role-specific prompt variant;
contextual prompt segment derived from recent conversation history;
current user message and task-specific instructions.

Conceptually, the runtime construction followed the structure shown in Equation (A1):

P_{f i n a l} = P_{b a s e} + P_{r o l e} + P_{c o n t e x t} + P_{i n s t r u c t i o n}

(A1)

where:

$P_{b a s e}$ defines assistant identity and global constraints,
$P_{r o l e}$ adapts behavior according to the user profile,
$P_{c o n t e x t}$ incorporates recent interaction history,
$P_{i n s t r u c t i o n}$ corresponds to the current user query and immediate response objective.

Appendix A.4. Representative Few-Shot Examples

The system incorporated representative example interactions in order to guide response style and improve consistency. The following examples illustrate the type of few-shot prompting used.

Appendix A.4.1. Example 1: Parent-Oriented Query

User input: What can I do at home to help my 2-year-old improve speech?

Assistant-style example:

Here are some simple activities you can do at home to support language development:

Talk to your child during daily routines, such as dressing or meals.
Name objects clearly and repeat words naturally.
Read short picture books together and point to images.
Encourage imitation of sounds, words, and gestures.

If you notice that your child rarely responds to sounds, does not attempt to communicate, or has lost previously acquired abilities, it would be advisable to consult a specialist.

Appendix A.4.2. Example 2: Teacher-Oriented Query

User input: Suggest an activity to stimulate fine motor development in early childhood.

Assistant-style example:

A suitable activity is a guided object-transfer exercise using large buttons, blocks, or soft manipulatives.

Objective: Strengthen grasp control, hand-eye coordination, and precision.

Procedure:

Present a container with safe manipulatives.
Ask the child to transfer the objects from one container to another.
Gradually vary size, texture, or positioning difficulty.

Observation criteria:

grasp stability,
bilateral coordination,
persistence during the task.

Appendix A.4.3. Example 3: Administrator-Oriented Query

User input: How should I explain to families the purpose of developmental follow-up?

Assistant-style example:

Developmental follow-up helps the center observe each child’s progress over time and identify strengths, needs, and possible support actions. It is not intended as a diagnosis, but as a structured way to guide communication with families and support timely attention when necessary.

It can be explained to families as a tool for:

understanding developmental progress,
documenting observations,
improving communication between the center and families,
identifying when additional support may be useful.

Appendix A.5. Input Filtering and Linguistic Normalization Rules

Before sending the final query to the model, the system applied preprocessing mechanisms to improve robustness and reduce ambiguity.

Appendix A.5.1. Linguistic Normalization

The preprocessing stage included:

conversion to lowercase,
removal of orthographic variation where needed,
correction of common spelling errors,
interpretation of abbreviations and informal variants,
tolerance to partial or colloquial expressions.

Representative normalization examples are shown in Table A1.

Table A1. Representative text normalization examples.

Input Form	Normalized Form
desarollo	desarrollo
estimulasion	estimulación
como te yamas	como te llamas
administrasion	administracion
metodolojia	metodologia
pediatrra	pediatra

Appendix A.5.2. Role-Based Topic Filtering

The system also validated whether a query was within the permitted thematic scope for the detected role. If a query was outside the allowed domain, the assistant did not attempt an unrestricted answer and instead returned a bounded role-specific response.

Appendix A.5.3. Representative Out-of-Scope Responses

Parent

I am a virtual assistant specialized in child development and childcare guidance. I can help with questions related to those topics. If you would like, you can ask me about routines, development, stimulation, or childcare practices.

Teacher

I am a virtual assistant specialized in child development and early education support. I can assist with questions related to educational strategies, developmental observation, and childcare-related pedagogy.

Administrator

I am a virtual assistant specialized in childcare center support and related institutional guidance. I can help with administrative and service-oriented topics relevant to the childcare context.

Appendix A.6. Output Post-Processing Rules

After model generation, the system applied a response-cleaning stage to improve clarity and consistency before presenting the output to the user.

The post-processing phase included:

removal of internal tags or reasoning traces such as <think>;
deletion of malformed or irrelevant technical artifacts;
normalization of spacing and formatting;
preservation of readable structure in headings or lists when present;
content filtering for clarity and role appropriateness.

This stage was intended to ensure that the final answer was coherent, readable, and aligned with the expected response style of the system.

Appendix A.7. Representative Interaction Examples

The following examples illustrate the behavior of the prompt-enhanced configuration. These are representative examples intended to document interaction style and system behavior.

Appendix A.7.1. Example A: Initial Parent Interaction

User: My child is 1 year old and still does not say any words. What can I do?

Assistant:

At this age, some children are still beginning to develop speech, but you can support language development in simple ways at home:

speak frequently during daily routines,
name people and objects clearly,
use songs and short picture books,
encourage imitation of sounds and gestures.

If your child does not respond to sounds, avoids interaction, or shows other concerning signs, it would be advisable to consult a pediatric or developmental specialist.

Appendix A.7.2. Example B: Teacher Clarification Request

User: Give me an activity for development.

Assistant:

I can help with that. Could you tell me which area you would like to support?

For example:

language,
fine motor skills,
gross motor skills,
social-emotional development,
cognitive development.

Appendix A.8. Scope of Reproducibility

This appendix documents the principal prompt design elements used in Condition B. However, because the system assembled prompts dynamically according to role, context, and interaction state, the appendix should be interpreted as a structured representation of the implemented configuration rather than as a single static prompt file.

The materials reported here are intended to improve methodological transparency and facilitate comparative replication of the prompt-enhanced conversational design.

References

Knoth, N.; Tolzin, A.; Janson, A.; Leimeister, J.M. AI Literacy and Its Implications for Prompt Engineering Strategies. Comput. Educ. Artif. Intell. 2024, 6, 100225. [Google Scholar] [CrossRef]
Labadze, L.; Grigolia, M.; Machaidze, L. Role of AI Chatbots in Education: Systematic Literature Review. Int. J. Educ. Technol. High. Educ. 2023, 20, 56. [Google Scholar] [CrossRef]
Okonkwo, C.W.; Ade-Ibijola, A. Chatbots Applications in Education: A Systematic Review. Comput. Educ. Artif. Intell. 2021, 2, 100033. [Google Scholar] [CrossRef]
Wollny, S.; Schneider, J.; Di Mitri, D.; Weidlich, J.; Rittberger, M.; Drachsler, H. Are We There Yet? A Systematic Literature Review on Chatbots in Education. Front. Artif. Intell. 2021, 4, 654924. [Google Scholar] [CrossRef] [PubMed]
Kuhail, M.A.; Alturki, N.; Alramlawi, S.; Alhejori, K. Interacting with Educational Chatbots: A Systematic Review. Educ. Inf. Technol. 2023, 28, 973–1018. [Google Scholar] [CrossRef]
Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 2023, 55, 195. [Google Scholar] [CrossRef]
Holmes, S.; Moorhead, A.; Bond, R.; Zheng, H.; Coates, V.; McTear, M. Usability Testing of a Healthcare Chatbot: Can We Use Conventional Methods to Assess Conversational User Interfaces? In Proceedings of the 31st European Conference on Cognitive Ergonomics (ECCE 2019), Belfast, UK, 10–13 September 2019; pp. 207–214. [Google Scholar] [CrossRef]
Lee, D.; Palmer, E. Prompt Engineering in Higher Education: A Systematic Review to Help Inform Curricula. Int. J. Educ. Technol. High. Educ. 2025, 22, 7. [Google Scholar] [CrossRef]
Walter, Y. Embracing the Future of Artificial Intelligence in the Classroom: The Relevance of AI Literacy, Prompt Engineering, and Critical Thinking in Modern Education. Int. J. Educ. Technol. High. Educ. 2024, 21, 15. [Google Scholar] [CrossRef]
Deng, X.; Yu, Z. A Meta-Analysis and Systematic Review of the Effect of Chatbot Technology Use in Sustainable Education. Sustainability 2023, 15, 2940. [Google Scholar] [CrossRef]
Park, J.; Choo, S. Generative AI Prompt Engineering for Educators: Practical Strategies. J. Spec. Educ. Technol. 2025, 40, 411–417. [Google Scholar] [CrossRef]
Kambhampati, S.; Valmeekam, K.; Guan, L.; Verma, M.; Stechly, K.; Bhambri, S.; Saldyt, L.; Murthy, A. LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks. arXiv 2024, arXiv:2402.01817. [Google Scholar] [CrossRef]
Holmes, S.; Bond, R.; Moorhead, A.; Zheng, J.; Coates, V.; McTear, M. Towards Validating a Chatbot Usability Scale. In Human-Computer Interaction—INTERACT 2023; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023; Volume 14033, pp. 321–339. [Google Scholar] [CrossRef]
Følstad, A.; Brandtzaeg, P.B. Users’ Experiences with Chatbots: Findings from a Questionnaire Study. Qual. User Exp. 2020, 5, 3. [Google Scholar] [CrossRef]
Borsci, S.; Schmettow, M. Reexamining the Chatbot Usability Scale (BUS11) to Assess User Experience with Customer Relationship Management Chatbots. Pers. Ubiquitous Comput. 2024, 28, 1033–1044. [Google Scholar] [CrossRef]
Adamopoulou, E.; Moussiades, L. Chatbots: History, Technology, and Applications. Mach. Learn. Appl. 2020, 2, 100006. [Google Scholar] [CrossRef]
Guo, D.; Yang, D.; Zhang, H.; Song, J.; Wang, P.; Zhu, Q.; Xu, R.; Zhang, R.; Ma, S.; Bi, X.; et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 2025, 645, 633–638. [Google Scholar] [CrossRef]
Gupta, G.K.; Pande, P.; Acharya, N.; Singh, A.K.; Niroula, S. LLMs in Disease Diagnosis: A Comparative Study of DeepSeekR1 and O3 Mini Across Chronic Health Conditions. arXiv 2025. [Google Scholar] [CrossRef]
He, Z.; Shu, W.; Ge, X.; Chen, L.; Wang, J.; Zhou, Y.; Liu, F.; Guo, Q.; Huang, X.; Wu, Z.; et al. Llama Scope: Extracting Millions of Features from Llama 3.1 8B with Sparse Autoencoders. arXiv 2024, arXiv:2410.20526. [Google Scholar] [CrossRef]
Luo, H.; Liu, Y.; Zhang, R.; Wang, J.; Sun, G.; Niyato, D.; Yu, H.; Xiong, Z.; Wang, X.; Shen, X. Toward Edge General Intelligence with Multiple-Large Language Model (Multi-LLM): Architecture, Trust, and Orchestration. arXiv 2025, arXiv:2507.00672. [Google Scholar] [CrossRef]
Chen, Z.; Li, J.; Chen, P.; Li, Z.; Sun, K.; Luo, Y.; Mao, Q.; Li, M.; Xiao, L.; Yang, D.; et al. Harnessing multiple large language models: A survey on llm ensemble. arXiv 2025, arXiv:2502.18036. [Google Scholar]
Nguyen Nhat Minh, N.; Baker, A.; Neo, C.; Roush, A.; Kirsch, A.; Shwartz-Ziv, R. Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs, 2025. arXiv 2025, arXiv:2407.01082. [Google Scholar]
Larbi, D.; Denecke, K.; Gabarron, E. Usability testing of a social media chatbot for increasing physical activity behavior. J. Pers. Med. 2022, 12, 828. [Google Scholar] [CrossRef]
Fulgencio, S.V. Developing effective educational chatbots with GPT: Insights from a pilot study in a university subject. Trends High. Educ. 2024, 3, 155–168. [Google Scholar] [CrossRef]
Patil, R.; Heston, T.F.; Bhuse, V. Prompt engineering in healthcare. Electronics 2024, 13, 2961. [Google Scholar] [CrossRef]

Figure 1. Chatbot interfaces for support mode (a) and AI mode (b).

Figure 2. Usability evaluation for Condition A: 69 points (Marginal).

Figure 3. Usability evaluation for Condition B: 91 points (Acceptable).

Table 1. Prompting Techniques.

Principle/Technique	Description	Application in the System	Theoretical Basis
Role-Based Prompting	Adaptation of the prompt according to the user role/profile.	Specialized prompts for parents, teachers, and administrators using role-specific terminology.	Role conditioning and contextualization improve the relevance and alignment of responses with user intent.
Few-Shot Learning	Provision of representative examples to steer the model toward the intended style/content.	Example interactions embedded in the prompt context to guide the response.	In-context demonstrations reduce ambiguity and increase response consistency.
Structured Output Prompting	Constraints on response structure through predefined formatting rules.	Standardized headings and hierarchical organization of the output.	Structured responses improve readability and reduce cognitive load.
Negative Prompting	Explicit specification of disallowed content or behaviors.	Exclusion of restricted topics and suppression of non-requested internal content.	Constraint specification mitigates undesired generations and improves safety.
Enhanced Linguistic Tolerance	Handling of orthographic variation and informal language patterns.	Interpretation of abbreviations and common misspellings in user inputs.	Robust language handling improves accessibility across diverse writing styles.

Table 2. CUQ Dimensions for Chatbot Evaluation.

Evaluated Dimension	Description
Chatbot personality	Assesses the perception of human-like attributes and naturalness in the interaction (items 1, 2, and 4).
Initial onboarding process	Evaluates how easily users understand system operation during first contact (items 3, 5, and 6).
General interaction experience	Measures conversational fluency, perceived naturalness, and effectiveness during system use (items 7–12, 15, and 16).
Error-handling capability	Assesses system responses to confusion, misunderstandings, or unexpected user inputs (items 13 and 14).

Table 3. Model Parameters Used in the Chatbot Configuration.

Parameter	Description	Default	Parent	Teacher	Administrator
temperature	Controls response creativity and randomness (0.0–1.0).	0.7	0.7	0.6	0.75
top_p	Nucleus sampling parameter controlling token selection diversity.	0.95	0.95	0.95	0.95
stream	Enables real-time token streaming during response generation.	true	true	true	true

Table 4. Comparative CUQ Results.

Question	Mean Condition A	Mean Condition B
1. The chatbot’s personality was realistic and engaging.	4.0	4.7
2. The chatbot seemed too robotic.	1.8	1.4
3. The chatbot was welcoming during initial setup.	4.2	4.0
4. The chatbot seemed very unfriendly.	1.4	1.3
5. The chatbot clearly explained its scope and purpose.	2.2	4.8
6. The chatbot gave no indication of its purpose.	3.8	1.1
7. It was easy to navigate the chatbot.	4.3	4.7
8. It would be easy to get confused while using the chatbot.	1.6	1.3
9. The chatbot understood me well.	3.0	4.8
10. The chatbot stopped recognizing many of my inputs.	1.5	1.9
11. The chatbot’s responses were useful, appropriate, and informative.	2.6	4.8
12. The chatbot’s responses were irrelevant.	3.4	1.3
13. The chatbot handled errors or mistakes well.	3.9	4.8
14. The chatbot seemed unable to handle errors.	1.5	1.4
15. The chatbot was very easy to use.	4.4	4.9
16. The chatbot was very complex.	1.5	2.0

Table 5. Final CUQ Usability Scores by Condition.

Calculation	Condition A	Condition B
Odd-item sum	28.6	37.5
Even-item sum	16.5	11.7
Final score	69	91

Table 6. Recoded CUQ Results by Dimension.

Dimension	Mean (Without PE)	Mean (with PE)	Mean Difference	p-Value
Chatbot personality	4.28	4.58	0.30	<0.001
Initial onboarding process	2.87	4.63	1.77	<0.001
General interaction experience	3.79	4.65	0.85	<0.001
Error-handling capability	4.22	4.63	0.42	<0.001

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bastidas-Guacho, G.K.; Azogue Martínez, E.P.; Gabilanes Martínez, M.A.; Moreno-Vallejo, P.X. Improving Chatbot Usability Through Structured Prompt-Based Interaction Design. Multimodal Technol. Interact. 2026, 10, 60. https://doi.org/10.3390/mti10060060

AMA Style

Bastidas-Guacho GK, Azogue Martínez EP, Gabilanes Martínez MA, Moreno-Vallejo PX. Improving Chatbot Usability Through Structured Prompt-Based Interaction Design. Multimodal Technologies and Interaction. 2026; 10(6):60. https://doi.org/10.3390/mti10060060

Chicago/Turabian Style

Bastidas-Guacho, Gisel Katerine, Edison Patricio Azogue Martínez, Marco Antonio Gabilanes Martínez, and Patricio Xavier Moreno-Vallejo. 2026. "Improving Chatbot Usability Through Structured Prompt-Based Interaction Design" Multimodal Technologies and Interaction 10, no. 6: 60. https://doi.org/10.3390/mti10060060

APA Style

Bastidas-Guacho, G. K., Azogue Martínez, E. P., Gabilanes Martínez, M. A., & Moreno-Vallejo, P. X. (2026). Improving Chatbot Usability Through Structured Prompt-Based Interaction Design. Multimodal Technologies and Interaction, 10(6), 60. https://doi.org/10.3390/mti10060060

Article Menu

Improving Chatbot Usability Through Structured Prompt-Based Interaction Design

Abstract

1. Introduction

2. Related Work

2.1. Chatbots in Education: Systematic Reviews

2.2. Prompt Engineering

2.3. Usability and User Experience

3. Methodology

3.1. Description of the AI Models Used

3.2. Experimental Process

3.2.1. Technological Implementation and Chatbot Architecture

3.2.2. Experimental Configuration Process

Condition A Configuration (Without Prompt Engineering)

Condition B Configuration (with Structured Prompt-Based Interaction Design)

Prompting Techniques and Adjustments

3.2.3. Usability Test Design

4. Results

4.1. Preparatory Phase

4.2. Population and Instruments

4.3. Experimental Parameter Configuration

4.4. Experimental Results

4.5. Statistical Analysis

5. Discussion

5.1. Limitations

5.2. Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Prompt Documentation for Reproducibility

Appendix A.1. Base Prompt Template

Appendix A.2. Role-Specific Prompt Variants

Appendix A.2.1. Parent Role Variant

Appendix A.2.2. Teacher Role Variant

Appendix A.2.3. Administrator Role Variant

Appendix A.3. Dynamic Prompt Construction

Appendix A.4. Representative Few-Shot Examples

Appendix A.4.1. Example 1: Parent-Oriented Query

Appendix A.4.2. Example 2: Teacher-Oriented Query

Appendix A.4.3. Example 3: Administrator-Oriented Query

Appendix A.5. Input Filtering and Linguistic Normalization Rules

Appendix A.5.1. Linguistic Normalization

Appendix A.5.2. Role-Based Topic Filtering

Appendix A.5.3. Representative Out-of-Scope Responses

Parent

Teacher

Administrator

Appendix A.6. Output Post-Processing Rules

Appendix A.7. Representative Interaction Examples

Appendix A.7.1. Example A: Initial Parent Interaction

Appendix A.7.2. Example B: Teacher Clarification Request

Appendix A.8. Scope of Reproducibility

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI