1. Introduction
AI-based chatbots have been increasingly adopted in educational and service-oriented contexts, where they support communication, information access, and interaction with digital systems. These systems can provide immediate assistance by answering questions, offering guidance, and facilitating access to relevant information. Prior research has also identified prompt engineering as an important factor in shaping the quality and relevance of generated responses, particularly in goal-oriented interactions with generative AI systems. Higher-quality prompting skills have been associated with better system outputs, highlighting the importance of prompt formulation in applied conversational settings [
1].
The scientific literature has extensively documented the benefits of chatbots in educational environments. Labadze et al. [
2], in a systematic review of 67 studies, identified key areas of benefit, including support for tasks and study, personalized assistance, and the development of diverse skills. For educators, the main benefits include time-saving assistance and pedagogical improvement, although they also identified challenges related to reliability, accuracy, and ethical considerations. Similarly, Okonkwo and Ade-Ibijola [
3] analyzed 53 articles and highlighted the potential of chatbots to provide fast and personalized services, emphasizing that their integration into educational institutions can significantly improve service accessibility and efficiency.
However, important challenges persist in the effective implementation of these systems. Wollny et al. [
4] identified three major challenges in their comprehensive review: aligning chatbot evaluations with implementation goals, considering learning as a process rather than only an outcome, and reporting findings in ways that enable systematic comparisons. Kuhail et al. [
5] found that educational chatbots are primarily designed as web platforms to teach computer science, languages, and general education, highlighting the importance of interaction design principles to improve user experience.
On the other hand, prompting has emerged as a paradigm for adapting pre-trained language models to downstream tasks without requiring full model retraining. Rather than modifying model parameters, prompting uses carefully designed input instructions to influence model behavior and adapt responses to specific tasks and contexts. Liu et al. [
6] provides a comprehensive conceptual and methodological overview of prompting in natural language processing. Their work highlights prompting not only as a technical mechanism for task adaptation, but also as a broader shift in how human users interact with increasingly capable language models, making it relevant for applications that depend on precise instruction design and context-sensitive responses.
Usability evaluation in educational chatbots requires specialized instruments that capture the particularities of conversational systems. Holmes et al. [
7] developed the Chatbot Usability Questionnaire (CUQ) specifically to address the limitations of conventional tools, evaluating aspects such as chatbot personality, user experience, error handling, and onboarding. Prompt design has been increasingly recognized as a relevant factor in shaping user interaction with generative AI systems. Lee and Palmer [
8] discuss the importance of prompt engineering in higher education contexts, while Walter [
9] highlights the role of strategic input design in supporting meaningful and context-sensitive interactions with AI tools.
This study aims to evaluate the impact of a prompt-enhanced conversational configuration on perceived usability in a childcare center management context. Rather than isolating prompt engineering as a standalone variable, the proposed approach integrates role-based contextualization and interaction refinement mechanisms within the chatbot system. The study focuses on assessing overall user experience within a multi-role environment, including parents, teachers, and administrators, with particular emphasis on the predominant user group in the system context.
4. Results
This study evaluated the usability of a chatbot using the Chatbot Usability Questionnaire (CUQ) under two experimental configurations: a baseline configuration (Condition A) and a structured prompt-based interaction design (Condition B). In both phases, participants (parents, teachers, and administrators) interacted with the system in 10-min sessions for each configuration and subsequently completed the CUQ.
4.1. Preparatory Phase
During the preparatory phase, functional tests and validation of the two chatbot configurations were conducted. Correct integration of DeepSeek R1 and Llama 3.1 was verified, along with stability of communication with the Hugging Face APIs. At this stage, interaction scenarios were defined, model operating parameters were adjusted, and prompts for Condition B were designed.
Condition A operated without prompt engineering, using standard parameters (temperature = 0.7, top_p = 0.95) and processing queries directly without contextualization. Condition B implemented optimized contextual prompts with automatic role recognition and response personalization. This design incorporated role-aware contextualization intended to adapt the chatbot’s tone and content according to the user profile (parent, teacher, or administrator).
4.2. Population and Instruments
The sample consisted of 30 participants from the childcare center, distributed across the three main system user roles as follows: 23 parents, 6 teachers, and 1 administrator. The majority of participants corresponded to parents, who represent the primary and most active user group in the evaluated system context.
Each participant performed controlled queries classified into low, medium, and high complexity levels, ensuring comparable interaction structures across both experimental conditions. The Chatbot Usability Questionnaire (CUQ) was administered after each interaction phase to assess perceived usability.
Although multiple user roles were considered in the system design, the present analysis focuses on overall usability outcomes and does not include a role-based statistical comparison. Therefore, the reported results primarily reflect general user experience, influenced by the predominant user group.
Chatbot Usability Questionnaire (CUQ): The CUQ is a specialized instrument for evaluating the usability of conversational interfaces, designed to capture unique aspects of human–chatbot interaction [
7]. The instrument consists of 16 items rated on a 5-point Likert scale: “Strongly disagree,” “Disagree,” “Neutral,” “Agree,” and “Strongly agree”. The CUQ score was calculated using Equation (
1).
where
is the sum of odd-numbered item scores and
is the sum of even-numbered item scores.
4.3. Experimental Parameter Configuration
Both chatbot configurations maintained the same core routing logic, latency threshold, and general system configuration to ensure comparability across experimental conditions. At the same time, some response-generation parameters were adapted according to user role (parent, teacher, administrator) as part of the operational design of the system. These role-specific settings were preserved consistently across conditions and were not treated as a differential factor between the baseline and prompt-enhanced configurations.
Table 3 shows the parameter configuration used for the different role profiles.
4.4. Experimental Results
The results in
Table 4 show notable differences between experimental conditions. Condition A obtained lower mean scores in explaining the system purpose (
) and understanding user queries (
), whereas Condition B reached mean scores of 4.8 in both dimensions. Additional improvements were observed in personality perception (4.0 vs. 4.7) and reduced robotic perception (1.8 vs. 1.4), suggesting that the enhanced configuration supported more natural and effective interactions.
The global usability calculation in
Table 5 confirmed a 32% increase in the CUQ score, from 69 points (marginal) to 91 points (acceptable). This reflects a substantial improvement in user experience after prompt optimization.
The usability scales shown in
Figure 2 and
Figure 3 indicate a transition from marginal to acceptable usability between conditions, suggesting that the enhanced configuration reached a satisfactory level for practical use in childcare center settings.
4.5. Statistical Analysis
In order to complement the descriptive results, inferential statistical analysis was conducted using the recoded CUQ scores. Negative items were reverse-coded so that higher values consistently represent better usability.
A paired t-test was conducted to compare recoded CUQ scores between conditions. The baseline condition obtained a mean score of M = 3.76 (SD = 0.28), whereas the prompt-enhanced condition reached M = 4.63 (SD = 0.17). The difference between conditions was statistically significant, t(29) = 20.57, p < 0.001, with a mean paired difference of 0.87 (95% CI [0.78, 0.95]); the paired-samples effect size was Cohen’s d = 3.76. The Wilcoxon signed-rank test confirmed the same pattern (p < 0.001), indicating a consistent increase in perceived usability across participants.
At the dimension level, all four CUQ dimensions improved under the prompt-enhanced condition. The largest increase was observed in the initial onboarding process (
), followed by general interaction experience (
), error-handling capability (
), and chatbot personality (
) as shown in
Table 6. These results suggest that the prompt-enhanced condition showed its largest improvement in helping users understand the system and engage more effectively during early interaction.
Item-level analysis further supported this trend. Most items showed improvements in the expected direction, particularly those related to system purpose, response usefulness, input recognition, and relevance of responses. However, the effects were not entirely uniform. Two items showed no statistically significant differences, and one item related to perceived complexity exhibited a slight decrease after reverse coding. This indicates that, although the overall impact was clearly positive, some aspects of usability may be less sensitive to prompt-based optimization than others.
5. Discussion
The results suggest that the prompt-enhanced conversational configuration is associated with improved perceived usability of the chatbot system. The significant improvement observed in the global CUQ score indicates that the enhanced configuration was associated with a more favorable user experience. In particular, the strongest improvement was found in the onboarding dimension, suggesting that structured prompt-based strategies were particularly useful for clarifying the chatbot’s purpose and supporting users during their initial interaction.
Improvements were also observed in general interaction experience, chatbot personality, and error-handling capability. These findings indicate that structured prompt-based interaction design can improve the relevance, coherence, and communicative clarity of chatbot responses in service-oriented environments. However, the item-level results also show that the effects were not entirely uniform across all aspects of usability, suggesting that some perceptions, such as complexity, may be influenced by factors beyond the interaction design alone.
According to the CUQ results, the enhanced configuration improved query understanding (from 3.0 to 4.8), suggesting better interpretation of users’ natural language. It also improved the explanation of system purpose (from 2.2 to 4.8), indicating clearer initial communication regarding the chatbot’s role and scope. Moreover, response usefulness increased from 2.6 to 4.8, suggesting greater relevance and precision in interactions.
These findings are consistent with prior research showing that chatbot usability is strongly influenced by response clarity, conversational coherence, and system transparency. Larbi et al. [
23], using the CUQ in an applied chatbot setting, showed that usability perceptions are closely related to how clearly and naturally the chatbot communicates with users. Similarly, Labadze et al. [
2] emphasize that, in educational and support-oriented environments, the effectiveness of AI chatbots depends not only on the language model itself but also on the quality of the interaction design. In the same direction, Fulgencio [
24] highlights that structured prompt design and iterative refinement can improve interaction quality and user acceptance in educational chatbot deployments. From a more general perspective, Patil et al. [
25] argue that carefully designed prompts are particularly important in domains requiring contextual precision and clear communication, since they can reduce ambiguity and improve response relevance. In this sense, the present study provides empirical evidence that a prompt-enhanced conversational configuration can improve perceived usability in a childcare support context.
Considering the multi-role context (parents, teachers, and administrators), an overall increase in user satisfaction was observed. However, since the analysis was conducted at an aggregate level and the majority of participants corresponded to parents, these results primarily reflect general usability trends influenced by the predominant user group. The analysis of chatbot perception also showed a reduction in robotic interaction perception (from 1.8 to 1.4) and a considerable increase in naturalness and communicative closeness (from 4.0 to 4.7). These results are also in line with prior literature suggesting that well-designed prompts and structured conversational strategies contribute to more natural and satisfactory interactions [
24,
25].
The chatbot was deployed in an early childhood education context (Initial Level 1), where it serves as a support tool for parents, teachers, and administrative staff to access guidance on child development activities and practices. The system is intended for non-expert users, both in terms of technology use and child development knowledge, and aims to facilitate access to understandable and context-appropriate information.
In this sense, the primary objective of the present study is to evaluate usability, specifically whether users can effectively interact with the system and understand its responses. The chatbot is designed as a guidance tool and does not replace professional or clinical expertise. Therefore, the evaluation focuses on user experience rather than on the verification of content accuracy or safety. The study does not evaluate learning outcomes or knowledge acquisition, but focuses on usability and interaction quality within a childcare support context.
5.1. Limitations
The study presents several methodological limitations. First, the experimental design did not include randomization or counterbalancing of condition order, which may introduce order effects and learning effects. Since participants interacted with both conditions within the same session, familiarity with the system may have influenced responses in the second phase.
Second, no washout period was implemented between conditions, which limits the ability to isolate carryover effects. Although comparable task structures were used, their equivalence was not formally validated.
Furthermore, although the system supports multiple user roles (parents, teachers, and administrators), the majority of participants corresponded to parents, who represent the primary user group in the evaluated context. The analysis was conducted at an aggregate level, and no role-based statistical comparison was performed. As a result, the findings may be influenced by the predominant user group and do not reflect potential differences across roles.
The study was also conducted with a relatively small sample of 30 participants within a specific childcare center context. This sample size is partly explained by the limited number of users served by the institution, which operates on a relatively small scale. Therefore, the findings should be interpreted with caution and may not be directly generalizable to other institutional, cultural, or linguistic settings.
Finally, although the chatbot operates in a context related to child development guidance, the present study does not include an evaluation of response accuracy, reliability, or safety. The system is intended as a support tool for non-expert users and does not replace professional or clinical expertise.
5.2. Future Work
Future research should extend the present study in several directions. First, larger and more diverse samples should be considered in order to validate the robustness of the observed usability improvements across different institutional, cultural, and linguistic contexts.
Second, future studies should incorporate objective performance-based evaluation metrics such as task completion rate, response time, and interaction efficiency in order to complement perceived usability measures.
Third, qualitative feedback should be systematically collected and analyzed through open-ended comments, user narratives, or interaction-based observations to provide a richer human–computer interaction perspective.
In addition, more detailed role-specific analyses should be conducted to explore whether parents, teachers, and administrators differ in their perception of the chatbot configuration.
Finally, expert-based validation should be incorporated to assess the appropriateness, reliability, and safety of the generated responses, particularly given the childcare guidance context in which the chatbot operates.