The Application of AI Chatbot System Based on CLIL Concept in the Teaching of Artificial Intelligence Courses

Liu, Ziqi; Wang, Qian

doi:10.3390/app16031633

Open AccessArticle

The Application of AI Chatbot System Based on CLIL Concept in the Teaching of Artificial Intelligence Courses

by

Ziqi Liu

^1,2

and

Qian Wang

^2,*

¹

Faculty of Education, The University of Melbourne, Parkville, Melbourne, VIC 3010, Australia

²

Department of Computer Science, Durham University, Stockton Road, Durham DH1 3LE, UK

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(3), 1633; https://doi.org/10.3390/app16031633

Submission received: 4 November 2025 / Revised: 27 January 2026 / Accepted: 29 January 2026 / Published: 5 February 2026

(This article belongs to the Special Issue Application of Smart Learning in Education)

Download

Browse Figures

Versions Notes

Abstract

The interdisciplinary nature of artificial intelligence courses forces non-computer science majors to contend with the simultaneous challenges of terminology comprehension and language cognition. To increase the efficiency of terminology teaching, this project develops and deploys an OpenAI-based AI chatbot teaching system that incorporates the concept of content and language integrated learning (CLIL). The system creates a dual-track “terminology layer-cognition layer” framework that includes term recognition, multi-level explanation (contextual examples and conceptual associations), task-driven dialogues, and conversation memory bank (CMB) modules. It then guides students through natural language interactions to master the core AI terms in context. The system’s effectiveness was confirmed in a controlled experiment with 98 participants (including computer and non-computer majors) separated into two groups: experimental (chatbot teaching) and control (conventional PPT teaching). In terms of terminology mastery, the experimental group’s posttest score (86.0 ± 5.33) was considerably higher than that of the control group (66.98 ± 5.6). Non-computer science major students showed a more significant improvement effect (83.29 ± 4.5 vs. 63.62 ± 4.68 for the control group). Non-computing students evaluated the clarity of systematic terminology explanation (4.33 ± 0.76) and the effectiveness of contextual assistance (4.21 ± 0.88) as the most important aspects of their learning experience. These experimental results show that the fusion AI chatbot teaching system developed in this study can improve teaching efficiency while effectively reducing cognitive load, and that the task-guided and immediate feedback mechanism can significantly increase students’ learning engagement.

Keywords:

artificial intelligence; chatbot; CLIL; teaching

1. Introduction

The rapid growth of AI technology has accelerated the process of introducing AI courses into the system of general education and professional foundation courses in colleges and universities [1,2]. This shift in education represents the evolution of AI education from specialization to popularization [3]. Courses like Introduction to Artificial Intelligence and AI Fundamentals and Applications, particularly for non-computer majors, have become a significant entrance point for students from interdisciplinary backgrounds to get in touch with AI technology [4,5]. These curricula show the broad applicability of AI in numerous sectors, as well as the trend in AI education towards multidisciplinary integration [6]. However, such courses encounter numerous obstacles in teaching practice. According to research, non-computer science majors struggle to learn programming and math skills [7]. The absence of fundamental abilities hinders their comprehension of essential ideas like “deep learning,” “neural networks,” and “fully connected layers,” while the technical intricacy of these concepts intensifies the learning obstacles [8,9]. In STEM disciplines (Science, Technology, Engineering, and Mathematics), the intricacy of the terminology intensifies comprehension challenges for non-specialized students, rendering this linguistic barrier a significant obstacle in computer science education [10,11]. Moreover, these obstacles to learning can diminish non-majors’ motivation and classroom involvement, thereby impacting their learning continuity [12,13]. This disengagement consequently obstructs the attainment of specialized knowledge, creating a detrimental cycle that must be disrupted by innovative instructional concepts and instruments [14].

Terminology and abstract symbols present considerable obstacles for non-specialized learners, who are ensnared in a dual predicament of terminology and linguistic conceptions [15]. Content and Language Integrated Learning (CLIL) is an innovative educational approach that combines topic knowledge with language instruction, proving effective in imparting specialized knowledge and language skills [16,17,18]. CLIL, as a “dual-focus” pedagogical method, promotes the concurrent integration of subject and language acquisition in genuine circumstances [19,20]. It fosters the synergistic advancement of cognitive and linguistic proficiency via direct understanding of the target language and professional articulation of expertise [21]. In medical, engineering, and other professional domains, CLIL can successfully enhance students’ comprehension of specialized content, augment their proficiency in professional terminology, and so facilitate the concurrent advancement of both professional knowledge and linguistic competence [22,23,24]. The incorporation of the CLIL idea in artificial intelligence technology courses might successfully diminish language comprehension barriers and enhance students’ understanding and acceptance of abstract concepts [25,26,27]. Nonetheless, there is an absence of technical curriculum adaption frameworks for implementing CLIL concepts in AI education both domestically and internationally, particularly with rigorous research on the synergistic improvement of terminology proficiency and conceptual comprehension. Furthermore, conventional CLIL instruction is significantly reliant on the teacher’s proficiency and contextual framework, complicating the implementation of tailored adaptations in large or diverse student cohorts. Consequently, the implementation of interactive and scalable digital tools will be essential to improve the practical functionality of CLIL.

In recent years, AI-driven educational systems, particularly AI Chatbots, have proliferated, showcasing significant promise for interactive teaching and learning in the education sector due to their natural language processing and semantic dialogue capabilities [28,29]. Chatbots amalgamate the advantages of traditional instructor-led and online learning, delivering a tailored one-on-one educational experience that effectively emulates the function of an online instructor, thereby fostering an immersive learning environment for students through contextual dialogues and adaptive feedback [30,31,32]. The chatbot improves learning via interaction and attains notable outcomes in the amalgamation of language acquisition and cultural material [29,33,34]. Nonetheless, current research has predominantly concentrated on language acquisition or humanities disciplines. A substantial gap remains in the practical investigation of integrating Chatbots and CLIL in technology course instruction.

Despite the theoretical dual benefits of the CLIL teaching approach, its actual execution encounters significant hurdles, primarily due to its substantial dependence on instructors’ professional competencies and their capacity to design scenarios. Personalized instruction is challenging to implement in big classrooms or those with considerable disparities in student backgrounds. In recent years, artificial intelligence-driven educational systems, especially AI chatbots with natural language processing capabilities, have increasingly shown promise in individualized instruction, instant feedback, and contextual flexibility. Nevertheless, current research predominantly emphasizes language acquisition or humanities disciplines, and there remains a deficiency in the systematic integration of the CLIL idea with chatbot technology in technical courses. This study seeks to investigate the transformation of the CLIL teaching concept into a scalable, interactive, and personalized terminology instruction pathway via an AI chatbot system, to mitigate the dual language and cognitive challenges encountered by non-computer science students in acquiring artificial intelligence terminology. However, existing research mainly focuses on language learning or humanities subjects [35,36,37]. Despite the extensive application of the CLIL teaching approach in professional education, existing research indicates a deficiency in a practical and scalable implementation framework for technical courses that involve quickly growing and highly specialized terminology [38,39]. Furthermore, while educational chatbots have proliferated, the majority of study concentrates on language learning or general knowledge inquiries, lacking adequate integration with profound pedagogical theories such as CLIL, particularly with practical studies in the educational domain. Consequently, a distinct vacuum exists in the current research domain, notably the absence of a systematic solution that integrates the theoretical benefits of CLIL with the interactive scalability of AI chatbots, while directly addressing the challenges of terminology instruction in technical courses. This study aims to fill this gap.

This study devises and executes a novel Chatbot utilizing the OpenAI platform to resolve the aforementioned issues. This work develops and implements an AI chatbot educational system that incorporates the principles of CLIL, utilizing the OpenAI platform. The system is developed and executed utilizing the OpenAI platform for non-computer science students in the “Introduction to Artificial Intelligence” course. The system emphasizes essential concepts and their conceptual frameworks within the AI course, facilitating students’ comprehension of the knowledge content through natural language conversations while concurrently enhancing their proficiency in the relevant terminology in English. The system facilitates many functions, including terminology elucidation, contextual feedback, adaptive dialogue, question responding, and error rectification, thereby assisting students in internalizing knowledge through task-oriented assistance. This paper designs and conducts a controlled teaching experiment to evaluate the system’s pedagogical efficacy. Computer science majors and non-computer science majors are selected to learn identical content through “chatbot teaching” and “traditional PPT teaching,” respectively. The effectiveness of the system is assessed using pre-tests, post-tests, and a learning experience questionnaire. We evaluated the students’ performance for terminology mastery, comprehension of knowledge, and learning initiative by a pre-test, post-test, and learning experience questionnaire. The trial findings indicate that the chatbot group markedly surpasses the traditional teaching group regarding terminology comprehension accuracy and learning initiative, particularly among non-computer science students, with a more pronounced knowledge acquisition.

This paper’s primary contributions are: (1) The systematic introduction of the CLIL idea into AI course instruction, accompanied by a proposed “language-content integration-oriented terminology teaching path.” The original CLIL concept depends on educators to create tailored courses, which are integrated with algorithms to automatically produce CLIL pathways for thousands of individuals, aiming for large-scale civilian education and the standardization of technology.

According to the CLIL idea, the system incorporates an additional terminology layer (Term) and a cognition layer (Cognition) into the fundamental design of the chatbot, thereby establishing a theoretical model and a pedagogical framework that integrates chatbot technology. It addresses the issue that conventional CLL depends on educators and is challenging to expand.

The TF-IDF technique is employed to identify high-frequency terms and develop an AI terminology corpus. Conversational Memory Bank (CMB) is engineered at the cognitive level to address the issue of disrupted conversational context that may arise in chatbots, hence facilitating coherent learning.

The system’s efficacy in improving the learning outcomes of non-computer science majors is substantiated by empirical study, offering a viable approach and practical foundation for future interdisciplinary and multilingual AI course instruction.

2. Theoretical Basis and Related Work

2.1. The Fit Between CLIL Concept and AI Terminology Learning

CLIL is an educational framework that prioritizes “language as a medium” for acquiring specialized information. The strategy promotes the enhancement of students’ capacity to articulate, comprehend, and apply their knowledge in a second or foreign language while studying specialized disciplinary topic. Y Zhu investigated the acquisition of scientific content knowledge among fifth-grade students in two primary schools in China through the implementation of the CLIL concept, and also analyzed the relationship between scientific achievement and second language competency [40]. JJ Xin created an extensive language and content learning methodology (CLIL) to improve the English-speaking abilities of culinary students [41]. But in “Introduction to Artificial Intelligence” courses, students need to understand a large number of core terms and concepts expressed in English. These terms often carry complex modeling mechanisms and abstract mathematical relationships, and for non-computer science students who lack programming or computing backgrounds, they face both conceptual comprehension barriers and language cognitive burdens. Therefore, a single translation or vocabulary cross-reference can hardly fulfill the deep learning needs.

The CLIL concept emphasizes the significance of incorporating terminology into discussions and activities within authentic contexts, hence enhancing the efficacy of terminology acquisition through the amalgamation of language and content. This method assists students in forming the “term-context-concept” triad and offers theoretical backing for the development of future interprofessional AI education. Traditional CLIL instruction is heavily reliant on the teachers’ proficiency and contextual framework, complicating the implementation of tailored adaptations in large or diverse student groups. Consequently, the implementation of interactive and scalable digital tools has emerged as essential for improving the practical functionality of CLIL.

2.2. Advantages and Limitations of Educational Chatbots

Conversational AI has significantly advanced within the realm of educational technology in recent years, particularly through the use of instructional chatbots. Educational chatbots are commonly employed in language acquisition, knowledge elucidation, autonomous training, and many instructional contexts. In contrast to Learning Management Systems (LMS), chatbots possess superior linguistic interaction capabilities, provide real-time feedback, and exhibit contextual adaptation, making them particularly well-suited for the integrated teaching goals of terminology understanding and language transfer within a CLIL framework.

Table 1 illustrates that educational chatbots exhibit considerable benefits across multiple areas. The system typically offers 24/7 contact, facilitating prompt terminology explanations and concept reminders to diminish students’ reliance on the instructor. Furthermore. It possesses the capability for multi-round discourse, enabling successive exchanges of questions and responses, so fostering a more dynamic semantic context for terminology elucidation. The system possesses a certain level of customisation capabilities. The system possesses a degree of customisation, enabling it to dynamically modify the explanation mode or feedback material based on the students’ proficiency level. The system possesses a degree of customisation, enabling it to dynamically modify its explanations or feedback based on the student’s proficiency level. The low-pressure environment established by the chatbot can successfully diminish anxiety during terminology acquisition and foster independent learning. Additionally, the chatbot’s interaction logs can furnish educators with insights into students’ cognitive states, facilitating data-informed instruction.

Additionally, Table 1 highlights the inherent limits of the existing system. Chatbots frequently struggle to produce effective responses to open-ended inquiries or complex terminological logic relationships. Furthermore, despite their ability to engage in multiple conversational exchanges, the absence of conversational memory hampers the system’s capacity to maintain contextual consistency, resulting in logical inconsistencies and redundant feedback loops. Moreover, in non-English environments, the current model continues to exhibit deficiencies in comprehending technical language, which constitutes a significant impediment in CLIL-type technology courses.

Table 1. Advantages and Limitations of Educational Chatbots.

Aspect	Advantages	Advantages and Limitations of Educational Chatbots
Responsiveness [42]	Provides 24/7 support for learning and terminology explanation	Weak understanding of open-ended questions and complex queries
Interaction style [43]	Enables multi-round dialogue, allowing more continuous and active learning	Often lacks conversational memory, making it difficult to maintain context
Personalization [44]	Offers initial personalization by adjusting difficulty or explanation depth	Limited ability to recognize individual differences; feedback logic is shallow
Emotion and motivation [43]	Creates low-pressure environments to reduce anxiety and promote autonomous learning	Redundant feedback loops may cause user fatigue or confusion
Redundant feedback loops may cause user fatigue or confusion [45]	Generates dialogue logs to enable data-informed teaching	Cannot replace human formative feedback or emotional/pedagogical adaptability
Language adaptability	Supports multilingual settings, suitable for terminology learning in CLIL	Lower language understanding accuracy in non-English contexts

2.3. Examples of Educational Chatbot Systems

The advancement of educational technology has led to the emergence of several educational chatbot systems, extensively utilized in language acquisition, subject matter instruction, cultural exchange, and other domains. A thorough review indicates that these chatbots are frequently developed with specific subject objectives and educational contexts, incorporating distinct interaction styles and task-focused functionalities. Freudbot, designed explicitly for psychology education, facilitates students’ comprehension of Freudian theories through discourse in a digital setting, highlighting the importance of interaction in online learning [46]. Ethnobot, conversely, replicates the interview procedure of an anthropological researcher for the purpose of gathering ethnographic data, highlighting interactive culture investigation [47]. CSIEC produces grammar-conscious responses for English grammar learning activities, harmonizing linguistic reasoning with individualized feedback [48].

Furthermore, systems for younger learners are being developed, such BookBuddy, which offers book recommendations, vocabulary Q&A, and reading assistance via a sub-bot to address the requirements of children’s tiered learning [49]. Gengobot is a multilingual grammar assistant that provides grammatical explanations in Japanese, Indonesian, and English [50]. Xbot is intended for middle school children to acquire mathematical and logical skills, focusing on the improvement of logic, programming, and mathematics through interactive assignments [51]. Notably, AssasaraBot exemplifies the application of CLIL concepts in practical teaching and learning contexts [29]. This system facilitates students’ bilingual cultural education via organized bilingual tasks in a secondary school culture course, effectively merging language and content teaching objectives, and demonstrating the viability of chatbots in a CLIL context.

Table 2 illustrates that these systems are distinct in their design objectives, service targets, and operational tactics. Most systems concentrate on language acquisition, K-12 education, or cultural initiatives, mostly addressing humanistic duties; nevertheless, they have yet to develop a sophisticated solution for comprehending technical terms and constructing complicated knowledge.

Table 2. Representative Educational Chatbots and Their Features.

System	Target Domain	Key Features
Freudbot [46]	Psychology	Teaches Freud’s theories via dialogue in online learning
Ethnobot [47]	Anthropology	Simulates ethnographic interviews to collect cultural data
CSIEC [48]	English Language Learning	Generates grammar-aware replies with user personalization
BookBuddy [49]	K-12 Language Learning	Recommends books and supports vocabulary and reading
AssasaraBot [29]	History & Language (CLIL)	Supports bilingual learning through structured task dialogue
Cleverbot [52]	General Chatting/Language	Enables open-ended language practice via free-form chat
Gengobot [50]	Japanese Language Grammar	Provides a multilingual grammar reference interface
Xbot [51]	STEM Education (K-12)	Enhances logic, programming, and math via chatbot tasks
Mondly/Andy [53]	English Conversation	Offers speaking practice and vocabulary reinforcement options

3. System Design and Functionality

3.1. General System Architecture and Design Background

This study develops an educational chatbot system utilizing the OpenAI platform to investigate the application of the CLIL idea in teaching AI terminology to non-computer science students in the “Introduction to Artificial Intelligence” course. The system aims to enhance students’ comprehension and application of essential words by incorporating natural language interaction, term recognition, task-oriented approaches, and learning log analysis to create an interactive learning environment tailored for term-focused educational contexts. The overall structure is illustrated in Figure 1, which presents a CLIL-TC collaborative architecture segmented into six essential functional modules, adhering to the principles of dual focus, contextualization, and cognitive integration inherent in the CLIL idea. Table 3 provides extensive functional descriptions. From a basic question-and-answer assistant to an intelligent tutor with specific teaching logic and context adaptability in education, the educational chatbot system has developed into a human–computer interaction tool with natural language understanding and generating capabilities. Chatbots can be used to teach terminology. The system can identify professional terms entered by students in real time and offer multi-level explanations, including definitions, examples, and related concepts, thanks to its instantaneous term recognition and interpretation. This helps to lessen the cognitive burden of terminology. By using a task-driven method and context embedding, terms are organically integrated into subject-related contexts through the creation of dialogue activities, fostering the natural relationship between terms, context, and concepts. With the help of personalized learning paths, the system may achieve early differentiated teaching by dynamically modifying the task complexity and level of explanation based on students’ replies. conversation memory and coherence learning: The CMB conversation memory module allows the system to avoid the issue of context disruption in terminology instruction while maintaining consistency over several dialogue rounds. This research’s AI chatbot teaching system incorporates several fundamental principles of traditional learning theories into its pedagogical logic and interaction design. The system adheres to the constructivist learning principle, highlighting that learners actively build knowledge within significant contexts. The system generates authentic AI application dialogue scenarios and directs students to investigate, inquire, and connect concepts through many conversational exchanges, with the objective of enhancing students’ profound conceptual comprehension of terminology. Secondly, the system utilizes the ideas of reinforcement and feedback derived from behaviorist learning theory. The design incorporates prompt feedback on correct and incorrect responses, intentional repetition of terminology in subsequent tasks, and visualization of task completion progress, all aimed at delivering timely reinforcement, cultivating proper term usage habits, and augmenting learning motivation. The CLIL idea offers a comprehensive framework that incorporates language as both a cognitive instrument and a social practice. This system can be considered a technology-enhanced learning environment, with CLIL as the overarching framework, effectively employing constructivist and behaviorist tactics to facilitate the integrated advancement of language and content.

The system is implemented on a web platform, requiring no installation, hence facilitating its usage in big class instruction or inter-school promotion. All modules collaboratively focus on terminology development and contextualization input, demonstrating the profound integration of language and content within the CLIL framework.

The system employs a modular architectural design, comprising five essential functional modules: Dialogue Interface, Terminology Processor, Task Flow Controller, Learning Logger, and CBM module. Table 3 presents the functional description of each module.

The technology is implemented on a web platform accessible via desktop and mobile devices, enabling students to engage with the robot through natural language without the need for program installation. All functional modules are interconnected through terminology acquisition and contextual proficiency to achieve the CLIL objective of “language-content integration.”

3.2. Function Module Design

The system is structured around two primary educational objectives: terminology acquisition and contextual comprehension, comprising five interrelated functional modules: Dialogue Interface, Terminology Processor, Task Flow Controller, Learning Logger, and CBM module. Each module possesses a distinct function within the overarching system architecture and is interconnected via the Dialogue Interface and Task Logic, so creating a comprehensive terminology-driven learning feedback loop. The functional logic and implementation technique of each of the six modules are elucidated sequentially.

3.2.1. Dialogue Interface

This module facilitates natural language contact between the user and the robot, serving as the primary interface for student-system engagement. The system accommodates inputs in both English and Chinese, identifies terminology inquiries, instructional tasks for learning, or open-ended questions posed by students, and provides appropriate responses through the language model interface. The system’s output is contingent upon the input language category, preserving the fundamental terminology in its original language. When required, the terminology is augmented with its corresponding native language category to enhance students’ comprehension of its meaning in practical circumstances. The method aids pupils in comprehending the significance of terminology within authentic contexts. The phrase “backbone” originated in English, and its translation into Chinese may lead to confusion. This module aligns with the dual emphasis principle of CLIL, facilitating non-specialized students’ comprehension of terminology and their acquisition of knowledge from language to content.

3.2.2. Terminology Processor

The Terminology Processor is tasked with detecting terminology and doing semantic analysis of student input to guarantee terminological clarity and conceptual consistency within the CLIL framework. The module includes an integrated glossary of AI course vocabulary and identifies the semantic context and intended usage of the phrases. The system will automatically access the terminology knowledge base for the specified terms and provide three types of information: standard language definitions, native word auxiliary explanations, and contextual example sentences, facilitating a multi-dimensional understanding framework of “term-concept-context” for students. This module aligns with the notion of contextualization in CLIL and is distinct from broad examples to create closely connected instances that improve students’ learning efficiency.

3.2.3. Task Flow Controller

This module is tasked with arranging and managing the overall instructional flow of the system’s dialogues to actualize the CLIL idea of “language task-driven content learning,” adhering to the principle of cognitive integration inherent in CLIL. The module establishes many instructional units derived from teaching scripts, with each unit comprising four phases: introduction of terminology, elucidation of knowledge points, interrogation of terminology (e.g., judgment questions, fill-in-the-blanks, selection of definitions, etc.), and task summarization. The module regulates the activation of task-switching nodes in the dialogue to guarantee that the learning process adheres to the sequential logic of “term recognition-term application-term reproduction”. The module facilitates dynamic modification of task difficulty and explanatory depth based on students’ performance to provide initial individualized assistance.

3.2.4. Learning Logger

This module is tasked with the real-time logging of students’ interactive behaviors during the learning process to assist subsequent teachers in conducting teaching diagnosis and feedback regulation. It records input content, frequency of terminology inquiries, response outcomes, and error rates, among other metrics. All logs are time-stamped and labeled with a timestamp. All records are time-stamped and classified by teaching unit, facilitating the evaluation of teaching efficacy, analysis of learning difficulties, and monitoring of terminology knowledge. The system interface produces concise stage-by-stage summaries for students, encompassing terminology mastering progress, task completion rates, and individualized reminders to foster self-monitoring skills.

3.2.5. CBM Module

The Conversation Memory Module is designed to address context disruption and terminology isolation in instructional robots by employing dynamic context embedding and terminology disambiguation. The precise implementation involves utilizing the text embedding model provided by OpenAI to transform the dialogue text into 768-dimensional vectors. Utilize Pinecone technology to construct a database for the storage of dialog history vectors and facilitate similarity retrieval. SQLite is employed to provide a metadata repository that logs conversation duration, user identification, term significance, etc., allowing the model to maintain conversational memory and engage in ongoing dialogues within the same context.

Conventional CLIL instruction encounters the challenge that the integration of linguistic objectives and content objectives depends on the specific experiences of teachers, making large-scale implementation problematic. Conversely, static teaching resources are incapable of accommodating the multifaceted metacognitive requirements of multidisciplinary learners. This study proposes the CLIL-TC model, which statistically analyzes core terms from the topic corpus and creates specific binding situations in conjunction with GPT-4, thereby integrating terminology learning into the students’ existing knowledge framework. Table 4 delineates the benefits of the CLIL-TC framework in comparison to the original CLIL model.

3.3. Teaching Content Organization and Terminology Embedding Design

To guarantee that the educational chatbot possesses a distinct pedagogical objective and terminology development capability in the AI course, the system is founded on the conventional knowledge framework of the “Introduction to Artificial Intelligence” course, categorizing the instructional material into three fundamental knowledge areas: supervised learning, neural networks, and reinforcement learning. Each module aligns with a collection of high-frequency phrases, which are contextualized during the task promotion process, facilitating students’ comprehension, application, and reinforcement of the terms’ meanings in authentic dialogues. The instruction provided by this system can be categorized as technology-mediated adaptive inquiry-based learning. The task-guiding system, by establishing serialized and context-specific problem-solving tasks, encapsulates the essence of inquiry-based and problem-oriented learning, compelling students to comprehend concepts through application. The immediate feedback system is a crucial aspect of programmed instruction and formative evaluation. The system offers ongoing and tailored feedback mechanisms via the dialogue interface, allowing students to promptly verify their comprehension or rectify misconceptions. This serves as both a method of behavioral reinforcement and a procedure that fosters metacognitive development. This instructional design, which integrates exploratory tasks with specific feedback, is the fundamental reason the system may significantly improve students’ interest and proficiency.

3.3.1. Teaching Module Division and Term Selection

Screening the high-frequency terminology database through the TF-IDF algorithm not only quantifies the importance of words, but also eliminates the influence of errors caused by subjective screening. The essence of the algorithm is to count the frequency of occurrence of terms; the higher the frequency of occurrence means that the more important the term, the. The calculation of the specific formula is as follows:

T F = \frac{T}{T w}

(1)

I D F = \log_{A} T D

(2)

T F - I D F = T F \times I D F

(3)

where

T F

represents the frequency of a word in a single document,

T

represents the number of times the word appears in the document, and

T W

represents the total number of words. IDF is used to measure the scarcity of a word in the whole corpus;

A

denotes the total number of documents in the corpus, and

T D

denotes the number of documents containing the word. Wherea larger value of

T F - I D F

indicates that the word is more important to the current document. The high-frequency terminology corpus filtered by TF-IDF is the cornerstone of the CLIL system to realize accurate terminology teaching and ensure that students prioritize the linguistic expressions of core concepts of their disciplines. The selected textbooks and courses are aggregated, and video teaching requires counting their subtitles and textualizing them. The obtained text content is pre-processed by lexicalization, removing irrelevant determiners such as “is” and “the”.

The obtained keyword data are calculated by using the above formula, and according to the syllabus and the focus of the course, the three teaching modules and their core terms selected by the system are shown in Table 5:

The selection of terminology adheres to the principles of “high frequency, representativeness, and moderate cognitive load,” prioritizing terms that are commonly utilized in subsequent courses or practical applications, and that exemplify linguistic expression and conceptual structure, thereby aiding in the construction of a unified library system and the design of multilingual interpretation.

3.3.2. Terminology Embedding and Context Trigger Mechanisms

The dialog process design employs a context embedding strategy rather than a singular terminology defining method, seamlessly incorporating terms into the dialogue through learning task situations. For instance, in elucidating the Supervised Learning module, the system initially presents the “fruit and vegetable classification task” scenario, facilitates students’ comprehension of the concepts of labels and training data, and subsequently directs students to discern the distinction between classification and regression through inquiries. The system instructs students to distinguish between classification and regression through inquiries and selectively enhances the explanation of decision boundaries or loss functions based on students’ responses.

This approach guarantees that the terminology is reiterated throughout the “input-processing-output” framework, circumventing the short-term memory issues associated with the isolated display of language in conventional instruction. The system establishes a “term recurrence” node, which actively retrieves previously taught terms in subsequent tasks, prompting students to re-explain them or select the appropriate English expressions, so reinforcing the dual transfer of terms and knowledge.

To guarantee the efficacy of instruction and evaluate the proficiency of various terminology levels, the system tasks and dialogue exercises are categorized into three ascending levels according to cognitive complexity: (1) Recognition and recall: emphasizes the re-identification of terminology and fundamental definitions; (2) Understanding and association: necessitates the differentiation or correlation of terms within uncomplicated contexts; (3) Application and analysis: demands the synthesis of multiple terms for elucidation or reasoning in intricate scenarios that replicate genuine problem-solving situations. The system adapts based on pupils’ success at the current level and introduces further assignments only if mastery is verified.

3.3.3. Multilingual Support and Interpretation Hierarchy Design

This system is designed for non-computer science majors, who exhibit considerable variation in language proficiency and professional backgrounds. To enhance the adaptability and clarity of terminology explanations, the system incorporates a multi-language support and hierarchical presentation mechanism in its terminology output strategy. The terminology explanation now use uses a multi-layered structured output technique rather than a singular lexical definition, encompassing several dimensions like language translation, context reproduction, and concept enlargement.

Table 6 delineates the layers of terminology explanation: Primary Definition, Chinese Explanation, Contextualized Usage, and Conceptual Relationship. The English Definition ensures academic precision, the Chinese Explanation reduces cognitive barriers, the Contextualized Usage highlights the practical application of terms, and the Conceptual Relationship aids students in comprehending the logical framework of the terms, such as linking the activation function with ReLU, sigmoid, etc.

The system assesses terminology competence based on the student’s input and interaction history, dynamically deciding whether to present the complete hierarchy or a simplified explanation. For frequently encountered terms, the system may exclude the Chinese explanations and retain solely the contextual example sentences as a reference; conversely, for terms introduced for the first time, the system emphasizes the presentation of the four-layered structure to facilitate students’ comprehension of the terms’ semantics while mastering their linguistic expressions and applications.

This multilingual hierarchical framework enhances the adaptability and diversity of terminology interpretation while efficiently executing the CLIL idea of concurrent language and knowledge instruction.

3.4. User Interaction Process and Experience Design

To achieve the objective of terminology-focused interactive learning, the system incorporates task scripts and a context embedding mechanism into the dialogue generation logic, establishing a closed loop of “input-interpretation-feedback-reproduction” interaction. The interactive approach facilitates the progressive enhancement of terminology instruction while addressing several user experience requirements, including coherence, contextual authenticity, and job pacing regulation. The system initially presents the current learning module to the students and succinctly highlights the essential terminology within the module; for instance, in the supervised learning module, the system directs students to focus on keywords such as label and regression. The system then provides scenarios for pupils to engage with one another. The system subsequently commences an initial dialogue by establishing a scenario or posing open-ended questions to orient the student within the terminology-related context. Upon students indicating their interest, inquiries, or ambiguous comprehension of a term, the system activates the terminology explanation module, which delivers English definitions, Chinese supplementary explanations, contextual examples, and terminology cues derived from the multi-layer structure in Table 5 to assist students in comprehending the linguistic and conceptual framework of the target term from various angles.

Upon grasping the vocabulary, the system will autonomously produce relevant task exercises, often comprising judgment questions, word selection, and brief explanatory responses. Upon completion of their responses, the system will promptly deliver feedback regarding the accuracy of the answers and offer specific linguistic advice or reinterpretation based on the context of the responses. In the ensuing dialogues, the system will deliberately prompt the students to utilize the current language to assess their language transfer capabilities and conceptual reproduction levels, ensuring that the terminology does not remain just in the passive receipt phase. Upon the conclusion of each task round, the system produces a comprehensive learning summary that details mastered terminology, reviews common erroneous terms, and offers recommendations for the subsequent module. This facilitates student self-monitoring and serves as a reference for educators in future instruction.

Figure 2 illustrates that the system’s task interface employs the reinforcement learning module as its foundation, showcasing the interactive learning process of the term “reward,” which encapsulates the framework of “term guidance-contextual explanation-task practice-instant feedback-conceptualization.” The system interface utilizes a reinforcement learning module that illustrates the interactive learning process of the term “forward,” effectively encapsulating the closed loop of “term guidance-contextual explanation-task practice-instant feedback-concept reproduction.” At the outset of the dialogue, the robot introduces the concept of “reward” by posing a brief inquiry to elicit students’ prior knowledge. If pupils do not comprehend the term, the system will automatically access the knowledge base to provide the English meaning and supplementary contextual explanations to enable comprehensive conceptual grasp.

The system subsequently generates real-time task exercises, such as the True/False questions seen in the image, to assess students’ comprehension of the context in which the phrases are utilized. Upon submission of their answers, the system delivers instantaneous feedback, reinforcing accurate impressions or rectifying misconceptions based on the outcomes. The algorithm simultaneously classifies the “reward” phrases into the “Mastered Terms” list, and if a comprehension discrepancy is detected, it is documented in the “Confusing Terms” section for future evaluation. Identified discrepancies are documented in the “confusing terms” section for later examination. Upon job completion, the system offers keywords for the subsequent module, informed by the learning progress, such as “value function,” to assist students in constructing a knowledge network connecting terms.

The interface displays the learning process in a straightforward dialogue format, integrating graphical feedback results and terminology classification, which significantly improves the visualization and interactivity of terminology instruction. It showcases the system’s capacity to facilitate multi-dimensional synergies among terminology, language skills, and cognitive hierarchies in CLIL teaching contexts.

Moreover, to augment the flexibility and anthropomorphic quality of the interaction, the system permits students to pose queries freely during the dialogue, prompting the robot to dynamically modify the work trajectory or the profundity of term elucidation based on the keywords. This technique significantly improves the initiative and personalization of the learning process, while also demonstrating the system’s adaptability and feedback capacity as an educational tool.

3.5. Overview of the Teaching System’s Configuration

The system utilizes a web application design that separates the front-end from the back-end, with essential services hosted on the cloud. The back-end and AI service layer, encompassing the fundamental dialogue logic and natural language processing functionalities, is facilitated by the OpenAI GPT-4 API (gpt-4-0125-preview). The dialogue history management employs the Pinecone vector database (index type: starter) for the storage of dialogue embedding vectors, complemented by the SQLite relational database for the preservation of session metadata, user learning logs, and structured terminology libraries. The back-end application layer employs Python 3.9+ and the FastAPI framework to develop RESTful API services, managing user requests, orchestrating several modules, and invoking AI services. A reactive single-page application is developed utilizing the Vue.js framework in the front-end interaction layer, offering a dialogue-like interface as illustrated in Figure 2. All service containers have been encapsulated and deployed on cloud servers compatible with Docker, accessible via a standard web browser.

The system workflow is centered on the cycle of “user input-system processing-multimodal feedback”. The front end transmits user input (text) to the back endback-end API, which subsequently executes dialogue receiving and preparation tasks. The backend initially does fundamental cleaning by eliminating special characters. The preprocessed text is forwarded to a specialized classification prompt to ascertain whether the user’s objective pertains to term recognition or intent classification. This prompt necessitates the GPT-4 model to execute two functions: (a) ascertain if the input includes terminology from the established AI lexicon database; (b) evaluate whether the user’s aim is “term inquiry,” “task execution,” or “free question.”

The algorithm aggregates the last dialogue prompt words and generates a response based on contextual interpretation, informed by intent and identified phrases. The system utilizes OpenAI’s text-embedding-ada-002 model to convert the last interaction of the current conversation into a 768-dimensional vector. The vector is sent to the Pinecone index for a similarity search (top_k = 5), obtaining the most relevant prior conversation segments related to the present query. The recalled historical backdrop is integrated at the outset of the last question as “dialogue memory” to ensure the coherence of the response. The task flow control integration maintains a finite state machine (FSM) based on the instructional script. The system identifies the appropriate node (e.g., “show examples,” “ask questions,” “provide feedback”) to activate in the FSM according to the current user state (e.g., the module in progress, completed exercises) and the input content, subsequently incorporating the pertinent instructions into the prompt sent to GPT-4. The system performs the final stage of structured output and post-processing, requiring GPT-4 to generate the response in precise alignment with the specified format. After parsing the JSON, the backend enriches it with Chinese explanations sourced from the term interpretation level in the local terminology database, updates the final content and the learning log in the SQLite database, and then returns it to the frontend for presentation.

4. Teaching Experiment

4.1. Teaching Experiment Design

The proliferation of AI courses in higher education has rendered the enhancement of non-computer science students’ comprehension of fundamental terminology a critical objective of contemporary AI pedagogical reform. This work proposes and performs a controlled teaching experiment utilizing a pre-test and post-test to evaluate the efficacy of the proposed AI chatbot system grounded in the CLIL concept within the “Introduction to Artificial Intelligence” course, as illustrated in Figure 3. The primary aim of the experiment is to assess the overall teaching efficacy of the system for terminology mastery, knowledge comprehension, and learning motivation, particularly for non-computer science students.

A total of 98 undergraduate students were recruited for this experiment, encompassing both computer-related majors (e.g., computer science and technology, data science and big data technology) and non-computer majors (e.g., agricultural engineering, food science, and economic management). All participants were in their freshman to junior years of college, had not undergone comprehensive training in AI courses, and possessed fundamental English reading skills. The study categorized students into two subgroups, computer science and non-computer science, based on their professional backgrounds, and subsequently assigned them randomly to the experimental group (utilizing the AI chatbot system) and the control group (employing traditional PPT-recorded course PDF materials) for comparative instruction within each group. Table 7 displays the quantity number of subgroups to guarantee balanced and comparative experimental conditions. A total of 98 people were involved, comprising 52 males and 46 females. Their ages varied from 18 to 22 years, with a mean age of 20.1 ± 1.2 years. Convenience sampling was utilized, and participants were randomly categorized based on their professional backgrounds. The data were gathered via an online questionnaire platform and required completion within a specified time frame. Throughout the two-week experimental intervention period, neither group of students engaged in any live online or offline classes. They utilized many internet resources for independent study. The primary objective of this experiment is to evaluate the disparities in efficacy between two instructional mediums when utilized to provide same content and necessitate equivalent cognitive engagement. The learning activities and assessment questions presented to both the experimental and control groups are entirely congruent regarding the extent of terminology, cognitive complexity, and ultimate learning objectives. The sole variable that is consistently regulated is the interplay between knowledge dissemination and application.

The experiments are organized around the three core modules of the AI curriculum, namely supervised learning, neural network and reinforcement learning. The experimental period is two weeks, and all students receive a unified knowledge pre-test before the experiment to check their initial terminology mastery. In the experimental phase, the experimental group accessed the AI chatbot system through the webpage to complete the modular task-based learning and terminology interaction, while the control group completed the same content in the form of instructor-recorded videos, PPT lectures, and accompanying PDF materials. The experimental design steps are shown in Table 8.

At the end of the teaching session, all students completed the same terminology mastery and conceptual understanding post-test questionnaire, while the experimental group additionally filled out the system interaction experience questionnaire to evaluate the system’s interface friendliness, terminology clarity and task fluency. This experimental design not only allows quantitative comparison of the differences in terminology mastery between the two teaching paths, but also further analyzes the adaptability and teaching potential of the CLIL-oriented dialogue system for students from different disciplinary backgrounds, providing empirical support for the interdisciplinary AI education model.

4.2. Assessment Tools and Data Collection Methods

This study employed two evaluation instruments to systematically assess the influence of the proposed AI chatbot teaching system on terminology acquisition and student experience. A knowledge post-test questionnaire measuring AI terminology proficiency was administered to both the experimental and control groups, while a system experience evaluation questionnaire was utilized exclusively for the experimental group to gather subjective feedback. Both types of surveys were disseminated via an internet system, with a stipulated response time of 15 min. This study’s entire assessment framework and design process, illustrated in Figure 4, is constructed from two dimensions: objective knowledge testing and subjective usage perception, creating a holistic assessment pathway that encompasses terminology mastery and system interaction experience.

4.2.1. AI Terminology Knowledge Post-Test Questionnaire Design

A standardized post-test questionnaire was developed and administered to thoroughly evaluate students’ understanding of essential AI terminology before and after the instructional intervention. The instrument emphasizes three principal knowledge modules—Supervised Learning, Neural Networks, and Reinforcement Learning—and evaluates students’ multidimensional cognitive performance regarding term recognition, definitional comprehension, and contextual application through diverse question formats execution.

The questionnaire has four types of questions to augment the range and depth of the assessment: Single-choice, True/False, Term-definition matching, and Contextual fill-in. The Single-choice and Judgment questions assess students’ capacity to identify fundamental characteristics and semantic classifications of terms; the Term-Definition Matching questions necessitate students to correctly associate terms with various definitions, evaluating their conceptual understanding; and the Contextual Fill-in questions involve specific contexts and terminology application, aimed at testing students’ ability to apply their knowledge to real-world conversational situations.

The system experience questionnaire employs a five-point Likert scale and includes seven dimensions (see Table 9). The selection of these criteria is intentional; they are closely linked to the essential elements of the learning process that this study aims to enhance: The precision of term definitions and the effectiveness of the context assess the system’s core function in alleviating students’ difficulties with term comprehension; the coherence of dialogue and the reinforcement of learning are associated with the quality of the system’s interaction design; the fluency of English output and the perception of personalization pertain to the focus on language environment and differentiated learning in CLIL. Thus, the data from this questionnaire can systematically clarify how the system influences these experiential factors and subsequently affects students’ learning outcomes and engagement.

4.2.2. System Experience Evaluation Questionnaire Design

The questionnaire was developed for the students in the experimental group to systematically gather their input regarding the utilization of the chatbot system and their learning experience. To ensure the standardization and comparability of quantitative results, the questionnaire employs a five-point Likert scale (1 signifies “strongly disagree,” and 5 signifies “strongly agree”), encompassing a total of 7 items that address various key dimensions, from terminology acquisition to linguistic interaction. Seven elements were established, encompassing several essential dimensions from terminology acquisition to linguistic interaction. These encompass: clarity of terminology, contextualization, acceptance of English output, logical consistency in systematic discourse, reinforcement of learning by term repetition, adaptation to individual learning speed, and overall satisfaction.

The research team redeveloped all entries by referencing existing educational technology evaluation tools, integrating CLIL teaching concepts and AI dialog system features. These entries underwent multiple reviews by teaching experts and linguists to ensure consistency and practicality regarding content coverage, expression style, and semantic clarity. The questionnaire, as a fundamental instrument for gathering subjective experiences, can elucidate students’ authentic perceptions of system interaction, human–-computer understanding, and the promptness of linguistic feedback during system usage. It is particularly effective for analyzing cognitive disparities and perceptual structural characteristics among students from diverse professional backgrounds utilizing the same educational system.

Furthermore, each student in the experimental group must complete the questionnaire prior to the conclusion of the course, following the completion of all teaching module tasks, to prevent cognitive bias from influencing their judgment. The collected data will be utilized to compute the average score of each student across the seven categories and to develop the system satisfaction evaluation index, which will facilitate later trend analysis, modeling of experiencing disparities, and satisfaction forecasting. The configuration of the pertinent questionnaire and the substance of the entries are presented in Table 10.

Based on the questionnaire scores, an average experience index will be calculated for each student, which can also be used for subsequent visualization of multidimensional perceptual trends and model satisfaction analysis. The Cronbach’s α coefficient for the post-test of the terminology knowledge assessment was 0.84, while the overall Cronbach’s α coefficient for the system experience questionnaire was 0.88, signifying that the measurement instruments exhibit strong internal consistency.

5. Experimental Results and Analysis

5.1. Pre-Test Results and Group Homogeneity Test

To guarantee that the experimental and control groups were equivalent before to the intervention, all 98 participants took a pre-test on AI terminology.

Table 11 shows the descriptive statistics for the pre-test scores. We used an independent sample t-test to determine if the groups’ baseline values were equal. The findings revealed no significant difference in pre-test scores between the experimental group (M = 52.1, SD = 10.2) and the control group (M = 50.9, SD = 9.8), t(96) = 0.62, p = 0.536. This finding demonstrates that the two groups of students had similar beginning knowledge levels of AI terminology prior to receiving various training approaches. As a result, the observed disparities in post-test scores can be attributed to the instructional intervention rather than differences in baseline ability.

5.2. Terminology Knowledge Post-Test Questionnaire Score Analysis

This study assesses the efficacy of the chatbot-based teaching system for AI terminology acquisition by comparing the post-test scores of computer science and non-computer science majors in both the experimental and control groups, as presented in Table 12.

The mean terminology test scores of students in the experimental group were significantly superior to those of the control group (86.0 ± 5.33 vs. 66.98 ± 5.6), demonstrating that the method markedly enhances terminology mastery. Subsequent analysis from a disciplinary perspective reveals that computer science majors in the experimental group achieved the highest mean score (88.6 ± 4.8), while non-computer science majors attained a marginally lower score (83.29 ± 4.5), both demonstrating commendable performance. Conversely, in the control group, the mean scores for both categories of students were comparatively lower (70.2 ± 4.44 vs. 63.62 ± 4.68, respectively), with the disparity being particularly pronounced. This outcome indicates that the chatbot system positively influenced both categories of students, demonstrating a more significant knowledge enhancement among those from non-computing backgrounds, so potentially bridging the terminology acquisition gap.

Figure 5 illustrates the distribution of scores among several student subgroups. Figure 5a illustrates the distribution trend of individual student scores within each group, revealing that the experimental group had superior performance and reduced variability, particularly among computer science majors, whose scores were predominantly clustered at 85 or above. Conversely, the control group’s data exhibited considerable fluctuation, with certain children achieving scores below 60 points. Figure 5b illustrates the box-and-line and normal distribution plots of the scores for each group, indicating that the median and interquartile range of the experimental group are more concentrated, while the control group exhibits greater variance, implying significant individual variability in terminology mastery within traditional teaching methods. The results of non-computer majors in the experimental group closely resemble those of computer majors, demonstrating the system’s transferability and broad applicability in enhancing terminology acquisition for underperforming students.

5.3. Analysis of Scores on the System Experience Evaluation Questionnaire

To further investigate the disparities in experiences among students from various disciplinary backgrounds during the utilization of the chatbot system, Table 13 presents the mean and standard deviation of the scores for computer-related majors and non-computer majors in the experimental group across the seven system evaluation dimensions.

Non-computer majors shown showed a propensity for more favorable evaluations across all dimensions, with a mean score range of 4.08–4.42 and an overall mean of 4.23 ± 0.77, greatly surpassing the mean of computer majors, which was 3.94 ± 0.79.

Non-computer science majors achieved superior scores in “Clarity of term explanations” (4.33 ± 0.76), “Coherence of dialogue flow” (4.33 ± 0.76), “Coherence of dialogue flow” (4.42 ± 0.71), and “Effectiveness of context” (4.21 ± 0.88), suggesting that the chatbot system effectively supports these students in terminology clarification and contextual aid.

Conversely, computer science students excelled in the dimensions of “Comfort with English output” (4.08 ± 0.91) and “Coherence of dialogue flow” (4.08 ± 0.76), demonstrating enhanced receptivity to verbal expression and conversational logic, yet exhibiting a relative conservatism in terminological clarity and perception of personalization. The scores of non-computer science students across several parameters are predominantly clustered in the high range, hence reinforcing the system’s capacity to facilitate terminology understanding and improve user experience for the “weak background” cohort.

This study presents a visualization of the system experience aspects, as depicted in Figure 6, to thoroughly assess the experimental group’s interactions with the chatbot system across various disciplinary contexts, utilizing the mean rating data from Table 10. Figure 6 illustrates the distribution of ratings among computer science majors across seven dimensions, revealing a consistent overall performance with minimal variation between dimensions. The mean ratings predominantly cluster between 3.8 and 4.1, suggesting a balanced overall perception of the system by this cohort, particularly regarding “Comfort with English output” (D3) and “Coherence of dialogue flow” (D4). Conversely, the non-computing majors depicted in Figure 6b exhibited more favorable feedback across the dimensions, with consistently elevated mean ratings compared to the computing majors group, and a more concentrated distribution of dimensions between 4.2 and 4.4, particularly regarding “Clarity of term explanations” (D1), “Effectiveness of context” (D2), and “Perceived personalization” (D6), suggesting that the system is more proficient in providing terminology guidance and personalized tempo control for this demographic. This suggests that the system is better suited for this cohort of students for terminology assistance and individualized pacing management. This outcome aligns with the statistical trend in Table 12, further corroborating the efficacy of the chatbot system in aiding non-majors with terminology acquisition and improving interactive participation.

We used post-test scores as the dependent variable and conducted a two-factor analysis of variance (ANOVA) on the teaching methods (chatbot/traditional) and professional background (computer/non-computer). The study found a significant main effect of the teaching approach (F(1, 94) = 152.36, p < 0.001, η² =0.618). This once again demonstrated the overall superiority of the chatbot teaching model. Computer major students scored higher overall, with a significant main impact of professional background (F(1, 94) = 18.24, p < 0.001, η² = 0.162). However, the interaction effect of teaching style and professional background was not significant, F(1, 94) = 2.15, p = 0.146. This suggests that, while the descriptive data in Table 11 showed that non-computer major students improved more in the experimental group, the effectiveness patterns of the two teaching methods on students from different majors were statistically similar and did not show statistically significant heterogeneity effects. The chatbot system had a substantial favorable impact on both student groups.

The system’s learning recorder module continually captured the students’ interactions during the procedure. This analysis focuses on the number of active questions raised by students (questions proposed by students rather than directly asked by the system) and the duration of effective talks (the time from login to active logout). These are considered objective behavioral indicators for assessing pupils’ learning initiative and adventurous spirit. The study of the experimental group students’ activity logs revealed that they shown showed strong initiative during the system learning period. Students initiated an average of 5.8 ± 3.2 active questions, exceeding the system task process’s requirement of 2 questions. Effective talks lasted an average of 42.5 ± 10.7 min, much longer than the projected minimum time to perform all predefined activities (about 25 min). These behavioral data objectively support the conclusion that pupils engage in positive learning.

6. Conclusions and Outlook

In order to solve the dual language and cognitive barriers that non-computer science students encounter when learning artificial intelligence concepts, this study developed and empirically validated an AI chatbot teaching system that incorporates the CLIL idea. The system offers a scalable and interactive terminology teaching approach at the teaching method level in addition to showcasing the instrumental innovation of AI technology in education. The results of the trial demonstrate that this system significantly improves learning engagement and terminology mastery, particularly for students with poorer disciplinary backgrounds. The system incorporates essential modules including term recognition, contextual elucidation, task facilitation, and learning documentation, enabling students to achieve term comprehension and knowledge construction via natural language interaction in both Chinese and English, while facilitating multi-tiered presentation and stage review of terms within dynamic contexts. The outcomes of the pre-test and post-test controlled experiments involving 98 students from various disciplines indicate that the chatbot teaching mode surpasses the traditional PPT recording mode in terminology mastery, learning initiative, and terminology transfer ability, particularly benefiting non-computer science students. The findings confirm the viability of integrating the CLIL concept with AI interaction technology in the instruction of AI terminology, offering a novel approach to achieving the educational objectives of language assistance and conceptual mastery in terminology-intensive courses. The suggested “CLIL-TC” framework and chatbot system improve the instructional efficacy of particular AI courses while also showcasing their interdisciplinary applicability and potential for extensive implementation. This framework fundamentally represents a domain-general model for cognitive enhancement. Its dual-track architecture and modular design suggest that it can be technically adapted to other highly term-dependent fields such as medicine, engineering, and law. In medical education, the system can be rapidly built to aid students in acquiring intricate clinical terminology and pathological ideas during simulated diagnostic conversations.

This study presents a functional prototype of a chatbot system designed to fulfill specific research objectives. While the experimental findings have substantiated the fundamental hypothesis, the system inherently possesses constraints regarding the comprehensiveness of the terminology database, the adaptability of the dialogues, and the extent of personalization. This aligns with the attributes of the initial developmental phase. Future endeavors will concentrate on augmenting and refining the system: (1) incorporating multimodal interpretative functionalities, such as the amalgamation of text and images to elucidate the neural network architecture; (2) devising more sophisticated dialogue state tracking systems to facilitate open and exploratory inquiries; (3) leveraging extensive and prolonged usage data to enhance task recommendations and feedback methodologies via reinforcement learning. We assure that future publications will include a completely open-source technological architecture, comprehensive API design papers, and an extensive performance evaluation to adequately fulfill the rigorous norms of scientific reproducibility.

This study has confirmed the beneficial impact of AI chatbots on the efficiency of terminology acquisition; however, it is essential to critically assess the inherent challenges associated with the implementation of such technology tools in education. The efficacy of the system mostly relies on the established terminology database and task scripts. When confronted with subjects outside the course parameters or intricate exploratory inquiries, there remain dangers about the authenticity and pedagogical suitability of the produced content, which may unintentionally constrain students’ divergent and critical thinking abilities. Immediate feedback in human-computer interaction can boost the impression of engagement; butbut, it may also foster a reliance on instant gratification, undermining students’ essential perseverance and resilience in confronting learning challenges. Ultimately, technology means cannot supplant the emotional support, value orientation, and immediate adaptability offered by educators grounded in extensive teaching experience. Consequently, forthcoming research and applications should prioritize the development of human-machine collaborative teaching models rather than attempting to “replace” educators with AI. The objective should be to utilize AI as an empowering instrument that enhances productivity and facilitates personalized attention, while educators maintain their essential role in guiding students towards profound contemplation, nurturing critical thinking, and promoting humanistic care.

Author Contributions

Conceptualization, Z.L. and Q.W.; methodology, Z.L.; software, Z.L.; validation, Z.L. and Q.W.; formal analysis, Z.L.; investigation, Z.L.; resources, Q.W.; data curation, Z.L.; writing—original draft preparation, Z.L.; writing—review and editing, Q.W.; visualization, Z.L.; supervision, Q.W.; project administration, Q.W.; funding acquisition, None. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data generated or analyzed during this study is included in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Singh, R.J. Transforming higher education: The power of artificial intelligence. Int. J. Multidiscip. Res. Arts Sci. Technol. 2023, 1, 13–18. [Google Scholar]
Southworth, J.; Migliaccio, K.; Glover, J.; Glover, J.N.; Reed, D.; McCarty, C.; Brendemuhl, J.; Thomas, A. Developing a model for AI Across the curriculum: Transforming the higher education landscape via innovation in AI literacy. Comput. Educ. Artif. Intell. 2023, 4, 100127. [Google Scholar] [CrossRef]
George, B.; Wooden, O. Managing the strategic transformation of higher education through artificial intelligence. Adm. Sci. 2023, 13, 196. [Google Scholar] [CrossRef]
Tenório, K.; Romeike, R. AI Competencies for non-computer science students in undergraduate education: Towards a competency framework. In Proceedings of the 23rd Koli Calling International Conference on Computing Education Research, Koli, Finland, 13–14 November 2023; pp. 1–12. [Google Scholar]
Barr, V.; Brodley, C.E.; Gunter, E.L.; Guzdial, M.; Libeskind-Hadas, R.; Manaris, B. CS + X: Approaches, Challenges, and Opportunities in Developing Interdisciplinary Computing Curricula; ACM CS202X; ACM: New York, NY, USA, 2023. [Google Scholar]
Tadimalla, S.Y.; Maher, M.L. Sociotechnical AI Education Course Design for CS Majors and Non-Majors. In Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 2, Pittsburgh, PA, USA, 26 February–1 March 2025; pp. 1631–1632. [Google Scholar]
Mbiada, A.; Isong, B.; Lugayizi, F. A Comparative Study of Computer Programming Challenges of Computing and Non-Computing First-Year Students. Indones. J. Comput. Sci. 2023, 12. [Google Scholar] [CrossRef]
Ali, M.; Dasgupta, S. “Even Though I Went Through Everything, I Didn’t Feel Like I Learned a Lot”: Insights From Experiences of Non-Computer Science Students Learning to Code. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April–1 May 2025; pp. 1–18. [Google Scholar]
Chen, H.; Wang, Y.; Li, Y.; Lee, Y.; Petri, A.; Cha, T. Computer science and non-computer science faculty members’ perception on teaching data science via an experiential learning platform. Educ. Inf. Technol. 2023, 28, 4093–4108. [Google Scholar] [CrossRef]
Mohammed, D.D.A.M.; Abd Elrahman, D.M.A.E.M. An Artificial Intelligence-Powered Program for Fostering STEM College Students’ Cognitive Academic English Language Proficiency (CALP). J. Res. Curric. Instr. Educ. Technol. 2024, 10, 345–402. [Google Scholar] [CrossRef]
Lei, Y.; Allen, M. English language learners in computer science education: A scoping review. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education, Providence, RI, USA, 3–5 March 2022; Volume 1, pp. 57–63. [Google Scholar]
Choi, J.-I.; Yang, S. Effectiveness and Design of PBL-Based Project Approach for Non-Major University Computing Courses. Appl. Sci. 2024, 15, 50. [Google Scholar] [CrossRef]
Park, S.H. A Study on Factors Influencing AI Learning Continuity: Focused on Business Major Students. J. Inf. Syst. 2023, 32, 189–210. [Google Scholar]
Almasri, F. Exploring the impact of artificial intelligence in teaching and learning of science: A systematic review of empirical research. Res. Sci. Educ. 2024, 54, 977–997. [Google Scholar] [CrossRef]
Guo, P.J. Non-native english speakers learning computer programming: Barriers, desires, and design opportunities. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; pp. 1–14. [Google Scholar]
Lorenzo, F.; Casal, S.; Moore, P. The effects of content and language integrated learning in European education: Key findings from the Andalusian bilingual sections evaluation project. Appl. Linguist. 2010, 31, 418–442. [Google Scholar] [CrossRef]
Dallinger, S.; Jonkmann, K.; Hollm, J.; Fiege, C. The effect of content and language integrated learning on students’ English and history competences–Killing two birds with one stone? Learn. Instr. 2016, 41, 23–31. [Google Scholar] [CrossRef]
Moore, P. Translanguaging in CLIL. In The Routledge Handbook of Content and Language Integrated Learning; Routledge: New York, NY, USA, 2023; pp. 28–42. [Google Scholar]
Martens, L.; Mettewie, L.; Elen, J. Looking for the i in CLIL: A literature review on the implementation of dual focus in both subject and language classrooms. Nord. J. Lang. Teach. Learn. 2023, 11, 255–277. [Google Scholar] [CrossRef]
Danilov, A.; Salekhova, L.; Yakaeva, T. Designing a dual focused CLIL-module: The focus on content and foreign language. In Proceedings of the 12th International Technology, Education and Development Conference, Valencia, Spain, 5–7 March 2018; IATED: Valencia, Spain, 2018. [Google Scholar]
Artemciuc, D. Innovative approaches to CLIL (Content and Language Integrated Learning) for enhanced teaching outcomes. In Proceedings of the National Science Conference, Philadelphia, PA, USA, 26–29 March 2025. [Google Scholar]
Lo, Y.Y. Professional Development of CLIL Teachers; Springer: Berlin/Heidelberg, Germany, 2020; Volume 102276. [Google Scholar]
Xanthou, M.A. The Development of a CLIL Teaching Model: Focus in Vocabulary Development and Content Knowledge; University of Cyprus Library: Nicosia, Cyprus, 2010. [Google Scholar]
BinSaran, A.A.A. Use of content and language integrated learning (CLIL) for teaching reading skills in Saudi universities. Engl. Lang. Teach. 2021, 14. [Google Scholar] [CrossRef]
Vu, N.T.; Burns, A. English as a medium of instruction: Challenges for Vietnamese tertiary lecturers. J. Asia TEFL 2014, 11, 1–31. [Google Scholar]
Mohammed, D.D.A.M.A.; Emran, D.A.Q.M.; Ali, D.R.F.F.; Ahmed, D.S.A.A. Examining the effectiveness of a Content-Based Instruction Approach on ESP College Students’ listening skills, motivation for learning and their Attitudes: A comparative Study in Egypt, KSA & Bahrain. J. Res. Curric. Instr. Educ. Technol. 2023, 9, 277–334. [Google Scholar]
Ngo, P.L.H. English-Medium Instruction (EMI) in Higher Education: A Case Study of an EMI Programme in Vietnam. Ph.D. Thesis, University of Southampton, Southampton, UK, 2019. [Google Scholar]
Manoharan, A.; Nagar, G. Maximizing learning trajectories: An investigation into AI-driven natural language processing integration in online educational platforms. Int. Res. J. Mod. Eng. Technol. Sci. 2021, 3, 1–10. [Google Scholar]
Mageira, K.; Pittou, D.; Papasalouros, A.; Kotis, K.; Zangogianni, P.; Daradoumis, A. Educational AI chatbots for content and language integrated learning. Appl. Sci. 2022, 12, 3239. [Google Scholar] [CrossRef]
Chang, D.H.; Lin, M.P.-C.; Hajian, S.; Wang, Q.Q. Educational design principles of using AI chatbot that supports self-regulated learning in education: Goal setting, feedback, and personalization. Sustainability 2023, 15, 12921. [Google Scholar] [CrossRef]
Ait Baha, T.; El Hajji, M.; Es-Saady, Y.; Fadili, H. The impact of educational chatbot on student learning experience. Educ. Inf. Technol. 2024, 29, 10153–10176. [Google Scholar] [CrossRef]
Rane, N.; Choudhary, S.; Rane, J. Education 4.0 and 5.0: Integrating artificial intelligence (AI) for personalized and adaptive learning. J. Artif. Intell. Robot. 2024, 1, 29–43. [Google Scholar] [CrossRef]
Huang, W.; Hew, K.F.; Fryer, L.K. Chatbots for language learning—Are they really useful? A systematic review of chatbot-supported language learning. J. Comput. Assist. Learn. 2022, 38, 237–257. [Google Scholar] [CrossRef]
Zhai, C.; Wibowo, S. A systematic review on cross-culture, humor and empathy dimensions in conversational chatbots: The case of second language acquisition. Heliyon 2022, 8, e12056. [Google Scholar] [CrossRef] [PubMed]
Wiboolyasarin, W.; Wiboolyasarin, K.; Tiranant, P.; Jinowat, N.; Boonyakitanont, P. AI-driven chatbots in second language education: A systematic review of their efficacy and pedagogical implications. Ampersand 2025, 14, 100224. [Google Scholar] [CrossRef]
Soon, G.Y.; Abdullah, N.A.C.B.; Suyan, Z.; Yiming, C. Integrating AI Chatbots in ESL and CFL Instruction: Revolutionizing Language Learning with Artificial Intelligence. LatIA 2024, 2, 23. [Google Scholar] [CrossRef]
Garamyan, A.V.; Poteryakhina, I.N.; Chernousova, Y.A.; Akopyants, A.M. Integrating AI in CLIL: The Case of Language Teaching in Higher Education. In Big Data and Artificial Intelligence for Decision-Making in the Smart Economy; Springer: Berlin/Heidelberg, Germany, 2025; pp. 427–436. [Google Scholar]
Noriega, A.O.; Costales, A.F.; Sánchez, F.H. A rubric model to analyse CLIL textbooks in Primary Education. Porta Linguarum 2024, 41, 259–277. [Google Scholar] [CrossRef]
Chen, Y.; Lin, L.; Zhu, X. Is social interaction a catalyst for digital environments? Exploring affordances of synchronous delivery in online CLIL. Interact. Learn. Environ. 2024, 33, 2689–2702. [Google Scholar] [CrossRef]
Zhu, Y.; Liu, Y.; Shu, D.; Wang, B.A. Relationship between young learners’ L2 proficiency and subject knowledge: Evaluating a CLIL programme in China. Lang. Aware. 2025, 34, 285–303. [Google Scholar] [CrossRef]
Xin, J.J.; Lo, Y.Y. Developing the CLIL teacher assessment literacy inventory. System 2025, 133, 103776. [Google Scholar] [CrossRef]
Palahan, S. PythonPal: Enhancing Online Programming Education through Chatbot-Driven Personalized Feedback. IEEE Trans. Learn. Technol. 2025, 18, 335–350. [Google Scholar] [CrossRef]
Wang, W.; Chen, X.; Miao, D.; Zhang, H.; Qin, X.; Gu, X.; Lu, P. Optimizing chatbot responsiveness: Automated history context selector via three-way decision for multi-turn dialogue Large Language Models. Eng. Anal. Bound. Elem. 2025, 173, 106150. [Google Scholar] [CrossRef]
Watts, F.M.; Dood, A.J.; Shultz, G.V.; Rodriguez, J.-M.G. Comparing Student and Generative Artificial Intelligence Chatbot Responses to Organic Chemistry Writing-to-Learn Assignments. J. Chem. Educ. 2023, 100, 3806–3817. [Google Scholar] [CrossRef]
Liu, I.; Liu, F.; Xiao, Y.; Huang, Y.; Wu, S.; Ni, S. Investigating the Key Success Factors of Chatbot-Based Positive Psychology Intervention with Retrieval- and Generative Pre-Trained Transformer (GPT)-Based Chatbots. Int. J. Hum.–Comput. Interact. 2025, 41, 341–352. [Google Scholar] [CrossRef]
Heller, B.; Proctor, M.; Mah, D.; Jewell, L.; Cheung, B. Freudbot: An investigation of chatbot technology in distance education. In Proceedings of the ED-MEDIA 2005—World Conference on Educational Multimedia, Hypermedia & Telecommunications, Montreal, QC, Canada, 27 June–2 July 2005; Association for the Advancement of Computing in Education (AACE): Waynesville, NC, USA, 2005; pp. 3913–3918. [Google Scholar]
Tallyn, E.; Fried, H.; Gianni, R.; Isard, A.; Speed, C. The ethnobot: Gathering ethnographies in the age of IoT. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; pp. 1–13. [Google Scholar]
Jia, J. CSIEC: A computer assisted English learning chatbot based on textual knowledge and reasoning. Knowl.-Based Syst. 2009, 22, 249–255. [Google Scholar] [CrossRef]
Negandhi, T.; Dhumale, P.; Vachhani, M.; Gawande, K. Bookbuddy: A mood based book recommendation system. In Proceedings of the 2023 7th International Conference On Computing, Communication, Control And Automation (ICCUBEA), Pune, India, 18–19 August 2023; pp. 1–7. [Google Scholar]
Haristiani, N.; Danuwijaya, A.; Rifai, M.; Sarila, H. Gengobot: A chatbot-based grammar application on mobile instant messaging as language learning medium. J. Eng. Sci. Technol. 2019, 14, 3158–3173. [Google Scholar]
Muratore, L.; Laurenzi, A.; Hoffman, E.M.; Tsagarakis, N.G. The xbot real-time software framework for robotics: From the developer to the user perspective. IEEE Robot. Autom. Mag. 2020, 27, 133–143. [Google Scholar] [CrossRef]
Gehl, R.W. Teaching to the Turing test with Cleverbot. Transform. J. Incl. Scholarsh. Pedagog. 2013, 24, 56–66. [Google Scholar]
Koç, F.Ş.; Savaş, P. The use of artificially intelligent chatbots in English language learning: A systematic meta-synthesis study of articles published between 2010 and 2024. ReCALL 2025, 37, 4–21. [Google Scholar] [CrossRef]

Figure 1. The operational framework of the AI chatbot used in the AI terminology instruction system grounded in CLIL-TC.

Figure 2. Chatbot interface for task-driven and term-focused AI learning.

Figure 3. Conceptual diagram of the overall experimental process.

Figure 4. Structure of Objective and Subjective Evaluations for the AI Chatbot Teaching System.

Figure 5. Post-test scores on AI terminology by academic background and instructional group. (a) Individual student scores across four subgroups. computer majors-experimental, non-computer majors-experimental, computer majors-control, and non-computer majors-control. computer majors-control. (b) Score distributions visualized using box plots overlaid with kernel density estimates (violin plots).

Figure 6. Visualization of chatbot system experience across evaluation dimensions. (a) Ratings from students in computer-related majors (n = 25); (b) Ratings from students in non-computer-related majors (n = 24).

Table 3. Core Functional Modules of the Chatbot System.

Module Name	Function Description
Dialogue Interface	Enables natural language interaction between student and bot; supports bilingual input and mainly English output with Chinese assistance when needed.
Terminology Processor	Identifies AI-related terms from dialogue and provides standard definitions, examples, and contextual explanations.
Terminology Progress Manager	Manages teaching progress based on scripted learning tasks, including terminology introduction, concept delivery, quizzes, and task summaries.
Learning Logger	Records student interaction logs and response data for teacher analysis and feedback refinement.
The CBM module	Introduce a vector database to store dialogue history and solve the problem of context breakage.

Table 4. Comparison between the CLIL-TC framework and the original CLIL framework.

Comparison Dimension	The Original CLIL Framework	The CLIL-TC Framework
The CLIL-TC framework	All-round teachers in all specialties	A chatbot that can be infinitely replicated
Teaching methods	Manually written cases	GPT-4 generates professional scenarios
Term explanation	Unified Language Interpretation	Application cases related terms
Language output	Classroom discussion	Mandatory term fill-in
The essence of teaching	Artificial experience transmission	Adaptive learning ecosystem
Case coverage	General Case	Based on professional professionally customized cases
Applicable scenarios	Small-class education	Large-scale interdisciplinary teaching

Table 5. Teaching Modules and Associated Terminology.

Module	Key Terms
Supervised Learning	label, training data, classification, regression
Neural Network	layer, activation function, backpropagation, epoch
Reinforcement Learning	Reinforcement Learning agent, environment, reward, policy, Q-learning

Table 6. Multi-Level Terminology Explanation Structure.

Explanation Layer	Content Description
Primary Definition	Standard English definition of the term based on AI knowledge conventions
Supportive Explanation	Chinese support for non-specialist learners
Contextualized Usage	Example sentence using the term in an AI-related application or dialogue context
Conceptual Relationship	Optional links to related terms or concepts (e.g., activation function → ReLU)

Table 7. Participant distribution by academic background and group assignment.

Academic Background	Experimental Group	Control Group	Subtotal
Computer-related majors	25 students	25 students	49 students
Non-computer-related majors	24 students	24 students	49 students
Total	49 students	49 students	98 students

Table 8. Table of Research Process.

Phase	Step	Specific Operations and Procedures	Task/Output
Preparation stage	1	Recruit participants, obtain informed consent, and randomly assign groups	Identified 98 participants and their groups (see Table 7)
	2	Develop and deploy an AI educational chatbot system; prepare traditional teaching materials. Complete the setup of the teaching environment.	Develop and deploy an AI educational chatbot system; prepare traditional teaching materials. Complete the setup of the teaching environment.
	3	Design and validate the pre-test/post-test sheets and the experience questionnaire. Finalize the assessment tools.	Design and validate the pre-test/post-test sheets and the experience questionnaire. Finalize the assessment tools.
Pre-test stage	4	All participants completed the AI terminology knowledge pre-test online. Baseline data was collected to assess the initial level balance.	All participants completed the AI terminology knowledge pre-test online. Baseline data was collected to assess the initial level balance.
Intervention stage	5	Experimental group: Accessed the AI chatbot through the web and completed the terminology task learning under system guidance.	Experimental group: Accessed the AI chatbot through the web and completed the terminology task learning under system guidance.
Post-test and data collection	6	Control group: Studied the same content in the form of PPT lectures, recorded videos, and PDF materials through the online platform. The two groups of students received differentiated teaching interventions for two weeks, respectively.	Control group: Studied the same content in the form of PPT lectures, recorded videos, and PDF materials through the online platform. The two groups of students received differentiated teaching interventions for two weeks, respectively.
Data analysis stage	7	All participants completed the same AI terminology knowledge post-test online. Main results data on terminology mastery were collected.	All participants completed the same AI terminology knowledge post-test online. Main results data on terminology mastery were collected.
Pre-test stage	8	Only the experimental group: Completed the system experience assessment questionnaire online. Subjective learning experience data was collected.	Only the experimental group: Completed the system experience assessment questionnaire online. Subjective learning experience data was collected.
	9	Data cleaning and coding. Generate a data set for result analysis.	Data cleaning and coding. Generate a data set for result analysis.

Table 9. Structure and Scoring of the Post-Test Questionnaire for AI Terminology Knowledge.

Question Type	Example Description	Score Per Item	Number of Items	Question Type Example Description Score Per Item Number of Items
Single-choice	Which of the following is an activation function?	5 points	4	20 points
The following is an activation function	“ReLU is a supervised model.” (True/False)	5 points	4	20 points
Match AI terms with their correct definitions	Match AI terms with their correct definitions	10 points	3	30 points
Contextual fill-in	In reinforcement learning, the ____ receives rewards.	10 points	3	30 points
Total	100 points

Table 10. Structure of the System Experience Evaluation Questionnaire (5-point Likert Scale).

Evaluation Dimension	Clarity of Term Explanations
Clarity of term explanations	The system provided clear and understandable definitions.
The system provided clear and understandable definitions.	The contextual examples helped me understand technical concepts.
Comfort with English output	I felt comfortable reading English responses from the chatbot.
Coherence of dialogue flow	The chatbot’s dialogue structure was logical and coherent.
Reinforcement of learning	The follow-up use of the terms helped reinforce my understanding.
Perceived personalization	The system responded to my pace and allowed me to explore concepts flexibly.
Overall satisfaction	Overall, I found the chatbot system helpful, interesting, and engaging.

Table 11. Comparison of the results of the system experience assessment questionnaire.

Measurement Method	Group (n)	Score (Mean ± Standard Deviation)	Comparison Between Groups (Independent Sample t-Test)
Pre-test	Experimental group (n = 49)	52.1 ± 10.2	t(96) = 0.62, p = 0.536 There is no significant difference.
Pre-test	Control group (n = 49)	50.9 ± 9.8	t(96) = 0.62, p = 0.536 There is no significant difference.
Post-test	Experimental group (n = 49)	86.0 ± 5.33	t(96) = 12.34, p < 0.001 Cohen‘s d = 1.82
Post-test	Control group (n = 49)	66.98 ± 5.6	t(96) = 12.34, p < 0.001 Cohen‘s d = 1.82

Table 12. Post-test Scores on AI Terminology by Group and Academic Background.

Group	Computer Majors	Non-Computer Majors	Overall (Group-Wise)
Overall (Group-wise)	88.6 ± 4.8	83.29 ± 4.5	86.0 ± 5.33
Control	70.2 ± 4.44	63.62 ± 4.68	66.98 ± 5.6
Overall (Major-wise)	79.4 ± 10.36	73.46 ± 10.93	76.49 ± 11.0

Table 13. System Experience Ratings by Major, Experimental Group Only.

Evaluation Dimension	Computer Majors (Mean ± SD)	Non-Computer Majors (Mean ± SD)	Overall (Mean ± SD)
Clarity of term explanations	3.84 ± 0.8	4.33 ± 0.76	4.08 ± 0.81
The system provided clear and understandable definitions.	3.84 ± 0.75	4.21 ± 0.88	4.02 ± 0.83
Comfort with English output	4.08 ± 0.91	4.25 ± 0.73	4.16 ± 0.82
Coherence of dialogue flow	4.08 ± 0.76	4.42 ± 0.71	4.24 ± 0.75
Reinforcement of learning	4.0 ± 0.87	4.17 ± 0.76	4.08 ± 0.81
Perceived personalization	3.92 ± 0.76	4.12 ± 0.8	4.02 ± 0.77
Overall satisfaction	3.84 ± 0.75	4.08 ± 0.77	3.96 ± 0.76
Overall (Mean ± SD)	3.94 ± 0.79	4.23 ± 0.77	4.08 ± 0.79

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Z.; Wang, Q. The Application of AI Chatbot System Based on CLIL Concept in the Teaching of Artificial Intelligence Courses. Appl. Sci. 2026, 16, 1633. https://doi.org/10.3390/app16031633

AMA Style

Liu Z, Wang Q. The Application of AI Chatbot System Based on CLIL Concept in the Teaching of Artificial Intelligence Courses. Applied Sciences. 2026; 16(3):1633. https://doi.org/10.3390/app16031633

Chicago/Turabian Style

Liu, Ziqi, and Qian Wang. 2026. "The Application of AI Chatbot System Based on CLIL Concept in the Teaching of Artificial Intelligence Courses" Applied Sciences 16, no. 3: 1633. https://doi.org/10.3390/app16031633

APA Style

Liu, Z., & Wang, Q. (2026). The Application of AI Chatbot System Based on CLIL Concept in the Teaching of Artificial Intelligence Courses. Applied Sciences, 16(3), 1633. https://doi.org/10.3390/app16031633

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Application of AI Chatbot System Based on CLIL Concept in the Teaching of Artificial Intelligence Courses

Abstract

1. Introduction

2. Theoretical Basis and Related Work

2.1. The Fit Between CLIL Concept and AI Terminology Learning

2.2. Advantages and Limitations of Educational Chatbots

2.3. Examples of Educational Chatbot Systems

3. System Design and Functionality

3.1. General System Architecture and Design Background

3.2. Function Module Design

3.2.1. Dialogue Interface

3.2.2. Terminology Processor

3.2.3. Task Flow Controller

3.2.4. Learning Logger

3.2.5. CBM Module

3.3. Teaching Content Organization and Terminology Embedding Design

3.3.1. Teaching Module Division and Term Selection

3.3.2. Terminology Embedding and Context Trigger Mechanisms

3.3.3. Multilingual Support and Interpretation Hierarchy Design

3.4. User Interaction Process and Experience Design

3.5. Overview of the Teaching System’s Configuration

4. Teaching Experiment

4.1. Teaching Experiment Design

4.2. Assessment Tools and Data Collection Methods

4.2.1. AI Terminology Knowledge Post-Test Questionnaire Design

4.2.2. System Experience Evaluation Questionnaire Design

5. Experimental Results and Analysis

5.1. Pre-Test Results and Group Homogeneity Test

5.2. Terminology Knowledge Post-Test Questionnaire Score Analysis

5.3. Analysis of Scores on the System Experience Evaluation Questionnaire

6. Conclusions and Outlook

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI