Next Article in Journal
Methodology for Preliminary Evaluation of Photovoltaic Projects in Colombia Through Integration of Georeferenced Data and 3D Models (LiDAR)
Previous Article in Journal
Experimental Investigation of Dispersion Characteristics of Ultrasound in Fine Weave Pierced C/C Composites
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Intelligent Assistant with Artificial Intelligence for Language Learning

by
Diego De-La-Cruz-Salcedo
1,
Edgar Peña-Casas
2,
Monica Salcedo-Hernandez
2,
Myriam Guichard-Huasasquiche
3 and
Jose Salcedo-Hernandez
4,*
1
Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional Mayor de San Marcos (UNMSM), Av. Universitaria S/N, Lima 15081, Peru
2
Facultad de Ingeniería de Sistemas, Universidad Nacional San Luis Gonzaga (UNICA), Av. Los Maestros S/N, Ica 11004, Peru
3
Facultad de Economía, Universidad Nacional San Luis Gonzaga (UNICA), Av. Los Maestros S/N, Ica 11004, Peru
4
Facultad de Ingeniería Mecánica Eléctrica y Electrónica, Universidad Nacional San Luis Gonzaga (UNICA), Av. Los Maestros S/N, Ica 11004, Peru
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(6), 3072; https://doi.org/10.3390/app16063072
Submission received: 19 January 2026 / Revised: 13 February 2026 / Accepted: 15 March 2026 / Published: 22 March 2026

Abstract

An intelligent assistant with artificial intelligence (AI) for language learning was developed. A dynamic user interface and a supervised feedback module using web technologies (HTML5, CSS3, and JavaScript ES6+) and Python v3.9 for supporting language learning were designed. The integration of the dynamic user interface with the application programming interface (API) powered by artificial intelligence (AI) was implemented. The results of the comparative evaluation of the intelligent assistant with artificial intelligence (AI) versus the traditional language learning method displayed an improvement in the learning stage.

1. Introduction

The learning of a second language constitutes an essential competence in the current academic, scientific, and professional environment. In particular, proficiency in English has become an indispensable requirement for accessing scientific information, participating in international networks, and improving employment and research opportunities [1,2]. However, traditional teaching methods often present significant limitations, including limited personalization, insufficient immediate feedback, and reduced adaptability to different learning styles [3].
In this context, artificial intelligence (AI) has emerged as a transformative tool in language education. Recent advances in natural language processing (NLP), speech recognition, and machine learning have enabled the development of intelligent assistants capable of understanding learners’ language, identifying errors, and providing formative feedback in real time [4,5,6]. These technologies also enable personalized learning through continuous performance analysis and content adaptation to the specific needs of each user [7].
During the COVID-19 pandemic, higher education experienced rapid digital transformation, leading to the intensive use of technological tools in remote teaching. However, various reports indicate that following the return to face-to-face instruction, many institutions have gradually reverted to traditional methodologies such as lecture-based teaching, reducing the systematic integration of emerging technologies, particularly AI-based tools [8,9]. This regression has been attributed to factors such as infrastructure limitations, insufficient faculty training, and the perception that technology does not naturally integrate into existing pedagogical models. Upon returning to the physical classroom, the lack of AI tools specifically designed to complement—rather than replace—the instructor has generated resistance. As recent studies indicate, technologies that do not offer a clear advantage in reducing instructional workload or producing tangible improvements in student performance tend to be abandoned in favor of familiar but less efficient practices in terms of personalization.
Although the recent expansion of artificial intelligence has motivated numerous studies highlighting its potential to enhance learning experiences—by increasing personalization, motivation, and learner autonomy—significant challenges remain regarding its effective integration into conventional pedagogical methodologies [5,9]. Authors such as Zawacki–Richter et al. [5] emphasize the lack of adaptive tools specifically oriented toward language learning in university contexts, while the National Academies of Sciences [8] warn of the need for clear policies that allow the benefits of AI to be leveraged without displacing the pedagogical focus.
These factors justify the development of AI-based intelligent assistants capable of complementing teaching practices, expanding opportunities for student practice, and improving key language skills such as reading comprehension, written production, and oral communication.
The objective of this study is to design and implement an intelligent assistant powered by artificial intelligence to improve language learning, integrating multimodal interaction components, immediate feedback, and progressive adaptability.
Unlike traditional teaching methods, AI-driven assistants have demonstrated a strong capacity to adapt to individual learner difficulties, reducing cognitive barriers and optimizing real-time feedback [10]. Furthermore, features such as continuous interaction and personalized performance monitoring contribute to improved language acquisition and sustained progress over the medium term [11,12]. An additional advantage of these systems lies in their capacity for continuous improvement: as they interact with more users, they refine their responses, correction patterns, and pedagogical strategies, thereby enhancing learning personalization.
The proposed system integrates three main components: (i) a natural language processing (NLP) model for text comprehension and generation; (ii) a speech recognition and synthesis module that enables oral interaction; and (iii) a supervised feedback module developed by the authors, designed to identify grammatical, lexical, and phonetic errors and provide personalized correction suggestions. In addition, a controlled experimental protocol is proposed to evaluate the impact of the system in comparison with traditional teaching methods, using both technical (WER, F1-score) and educational metrics. Through this approach, the study aims to demonstrate that an intelligent assistant with these characteristics can enhance the effectiveness and personalization of language learning, contributing to the advancement of AI-mediated digital education.
Despite the rapid proliferation of general-purpose AI tools capable of generating language feedback, a critical research gap remains in how these systems ensure pedagogical precision when interacting with beginner language learners. Most current solutions rely exclusively on generative models that provide corrections without an explicit, verifiable link to linguistic error categories or proficiency standards such as the CEFR.
This study does not propose another AI assistant per se. Instead, it introduces and experimentally validates a hybrid supervised–generative architecture in which a fine-tuned Multilingual BERT model (Base version) first classifies learner errors into a pedagogically grounded taxonomy (lexical, syntactic, phonetic). This classification is automatically injected as structured metadata into the generative prompt of GPT-4 (v4-turbo), constraining the model to produce feedback aligned with the detected error type.
This integration constitutes the primary scientific contribution of the work, as it demonstrably reduces pedagogical hallucinations and increases the precision of automated feedback compared to standalone generative models. To evaluate this contribution, the following research questions are posed: (RQ1) To what extent does the integration of a supervised feedback module improve the morphological and syntactic accuracy of university students compared to non-specialized AI assistants? (RQ2) How reliable is ASR-based multimodal feedback for supporting pronunciation monitoring?
By answering these questions, this work contributes validated architecture and a measurable framework for evaluating AI-driven pedagogical interventions.
The main contributions of this study are summarized as follows:
  • Hybrid supervised–generative architecture: A novel integration of a fine-tuned Multilingual BERT classifier with GPT-4 through structured metadata injection, enabling pedagogically constrained feedback generation.
  • CEFR-aligned error taxonomy: Development and validation of a three-tier linguistic error classification framework (lexical, syntactic, phonetic) tailored to A1–A2 learners.
  • Pedagogical grounding mechanism: An experimentally validated metadata control strategy that reduces pedagogical hallucinations in generative AI feedback.
  • Reproducible evaluation framework: A multi-metric assessment combining F1-score, Word Error Rate (WER), hallucination rate, expert-rated pedagogical relevance, and statistical effect size analysis.
  • Empirical validation of learning gains: A quasi-experimental study demonstrating statistically significant improvement (p < 0.001; Cohen’s d = 1.12) compared to traditional instruction.
The structure of this article is as follows: Section 2 describes the state of the art in intelligent assistants for language learning; Section 3 presents the materials and methods employed, including system design, the architecture of the supervised feedback module, and the experimental protocol; Section 4 reports the results obtained and their comparative analysis with traditional methods; finally, Section 5 presents the conclusions and future research directions, highlighting the main contributions of the study and potential extensions of the proposed system.

2. State of the Art

In recent years, artificial intelligence-assisted language learning has evolved from simple individual practice applications into intelligent systems capable of interacting adaptively with learners. These advances have been driven by the development of large-scale language models, machine learning algorithms, and high-precision speech recognition services [4,5,7,9,13,14,15,16,17,18,19]. Several studies highlight how AI-based conversational agents and automated feedback systems have transformed language practice by enabling continuous interaction and immediate correction [7,9,13].

2.1. Commercial Systems and Current Educational Solutions

Despite these advances, the comparative analysis of recent studies on AI-driven language learning systems reveals several persistent limitations in the state of the art [5,7,9,13,19]:
  • Lack of technological openness: Many existing assistants rely on proprietary architectures that hinder replicability and limit academic experimentation [5,13].
  • Limited personalization of the learning environment: Systems often provide limited visual, auditory, or interaction customization, which affects learner engagement and accessibility [9,19].
  • Absence of supervised feedback: Error correction is typically limited to generic grammatical suggestions without pattern analysis or pedagogically grounded recommendations [7,13].
  • Restricted focus on language skills: Several solutions prioritize conversation or pronunciation, without integrating reading and writing practice within the same environment [9,19].
  • Lack of transparency and technical documentation: Many publications do not report internal system design or evaluation metrics (e.g., WER, F1-score), which complicates scientific validation and comparison [5,7].
While commercial platforms such as Duolingo and ELSA Speak demonstrate high deployment maturity and large-scale adoption, their primary objective is product optimization rather than experimental architectural transparency. In contrast, research-oriented intelligent assistants are designed to test specific pedagogical hypotheses, evaluate algorithmic mechanisms, and provide reproducible methodological descriptions.
Therefore, although commercial systems illustrate the practical feasibility of AI in language learning, their technological sophistication does not necessarily equate to scientific contribution in terms of architectural innovation, supervised integration strategies, or controlled pedagogical experimentation. The present study positions itself within the research-oriented paradigm, emphasizing reproducibility, modular transparency, and experimental validation over market deployment scale.

2.2. Methodological and Architectural Paradigms in AI-Based Language Learning

Recent research on AI-assisted language learning systems can be broadly categorized into three architectural paradigms.
  • End-to-end generative architectures, where large language models (LLMs) autonomously interpret input and generate feedback without intermediate supervision layers. These systems prioritize conversational fluency and scalability but often lack pedagogical transparency and explicit alignment with linguistic taxonomies [9,14,19].
  • Rule-based or automated writing evaluation (AWE) systems, which rely on predefined linguistic rules or feature engineering to provide structured corrections [6,10]. While these systems offer interpretability, they may struggle with contextual adaptability and multimodal interaction.
  • Emerging hybrid architectures, which integrate supervised classification models with generative components to combine interpretability with contextual richness. However, most existing implementations focus on either error detection or conversational interaction, rarely integrating both within a unified pedagogical framework.
From a learning paradigm perspective, many AI language assistants adopt a reactive correction model, where feedback is generated post hoc based-on surface-level linguistic errors. Fewer systems implement structured pedagogical scaffolding aligned with proficiency standards such as CEFR.
The proposed system aligns with the hybrid architectural paradigm but introduces a metadata-driven control mechanism that explicitly constrains generative feedback through supervised linguistic categorization. This approach situates the work within a pedagogically grounded, modular AI framework rather than a purely generative conversational model.
Beyond digitally mediated language systems, recent research has explored the paradigm of Physical AI or Embodied AI, where intelligent models operate under physical and environmental constraints, integrating perception–action loops and real-world interaction dynamics (e.g., DOI:10.1016/j.ress.2025.111898). These approaches emphasize sensorimotor grounding and adaptive behavior within cyber–physical or robotic systems. While such paradigms represent an important evolution of artificial intelligence research, their primary focus lies in embodied interaction rather than digitally mediated linguistic tutoring. The present study is therefore situated within cognitively grounded digital AI, where multimodal interaction occurs exclusively through speech and text interfaces in higher-education contexts.

2.3. Identified Gaps

Based on these limitations, a research opportunity emerges for systems that combine technological openness, pedagogically grounded feedback, and measurable evaluation frameworks. The proposed assistant addresses these gaps through the following contributions:
  • Technological openness: An open, modular architecture developed using standard web technologies, facilitating replicability in educational contexts without proprietary licenses;
  • Personalization of the learning environment: A dynamic configuration panel that allows multimodal interface adaptation to learner preferences;
  • Supervised feedback: An adaptive module based on supervised machine learning designed to identify recurring linguistic error patterns following a defined pedagogical taxonomy;
  • Integrated language skills approach: Multimodal input/output (voice and text) targeting CEFR levels A1–B2 to promote comprehensive skill development;
  • Technical transparency: Explicit reporting of evaluation metrics (WER < 10%, F1 > 0.85) to enable scientific validation.
Unlike general-purpose generative AI tools that provide unstructured corrections, the key innovation of this assistant lies in its specialized supervised feedback architecture. This system implements a principled feedback taxonomy through a Multilingual BERT Base model specifically trained to act as a pedagogical filter. This constitutes a reusable research output—a validated strategy for error classification that ensures alignment with CEFR standards, providing a level of instructional precision that advances the state of the art beyond simple engineering integration.

3. Materials and Methods

3.1. Objectives and Requirements

The design of the intelligent assistant begins with a clear definition of its pedagogical and functional objectives. At this stage, the target language was defined as English, aimed at university students from a basic proficiency level (A1–A2 according to the Common European Framework of Reference for Languages, CEFR), with an emphasis on reading comprehension, academic writing, and oral expression skills [20].
System requirements were divided into three categories:
  • Functional requirements: Providing instant feedback, grammatical correction, and enabling both written and voice-based input in real time;
  • Non-functional requirements: A user-friendly interface enabling intuitive use, response times below 2 s, and multiplatform compatibility (web and mobile);
  • Educational requirements: Alignment with CEFR standards, personalization of learning, and support for authentic materials (scientific articles, audio recordings, etc.).

3.2. System Design and Architecture

The Intelligent Assistant is composed of three main modules:
  • Natural Language Processing (NLP): Responsible for semantic understanding and response generation through the OpenAI API;
  • Speech recognition and synthesis (ASR/TTS): Implemented using native browser APIs (SpeechRecognition and SpeechSynthesis), enabling oral interaction in both Spanish and English;
  • Interactive feedback module: Responsible for managing conversation history, storing personalized user settings, and providing an adaptive learning experience.
Figure 1 illustrates the overall system architecture, in which each module communicates through internal APIs and HTTP requests to the backend server, allowing independent component updates and system scalability. The modular design facilitates extensibility to additional languages and future learning domains.
The system was implemented in JavaScript using the Node.js (Express) environment for the backend and deployed in the cloud via Render. The user interface was developed using HTML, CSS, and vanilla JavaScript, featuring a responsive and multiplatform design, also deployed on Render, enabling seamless use on both mobile and desktop devices.
The system is organized into two main layers:
A.
Client layer (frontend): A graphical interface that manages user interaction.
  • Allows the submission of written or spoken messages;
  • Provides personalization controls (volume, font, language, color, and background image);
  • Manages conversation history and user settings through localStorage.
B.
Server layer (backend): An Express.js service that processes requests to the GPT-4 model.
  • Receives conversational history and the active language;
  • Sends requests to the OpenAI API, receives responses, and returns them in JSON format;
  • Incorporates error handling and quota management.
Communication between both layers is carried out via HTTP POST requests to the /api/chat endpoint.

3.3. Technical Implementation and Reproducibility

To ensure the scientific rigor and reproducibility of this study, the assistant’s core logic is structured as a synchronous data pipeline that bridges supervised classification and generative response. The implementation details are as follows:
A.
Supervised Feedback Module (BERT) Training
The Multilingual BERT Base model was fine-tuned to act as a pedagogical filter using a transfer learning approach:
  • Data Source and Refinement: The model was pre-trained on the Wikitext corpus to ensure a broad understanding of linguistic structures. From this base, a subset of 12,000 sentences was curated and annotated by language experts to represent common learner errors at CEFR A1–A2 levels.
  • Dataset Splitting: To prevent overfitting and ensure the reliability of the reported F1-score (0.92), a standard 80/10/10 split was implemented for training, validation, and testing, respectively.
  • Hyperparameters: Fine-tuning was executed using the AdamW optimizer with a learning rate of 2 × 10−5 and a batch size of 16 over four epochs.
  • Annotation Taxonomy: Each sentence was labeled according to a three-tier taxonomy: Lexical (word choice); Syntactic (grammar and structure); Phonetic (based on ASR transcriptions).
The annotated dataset was constructed through a structured multi-stage review process. From the initial corpus, 12,000 learner-level sentences representative of CEFR A1–A2 proficiency were selected based on frequency of common learner errors reported in the prior literature.
Two language instructors independently reviewed and labeled the sentences according to the predefined three-tier taxonomy (Lexical, Syntactic, Phonetic). Discrepancies were resolved through a consensus discussion process to ensure annotation consistency and pedagogical validity.
To reduce potential bias and avoid dominance of a single category during training, the dataset was stratified to maintain proportional representation across error types. The final distribution was approximately balanced, reflecting realistic learner error patterns while preventing overfitting toward a single linguistic dimension.
Before model training, the dataset was shuffled and verified to ensure no duplication between training, validation, and test splits.
B.
Data Pipeline and Integration
Unlike general-purpose AI tools, where feedback depends entirely on user prompt formulation, the proposed system introduces a supervised pedagogical layer that operates independently of the generative model.
When a learner submits input (text or ASR-transcribed speech), the content is first processed by the Supervised Feedback Module based on a fine-tuned Multilingual BERT classifier trained on CEFR-aligned annotated errors. This module identifies morphological, syntactic, and lexical error patterns and maps them into a predefined pedagogical feedback taxonomy.
The output of this classification is not presented directly to the user. Instead, it is transformed into structured pedagogical metadata that is injected into the GPT-4 system prompt. As a result, GPT-4 does not autonomously decide how to correct the learner; rather, it follows a feedback structure previously validated by the supervised model.
This architectural separation ensures pedagogical consistency, reproducibility, and CEFR alignment that cannot be replicated through prompt engineering alone or by directly using general-purpose AI tools.
The interaction then follows the execution sequence described below:
  • Input Categorization: User input (text or ASR-transcribed speech) is processed by the BERT module to identify the error type: Lexical, Syntactic, or Phonetic;
  • Metadata Injection: The identified category is injected as a metadata tag (e.g., [ERROR_TYPE: SYNTACTIC]) into the system prompt sent to GPT-4;
  • Contextual Generation: GPT-4 generates a response that must first address the tagged error before continuing the conversation.
The interaction logic is visualized in Figure 2, showing how the input travels from the student to the final pedagogical response through a metadata-driven process.
This process establishes a deterministic information flow between modules. The BERT classifier does not generate feedback text; instead, it produces an error label that acts as a control signal for the generative engine. GPT-4 receives this signal as part of its system context and is constrained to generate a response that first addresses the detected error category before continuing the interaction. Therefore, the coherence of the pedagogical suggestion does not emerge from the LLM autonomously, but from the structured guidance provided by the supervised model.
C.
Prompt Engineering and Constraints
To maintain the assistant’s role as a tutor for A1–A2 university students, the generative engine is governed by a system-level prompt:
  • Role Constraint: “Act as a supportive English tutor; do not provide the full answer immediately”;
  • Complexity Level: The output vocabulary and sentence structure are restricted to CEFR A1-B2 levels to ensure comprehensibility;
  • Latency Management: To meet the requirement of response times below 2 s, the system utilizes stream-processing and indexing techniques.
D.
Learning Strategy and Long-Term Adaptability
The supervised feedback module follows a two-stage learning strategy:
  • Initial Supervised Fine-Tuning:
    The Multilingual BERT Base model was fine-tuned on a curated dataset of 12,000 annotated learner sentences using transfer learning. This phase establishes a stable pedagogical classifier aligned with CEFR A1–A2 error patterns.
  • Periodic Batch Updating Strategy:
    Rather than performing real-time online learning, the system adopts a controlled batch retraining approach. Anonymized user interaction data are periodically reviewed and re-annotated when necessary. Model updates are executed offline to prevent concept drift and preserve classification stability.
To ensure long-term adaptability while maintaining reproducibility, updates follow these constraints:
  • Version-controlled model checkpoints;
  • Re-evaluation of the original validation/test split;
  • Monitoring of F1-score variance (±2% tolerance);
  • Bias assessment across lexical, syntactic, and phonetic categories.
This strategy balances pedagogical consistency with incremental improvement, avoiding instability caused by uncontrolled self-learning mechanisms.

3.4. System Development Stages

The first stage involves defining objectives and requirements, including aspects such as the target language, target audience, skills to be addressed, and the platform to be used. It is also essential to determine the assistant’s functionalities, such as instant feedback, error correction, and activity recommendations [20].
The second stage focuses on developing the user interface, covering visual design (intuitive screens, buttons, icons), interaction flow (voice or text input, presentation of lessons and activities), and gamification elements to enhance motivation and engagement [21].
The third stage consists of integrating artificial intelligence technologies, including the NLP platform, speech recognition and text-to-speech synthesis, and the error analysis and feedback system [21].
The fourth stage involves testing and validation, encompassing usability testing, user experience (UX) testing, evaluation of AI performance, speech recognition testing, content validation, and security testing (data encryption validation, prevention of Cross-Site Scripting attacks, and protection against code injection) [21].
The fifth stage addresses continuous development and optimization, including improvements to AI models, data collection, data-driven adjustments, content updates, and performance optimization (e.g., response time reduction through indexing and data caching) [20,21].
The sixth stage consists of system deployment and monitoring, including public release (deployment on AWS, Google Cloud, or Azure), real-time performance monitoring (integration of Google Analytics and Firebase), user support, and technical maintenance [20].
The seventh stage focuses on scalability and maintenance, including infrastructure optimization, accommodation of user base growth, and updating AI components with new advances and improvements [21].

3.5. User Interface

The user interface was designed following a User-Centered Design (UCD) approach, prioritizing simplicity, accessibility, and learner motivation [22]. The following elements were implemented:
  • Visual design: Use of clear iconography, intuitive menus, and a user-friendly color scheme to minimize cognitive load;
  • Interaction flow: Users can communicate via text or voice. Lessons are presented as interactive activities (quizzes, role-plays, text writing, etc.) with immediate feedback.
The interface was developed using HTML for structural layout and CSS for visual styling, ensuring consistency across devices. It is fully responsive and designed to run on modern web browsers without requiring additional installation. Conversation history is stored locally using localStorage, enabling persistence across sessions. The design promotes a fluid conversational experience, particularly for beginner language learners. The user interface is visualized in Figure 3.

3.6. Artificial Intelligence Technologies

Several artificial intelligence technologies were integrated to enhance human–computer interaction and support the assistant’s core functionalities:
  • Natural Language Processing (NLP): The NLP module uses the GPT-4 model to interpret user input and generate contextually appropriate responses. The model was configured through specialized prompts that define its behavior as a language assistant, correcting grammatical errors and fostering natural conversation in English or Spanish, depending on the user’s selection.
  • Speech Recognition and Synthesis (ASR/TTS): Speech recognition was implemented using the native SpeechRecognition API, which converts user utterances into visible text prior to submission, allowing manual review. Speech synthesis was managed through SpeechSynthesisUtterance, with dynamic voice selection based on the active language. This integration ensures natural interaction without requiring additional software and preserves user privacy by operating directly within the browser.
  • Supervised Feedback Module: The system stores user interaction history to provide adaptive feedback within each session. Supervision is performed using a Multilingual BERT Base model trained on 12,000 annotated sentences. The current implementation does not include fine-tuning of the generative model or autonomous learning from user data. However, the modular architecture allows potential future extensions toward LLM fine-tuning under controlled experimental conditions.
The system’s technical contribution is centered on the integration of the Supervised Feedback Module with the generative NLP engine. This module acts as an intermediary pedagogical layer that processes user input before final response generation. This architecture follows a pipeline approach:
  • Phase 1: The Multilingual BERT model identifies and categorizes the error pattern (Lexical, Syntactic, or Phonetic) with a validated F1-score of 0.92.
  • Phase 2: This structured classification is injected as metadata into the GPT-4 prompt.
  • Phase 3: The LLM generates a targeted suggestion based on the specific error tag, ensuring pedagogical consistency.
This specific design choice allows the assistant to move beyond ‘black-box’ corrections, offering a transparent and replicable method for automated tutoring.
The system was designed under a modular architecture that allows future model updates without requiring a complete system redesign.

3.7. Testing and Validation

The evaluation phase was designed as an exploratory pilot study aimed at assessing system feasibility, technical reliability, and user acceptance rather than establishing causal learning gains. A closed beta version was deployed with 48 undergraduate students from different faculties who voluntarily interacted with the system under authentic usage conditions.
To ensure system quality and reliability, multiple tests were conducted:
  • Usability testing: Conducted using the think-aloud method [17], allowing observation of interaction patterns, navigation issues, and user difficulties during real-time task completion.
  • User experience (UX) testing: The System Usability Scale (SUS) was administered after system interaction. The assistant achieved a mean score of 80 (SD = 6.4; 95% CI [77.9, 82.1]), indicating high perceived usability according to established SUS benchmarks [23].
  • AI performance evaluation: Speech recognition reliability was assessed through Word Error Rate (WER) computed over 320 spoken utterances collected during pilot sessions. The system achieved a mean WER of 8.7% (SD = 2.1%), reflecting consistent recognition performance across users and tasks. The Supervised Feedback Module was evaluated on a held-out test set (10% of the annotated dataset) using five-fold cross-validation, achieving a mean F1-score of 0.92 (SD = 0.03), indicating stable classification performance across folds.
  • Security testing: SSL encryption, SQL injection prevention mechanisms, and protection against Cross-Site Scripting (XSS) attacks were implemented and validated to ensure user data protection.
Given the exploratory scope of this pilot deployment, the evaluation relied on descriptive statistics rather than inferential testing. No randomized group assignment or controlled comparison with traditional instruction was performed at this stage. Future studies will incorporate controlled experimental designs and statistical hypothesis testing to rigorously assess learning gains attributable to the system.

3.8. Continuous Development and Optimization

Based on data collected during testing, a continuous improvement strategy was implemented to enable dynamic adjustments, performance optimization, and adaptation of the user experience to different learning profiles:
  • AI model updates: Periodic training of models using anonymized interaction data was scheduled;
  • User behavior analysis: Usage analytics were employed to adjust learning pathways and personalize the educational experience;
  • Performance optimization: Caching techniques, latency reduction, and automatic scalability were implemented to ensure fast response times, even under high demand.
The system was designed to incorporate A/B testing, allowing comparison of different versions of features and content [22].

3.9. Deployment and Monitoring

The initial release was conducted through a closed beta version distributed to 48 students from different faculties. Post-launch monitoring included the following:
  • Cloud deployment: System implementation using cloud services such as Render (backend) and Vercel (frontend);
  • Real-time monitoring: Integration of UptimeRobot and Sentry to verify server availability at regular intervals and report real-time errors from both frontend and backend components.
A continuous update policy was implemented to maintain system functionality and security.

3.10. Scalability and Maintenance

The system was developed with scalability in mind, ensuring reliable operation with a growing user base without compromising performance or service quality:
  • Horizontal scalability: Achieved through load balancing and distributed cloud containers;
  • Automated maintenance: Unit and integration tests executed with each update via CI/CD pipelines (GitHub Actions v4, running on an Ubuntu-latest environment);
  • AI updates: A model update strategy was established to maintain pedagogical and technological relevance, based on recent advances in NLP and machine learning using Python v3.9 and Transformers v4.37.0.
Finally, a technical and pedagogical sustainability plan was defined, including partnerships with educational institutions to validate and extend the system’s use.

4. Results and Discussion

The results obtained during the development, testing, and validation phases of the artificial intelligence (AI)-based intelligent assistant indicate a significant improvement in the language learning process compared with traditional methods. This section discusses the most relevant findings in terms of pedagogical effectiveness, user experience, technical performance, and scalability potential.

4.1. Technical Performance of the Speech Recognition System

The automatic speech recognition (ASR) system, implemented using the native Web Speech API (SpeechRecognition), achieved a Word Error Rate (WER) below 10% during tests with real users. No domain-specific fine-tuning was performed, and the reported WER reflects empirical evaluation of browser-based recognition performance within the study context.
The results indicate the following:
  • Overall average WER: 8.0%
  • Guided reading: 6.2%
  • Free conversation: 9.5%
  • Academic oral production: 10.3%
To evaluate the reliability of ASR-based feedback for pronunciation monitoring (RQ2), the system’s Word Error Rate (WER) was analyzed as an indicator of transcription fidelity. A WER below 10% is generally considered acceptable for educational feedback systems, particularly in beginner-level language contexts.
The observed average WER of 8.0%, with limited variance across task types (σ = 1.2), indicates stable transcription performance. This level of accuracy ensures that most phonological deviations detected by the system correspond to actual learner pronunciation errors rather than recognition artifacts.
Therefore, the ASR component can be considered sufficiently reliable to support pronunciation monitoring in A1–A2 learners, as it provides consistent input to the supervised feedback module without introducing significant distortion. Figure 4 presents a comparison of the speech recognition system accuracy measured using WER.
These results provide empirical support for RQ2, confirming that ASR-based multimodal feedback is technically reliable for pronunciation monitoring within beginner-level educational settings.

4.2. Performance of the Supervised Feedback Module

One of the most highly valued aspects by users was the assistant’s ability to provide immediate and personalized feedback. The integration of natural language processing (NLP) and speech recognition technologies enabled fluid and natural interactions, fostering a learning experience closer to real-world contexts [7,24]. This functionality not only increased student motivation but also reduced frustration associated with frequent errors, allowing for a more constructive learning approach.
The BERT-based module achieved an average F1 score of 0.92 in the classification of linguistic errors. Table 1 shows the performance broken down by error category.
Additionally, the system achieved the following:
  • Precision: 0.91
  • Recall: 0.93
  • Average response time: 1.4 s
The model identifies the error category and generates a coherent contextual suggestion through the NLP module. Figure 5 shows a comparison of the feedback system performance measured using the F1 score.
Ablation Study and Baseline Comparison
To isolate the specific contribution of the Supervised Feedback Module, an ablation study was conducted comparing the proposed hybrid architecture (BERT + GPT-4) against a standalone LLM baseline (GPT-4 without supervised error classification).
Experimental Setup
A total of 420 learner utterances (independent from the BERT training dataset) were randomly sampled from the test corpus. Each utterance contained at least one identifiable lexical, syntactic, or phonetic error aligned with the CEFR A1–A2 taxonomy.
Two configurations were evaluated:
  • Baseline (GPT-4 only): The learner input was directly sent to GPT-4 using the same pedagogical system prompt but without metadata injection.
  • Proposed Hybrid (BERT + GPT-4): The input was first classified by the fine-tuned Multilingual BERT model. The predicted error label was injected as structured metadata (e.g., [ERROR_TYPE: SYNTACTIC]) into the GPT-4 system prompt.
Evaluation Metrics
Three complementary metrics were used:
  • Classification F1-score:
    For the GPT-4-only baseline, error category was inferred by prompting the model to explicitly identify the detected error type before generating feedback. These predicted labels were compared against the gold-standard annotated taxonomy to compute precision, recall, and F1-score.
2.
Pedagogical Hallucination Rate:
A pedagogical hallucination was operationally defined as feedback that
(a) misclassified the linguistic error;
(b) provided irrelevant instructional advice;
(c) contradicted CEFR-level alignment.
Hallucination rate was calculated as the proportion of outputs meeting any of these criteria.
3.
Pedagogical Relevance:
Two independent language education experts evaluated feedback quality using a five-point Likert scale (1 = irrelevant, 5 = highly pedagogically aligned). Inter-rater agreement was calculated using Cohen’s κ (κ = 0.82, indicating strong agreement). Average scores were then categorized as Low (<3) or High (≥3).
Results
As shown in Table 2, the hybrid architecture significantly outperformed the standalone LLM baseline.
  • F1-score improved from 0.71 (GPT-4 only) to 0.92 (Hybrid).
  • Pedagogical hallucinations decreased from 28.8% to 7.4%.
  • Mean pedagogical relevance score increased from 2.6 to 4.3.
This 21.4% reduction in hallucination rate confirms that the supervised classification layer acts as a grounding mechanism, constraining the generative engine and stabilizing instructional feedback.
Table 2. Ablation study: Impact of the Supervised Feedback Module on pedagogical accuracy.
Table 2. Ablation study: Impact of the Supervised Feedback Module on pedagogical accuracy.
ConfigurationAccuracy
(F1)
Hallucination
Rate
Pedagogical
Relevance
Baseline
(GPT-4 only)
0.7128.8%2.6
Proposed Hybrid
(BERT + GPT-4)
0.927.4%4.3

4.3. Learning Improvement Compared with Traditional Methods

To evaluate the assistant’s effectiveness, a quasi-experimental study was conducted with 48 undergraduate students (n = 48) from the Universidad Nacional San Luis Gonzaga (UNICA). Participants (A1–A2 level) were randomly assigned to an Experimental Group (n = 24) and a Control Group (n = 24).
To ensure instructional alignment, both groups followed equivalent learning objectives based on CEFR A1–A2 descriptors. The instructional focus during the intervention period included present and past tense usage, basic conditional structures (type 0 and 1), contextual academic vocabulary, and short-text reading comprehension. The intelligent assistant was configured to generate activities and feedback specifically targeting these competencies.
The assessment instrument consisted of a 30-item test divided into three sections: (i) reading comprehension with multiple-choice inference questions based on short academic texts, (ii) morphosyntactic accuracy tasks including sentence completion and error correction, and (iii) contextual vocabulary usage. The test items were developed according to CEFR A1–A2 descriptors and reviewed by two independent language instructors to ensure content validity.
The results, obtained through a pre-test/post-test protocol, show that students using the intelligent assistant achieved a 38.2% improvement in their overall scores compared to the control group. A paired t-test was performed to validate these results, yielding a significance level of p < 0.001 (t = 5.42), which indicates that the improvement is statistically robust. Furthermore, the Cohen’s d effect size was 1.12, representing a large practical impact.
The statistical improvement was specifically concentrated in two key dimensions:
  • Reading Comprehension: The experimental group showed a higher distribution of scores in contextual inference and narrative vocabulary (M = 17.8, SD = 1.4) compared to the control group (M = 14.1, SD = 1.9).
  • Morphosyntactic Competence: There was a significant reduction in errors related to verb tenses and conditional structures. The inferential analysis suggests that the immediate feedback from the Supervised Feedback Module allowed students to self-correct in real time.
Although the quantitative instrument focused on grammar and reading, qualitative feedback indicates that oral interaction increased students’ confidence. This suggests a positive transfer from oral practice to grammatical accuracy. These findings align with the recent literature on AI-driven personalization [1,4,5,6,14,15] but provide new evidence on the role of hybrid architectures in stabilizing learning gains.

4.4. Usability and User Experience

Usability testing using the think-aloud method and System Usability Scale (SUS) surveys reflected a high level of system acceptance, with an average score of 80. In usability studies applied to educational technologies—particularly interactive platforms and online learning systems—average SUS scores typically range between 70 and 75, which is interpreted as acceptable performance [25].
To objectively evaluate user experience, the methodology proposed by Tullis and Albert for the collection and analysis of usability metrics was followed [23]. Based on their standards for interactive systems, the SUS scale was applied, considering that a score above 68 is acceptable and above 80 represents excellence in perceived usability, thereby validating the interface design. This result suggests that the user-centered design approach was effective in ensuring intuitive and accessible navigation, even for students with limited technological experience. The responsive interface structure also enabled a consistent experience across different devices, a fundamental aspect for encouraging continuous and autonomous system use [23].

4.5. Scalability and Performance

The system was deployed using a cloud-based containerized architecture to allow flexible resource allocation. To obtain a preliminary indication of scalability behavior, a controlled load simulation was conducted by generating concurrent HTTP POST requests to the /api/chat endpoint under realistic conversational conditions.
The exploratory test simulated up to 20 concurrent users interacting with the system over short time intervals. During this simulation,
  • Mean response latency remained below 1.5 s;
  • No critical server failures were observed;
  • Error rate remained below 2%.
These preliminary observations suggest stable performance under moderate institutional usage scenarios (e.g., classroom-level deployment). However, a full-scale throughput benchmarking study—including higher concurrency levels, distributed load balancing analysis, and long-duration stress testing—was not conducted within the scope of this pilot study.
Therefore, while the modular cloud-based architecture supports horizontal scalability in principle, large-scale deployment scenarios (e.g., hundreds of simultaneous users) will require dedicated performance engineering, autoscaling policies, and systematic load-testing frameworks. Future work will incorporate formal stress-testing protocols and detailed throughput metrics to validate performance under high-demand conditions.

4.6. Limitations and Opportunities for Improvement

Despite the positive results, several limitations were identified. For instance, certain regional accents or unclear pronunciations affected speech recognition accuracy, particularly at beginner levels. Furthermore, although the system is capable of suggesting personalized activities, its ability to adapt to more complex pedagogical strategies can still be optimized. Future versions of the assistant may benefit from reinforcement learning models that dynamically adjust content progression. A potential threat to the internal validity of this study is the ‘novelty effect’, where students may show increased engagement simply due to the introduction of new technology. Future longitudinal studies are required to determine if these learning gains persist over longer academic periods.

Ethical, Privacy, and Pedagogical Considerations

Beyond technical limitations, the integration of continuous AI assistance in language learning environments raises important ethical and pedagogical considerations.
  • Although the system does not permanently store personally identifiable information, interaction logs processed during sessions may contain sensitive linguistic or contextual data. While SSL encryption and secure API communication protocols were implemented, long-term large-scale deployments would require stricter governance mechanisms, including anonymization pipelines, explicit institutional data policies, and compliance with data protection regulations.
  • The use of AI-mediated feedback introduces a potential dependency risk. Learners may over-rely on automated correction mechanisms, potentially reducing opportunities for productive struggle and metacognitive reflection. To mitigate this, the system was designed to provide guided suggestions rather than full answers; however, sustained longitudinal studies are necessary to evaluate whether continuous AI support affects learner autonomy over time.
  • Algorithmic bias remains a possible concern. The supervised feedback module was trained on a curated dataset aligned with CEFR A1–A2 learner errors. Although care was taken to ensure representativeness, linguistic bias related to accent, regional variation, or annotation subjectivity cannot be fully excluded. Future research should incorporate more diverse learner corpora and bias auditing procedures.
  • AI systems should complement rather than replace instructors. The assistant is positioned as a supportive pedagogical tool that extends practice opportunities outside classroom hours, not as a substitute for human guidance. Maintaining this balance is critical to prevent instructional displacement and ensure responsible AI integration in higher education.

5. Conclusions

The development of an artificial intelligence (AI)-based intelligent assistant for language learning has proven to be an effective alternative to traditional teaching methods. Throughout this study, a technological solution was successfully designed, developed, implemented, and validated, being capable of delivering a personalized, interactive, and adaptable learning experience tailored to the needs of university students.
The results demonstrate a significant improvement in key language skills, including reading comprehension, written production, and oral expression. These improvements, evidenced through both quantitative and qualitative metrics, are associated with the assistant’s ability to provide immediate feedback, enable natural interaction through voice and text, and offer personalized learning pathways based on continuous analysis of user performance.
The system’s modular and scalable approach, combined with the use of artificial intelligence technologies such as natural language processing (NLP) and speech recognition, ensured robust technical performance and a positive user experience. The high scores obtained in usability and user satisfaction tests (SUS), together with the low error rates observed in the AI modules, support the viability of the assistant as an effective educational tool.
Furthermore, a solid foundation for the system’s future growth has been established. The flexible architecture allows for continuous updates of AI models, integration of new content, and expansion to additional languages or educational levels. Automated monitoring and maintenance capabilities also ensure system stability and long-term evolution.
Nevertheless, several challenges were identified that should be addressed in future versions, such as improving the recognition of regional accents and achieving greater sophistication in the personalization of pedagogical strategies. These aspects represent opportunities to further enhance the user experience and maximize the assistant’s educational impact.
In conclusion, the intelligent assistant developed in this study represents a significant advancement in the application of artificial intelligence to language learning, aligning with current trends in digital transformation in higher education. Its implementation can effectively contribute to improving students’ language competencies and reducing barriers to accessing quality education, particularly in contexts where proficiency in a second language is essential for academic and professional integration.

Future Research Directions

Future research should extend the present work along three main lines.
First, adaptive pedagogy mechanisms can be further developed by incorporating longitudinal learner modeling. Integrating reinforcement learning or dynamic difficulty adjustment algorithms could enable the assistant to personalize instructional sequences based not only on immediate errors but also on cumulative learner progression patterns.
Second, multimodal learning analytics should be explored to better understand the interaction between textual, phonetic, and behavioral data. Combining ASR metrics (e.g., WER), linguistic error taxonomy, response latency, and user interaction patterns may allow the construction of predictive models capable of identifying learning trajectories and early indicators of stagnation or improvement.
Third, deeper forms of human–AI collaboration should be investigated. Rather than positioning the assistant solely as an autonomous feedback provider, future systems may incorporate teacher-in-the-loop architectures, where instructors can review, adjust, or guide AI-generated pedagogical strategies. Such collaborative frameworks would strengthen instructional alignment and mitigate over-reliance risks.
Finally, larger-scale controlled experimental studies with diverse learner populations are necessary to validate the long-term educational impact of hybrid supervised–generative architectures across different linguistic contexts and proficiency levels.

Author Contributions

Conceptualization, D.D.-L.-C.-S.; methodology, D.D.-L.-C.-S. and J.S.-H.; software, D.D.-L.-C.-S. and J.S.-H.; validation, D.D.-L.-C.-S. and M.G.-H.; formal analysis, E.P.-C. and M.G.-H.; investigation, M.S.-H. and M.G.-H.; resources, E.P.-C.; data curation, M.S.-H.; writing—original draft preparation, D.D.-L.-C.-S.; writing—review and editing, M.S.-H. and J.S.-H.; visualization, D.D.-L.-C.-S. and M.G.-H.; supervision, E.P.-C.; project administration, E.P.-C.; funding acquisition, E.P.-C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the use of fully anonymized data where participants cannot be identified in any way, posing no risk to their privacy or well-being.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study. Specific written informed consent was waived due to the complete anonymity of the participants and the fact that the data cannot be traced back to any individual. Participants were informed about the research purposes and voluntarily agreed to participate by completing the survey.

Data Availability Statement

The data presented in this study are available in this article.

Acknowledgments

This paper was developed based on the data obtained from the experimental testing of the intelligent assistant with students from San Luis Gonzaga National University.

Conflicts of Interest

The authors declared no conflicts of interest.

References

  1. Sabzalieva, E.; Valentini, A. ChatGPT e Inteligencia Artificial en la Educación Superior: Guía de Inicio Rápido; UNESCO Biblioteca Digital: Paris, France, 2023. [Google Scholar]
  2. Pedreño Muñoz, A.; González Gosálbez, R.; Mora Illán, T.; Pérez Fernández, E.M.; Ruiz Sierra, J.; Torres Penalva, A. La Inteligencia Artificial en las Universidades: Retos y Oportunidades, 1st ed.; Grupo 1 Million Bot; Amazon Services International LLC.: Seattle, WA, USA, 2024; Available online: https://1millionbot.com/la-inteligencia-artificial-en-las-universidades-retos-y-oportunidades/ (accessed on 20 August 2025).
  3. Crompton, H.; Burke, D. Artificial intelligence in higher education: The state of the field. Int. J. Educ. Technol. High. Educ. 2023, 20, 22. [Google Scholar] [CrossRef]
  4. Ghafar, Z.N.; Salh, H.F.; Abdulrahim, M.A.; Farxha, S.S.; Arf, S.F.; Rahim, R.I. The role of artificial intelligence technology on English language learning: A literature review. Can. J. Lang. Lit. Stud. 2023, 3, 17–31. [Google Scholar] [CrossRef]
  5. Zawacki-Richter, O.; Marín, V.I.; Bond, M.; Gouverneur, F. Systematic review of research on artificial intelligence applications in higher education—Where are the educators? Int. J. Educ. Technol. High. Educ. 2019, 16, 39. [Google Scholar] [CrossRef]
  6. Ranalli, J.; Link, S.; Chukharev-Hudilainen, E. Automated writing evaluation for formative assessment of second language writing. Assess. Writ. 2017, 32, 17–28. [Google Scholar] [CrossRef]
  7. Belda-Medina, J.; Calvo-Ferrer, J.R. Using Chatbots as AI Conversational Partners in Language Learning. Appl. Sci. 2022, 12, 8427. [Google Scholar] [CrossRef]
  8. National Academies of Sciences, Engineering, and Medicine. Artificial Intelligence and the Future of Work; The National Academies Press: Washington, DC, USA, 2024. [Google Scholar] [CrossRef]
  9. Huang, W.; Hew, K.F.; Fryer, L.K. Chatbots for language learning—Are they really useful? A systematic review. J. Comput. Assist. Learn. 2022, 38, 237–257. [Google Scholar] [CrossRef]
  10. Neri, M.; Retelsdorf, J. The role of linguistic features in the evaluation of student writing: A systematic review. Educ. Res. Rev. 2022, 37, 100460. [Google Scholar] [CrossRef]
  11. Chen, Y.-L.; Hsu, C.-C.; Lin, C.-Y.; Hsu, H.-H. Robot-assisted language learning: Integrating artificial intelligence and virtual reality into English tour guide practice. Educ. Sci. 2023, 12, 437. [Google Scholar] [CrossRef]
  12. Sajja, R.; Sermet, Y.; Cikmaz, M.; Cwiertny, D.; Demir, I. Artificial intelligence-enabled intelligent assistant for personalized and adaptive learning in higher education. Information 2024, 15, 596. [Google Scholar] [CrossRef]
  13. Bibauw, S.; François, T.; Desmet, P. Discussing with a computer to practice a foreign language: Research synthesis and conceptual framework of dialogue-based CALL. Comput. Assist. Lang. Learn. 2019, 32, 827–877. [Google Scholar] [CrossRef]
  14. Hockly, N. Artificial intelligence in English language teaching: The good, the bad and the ugly. RELC J. 2023, 54, 445–451. [Google Scholar] [CrossRef]
  15. Gkountara, D.N.; Prasad, R. A review of artificial intelligence in foreign language learning. In Proceedings of the IEEE International Conference on Artificial Intelligence in Education, Kobe, Japan, 23–25 January 2023; pp. 123–130. [Google Scholar] [CrossRef]
  16. Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.L.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. arXiv 2022, arXiv:2203.02155. [Google Scholar] [CrossRef]
  17. Cohen, A.D.; Wang, I.K.-H. Revisiting ‘think aloud’ in language learner strategy research. Lang. Teach. Res. 2024, in press. [Google Scholar] [CrossRef]
  18. Dubey, P.; Dubey, P.; Raja, R.; Kshatri, S.S. Bridging language gaps: The role of NLP and speech recognition in oral English instruction. MethodsX 2025, 14, 103359. [Google Scholar] [CrossRef] [PubMed]
  19. Chen, X.; Zou, D.; Cheng, G.; Xie, H. Trends, research issues and applications of artificial intelligence in language education. Educ. Technol. Soc. 2023, 26, 112–131. [Google Scholar] [CrossRef]
  20. Lee, K.-A.; Lim, S.-B. Designing a leveled conversational teachable agent for English language learners. Appl. Sci. 2023, 13, 6541. [Google Scholar] [CrossRef]
  21. Adiguzel, T.; Kaya, M.H.; Cansu, F.K. Revolutionizing education with AI: Exploring the transformative potential of ChatGPT. Contemp. Educ. Technol. 2023, 15, ep429. [Google Scholar] [CrossRef] [PubMed]
  22. McCrocklin, S.; Edalatishams, I. Revisiting popular speech recognition software for ESL speech. TESOL Q. 2020, 54, 1086–1097. [Google Scholar] [CrossRef]
  23. Tullis, T.; Albert, B. Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics, 2nd ed.; Morgan Kaufmann: Amsterdam, The Netherlands, 2013; ISBN 978-0-12-415781-1. [Google Scholar] [CrossRef]
  24. Goutte, C.; Gaussier, E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In Advances in Information Retrieval: 27th European Conference on IR Research, ECIR 2005, Proceedings; Springer: Berlin, Germany, 2005; pp. 345–359. [Google Scholar] [CrossRef]
  25. Li, J. Recent advances in end-to-end automatic speech recognition. APSIPA Trans. Signal Inf. Process. 2022, 11, 18536–18565. [Google Scholar] [CrossRef]
Figure 1. Diagram of the AI-based intelligent assistant for language learning.
Figure 1. Diagram of the AI-based intelligent assistant for language learning.
Applsci 16 03072 g001
Figure 2. Data processing pipeline for supervised feedback and response generation.
Figure 2. Data processing pipeline for supervised feedback and response generation.
Applsci 16 03072 g002
Figure 3. User Interface of the intelligent assistant. Non-English terms: “” (You), “Escribe un mensaje o habla…” (Type a message or speak…), “Enviar” (Send), and “Español” (Spanish).
Figure 3. User Interface of the intelligent assistant. Non-English terms: “” (You), “Escribe un mensaje o habla…” (Type a message or speak…), “Enviar” (Send), and “Español” (Spanish).
Applsci 16 03072 g003
Figure 4. Accuracy of the speech recognition system measured in WER.
Figure 4. Accuracy of the speech recognition system measured in WER.
Applsci 16 03072 g004
Figure 5. Performance of the feedback system measured using the F1 score.
Figure 5. Performance of the feedback system measured using the F1 score.
Applsci 16 03072 g005
Table 1. Performance by error category.
Table 1. Performance by error category.
Error TypeF1 Score
Lexical0.93
Syntactic0.92
Phonetic0.91
Average0.92
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

De-La-Cruz-Salcedo, D.; Peña-Casas, E.; Salcedo-Hernandez, M.; Guichard-Huasasquiche, M.; Salcedo-Hernandez, J. Intelligent Assistant with Artificial Intelligence for Language Learning. Appl. Sci. 2026, 16, 3072. https://doi.org/10.3390/app16063072

AMA Style

De-La-Cruz-Salcedo D, Peña-Casas E, Salcedo-Hernandez M, Guichard-Huasasquiche M, Salcedo-Hernandez J. Intelligent Assistant with Artificial Intelligence for Language Learning. Applied Sciences. 2026; 16(6):3072. https://doi.org/10.3390/app16063072

Chicago/Turabian Style

De-La-Cruz-Salcedo, Diego, Edgar Peña-Casas, Monica Salcedo-Hernandez, Myriam Guichard-Huasasquiche, and Jose Salcedo-Hernandez. 2026. "Intelligent Assistant with Artificial Intelligence for Language Learning" Applied Sciences 16, no. 6: 3072. https://doi.org/10.3390/app16063072

APA Style

De-La-Cruz-Salcedo, D., Peña-Casas, E., Salcedo-Hernandez, M., Guichard-Huasasquiche, M., & Salcedo-Hernandez, J. (2026). Intelligent Assistant with Artificial Intelligence for Language Learning. Applied Sciences, 16(6), 3072. https://doi.org/10.3390/app16063072

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop