MDPI - Publisher of Open Access Journals

25 pages, 19135 KB

Open AccessArticle

Development of a Multi-Platform AI-Based Software Interface for the Accompaniment of Children

by Isaac León, Camila Reyes, Iesus Davila, Bryan Puruncajas, Dennys Paillacho, Nayeth Solorzano, Marcelo Fajardo-Pruna, Hyungpil Moon and Francisco Yumbla

Multimodal Technol. Interact. 2025, 9(9), 88; https://doi.org/10.3390/mti9090088 - 26 Aug 2025

Viewed by 904

Abstract

The absence of parental presence has a direct impact on the emotional stability and social routines of children, especially during extended periods of separation from their family environment, as in the case of daycare centers, hospitals, or when they remain alone at home. [...] Read more.

The absence of parental presence has a direct impact on the emotional stability and social routines of children, especially during extended periods of separation from their family environment, as in the case of daycare centers, hospitals, or when they remain alone at home. At the same time, the technology currently available to provide emotional support in these contexts remains limited. In response to the growing need for emotional support and companionship in child care, this project proposes the development of a multi-platform software architecture based on artificial intelligence (AI), designed to be integrated into humanoid robots that assist children between the ages of 6 and 14. The system enables daily verbal and non-verbal interactions intended to foster a sense of presence and personalized connection through conversations, games, and empathetic gestures. Built on the Robot Operating System (ROS), the software incorporates modular components for voice command processing, real-time facial expression generation, and joint movement control. These modules allow the robot to hold natural conversations, display dynamic facial expressions on its LCD (Liquid Crystal Display) screen, and synchronize gestures with spoken responses. Additionally, a graphical interface enhances the coherence between dialogue and movement, thereby improving the quality of human–robot interaction. Initial evaluations conducted in controlled environments assessed the system’s fluency, responsiveness, and expressive behavior. Subsequently, it was implemented in a pediatric hospital in Guayaquil, Ecuador, where it accompanied children during their recovery. It was observed that this type of artificial intelligence-based software, can significantly enhance the experience of children, opening promising opportunities for its application in clinical, educational, recreational, and other child-centered settings. Full article

(This article belongs to the Special Issue Human-AI Collaborative Interaction Design: Rethinking Human-Computer Symbiosis in the Age of Intelligent Systems)

► Show Figures

Graphical abstract

17 pages, 6837 KB

Open AccessArticle

Mitigating LLM Hallucinations Using a Multi-Agent Framework

by Ahmed M. Darwish, Essam A. Rashed and Ghada Khoriba

Information 2025, 16(7), 517; https://doi.org/10.3390/info16070517 - 21 Jun 2025

Cited by 1 | Viewed by 5855

Abstract

The rapid advancement of Large Language Models (LLMs) has led to substantial investment in enhancing their capabilities and expanding their feature sets. Despite these developments, a critical gap remains between model sophistication and their dependable deployment in real-world applications. A key concern is [...] Read more.

The rapid advancement of Large Language Models (LLMs) has led to substantial investment in enhancing their capabilities and expanding their feature sets. Despite these developments, a critical gap remains between model sophistication and their dependable deployment in real-world applications. A key concern is the inconsistency of LLM-generated outputs in production environments, which hinders scalability and reliability. In response to these challenges, we propose a novel framework that integrates custom-defined, rule-based logic to constrain and guide LLM behavior effectively. This framework enforces deterministic response boundaries while considering the model’s reasoning capabilities. Furthermore, we introduce a quantitative performance scoring mechanism that achieves an 85.5% improvement in response consistency, facilitating more predictable and accountable model outputs. The proposed system is industry-agnostic and can be generalized to any domain with a well-defined validation schema. This work contributes to the growing research on aligning LLMs with structured, operational constraints to ensure safe, robust, and scalable deployment. Full article

(This article belongs to the Special Issue Intelligent Agent and Multi-Agent System)

► Show Figures

Figure 1

20 pages, 3901 KB

Open AccessArticle

Designing Social Robots with LLMs for Engaging Human Interaction

by Maria Pinto-Bernal, Matthijs Biondina and Tony Belpaeme

Appl. Sci. 2025, 15(11), 6377; https://doi.org/10.3390/app15116377 - 5 Jun 2025

Cited by 3 | Viewed by 2569

Abstract

Large Language Models (LLMs), particularly those enhanced through Reinforcement Learning from Human Feedback, such as ChatGPT, have opened up new possibilities for natural and open-ended spoken interaction in social robotics. However, these models are not inherently designed for embodied, multimodal contexts. This paper [...] Read more.

Large Language Models (LLMs), particularly those enhanced through Reinforcement Learning from Human Feedback, such as ChatGPT, have opened up new possibilities for natural and open-ended spoken interaction in social robotics. However, these models are not inherently designed for embodied, multimodal contexts. This paper presents a user-centred approach to integrating an LLM into a humanoid robot, designed to engage in fluid, context-aware conversation with socially isolated older adults. We describe our system architecture, which combines real-time speech processing, layered memory summarisation, persona conditioning, and multilingual voice adaptation to support personalised, socially appropriate interactions. Through iterative development and evaluation, including in-home exploratory trials with older adults (n = 7) and a preliminary study with young adults (n = 43), we investigated the technical and experiential challenges of deploying LLMs in real-world human–robot dialogue. Our findings show that memory continuity, adaptive turn-taking, and culturally attuned voice design enhance user perceptions of trust, naturalness, and social presence. We also identify persistent limitations related to response latency, hallucinations, and expectation management. This work contributes design insights and architectural strategies for future LLM-integrated robots that aim to support meaningful, emotionally resonant companionship in socially assistive settings. Full article

► Show Figures

Figure 1

20 pages, 6941 KB

Open AccessArticle

EmoSDS: Unified Emotionally Adaptive Spoken Dialogue System Using Self-Supervised Speech Representations

by Jaehwan Lee, Youngjun Sim, Jinyou Kim and Young-Joo Suh

Future Internet 2025, 17(4), 143; https://doi.org/10.3390/fi17040143 - 25 Mar 2025

Cited by 2 | Viewed by 1131

Abstract

In recent years, advancements in artificial intelligence, speech, and natural language processing technology have enhanced spoken dialogue systems (SDSs), enabling natural, voice-based human–computer interaction. However, discrete, token-based LLMs in emotionally adaptive SDSs focus on lexical content while overlooking essential paralinguistic cues for emotion [...] Read more.

In recent years, advancements in artificial intelligence, speech, and natural language processing technology have enhanced spoken dialogue systems (SDSs), enabling natural, voice-based human–computer interaction. However, discrete, token-based LLMs in emotionally adaptive SDSs focus on lexical content while overlooking essential paralinguistic cues for emotion expression. Existing methods use external emotion predictors to compensate for this but introduce computational overhead and fail to fully integrate paralinguistic features with linguistic context. Moreover, the lack of high-quality emotional speech datasets limits models’ ability to learn expressive emotional cues. To address these challenges, we propose EmoSDS, a unified SDS framework that integrates speech and emotion recognition by leveraging self-supervised learning (SSL) features. Our three-stage training pipeline enables the LLM to learn both discrete linguistic content and continuous paralinguistic features, improving emotional expressiveness and response naturalness. Additionally, we construct EmoSC, a dataset combining GPT-generated dialogues with emotional voice conversion data, ensuring greater emotional diversity and a balanced sample distribution across emotion categories. The experimental results show that EmoSDS outperforms existing models in emotional alignment and response generation, achieving a minimum 2.9% increase in text generation metrics, enhancing the LLM’s ability to interpret emotional and textual cues for more expressive and contextually appropriate responses. Full article

(This article belongs to the Special Issue Generative Artificial Intelligence in Smart Societies)

► Show Figures

Figure 1

14 pages, 310 KB

Open AccessArticle

Automated Generation of Clinical Reports Using Sensing Technologies with Deep Learning Techniques

by Celia Cabello-Collado, Javier Rodriguez-Juan, David Ortiz-Perez, Jose Garcia-Rodriguez, David Tomás and Maria Flores Vizcaya-Moreno

Sensors 2024, 24(9), 2751; https://doi.org/10.3390/s24092751 - 25 Apr 2024

Cited by 5 | Viewed by 2758

Abstract

This study presents a pioneering approach that leverages advanced sensing technologies and data processing techniques to enhance the process of clinical documentation generation during medical consultations. By employing sophisticated sensors to capture and interpret various cues such as speech patterns, intonations, or pauses, [...] Read more.

This study presents a pioneering approach that leverages advanced sensing technologies and data processing techniques to enhance the process of clinical documentation generation during medical consultations. By employing sophisticated sensors to capture and interpret various cues such as speech patterns, intonations, or pauses, the system aims to accurately perceive and understand patient–doctor interactions in real time. This sensing capability allows for the automation of transcription and summarization tasks, facilitating the creation of concise and informative clinical documents. Through the integration of automatic speech recognition sensors, spoken dialogue is seamlessly converted into text, enabling efficient data capture. Additionally, deep models such as Transformer models are utilized to extract and analyze crucial information from the dialogue, ensuring that the generated summaries encapsulate the essence of the consultations accurately. Despite encountering challenges during development, experimentation with these sensing technologies has yielded promising results. The system achieved a maximum ROUGE-1 metric score of 0.57, demonstrating its effectiveness in summarizing complex medical discussions. This sensor-based approach aims to alleviate the administrative burden on healthcare professionals by automating documentation tasks and safeguarding important patient information. Ultimately, by enhancing the efficiency and reliability of clinical documentation, this innovative method contributes to improving overall healthcare outcomes. Full article

(This article belongs to the Special Issue Intelligent Sensors for Healthcare and Patient Monitoring)

► Show Figures

Figure 1

14 pages, 1680 KB

Open AccessEditor’s ChoiceArticle

AI to Train AI: Using ChatGPT to Improve the Accuracy of a Therapeutic Dialogue System

by Karolina Gabor-Siatkowska, Marcin Sowański, Rafał Rzatkiewicz, Izabela Stefaniak, Marek Kozłowski and Artur Janicki

Electronics 2023, 12(22), 4694; https://doi.org/10.3390/electronics12224694 - 18 Nov 2023

Cited by 12 | Viewed by 6209

Abstract

In this work, we present the use of one artificial intelligence (AI) application (ChatGPT) to train another AI-based application. As the latter one, we show a dialogue system named Terabot, which was used in the therapy of psychiatric patients. Our study was motivated [...] Read more.

In this work, we present the use of one artificial intelligence (AI) application (ChatGPT) to train another AI-based application. As the latter one, we show a dialogue system named Terabot, which was used in the therapy of psychiatric patients. Our study was motivated by the fact that for such a domain-specific system, it was difficult to acquire large real-life data samples to increase the training database: this would require recruiting more patients, which is both time-consuming and costly. To address this gap, we have employed a neural large language model: ChatGPT version 3.5, to generate data solely for training our dialogue system. During initial experiments, we identified intents that were most often misrecognized. Next, we fed ChatGPT with a series of prompts, which triggered the language model to generate numerous additional training entries, e.g., alternatives to the phrases that had been collected during initial experiments with healthy users. This way, we have enlarged the training dataset by 112%. In our case study, for testing, we used 2802 speech recordings originating from 32 psychiatric patients. As an evaluation metric, we used the accuracy of intent recognition. The speech samples were converted into text using automatic speech recognition (ASR). The analysis showed that the patients’ speech challenged the ASR module significantly, resulting in deteriorated speech recognition and, consequently, low accuracy of intent recognition. However, thanks to the augmentation of the training data with ChatGPT-generated data, the intent recognition accuracy increased by 13% relatively, reaching 86% in total. We also emulated the case of an error-free ASR and showed the impact of ASR misrecognitions on the intent recognition accuracy. Our study showcased the potential of using generative language models to develop other AI-based tools, such as dialogue systems. Full article

(This article belongs to the Special Issue Application of Machine Learning and Intelligent Systems)

► Show Figures

Figure 1

14 pages, 325 KB

Open AccessArticle

Improving Abstractive Dialogue Summarization Using Keyword Extraction

by Chongjae Yoo and Hwanhee Lee

Appl. Sci. 2023, 13(17), 9771; https://doi.org/10.3390/app13179771 - 29 Aug 2023

Cited by 6 | Viewed by 3076

Abstract

Abstractive dialogue summarization aims to generate a short passage that contains important content for a particular dialogue spoken by multiple speakers. In abstractive dialogue summarization systems, capturing the subject in the dialogue is challenging owing to the properties of colloquial texts. Moreover, the [...] Read more.

Abstractive dialogue summarization aims to generate a short passage that contains important content for a particular dialogue spoken by multiple speakers. In abstractive dialogue summarization systems, capturing the subject in the dialogue is challenging owing to the properties of colloquial texts. Moreover, the system often generates uninformative summaries. In this paper, we propose a novel keyword-aware dialogue summarization system (KADS) that easily captures the subject in the dialogue to alleviate the problem mentioned above through the efficient usage of keywords. Specifically, we first extract the keywords from the input dialogue using a pre-trained keyword extractor. Subsequently, KADS efficiently leverages the keywords information of the dialogue to the transformer-based dialogue system by using the pre-trained keyword extractor. Extensive experiments performed on three benchmark datasets show that the proposed method outperforms the baseline system. Additionally, we demonstrate that the proposed keyword-aware dialogue summarization system exhibits a high-performance gain in low-resource conditions where the number of training examples is highly limited. Full article

(This article belongs to the Special Issue Application of Machine Learning in Text Mining)

► Show Figures

Figure 1

25 pages, 1000 KB

Open AccessArticle

A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers

by Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlicek and Matthias Kleinert

Aerospace 2023, 10(5), 490; https://doi.org/10.3390/aerospace10050490 - 22 May 2023

Cited by 22 | Viewed by 6686

Abstract

In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI)-based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and [...] Read more.

In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI)-based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding. Thus, it goes beyond only transcribing the communication and can also understand its meaning. The output is subsequently sent to a response generator system, which resembles the spoken read-back that pilots give to the ATCo trainees. The overall pipeline is composed of the following submodules: (i) an automatic speech recognition (ASR) system that transforms audio into a sequence of words; (ii) a high-level air traffic control (ATC)-related entity parser that understands the transcribed voice communication; and (iii) a text-to-speech submodule that generates a spoken utterance that resembles a pilot based on the situation of the dialogue. Our system employs state-of-the-art AI-based tools such as Wav2Vec 2.0, Conformer, BERT and Tacotron models. To the best of our knowledge, this is the first work fully based on open-source ATC resources and AI tools. In addition, we develop a robust and modular system with optional submodules that can enhance the system’s performance by incorporating real-time surveillance data, metadata related to exercises (such as sectors or runways), or even a deliberate read-back error to train ATCo trainees to identify them. Our ASR system can reach as low as 5.5% and 15.9% absolute word error rates (WER) on high- and low-quality ATC audio. We also demonstrate that adding surveillance data into the ASR can yield a callsign detection accuracy of more than 96%. Full article

(This article belongs to the Special Issue Automatic Speech Recognition and Understanding in Air Traffic Management)

► Show Figures

Figure 1

15 pages, 2028 KB

Open AccessArticle

Analysis of Physical Education Classroom Teaching after Implementation of the Chinese Health Physical Education Curriculum Model: A Video-Based Assessment

by Chao Liu, Cuixiang Dong, Xiaohui Li, Huanhuan Huang and Qiulin Wang

Behav. Sci. 2023, 13(3), 251; https://doi.org/10.3390/bs13030251 - 12 Mar 2023

Cited by 4 | Viewed by 4531

Abstract

This study assessed the Chinese health physical education curriculum model recently suggested to meet the recommended physical education curriculum reforms addressing the declining physical and mental health of students in China. We used video analyses of 41 physical education classroom teaching cases with [...] Read more.

This study assessed the Chinese health physical education curriculum model recently suggested to meet the recommended physical education curriculum reforms addressing the declining physical and mental health of students in China. We used video analyses of 41 physical education classroom teaching cases with a physical education classroom teaching behavior analysis system to provide quantitative and qualitative behavioral data. We established reference ranges for classroom teaching behavior indicators, summarized classroom teaching patterns, and assessed classroom discourse and the emotional climate. Notable findings included teachers in elementary schools using closed-ended questions, predictable responses, and general feedback significantly more often than teachers in senior high school, and ball sports instructors using demonstration and competition significantly more frequently than instructors in athletics. Overall, three teaching patterns were most commonly used—lecture, practice, and dialogue—with practice being dominant. Analysis of the top 50 most commonly spoken words by teachers identified five types of discourse—motivational, directive, specialized, transitional, and regulatory—with motivational words being most frequent. The classroom atmosphere was mainly positive. These findings provide evidence that the use of this curriculum model may bring positive changes to physical education classroom teaching methods in China and will inform subsequent innovative physical education classroom teaching practices. Full article

(This article belongs to the Special Issue Behaviors in Educational Settings)

► Show Figures

Figure 1

12 pages, 1436 KB

Open AccessArticle

Pre-Trained Joint Model for Intent Classification and Slot Filling with Semantic Feature Fusion

by Yan Chen and Zhenghang Luo

Sensors 2023, 23(5), 2848; https://doi.org/10.3390/s23052848 - 6 Mar 2023

Cited by 9 | Viewed by 5149

Abstract

The comprehension of spoken language is a crucial aspect of dialogue systems, encompassing two fundamental tasks: intent classification and slot filling. Currently, the joint modeling approach for these two tasks has emerged as the dominant method in spoken language understanding modeling. However, the [...] Read more.

The comprehension of spoken language is a crucial aspect of dialogue systems, encompassing two fundamental tasks: intent classification and slot filling. Currently, the joint modeling approach for these two tasks has emerged as the dominant method in spoken language understanding modeling. However, the existing joint models have limitations in terms of their relevancy and utilization of contextual semantic features between the multiple tasks. To address these limitations, a joint model based on BERT and semantic fusion (JMBSF) is proposed. The model employs pre-trained BERT to extract semantic features and utilizes semantic fusion to associate and integrate this information. The results of experiments on two benchmark datasets, ATIS and Snips, in spoken language comprehension demonstrate that the proposed JMBSF model attains 98.80% and 99.71% intent classification accuracy, 98.25% and 97.24% slot-filling F1-score, and 93.40% and 93.57% sentence accuracy, respectively. These results reveal a significant improvement compared to other joint models. Furthermore, comprehensive ablation studies affirm the effectiveness of each component in the design of JMBSF. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

23 pages, 2695 KB

Open AccessFeature PaperArticle

A Multi-Attention Approach Using BERT and Stacked Bidirectional LSTM for Improved Dialogue State Tracking

by Muhammad Asif Khan, Yi Huang, Junlan Feng, Bhuyan Kaibalya Prasad, Zafar Ali, Irfan Ullah and Pavlos Kefalas

Appl. Sci. 2023, 13(3), 1775; https://doi.org/10.3390/app13031775 - 30 Jan 2023

Cited by 6 | Viewed by 4595

Abstract

The modern digital world and associated innovative and state-of-the-art applications that characterize its presence, render the current digital age a captivating era for many worldwide. These innovations include dialogue systems, such as Apple’s Siri, Google Now, and Microsoft’s Cortana, that stay on the [...] Read more.

The modern digital world and associated innovative and state-of-the-art applications that characterize its presence, render the current digital age a captivating era for many worldwide. These innovations include dialogue systems, such as Apple’s Siri, Google Now, and Microsoft’s Cortana, that stay on the personal devices of users and assist them in their daily activities. These systems track the intentions of users by analyzing their speech, context by looking at their previous turns, and several other external details, and respond or act in the form of speech output. For these systems to work efficiently, a dialogue state tracking (DST) module is required to infer the current state of the dialogue in a conversation by processing previous states up to the current state. However, developing a DST module that tracks and exploit dialogue states effectively and accurately is challenging. The notable challenges that warrant immediate attention include scalability, handling the unseen slot-value pairs during training, and retraining the model with changes in the domain ontology. In this article, we present a new end-to-end framework by combining BERT, Stacked Bidirectional LSTM (BiLSTM), and a multiple attention mechanism to formalize DST as a classification problem and address the aforementioned issues. The BERT-based module encodes the user’s and system’s utterances. The Stacked BiLSTM extracts the contextual features and multiple attention mechanisms to calculate the attention between its hidden states and the utterance embeddings. We experimentally evaluated our method against the current approaches over a variety of datasets. The results indicate a significant overall improvement. The proposed model is scalable in terms of sharing the parameters and it considers the unseen instances during training. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

► Show Figures

Figure 1

30 pages, 4163 KB

Open AccessArticle

Dialogue Management and Language Generation for a Robust Conversational Virtual Coach: Validation and User Study

by Alain Vázquez, Asier López Zorrilla, Javier Mikel Olaso and María Inés Torres

Sensors 2023, 23(3), 1423; https://doi.org/10.3390/s23031423 - 27 Jan 2023

Cited by 14 | Viewed by 4373

Abstract

Designing human–machine interactive systems requires cooperation between different disciplines is required. In this work, we present a Dialogue Manager and a Language Generator that are the core modules of a Voice-based Spoken Dialogue System (SDS) capable of carrying out challenging, long and complex [...] Read more.

Designing human–machine interactive systems requires cooperation between different disciplines is required. In this work, we present a Dialogue Manager and a Language Generator that are the core modules of a Voice-based Spoken Dialogue System (SDS) capable of carrying out challenging, long and complex coaching conversations. We also develop an efficient integration procedure of the whole system that will act as an intelligent and robust Virtual Coach. The coaching task significantly differs from the classical applications of SDSs, resulting in a much higher degree of complexity and difficulty. The Virtual Coach has been successfully tested and validated in a user study with independent elderly, in three different countries with three different languages and cultures: Spain, France and Norway. Full article

(This article belongs to the Section Communications)

► Show Figures

Figure 1

18 pages, 860 KB

Open AccessArticle

Improved Spoken Language Representation for Intent Understanding in a Task-Oriented Dialogue System

by June-Woo Kim, Hyekyung Yoon and Ho-Young Jung

Sensors 2022, 22(4), 1509; https://doi.org/10.3390/s22041509 - 15 Feb 2022

Cited by 5 | Viewed by 4570

Abstract

Successful applications of deep learning technologies in the natural language processing domain have improved text-based intent classifications. However, in practical spoken dialogue applications, the users’ articulation styles and background noises cause automatic speech recognition (ASR) errors, and these may lead language models to [...] Read more.

Successful applications of deep learning technologies in the natural language processing domain have improved text-based intent classifications. However, in practical spoken dialogue applications, the users’ articulation styles and background noises cause automatic speech recognition (ASR) errors, and these may lead language models to misclassify users’ intents. To overcome the limited performance of the intent classification task in the spoken dialogue system, we propose a novel approach that jointly uses both recognized text obtained by the ASR model and a given labeled text. In the evaluation phase, only the fine-tuned recognized language model (RLM) is used. The experimental results show that the proposed scheme is effective at classifying intents in the spoken dialogue system containing ASR errors. Full article

(This article belongs to the Special Issue VOICE Sensors with Deep Learning)

► Show Figures

Figure 1

17 pages, 1397 KB

Open AccessArticle

Development of Speech Recognition Systems in Emergency Call Centers

by Alakbar Valizada, Natavan Akhundova and Samir Rustamov

Symmetry 2021, 13(4), 634; https://doi.org/10.3390/sym13040634 - 9 Apr 2021

Cited by 21 | Viewed by 6224

Abstract

In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific [...] Read more.

In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific context and noisy, emotional environments, available speech recognition systems show poor performance. Therefore, in order to accurately recognize dialogue speeches, the main modules of speech recognition systems—language models and acoustic training methodologies—as well as symmetric data labeling approaches have been investigated and analyzed. To find an effective acoustic model for dialogue data, different types of Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) and Deep Neural Network/Hidden Markov Model (DNN/HMM) methodologies were trained and compared. Additionally, effective language models for dialogue systems were defined based on extrinsic and intrinsic methods. Lastly, our suggested data labeling approaches with spelling correction are compared with common labeling methods resulting in outperforming the other methods with a notable percentage. Based on the results of the experiments, we determined that DNN/HMM for an acoustic model, trigram with Kneser–Ney discounting for a language model and using spelling correction before training data for a labeling method are effective configurations for dialogue speech recognition in emergency call centers. It should be noted that this research was conducted with two different types of datasets collected from emergency calls: the Dialogue dataset (27 h), which encapsulates call agents’ speech, and the Summary dataset (53 h), which contains voiced summaries of those dialogues describing emergency cases. Even though the speech taken from the emergency call center is in the Azerbaijani language, which belongs to the Turkic group of languages, our approaches are not tightly connected to specific language features. Hence, it is anticipated that suggested approaches can be applied to the other languages of the same group. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

14 pages, 1723 KB

Open AccessArticle

Dialogue Enhanced Extended Reality: Interactive System for the Operator 4.0

by Manex Serras, Laura García-Sardiña, Bruno Simões, Hugo Álvarez and Jon Arambarri

Appl. Sci. 2020, 10(11), 3960; https://doi.org/10.3390/app10113960 - 7 Jun 2020

Cited by 34 | Viewed by 4499

Abstract

The nature of industrial manufacturing processes and the continuous need to adapt production systems to new demands require tools to support workers during transitions to new processes. At the early stage of transitions, human error rate is often high and the impact in [...] Read more.

The nature of industrial manufacturing processes and the continuous need to adapt production systems to new demands require tools to support workers during transitions to new processes. At the early stage of transitions, human error rate is often high and the impact in quality and production loss can be significant. Over the past years, eXtended Reality (XR) technologies (such as virtual, augmented, immersive, and mixed reality) have become a popular approach to enhance operators’ capabilities in the Industry 4.0 paradigm. The purpose of this research is to explore the usability of dialogue-based XR enhancement to ease the cognitive burden associated with manufacturing tasks, through the augmentation of linked multi-modal information available to support operators. The proposed Interactive XR architecture, using the Spoken Dialogue Systems’ modular and user-centred architecture as a basis, was tested in two use case scenarios: the maintenance of a robotic gripper and as a shop-floor assistant for electric panel assembly. In both cases, we have confirmed a high user acceptance rate with an efficient knowledge communication and distribution even for operators without prior experience or with cognitive impairments, therefore demonstrating the suitability of the solution for assisting human workers in industrial manufacturing processes. The results endorse an initial validation of the Interactive XR architecture to achieve a multi-device and user-friendly experience to solve industrial processes, which is flexible enough to encompass multiple tasks. Full article

(This article belongs to the Special Issue New Industry 4.0 Advances in Industrial IoT and Visual Computing for Manufacturing Processes: Volume II)

► Show Figures

Figure 1

Search Results (20)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (20)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI