Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (20)

Search Parameters:
Keywords = spoken dialogue systems

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 19135 KB  
Article
Development of a Multi-Platform AI-Based Software Interface for the Accompaniment of Children
by Isaac León, Camila Reyes, Iesus Davila, Bryan Puruncajas, Dennys Paillacho, Nayeth Solorzano, Marcelo Fajardo-Pruna, Hyungpil Moon and Francisco Yumbla
Multimodal Technol. Interact. 2025, 9(9), 88; https://doi.org/10.3390/mti9090088 - 26 Aug 2025
Viewed by 904
Abstract
The absence of parental presence has a direct impact on the emotional stability and social routines of children, especially during extended periods of separation from their family environment, as in the case of daycare centers, hospitals, or when they remain alone at home. [...] Read more.
The absence of parental presence has a direct impact on the emotional stability and social routines of children, especially during extended periods of separation from their family environment, as in the case of daycare centers, hospitals, or when they remain alone at home. At the same time, the technology currently available to provide emotional support in these contexts remains limited. In response to the growing need for emotional support and companionship in child care, this project proposes the development of a multi-platform software architecture based on artificial intelligence (AI), designed to be integrated into humanoid robots that assist children between the ages of 6 and 14. The system enables daily verbal and non-verbal interactions intended to foster a sense of presence and personalized connection through conversations, games, and empathetic gestures. Built on the Robot Operating System (ROS), the software incorporates modular components for voice command processing, real-time facial expression generation, and joint movement control. These modules allow the robot to hold natural conversations, display dynamic facial expressions on its LCD (Liquid Crystal Display) screen, and synchronize gestures with spoken responses. Additionally, a graphical interface enhances the coherence between dialogue and movement, thereby improving the quality of human–robot interaction. Initial evaluations conducted in controlled environments assessed the system’s fluency, responsiveness, and expressive behavior. Subsequently, it was implemented in a pediatric hospital in Guayaquil, Ecuador, where it accompanied children during their recovery. It was observed that this type of artificial intelligence-based software, can significantly enhance the experience of children, opening promising opportunities for its application in clinical, educational, recreational, and other child-centered settings. Full article
Show Figures

Graphical abstract

17 pages, 6837 KB  
Article
Mitigating LLM Hallucinations Using a Multi-Agent Framework
by Ahmed M. Darwish, Essam A. Rashed and Ghada Khoriba
Information 2025, 16(7), 517; https://doi.org/10.3390/info16070517 - 21 Jun 2025
Cited by 1 | Viewed by 5855
Abstract
The rapid advancement of Large Language Models (LLMs) has led to substantial investment in enhancing their capabilities and expanding their feature sets. Despite these developments, a critical gap remains between model sophistication and their dependable deployment in real-world applications. A key concern is [...] Read more.
The rapid advancement of Large Language Models (LLMs) has led to substantial investment in enhancing their capabilities and expanding their feature sets. Despite these developments, a critical gap remains between model sophistication and their dependable deployment in real-world applications. A key concern is the inconsistency of LLM-generated outputs in production environments, which hinders scalability and reliability. In response to these challenges, we propose a novel framework that integrates custom-defined, rule-based logic to constrain and guide LLM behavior effectively. This framework enforces deterministic response boundaries while considering the model’s reasoning capabilities. Furthermore, we introduce a quantitative performance scoring mechanism that achieves an 85.5% improvement in response consistency, facilitating more predictable and accountable model outputs. The proposed system is industry-agnostic and can be generalized to any domain with a well-defined validation schema. This work contributes to the growing research on aligning LLMs with structured, operational constraints to ensure safe, robust, and scalable deployment. Full article
(This article belongs to the Special Issue Intelligent Agent and Multi-Agent System)
Show Figures

Figure 1

20 pages, 3901 KB  
Article
Designing Social Robots with LLMs for Engaging Human Interaction
by Maria Pinto-Bernal, Matthijs Biondina and Tony Belpaeme
Appl. Sci. 2025, 15(11), 6377; https://doi.org/10.3390/app15116377 - 5 Jun 2025
Cited by 3 | Viewed by 2569
Abstract
Large Language Models (LLMs), particularly those enhanced through Reinforcement Learning from Human Feedback, such as ChatGPT, have opened up new possibilities for natural and open-ended spoken interaction in social robotics. However, these models are not inherently designed for embodied, multimodal contexts. This paper [...] Read more.
Large Language Models (LLMs), particularly those enhanced through Reinforcement Learning from Human Feedback, such as ChatGPT, have opened up new possibilities for natural and open-ended spoken interaction in social robotics. However, these models are not inherently designed for embodied, multimodal contexts. This paper presents a user-centred approach to integrating an LLM into a humanoid robot, designed to engage in fluid, context-aware conversation with socially isolated older adults. We describe our system architecture, which combines real-time speech processing, layered memory summarisation, persona conditioning, and multilingual voice adaptation to support personalised, socially appropriate interactions. Through iterative development and evaluation, including in-home exploratory trials with older adults (n = 7) and a preliminary study with young adults (n = 43), we investigated the technical and experiential challenges of deploying LLMs in real-world human–robot dialogue. Our findings show that memory continuity, adaptive turn-taking, and culturally attuned voice design enhance user perceptions of trust, naturalness, and social presence. We also identify persistent limitations related to response latency, hallucinations, and expectation management. This work contributes design insights and architectural strategies for future LLM-integrated robots that aim to support meaningful, emotionally resonant companionship in socially assistive settings. Full article
Show Figures

Figure 1

20 pages, 6941 KB  
Article
EmoSDS: Unified Emotionally Adaptive Spoken Dialogue System Using Self-Supervised Speech Representations
by Jaehwan Lee, Youngjun Sim, Jinyou Kim and Young-Joo Suh
Future Internet 2025, 17(4), 143; https://doi.org/10.3390/fi17040143 - 25 Mar 2025
Cited by 2 | Viewed by 1131
Abstract
In recent years, advancements in artificial intelligence, speech, and natural language processing technology have enhanced spoken dialogue systems (SDSs), enabling natural, voice-based human–computer interaction. However, discrete, token-based LLMs in emotionally adaptive SDSs focus on lexical content while overlooking essential paralinguistic cues for emotion [...] Read more.
In recent years, advancements in artificial intelligence, speech, and natural language processing technology have enhanced spoken dialogue systems (SDSs), enabling natural, voice-based human–computer interaction. However, discrete, token-based LLMs in emotionally adaptive SDSs focus on lexical content while overlooking essential paralinguistic cues for emotion expression. Existing methods use external emotion predictors to compensate for this but introduce computational overhead and fail to fully integrate paralinguistic features with linguistic context. Moreover, the lack of high-quality emotional speech datasets limits models’ ability to learn expressive emotional cues. To address these challenges, we propose EmoSDS, a unified SDS framework that integrates speech and emotion recognition by leveraging self-supervised learning (SSL) features. Our three-stage training pipeline enables the LLM to learn both discrete linguistic content and continuous paralinguistic features, improving emotional expressiveness and response naturalness. Additionally, we construct EmoSC, a dataset combining GPT-generated dialogues with emotional voice conversion data, ensuring greater emotional diversity and a balanced sample distribution across emotion categories. The experimental results show that EmoSDS outperforms existing models in emotional alignment and response generation, achieving a minimum 2.9% increase in text generation metrics, enhancing the LLM’s ability to interpret emotional and textual cues for more expressive and contextually appropriate responses. Full article
(This article belongs to the Special Issue Generative Artificial Intelligence in Smart Societies)
Show Figures

Figure 1

14 pages, 310 KB  
Article
Automated Generation of Clinical Reports Using Sensing Technologies with Deep Learning Techniques
by Celia Cabello-Collado, Javier Rodriguez-Juan, David Ortiz-Perez, Jose Garcia-Rodriguez, David Tomás and Maria Flores Vizcaya-Moreno
Sensors 2024, 24(9), 2751; https://doi.org/10.3390/s24092751 - 25 Apr 2024
Cited by 5 | Viewed by 2758
Abstract
This study presents a pioneering approach that leverages advanced sensing technologies and data processing techniques to enhance the process of clinical documentation generation during medical consultations. By employing sophisticated sensors to capture and interpret various cues such as speech patterns, intonations, or pauses, [...] Read more.
This study presents a pioneering approach that leverages advanced sensing technologies and data processing techniques to enhance the process of clinical documentation generation during medical consultations. By employing sophisticated sensors to capture and interpret various cues such as speech patterns, intonations, or pauses, the system aims to accurately perceive and understand patient–doctor interactions in real time. This sensing capability allows for the automation of transcription and summarization tasks, facilitating the creation of concise and informative clinical documents. Through the integration of automatic speech recognition sensors, spoken dialogue is seamlessly converted into text, enabling efficient data capture. Additionally, deep models such as Transformer models are utilized to extract and analyze crucial information from the dialogue, ensuring that the generated summaries encapsulate the essence of the consultations accurately. Despite encountering challenges during development, experimentation with these sensing technologies has yielded promising results. The system achieved a maximum ROUGE-1 metric score of 0.57, demonstrating its effectiveness in summarizing complex medical discussions. This sensor-based approach aims to alleviate the administrative burden on healthcare professionals by automating documentation tasks and safeguarding important patient information. Ultimately, by enhancing the efficiency and reliability of clinical documentation, this innovative method contributes to improving overall healthcare outcomes. Full article
(This article belongs to the Special Issue Intelligent Sensors for Healthcare and Patient Monitoring)
Show Figures

Figure 1

14 pages, 1680 KB  
Article
AI to Train AI: Using ChatGPT to Improve the Accuracy of a Therapeutic Dialogue System
by Karolina Gabor-Siatkowska, Marcin Sowański, Rafał Rzatkiewicz, Izabela Stefaniak, Marek Kozłowski and Artur Janicki
Electronics 2023, 12(22), 4694; https://doi.org/10.3390/electronics12224694 - 18 Nov 2023
Cited by 12 | Viewed by 6209
Abstract
In this work, we present the use of one artificial intelligence (AI) application (ChatGPT) to train another AI-based application. As the latter one, we show a dialogue system named Terabot, which was used in the therapy of psychiatric patients. Our study was motivated [...] Read more.
In this work, we present the use of one artificial intelligence (AI) application (ChatGPT) to train another AI-based application. As the latter one, we show a dialogue system named Terabot, which was used in the therapy of psychiatric patients. Our study was motivated by the fact that for such a domain-specific system, it was difficult to acquire large real-life data samples to increase the training database: this would require recruiting more patients, which is both time-consuming and costly. To address this gap, we have employed a neural large language model: ChatGPT version 3.5, to generate data solely for training our dialogue system. During initial experiments, we identified intents that were most often misrecognized. Next, we fed ChatGPT with a series of prompts, which triggered the language model to generate numerous additional training entries, e.g., alternatives to the phrases that had been collected during initial experiments with healthy users. This way, we have enlarged the training dataset by 112%. In our case study, for testing, we used 2802 speech recordings originating from 32 psychiatric patients. As an evaluation metric, we used the accuracy of intent recognition. The speech samples were converted into text using automatic speech recognition (ASR). The analysis showed that the patients’ speech challenged the ASR module significantly, resulting in deteriorated speech recognition and, consequently, low accuracy of intent recognition. However, thanks to the augmentation of the training data with ChatGPT-generated data, the intent recognition accuracy increased by 13% relatively, reaching 86% in total. We also emulated the case of an error-free ASR and showed the impact of ASR misrecognitions on the intent recognition accuracy. Our study showcased the potential of using generative language models to develop other AI-based tools, such as dialogue systems. Full article
(This article belongs to the Special Issue Application of Machine Learning and Intelligent Systems)
Show Figures

Figure 1

14 pages, 325 KB  
Article
Improving Abstractive Dialogue Summarization Using Keyword Extraction
by Chongjae Yoo and Hwanhee Lee
Appl. Sci. 2023, 13(17), 9771; https://doi.org/10.3390/app13179771 - 29 Aug 2023
Cited by 6 | Viewed by 3076
Abstract
Abstractive dialogue summarization aims to generate a short passage that contains important content for a particular dialogue spoken by multiple speakers. In abstractive dialogue summarization systems, capturing the subject in the dialogue is challenging owing to the properties of colloquial texts. Moreover, the [...] Read more.
Abstractive dialogue summarization aims to generate a short passage that contains important content for a particular dialogue spoken by multiple speakers. In abstractive dialogue summarization systems, capturing the subject in the dialogue is challenging owing to the properties of colloquial texts. Moreover, the system often generates uninformative summaries. In this paper, we propose a novel keyword-aware dialogue summarization system (KADS) that easily captures the subject in the dialogue to alleviate the problem mentioned above through the efficient usage of keywords. Specifically, we first extract the keywords from the input dialogue using a pre-trained keyword extractor. Subsequently, KADS efficiently leverages the keywords information of the dialogue to the transformer-based dialogue system by using the pre-trained keyword extractor. Extensive experiments performed on three benchmark datasets show that the proposed method outperforms the baseline system. Additionally, we demonstrate that the proposed keyword-aware dialogue summarization system exhibits a high-performance gain in low-resource conditions where the number of training examples is highly limited. Full article
(This article belongs to the Special Issue Application of Machine Learning in Text Mining)
Show Figures

Figure 1

25 pages, 1000 KB  
Article
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
by Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlicek and Matthias Kleinert
Aerospace 2023, 10(5), 490; https://doi.org/10.3390/aerospace10050490 - 22 May 2023
Cited by 22 | Viewed by 6686
Abstract
In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI)-based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and [...] Read more.
In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI)-based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding. Thus, it goes beyond only transcribing the communication and can also understand its meaning. The output is subsequently sent to a response generator system, which resembles the spoken read-back that pilots give to the ATCo trainees. The overall pipeline is composed of the following submodules: (i) an automatic speech recognition (ASR) system that transforms audio into a sequence of words; (ii) a high-level air traffic control (ATC)-related entity parser that understands the transcribed voice communication; and (iii) a text-to-speech submodule that generates a spoken utterance that resembles a pilot based on the situation of the dialogue. Our system employs state-of-the-art AI-based tools such as Wav2Vec 2.0, Conformer, BERT and Tacotron models. To the best of our knowledge, this is the first work fully based on open-source ATC resources and AI tools. In addition, we develop a robust and modular system with optional submodules that can enhance the system’s performance by incorporating real-time surveillance data, metadata related to exercises (such as sectors or runways), or even a deliberate read-back error to train ATCo trainees to identify them. Our ASR system can reach as low as 5.5% and 15.9% absolute word error rates (WER) on high- and low-quality ATC audio. We also demonstrate that adding surveillance data into the ASR can yield a callsign detection accuracy of more than 96%. Full article
Show Figures

Figure 1

15 pages, 2028 KB  
Article
Analysis of Physical Education Classroom Teaching after Implementation of the Chinese Health Physical Education Curriculum Model: A Video-Based Assessment
by Chao Liu, Cuixiang Dong, Xiaohui Li, Huanhuan Huang and Qiulin Wang
Behav. Sci. 2023, 13(3), 251; https://doi.org/10.3390/bs13030251 - 12 Mar 2023
Cited by 4 | Viewed by 4531
Abstract
This study assessed the Chinese health physical education curriculum model recently suggested to meet the recommended physical education curriculum reforms addressing the declining physical and mental health of students in China. We used video analyses of 41 physical education classroom teaching cases with [...] Read more.
This study assessed the Chinese health physical education curriculum model recently suggested to meet the recommended physical education curriculum reforms addressing the declining physical and mental health of students in China. We used video analyses of 41 physical education classroom teaching cases with a physical education classroom teaching behavior analysis system to provide quantitative and qualitative behavioral data. We established reference ranges for classroom teaching behavior indicators, summarized classroom teaching patterns, and assessed classroom discourse and the emotional climate. Notable findings included teachers in elementary schools using closed-ended questions, predictable responses, and general feedback significantly more often than teachers in senior high school, and ball sports instructors using demonstration and competition significantly more frequently than instructors in athletics. Overall, three teaching patterns were most commonly used—lecture, practice, and dialogue—with practice being dominant. Analysis of the top 50 most commonly spoken words by teachers identified five types of discourse—motivational, directive, specialized, transitional, and regulatory—with motivational words being most frequent. The classroom atmosphere was mainly positive. These findings provide evidence that the use of this curriculum model may bring positive changes to physical education classroom teaching methods in China and will inform subsequent innovative physical education classroom teaching practices. Full article
(This article belongs to the Special Issue Behaviors in Educational Settings)
Show Figures

Figure 1

12 pages, 1436 KB  
Article
Pre-Trained Joint Model for Intent Classification and Slot Filling with Semantic Feature Fusion
by Yan Chen and Zhenghang Luo
Sensors 2023, 23(5), 2848; https://doi.org/10.3390/s23052848 - 6 Mar 2023
Cited by 9 | Viewed by 5149
Abstract
The comprehension of spoken language is a crucial aspect of dialogue systems, encompassing two fundamental tasks: intent classification and slot filling. Currently, the joint modeling approach for these two tasks has emerged as the dominant method in spoken language understanding modeling. However, the [...] Read more.
The comprehension of spoken language is a crucial aspect of dialogue systems, encompassing two fundamental tasks: intent classification and slot filling. Currently, the joint modeling approach for these two tasks has emerged as the dominant method in spoken language understanding modeling. However, the existing joint models have limitations in terms of their relevancy and utilization of contextual semantic features between the multiple tasks. To address these limitations, a joint model based on BERT and semantic fusion (JMBSF) is proposed. The model employs pre-trained BERT to extract semantic features and utilizes semantic fusion to associate and integrate this information. The results of experiments on two benchmark datasets, ATIS and Snips, in spoken language comprehension demonstrate that the proposed JMBSF model attains 98.80% and 99.71% intent classification accuracy, 98.25% and 97.24% slot-filling F1-score, and 93.40% and 93.57% sentence accuracy, respectively. These results reveal a significant improvement compared to other joint models. Furthermore, comprehensive ablation studies affirm the effectiveness of each component in the design of JMBSF. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

23 pages, 2695 KB  
Article
A Multi-Attention Approach Using BERT and Stacked Bidirectional LSTM for Improved Dialogue State Tracking
by Muhammad Asif Khan, Yi Huang, Junlan Feng, Bhuyan Kaibalya Prasad, Zafar Ali, Irfan Ullah and Pavlos Kefalas
Appl. Sci. 2023, 13(3), 1775; https://doi.org/10.3390/app13031775 - 30 Jan 2023
Cited by 6 | Viewed by 4595
Abstract
The modern digital world and associated innovative and state-of-the-art applications that characterize its presence, render the current digital age a captivating era for many worldwide. These innovations include dialogue systems, such as Apple’s Siri, Google Now, and Microsoft’s Cortana, that stay on the [...] Read more.
The modern digital world and associated innovative and state-of-the-art applications that characterize its presence, render the current digital age a captivating era for many worldwide. These innovations include dialogue systems, such as Apple’s Siri, Google Now, and Microsoft’s Cortana, that stay on the personal devices of users and assist them in their daily activities. These systems track the intentions of users by analyzing their speech, context by looking at their previous turns, and several other external details, and respond or act in the form of speech output. For these systems to work efficiently, a dialogue state tracking (DST) module is required to infer the current state of the dialogue in a conversation by processing previous states up to the current state. However, developing a DST module that tracks and exploit dialogue states effectively and accurately is challenging. The notable challenges that warrant immediate attention include scalability, handling the unseen slot-value pairs during training, and retraining the model with changes in the domain ontology. In this article, we present a new end-to-end framework by combining BERT, Stacked Bidirectional LSTM (BiLSTM), and a multiple attention mechanism to formalize DST as a classification problem and address the aforementioned issues. The BERT-based module encodes the user’s and system’s utterances. The Stacked BiLSTM extracts the contextual features and multiple attention mechanisms to calculate the attention between its hidden states and the utterance embeddings. We experimentally evaluated our method against the current approaches over a variety of datasets. The results indicate a significant overall improvement. The proposed model is scalable in terms of sharing the parameters and it considers the unseen instances during training. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)
Show Figures

Figure 1

30 pages, 4163 KB  
Article
Dialogue Management and Language Generation for a Robust Conversational Virtual Coach: Validation and User Study
by Alain Vázquez, Asier López Zorrilla, Javier Mikel Olaso and María Inés Torres
Sensors 2023, 23(3), 1423; https://doi.org/10.3390/s23031423 - 27 Jan 2023
Cited by 14 | Viewed by 4373
Abstract
Designing human–machine interactive systems requires cooperation between different disciplines is required. In this work, we present a Dialogue Manager and a Language Generator that are the core modules of a Voice-based Spoken Dialogue System (SDS) capable of carrying out challenging, long and complex [...] Read more.
Designing human–machine interactive systems requires cooperation between different disciplines is required. In this work, we present a Dialogue Manager and a Language Generator that are the core modules of a Voice-based Spoken Dialogue System (SDS) capable of carrying out challenging, long and complex coaching conversations. We also develop an efficient integration procedure of the whole system that will act as an intelligent and robust Virtual Coach. The coaching task significantly differs from the classical applications of SDSs, resulting in a much higher degree of complexity and difficulty. The Virtual Coach has been successfully tested and validated in a user study with independent elderly, in three different countries with three different languages and cultures: Spain, France and Norway. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

18 pages, 860 KB  
Article
Improved Spoken Language Representation for Intent Understanding in a Task-Oriented Dialogue System
by June-Woo Kim, Hyekyung Yoon and Ho-Young Jung
Sensors 2022, 22(4), 1509; https://doi.org/10.3390/s22041509 - 15 Feb 2022
Cited by 5 | Viewed by 4570
Abstract
Successful applications of deep learning technologies in the natural language processing domain have improved text-based intent classifications. However, in practical spoken dialogue applications, the users’ articulation styles and background noises cause automatic speech recognition (ASR) errors, and these may lead language models to [...] Read more.
Successful applications of deep learning technologies in the natural language processing domain have improved text-based intent classifications. However, in practical spoken dialogue applications, the users’ articulation styles and background noises cause automatic speech recognition (ASR) errors, and these may lead language models to misclassify users’ intents. To overcome the limited performance of the intent classification task in the spoken dialogue system, we propose a novel approach that jointly uses both recognized text obtained by the ASR model and a given labeled text. In the evaluation phase, only the fine-tuned recognized language model (RLM) is used. The experimental results show that the proposed scheme is effective at classifying intents in the spoken dialogue system containing ASR errors. Full article
(This article belongs to the Special Issue VOICE Sensors with Deep Learning)
Show Figures

Figure 1

17 pages, 1397 KB  
Article
Development of Speech Recognition Systems in Emergency Call Centers
by Alakbar Valizada, Natavan Akhundova and Samir Rustamov
Symmetry 2021, 13(4), 634; https://doi.org/10.3390/sym13040634 - 9 Apr 2021
Cited by 21 | Viewed by 6224
Abstract
In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific [...] Read more.
In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific context and noisy, emotional environments, available speech recognition systems show poor performance. Therefore, in order to accurately recognize dialogue speeches, the main modules of speech recognition systems—language models and acoustic training methodologies—as well as symmetric data labeling approaches have been investigated and analyzed. To find an effective acoustic model for dialogue data, different types of Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) and Deep Neural Network/Hidden Markov Model (DNN/HMM) methodologies were trained and compared. Additionally, effective language models for dialogue systems were defined based on extrinsic and intrinsic methods. Lastly, our suggested data labeling approaches with spelling correction are compared with common labeling methods resulting in outperforming the other methods with a notable percentage. Based on the results of the experiments, we determined that DNN/HMM for an acoustic model, trigram with Kneser–Ney discounting for a language model and using spelling correction before training data for a labeling method are effective configurations for dialogue speech recognition in emergency call centers. It should be noted that this research was conducted with two different types of datasets collected from emergency calls: the Dialogue dataset (27 h), which encapsulates call agents’ speech, and the Summary dataset (53 h), which contains voiced summaries of those dialogues describing emergency cases. Even though the speech taken from the emergency call center is in the Azerbaijani language, which belongs to the Turkic group of languages, our approaches are not tightly connected to specific language features. Hence, it is anticipated that suggested approaches can be applied to the other languages of the same group. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

14 pages, 1723 KB  
Article
Dialogue Enhanced Extended Reality: Interactive System for the Operator 4.0
by Manex Serras, Laura García-Sardiña, Bruno Simões, Hugo Álvarez and Jon Arambarri
Appl. Sci. 2020, 10(11), 3960; https://doi.org/10.3390/app10113960 - 7 Jun 2020
Cited by 34 | Viewed by 4499
Abstract
The nature of industrial manufacturing processes and the continuous need to adapt production systems to new demands require tools to support workers during transitions to new processes. At the early stage of transitions, human error rate is often high and the impact in [...] Read more.
The nature of industrial manufacturing processes and the continuous need to adapt production systems to new demands require tools to support workers during transitions to new processes. At the early stage of transitions, human error rate is often high and the impact in quality and production loss can be significant. Over the past years, eXtended Reality (XR) technologies (such as virtual, augmented, immersive, and mixed reality) have become a popular approach to enhance operators’ capabilities in the Industry 4.0 paradigm. The purpose of this research is to explore the usability of dialogue-based XR enhancement to ease the cognitive burden associated with manufacturing tasks, through the augmentation of linked multi-modal information available to support operators. The proposed Interactive XR architecture, using the Spoken Dialogue Systems’ modular and user-centred architecture as a basis, was tested in two use case scenarios: the maintenance of a robotic gripper and as a shop-floor assistant for electric panel assembly. In both cases, we have confirmed a high user acceptance rate with an efficient knowledge communication and distribution even for operators without prior experience or with cognitive impairments, therefore demonstrating the suitability of the solution for assisting human workers in industrial manufacturing processes. The results endorse an initial validation of the Interactive XR architecture to achieve a multi-device and user-friendly experience to solve industrial processes, which is flexible enough to encompass multiple tasks. Full article
Show Figures

Figure 1

Back to TopTop