MDPI - Publisher of Open Access Journals

37 pages, 618 KiB

Open AccessSystematic Review

Interaction, Artificial Intelligence, and Motivation in Children’s Speech Learning and Rehabilitation Through Digital Games: A Systematic Literature Review

by Chra Abdoulqadir and Fernando Loizides

Information 2025, 16(7), 599; https://doi.org/10.3390/info16070599 - 12 Jul 2025

Viewed by 222

Abstract

The integration of digital serious games into speech learning (rehabilitation) has demonstrated significant potential in enhancing accessibility and inclusivity for children with speech disabilities. This review of the state of the art examines the role of serious games, Artificial Intelligence (AI), and Natural [...] Read more.

The integration of digital serious games into speech learning (rehabilitation) has demonstrated significant potential in enhancing accessibility and inclusivity for children with speech disabilities. This review of the state of the art examines the role of serious games, Artificial Intelligence (AI), and Natural Language Processing (NLP) in speech rehabilitation, with a particular focus on interaction modalities, engagement autonomy, and motivation. We have reviewed 45 selected studies. Our key findings show how intelligent tutoring systems, adaptive voice-based interfaces, and gamified speech interventions can empower children to engage in self-directed speech learning, reducing dependence on therapists and caregivers. The diversity of interaction modalities, including speech recognition, phoneme-based exercises, and multimodal feedback, demonstrates how AI and Assistive Technology (AT) can personalise learning experiences to accommodate diverse needs. Furthermore, the incorporation of gamification strategies, such as reward systems and adaptive difficulty levels, has been shown to enhance children’s motivation and long-term participation in speech rehabilitation. The gaps identified show that despite advancements, challenges remain in achieving universal accessibility, particularly regarding speech recognition accuracy, multilingual support, and accessibility for users with multiple disabilities. This review advocates for interdisciplinary collaboration across educational technology, special education, cognitive science, and human–computer interaction (HCI). Our work contributes to the ongoing discourse on lifelong inclusive education, reinforcing the potential of AI-driven serious games as transformative tools for bridging learning gaps and promoting speech rehabilitation beyond clinical environments. Full article

(This article belongs to the Special Issue ICT, AI, and Assistive Technology for Accessible and Inclusive Education)

► Show Figures

Graphical abstract

17 pages, 2108 KiB

Open AccessArticle

Designing for Dyads: A Comparative User Experience Study of Remote and Face-to-Face Multi-User Interfaces

by Mengcai Zhou, Jingxuan Wang, Ono Kenta, Makoto Watanabe and Chacon Quintero Juan Carlos

Electronics 2025, 14(14), 2806; https://doi.org/10.3390/electronics14142806 - 12 Jul 2025

Viewed by 200

Abstract

Collaborative digital games and interfaces are increasingly used in both research and commercial contexts, yet little is known about how the spatial arrangement and interface sharing affect the user experience in dyadic settings. Using a two-player iPad pong game, this study compared user [...] Read more.

Collaborative digital games and interfaces are increasingly used in both research and commercial contexts, yet little is known about how the spatial arrangement and interface sharing affect the user experience in dyadic settings. Using a two-player iPad pong game, this study compared user experiences across three collaborative gaming scenarios: face-to-face single-screen (F2F-OneS), face-to-face dual-screen (F2F-DualS), and remote dual-screen (Rmt-DualS) scenarios. Eleven dyads participated in all conditions using a within-subject design. After each session, the participants completed a 21-item user experience questionnaire and took part in brief interviews. The results from a repeated-measure ANOVA and post hoc paired t-tests showed significant scenario effects for several experience items, with F2F-OneS yielding higher engagement, novelty, and accomplishment than remote play, and qualitative interviews supported the quantitative findings, revealing themes of social presence and interaction. These results highlight the importance of spatial and interface design in collaborative settings, suggesting that both technical and social factors should be considered in multi-user interface development. Full article

(This article belongs to the Special Issue Innovative Designs in Human–Computer Interaction)

► Show Figures

Figure 1

21 pages, 480 KiB

Open AccessPerspective

Towards Predictive Communication: The Fusion of Large Language Models and Brain–Computer Interface

by Andrea Carìa

Sensors 2025, 25(13), 3987; https://doi.org/10.3390/s25133987 - 26 Jun 2025

Viewed by 478

Abstract

Integration of advanced artificial intelligence with neurotechnology offers transformative potential for assistive communication. This perspective article examines the emerging convergence between non-invasive brain–computer interface (BCI) spellers and large language models (LLMs), with a focus on predictive communication for individuals with motor or language [...] Read more.

Integration of advanced artificial intelligence with neurotechnology offers transformative potential for assistive communication. This perspective article examines the emerging convergence between non-invasive brain–computer interface (BCI) spellers and large language models (LLMs), with a focus on predictive communication for individuals with motor or language impairments. First, I will review the evolution of language models—from early rule-based systems to contemporary deep learning architectures—and their role in enhancing predictive writing. Second, I will survey existing implementations of BCI spellers that incorporate language modeling and highlight recent pilot studies exploring the integration of LLMs into BCI. Third, I will examine how, despite advancements in typing speed, accuracy, and user adaptability, the fusion of LLMs and BCI spellers still faces key challenges such as real-time processing, robustness to noise, and the integration of neural decoding outputs with probabilistic language generation frameworks. Finally, I will discuss how fully integrating LLMs with BCI technology could substantially improve the speed and usability of BCI-mediated communication, offering a path toward more intuitive, adaptive, and effective neurotechnological solutions for both clinical and non-clinical users. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

26 pages, 1708 KiB

Open AccessArticle

Research on Task Complexity Measurements in Human—Computer Interaction in Nuclear Power Plant DCS Systems Based on Emergency Operating Procedures

by Ensheng Pang and Licao Dai

Entropy 2025, 27(6), 600; https://doi.org/10.3390/e27060600 - 4 Jun 2025

Viewed by 492

Abstract

Within the scope of digital transformation in nuclear power plants (NPPs), task complexity in human–computer interaction (HCI) has become a critical factor affecting the safe and stable operation of NPPs. This study systematically reviews and analyzes existing complexity sources and assessment methods and [...] Read more.

Within the scope of digital transformation in nuclear power plants (NPPs), task complexity in human–computer interaction (HCI) has become a critical factor affecting the safe and stable operation of NPPs. This study systematically reviews and analyzes existing complexity sources and assessment methods and suggests that complexity is primarily driven by core factors such as the quantity of, variety of, and relationships between elements. By innovatively introducing Halstead’s E measure, this study constructs a quantitative model of dynamic task execution complexity (TEC), addressing the limitations of traditional entropy-based metrics in analyzing interactive processes. By combining entropy metrics and the E measure, a task complexity quantification framework is established, encompassing both the task execution and intrinsic dimensions. Specifically, Halstead’s E measure focuses on analyzing operators and operands, defining interaction symbols between humans and interfaces to quantify task execution complexity (TEC). Entropy metrics, on the other hand, measure task logical complexity (TLC), task scale complexity (TSC), and task information complexity (TIC) based on the intrinsic structure and scale of tasks. Finally, the weighted Euclidean norm of these four factors determines the task complexity (TC) of each step. Taking the emergency operating procedures (EOP) for a small-break loss-of-coolant accident (SLOCA) in an NPP as an example, the entropy and E metrics are used to calculate the task complexity of each step, followed by experimental validation using NASA-TLX task load scores and step execution time for regression analysis. The results show that task complexity is significantly positively correlated with NASA-TLX subjective scores and task execution time, with the determination coefficients reaching 0.679 and 0.785, respectively. This indicates that the complexity metrics have high explanatory power, showing that the complexity quantification model is effective and has certain application value in improving human–computer interfaces and emergency procedures. Full article

► Show Figures

Figure 1

20 pages, 12032 KiB

Open AccessArticle

Influence of Visual Coding Based on Attraction Effect on Human–Computer Interface

by Linlin Wang, Yujie Liu, Xinyi Tang, Chengqi Xue and Haiyan Wang

J. Eye Mov. Res. 2025, 18(2), 12; https://doi.org/10.3390/jemr18020012 - 8 Apr 2025

Viewed by 369

Abstract

Decision-making is often influenced by contextual information on the human–computer interface (HCI), with the attraction effect being a common situational effect in digital nudging. To address the role of visual cognition and coding in the HCI based on the attraction effect, this research [...] Read more.

Decision-making is often influenced by contextual information on the human–computer interface (HCI), with the attraction effect being a common situational effect in digital nudging. To address the role of visual cognition and coding in the HCI based on the attraction effect, this research takes online websites as experimental scenarios and demonstrates how the coding modes and attributes influence the attraction effect. The results show that similarity-based attributes enhance the attraction effect, whereas difference-based attributes do not modulate its intensity, suggesting that the influence of the relationship driven by coding modes is weaker than that of coding attributes. Additionally, variations in the strength of the attraction effect are observed across different coding modes under the coding attribute of similarity, with color coding having the strongest effect, followed by size, and labels showing the weakest effect. This research analyzes the stimulating conditions of the attraction effect and provides new insights for exploring the relationship between cognition and visual characterization through the attraction effect at the HCI. Furthermore, our findings can help apply the attraction effect more effectively and assist users in making more reasonable decisions. Full article

► Show Figures

Figure 1

25 pages, 2844 KiB

Open AccessArticle

Real-Time Gesture-Based Hand Landmark Detection for Optimized Mobile Photo Capture and Synchronization

by Pedro Marques, Paulo Váz, José Silva, Pedro Martins and Maryam Abbasi

Electronics 2025, 14(4), 704; https://doi.org/10.3390/electronics14040704 - 12 Feb 2025

Viewed by 2026

Abstract

Gesture recognition technology has emerged as a transformative solution for natural and intuitive human–computer interaction (HCI), offering touch-free operation across diverse fields such as healthcare, gaming, and smart home systems. In mobile contexts, where hygiene, convenience, and the ability to operate under resource [...] Read more.

Gesture recognition technology has emerged as a transformative solution for natural and intuitive human–computer interaction (HCI), offering touch-free operation across diverse fields such as healthcare, gaming, and smart home systems. In mobile contexts, where hygiene, convenience, and the ability to operate under resource constraints are critical, hand gesture recognition provides a compelling alternative to traditional touch-based interfaces. However, implementing effective gesture recognition in real-world mobile settings involves challenges such as limited computational power, varying environmental conditions, and the requirement for robust offline–online data management. In this study, we introduce ThumbsUp, which is a gesture-driven system, and employ a partially systematic literature review approach (inspired by core PRISMA guidelines) to identify the key research gaps in mobile gesture recognition. By incorporating insights from deep learning–based methods (e.g., CNNs and Transformers) while focusing on low resource consumption, we leverage Google’s MediaPipe in our framework for real-time detection of 21 hand landmarks and adaptive lighting pre-processing, enabling accurate recognition of a “thumbs-up” gesture. The system features a secure queue-based offline–cloud synchronization model, which ensures that the captured images and metadata (encrypted with AES-GCM) remain consistent and accessible even with intermittent connectivity. Experimental results under dynamic lighting, distance variations, and partially cluttered environments confirm the system’s superior low-light performance and decreased resource consumption compared to baseline camera applications. Additionally, we highlight the feasibility of extending ThumbsUp to incorporate AI-driven enhancements for abrupt lighting changes and, in the future, electromyographic (EMG) signals for users with motor impairments. Our comprehensive evaluation demonstrates that ThumbsUp maintains robust performance on typical mobile hardware, showing resilience to unstable network conditions and minimal reliance on high-end GPUs. These findings offer new perspectives for deploying gesture-based interfaces in the broader IoT ecosystem, thus paving the way toward secure, efficient, and inclusive mobile HCI solutions. Full article

(This article belongs to the Special Issue AI-Driven Digital Image Processing: Latest Advances and Prospects)

► Show Figures

Figure 1

32 pages, 475 KiB

Open AccessReview

Multimodal Interaction, Interfaces, and Communication: A Survey

by Elias Dritsas, Maria Trigka, Christos Troussas and Phivos Mylonas

Multimodal Technol. Interact. 2025, 9(1), 6; https://doi.org/10.3390/mti9010006 - 14 Jan 2025

Cited by 4 | Viewed by 7604

Abstract

Multimodal interaction is a transformative human-computer interaction (HCI) approach that allows users to interact with systems through various communication channels such as speech, gesture, touch, and gaze. With advancements in sensor technology and machine learning (ML), multimodal systems are becoming increasingly important in [...] Read more.

Multimodal interaction is a transformative human-computer interaction (HCI) approach that allows users to interact with systems through various communication channels such as speech, gesture, touch, and gaze. With advancements in sensor technology and machine learning (ML), multimodal systems are becoming increasingly important in various applications, including virtual assistants, intelligent environments, healthcare, and accessibility technologies. This survey concisely overviews recent advancements in multimodal interaction, interfaces, and communication. It delves into integrating different input and output modalities, focusing on critical technologies and essential considerations in multimodal fusion, including temporal synchronization and decision-level integration. Furthermore, the survey explores the challenges of developing context-aware, adaptive systems that provide seamless and intuitive user experiences. Lastly, by examining current methodologies and trends, this study underscores the potential of multimodal systems and sheds light on future research directions. Full article

(This article belongs to the Special Issue Multimodal User Interfaces and Experiences: Challenges, Applications, and Perspectives—2nd Edition)

► Show Figures

Figure 1

22 pages, 3073 KiB

Open AccessArticle

Encouraging Sustainable Choices Through Socially Engaged Persuasive Recycling Initiatives: A Participatory Action Design Research Study

by Emilly Marques da Silva, Daniel Schneider, Claudio Miceli and António Correia

Informatics 2025, 12(1), 5; https://doi.org/10.3390/informatics12010005 - 8 Jan 2025

Cited by 1 | Viewed by 1744

Abstract

Human-Computer Interaction (HCI) research has illuminated how technology can influence users’ awareness of their environmental impact and the potential for mitigating these impacts. From hot water saving to food waste reduction, researchers have systematically and widely tried to find pathways to speed up [...] Read more.

Human-Computer Interaction (HCI) research has illuminated how technology can influence users’ awareness of their environmental impact and the potential for mitigating these impacts. From hot water saving to food waste reduction, researchers have systematically and widely tried to find pathways to speed up achieving sustainable development goals through persuasive technology interventions. However, motivating users to adopt sustainable behaviors through interactive technologies presents significant psychological, cultural, and technical challenges in creating engaging and long-lasting experiences. Aligned with this perspective, there is a dearth of research and design solutions addressing the use of persuasive technology to promote sustainable recycling behavior. Guided by a participatory design approach, this investigation focuses on the design opportunities for leveraging persuasive and human-centered Internet of Things (IoT) applications to enhance user engagement in recycling activities. The assumption is that one pathway to achieve this goal is to adopt persuasive strategies that may be incorporated into the design of sustainable applications. The insights gained from this process can then be applied to various sustainable HCI scenarios and therefore contribute to HCI’s limited understanding in this area by providing a series of design-oriented research recommendations for informing the development of persuasive and socially engaged recycling platforms. In particular, we advocate for the inclusion of educational content, real-time interactive feedback, and intuitive interfaces to actively engage users in recycling activities. Moreover, recognizing the cultural context in which the technology is socially situated becomes imperative for the effective implementation of smart devices to foster sustainable recycling practices. To this end, we present a case study that seeks to involve children and adolescents in pro-recycling activities within the school environment. Full article

► Show Figures

Figure 1

14 pages, 8055 KiB

Open AccessArticle

A Comparative Study of the User Interaction Behavior and Experience in a Home-Oriented Multi-User Interface (MUI) During Family Collaborative Cooking

by Mengcai Zhou, Minglun Li, Kenta Ono and Makoto Watanabe

Future Internet 2024, 16(12), 478; https://doi.org/10.3390/fi16120478 - 20 Dec 2024

Cited by 2 | Viewed by 710

Abstract

This study sought to ascertain the necessity of crafting specialized multi-user interfaces for scenarios involving multiple users and to provide guidance for the design of multi-user human–computer interactions by identifying the disparities in the interaction behavior and user experience when employing a conventional [...] Read more.

This study sought to ascertain the necessity of crafting specialized multi-user interfaces for scenarios involving multiple users and to provide guidance for the design of multi-user human–computer interactions by identifying the disparities in the interaction behavior and user experience when employing a conventional one-user interface (OUI) recipe versus a multi-user interface (MUI) recipe in the context of family collaborative cooking. To address this objective, this study employed a before-and-after comparison approach. Subsequently, adult users submitted self-assessments of their experiences using the OUI and MUI. The evaluation tools included a user experience survey questionnaire and a Likert seven-point scale, including aspects such as visual confirmation, content, operation, and satisfaction. Post-experiment interviews were also conducted with family members. The MUI exhibited greater effectiveness in terms of visual confirmation, with the “layout” assuming a role analogous to that of “text” in facilitating visual confirmation. Moreover, the operation of the MUI was found to be somewhat enjoyable. Nevertheless, no significant disparities were observed between the OUI group and the MUI group concerning content readability and most operational aspects. Furthermore, the users described their satisfaction with the MUI to be superior to that of the OUI, offering fun, convenience, and a clear appearance. Findings from my research clearly demonstrate that it is both valuable and essential to design a dedicated MUI. Full article

(This article belongs to the Special Issue Advances and Perspectives in Human-Computer Interaction—2nd Edition)

► Show Figures

Graphical abstract

30 pages, 5615 KiB

Open AccessArticle

The Personality of the Intelligent Cockpit? Exploring the Personality Traits of In-Vehicle LLMs with Psychometrics

by Qianli Lin, Zhipeng Hu and Jun Ma

Information 2024, 15(11), 679; https://doi.org/10.3390/info15110679 - 31 Oct 2024

Cited by 1 | Viewed by 2246

Abstract

The development of large language models (LLMs) has promoted a transformation of human–computer interaction (HCI) models and has attracted the attention of scholars to the evaluation of personality traits of LLMs. As an important interface for the HCI and human–machine interface (HMI) in [...] Read more.

The development of large language models (LLMs) has promoted a transformation of human–computer interaction (HCI) models and has attracted the attention of scholars to the evaluation of personality traits of LLMs. As an important interface for the HCI and human–machine interface (HMI) in the future, the intelligent cockpit has become one of LLM’s most important application scenarios. When in-vehicle intelligent systems based on in-vehicle LLMs begin to become human assistants or even partners, it has become important to study the “personality” of in-vehicle LLMs. Referring to the relevant research on personality traits of LLMs, this study selected the psychological scales Big Five Inventory-2 (BFI-2), Myers–Briggs Type Indicator (MBTI), and Short Dark Triad (SD-3) to establish a personality traits evaluation framework for in-vehicle LLMs. Then, we used this framework to evaluate the personality of three in-vehicle LLMs. The results showed that psychological scales can be used to measure the personality traits of in-vehicle LLMs. In-vehicle LLMs showed commonalities in extroversion, agreeableness, conscientiousness, and action patterns, yet differences in openness, perception, decision-making, information acquisition methods, and psychopathy. According to the results, we established anthropomorphic personality personas of different in-vehicle LLMs. This study represents a novel attempt to evaluate the personalities of in-vehicle LLMs. The experimental results deepen our understanding of in-vehicle LLMs and contribute to the further exploration of personalized fine-tuning of in-vehicle LLMs and the improvement in the user experience of the automobile in the future. Full article

► Show Figures

Figure 1

18 pages, 3089 KiB

Open AccessArticle

Surface Electromyography-Based Recognition of Electronic Taste Sensations

by Asif Ullah, Fengqi Zhang, Zhendong Song, You Wang, Shuo Zhao, Waqar Riaz and Guang Li

Biosensors 2024, 14(8), 396; https://doi.org/10.3390/bios14080396 - 16 Aug 2024

Cited by 2 | Viewed by 1890

Abstract

Taste sensation recognition is a core for taste-related queries. Most prior research has been devoted to recognizing the basic taste sensations using the Brain–Computer Interface (BCI), which includes EEG, MEG, EMG, and fMRI. This research aims to recognize electronic taste (E-Taste) sensations based [...] Read more.

Taste sensation recognition is a core for taste-related queries. Most prior research has been devoted to recognizing the basic taste sensations using the Brain–Computer Interface (BCI), which includes EEG, MEG, EMG, and fMRI. This research aims to recognize electronic taste (E-Taste) sensations based on surface electromyography (sEMG). Silver electrodes with platinum plating of the E-Taste device were placed on the tongue’s tip to stimulate various tastes and flavors. In contrast, the electrodes of the sEMG were placed on facial muscles to collect the data. The dataset was organized and preprocessed, and a random forest classifier was applied, giving a five-fold accuracy of 70.43%. The random forest classifier was used on each participant dataset individually and in groups, providing the highest accuracy of 84.79% for a single participant. Moreover, various feature combinations were extracted and acquired 72.56% accuracy after extracting eight features. For a future perspective, this research offers guidance for electronic taste recognition based on sEMG. Full article

(This article belongs to the Section Biosensor and Bioelectronic Devices)

► Show Figures

Figure 1

25 pages, 1263 KiB

Open AccessArticle

Cognitive Classifier of Hand Gesture Images for Automated Sign Language Recognition: Soft Robot Assistance Based on Neutrosophic Markov Chain Paradigm

by Muslem Al-Saidi, Áron Ballagi, Oday Ali Hassen and Saad M. Saad

Computers 2024, 13(4), 106; https://doi.org/10.3390/computers13040106 - 22 Apr 2024

Cited by 3 | Viewed by 2279

Abstract

In recent years, Sign Language Recognition (SLR) has become an additional topic of discussion in the human–computer interface (HCI) field. The most significant difficulty confronting SLR recognition is finding algorithms that will scale effectively with a growing vocabulary size and a limited supply [...] Read more.

In recent years, Sign Language Recognition (SLR) has become an additional topic of discussion in the human–computer interface (HCI) field. The most significant difficulty confronting SLR recognition is finding algorithms that will scale effectively with a growing vocabulary size and a limited supply of training data for signer-independent applications. Due to its sensitivity to shape information, automated SLR based on hidden Markov models (HMMs) cannot characterize the confusing distributions of the observations in gesture features with sufficiently precise parameters. In order to simulate uncertainty in hypothesis spaces, many scholars provide an extension of the HMMs, utilizing higher-order fuzzy sets to generate interval-type-2 fuzzy HMMs. This expansion is helpful because it brings the uncertainty and fuzziness of conventional HMM mapping under control. The neutrosophic sets are used in this work to deal with indeterminacy in a practical SLR setting. Existing interval-type-2 fuzzy HMMs cannot consider uncertain information that includes indeterminacy. However, the neutrosophic hidden Markov model successfully identifies the best route between states when there is vagueness. This expansion is helpful because it brings the uncertainty and fuzziness of conventional HMM mapping under control. The neutrosophic three membership functions (truth, indeterminate, and falsity grades) provide more layers of autonomy for assessing HMM’s uncertainty. This approach could be helpful for an extensive vocabulary and hence seeks to solve the scalability issue. In addition, it may function independently of the signer, without needing data gloves or any other input devices. The experimental results demonstrate that the neutrosophic HMM is nearly as computationally difficult as the fuzzy HMM but has a similar performance and is more robust to gesture variations. Full article

(This article belongs to the Special Issue Uncertainty-Aware Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 6752 KiB

Open AccessArticle

iBVP Dataset: RGB-Thermal rPPG Dataset with High Resolution Signal Quality Labels

by Jitesh Joshi and Youngjun Cho

Electronics 2024, 13(7), 1334; https://doi.org/10.3390/electronics13071334 - 2 Apr 2024

Cited by 14 | Viewed by 6049

Abstract

Remote photo-plethysmography (rPPG) has emerged as a non-intrusive and promising physiological sensing capability in human–computer interface (HCI) research, gradually extending its applications in health-monitoring and clinical care contexts. With advanced machine learning models, recent datasets collected in real-world conditions have gradually enhanced the [...] Read more.

Remote photo-plethysmography (rPPG) has emerged as a non-intrusive and promising physiological sensing capability in human–computer interface (HCI) research, gradually extending its applications in health-monitoring and clinical care contexts. With advanced machine learning models, recent datasets collected in real-world conditions have gradually enhanced the performance of rPPG methods in recovering heart-rate and heart-rate-variability metrics. However, the signal quality of reference ground-truth PPG data in existing datasets is by and large neglected, while poor-quality references negatively influence models. Here, this work introduces a new imaging blood volume pulse (iBVP) dataset of synchronized RGB and thermal infrared videos with ground-truth PPG signals from ear with their high-resolution-signal-quality labels, for the first time. Participants perform rhythmic breathing, head-movement, and stress-inducing tasks, which help reflect real-world variations in psycho-physiological states. This work conducts dense (per sample) signal-quality assessment to discard noisy segments of ground-truth and corresponding video frames. We further present a novel end-to-end machine learning framework, iBVPNet, that features an efficient and effective spatio-temporal feature aggregation for the reliable estimation of BVP signals. Finally, this work examines the feasibility of extracting BVP signals from thermal video frames, which is under-explored. The iBVP dataset and source codes are publicly available for research use. Full article

(This article belongs to the Special Issue Future Trends and Challenges in Human-Computer Interaction)

► Show Figures

Figure 1

22 pages, 1850 KiB

Open AccessArticle

Voice Synthesis Improvement by Machine Learning of Natural Prosody

by Joseph Kane, Michael N. Johnstone and Patryk Szewczyk

Sensors 2024, 24(5), 1624; https://doi.org/10.3390/s24051624 - 1 Mar 2024

Cited by 3 | Viewed by 2965

Abstract

Since the advent of modern computing, researchers have striven to make the human–computer interface (HCI) as seamless as possible. Progress has been made on various fronts, e.g., the desktop metaphor (interface design) and natural language processing (input). One area receiving attention recently is [...] Read more.

Since the advent of modern computing, researchers have striven to make the human–computer interface (HCI) as seamless as possible. Progress has been made on various fronts, e.g., the desktop metaphor (interface design) and natural language processing (input). One area receiving attention recently is voice activation and its corollary, computer-generated speech. Despite decades of research and development, most computer-generated voices remain easily identifiable as non-human. Prosody in speech has two primary components—intonation and rhythm—both often lacking in computer-generated voices. This research aims to enhance computer-generated text-to-speech algorithms by incorporating melodic and prosodic elements of human speech. This study explores a novel approach to add prosody by using machine learning, specifically an LSTM neural network, to add paralinguistic elements to a recorded or generated voice. The aim is to increase the realism of computer-generated text-to-speech algorithms, to enhance electronic reading applications, and improved artificial voices for those in need of artificial assistance to speak. A computer that is able to also convey meaning with a spoken audible announcement will also improve human-to-computer interactions. Applications for the use of such an algorithm may include improving high-definition audio codecs for telephony, renewing old recordings, and lowering barriers to the utilization of computing. This research deployed a prototype modular platform for digital speech improvement by analyzing and generalizing algorithms into a modular system through laboratory experiments to optimize combinations and performance in edge cases. The results were encouraging, with the LSTM-based encoder able to produce realistic speech. Further work will involve optimizing the algorithm and comparing its performance against other approaches. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

20 pages, 16971 KiB

Open AccessArticle

Human–Computer Interaction Multi-Task Modeling Based on Implicit Intent EEG Decoding

by Xiu Miao and Wenjun Hou

Appl. Sci. 2024, 14(1), 368; https://doi.org/10.3390/app14010368 - 30 Dec 2023

Cited by 2 | Viewed by 2087

Abstract

In the short term, a fully autonomous level of machine intelligence cannot be achieved. Humans are still an important part of HCI systems, and intelligent systems should be able to “feel” and “predict” human intentions in order to achieve dynamic coordination between humans [...] Read more.

In the short term, a fully autonomous level of machine intelligence cannot be achieved. Humans are still an important part of HCI systems, and intelligent systems should be able to “feel” and “predict” human intentions in order to achieve dynamic coordination between humans and machines. Intent recognition is very important to improve the accuracy and efficiency of the HCI system. However, it is far from enough to focus only on explicit intent. There is a lot of vague and hidden implicit intent in the process of human–computer interaction. Based on passive brain–computer interface (pBCI) technology, this paper proposes a method to integrate humans into HCI systems naturally, which is to establish an intent-based HCI model and automatically recognize the implicit intent according to human EEG signals. In view of the existing problems of few divisible patterns and low efficiency of implicit intent recognition, this paper finally proves that EEG can be used as the basis for judging human implicit intent through extracting multi-task intention, carrying out experiments, and constructing algorithmic models. The CSP + SVM algorithm model can effectively improve the EEG decoding performance of implicit intent in HCI, and the effectiveness of the CSP algorithm on intention feature extraction is further verified by combining 3D space visualization. The translation of implicit intent information is of significance for the study of intent-based HCI models, the development of HCI systems, and the improvement of human–machine collaboration efficiency. Full article

► Show Figures

Figure 1

Search Results (72)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (72)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI