Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (453)

Search Parameters:
Keywords = voice interactions

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 2440 KiB  
Article
Dog–Stranger Interactions Can Facilitate Canine Incursion into Wilderness: The Role of Food Provisioning and Sociability
by Natalia Rojas-Troncoso, Valeria Gómez-Silva, Annegret Grimm-Seyfarth and Elke Schüttler
Biology 2025, 14(8), 1006; https://doi.org/10.3390/biology14081006 - 6 Aug 2025
Abstract
Most research on domestic dog (Canis familiaris) behavior has focused on pets with restricted movement. However, free-ranging dogs exist in diverse cultural contexts globally, and their interactions with humans are less understood. Tourists can facilitate unrestricted dog movement into wilderness areas, [...] Read more.
Most research on domestic dog (Canis familiaris) behavior has focused on pets with restricted movement. However, free-ranging dogs exist in diverse cultural contexts globally, and their interactions with humans are less understood. Tourists can facilitate unrestricted dog movement into wilderness areas, where they may negatively impact wildlife. This study investigated which stimuli—namely, voice, touch, or food—along with inherent factors (age, sex, sociability) motivate free-ranging dogs to follow a human stranger. We measured the distance (up to 600 m) of 129 free-ranging owned and stray dogs from three villages in southern Chile as they followed an experimenter who presented them one of the above stimuli or none (control). To evaluate the effect of dog sociability (i.e., positive versus stress-related or passive behaviors), we performed a 30 s socialization test (standing near the dog without interacting) before presenting a 10 s stimulus twice. We also tracked whether the dog was in the company of other dogs. Each focus dog was video-recorded and tested up to three times over five days. Generalized linear mixed-effects models revealed that the food stimulus significantly influenced dogs’ motivation to follow a stranger, as well as a high proportion of sociable behaviors directed towards humans and the company of other dogs present during the experiment. Juveniles tended to follow a stranger more than adults or seniors, but no effects were found for the dog’s sex, whether an owner was present, the repetition of trials, the location where the study was performed, or for individuals as a random variable. This research highlights that sociability as an inherent factor shapes dog–stranger interactions in free-ranging dogs when food is given. In the context of wildlife conservation, we recommend that managers promote awareness among local communities and tourists to avoid feeding dogs, especially in the context of outdoor activities close to wilderness. Full article
(This article belongs to the Special Issue Biology, Ecology, Management and Conservation of Canidae)
Show Figures

Graphical abstract

36 pages, 1010 KiB  
Article
SIBERIA: A Self-Sovereign Identity and Multi-Factor Authentication Framework for Industrial Access
by Daniel Paredes-García, José Álvaro Fernández-Carrasco, Jon Ander Medina López, Juan Camilo Vasquez-Correa, Imanol Jericó Yoldi, Santiago Andrés Moreno-Acevedo, Ander González-Docasal, Haritz Arzelus Irazusta, Aitor Álvarez Muniain and Yeray de Diego Loinaz
Appl. Sci. 2025, 15(15), 8589; https://doi.org/10.3390/app15158589 (registering DOI) - 2 Aug 2025
Viewed by 213
Abstract
The growing need for secure and privacy-preserving identity management in industrial environments has exposed the limitations of traditional, centralized authentication systems. In this context, SIBERIA was developed as a modular solution that empowers users to control their own digital identities, while ensuring robust [...] Read more.
The growing need for secure and privacy-preserving identity management in industrial environments has exposed the limitations of traditional, centralized authentication systems. In this context, SIBERIA was developed as a modular solution that empowers users to control their own digital identities, while ensuring robust protection of critical services. The system is designed in alignment with European standards and regulations, including EBSI, eIDAS 2.0, and the GDPR. SIBERIA integrates a Self-Sovereign Identity (SSI) framework with a decentralized blockchain-based infrastructure for the issuance and verification of Verifiable Credentials (VCs). It incorporates multi-factor authentication by combining a voice biometric module, enhanced with spoofing-aware techniques to detect synthetic or replayed audio, and a behavioral biometrics module that provides continuous authentication by monitoring user interaction patterns. The system enables secure and user-centric identity management in industrial contexts, ensuring high resistance to impersonation and credential theft while maintaining regulatory compliance. SIBERIA demonstrates that it is possible to achieve both strong security and user autonomy in digital identity systems by leveraging decentralized technologies and advanced biometric verification methods. Full article
(This article belongs to the Special Issue Blockchain and Distributed Systems)
Show Figures

Figure 1

14 pages, 283 KiB  
Article
Teens, Tech, and Talk: Adolescents’ Use of and Emotional Reactions to Snapchat’s My AI Chatbot
by Gaëlle Vanhoffelen, Laura Vandenbosch and Lara Schreurs
Behav. Sci. 2025, 15(8), 1037; https://doi.org/10.3390/bs15081037 - 30 Jul 2025
Viewed by 238
Abstract
Due to technological advancements such as generative artificial intelligence (AI) and large language models, chatbots enable increasingly human-like, real-time conversations through text (e.g., OpenAI’s ChatGPT) and voice (e.g., Amazon’s Alexa). One AI chatbot that is specifically designed to meet the social-supportive needs of [...] Read more.
Due to technological advancements such as generative artificial intelligence (AI) and large language models, chatbots enable increasingly human-like, real-time conversations through text (e.g., OpenAI’s ChatGPT) and voice (e.g., Amazon’s Alexa). One AI chatbot that is specifically designed to meet the social-supportive needs of youth is Snapchat’s My AI. Given its increasing popularity among adolescents, the present study investigated whether adolescents’ likelihood of using My AI, as well as their positive or negative emotional experiences from interacting with the chatbot, is related to socio-demographic factors (i.e., gender, age, and socioeconomic status (SES)). A cross-sectional study was conducted among 303 adolescents (64.1% girls, 35.9% boys, 1.0% other, 0.7% preferred not to say their gender; Mage = 15.89, SDage = 1.69). The findings revealed that younger adolescents were more likely to use My AI and experienced more positive emotions from these interactions than older adolescents. No significant relationships were found for gender or SES. These results highlight the potential for age to play a critical role in shaping adolescents’ engagement with AI chatbots on social media and their emotional outcomes from such interactions, underscoring the need to consider developmental factors in AI design and policy. Full article
26 pages, 6831 KiB  
Article
Human–Robot Interaction and Tracking System Based on Mixed Reality Disassembly Tasks
by Raúl Calderón-Sesmero, Adrián Lozano-Hernández, Fernando Frontela-Encinas, Guillermo Cabezas-López and Mireya De-Diego-Moro
Robotics 2025, 14(8), 106; https://doi.org/10.3390/robotics14080106 - 30 Jul 2025
Viewed by 199
Abstract
Disassembly is a crucial process in industrial operations, especially in tasks requiring high precision and strict safety standards when handling components with collaborative robots. However, traditional methods often rely on rigid and sequential task planning, which makes it difficult to adapt to unforeseen [...] Read more.
Disassembly is a crucial process in industrial operations, especially in tasks requiring high precision and strict safety standards when handling components with collaborative robots. However, traditional methods often rely on rigid and sequential task planning, which makes it difficult to adapt to unforeseen changes or dynamic environments. This rigidity not only limits flexibility but also leads to prolonged execution times, as operators must follow predefined steps that do not allow for real-time adjustments. Although techniques like teleoperation have attempted to address these limitations, they often hinder direct human–robot collaboration within the same workspace, reducing effectiveness in dynamic environments. In response to these challenges, this research introduces an advanced human–robot interaction (HRI) system leveraging a mixed-reality (MR) interface embedded in a head-mounted device (HMD). The system enables operators to issue real-time control commands using multimodal inputs, including voice, gestures, and gaze tracking. These inputs are synchronized and processed via the Robot Operating System (ROS2), enabling dynamic and flexible task execution. Additionally, the integration of deep learning algorithms ensures precise detection and validation of disassembly components, enhancing accuracy. Experimental evaluations demonstrate significant improvements, including reduced task completion times, enhanced operator experience, and compliance with strict adherence to safety standards. This scalable solution offers broad applicability for general-purpose disassembly tasks, making it well-suited for complex industrial scenarios. Full article
(This article belongs to the Special Issue Robot Teleoperation Integrating with Augmented Reality)
Show Figures

Figure 1

17 pages, 8512 KiB  
Article
Interactive Holographic Display System Based on Emotional Adaptability and CCNN-PCG
by Yu Zhao, Zhong Xu, Ting-Yu Zhang, Meng Xie, Bing Han and Ye Liu
Electronics 2025, 14(15), 2981; https://doi.org/10.3390/electronics14152981 - 26 Jul 2025
Viewed by 315
Abstract
Against the backdrop of the rapid advancement of intelligent speech interaction and holographic display technologies, this paper introduces an interactive holographic display system. This paper applies 2D-to-3D technology to acquisition work and uses a Complex-valued Convolutional Neural Network Point Cloud Gridding (CCNN-PCG) algorithm [...] Read more.
Against the backdrop of the rapid advancement of intelligent speech interaction and holographic display technologies, this paper introduces an interactive holographic display system. This paper applies 2D-to-3D technology to acquisition work and uses a Complex-valued Convolutional Neural Network Point Cloud Gridding (CCNN-PCG) algorithm to generate a computer-generated hologram (CGH) with depth information for application in point cloud data. During digital human hologram building, 2D-to-3D conversion yields high-precision point cloud data. The system uses ChatGLM for natural language processing and emotion-adaptive responses, enabling multi-turn voice dialogs and text-driven model generation. The CCNN-PCG algorithm reduces computational complexity and improves display quality. Simulations and experiments show that CCNN-PCG enhances reconstruction quality and speeds up computation by over 2.2 times. This research provides a theoretical framework and practical technology for holographic interactive systems, applicable in virtual assistants, educational displays, and other fields. Full article
(This article belongs to the Special Issue Artificial Intelligence, Computer Vision and 3D Display)
Show Figures

Figure 1

26 pages, 2261 KiB  
Article
Real-Time Fall Monitoring for Seniors via YOLO and Voice Interaction
by Eugenia Tîrziu, Ana-Mihaela Vasilevschi, Adriana Alexandru and Eleonora Tudora
Future Internet 2025, 17(8), 324; https://doi.org/10.3390/fi17080324 - 23 Jul 2025
Viewed by 250
Abstract
In the context of global demographic aging, falls among the elderly remain a major public health concern, often leading to injury, hospitalization, and loss of autonomy. This study proposes a real-time fall detection system that combines a modern computer vision model, YOLOv11 with [...] Read more.
In the context of global demographic aging, falls among the elderly remain a major public health concern, often leading to injury, hospitalization, and loss of autonomy. This study proposes a real-time fall detection system that combines a modern computer vision model, YOLOv11 with integrated pose estimation, and an Artificial Intelligence (AI)-based voice assistant designed to reduce false alarms and improve intervention efficiency and reliability. The system continuously monitors human posture via video input, detects fall events based on body dynamics and keypoint analysis, and initiates a voice-based interaction to assess the user’s condition. Depending on the user’s verbal response or the absence thereof, the system determines whether to trigger an emergency alert to caregivers or family members. All processing, including speech recognition and response generation, is performed locally to preserve user privacy and ensure low-latency performance. The approach is designed to support independent living for older adults. Evaluation of 200 simulated video sequences acquired by the development team demonstrated high precision and recall, along with a decrease in false positives when incorporating voice-based confirmation. In addition, the system was also evaluated on an external dataset to assess its robustness. Our results highlight the system’s reliability and scalability for real-world in-home elderly monitoring applications. Full article
Show Figures

Figure 1

22 pages, 11043 KiB  
Article
Digital Twin-Enabled Adaptive Robotics: Leveraging Large Language Models in Isaac Sim for Unstructured Environments
by Sanjay Nambiar, Rahul Chiramel Paul, Oscar Chigozie Ikechukwu, Marie Jonsson and Mehdi Tarkian
Machines 2025, 13(7), 620; https://doi.org/10.3390/machines13070620 - 17 Jul 2025
Viewed by 425
Abstract
As industrial automation evolves towards human-centric, adaptable solutions, collaborative robots must overcome challenges in unstructured, dynamic environments. This paper extends our previous work on developing a digital shadow for industrial robots by introducing a comprehensive framework that bridges the gap between physical systems [...] Read more.
As industrial automation evolves towards human-centric, adaptable solutions, collaborative robots must overcome challenges in unstructured, dynamic environments. This paper extends our previous work on developing a digital shadow for industrial robots by introducing a comprehensive framework that bridges the gap between physical systems and their virtual counterparts. The proposed framework advances toward a fully functional digital twin by integrating real-time perception and intuitive human–robot interaction capabilities. The framework is applied to a hospital test lab scenario, where a YuMi robot automates the sorting of microscope slides. The system incorporates a RealSense D435i depth camera for environment perception, Isaac Sim for virtual environment synchronization, and a locally hosted large language model (Mistral 7B) for interpreting user voice commands. These components work together to achieve bi-directional synchronization between the physical and digital environments. The framework was evaluated through 20 test runs under varying conditions. A validation study measured the performance of the perception module, simulation, and language interface, with a 60% overall success rate. Additionally, synchronization accuracy between the simulated and physical robot joint movements reached 98.11%, demonstrating strong alignment between the digital and physical systems. By combining local LLM processing, real-time vision, and robot simulation, the approach enables untrained users to interact with collaborative robots in dynamic settings. The results highlight its potential for improving flexibility and usability in industrial automation. Full article
(This article belongs to the Topic Smart Production in Terms of Industry 4.0 and 5.0)
Show Figures

Figure 1

20 pages, 1798 KiB  
Article
An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment
by Alessio Catalfamo, Antonio Celesti, Maria Fazio, A. F. M. Saifuddin Saif, Yu-Sheng Lin, Edelberto Franco Silva and Massimo Villari
Big Data Cogn. Comput. 2025, 9(7), 188; https://doi.org/10.3390/bdcc9070188 - 17 Jul 2025
Viewed by 474
Abstract
Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications [...] Read more.
Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications through users’ voice commands presents significant challenges due to the hardware and software limitations of headset devices. This paper aims to bridge this gap by proposing a methodology to address these issues. In particular, starting from a Mel-Frequency Cepstral Coefficient (MFCC) extraction algorithm able to capture the unique characteristics of the user’s voice, we pass it as input to a Convolutional Neural Network (CNN) model. After that, in order to integrate the CNN model with a VR application running on a standalone headset, such as Oculus Quest, we converted it into an Open Neural Network Exchange (ONNX) format, i.e., a Machine Learning (ML) interoperability open standard format. The proposed system demonstrates good performance and represents a foundation for the development of user-centric, effective computing systems, enhancing accessibility to VR environments through voice-based commands. Experiments demonstrate that a native CNN model developed through TensorFlow presents comparable performances with respect to the corresponding CNN model converted into the ONNX format, paving the way towards the development of VR applications running in headsets controlled through the user’s voice. Full article
Show Figures

Figure 1

14 pages, 217 KiB  
Article
Narration as Characterization in First-Person Realist Fiction: Complicating a Universally Acknowledged Truth
by James Phelan
Humanities 2025, 14(7), 151; https://doi.org/10.3390/h14070151 - 16 Jul 2025
Viewed by 293
Abstract
I argue that the universally accepted assumption that in realist fiction a character narrator’s narration contributes to their characterization needs to be complicated. Working with a conception of narrative as rhetoric that highlights readerly interest in the author’s handling of the mimetic, thematic, [...] Read more.
I argue that the universally accepted assumption that in realist fiction a character narrator’s narration contributes to their characterization needs to be complicated. Working with a conception of narrative as rhetoric that highlights readerly interest in the author’s handling of the mimetic, thematic, and synthetic components of narrative, I suggest that the question about narration as characterization is one about the relation between the mimetic (character as possible person) and synthetic (character as invented construct) components. In addition, understanding the mimetic-synthetic relation requires attention to issues at the macro and micro levels of such narratives. At the macro level, I note the importance of (1) the tacit knowledge, shared by both authors and audiences, of the fictionality of character narration, which means authors write and readers read with an interest in its payoffs; and of (2) the recognition that character narration functions simultaneously along two tracks of communication: that between the character narrator and their narratee, and that between the author and their audience. These macro level matters then provide a frame within which authors and readers understand what happens at the micro level. At that level, I identify seven features of a character’s telling that have the potential to be used for characterization—voice, occasion, un/reliability, authority, self-consciousness, narrative control, and aesthetics. I also note that these features have their counterparts in the author’s telling. Finally, I propose that characterization via narration results from the interaction between the salient features of the character’s telling and their counterparts in the author’s telling. I develop these points through the analysis of four diverse case studies: Mark Twain’s Huckleberry Finn, Robert Browning’s “My Last Duchess,” Nadine Gordimer’s “Homage,” and Ernest Hemingway’s A Farewell to Arms. Full article
21 pages, 1118 KiB  
Review
Integrating Large Language Models into Robotic Autonomy: A Review of Motion, Voice, and Training Pipelines
by Yutong Liu, Qingquan Sun and Dhruvi Rajeshkumar Kapadia
AI 2025, 6(7), 158; https://doi.org/10.3390/ai6070158 - 15 Jul 2025
Viewed by 1486
Abstract
This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into [...] Read more.
This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into low-level control signals, supporting semantic planning and enabling adaptive execution. Systems like SayTap improve gait stability through LLM-generated contact patterns, while TrustNavGPT achieves a 5.7% word error rate (WER) under noisy voice-guided conditions by modeling user uncertainty. Frameworks such as MapGPT, LLM-Planner, and 3D-LOTUS++ integrate multi-modal data—including vision, speech, and proprioception—for robust planning and real-time recovery. We also highlight the use of physics-informed neural networks (PINNs) to model object deformation and support precision in contact-rich manipulation tasks. To bridge the gap between simulation and real-world deployment, we synthesize best practices from benchmark datasets (e.g., RH20T, Open X-Embodiment) and training pipelines designed for one-shot imitation learning and cross-embodiment generalization. Additionally, we analyze deployment trade-offs across cloud, edge, and hybrid architectures, emphasizing latency, scalability, and privacy. The survey concludes with a multi-dimensional taxonomy and cross-domain synthesis, offering design insights and future directions for building intelligent, human-aligned robotic systems powered by LLMs. Full article
Show Figures

Figure 1

37 pages, 618 KiB  
Systematic Review
Interaction, Artificial Intelligence, and Motivation in Children’s Speech Learning and Rehabilitation Through Digital Games: A Systematic Literature Review
by Chra Abdoulqadir and Fernando Loizides
Information 2025, 16(7), 599; https://doi.org/10.3390/info16070599 - 12 Jul 2025
Viewed by 525
Abstract
The integration of digital serious games into speech learning (rehabilitation) has demonstrated significant potential in enhancing accessibility and inclusivity for children with speech disabilities. This review of the state of the art examines the role of serious games, Artificial Intelligence (AI), and Natural [...] Read more.
The integration of digital serious games into speech learning (rehabilitation) has demonstrated significant potential in enhancing accessibility and inclusivity for children with speech disabilities. This review of the state of the art examines the role of serious games, Artificial Intelligence (AI), and Natural Language Processing (NLP) in speech rehabilitation, with a particular focus on interaction modalities, engagement autonomy, and motivation. We have reviewed 45 selected studies. Our key findings show how intelligent tutoring systems, adaptive voice-based interfaces, and gamified speech interventions can empower children to engage in self-directed speech learning, reducing dependence on therapists and caregivers. The diversity of interaction modalities, including speech recognition, phoneme-based exercises, and multimodal feedback, demonstrates how AI and Assistive Technology (AT) can personalise learning experiences to accommodate diverse needs. Furthermore, the incorporation of gamification strategies, such as reward systems and adaptive difficulty levels, has been shown to enhance children’s motivation and long-term participation in speech rehabilitation. The gaps identified show that despite advancements, challenges remain in achieving universal accessibility, particularly regarding speech recognition accuracy, multilingual support, and accessibility for users with multiple disabilities. This review advocates for interdisciplinary collaboration across educational technology, special education, cognitive science, and human–computer interaction (HCI). Our work contributes to the ongoing discourse on lifelong inclusive education, reinforcing the potential of AI-driven serious games as transformative tools for bridging learning gaps and promoting speech rehabilitation beyond clinical environments. Full article
Show Figures

Graphical abstract

29 pages, 7197 KiB  
Review
Recent Advances in Electrospun Nanofiber-Based Self-Powered Triboelectric Sensors for Contact and Non-Contact Sensing
by Jinyue Tian, Jiaxun Zhang, Yujie Zhang, Jing Liu, Yun Hu, Chang Liu, Pengcheng Zhu, Lijun Lu and Yanchao Mao
Nanomaterials 2025, 15(14), 1080; https://doi.org/10.3390/nano15141080 - 11 Jul 2025
Viewed by 568
Abstract
Electrospun nanofiber-based triboelectric nanogenerators (TENGs) have emerged as a highly promising class of self-powered sensors for a broad range of applications, particularly in intelligent sensing technologies. By combining the advantages of electrospinning and triboelectric nanogenerators, these sensors offer superior characteristics such as high [...] Read more.
Electrospun nanofiber-based triboelectric nanogenerators (TENGs) have emerged as a highly promising class of self-powered sensors for a broad range of applications, particularly in intelligent sensing technologies. By combining the advantages of electrospinning and triboelectric nanogenerators, these sensors offer superior characteristics such as high sensitivity, mechanical flexibility, lightweight structure, and biocompatibility, enabling their integration into wearable electronics and biomedical interfaces. This review presents a comprehensive overview of recent progress in electrospun nanofiber-based TENGs, covering their working principles, operating modes, and material composition. Both pure polymer and composite nanofibers are discussed, along with various electrospinning techniques that enable control over morphology and performance at the nanoscale. We explore their practical implementations in both contact-type and non-contact-type sensing, such as human–machine interaction, physiological signal monitoring, gesture recognition, and voice detection. These applications demonstrate the potential of TENGs to enable intelligent, low-power, and real-time sensing systems. Furthermore, this paper points out critical challenges and future directions, including durability under long-term operation, scalable and cost-effective fabrication, and seamless integration with wireless communication and artificial intelligence technologies. With ongoing advancements in nanomaterials, fabrication techniques, and system-level integration, electrospun nanofiber-based TENGs are expected to play a pivotal role in shaping the next generation of self-powered, intelligent sensing platforms across diverse fields such as healthcare, environmental monitoring, robotics, and smart wearable systems. Full article
(This article belongs to the Special Issue Self-Powered Flexible Sensors Based on Triboelectric Nanogenerators)
Show Figures

Figure 1

15 pages, 1359 KiB  
Article
Phoneme-Aware Hierarchical Augmentation and Semantic-Aware SpecAugment for Low-Resource Cantonese Speech Recognition
by Lusheng Zhang, Shie Wu and Zhongxun Wang
Sensors 2025, 25(14), 4288; https://doi.org/10.3390/s25144288 - 9 Jul 2025
Viewed by 441
Abstract
Cantonese Automatic Speech Recognition (ASR) is hindered by tonal complexity, acoustic diversity, and a lack of labelled data. This study proposes a phoneme-aware hierarchical augmentation framework that enhances performance without additional annotation. A Phoneme Substitution Matrix (PSM), built from Montreal Forced Aligner alignments [...] Read more.
Cantonese Automatic Speech Recognition (ASR) is hindered by tonal complexity, acoustic diversity, and a lack of labelled data. This study proposes a phoneme-aware hierarchical augmentation framework that enhances performance without additional annotation. A Phoneme Substitution Matrix (PSM), built from Montreal Forced Aligner alignments and Tacotron-2 synthesis, injects adversarial phoneme variants into both transcripts and their aligned audio segments, enlarging pronunciation diversity. Concurrently, a semantic-aware SpecAugment scheme exploits wav2vec 2.0 attention heat maps and keyword boundaries to adaptively mask informative time–frequency regions; a reinforcement-learning controller tunes the masking schedule online, forcing the model to rely on a wider context. On the Common Voice Cantonese 50 h subset, the combined strategy reduces the character error rate (CER) from 26.17% to 16.88% with wav2vec 2.0 and from 38.83% to 23.55% with Zipformer. At 100 h, the CER further drops to 4.27% and 2.32%, yielding relative gains of 32–44%. Ablation studies confirm that phoneme-level and masking components provide complementary benefits. The framework offers a practical, model-independent path toward accurate ASR for Cantonese and other low-resource tonal languages. This paper presents an intelligent sensing-oriented modeling framework for speech signals, which is suitable for deployment on edge or embedded systems to process input from audio sensors (e.g., microphones) and shows promising potential for voice-interactive terminal applications. Full article
Show Figures

Figure 1

16 pages, 815 KiB  
Review
Microvascularization of the Vocal Folds: Molecular Architecture, Functional Insights, and Personalized Research Perspectives
by Roxana-Andreea Popa, Cosmin-Gabriel Popa, Delia Hînganu and Marius Valeriu Hînganu
J. Pers. Med. 2025, 15(7), 293; https://doi.org/10.3390/jpm15070293 - 7 Jul 2025
Viewed by 418
Abstract
Introduction: The vascular architecture of the vocal folds plays a critical role in sustaining the dynamic demands of phonation. Disruptions in this microvascular system are linked to various pathological conditions, including Reinke’s edema, hemorrhage, and laryngeal carcinoma. This review explores the structural [...] Read more.
Introduction: The vascular architecture of the vocal folds plays a critical role in sustaining the dynamic demands of phonation. Disruptions in this microvascular system are linked to various pathological conditions, including Reinke’s edema, hemorrhage, and laryngeal carcinoma. This review explores the structural and functional components of vocal fold microvascularization, with emphasis on pericytes, endothelial interactions, and neurovascular regulation. Materials and Methods: A systematic review of the literature was conducted using databases such as PubMed, Scopus, Web of Science, and Embase. Keywords included “pericytes”, “Reinke’s edema”, and “vocal fold microvascularization”. Selected studies were peer-reviewed and met criteria for methodological quality and relevance to laryngeal microvascular physiology and pathology. Results: The vocal fold vasculature is organized in a parallel, tree-like pattern with distinct arterioles, capillaries, and venules. Capillaries dominate the superficial lamina propria, while transitional vessels connect to deeper arterioles surrounded by smooth muscle. Pericytes, present from birth, form tight associations with endothelial cells and contribute to capillary stability, vessel remodeling, and mechanical protection during vibration. Their thick cytoplasmic processes suggest a unique adaptation to the biomechanical stress of phonation. Arteriovenous anastomoses regulate perfusion by shunting blood according to functional demand. Furthermore, neurovascular control is mediated by noradrenergic fibers and neuropeptides such as VIP and CGRP, modulating vascular tone and glandular secretion. The limited lymphatic presence in the vocal fold mucosa contributes to edema accumulation while also restricting carcinoma spread, offering both therapeutic challenges and advantages. Conclusions: A deeper understanding of vocal fold microvascularization enhances clinical approaches to voice disorders and laryngeal disease, offering new perspectives for targeted therapies and regenerative strategies. Full article
(This article belongs to the Special Issue Clinical Diagnosis and Treatment in Otorhinolaryngology)
Show Figures

Figure 1

19 pages, 901 KiB  
Article
The Effects of Psychological Capital and Workplace Bullying on Intention to Stay in the Lodging Industry
by Can Olgun and Brijesh Thapa
Tour. Hosp. 2025, 6(3), 127; https://doi.org/10.3390/tourhosp6030127 - 2 Jul 2025
Viewed by 378
Abstract
Workplace bullying is a widespread yet rarely recognized stressor that impairs employee productivity and organizational harmony. It requires attention in the hospitality industry, where a high volume of interpersonal interactions occurs. It is essential to address employees’ overall outlook and attitudes toward hardships [...] Read more.
Workplace bullying is a widespread yet rarely recognized stressor that impairs employee productivity and organizational harmony. It requires attention in the hospitality industry, where a high volume of interpersonal interactions occurs. It is essential to address employees’ overall outlook and attitudes toward hardships resulting from stressful work environments. This study examined workplace bullying by highlighting the role of psychological capital in employees’ responses to hostile work environments. The relationships among employee voice, perceived organizational support, organizational commitment, and intention to stay were further elaborated based on a conceptual model. An online survey was distributed to hotel employees, and the results were analyzed using structural equation modeling. The indirect effects of psychological capital on perceived organizational support and organizational commitment were stronger than those of workplace bullying. The results demonstrate that employees with higher psychological capital have more proactive response tendencies to workplace bullying. Full article
Show Figures

Figure 1

Back to TopTop