Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (614)

Search Parameters:
Keywords = human voice

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 6313 KB  
Article
IoT-Driven Pull Scheduling to Avoid Congestion in Human Emergency Evacuation
by Erol Gelenbe and Yuting Ma
Sensors 2026, 26(3), 837; https://doi.org/10.3390/s26030837 (registering DOI) - 27 Jan 2026
Abstract
The efficient and timely management of human evacuation during emergency events is an important area of research where the Internet of Things (IoT) can be of great value. Significant areas of application for optimum evacuation strategies include buildings, sports arenas, cultural venues, such [...] Read more.
The efficient and timely management of human evacuation during emergency events is an important area of research where the Internet of Things (IoT) can be of great value. Significant areas of application for optimum evacuation strategies include buildings, sports arenas, cultural venues, such as museums and concert halls, and ships that carry passengers, such as cruise ships. In many cases, the evacuation process is complicated by constraints on space and movement, such as corridors, staircases, and passageways, that can cause congestion and slow the evacuation process. In such circumstances, the Internet of Things (IoT) can be used to sense the presence of evacuees in different locations, to sense hazards and congestion, to assist in making decisions based on sensing to guide the evacuees dynamically in the most effective direction to limit or eliminate congestion and maximize safety, and notify to the passengers the directions they should take or whether they should stop and wait, through signaling with active IoT devices that can include voice and visual indications and signposts. This paper uses an analytical queueing network approach to analyze an emergency evacuation system, and suggests the use of the Pull Policy, which employs the IoT to direct evacuees in a manner that reduces downstream congestion by signalling them to move forward when the preceding evacuees exit the system. The IoT-based Pull Policy is analyzed using a realistic representation of evacuation from an existing commercial cruise ship, with a queueing network model that also allows for a computationally very efficient comparison of different routing rules with wide-ranging variations in speed parameters of each of the individual evacuees.Numerical examples are used to demonstrate its value for the timely evacuation of passengers within the confined space of a cruise ship. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

27 pages, 4789 KB  
Article
Assessing Interaction Quality in Human–AI Dialogue: An Integrative Review and Multi-Layer Framework for Conversational Agents
by Luca Marconi, Luca Longo and Federico Cabitza
Mach. Learn. Knowl. Extr. 2026, 8(2), 28; https://doi.org/10.3390/make8020028 - 26 Jan 2026
Abstract
Conversational agents are transforming digital interactions across various domains, including healthcare, education, and customer service, thanks to advances in large language models (LLMs). As these systems become more autonomous and ubiquitous, understanding what constitutes high-quality interaction from a user perspective is increasingly critical. [...] Read more.
Conversational agents are transforming digital interactions across various domains, including healthcare, education, and customer service, thanks to advances in large language models (LLMs). As these systems become more autonomous and ubiquitous, understanding what constitutes high-quality interaction from a user perspective is increasingly critical. Despite growing empirical research, the field lacks a unified framework for defining, measuring, and designing user-perceived interaction quality in human–artificial intelligence (AI) dialogue. Here, we present an integrative review of 125 empirical studies published between 2017 and 2025, spanning text-, voice-, and LLM-powered systems. Our synthesis identifies three consistent layers of user judgment: a pragmatic core (usability, task effectiveness, and conversational competence), a social–affective layer (social presence, warmth, and synchronicity), and an accountability and inclusion layer (transparency, accessibility, and fairness). These insights are formalised into a four-layer interpretive framework—Capacity, Alignment, Levers, and Outcomes—operationalised via a Capacity × Alignment matrix that maps distinct success and failure regimes. It also identifies design levers such as anthropomorphism, role framing, and onboarding strategies. The framework consolidates constructs, positions inclusion and accountability as central to quality, and offers actionable guidance for evaluation and design. This research redefines interaction quality as a dialogic construct, shifting the focus from system performance to co-orchestrated, user-centred dialogue quality. Full article
Show Figures

Figure 1

16 pages, 803 KB  
Article
AI-Powered Physiotherapy: Evaluating LLMs Against Students in Clinical Rehabilitation Scenarios
by Ioanna Michou, Athanasios Fouras, Dionysia Chrysanthakopoulou, Marina Theodoritsi, Savina Mariettou, Sotiria Stellatou and Constantinos Koutsojannis
Appl. Sci. 2026, 16(3), 1165; https://doi.org/10.3390/app16031165 - 23 Jan 2026
Viewed by 116
Abstract
Generative artificial intelligence (GenAI), particularly large language models (LLMs) such as ChatGPT and DeepSeek, is transforming healthcare by enhancing clinical decision-making, education, and patient interaction. This exploratory study compares the responses of ChatGPT (GPT-4.1) and DeepSeek-V2 against 90 final-year physiotherapy students in Greece [...] Read more.
Generative artificial intelligence (GenAI), particularly large language models (LLMs) such as ChatGPT and DeepSeek, is transforming healthcare by enhancing clinical decision-making, education, and patient interaction. This exploratory study compares the responses of ChatGPT (GPT-4.1) and DeepSeek-V2 against 90 final-year physiotherapy students in Greece on the quality of the responses to 60 clinical questions across four rehabilitation domains: low back pain, multiple sclerosis, frozen shoulder, and knee osteoarthritis (15 questions per domain). The questions spanned basic knowledge, diagnosis, alternative treatments, and rehabilitation practices. The responses were evaluated for their relevance, accuracy, clarity, completeness, and consistency with clinical practice guidelines (CPGs), emphasizing conceptual understanding. This study provides novel contributions by (i) benchmarking LLMs in physiotherapy-specific domains (low back pain, multiple sclerosis, frozen shoulder, and knee osteoarthritis) underrepresented in prior AI-health evaluations; (ii) directly comparing the LLM written response quality to student performance under exam constraints; and (iii) highlighting the improvement potential for education, complementing ChatGPT’s established role in physician decision support. The results indicate that the LLMs produced higher-quality written responses than students in most domains, particularly in the global response quality and the conceptual depth of written responses, highlighting their potential as educational aids for knowledge-based tasks, although not equivalent to clinical expertise. This suggests AI’s role in physiotherapy as a supportive tool rather than a replacement for hands-on clinical skills and asks whether GenAI could transform physiotherapy practice by augmenting, rather than threatening, human-centered care, for its potential as a knowledge support tool in education, pending validation in clinical contexts. This study explores these findings, compares them with the related work, and discusses whether GenAI will transform or threaten physiotherapy practice. Ethical considerations, limitations, and future directions, including AI voice assistants and AI characters, are addressed. Full article
Show Figures

Figure 1

43 pages, 595 KB  
Review
An Overview of Severe Myalgic Encephalomyelitis
by Mark Vink and Alexandra Vink-Niese
J. Clin. Med. 2026, 15(2), 805; https://doi.org/10.3390/jcm15020805 - 19 Jan 2026
Viewed by 1755
Abstract
In this article, we have reviewed the literature on severe myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). ME/CFS is a clinical diagnosis in the absence of a diagnostic test. However, in research settings and disability disputes, 2-day cardiopulmonary exercise testing can be used to diagnose [...] Read more.
In this article, we have reviewed the literature on severe myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). ME/CFS is a clinical diagnosis in the absence of a diagnostic test. However, in research settings and disability disputes, 2-day cardiopulmonary exercise testing can be used to diagnose and document the abnormal response to exercise. Biomedical research into this disease has been scarce and underfunded for decades. Consequently, there are no effective treatments. In its most severe form, it is more disabling than many other diseases, and patients are bedbound 24/7, dependent on carers, and spend their days in dark and quiet rooms. Even the soft sound of a human voice can lead to further deterioration. Some of the very severely ill suffer from life-threatening malnutrition and need to be tube-fed. The COVID-19 pandemic has led to a sharp increase in the number of patients with post-infectious diseases, and many of them fulfill ME/CFS criteria. Dedicated, focused research using advanced medical technologies is needed to gain further understanding of the underlying disease mechanism. This will enable us to find effective pharmacological treatments and address the unmet medical needs of these very ill people. Full article
(This article belongs to the Special Issue POTS, ME/CFS and Long COVID: Recent Advances and Future Direction)
18 pages, 5332 KB  
Article
Research on Active Interference Technology Based on Piezoelectric Flexible Structure
by Chaoyan Wang, Xiaodong Zhou, Chao Zhang, Hongli Ji and Jinhao Qiu
Actuators 2026, 15(1), 62; https://doi.org/10.3390/act15010062 - 16 Jan 2026
Viewed by 178
Abstract
To address the issue of voice leakage during the rapid deployment of meeting rooms, a piezoelectric flexible interference structure (PFIS) for active sound masking is developed in this paper. The PFIS uses rubber as the base, allowing it to bend or fold, offering [...] Read more.
To address the issue of voice leakage during the rapid deployment of meeting rooms, a piezoelectric flexible interference structure (PFIS) for active sound masking is developed in this paper. The PFIS uses rubber as the base, allowing it to bend or fold, offering good flexibility. The PFIS generates vibration through direct contact with the target object, without the need for adhesives or installation, fulfilling the need for rapid deployment. The experiment studied the driving of PFIS under three types of interference signals, analyzing the interference performance of PFIS by combining the vibration response of the surface of the table. The results show that the vibration response generated by PFIS on the surface of the table is significantly greater than when only a human voice is present. When a 3.5 kg weight is added to the surface of PFIS, its vibration performance increases by 5.6 times. Furthermore, increasing the voltage enhances the vibration interference effect of the PFIS across the entire frequency range; after adding weight, the vibration interference performance of the PFIS is significantly improved for frequencies above 2500 Hz. It has been verified that PFIS has strong vibration interference performance, effectively masking the vibrations of objects under human voice, providing a new technical solution for information security protection in sensitive areas. Full article
Show Figures

Figure 1

22 pages, 8300 KB  
Article
Sign2Story: A Multimodal Framework for Near-Real-Time Hand Gestures via Smartphone Sensors to AI-Generated Audio-Comics
by Gul Faraz, Lei Jing and Xiang Li
Sensors 2026, 26(2), 596; https://doi.org/10.3390/s26020596 - 15 Jan 2026
Viewed by 248
Abstract
This study presents a multimodal framework that uses smartphone motion sensors and generative AI to create audio comics from live news headlines. The system operates without direct touch or voice input, instead responding to simple hand-wave gestures. The system demonstrates potential as an [...] Read more.
This study presents a multimodal framework that uses smartphone motion sensors and generative AI to create audio comics from live news headlines. The system operates without direct touch or voice input, instead responding to simple hand-wave gestures. The system demonstrates potential as an alternative input method, which may benefit users who find traditional touch or voice interaction challenging. In the experiments, we investigated the generation of comics on based on the latest tech-related news headlines using Really Simple Syndication (RSS) on a simple hand wave gesture. The proposed framework demonstrates extensibility beyond comic generation, as various other tasks utilizing large language models and multimodal AI could be integrated by mapping them to different hand gestures. Our experiments with open-source models like LLaMA, LLaVA, Gemma, and Qwen revealed that LLaVA delivers superior results in generating panel-aligned stories compared to Qwen3-VL, both in terms of inference speed and output quality, relative to the source image. These large language models (LLMs) collectively contribute imaginative and conversational narrative elements that enhance diversity in storytelling within the comic format. Additionally, we implement an AI-in-the-loop mechanism to iteratively improve output quality without human intervention. Finally, AI-generated audio narration is incorporated into the comics to create an immersive, multimodal reading experience. Full article
(This article belongs to the Special Issue Body Area Networks: Intelligence, Sensing and Communication)
Show Figures

Figure 1

30 pages, 6201 KB  
Article
AFAD-MSA: Dataset and Models for Arabic Fake Audio Detection
by Elsayed Issa
Computation 2026, 14(1), 20; https://doi.org/10.3390/computation14010020 - 14 Jan 2026
Viewed by 187
Abstract
As generative speech synthesis produces near-human synthetic voices and reliance on online media grows, robust audio-deepfake detection is essential to fight misuse and misinformation. In this study, we introduce the Arabic Fake Audio Dataset for Modern Standard Arabic (AFAD-MSA), a curated corpus of [...] Read more.
As generative speech synthesis produces near-human synthetic voices and reliance on online media grows, robust audio-deepfake detection is essential to fight misuse and misinformation. In this study, we introduce the Arabic Fake Audio Dataset for Modern Standard Arabic (AFAD-MSA), a curated corpus of authentic and synthetic Arabic speech designed to advance research on Arabic deepfake and spoofed-speech detection. The synthetic subset is generated with four state-of-the-art proprietary text-to-speech and voice-conversion models. Rich metadata—covering speaker attributes and generation information—is provided to support reproducibility and benchmarking. To establish reference performance, we trained three AASIST models and compared their performance to two baseline transformer detectors (Wav2Vec 2.0 and Whisper). On the AFAD-MSA test split, AASIST-2 achieved perfect accuracy, surpassing the baseline models. However, its performance declined under cross-dataset evaluation. These results underscore the importance of data construction. Detectors generalize best when exposed to diverse attack types. In addition, continual or contrastive training that interleaves bona fide speech with large, heterogeneous spoofed corpora will further improve detectors’ robustness. Full article
Show Figures

Figure 1

26 pages, 1643 KB  
Article
Methodologies of Care: A Multimodal, Participatory Research Approach with Vulnerable Families Among South African Communities
by James Reid, Chanté Johannes, Shenaaz Wareley, Collen Ngadhi, Avukonke Nginase, Katerina Demetriou and Nicolette V. Roman
Methods Protoc. 2026, 9(1), 11; https://doi.org/10.3390/mps9010011 - 13 Jan 2026
Viewed by 155
Abstract
Multimodal methods provide valuable opportunities within Participatory Action Research (PAR), to foster meaningful participation, and amplify marginalized voices. However, conventional research approaches have not always adequately captured the complex realities of the lived experiences of families, and multimodal techniques have remained underutilized for [...] Read more.
Multimodal methods provide valuable opportunities within Participatory Action Research (PAR), to foster meaningful participation, and amplify marginalized voices. However, conventional research approaches have not always adequately captured the complex realities of the lived experiences of families, and multimodal techniques have remained underutilized for the exploration of such experiences. This study aimed to explore the use of creative multimodal methods, within a PAR framework, grounded in care among vulnerable South African families. A qualitative design was adopted, incorporating Human-centered Design principles, within a PAR approach. The participants were recruited from the Saldanha Bay Municipality area (n = 70), as well as Mitchells Plain (n = 59). The multimodal methodology included Draw-and-Tell, painting, object and photo elicitation, I-Poems, and LEGO®-based activities. Data were annotated and transcribed verbatim, followed by thematic analysis. A total of 42 participants contributed towards the validation of the methods. The participants described experiences of deep emotional insight, self-reflection, and self-recognition, through engagement with the multimodal activities. The findings revealed that these approaches were: (1) credible, producing internally valid and contextually rich data; (2) contributory, generating original and applicable insights into family life; (3) communicable, offering accessible and structured ways for diverse participants to express their experiences; and (4) conforming, ensuring ethical engagement through inclusive participation. These findings demonstrate the potential of creative, arts-based, and participatory approaches, to advance methodological innovation in qualitative family research. Full article
(This article belongs to the Section Public Health Research)
Show Figures

Figure 1

23 pages, 6094 KB  
Systematic Review
Toward Smart VR Education in Media Production: Integrating AI into Human-Centered and Interactive Learning Systems
by Zhi Su, Tse Guan Tan, Ling Chen, Hang Su and Samer Alfayad
Biomimetics 2026, 11(1), 34; https://doi.org/10.3390/biomimetics11010034 - 4 Jan 2026
Viewed by 610
Abstract
Smart virtual reality (VR) systems are becoming central to media production education, where immersive practice, real-time feedback, and hands-on simulation are essential. This review synthesizes the integration of artificial intelligence (AI) into human-centered, interactive VR learning for television and media production. Searches in [...] Read more.
Smart virtual reality (VR) systems are becoming central to media production education, where immersive practice, real-time feedback, and hands-on simulation are essential. This review synthesizes the integration of artificial intelligence (AI) into human-centered, interactive VR learning for television and media production. Searches in Scopus, Web of Science, IEEE Xplore, ACM Digital Library, and SpringerLink (2013–2024) identified 790 records; following PRISMA screening, 94 studies met the inclusion criteria and were synthesized using a systematic scoping review approach. Across this corpus, common AI components include learner modeling, adaptive task sequencing (e.g., RL-based orchestration), affect sensing (vision, speech, and biosignals), multimodal interaction (gesture, gaze, voice, haptics), and growing use of LLM/NLP assistants. Reported benefits span personalized learning trajectories, high-fidelity simulation of studio workflows, and more responsive feedback loops that support creative, technical, and cognitive competencies. Evaluation typically covers usability and presence, workload and affect, collaboration, and scenario-based learning outcomes, leveraging interaction logs, eye tracking, and biofeedback. Persistent challenges include latency and synchronization under multimodal sensing, data governance and privacy for biometric/affective signals, limited transparency/interpretability of AI feedback, and heterogeneous evaluation protocols that impede cross-system comparison. We highlight essential human-centered design principles—teacher-in-the-loop orchestration, timely and explainable feedback, and ethical data governance—and outline a research agenda to support standardized evaluation and scalable adoption of smart VR education in the creative industries. Full article
(This article belongs to the Special Issue Biomimetic Innovations for Human–Machine Interaction)
Show Figures

Figure 1

16 pages, 372 KB  
Entry
AI, Authorship, Copyright, and Human Originality
by Anja Neubauer, Martin Wynn and Robin Bown
Encyclopedia 2026, 6(1), 9; https://doi.org/10.3390/encyclopedia6010009 - 2 Jan 2026
Viewed by 563
Definition
This entry explores the implications of generative AI for the underlying foundational premises of copyright law and the potential threat it poses to human creativity. It identifies the gaps and inconsistencies in legal frameworks as regards authorship, training-data use, moral rights, and human [...] Read more.
This entry explores the implications of generative AI for the underlying foundational premises of copyright law and the potential threat it poses to human creativity. It identifies the gaps and inconsistencies in legal frameworks as regards authorship, training-data use, moral rights, and human originality in the context of AI systems that are capable of imitating human expression at both syntactic and semantic levels. The entry includes: (i) a comparative analysis of the legal frameworks of the United Kingdom, United States, and Germany, using the Berne Convention as a harmonising baseline, (ii) a systematic synthesis of the relevant academic literature, and (iii) insights gained from semi-structured interviews with legal scholars, AI developers, industry stakeholders, and creators. Evidence suggests that existing laws are ill-equipped for semantic and stylistic reproduction; there is no agreement on authorship, no clear licensing model for training data, and inadequate protection for the moral identity of creators—especially posthumously, where explicit protections for likeness, voice, and style are fragmented. The entry puts forward a draft global framework to restore legal certainty and cultural value, incorporating a semantics-aware definition of the term “work”, and encompassing licensing and remuneration of training data, enhanced moral and posthumous rights, as well as enforceable transparency. At the same time, parallel personality-based safeguards, including rights of publicity, image, or likeness, although present in all three jurisdictions studied, are not subject to the same copyright and thus do not offer any coherent or adequate protection against semantic or stylistic imitation, which once again highlights the need for a more unified and robust copyright strategy. Full article
(This article belongs to the Section Social Sciences)
Show Figures

Graphical abstract

22 pages, 3592 KB  
Article
Empirical Evidence of AI-Enabled Accessibility in Digital Gastronomy: Development and Evaluation of the Receitas +Power Platform
by Paulo Serra, Ângela Oliveira, Filipe Fidalgo, Bruno Serra, Tiago Infante and Luís Baião
Gastronomy 2026, 4(1), 2; https://doi.org/10.3390/gastronomy4010002 - 31 Dec 2025
Viewed by 230
Abstract
This study explores how artificial intelligence can promote accessibility and inclusiveness in digital culinary environments. Centred on the Receitas +Power platform, the research adopts an exploratory, multidimensional case study design integrating qualitative and quantitative analyses. The investigation addresses three research questions concerning (i) [...] Read more.
This study explores how artificial intelligence can promote accessibility and inclusiveness in digital culinary environments. Centred on the Receitas +Power platform, the research adopts an exploratory, multidimensional case study design integrating qualitative and quantitative analyses. The investigation addresses three research questions concerning (i) user empowerment beyond recommendation systems, (ii) accessibility best practices across disability types, and (iii) the effectiveness of AI-enabled inclusive solutions. The system was developed following user-centred design principles and WCAG 2.2 standards, combining generative AI modules for recipe creation with accessibility features such as voice interaction and adaptive navigation. The evaluation, conducted with 87 participants, employed the System Usability Scale complemented by thematic qualitative feedback. Results indicate excellent usability (M = 80.6), high reliability (Cronbach’s α = 0.798–0.849), and moderate positive correlations between usability and accessibility dimensions (r = 0.45–0.55). Participants highlighted the platform’s personalisation, clarity, and inclusivity, confirming that accessibility enhances rather than restricts user experience. The findings provide empirical evidence that AI-driven adaptability, when grounded in universal design principles, offers an effective and ethically sound pathway toward digital inclusion. Receitas +Power thus advances the field of inclusive digital gastronomy and presents a replicable framework for human–AI co-creation in accessible web technologies. Full article
Show Figures

Figure 1

17 pages, 1717 KB  
Review
Trends in Marine Mammal Literature in Human Care: A Need for More Welfare-, Environmental- and Management-Related Research
by Sabrina Brando, Sara Torres Ortiz, Geoff Hosey and Heather M. Manitzas Hill
J. Zool. Bot. Gard. 2025, 6(4), 65; https://doi.org/10.3390/jzbg6040065 - 18 Dec 2025
Viewed by 763
Abstract
Marine mammals have been successfully maintained under human care; however, the media, public, and professionals within the field frequently voice welfare concerns. This study systematically surveyed peer-reviewed (PR) literature from 1948 to 2024 (n = 1308) and included an opportunistic sample of [...] Read more.
Marine mammals have been successfully maintained under human care; however, the media, public, and professionals within the field frequently voice welfare concerns. This study systematically surveyed peer-reviewed (PR) literature from 1948 to 2024 (n = 1308) and included an opportunistic sample of non-peer-reviewed (NPR) literature from the past 40 years (n = 756) to evaluate research efforts associated with species housed in zoos and aquariums. The current study updates and extends previous efforts to assess research categories. The findings indicate that the volume of research published mirrors the species abundance in human care. Across taxa, PR papers concentrate on science that enhances the understanding of biological functions (Acoustics, Biology, Breeding, Behaviour, Health) but is not necessarily tailored to improve management or optimal care. In contrast, a substantial portion of the NPR literature focuses on daily handling and management, highlighting Environment and Management and Enrichment-related activities. While welfare-related research has increased in both PR and NPR literature, this review underscores the need for additional welfare-related empirical studies to further enhance animal care and wellbeing. We encourage those involved in the practical care of such taxa to empirically evaluate these interventions and disseminate their findings in the PR literature. Full article
Show Figures

Figure 1

59 pages, 7553 KB  
Review
Turn-Taking Modelling in Conversational Systems: A Review of Recent Advances
by Rutherford Agbeshi Patamia, Ha Pham Thien Dinh, Ming Liu and Akansel Cosgun
Technologies 2025, 13(12), 591; https://doi.org/10.3390/technologies13120591 - 15 Dec 2025
Viewed by 1921
Abstract
Effective turn-taking is fundamental to conversational interactions, shaping the fluidity of communication across human dialogues and interactions with spoken dialogue systems (SDS). Despite its apparent simplicity, conversational turn-taking involves complex timing mechanisms influenced by various linguistic, prosodic, and multimodal cues. This review synthesises [...] Read more.
Effective turn-taking is fundamental to conversational interactions, shaping the fluidity of communication across human dialogues and interactions with spoken dialogue systems (SDS). Despite its apparent simplicity, conversational turn-taking involves complex timing mechanisms influenced by various linguistic, prosodic, and multimodal cues. This review synthesises recent theoretical insights and practical advancements in understanding and modelling conversational timing dynamics, emphasising critical phenomena such as voice activity (VA), turn floor offsets (TFO), and predictive turn-taking. We first discuss foundational concepts, such as voice activity detection (VAD) and inter-pausal units (IPUs), and highlight their significance for systematically representing dialogue states. Central to the challenge of interactive systems is distinguishing moments when conversational roles shift versus when they remain with the current speaker, encapsulated by the concepts of “hold” and “shift”. The timing of these transitions, measured through Turn Floor Offsets (TFOs), aligns closely with minimal human reaction times, suggesting biological underpinnings while exhibiting cross-linguistic variability. This review further explores computational turn-taking heuristics and models, noting that simplistic strategies may reduce interruptions yet risk introducing unnatural delays. Integrating multimodal signals, prosodic, verbal, visual, and predictive mechanisms is emphasised as essential for future developments in achieving human-like conversational responsiveness. Full article
(This article belongs to the Special Issue Collaborative Robotics and Human-AI Interactions)
Show Figures

Figure 1

17 pages, 4452 KB  
Article
SAUCF: A Framework for Secure, Natural-Language-Guided UAS Control
by Nihar Shah, Varun Aggarwal and Dharmendra Saraswat
Drones 2025, 9(12), 860; https://doi.org/10.3390/drones9120860 - 14 Dec 2025
Viewed by 480
Abstract
Precision agriculture increasingly recognizes the transformative potential of unmanned aerial systems (UASs) for crop monitoring and field assessment, yet research consistently highlights significant usability barriers as the main constraints to widespread adoption. Complex mission planning processes, including detailed flight plan creation and way [...] Read more.
Precision agriculture increasingly recognizes the transformative potential of unmanned aerial systems (UASs) for crop monitoring and field assessment, yet research consistently highlights significant usability barriers as the main constraints to widespread adoption. Complex mission planning processes, including detailed flight plan creation and way point management, pose substantial technical challenges that mainly affect non-expert operators. Farmers and their teams generally prefer user-friendly, straightforward tools, as evidenced by the rapid adoption of GPS guidance systems, which underscores the need for simpler mission planning in UAS operations. To enhance accessibility and safety in UAS control, especially for non-expert operators in agriculture and related fields, we propose a Secure UAS Control Framework (SAUCF): a comprehensive system for natural-language-driven UAS mission management with integrated dual-factor biometric authentication. The framework converts spoken user instructions into executable flight plans by leveraging a language-model-powered mission planner that interprets transcribed voice commands and generates context-aware operational directives, including takeoff, location monitoring, return-to-home, and landing operations. Mission orchestration is performed through a large language model (LLM) agent, coupled with a human-in-the-loop supervision mechanism that enables operators to review, adjust, or confirm mission plans before deployment. Additionally, SAUCF offers a manual override feature, allowing users to assume direct control or interrupt missions at any stage, ensuring safety and adaptability in dynamic environments. Proof-of-concept demonstrations on a UAS plat-form with on-board computing validated reliable speech-to-text transcription, biometric verification via voice matching and face authentication, and effective Sim2Real transfer of natural-language-driven mission plans from simulation environments to physical UAS operations. Initial evaluations showed that SAUCF reduced mission planning time, minimized command errors, and simplified complex multi-objective workflows compared to traditional waypoint-based tools, though comprehensive field validation remains necessary to confirm these preliminary findings. The integration of natural-language-based interaction, real-time identity verification, human-in-the-loop LLM orchestration, and manual override capabilities allows SAUCF to significantly lower the technical barrier to UAS operation while ensuring mission security, operational reliability, and operator agency in real-world conditions. These findings lay the groundwork for systematic field trials and suggest that prioritizing ease of operation in mission planning can drive broader deployment of UAS technologies. Full article
(This article belongs to the Section Artificial Intelligence in Drones (AID))
Show Figures

Figure 1

15 pages, 3698 KB  
Article
Discovering the Effects of Superior-Surface Vocal Fold Lesions via Fluid–Structure Interaction Analysis
by Manoela Neves, Anitha Niyingenera, Norah Delaney and Rana Zakerzadeh
Bioengineering 2025, 12(12), 1360; https://doi.org/10.3390/bioengineering12121360 - 13 Dec 2025
Viewed by 438
Abstract
This study examines the impact of vocal fold (VF) lesions located on the superior surface on glottal airflow dynamics and tissue oscillatory behaviors using biomechanical simulations of a two-layered realistic VF model. It is hypothesized that morphological changes in the VFs due to [...] Read more.
This study examines the impact of vocal fold (VF) lesions located on the superior surface on glottal airflow dynamics and tissue oscillatory behaviors using biomechanical simulations of a two-layered realistic VF model. It is hypothesized that morphological changes in the VFs due to the presence of a lesion cause changes in tissue elasticity and rheological properties, contributing to dysphonia. Previous research has lacked the integration of lesions in computational simulations of anatomically accurate larynx-VF models to explore their effects on phonation and contribution to voice disorders. Addressing the current gap in literature, this paper considers a computational model of a two-layered VF structure incorporating a lesion that represents a hemorrhagic polyp. A three-dimensional, subject-specific, multilayered geometry of VFs is constructed based on STL files derived from a human larynx CT scan, and a fluid–structure interaction (FSI) methodology is employed to simulate the coupling of glottal airflow and VF tissue dynamics. To evaluate the effects of the lesion’s presence, two FSI models, one with a lesion embedded in the cover layer and one without, are simulated and compared. Analysis of airflow dynamics and tissue vibrational patterns between these two models is used to determine the impact of the lesion on the biomechanical characteristics of phonation. The polyp is found to slightly increase airflow resistance through the glottis and disrupt vibratory symmetry by decreasing the vibration frequency of the affected fold, leading to weaker and less rhythmic oscillations. The results also indicate that the lesion increases tissue stress in the affected fold, which agrees with clinical observations. While quantitative ranges depend on lesion size and tissue properties, these consistent and physically meaningful trends highlight the biomechanical mechanisms by which lesions influence phonation. Full article
(This article belongs to the Section Biomechanics and Sports Medicine)
Show Figures

Figure 1

Back to TopTop