Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (73)

Search Parameters:
Keywords = audio delay

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 264 KB  
Article
Collaboration Between Nurses and Patients’ Families in Managing Chronic Heart Failure in Older Adults: A Qualitative Study
by Abdulaziz M. Alodhailah, Albandari Almutairi, Thurayya Eid, Rayhanah R. Almutairi, Asrar S. Almutairi, Ashwaq A. Almutairi, Waleed M. Alshehri, Bader M. Almutairy and Faihan F. Alshaibany
Healthcare 2026, 14(7), 853; https://doi.org/10.3390/healthcare14070853 - 27 Mar 2026
Abstract
Background: Chronic heart failure (CHF) in older adults requires sustained self-management and close follow-up, yet day-to-day care is often carried out by families with support from primary healthcare nurses. In Saudi Arabia, where family caregiving is culturally normative, collaboration between nurses and [...] Read more.
Background: Chronic heart failure (CHF) in older adults requires sustained self-management and close follow-up, yet day-to-day care is often carried out by families with support from primary healthcare nurses. In Saudi Arabia, where family caregiving is culturally normative, collaboration between nurses and patients’ families may be pivotal to effective CHF management, but remains insufficiently understood in primary healthcare contexts. Methods: A qualitative study informed by an interpretive phenomenological approach was conducted. Participants (n = 24; 12 nurses and 12 family caregivers) were recruited using purposive sampling from primary healthcare centers in Riyadh, Saudi Arabia. In-depth, semi-structured interviews were conducted in Arabic or English, audio-recorded, transcribed verbatim, and analyzed using reflexive thematic analysis following Braun and Clarke’s six-phase framework. Strategies to enhance trustworthiness included member checking, peer debriefing, maintenance of an audit trail, and reflexive journaling. Results: Twenty-four participants (12 nurses and 12 family caregivers) were interviewed. Four interrelated themes were generated from both nurses’ and family caregivers’ accounts. (1) “We Are Caring Together”: Collaboration was experienced as shared responsibility for daily CHF management, grounded in trust; (2) Navigating Roles and Boundaries: Participants described unclear expectations, role overlap, and tension between professional authority and family knowledge; (3) Communication as the Engine of Collaboration: Effective partnerships depended on clear information exchange, caregiver-tailored education, and continuity of contact, while communication gaps created uncertainty and delayed support-seeking; and (4) Cultural and System Constraints Shaping Collaboration: Strong family obligation motivated caregiving but also intensified moral pressure and limited help-seeking, while time pressure and fragmented services constrained meaningful engagement and continuity across settings. Conclusions: Nurse–family collaboration in CHF management is relational, shaped by trust, role negotiation, and communication, and constrained by cultural norms and system pressures. This study contributes to the literature by demonstrating how moral obligation, hierarchical professional norms, and system fragmentation distinctively shape collaboration in the Saudi primary care context, extending existing conceptualizations derived primarily from Western individualist settings. Strengthening collaboration requires explicit role clarification, health literacy–informed caregiver education, continuity of contact, and organizational supports. Findings are limited by purposive sampling, single-city context, and exclusion of patient perspectives. Full article
13 pages, 214 KB  
Article
Living with Retinitis Pigmentosa in Türkiye: Diagnosis, Independence, and Access to Care
by Nurcan Gürsoy and Ersan Gürsoy
Healthcare 2026, 14(5), 593; https://doi.org/10.3390/healthcare14050593 - 27 Feb 2026
Viewed by 182
Abstract
Background: Retinitis pigmentosa (RP) is a progressive inherited retinal dystrophy that affects daily functioning, psychological well-being, and social participation. Although quantitative research describes disease burden, less is known about how individuals experience progressive vision loss in everyday life and within healthcare and social [...] Read more.
Background: Retinitis pigmentosa (RP) is a progressive inherited retinal dystrophy that affects daily functioning, psychological well-being, and social participation. Although quantitative research describes disease burden, less is known about how individuals experience progressive vision loss in everyday life and within healthcare and social contexts. Methods: This qualitative study used semi-structured face-to-face interviews with adults diagnosed with RP. Purposive sampling was applied to ensure variation in demographic and clinical characteristics. Interviews were conducted in a tertiary ophthalmology clinic in Erzincan, Türkiye, between June and October 2025. Audio recordings were transcribed verbatim and analyzed using Braun and Clarke’s reflexive thematic analysis. Reporting followed the Consolidated Criteria for Reporting Qualitative Research (COREQ) checklist. Results: Sixteen participants (P1–P16) were included. Five themes were identified: (1) making sense of the illness and the diagnostic journey; (2) functional loss and the negotiation of independence; (3) psychological adaptation and identity reconstruction; (4) social relationships and social encounters; and (5) interaction with systems and the environment—accessibility and healthcare. Participants described early symptom normalization, delayed diagnostic pathways, and uncertainty persisting after diagnosis. Independence was shaped by safety concerns, environmental barriers, and reliance on support. Psychological adjustment fluctuated between fear of progression and efforts to sustain resilience. Social participation was influenced by support networks, concerns about being a burden, and stigma linked to invisible disability. Conclusions: Living with RP extends beyond visual impairment; building on prior qualitative work, our findings contextualize these experiences in Türkiye, highlighting how accessibility gaps, bureaucratic encounters in public institutions, and cost barriers within healthcare and public services can shape uncertainty, independence, and social participation. Full article
(This article belongs to the Section Mental Health and Psychosocial Well-being)
18 pages, 10692 KB  
Article
Short-Time Homomorphic Deconvolution (STHD): A Novel 2D Feature for Robust Indoor Direction of Arrival Estimation
by Yeonseok Park and Jun-Hwa Kim
Sensors 2026, 26(2), 722; https://doi.org/10.3390/s26020722 - 21 Jan 2026
Viewed by 345
Abstract
Accurate indoor positioning and navigation remain significant challenges, with audio sensor-based sound source localization emerging as a promising sensing modality. Conventional methods, often reliant on multi-channel processing or time-delay estimation techniques such as Generalized Cross-Correlation, encounter difficulties regarding computational complexity, hardware synchronization, and [...] Read more.
Accurate indoor positioning and navigation remain significant challenges, with audio sensor-based sound source localization emerging as a promising sensing modality. Conventional methods, often reliant on multi-channel processing or time-delay estimation techniques such as Generalized Cross-Correlation, encounter difficulties regarding computational complexity, hardware synchronization, and reverberant environments where time difference in arrival cues are masked. While machine learning approaches have shown potential, their performance depends heavily on the discriminative power of input features. This paper proposes a novel feature extraction method named Short-Time Homomorphic Deconvolution, which transforms multi-channel audio signals into a 2D Time × Time-of-Flight representation. Unlike prior 1D methods, this feature effectively captures the temporal evolution and stability of time-of-flight differences between microphone pairs, offering a rich and robust input for deep learning models. We validate this feature using a lightweight Convolutional Neural Network integrated with a dual-stage channel attention mechanism, designed to prioritize reliable spatial cues. The system was trained on a large-scale dataset generated via simulations and rigorously tested using real-world data acquired in an ISO-certified anechoic chamber. Experimental results demonstrate that the proposed model achieves precise Direction of Arrival estimation with a Mean Absolute Error of 1.99 degrees in real-world scenarios. Notably, the system exhibits remarkable consistency between simulation and physical experiments, proving its effectiveness for robust indoor navigation and positioning systems. Full article
Show Figures

Figure 1

15 pages, 1386 KB  
Article
Symmetry and Asymmetry Principles in Deep Speaker Verification Systems: Balancing Robustness and Discrimination Through Hybrid Neural Architectures
by Sundareswari Thiyagarajan and Deok-Hwan Kim
Symmetry 2026, 18(1), 121; https://doi.org/10.3390/sym18010121 - 8 Jan 2026
Viewed by 428
Abstract
Symmetry and asymmetry are foundational design principles in artificial intelligence, defining the balance between invariance and adaptability in multimodal learning systems. In audio-visual speaker verification, where speech and lip-motion features are jointly modeled to determine whether two utterances belong to the same individual, [...] Read more.
Symmetry and asymmetry are foundational design principles in artificial intelligence, defining the balance between invariance and adaptability in multimodal learning systems. In audio-visual speaker verification, where speech and lip-motion features are jointly modeled to determine whether two utterances belong to the same individual, these principles govern both fairness and discriminative power. In this work, we analyze how symmetry and asymmetry emerge within a gated-fusion architecture that integrates Time-Delay Neural Networks and Bidirectional Long Short-Term Memory encoders for speech, ResNet-based visual lip encoders, and a shared Conformer-based temporal backbone. Structural symmetry is preserved through weight-sharing across paired utterances and symmetric cosine-based scoring, ensuring verification consistency regardless of input order. In contrast, asymmetry is intentionally introduced through modality-dependent temporal encoding, multi-head attention pooling, and a learnable gating mechanism that dynamically re-weights the contribution of audio and visual streams at each timestep. This controlled asymmetry allows the model to rely on visual cues when speech is noisy, and conversely on speech when lip visibility is degraded, yielding adaptive robustness under cross-modal degradation. Experimental results demonstrate that combining symmetric embedding space design with adaptive asymmetric fusion significantly improves generalization, reducing Equal Error Rate (EER) to 3.419% on VoxCeleb-2 test dataset without sacrificing interpretability. The findings show that symmetry ensures stable and fair decision-making, while learnable asymmetry enables modality awareness together forming a principled foundation for next-generation audio-visual speaker verification systems. Full article
Show Figures

Figure 1

19 pages, 285 KB  
Article
Mothers’ Experiences in Accessing Early Intervention Services for Children with Developmental Disabilities
by Špela Golubović, Jelena Radonjić, Mirjana Djordjević and Sonja Golubović
Psychiatry Int. 2025, 6(4), 144; https://doi.org/10.3390/psychiatryint6040144 - 19 Nov 2025
Cited by 1 | Viewed by 1016
Abstract
Children with developmental disabilities (DD) require early and coordinated services, yet parents often face obstacles in accessing adequate support. This study examined parents’ experiences with early intervention in Serbia to identify barriers, supports, and context-specific challenges. Semistructured interviews were conducted with 15 parents [...] Read more.
Children with developmental disabilities (DD) require early and coordinated services, yet parents often face obstacles in accessing adequate support. This study examined parents’ experiences with early intervention in Serbia to identify barriers, supports, and context-specific challenges. Semistructured interviews were conducted with 15 parents of children aged ≤ 6 years. Interviews (30–50 min) were audio-recorded, transcribed verbatim, and thematically analyzed in line with the Consolidated Criteria for Reporting Qualitative Studies. Seven themes emerged: recognition of concerns, first steps in seeking help, complexity of procedures, information gaps, emotional and practical challenges, collaboration with professionals, and recommendations for improvement. Parents typically noticed developmental delays, especially in language and motor skills, by age two but encountered lengthy and fragmented referral pathways, long waiting lists, and insufficient guidance. Parents emphasized the value of empathetic professionals and peer networks while also reporting stigma and social isolation. This study contributes new evidence on how structural barriers and cultural attitudes in Serbia shape families’ access to early intervention. Findings highlight the need for streamlined referral systems, transparent and accessible information for families, and interdisciplinary training for professionals. Addressing these issues could reduce delays, alleviate parental stress, and promote better developmental outcomes for children with DD. Full article
20 pages, 1978 KB  
Article
StressSpeak: A Speech-Driven Framework for Real-Time Personalized Stress Detection and Adaptive Psychological Support
by Laraib Umer, Javaid Iqbal, Yasar Ayaz, Hassan Imam, Adil Ahmad and Umer Asgher
Diagnostics 2025, 15(22), 2871; https://doi.org/10.3390/diagnostics15222871 - 12 Nov 2025
Viewed by 1432
Abstract
Background: Stress is a critical determinant of mental health, yet conventional monitoring approaches often rely on subjective self-reports or physiological signals that lack real-time responsiveness. Recent advances in large language models (LLMs) offer opportunities for speech-driven, adaptive stress detection, but existing systems are [...] Read more.
Background: Stress is a critical determinant of mental health, yet conventional monitoring approaches often rely on subjective self-reports or physiological signals that lack real-time responsiveness. Recent advances in large language models (LLMs) offer opportunities for speech-driven, adaptive stress detection, but existing systems are limited to retrospective text analysis, monolingual settings, or detection-only outputs. Methods: We developed a real-time, speech-driven stress detection framework that integrates audio recording, speech-to-text conversion, and linguistic analysis using transformer-based LLMs. The system provides multimodal outputs, delivering recommendations in both text and synthesized speech. Nine LLM variants were evaluated on five benchmark datasets under zero-shot and few-shot learning conditions. Performance was assessed using accuracy, precision, recall, F1-score, and misclassification trends (false-negatives and false-positives). Real-time feasibility was analyzed through latency modeling, and user-centered validation was conducted across cross-domains. Results: Few-shot fine-tuning improved model performance across all datasets, with Large Language Model Meta AI (LLaMA) and Robustly Optimized BERT Pretraining Approach (RoBERTa) achieving the highest F1-scores and reduced false-negatives, particularly for suicide risk detection. Latency analysis revealed a trade-off between responsiveness and accuracy, with delays ranging from ~2 s for smaller models to ~7.6 s for LLaMA-7B on 30 s audio inputs. Multilingual input support and multimodal output enhanced inclusivity. User feedback confirmed strong usability, accessibility, and adoption potential in real-world settings. Conclusions: This study demonstrates that real-time, LLM-powered stress detection is both technically robust and practically feasible. By combining speech-based input, multimodal feedback, and user-centered validation, the framework advances beyond traditional detection only models toward scalable, inclusive, and deployment-ready digital mental health solutions. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

23 pages, 2166 KB  
Article
Performance Analysis of Switch Buffer Management Policy for Mixed-Critical Traffic in Time-Sensitive Networks
by Ling Zheng, Yingge Feng, Weiqiang Wang and Qianxi Men
Mathematics 2025, 13(21), 3443; https://doi.org/10.3390/math13213443 - 29 Oct 2025
Cited by 1 | Viewed by 990
Abstract
Time-sensitive networking (TSN), a cutting-edge technology enabling efficient real-time communication and control, provides strong support for traditional Ethernet in terms of real-time performance, reliability, and deterministic transmission. In TSN systems, although time-triggered (TT) flows enjoy deterministic delay guarantees, audio video bridging (AVB) and [...] Read more.
Time-sensitive networking (TSN), a cutting-edge technology enabling efficient real-time communication and control, provides strong support for traditional Ethernet in terms of real-time performance, reliability, and deterministic transmission. In TSN systems, although time-triggered (TT) flows enjoy deterministic delay guarantees, audio video bridging (AVB) and best effort (BE) traffic still share link bandwidth through statistical multiplexing, a process that remains nondeterministic. This competition in shared memory switches adversely affects data transmission performance. In this paper, a priority queue threshold control policy is proposed and analyzed for mixed-critical traffic in time-sensitive networks. The core of this policy is to set independent queues for different types of traffic in the shared memory queuing system. To prevent low-priority traffic from monopolizing the shared buffer, its entry into the queue is blocked when buffer usage exceeds a preset threshold. A two-dimensional Markov chain is introduced to accurately construct the system’s queuing model. Through detailed analysis of the queuing model, the truncated chain method is used to decompose the two-dimensional state space into solvable one-dimensional sub-problems, and the approximate solution of the system’s steady-state distribution is derived. Based on this, the blocking probability, average queue length, and average queuing delay of different priority queues are accurately calculated. Finally, according to the optimization goal of the overall blocking probability of the system, the optimal threshold value is determined to achieve better system performance. Numerical results show that this strategy can effectively allocate the shared buffer space in multi-priority traffic scenarios. Compared with the conventional schemes, the queue blocking probability is reduced by approximately 40% to 60%. Full article
Show Figures

Figure 1

16 pages, 647 KB  
Article
Implementation of a Generative AI-Powered Digital Interactive Platform for Clinical Language Therapy in Children with Language Delay: A Pilot Study
by Chia-Hui Chueh, Tzu-Hui Chiang, Po-Wei Pan, Ko-Long Lin, Yen-Sen Lu, Sheng-Hui Tuan, Chao-Ruei Lin, I-Ching Huang and Hsu-Sheng Cheng
Life 2025, 15(10), 1628; https://doi.org/10.3390/life15101628 - 18 Oct 2025
Viewed by 1974
Abstract
Early intervention is pivotal for optimizing neurodevelopmental outcomes in children with language delay, where increased language stimulation can optimize therapeutic outcomes. Extending speech–language therapy from clinical settings to the home is a promising strategy; however, practical barriers and a lack of scalable, customizable [...] Read more.
Early intervention is pivotal for optimizing neurodevelopmental outcomes in children with language delay, where increased language stimulation can optimize therapeutic outcomes. Extending speech–language therapy from clinical settings to the home is a promising strategy; however, practical barriers and a lack of scalable, customizable home-based models limit the implementation of this approach. The integration of AI-powered digital interactive tools could bridge this gap. This pilot feasibility study adopted a single-arm pre–post (before–after) design within a two-phase, mixed-methods framework to evaluate a generative AI-powered interactive platform supporting home-based language therapy in children with either idiopathic language delay or autism spectrum disorder (ASD)-related language impairment: two conditions known to involve heterogeneous developmental profiles. The participants received clinical language assessments and engaged in home-based training using AI-enhanced tablet software, and 2000 audio recordings were collected and analyzed to assess pre- and postintervention language abilities. A total of 22 children aged 2–12 years were recruited, with 19 completing both phases. Based on 6-week cumulative usage, participants were stratified with respect to hours of AI usage into Groups A (≤5 h, n = 5), B (5 < h ≤ 10, n = 5), C (10 < h ≤ 15, n = 4), and D (>15 h, n = 5). A threshold effect was observed: only Group D showed significant gains between baseline and postintervention, with total words (58→110, p = 0.043), characters (98→192, p = 0.043), type–token ratio (0.59→0.78, p = 0.043), nouns (34→56, p = 0.043), verbs (12→34, p = 0.043), and mean length of utterance (1.83→3.24, p = 0.043) all improving. No significant changes were found in Groups A to C. These findings indicate the positive impact of extended use on the development of language. Generative AI-powered digital interactive tools, when they are integrated into home-based language therapy programs, can significantly improve language outcomes in children who have language delay and ASD. This approach offers a scalable, cost-effective extension of clinical care to the home, demonstrating the potential to enhance therapy accessibility and long-term outcomes. Full article
(This article belongs to the Section Medical Research)
Show Figures

Figure A1

15 pages, 10411 KB  
Article
Application of Foundation Models for Colorectal Cancer Tissue Classification in Mass Spectrometry Imaging
by Alon Gabriel, Amoon Jamzad, Mohammad Farahmand, Martin Kaufmann, Natasha Iaboni, David Hurlbut, Kevin Yi Mi Ren, Christopher J. B. Nicol, John F. Rudan, Sonal Varma, Gabor Fichtinger and Parvin Mousavi
Technologies 2025, 13(10), 434; https://doi.org/10.3390/technologies13100434 - 27 Sep 2025
Viewed by 1249
Abstract
Colorectal cancer (CRC) remains a leading global health challenge, with early and accurate diagnosis crucial for effective treatment. Histopathological evaluation, the current diagnostic gold standard, faces limitations including subjectivity, delayed results, and reliance on well-prepared tissue slides. Mass spectrometry imaging (MSI) offers a [...] Read more.
Colorectal cancer (CRC) remains a leading global health challenge, with early and accurate diagnosis crucial for effective treatment. Histopathological evaluation, the current diagnostic gold standard, faces limitations including subjectivity, delayed results, and reliance on well-prepared tissue slides. Mass spectrometry imaging (MSI) offers a complementary approach by providing molecular-level information, but its high dimensionality and the scarcity of labeled data present unique challenges for traditional supervised learning. In this study, we present the first implementation of foundation models for MSI-based cancer classification using desorption electrospray ionization (DESI) data. We evaluate multiple architectures adapted from other domains, including a spectral classification model known as FACT, which leverages audio–language pretraining. Compared to conventional machine learning approaches, these foundation models achieved superior performance, with FACT achieving the highest cross-validated balanced accuracy (93.27%±3.25%) and AUROC (98.4%±0.7%). Ablation studies demonstrate that these models retain strong performance even under reduced data conditions, highlighting their potential for generalizable and scalable MSI-based cancer diagnostics. Future work will explore the integration of spatial and multi-modal data to enhance clinical utility. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Medical Image Analysis)
Show Figures

Graphical abstract

18 pages, 13021 KB  
Article
EMPhone: Electromagnetic Covert Channel via Silent Audio Playback on Smartphones
by Yongjae Kim, Hyeonjun An and Dong-Guk Han
Sensors 2025, 25(18), 5900; https://doi.org/10.3390/s25185900 - 21 Sep 2025
Viewed by 1087
Abstract
Covert channels enable hidden communication that poses significant security risks, particularly when smartphones are used as transmitters. This paper presents the first end-to-end implementation and evaluation of an electromagnetic (EM) covert channel on modern Samsung Galaxy S21, S22, and S23 smartphones (Samsung Electronics [...] Read more.
Covert channels enable hidden communication that poses significant security risks, particularly when smartphones are used as transmitters. This paper presents the first end-to-end implementation and evaluation of an electromagnetic (EM) covert channel on modern Samsung Galaxy S21, S22, and S23 smartphones (Samsung Electronics Co., Ltd., Suwon, Republic of Korea). We first demonstrate that a previously proposed method relying on zero-volume playback is no longer effective on these devices. Through a detailed analysis of EM emissions in the 0.1–2.5 MHz range, we discovered that consistent, volume-independent signals can be generated by exploiting the hardware’s recovery delay after silent audio playback. Based on these findings, we developed a complete system comprising a stealthy Android application for transmission, a time-based modulation scheme, and a demodulation technique designed around the characteristics of the generated signals to ensure reliable reception. The channel’s reliability and robustness were validated through evaluations of modulation time, probe distance, and message length. Experimental results show that the maximum error-free bit rate (bits per second, bps) reached 0.558 bps on Galaxy S21 and 0.772 bps on Galaxy S22 and Galaxy S23. Reliable communication was feasible up to 0.5 cm with a near-field probe, and a low alignment-aware bit error rate (BER) was maintained even for 100-byte messages. This work establishes a practical threat, and we conclude by proposing countermeasures to mitigate this vulnerability. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

28 pages, 7369 KB  
Article
Comparison of Impulse Response Generation Methods for a Simple Shoebox-Shaped Room
by Lloyd May, Nima Farzaneh, Orchisama Das and Jonathan S. Abel
Acoustics 2025, 7(3), 56; https://doi.org/10.3390/acoustics7030056 - 6 Sep 2025
Cited by 1 | Viewed by 3494
Abstract
Simulated room impulse responses (RIRs) are important tools for studying architectural acoustics. Many methods exist to generate RIRs, each with unique properties that need to be considered when choosing an RIR synthesis technique. Despite the variation in synthesis techniques, there is a dearth [...] Read more.
Simulated room impulse responses (RIRs) are important tools for studying architectural acoustics. Many methods exist to generate RIRs, each with unique properties that need to be considered when choosing an RIR synthesis technique. Despite the variation in synthesis techniques, there is a dearth of comparisons between these techniques. To address this, a comprehensive comparison of four major categories of RIR synthesis techniques was conducted: wave-based methods (hybrid FEM and modal analysis), geometrical acoustics methods (the image source method and ray tracing), delay-network reverberators (SDNs), and statistical methods (Sabine-NED). To compare these techniques, RIRs were recorded in a simple shoebox-shaped racquetball court, and we compared the synthesized RIRs against these recordings. We conducted both objective analyses, such as energy decay curves, normalized echo density, and frequency-dependent decay times, and a perceptual assessment of synthesized RIRs, which consisted of a listening assessment with 29 participants that utilized a MUSHRA comparison methodology. Our results reveal distinct advantages and limitations across synthesis categories. For example, the Sabine-NED technique was indistinguishable from the recorded IR, but it does not scale well with increasing geometric complexity. These findings provide valuable insights for selecting appropriate synthesis techniques for applications in architectural acoustics, immersive audio rendering, and virtual reality environments. Full article
Show Figures

Figure 1

25 pages, 4385 KB  
Article
Robust DeepFake Audio Detection via an Improved NeXt-TDNN with Multi-Fused Self-Supervised Learning Features
by Gul Tahaoglu
Appl. Sci. 2025, 15(17), 9685; https://doi.org/10.3390/app15179685 - 3 Sep 2025
Cited by 3 | Viewed by 4572
Abstract
Deepfake audio refers to speech that has been synthetically generated or altered through advanced neural network techniques, often with a degree of realism sufficient to convincingly imitate genuine human voices. As these manipulations become increasingly indistinguishable from authentic recordings, they present significant threats [...] Read more.
Deepfake audio refers to speech that has been synthetically generated or altered through advanced neural network techniques, often with a degree of realism sufficient to convincingly imitate genuine human voices. As these manipulations become increasingly indistinguishable from authentic recordings, they present significant threats to security, undermine media integrity, and challenge the reliability of digital authentication systems. In this study, a robust detection framework is proposed, which leverages the power of self-supervised learning (SSL) and attention-based modeling to identify deepfake audio samples. Specifically, audio features are extracted from input speech using two powerful pretrained SSL models: HuBERT-Large and WavLM-Large. These distinctive features are then integrated through an Attentional Multi-Feature Fusion (AMFF) mechanism. The fused features are subsequently classified using a NeXt-Time Delay Neural Network (NeXt-TDNN) model enhanced with Efficient Channel Attention (ECA), enabling improved temporal and channel-wise feature discrimination. Experimental results show that the proposed method achieves a 0.42% EER and 0.01 min-tDCF on ASVspoof 2019 LA, a 1.01% EER on ASVspoof 2019 PA, and a pooled 6.56% EER on the cross-channel ASVspoof 2021 LA evaluation, thus highlighting its effectiveness for real-world deepfake detection scenarios. Furthermore, on the ASVspoof 5 dataset, the method achieved a 7.23% EER, outperforming strong baselines and demonstrating strong generalization ability. Moreover, the macro-averaged F1-score of 96.01% and balanced accuracy of 99.06% were obtained on the ASVspoof 2019 LA dataset, while the proposed method achieved a macro-averaged F1-score of 98.70% and balanced accuracy of 98.90% on the ASVspoof 2019 PA dataset. On the highly challenging ASVspoof 5 dataset, which includes crowdsourced, non-studio-quality audio, and novel adversarial attacks, the proposed method achieves macro-averaged metrics exceeding 92%, with a precision of 92.07%, a recall of 92.63%, an F1-measure of 92.35%, and a balanced accuracy of 92.63%. Full article
Show Figures

Figure 1

20 pages, 642 KB  
Article
Impact of Audio Delay and Quality in Network Music Performance
by Konstantinos Tsioutas, George Xylomenos and Ioannis Doumanis
Future Internet 2025, 17(8), 337; https://doi.org/10.3390/fi17080337 - 28 Jul 2025
Cited by 1 | Viewed by 3715
Abstract
Network Music Performance (NMP) refers to network-based remote collaboration when applied to music performances, such as musical education, music production and live music concerts. In NMP, the most important parameter for the Quality of Experience (QoE) of the participants is low end-to-end audio [...] Read more.
Network Music Performance (NMP) refers to network-based remote collaboration when applied to music performances, such as musical education, music production and live music concerts. In NMP, the most important parameter for the Quality of Experience (QoE) of the participants is low end-to-end audio delay. Increasing delays prevent musicians’ synchronization and lead to a suboptimal musical experience. Visual contact between the participants is also crucial for their experience but highly demanding in terms of bandwidth. Since audio compression induces additional coding and decoding delays on the signal path, most NMP systems rely on audio quality reduction when bandwidth is limited to avoid violating the stringent delay limitations of NMP. To assess the delay and quality tolerance limits for NMP and see if they can be satisfied by emerging 5G networks, we asked eleven pairs of musicians to perform musical pieces of their choice in a carefully controlled laboratory environment, which allowed us to set different end-to-end delays or audio sampling rates. To assess the QoE of these NMP sessions, each musician responded to a set of questions after each performance. The analysis of the musicians’ responses revealed that actual musicians in delay-controlled NMP scenarios can synchronize at delays of up to 40 ms, compared to the 25–30 ms reported in rhythmic hand-clapping experiments. Our analysis also shows that audio quality can be considerably reduced by sub-sampling, so as to save bandwidth without significant QoE loss. Finally, we find that musicians rely more on audio and less on video to synchronize during an NMP session. These results indicate that NMP can become feasible in advanced 5G networks. Full article
Show Figures

Figure 1

23 pages, 524 KB  
Article
Clinician Experiences with Adolescents with Comorbid Chronic Pain and Eating Disorders
by Emily A. Beckmann, Claire M. Aarnio-Peterson, Kendra J. Homan, Cathleen Odar Stough and Kristen E. Jastrowski Mano
J. Clin. Med. 2025, 14(15), 5300; https://doi.org/10.3390/jcm14155300 - 27 Jul 2025
Viewed by 977
Abstract
Background/Objectives: Chronic pain and eating disorders are two prevalent and disabling pediatric health concerns, with serious, life-threatening consequences. These conditions can co-occur, yet little is known about best practices addressing comorbid pain and eating disorders. Delayed intervention for eating disorders may have [...] Read more.
Background/Objectives: Chronic pain and eating disorders are two prevalent and disabling pediatric health concerns, with serious, life-threatening consequences. These conditions can co-occur, yet little is known about best practices addressing comorbid pain and eating disorders. Delayed intervention for eating disorders may have grave implications, as eating disorders have one of the highest mortality rates among psychological disorders. Moreover, chronic pain not only persists but worsens into adulthood when left untreated. This study aimed to understand pediatric clinicians’ experiences with adolescents with chronic pain and eating disorders. Methods: Semi-structured interviews were conducted with hospital-based physicians (N = 10; 70% female; M years of experience = 15.3) and psychologists (N = 10; 80% female; M years of experience = 10.2) specializing in anesthesiology/pain, adolescent medicine/eating disorders, and gastroenterology across the United States. Audio transcripts were coded, and thematic analysis was used to identify key themes. Results: Clinicians described frequently encountering adolescents with chronic pain and eating disorders. Clinicians described low confidence in diagnosing comorbid eating disorders and chronic pain, which they attributed to lack of screening tools and limited training. Clinicians collaborated with and consulted clinicians who encountered adolescents with chronic pain and/or eating disorders. Conclusions: Results reflect clinicians’ desire for additional resources, training, and collaboration to address the needs of this population. Targets for future research efforts in comorbid pain and eating disorders were highlighted. Specifically, results support the development of screening tools, program development to improve training in complex medical and psychiatric presentations, and methods to facilitate more collaboration and consultation across health care settings, disciplines, and specialties. Full article
(This article belongs to the Section Clinical Pediatrics)
Show Figures

Figure 1

21 pages, 699 KB  
Article
Remote Intent Service: Supporting Transparent Task-Oriented Collaboration for Mobile Devices
by Seyul Lee, Sooyong Kang and Hyuck Han
Electronics 2025, 14(14), 2849; https://doi.org/10.3390/electronics14142849 - 16 Jul 2025
Viewed by 792
Abstract
Platform support for mobile collaboration among multiple smart devices has been an active research issues in the computing community. Using platform-level collaboration functionalities, a mobile device can share its resources, I/O events, and even apps easily with other devices, which enables developing a [...] Read more.
Platform support for mobile collaboration among multiple smart devices has been an active research issues in the computing community. Using platform-level collaboration functionalities, a mobile device can share its resources, I/O events, and even apps easily with other devices, which enables developing a new kind of application that runs across multiple devices. In this work, we further extend the collaboration functionalities in mobile platforms by developing a novel platform service, remote intent service (RIS),which enables a running application in a device to outsource the execution of a specific task to another application in a remote device. Using the remote intent service, for example, we can view an attached document to an email, using a document viewer application in a remote device that has a larger screen, or conveniently browse an audio file that exists on another mobile device and play it locally. We implemented the remote intent service to the Android platform and measured the latency for executing such tasks in a remote device. The experimental results confirm that the remote intent service, for sending the intent plus retrieving the result, incurs an additional delay of less than 250 ms in total, and thus, it is practical. Full article
Show Figures

Figure 1

Back to TopTop