Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (288)

Search Parameters:
Keywords = 3D audio

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 2198 KB  
Article
Designing Augmented Virtuality: Impact of Audio and Video Features on User Experience in a Virtual Opera Performance
by Selina Palige, Franziska Legler, Frank Dittrich and Angelika C. Bullinger
Electronics 2026, 15(3), 577; https://doi.org/10.3390/electronics15030577 - 28 Jan 2026
Viewed by 126
Abstract
Emerging technologies offer cultural institutions opportunities to expand their audiences and make their content more accessible. Augmented Virtuality (AV), which integrates real-world content into virtual environments, shows particular potential. It enables the transmission of live or pre-recorded stage performances, such as concerts and [...] Read more.
Emerging technologies offer cultural institutions opportunities to expand their audiences and make their content more accessible. Augmented Virtuality (AV), which integrates real-world content into virtual environments, shows particular potential. It enables the transmission of live or pre-recorded stage performances, such as concerts and theater productions, from the stage to virtual audiences via increasingly affordable head-mounted displays (HMDs). However, as AV remains under-researched, little is known about which recording features enable an immersive user experience while maintaining cost-efficiency—an essential requirement for resource-constrained cultural institutions. In this context, we investigate the influence of features of audio and video recordings on key dimensions of user experience in the use case of a real opera performance integrated into a virtual opera house. We conducted a 2 × 2 within-subjects study with 30 participants to measure the effects of 2D versus 3D videos and stereophonic versus spatial audio rendering on several key dimensions of user experience. The results show that spatial audio has a positive impact on Place Illusion, whereas video dimensionality had no significant effect. Recommendations for the design of AV applications are derived from study results, aiming at balancing immersive user experience and cost-efficiency for virtual cultural participation. Full article
Show Figures

Figure 1

20 pages, 5360 KB  
Article
Experimental Investigation of Deviations in Sound Reproduction
by Paul Oomen, Bashar Farran, Luka Nadiradze, Máté Csanád and Amira Val Baker
Acoustics 2026, 8(1), 7; https://doi.org/10.3390/acoustics8010007 - 28 Jan 2026
Viewed by 537
Abstract
Sound reproduction is the electro-mechanical re-creation of sound waves using analogue and digital audio equipment. Although sound reproduction implies that repeated acoustical events are close to identical, numerous fixed and variable conditions affect the acoustic result. To arrive at a better understanding of [...] Read more.
Sound reproduction is the electro-mechanical re-creation of sound waves using analogue and digital audio equipment. Although sound reproduction implies that repeated acoustical events are close to identical, numerous fixed and variable conditions affect the acoustic result. To arrive at a better understanding of the magnitude of deviations in sound reproduction, amplitude deviation and phase distortion of a sound signal were measured at various reproduction stages and compared under a set of controlled acoustical conditions, one condition being the presence of a human subject in the acoustic test environment. Deviations in electroacoustic reproduction were smaller than ±0.2 dB amplitude and ±3 degrees phase shift when comparing trials recorded on the same day (Δt < 8 h, mean uncertainty u = 1.58%). Deviations increased significantly with greater than two times the amplitude and three times the phase shift when comparing trials recorded on different days (Δt > 16 h, u = 4.63%). Deviations further increased significantly with greater than 15 times the amplitude and the phase shift when a human subject was present in the acoustic environment (u = 24.64%). For the first time, this study shows that the human body does not merely absorb but can also cause amplification of sound energy. The degree of attenuation or amplification per frequency shows complex variance depending on the type of reproduction and the subject, indicating a nonlinear dynamic interaction. The findings of this study may serve as a reference to update acoustical standards and improve accuracy and reliability of sound reproduction and its application in measurements, diagnostics and therapeutic methods. Full article
Show Figures

Figure 1

17 pages, 4145 KB  
Article
Acoustic Metadata Design on Object-Based Audio Using Estimated 3D-Position from Visual Image Toward Depth-Directional Sound Image Localization
by Subaru Kato, Masato Nakayama, Takanobu Nishiura and Yoshiharu Soeta
Acoustics 2026, 8(1), 3; https://doi.org/10.3390/acoustics8010003 - 23 Jan 2026
Viewed by 111
Abstract
Multichannel audio is a sound field reproduction technology that uses multiple loudspeakers. Object-based audio is a playback method for multichannel audio that enables the construction of sound images at specified positions using coordinates within the playback space. However, the sound image positions must [...] Read more.
Multichannel audio is a sound field reproduction technology that uses multiple loudspeakers. Object-based audio is a playback method for multichannel audio that enables the construction of sound images at specified positions using coordinates within the playback space. However, the sound image positions must be manually specified by audio content creators, which increases the production workload, especially for works containing many sound images or feature films. We have previously proposed a method to reduce the workload of content creators by constructing sound images based on object positions in visual images. However, a significant challenge remains since depth localization of the sound image is not accurate enough. This paper aims to improve localization accuracy by changing the range of sound image movement along the depth direction. To confirm the localization accuracy of sound images constructed using the proposed method, we conducted a subjective evaluation experiment. The experiment identified the optimal movement range by presenting participants with visual images synchronized with sound images moving across varying spatial scales. Consequently, we were able to identify the range of sound image movement in the depth direction necessary for presenting sound images with high consistency with the visual images. Full article
Show Figures

Figure 1

18 pages, 10692 KB  
Article
Short-Time Homomorphic Deconvolution (STHD): A Novel 2D Feature for Robust Indoor Direction of Arrival Estimation
by Yeonseok Park and Jun-Hwa Kim
Sensors 2026, 26(2), 722; https://doi.org/10.3390/s26020722 - 21 Jan 2026
Viewed by 197
Abstract
Accurate indoor positioning and navigation remain significant challenges, with audio sensor-based sound source localization emerging as a promising sensing modality. Conventional methods, often reliant on multi-channel processing or time-delay estimation techniques such as Generalized Cross-Correlation, encounter difficulties regarding computational complexity, hardware synchronization, and [...] Read more.
Accurate indoor positioning and navigation remain significant challenges, with audio sensor-based sound source localization emerging as a promising sensing modality. Conventional methods, often reliant on multi-channel processing or time-delay estimation techniques such as Generalized Cross-Correlation, encounter difficulties regarding computational complexity, hardware synchronization, and reverberant environments where time difference in arrival cues are masked. While machine learning approaches have shown potential, their performance depends heavily on the discriminative power of input features. This paper proposes a novel feature extraction method named Short-Time Homomorphic Deconvolution, which transforms multi-channel audio signals into a 2D Time × Time-of-Flight representation. Unlike prior 1D methods, this feature effectively captures the temporal evolution and stability of time-of-flight differences between microphone pairs, offering a rich and robust input for deep learning models. We validate this feature using a lightweight Convolutional Neural Network integrated with a dual-stage channel attention mechanism, designed to prioritize reliable spatial cues. The system was trained on a large-scale dataset generated via simulations and rigorously tested using real-world data acquired in an ISO-certified anechoic chamber. Experimental results demonstrate that the proposed model achieves precise Direction of Arrival estimation with a Mean Absolute Error of 1.99 degrees in real-world scenarios. Notably, the system exhibits remarkable consistency between simulation and physical experiments, proving its effectiveness for robust indoor navigation and positioning systems. Full article
Show Figures

Figure 1

18 pages, 935 KB  
Article
A Lightweight Audio Spectrogram Transformer for Robust Pump Anomaly Detection
by Hangyu Zhang and Yi-Horng Lai
Machines 2026, 14(1), 114; https://doi.org/10.3390/machines14010114 - 19 Jan 2026
Viewed by 160
Abstract
Industrial pumps are critical components in manufacturing and process plants, where early acoustic anomaly detection is essential for preventing unplanned downtime and reducing maintenance costs. In practice, however, strong background noise, severe class imbalance between rare faults and abundant normal data, and the [...] Read more.
Industrial pumps are critical components in manufacturing and process plants, where early acoustic anomaly detection is essential for preventing unplanned downtime and reducing maintenance costs. In practice, however, strong background noise, severe class imbalance between rare faults and abundant normal data, and the limited computing resources of edge devices make reliable deployment challenging. In this work, a lightweight Audio Spectrogram Transformer (Tiny-AST) is proposed for robust pump anomaly detection under imbalanced supervision. Building on the Audio Spectrogram Transformer, the internal Transformer encoder is redesigned by jointly reducing the embedding dimension, depth, and number of attention heads, and combined with a class frequency-based balanced sampling strategy and time–frequency masking augmentation. Experiments on the pump subset of the MIMII dataset across three SNR levels (−6 dB, 0 dB, 6 dB) demonstrate that Tiny-AST achieves an effective trade-off between computational efficiency and noise robustness. With 1.01 M parameters and 1.68 GFLOPs, it maintains superior performance under heavy noise (−6 dB) compared to ultra-lightweight CNNs (MobileNetV3) and offers significantly lower computational cost than standard compact baselines (ResNet18, EfficientNet-B0). Furthermore, comparisons highlight the performance gains of this lightweight supervised approach over traditional unsupervised benchmarks (e.g., autoencoders, GANs) by effectively leveraging scarce fault samples. These results indicate that a carefully designed lightweight Transformer, together with appropriate sampling and augmentation, can provide competitive acoustic anomaly detection performance while remaining suitable for deployment on resource-constrained industrial edge devices. Full article
Show Figures

Figure 1

24 pages, 5019 KB  
Article
A Dual Stream Deep Learning Framework for Alzheimer’s Disease Detection Using MRI Sonification
by Nadia A. Mohsin and Mohammed H. Abdul Ameer
J. Imaging 2026, 12(1), 46; https://doi.org/10.3390/jimaging12010046 - 15 Jan 2026
Viewed by 229
Abstract
Alzheimer’s Disease (AD) is an advanced brain illness that affects millions of individuals across the world. It causes gradual damage to the brain cells, leading to memory loss and cognitive dysfunction. Although Magnetic Resonance Imaging (MRI) is widely used in AD diagnosis, the [...] Read more.
Alzheimer’s Disease (AD) is an advanced brain illness that affects millions of individuals across the world. It causes gradual damage to the brain cells, leading to memory loss and cognitive dysfunction. Although Magnetic Resonance Imaging (MRI) is widely used in AD diagnosis, the existing studies rely solely on the visual representations, leaving alternative features unexplored. The objective of this study is to explore whether MRI sonification can provide complementary diagnostic information when combined with conventional image-based methods. In this study, we propose a novel dual-stream multimodal framework that integrates 2D MRI slices with their corresponding audio representations. MRI images are transformed into audio signals using a multi-scale, multi-orientation Gabor filtering, followed by a Hilbert space-filling curve to preserve spatial locality. The image and sound modalities are processed using a lightweight CNN and YAMNet, respectively, then fused via logistic regression. The experimental results of the multimodal achieved the highest accuracy in distinguishing AD from Cognitively Normal (CN) subjects at 98.2%, 94% for AD vs. Mild Cognitive Impairment (MCI), and 93.2% for MCI vs. CN. This work provides a new perspective and highlights the potential of audio transformation of imaging data for feature extraction and classification. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

26 pages, 29009 KB  
Article
Quantifying the Relationship Between Speech Quality Metrics and Biometric Speaker Recognition Performance Under Acoustic Degradation
by Ajan Ahmed and Masudul H. Imtiaz
Signals 2026, 7(1), 7; https://doi.org/10.3390/signals7010007 - 12 Jan 2026
Viewed by 388
Abstract
Self-supervised learning (SSL) models have achieved remarkable success in speaker verification tasks, yet their robustness to real-world audio degradation remains insufficiently characterized. This study presents a comprehensive analysis of how audio quality degradation affects three prominent SSL-based speaker verification systems (WavLM, Wav2Vec2, and [...] Read more.
Self-supervised learning (SSL) models have achieved remarkable success in speaker verification tasks, yet their robustness to real-world audio degradation remains insufficiently characterized. This study presents a comprehensive analysis of how audio quality degradation affects three prominent SSL-based speaker verification systems (WavLM, Wav2Vec2, and HuBERT) across three diverse datasets: TIMIT, CHiME-6, and Common Voice. We systematically applied 21 degradation conditions spanning noise contamination (SNR levels from 0 to 20 dB), reverberation (RT60 from 0.3 to 1.0 s), and codec compression (various bit rates), then measured both objective audio quality metrics (PESQ, STOI, SNR, SegSNR, fwSNRseg, jitter, shimmer, HNR) and speaker verification performance metrics (EER, AUC-ROC, d-prime, minDCF). At the condition level, multiple regression with all eight quality metrics explained up to 80% of the variance in minDCF for HuBERT and 78% for WavLM, but only 35% for Wav2Vec2; EER predictability was lower (69%, 67%, and 28%, respectively). PESQ was the strongest single predictor for WavLM and HuBERT, while Shimmer showed the highest single-metric correlation for Wav2Vec2; fwSNRseg yielded the top single-metric R2 for WavLM, and PESQ for HuBERT and Wav2Vec2 (with much smaller gains for Wav2Vec2). WavLM and HuBERT exhibited more predictable quality-performance relationships compared to Wav2Vec2. These findings establish quantitative relationships between measurable audio quality and speaker verification accuracy at the condition level, though substantial within-condition variability limits utterance-level prediction accuracy. Full article
Show Figures

Figure 1

29 pages, 721 KB  
Systematic Review
Sex and Gender Aspects in Vestibular Disorders: Current Knowledge and Emerging Perspectives—A Systematic Review
by Leonardo Franz, Andrea Frosolini, Daniela Parrino, Giulio Badin, Chiara Pavone, Roberta Cenedese, Agnese Vitturi, Margherita Terenzani, Charles Nicholas Babb, Cosimo de Filippis, Elisabetta Zanoletti and Gino Marioni
Diagnostics 2026, 16(2), 197; https://doi.org/10.3390/diagnostics16020197 - 8 Jan 2026
Viewed by 602
Abstract
Background/Objectives: As precision medicine advances, attention to sex and gender determinants across epidemiological and clinical domains has intensified. However, in the audio-vestibular field, knowledge on sex- and gender-related aspects remains relatively limited. The main aim of this review has been to analyze [...] Read more.
Background/Objectives: As precision medicine advances, attention to sex and gender determinants across epidemiological and clinical domains has intensified. However, in the audio-vestibular field, knowledge on sex- and gender-related aspects remains relatively limited. The main aim of this review has been to analyze the available gender medicine-based evidence in vestibular disorders. In particular, our investigation considered the following: (i) pathophysiology and clinical presentation, including differences in predominant signs and symptoms, diagnostic modalities and findings, underlying biological mechanisms associated with vestibular disorders across sex-specific groups; (ii) prognostic variables, including response to treatment, recovery rates, and long-term functional outcomes; (iii) the potential role of sex- and gender-specific diagnostic and therapeutic approaches in the management of vestibular disorders. Methods: Our protocol was registered on PROSPERO (CRD42025641292). A literature search was conducted screening PubMed, Scopus and Web of Science databases. After removal of duplicates and implementation of our inclusion/exclusion criteria, 67 included studies were identified and analyzed. Results: Several studies reported a higher incidence of vestibular dysfunctions among females, with proposed associations involving hormonal fluctuations, calcium metabolism and vitamin D. Estrogen receptors within the inner ear and their regulatory effects on calcium homeostasis have been proposed as potential mechanisms underlying these sex-specific differences. Furthermore, lifestyle factors, comorbidities and differential health-seeking behaviors between males and females may also modulate disease expression and clinical course. Conclusions: Gender-specific variables could not be independently analyzed because none of the included studies systematically reported gender-related data, representing a limitation of the available evidence. Current evidence suggests the presence of sex-related differences in the epidemiology and clinical expression of vestibular disorders, but substantial gaps remain regarding mechanisms, outcomes, and clinical implications. Future research should prioritize prospective, adequately powered studies specifically designed to assess sex and gender influences, integrating biological, psychosocial, and patient-reported outcomes, and adopting standardized sex- and gender-sensitive reporting frameworks. Full article
(This article belongs to the Section Clinical Diagnosis and Prognosis)
Show Figures

Figure 1

33 pages, 3147 KB  
Review
Perception–Production of Second-Language Mandarin Tones Based on Interpretable Computational Methods: A Review
by Yujiao Huang, Zhaohong Xu, Xianming Bei and Huakun Huang
Mathematics 2026, 14(1), 145; https://doi.org/10.3390/math14010145 - 30 Dec 2025
Viewed by 495
Abstract
We survey recent advances in second-language (L2) Mandarin lexical tones research and show how an interpretable computational approach can deliver parameter-aligned feedback across perception–production (P ↔ P). We synthesize four strands: (A) conventional evaluations and tasks (identification, same–different, imitation/read-aloud) that reveal robust tone-pair [...] Read more.
We survey recent advances in second-language (L2) Mandarin lexical tones research and show how an interpretable computational approach can deliver parameter-aligned feedback across perception–production (P ↔ P). We synthesize four strands: (A) conventional evaluations and tasks (identification, same–different, imitation/read-aloud) that reveal robust tone-pair asymmetries and early P ↔ P decoupling; (B) physiological and behavioral instrumentation (e.g., EEG, eye-tracking) that clarifies cue weighting and time course; (C) audio-only speech analysis, from classic F0 tracking and MFCC–prosody fusion to CNN/RNN/CTC and self-supervised pipelines; and (D) interpretable learning, including attention and relational models (e.g., graph neural networks, GNNs) opened with explainable AI (XAI). Across strands, evidence converges on tones as time-evolving F0 trajectories, so movement, turning-point timing, and local F0 range are more diagnostic than height alone, and the contrast between Tone 2 (rising) and Tone 3 (dipping/low) remains the persistent difficulty; learners with tonal vs. non-tonal language backgrounds weight these cues differently. Guided by this synthesis, we outline a tool-oriented framework that pairs perception and production on the same items, jointly predicts tone labels and parameter targets, and uses XAI to generate local attributions and counterfactual edits, making feedback classroom-ready. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

19 pages, 2610 KB  
Article
Open HTML5 Widgets for Smart Learning: Enriching Educational 360° Virtual Tours and a Comparative Evaluation vs. H5P
by Félix Fariña-Rodriguez, Jose Luis Saorín, Dámari Melian Díaz, Jose Luis Saorín-Ferrer and Cecile Meier
Appl. Sci. 2026, 16(1), 338; https://doi.org/10.3390/app16010338 - 29 Dec 2025
Viewed by 309
Abstract
In educational smart learning contexts, 360° virtual tours deliver authentic, cross-device experiences, but uptake is limited by subscription-based authoring tools and free options that restrict in-tour rich media embedding. To address this, we present a library of eight open-source HTML5 widgets (image gallery, [...] Read more.
In educational smart learning contexts, 360° virtual tours deliver authentic, cross-device experiences, but uptake is limited by subscription-based authoring tools and free options that restrict in-tour rich media embedding. To address this, we present a library of eight open-source HTML5 widgets (image gallery, PDF viewer, quiz, 3D model viewer, webpage viewer, audio player, YouTube viewer, and image comparison) that can be embedded directly in the viewer as HTML pop-ups (e.g., CloudPano) or run standalone, with dual packaging (single self-contained HTML or server-hosted assets referenced by URL). Evaluation is limited to technical efficiency (resource size, load performance, and cross-device/browser compatibility), with pedagogical outcomes and learner performance beyond the scope. The architecture minimizes dependencies and enables reuse in virtual classrooms via iframes. We provide a unified web interface and a repository to promote adoption, auditability, and community contributions. The results show that standalone widgets are between 20 and 100 times smaller than H5P equivalents produced with Lumi Education and exhibit shorter measured load times (0.1–0.5 ms). Seamless integration is demonstrated for CloudPano and Moodle. By lowering costs, simplifying deployment, and broadening in-tour media capabilities, the proposed widgets offer a pragmatic pathway to enrich educational 360° tours. Full article
(This article belongs to the Special Issue Application of Smart Learning in Education)
Show Figures

Figure 1

23 pages, 5039 KB  
Article
A3DSimVP: Enhancing SimVP-v2 with Audio and 3D Convolution
by Junfeng Yang, Mingrui Long, Hongjia Zhu, Limei Liu, Wenzhi Cao, Qin Li and Han Peng
Electronics 2026, 15(1), 112; https://doi.org/10.3390/electronics15010112 - 25 Dec 2025
Viewed by 291
Abstract
In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a [...] Read more.
In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a critical challenge. Traditional error control strategies, such as Forward Error Correction (FEC) and Automatic Repeat Request (ARQ), often introduce excessive latency or bandwidth overhead. Meanwhile, receiver-side concealment methods struggle under high motion or significant packet loss, motivating the exploration of predictive models. SimVP-v2, with its efficient convolutional architecture and Gated Spatiotemporal Attention (GSTA) mechanism, provides a strong baseline by reducing complexity and achieving competitive prediction performance. Despite its merits, SimVP-v2’s reliance on 2D convolutions for implicit temporal aggregation limits its capacity to capture complex motion trajectories and long-term dependencies. This often results in artifacts such as motion blur, detail loss, and accumulated errors. Furthermore, its single-modality design ignores the complementary contextual cues embedded in the audio stream. To overcome these issues, we propose A3DSimVP (Audio- and 3D-Enhanced SimVP-v2), which integrates explicit spatio-temporal modeling with multimodal feature fusion. Architecturally, we replace the 2D depthwise separable convolutions within the GSTA module with their 3D counterparts, introducing a redesigned GSTA-3D module that significantly improves motion coherence across frames. Additionally, an efficient audio–visual fusion strategy supplements visual features with contextual audio guidance, thereby enhancing the model’s robustness and perceptual realism. We validate the effectiveness of A3DSimVP’s improvements through extensive experiments on the KTH dataset. Our model achieves a PSNR of 27.35 dB, surpassing the 27.04 of the SimVP-v2 baseline. Concurrently, our improved A3DSimVP model reduces the loss metrics on the KTH dataset, achieving an MSE of 43.82 and an MAE of 385.73, both lower than the baseline. Crucially, our LPIPS metric is substantially lowered to 0.22. These data tangibly confirm that A3DSimVP significantly enhances both structural fidelity and perceptual quality while maintaining high predictive accuracy. Notably, A3DSimVP attains faster inference speeds than the baseline with only a marginal increase in computational overhead. These results establish A3DSimVP as an efficient and robust solution for latency-critical video applications. Full article
(This article belongs to the Special Issue Digital Intelligence Technology and Applications, 2nd Edition)
Show Figures

Figure 1

24 pages, 5274 KB  
Article
Improved BiLSTM-TDOA-Based Localization Method for Laying Hen Cough Sounds
by Feng Qiu, Qifeng Li, Yanrong Zhuang, Xiaoli Ding, Yue Wu, Yuxin Wang, Yujie Zhao, Haiqing Zhang, Zhiyu Ren, Chengrong Lai and Ligen Yu
Agriculture 2026, 16(1), 28; https://doi.org/10.3390/agriculture16010028 - 22 Dec 2025
Viewed by 315
Abstract
Cough sounds are a key acoustic indicator for detecting respiratory diseases in laying hens, which have become increasingly prevalent with the intensification of poultry housing systems. As an important early signal, cough sounds play a vital role in disease prevention and precision health [...] Read more.
Cough sounds are a key acoustic indicator for detecting respiratory diseases in laying hens, which have become increasingly prevalent with the intensification of poultry housing systems. As an important early signal, cough sounds play a vital role in disease prevention and precision health management through timely recognition and spatial localization. In this study, an improved BiLSTM–TDOA method was proposed for the accurate recognition and localization of laying hen cough sounds. Nighttime audio data were collected and preprocessed to extract 81 acoustic features, including formant parameters, MFCC, LPCC, and their first and second derivatives. These features were then input into a BiLSTM-Attention model, which achieved a precision of 97.50%, a recall of 90.70%, and an F1-score of 0.9398. An improved TDOA algorithm was then applied for three-dimensional sound source localization, which resulted in mean absolute errors of 0.1453 m, 0.1952 m, and 0.1975 m along the X, Y, and Z axes across 31 positions. The results demonstrated that the proposed method enabled accurate recognition and 3D localization of abnormal vocalizations in laying hens, which will provide a novel approach for early detection, precise control, and intelligent health monitoring of respiratory diseases in poultry houses. Full article
(This article belongs to the Special Issue Modeling of Livestock Breeding Environment and Animal Behavior)
Show Figures

Figure 1

15 pages, 2973 KB  
Article
Vibro-Acoustic Characterization of Additively Manufactured Loudspeaker Enclosures: A Parametric Study of Material and Infill Influence
by Jakub Konopiński, Piotr Sosiński, Mikołaj Wanat and Piotr Góral
Signals 2025, 6(4), 73; https://doi.org/10.3390/signals6040073 - 12 Dec 2025
Viewed by 1059
Abstract
This paper presents a comparative analysis of the influence of Fused Deposition Modeling (FDM) parameters—specifically material type, infill geometry, and density—on the vibro-acoustic characteristics of loudspeaker enclosures. The enclosures were designed as exponential horns to intensify resonance phenomena for precise evaluation. Twelve unique [...] Read more.
This paper presents a comparative analysis of the influence of Fused Deposition Modeling (FDM) parameters—specifically material type, infill geometry, and density—on the vibro-acoustic characteristics of loudspeaker enclosures. The enclosures were designed as exponential horns to intensify resonance phenomena for precise evaluation. Twelve unique configurations were fabricated using three materials with distinct damping properties (PLA, ABS, wood-composite) and three internal geometries (linear, honeycomb, Gyroid). Key vibro-acoustic properties were assessed via digital signal processing of recorded audio signals, including relative frequency response and time-frequency (spectrogram) analysis, and correlated with a predictive Finite Element Analysis (FEA) model of mechanical vibrations. The study unequivocally demonstrates that a material with a high internal damping coefficient is a critical factor. The wood-composite enabled a reduction in the main resonance amplitude by approximately 4 dB compared to PLA with the same geometry, corresponding to a predicted 86% reduction in mechanical vibration. Furthermore, the results show that a synergy between a high-damping material and an advanced, energy-dissipating infill (Gyroid) is crucial for achieving high acoustic fidelity. The wood-composite with 10% Gyroid infill was identified as the optimal design, offering the most effective resonance damping and the most neutral tonal characteristic. This work provides a valuable contribution to the field by establishing a clear link between FDM parameters and acoustic outcomes, delivering practical guidelines for performance optimization in personalized audio systems. Full article
Show Figures

Figure 1

20 pages, 2845 KB  
Article
From Gaze to Music: AI-Powered Personalized Audiovisual Experiences for Children’s Aesthetic Education
by Jiahui Liu, Jing Liu and Hong Yan
Behav. Sci. 2025, 15(12), 1684; https://doi.org/10.3390/bs15121684 - 4 Dec 2025
Viewed by 489
Abstract
The cultivation of aesthetic appreciation through engagement with exemplary artworks constitutes a fundamental pillar in fostering children’s cognitive and emotional development, while simultaneously facilitating multidimensional learning experiences across diverse perceptual domains. However, children in early stages of cognitive development frequently encounter substantial challenges [...] Read more.
The cultivation of aesthetic appreciation through engagement with exemplary artworks constitutes a fundamental pillar in fostering children’s cognitive and emotional development, while simultaneously facilitating multidimensional learning experiences across diverse perceptual domains. However, children in early stages of cognitive development frequently encounter substantial challenges when attempting to comprehend and internalize complex visual narratives and abstract artistic concepts inherent in sophisticated artworks. This study presents an innovative methodological framework designed to enhance children’s artwork comprehension capabilities by systematically leveraging the theoretical foundations of audio-visual cross-modal integration. Through investigation of cross-modal correspondences between visual and auditory perceptual systems, we developed a sophisticated methodology that extracts and interprets musical elements based on gaze behavior patterns derived from prior pilot studies when observing artworks. Utilizing state-of-the-art deep learning techniques, specifically Recurrent Neural Networks (RNNs), these extracted visual–musical correspondences are subsequently transformed into cohesive, aesthetically pleasing musical compositions that maintain semantic and emotional congruence with the observed visual content. The efficacy and practical applicability of our proposed method were validated through empirical evaluation involving 96 children (analyzed through objective behavioral assessments using eye-tracking technology), complemented by qualitative evaluations from 16 parents and 5 experienced preschool educators. Our findings show statistically significant improvements in children’s sustained engagement and attentional focus under AI-generated, artwork-matched audiovisual support, potentially scaffolding deeper processing and informing future developments in aesthetic education. The results demonstrate statistically significant improvements in children’s sustained engagement (fixation duration: 58.82 ± 7.38 s vs. 41.29 ± 6.92 s, p < 0.001, Cohen’s d ≈ 1.29), attentional focus (AOI gaze frequency increased 73%, p < 0.001), and subjective evaluations from parents (mean ratings 4.56–4.81/5) when visual experiences are augmented by AI-generated, personalized audio-visual experiences. Full article
(This article belongs to the Section Cognition)
Show Figures

Figure 1

29 pages, 4304 KB  
Review
From Pixels to Motion: A Systematic Analysis of Translation-Based Video Synthesis Techniques
by Pratim Saha and Chengcui Zhang
Information 2025, 16(11), 990; https://doi.org/10.3390/info16110990 - 16 Nov 2025
Viewed by 725
Abstract
Translation-based Video Synthesis (TVS) has emerged as a transformative technology that enables sophisticated manipulation and generation of dynamic visual content. This comprehensive survey systematically examines the evolution of TVS methodologies, encompassing both image-to-video (I2V) and video-to-video (V2V) translation approaches. We analyze the progression [...] Read more.
Translation-based Video Synthesis (TVS) has emerged as a transformative technology that enables sophisticated manipulation and generation of dynamic visual content. This comprehensive survey systematically examines the evolution of TVS methodologies, encompassing both image-to-video (I2V) and video-to-video (V2V) translation approaches. We analyze the progression from domain-specific facial animation techniques to generalizable diffusion-based frameworks, investigating architectural innovations that address fundamental challenges in temporal consistency and cross-domain adaptation. Our investigation categorizes V2V methods into paired approaches, including conditional GAN-based frameworks and world-consistent synthesis, and unpaired approaches organized into five distinct paradigms: 3D GAN-based processing, temporal constraint mechanisms, optical flow integration, content-motion disentanglement learning, and extended image-to-image frameworks. Through comprehensive evaluation across diverse datasets, we analyze the performance using spatial quality metrics, temporal consistency measures, and semantic preservation indicators. We present a qualitative analysis comparing methods evaluated on identical benchmarks, revealing critical trade-offs between visual quality, temporal coherence, and computational efficiency. Current challenges persist in long-term temporal coherence, with future research directions identified in long-range video generation, audio-visual synthesis for enhanced realism, and development of comprehensive evaluation metrics that better capture human perceptual quality. This survey provides a structured understanding of methodological foundations, evaluation frameworks, and future research opportunities in TVS. We identify pathways for advancing cross-domain generalization, improving computational efficiency, and developing enhanced evaluation metrics for practical deployment, contributing to the broader understanding of temporal video synthesis technologies. Full article
(This article belongs to the Special Issue Computer and Multimedia Technology)
Show Figures

Figure 1

Back to TopTop