Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (305)

Search Parameters:
Keywords = audio-visual information

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 1020 KB  
Article
Research on the Diagnosis of Abnormal Sound Defects in Automobile Engines Based on Fusion of Multi-Modal Images and Audio
by Yi Xu, Wenbo Chen and Xuedong Jing
Electronics 2026, 15(7), 1406; https://doi.org/10.3390/electronics15071406 - 27 Mar 2026
Abstract
Against the global carbon neutrality target, predictive maintenance (PdM) of automotive engines represents a core technical strategy to advance the sustainable development of the automotive industry. Conventional single-modal diagnostic approaches for engine abnormal sound defects suffer from low accuracy and weak anti-interference capability. [...] Read more.
Against the global carbon neutrality target, predictive maintenance (PdM) of automotive engines represents a core technical strategy to advance the sustainable development of the automotive industry. Conventional single-modal diagnostic approaches for engine abnormal sound defects suffer from low accuracy and weak anti-interference capability. Existing multi-modal fusion methods fail to deeply mine the physical coupling between cross-modal features and often entail excessive model complexity, hindering deployment on resource-constrained on-board edge devices. To resolve these limitations, this study proposes a Physical Prior-Embedded Cross-Modal Attention (PPE-CMA) mechanism for lightweight multi-modal fusion diagnosis of engine abnormal sound defects. First, wavelet packet decomposition (WPD) and mel-frequency cepstral coefficients (MFCC) are integrated to extract time-frequency features from engine audio signals, while a channel-pruned ResNet18 is employed to extract spatial features from engine thermal imaging and vibration visualization images. Second, the PPE-CMA module is designed to adaptively assign attention weights to audio and image features by exploiting the physical coupling between engine fault acoustic and visual characteristics, enabling efficient cross-modal feature fusion with redundant information suppression. A rigorous theoretical derivation is provided to link cosine similarity with the physical correlation of engine fault acoustic-visual features, justifying the attention weight constraint (β = 1 − α) from the perspective of fault feature physical coupling. Third, an improved lightweight XGBoost classifier is constructed for fault classification, and a hybrid data augmentation strategy customized for engine multi-modal data is proposed to address the small-sample challenge in industrial applications. Ablation experiments on ResNet18 pruning ratios verify the optimal trade-off between diagnostic performance and computational efficiency, while feature distribution analysis validates the authenticity and effectiveness of the hybrid augmentation strategy. Experimental results on a self-constructed multi-modal dataset show that the proposed method achieves 98.7% diagnostic accuracy and a 98.2% F1-score, retaining 96.5% accuracy under 90 dB high-level environmental noise, with an end-to-end inference speed of 0.8 ms per sample (including preprocessing, feature extraction, and classification). Cross-engine and cross-domain validation on a 2.0T diesel engine small-sample dataset and the open-source SEMFault-2024 dataset yield average accuracies of 94.8% and 95.2%, respectively, demonstrating strong generalization. This method effectively enhances the accuracy and robustness of engine abnormal sound defect diagnosis, offering a lightweight technical solution for on-board real-time fault diagnosis and in-plant online quality inspection. By reducing engine fault-induced energy loss and spare parts waste, it further promotes energy conservation and emission reduction in the automotive industry. Quantified experimental data on fuel efficiency improvement and carbon emission reduction are provided to substantiate the ecological benefits of the proposed framework. Full article
Show Figures

Figure 1

37 pages, 5953 KB  
Article
Fire Detection Using Sound Analysis Based on a Hybrid Artificial Intelligence Algorithm
by Robert-Nicolae Boştinaru, Sebastian-Alexandru Drǎguşin, Nicu Bizon, Dumitru Cazacu and Gabriel-Vasile Iana
Algorithms 2026, 19(3), 240; https://doi.org/10.3390/a19030240 - 23 Mar 2026
Viewed by 204
Abstract
Fire detection is a critical task for early warning systems, particularly in environments where visual sensing is unreliable. While most existing approaches rely on image-based or smoke-based detection, acoustic signals provide complementary information capable of capturing early combustion-related events. This study investigates deep [...] Read more.
Fire detection is a critical task for early warning systems, particularly in environments where visual sensing is unreliable. While most existing approaches rely on image-based or smoke-based detection, acoustic signals provide complementary information capable of capturing early combustion-related events. This study investigates deep learning models for sound-based fire detection, focusing on convolutional and Transformer-based architectures. VGG16 and VGG19 convolutional neural networks are adapted to process time-frequency audio representations for binary classification into Fire and No-Fire classes. An Audio Spectrogram Transformer (AST) is further employed to model long-range temporal dependencies in acoustic data. Finally, a hybrid VGG19-AST architecture is proposed, in which convolutional layers extract local spectral–temporal features, and Transformer-based self-attention performs global sequence modeling. The models are evaluated on a curated dataset containing fire sounds and diverse environmental background noises under multiple noise conditions. Experimental results demonstrate competitive performance across convolutional and Transformer-based models, while the proposed hybrid VGG19-AST architecture achieves the most consistent overall results. The findings suggest that integrating convolutional feature extraction with self-attention-based global modeling enhances robustness under complex acoustic variability. The proposed hybrid framework provides a scalable and cost-effective solution for sound-based fire detection, particularly in scenarios where visual monitoring may be obstructed or ineffective. Full article
Show Figures

Figure 1

20 pages, 2559 KB  
Article
Enhancing Reflection in VR-Based Evacuation Training Through Synchronized Auditory Clue Presentation: A Pilot Study
by Hiroyuki Mitsuhara, Ryoichi Yamanaka, Maya Matsushige and Yasunori Kozuki
Appl. Sci. 2026, 16(6), 3048; https://doi.org/10.3390/app16063048 - 21 Mar 2026
Viewed by 125
Abstract
Virtual reality (VR)-based evacuation training provides a safe and immersive environment for participants to experience disaster scenarios. However, existing systems often prioritize the experience itself, leaving the critical stage of reflection—essential for refining and stabilizing evacuation knowledge—under-supported. This study presents a qualitative pilot [...] Read more.
Virtual reality (VR)-based evacuation training provides a safe and immersive environment for participants to experience disaster scenarios. However, existing systems often prioritize the experience itself, leaving the critical stage of reflection—essential for refining and stabilizing evacuation knowledge—under-supported. This study presents a qualitative pilot investigation into an extended reflection support function for a VR-based evacuation training system. Unlike traditional replay functions that only visualize avatar movements, our system synchronizes spatialized environmental sounds and recorded verbal utterances, i.e., voices of the user and non-player characters (NPCs), with the visual replay. A preliminary experiment involving eight university students was conducted to evaluate how these auditory clues influence the reflection-on-action process. Qualitative results indicate that audio clues help participants recall their internal decision-making processes and provide essential context for understanding the actions of others (NPCs). The findings suggest that the integration of auditory information facilitates evacuation knowledge refinement, i.e., the transition from mere experience to the formulation of concrete survival concepts. Although limited by a small sample size, this study highlights the potential of multi-modal reflection support in VR-based evacuation training. Full article
Show Figures

Figure 1

31 pages, 4706 KB  
Article
LGCDF: Label-Guided Contrastive Disentanglement Fusion of Sensitive Attribute-Free Representations for Fair Multimodal Sentiment Analysis
by Rongfei Chen, Xinming Zhang, Siwei Cheng, Tingting Zhang, Hanlin Zhang and Wei Zhang
Appl. Sci. 2026, 16(6), 2952; https://doi.org/10.3390/app16062952 - 19 Mar 2026
Viewed by 156
Abstract
Multimodal sentiment analysis (MSA) has emerged as a prominent research frontier, enabling a comprehensive understanding of complex human emotions through the synergistic integration of heterogeneous multimodal signals. However, most existing approaches rely on idealized signal distribution assumptions, overlooking the detrimental impact of demographic [...] Read more.
Multimodal sentiment analysis (MSA) has emerged as a prominent research frontier, enabling a comprehensive understanding of complex human emotions through the synergistic integration of heterogeneous multimodal signals. However, most existing approaches rely on idealized signal distribution assumptions, overlooking the detrimental impact of demographic bias on representation fairness and fusion robustness. This paper proposes a Label-Guided Contrastive Decoupling Fusion (LGCDF) framework that enhances model robustness to demographic bias by learning and fusing multimodal representations invariant to Sensitive Attributes (SAs). Specifically, the proposed LGCDF framework employs gender-sensitive attribute information as modality-level constraints to achieve language-centric cross-modal sentiment alignment, which is accomplished by computing contrastive losses between text–audio and text–visual feature pairs. Moreover, it introduces a SA-guided contrastive decoupling mechanism that decomposes multimodal representations into SA-related and -independent components. The SA-independent components are subsequently fused through a cross-modal attention fusion strategy, thereby facilitating fair sentiment representation and enabling efficient and robust multimodal information fusion. Extensive experimental results demonstrate that the proposed LGCDF framework achieves superior performance in fair representation learning and cross-modal information fusion while maintaining strong robustness under varying gender distribution biases. Full article
Show Figures

Figure 1

22 pages, 2166 KB  
Article
Sound-to-Image Translation Through Direct Cross-Modal Connection Using a Convolutional–Attention Generative Model
by Leonardo A. Fanzeres, Climent Nadeu and José A. R. Fonollosa
Appl. Sci. 2026, 16(6), 2942; https://doi.org/10.3390/app16062942 - 18 Mar 2026
Viewed by 152
Abstract
Sound plays a fundamental role in human perception, conveying information about events, objects, and spatial dynamics that may not be visually accessible. However, current technologies such as Acoustic Event Detection typically reduce complex soundscapes to textual labels, often failing to preserve their semantic [...] Read more.
Sound plays a fundamental role in human perception, conveying information about events, objects, and spatial dynamics that may not be visually accessible. However, current technologies such as Acoustic Event Detection typically reduce complex soundscapes to textual labels, often failing to preserve their semantic richness. This limitation motivates the exploration of sound-to-image (S2I) translation as an alternative connection between audio and visual modalities. Unlike multimodal approaches guided by intermediary constraints during the learning process, we investigate S2I translation without class supervision, cluster-based alignment, or textual mediation, a paradigm we refer to as direct S2I translation. To the best of our knowledge, apart from our previous work, no prior study addresses S2I translation under this fully direct setting. We propose a convolutional–attention generative framework composed of an audio encoder and a densely connected GAN integrating self-attention and cross-attention mechanisms. The attention-based model is systematically compared with a purely convolutional baseline. Results show that introducing attention at early stages of the generator significantly improves translation performance, increasing the likelihood of producing interpretable and semantically coherent visual representations of sound. These findings indicate that attention strengthens semantic correspondence between audio and vision while preserving the fully direct nature of the translation process. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 2235 KB  
Article
Sensing the Sacred: Non-Verbal Performance and the Pluralities of Contemporary Religious Space
by Frederico Dinis
Religions 2026, 17(3), 376; https://doi.org/10.3390/rel17030376 - 17 Mar 2026
Viewed by 214
Abstract
This article investigates how site-specific audiovisual performances can reconfigure the contemporary relationship between art and the sacred in contexts characterised by religious plurality and late-modern disenchantment. In response to the erosion of traditional religious language, it examines how non-verbal mediation through sound, moving [...] Read more.
This article investigates how site-specific audiovisual performances can reconfigure the contemporary relationship between art and the sacred in contexts characterised by religious plurality and late-modern disenchantment. In response to the erosion of traditional religious language, it examines how non-verbal mediation through sound, moving images and embodied presence can enable alternative ways of engaging with sacred spaces. Drawing on three artistic interventions created within different religious contexts, the article shows that performative memory emerges as a presence-in-absence phenomenon, activated through sensory, spatial and atmospheric engagement. The analysis reveals that religious spaces act as active agents in the process of performative remembrance, generating shared experiences centred on themes of shelter, humility, and fragility. Methodologically, the research takes a practice-as-research approach, informed by an emergent research design. This approach combines site immersion, audiovisual performance and reflexive analysis in order to articulate the knowledge produced through artistic practice. The findings suggest that these performances counter the accelerated temporal regimes characteristic of late-modern life by cultivating slowness, attentiveness, and affective resonance. The article concludes that performative memory functions as a relational practice through which the sacred persists and is reimagined beyond doctrinal representation, fostering inclusive forms of encounter within plural religious environments. In this way, the study contributes to broader sociological and humanistic debates on art, religion, and the transformation of sacred experience in contemporary society. Full article
Show Figures

Figure 1

16 pages, 2783 KB  
Article
The Spectacle of Power: Hybridisation and Digital Populism in White House Communication (2025)
by Ana Velasco Molpeceres, Jonattan Rodríguez Hernández and Eglée Ortega Fernández
Soc. Sci. 2026, 15(3), 186; https://doi.org/10.3390/socsci15030186 - 14 Mar 2026
Viewed by 346
Abstract
This article examines the institutional communication of the White House on X (formerly Twitter) during the first nine months of Donald Trump’s second presidency (January–October 2025). Through a mixed-methods approach that combines thematic, network, and lexical–discursive analysis, the study explores how the presidential [...] Read more.
This article examines the institutional communication of the White House on X (formerly Twitter) during the first nine months of Donald Trump’s second presidency (January–October 2025). Through a mixed-methods approach that combines thematic, network, and lexical–discursive analysis, the study explores how the presidential account (@WhiteHouse) integrates informational, emotional, and performative dimensions within a hybrid media system. The dataset comprises 4297 tweets, analysed through Graphext, NodeXL/Gephi, and Sketch Engine. The findings reveal that audiovisual and symbolic content dominate over political or policy-related topics, while financial and technological actors occupy central positions in the network of mentions. Lexical analysis highlights three semantic nuclei—Trump, President, and America—that structure a moralised and affective narrative of leadership. The results reflect that White House communication operates as a hybrid and post-bureaucratic model, where political legitimacy increasingly depends on visibility and reputational association with market logics. Full article
(This article belongs to the Special Issue Big Data and Political Communication)
Show Figures

Figure 1

26 pages, 2077 KB  
Review
Intervention Practices for Promoting Well-Being and Cognitive Development in Hospitalized Children: A Scoping Review
by Sofía Castro-Trigo, Alexa von Hagen, Paloma Alonso-Stuyck, Pau Miquel, Donovan Barba-Reynoso, Agustina Quintero, Julieta Zorrilla de San Martín and Augusto Ferreira-Umpiérrez
Eur. J. Investig. Health Psychol. Educ. 2026, 16(3), 41; https://doi.org/10.3390/ejihpe16030041 - 10 Mar 2026
Viewed by 456
Abstract
Psychosocial and cognitive interventions are increasingly implemented in pediatric hospital settings. However, evidence regarding their structure, delivery, and outcomes remains dispersed. This scoping review aimed to synthesize current evidence on these interventions, focusing on their design, professional delivery, reported outcomes, and existing research [...] Read more.
Psychosocial and cognitive interventions are increasingly implemented in pediatric hospital settings. However, evidence regarding their structure, delivery, and outcomes remains dispersed. This scoping review aimed to synthesize current evidence on these interventions, focusing on their design, professional delivery, reported outcomes, and existing research gaps. It was conducted using established scoping review methodology and is reported in accordance with PRISMA-ScR guidelines. Systematic searches were conducted in PubMed, Scopus, Web of Science, PsycINFO, and ProQuest Dissertations to identify peer-reviewed and grey literature published between 2009 and 2024. Following study selection based on predefined inclusion criteria, data were charted using a standardized data extraction form and analyzed to synthesize and map key characteristics of interventions and outcomes in relation to the review questions. Sixty-one studies met the inclusion criteria. Interventions primarily targeted school-aged children and adolescents and were delivered by psychologists, educators, and nurses, frequently within interdisciplinary teams. A wide range of materials and resources were used, including digital technologies, playful and artistic materials, audiovisual and informational supports, and sensory or therapeutic objects. Techniques primarily involved guided conversation, cognitive and body-based exercises, and play-based approaches. Outcomes mainly focused on emotional well-being and recovery, while fewer interventions explicitly addressed cognitive processes such as attention and executive functioning. Overall, reported effects were generally positive. These findings suggest that psychosocial and cognitive interventions in pediatric hospital settings reflect a wide range of approaches, while also revealing methodological heterogeneity, variability in reporting, and the underrepresentation of low- and middle-income countries, pointing to the need for more robust and inclusive future research. Full article
Show Figures

Figure 1

27 pages, 12591 KB  
Article
Audio–Visual Fusion Sim2Real Platform for Anti-UAV Detection and Tracking
by Xiaohong Nian, Haolun Liu and Xunhua Dai
Drones 2026, 10(3), 190; https://doi.org/10.3390/drones10030190 - 10 Mar 2026
Viewed by 433
Abstract
To address the escalating security challenges posed by unauthorized Unmanned Aerial Vehicles, this paper presents a Sim2real physics-informed audio–visual fusion simulation platform designed to enhance Counter-Unmanned Aerial Vehicle detection and tracking performance. The proposed method integrates two complementary sensing pipelines: a physics-based acoustic [...] Read more.
To address the escalating security challenges posed by unauthorized Unmanned Aerial Vehicles, this paper presents a Sim2real physics-informed audio–visual fusion simulation platform designed to enhance Counter-Unmanned Aerial Vehicle detection and tracking performance. The proposed method integrates two complementary sensing pipelines: a physics-based acoustic localization system utilizing Time Difference of Arrival principles and a deep learning-driven visual detection framework. To ensure robust surveillance against non-cooperative targets, these pipelines are not only fused through strict spatiotemporal synchronization but also mutually reinforce each other—acoustic data guides visual attention in low-visibility scenarios typical of adversarial intrusions, while visual detections refine acoustic parameter estimation. Building upon prior work in multi-modal perception, we extend the framework to dynamic environments characterized by configurable visual obstructions, including smoke and fog, which frequently compromise conventional optical anti-drone systems. Experiments demonstrate that the fusion system progressively adapts to degraded visual conditions, extending tracking continuity from approximately 50% coverage under vision-only operation to near-continuous target awareness, with a moderate trade-off in average angular precision when acoustic-only segments are included. Physical validation with quadrotor Unmanned Aerial Vehicles confirms the platform’s capability to bridge simulation-to-reality gaps. Our results highlight the system’s robustness against sensor degradation and its potential to accelerate the development of resilient multisensor Counter-Unmanned Aerial Vehicle systems while reducing dependency on costly field testing. Full article
Show Figures

Figure 1

24 pages, 4302 KB  
Article
Adapted Route Instructions for Navigation Technologies in Support of Wheelchair Mobility in Urban Areas: Online Survey
by Sanaz Azimi, Mir Abolfazl Mostafavi, Krista L. Best, Aurélie Dommes and Angélique Montuwy
ISPRS Int. J. Geo-Inf. 2026, 15(3), 110; https://doi.org/10.3390/ijgi15030110 - 5 Mar 2026
Viewed by 386
Abstract
Wheelchair users face environmental barriers that limit their mobility and social participation. Although existing navigation tools support urban mobility, they often lack clear orientation and confirmation cues, and information on accessible and safe routes to meet wheelchair users’ needs. This study aims to [...] Read more.
Wheelchair users face environmental barriers that limit their mobility and social participation. Although existing navigation tools support urban mobility, they often lack clear orientation and confirmation cues, and information on accessible and safe routes to meet wheelchair users’ needs. This study aims to identify the most adapted route instructions for wheelchair users, examine characteristics’ (sociodemographic information and profiles) impact on their instructions’ choices, and evaluate instruction’s delivery modalities. An online questionnaire collected participants’ characteristics and agreement with the proposed route instruction formulations (different combinations of information like turn-by-turn instructions, landmarks, and accessibility information) regarding clarity, sufficiency, adaptability, and safety criteria. Formulations were evaluated across 14 navigation situations involving accessibility and safety challenges. Participants also rated communication modalities. 32 wheelchair-users (19 males, 13 females; mean age = 45.8 years; mean wheelchair experience = 23.5 years) participated. Data analysis reveals the importance of enriched turn-by-turn instructions, including non-turning actions, alerts, landmarks, and/or street names for participants. Alert-based formulations were favored in most situations, like uneven sidewalks, slopes and intersections. More enriched instructions were significantly acceptable among women and participants with greater wheelchair experience. Multimodal delivery, particularly visual and audio information, was also preferred. These findings help develop adaptive navigation tools, improving wheelchair users’ safe, confident mobility, autonomy, and social participation. Full article
Show Figures

Figure 1

15 pages, 669 KB  
Article
Dementia Detection from Spontaneous Speech Using Cross-Attention Fusion
by Felix Agbavor and Hualou Liang
J. Dement. Alzheimer's Dis. 2026, 3(1), 12; https://doi.org/10.3390/jdad3010012 - 2 Mar 2026
Viewed by 304
Abstract
Background/Objectives: Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that affects the daily lives of older adults, impacting their cognitive abilities as well as speech and language communication. Early detection is crucial, as it enables timely intervention and helps improve the quality [...] Read more.
Background/Objectives: Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that affects the daily lives of older adults, impacting their cognitive abilities as well as speech and language communication. Early detection is crucial, as it enables timely intervention and helps improve the quality of life for those affected. While large language models (LLMs) have shown promise from spontaneous speech, most studies are unimodal and miss complementary signals across modalities. Methods: We present an LLM-powered multimodal cross-attention framework that integrates lexical (text), acoustic (speech), and visual (image) information for dementia detection using the ADReSSo 2021 picture-description dataset. Within this framework, text data are encoded using the ModernBERT, audio features are extracted using the wav2vec 2.0-base-960, and the Cookie Theft image is represented through the CLIP ViT-L/14. These embeddings are linearly projected to a shared space and then combined via Transformer-based cross-attention, yielding a fused vector for AD detection. Results: Our results show that the trimodal model achieved the best overall performance when paired with an SVC classifier, reaching an accuracy of 0.8732 and an F1 score of 0.8571, surpassing both the top-performing unimodal and bimodal configurations. For interpretability, a sensitivity analysis of modality contributions reveals that text plays the primary role, audio provides complementary improvements, and image offers modest yet stabilizing contextual support. Conclusions: These results highlight that the method of multimodal embedding fusion significantly influences performance: a cross-attention block achieves an effective balance between accuracy and simplicity, producing integrated representations that align well with interpretable downstream classifiers. Full article
Show Figures

Figure 1

39 pages, 3488 KB  
Article
“We Leave at Least a Little Seed”: The School’s Role in Developing Students’ Agency Toward Climate Change
by Jennifer Cunha, Marcelo Félix, Sara Miranda and Pedro Rosário
Sustainability 2026, 18(5), 2350; https://doi.org/10.3390/su18052350 - 28 Feb 2026
Viewed by 296
Abstract
As in schools worldwide, climate change (CC) is addressed in curricula and environmental programs in Portugal. Grounded in Bandura’s human agency theory, effective CC mitigation requires the capacity to intentionally initiate, sustain, and reflect on behaviors to reduce greenhouse gas emissions, i.e., climate [...] Read more.
As in schools worldwide, climate change (CC) is addressed in curricula and environmental programs in Portugal. Grounded in Bandura’s human agency theory, effective CC mitigation requires the capacity to intentionally initiate, sustain, and reflect on behaviors to reduce greenhouse gas emissions, i.e., climate agency. This study aimed to map school’s role (environmental initiatives and CC teaching) in developing students’ climate agency and its determinants. Participants included 42 school representatives and 24 teachers from various subjects. Data sets, collected through online surveys, semi-structured interviews, and a focus group, were analyzed using content analysis. School representatives emphasized school initiatives requiring significant levels of student engagement (e.g., cleanups) but with limited participation. Most teachers reported employing transmissive teaching approaches, complemented by audio–visual resources and classroom discussions. Interviewees identified facilitators (e.g., family pro-environmental behaviors and municipal support), but mostly obstacles (e.g., limited instruction time and surface approach to learning) that contributed to a perceived minimal impact of CC education on their students. Overall, the data suggest that current environmental programs and CC teaching are not consistently developing students’ climate agency. The findings highlight the need to rethink formal and informal approaches to promote high-quality CC education and student agency in addressing the climate crisis. Full article
Show Figures

Figure 1

20 pages, 1914 KB  
Article
Influence of Multimodal AR-HUD Navigation Prompt Design on Driving Behavior at F-Type-5 M Intersections
by Ziqi Liu, Zhengxing Yang and Yifan Du
J. Eye Mov. Res. 2026, 19(1), 22; https://doi.org/10.3390/jemr19010022 - 11 Feb 2026
Viewed by 443
Abstract
In complex urban traffic environments, the design of multimodal prompts in augmented reality head-up displays (AR-HUDs) plays a critical role in driving safety and operational efficiency. Despite growing interest in audiovisual navigation assistance, empirical evidence remains limited regarding when prompts should be delivered [...] Read more.
In complex urban traffic environments, the design of multimodal prompts in augmented reality head-up displays (AR-HUDs) plays a critical role in driving safety and operational efficiency. Despite growing interest in audiovisual navigation assistance, empirical evidence remains limited regarding when prompts should be delivered and whether visual and auditory information should remain temporally aligned. To address this gap, this study aims to examine how audiovisual prompt timing and prompt mode influence driving behavior in AR-HUD navigation systems at complex F-type-5 m intersections through a within-subject experimental design. A 2 (prompt mode: synchronized vs. asynchronous) × 3 (prompt timing: −1000 m, −600 m, −400 m) design was employed to assess driver response time, situational awareness, and eye-movement measures, including average fixation duration and fixation count. The results showed clear main effects of both prompt mode and prompt timing. Compared with asynchronous prompts, synchronized prompts consistently resulted in shorter response times, reduced visual demand, and higher situational awareness. Driving performance also improved as prompt timing shifted closer to the intersection, from −1000 m to −400 m. But no significant interaction effects were found, suggesting that prompt mode and prompt timing can be treated as relatively independent design factors. In addition, among the six experimental conditions, the −400 m synchronized condition yielded the most favorable overall performance, whereas the −1000 m asynchronous condition performed worst. These findings indicate that in time-critical and low-tolerance scenarios, such as F-type-5 m intersections, near-distance synchronized multimodal prompts should be prioritized. This study provides empirical support for optimizing prompt timing and cross-modal temporal alignment in AR-HUD systems and offers actionable implications for interface and timing design. Full article
Show Figures

Figure 1

12 pages, 589 KB  
Article
Inclusive and Sustainable Digital Innovation Within the Amara Berri System
by Ana Belén Olmos Ortega, Cristina Medrano Pascual, Rosa Ana Alonso Ruiz, María García Pérez and María Ángeles Valdemoros San Emeterio
Sustainability 2026, 18(2), 947; https://doi.org/10.3390/su18020947 - 16 Jan 2026
Viewed by 366
Abstract
The current debate on digital education is at a crossroads between the need for technological innovation and the growing concern about the impact of passive screen use. In this context, identifying sustainable pedagogical models that integrate Information and Communication Technologies (ICT) in a [...] Read more.
The current debate on digital education is at a crossroads between the need for technological innovation and the growing concern about the impact of passive screen use. In this context, identifying sustainable pedagogical models that integrate Information and Communication Technologies (ICT) in a meaningful and inclusive way is an urgent need. This article presents a case study of the Amara Berri System (ABS), aiming to analyze how inclusive and sustainable digital innovation is operationalized within the system and whether teachers’ length of service is associated with the implementation and perceived impact of inclusive ICT practices. The investigation is based on a mixed-methods sequential design. A questionnaire was administered to a sample of 292 teachers to collect data on their practices and perceptions. Subsequently, a focus group with eight teachers was conducted to further explore the meaning of their practices. Quantitative results show that the implementation and positive evaluation of inclusive ICT practices correlate significantly with teachers’ seniority within the system, which suggests that the model is formative in itself. Qualitative analysis shows that ICTs are not an end in themselves within the ABS, but an empowering tool for the students. The “Audiovisual Media Room”, managed by students, functions as a space for social and creative production that gives technology a pedagogical purpose. The study concludes that the sustainability of digital innovation requires coherence with the pedagogical project. Findings offer valuable implications for the design of teacher training contexts that foster the integration of technology within a framework of truly inclusive education. Full article
(This article belongs to the Special Issue Sustainable Digital Education: Innovations in Teaching and Learning)
Show Figures

Figure 1

24 pages, 5019 KB  
Article
A Dual Stream Deep Learning Framework for Alzheimer’s Disease Detection Using MRI Sonification
by Nadia A. Mohsin and Mohammed H. Abdul Ameer
J. Imaging 2026, 12(1), 46; https://doi.org/10.3390/jimaging12010046 - 15 Jan 2026
Viewed by 446
Abstract
Alzheimer’s Disease (AD) is an advanced brain illness that affects millions of individuals across the world. It causes gradual damage to the brain cells, leading to memory loss and cognitive dysfunction. Although Magnetic Resonance Imaging (MRI) is widely used in AD diagnosis, the [...] Read more.
Alzheimer’s Disease (AD) is an advanced brain illness that affects millions of individuals across the world. It causes gradual damage to the brain cells, leading to memory loss and cognitive dysfunction. Although Magnetic Resonance Imaging (MRI) is widely used in AD diagnosis, the existing studies rely solely on the visual representations, leaving alternative features unexplored. The objective of this study is to explore whether MRI sonification can provide complementary diagnostic information when combined with conventional image-based methods. In this study, we propose a novel dual-stream multimodal framework that integrates 2D MRI slices with their corresponding audio representations. MRI images are transformed into audio signals using a multi-scale, multi-orientation Gabor filtering, followed by a Hilbert space-filling curve to preserve spatial locality. The image and sound modalities are processed using a lightweight CNN and YAMNet, respectively, then fused via logistic regression. The experimental results of the multimodal achieved the highest accuracy in distinguishing AD from Cognitively Normal (CN) subjects at 98.2%, 94% for AD vs. Mild Cognitive Impairment (MCI), and 93.2% for MCI vs. CN. This work provides a new perspective and highlights the potential of audio transformation of imaging data for feature extraction and classification. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

Back to TopTop