Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (129)

Search Parameters:
Keywords = audiovisual interactions

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 1150 KiB  
Article
Navigating by Design: Effects of Individual Differences and Navigation Modality on Spatial Memory Acquisition
by Xianyun Liu, Yanan Zhang and Baihu Sun
Behav. Sci. 2025, 15(7), 959; https://doi.org/10.3390/bs15070959 - 15 Jul 2025
Viewed by 308
Abstract
Spatial memory is a critical component of spatial cognition, particularly in unfamiliar environments. As navigation systems become integral to daily life, understanding how individuals with varying spatial abilities respond to different navigation modes is increasingly important. This study employed a virtual driving environment [...] Read more.
Spatial memory is a critical component of spatial cognition, particularly in unfamiliar environments. As navigation systems become integral to daily life, understanding how individuals with varying spatial abilities respond to different navigation modes is increasingly important. This study employed a virtual driving environment to examine how participants with varying spatial abilities (good or poor) performed under three navigation modes, namely visual, audio, and combined audio–visual navigation modes. A total of 78 participants were divided into two groups, good sense of direction (G-SOD) and poor sense of direction (P-SOD), according to their Santa Barbara Sense of Direction (SBSOD) scores. They were randomly assigned to one of the three navigation modes (visual, audio, audio–visual). Participants followed navigation cues and simulated driving behavior to the end point twice during the learning phase, then completed the route retracing task, recognizing scenes task and recognizing the order task. Significant main effects were found for both SOD group and navigation mode, with no interaction. G-SOD participants outperformed P-SOD participants in route retracing task. Audio navigation mode led to better performance in tasks involving complex spatial decisions, such as turn intersections and recognizing the order. The accuracy of recognizing scenes did not significantly differ across SOD groups or navigation modes. These findings suggest that audio navigation mode may reduce visual distraction and support more effective spatial encoding and that individual spatial abilities influence navigation performance independently of guidance type. These findings highlight the importance of aligning navigation modalities with users’ cognitive profiles and support the development of adaptive navigation systems that accommodate individual differences in spatial ability. Full article
(This article belongs to the Section Cognition)
Show Figures

Figure 1

24 pages, 2293 KiB  
Article
Research on the Healing Effect of the Waterscapes in Chinese Classical Gardens in Audiovisual Interaction
by Zhigao Zhai, Luning Cao, Qinhan Li, Zheng Gong, Li Guo and Deshun Zhang
Buildings 2025, 15(13), 2310; https://doi.org/10.3390/buildings15132310 - 1 Jul 2025
Viewed by 336
Abstract
As an important part of world cultural heritage, waterscapes in Chinese classical gardens are renowned for their unique design, rich cultural connotations, and distinctive esthetic value. However, objective studies of their impact on mental health are lacking. This paper focuses on Xishu Garden, [...] Read more.
As an important part of world cultural heritage, waterscapes in Chinese classical gardens are renowned for their unique design, rich cultural connotations, and distinctive esthetic value. However, objective studies of their impact on mental health are lacking. This paper focuses on Xishu Garden, a Chinese classical garden, and examines four types of waterscapes (for a total of twelve) using eye-tracking technology and the Perceived Restorativeness Scale (PRS). The aim of this study is to explore the restorative effects of different types of waterscapes with visual and audiovisual conditions, with particular attention paid to their mechanisms of action. The research results indicate that (1) waterscapes with an audiovisual interaction have a greater restorative value; (2) dynamic waterscapes have greater visual appeal than still landscapes do, but the latter have stronger environmentally restorative effects; and (3) the visual behavioral characteristics of waterscapes change during audiovisual interactions. This study contributes theoretical support for the maintenance and enhancement of Chinese classical gardens and the planning and design of modern urban green spaces, and it enriches our understanding of the role of waterscapes in restorative environments. Full article
(This article belongs to the Special Issue Acoustics and Well-Being: Towards Healthy Environments)
Show Figures

Figure 1

15 pages, 770 KiB  
Data Descriptor
NPFC-Test: A Multimodal Dataset from an Interactive Digital Assessment Using Wearables and Self-Reports
by Luis Fernando Morán-Mirabal, Luis Eduardo Güemes-Frese, Mariana Favarony-Avila, Sergio Noé Torres-Rodríguez and Jessica Alejandra Ruiz-Ramirez
Data 2025, 10(7), 103; https://doi.org/10.3390/data10070103 - 30 Jun 2025
Viewed by 447
Abstract
The growing implementation of digital platforms and mobile devices in educational environments has generated the need to explore new approaches for evaluating the learning experience beyond traditional self-reports or instructor presence. In this context, the NPFC-Test dataset was created from an experimental protocol [...] Read more.
The growing implementation of digital platforms and mobile devices in educational environments has generated the need to explore new approaches for evaluating the learning experience beyond traditional self-reports or instructor presence. In this context, the NPFC-Test dataset was created from an experimental protocol conducted at the Experiential Classroom of the Institute for the Future of Education. The dataset was built by collecting multimodal indicators such as neuronal, physiological, and facial data using a portable EEG headband, a medical-grade biometric bracelet, a high-resolution depth camera, and self-report questionnaires. The participants were exposed to a digital test lasting 20 min, composed of audiovisual stimuli and cognitive challenges, during which synchronized data from all devices were gathered. The dataset includes timestamped records related to emotional valence, arousal, and concentration, offering a valuable resource for multimodal learning analytics (MMLA). The recorded data were processed through calibration procedures, temporal alignment techniques, and emotion recognition models. It is expected that the NPFC-Test dataset will support future studies in human–computer interaction and educational data science by providing structured evidence to analyze cognitive and emotional states in learning processes. In addition, it offers a replicable framework for capturing synchronized biometric and behavioral data in controlled academic settings. Full article
Show Figures

Figure 1

16 pages, 1093 KiB  
Article
A Lightweight Framework for Audio-Visual Segmentation with an Audio-Guided Space–Time Memory Network
by Yunpeng Zuo and Yunwei Zhang
Appl. Sci. 2025, 15(12), 6585; https://doi.org/10.3390/app15126585 - 11 Jun 2025
Viewed by 523
Abstract
As a multimodal fusion task, audio-visual segmentation (AVS) aims to locate sounding objects at the pixel level within a given image. This capability holds significant importance and practical value in applications such as intelligent surveillance, multimedia content analysis, and human–robot interaction. However, existing [...] Read more.
As a multimodal fusion task, audio-visual segmentation (AVS) aims to locate sounding objects at the pixel level within a given image. This capability holds significant importance and practical value in applications such as intelligent surveillance, multimedia content analysis, and human–robot interaction. However, existing AVS models typically feature complex architectures, require a large number of parameters, and are challenging to deploy on embedded platforms. Furthermore, these models often lack integration with object tracking mechanisms and fail to address the issue of the mis-segmentation of unvoiced objects caused by environmental noise in real-world scenarios. To address these challenges, this research proposes a lightweight audio-visual segmentation framework incorporating an audio-guided space–time memory network (AG-STMNet). First, a mask generator with a scoring mechanism was developed to identify sounding objects from generated masks. This component integrates Fastsam, a lightweight, pre-trained, object-aware segmentation model, with WAV2CLIP, a parameter-efficient audio-visual alignment model. Subsequently, AG-STMNet, an audio-guided video object segmentation network, was introduced to track sounding objects using video object segmentation techniques while mitigating environmental noise. Finally, the mask generator and AG-STMNet were combined to form the complete framework. The experimental results demonstrate that the framework achieves a mean Intersection over Union (mIoU) score of 41.5, indicating its potential as a viable lightweight solution for practical applications. Full article
(This article belongs to the Special Issue Artificial Intelligence and Its Application in Robotics)
Show Figures

Figure 1

24 pages, 9841 KiB  
Article
The Audiovisual Assessment of Monocultural Vegetation Based on Facial Expressions
by Mary Nwankwo, Qi Meng, Da Yang and Mengmeng Li
Forests 2025, 16(6), 937; https://doi.org/10.3390/f16060937 - 3 Jun 2025
Viewed by 504
Abstract
Plant vegetation is nature’s symphony, offering sensory experiences that influence ecological systems, human well-being, and emotional states and significantly impact human societal progress. This study investigated the emotional and perceptual impacts of specific monocultural vegetation (palm and rubber) in Nigeria, through audiovisual interactions [...] Read more.
Plant vegetation is nature’s symphony, offering sensory experiences that influence ecological systems, human well-being, and emotional states and significantly impact human societal progress. This study investigated the emotional and perceptual impacts of specific monocultural vegetation (palm and rubber) in Nigeria, through audiovisual interactions using facial expression analysis, soundscape, and visual perception assessments. The findings reveal three key outcomes: (1) Facial expressions varied significantly by vegetation type and time of day, with higher “happy” valence values recorded for palm vegetation in the morning (mean = 0.39), and for rubber vegetation in the afternoon (mean = 0.37). (2) Gender differences in emotional response were observed, as male participants exhibited higher positive expressions (mean = 0.40) compared to females (mean = 0.33). (3) Perceptual ratings indicated that palm vegetation was perceived as more visually beautiful (mean = 4.05), whereas rubber vegetation was rated as having a more pleasant soundscape (mean = 4.10). However, facial expressions showed weak correlations with soundscape and visual perceptions, suggesting that other cognitive or sensory factors may be more influential. This study addresses a critical gap in soundscape research for monocultural vegetation and offers valuable insights for urban planners, environmental psychologists, and restorative landscape designs. Full article
(This article belongs to the Special Issue Soundscape in Urban Forests—2nd Edition)
Show Figures

Figure 1

21 pages, 813 KiB  
Review
Light, Sound, and Melatonin: Investigating Multisensory Pathways for Visual Restoration
by Dario Rusciano
Medicina 2025, 61(6), 1009; https://doi.org/10.3390/medicina61061009 - 28 May 2025
Cited by 1 | Viewed by 882
Abstract
Multisensory integration is fundamental for coherent perception and interaction with the environment. While cortical mechanisms of multisensory convergence are well studied, emerging evidence implicates specialized retinal ganglion cells—particularly melanopsin-expressing intrinsically photosensitive retinal ganglion cells (ipRGCs)—in crossmodal processing. This review explores how hierarchical brain [...] Read more.
Multisensory integration is fundamental for coherent perception and interaction with the environment. While cortical mechanisms of multisensory convergence are well studied, emerging evidence implicates specialized retinal ganglion cells—particularly melanopsin-expressing intrinsically photosensitive retinal ganglion cells (ipRGCs)—in crossmodal processing. This review explores how hierarchical brain networks (e.g., superior colliculus, parietal cortex) and ipRGCs jointly shape perception and behavior, focusing on their convergence in multisensory plasticity. We highlight ipRGCs as gatekeepers of environmental light cues. Their anatomical projections to multisensory areas like the superior colliculus are well established, although direct evidence for their role in human audiovisual integration remains limited. Through melanopsin signaling and subcortical projections, they may modulate downstream multisensory processing, potentially enhancing the salience of crossmodal inputs. A key theme is the spatiotemporal synergy between melanopsin and melatonin: melanopsin encodes light, while melatonin fine-tunes ipRGC activity and synaptic plasticity, potentially creating time-sensitive rehabilitation windows. However, direct evidence linking ipRGCs to audiovisual rehabilitation remains limited, with their role primarily inferred from anatomical and functional studies. Future implementations should prioritize quantitative optical metrics (e.g., melanopic irradiance, spectral composition) to standardize light-based interventions and enhance reproducibility. Nonetheless, we propose a translational framework combining multisensory stimuli (e.g., audiovisual cues) with circadian-timed melatonin to enhance recovery in visual disorders like hemianopia and spatial neglect. By bridging retinal biology with systems neuroscience, this review redefines the retina’s role in multisensory processing and offers novel, mechanistically grounded strategies for neurorehabilitation. Full article
(This article belongs to the Section Ophthalmology)
Show Figures

Figure 1

51 pages, 41402 KiB  
Article
A Digitally Enhanced Ethnography for Craft Action and Process Understanding
by Xenophon Zabulis, Partarakis Nikolaos, Vasiliki Manikaki, Ioanna Demeridou, Arnaud Dubois, Inés Moreno, Valentina Bartalesi, Nicolò Pratelli, Carlo Meghini, Sotiris Manitsaris and Gavriela Senteri
Appl. Sci. 2025, 15(10), 5408; https://doi.org/10.3390/app15105408 - 12 May 2025
Viewed by 947
Abstract
Traditional ethnographic methods have long been employed to study craft practices, yet they often fall short of capturing the full depth of embodied knowledge, material interactions, and procedural workflows inherent in craftsmanship. This paper introduces a digitally enhanced ethnographic framework that integrates Motion [...] Read more.
Traditional ethnographic methods have long been employed to study craft practices, yet they often fall short of capturing the full depth of embodied knowledge, material interactions, and procedural workflows inherent in craftsmanship. This paper introduces a digitally enhanced ethnographic framework that integrates Motion Capture, 3D scanning, audiovisual documentation, and semantic knowledge representation to document both the tangible and dynamic aspects of craft processes. By distinguishing between endurant (tools, materials, objects) and perdurant (actions, events, transformations) entities, we propose a structured methodology for analyzing craft gestures, material behaviors, and production workflows. The study applies this proposed framework to eight European craft traditions—including glassblowing, tapestry weaving, woodcarving, porcelain pottery, marble carving, silversmithing, clay pottery, and textile weaving—demonstrating the adaptability of digital ethnographic tools across disciplines. Through a combination of multimodal data acquisition and expert-driven annotation, we present a comprehensive model for craft documentation that enhances the preservation, education, and analysis of artisanal knowledge. This research contributes to the ongoing evolution of ethnographic methods by bridging digital technology with Cultural Heritage studies, offering a robust framework for understanding the mechanics and meanings of craft practices. Full article
Show Figures

Figure 1

15 pages, 4273 KiB  
Article
Speech Emotion Recognition: Comparative Analysis of CNN-LSTM and Attention-Enhanced CNN-LSTM Models
by Jamsher Bhanbhro, Asif Aziz Memon, Bharat Lal, Shahnawaz Talpur and Madeha Memon
Signals 2025, 6(2), 22; https://doi.org/10.3390/signals6020022 - 9 May 2025
Cited by 1 | Viewed by 1749
Abstract
Speech Emotion Recognition (SER) technology helps computers understand human emotions in speech, which fills a critical niche in advancing human–computer interaction and mental health diagnostics. The primary objective of this study is to enhance SER accuracy and generalization through innovative deep learning models. [...] Read more.
Speech Emotion Recognition (SER) technology helps computers understand human emotions in speech, which fills a critical niche in advancing human–computer interaction and mental health diagnostics. The primary objective of this study is to enhance SER accuracy and generalization through innovative deep learning models. Despite its importance in various fields like human–computer interaction and mental health diagnosis, accurately identifying emotions from speech can be challenging due to differences in speakers, accents, and background noise. The work proposes two innovative deep learning models to improve SER accuracy: a CNN-LSTM model and an Attention-Enhanced CNN-LSTM model. These models were tested on the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), collected between 2015 and 2018, which comprises 1440 audio files of male and female actors expressing eight emotions. Both models achieved impressive accuracy rates of over 96% in classifying emotions into eight categories. By comparing the CNN-LSTM and Attention-Enhanced CNN-LSTM models, this study offers comparative insights into modeling techniques, contributes to the development of more effective emotion recognition systems, and offers practical implications for real-time applications in healthcare and customer service. Full article
Show Figures

Figure 1

15 pages, 2408 KiB  
Article
From Emotion to Virality: The Use of Social Media in the Populism of Chega and VOX
by Ricardo Domínguez-García, João Pedro Baptista, Concha Pérez-Curiel and Daniela Esperança Monteiro da Fonseca
Soc. Sci. 2025, 14(5), 255; https://doi.org/10.3390/socsci14050255 - 23 Apr 2025
Viewed by 1711
Abstract
This study analyses the digital communication strategies of the radical right parties VOX (Spain) and Chega (Portugal) on the social media platforms X, Instagram, and TikTok during the electoral periods. Using a comparative content analysis with quantitative and qualitative approaches, the research reveals [...] Read more.
This study analyses the digital communication strategies of the radical right parties VOX (Spain) and Chega (Portugal) on the social media platforms X, Instagram, and TikTok during the electoral periods. Using a comparative content analysis with quantitative and qualitative approaches, the research reveals that both parties employ a populist discourse marked by confrontation with the political elite and the use of emotional appeals to mobilize their followers. VOX directs its attacks at the left and the Spanish Prime Minister, while Chega emphasizes criticism of the political system as a whole. The results show that polarization and the evocation of emotions such as indignation, pride, and hope are central strategies in their posts. Furthermore, messages with strong emotional charge and audiovisual elements generate a greater impact, especially on TikTok and Instagram, where virality is significantly higher than on X. The study concludes that the communication strategies of these parties are based on ‘data populism’, where interaction and visibility on social media reinforce their political narratives and consolidate their base of support. Full article
Show Figures

Figure 1

22 pages, 6743 KiB  
Article
The Effect of Audiovisual Environment in Rail Transit Spaces on Pedestrian Psychological Perception
by Mingli Zhang, Xinyi Zou, Xuejun Hu, Haisheng Xie, Feng Han and Qi Meng
Buildings 2025, 15(9), 1400; https://doi.org/10.3390/buildings15091400 - 22 Apr 2025
Viewed by 445
Abstract
The environmental quality of rail transit spaces has increasingly attracted attention, as factors such as train noise and visual disturbances from elevated lines can impact pedestrians’ psychological perception through the audiovisual environment in these spaces. This study first collects audiovisual materials from rail [...] Read more.
The environmental quality of rail transit spaces has increasingly attracted attention, as factors such as train noise and visual disturbances from elevated lines can impact pedestrians’ psychological perception through the audiovisual environment in these spaces. This study first collects audiovisual materials from rail transit spaces and pedestrian perception data through on-site surveys, measurements, VR environment simulations, and custom Deep Learning (DL) models. Using cluster analysis, the environments are categorized based on visual and auditory perceptions and evaluations of rail transit stations, delineating and classifying the spaces into different zones. The study further explores the interactive effects of audiovisual environmental factors on psychological perception within these zones. The results indicate that, based on audiovisual perception, the space within 300 m of a rail transit station can be divided into three zones and four distinct types of audiovisual perception spaces. The effect of the type of auditory environment on visual indicators was smaller than the effect of the visual environment on auditory indicators, and the category of vision had the greatest effect on the subjective indicators of hearing within Zones 1 and 2. This study not only provides a scientific basis for improving the environmental quality of rail transit station areas but also offers new perspectives and practical approaches for urban transportation planning and design. Full article
Show Figures

Figure 1

13 pages, 2815 KiB  
Article
More than Interactivity: Designing a Critical AI Game Beyond Ludo-Centrism
by Hongwei Zhou, Fandi Meng, Katherine Kosolapova and Noah Wadrip-Fruin
Humanities 2025, 14(4), 88; https://doi.org/10.3390/h14040088 - 15 Apr 2025
Viewed by 526
Abstract
This article presents our work-in-progress game Sea of Paint, aimed at exploring concerns around contemporary machine-learning-based AI technologies. It is a narrative-driven game with dialogues and a custom-made text-to-image system as its core mechanics. We identify our design approach as non-ludo-centric, as [...] Read more.
This article presents our work-in-progress game Sea of Paint, aimed at exploring concerns around contemporary machine-learning-based AI technologies. It is a narrative-driven game with dialogues and a custom-made text-to-image system as its core mechanics. We identify our design approach as non-ludo-centric, as in, de-emphasizing the importance of mechanical interactions. We argue that contemporary game design language has largely been ludo-centric, where audiovisual and narrative aspects are framed as having somewhat static and complementary roles to rules and mechanics: as context, content, or smoothening and juicing up interactions. Although we do not believe that game design writ large has been ludo-centric, given the diversities of games in both commercial and experimental spaces, we still argue that the entanglement of design decisions across a game’s different aspects have been under-discussed. By presenting our project, we demonstrate how the interrelations across mechanical, narrative and visual aspects help us communicate our critical AI themes more effectively, and explore their potentials more thoroughly. Full article
(This article belongs to the Special Issue Electronic Literature and Game Narratives)
Show Figures

Figure 1

22 pages, 1195 KiB  
Article
Harmonizing Sight and Sound: The Impact of Auditory Emotional Arousal, Visual Variation, and Their Congruence on Consumer Engagement in Short Video Marketing
by Qiang Yang, Yudan Wang, Qin Wang, Yushi Jiang and Jingpeng Li
J. Theor. Appl. Electron. Commer. Res. 2025, 20(2), 69; https://doi.org/10.3390/jtaer20020069 - 8 Apr 2025
Cited by 1 | Viewed by 3262
Abstract
Social media influencers strategically design the auditory and visual features of short videos to enhance consumer engagement. Among these, auditory emotional arousal and visual variation play crucial roles, yet their interactive effects remain underexplored. Drawing on multichannel integration theory, this study applies multimodal [...] Read more.
Social media influencers strategically design the auditory and visual features of short videos to enhance consumer engagement. Among these, auditory emotional arousal and visual variation play crucial roles, yet their interactive effects remain underexplored. Drawing on multichannel integration theory, this study applies multimodal machine learning to analyze 12,842 short videos from Douyin, integrating text analysis, sound recognition, and image processing. The results reveal an inverted U-shaped relationship between auditory emotional arousal and consumer engagement, where moderate arousal maximizes interaction while excessively high or low arousal reduces engagement. Visual variation, however, exhibits a positive linear effect, with greater variation driving higher engagement. Notably, audiovisual congruence significantly enhances engagement, as high alignment between arousal and visual variation optimizes consumer information processing. These findings advance short video marketing research by uncovering the multisensory interplay in consumer engagement. They also provide practical guidance for influencers in optimizing voice and visual design strategies to enhance content effectiveness. Full article
(This article belongs to the Topic Interactive Marketing in the Digital Era)
Show Figures

Figure 1

21 pages, 3952 KiB  
Article
Which Factors Enhance the Perceived Restorativeness of Streetscapes: Sound, Vision, or Their Combined Effects? Insights from Four Street Types in Nanjing, China
by Xi Lu, Jiamin Xu, Eckart Lange and Jingwen Cao
Land 2025, 14(4), 757; https://doi.org/10.3390/land14040757 - 1 Apr 2025
Viewed by 697
Abstract
Streetscapes play a critical role in restorative landscapes, offering opportunities for promoting public well-being. Previous studies have predominantly examined the influence of visual and auditory stimuli on perceived restorativeness independently. There is a limited understanding of their interactive effects. In this research, 360 [...] Read more.
Streetscapes play a critical role in restorative landscapes, offering opportunities for promoting public well-being. Previous studies have predominantly examined the influence of visual and auditory stimuli on perceived restorativeness independently. There is a limited understanding of their interactive effects. In this research, 360 participants completed a series of experiments considering four distinct street types, including visual comfort assessment, acoustic environment assessment, and perceived restorativeness. They were assigned to a control group and one of three experimental groups, each receiving specific enhancement: visual stimuli, auditory stimuli, or a combination of audiovisual stimuli. The findings revealed that the experimental groups reported a greater sense of restorativeness compared to the control group. Notably, auditory stimuli demonstrated a more pronounced restorative effect than visual stimuli, while limited differences were found between auditory and audiovisual stimuli. The differences in experimental outcomes among the four street types are compared and discussed, highlighting context-specific guidelines for enhancing streetscape restorativeness. The research findings highlight enhancing the masking effect of soundscape in street environmental design. The study adds a novel multi-sensory approach to the current body of research on restorative landscapes, providing significant insights for the planning and design of streetscapes. Full article
Show Figures

Figure 1

30 pages, 2781 KiB  
Article
Hybrid Multi-Attention Network for Audio–Visual Emotion Recognition Through Multimodal Feature Fusion
by Sathishkumar Moorthy and Yeon-Kug Moon
Mathematics 2025, 13(7), 1100; https://doi.org/10.3390/math13071100 - 27 Mar 2025
Cited by 3 | Viewed by 1729
Abstract
Multimodal emotion recognition involves leveraging complementary relationships across modalities to enhance the assessment of human emotions. Networks that integrate diverse information sources outperform single-modal approaches while offering greater robustness against noisy or missing data. Current emotion recognition approaches often rely on cross-modal attention [...] Read more.
Multimodal emotion recognition involves leveraging complementary relationships across modalities to enhance the assessment of human emotions. Networks that integrate diverse information sources outperform single-modal approaches while offering greater robustness against noisy or missing data. Current emotion recognition approaches often rely on cross-modal attention mechanisms, particularly audio and visual modalities; however, these methods do not assume the complementary nature of the data. Despite making this assumption, it is not uncommon to see non-complementary relationships arise in real-world data, reducing the effectiveness of feature integration that assumes consistent complementarity. While audio–visual co-learning provides a broader understanding of contextual information for practical implementation, discrepancies between audio and visual data, such as semantic inconsistencies, pose challenges and lay the groundwork for inaccurate predictions. In this way, they have limitations in modeling the intramodal and cross-modal interactions. In order to address these problems, we propose a multimodal learning framework for emotion recognition, called the Hybrid Multi-ATtention Network (HMATN). Specifically, we introduce a collaborative cross-attentional paradigm for audio–visual amalgamation, intending to effectively capture salient features over modalities while preserving both intermodal and intramodal relationships. The model calculates cross-attention weights by analyzing the relationship between combined feature illustrations and distinct modes. Meanwhile, the network employs the Hybrid Attention of Single and Parallel Cross-Modal (HASPCM) mechanism, comprising a single-modal attention component and a parallel cross-modal attention component, to harness complementary multimodal data and hidden features to improve representation. Additionally, these modules exploit complementary and concealed multimodal information to enhance the richness of feature representation. Finally, the efficiency of the proposed method is demonstrated through experiments on complex videos from the AffWild2 and AFEW-VA datasets. The findings of these tests show that the developed attentional audio–visual fusion model offers a cost-efficient solution that surpasses state-of-the-art techniques, even when the input data are noisy or missing modalities. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

20 pages, 7113 KiB  
Article
Juggling Balls and Mathematics: An Ethnomathematical Exploration
by Giovanna Zito and Veronica Albanese
Educ. Sci. 2025, 15(3), 387; https://doi.org/10.3390/educsci15030387 - 20 Mar 2025
Cited by 1 | Viewed by 550
Abstract
Ethnomathematics, as a field of study, promotes recognizing the diversity in ways of thinking and doing mathematics, challenging the hierarchies and exclusions typical of traditional mathematics education. This research explores the practice of juggling, specifically analyzing three-ball juggling sequences to uncover the mathematical [...] Read more.
Ethnomathematics, as a field of study, promotes recognizing the diversity in ways of thinking and doing mathematics, challenging the hierarchies and exclusions typical of traditional mathematics education. This research explores the practice of juggling, specifically analyzing three-ball juggling sequences to uncover the mathematical structures and patterns embedded in this ancient art form. In a social association during a workshop, two jugglers and seven juggling learners interact with one of the researchers, a mathematics educator, to co-construct a shared model establishing a symmetrical dialogue based on the Alangui’s principles of “mutual interrogation” between the practice of juggling and the domain of mathematics. The knowledge exchange process is envisioned as a “barter” where both the mathematics educator and the jugglers contribute their unique perspectives to generate new and hybrid understandings. With a qualitative approach, from the analysis of the data collected during the ethnographic field work (notes, audiovisual recordings) emerges how the initial model, created by mathematicians and jugglers, was reinterpreted to better align with the cultural community’s practice. The research revealed that juggling serves as a concrete context for exploring abstract mathematical concepts and that mathematical analysis of juggling sequences helps jugglers gain a deeper understanding of underlying structures, enhancing their creativity. The hybrid model developed in this study offers a promising resource to integrating ethnomathematical perspectives into formal mathematics education, fostering a more situated and engaging learning experience for students. Full article
Show Figures

Figure 1

Back to TopTop