You are currently on the new version of our website. Access the old version .

102 Results Found

  • Article
  • Open Access
11 Citations
7,827 Views
21 Pages

Audiovisual n-Back Training Alters the Neural Processes of Working Memory and Audiovisual Integration: Evidence of Changes in ERPs

  • Ao Guo,
  • Weiping Yang,
  • Xiangfu Yang,
  • Jinfei Lin,
  • Zimo Li,
  • Yanna Ren,
  • Jiajia Yang and
  • Jinglong Wu

(1) Background: This study investigates whether audiovisual n-back training leads to training effects on working memory and transfer effects on perceptual processing. (2) Methods: Before and after training, the participants were tested using the audi...

  • Article
  • Open Access
7 Citations
3,419 Views
13 Pages

9 December 2020

The education of future teachers is going through a crisis related to audiovisual education. Following what is known as the 2015–2030 Agenda, the world of education is being confronted with a new learning model that must respond to the proposal...

  • Communication
  • Open Access
2 Citations
3,252 Views
9 Pages

24 January 2020

In considering capacity measures of audiovisual integration, it has become apparent that there is a wide degree of variation both within (based on unimodal and multimodal stimulus characteristics) and between participants. Recent work has discussed p...

  • Article
  • Open Access
14 Citations
4,144 Views
14 Pages

Audiovisual Training in Virtual Reality Improves Auditory Spatial Adaptation in Unilateral Hearing Loss Patients

  • Mariam Alzaher,
  • Chiara Valzolgher,
  • Grégoire Verdelet,
  • Francesco Pavani,
  • Alessandro Farnè,
  • Pascal Barone and
  • Mathieu Marx

17 March 2023

Unilateral hearing loss (UHL) leads to an alteration of binaural cues resulting in a significant increment of spatial errors in the horizontal plane. In this study, nineteen patients with UHL were recruited and randomized in a cross-over design into...

  • Article
  • Open Access
497 Views
20 Pages

24 November 2025

Introduction: Speech perception relies on integrating auditory and visual information, shaped by both perceptual and cognitive factors. Musical training has been shown to affect multisensory processing, whereas cognitive processes, such as recalibrat...

  • Proceeding Paper
  • Open Access
3 Citations
1,462 Views
5 Pages

Audiovisual Pills as a Tool for Training and Professional Preparation

  • Graciela Padilla-Castillo,
  • Eglée Ortega-Fernández and
  • Jonattan Rodríguez-Hernández

The social and cultural context after the COVID-19 pandemic is on many levels very different from the previous context. Studies carried out during 2020 and 2021 point to an unprecedented increase in the use of social media. This research delves into...

  • Article
  • Open Access
5 Citations
3,291 Views
17 Pages

Off-Screen Sound Separation Based on Audio-visual Pre-training Using Binaural Audio

  • Masaki Yoshida,
  • Ren Togo,
  • Takahiro Ogawa and
  • Miki Haseyama

7 May 2023

This study proposes a novel off-screen sound separation method based on audio-visual pre-training. In the field of audio-visual analysis, researchers have leveraged visual information for audio manipulation tasks, such as sound source separation. Alt...

  • Article
  • Open Access
2,477 Views
15 Pages

Feasibility of Developing Audiovisual Material for Training Needs in a Vietnam Orphanage: A Mixed-Method Design

  • Patricia Jovellar-Isiegas,
  • Carolina Jiménez-Sánchez,
  • Almudena Buesa-Estéllez,
  • Pilar Gómez-Barreiro,
  • Inés Alonso-Langa,
  • Sandra Calvo and
  • Marina Francín-Gallego

Disabled children living in orphanages in low-income countries may not have access to the therapy they need. The COVID-19 pandemic has complicated the situation dramatically, making online training activities a possible innovative option to meet the...

  • Article
  • Open Access
5 Citations
3,650 Views
18 Pages

8 January 2022

Nowadays, audiovisual media play a central role in access to information and in personal relationships. Among the audiovisual media is cinema, which due to its heterogeneous nature, can fulfill diverse educational functions. The objective of this stu...

  • Article
  • Open Access
2 Citations
2,840 Views
12 Pages

An Investigation into Audio–Visual Speech Recognition under a Realistic Home–TV Scenario

  • Bing Yin,
  • Shutong Niu,
  • Haitao Tang,
  • Lei Sun,
  • Jun Du,
  • Zhenhua Ling and
  • Cong Liu

23 March 2023

Robust speech recognition in real world situations is still an important problem, especially when it is affected by environmental interference factors and conversational multi-speaker interactions. Supplementing audio information with other modalitie...

  • Article
  • Open Access
1 Citations
2,421 Views
25 Pages

Emotional Induction Among Firefighters Using Audiovisual Stimuli: An Experimental Study

  • Frédéric Antoine-Santoni,
  • Arielle Syssau,
  • Claude Devichi,
  • Jean-Louis Rossi,
  • Thierry Marcelli,
  • François-Joseph Chatelon,
  • Adil Yakhloufi,
  • Pauline-Marie Ortoli,
  • Sofiane Meradji and
  • Dominique Grandjean-Kruslin
  • + 3 authors

14 March 2025

This study investigates the effectiveness of immersive audiovisual simulations in eliciting emotional responses and replicating the psychological and cognitive demands of high-risk operational environments, particularly in firefighting scenarios. Con...

  • Article
  • Open Access
42 Citations
8,828 Views
22 Pages

Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching

  • Natalia Bogach,
  • Elena Boitsova,
  • Sergey Chernonog,
  • Anton Lamtev,
  • Maria Lesnichaya,
  • Iurii Lezhenin,
  • Andrey Novopashenny,
  • Roman Svechnikov,
  • Daria Tsikach and
  • John Blake
  • + 2 authors

This article contributes to the discourse on how contemporary computer and information technology may help in improving foreign language learning not only by supporting better and more flexible workflow and digitizing study materials but also through...

  • Article
  • Open Access
16 Citations
9,730 Views
14 Pages

25 November 2019

Speaker diarization systems aim to find ‘who spoke when?’ in multi-speaker recordings. The dataset usually consists of meetings, TV/talk shows, telephone and multi-party interaction recordings. In this paper, we propose a novel multimodal...

  • Opinion
  • Open Access
1 Citations
2,044 Views
6 Pages

Face-to-face communication is one of the most common means of communication in daily life. We benefit from both auditory and visual speech signals that lead to better language understanding. People prefer face-to-face communication when access to aud...

  • Article
  • Open Access
57 Citations
8,004 Views
24 Pages

5 January 2022

Audio-visual emotion recognition is the research of identifying human emotional states by combining the audio modality and the visual modality simultaneously, which plays an important role in intelligent human-machine interactions. With the help of d...

  • Article
  • Open Access
1 Citations
752 Views
13 Pages

Enhanced Cross-Audiovisual Perception in High-Level Martial Arts Routine Athletes Stems from Increased Automatic Processing Capacity

  • Xiaohan Wang,
  • Zeshuai Wang,
  • Ya Gao,
  • Wu Jiang,
  • Zikang Meng,
  • Tianxin Gu,
  • Zonghao Zhang,
  • Haoping Yang and
  • Li Luo

29 July 2025

Multisensory integration is crucial for effective cognitive functioning, especially in complex tasks such as those requiring rapid audiovisual information processing. High-level martial arts routine athletes, trained in integrating visual and auditor...

  • Article
  • Open Access
8 Citations
5,306 Views
12 Pages

Knowledge in Images and Sounds: Informative, Narrative and Aesthetic Analysis of the Video for MOOC

  • Mario Rajas-Fernández,
  • Manuel Gértrudix-Barrio and
  • Miguel Baños-González

The virtual courses developed by higher education institutions incorporate the video format as one of the most used resources in the delivery of their online training offer. Within the different types of audiovisual productions found in MOOCs, the in...

  • Article
  • Open Access
24 Citations
6,419 Views
17 Pages

While VR-based training has been proven to improve learning effectiveness over conventional methods, there is a lack of study on its learning effectiveness due to the implementation of training modes. This study aims to investigate the learning effec...

  • Article
  • Open Access
1 Citations
1,094 Views
22 Pages

The Effect of Audiovisual Environment in Rail Transit Spaces on Pedestrian Psychological Perception

  • Mingli Zhang,
  • Xinyi Zou,
  • Xuejun Hu,
  • Haisheng Xie,
  • Feng Han and
  • Qi Meng

22 April 2025

The environmental quality of rail transit spaces has increasingly attracted attention, as factors such as train noise and visual disturbances from elevated lines can impact pedestrians’ psychological perception through the audiovisual environme...

  • Article
  • Open Access
3 Citations
3,090 Views
22 Pages

Multi-Corpus Learning for Audio–Visual Emotions and Sentiment Recognition

  • Elena Ryumina,
  • Maxim Markitantov and
  • Alexey Karpov

15 August 2023

Recognition of emotions and sentiment (affective states) from human audio–visual information is widely used in healthcare, education, entertainment, and other fields; therefore, it has become a highly active research area. The large variety of...

  • Article
  • Open Access
2,658 Views
20 Pages

27 November 2023

People rely on multiple learning systems to complete weather prediction (WP) tasks with visual cues. However, how people perform in audio and audiovisual modalities remains elusive. The present research investigated how the cue modality influences pe...

  • Article
  • Open Access
1 Citations
1,905 Views
13 Pages

20 April 2025

Background: Foreign body airway obstruction is a sudden emergency that can occur unexpectedly in healthy people, leading to severe consequences if immediate first aid is not provided. Unlike the Heimlich maneuver for adults, the first aid for infant...

  • Article
  • Open Access
4 Citations
2,286 Views
10 Pages

System for Game-like Therapy in Balance Issues Using Audiovisual Feedback and Force Platform

  • Markéta Janatová,
  • Jakub Pětioký,
  • Kristýna Hoidekrová,
  • Tomáš Veselý,
  • Karel Hána,
  • Pavel Smrčka,
  • Lubomír Štěpánek,
  • Marcela Lippert-Grünerová and
  • Jaroslav Jeřábek

Background: The aim of the work is to verify the usability of a stabilometric platform and audiovisual feedback in the group-based therapy of patients with vertebral algic syndrome, to analyze an immediate effect after a single therapeutic unit, and...

  • Article
  • Open Access
3,364 Views
17 Pages

Speech recognition approaches typically fall into three categories: audio, visual, and audio–visual. Visual speech recognition, or lip reading, is the most difficult because visual cues are ambiguous and data is scarce. To address these challen...

  • Article
  • Open Access
1 Citations
2,555 Views
21 Pages

Two-Stage Fusion-Based Audiovisual Remote Sensing Scene Classification

  • Yaming Wang,
  • Yiyang Liu,
  • Wenqing Huang,
  • Xiaoping Ye and
  • Mingfeng Jiang

30 October 2023

Scene classification in remote sensing is a pivotal research area, traditionally relying on visual information from aerial images for labeling. The introduction of ground environment audio as a novel geospatial data source adds valuable information f...

  • Article
  • Open Access
10 Citations
5,787 Views
19 Pages

24 November 2017

Although isomorphic pitch layouts are proposed to afford various advantages for musicians playing new musical instruments, this paper details the first substantive set of empirical tests on how two fundamental aspects of isomorphic pitch layouts affe...

  • Article
  • Open Access
11 Citations
7,211 Views
16 Pages

Research on Robust Audio-Visual Speech Recognition Algorithms

  • Wenfeng Yang,
  • Pengyi Li,
  • Wei Yang,
  • Yuxing Liu,
  • Yulong He,
  • Ovanes Petrosian and
  • Aleksandr Davydenko

5 April 2023

Automatic speech recognition (ASR) that relies on audio input suffers from significant degradation in noisy conditions and is particularly vulnerable to speech interference. However, video recordings of speech capture both visual and audio signals, p...

  • Article
  • Open Access
12 Citations
6,233 Views
22 Pages

Multimodal emotion recognition has emerged as a promising approach to capture the complex nature of human emotions by integrating information from various sources such as physiological signals, visual behavioral cues, and audio-visual content. Howeve...

  • Article
  • Open Access
5 Citations
3,596 Views
15 Pages

Multimodal Lip-Reading for Tracheostomy Patients in the Greek Language

  • Yorghos Voutos,
  • Georgios Drakopoulos,
  • Georgios Chrysovitsiotis,
  • Zoi Zachou,
  • Dimitris Kikidis,
  • Efthymios Kyrodimos and
  • Themis Exarchos

28 February 2022

Voice loss constitutes a crucial disorder which is highly associated with social isolation. The use of multimodal information sources, such as, audiovisual information, is crucial since it can lead to the development of straightforward personalized w...

  • Article
  • Open Access
2,768 Views
15 Pages

Multimodal Diarization Systems by Training Enrollment Models as Identity Representations

  • Victoria Mingote,
  • Ignacio Viñals,
  • Pablo Gimeno,
  • Antonio Miguel,
  • Alfonso Ortega and
  • Eduardo Lleida

21 January 2022

This paper describes a post-evaluation analysis of the system developed by ViVoLAB research group for the IberSPEECH-RTVE 2020 Multimodal Diarization (MD) Challenge. This challenge focuses on the study of multimodal systems for the diarization of aud...

  • Article
  • Open Access
20 Citations
7,868 Views
31 Pages

Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism

  • Konstantinos Mountzouris,
  • Isidoros Perikos and
  • Ioannis Hatzilygeroudis

23 October 2023

Speech emotion recognition (SER) is an interesting and difficult problem to handle. In this paper, we deal with it through the implementation of deep learning networks. We have designed and implemented six different deep learning networks, a deep bel...

  • Article
  • Open Access
25 Citations
5,529 Views
22 Pages

Emotion Elicitation Under Audiovisual Stimuli Reception: Should Artificial Intelligence Consider the Gender Perspective?

  • Marian Blanco-Ruiz,
  • Clara Sainz-de-Baranda,
  • Laura Gutiérrez-Martín,
  • Elena Romero-Perales and
  • Celia López-Ongil

Identification of emotions triggered by different sourced stimuli can be applied to automatic systems that help, relieve or protect vulnerable groups of population. The selection of the best stimuli allows to train these artificial intelligence-based...

  • Article
  • Open Access
14 Citations
6,308 Views
12 Pages

7 February 2023

This paper investigates multimodal sensor architectures with deep learning for audio-visual speech recognition, focusing on in-the-wild scenarios. The term “in the wild” is used to describe AVSR for unconstrained natural-language audio st...

  • Article
  • Open Access
4 Citations
3,386 Views
17 Pages

19 January 2023

In this work we present a bimodal multitask network for audiovisual biometric recognition. The proposed network performs the fusion of features extracted from face and speech data through a weighted sum to jointly optimize the contribution of each mo...

  • Article
  • Open Access
1 Citations
1,392 Views
12 Pages

23 August 2023

Compared to traditional unimodal methods, multimodal audio-visual correspondence learning has many advantages in the field of video understanding, but it also faces significant challenges. In order to fully utilize the feature information from both m...

  • Article
  • Open Access
24 Citations
6,474 Views
12 Pages

Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets

  • Marta Zielonka,
  • Artur Piastowski,
  • Andrzej Czyżewski,
  • Paweł Nadachowski,
  • Maksymilian Operlejn and
  • Kamil Kaczor

21 November 2022

Artificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extr...

  • Article
  • Open Access
1,468 Views
16 Pages

11 June 2025

As a multimodal fusion task, audio-visual segmentation (AVS) aims to locate sounding objects at the pixel level within a given image. This capability holds significant importance and practical value in applications such as intelligent surveillance, m...

  • Article
  • Open Access
37 Citations
5,187 Views
21 Pages

28 September 2020

Speech emotion recognition (SER) classifies emotions using low-level features or a spectrogram of an utterance. When SER methods are trained and tested using different datasets, they have shown performance reduction. Cross-corpus SER research identif...

  • Article
  • Open Access
12 Citations
4,305 Views
13 Pages

9 August 2020

Deep learning (DL) models have emerged in recent years as the state-of-the-art technique across numerous machine learning application domains. In particular, image processing-related tasks have seen a significant improvement in terms of performance d...

  • Article
  • Open Access
11 Citations
3,424 Views
12 Pages

Recently, automatic speech recognition (ASR) and visual speech recognition (VSR) have been widely researched owing to the development in deep learning. Most VSR research works focus only on frontal face images. However, assuming real scenes, it is ob...

  • Article
  • Open Access
1 Citations
2,892 Views
27 Pages

Location-Based Game for Thought-Provoking Evacuation Training

  • Hiroyuki Mitsuhara,
  • Chie Tanimura,
  • Junko Nemoto and
  • Masami Shishibori

Participation in evacuation training can aid survival in the event of an unpredictable disaster, such as an earthquake. However, conventional evacuation training is not well designed for provoking critical thinking in participants regarding the proce...

  • Article
  • Open Access
3 Citations
3,122 Views
35 Pages

Audiovisual Tracking of Multiple Speakers in Smart Spaces

  • Frank Sanabria-Macias,
  • Marta Marron-Romera and
  • Javier Macias-Guarasa

5 August 2023

This paper presents GAVT, a highly accurate audiovisual 3D tracking system based on particle filters and a probabilistic framework, employing a single camera and a microphone array. Our first contribution is a complex visual appearance model that acc...

  • Article
  • Open Access
28 Citations
5,225 Views
26 Pages

Mixed-Reality Demonstration and Training of Glassblowing

  • Anne Laure Carre,
  • Arnaud Dubois,
  • Nikolaos Partarakis,
  • Xenophon Zabulis,
  • Nikolaos Patsiouras,
  • Elina Mantinaki,
  • Emmanouil Zidianakis,
  • Nedjma Cadi,
  • Evangelia Baka and
  • Sotirios Manitsaris
  • + 3 authors

2 January 2022

Traditional crafts exhibit tangible and intangible dimensions. Intangible dimensions include the practitioner’s gestural know-how in craft practice and have received smaller attention than tangible dimensions in digitization projects. This work...

  • Article
  • Open Access
6 Citations
5,130 Views
19 Pages

Chilean Primary Learners’ Motivation and Attitude towards English as a Foreign Language

  • Maria-Jesus Inostroza,
  • Cristhian Perez-Villalobos and
  • Pia Tabalí

This study aims to identify motivational and attitude variables among Chilean young English learners from Concepción. A child-appropriate Likert scale questionnaire was distributed to 137 students from the 3rd, 4th, and 5th grade of two state-...

  • Article
  • Open Access
38 Citations
8,480 Views
19 Pages

EmoTour: Estimating Emotion and Satisfaction of Users Based on Behavioral Cues and Audiovisual Data

  • Yuki Matsuda,
  • Dmitrii Fedotov,
  • Yuta Takahashi,
  • Yutaka Arakawa,
  • Keiichi Yasumoto and
  • Wolfgang Minker

15 November 2018

With the spread of smart devices, people may obtain a variety of information on their surrounding environment thanks to sensing technologies. To design more context-aware systems, psychological user context (e.g., emotional status) is a substantial f...

  • Article
  • Open Access
13 Citations
5,634 Views
11 Pages

There is a strong correlation between the like/dislike responses to audio–visual stimuli and the emotional arousal and valence reactions of a person. In the present work, our attention is focused on the automated detection of dislike responses...

  • Article
  • Open Access
2,486 Views
10 Pages

Light and Color: Art and Science Using an Interdisciplinary Approach in Primary Education Teacher Training

  • Francisco Javier Serón Torrecilla,
  • Ana de Echave Sanz,
  • Carlos Rodríguez Casals,
  • Eva M. Terrado Sieso,
  • Jorge Pozuelo Muñoz and
  • Esther Cascarosa Salillas

4 November 2024

Light and color present complex interactions whose understanding is not always intuitive. Over the past forty years, numerous studies have aimed to identify preconceived notions and the various models employed in schools. Numerous teaching and learni...

  • Article
  • Open Access
5 Citations
5,460 Views
20 Pages

Developing a Dataset of Audio Features to Classify Emotions in Speech

  • Alvaro A. Colunga-Rodriguez,
  • Alicia Martínez-Rebollar,
  • Hugo Estrada-Esquivel,
  • Eddie Clemente and
  • Odette A. Pliego-Martínez

Emotion recognition in speech has gained increasing relevance in recent years, enabling more personalized interactions between users and automated systems. This paper presents the development of a dataset of features obtained from RAVDESS (Ryerson Au...

  • Article
  • Open Access
1 Citations
2,344 Views
20 Pages

29 April 2024

Auscultation of heart sounds is an important veterinary skill requiring an understanding of anatomy, physiology, pathophysiology and pattern recognition. This cross-sectional study was developed to evaluate a targeted, audio-visual training resource...

of 3