error_outline You can access the new MDPI.com website here. Explore and share your feedback with us.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (3)

Search Parameters:
Keywords = audiovisual cuts

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 439 KB  
Article
MultiAVSR: Robust Speech Recognition via Supervised Multi-Task Audio–Visual Learning
by Shad Torrie, Kimi Wright and Dah-Jye Lee
Electronics 2025, 14(12), 2310; https://doi.org/10.3390/electronics14122310 - 6 Jun 2025
Viewed by 3272
Abstract
Speech recognition approaches typically fall into three categories: audio, visual, and audio–visual. Visual speech recognition, or lip reading, is the most difficult because visual cues are ambiguous and data is scarce. To address these challenges, we present a new multi-task audio–visual speech recognition, [...] Read more.
Speech recognition approaches typically fall into three categories: audio, visual, and audio–visual. Visual speech recognition, or lip reading, is the most difficult because visual cues are ambiguous and data is scarce. To address these challenges, we present a new multi-task audio–visual speech recognition, or MultiAVSR, framework for training a model on all three types of speech recognition simultaneously primarily to improve visual speech recognition. Unlike prior works which use separate models or complex semi-supervision, our framework employs a supervised multi-task hybrid Connectionist Temporal Classification/Attention loss cutting training exaFLOPs to just 18% of that required by semi-supervised multitask models. MultiAVSR achieves state-of-the-art visual speech recognition word error rate of 21.0% on the LRS3-TED dataset. Furthermore, it exhibits robust generalization capabilities, achieving a remarkable 44.7% word error rate on the WildVSR dataset. Our framework also demonstrates reduced dependency on external language models, which is critical for real-time visual speech recognition. For the audio and audio–visual tasks, our framework improves the robustness under various noisy environments with average relative word error rate improvements of 16% and 31%, respectively. These improvements across the three tasks illustrate the robust results our supervised multi-task speech recognition framework enables. Full article
(This article belongs to the Special Issue Advances in Information, Intelligence, Systems and Applications)
Show Figures

Figure 1

16 pages, 288 KB  
Article
Black (W)hole Foods: Okra, Soil and Blackness in The Underground Railroad (Barry Jenkins, USA, 2021)
by William Brown
Philosophies 2022, 7(5), 117; https://doi.org/10.3390/philosophies7050117 - 14 Oct 2022
Cited by 1 | Viewed by 4050
Abstract
This essay analyses the role played by okra in The Underground Railroad, together with how it functions in relation to the soil that sustains it and which allows it to grow. I argue that okra represents an otherwise lost African past for [...] Read more.
This essay analyses the role played by okra in The Underground Railroad, together with how it functions in relation to the soil that sustains it and which allows it to grow. I argue that okra represents an otherwise lost African past for both protagonist Cora and for the show in general and that this transplanted plant, similar to the transplanted Africans who endured the Middle Passage on the way to ‘New World’ slave plantations, survives by going through ‘black holes’, something that is not only linked poetically to the established trope of the otherwise absent Black mother but which also finds support from physics, where wormholes (similar to the holes created by worms in the soil) take us through black holes and into new worlds, realities or dimensions. This is reflected in Jenkins’s series (as well as Whitehead’s novel) by the titular Underground Railroad itself, which sees Cora and others disappear underground only to reappear in new states (the show travels from Georgia to South Carolina to North Carolina to Tennessee to Indiana and so on), as well as specifically in the show through the formal properties of the audio-visual (cinematic/televisual) medium, which, with its cuts and movements, similarly keeps shifting through space and time in a nonlinear but generative fashion. Finally, I suggest that we cannot philosophise the plant or the medium of film (or television or streaming media) without philosophising race, with The Underground Railroad serving as a means for bringing together plants and plantations, soil and wormholes and Blackness and black holes, which, collectively and playfully, I group under the umbrella term ‘black (w)hole foods’. Full article
(This article belongs to the Special Issue Thinking Cinema—With Plants)
10 pages, 884 KB  
Article
Brain Symmetry in Alpha Band When Watching Cuts in Movies
by Celia Andreu-Sánchez, Miguel Ángel Martín-Pascual, Agnès Gruart and José María Delgado-García
Symmetry 2022, 14(10), 1980; https://doi.org/10.3390/sym14101980 - 22 Sep 2022
Cited by 3 | Viewed by 1952
Abstract
The purpose of this study is to determine if there is asymmetry in the brain activity between both hemispheres while watching cuts in movies. We presented videos with cuts to 36 participants, registered electrical brain activity through electroencephalography (EEG) and analyzed asymmetry in [...] Read more.
The purpose of this study is to determine if there is asymmetry in the brain activity between both hemispheres while watching cuts in movies. We presented videos with cuts to 36 participants, registered electrical brain activity through electroencephalography (EEG) and analyzed asymmetry in frontal, somatomotor, temporal, parietal and occipital areas. EEG power and alpha (8–13 Hz) asymmetry were analyzed based on 4032 epochs (112 epochs from videos × 36 participants) in each hemisphere. On average, we found negative asymmetry, indicating a greater alpha power in the left hemisphere and a greater activity in the right hemisphere in frontal, temporal and occipital areas. The opposite was found in somatomotor and temporal areas. However, with a high inter-subjects variability, these asymmetries did not seem to be significant. Our results suggest that cuts in audiovisuals do not provoke any specific asymmetrical brain activity in the alpha band in viewers. We conclude that brain asymmetry when decoding audiovisual content may be more related with narrative content than with formal style. Full article
(This article belongs to the Special Issue Cognitive Neuroscience and Symmetry)
Show Figures

Figure 1

Back to TopTop