Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (26)

Search Parameters:
Keywords = audiovisual series

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 1779 KiB  
Article
Through the Eyes of the Viewer: The Cognitive Load of LLM-Generated vs. Professional Arabic Subtitles
by Hussein Abu-Rayyash and Isabel Lacruz
J. Eye Mov. Res. 2025, 18(4), 29; https://doi.org/10.3390/jemr18040029 - 14 Jul 2025
Viewed by 474
Abstract
As streaming platforms adopt artificial intelligence (AI)-powered subtitle systems to satisfy global demand for instant localization, the cognitive impact of these automated translations on viewers remains largely unexplored. This study used a web-based eye-tracking protocol to compare the cognitive load that GPT-4o-generated Arabic [...] Read more.
As streaming platforms adopt artificial intelligence (AI)-powered subtitle systems to satisfy global demand for instant localization, the cognitive impact of these automated translations on viewers remains largely unexplored. This study used a web-based eye-tracking protocol to compare the cognitive load that GPT-4o-generated Arabic subtitles impose with that of professional human translations among 82 native Arabic speakers who viewed a 10 min episode (“Syria”) from the BBC comedy drama series State of the Union. Participants were randomly assigned to view the same episode with either professionally produced Arabic subtitles (Amazon Prime’s human translations) or machine-generated GPT-4o Arabic subtitles. In a between-subjects design, with English proficiency entered as a moderator, we collected fixation count, mean fixation duration, gaze distribution, and attention concentration (K-coefficient) as indices of cognitive processing. GPT-4o subtitles raised cognitive load on every metric; viewers produced 48% more fixations in the subtitle area, recorded 56% longer fixation durations, and spent 81.5% more time reading the automated subtitles than the professional subtitles. The subtitle area K-coefficient tripled (0.10 to 0.30), a shift from ambient scanning to focal processing. Viewers with advanced English proficiency showed the largest disruptions, which indicates that higher linguistic competence increases sensitivity to subtle translation shortcomings. These results challenge claims that large language models (LLMs) lighten viewer burden; despite fluent surface quality, GPT-4o subtitles demand far more cognitive resources than expert human subtitles and therefore reinforce the need for human oversight in audiovisual translation (AVT) and media accessibility. Full article
Show Figures

Figure 1

27 pages, 316 KiB  
Article
Hearing Written Magic in Harry Potter Films: Insights into Power and Truth in the Scoring for In-World Written Words
by Jamie Lynn Webster
Humanities 2025, 14(6), 125; https://doi.org/10.3390/h14060125 - 10 Jun 2025
Viewed by 1460
Abstract
This paper explores how sound design in the Harry Potter film series shapes the symbolic significance of written words within the magical world. Sound mediates between language and meaning; while characters gain knowledge by reading and seeing, viewers are guided emotionally and thematically [...] Read more.
This paper explores how sound design in the Harry Potter film series shapes the symbolic significance of written words within the magical world. Sound mediates between language and meaning; while characters gain knowledge by reading and seeing, viewers are guided emotionally and thematically by how these written texts are framed through sound. For example, Harry’s magical identity is signalled to viewers through the score long before he fully understands himself—first through music when he speaks to a snake, then more explicitly when he receives his letter from Hogwarts. Throughout the series, characters engage with a wide array of written media—textbooks, letters, newspapers, diaries, maps, and inscriptions—that gradually shift in narrative function, from static props to dynamic, multi-sensory agents of transformation. Using a close analysis of selected scenes to examine layers of utterances, diegetic sounds, underscore, and sound design, this study draws on metaphor theory and adaptation theory to examine how sound design gives writing a metaphorical voice, sometimes framing it as character, landscape, or moral authority. As the series progresses, becoming more autonomous from the literary source, written words take on greater symbolic significance, and sound increasingly determines which texts are granted narrative power, whose voices are trusted, and how viewers interpret truth and agency across media. Ultimately, written words in the films are animated through sound into agents of growth, memory, resistance, and transformation. Thus, the audio-visual treatment of written magic reveals not just what is written, but what matters. Full article
(This article belongs to the Special Issue Music and the Written Word)
17 pages, 1352 KiB  
Article
Fusion Classification Method Based on Audiovisual Information Processing
by Peiju Chen, Xuan Zhang, Huijun Zhao, Huiliang Cao, Xuemei Chen and Xiaochen Liu
Appl. Sci. 2025, 15(8), 4104; https://doi.org/10.3390/app15084104 - 8 Apr 2025
Cited by 1 | Viewed by 605
Abstract
In the presence of external interference, multimodal target classification plays a crucial role. Traditional single-modal classification systems are limited by the singularity of data representation and their sensitivity to environmental conditions, making it challenging to meet the robustness requirements for target classification under [...] Read more.
In the presence of external interference, multimodal target classification plays a crucial role. Traditional single-modal classification systems are limited by the singularity of data representation and their sensitivity to environmental conditions, making it challenging to meet the robustness requirements for target classification under external disturbances. This paper addresses the inadequacies of single-modal target classification by proposing a target classification algorithm based on audiovisual fusion. The innovative contributions of this work are as follows. (1) To resolve the issue of the lack of correlation between audio signals and image signals, we introduce a method that converts audio signals into spectrograms and fuses them with target images. The advantage of this method is that the spectrogram can fully utilize the effective information in the audio, ensuring stability, while also effectively addressing the challenge of fusing one-dimensional time series audio signals with two-dimensional discrete image signals. (2) We propose a convolutional extraction and modal fusion network framework that incorporates an attention mechanism module during the fusion process, ensuring the stability and robustness of the fused data for audiovisual target classification. Validation was conducted on both a custom dataset and the YouTube-8M dataset. The experimental results indicate that the proposed method demonstrates improvements in accuracy of 2.9%, 2.4%, 1.2%, and 0.9% compared to other multimodal fusion target classification methods on the custom dataset. This demonstrates the effectiveness of the proposed multimodal fusion recognition approach and fully validates the theoretical rationale behind our method. Full article
Show Figures

Figure 1

21 pages, 3952 KiB  
Article
Which Factors Enhance the Perceived Restorativeness of Streetscapes: Sound, Vision, or Their Combined Effects? Insights from Four Street Types in Nanjing, China
by Xi Lu, Jiamin Xu, Eckart Lange and Jingwen Cao
Land 2025, 14(4), 757; https://doi.org/10.3390/land14040757 - 1 Apr 2025
Viewed by 687
Abstract
Streetscapes play a critical role in restorative landscapes, offering opportunities for promoting public well-being. Previous studies have predominantly examined the influence of visual and auditory stimuli on perceived restorativeness independently. There is a limited understanding of their interactive effects. In this research, 360 [...] Read more.
Streetscapes play a critical role in restorative landscapes, offering opportunities for promoting public well-being. Previous studies have predominantly examined the influence of visual and auditory stimuli on perceived restorativeness independently. There is a limited understanding of their interactive effects. In this research, 360 participants completed a series of experiments considering four distinct street types, including visual comfort assessment, acoustic environment assessment, and perceived restorativeness. They were assigned to a control group and one of three experimental groups, each receiving specific enhancement: visual stimuli, auditory stimuli, or a combination of audiovisual stimuli. The findings revealed that the experimental groups reported a greater sense of restorativeness compared to the control group. Notably, auditory stimuli demonstrated a more pronounced restorative effect than visual stimuli, while limited differences were found between auditory and audiovisual stimuli. The differences in experimental outcomes among the four street types are compared and discussed, highlighting context-specific guidelines for enhancing streetscape restorativeness. The research findings highlight enhancing the masking effect of soundscape in street environmental design. The study adds a novel multi-sensory approach to the current body of research on restorative landscapes, providing significant insights for the planning and design of streetscapes. Full article
Show Figures

Figure 1

24 pages, 19241 KiB  
Article
Secular “Angels”. Para-Angelic Imagery in Popular Culture
by Urszula Jarecka
Religions 2025, 16(3), 396; https://doi.org/10.3390/rel16030396 - 20 Mar 2025
Viewed by 2092
Abstract
Religious symbols and figures are gaining new life in popular culture. Reinterpretations of symbols rooted in the visual arts tradition are appearing in film, TV series and short audiovisual forms presented on the Internet, especially on social media. This also applies to angels, [...] Read more.
Religious symbols and figures are gaining new life in popular culture. Reinterpretations of symbols rooted in the visual arts tradition are appearing in film, TV series and short audiovisual forms presented on the Internet, especially on social media. This also applies to angels, to which the author’s research would be devoted. This article discusses images of “secular angels”, decontextualized religious symbols, popularized throughout the 20th and 21st centuries in the visual media of Western culture. From the rich research material, the most characteristic images are selected for discussion and interpretation and subjected to interpretation in the spirit of discourse analysis. The images of modern “angels” in the texts of popular culture refer not so much to their biblical prototypes, but to the moral condition of man in consumerist, individualistic societies focused on living for pleasure. Film, TV series and Internet images of “angels” also show the controversies and social problems (such as racism) faced by contemporary Western societies. Full article
(This article belongs to the Special Issue The Interplay between Religion and Culture)
Show Figures

Figure 1

17 pages, 1032 KiB  
Article
Mapping Australian Culture and Society in the Animated Series Bluey—The Use of Audiovisual Material in Early EFL Learning
by Amaya Arigita-García, Lidia Mañoso-Pacheco, José Luis Estrada-Chichón and Roberto Sánchez-Cabrero
Societies 2024, 14(12), 252; https://doi.org/10.3390/soc14120252 - 27 Nov 2024
Viewed by 2737
Abstract
Bluey stands as the current pinnacle in children’s television series, lauded and adorned with multiple accolades for its educational and social merits. It stands out for its portrayal of childhood social learning within familial settings, offering a realistic depiction of everyday challenges. In [...] Read more.
Bluey stands as the current pinnacle in children’s television series, lauded and adorned with multiple accolades for its educational and social merits. It stands out for its portrayal of childhood social learning within familial settings, offering a realistic depiction of everyday challenges. In addition, Bluey is based on the everyday life of Australian society, clearly reflecting the country’s customs, social values, and natural environments, making it an invaluable resource for enriching the cultural learning of the English language and culture from an Australian point of view, an issue that is rarely addressed in the specialist literature. Thus, this study seeks to identify the cultural and societal facets of Australia depicted in it, with the aim of assessing its pedagogical value in teaching English to non-native learners within the context of primary education. Thirty evaluators analyzed the 52 episodes of the first season of Bluey, endeavoring to identify elements across nine thematic areas. To mitigate variances among evaluators, elements were verified only if agreement was reached by at least three evaluators. In total, evaluators identified 3327 elements representing Australian culture, comprising these categories: (1) Childhood; (2) Devices; (3) Lifestyles; (4) Food; (5) Language; (6) Sports; (7) Animals; (8) Nature; and (9) Places. A total of 1223 elements received verification by the requisite number of evaluators. The resulting catalog of Australia-specific elements per episode serves as a valuable tool in selecting the most instructive episodes for English-language and Australian cultural education for non-natives. This compilation facilitates a nuanced approach to teaching English, rooted in the diverse and culturally rich Australian context, thus breaking away from strictly British and American cultural associations and embracing a broader linguistic and cultural landscape. Full article
Show Figures

Figure 1

17 pages, 2957 KiB  
Article
Out-of-School Exposure to English in EFL Teenage Learners: Is It Related to Academic Performance?
by Linh Tran and Imma Miralpeix
Educ. Sci. 2024, 14(4), 393; https://doi.org/10.3390/educsci14040393 - 10 Apr 2024
Viewed by 4835
Abstract
Learning a Foreign Language (FL) beyond the classroom has become common practice thanks to advances in technology and the use of English as a Lingua Franca. This study explores the types and amount of out-of-school informal exposure to English that Spanish secondary school [...] Read more.
Learning a Foreign Language (FL) beyond the classroom has become common practice thanks to advances in technology and the use of English as a Lingua Franca. This study explores the types and amount of out-of-school informal exposure to English that Spanish secondary school students typically receive in their daily lives. Informed by recent literature on the influence of extramural activities on FL proficiency, the second aim of this study is to investigate the potential relationship between out-of-school exposure and academic performance, as measured by English school grades. Data were obtained from a questionnaire answered by secondary school students aged 12–16 (N = 2015) regarding the different types and amounts of activities they perform in English outside school. Findings revealed that teenage learners were most frequently exposed to English through audiovisual input. Social media interaction, along with reading and writing (with or without digital support), were closely associated with their English marks. Other popular activities, such as listening to music or playing video games, were not found to be related to proficiency or even showed a negative correlation with it, while less popular activities, such as watching subtitled movies and series, could have greater potential for language learning. This study contributes to the understanding of informal practices in FL learning settings and provides insights that can help bridge interactive language practices and formal curriculum to create holistic learning experiences for language learners. Full article
(This article belongs to the Special Issue Informal and Incidental Second Language Vocabulary Learning)
Show Figures

Figure 1

16 pages, 1931 KiB  
Review
Film-Induced Tourism, Destination Branding and Game of Thrones: A Review of the Peñíscola de Cine Project
by Pablo Jesús Huerta-Viso, Germán Llorca Abad and Lourdes Canós-Darós
Sustainability 2024, 16(1), 186; https://doi.org/10.3390/su16010186 - 25 Dec 2023
Cited by 2 | Viewed by 4268
Abstract
This paper addresses an alternative perspective on tourism success, emphasising sustainability over traditional quantitative metrics such as arrival numbers. It explores the impact of fiction films and TV series on individuals’ mental representations of destinations featured on screen, as well as the capacity [...] Read more.
This paper addresses an alternative perspective on tourism success, emphasising sustainability over traditional quantitative metrics such as arrival numbers. It explores the impact of fiction films and TV series on individuals’ mental representations of destinations featured on screen, as well as the capacity of film discourse to construct a brand aligned with local stakeholders’ interests. Qualitative methods have been employed, conducting a literature review on sustainable film tourism and destination branding. Local news and an interview with the head of the Peñíscola Film Office complemented academic insights. The primary goal is to examine the “Peñíscola de Cine” project as a paradigm of success, initiated by the city council of Peñíscola, Spain. This project positions the municipality as a natural film set through productions like Game of Thrones (2011–2019), illustrating how film can contribute to destination branding and community engagement. The study highlights the positive contribution of film tourism to sustainability by diversifying and de-seasonalising a territory’s offerings. It also attracts a more educated and environmentally conscious audience. However, it cautiously discusses the potential risks, as evidenced by misapplications in Goathland, England, and Skellig Michael, Ireland, following their appearances in Heartbeat (1992–2010) and Star Wars (1977–2019), respectively. The paper concludes by suggesting film-friendly measures for destination management organizations (DMOs), emphasising the pivotal role of film commissions and film offices in crafting effective marketing strategies and capturing the interest of audiovisual production companies. Full article
Show Figures

Figure 1

22 pages, 3769 KiB  
Article
Multilingualism as a Functional Element, a Useful Category for the Study of the Construction and Translation of Linguistically Diverse Discourse
by Lorena Hurtado-Malillos
Languages 2023, 8(3), 198; https://doi.org/10.3390/languages8030198 - 23 Aug 2023
Viewed by 2149
Abstract
This article is a discursive and equivalence-generating study of the use of the multilingual property as a narrative transmission mechanism in audiovisual texts. Specific functions can be constructed and different events and aspects of the plot can be presented through the introduction of [...] Read more.
This article is a discursive and equivalence-generating study of the use of the multilingual property as a narrative transmission mechanism in audiovisual texts. Specific functions can be constructed and different events and aspects of the plot can be presented through the introduction of linguistic variation and its deliberate application to achieve defined purposes. The analysis is based on functionalist approaches to the study of fiction and translation and on the binary branching classification model of solution types for determining textual problems in translation based on the form these adopt. This article presents the findings of multilingual property identification and translation related to the application of this forms- and functions-based approach. Several classifications of solution types are also developed with representative examples extracted from film and series. Full article
Show Figures

Figure 1

15 pages, 3752 KiB  
Article
The Processing of Audiovisual Speech Is Linked with Vocabulary in Autistic and Nonautistic Children: An ERP Study
by Kacie Dunham-Carr, Jacob I. Feldman, David M. Simon, Sarah R. Edmunds, Alexander Tu, Wayne Kuang, Julie G. Conrad, Pooja Santapuram, Mark T. Wallace and Tiffany G. Woynaroski
Brain Sci. 2023, 13(7), 1043; https://doi.org/10.3390/brainsci13071043 - 8 Jul 2023
Cited by 3 | Viewed by 2740
Abstract
Explaining individual differences in vocabulary in autism is critical, as understanding and using words to communicate are key predictors of long-term outcomes for autistic individuals. Differences in audiovisual speech processing may explain variability in vocabulary in autism. The efficiency of audiovisual speech processing [...] Read more.
Explaining individual differences in vocabulary in autism is critical, as understanding and using words to communicate are key predictors of long-term outcomes for autistic individuals. Differences in audiovisual speech processing may explain variability in vocabulary in autism. The efficiency of audiovisual speech processing can be indexed via amplitude suppression, wherein the amplitude of the event-related potential (ERP) is reduced at the P2 component in response to audiovisual speech compared to auditory-only speech. This study used electroencephalography (EEG) to measure P2 amplitudes in response to auditory-only and audiovisual speech and norm-referenced, standardized assessments to measure vocabulary in 25 autistic and 25 nonautistic children to determine whether amplitude suppression (a) differs or (b) explains variability in vocabulary in autistic and nonautistic children. A series of regression analyses evaluated associations between amplitude suppression and vocabulary scores. Both groups demonstrated P2 amplitude suppression, on average, in response to audiovisual speech relative to auditory-only speech. Between-group differences in mean amplitude suppression were nonsignificant. Individual differences in amplitude suppression were positively associated with expressive vocabulary through receptive vocabulary, as evidenced by a significant indirect effect observed across groups. The results suggest that efficiency of audiovisual speech processing may explain variance in vocabulary in autism. Full article
Show Figures

Figure 1

10 pages, 282 KiB  
Article
Analysis of the Narratives with Characters That Make Ethnic Diversity Visible—Miraculous: Tales of Ladybug & Cat Noir
by Miriam E. Aguasanta-Regalado, Ángel San Martín Alonso and Isabel M. Gallardo-Fernández
Educ. Sci. 2023, 13(5), 460; https://doi.org/10.3390/educsci13050460 - 29 Apr 2023
Viewed by 3699
Abstract
This study follows the line of different authors who examined the visibility of ethnic diversity in children’s television series and the psychoeducational implications of these media narratives for children. Specifically, this work analyses the behaviours/actions developed by the model characters of cultural diversity [...] Read more.
This study follows the line of different authors who examined the visibility of ethnic diversity in children’s television series and the psychoeducational implications of these media narratives for children. Specifically, this work analyses the behaviours/actions developed by the model characters of cultural diversity and how these characters promote a perspective on diversity that conditions children. Employing a qualitative methodology, we use content analysis and critical discourse analysis as tools to be able to read, describe and interpret said content. The results highlight that these children’s programmes present a culture that reinforces certain values and behaviours. Likewise, the TV programmes analysed present stories marked by models of cultural diversity that contribute to the maintenance of certain social structures and the normalisation of inequality. We believe that educational institutions, through media education, should go deeper and teach students to look critically, deciphering codes of the audiovisual language present in the elements of children’s stories. In the complex society of the 21st century, we must consider that the needs of children change depending on how their identity intersects with aspects such as ethnicity, class, gender, etc., in order to equip them with the appropriate tools to deal with these problems. Full article
27 pages, 3294 KiB  
Article
Evaluating the Presence of Sustainable Development Goals in Digital Teen Series: An Analytical Proposal
by Sara Valenzuela-Monreal, Javier Lozano Delmar and Rafael A. Araque-Padilla
Systems 2023, 11(4), 195; https://doi.org/10.3390/systems11040195 - 12 Apr 2023
Cited by 4 | Viewed by 3479
Abstract
In 2015, the United Nations adopted the Sustainable Development Goals. Several experts on sustainable development have highlighted the need for educational transformation to achieve them on time. Simultaneously, the influence of teen series on the personal and social development of teenagers has been [...] Read more.
In 2015, the United Nations adopted the Sustainable Development Goals. Several experts on sustainable development have highlighted the need for educational transformation to achieve them on time. Simultaneously, the influence of teen series on the personal and social development of teenagers has been increasingly demonstrated, especially after their boom through video-on-demand digital platforms. Therefore, it is worth asking how the 2030 Agenda goals are presented in teen series, especially in those of public television, such as the Spanish one, due to its commitment to young people and the SDGs (ratified in its official documents). The aim of this study is to propose an analysis tool and, subsequently, to apply it to a content analysis of the digital teen series Boca Norte. The results of the analysis reveal that social issues are presented in Boca Norte, while environmental ones are not. In addition, the results show limitations in the integration of SDG-related issues, especially because they are focused on social relationships between characters rather than on realities, contexts and consequences. The tools’ findings could impact or be linked to teenagers’ education. These conclusions prove that the proposed tool is useful, even for the development of new series for public television aligned with its public commitment. Full article
(This article belongs to the Special Issue Communication for the Digital Media Age)
Show Figures

Figure 1

16 pages, 288 KiB  
Article
Black (W)hole Foods: Okra, Soil and Blackness in The Underground Railroad (Barry Jenkins, USA, 2021)
by William Brown
Philosophies 2022, 7(5), 117; https://doi.org/10.3390/philosophies7050117 - 14 Oct 2022
Cited by 1 | Viewed by 3382
Abstract
This essay analyses the role played by okra in The Underground Railroad, together with how it functions in relation to the soil that sustains it and which allows it to grow. I argue that okra represents an otherwise lost African past for [...] Read more.
This essay analyses the role played by okra in The Underground Railroad, together with how it functions in relation to the soil that sustains it and which allows it to grow. I argue that okra represents an otherwise lost African past for both protagonist Cora and for the show in general and that this transplanted plant, similar to the transplanted Africans who endured the Middle Passage on the way to ‘New World’ slave plantations, survives by going through ‘black holes’, something that is not only linked poetically to the established trope of the otherwise absent Black mother but which also finds support from physics, where wormholes (similar to the holes created by worms in the soil) take us through black holes and into new worlds, realities or dimensions. This is reflected in Jenkins’s series (as well as Whitehead’s novel) by the titular Underground Railroad itself, which sees Cora and others disappear underground only to reappear in new states (the show travels from Georgia to South Carolina to North Carolina to Tennessee to Indiana and so on), as well as specifically in the show through the formal properties of the audio-visual (cinematic/televisual) medium, which, with its cuts and movements, similarly keeps shifting through space and time in a nonlinear but generative fashion. Finally, I suggest that we cannot philosophise the plant or the medium of film (or television or streaming media) without philosophising race, with The Underground Railroad serving as a means for bringing together plants and plantations, soil and wormholes and Blackness and black holes, which, collectively and playfully, I group under the umbrella term ‘black (w)hole foods’. Full article
(This article belongs to the Special Issue Thinking Cinema—With Plants)
18 pages, 1639 KiB  
Article
Aesthetic Representation of Antisocial Personality Disorder in British Coming-of-Age TV Series
by Marta Lopera-Mármol, Manel Jiménez-Morales and Mònika Jiménez-Morales
Soc. Sci. 2022, 11(3), 133; https://doi.org/10.3390/socsci11030133 - 17 Mar 2022
Cited by 5 | Viewed by 15572
Abstract
TV series’ depictions of mental disorders have received considerable scholarly attention. However, few studies have considered the role of aesthetic elements in representing mental disorders. Therefore, in this study, we analysed how aesthetic features influence the representation of “psychopathy” in British coming-of-age TV [...] Read more.
TV series’ depictions of mental disorders have received considerable scholarly attention. However, few studies have considered the role of aesthetic elements in representing mental disorders. Therefore, in this study, we analysed how aesthetic features influence the representation of “psychopathy” in British coming-of-age TV series through the case study of The End of the F***ing World. We chose to analyse psychopathy due to its over-representation in the media and its often-mistaken conflation with the actual mental disorder of antisocial personality disorder (ASPD). We applied an aesthetic methodology in our analysis. We analysed the series in terms of language, appearance, behaviour, music and sound, technical devices, and intertextuality, closely observing three sequences of various episodes that correspond to the character’s symptoms, diagnosis, medication, and treatment. Our findings show that the aesthetic characteristics, characters, and events of the plot can act as expressive means through which the experience of living with a mental disorder can be accurately represented and simultaneously entertain viewers with drama and suspense. The series challenges the reductionist perspective and previous stereotypes of audio–visual pieces related to ASPD, suggesting that future TV series can better represent mental disorders with the correct use of television aesthetics and cinematic devices. Full article
Show Figures

Figure 1

24 pages, 1285 KiB  
Article
Data Augmentation for Audio-Visual Emotion Recognition with an Efficient Multimodal Conditional GAN
by Fei Ma, Yang Li, Shiguang Ni, Shao-Lun Huang and Lin Zhang
Appl. Sci. 2022, 12(1), 527; https://doi.org/10.3390/app12010527 - 5 Jan 2022
Cited by 45 | Viewed by 6930
Abstract
Audio-visual emotion recognition is the research of identifying human emotional states by combining the audio modality and the visual modality simultaneously, which plays an important role in intelligent human-machine interactions. With the help of deep learning, previous works have made great progress for [...] Read more.
Audio-visual emotion recognition is the research of identifying human emotional states by combining the audio modality and the visual modality simultaneously, which plays an important role in intelligent human-machine interactions. With the help of deep learning, previous works have made great progress for audio-visual emotion recognition. However, these deep learning methods often require a large amount of data for training. In reality, data acquisition is difficult and expensive, especially for the multimodal data with different modalities. As a result, the training data may be in the low-data regime, which cannot be effectively used for deep learning. In addition, class imbalance may occur in the emotional data, which can further degrade the performance of audio-visual emotion recognition. To address these problems, we propose an efficient data augmentation framework by designing a multimodal conditional generative adversarial network (GAN) for audio-visual emotion recognition. Specifically, we design generators and discriminators for audio and visual modalities. The category information is used as their shared input to make sure our GAN can generate fake data of different categories. In addition, the high dependence between the audio modality and the visual modality in the generated multimodal data is modeled based on Hirschfeld-Gebelein-Rényi (HGR) maximal correlation. In this way, we relate different modalities in the generated data to approximate the real data. Then, the generated data are used to augment our data manifold. We further apply our approach to deal with the problem of class imbalance. To the best of our knowledge, this is the first work to propose a data augmentation strategy with a multimodal conditional GAN for audio-visual emotion recognition. We conduct a series of experiments on three public multimodal datasets, including eNTERFACE’05, RAVDESS, and CMEW. The results indicate that our multimodal conditional GAN has high effectiveness for data augmentation of audio-visual emotion recognition. Full article
Show Figures

Figure 1

Back to TopTop