MDPI - Publisher of Open Access Journals

21 pages, 2869 KiB

Open AccessArticle

Multimodal Feature-Guided Audio-Driven Emotional Talking Face Generation

by Xueping Wang, Yuemeng Huo, Yanan Liu, Xueni Guo, Feihu Yan and Guangzhe Zhao

Electronics 2025, 14(13), 2684; https://doi.org/10.3390/electronics14132684 - 2 Jul 2025

Viewed by 548

Audio-driven emotional talking face generation aims to generate talking face videos with rich facial expressions and temporal coherence. Current diffusion model-based approaches predominantly depend on either single-label emotion annotations or external video references, which often struggle to capture the complex relationships between modalities, [...] Read more.

Audio-driven emotional talking face generation aims to generate talking face videos with rich facial expressions and temporal coherence. Current diffusion model-based approaches predominantly depend on either single-label emotion annotations or external video references, which often struggle to capture the complex relationships between modalities, resulting in less natural emotional expressions. To address these issues, we propose MF-ETalk, a multimodal feature-guided method for emotional talking face generation. Specifically, we design an emotion-aware multimodal feature disentanglement and fusion framework that leverages Action Units (AUs) to disentangle facial expressions and models the nonlinear relationships among AU features using a residual encoder. Furthermore, we introduce a hierarchical multimodal feature fusion module that enables dynamic interactions among audio, visual cues, AUs, and motion dynamics. This module is optimized through global motion modeling, lip synchronization, and expression subspace learning, enabling full-face dynamic generation. Finally, an emotion-consistency constraint module is employed to refine the generated results and ensure the naturalness of expressions. Extensive experiments on the MEAD and HDTF datasets demonstrate that MF-ETalk outperforms state-of-the-art methods in both expression naturalness and lip-sync accuracy. For example, it achieves an FID of 43.052 and E-FID of 2.403 on MEAD, along with strong synchronization performance (LSE-C of 6.781, LSE-D of 7.962), confirming the effectiveness of our approach in producing realistic and emotionally expressive talking face videos. Full article

► Show Figures

Figure 1

15 pages, 259 KiB

Open AccessArticle

Researching Young People and Far-Right Populism

by Pam Nilan

Soc. Sci. 2025, 14(5), 270; https://doi.org/10.3390/socsci14050270 - 28 Apr 2025

Viewed by 1295

Abstract

This paper considers the challenges facing qualitative researchers who study far-right populism and youth. First, there is the question of the method itself. Across the relevant literature, it seems more popular to use online methodologies rather than conduct face-to-face interviews. This is not [...] Read more.

This paper considers the challenges facing qualitative researchers who study far-right populism and youth. First, there is the question of the method itself. Across the relevant literature, it seems more popular to use online methodologies rather than conduct face-to-face interviews. This is not surprising given the difficulties of talking face-to-face with a specific cohort of young people who are often suspicious of outsiders and who may even pose a personal security risk to the interviewer. Second, the age, gender, and institutional status of a researcher may constitute an obstacle to the effectiveness of a face-to-face interview. Common features of far-right populism are mistrust of elites and misogyny. Moreover, the online world of youth today is a dynamic technological sphere that may be hard to grasp for someone from a previous generation. This paper is a reflective essay that uses examples of research in action. It aims to invite reader reflection on attuning research approaches to the lived experiences of youth drawn to far-right populism Full article

(This article belongs to the Special Issue Researching Youth on the Move: Methods, Ethics and Emotions)

15 pages, 3962 KiB

Open AccessArticle

Continuous Talking Face Generation Based on Gaussian Blur and Dynamic Convolution

by Ying Tang, Yazhi Liu and Wei Li

Sensors 2025, 25(6), 1885; https://doi.org/10.3390/s25061885 - 18 Mar 2025

Cited by 1 | Viewed by 594

Abstract

In the field of talking face generation, two-stage audio-based generation methods have attracted significant research interest. However, these methods still face challenges in achieving lip–audio synchronization during face generation, as well as issues with the discontinuity between the generated parts and original face [...] Read more.

In the field of talking face generation, two-stage audio-based generation methods have attracted significant research interest. However, these methods still face challenges in achieving lip–audio synchronization during face generation, as well as issues with the discontinuity between the generated parts and original face in rendered videos. To overcome these challenges, this paper proposes a two-stage talking face generation method. The first stage is the landmark generation stage. A dynamic convolutional transformer generator is designed to capture complex facial movements. A dual-pipeline parallel processing mechanism is adopted to enhance the temporal feature correlation of input features and the ability to model details at the spatial scale. In the second stage, a dynamic Gaussian renderer (adaptive Gaussian renderer) is designed to realize seamless and natural connection of the upper- and lower-boundary areas through a Gaussian blur masking technique. We conducted quantitative analyses on the LRS2, HDTF, and MEAD neutral expression datasets. Experimental results demonstrate that, compared with existing methods, our approach significantly improves the realism and lip–audio synchronization of talking face videos. In particular, on the LRS2 dataset, the lip–audio synchronization rate was improved by 18.16% and the peak signal-to-noise ratio was improved by 12.11% compared to state-of-the-art works. Full article

(This article belongs to the Special Issue Convolutional Neural Network Technology for 3D Imaging and Sensing)

► Show Figures

Figure 1

17 pages, 3610 KiB

Open AccessArticle

Multi-Level Feature Dynamic Fusion Neural Radiance Fields for Audio-Driven Talking Head Generation

by Wenchao Song, Qiong Liu, Yanchao Liu, Pengzhou Zhang and Juan Cao

Appl. Sci. 2025, 15(1), 479; https://doi.org/10.3390/app15010479 - 6 Jan 2025

Viewed by 1562

Abstract

Audio-driven cross-modal talking head generation has experienced significant advancement in the last several years, and it aims to generate a talking head video that corresponds to a given audio sequence. Out of these approaches, the NeRF-based method can generate videos featuring a specific [...] Read more.

Audio-driven cross-modal talking head generation has experienced significant advancement in the last several years, and it aims to generate a talking head video that corresponds to a given audio sequence. Out of these approaches, the NeRF-based method can generate videos featuring a specific person with more natural motion compared to the one-shot methods. However, previous approaches failed to distinguish the importance of different regions, resulting in the loss of information-rich region features. To alleviate the problem and improve video quality, we propose MLDF-NeRF, an end-to-end method for talking head generation, which can achieve better vector representation through multi-level feature dynamic fusion. Specifically, we designed two modules in MLDF-NeRF to enhance the cross-modal mapping ability between audio and different facial regions. We initially developed a multi-level tri-plane hash representation that uses three sets of tri-plane hash networks with varying resolutions of limitation to capture the dynamic information of the face more accurately. Then, we introduce the idea of multi-head attention and design an efficient audio-visual fusion module that explicitly fuses audio features with image features from different planes, thereby improving the mapping between audio features and spatial information. Meanwhile, the design helps to minimize interference from facial areas unrelated to audio, thereby improving the overall quality of the representation. The quantitative and qualitative results indicate that our proposed method can effectively generate talk heads with natural actions and realistic details. Compared with previous methods, it performs better in terms of image quality, lip sync, and other aspects. Full article

► Show Figures

Figure 1

27 pages, 2436 KiB

Open AccessArticle

Seeing the Sound: Multilingual Lip Sync for Real-Time Face-to-Face Translation

by Amirkia Rafiei Oskooei, Mehmet S. Aktaş and Mustafa Keleş

Computers 2025, 14(1), 7; https://doi.org/10.3390/computers14010007 - 28 Dec 2024

Cited by 3 | Viewed by 4155

Abstract

Imagine a future where language is no longer a barrier to real-time conversations, enabling instant and lifelike communication across the globe. As cultural boundaries blur, the demand for seamless multilingual communication has become a critical technological challenge. This paper addresses the lack of [...] Read more.

Imagine a future where language is no longer a barrier to real-time conversations, enabling instant and lifelike communication across the globe. As cultural boundaries blur, the demand for seamless multilingual communication has become a critical technological challenge. This paper addresses the lack of robust solutions for real-time face-to-face translation, particularly for low-resource languages, by introducing a comprehensive framework that not only translates language but also replicates voice nuances and synchronized facial expressions. Our research tackles the primary challenge of achieving accurate lip synchronization across culturally diverse languages, filling a significant gap in the literature by evaluating the generalizability of lip sync models beyond English. Specifically, we develop a novel evaluation framework combining quantitative lip sync error metrics and qualitative assessments by human observers. This framework is applied to assess two state-of-the-art lip sync models with different architectures for Turkish, Persian, and Arabic languages, using a newly collected dataset. Based on these findings, we propose and implement a modular system that integrates language-agnostic lip sync models with neural networks to deliver a fully functional face-to-face translation experience. Inference Time Analysis shows this system achieves highly realistic, face-translated talking heads in real time, with a throughput as low as 0.381 s. This transformative framework is primed for deployment in immersive environments such as VR/AR, Metaverse ecosystems, and advanced video conferencing platforms. It offers substantial benefits to developers and businesses aiming to build next-generation multilingual communication systems for diverse applications. While this work focuses on three languages, its modular design allows scalability to additional languages. However, further testing in broader linguistic and cultural contexts is required to confirm its universal applicability, paving the way for a more interconnected and inclusive world where language ceases to hinder human connection. Full article

(This article belongs to the Special Issue Computational Science and Its Applications 2024 (ICCSA 2024))

► Show Figures

Figure 1

24 pages, 1556 KiB

Open AccessReview

Audio-Driven Facial Animation with Deep Learning: A Survey

by Diqiong Jiang, Jian Chang, Lihua You, Shaojun Bian, Robert Kosk and Greg Maguire

Information 2024, 15(11), 675; https://doi.org/10.3390/info15110675 - 28 Oct 2024

Cited by 1 | Viewed by 6959

Abstract

Audio-driven facial animation is a rapidly evolving field that aims to generate realistic facial expressions and lip movements synchronized with a given audio input. This survey provides a comprehensive review of deep learning techniques applied to audio-driven facial animation, with a focus on [...] Read more.

Audio-driven facial animation is a rapidly evolving field that aims to generate realistic facial expressions and lip movements synchronized with a given audio input. This survey provides a comprehensive review of deep learning techniques applied to audio-driven facial animation, with a focus on both audio-driven facial image animation and audio-driven facial mesh animation. These approaches employ deep learning to map audio inputs directly onto 3D facial meshes or 2D images, enabling the creation of highly realistic and synchronized animations. This survey also explores evaluation metrics, available datasets, and the challenges that remain, such as disentangling lip synchronization and emotions, generalization across speakers, and dataset limitations. Lastly, we discuss future directions, including multi-modal integration, personalized models, and facial attribute modification in animations, all of which are critical for the continued development and application of this technology. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

15 pages, 1617 KiB

Open AccessArticle

Destigmatizing Palliative Care among Young Adults—A Theoretical Intervention Mapping Approach

by Yann-Nicolas Batzler, Manuela Schallenburger, Jacqueline Schwartz, Chantal Marazia and Martin Neukirchen

Healthcare 2024, 12(18), 1863; https://doi.org/10.3390/healthcare12181863 - 16 Sep 2024

Viewed by 1511

Abstract

Background: In medicine, stigmatization pertains to both afflicted individuals and diseases themselves but can also encompass entire medical fields. In regard to demographic change and the rising prevalence of oncological diseases, palliative care will become increasingly important. However, palliative care faces multiple stigmas. [...] Read more.

Background: In medicine, stigmatization pertains to both afflicted individuals and diseases themselves but can also encompass entire medical fields. In regard to demographic change and the rising prevalence of oncological diseases, palliative care will become increasingly important. However, palliative care faces multiple stigmas. These include equating of palliative care with death and dying. A timely integration of palliative care would have the potential to alleviate symptom burden, diminish the risk of overtreatment, and thus save healthcare-related costs. Several interventions have been developed to destigmatize palliative care. However, they have mainly focused on the general public. Aim: The aim of this work is to develop a theoretical framework for an interventional campaign targeted at young adults to systematically destigmatize palliative care. Methods: The basis for the development of the campaign is a systematic review conducted by our working group that assessed the perception and knowledge of palliative care of young adults aged 18 to 24 years. To design a possible intervention, the Intervention Mapping approach was used. Results: The target group of young adults can be effectively reached in secondary schools, vocational schools, and universities. The target population should be able to discuss the content of palliative care and openly talk about death and dying. At the environmental level, palliative care should be more present in public spaces, and death and dying should be freed from taboos. Within an intervention with palliative care experts and patients serving as interventionists, these changes can be achieved by incorporating evidence-based methods of behavioral change. Conclusions: An early engagement with palliative care could contribute to the long-term reduction of stigmas and address the demographic shift effectively. A multimodal intervention approach comprising knowledge dissemination, exchange, and media presence provides an appropriate framework to counter the existing stigmatization of palliative care within the peer group of young adults. Full article

(This article belongs to the Special Issue Development and Impact of Palliative and End-of-Life Care Services)

► Show Figures

Figure 1

20 pages, 12092 KiB

Open AccessArticle

Low-Cost Optimized U-Net Model with GMM Automatic Labeling Used in Forest Semantic Segmentation

by Alexandru-Toma Andrei and Ovidiu Grigore

Sensors 2023, 23(21), 8991; https://doi.org/10.3390/s23218991 - 5 Nov 2023

Cited by 3 | Viewed by 2133

Abstract

Currently, Convolutional Neural Networks (CNN) are widely used for processing and analyzing image or video data, and an essential part of state-of-the-art studies rely on training different CNN architectures. They have broad applications, such as image classification, semantic segmentation, or face recognition. Regardless [...] Read more.

Currently, Convolutional Neural Networks (CNN) are widely used for processing and analyzing image or video data, and an essential part of state-of-the-art studies rely on training different CNN architectures. They have broad applications, such as image classification, semantic segmentation, or face recognition. Regardless of the application, one of the important factors influencing network performance is the use of a reliable, well-labeled dataset in the training stage. Most of the time, especially if we talk about semantic classification, labeling is time and resource-consuming and must be done manually by a human operator. This article proposes an automatic label generation method based on the Gaussian mixture model (GMM) unsupervised clustering technique. The other main contribution of this paper is the optimization of the hyperparameters of the traditional U-Net model to achieve a balance between high performance and the least complex structure for implementing a low-cost system. The results showed that the proposed method decreased the resources needed, computation time, and model complexity while maintaining accuracy. Our methods have been tested in a deforestation monitoring application by successfully identifying forests in aerial imagery. Full article

(This article belongs to the Special Issue Machine Learning Based Remote Sensing Image Classification)

► Show Figures

Figure 1

18 pages, 4164 KiB

Open AccessArticle

BlinkLinMulT: Transformer-Based Eye Blink Detection

by Ádám Fodor, Kristian Fenech and András Lőrincz

J. Imaging 2023, 9(10), 196; https://doi.org/10.3390/jimaging9100196 - 26 Sep 2023

Cited by 6 | Viewed by 5931

Abstract

This work presents BlinkLinMulT, a transformer-based framework for eye blink detection. While most existing approaches rely on frame-wise eye state classification, recent advancements in transformer-based sequence models have not been explored in the blink detection literature. Our approach effectively combines low- and high-level [...] Read more.

This work presents BlinkLinMulT, a transformer-based framework for eye blink detection. While most existing approaches rely on frame-wise eye state classification, recent advancements in transformer-based sequence models have not been explored in the blink detection literature. Our approach effectively combines low- and high-level feature sequences with linear complexity cross-modal attention mechanisms and addresses challenges such as lighting changes and a wide range of head poses. Our work is the first to leverage the transformer architecture for blink presence detection and eye state recognition while successfully implementing an efficient fusion of input features. In our experiments, we utilized several publicly available benchmark datasets (CEW, ZJU, MRL Eye, RT-BENE, EyeBlink8, Researcher’s Night, and TalkingFace) to extensively show the state-of-the-art performance and generalization capability of our trained model. We hope the proposed method can serve as a new baseline for further research. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

23 pages, 574 KiB

Open AccessReview

Towards Heritage Transformation Perspectives

by Rasa Pranskūnienė and Erika Zabulionienė

Sustainability 2023, 15(7), 6135; https://doi.org/10.3390/su15076135 - 3 Apr 2023

Cited by 6 | Viewed by 4987

Abstract

When facing the challenge of preserving cultural heritage for future generations, it becomes important to talk about heritage transformations and the perspectives of these transformations. Thus, this integrative review article seeks to discuss heritage transformations and their perspectives for future tourism development, by [...] Read more.

When facing the challenge of preserving cultural heritage for future generations, it becomes important to talk about heritage transformations and the perspectives of these transformations. Thus, this integrative review article seeks to discuss heritage transformations and their perspectives for future tourism development, by analyzing various theoretical and empirical literature sources. The results of this integrative review analysis highlighted the importance of paying attention to the three layers of perspectives: personal, local, and regional. Thus, the discussion opened up the following “IPR” theoretical insights: heritage transformations—“I”—as personal transformations, heritage transformations—“Place”—as local perspective, heritage transformations—“R”—as regional perspective. It has revealed that all three discussed heritage transformation perspectives are experiencing significant connections. The biggest challenge of current and future heritage transformations is a dependence on being constantly interconnected (individually, locally, regionally) and on being constantly influenced by the world’s challenges and development trends. When looking towards future tourism development, the interconnected layers of heritage transformation perspectives could lead to the constant integration and creation of interwoven tourism values and experiences. Full article

(This article belongs to the Special Issue Tourism, Sustainable Development, and Cultural Heritage)

► Show Figures

Figure 1

14 pages, 1140 KiB

Open AccessReview

Towards a More Realistic In Vitro Meat: The Cross Talk between Adipose and Muscle Cells

by Margherita Pallaoro, Silvia Clotilde Modina, Andrea Fiorati, Lina Altomare, Giorgio Mirra, Paola Scocco and Alessia Di Giancamillo

Int. J. Mol. Sci. 2023, 24(7), 6630; https://doi.org/10.3390/ijms24076630 - 1 Apr 2023

Cited by 10 | Viewed by 4334

Abstract

According to statistics and future predictions, meat consumption will increase in the coming years. Considering both the environmental impact of intensive livestock farming and the importance of protecting animal welfare, the necessity of finding alternative strategies to satisfy the growing meat demand is [...] Read more.

According to statistics and future predictions, meat consumption will increase in the coming years. Considering both the environmental impact of intensive livestock farming and the importance of protecting animal welfare, the necessity of finding alternative strategies to satisfy the growing meat demand is compelling. Biotechnologies are responding to this demand by developing new strategies for producing meat in vitro. The manufacturing of cultured meat has faced criticism concerning, above all, the practical issues of culturing together different cell types typical of meat that are partly responsible for meat’s organoleptic characteristics. Indeed, the existence of a cross talk between adipose and muscle cells has critical effects on the outcome of the co-culture, leading to a general inhibition of myogenesis in favor of adipogenic differentiation. This review aims to clarify the main mechanisms and the key molecules involved in this cross talk and provide an overview of the most recent and successful meat culture 3D strategies for overcoming this challenge, focusing on the approaches based on farm-animal-derived cells. Full article

(This article belongs to the Special Issue Advances in Organ-on-Chip)

► Show Figures

Graphical abstract

14 pages, 329 KiB

Open AccessReview

The Future of Epidemic and Pandemic Vaccines to Serve Global Public Health Needs

by Andrew Farlow, Els Torreele, Glenda Gray, Kiat Ruxrungtham, Helen Rees, Sai Prasad, Carolina Gomez, Amadou Sall, Jorge Magalhães, Piero Olliaro and Petro Terblanche

Vaccines 2023, 11(3), 690; https://doi.org/10.3390/vaccines11030690 - 17 Mar 2023

Cited by 35 | Viewed by 6589

Abstract

This Review initiates a wide-ranging discussion over 2023 by selecting and exploring core themes to be investigated more deeply in papers submitted to the Vaccines Special Issue on the “Future of Epidemic and Pandemic Vaccines to Serve Global Public Health Needs”. To tackle [...] Read more.

This Review initiates a wide-ranging discussion over 2023 by selecting and exploring core themes to be investigated more deeply in papers submitted to the Vaccines Special Issue on the “Future of Epidemic and Pandemic Vaccines to Serve Global Public Health Needs”. To tackle the SARS-CoV-2 pandemic, an acceleration of vaccine development across different technology platforms resulted in the emergency use authorization of multiple vaccines in less than a year. Despite this record speed, many limitations surfaced including unequal access to products and technologies, regulatory hurdles, restrictions on the flow of intellectual property needed to develop and manufacture vaccines, clinical trials challenges, development of vaccines that did not curtail or prevent transmission, unsustainable strategies for dealing with variants, and the distorted allocation of funding to favour dominant companies in affluent countries. Key to future epidemic and pandemic responses will be sustainable, global-public-health-driven vaccine development and manufacturing based on equitable access to platform technologies, decentralised and localised innovation, and multiple developers and manufacturers, especially in low- and middle-income countries (LMICs). There is talk of flexible, modular pandemic preparedness, of technology access pools based on non-exclusive global licensing agreements in exchange for fair compensation, of WHO-supported vaccine technology transfer hubs and spokes, and of the creation of vaccine prototypes ready for phase I/II trials, etc. However, all these concepts face extraordinary challenges shaped by current commercial incentives, the unwillingness of pharmaceutical companies and governments to share intellectual property and know-how, the precariousness of building capacity based solely on COVID-19 vaccines, the focus on large-scale manufacturing capacity rather than small-scale rapid-response innovation to stop outbreaks when and where they occur, and the inability of many resource-limited countries to afford next-generation vaccines for their national vaccine programmes. Once the current high subsidies are gone and interest has waned, sustaining vaccine innovation and manufacturing capability in interpandemic periods will require equitable access to vaccine innovation and manufacturing capabilities in all regions of the world based on many vaccines, not just “pandemic vaccines”. Public and philanthropic investments will need to leverage enforceable commitments to share vaccines and critical technology so that countries everywhere can establish and scale up vaccine development and manufacturing capability. This will only happen if we question all prior assumptions and learn the lessons offered by the current pandemic. We invite submissions to the special issue, which we hope will help guide the world towards a global vaccine research, development, and manufacturing ecosystem that better balances and integrates scientific, clinical trial, regulatory, and commercial interests and puts global public health needs first. Full article

(This article belongs to the Special Issue Future of Epidemic and Pandemic Vaccines to Serve Global Public Health Needs)

27 pages, 5842 KiB

Open AccessEditor’s ChoiceReview

Green Energy by Hydrogen Production from Water Splitting, Water Oxidation Catalysis and Acceptorless Dehydrogenative Coupling

by Jesús Antonio Luque-Urrutia, Thalía Ortiz-García, Miquel Solà and Albert Poater

Inorganics 2023, 11(2), 88; https://doi.org/10.3390/inorganics11020088 - 20 Feb 2023

Cited by 20 | Viewed by 5212

Abstract

In this review, we want to explain how the burning of fossil fuels is pushing us towards green energy. Actually, for a long time, we have believed that everything is profitable, that resources are unlimited and there are no consequences. However, the reality [...] Read more.

In this review, we want to explain how the burning of fossil fuels is pushing us towards green energy. Actually, for a long time, we have believed that everything is profitable, that resources are unlimited and there are no consequences. However, the reality is often disappointing. The use of non-renewable resources, the excessive waste production and the abandonment of the task of recycling has created a fragile thread that, once broken, may never restore itself. Metaphors aside, we are talking about our planet, the Earth, and its unique ability to host life, including ourselves. Our world has its balance; when the wind erodes a mountain, a beach appears, or when a fire devastates an area, eventually new life emerges from the ashes. However, humans have been distorting this balance for decades. Our evolving way of living has increased the number of resources that each person consumes, whether food, shelter, or energy; we have overworked everything to exhaustion. Scientists worldwide have already said actively and passively that we are facing one of the biggest problems ever: climate change. This is unsustainable and we must try to revert it, or, if we are too late, slow it down as much as possible. To make this happen, there are many possible methods. In this review, we investigate catalysts for using water as an energy source, or, instead of water, alcohols. On the other hand, the recycling of gases such as CO₂ and N₂O is also addressed, but we also observe non-catalytic means of generating energy through solar cell production. Full article

(This article belongs to the Special Issue Inorganics for Catalysts: Design, Synthesis and Applications)

► Show Figures

Graphical abstract

16 pages, 4534 KiB

Open AccessArticle

Emotionally Controllable Talking Face Generation from an Arbitrary Emotional Portrait

by Zikang Zhao, Yujia Zhang, Tianjun Wu, Hao Guo and Yao Li

Appl. Sci. 2022, 12(24), 12852; https://doi.org/10.3390/app122412852 - 14 Dec 2022

Cited by 4 | Viewed by 3519

Abstract

With the continuous development of cross-modality generation, audio-driven talking face generation has made substantial advances in terms of speech content and mouth shape, but existing research on talking face emotion generation is still relatively unsophisticated. In this work, we present Emotionally Controllable Talking [...] Read more.

With the continuous development of cross-modality generation, audio-driven talking face generation has made substantial advances in terms of speech content and mouth shape, but existing research on talking face emotion generation is still relatively unsophisticated. In this work, we present Emotionally Controllable Talking Face Generation from an Arbitrary Emotional Portrait to synthesize lip-sync and an emotionally controllable high-quality talking face. Specifically, we take a facial reenactment perspective, using facial landmarks as an intermediate representation driving the expression generation of talking faces through the landmark features of an arbitrary emotional portrait. Meanwhile, decoupled design ideas are used to divide the model into three sub-networks to improve emotion control. They are the lip-sync landmark animation generation network, the emotional landmark animation generation network, and the landmark-to-animation translation network. The two landmark animation generation networks are responsible for generating content-related lip area landmarks and facial expression landmarks to correct the landmark sequences of the target portrait. Following this, the corrected landmark sequences and the target portrait are fed into the translation network to generate an emotionally controllable talking face. Our method controls the expressions of talking faces by driving the emotional portrait images while ensuring the generation of animated lip-sync, and can handle new audio and portraits not seen during training. A multi-perspective user study and extensive quantitative and qualitative evaluations demonstrate the superiority of the system in terms of visual emotion representation and video authenticity. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

13 pages, 2294 KiB

Open AccessArticle

Using Architectural Mapping to Understand Behavior and Space Utilization in a Surgical Waiting Room of a Safety Net Hospital

by Elizabeth N. Liao, Lara Z. Chehab, Michelle Ossmann, Benjamin Alpers, Devika Patel and Amanda Sammann

Int. J. Environ. Res. Public Health 2022, 19(21), 13870; https://doi.org/10.3390/ijerph192113870 - 25 Oct 2022

Cited by 2 | Viewed by 2916

Abstract

Objective: To use architectural mapping to understand how patients and families utilize the waiting space at an outpatient surgery clinic in a safety-net hospital. Background: The waiting period is an important component of patient experience and satisfaction. Studies have found that patients value [...] Read more.

Objective: To use architectural mapping to understand how patients and families utilize the waiting space at an outpatient surgery clinic in a safety-net hospital. Background: The waiting period is an important component of patient experience and satisfaction. Studies have found that patients value privacy, information transparency and comfort. However, approaches common in the architecture field have rarely been used to investigate interactions between patients and the built environment in a safety-net healthcare setting. Methods: This was a prospective observational study in a general surgery outpatient clinic at a safety-net hospital and level 1 trauma center. We used a web-based application generated from the design and architecture industry, to quantitatively track waiting space utilization over 2 months. Results: A total of 728 observations were recorded across 5 variables: time, location, chair selection, person/object, and activity. There were 536 (74%) observations involving people and 179 (25%) involving personal items. People most frequently occupied chairs facing the door (43%, n = 211), and least frequently occupied seats in the hallway (5%, n = 23), regardless of the time of their appointment (p-value = 0.92). Most common activities included interacting with personal phone, gazing into space, and talking face to face. Thirteen percent of people brought mobility devices, and 64% of objects were placed on an adjacent chair, indicating the desire for increased personal space. Conclusion: Architectural behavioral mapping is an effective information gathering tool to help design waiting space improvement in the safety-net healthcare setting. Full article

► Show Figures

Figure 1

Search Results (28)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (28)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI