Next Article in Journal
Improving GPT-Driven Medical Question Answering Model Using SPARQL–Retrieval-Augmented Generation Techniques
Previous Article in Journal
Modelling Mass Transport in Anode-Supported Solid Oxide Fuel Cells
Previous Article in Special Issue
Precise Recognition of Gong-Che Score Characters Based on Deep Learning: Joint Optimization of YOLOv8m and SimAM/MSCAM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

All’s Well That FID’s Well? Result Quality and Metric Scores in GAN Models for Lip-Synchronization Tasks

by
Carina Geldhauser
1,*,†,
Johan Liljegren
2,† and
Pontus Nordqvist
2,†
1
Department Mathematik, ETH Zurich, 8092 Zurich, Switzerland
2
Centre for Mathematical Sciences, Lund University, P.O. Box 118, 22100 Lund, Sweden
*
Author to whom correspondence should be addressed.
All authors contributed equally to this work, hence, alphabetical ordering of author names was applied.
Electronics 2025, 14(17), 3487; https://doi.org/10.3390/electronics14173487
Submission received: 2 July 2025 / Revised: 17 August 2025 / Accepted: 25 August 2025 / Published: 31 August 2025
(This article belongs to the Special Issue New Trends in AI-Assisted Computer Vision)

Abstract

This exploratory study investigates the usability of performance metrics for generative adversarial network (GAN)-based models for speech-driven facial animation. These models focus on the transfer of speech information from an audio file to a still image to generate talking-head videos in a small-scale “everyday usage” setting. Two models, LipGAN and a custom implementation of a Wasserstein GAN with gradient penalty (L1WGAN-GP), are examined for their visual performance and scoring according to commonly used metrics: Quantitative comparisons using FID, SSIM, and PSNR metrics on the GRIDTest dataset show mixed results, and metrics fail to capture local artifacts crucial for lip synchronization, pointing to limitations in their applicability for video animation tasks. The study points towards the inadequacy of current quantitative measures and emphasizes the continued necessity of human qualitative assessment for evaluating talking-head video quality.
Keywords: speech-driven facial animation; generative adversarial networks (GANs); lip synchronization; image-to-video synthesis; audio-driven talking-head generation; evaluation metrics speech-driven facial animation; generative adversarial networks (GANs); lip synchronization; image-to-video synthesis; audio-driven talking-head generation; evaluation metrics

Share and Cite

MDPI and ACS Style

Geldhauser, C.; Liljegren, J.; Nordqvist, P. All’s Well That FID’s Well? Result Quality and Metric Scores in GAN Models for Lip-Synchronization Tasks. Electronics 2025, 14, 3487. https://doi.org/10.3390/electronics14173487

AMA Style

Geldhauser C, Liljegren J, Nordqvist P. All’s Well That FID’s Well? Result Quality and Metric Scores in GAN Models for Lip-Synchronization Tasks. Electronics. 2025; 14(17):3487. https://doi.org/10.3390/electronics14173487

Chicago/Turabian Style

Geldhauser, Carina, Johan Liljegren, and Pontus Nordqvist. 2025. "All’s Well That FID’s Well? Result Quality and Metric Scores in GAN Models for Lip-Synchronization Tasks" Electronics 14, no. 17: 3487. https://doi.org/10.3390/electronics14173487

APA Style

Geldhauser, C., Liljegren, J., & Nordqvist, P. (2025). All’s Well That FID’s Well? Result Quality and Metric Scores in GAN Models for Lip-Synchronization Tasks. Electronics, 14(17), 3487. https://doi.org/10.3390/electronics14173487

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop