You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

30 December 2025

CoFaDiff: Coordinating Identity Fidelity and Text Consistency in Diffusion-Based Face Generation

and
1
Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China
2
University of Chinese Academy of Sciences, Beijing 101408, China
*
Author to whom correspondence should be addressed.
Appl. Sci.2026, 16(1), 414;https://doi.org/10.3390/app16010414 
(registering DOI)

Abstract

Personalized face image generation is essential for Artificial Intelligence-Generated Content (AIGC) applications such as personalized digital avatars and user-customized media creation. However, existing diffusion-based approaches still suffer from insufficient identity consistency and limited text editability. In this work, we propose CoFaDiff, a diffusion-based face generation framework designed for coordinating identity consistency and text-driven editability. To enhance identity consistency, our method integrates a dual-encoder scheme that jointly leverages CLIP and ArcFace to capture both semantic and discriminative facial features, incorporates a progressive curriculum learning strategy to obtain more robust identity representations, and adopts a hybrid IdentityNet–IPAdapter architecture that explicitly models facial location, pose, and corresponding identity embeddings in a unified manner. To enhance text-driven editability, we introduce three complementary optimization strategies: First, long-prompt fine-tuning is employed to reduce the model’s dependency on identity conditions. Second, a semantic alignment loss is incorporated to regularize the influence of identity embeddings within the semantic space of the pretrained diffusion model. Third, during classifier-free guided sampling, we modulate the strength of the identity condition by stacking different numbers of zero-valued identity tokens, enabling users to flexibly balance identity consistency and text editability according to their needs. Experiments on FFHQ and IMDB-WIKI demonstrate that CoFaDiff achieves superior identity consistency and text editability compared to recent baselines.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.