A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models

Yang, Kairui; Gu, Xu; An, Fanglin; Ye, Jun; Zhang, Zhengqi

doi:10.3390/app16105077

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models

by

Kairui Yang

¹,

Xu Gu

¹

,

Fanglin An

¹

,

Jun Ye

¹

and

Zhengqi Zhang

^2,*

¹

School of Cyberspace Security, Hainan University, Haikou 570228, China

²

Key Laboratory of Data Science and Intelligence Education (Hainan Normal University), Ministry of Education, Shanwei Institute of Technology, Haikou 571158, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(10), 5077; https://doi.org/10.3390/app16105077

Submission received: 14 April 2026 / Revised: 16 May 2026 / Accepted: 16 May 2026 / Published: 19 May 2026

Download Versions Notes

Abstract

Diffusion models have achieved groundbreaking progress in image generation, text-to-image, and other multimodal generation tasks, becoming the mainstream architecture in the field of generative artificial intelligence. However, studies have shown that diffusion models are vulnerable to backdoor attacks. By injecting specific triggers into the training data, attackers can manipulate the model to generate preset target images during the inference phase, posing a serious security threat. Existing defense methods suffer from three major limitations: detection methods typically rely on prior knowledge of specific attack types or require large amounts of real data; removal methods lack theoretical modeling of the intrinsic mechanism of backdoor injection; and there is no unified, low-data-dependency defense framework. To address the above issues, this paper proposes a unified defense framework named DIFFDEFEND. For the first time, it summarizes the essence of backdoor injection as “layer-by-layer propagation of distribution shifts” and designs a complete solution that achieves high-precision detection and effective removal without requiring real data. Specifically, this paper first proposes a multi-stage joint trigger inversion method that exploits the consistency constraints of distribution shifts across multiple time steps to achieve stable recovery of the trigger. Second, it constructs a dual-modal detector that combines the uniformity score of generated images with total variation loss to achieve high-precision identification of backdoored models. Finally, it designs a distribution-guided purification mechanism that freezes a clean reference model and optimizes the removal loss and retention loss, rapidly eliminating backdoor effects without relying on real data while preserving the model’s generation quality. Extensive experiments on three mainstream architectures—DDPM, NCSN, and LDM—and 13 different samplers demonstrate that DIFFDEFEND achieves near-100% detection accuracy, reduces the backdoor attack success rate to nearly 0, and keeps the model’s generation quality essentially unchanged, significantly outperforming existing methods.

Keywords: diffusion models; backdoor attacks; distribution shift; trigger inversion; model purification

Share and Cite

MDPI and ACS Style

Yang, K.; Gu, X.; An, F.; Ye, J.; Zhang, Z. A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models. Appl. Sci. 2026, 16, 5077. https://doi.org/10.3390/app16105077

AMA Style

Yang K, Gu X, An F, Ye J, Zhang Z. A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models. Applied Sciences. 2026; 16(10):5077. https://doi.org/10.3390/app16105077

Chicago/Turabian Style

Yang, Kairui, Xu Gu, Fanglin An, Jun Ye, and Zhengqi Zhang. 2026. "A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models" Applied Sciences 16, no. 10: 5077. https://doi.org/10.3390/app16105077

APA Style

Yang, K., Gu, X., An, F., Ye, J., & Zhang, Z. (2026). A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models. Applied Sciences, 16(10), 5077. https://doi.org/10.3390/app16105077

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI