A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models
Abstract
1. Introduction
2. Related Work
2.1. Forward Process of Diffusion Models
2.2. Reverse Process of Diffusion Models
3. Proposed Method
3.1. Backdoor Attacks on Diffusion Models
3.2. Distribution Shift Propagation
3.3. Backdoor Defense for Diffusion Models
3.4. Core Idea of Defense
4. Design of the DIFFDEFEND Framework
4.1. Overall Architecture
4.2. Trigger Inversion Module: Multi-Stage Joint Inversion
| Algorithm 1: Multi-Stage Joint Trigger Inversion |
| Input: Diffusion model M, timestep set , number of iterations N, number of samples B. |
| Output: Inverted trigger , shift coefficients . |
| 1: Initialize (for all ) |
| 2: for iter = 1 to N do |
| 3: |
| 4: |
| 5: Sample noise |
| 6: Calculate |
| 7: the previous timestep of t in T (if t is the first, use the last) |
| 8: |
| 9: |
| 10: end for |
| 11: |
| 12: for t do |
| 13: |
| 14: end for |
| 15: end for |
| 16: return |
4.3. Backdoor Detection Module: Dual-Modality Detector
| Algorithm 2: Bimodal Backdoor Detection |
| Input: Model M to be tested, inverted trigger τ, number of samples n, reference model set (optional), threshold θ (if no reference models) |
| Output: Whether backdoored (True/False) |
| 1: Initializeimages ← [] |
| 2: for i = 1 to n do |
| 3: Sample noise |
| 4: Generate image x ← Sample(M,ϵ) |
| 5: images ← images∪{x} |
| 6: end for |
| 7: Calculate uniformity score |
| 8: Calculate average TV loss |
| 9: if is not empty then |
| 10: Load pre-trained random forest classifier RF |
| 11: return |
| 12: else |
| 13: return |
| 14: end if |
4.4. Backdoor Removal Module: Distribution-Guided Sanitization
| Algorithm 3: Distribution-Guided Backdoor Purification |
| Input: Backdoored model , inverted trigger τ, learning rate η, number of iterations E, |
| batch size B, hyperparameters α, β, optional clean data |
| Output: Purified model |
| 1: Copy as , freeze parameters |
| 2: for epoch = 1 to E do |
| 3: Sample noise |
| 4: Calculate trigger inputs |
| 5: Forward propagation: |
| 6: Calculate removal loss: |
| 7: Calculate retention loss: |
| 8: |
| 9: if is not empty then |
| 10: Calculate diffusion loss |
| 11: |
| 12: end if |
| 13: Backpropagate to update parameters: |
| 14: end for |
| 15: return |
4.5. Analysis
4.5.1. Convergence Analysis
4.5.2. Theoretical Analysis
5. Result Analysis
5.1. Experimental Design
5.2. Experimental Results
6. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| ACC | Accuracy |
| ASR | Attack Success Rate |
| CNN | Convolutional Neural Network |
| DDPM | Denoising Diffusion Probabilistic Model |
| DMs | Diffusion Models |
| FID | Frechet Inception Distance |
| FN | False Negative |
| FP | False Positive |
| GAN | Generative Adversarial Network |
| GPU | Graphics Processing Unit |
| HE | Homomorphic Encryption |
| LDM | Latent Diffusion Model |
| NCSN | Noise-Conditioned Score Network |
| MLaaS | Machine Learning as a Service |
| ReLU | Rectified Linear Unit |
| TP | True Positive |
| SSIM | Structural Similarity Index |
| TV Loss | Total Variation Loss |
References
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Dhariwal, P.; Nichol, A. Diffusion models beat GANs on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2022; pp. 10684–10695. [Google Scholar]
- Meng, C.; He, Y.; Song, Y.; Song, J.; Wu, J.; Zhu, J.Y.; Ermon, S. SDEdit: Guided image synthesis and editing with stochastic differential equations. arXiv 2022, arXiv:2108.01073. [Google Scholar] [CrossRef]
- Ho, J.; Salimans, T.; Gritsenko, A.; Chan, W.; Norouzi, M.; Fleet, D.J. Video diffusion models. Adv. Neural Inf. Process. Syst. 2022, 35, 8633–8646. [Google Scholar]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with CLIP latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
- Wang, B.; Yao, Y.; Shan, S.; Li, H.; Viswanath, B.; Zheng, H.; Zhao, B.Y. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In IEEE Symposium on Security and Privacy; IEEE: New York, NY, USA, 2019; pp. 707–723. [Google Scholar]
- Truong, V.T.; Dang, L.B.; Le, L.B. Attacks and defenses for generative diffusion models: A comprehensive survey. ACM Comput. Surv. 2025, 57, 1–44. [Google Scholar] [CrossRef]
- Chou, S.Y.; Chen, P.Y.; Ho, T.Y. How to backdoor diffusion models? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2023; pp. 4015–4024. [Google Scholar]
- Chen, W.; Song, D.; Li, B. TrojDiff: Trojan attacks on diffusion models with diverse targets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2023; pp. 4035–4044. [Google Scholar]
- Chou, S.Y.; Chen, P.Y.; Ho, T.Y. VillanDiffusion: A unified backdoor attack framework for diffusion models. Adv. Neural Inf. Process. Syst. 2024, 36, 33912–33964. [Google Scholar]
- An, S.; Chou, S.Y.; Zhang, K.; Xu, Q.; Tao, G.; Shen, G.; Cheng, S.; Ma, S.; Chen, P.-Y.; Ho, T.-Y.; et al. Elijah: Eliminating backdoors injected in diffusion models via distribution shift. Proc. AAAI Conf. Artif. Intell. 2024, 38, 10847–10855. [Google Scholar] [CrossRef]
- Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2021, arXiv:2010.02502. [Google Scholar] [CrossRef]
- Lu, C.; Zhou, Y.; Bao, F.; Chen, J.; Li, C.; Zhu, J. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. Adv. Neural Inf. Process. Syst. 2022, 35, 5775–5787. [Google Scholar]
- Lu, C.; Zhou, Y.; Bao, F.; Chen, J.; Li, C.; Zhu, J. DPM-Solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv 2022, arXiv:2211.01095. [Google Scholar] [CrossRef]
- Zhao, W.; Bai, L.; Rao, Y.; Zhou, J.; Lu, J. UniPC: A unified predictor-corrector framework for fast sampling of diffusion models. Adv. Neural Inf. Process. Syst. 2023, 36, 49842–49869. [Google Scholar]
- Karras, T.; Aittala, M.; Aila, T.; Laine, S. Elucidating the design space of diffusion-based generative models. Adv. Neural Inf. Process. Syst. 2022, 35, 26565–26577. [Google Scholar]
- Song, Y.; Ermon, S. Generative modeling by estimating gradients of the data distribution. Adv. Neural Inf. Process. Syst. 2019, 32, 11918–11930. [Google Scholar]
- Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-based generative modeling through stochastic differential equations. arXiv 2021, arXiv:2011.13456. [Google Scholar] [CrossRef]
- Liu, K.; Dolan-Gavitt, B.; Garg, S. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions and Defenses; Springer: Cham, Switzerland, 2018; pp. 273–294. [Google Scholar]
- Guan, Z.; Hu, M.; Li, S.; Vullikanti, A.K. UFID: A unified framework for black-box input-level backdoor detection on diffusion models. Proc. AAAI Conf. Artif. Intell. 2025, 39, 27312–27320. [Google Scholar] [CrossRef]
- Wang, B.; Gu, X.; Xu, H.; Li, H.; Yu, Z.; Zhou, J.; Wang, W. Backdoor Sentinel: Detecting and detoxifying backdoors in diffusion models via temporal noise consistency. arXiv 2026, arXiv:2602.01765. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. arXiv 2018, arXiv:1710.10196. [Google Scholar] [CrossRef]
- Liu, L.; Ren, Y.; Lin, Z.; Zhao, Z. Pseudo numerical methods for diffusion models on manifolds. arXiv 2022, arXiv:2202.09778. [Google Scholar] [CrossRef]
- Zhang, Q.; Chen, Y. Fast sampling of diffusion models with exponential integrator. arXiv 2023, arXiv:2204.13902. [Google Scholar] [CrossRef]












| Inversion Method | Inversion Loss | Trigger MSE | Detection Accuracy |
|---|---|---|---|
| Single-Step | 0.234 | 0.156 | 92% |
| Multi-Stage (2 Steps) | 0.089 | 0.043 | 98% |
| Multi-Stage (3 Steps) | 0.042 | 0.018 | 99% |
| Attack Method | Model Architecture | Accuracy w/Ref | Accuracy w/o Ref |
|---|---|---|---|
| BadDiff | DDPM-C | 99.8% | 98.5% |
| BadDiff | DDPM-A | 100% | 99.2% |
| TrojDiff | DDPM-C | 99.5% | 96.0% |
| TrojDiff | DDIM-C | 99% | 97.5% |
| VillanDiff | NCSN-C | 100% | 98.8% |
| VillanDiff | LDM-A | 100% | 99.0% |
| Attack Method | ΔASR | ΔSSIM | ΔFID |
|---|---|---|---|
| BadDiff | −0.99 | −0.98 | +0.03 |
| TrojDiff | −0.98 | −0.96 | +0.04 |
| VillanDiff | −0.97 | −0.95 | +0.02 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yang, K.; Gu, X.; An, F.; Ye, J.; Zhang, Z. A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models. Appl. Sci. 2026, 16, 5077. https://doi.org/10.3390/app16105077
Yang K, Gu X, An F, Ye J, Zhang Z. A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models. Applied Sciences. 2026; 16(10):5077. https://doi.org/10.3390/app16105077
Chicago/Turabian StyleYang, Kairui, Xu Gu, Fanglin An, Jun Ye, and Zhengqi Zhang. 2026. "A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models" Applied Sciences 16, no. 10: 5077. https://doi.org/10.3390/app16105077
APA StyleYang, K., Gu, X., An, F., Ye, J., & Zhang, Z. (2026). A Unified Framework Based on Distribution Shift Modeling for Revealing and Eliminating Backdoor Attacks in Diffusion Models. Applied Sciences, 16(10), 5077. https://doi.org/10.3390/app16105077

