# How Much Is Enough? A Study on Diffusion Times in Score-Based Generative Models

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

**both**for training and for sampling. This would suggest to choose a smaller diffusion time. Given the importance of this problem, in this work, we set off to study the existence of suitable operating regimes to strike the right balance between computational efficiency and model quality. The main contributions of this work are the following:

**Contribution 1:**

**Contribution 2:**

## 2. A Tradeoff on Diffusion Time

#### 2.1. Preliminaries: The ELBO Decomposition

#### 2.2. The Tradeoff on Diffusion Time

**Table 1.**Two main families of diffusion processes, where ${\sigma}^{2}\left(t\right)={\left(\right)}^{\frac{{\sigma}_{\mathit{max}}^{2}}{{\sigma}_{\mathit{min}}^{2}}}t$ and $\beta \left(t\right)={\beta}_{0}+({\beta}_{1}-{\beta}_{0})t$.

Diffusion Process | $\mathit{p}({\mathit{x}}_{\mathit{t}},\mathit{t}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{\mathit{x}}_{0})=\mathcal{N}(\mathit{m},\mathit{s}\mathit{I})$ | ${\mathit{p}}_{\mathit{n}\mathit{o}\mathit{i}\mathit{s}\mathit{e}}\left(\mathit{x}\right)$ | |
---|---|---|---|

Variance Exploding | $\alpha \left(t\right)=0$, $g\left(t\right)=\sqrt{\frac{{\sigma}^{2}\left(t\right)}{\mathrm{d}t}}$ | $m={\mathit{x}}_{0}$, $\mathit{s}={\sigma}^{2}\left(t\right)-{\sigma}^{2}\left(0\right)$ | $\mathcal{N}(0,({\sigma}^{2}\left(T\right)-{\sigma}^{2}\left(0\right))\mathit{I})$ |

Variance Preserving | $\alpha \left(t\right)=-\frac{1}{2}\beta \left(t\right)$, $g\left(t\right)=\sqrt{\beta \left(t\right)}$ | $m={e}^{-\frac{1}{2}{\int}_{0}^{t}\beta \left(\mathrm{d}\tau \right)}{\mathit{x}}_{0}$, $\mathit{s}=1-{e}^{-{\int}_{0}^{t}\beta \left(\mathrm{d}\tau \right)}$ | $\mathcal{N}(0,\mathit{I})$ |

**Lemma 1.**

**Definition 1.**

**optimal score**${\widehat{\mathit{s}}}_{\mathsf{\theta}}$ for any diffusion time T, as the score obtained using parameters that minimize $\mathit{I}({\mathit{s}}_{\mathsf{\theta}},T)$. Similarly, we define the

**optimal score gap**$\mathcal{G}({\widehat{\mathit{s}}}_{\mathsf{\theta}},T)$ for any diffusion time T, as the gap attained when using the optimal score.

**Note that Section 2.2 does not imply that $\mathcal{G}({\mathit{s}}_{{\mathsf{\theta}}_{a}},{T}_{2})\ge \mathcal{G}({\mathit{s}}_{{\mathsf{\theta}}_{b}},{T}_{1})$ holds for generic parameters ${\mathsf{\theta}}_{a},{\mathsf{\theta}}_{b}$.**

#### 2.3. Is There an Optimal Diffusion Time?

**our final objective in this work is not to find and use an optimal diffusion time**. Instead, our result on the existence of optimal diffusion times (which can be smaller than the ones set by than popular heuristics) serves the purpose of motivating the choice of small diffusion times, which, however, calls for a method to overcome approximation errors. For completeness, in Appendix H, we show that optimizing the ELBO to obtain an optimal diffusion time ${T}^{\u2605}$ is technically feasible, without resorting to exhaustive grid search.

#### 2.4. Relation with Diffusion Process Noise Schedule

#### 2.5. Relation with Literature on Bounds and Goodness of Score Assumptions

**always**a non decreasing function of T. First, we question whether the current best practice for setting diffusion times is adequate: we find that, in realistic implementations, diffusion times are larger than necessary. Second, we introduce a new approach, with provably the same performance of standard diffusion models but lower computational complexity, as highlighted in Section 3.

## 3. A New, Practical Method for Decreasing Diffusion Times

**both**a small gap $\mathcal{G}({\mathit{s}}_{\mathsf{\theta}},T)$ and a small discrepancy $\mathrm{KL}\left(\right)open="["\; close="]">p(\mathit{x},T)\phantom{\rule{0.277778em}{0ex}}\parallel \phantom{\rule{0.277778em}{0ex}}{p}_{\mathit{n}\mathit{o}\mathit{i}\mathit{s}\mathit{e}}\left(\mathit{x}\right)$. Before that, let us use Figure 3 to summarize all densities involved and the effects of the various approximations, which will be useful to visualize our proposal.

#### 3.1. Auxiliary Model Fitting and Guarantees

**Proposition 1.**

**Proposition 2.**

#### 3.2. Comparison with Schrödinger Bridges

#### 3.3. An Extension for Density Estimation

## 4. Experiments

**version of the methods the number of sampling steps can decrease linearly with**T, in accordance with theory [45], while retaining good BPD and FIDscores. Similarly, although not in a linear fashion, the number of steps of the ODE samplers can be reduced by using a smaller diffusion time T.

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Appendix A. Generic Definitions and Assumptions

## Appendix B. Deriving Equation (4) from [32]

## Appendix C. Proof of Equation (5)

**Proof.**

## Appendix D. Proof of Equation (8)

**Proof.**

## Appendix E. Proof of Lemma 1

#### Appendix E.1. The Variance Preserving (VP) Convergence

#### Appendix E.2. The Variance Exploding (VE) Convergence

## Appendix F. Proof for the Optimal Score Gap Term, Section 2.2

**Proof.**

## Appendix G. Proof of Section 2.3

**Proof.**

## Appendix H. Optimization of T ^{★}

## Appendix I. Proof of Proposition 2

**Proof.**

## Appendix J. Invariance to Noise Schedule

#### Appendix J.1. Preliminaries

#### Appendix J.2. Different Noise Schedules

**Theorem A1.**

**Proof.**

**Theorem A2.**

**Proof.**

## Appendix K. Experimental Details

#### Appendix K.1. Toy Example Details

#### Appendix K.2. Section 4 Details

#### Appendix K.3. Varying T

## Appendix L. Non-Curated Samples

**Figure A10.**CELEBA images.

**Top Left**: our method with pretrained score model and Glow ($T=0.2$),

**Top Right**: our method with pretrained score model and Glow ($T=0.5$) and

**Bottom Left**: baseline diffusion ($T=1.0$).

**Bottom Right**: FIDscores for our method and baseline ($T=1.0$).

## References

- Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
- Song, Y.; Ermon, S. Generative Modeling by Estimating Gradients of the Data Distribution. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-Based Generative Modeling through Stochastic Differential Equations. In Proceedings of the International Conference on Learning Representations, Virtual, 30 April–3 May 2021. [Google Scholar]
- Vahdat, A.; Kreis, K.; Kautz, J. Score-based Generative Modeling in Latent Space. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
- Kingma, D.; Salimans, T.; Poole, B.; Ho, J. Variational Diffusion Models. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–12 December 2020. [Google Scholar]
- Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the International Conference on Learning Representations, Virtual, 30 April–3 May 2021. [Google Scholar]
- Kong, Z.; Ping, W.; Huang, J.; Zhao, K.; Catanzaro, B. DiffWave: A Versatile Diffusion Model for Audio Synthesis. In Proceedings of the International Conference on Learning Representations, Virtual, 30 April–3 May 2021. [Google Scholar]
- Lee, S.G.; Kim, H.; Shin, C.; Tan, X.; Liu, C.; Meng, Q.; Qin, T.; Chen, W.; Yoon, S.; Liu, T.Y. PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Dhariwal, P.; Nichol, A. Diffusion Models Beat GANs on Image Synthesis. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
- Nichol, A.Q.; Dhariwal, P. Improved Denoising Diffusion Probabilistic Models. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021. [Google Scholar]
- Tashiro, Y.; Song, J.; Song, Y.; Ermon, S. CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Kingma, D.P.; Salimans, T.; Jozefowicz, R.; Chen, X.; Sutskever, I.; Welling, M. Improved Variational Inference with Inverse Autoregressive Flow. In Proceedings of the Advances in Neural Information Processing Systems 29, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Tran, B.H.; Rossi, S.; Milios, D.; Michiardi, P.; Bonilla, E.V.; Filippone, M. Model Selection for Bayesian Autoencoders. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
- Anderson, B.D. Reverse-Time Diffusion Equation Models. Stoch. Process. Their Appl.
**1982**, 12, 313–326. [Google Scholar] [CrossRef][Green Version] - Song, Y.; Durkan, C.; Murray, I.; Ermon, S. Maximum Likelihood Training of Score-Based Diffusion Models. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
- Särkkä, S.; Solin, A. Applied Stochastic Differential Equations; Institute of Mathematical Statistics Textbooks, Cambridge University Press: Cambridge, UK, 2019. [Google Scholar] [CrossRef][Green Version]
- Zheng, H.; He, P.; Chen, W.; Zhou, M. Truncated Diffusion Probabilistic Models. CoRR 2022. abs/2202.09671. Available online: http://xxx.lanl.gov/abs/2202.09671 (accessed on 28 March 2023).
- Austin, J.; Johnson, D.D.; Ho, J.; Tarlow, D.; van den Berg, R. Structured Denoising Diffusion Models in Discrete State-Spaces. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
- Jolicoeur-Martineau, A.; Li, K.; Piché-Taillefer, R.; Kachman, T.; Mitliagkas, I. Gotta Go Fast When Generating Data with Score-Based Models. CoRR 2021. abs/2105.14080. Available online: http://xxx.lanl.gov/abs/2105.14080 (accessed on 28 March 2023).
- Salimans, T.; Ho, J. Progressive Distillation for Fast Sampling of Diffusion Models. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Xiao, Z.; Kreis, K.; Vahdat, A. Tackling the Generative Learning Trilemma with Denoising Diffusion GANs. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Watson, D.; Ho, J.; Norouzi, M.; Chan, W. Learning to Efficiently Sample from Diffusion Probabilistic Models. CoRR 2021. abs/2106.03802. Available online: http://xxx.lanl.gov/abs/2106.03802 (accessed on 28 March 2023).
- Dockhorn, T.; Vahdat, A.; Kreis, K. Score-Based Generative Modeling with Critically-Damped Langevin Diffusion. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022. [Google Scholar]
- Bao, F.; Li, C.; Zhu, J.; Zhang, B. Analytic-DPM: An Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- De Bortoli, V.; Thornton, J.; Heng, J.; Doucet, A. Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
- De Bortoli, V. Convergence of denoising diffusion models under the manifold hypothesis. arXiv
**2022**, arXiv:2208.05314. [Google Scholar] - Lee, H.; Lu, J.; Tan, Y. Convergence for score-based generative modeling with polynomial complexity. arXiv
**2022**, arXiv:2206.06227. [Google Scholar] - Huang, C.W.; Lim, J.H.; Courville, A.C. A Variational Perspective on Diffusion-Based Generative Models and Score Matching. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
- Villani, C. Optimal Transport: Old and New; Springer: Berlin/Heidelberg, Germany, 2009; Volume 338. [Google Scholar]
- Chen, S.; Chewi, S.; Li, J.; Li, Y.; Salim, A.; Zhang, A.R. Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions. arXiv
**2022**, arXiv:2209.11215. [Google Scholar] - Chen, Y.; Georgiou, T.T.; Pavon, M. Stochastic control liaisons: Richard sinkhorn meets gaspard monge on a schrodinger bridge. SIAM Rev.
**2021**, 63, 249–313. [Google Scholar] [CrossRef] - Chen, T.; Liu, G.H.; Theodorou, E. Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural Ordinary Differential Equations. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Grathwohl, W.; Chen, R.T.Q.; Bettencourt, J.; Duvenaud, D. Scalable Reversible Generative Models with Free-form Continuous Dynamics. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Kynkäänniemi, T.; Karras, T.; Aittala, M.; Aila, T.; Lehtinen, J. The Role of ImageNet Classes in Fréchet Inception Distance. CoRR 2022. abs/2203.06026. Available online: http://xxx.lanl.gov/abs/2203.06026 (accessed on 28 March 2023).
- Theis, L.; van den Oord, A.; Bethge, M. A Note on the Evaluation of Generative Models. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Rasmussen, C. The Infinite Gaussian Mixture Model. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 29 November–4 December 1999. [Google Scholar]
- Görür, D.; Edward Rasmussen, C. Dirichlet Process Gaussian Mixture Models: Choice of the Base Distribution. J. Comput. Sci. Technol.
**2010**, 25, 653–664. [Google Scholar] [CrossRef][Green Version] - Kingma, D.P.; Dhariwal, P. Glow: Generative Flow with Invertible 1 × 1 Convolutions. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Hoogeboom, E.; Gritsenko, A.A.; Bastings, J.; Poole, B.; van den Berg, R.; Salimans, T. Autoregressive Diffusion Models. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Kloeden, P.E.; Platen, E. Numerical Solution of Stochastic Differential Equations; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
- Karras, T.; Aittala, M.; Aila, T.; Laine, S. Elucidating the Design Space of Diffusion-Based Generative Models. arXiv
**2022**, arXiv:2206.00364. [Google Scholar]

**Figure 1.**Effect of T on a toy model: low diffusion times are detrimental for sample quality (likelihood of 1024 samples as median and 95 quantile, on 8 random seeds).

**Figure 2.**ELBO decomposition, ELBO and likelihood for a 1D toy model, as a function of diffusion time T. Tradeoff and optimality numerical results confirm our theory.

**Figure 3.**Intuitive illustration of the forward and backward diffusion processes. Discrepancies between distributions are illustrated as distances. Color coding is discussed in the text.

**Figure 4.**Complexity of the auxiliary model as function of diffusion time (reported median and 95 quantiles on 4 random seeds).

**Figure 5.**Visualization of some samples. Top to Bottom: ScoreSDE [3] ($T=1$, BPD $=1.16$), ScoreSDE ($T=0.4$, BPD $=1.25$), Our ($T=0.4$, BPD $=1.17$).

**Figure 7.**Training curves of score models for different diffusion time T, recorded during the span of $1.3$ million iterations.

Dataset | Time T | BPD (↓) |
---|---|---|

MNIST | $1.0$ | $1.16$ |

$0.6$ | $\mathbf{1}.\mathbf{16}$ | |

$0.4$ | $1.25$ | |

$0.2$ | $1.75$ | |

CIFAR10 | $1.0$ | $3.09$ |

$0.6$ | $\mathbf{3}.\mathbf{07}$ | |

$0.4$ | $3.09$ | |

$0.2$ | $3.38$ |

**Table 3.**Experiment results on MNIST. For our method, (S) is for the extension in Section 3.3.

NFE (↓) | BPD (↓) | ||

Model | (ODE) | ||

ScoreSDE | 300 | $1.16$ | |

ScoreSDE ($T=0.6$) | 258 | $1.16$ | |

Our ($T=0.6$) | 258 | $1.16$ | $1.14$ (S) |

ScoreSDE ($T=0.4$) | 235 | $1.25$ | |

Our ($T=0.4$) | 235 | $1.17$ | $1.16$ (S) |

ScoreSDE ($T=0.2$) | 191 | $1.75$ | |

Our ($T=0.2$) | 191 | $1.33$ | $1.31$ (S) |

**Table 4.**Experimental results on CIFAR10, including other relevant baselines and sampling efficiency enhancements from the literature.

FID $(\downarrow )$ | BPD $(\downarrow )$ | NFE $(\downarrow )$ | NFE $(\downarrow )$ | |

Model | (SDE) | (ODE) | ||

ScoreSDE [3] | $3.64$ | $3.09$ | 1000 | 221 |

ScoreSDE ($T=0.6$) | $5.74$ | $3.07$ | 600 | 200 |

ScoreSDE ($T=0.4$) | $24.91$ | $3.09$ | 400 | 187 |

ScoreSDE ($T=0.2$) | $339.72$ | $3.38$ | 200 | 176 |

Our ($T=0.6$) | $3.72$ | $3.07$ | 600 | 200 |

Our ($T=0.4$) | $5.44$ | $3.06$ | 400 | 187 |

Our ($T=0.2$) | $14.38$ | $3.06$ | 200 | 176 |

ARDM [44] | − | $2.69$ | 3072 | |

VDM [5] | $4.0$ | $2.49$ | 1000 | |

D3PMs [21] | $7.34$ | $3.43$ | 1000 | |

DDPM [6] | $3.21$ | $3.75$ | 1000 | |

Gotta Go Fast [22] | $2.44$ | − | 180 | |

LSGM [4] | $2.10$ | $2.87$ | $120/138$ | |

ARDM-P [44] | − | $2.68/2.74$ | $200/50$ |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Franzese, G.; Rossi, S.; Yang, L.; Finamore, A.; Rossi, D.; Filippone, M.; Michiardi, P.
How Much Is Enough? A Study on Diffusion Times in Score-Based Generative Models. *Entropy* **2023**, *25*, 633.
https://doi.org/10.3390/e25040633

**AMA Style**

Franzese G, Rossi S, Yang L, Finamore A, Rossi D, Filippone M, Michiardi P.
How Much Is Enough? A Study on Diffusion Times in Score-Based Generative Models. *Entropy*. 2023; 25(4):633.
https://doi.org/10.3390/e25040633

**Chicago/Turabian Style**

Franzese, Giulio, Simone Rossi, Lixuan Yang, Alessandro Finamore, Dario Rossi, Maurizio Filippone, and Pietro Michiardi.
2023. "How Much Is Enough? A Study on Diffusion Times in Score-Based Generative Models" *Entropy* 25, no. 4: 633.
https://doi.org/10.3390/e25040633