Utilizing Latent Diffusion Model to Accelerate Sampling Speed and Enhance Text Generation Quality
Abstract
:1. Introduction
- Enriching the ChnSentiCorp dataset by adding topic attributes and expanding the binary emotion classification into seven finer-grained emotions.
- Leveraging the capabilities of a pretrained decoder-encoder to encode text into a lower-dimensional latent vector space, solving the embedding conversion process from discrete text to a continuous space, and circumventing rounding loss problems inherent in traditional methods.
- Designing a sequence diffusion process without classifier guidance, mapping the controllable text generation to a Seq2Seq task, and directly performing the diffusion process on low-dimensional vectors in latent space. This avoids the generation quality degradation problems associated with the introduction of classifiers.
- Incorporating theme and emotion information, the generated texts become closer to the intended topics with fine-grained emotional expression.
2. Related Works
2.1. Text Generation Based on Pretrained Models
2.2. Text Generation Based on Diffusion Models
3. Materials and Methods
3.1. Using BART for Text Sequence Encoding and Decoding
3.2. Sequence Diffusion Process in Latent Space
3.3. Context-Guided Strategy Based on Prompt
4. Experiment and Result Analysis
4.1. Dataset
4.2. Evaluation Metrics
4.3. Experimental Results Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y.N. Convolutional sequence to sequence learning. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1243–1252. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar] [CrossRef]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
- Fröhling, L.; Zubiaga, A. Feature-based detection of automated language models: Tackling GPT-2, GPT-3 and Grover. PeerJ Comput. Sci. 2021, 7, e443. [Google Scholar] [CrossRef]
- Zhang, H.; Song, H.; Li, S.; Zhou, M.; Song, D. A survey of controllable text generation using transformer-based pre-trained language models. ACM Comput. Surv. 2023, 56, 1–37. [Google Scholar] [CrossRef]
- Sarzynska-Wawer, J.; Wawer, A.; Pawlak, A.; Szymanowska, J.; Stefaniak, I.; Jarkiewicz, M.; Okruszek, L. Detecting formal thought disorder by deep contextualized word representations. Psychiatry Res. 2021, 304, 114135. [Google Scholar] [CrossRef]
- Yang, K.; Liu, D.; Lei, W.; Yang, B.; Xue, M.; Chen, B.; Xie, J. Tailor: A prompt-based approach to attribute-based controlled text generation. arXiv 2022, arXiv:2204.13362. [Google Scholar]
- Zhao, T.; Zhao, R.; Eskenazi, M. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. arXiv 2017, arXiv:1703.10960. [Google Scholar]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
- Ghosh, S.; Chollet, M.; Laksana, E.; Morency, L.-P.; Scherer, S. Affect-lm: A neural language model for customizable affective text generation. arXiv 2017, arXiv:1704.06851. [Google Scholar] [CrossRef]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; pp. 38–45. [Google Scholar] [CrossRef]
- Keskar, N.S.; McCann, B.; Varshney, L.R.; Xiong, C.; Socher, R. Ctrl: A conditional transformer language model for controllable generation. arXiv 2019, arXiv:1909.05858. [Google Scholar]
- Dathathri, S.; Madotto, A.; Lan, J.; Hung, J.; Frank, E.; Molino, P.; Yosinski, J.; Liu, R. Plug and play language models: A simple approach to controlled text generation. arXiv 2019, arXiv:1912.02164. [Google Scholar]
- Yang, K.; Klein, D. FUDGE: Controlled text generation with future discriminators. arXiv 2021, arXiv:2104.05218. [Google Scholar]
- Sohl-Dickstein, J.; Weiss, E.A.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2256–2265. [Google Scholar]
- Song, Y.; Ermon, S. Generative modeling by estimating gradients of the data distribution. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Metz, L.; Poole, B.; Pfau, D.; Sohl-Dickstein, J. Unrolled generative adversarial networks. arXiv 2016, arXiv:1611.02163. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar] [CrossRef]
- Oppenlaender, J. The creativity of text-to-image generation. In Proceedings of the 25th International Academic Mindtrek Conference, Tampere, Finland, 16–18 November 2022; pp. 192–202. [Google Scholar] [CrossRef]
- Nichol, A.Q.; Dhariwal, P. Improved denoising diffusion probabilistic models. In Proceedings of the 2021 International Conference on Machine Learning, Virtual, 18–24 July 2021; ACM: New York, NY, USA, 2021; pp. 8162–8171. [Google Scholar]
- Tashiro, Y.; Song, J.; Song, Y.; Ermon, S. Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Adv. Neural Inf. Process. Syst. 2021, 34, 24804–24816. [Google Scholar]
- Nichol, A.; Dhariwal, P.; Ramesh, A.; Shyam, P.; Mishkin, P.; McGrew, B.; Sutskever, I.; Chen, M. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv 2021, arXiv:2112.10741. [Google Scholar]
- Kong, Z.; Ping, W.; Huang, J.; Zhao, K.; Catanzaro, B. Diffwave: A versatile diffusion model for audio synthesis. arXiv 2020, arXiv:2009.09761. [Google Scholar]
- Austin, J.; Johnson, D.D.; Ho, J.; Tarlow, D.; van den Berg, R. Structured denoising diffusion models in discrete state-spaces. Adv. Neural Inf. Process. Syst. 2021, 34, 17981–17993. [Google Scholar]
- Li, X.L.; Thickstun, J.; Gulrajani, I.; Liang, P.; Hashimoto, T.B. Diffusion-lm improves controllable text generation. Adv. Neural Inf. Process. Syst. 2022, 35, 4328–4343. [Google Scholar] [CrossRef]
- Gong, S.; Li, M.; Feng, J.; Wu, Z.; Kong, L. Diffuseq: Sequence to sequence text generation with diffusion models. arXiv 2022, arXiv:2210.08933. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 5485–5551. [Google Scholar]
- Lix, Q.; Wang, S.; Wang, Z.J.; Zhuj, W. Overview of natural language generation. J. Comput. Appl. 2021, 41, 1227–1235. [Google Scholar] [CrossRef]
- Liu, X.-M.; Zhang, Z.-H.; Yang, C.-Y. Adversarial techniques for online social network text content. J. Comput. Appl. 2022, 45, 1571–1597. [Google Scholar] [CrossRef]
- Li, J.; Tang, T.; Nie, J.-Y.; Wen, J.-R.; Zhao, X. Learning to transfer prompts for text generation. arXiv 2022, arXiv:2205.01543. [Google Scholar]
- Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.; Zhang, W.; Cui, B.; Yang, M.-H. Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv. 2023, 56, 1–39. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
- Zhao, L.; Zheng, K.; Zheng, Y.; Zhao, D.; Zhou, J. RLEG: Vision-language representation learning with diffusion-based embedding generation. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 42247–42258. [Google Scholar]
- Strudel, R.; Tallec, C.; Altché, F.; Du, Y.; Ganin, Y.; Mensch, A.; Grathwohl, W.; Savinov, N.; Dieleman, S.; Sifre, L.; et al. Self-conditioned embedding diffusion for text generation. arXiv 2022, arXiv:2211.04236. [Google Scholar]
- Gao, Z.; Guo, J.; Tan, X.; Zhu, Y.; Zhang, F.; Bian, J.; Xu, L. Difformer: Empowering diffusion model on embedding space for text generation. arXiv 2022, arXiv:2212.09412. [Google Scholar]
- Lin, Y.; Ji, H.; Liu, Z.; Sun, M. Denoising distantly supervised open-domain question answering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; Volume 1: Long Papers. pp. 1736–1745. [Google Scholar] [CrossRef]
- Li, M.; Long, Y.; Lu, Q.; Li, W. Emotion corpus construction based on selection from hashtags. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 23–28 May 2016; Springer: Cham, Switzerland, 2016; pp. 1845–1849. [Google Scholar]
- Yuan, J.; Cheng, L.; He, R.; Li, Y.; Bing, L.; Wei, Z.; Liu, Q.; Shen, C.; Zhang, S.; Sun, C.; et al. Overview of argumentative text understanding for ai debater challenge. In Proceedings of the 2021 International Conference on Natural Language Processing and Chinese Computing, Qingdao, China, 13–17 October 2021; Springer: Cham, Switzerland, 2021; pp. 548–568. [Google Scholar] [CrossRef]
- Zhu, Y.; Lu, S.; Zheng, L.; Guo, J.; Zhang, W.; Wang, J.; Yu, Y. Texygen: A benchmarking platform for text generation models. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 1097–1100. [Google Scholar] [CrossRef]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.-J. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 311–318. [Google Scholar] [CrossRef]
- Reiter, E. A structured review of the validity of BLEU. Comput. Linguist. 2018, 44, 393–401. [Google Scholar] [CrossRef]
- Wieting, J.; Berg-Kirkpatrick, T.; Gimpel, K.; Neubig, G. Beyond BLEU: Training neural machine translation with semantic similarity. arXiv 2019, arXiv:1909.06694. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. Bertscore: Evaluating text generation with bert. arXiv 2019, arXiv:1904.09675. [Google Scholar]
- Hanna, M.; Bojar, O. A fine-grained analysis of BERTScore. In Proceedings of the Sixth Conference on Machine Translation, Online, 10–11 November 2021; pp. 507–517. [Google Scholar]
- Meister, C.; Cotterell, R. Language Model Evaluation Beyond Perplexity. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual, 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 5328–5339. [Google Scholar] [CrossRef]
- Chen, J.; Zhang, A.; Li, M.; Smola, A.; Yang, D. A cheaper and better diffusion language model with soft-masked noise. arXiv 2023, arXiv:2304.04746. [Google Scholar]
- Yuan, H.; Yuan, Z.; Tan, C.; Huang, F.; Huang, S. Seqdiffuseq: Text diffusion with encoder-decoder transformers. arXiv 2022, arXiv:2212.10325. [Google Scholar]
- Tang, Z.; Wang, P.; Zhou, K.; Li, J.; Cao, Z.; Zhang, M. Can Diffusion Model Achieve Better Performance in Text Generation? Bridging the Gap between Training and Inference! arXiv 2023, arXiv:2305.04465. [Google Scholar]
- Wiseman, S.; Rush, A.M. Sequence-to-sequence learning as beam-search optimization. arXiv 2016, arXiv:1606.02960. [Google Scholar]
- Li, C.; Zhang, L.; Zheng, Q.; Zhao, Z.; Chen, Z. User Preference Prediction for online dialogue systems based on pre-trained large model. In Proceedings of the 2023 International Conference on Natural Language Processing and Chinese Computing, Foshan, China, 12–15 October 2023; Springer: Cham, Switzerland, 2023; pp. 349–357. [Google Scholar] [CrossRef]
- Jolicoeur-Martineau, A.; Li, K.; Piché-Taillefer, R.; Kachman, T.; Mitliagkas, I. Gotta go fast when generating data with score-based models. arXiv 2021, arXiv:2105.14080. [Google Scholar]
- Salimans, T.; Ho, J. Progressive distillation for fast sampling of diffusion models. arXiv 2022, arXiv:2202.00512. [Google Scholar]
- Kim, B.; Ye, J.C. Denoising MCMC for accelerating diffusion-based generative models. arXiv 2022, arXiv:2209.14593. [Google Scholar]
Computer | Hotel | Books | Total |
---|---|---|---|
2741 | 3910 | 2773 | 9090 |
Anger | Disgust | Sadness | Fear | Surprise | Liking | Happiness | Total |
---|---|---|---|---|---|---|---|
824 | 749 | 2741 | 13 | 228 | 2435 | 2100 | 9090 |
Method | Text Embedding Method | Time Steps | Dataset A | Dataset B |
---|---|---|---|---|
ppl↓ | ppl↓ | |||
LaDiffuSeq | BART | 32 | 223.57 | 145.93 |
64 | 97.88 | 83.65 | ||
256 | 43.525 | 44.59 | ||
512 | 31.085 | 38.635 | ||
D3PM | none | 512 | 225.15 | 152.75 |
Diffusion-LM | end-to-end | 2000 | 196.164 | 130.145 |
DiffuSeq | end-to-end | 2000 | 34.418 | 43.197 |
SeqDiffuSeq | end-to-end | 2000 | 67.877 | 47.917 |
GPT-2 | GPT | 1 | 38.7 | 35.78 |
Method | Dataset A | Dataset B | ||||||
---|---|---|---|---|---|---|---|---|
bleu↑ | self_bleu↓ | BERTScore↑ | ppl↓ | bleu↑ | self_bleu↓ | BERTScore↑ | ppl↓ | |
Diffusion-LM | 0.256 | 0.402 | 0.547 | 196.164 | 0.268 | 0.451 | 0.587 | 130.145 |
DiffuSeq | 0.478 | 0.499 | 0.567 | 34.418 | 0.875 | 0.917 | 0.930 | 43.197 |
SeqDiffuSeq | 0.476 | 0.571 | 0.589 | 67.877 | 0.501 | 0.627 | 0.711 | 47.917 |
LaDiffuSeq (no prompt) | 0.493 | 0.481 | 0.670 | 33.332 | 0.882 | 0.485 | 0.932 | 39.733 |
LaDiffuSeq | 0.501 | 0.476 | 0.672 | 31.085 | 0.899 | 0.473 | 0.939 | 38.635 |
Control Attributes | bleu↑ | self_bleu↓ | BERTScore↑ | ppl↓ |
---|---|---|---|---|
unconditional | 0.505 | 0.469 | 0.675 | 31.069 |
3 themes | 0.503 | 0.469 | 0.677 | 31.372 |
2 emotions | 0.502 | 0.472 | 0.677 | 31.493 |
7 emotions | 0.499 | 0.478 | 0.676 | 31.037 |
3 themes+2 emotions | 0.503 | 0.473 | 0.673 | 31.100 |
3 themes+7emotions | 0.501 | 0.476 | 0.672 | 31.085 |
Control Attributes | bleu↑ | self_bleu↓ | BERTScore↑ | ppl↓ |
---|---|---|---|---|
unconditional | 0.892 | 0. 473 | 0. 939 | 38.352 |
debate topic | 0.895 | 0. 472 | 0. 933 | 38.424 |
stance | 0.899 | 0. 476 | 0. 931 | 38.550 |
debate topic stance | 0.899 | 0.473 | 0.939 | 38.635 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, C.; Zhang, L.; Zheng, Q. Utilizing Latent Diffusion Model to Accelerate Sampling Speed and Enhance Text Generation Quality. Electronics 2024, 13, 1093. https://doi.org/10.3390/electronics13061093
Li C, Zhang L, Zheng Q. Utilizing Latent Diffusion Model to Accelerate Sampling Speed and Enhance Text Generation Quality. Electronics. 2024; 13(6):1093. https://doi.org/10.3390/electronics13061093
Chicago/Turabian StyleLi, Chenyang, Long Zhang, and Qiusheng Zheng. 2024. "Utilizing Latent Diffusion Model to Accelerate Sampling Speed and Enhance Text Generation Quality" Electronics 13, no. 6: 1093. https://doi.org/10.3390/electronics13061093
APA StyleLi, C., Zhang, L., & Zheng, Q. (2024). Utilizing Latent Diffusion Model to Accelerate Sampling Speed and Enhance Text Generation Quality. Electronics, 13(6), 1093. https://doi.org/10.3390/electronics13061093