AI Clothing Pattern Generation: Combining Improved Pix2Pix Image Generation and Diffusion Model Repairing
Abstract
1. Introduction
2. Related Work
2.1. GAN Image Generation
2.2. Quality Repairing for Diffusion Model
2.3. GAN and Diffusion Model Fusion Paradigms
2.4. Discussion of Recent Garment Generation Models
3. Method
3.1. Improved Pix2Pix Model
3.2. Diffusion Probabilistic Models
3.3. Intrinsic Collaborative Mechanism of the Two-Stage Framework
- (1)
- Training Stage: Data Augmentation and Difficulty Decoupling for Collaborative Optimization. First, the improved Pix2Pix module is trained on the original small-scale labeled paired dataset to learn the structural mapping from design sketches to pattern diagrams. After training is completed, the Pix2Pix model generates a large number of coarse-grained pattern samples with complete basic structures, which are combined with the original real patterns to form an augmented dataset for the diffusion model. This expands the training data volume from the original 990 samples to 3120 augmented samples, effectively mitigating the overfitting and unstable training issues of the diffusion model on small datasets. At the same time, this two-stage framework achieves a decoupling of learning difficulty: the Pix2Pix module is solely responsible for learning the core geometric structure and topological mapping, while the diffusion module is solely responsible for learning detail refinement and line optimization given the structural prior. This decoupling avoids the problem of a single model having to learn both global structure and local details simultaneously, which can lead to convergence difficulties and performance degradation, and significantly reduces the learning difficulty of each module.
- (2)
- Inference Stage: Structural Prior Constraints and Detail Enhancement for Collaborative Generation. During the inference stage, the improved Pix2Pix first generates a coarse-grained pattern with a complete basic structure based on the input design sketch, and this pattern is used as the conditional input for the diffusion model. This coarse-grained pattern provides a strict structural prior and geometric constraints for the diffusion model, ensuring that the diffusion model does not deviate from the target structure during the denoising generation process, thus solving the core problem of structural distortion that occurs with independent diffusion models in small-sample scenarios.
- (3)
- Performance Complementarity: Addressing the Limitations of Single Models. For the standalone improved Pix2Pix model, although it maintains structural generation stability under small-sample conditions, it is limited by the adversarial learning mechanism of GANs. It is prone to issues such as edge blurring, local line breaks, and detail loss in complex structures like curved lines and cutting lines, failing to meet industrial production requirements. For the standalone conditional diffusion model, despite its excellent detail generation capability, it is susceptible to structural deformation, topological deviation, and unstable generation under small annotated datasets, rendering the generated patterns unsuitable for direct production use.
4. Experiments
4.1. Dataset and Implementation Details
4.2. Analysis of Weighting Factors and Evaluation Indicators
4.3. Experimental Results and Comparison with State-of-the-Art Methods
4.3.1. Generation of Pattern-Making for the Sleeve
4.3.2. Generation of Pattern-Making for the Back Panels
4.4. Ablation Study
4.4.1. Loss Function Ablation
4.4.2. Module Ablation
4.4.3. Discriminator Architecture Ablation
5. Conclusions
- (1)
- Improving the structure capturing ability of Pix2Pix: By designing a multi-scale discriminator (fusing local PatchGAN and global discriminator) and a composite structure-aware loss function, it balances global structure consistency and local detail accuracy, overcoming the limitations of traditional Pix2Pix in expressing structural differences in clothing patterns.
- (2)
- Two-stage generation–repair framework for data scarcity: Proposing a hybrid architecture combining a GAN and diffusion model, the improved Pix2Pix establishes a rough structural correspondence under limited supervision, providing enhanced training samples for the diffusion model; the diffusion model optimizes details based on these structures, alleviating the instability of the diffusion model in small datasets, and achieving a collaborative improvement in structural integrity and detail quality.
- (3)
- Progressive verification logic for multiple complexity components: Verifying the model performance from sleeves (a simple component containing only contour lines) to the back piece (a complex component containing cutting lines and pleats). This logic not only validates the basic generation ability of the framework but also confirms its robustness in handling complex structures, fully demonstrating its adaptability to different complexity components. Compared with existing methods (such as ControlNet, GAN, FUNIT), this framework performs exceptionally well in the clothing pattern generation task, with SSIM reaching 0.869, PSNR reaching 22.31, and LPIPS reaching 0.1318, while ensuring the accuracy and clarity of the generated images, confirming its practicality and effectiveness in automated clothing pattern production.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| GAN | Generative Adversarial Network |
| DPMs | Diffusion Probabilistic Models |
| SSIM | Structural Similarity Index |
| PSNR | Peak Signal-to-Noise Ratio |
| LPIPS | Learned Perceptual Image Patch Similarity |
| cGANs | Conditional Generative Adversarial Networks |
| MSE | Mean Square Error |
| IoU | Intersection over Union |
References
- Lee, J.; Nguyen, D.; Kim, J.; Kang, J.; Lee, S. Double reverse diffusion for realistic garment reconstruction from images. Eng. Appl. Artif. Intell. 2024, 127, 107404. [Google Scholar] [CrossRef]
- Ma, W.; Guan, Z.; Wang, X.; Zhang, Z.; Cao, J. Research on reflective clothing recognition algorithm based on combining omni-dimensional dynamic convolution and partial convolution. Eng. Appl. Artif. Intell. 2024, 137, 109180. [Google Scholar] [CrossRef]
- Lv, Z.; Li, X.; Li, X.; Li, F.; Lin, T.; He, D.; Zuo, W. Learning semantic person image generation by region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2021; pp. 10806–10815. [Google Scholar]
- Li, T.; Du, L.; Huang, Z.; Jiang, Y.; Zou, F. Review on pattern conversion technology based on garment flat recognition. J. Text. Res. 2020, 41, 145–151. [Google Scholar]
- Huang, X.; Hou, Y.; Yang, Y. Automatic generation of high-precision garment patterns based on improved deep learning model. J. Text. Res. 2025, 46, 236–243. [Google Scholar]
- Li, Y.; Wu, X.; Wu, G.; Cong, H. Parametric design modeling and implementation of patterns for knit sweaters. J. Text. Res. 2023, 44, 168–174. [Google Scholar]
- Liu, R.; Xie, H. Similarity pattern matching technology based on garment structural feature recognition. J. Text. Res. 2023, 44, 134–142. [Google Scholar]
- Korosteleva, M.; Lee, S.H. NeuralTailor: Reconstructing sewing pattern structures from 3D point clouds of garments. ACM Trans. Graph. 2022, 41, 109180. [Google Scholar] [CrossRef]
- Tao, X.; Gao, H.; Yang, K.; Wu, Q. Expanding the defect image dataset of composite material coating with enhanced image-to-image translation. Eng. Appl. Artif. Intell. 2024, 133, 108590. [Google Scholar] [CrossRef]
- Liu, L.; Zhang, H.; Ji, Y.; Wu, Q.J. Toward AI fashion design: An Attribute-GAN model for clothing match. Neurocomputing 2019, 341, 156–167. [Google Scholar] [CrossRef]
- Cui, R.Y.; Liu, Q.; Gao, Y.C.; Su, Z. FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets. Comput. Graph. Forum 2018, 37, 109–119. [Google Scholar] [CrossRef]
- Yang, C.; Mohsen, M. Attribute-Aware Generative Design with Generative Adversarial Networks. IEEE Access 2020, 8, 190710–190721. [Google Scholar] [CrossRef]
- Zhang, H.; Sun, Y.; Liu, L.; Wang, X.; Li, L.; Liu, W. ClothingOut: A category-supervised GAN model for clothing segmentation and retrieval. Neural Comput. Appl. 2018, 32, 4519–4530. [Google Scholar] [CrossRef]
- Ma, Q.; Yan, J.; Ramesh, A.; Pujades, S.; Pons-Moll, G.; Tang, S.; Black, M.J. Learning to dress 3D people in generative clothing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2020; pp. 6468–6477. [Google Scholar]
- Ke, H.; Wang, Y. Deep learning techniques in modern women’s smart clothing design. Appl. Math. Nonlinear Sci. 2024, 9, 1–15. [Google Scholar] [CrossRef]
- Tahmid, M.; Alam, S.; Rao, N.; Ashrafi, K.M.A. Image-to-Image Translation with Conditional Adversarial Networks. In 2023 IEEE 9th International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE); IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
- Isola, P.; Zhu, J.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2017. [Google Scholar]
- Lin, E. Comparative Analysis of Pix2Pix and CycleGAN for Image-to-Image Translation. Highlights Sci. Eng. Technol. 2023, 39, 915–925. [Google Scholar] [CrossRef]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Song, Y.; Ermon, S. Generative modeling by estimating gradients of the data distribution. Adv. Neural Inf. Process. Syst. 2019, 32, 1–13. [Google Scholar]
- Carrillo, H.; Clément, M.; Bugeau, A.; Simo-Serra, E. Diffusart: Enhancing Line Art Colorization with Conditional Diffusion Models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada; IEEE: New York, NY, USA, 2023; pp. 3486–3490. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Denton, E.L.; Chintala, S.; Fergus, R. Deep generative image models using a Laplacian pyramid of adversarial networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1486–1494. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training GANs. Adv. Neural Inf. Process. Syst. 2016, 29, 2234–2242. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. arXiv 2017, arXiv:1701.07875. [Google Scholar]
- Berthelot, D.; Schumm, T.; Metz, L. BEGAN: Boundary equilibrium generative adversarial networks. arXiv 2017, arXiv:1703.10717. [Google Scholar] [CrossRef]
- Zhu, J.Y.; Krähenbühl, P.; Shechtman, E.; Efros, A.A. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 597–613. [Google Scholar]
- Song, Y.; Sohl-Dickstein, J.N.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-Based Generative Modeling through Stochastic Differential Equations. arXiv 2020, arXiv:2011.13456. [Google Scholar] [CrossRef]
- Saharia, C.; Ho, J.; Chan, W.; Salimans, T.; Fleet, D.J.; Norouzi, M. Image Super-Resolution via Iterative Refinement. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 45, 4713–4726. [Google Scholar] [CrossRef]
- Lugmayr, A.; Danelljan, M.; Romero, A.; Yu, F.; Timofte, R.; Van Gool, L. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2022; pp. 11451–11461. [Google Scholar]
- Liu, S.; Cheng, Y.; Chen, Z.; Ren, X.; Zhu, W.; Li, L.; Bi, M.; Yang, X.; Yan, Y. Multimodal latent diffusion model for complex sewing pattern generation. arXiv 2024, arXiv:2412.14453. [Google Scholar] [CrossRef]
- Li, X.; Yao, Q.; Wang, Y. GarmentDiffusion: 3D garment sewing pattern generation with multimodal diffusion transformers. In Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI); International Joint Conferences on Artificial Intelligence: Bremen, Germany, 2025; pp. 1458–1466. [Google Scholar]
- Nakayama, K.; Ackermann, J.; Kesdogan, T.L.; Zheng, Y.; Korosteleva, M.; Sorkine-Hornung, O.; Guibas, L.J.; Yang, G.; Wetzstein, G. AIpparel: A multimodal foundation model for digital garments. arXiv 2024, arXiv:2412.03937. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2018; pp. 8798–8807. [Google Scholar]
- Korosteleva, M.; Lee, S.H. Generating Datasets of 3D Garments with Sewing Patterns. In Asian Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 71–88. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2018; pp. 586–595. [Google Scholar]
- Zhang, L.; Rao, A.; Agrawala, M. Adding Conditional Control to Text-to-Image Diffusion Models. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: New York, NY, USA, 2023; pp. 3813–3824. [Google Scholar]
- Liu, M.Y.; Huang, X.; Mallya, A.; Karras, T. Few-Shot Unsupervised Image-to-Image Translation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: New York, NY, USA, 2019; pp. 10550–10559. [Google Scholar]











| Category | Method | SSIM (Original Images) | SSIM (Canny Edge Images) | PSNR | LPIPS |
|---|---|---|---|---|---|
| Existing Methods | ControlNet | 0.765 | 0.788 | 19.58 | 0.2397 |
| GAN | 0.771 | 0.787 | 18.00 | 0.2622 | |
| FUNIT | 0.770 | 0.750 | 19.00 | 0.3223 | |
| Pix2Pix | 0.745 | 0.780 | 19.25 | 0.2013 | |
| Proposed Method | Improved Pix2Pix | 0.764 | 0.780 | 20.20 | 0.1710 |
| DPMs repaired | 0.782 | 0.869 | 22.31 | 0.1318 |
| Experimental Condition | SSIM | PSNR | LPIPS |
|---|---|---|---|
| Remove | 0.744 | 19.47 | 0.1992 |
| Remove | 0.738 | 18.98 | 0.2121 |
| Remove | 0.698 | 18.79 | 0.2344 |
| Remove | 0.730 | 18.96 | 0.2033 |
| Improved Pix2Pix (full loss) | 0.780 | 20.20 | 0.1710 |
| Method | SSIM | PSNR | LPIPS |
|---|---|---|---|
| Remove multi-scale discriminator | 0.783 | 20.00 | 0.1968 |
| Removal of the self-attention mechanism | 0.750 | 19.33 | 0.1844 |
| Removal of diffusion model fixes (DPMs) | 0.780 | 20.20 | 0.1710 |
| Method | SSIM | PSNR | LPIPS |
|---|---|---|---|
| Local only (D1) | 0.745 | 19.38 | 0.2043 |
| Global only (D2) | 0.762 | 19.52 | 0.1919 |
| Multi-scale (D1 + D2) | 0.780 | 20.20 | 0.1710 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zheng, X.; Li, X.; Liu, B.; Xu, B. AI Clothing Pattern Generation: Combining Improved Pix2Pix Image Generation and Diffusion Model Repairing. Electronics 2026, 15, 1751. https://doi.org/10.3390/electronics15081751
Zheng X, Li X, Liu B, Xu B. AI Clothing Pattern Generation: Combining Improved Pix2Pix Image Generation and Diffusion Model Repairing. Electronics. 2026; 15(8):1751. https://doi.org/10.3390/electronics15081751
Chicago/Turabian StyleZheng, Xiaohu, Xiechen Li, Bing Liu, and Bingshun Xu. 2026. "AI Clothing Pattern Generation: Combining Improved Pix2Pix Image Generation and Diffusion Model Repairing" Electronics 15, no. 8: 1751. https://doi.org/10.3390/electronics15081751
APA StyleZheng, X., Li, X., Liu, B., & Xu, B. (2026). AI Clothing Pattern Generation: Combining Improved Pix2Pix Image Generation and Diffusion Model Repairing. Electronics, 15(8), 1751. https://doi.org/10.3390/electronics15081751
