Arbitrary Motion Style Transfer via Contrastive Learning
Abstract
:1. Introduction
- We propose a novel framework that incorporates contrastive learning for arbitrary motion style transfer, leveraging the differences and similarities between various motions.
- We introduce a dedicated module for contrastive learning on random temporal motion clips. This module enhances the style encoder’s ability to fully represent style motions.
- We conducted extensive experiments on the impact of contrastive learning and temporal contrastive learning on arbitrary motion style transfer, analyzing their ability to preserve style and content in the style transfer task.
2. Related Work
2.1. Arbitrary Image Style Transfer
2.2. Contrastive Learning
2.3. Motion Style Transfer
3. Methodology
3.1. Architecture
3.2. Training
4. Experiments
4.1. Dataset
4.2. Comparison
4.3. Interpolation
4.4. Ablation Study
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yumer, M.E.; Mitra, N.J. Spectral style transfer for human motion between independent actions. ACM Trans. Graph. (TOG) 2016, 35, 1–8. [Google Scholar] [CrossRef]
- Holden, D.; Saito, J.; Komura, T. A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. (TOG) 2016, 35, 1–11. [Google Scholar] [CrossRef]
- Bazzi, A.; Slock, D.T.; Meilhac, L. A Newton-type Forward Backward Greedy method for multi-snapshot compressed sensing. In Proceedings of the 2017 51st Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 29 October–1 November 2017; pp. 1178–1182. [Google Scholar]
- Aberman, K.; Weng, Y.; Lischinski, D.; Cohen-Or, D.; Chen, B. Unpaired motion style transfer from video to animation. ACM Trans. Graph. (TOG) 2020, 39, 64:1–64:12. [Google Scholar] [CrossRef]
- Park, S.; Jang, D.K.; Lee, S.H. Diverse motion stylization for multiple style domains via spatial-temporal graph-based generative model. Proc. ACM Comput. Graph. Interact. Tech. 2021, 4, 1–17. [Google Scholar] [CrossRef]
- Jang, D.K.; Park, S.; Lee, S.H. Motion puzzle: Arbitrary motion style transfer by body part. ACM Trans. Graph. (TOG) 2022, 41, 1–16. [Google Scholar] [CrossRef]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Proceedings, Part II 14; Springer: Cham, Switzerland, 2016; pp. 694–711. [Google Scholar]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Kwon, J.; Kim, S.; Lin, Y.; Yoo, S.; Cha, J. AesFA: An Aesthetic Feature-Aware Arbitrary Neural Style Transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 13310–13319. [Google Scholar]
- Wang, P.; Han, K.; Wei, X.S.; Zhang, L.; Wang, L. Contrastive learning based hybrid networks for long-tailed image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 943–952. [Google Scholar]
- Li, T.; Cao, P.; Yuan, Y.; Fan, L.; Yang, Y.; Feris, R.S.; Indyk, P.; Katabi, D. Targeted supervised contrastive learning for long-tailed recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6918–6928. [Google Scholar]
- Wang, Y.; Liu, Y.; Zhou, S.; Huang, Y.; Tang, C.; Zhou, W.; Chen, Z. Emotion-oriented Cross-modal Prompting and Alignment for Human-centric Emotional Video Captioning. IEEE Trans. Multimed. 2025, 1. [Google Scholar]
- Liu, Y.; Zhang, H.; Zhan, Y.; Chen, Z.; Yin, G.; Wei, L.; Chen, Z. Noise-resistant multimodal transformer for emotion recognition. Int. J. Comput. Vis. 2024, 1–21. [Google Scholar] [CrossRef]
- Liu, Y.; Feng, S.; Liu, S.; Zhan, Y.; Tao, D.; Chen, Z.; Chen, Z. Sample-Cohesive Pose-Aware Contrastive Facial Representation Learning. Int. J. Comput. Vis. 2025, 1–19. [Google Scholar] [CrossRef]
- Park, T.; Efros, A.A.; Zhang, R.; Zhu, J.Y. Contrastive learning for unpaired image-to-image translation. In Computer Vision–ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part IX 16; Springer: Berlin/Heidelberg, Germany, 2020; pp. 319–345. [Google Scholar]
- Wu, Z.; Zhu, Z.; Du, J.; Bai, X. CCPL: Contrastive coherence preserving loss for versatile style transfer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 189–206. [Google Scholar]
- Su, Y.; Lan, T.; Wang, Y.; Yogatama, D.; Kong, L.; Collier, N. A contrastive framework for neural text generation. Adv. Neural Inf. Process. Syst. 2022, 35, 21548–21561. [Google Scholar]
- Lee, S.; Lee, D.B.; Hwang, S.J. Contrastive learning with adversarial perturbations for conditional text generation. arXiv 2020, arXiv:2012.07280. [Google Scholar]
- Singh, A.; Chakraborty, O.; Varshney, A.; Panda, R.; Feris, R.; Saenko, K.; Das, A. Semi-supervised action recognition with temporal contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10389–10399. [Google Scholar]
- Song, X.; Zhao, S.; Yang, J.; Yue, H.; Xu, P.; Hu, R.; Chai, H. Spatio-temporal contrastive domain adaptation for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9787–9795. [Google Scholar]
- Chen, H.; Zhao, L.; Wang, Z.; Zhang, H.; Zuo, Z.; Li, A.; Xing, W.; Lu, D. Artistic style transfer with internal-external learning and contrastive learning. Adv. Neural Inf. Process. Syst. 2021, 34, 26561–26573. [Google Scholar]
- Zhang, Y.; Tang, F.; Dong, W.; Huang, H.; Ma, C.; Lee, T.Y.; Xu, C. Domain enhanced arbitrary image style transfer via contrastive learning. In Proceedings of the ACM SIGGRAPH 2022 Conference Proceedings, Vancouver, BC, Canada, 7–11 August 2022; pp. 1–8. [Google Scholar]
- Zhang, Y.; Tang, F.; Dong, W.; Huang, H.; Ma, C.; Lee, T.Y.; Xu, C. A unified arbitrary style transfer framework via adaptive contrastive learning. ACM Trans. Graph. 2023, 42, 1–16. [Google Scholar] [CrossRef]
- Hsu, E.; Pulli, K.; Popović, J. Style translation for human motion. In ACM SIGGRAPH 2005 Papers; ACM: New York, NY, USA, 2005; pp. 1082–1089. [Google Scholar]
- Ikemoto, L.; Arikan, O.; Forsyth, D. Generalizing motion edits with gaussian processes. ACM Trans. Graph. (TOG) 2009, 28, 1–12. [Google Scholar] [CrossRef]
- Xia, S.; Wang, C.; Chai, J.; Hodgins, J. Realtime style transfer for unlabeled heterogeneous human motion. ACM Trans. Graph. (TOG) 2015, 34, 1–10. [Google Scholar] [CrossRef]
- Holden, D.; Habibie, I.; Kusajima, I.; Komura, T. Fast neural style transfer for motion data. IEEE Comput. Graph. Appl. 2017, 37, 42–49. [Google Scholar] [CrossRef] [PubMed]
- Du, H.; Herrmann, E.; Sprenger, J.; Cheema, N.; Hosseini, S.; Fischer, K.; Slusallek, P. Stylistic Locomotion Modeling with Conditional Variational Autoencoder. In Proceedings of the Eurographics (Short Papers), Genoa, Italy, 6–10 May 2019; pp. 9–12. [Google Scholar]
- Tao, T.; Zhan, X.; Chen, Z.; van de Panne, M. Style-ERD: Responsive and coherent online motion style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6593–6603. [Google Scholar]
- Mason, I.; Starke, S.; Komura, T. Real-time style modelling of human locomotion via feature-wise transformations and local motion phases. Proc. ACM Comput. Graph. Interact. Tech. 2022, 5, 1–18. [Google Scholar] [CrossRef]
- Tang, X.; Wang, H.; Hu, B.; Gong, X.; Yi, R.; Kou, Q.; Jin, X. Real-time controllable motion transition for characters. ACM Trans. Graph. (TOG) 2022, 41, 1–10. [Google Scholar] [CrossRef]
- Song, W.; Jin, X.; Li, S.; Chen, C.; Hao, A.; Hou, X.; Li, N.; Qin, H. Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 821–830. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Wu, Z.; Xiong, Y.; Yu, S.X.; Lin, D. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3733–3742. [Google Scholar]
- Wen, Y.H.; Yang, Z.; Fu, H.; Gao, L.; Sun, Y.; Liu, Y.J. Autoregressive stylized motion synthesis with generative flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13612–13621. [Google Scholar]
- Yan, S.; Xiong, Y.; Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Method | CC ↓ | SC ↓ |
---|---|---|
Aberman et al. [4] | 33.91 ± 6.33 | 35.34 ± 6.45 |
Jang et al. [6] | 10.05 ± 4.39 | 9.83 ± 4.20 |
Ours w/ | 9.37 ± 3.99 | 10.11 ± 3.6 |
Ours w/ | 9.01 ± 4.00 | 9.68 ± 3.59 |
Ours w/ | 10.05 ± 4.11 | 9.19 ± 3.15 |
Ours (limited) | 9.27 ± 4.41 | 10.18 ± 4.12 |
Ours | 9.20 ± 4.05 | 9.34 ± 3.54 |
Method | CRA ↑ (%) | SRA ↑ (%) |
---|---|---|
Jang et al. [6] | 29.83 | 54.94 |
Ours w/ | 50.68 | 14.11 |
Ours w/ | 52.81 | 12.81 |
Ours w/ | 52.12 | 16.03 |
Ours | 50.75 | 16.71 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Z.; Luo, Z.; Wang, Z.; Liu, Y. Arbitrary Motion Style Transfer via Contrastive Learning. Appl. Sci. 2025, 15, 1817. https://doi.org/10.3390/app15041817
Yang Z, Luo Z, Wang Z, Liu Y. Arbitrary Motion Style Transfer via Contrastive Learning. Applied Sciences. 2025; 15(4):1817. https://doi.org/10.3390/app15041817
Chicago/Turabian StyleYang, Zirui, Zhongwen Luo, Zhengrui Wang, and Yuanyuan Liu. 2025. "Arbitrary Motion Style Transfer via Contrastive Learning" Applied Sciences 15, no. 4: 1817. https://doi.org/10.3390/app15041817
APA StyleYang, Z., Luo, Z., Wang, Z., & Liu, Y. (2025). Arbitrary Motion Style Transfer via Contrastive Learning. Applied Sciences, 15(4), 1817. https://doi.org/10.3390/app15041817