Style Transfer of Chinese Wuhu Iron Paintings Using Hierarchical Visual Transformer
Abstract
:1. Introduction
- We propose a new network that achieves reliable feature encoding by relying on short and long-term modeling of content features and stylistic features with a hierarchical visual transformer and effective style transfer of Chinese Wuhu Iron Paintings with a designed attentional decoder.
- We further designed a content correction module that effectively captures redundant features and noise for rejection using a residual dense architecture to ensure the visual fidelity and friendliness of the migrated images.
- We collected a dataset of Iron Paintings from Wuhu, China, and evaluated it qualitatively and quantitatively to verify the validity of our method.
2. Related Work
2.1. Chinese Wuhu Iron Paintings
2.1.1. Analysis of the Artistic Characteristics of Chinese Wuhu Iron Paintings
2.1.2. Extraction of Artistic Characteristics of Wuhu Iron Paintings in China
2.2. Image Style Transfer
2.3. Vision Transformer for Image
3. Methodology
3.1. Hierarchical Visual Transformer
3.2. ELA-Decoder Module
3.3. Content Correction Module
3.4. Network Training
4. Experiment
4.1. Implementation Details
4.2. Comparison Experiment
4.2.1. Qualitative Comparison
4.2.2. Quantitative Comparison
4.3. Ablation Study
4.3.1. ELA-Decoder Module
4.3.2. Content Correction Module
4.4. Expert Scoring Experiment
5. Analysis of Design Application Based on the Conversion of Wuhu Iron Paintings Style
6. Discussion
6.1. Wuhu Iron Paintings Dataset
6.2. Limitation
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gao, I.; Ilharco, G.; Lundberg, S.; Ribeiro, M.T. Adaptive testing of computer vision models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 4003–4014. [Google Scholar]
- Wang, X.; Wang, W.; Cao, Y.; Shen, C.; Huang, T. Images speak in images: A generalist painter for in-context visual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6830–6839. [Google Scholar]
- Talebi, H.; Milanfar, P. Learning to resize images for computer vision tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 497–506. [Google Scholar]
- Cheng, W.H.; Song, S.; Chen, C.Y.; Hidayati, S.C.; Liu, J. Fashion meets computer vision: A survey. ACM Comput. Surv. (CSUR) 2021, 54, 1–41. [Google Scholar] [CrossRef]
- Huang, B.; Yu, Z.; Chen, A.; Geiger, A.; Gao, S. 2d gaussian splatting for geometrically accurate radiance fields. In Proceedings of the SIGGRAPH ’24: Special Interest Group on Computer Graphics and Interactive Techniques Conference, Denver, CO, USA, 27 July–1 August 2024; ACM SIGGRAPH 2024 Conference Papers. pp. 1–11. [Google Scholar]
- Gortler, S.J.; Grzeszczuk, R.; Szeliski, R.; Cohen, M.F. The lumigraph. In Seminal Graphics Papers: Pushing the Boundaries; Association for Computing Machinery: New York, NY, USA, 2023; Volume 2, pp. 453–464. [Google Scholar]
- Pan, X.; Tewari, A.; Leimkühler, T.; Liu, L.; Meka, A.; Theobalt, C. Drag your gan: Interactive point-based manipulation on the generative image manifold. In Proceedings of the ACM SIGGRAPH 2023 Conference Proceedings, Los Angeles, CA, USA, 6–10 August 2023; pp. 1–11. [Google Scholar]
- Tewel, Y.; Gal, R.; Chechik, G.; Atzmon, Y. Key-locked rank one editing for text-to-image personalization. In Proceedings of the ACM SIGGRAPH 2023 Conference Proceedings, Angeles, CA, USA, 6–10 August 2023; pp. 1–11. [Google Scholar]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Efros, A.; Freeman, W. Image Quilting for Texture Synthesis and Transfer; SIGGRAPH: Tokyo, Japan, 2001. [Google Scholar]
- Gatys, L.; Ecker, A.S.; Bethge, M. Texture synthesis using convolutional neural networks. Adv. Neural Inf. Process. Syst. 2015, 28, 262–270. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 694–711. [Google Scholar]
- Chen, D.; Yuan, L.; Liao, J.; Yu, N.; Hua, G. Stylebank: An explicit representation for neural image style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1897–1906. [Google Scholar]
- Yin, W.; Yin, H.; Baraka, K.; Kragic, D.; Björkman, M. Dance style transfer with cross-modal transformer. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 5058–5067. [Google Scholar]
- Tang, H.; Liu, S.; Lin, T.; Huang, S.; Li, F.; He, D.; Wang, X. Master: Meta style transformer for controllable zero-shot and few-shot artistic style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 18329–18338. [Google Scholar]
- Zhang, C.; Xu, X.; Wang, L.; Dai, Z.; Yang, J. S2wat: Image style transfer via hierarchical vision transformer using strips window attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 7024–7032. [Google Scholar]
- An, J.; Huang, S.; Song, Y.; Dou, D.; Liu, W.; Luo, J. Artflow: Unbiased image style transfer via reversible neural flows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 862–871. [Google Scholar]
- Chen, H.; Wang, Z.; Zhang, H.; Zuo, Z.; Li, A.; Xing, W.; Lu, D. Artistic style transfer with internal-external learning and contrastive learning. Adv. Neural Inf. Process. Syst. 2021, 34, 26561–26573. [Google Scholar]
- Zhang, Y.; Tang, F.; Dong, W.; Huang, H.; Ma, C.; Lee, T.Y.; Xu, C. Domain enhanced arbitrary image style transfer via contrastive learning. In Proceedings of the ACM SIGGRAPH 2022 Conference Proceedings, Vancouver, BC, Canada, 7–11 August 2022; pp. 1–8. [Google Scholar]
- Ma, L.; Wang, B. Design and Inheritance of Iron Painting Intangible Cultural Heritage Based on Modern Information Technology. In Proceedings of the 2020 International Conference on Data Processing Techniques and Applications for Cyber-Physical Systems: DPTA 2020; Springer: Berlin/Heidelberg, Germany, 2021; pp. 407–414. [Google Scholar]
- Ma, L. The Inheritance Strategy of Intangible Cultural Heritage Based on Internet and Information Technolog–Taking Wuhu Iron Painting as an Example. In Proceedings of the 2021 International Conference on Forthcoming Networks and Sustainability in AIoT Era (FoNeS-AIoT), Nicosia, Turkey, 27–28 December 2021; pp. 280–283. [Google Scholar]
- Lyu, K. On Intangible Cultural Heritage Research on the Inheritance and Development of “Wuhu Iron Painting” in Wuhu City. MESSAGE from the President of Suan Sunandha Rajabhat University. In Proceedings of the 1st International Conference on Management, Innovation, Economics and Social Sciences, Bangkok, Thailand, 25–26 July 2020; p. 445. [Google Scholar]
- Li, G.; Hu, J. The inheritance and development of Wuhu iron paintings from the perspective of cultural industry. In Proceedings of the 3rd International Conference on Public Art and Human Development (ICPAHD 2023), Tianjin, China, 22–24 December 2023; EDP Sciences: Les Ulis Cedex, France, 2024; Volume 183, p. 01017. [Google Scholar]
- Tiancheng, Z.; Tieyi, C. The preliminary study on the application of modern advanced processing technique in non-legacy cultural and creative product design–Taking Wuhu iron painting as an example. In Proceedings of the E3S Web of Conferences, E3S Web of Conferences, Tallinn, Estonia, 6–9 September 2020; Volume 179, p. 02091. [Google Scholar]
- Zhang, Y.; Huang, N.; Tang, F.; Huang, H.; Ma, C.; Dong, W.; Xu, C. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 10146–10156. [Google Scholar]
- Cai, Q.; Ma, M.; Wang, C.; Li, H. Image neural style transfer: A review. Comput. Electr. Eng. 2023, 108, 108723. [Google Scholar] [CrossRef]
- Liu, K.; Zhan, F.; Chen, Y.; Zhang, J.; Yu, Y.; El Saddik, A.; Lu, S.; Xing, E.P. Stylerf: Zero-shot 3d style transfer of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 8338–8348. [Google Scholar]
- Woodland, M.; Wood, J.; Anderson, B.M.; Kundu, S.; Lin, E.; Koay, E.; Odisio, B.; Chung, C.; Kang, H.C.; Venkatesan, A.M.; et al. Evaluating the performance of StyleGAN2-ADA on medical images. In Simulation and Synthesis in Medical Imaging; Springer: Berlin/Heidelberg, Germany, 2022; pp. 142–153. [Google Scholar]
- Zhang, Y.; He, Z.; Xing, J.; Yao, X.; Jia, J. Ref-npr: Reference-based non-photorealistic radiance fields for controllable scene stylization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 4242–4251. [Google Scholar]
- Yang, G. The imagery and abstraction trend of Chinese contemporary oil painting. Linguist. Cult. Rev. 2021, 5, 454–471. [Google Scholar] [CrossRef]
- Liu, W. Analysis on the Collision and Fusion of Eastern and Western Paintings in the Context of Globalization. Thought 2021, 7, 8. [Google Scholar]
- Fan, Z.; Zhu, Y.; Yan, C.; Li, Y.; Zhang, K. A comparative study of color between abstract paintings, oil paintings and Chinese ink paintings. In Proceedings of the 15th International Symposium on Visual Information Communication and Interaction, Chur, Switzerland, 16–18 August 2022; pp. 1–8. [Google Scholar]
- Liu, F. Research on oil painting creation based on Computer Technology. J. Phys. Conf. Ser. 2021, 1915, 022005. [Google Scholar] [CrossRef]
- Wen, X.; White, P. The role of landscape art in cultural and national identity: Chinese and European comparisons. Sustainability 2020, 12, 5472. [Google Scholar] [CrossRef]
- Hongxian, L.; Tahir, A.; Bakar, S.A.S.A. The Developing Process of Ideological Trend of the Nationalization in Chinese Oil Painting. Asian J. Res. Educ. Soc. Sci. 2024, 6, 465–474. [Google Scholar]
- Ao, J.; Ye, Z.; Li, W.; Ji, S. Impressions of Guangzhou city in Qing dynasty export paintings in the context of trade economy: A color analysis of paintings based on k-means clustering algorithm. Herit. Sci. 2024, 12, 77. [Google Scholar] [CrossRef]
- Sheng, J.; Song, C.; Wang, J.; Han, Y. Convolutional neural network style transfer towards Chinese paintings. IEEE Access 2019, 7, 163719–163728. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Wei, Z.; Dong, P.; Hui, Z.; Li, A.; Li, L.; Lu, M.; Pan, H.; Li, D. Auto-prox: Training-free vision transformer architecture search via automatic proxy discovery. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 15814–15822. [Google Scholar]
- Fan, Q.; Huang, H.; Chen, M.; Liu, H.; He, R. Rmt: Retentive networks meet vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 5641–5651. [Google Scholar]
- Da, C.; Luo, C.; Zheng, Q.; Yao, C. Vision grid transformer for document layout analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 19462–19472. [Google Scholar]
- Tang, C.; Zhang, L.L.; Jiang, H.; Xu, J.; Cao, T.; Zhang, Q.; Yang, Y.; Wang, Z.; Yang, M. Elasticvit: Conflict-aware supernet training for deploying fast vision transformer on diverse mobile devices. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 5829–5840. [Google Scholar]
- Ji, G.P.; Zhuge, M.; Gao, D.; Fan, D.P.; Sakaridis, C.; Gool, L.V. Masked vision-language transformer in fashion. Mach. Intell. Res. 2023, 20, 421–434. [Google Scholar] [CrossRef]
- Wensel, J.; Ullah, H.; Munir, A. Vit-ret: Vision and recurrent transformer neural networks for human activity recognition in videos. IEEE Access 2023, 11, 72227–72249. [Google Scholar] [CrossRef]
- Liu, Y.; Matsoukas, C.; Strand, F.; Azizpour, H.; Smith, K. Patchdropout: Economizing vision transformers using patch dropout. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 3953–3962. [Google Scholar]
- Wang, Y.; Lu, L.; Yang, W.; Chen, Y. Local or global? A novel transformer for Chinese named entity recognition based on multi-view and sliding attention. Int. J. Mach. Learn. Cybern. 2024, 15, 2199–2208. [Google Scholar] [CrossRef]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning (PMLR), Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Deng, Y.; Tang, F.; Dong, W.; Ma, C.; Pan, X.; Wang, L.; Xu, C. Stytr2: Image style transfer with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11326–11336. [Google Scholar]
- Graham, B.; El-Nouby, A.; Touvron, H.; Stock, P.; Joulin, A.; Jégou, H.; Douze, M. Levit: A vision transformer in convnet’s clothing for faster inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 12259–12269. [Google Scholar]
- Fan, H.; Xiong, B.; Mangalam, K.; Li, Y.; Yan, Z.; Malik, J.; Feichtenhofer, C. Multiscale vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 6824–6835. [Google Scholar]
- Li, W.; Chen, Y.; Guo, X.; He, X. ST2SI: Image Style Transfer via Vision Transformer using Spatial Interaction. Comput. Graph. 2024, 124, 104084. [Google Scholar] [CrossRef]
- Deng, Y.; He, X.; Tang, F.; Dong, W. Z*: Zero-shot Style Transfer via Attention Reweighting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 6934–6944. [Google Scholar]
- Liu, Y.; Yu, W.; Zhang, Z.; Wang, Q.; Che, L. Axial Attention Transformer for Fast High-quality Image Style Transfer. In Proceedings of the 2024 IEEE International Symposium on Circuits and Systems (ISCAS), Singapore, 19–22 May 2024; pp. 1–5. [Google Scholar]
- Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 22–31. [Google Scholar]
- Zhu, M.; He, X.; Wang, N.; Wang, X.; Gao, X. All-to-key attention for arbitrary style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 23109–23119. [Google Scholar]
- Qu, C.; Lu, L.; Wang, A.; Yang, W.; Chen, Y. Novel multi-domain attention for abstractive summarisation. CAAI Trans. Intell. Technol. 2023, 8, 796–806. [Google Scholar] [CrossRef]
- Xue, M.; He, J.; He, Y.; Liu, Z.; Wang, W.; Zhou, M. Low-light image enhancement via clip-fourier guided wavelet diffusion. arXiv 2024, arXiv:2401.03788. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar]
- Park, D.Y.; Lee, K.H. Arbitrary style transfer with style-attentional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5880–5888. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Phillips, F.; Mackintosh, B. Wiki art gallery, inc.: A case for critical thinking. Issues Account. Educ. 2011, 26, 593–608. [Google Scholar] [CrossRef]
- Ghiasi, G.; Lee, H.; Kudlur, M.; Dumoulin, V.; Shlens, J. Exploring the structure of a real-time, arbitrary neural artistic stylization network. arXiv 2017, arXiv:1705.06830. [Google Scholar]
Type | Specific Features | Feature Description |
---|---|---|
Modeling | Flowing line | Unlike other images, Wuhu Iron Paintings are made of steel title, focusing on the expression of modeling lines. |
Brilliance | The material of Wuhu Iron Paintings is metal, which has a strong metallic luster, which is different from the data used in the current field. | |
Simple outline | In the production of Wuhu Iron Paintings, the shape of the object will be highly generalized, forming a brief and condensed stylistic imagery. | |
Color | Black and white | In terms of color, Wuhu Iron Paintings inherit the characteristics of Chinese painting, often with white as the background, and black lines for the composition of the painting, to create a strong contrast between black and white, and with the Chinese painting of the color of the beloved ink it coincides with. |
Hairstyle | 3D artistry | Wuhu Iron Paintings is a three-dimensional painting with a unique three-dimensional texture, and its three-dimensionality is mainly divided into two kinds: one is the height of the object itself, and the other is the level of interspersed between the objects. The data used in the current field are mostly flat, which also shows that it is challenging for us to migrate the style of Wuhu Iron Paintings. |
Leave a blank page | Wuhu Iron Paintings, in terms of the white pictures from the influence of Chinese painting, have specific whites, divided into the following four kinds: one is the composition of white, two is the mood of white, three is the reality of white, and four is the scene of white. This situation is also one of the difficulties we face, because Western art images do not have this type of art. |
Type | CAST | StyTr2 | S2Wat | WCT | Ours |
---|---|---|---|---|---|
Content Loss ↓ | 2.17 | 1.91 | 1.67 | 2.56 | 1.62 |
Style Loss ↓ | 4.43 | 1.67 | 1.75 | 2.23 | 1.63 |
Time(seconds) ↓ | 0.042 | 0.237 | 0.558 | 0.590 | 0.573 |
Methods | Ghiasi et al. [63] | CAST | StyTr2 | S2WAT | Ours |
---|---|---|---|---|---|
Animal species sample | |||||
19 | 47 | 25 | 18 | 69 | |
Landscape sample | |||||
11 | 28 | 35 | 33 | 71 | |
Houseware -1 | |||||
16 | 26 | 41 | 16 | 79 | |
Houseware -2 | |||||
14 | 34 | 46 | 29 | 54 |
Type | Samples | |||
---|---|---|---|---|
Animal Samples | Landscape Sample | Houseware Sample | Houseware Samples | |
Line fluidity | 3.98 | 4.27 | 4.02 | 4.18 |
Metallic expression | 4.16 | 4.07 | 4.29 | 4.30 |
Contour simplicity | 3.91 | 4.11 | 4.23 | 4.25 |
Black and white contrast effect | 4.25 | 4.40 | 3.98 | 4.14 |
Stereoscopic depth of picture | 4.35 | 4.14 | 4.25 | 4.30 |
Leave a blank page | 4.09 | 4.21 | 3.96 | 4.17 |
Aggregate score | 4.12 | 4.20 | 4.12 | 4.22 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, Y.; Ren, Y.; Wu, C.; Xue, M. Style Transfer of Chinese Wuhu Iron Paintings Using Hierarchical Visual Transformer. Sensors 2024, 24, 8103. https://doi.org/10.3390/s24248103
Zhou Y, Ren Y, Wu C, Xue M. Style Transfer of Chinese Wuhu Iron Paintings Using Hierarchical Visual Transformer. Sensors. 2024; 24(24):8103. https://doi.org/10.3390/s24248103
Chicago/Turabian StyleZhou, Yuying, Yao Ren, Chao Wu, and Minglong Xue. 2024. "Style Transfer of Chinese Wuhu Iron Paintings Using Hierarchical Visual Transformer" Sensors 24, no. 24: 8103. https://doi.org/10.3390/s24248103
APA StyleZhou, Y., Ren, Y., Wu, C., & Xue, M. (2024). Style Transfer of Chinese Wuhu Iron Paintings Using Hierarchical Visual Transformer. Sensors, 24(24), 8103. https://doi.org/10.3390/s24248103