Transformer-Based Semantic Segmentation of Japanese Knotweed in High-Resolution UAV Imagery Using Twins-SVT
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Sites
2.2. UAV-Based Aerial Data Collection
2.3. Dataset Construction and Preprocessing
2.4. Model Configuration and Optimization
2.4.1. Encoder: Twins-SVT-Small Transformer
2.4.2. Decoder: UPerNet with Pyramid Pooling
2.4.3. Training Strategy and Environment
2.4.4. Implementation Details
2.4.5. Evaluation Metrics
3. Results
3.1. Quantitative Evaluation
3.2. Training Dynamics and Convergence Analysis
3.3. Qualitative Evaluation and Visualization
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bailey, J.P.; Conolly, C.A. Prize-winners to pariahs—A history of Japanese knotweed s.l. (Polygonaceae) in the British Isles. Watsonia 2001, 23, 93–110. [Google Scholar]
- Child, L.E.; Wade, M.R. The Japanese Knotweed Manual: The Management and Control of an Invasive Alien Weed; Packard Publishing Ltd.: Chichester, UK, 2003. [Google Scholar]
- Conolly, C.A. The distribution and history in the British Isles of some alien species of Polygonum and Reynoutria. Watsonia 1977, 11, 291–311. [Google Scholar]
- Beerling, D.J. Biological flora of the British Isles: Fallopia japonica (Houtt.) Ronse Decraene. J. Ecol. 1991, 79, 1249–1272. [Google Scholar]
- Soll, J. Controlling Knotweed (Polygonum cuspidatum) in the Pacific Northwest; The Nature Conservancy: Arlington, VA, USA, 2004. [Google Scholar]
- Hocking, S.; Toop, T.; Jones, D.; Graham, I.; Eastwood, D. Assessing the relative impacts and economic costs of Japanese knotweed management methods. Sci. Rep. 2023, 13, 3872. [Google Scholar] [CrossRef]
- Powles, S.B.; Yu, Q. Control of Conyza spp. with glyphosate: A review of the situation in Europe. Weed Res. 2015, 55, 1–16. [Google Scholar]
- Shaw, R.H.; Bryner, S.; Tanner, R. The life history and host range of the Japanese knotweed psyllid, Aphalara itadori. Biol. Control. 2011, 58, 328–335. [Google Scholar]
- Valicharla, S.K.; Li, X.; Greenleaf, J.; Turcotte, R.; Hayes, C.; Park, Y.-L. Precision detection and assessment of ash death and decline caused by the emerald ash borer using drones and deep learning. Plants 2023, 12, 798. [Google Scholar] [CrossRef]
- Park, Y.-L.; Naharki, K.; Karimzadeh, R.; Seo, B.Y.; Lee, G.S. Rapid assessment of insect pest outbreak using drones: A case study with Spodoptera exigua (Hübner) (Lepidoptera: Noctuidae) in soybean fields. Insects 2023, 14, 555. [Google Scholar] [CrossRef]
- Anderson, K.; Gaston, K. Lightweight unmanned aerial vehicles will revolutionize spatial ecology. Front. Ecol. Environ. 2016, 11, 138–146. [Google Scholar] [CrossRef]
- Valicharla, S.K.; Karimzadeh, R.; Naharki, K.; Li, X.; Park, Y.L. Detection and multi-class classification of invasive knotweeds with drones and deep learning models. Drones 2024, 8, 293. [Google Scholar] [CrossRef]
- Shi, J.; Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888–905. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI); Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2018; pp. 801–818. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 19–23 June 2018; pp. 7794–7803. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Places: A 10 million Image Database for Scene Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1452–1464. [Google Scholar] [CrossRef]
- Soille, P. Morphological Image Analysis: Principles and Applications, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Semantic understanding of scenes through ADE20K Dataset. Int. J. Comput. Vis. 2017, 127, 302–321. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Pyšek, P.; Richardson, D.M. Invasive species, environmental change and management, and health. Annu. Rev. Environ. Resour. 2010, 35, 25–55. [Google Scholar] [CrossRef]
- Chu, X.; Tian, Z.; Xie, W.; Li, Y.; Jiang, Y.; Liu, Z.; Hu, H. Twins: Revisiting spatial attention design in vision transformers. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–14 December 2021. [Google Scholar]
- Xiao, T.; Liu, Y.; Zhou, B.; Jiang, Y.; Sun, J. Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 418–434. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC). Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. arXiv 2021, arXiv:2105.15203. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Park, Y.-L.; Gururajan, S.; Thistle, H.; Chandran, R.; Reardon, R. Aerial release of Rhinoncomimus latipes (Coleoptera: Curculionidae) to control Persicaria perfoliata (Polygonaceae) using an unmanned aerial system. Pest Manag. Sci. 2018, 74, 141–148. [Google Scholar] [CrossRef]
- Naharki, K.; Hayes, C.; Park, Y.-L. Aerial systems for releasing natural enemy insects of purple loosestrife using drones. Drones 2024, 8, 635. [Google Scholar] [CrossRef]
- Tsouros, D.C.; Bibi, S.; Sarigiannidis, P.G. A review on UAV-based applications for precision agriculture. Information 2019, 10, 349. [Google Scholar] [CrossRef]
- Kim, J.; Huebner, C.D.; Reardon, R.; Park, Y.-L. Spatially targeted biological control of mile-a-minute weed using Rhinoncomimus latipes (Coleoptera: Curculionidae) and an unmanned aircraft system. J. Econ. Entomol. 2021, 114, 1889–1895. [Google Scholar] [CrossRef] [PubMed]
- Karimzadeh, R.; Naharki, K.; Park, Y.-L. Detection of bean damage caused by Epilachna varivestis (Coleoptera: Coccinellidae) using drones, sensors, and image analysis. J. Econ. Entomol. 2024, 117, 2143–2150. [Google Scholar] [CrossRef]
- Dalponte, M.; Coomes, D.A. Deep learning for remote sensing data: A review. Remote Sens. Environ. 2018, 204, 207–223. [Google Scholar] [CrossRef]
Parameter | Stage 1 | Stage 2 | Stage 3 | Stage 4 |
---|---|---|---|---|
Embedding dimension | 64 | 128 | 256 | 512 |
Attention heads | 2 | 4 | 8 | 16 |
Depth | 2 | 2 | 10 | 4 |
MLP expansion ratio | 4 | 4 | 4 | 4 |
Window size | 7 × 7 | 7 × 7 | 7 × 7 | 7 × 7 |
Model | Class | IoU (%) | Accuracy (%) | Pretrain |
---|---|---|---|---|
Twins-SVT [26] | Background | 98.18 | 99.02 | ImageNet-1K |
Knotweed | 91.70 | 95.98 | ImageNet-1K | |
SegFormer [33] | Background | 97.85 | 98.86 | ImageNet-1K |
Knotweed | 90.23 | 95.10 | ImageNet-1K | |
Swin-T [34] | Background | 95.23 | 98.79 | ImageNet-1K |
Knotweed | 90.01 | 95.57 | ImageNet-1K | |
ViT [20] | Background | 97.63 | 98.23 | ImageNet-1K |
Knotweed | 89.65 | 88.92 | ImageNet-1K |
Model | mIoU (%) | AAcc (%) |
---|---|---|
Twins-SVT [30] | 94.94 | 97.50 |
SegFormer [35] | 94.04 | 96.98 |
Swin-T [15] | 92.62 | 96.18 |
ViT [36] | 93.64 | 93.58 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Valicharla, S.K.; Karimzadeh, R.; Li, X.; Park, Y.-L. Transformer-Based Semantic Segmentation of Japanese Knotweed in High-Resolution UAV Imagery Using Twins-SVT. Information 2025, 16, 741. https://doi.org/10.3390/info16090741
Valicharla SK, Karimzadeh R, Li X, Park Y-L. Transformer-Based Semantic Segmentation of Japanese Knotweed in High-Resolution UAV Imagery Using Twins-SVT. Information. 2025; 16(9):741. https://doi.org/10.3390/info16090741
Chicago/Turabian StyleValicharla, Sruthi Keerthi, Roghaiyeh Karimzadeh, Xin Li, and Yong-Lak Park. 2025. "Transformer-Based Semantic Segmentation of Japanese Knotweed in High-Resolution UAV Imagery Using Twins-SVT" Information 16, no. 9: 741. https://doi.org/10.3390/info16090741
APA StyleValicharla, S. K., Karimzadeh, R., Li, X., & Park, Y.-L. (2025). Transformer-Based Semantic Segmentation of Japanese Knotweed in High-Resolution UAV Imagery Using Twins-SVT. Information, 16(9), 741. https://doi.org/10.3390/info16090741