Multi-Scale Feature Learning for Farmland Segmentation Under Complex Spatial Structures
Abstract
1. Introduction
- (1)
- The fragmented distribution, irregular geometries, and significant scale differences of farmland parcels under China’s small-farmer-dominated cultivation mode led to blurred parcel boundaries and inter-class spectral confusion in remote sensing imagery, which severely limits the completeness and boundary accuracy of farmland extraction.
- (2)
- Farmland-containing remote sensing imagery typically has complex backgrounds, making it difficult to accurately detect and extract farmland features from these datasets.
- (3)
- Large-scale, high-resolution farmland mapping is not effectively automated using the current methods. The precision of parcel area statistics and ensuing agricultural management are directly impacted by these three issues.
2. Materials and Methods
2.1. Overview of the Study Area
2.1.1. Sources and Production of Sentinel-2 Datasets
- (1)
- Phenological characteristics: The crop maturation period in September exhibits distinctive spectral signatures that facilitate accurate cropland segmentation.
- (2)
- Data quality: Compared to summer months, September typically has reduced cloud cover, ensuring clearer imagery, while the red-edge bands enhance feature extraction.
- (3)
- Algorithm compatibility: The high spatial resolution and multi-spectral bands optimally support deep learning model training.
- (4)
- Regional adaptability: This temporal window aligns with the optimal growing season for major agricultural regions in the Northern Hemisphere.
2.1.2. Public Datasets
2.2. Model Architecture
2.2.1. Improvements to the U-Net++ Model
2.2.2. Multi-Headed Attention
- (1)
- The channel attention pathway adopts an efficient channel attention strategy without reducing dimensionality. Leveraging adaptive 1D convolutions to dynamically adjust channel weights, it focuses on enhancing feature channels closely linked to crop spectral responses, thereby enabling effective differentiation between diverse crop varieties and distinct growth stages.
- (2)
- For the spatial attention pathway, a traditional spatial attention mechanism is employed. It merges features extracted through channel-wise average pooling and max-pooling operations, then constructs spatial weight maps using fixed-size 7 × 7 convolution kernels. This process strengthens the regular geometric boundaries, internal texture structures, and spatial layout characteristics of farmland parcels.
- (3)
- The global context path uses a linear-complexity self-attention mechanism. It models semantic relationships between farmland and adjacent areas by creating long-range dependencies between pixels, offering crucial scene-level prior information for local discrimination.
- (4)
- A mechanism for dynamic adaptive fusion is presented. A lightweight Gate Network is created to work around the drawbacks of fixed-weight fusion. Based on the initial input properties, this network conditionally creates a set of spatially adaptive weight maps in real-time that match the outputs of the three attention paths. The module achieves input-adaptive optimization of the feature fusion technique by dynamically allocating contributions from each path based on the local content of the input imagery using element-wise weighted summation.
2.2.3. Combine Loss Function
2.3. Accuracy Assessment
2.4. Experimental Environment
3. Results
3.1. Weighting of the Combine Loss Function
3.2. Evaluation of Ablation
3.3. Results of the Sentinel-2 Dataset
3.4. Results of the iFLYTEK Dataset
3.5. Extraction Results in Challenging Scenarios
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Weiss, M.; Jacob, F.; Duveiller, G. Remote sensing for agricultural applications: A meta-review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
- Waldner, F.; Diakogiannis, F.I. Deep learning on edge: Extracting field boundaries from satellite images with a convolutional neural network. Remote Sens. Environ. 2020, 245, 111741. [Google Scholar] [CrossRef]
- Watkins, B.; Van Niekerk, A. A comparison of object-based image analysis approaches for field boundary delineation using multi-temporal Sentinel-2 imagery. Comput. Electron. Agric. 2019, 158, 294–302. [Google Scholar] [CrossRef]
- Food and Agriculture Organization of the United Nations (FAO). The Future of Food and Agriculture—Trends and Challenges. Available online: https://www.fao.org/3/a-i7962e.pdf (accessed on 13 February 2026).
- Sykas, D.; Sdralkakis, M.; Zografakis, D.; Papoutsis, I. A Sentinel-2 multiyear, multicountry benchmark dataset for crop classification and segmentation with deep learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 3323–3339. [Google Scholar] [CrossRef]
- Raei, E.; Asanjani, A.A.; Nikoo, M.R.; Sadegh, M.; Pourshahabi, S.; Adamowski, J.F. A deep learning image segmentation model for agricultural irrigation system classification. Comput. Electron. Agric. 2022, 198, 106977. [Google Scholar] [CrossRef]
- Matton, N.; Canto, G.S.; Waldner, F.; Valero, S.; Morin, D.; Inglada, J.; Arias, M.; Bontemps, S.; Koetz, B.; Defourny, P. An automated method for annual cropland mapping along the season for various globally-distributed agrosystems using high spatial and temporal resolution time series. Remote Sens. 2015, 7, 13208–13232. [Google Scholar] [CrossRef]
- Zhou, X.; Zheng, H.B.; Xu, X.Q.; He, J.Y.; Ge, X.K.; Yao, X.; Cheng, T.; Zhu, Y.; Cao, W.X.; Tian, Y.C. Predicting grain yield in rice using multi-temporal vegetation indices from UAV-based multispectral and digital imagery. ISPRS J. Photogramm. Remote Sens. 2017, 130, 246–255. [Google Scholar] [CrossRef]
- Long, J.; Li, M.; Wang, X.; Stein, A. Delineation of agricultural fields using multi-task BsiNet from high-resolution satellite images. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102871. [Google Scholar] [CrossRef]
- Li, M.; Long, J.; Stein, A.; Wang, X. Using a semantic edge-aware multi-task neural network to delineate agricultural parcels from remote sensing images. ISPRS J. Photogramm. Remote Sens. 2023, 200, 24–40. [Google Scholar] [CrossRef]
- Blaschke, T. Object based image analysis for remote sensing. ISPRS J. Photogramm. Remote Sens. 2010, 65, 2–16. [Google Scholar] [CrossRef]
- Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
- Blaschke, T.; Hay, G.J.; Kelly, M.; Lang, S.; Hofmann, P.; Addink, E.; Feitosa, R.Q.; Van der Meer, F.; Van der Werff, H.; Van Coillie, F.; et al. Geographic object-based image analysis—Towards a new paradigm. ISPRS J. Photogramm. Remote Sens. 2014, 87, 180–191. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 833–851. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 30th Advances in Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A nested U-Net architecture for medical image segmentation. In Proceedings of the 4th Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (DLMIA 2018), Granada, Spain, 16–20 September 2018; pp. 3–11. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers make strong encoders for medical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Strasbourg, France, 27 September–1 October 2021; pp. 66–78. [Google Scholar]
- Wan, Q.; Huang, Z.; Lu, J.; Yu, G.; Zhang, L. SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation. arXiv 2023, arXiv:2301.13156v4. [Google Scholar]
- Shunying, W.; Ya’nan, Z.; Xianzeng, Y.; Li, F.; Tianjun, W.; Jiancheng, L. BSNet: Boundary-semantic-fusion network for farmland parcel mapping in high-resolution satellite images. Comput. Electron. Agric. 2023, 206, 107683. [Google Scholar] [CrossRef]
- Li, J.; Wei, Y.; Wei, T.; He, W. A Comprehensive Deep-Learning Framework for Fine-Grained Farmland Mapping From High-Resolution Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5601215. [Google Scholar] [CrossRef]
- Long, C.; Wenlong, S.; Tao, S.; Yizhu, L.; Wei, J.; Jun, L.; Hongjie, L.; Tianshi, F.; Rongjie, G.; Abbas, H.; et al. Field Patch Extraction Based on High-Resolution Imaging and U2-Net++ Convolutional Neural Networks. Remote Sens. 2023, 15, 4900. [Google Scholar] [CrossRef]
- Lu, H.; Wang, H.; Ma, Z.; Ren, Y.; Fu, W.; Shan, Y.; Hu, S.; Zhang, G.; Meng, Z. Farmland boundary extraction based on the AttMobile-DeeplabV3+ network and least squares fitting of straight lines. Front. Plant Sci. 2023, 14, 1228590. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Li, Y.; Tong, Z.; He, L.; Zhang, M.; Niu, Z.; He, H. GLCANet: Global–Local Context Aggregation Network for Cropland Segmentation from Multi-Source Remote Sensing Images. Remote Sens. 2024, 16, 4627. [Google Scholar] [CrossRef]
- Wang, Y.; Gu, L.; Jiang, T.; Gao, F. MDE-UNet: A Multitask Deformable UNet Combined Enhancement Network for Farmland Boundary Segmentation. IEEE Geosci. Remote Sens. Lett. 2023, 20, 3001305. [Google Scholar] [CrossRef]
- Zhong, B.; Wei, T.; Luo, X.; Du, B.; Hu, L.; Ao, K.; Yang, A.; Wu, J. Multi-Swin Mask Transformer for Instance Segmentation of Agricultural Field Extraction. Remote Sens. 2023, 15, 549. [Google Scholar] [CrossRef]
- Lu, R.; Wang, N.; Zhang, Y.; Lin, Y.; Wu, W.; Shi, Z. Extraction of Agricultural Fields via DASFNet with Dual Attention Mechanism and Multi-scale Feature Fusion in South Xinjiang, China. Remote Sens. 2022, 14, 2253. [Google Scholar] [CrossRef]
- Lu, X.; Ming, D.; Du, T.; Chen, Y.; Dong, D.; Zhou, C. Delineation of cultivated land parcels based on deep convolutional networks and geographical thematic scene division of remotely sensed images. Comput. Electron. Agric. 2022, 192, 106611. [Google Scholar] [CrossRef]
- Cao, Y.; Zhao, Z.; Huang, Y.; Lin, X.; Luo, S.; Xiang, B.; Yang, H. Case instance segmentation of small farmland based on Mask R-CNN of feature pyramid network with double attention mechanism in high resolution satellite images. Comput. Electron. Agric. 2023, 212, 108073. [Google Scholar] [CrossRef]
- Zhang, D.; Pan, Y.; Zhang, J.; Hu, T.; Li, N.; Chen, Q. A generalized approach based on convolutional neural networks for large area cropland mapping at very high resolution. Remote Sens. Environ. 2020, 247, 111912. [Google Scholar] [CrossRef]
- Miao, L.; Li, X.; Zhou, X.; Yao, L.; Deng, Y.; Hang, T. SNUNet3+: A Full-Scale Connected Siamese Network and a Dataset for Cultivated Land Change Detection in High-Resolution Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4400818. [Google Scholar] [CrossRef]
- Zhang, Z.; Huang, L.; Tang, B.H.; Le, W.; Wang, M.; Cheng, J.; Wu, Q. MATNet: Multiattention Transformer network for cropland semantic segmentation in remote sensing images. Int. J. Digit. Earth 2024, 17, 2392845. [Google Scholar] [CrossRef]
- Xu, Y.; Zhu, Z.; Guo, M.; Huang, Y. Multiscale Edge-Guided Network for Accurate Cultivated Land Parcel Boundary Extraction From Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4501020. [Google Scholar] [CrossRef]
- Hong, Q.; Zhu, Y.; Liu, W.; Ren, T.; Shi, C.; Lu, Z.; Yang, Y.; Deng, R.; Qian, J.; Tan, C. A Segmentation Network for Farmland Ridge Based on Encoder-Decoder Architecture in Combined with Strip Pooling Module and ASPP. Front. Plant Sci. 2024, 15, 1328075. [Google Scholar] [CrossRef] [PubMed]
- Wang, B.; Zhou, Y.; Zhu, W.; Feng, L.; He, J.; Wu, T.; Luo, J.; Zhang, X. AAMS-YOLO: Enhanced Farmland Parcel Detection for High-Resolution Remote Sensing Images. Int. J. Digit. Earth 2024, 17, 2432532. [Google Scholar] [CrossRef]
- Zhao, Z.; Liu, Y.; Zhang, G.; Tang, L.; Hu, X. The winning solution to the iFLYTEK challenge 2021 cultivated land extraction from high-resolution remote sensing images. In Proceedings of the 14th International Conference on Advanced Computational Intelligence (ICACI), Wuhan, China, 18–20 March 2022; pp. 376–380. [Google Scholar] [CrossRef]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. arXiv 2023, arXiv:2301.00808. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Fu, Y.; Lou, M.; Yu, Y. SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation. arXiv 2025, arXiv:2412.11890v2. [Google Scholar] [CrossRef]










| Configuration | Contents |
|---|---|
| Optimizer | SGD |
| Scheduler | CosineAnnealingLR |
| Batch size | 8 |
| Total epochs | 150 |
| Initial learning rate | 0.001 |
| Min learning rate | 0.00001 |
| Weight decay | 0.0001 |
| α | β | IoU (%) |
|---|---|---|
| 1 | 0 | 88.95 |
| 1 | 0.5 | 89.43 |
| 1 | 0.8 | 89.87 |
| 1 | 1 | 89.95 |
| 0.8 | 1 | 90.85 |
| 0.5 | 1 | 90.12 |
| 0 | 1 | 89.18 |
| Method | Combine Loss | ConvNeXt V2 | Multi—Scale Fusion | MHA | Precision (%) | Recall (%) | F1 (%) | IoU (%) |
|---|---|---|---|---|---|---|---|---|
| UNet++ | 92.01 | 87.16 | 89.51 | 81.93 | ||||
| √ | 92.65 | 89.91 | 83.78 | 83.78 | ||||
| √ | 93.47 | 91.86 | 92.41 | 86.18 | ||||
| √ | 92.88 | 89.98 | 91.42 | 84.48 | ||||
| √ | 93.12 | 90.72 | 91.90 | 85.16 |
| Method | Precision (%) | Recall (%) | F1 (%) | IoU (%) |
|---|---|---|---|---|
| PSPNet | 89.28 ± 0.68 | 79.90 ± 0.77 | 84.30 ± 0.71 | 74.91 ± 0.78 |
| SegNet | 90.90 ± 0.65 | 85.79 ± 0.74 | 88.26 ± 0.68 | 80.02 ± 0.75 |
| DeepLabV3+ | 92.77 ± 0.60 | 86.72 ± 0.68 | 89.63 ± 0.63 | 82.28 ± 0.70 |
| UNet++ | 92.01 ± 0.62 | 87.16 ± 0.71 | 89.51 ± 0.65 | 81.93 ± 0.72 |
| TransUNet | 91.72 ± 0.59 | 91.78 ± 0.69 | 91.24 ± 0.62 | 83.90 ± 0.69 |
| SeaFormer | 92.08 ± 0.55 | 91.95 ± 0.64 | 92.11 ± 0.57 | 85.24 ± 0.66 |
| SegMAN | 94.86 ± 0.48 | 92.87 ± 0.61 | 93.81 ± 0.52 | 88.78 ± 0.62 |
| CSMNet | 95.91 ± 0.45 | 93.95 ± 0.57 | 94.92 ± 0.49 | 90.85 ± 0.59 |
| Method | Parameters (M) | FLOPs (G) | FPS |
|---|---|---|---|
| PSPNet | 29.81 | 27.27 | 87.34 |
| SegNet | 18.62 | 24.31 | 94.66 |
| DeepLabV3+ | 39.63 | 41.03 | 57.3 |
| UNet++ | 32.16 | 34.90 | 40.51 |
| TransUNet | 95.71 | 75.56 | 17.5 |
| SeaFormer | 10.1 | 16.8 | 127.8 |
| SegMAN | 28.2 | 38.13 | 35.5 |
| CSMNet | 35.72 | 44.85 | 30.18 |
| Method | Precision (%) | Recall (%) | F1 (%) | IoU (%) |
|---|---|---|---|---|
| PSPNet | 94.32 ± 0.55 | 92.25 ± 0.69 | 93.72 ± 0.64 | 88.79 ± 0.70 |
| SegNet | 94.93 ± 0.50 | 93.28 ± 0.67 | 94.06 ± 0.61 | 89.12 ± 0.68 |
| DeepLabV3+ | 95.59 ± 0.45 | 93.10 ± 0.63 | 94.54 ± 0.57 | 90.09 ± 0.62 |
| UNet++ | 95.43 ± 0.47 | 93.11 ± 0.65 | 94.23 ± 0.59 | 89.56 ± 0.65 |
| TransUNet | 95.94 ± 0.43 | 93.69 ± 0.60 | 94.77 ± 0.54 | 90.49 ± 0.63 |
| SeaFormer | 95.50 ± 0.45 | 94.68 ± 0.56 | 95.01 ± 0.51 | 90.73 ± 0.60 |
| SegMAN | 96.16 ± 0.39 | 95.26 ± 0.50 | 95.93 ± 0.46 | 92.48 ± 0.54 |
| CSMNet | 97.23 ± 0.35 | 96.39 ± 0.48 | 96.94 ± 0.42 | 93.69 ± 0.51 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Han, Y.; Wang, Y.; Zhang, Y.; Ai, H.; Qin, C.; Zhang, X. Multi-Scale Feature Learning for Farmland Segmentation Under Complex Spatial Structures. Entropy 2026, 28, 242. https://doi.org/10.3390/e28020242
Han Y, Wang Y, Zhang Y, Ai H, Qin C, Zhang X. Multi-Scale Feature Learning for Farmland Segmentation Under Complex Spatial Structures. Entropy. 2026; 28(2):242. https://doi.org/10.3390/e28020242
Chicago/Turabian StyleHan, Yongqi, Yuqing Wang, Yun Zhang, Hongfu Ai, Chuan Qin, and Xinle Zhang. 2026. "Multi-Scale Feature Learning for Farmland Segmentation Under Complex Spatial Structures" Entropy 28, no. 2: 242. https://doi.org/10.3390/e28020242
APA StyleHan, Y., Wang, Y., Zhang, Y., Ai, H., Qin, C., & Zhang, X. (2026). Multi-Scale Feature Learning for Farmland Segmentation Under Complex Spatial Structures. Entropy, 28(2), 242. https://doi.org/10.3390/e28020242

