Precise Extraction of Croplands from Remote Sensing Images in Egypt by a Dual-Encoder U-Net with Multi-Scale Axial Attention and Boundary Constraints
Abstract
1. Introduction
- A novel dual-encoder framework is proposed to couple local detail preservation with global context acquisition for cropland parcel extraction, which can reduce boundary fragmentation and semantic inconsistency in various scenes by fusing a CNN encoder (VGG16) with an RMT-based global encoder for long–short range dependency modeling.
- A multi-scale spectral–spatial axial attention module is introduced to better represent parcel geometry and directional structures, which is capable of capturing anisotropic field patterns and improving discrimination under spectral confusion, where conventional spatial attention and single-scale context aggregation are often insufficient.
- An edge-aware boundary enhancement mechanism is designed to explicitly use multi-directional gradient cues into the cropland segmentation, which can mitigate mixed-pixel-induced boundary misclassification and produce more accurate and continuous parcel outlines, especially for irregular and fragmented cropland patches.
2. Materials and Methods
2.1. Study Area
2.2. Materials
- Feature ranking. A total of nineteen features including ten original bands and nine vegetation indices were ranked by feature importance based on RF as in Figure 2.
- Feature subset assessing. Feature subsets were iteratively constructed by selecting the top-K features. For each subset, an RF model was retrained and evaluated using cross-validation. The average accuracy across folds served as the evaluation metric. By varying K, the model’s performance was assessed.
- Optimal K selection. The model achieves a peak cross-validation accuracy of 0.9796 when K = 9, as shown in Figure 3. Adding more features only yields marginal accuracy with computation increasing. And the classification accuracy exhibits a gradual decline when K exceeds 13 due to the redundant or noisy features that may compromise model performance.
2.3. Methods
2.3.1. Dual-Encoder Network with Multi-Scale Axial Attention and Boundary Constraints
2.3.2. RMT Module
2.3.3. CBAM_s Module
2.3.4. EdgeDetect Module
- Semantic enhancement. CBAM_s prioritizes cropland regions via channel-axial attention.
- Spatial localization. Multi-directional Sobel operations extract gradient cues of field edges.
- Boundary refinement. Gradient features are fused and normalized to suppress noise while sharpening the edges.
2.3.5. Evaluation Metrics
2.3.6. Parameters
3. Results
3.1. Ablation Experiments
3.2. Comparison Experiments
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| MAA-BCNet | A novel dual-encoder deep learning method that integrates multi-scale axial attention and boundary constraints |
| ML | Machine learning |
| OBIA | Object-Based Image Analysis |
| SVM | Support Vector Machines |
| RF | Random Forest |
| GBMs | Gradient Boosting Machines |
| k-NN | k-Nearest Neighbors |
| ViT | Vision Transformer |
| scSE | spatial-channel squeeze-and-excitation |
| RMT | Retentive Networks Meet Vision Transformers |
| OOB | Out-of-bag |
| CBAM_s | Convolutional Block Attention Module with axial spatial module |
| RetNet | Retentive Network |
| MaSA | Manhattan Self-Attention |
| CBAM | Convolutional Block Attention Module |
| GELU | Gaussian Error Linear Unit |
| MaxPool | Max Pooling |
| BN | Batch Normalization |
| AvgPool | Average Pooling |
| IoU | Intersection over Union |
| AUC | Area Under the Curve |
| ROC | Receiver Operating Characteristic |
| FPR | False Positive Rate |
| TPR | True Positive Rate |
| ED | EdgeDetect |
References
- Namany, S.; Govindan, R.; Al-Ansari, T. Operationalising transboundary cooperation through game theory: An energy water food nexus approach for the Middle East and North Africa. Futures 2023, 152, 103198. [Google Scholar] [CrossRef]
- Omar, A.R.; Bardsley, D.K. Conceptualising climate change vulnerability across the agrarian transition: The example of Egypt. Environ. Dev. 2024, 52, 101087. [Google Scholar] [CrossRef]
- Robson, J.S.; Ayad, H.M.; Wasfi, R.A.; El-Geneidy, A.M. Spatial disintegration and arable land security in Egypt: A study of small- and moderate-sized urban areas. Habitat Int. 2012, 36, 253–260. [Google Scholar] [CrossRef]
- Sattar, A.; Brown, C.; Rounsevell, M.; Alexander, P. Typology analysis of Egyptian agricultural households reveals increasing income diversification and abandonment of agricultural activities. Agric. Syst. 2024, 218, 104000. [Google Scholar] [CrossRef]
- Zhao, Y.; Ji, C.; Chen, Y.; Zhu, X. Who gains, who loses?—The impact of the belt and road initiative on bilateral agricultural trade. China Econ. Rev. 2024, 88, 102284. [Google Scholar] [CrossRef]
- Gopal, S.; Woodcock, C.E.; Strahler, A.H. Fuzzy Neural Network Classification of Global Land Cover from a 1° AVHRR Data Set. Remote Sens. Environ. 1999, 67, 230–243. [Google Scholar] [CrossRef]
- Rydberg, A.; Borgefors, G. Integrated method for boundary delineation of agricultural fields in multispectral satellite images. IEEE Trans. Geosci. Remote Sens. 2001, 39, 2514–2520. [Google Scholar] [CrossRef]
- Turker, M.; Kok, E.H. Field-based sub-boundary extraction from remote sensing imagery using perceptual grouping. ISPRS J. Photogramm. Remote Sens. 2013, 79, 106–121. [Google Scholar] [CrossRef]
- Graesser, J.; Ramankutty, N. Detection of cropland field parcels from Landsat imagery. Remote Sens. Environ. 2017, 201, 165–180. [Google Scholar] [CrossRef]
- Belgiu, M.; Csillik, O. Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis. Remote Sens. Environ. 2018, 204, 509–523. [Google Scholar] [CrossRef]
- Cheng, T.; Ji, X.; Yang, G.; Zheng, H.; Ma, J.; Yao, X.; Zhu, Y.; Cao, W. DESTIN: A new method for delineating the boundaries of crop fields by fusing spatial and temporal information from World View and Planet satellite imagery. Comput. Electron. Agric. 2020, 178, 105787. [Google Scholar] [CrossRef]
- Cai, Z.; Hu, Q.; Zhang, X.; Yang, J.; Wei, H.; He, Z.; Song, Q.; Wang, C.; Yin, G.; Xu, B. An Adaptive Image Segmentation Method with Automatic Selection of Optimal Scale for Extracting Cropland Parcels in Smallholder Farming Systems. Remote Sens. 2022, 14, 3067. [Google Scholar] [CrossRef]
- Ming, D.; Li, J.; Wang, J.; Zhang, M. Scale parameter selection by spatial statistics for GeOBIA: Using mean-shift based multi-scale segmentation as an example. ISPRS J. Photogramm. Remote Sens. 2015, 106, 28–41. [Google Scholar] [CrossRef]
- Lambert, M.-J.; Waldner, F.; Defourny, P. Cropland Mapping over Sahelian and Sudanian Agrosystems: A Knowledge-Based Approach Using PROBA-V Time Series at 100-m. Remote Sens. 2016, 8, 232. [Google Scholar] [CrossRef]
- Phalke, A.R.; Özdoğan, M.; Thenkabail, P.S.; Erickson, T.; Gorelick, N.; Yadav, K.; Congalton, R.G. Mapping croplands of Europe, Middle East, Russia, and Central Asia using Landsat, Random Forest, and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2020, 167, 104–122. [Google Scholar] [CrossRef]
- Ramezan, C.A.; Warner, T.A.; Maxwell, A.E.; Price, B.S. Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data. Remote Sens. 2021, 13, 368. [Google Scholar] [CrossRef]
- Li, Y.; Liu, W.; Ge, Y.; Yuan, S.; Zhang, T.; Liu, X. Extracting Citrus Growing Regions by Multiscale UNet Using Sentinel-2 Satellite Imagery. Remote Sens. 2024, 16, 36. [Google Scholar] [CrossRef]
- Zhang, D.; Pan, Y.; Zhang, J.; Hu, T.; Zhao, J.; Li, N.; Chen, Q. A generalized approach based on convolutional neural networks for large area cropland mapping at very high resolution. Remote Sens. Environ. 2020, 247, 111912. [Google Scholar] [CrossRef]
- Xu, J.; Zhu, Y.; Zhong, R.; Lin, Z.; Xu, J.; Jiang, H.; Huang, J.; Li, H.; Lin, T. DeepCropMapping: A multi-temporal deep learning approach with improved spatial generalizability for dynamic corn and soybean map ping. Remote Sens. Environ. 2020, 247, 111946. [Google Scholar] [CrossRef]
- Persello, C.; Tolpekin, V.A.; Bergado, J.R.; de By, R.A. Delineation of agricultural fields in smallholder farms from satellite images using fully convolutional networks and combinatorial grouping. Remote Sens. Environ. 2019, 231, 111253. [Google Scholar] [CrossRef]
- Zhang, H.; Liu, M.; Wang, Y.; Shang, J.; Liu, X.; Li, B.; Song, A.; Li, Q. Automated delineation of agricultural field boundaries from Sentinel-2 images using recurrent residual U-Net. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102557. [Google Scholar] [CrossRef]
- Cai, Z.; Hu, Q.; Zhang, X.; Yang, J.; Wei, H.; Wang, J.; Zeng, Y.; Yin, G.; Li, W.; You, L.; et al. Improving agricultural field parcel delineation with a dual branch spatiotemporal fusion network by integrating multimodal satellite data. ISPRS J. Photogramm. Remote Sens. 2023, 205, 34–49. [Google Scholar] [CrossRef]
- Wang, H.; Zhang, M.; Li, W.; Gao, Y.; Gui, Y.; Zhang, Y. Unbalanced Class Learning Network with Scale-Adaptive Perception for Complicated Scene in Remote Sensing Images Segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4406712. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W., Frangi, A., Eds.; Springer: Cham, Switzerland, 2015; Volume 9351. [Google Scholar] [CrossRef]
- Wei, H.; Xu, X.; Ou, N.; Zhang, X.; Dai, Y. DEANet: Dual Encoder with Attention Network for Semantic Segmentation of Remote Sensing Imagery. Remote Sens. 2021, 13, 3900. [Google Scholar] [CrossRef]
- Wang, D.; Sun, Y.; Chen, H.; Zhao, X. Image segmentation network based on enhanced dual encoder. Sci. Rep. 2025, 15, 35983. [Google Scholar] [CrossRef]
- Ahmed, A.; Sun, G.; Bilal, A.; Li, Y.; Ebad, S.A. Precision and efficiency in skin cancer segmentation through a dual encoder deep learning model. Sci. Rep. 2025, 15, 4815. [Google Scholar] [CrossRef]
- Khan, S.D.; Alarabi, L.; Basalamah, S. Deep Hybrid Network for Land Cover Semantic Segmentation in High-Spatial Resolution Satellite Images. Information 2021, 12, 230. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Wang, Z.; Xia, M.; Weng, L.; Hu, K.; Lin, H. Dual Encoder–Decoder Network for Land Cover Segmentation of Remote Sensing Image. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 2372–2385. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Fan, Q.; Huang, H.; Chen, M.; Liu, H.; He, R. RMT: Retentive Networks Meet Vision Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 5641–5651. [Google Scholar] [CrossRef]
- Sun, Y.; Dong, L.; Huang, S.; Ma, S.; Xia, Y.; Xue, J.; Wang, J.; Wei, F. Retentive Network: A Successor to Transformer for Large Language Models. arXiv 2023, arXiv:2307.08621. [Google Scholar] [CrossRef]
- Li, Y.; Liu, X.; Ferreira, V.; Balzter, H.; Zhou, H.; Ge, Y.; Lai, M.; Chu, S.; Ding, H.; Gu, Z. Surface water mapping from remote sensing in Egypt’s dry season using an improved U-Net model with multi-scale information and attention mechanism. Int. J. Appl. Earth Obs. Geoinf. 2025, 142, 104666. [Google Scholar] [CrossRef]
- Ghaznavi, A.; Saberioon, M.; Brom, J.; Itzerott, S. Comparative performance analysis of simple U-Net, residual attention U-Net, and VGG16-U-Net for inventory inland water bodies. Appl. Comput. Geosci. 2024, 21, 100150. [Google Scholar] [CrossRef]
- Zhang, G.; Zhao, C.; Jia, M.; Zhang, R.; Jiang, H.; Wang, Z. Mapping dominant plant communities in the degraded Zoige swamp using Sentinel-1/2 imagery and its implications for vegetation restoration. Ecol. Indic. 2025, 175, 113557. [Google Scholar] [CrossRef]
- Liu, J.; Yan, J.; Wang, L.; Huang, L.; He, H.; Liu, H. Remote Sensing Time Series Classification Based on Self-Attention Mechanism and Time Sequence Enhancement. Remote Sens. 2021, 13, 1804. [Google Scholar] [CrossRef]
- Zheng, J.; Fu, Y.; Chen, X.; Zhao, R.; Lu, J.; Zhao, H.; Chen, Q. EGCM-UNet: Edge Guided Hybrid CNN-Mamba UNet for farm land remote sensing image semantic segmentation. Geocarto Int. 2024, 40, 2440407. [Google Scholar] [CrossRef]
- John, D.; Zhang, C. An attention-based U-Net for detecting deforestation within satellite sensor imagery. Int. J. Appl. Earth Obs. Geoinf. 2022, 107, 102685. [Google Scholar] [CrossRef]
- Miao, L.; Li, X.; Zhou, X.; Yao, L.; Deng, Y.; Hang, T.; Zhou, Y.; Yang, H. SNUNet3+: A full-scale connected Siamese network and a dataset for cultivated land change detection in high-resolution remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4400818. [Google Scholar] [CrossRef]
- Lu, R.; Zhang, Y.; Huang, Q.; Zeng, P.; Shi, Z.; Ye, S. A refined edge aware convolutional neural networks for agricultural parcel delineation. Int. J. Appl. Earth Obs. Geoinf. 2024, 133, 104084. [Google Scholar] [CrossRef]
- Gohar, A.A.; Cashman, A.; El-bardisy, H.A.E.H. Modeling the impacts of water-land allocation alternatives on food security and agricultural livelihoods in Egypt: Welfare analysis approach. Environ. Dev. 2021, 39, 100650. [Google Scholar] [CrossRef]
- Bratley, K.H.; Woodcock, C.E. Estimating the expansion and reduction of agricultural extent in Egypt using Landsat time series. Int. J. Appl. Earth Obs. Geoinf. 2024, 133, 104141. [Google Scholar] [CrossRef]
- Akbari, E.; Amini, J.; Sumfleth, K. Crop mapping using Random Forest and Particle Swarm Optimization: A classification–feature selection ensemble procedure for multi-temporal Sentinel-2 data. Remote Sens. 2020, 12, 1449. [Google Scholar] [CrossRef]
- Snevajs, H.; Charvat, K.; Onckelet, V.; Kvapil, J.; Zadrazil, F.; Kubickova, H.; Seidlova, J.; Batrlova, I. Crop detection using time series of Sentinel-2 and Sentinel-1 and existing land parcel information systems. Remote Sens. 2022, 14, 1095. [Google Scholar] [CrossRef]
- Judith, J.; Tamilselvi, R.; Beham, M.P.; Lakshmi, S.S.P.; Panthakkan, A.; Mansoori, S.A.; Ahmad, H.A. Remote sensing based crop health classification using NDVI and fully connected neural networks. arXiv 2025, arXiv:2504.10522. [Google Scholar] [CrossRef]
- Delegido, J.; Verrelst, J.; Alonso, L.; Moreno, J. Evaluation of Sentinel-2 Red-Edge Bands for Empirical Estimation of Green LAI and Chlorophyll Content. Sensors 2011, 11, 7063–7081. [Google Scholar] [CrossRef]
- Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A modified soil adjusted vegetation index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
- Fei, H.; Fan, Z.; Wang, C.; Zhang, N.; Wang, T.; Chen, R.; Bai, T. Cotton Classification Method at the County Scale Based on Multi-Features and Random Forest Feature Selection Algorithm and Classifier. Remote Sens. 2022, 14, 829. [Google Scholar] [CrossRef]
- Liu, J.; Feng, Q.; Gong, J.; Zhou, J.; Liang, J.; Li, Y. Winter wheat mapping using a random forest classifier combined with multi-temporal and multi-sensor data. Int. J. Digit. Earth 2018, 11, 783–802. [Google Scholar] [CrossRef]
- Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
- Yu, G.; Goussies, N.A.; Yuan, J.; Liu, Z. Fast Action Detection via Discriminative Random Forest Voting and Top-K Subvolume Search. IEEE Trans. Multimed. 2011, 13, 507–517. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: Re designing Skip Connections to Exploit Multiscale Features in Image Seg mentation. arXiv 2020, arXiv:1912.05074. [Google Scholar] [CrossRef]
- Zhu, W.; Huang, Y.; Zeng, L.; Chen, X.; Liu, Y.; Qian, Z.; Du, N.; Fan, W.; Xie, X. AnatomyNet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy. Med. Phys. 2018, 46, 576–589. [Google Scholar] [CrossRef] [PubMed]














| Indices | Equations | Reference |
|---|---|---|
| NDVI | ) | [45] |
| EVI | ) | [45] |
| GNDVI | ) | [46] |
| MSAVI | 2 | [46] |
| NDVIre5 | ) | [47] |
| NDVIre6 | ) | [47] |
| NDVIre7 | ) | [47] |
| SAVI | ) | [48] |
| OSAVI | ) | [48] |
| Model | Precision | Recall | F1 Score | IoU | FLOPs |
|---|---|---|---|---|---|
| VGG16-U_net++ | 0.8327 | 0.9189 | 0.8737 | 0.7768 | 37.71 |
| RMT-U_net++ | 0.8385 | 0.9027 | 0.8694 | 0.7703 | 38.15 |
| U_net++ | 0.8464 | 0.8829 | 0.8643 | 0.7610 | 35.73 |
| Dual-Encoder-U_net++ | 0.8458 | 0.9155 | 0.8793 | 0.7846 | 38.68 |
| Model | ED Module | CBAM_s Module | Precision | Recall | F1-Score | IoU | FLOPs |
|---|---|---|---|---|---|---|---|
| MAA-BCNet | √ | √ | 0.9077 | 0.9492 | 0.9280 | 0.8657 | 43.27 |
| MAA-BCNetnoCBAM_s | √ | × | 0.8787 | 0.9441 | 0.9102 | 0.8352 | 41.34 |
| MAA-BCNetnoED | × | √ | 0.8751 | 0.9382 | 0.9056 | 0.8274 | 39.78 |
| MAA-BCNetnoAll | × | × | 0.8458 | 0.9155 | 0.8793 | 0.7846 | 38.68 |
| Feature Selection | F1-Score | IoU | FLOPs |
|---|---|---|---|
| Raw bands (10-band) | 0.9172 | 0.8526 | 59.74 |
| Selected features (Top-K, nine features) | 0.9297 | 0.8682 | 55.28 |
| Spectral indices only (nine indices) | 0.9214 | 0.8571 | 56.97 |
| PCA components (three components from nine features) | 0.9280 | 0.8657 | 43.27 |
| Model | Precision | Recall | F1 Score | IoU |
|---|---|---|---|---|
| MAA-BCNet | 0.9077 | 0.9492 | 0.9280 | 0.8657 |
| DeeplabV3_plus | 0.8789 | 0.9463 | 0.9114 | 0.8371 |
| PSPnet | 0.8669 | 0.9016 | 0.8839 | 0.7920 |
| Link_net | 0.8413 | 0.8802 | 0.8603 | 0.7549 |
| FCN_resnet101 | 0.7617 | 0.8476 | 0.8024 | 0.6699 |
| U_net++ | 0.8464 | 0.8829 | 0.8643 | 0.7610 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, Y.; Ding, H.; Balzter, H.; Ferreira, V.; Ge, Y.; Wang, H.; Zhou, H.; Sun, T.; Shi, L.; Lai, M.; et al. Precise Extraction of Croplands from Remote Sensing Images in Egypt by a Dual-Encoder U-Net with Multi-Scale Axial Attention and Boundary Constraints. Land 2026, 15, 305. https://doi.org/10.3390/land15020305
Li Y, Ding H, Balzter H, Ferreira V, Ge Y, Wang H, Zhou H, Sun T, Shi L, Lai M, et al. Precise Extraction of Croplands from Remote Sensing Images in Egypt by a Dual-Encoder U-Net with Multi-Scale Axial Attention and Boundary Constraints. Land. 2026; 15(2):305. https://doi.org/10.3390/land15020305
Chicago/Turabian StyleLi, Yong, Han Ding, Heiko Balzter, Vagner Ferreira, Ying Ge, Hongyan Wang, Huiyu Zhou, Tengbo Sun, Lulu Shi, Meiyun Lai, and et al. 2026. "Precise Extraction of Croplands from Remote Sensing Images in Egypt by a Dual-Encoder U-Net with Multi-Scale Axial Attention and Boundary Constraints" Land 15, no. 2: 305. https://doi.org/10.3390/land15020305
APA StyleLi, Y., Ding, H., Balzter, H., Ferreira, V., Ge, Y., Wang, H., Zhou, H., Sun, T., Shi, L., Lai, M., & Liu, X. (2026). Precise Extraction of Croplands from Remote Sensing Images in Egypt by a Dual-Encoder U-Net with Multi-Scale Axial Attention and Boundary Constraints. Land, 15(2), 305. https://doi.org/10.3390/land15020305

