Hybrid ConvLSTM U-Net Deep Neural Network for Land Use and Land Cover Classification from Multi-Temporal Sentinel-2 Images: Application to Yaoundé, Cameroon
Abstract
1. Introduction
2. Related Work and Critical Literature Review
2.1. Related Background
2.1.1. Data for LULC Classification
- Google Dynamic World [28] delivers near-real-time land cover maps derived from Sentinel-2 imagery using a deep learning model trained on large-scale automatically labeled data. It distinguishes nine land cover classes, including built-up areas, crops, grass, trees, water, and bare ground. Its main strength lies in its high temporal frequency and global consistency, making it suitable for large-scale monitoring and rapid change detection.
- ESA WorldCover [29] provides a global land cover map at 10 m resolution with eleven classes, generated through a supervised classification framework combining Sentinel-1 and Sentinel-2 data. The product emphasizes thematic consistency at the global scale and is optimized for interoperability with climate, environmental, and biodiversity studies.
- Esri Global Land Cover [30] is another global product derived from Sentinel-2 imagery, designed for integration into GIS workflows. It offers a simplified multi-class legend and visually smooth maps that are particularly suited for cartographic and visualization purposes.
2.1.2. Convolutional Neural Networks (CNNs) and U-Net Variants
2.1.3. Recurrent and ConvLSTM-Based Models
2.1.4. Hybrid CNN–RNN Models for Multi-Temporal Remote Sensing
2.1.5. Remote Sensing Foundation Models and Vision Transformers
2.2. Research Gap, Motivations, and Challenges
2.2.1. Research Gab
2.2.2. Motivation and Challenge
3. Materials and Methods
3.1. Study Area and Data Collection
3.1.1. Study Area
3.1.2. Data Collection and Characterization
3.2. Proposed Method: Hybrid ConvLSTM U-Net Architecture
3.2.1. Preparation of Spatio-Temporal Sequences of Data
3.2.2. Hybrid ConvLSTM U-Net Model Architecture
4. Experiments and Results Analysis
4.1. Experimental Protocol
4.2. Model Setting and Training
Performance Metrics Calculation
4.3. Results Analysis
4.3.1. Learning Curve Analysis
4.3.2. Performance Results Analysis
4.4. Qualitative Analysis of Predictions
4.5. Study of Model Complexity
5. Discussion: Relevance for Urban Planning, Limitations, and Future Directions
Limitations and Perspectives
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| LULC | Land Use Land Cover |
| LSTM | Long Short Term Memory |
| ConvLSTM | Convolutional Long Short Term Memory |
| CNN | Convolutional Neural Network |
Appendix A

References
- Belinga, A.G.; El Haziti, M. Overviewing the emerging methods for predicting urban sprawl features. E3S Web of Conf. 2023, 418, 03008. [Google Scholar] [CrossRef]
- Belinga, A.G.; Koumetio, S.C.T.; El Haziti, M. Exploring the potentialities and challenges of deep learning for simulation and prediction of urban sprawl features. Data Policy 2025, 7, e2. [Google Scholar] [CrossRef]
- Tekouabou, S.C.K.; Diop, E.B.; Azmi, R.; Jaligot, R.; Chenal, J. Reviewing the application of machine learning methods to model urban form indicators in planning decision support systems: Potential, issues and challenges. J. King Saud-Univ.-Comput. Inf. Sci. 2022, 34, 5943–5967. [Google Scholar] [CrossRef]
- Tékouabou, S.C.; Chenal, J.; Azmi, R.; Toulni, H.; Diop, E.B.; Nikiforova, A. Identifying and classifying urban data sources for machine learning-based sustainable urban planning and decision support systems development. Data 2022, 7, 170. [Google Scholar] [CrossRef]
- Phiri, D.; Simwanda, M.; Salekin, S.; Nyirenda, V.R.; Murayama, Y.; Ranagalage, M. Sentinel-2 data for land cover/use mapping: A review. Remote Sens. 2020, 12, 2291. [Google Scholar] [CrossRef]
- Kamda Silapeux, A.; Ponka, R.; Frazzoli, C.; Fokou, E. Waste of fresh fruits in Yaoundé, Cameroon: Challenges for retailers and impacts on consumer health. Agriculture 2021, 11, 89. [Google Scholar] [CrossRef]
- Masolele, R.N.; De Sy, V.; Marcos, D.; Verbesselt, J.; Gieseke, F.; Mulatu, K.A.; Moges, Y.; Sebrala, H.; Martius, C.; Herold, M. Using high-resolution imagery and deep learning to classify land-use following deforestation: A case study in Ethiopia. GISci. Remote Sens. 2022, 59, 1446–1472. [Google Scholar] [CrossRef]
- Gallwey, J.; Robiati, C.; Coggan, J.; Vogt, D.; Eyre, M. A Sentinel-2 based multispectral convolutional neural network for detecting artisanal small-scale mining in Ghana: Applying deep learning to shallow mining. Remote Sens. Environ. 2020, 248, 111970. [Google Scholar] [CrossRef]
- Pelletier, C.; Webb, G.I.; Petitjean, F. Temporal convolutional neural network for the classification of satellite image time series. Remote Sens. 2019, 11, 523. [Google Scholar] [CrossRef]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.c. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 1. [Google Scholar]
- Rußwurm, M.; Körner, M. Convolutional LSTMs for cloud-robust segmentation of remote sensing imagery. arXiv 2018, arXiv:1811.02471. [Google Scholar] [CrossRef]
- Rußwurm, M.; Körner, M. Multi-temporal land cover classification with sequential recurrent encoders. ISPRS Int. J. Geo-Inf. 2018, 7, 129. [Google Scholar] [CrossRef]
- Li, Z.; Shen, H.; Cheng, Q.; Liu, Y.; You, S.; He, Z. Deep learning based cloud detection for medium and high resolution remote sensing images of different sensors. ISPRS J. Photogramm. Remote Sens. 2019, 150, 197–212. [Google Scholar] [CrossRef]
- Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
- Zhang, H.; Li, Y.; Zhang, Y.; Shen, Q. Spectral-spatial classification of hyperspectral imagery using a dual-channel convolutional neural network. Remote Sens. Lett. 2017, 8, 438–447. [Google Scholar] [CrossRef]
- Sefrin, O.; Riese, F.M.; Keller, S. Deep learning for land cover change detection. Remote Sens. 2020, 13, 78. [Google Scholar] [CrossRef]
- Arrechea-Castillo, D.A.; Solano-Correa, Y.T.; Muñoz-Ordóñez, J.F.; Pencue-Fierro, E.L.; Figueroa-Casas, A. Multiclass land use and land cover classification of Andean Sub-Basins in Colombia with Sentinel-2 and Deep Learning. Remote Sens. 2023, 15, 2521. [Google Scholar] [CrossRef]
- Zhang, G.; Roslan, S.N.A.B.; Wang, C.; Quan, L. Research on land cover classification of multi-source remote sensing data based on improved U-net network. Sci. Rep. 2023, 13, 16275. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Cao, S.; Lu, X.; Peng, J.; Ping, L.; Fan, X.; Teng, F.; Liu, X. Lightweight Deep Learning Model, ConvNeXt-U: An Improved U-Net Network for Extracting Cropland in Complex Landscapes from Gaofen-2 Images. Sensors 2025, 25, 261. [Google Scholar] [CrossRef]
- Xu, X. Multi-temporal Land Cover Segmentation via Trans-ConvLSTM. In Proceedings of the International Conference on Big Data Analytics for Cyber-Physical System in Smart City; Springer: Berlin/Heidelberg, Germany, 2022; pp. 422–430. [Google Scholar] [CrossRef]
- Wenger, R.; Puissant, A.; Weber, J.; Idoumghar, L.; Forestier, G. Multimodal and multitemporal land use/land cover semantic segmentation on sentinel-1 and sentinel-2 imagery: An application on a multisenge dataset. Remote Sens. 2022, 15, 151. [Google Scholar] [CrossRef]
- Majidizadeh, A.; Hasani, H.; Jafari, M. Semantic segmentation of oblique UAV video based on ConvLSTM in complex urban area. Earth Sci. Inform. 2024, 17, 3413–3435. [Google Scholar] [CrossRef]
- Yele, V.P.; Badhe, N.B.; Alegavi, S.; Sedamkar, R. Multi Attention Convolutional Sparse Coding U-Net for Enhanced Land-Use and Land-Cover Segmentation Using Hyperspectral Images. Sens. Imaging 2025, 26, 69. [Google Scholar] [CrossRef]
- Buttar, P.K.; Sachan, M.K. Land Cover Segmentation Using 3-D FCN-Based Architecture With Coordinate Attention. IEEE Geosci. Remote Sens. Lett. 2024, 21, 2502905. [Google Scholar] [CrossRef]
- Li, R.; Zheng, S.; Duan, C.; Wang, L.; Zhang, C. Land cover classification from remote sensing images based on multi-scale fully convolutional network. Geo-Spat. Inf. Sci. 2022, 25, 278–294. [Google Scholar] [CrossRef]
- Alam, A.; Bhat, M.S.; Maheen, M. Using Landsat satellite data for assessing the land use and land cover change in Kashmir valley. GeoJournal 2020, 85, 1529–1543. [Google Scholar] [CrossRef]
- Usman, M.; Liedl, R.; Shahid, M.; Abbas, A. Land use/land cover classification and its change detection using multi-temporal MODIS NDVI data. J. Geogr. Sci. 2015, 25, 1479–1506. [Google Scholar] [CrossRef]
- Brown, C.F.; Brumby, S.P.; Guzder-Williams, B.; Birch, T.; Hyde, S.B.; Mazzariello, J.; Czerwinski, W.; Pasquarella, V.J.; Haertel, R.; Ilyushchenko, S.; et al. Dynamic World, Near real-time global 10 m land use land cover mapping. Sci. Data 2022, 9, 251. [Google Scholar] [CrossRef]
- Zanaga, D.; Van De Kerchove, R.; Daems, D.; De Keersmaecker, W.; Brockmann, C.; Kirches, G.; Wevers, J.; Cartus, O.; Santoro, M.; Fritz, S.; et al. ESA WorldCover 10 m 2021 v200; Zenodo: Geneva, Switzerland, 2022. [Google Scholar] [CrossRef]
- Karra, K.; Kontgis, C.; Statman-Weil, Z.; Mazzariello, J.C.; Mathis, M.; Brumby, S.P. Global land use/land cover with Sentinel 2 and deep learning. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 12–16 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 4704–4707. [Google Scholar]
- Mahmud, B.U.; Hong, G.Y.; Mamun, A.A.; Ping, E.P.; Wu, Q. Deep learning-based segmentation of 3D volumetric image and microstructural analysis. Sensors 2023, 23, 2640. [Google Scholar] [CrossRef]
- Lu, S.; Guo, J.; Zimmer-Dauphinee, J.R.; Nieusma, J.M.; Wang, X.; vanValkenburgh, P.; Wernke, S.A.; Huo, Y. Vision foundation models in remote sensing: A survey. IEEE Geosci. Remote Sens. Mag. 2025, 13, 190–215. [Google Scholar] [CrossRef]
- Guo, X.; Lao, J.; Dang, B.; Zhang, Y.; Yu, L.; Ru, L.; Zhong, L.; Huang, Z.; Wu, K.; Hu, D.; et al. Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Denver, CO, USA, 16–22 June 2024; pp. 27672–27683. [Google Scholar]
- Li, Y.; Tan, J.; Dang, B.; Ye, M.; Bartalev, S.A.; Shinkarenko, S.; Wang, L.; Zhang, Y.; Ru, L.; Guo, X.; et al. Unleashing the potential of remote sensing foundation models via bridging data and computility islands. Innovation 2025, 6, 100841. [Google Scholar] [CrossRef]
- Wu, K.; Zhang, Y.; Ru, L.; Dang, B.; Lao, J.; Yu, L.; Luo, J.; Zhu, Z.; Sun, Y.; Zhang, J.; et al. A semantic-enhanced multi-modal remote sensing foundation model for Earth observation. Nat. Mach. Intell. 2025, 7, 1235–1249. [Google Scholar] [CrossRef]
- Zhu, Q.; Lao, J.; Ji, D.; Luo, J.; Wu, K.; Zhang, Y.; Ru, L.; Wang, J.; Chen, J.; Yang, M.; et al. Skysense-o: Towards open-world remote sensing interpretation with vision-centric visual-language modeling. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 14733–14744. [Google Scholar]
- Luo, J.; Pang, Z.; Zhang, Y.; Wang, T.; Wang, L.; Dang, B.; Lao, J.; Wang, J.; Chen, J.; Tan, Y.; et al. Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding. arXiv 2024, arXiv:2406.10100. [Google Scholar]
- Chen, Z.; Zhao, S. Automatic monitoring of surface water dynamics using Sentinel-1 and Sentinel-2 data with Google Earth Engine. Int. J. Appl. Earth Obs. Geoinf. 2022, 113, 103010. [Google Scholar] [CrossRef]
- Nasiri, V.; Deljouei, A.; Moradi, F.; Sadeghi, S.M.M.; Borz, S.A. Land use and land cover mapping using Sentinel-2, Landsat-8 Satellite Images, and Google Earth Engine: A comparison of two composition methods. Remote Sens. 2022, 14, 1977. [Google Scholar] [CrossRef]
- Azad, R.; Asadi-Aghbolaghi, M.; Fathy, M.; Escalera, S. Bi-directional ConvLSTM U-Net with densley connected convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Gillies, S. Rasterio Documentation; MapBox: San Francisco, CA, USA, 2019; Volume 23. [Google Scholar]
- Kramer, O. Scikit-learn. In Machine Learning for Evolution Strategies; Springer: Berlin/Heidelberg, Germany, 2016; pp. 45–53. [Google Scholar] [CrossRef]
- Bisong, E. Matplotlib and seaborn. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners; Springer: Berlin/Heidelberg, Germany, 2019; pp. 151–165. [Google Scholar] [CrossRef]
- Rahman, M.A.; Wang, Y. Optimizing intersection-over-union in deep neural networks for image segmentation. In Advances in Visual Computing, Proceedings of the International Symposium on Visual Computing, Las Vegas, NV, USA, 12–14 December 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 234–244. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized interSection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 658–666. [Google Scholar]
- Zhou, D.; Fang, J.; Song, X.; Guan, C.; Yin, J.; Dai, Y.; Yang, R. IoU loss for 2D/3D object detection. In Proceedings of the 2019 International Conference on 3D Vision (3DV), Québec City, QC, Canada, 15–18 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 85–94. [Google Scholar] [CrossRef]









| Ref | Data Source | Method | Key Findings/Limitations |
|---|---|---|---|
| [12] | Sentinel-2 (time series) | Recurrent encoders | Effective for temporal dynamics, but limited spatial detail. |
| [11] | Sentinel imagery | ConvLSTM | Cloud-robust segmentation, but less efficient for spatial context. |
| [8] | Sentinel-2 | CNN | Detects artisanal mining, limited temporal modeling. |
| [7] | High-resolution imagery (Ethiopia) | CNN + DL classifiers | Good accuracy for deforestation monitoring, seasonal variation issues. |
| [18] | Multi-source imagery | Improved U-Net | Enhances segmentation accuracy with structural modifications. |
| [19] | Gaofen-2 | ConvNeXt-U (U-Net variant) | Lightweight U-Net for cropland extraction, efficient but not temporal. |
| [20] | Multi-temporal datasets | Trans-ConvLSTM | Strong spatio-temporal modeling, higher complexity. |
| [21] | Sentinel-1 & Sentinel-2 | Multimodal CNN | Improved robustness, but requires multiple sensors. |
| [22] | UAV video | ConvLSTM | Handles complex urban dynamics, lacks transferability to satellite imagery. |
| [23] | Hyperspectral images | Sparse Coding U-Net + Attention | High accuracy, computationally expensive. |
| [17] | Sentinel-2 (Colombia) | Deep CNN | Good performance in mountainous regions, not temporal. |
| [24] | RS imagery | 3D FCN + attention | Strong feature extraction, but high GPU demand. |
| [25] | Multi-scale imagery | FCN | Handles multi-scale, but ignores temporal correlations. |
| Year | <10% | <20% | <30% | <40% | <50% | <60% |
|---|---|---|---|---|---|---|
| 2018 | 0 | 0 | 2 | 2 | 3 | 3 |
| 2019 | 2 | 5 | 14 | 23 | 24 | 27 |
| 2020 | 2 | 6 | 8 | 15 | 18 | 23 |
| 2021 | 2 | 2 | 8 | 11 | 19 | 22 |
| 2022 | 0 | 2 | 4 | 8 | 14 | 16 |
| 2023 | 3 | 4 | 5 | 12 | 17 | 19 |
| 2024 | 3 | 3 | 7 | 13 | 20 | 28 |
| 2025 | 0 | 1 | 1 | 3 | 5 | 6 |
| Total | 12 | 23 | 49 | 87 | 120 | 144 |
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual positive | ||
| Actual negative |
| Validation Score | Test Score | |||||||
|---|---|---|---|---|---|---|---|---|
| Classes | Precision | Recall | F1-Score | IoU | Precision | Recall | F1-Score | IoU |
| U-Net | ||||||||
| Class 0 | 0.383 | 0.528 | 0.444 | 0.286 | 0.312 | 0.494 | 0.383 | 0.237 |
| Class 1 | 0.727 | 0.656 | 0.624 | 0.457 | 0.674 | 0.422 | 0.519 | 0.350 |
| Class 2 | 0.790 | 0.701 | 0.743 | 0.590 | 0.807 | 0.694 | 0.746 | 0.595 |
| Macro-Avg | 0.633 | 0.592 | 0.603 | 0.444 | 0.598 | 0.537 | 0.549 | 0.394 |
| ConLSTM | ||||||||
| Class 0 | 0.654 | 0.736 | 0.692 | 0.530 | 0.642 | 0.741 | 0.688 | 0.524 |
| Class 1 | 0.884 | 0.677 | 0.766 | 0.622 | 0.882 | 0.666 | 0.759 | 0.612 |
| Class 2 | 0.884 | 0.881 | 0.882 | 0.790 | 0.911 | 0.901 | 0.828 | 0.910 |
| Macro-Avg | 0.807 | 0.765 | 0.780 | 0.647 | 0.812 | 0.769 | 0.784 | 0.655 |
| Hybrid Unet-ConLSTM | ||||||||
| Class 0 | 0.816 | 0.818 | 0.817 | 0.692 | 0.808 | 0.840 | 0.824 | 0.701 |
| Class 1 | 0.973 | 0.841 | 0.902 | 0.822 | 0.980 | 0.837 | 0.903 | 0.823 |
| Class 2 | 0.927 | 0.953 | 0.940 | 0.891 | 0.948 | 0.958 | 0.953 | 0.910 |
| Macro-Avg | 0.905 | 0.871 | 0.886 | 0.801 | 0.912 | 0.878 | 0.893 | 0.811 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Belinga, A.G.; Tékouabou Koumetio, S.C.; El Haziti, M. Hybrid ConvLSTM U-Net Deep Neural Network for Land Use and Land Cover Classification from Multi-Temporal Sentinel-2 Images: Application to Yaoundé, Cameroon. Math. Comput. Appl. 2026, 31, 18. https://doi.org/10.3390/mca31010018
Belinga AG, Tékouabou Koumetio SC, El Haziti M. Hybrid ConvLSTM U-Net Deep Neural Network for Land Use and Land Cover Classification from Multi-Temporal Sentinel-2 Images: Application to Yaoundé, Cameroon. Mathematical and Computational Applications. 2026; 31(1):18. https://doi.org/10.3390/mca31010018
Chicago/Turabian StyleBelinga, Ange Gabriel, Stéphane Cédric Tékouabou Koumetio, and Mohammed El Haziti. 2026. "Hybrid ConvLSTM U-Net Deep Neural Network for Land Use and Land Cover Classification from Multi-Temporal Sentinel-2 Images: Application to Yaoundé, Cameroon" Mathematical and Computational Applications 31, no. 1: 18. https://doi.org/10.3390/mca31010018
APA StyleBelinga, A. G., Tékouabou Koumetio, S. C., & El Haziti, M. (2026). Hybrid ConvLSTM U-Net Deep Neural Network for Land Use and Land Cover Classification from Multi-Temporal Sentinel-2 Images: Application to Yaoundé, Cameroon. Mathematical and Computational Applications, 31(1), 18. https://doi.org/10.3390/mca31010018

