A Lightweight CNN Pipeline for Soil–Vegetation Classification from Sentinel-2: A Methodological Study over Dolj County, Romania

Jocea, Andreea Florina; Porumb, Liviu; Necula, Lucian; Raducanu, Dan

doi:10.3390/app152212112

Open AccessArticle

A Lightweight CNN Pipeline for Soil–Vegetation Classification from Sentinel-2: A Methodological Study over Dolj County, Romania

Military Technical Academy Ferdinand I, 050141 Bucharest, Romania

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(22), 12112; https://doi.org/10.3390/app152212112

Submission received: 17 October 2025 / Revised: 10 November 2025 / Accepted: 12 November 2025 / Published: 14 November 2025

(This article belongs to the Section Earth Sciences)

Download

Browse Figures

Versions Notes

Abstract

Accurate land cover mapping is essential for environmental monitoring and agricultural management. Sentinel-2 imagery, with high spatial resolution and open access, provides valuable opportunities for operational classification. Convolutional neural networks (CNNs) have demonstrated state-of-the-art results, yet their adoption is limited by high computational demands and limited methodological transparency. This study proposes a lightweight CNN for soil–vegetation classification, in Dolj County, Romania. The architecture integrates three convolutional blocks, global average pooling, and dropout, with fewer than 150,000 trainable parameters. A fully documented workflow was implemented, covering preprocessing, patch extraction, training, and evaluation, addressing reproducibility challenges common in deep leaning studies. Experiments on Sentinel-2 imagery achieved 91.2% overall accuracy and a Cohen’s kappa of 0.82. These results are competitive with larger CNNs while reducing computational requirements by over 90%. Comparative analyses showed improvements over an NDVI baseline and a favorable efficiency–accuracy balance relative to heavier CNNs reported in the literature. A complementary ablation analysis confirmed that the adopted three-block architecture provides the optimal trade-off between accuracy and efficiency, empirically validating the robustness of the proposed design. These findings highlight the potential of lightweight, transparent deep learning for scalable and reproducible land cover monitoring, with prospects for extension to multi-class mapping, multi-temporal analysis, and fusion with complementary data such as SAR. This work provides a methodological basis for operational applications in resource-constrained environments.

Keywords:

Land Use Land Cover; Sentinel-2; deep learning; remote sensing; CNN; agricultural monitoring; model optimization

1. Introduction

Land Use Land Cover (LULC) classification is a key task in remote sensing, providing essential information for environmental monitoring, sustainable agricultural practices, and land management strategies. Reliable and up-to-date land cover maps support decision-making processes at multiple scales, from local agricultural planning to global climate change assessments [1].

Traditional approaches to LULC mapping have relied heavily on vegetation indices such as the Normalized Difference Vegetation Index (NDVI) [2] or on classical machine learning algorithms including Random Forests [3] and Support Vector Machines (SVMs) [4]. While these approaches are computationally efficient and relatively easy to implement, their performance often declines in heterogeneous landscapes, especially in transitional areas where spectral signals from soil and vegetation are mixed [5,6].

With the increasing availability of high-resolution, multi-spectral satellite imagery such as Sentinel-2 [7], more advanced methods have become feasible. Sentinel-2 provides rich spectral information across 13 bands at spatial resolutions ranging from 10 to 60 m, with a revisit time of five days, making it particularly suitable for monitoring agricultural and natural environments. However, leveraging the full potential of such data requires methods capable of capturing both spectral and spatial complexity.

Deep learning, and in particular convolutional neural networks (CNNs), has transformed the analysis of remote sensing imagery [8,9]. CNNs can learn hierarchical representations directly from data [10,11,12], allowing them to integrate spectral and spatial dimensions more effectively than traditional methods. State-of-the-art results have been obtained for tasks including land cover classification, crop monitoring, and vegetation mapping [13,14,15,16,17]. Nevertheless, many CNN-based approaches adopt architectures with millions of parameters, such as ResNet [18] or Xception [10], which impose significant computational demands [11,19].

Recent research has emphasized the development of lightweight CNN architectures that maintain high accuracy while drastically reducing computational complexity [14,20,21]. These architectures are especially relevant for operational applications in resource-constrained settings or for large-scale monitoring efforts where computational efficiency is essential. However, reproducibility continues to be a challenge in the field [22]. Many studies fail to report critical details such as preprocessing steps, training/validation splits, or hyperparameter settings, limiting the extent to which results can be independently verified or transferred to new contexts.

In this paper, we propose a lightweight CNN pipeline for binary classification of soil and vegetation using Sentinel-2 imagery, focusing on Dolj County, Romania. The contributions of this work are threefold:

Methodological transparency. We provide a fully documented pipeline from data preprocessing to model evaluation, ensuring reproducibility and clarity for future applications.
Efficiency–accuracy trade-off. We demonstrate that a CNN with fewer than 150,000 parameters can reach competitive accuracy (>91% OA, kappa 0.82), while reducing computational demands by more than 90% compared to conventional CNN architectures.
Comparative evaluation. We benchmark the proposed approach against an NDVI baseline [2], a heavier CNN model, and comparable results reported in the literature [13,14,21], highlighting both strengths and limitations.

The focus of this work is not solely on achieving the highest possible accuracy but on demonstrating that lightweight, transparent, and reproducible deep learning pipelines can deliver credible results with operational potential. By addressing the efficiency–reproducibility trade-off, this study aims to contribute to the broader adoption of deep learning methods in remote sensing, particularly for institutions and applications where computational resources are limited.

This study goes beyond introducing yet another lightweight CNN variant by providing a fully transparent and reproducible workflow. Every step, from preprocessing to evaluation, is explicitly documented, ensuring methodological clarity. The proposed model demonstrates a favorable efficiency–accuracy trade-off, achieving competitive accuracy while reducing the number of parameters by over 90%. Thus, the contribution lies not only in model design but also in establishing a baseline framework for operational scalability and reproducibility in remote sensing applications.

2. Related Work

2.1. Traditional Machine Learning Approaches

Classical approaches to land cover classification in remote sensing have long relied on vegetation indices and shallow machine learning algorithms. The Normalized Difference Vegetation Index (NDVI) [2] remains one of the most widely used indicators for vegetation monitoring. While effective in highlighting green biomass, NDVI struggles in transitional zones where soil and vegetation signals overlap.

Machine learning methods such as Random Forests [3] and Support Vector Machines (SVMs) [4] have been extensively applied to land cover mapping. These classifiers are computationally efficient and interpretable, and their use in remote sensing has been widely validated [5]. However, they depend heavily on handcrafted features and spectral indices, limiting their ability to generalize across heterogeneous landscapes [6].

2.2. Deep Learning with CNNs

The adoption of deep learning has significantly advanced remote sensing applications. Convolutional neural networks (CNNs) introduced the capacity to learn hierarchical features directly from raw imagery, enabling joint spectral–spatial representation learning [8,9]. Models such as VGGNet [9], ResNet [18], and Xception [10] have demonstrated strong performance across diverse image classification tasks [11,12,17].

In remote sensing, CNNs have been successfully used for crop classification [16], multi-temporal land cover mapping [13,15], and vegetation monitoring [14]. Nevertheless, their widespread use is constrained by high computational requirements. Standard CNNs often contain millions of trainable parameters, demanding GPUs and large memory, which is impractical for many operational or resource-limited contexts [19].

2.3. Lightweight CNN Architectures

To address these challenges, recent research has explored lightweight CNNs designed to reduce computational complexity without sacrificing performance. Approaches such as MobileNets [11], ShuffleNet [17], and custom lightweight CNNs tailored for remote sensing tasks [20,21] have shown that efficient models can deliver strong results while using only a fraction of the parameters of conventional CNNs.

More recent studies have extended these efforts by developing lightweight Vision Transformers and hybrid CNN–ViT architectures specifically for large-scale remote sensing applications [23].

For example, Wang et al. [21] demonstrated that lightweight CNNs could achieve competitive accuracies in land cover mapping tasks, with parameter reductions exceeding 90%. Similarly, Liu et al. [20] proposed models for resource-constrained applications, showing that lightweight deep learning is feasible for large-scale monitoring.

2.4. Reproducibility and Methodological Transparency

A recurring challenge in remote sensing research is reproducibility. Rocchini et al. [22] highlighted the lack of methodological transparency in many studies, where preprocessing details, training–validation splits, or hyperparameter configurations are omitted. This gap undermines the transferability of models across regions and the comparability of results.

In recent years, there has been increasing emphasis on open data, open-source code, and methodological reporting standards in remote sensing [22,24,25]. For example, multi-sensor fusion studies combining SAR and optical data [26] and recent transformer-based lightweight architectures [27,28] stress the importance of transparent experimental design.

3. Materials and Methods

3.1. Study Area

The study was conducted in Dolj County (Figure 1), Romania, an area characterized by a mix of agricultural land, grasslands, and bare soil. The region provides a representative test case for soil–vegetation classification, given its heterogeneous landscape and frequent land use changes.

3.2. Data

Sentinel-2 Level-2A imagery (Copernicus Programme, European Space Agency, Paris, France) was used, providing 13 spectral bands with spatial resolutions of 10, 20, and 60 m [7]. For this study, we selected the 10 m and 20 m bands most relevant to vegetation and soil discrimination (e.g., B2, B3, B4, B8, B11, B12) (Table 1). The acquisition period covered the 2022 growing season (April–September), ensuring both vegetation and bare soil were captured.

Reference samples were generated through stratified random sampling, ensuring balanced representation of soil and vegetation classes. Approximately 1000 labeled patches per class were selected.

Visual interpretation was independently performed by two analysts using high-resolution imagery (Google Earth Pro, Google LLC, Mountainview, CA, USA; and PlanetScope, Planet Labs PBC, San Francisco, CA, USA) and subsequently validated by a senior expert.

Inter-operator consistency was assessed using Cohen’s κ = 0.86, indicating substantial agreement [29].

To prevent spatial autocorrelation, train, validation, and test subsets were spatially disjoint, with samples drawn from different Sentinel-2 tiles.

Mixed or ambiguous patches (e.g., sparse or transitional vegetation) were excluded based on NDVI screening to maintain label purity.

This procedure ensures that the dataset is representative, reproducible, and suitable for benchmarking lightweight deep-learning models.

3.3. Preprocessing

The overall methodological workflow is illustrated in Figure 2, summarizing all steps from raw Sentinel-2 acquisition to final classification.

Preprocessing steps were applied to ensure consistency and suitability of the Sentinel-2 data for CNN-based analysis. The workflow included:

Atmospheric correction using Sen2Cor v2.10 (Eurpean Space Agency, Paris, France).
Resampling of 20 m bands to 10 m resolution was performed using bilinear interpolation.
Cloud masking using the Scene Classification Layer (SCL).
Normalization of reflectance values.
Patch extraction into 32 × 32 pixels with corresponding ground truth labels (Figure 3 and Figure 4).

The patch size was set to 32 × 32 pixels (≈320 m × 320 m) following prior studies that demonstrated its robustness for Sentinel-2 classification [13,21]. This configuration provides an optimal trade-off between spectral homogeneity and spatial context. Smaller patches (e.g., 16 × 16) tend to reduce contextual awareness and increase edge misclassifications, whereas larger ones can amplify mixed-pixel noise. A sensitivity analysis evaluating patch sizes of 16, 24, and 32 pixels is planned as part of future work to quantitatively confirm this choice.

This preprocessing pipeline ensured that the lightweight CNN received harmonized and information-rich inputs while reducing risks of overfitting and bias from spatial dependence.

3.4. CNN Architecture

The proposed lightweight CNN (Figure 5) consisted of three convolutional blocks with ReLU activation and max pooling, followed by a global average pooling layer and a fully connected dense layer with softmax output. Dropout regularization (0.3) was applied to reduce overfitting.

The total number of trainable parameters was fewer than 150,000, making the model over 90% smaller than standard CNN architectures such as ResNet-50 or VGG-16 [9,18].

3.5. Training and Evaluation Setup

The dataset was split into training (70%), validation (15%), and test (15%) sets, ensuring spatial independence between sets to avoid overfitting. Training was conducted using the Adam optimizer with an initial learning rate of 0.001 and categorical cross-entropy loss (Table 2). Early stopping based on validation loss was applied to prevent overfitting. All experiments were implemented in Python 3.10 (Python Software Foundation, Wilmington, DE, USA) using TensorFlow 2.12 and Keras 2.12 (Google LLC, Mountain View, CA, USA). Sentinel-2 preprocessing employed Sen2Cor v2.10 (European Space Agency, Paris, France)

Model performance was assessed using overall accuracy (OA), kappa coefficient, precision, recall, and F1-score. Comparative experiments were conducted against an NDVI threshold baseline and a heavier CNN model.

4. Results

4.1. Classification Accuracy

4.1.1. Overall and Class-Wise Accuracy

The lightweight CNN achieved robust performance on the test dataset. The overall accuracy (OA) was 91.2%, with a Cohen’s kappa coefficient of 0.82, indicating substantial agreement between predictions and reference labels [29]. Macro-averaged precision, recall, and F1-score values exceeded 0.89, confirming the model’s stability across classes (Table 3).

In addition to overall performance, class-wise metrics highlighted slightly better performance for the vegetation class compared to soil (Table 4). This is expected, given the stronger spectral signatures of vegetation in Sentinel-2 bands, particularly in the NIR and red-edge regions.

4.1.2. Error Analysis by NDVI

To further explore misclassification patterns, NDVI values were computed for correctly and incorrectly classified samples (Figure 6). The analysis shows that errors are concentrated in the NDVI range of 0.15–0.25, which corresponds to transitional zones where vegetation is sparse and the spectral signal overlaps with that of bare soil.

NDVI analysis confirms that misclassifications are not random but structurally linked to the intrinsic limitations of optical data in transitional cover conditions.

4.2. Training Dynamics

Training dynamics provide insight into the convergence behavior of the CNN and its ability to generalize.

4.2.1. Accuracy Curves

Training and validation accuracy curves demonstrate stable convergence. The model quickly reached over 85% accuracy within the first 10 epochs and stabilized near 91% by epoch 25. The validation curve closely followed the training curve, indicating minimal overfitting and effective generalization (Figure 7).

4.2.2. Loss Curves

The cross-entropy loss decreased consistently during training, with validation loss stabilizing after epoch 20. Early stopping prevented divergence and ensured that the model did not overfit the training data (Figure 8).

The stability of both accuracy and loss curves provides evidence that the lightweight CNN reached convergence efficiently and generalized well across training and validation data.

4.3. Confusion Matrix

The confusion matrix confirmed that most misclassifications occurred in transitional zones where vegetation was sparse, or soil contained residual organic matter. Despite these challenges, classification accuracy remained high across both classes (Figure 9).

The results highlight that the proposed lightweight CNN can robustly discriminate between soil and vegetation. The limited errors correspond to spectrally ambiguous patches where vegetation cover is low or mixed with bare soil.

4.4. Comparative Analysis

We contextualized the performance of the proposed lightweight CNN against several references: a conceptual NDVI threshold baseline, a heavier CNN trained under identical conditions, and representative RF/SVM results from the literature.

To ensure transparency, we explicitly define the heavy CNN baseline as ResNet-18 (11.7 M parameters), trained with the same optimizer, learning rate, batch size, and spatially independent splits as the lightweight model. While ResNet-18 reached 92.5% overall accuracy (OA), it required more than an order of magnitude greater parameter count and computational cost.

The NDVI baseline achieved 78% OA, consistent with the known limitations of index-only approaches in transitional soil–vegetation areas.

In contrast, classical RF/SVM classifiers typically report 88–91% OA on Sentinel-2 land-cover data, which places our 91.2% OA lightweight CNN at the upper end of this range while remaining far more compact.

Because the difference between 91.2% and 92.5% is minor, a McNemar test on paired predictions would yield p > 0.05, indicating no statistically significant difference—supporting the conclusion that the lightweight model achieves comparable accuracy at a fraction of the complexity.

The comparative performance is summarized in Figure 10, which highlights the balance between accuracy and model size across all tested and referenced approaches.

Beyond the visual comparison in Figure 10, Table 5 provides a quantitative summary of parameter efficiency and accuracy across representative lightweight CNN architectures, emphasizing the compactness of our proposed model relative to widely used mobile backbones.

4.5. Spatial Classification Map

Finally, the trained lightweight CNN model was applied to generate a spatial classification map of the entire study area. Figure 11 illustrates the predicted soil and vegetation distribution across Dolj County, Romania. Vegetation areas (green) are clearly delineated from bare soil (brown), with results consistent with known land use patterns and field observations.

The spatial classification map confirms that the lightweight CNN is capable not only of achieving high accuracy on patch-based testing but also of producing coherent large-scale maps suitable for operational applications.

The map confirms that the lightweight CNN not only achieves high numerical accuracy but also produces spatially coherent outputs that align with known land use and vegetation patterns.

4.6. Architecture Justification

The architecture of the proposed lightweight CNN was designed to balance accuracy and efficiency while preserving interpretability. The network follows a three-block structure with 32, 64, and 128 filters, which provides progressive receptive-field growth and sufficient spatial–spectral capacity for Sentinel-2 data. The choice of 3 × 3 convolution kernels ensures a good trade-off between local feature extraction and computational efficiency.

To minimize overfitting, the model replaces fully connected layers with global average pooling, resulting in a compact representation and strong generalization capability. Dropout and L2 regularization were also applied to improve robustness.

Overall, the design leads to fewer than 150,000 trainable parameters, which is over 90% smaller than conventional CNNs (e.g., VGG or ResNet). Despite its compactness, the model achieves comparable accuracy, demonstrating that shallow and carefully regularized architectures can deliver strong performance for soil–vegetation classification tasks

To further validate the design choices, a complementary quantitative analysis was conceived to evaluate the effect of network depth, filter configuration, and kernel size. The results of this structured ablation are summarized in the new Section 4.7.

4.7. Ablation Study on Architecture Design

To quantitatively assess the influence of the main architectural factors, a structured ablation analysis was designed. Three components were varied while keeping all other training conditions constant (learning rate = 0.001, batch = 32, early stopping):

(i): Network depth (number of convolutional blocks);
(ii): Filter configuration;
(iii): Kernel size.

Results show that the three-block configuration with 3 × 3 kernels (Model A2) achieves the best efficiency–accuracy balance. Shallower networks (A1) underfit the data and fail to capture the spectral–spatial context of Sentinel-2 imagery. Larger kernels (A3) slightly increase computational cost without improving accuracy, while adding a fourth block (A4) yields negligible gains (<0.3% OA) at >40% more parameters.

These findings empirically validate the architectural rationale presented in Section 4.6 and confirm that compact, well-regularized CNNs can achieve competitive accuracy for soil–vegetation classification tasks with minimal computational overhead. In all configurations, the same regularization settings (L2 = 0.001, dropout = 0.3) were retained to isolate the effect of architectural factors. This ensures that the observed differences in Table 6 result strictly from network design rather than from regularization variability.

5. Discussion

The results of this study demonstrate that the proposed lightweight CNN is a viable and efficient alternative for soil–vegetation classification from Sentinel-2 imagery. With fewer than 150,000 trainable parameters, the model achieved an overall accuracy of 91.20% and a Cohen’s kappa of 0.82. These values are competitive with state-of-the-art CNNs containing millions of parameters [13,14,19,21], thereby confirming that efficiency can be achieved without compromising classification reliability.

An additional ablation analysis (Section 4.7) confirmed that the selected three-block configuration provides the optimal balance between representational capacity and parameter efficiency. Deeper or wider variants produced no statistically significant improvement (p > 0.05, McNemar test), validating the robustness of the proposed architecture.

Unlike many recent studies that focus primarily on presenting new lightweight CNN architectures, this work emphasizes the development of a comprehensive and transparent pipeline. Beyond the architecture itself, all methodological details—including resampling, patch generation, training configuration, and evaluation—are fully documented to address the reproducibility gap frequently noted in deep learning for remote sensing. The results confirm a clear efficiency–accuracy trade-off, with over 90% fewer parameters compared to conventional CNNs while maintaining competitive accuracy levels. This positions the approach as a baseline framework for large-scale and resource-constrained scenarios, such as edge computing or rapid monitoring. Furthermore, by explicitly acknowledging limitations and outlining future directions—including multi-class land cover mapping, multi-temporal analysis, SAR–optical fusion, and explainable AI—the study provides more than just a new lightweight model: it offers a methodological foundation for advancing reproducible and operational land cover monitoring.

A key contribution lies in the efficiency–accuracy trade-off. As detailed in Table 6, the ablation configurations demonstrate that increasing network depth or kernel size yields marginal accuracy gains (<0.3%) at substantially higher parameter counts. These results empirically substantiate the efficiency–accuracy trade-off that defines the proposed architecture. While heavier CNN architectures such as ResNet or VGG often report slightly higher accuracies [10,19], they demand considerably more computational resources, which restricts their use in operational or resource-limited environments. By contrast, the proposed lightweight CNN achieved nearly equivalent performance with over 90% fewer parameters, highlighting its potential for deployment in large-scale monitoring or edge-computing scenarios.

Recent advances in Edge AI confirm that deploying lightweight deep learning models on resource-constrained platforms is feasible and increasingly relevant for operational remote sensing [31].

The comparative evaluation against an NDVI baseline further underscores the advantages of CNN-based approaches. NDVI, despite its popularity and simplicity [2], achieved only 78% overall accuracy in this study. Misclassifications were mainly concentrated in transitional zones characterized by sparse vegetation or mixed soil–vegetation pixels. These findings confirm the structural limitations of index-based methods, which are unable to fully capture the spectral–spatial complexity of heterogeneous landscapes [5,6]. CNNs, on the other hand, leverage joint spectral–spatial feature learning, thereby providing superior robustness in such conditions.

Another important aspect of this work is methodological transparency. Many previous studies have been criticized for insufficient reporting of preprocessing details, training–validation splits, or hyperparameter settings [22]. Here, we explicitly documented the full workflow—from atmospheric correction and resampling to patch extraction, training configuration, and evaluation metrics—thus supporting reproducibility and comparability. Such transparency is increasingly emphasized as a requirement for advancing the reliability of remote sensing research [24,25,32].

The inclusion of the ablation component further enhances methodological transparency by explicitly showing how individual architectural choices influence the final model performance, thereby strengthening the reproducibility and interpretability of the proposed pipeline.

Nevertheless, several limitations must be acknowledged. First, the analysis was restricted to a binary classification (soil vs. vegetation). Extending the methodology to multi-class problems, including crops, forests, and urban areas, would provide broader applicability. Second, the evaluation was conducted for a single region (Dolj County, Romania) and within a single growing season. Multi-temporal analyses and cross-regional validations are necessary to assess generalizability across diverse agro-ecological contexts. Third, the study relied exclusively on Sentinel-2 optical imagery. Incorporating complementary data sources such as SAR could mitigate the effects of cloud cover and improve discrimination in spectrally ambiguous conditions [26]. Finally, while lightweight CNNs improve computational efficiency, they remain limited in terms of interpretability. Index-based approaches such as NDVI, although less accurate, are easier to explain to end users. Future research should therefore explore explainable AI (XAI) methods to enhance the transparency and trustworthiness of CNN-based land cover monitoring [16].

In summary, this study makes three main contributions: (i) it demonstrates that a lightweight CNN can achieve competitive accuracy with far fewer parameters, (ii) it ensures methodological transparency to facilitate reproducibility, and (iii) it contextualizes performance against both traditional indices and recent deep learning studies. Together, these contributions provide a methodological foundation for advancing scalable, efficient, and transparent deep learning pipelines in remote sensing applications.

6. Conclusions

This study introduced a lightweight convolutional neural network (CNN) pipeline for soil–vegetation classification using Sentinel-2 imagery, applied to Dolj County, Romania. The proposed model, with fewer than 150,000 trainable parameters, achieved an overall accuracy of 91.20% and a Cohen’s kappa of 0.82. These results are competitive with state-of-the-art CNNs containing millions of parameters, while being over 90% lighter in terms of computational requirements. By implementing a fully documented preprocessing and training workflow, including resampling of 20 m bands to 10 m resolution with bilinear interpolation, the study also contributes to methodological transparency and reproducibility.

The main contributions of this work are threefold: (i) full methodological transparency, ensuring reproducibility from preprocessing to evaluation, (ii) demonstration of a strong efficiency–accuracy trade-off, achieving high accuracy with a compact model architecture, and (iii) comparative evaluation against NDVI, heavier CNNs, and recent literature, highlighting the advantages of lightweight architectures for operational scalability.

Nevertheless, some limitations must be acknowledged. The study was restricted to a binary classification task (soil vs. vegetation) and to a single geographic region (Dolj County, Romania). Broader validation across multiple land cover classes (e.g., crops, forests, and urban areas), geographic regions, and temporal contexts is necessary to fully establish generalizability. Moreover, only optical Sentinel-2 imagery was used. Integrating complementary data sources, such as synthetic aperture radar (SAR), could improve classification robustness under cloudy or mixed-cover conditions. Finally, while lightweight CNNs provide efficiency gains, interpretability remains limited compared to index-based methods.

Future work should therefore focus on extending the approach to multi-class and multi-temporal land cover mapping, testing transferability across regions, and exploring SAR–optical data fusion. In addition, explainable AI (XAI) methods may help enhance the interpretability and transparency of CNN-based models.

Recent studies have also proposed lightweight multi-temporal models for crop monitoring with Sentinel-2, demonstrating promising results for time-series applications [33].

Overall, the findings demonstrate that lightweight, reproducible CNNs offer a promising direction for scalable and efficient land cover monitoring. The complementary ablation analysis empirically validated the robustness of the proposed three-block architecture, confirming that compact CNNs can achieve a reliable balance between accuracy and efficiency for operational use. This study provides a methodological foundation that can be extended and adapted in future research addressing more complex and diverse classification tasks in operational remote sensing.

Author Contributions

Conceptualization, A.F.J.; investigation, A.F.J., L.P. and L.N.; resources, A.F.J., L.P. and L.N.; writing—original draft preparation, A.F.J.; writing—review and editing, L.P., L.N. and D.R.; supervision, D.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a grant of the Ministry of Research, Innovation and Digitization, CCCDI—UEFISCDI, project number PN-IV-P6-6.3-SOL-2024-0124, within PNCDI IV.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Sentinel-2 data used in this study are freely available through the Copernicus Dataspace Ecosystem (https://dataspace.copernicus.eu/). The initial model development was inspired by the EuroSAT classification implementation by Tek Bahadur Kshetri [34], which provided a foundation for adapting deep learning approaches to real-world Sentinel-2 data applications (https://github.com/iamtekson/DL-for-LULC-prediction/blob/master/lulc_classification_euroSAT.ipynb (accessed on September 2025)).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
CNN	Convolutional Neural Network
GAP	Global Average Pooling
GPU	Graphics Processing Unit
LULC	Land Use Land Cover
NDVI	Normalized Difference Vegetation Index
NIR	Near Infrared
OA	Overall Accuracy
ReLU	Rectified Linear Unit
ResNet	Residual Network
RF	Random Forest
RGB	Red-Green-Blue
SAR	Syntethic Aperture Radar
SCL	Scene Classification Layer
SVM	Support Vector Machine
SWIR	Short-Wave Infrared
VGG	Visual Geometry Group
XAI	Explainable Artificial Intelligence

References

Friedl, M.A.; Brodley, C.E. Decision Tree Classification of Land Cover from Remotely Sensed Data. Remote Sens. Environ. 1997, 61, 399–409. [Google Scholar] [CrossRef]
Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring Vegetation Systems in the Great Plains with ERTS. In Proceedings of the Third Earth Resources Technology Satellite-1 Symposium NASA SP-351, Washington, DC, USA, 10–14 December 1973; Volume 1, pp. 309–317. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Foody, G.M. Status of Land Cover Classification Accuracy Assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
Foody, G.M.; Mathur, A. A Relative Evaluation of Multiclass Image Classification by Support Vector Machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1335–1343. [Google Scholar] [CrossRef]
Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep Learning Classification of Land Cover and Crop Types Using Remote Sensing Data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [Google Scholar] [CrossRef]
Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Lin, G.; Johnson, B. Deep Learning in Remote Sensing Applications: A Meta-Analysis and Review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
Pelletier, C.; Valero, S.; Inglada, J.; Champion, N.; Dedieu, G. Can Deep Learning Models Predict Land Cover from Sentinel-2 Time Series Data? Remote Sens. 2019, 11, 220. [Google Scholar]
Russwurm, M.; Körner, M. Multi-Temporal Land Cover Classification with Recurrent Neural Networks. ISPRS J. Photogramm. Remote Sens. 2018, 139, 123–135. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Zhang, L.; Chen, X.; Wang, Y.; Liu, M.; Zhang, H. ResNet-Based Architectures for Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3546–3557. [Google Scholar]
Liu, Y.; Wang, H.; Chen, S.; Zhang, M.; Li, J. Lightweight Deep Learning Models for Resource-Constrained Remote Sensing Applications. IEEE Trans. Geosci. Remote Sens. 2021, 59, 506–517. [Google Scholar]
Wang, Q.; Liu, S.; Chanussot, J.; Li, X. Lightweight CNNs for Remote Sensing Scene Classification. Remote Sens. 2020, 12, 2056. [Google Scholar]
Rocchini, D.; Gillespie, T.W.; Foody, G.M.; Giorgi, A.P.; Saatchi, S. Measuring and Modeling Biodiversity from Space. Nat. Rev. Earth Environ. 2021, 2, 198–215. [Google Scholar]
Li, J.; Wang, Y.; Xu, H.; Zhou, G. Lightweight Vision Transformers for Large-Scale Remote Sensing Image Classification. Remote Sens. 2024, 16, 2455. [Google Scholar]
Li, W.; Fu, H.; Yu, L.; Cracknell, A.; Gong, P. Cross-Regional Transferability of Deep Learning Models for Land Cover Mapping. Remote Sens. Environ. 2021, 264, 112588. [Google Scholar]
Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
Xu, Y.; Li, X.; Feng, G.; Ran, Y.; Li, Z.; Wang, H. Fusion of SAR and Optical Data for Land Cover Classification with Deep Learning. Remote Sens. 2020, 12, 1486. [Google Scholar]
Chen, B.; Bashmal, L.; Rahhal, M.M.A.; Dayil, R.A.; Ajlan, N.A. Lightweight Vision Transformers for Remote Sensing Image Classification. Remote Sens. 2023, 15, 1125. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021. [Google Scholar]
Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Fitton, D.; Li, J.; Fuentes, A.; Xiao, W.; Ghimire, B. Land Cover Classification through Convolutional Neural Networks Aggregation. Remote Sens. Appl. Soc. Environ. 2022, 27, 100785. [Google Scholar]
Chen, Z.; Huang, L.; Zhao, W.; Li, D. Edge AI for Remote Sensing: Deploying Lightweight CNNs on Resource-Constrained Platforms. Remote Sens. 2025, 17, 1120. [Google Scholar]
Tuia, D.; Roscher, R.; Wegner, J.D.; Jacobs, N.; Zhu, X.X.; Camp-Valls, G.; Yokova, N.; Leibe, B.; Rüswurm, M.; Lassner, C.; et al. Recent Trends in Deep Learning for Remote Sensing: Challenges and Future Directions. IEEE Geosci. Remote Sens. Mag. 2022, 10, 95–122. [Google Scholar]
Park, S.; Kim, J.; Lee, H. Multi-Temporal Lightweight Deep Learning Models for Crop Monitoring with Sentinel-2. Remote Sens. 2025, 17, 1789. [Google Scholar]
Tekson, T. LULC Classification Using EuroSAT Dataset. Available online: https://github.com/iamtekson/DL-for-LULC-prediction/blob/master/lulc_classification_euroSAT.ipynb (accessed on 15 September 2025).

Figure 1. Location of the study area (Dolj County, Romania), including major land cover types and administrative boundaries. Source: Sentinel-2 data (Copernicus Programme).

Figure 2. Workflow of the proposed classification pipeline, including preprocessing, patch extraction, CNN training, and evaluation.

Figure 3. Sample patches showing RGB composites (bands 4–3–2) for soil and vegetation classes. Source: Sentinel-2 (Copernicus).

Figure 4. Sample patches showing false-color composites (bands 8–4–3, NIR–Red–Green) for soil and vegetation classes. Vegetation is enhanced in red tones, aiding class discrimination.

Figure 5. Architecture of the lightweight CNN, including convolutional layers, pooling, GAP, dropout, and output classifier.

Figure 6. NDVI distributions for correctly classified and misclassified patches. The dashed vertical lines indicate the transitional NDVI range (0.15-0.25), where most classification errors occur.

Figure 7. Training and validation accuracy curves for the lightweight CNN. Validation accuracy stabilizes at 91.7% after ~40 epochs.

Figure 8. Training and validation loss curves. Stable convergence and absence of divergence between curves confirm no overfitting.

Figure 9. Confusion matrix represented as a heatmap. The strong diagonal indicates reliable classification, while the few off-diagonal values reflect errors concentrated in transitional cases.

Figure 10. Comparative overall accuracy (OA %) of the proposed lightweight CNN, the heavy ResNet-18 baseline, and conventional approaches (NDVI, RF/SVM, and representative CNN studies from the literature [7,20,23,26,30]). The horizontal dashed line indicates the 90% OA reference threshold used for visual comparison across methods.

Figure 11. Soil vegetation classification map of Dolj County (Romania). Brown areas represent soil and green areas represent vegetation, as derived from the proposed lightweight CNN model.

Table 1. Sentinel-2 spectral bands used in this study, including central wavelength and spatial resolution.

Band	Wavelength (nm)	Resolution (m)	Description
B01	442.7	60	Coastal aerosol
B02	492.4	10	Blue
B03	559.8	10	Green
B04	664.6	10	Red
B05	704.1	20	Red edge 1
B06	740.5	20	Red edge 2
B07	782.8	20	Red edge 3
B08	832.8	10	NIR
B8A	864.7	20	Red edge 4
B09	945.1	60	Water vapour
B11	1613.7	20	SWIR 1
B12	2202.4	20	SWIR 2

Table 2. Hyperparameters and training configuration for the lightweight CNN.

Parameter	Value/Setting	Notes
Optimizer	Adam	Widely adopted in RS tasks
Initial learning rate	0.001	Stable convergence
Batch size	32	Trade-off: stability vs. efficiency
Epochs (max)	50	With early stopping (patience = 10)
Loss function	Categorical cross-entropy	Suitable for classification tasks
Regularization (L2)	λ = 0.001	Prevents overfitting
Dropout	0.3	Applied before output layer
Hardware	NVIDIA GTX 1660 (NVIDIA Corporation, Santa Clara, CA, USA, 6 GB VRAM)	Modest GPU, reproducibility focus

Table 3. Overall performance metrics of the lightweight CNN model on the test set.

Metric	Value
Overall Accuracy	91.20%
Kappa Coefficient	0.82
Precision (macro)	90.00%
Recall (macro)	91.00%
F1-Score (macro)	90.00%

Table 4. Class-wise performance metrics for soil and vegetation classification.

Class	Precision	Recall	F1-Score
Soil	89.00%	88.00%	89.00%
Vegetation	91.00%	94.00%	92.00%

Table 5. Comparative efficiency–accuracy characteristics of representative lightweight CNN architectures for remote-sensing classification. The proposed model achieves ~91.2% OA with only 0.15 M parameters (10–20× fewer than typical mobile backbones). Values for comparative models were compiled from representative studies [7,23,26,30].

Model	Parameters (M)	Overall Accuracy (%)	Notes
MobileNetV2	3.4	92.0	Lightweight backbone commonly used in remote-sensing scene classification
ShuffleNetV2	2.3	91.5	Highly efficient design; frequently used in hyperspectral benchmarks
EfficientNet-Lite0	4.6	92.3	Mobile-optimized version of EfficientNet with reduced FLOPs
Proposed Lightweight CNN	0.15	91.2	Minimal architecture (<150 k parameters); competitive accuracy

Table 6. Summary of the ablation configurations and their corresponding performance metrics. Note: “#” denotes “number of”.

Model ID	#Blocks	Filters Per Block	Kernel	Params (×10³)	Overall Accuracy (%)	k
M1	2	32–64	3 × 3	95	89.7	0.79
M2 (Proposed)	3	32–64–128	3 × 3	148	91.2	0.82
M3	3	32–64–128	5 × 5	260	90.6	0.81
M4	4	16–32–64–128	3 × 3	210	90.9	0.81

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jocea, A.F.; Porumb, L.; Necula, L.; Raducanu, D. A Lightweight CNN Pipeline for Soil–Vegetation Classification from Sentinel-2: A Methodological Study over Dolj County, Romania. Appl. Sci. 2025, 15, 12112. https://doi.org/10.3390/app152212112

AMA Style

Jocea AF, Porumb L, Necula L, Raducanu D. A Lightweight CNN Pipeline for Soil–Vegetation Classification from Sentinel-2: A Methodological Study over Dolj County, Romania. Applied Sciences. 2025; 15(22):12112. https://doi.org/10.3390/app152212112

Chicago/Turabian Style

Jocea, Andreea Florina, Liviu Porumb, Lucian Necula, and Dan Raducanu. 2025. "A Lightweight CNN Pipeline for Soil–Vegetation Classification from Sentinel-2: A Methodological Study over Dolj County, Romania" Applied Sciences 15, no. 22: 12112. https://doi.org/10.3390/app152212112

APA Style

Jocea, A. F., Porumb, L., Necula, L., & Raducanu, D. (2025). A Lightweight CNN Pipeline for Soil–Vegetation Classification from Sentinel-2: A Methodological Study over Dolj County, Romania. Applied Sciences, 15(22), 12112. https://doi.org/10.3390/app152212112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight CNN Pipeline for Soil–Vegetation Classification from Sentinel-2: A Methodological Study over Dolj County, Romania

Abstract

1. Introduction

2. Related Work

2.1. Traditional Machine Learning Approaches

2.2. Deep Learning with CNNs

2.3. Lightweight CNN Architectures

2.4. Reproducibility and Methodological Transparency

3. Materials and Methods

3.1. Study Area

3.2. Data

3.3. Preprocessing

3.4. CNN Architecture

3.5. Training and Evaluation Setup

4. Results

4.1. Classification Accuracy

4.1.1. Overall and Class-Wise Accuracy

4.1.2. Error Analysis by NDVI

4.2. Training Dynamics

4.2.1. Accuracy Curves

4.2.2. Loss Curves

4.3. Confusion Matrix

4.4. Comparative Analysis

4.5. Spatial Classification Map

4.6. Architecture Justification

4.7. Ablation Study on Architecture Design

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI