1. Introduction
Land Use Land Cover (LULC) classification is a key task in remote sensing, providing essential information for environmental monitoring, sustainable agricultural practices, and land management strategies. Reliable and up-to-date land cover maps support decision-making processes at multiple scales, from local agricultural planning to global climate change assessments [
1].
Traditional approaches to LULC mapping have relied heavily on vegetation indices such as the Normalized Difference Vegetation Index (NDVI) [
2] or on classical machine learning algorithms including Random Forests [
3] and Support Vector Machines (SVMs) [
4]. While these approaches are computationally efficient and relatively easy to implement, their performance often declines in heterogeneous landscapes, especially in transitional areas where spectral signals from soil and vegetation are mixed [
5,
6].
With the increasing availability of high-resolution, multi-spectral satellite imagery such as Sentinel-2 [
7], more advanced methods have become feasible. Sentinel-2 provides rich spectral information across 13 bands at spatial resolutions ranging from 10 to 60 m, with a revisit time of five days, making it particularly suitable for monitoring agricultural and natural environments. However, leveraging the full potential of such data requires methods capable of capturing both spectral and spatial complexity.
Deep learning, and in particular convolutional neural networks (CNNs), has transformed the analysis of remote sensing imagery [
8,
9]. CNNs can learn hierarchical representations directly from data [
10,
11,
12], allowing them to integrate spectral and spatial dimensions more effectively than traditional methods. State-of-the-art results have been obtained for tasks including land cover classification, crop monitoring, and vegetation mapping [
13,
14,
15,
16,
17]. Nevertheless, many CNN-based approaches adopt architectures with millions of parameters, such as ResNet [
18] or Xception [
10], which impose significant computational demands [
11,
19].
Recent research has emphasized the development of lightweight CNN architectures that maintain high accuracy while drastically reducing computational complexity [
14,
20,
21]. These architectures are especially relevant for operational applications in resource-constrained settings or for large-scale monitoring efforts where computational efficiency is essential. However, reproducibility continues to be a challenge in the field [
22]. Many studies fail to report critical details such as preprocessing steps, training/validation splits, or hyperparameter settings, limiting the extent to which results can be independently verified or transferred to new contexts.
In this paper, we propose a lightweight CNN pipeline for binary classification of soil and vegetation using Sentinel-2 imagery, focusing on Dolj County, Romania. The contributions of this work are threefold:
Methodological transparency. We provide a fully documented pipeline from data preprocessing to model evaluation, ensuring reproducibility and clarity for future applications.
Efficiency–accuracy trade-off. We demonstrate that a CNN with fewer than 150,000 parameters can reach competitive accuracy (>91% OA, kappa 0.82), while reducing computational demands by more than 90% compared to conventional CNN architectures.
Comparative evaluation. We benchmark the proposed approach against an NDVI baseline [
2], a heavier CNN model, and comparable results reported in the literature [
13,
14,
21], highlighting both strengths and limitations.
The focus of this work is not solely on achieving the highest possible accuracy but on demonstrating that lightweight, transparent, and reproducible deep learning pipelines can deliver credible results with operational potential. By addressing the efficiency–reproducibility trade-off, this study aims to contribute to the broader adoption of deep learning methods in remote sensing, particularly for institutions and applications where computational resources are limited.
This study goes beyond introducing yet another lightweight CNN variant by providing a fully transparent and reproducible workflow. Every step, from preprocessing to evaluation, is explicitly documented, ensuring methodological clarity. The proposed model demonstrates a favorable efficiency–accuracy trade-off, achieving competitive accuracy while reducing the number of parameters by over 90%. Thus, the contribution lies not only in model design but also in establishing a baseline framework for operational scalability and reproducibility in remote sensing applications.
2. Related Work
2.1. Traditional Machine Learning Approaches
Classical approaches to land cover classification in remote sensing have long relied on vegetation indices and shallow machine learning algorithms. The Normalized Difference Vegetation Index (NDVI) [
2] remains one of the most widely used indicators for vegetation monitoring. While effective in highlighting green biomass, NDVI struggles in transitional zones where soil and vegetation signals overlap.
Machine learning methods such as Random Forests [
3] and Support Vector Machines (SVMs) [
4] have been extensively applied to land cover mapping. These classifiers are computationally efficient and interpretable, and their use in remote sensing has been widely validated [
5]. However, they depend heavily on handcrafted features and spectral indices, limiting their ability to generalize across heterogeneous landscapes [
6].
2.2. Deep Learning with CNNs
The adoption of deep learning has significantly advanced remote sensing applications. Convolutional neural networks (CNNs) introduced the capacity to learn hierarchical features directly from raw imagery, enabling joint spectral–spatial representation learning [
8,
9]. Models such as VGGNet [
9], ResNet [
18], and Xception [
10] have demonstrated strong performance across diverse image classification tasks [
11,
12,
17].
In remote sensing, CNNs have been successfully used for crop classification [
16], multi-temporal land cover mapping [
13,
15], and vegetation monitoring [
14]. Nevertheless, their widespread use is constrained by high computational requirements. Standard CNNs often contain millions of trainable parameters, demanding GPUs and large memory, which is impractical for many operational or resource-limited contexts [
19].
2.3. Lightweight CNN Architectures
To address these challenges, recent research has explored lightweight CNNs designed to reduce computational complexity without sacrificing performance. Approaches such as MobileNets [
11], ShuffleNet [
17], and custom lightweight CNNs tailored for remote sensing tasks [
20,
21] have shown that efficient models can deliver strong results while using only a fraction of the parameters of conventional CNNs.
More recent studies have extended these efforts by developing lightweight Vision Transformers and hybrid CNN–ViT architectures specifically for large-scale remote sensing applications [
23].
For example, Wang et al. [
21] demonstrated that lightweight CNNs could achieve competitive accuracies in land cover mapping tasks, with parameter reductions exceeding 90%. Similarly, Liu et al. [
20] proposed models for resource-constrained applications, showing that lightweight deep learning is feasible for large-scale monitoring.
2.4. Reproducibility and Methodological Transparency
A recurring challenge in remote sensing research is reproducibility. Rocchini et al. [
22] highlighted the lack of methodological transparency in many studies, where preprocessing details, training–validation splits, or hyperparameter configurations are omitted. This gap undermines the transferability of models across regions and the comparability of results.
In recent years, there has been increasing emphasis on open data, open-source code, and methodological reporting standards in remote sensing [
22,
24,
25]. For example, multi-sensor fusion studies combining SAR and optical data [
26] and recent transformer-based lightweight architectures [
27,
28] stress the importance of transparent experimental design.
3. Materials and Methods
3.1. Study Area
The study was conducted in Dolj County (
Figure 1), Romania, an area characterized by a mix of agricultural land, grasslands, and bare soil. The region provides a representative test case for soil–vegetation classification, given its heterogeneous landscape and frequent land use changes.
3.2. Data
Sentinel-2 Level-2A imagery (Copernicus Programme, European Space Agency, Paris, France) was used, providing 13 spectral bands with spatial resolutions of 10, 20, and 60 m [
7]. For this study, we selected the 10 m and 20 m bands most relevant to vegetation and soil discrimination (e.g., B2, B3, B4, B8, B11, B12) (
Table 1). The acquisition period covered the 2022 growing season (April–September), ensuring both vegetation and bare soil were captured.
Reference samples were generated through stratified random sampling, ensuring balanced representation of soil and vegetation classes. Approximately 1000 labeled patches per class were selected.
Visual interpretation was independently performed by two analysts using high-resolution imagery (Google Earth Pro, Google LLC, Mountainview, CA, USA; and PlanetScope, Planet Labs PBC, San Francisco, CA, USA) and subsequently validated by a senior expert.
Inter-operator consistency was assessed using Cohen’s κ = 0.86, indicating substantial agreement [
29].
To prevent spatial autocorrelation, train, validation, and test subsets were spatially disjoint, with samples drawn from different Sentinel-2 tiles.
Mixed or ambiguous patches (e.g., sparse or transitional vegetation) were excluded based on NDVI screening to maintain label purity.
This procedure ensures that the dataset is representative, reproducible, and suitable for benchmarking lightweight deep-learning models.
3.3. Preprocessing
The overall methodological workflow is illustrated in
Figure 2, summarizing all steps from raw Sentinel-2 acquisition to final classification.
Preprocessing steps were applied to ensure consistency and suitability of the Sentinel-2 data for CNN-based analysis. The workflow included:
Atmospheric correction using Sen2Cor v2.10 (Eurpean Space Agency, Paris, France).
Resampling of 20 m bands to 10 m resolution was performed using bilinear interpolation.
Cloud masking using the Scene Classification Layer (SCL).
Normalization of reflectance values.
Patch extraction into 32 × 32 pixels with corresponding ground truth labels (
Figure 3 and
Figure 4).
The patch size was set to 32 × 32 pixels (≈320 m × 320 m) following prior studies that demonstrated its robustness for Sentinel-2 classification [
13,
21]. This configuration provides an optimal trade-off between spectral homogeneity and spatial context. Smaller patches (e.g., 16 × 16) tend to reduce contextual awareness and increase edge misclassifications, whereas larger ones can amplify mixed-pixel noise. A sensitivity analysis evaluating patch sizes of 16, 24, and 32 pixels is planned as part of future work to quantitatively confirm this choice.
This preprocessing pipeline ensured that the lightweight CNN received harmonized and information-rich inputs while reducing risks of overfitting and bias from spatial dependence.
3.4. CNN Architecture
The proposed lightweight CNN (
Figure 5) consisted of three convolutional blocks with ReLU activation and max pooling, followed by a global average pooling layer and a fully connected dense layer with softmax output. Dropout regularization (0.3) was applied to reduce overfitting.
The total number of trainable parameters was fewer than 150,000, making the model over 90% smaller than standard CNN architectures such as ResNet-50 or VGG-16 [
9,
18].
3.5. Training and Evaluation Setup
The dataset was split into training (70%), validation (15%), and test (15%) sets, ensuring spatial independence between sets to avoid overfitting. Training was conducted using the Adam optimizer with an initial learning rate of 0.001 and categorical cross-entropy loss (
Table 2). Early stopping based on validation loss was applied to prevent overfitting. All experiments were implemented in Python 3.10 (Python Software Foundation, Wilmington, DE, USA) using TensorFlow 2.12 and Keras 2.12 (Google LLC, Mountain View, CA, USA). Sentinel-2 preprocessing employed Sen2Cor v2.10 (European Space Agency, Paris, France)
Model performance was assessed using overall accuracy (OA), kappa coefficient, precision, recall, and F1-score. Comparative experiments were conducted against an NDVI threshold baseline and a heavier CNN model.
4. Results
4.1. Classification Accuracy
4.1.1. Overall and Class-Wise Accuracy
The lightweight CNN achieved robust performance on the test dataset. The overall accuracy (OA) was 91.2%, with a Cohen’s kappa coefficient of 0.82, indicating substantial agreement between predictions and reference labels [
29]. Macro-averaged precision, recall, and F1-score values exceeded 0.89, confirming the model’s stability across classes (
Table 3).
In addition to overall performance, class-wise metrics highlighted slightly better performance for the vegetation class compared to soil (
Table 4). This is expected, given the stronger spectral signatures of vegetation in Sentinel-2 bands, particularly in the NIR and red-edge regions.
4.1.2. Error Analysis by NDVI
To further explore misclassification patterns, NDVI values were computed for correctly and incorrectly classified samples (
Figure 6). The analysis shows that errors are concentrated in the NDVI range of 0.15–0.25, which corresponds to transitional zones where vegetation is sparse and the spectral signal overlaps with that of bare soil.
NDVI analysis confirms that misclassifications are not random but structurally linked to the intrinsic limitations of optical data in transitional cover conditions.
4.2. Training Dynamics
Training dynamics provide insight into the convergence behavior of the CNN and its ability to generalize.
4.2.1. Accuracy Curves
Training and validation accuracy curves demonstrate stable convergence. The model quickly reached over 85% accuracy within the first 10 epochs and stabilized near 91% by epoch 25. The validation curve closely followed the training curve, indicating minimal overfitting and effective generalization (
Figure 7).
4.2.2. Loss Curves
The cross-entropy loss decreased consistently during training, with validation loss stabilizing after epoch 20. Early stopping prevented divergence and ensured that the model did not overfit the training data (
Figure 8).
The stability of both accuracy and loss curves provides evidence that the lightweight CNN reached convergence efficiently and generalized well across training and validation data.
4.3. Confusion Matrix
The confusion matrix confirmed that most misclassifications occurred in transitional zones where vegetation was sparse, or soil contained residual organic matter. Despite these challenges, classification accuracy remained high across both classes (
Figure 9).
The results highlight that the proposed lightweight CNN can robustly discriminate between soil and vegetation. The limited errors correspond to spectrally ambiguous patches where vegetation cover is low or mixed with bare soil.
4.4. Comparative Analysis
We contextualized the performance of the proposed lightweight CNN against several references: a conceptual NDVI threshold baseline, a heavier CNN trained under identical conditions, and representative RF/SVM results from the literature.
To ensure transparency, we explicitly define the heavy CNN baseline as ResNet-18 (11.7 M parameters), trained with the same optimizer, learning rate, batch size, and spatially independent splits as the lightweight model. While ResNet-18 reached 92.5% overall accuracy (OA), it required more than an order of magnitude greater parameter count and computational cost.
The NDVI baseline achieved 78% OA, consistent with the known limitations of index-only approaches in transitional soil–vegetation areas.
In contrast, classical RF/SVM classifiers typically report 88–91% OA on Sentinel-2 land-cover data, which places our 91.2% OA lightweight CNN at the upper end of this range while remaining far more compact.
Because the difference between 91.2% and 92.5% is minor, a McNemar test on paired predictions would yield p > 0.05, indicating no statistically significant difference—supporting the conclusion that the lightweight model achieves comparable accuracy at a fraction of the complexity.
The comparative performance is summarized in
Figure 10, which highlights the balance between accuracy and model size across all tested and referenced approaches.
Beyond the visual comparison in
Figure 10,
Table 5 provides a quantitative summary of parameter efficiency and accuracy across representative lightweight CNN architectures, emphasizing the compactness of our proposed model relative to widely used mobile backbones.
4.5. Spatial Classification Map
Finally, the trained lightweight CNN model was applied to generate a spatial classification map of the entire study area.
Figure 11 illustrates the predicted soil and vegetation distribution across Dolj County, Romania. Vegetation areas (green) are clearly delineated from bare soil (brown), with results consistent with known land use patterns and field observations.
The spatial classification map confirms that the lightweight CNN is capable not only of achieving high accuracy on patch-based testing but also of producing coherent large-scale maps suitable for operational applications.
The map confirms that the lightweight CNN not only achieves high numerical accuracy but also produces spatially coherent outputs that align with known land use and vegetation patterns.
4.6. Architecture Justification
The architecture of the proposed lightweight CNN was designed to balance accuracy and efficiency while preserving interpretability. The network follows a three-block structure with 32, 64, and 128 filters, which provides progressive receptive-field growth and sufficient spatial–spectral capacity for Sentinel-2 data. The choice of 3 × 3 convolution kernels ensures a good trade-off between local feature extraction and computational efficiency.
To minimize overfitting, the model replaces fully connected layers with global average pooling, resulting in a compact representation and strong generalization capability. Dropout and L2 regularization were also applied to improve robustness.
Overall, the design leads to fewer than 150,000 trainable parameters, which is over 90% smaller than conventional CNNs (e.g., VGG or ResNet). Despite its compactness, the model achieves comparable accuracy, demonstrating that shallow and carefully regularized architectures can deliver strong performance for soil–vegetation classification tasks
To further validate the design choices, a complementary quantitative analysis was conceived to evaluate the effect of network depth, filter configuration, and kernel size. The results of this structured ablation are summarized in the new
Section 4.7.
4.7. Ablation Study on Architecture Design
To quantitatively assess the influence of the main architectural factors, a structured ablation analysis was designed. Three components were varied while keeping all other training conditions constant (learning rate = 0.001, batch = 32, early stopping):
- (i)
Network depth (number of convolutional blocks);
- (ii)
Filter configuration;
- (iii)
Kernel size.
Results show that the three-block configuration with 3 × 3 kernels (Model A2) achieves the best efficiency–accuracy balance. Shallower networks (A1) underfit the data and fail to capture the spectral–spatial context of Sentinel-2 imagery. Larger kernels (A3) slightly increase computational cost without improving accuracy, while adding a fourth block (A4) yields negligible gains (<0.3% OA) at >40% more parameters.
These findings empirically validate the architectural rationale presented in
Section 4.6 and confirm that compact, well-regularized CNNs can achieve competitive accuracy for soil–vegetation classification tasks with minimal computational overhead. In all configurations, the same regularization settings (L2 = 0.001, dropout = 0.3) were retained to isolate the effect of architectural factors. This ensures that the observed differences in
Table 6 result strictly from network design rather than from regularization variability.
5. Discussion
The results of this study demonstrate that the proposed lightweight CNN is a viable and efficient alternative for soil–vegetation classification from Sentinel-2 imagery. With fewer than 150,000 trainable parameters, the model achieved an overall accuracy of 91.20% and a Cohen’s kappa of 0.82. These values are competitive with state-of-the-art CNNs containing millions of parameters [
13,
14,
19,
21], thereby confirming that efficiency can be achieved without compromising classification reliability.
An additional ablation analysis (
Section 4.7) confirmed that the selected three-block configuration provides the optimal balance between representational capacity and parameter efficiency. Deeper or wider variants produced no statistically significant improvement (
p > 0.05, McNemar test), validating the robustness of the proposed architecture.
Unlike many recent studies that focus primarily on presenting new lightweight CNN architectures, this work emphasizes the development of a comprehensive and transparent pipeline. Beyond the architecture itself, all methodological details—including resampling, patch generation, training configuration, and evaluation—are fully documented to address the reproducibility gap frequently noted in deep learning for remote sensing. The results confirm a clear efficiency–accuracy trade-off, with over 90% fewer parameters compared to conventional CNNs while maintaining competitive accuracy levels. This positions the approach as a baseline framework for large-scale and resource-constrained scenarios, such as edge computing or rapid monitoring. Furthermore, by explicitly acknowledging limitations and outlining future directions—including multi-class land cover mapping, multi-temporal analysis, SAR–optical fusion, and explainable AI—the study provides more than just a new lightweight model: it offers a methodological foundation for advancing reproducible and operational land cover monitoring.
A key contribution lies in the efficiency–accuracy trade-off. As detailed in
Table 6, the ablation configurations demonstrate that increasing network depth or kernel size yields marginal accuracy gains (<0.3%) at substantially higher parameter counts. These results empirically substantiate the efficiency–accuracy trade-off that defines the proposed architecture. While heavier CNN architectures such as ResNet or VGG often report slightly higher accuracies [
10,
19], they demand considerably more computational resources, which restricts their use in operational or resource-limited environments. By contrast, the proposed lightweight CNN achieved nearly equivalent performance with over 90% fewer parameters, highlighting its potential for deployment in large-scale monitoring or edge-computing scenarios.
Recent advances in Edge AI confirm that deploying lightweight deep learning models on resource-constrained platforms is feasible and increasingly relevant for operational remote sensing [
31].
The comparative evaluation against an NDVI baseline further underscores the advantages of CNN-based approaches. NDVI, despite its popularity and simplicity [
2], achieved only 78% overall accuracy in this study. Misclassifications were mainly concentrated in transitional zones characterized by sparse vegetation or mixed soil–vegetation pixels. These findings confirm the structural limitations of index-based methods, which are unable to fully capture the spectral–spatial complexity of heterogeneous landscapes [
5,
6]. CNNs, on the other hand, leverage joint spectral–spatial feature learning, thereby providing superior robustness in such conditions.
Another important aspect of this work is methodological transparency. Many previous studies have been criticized for insufficient reporting of preprocessing details, training–validation splits, or hyperparameter settings [
22]. Here, we explicitly documented the full workflow—from atmospheric correction and resampling to patch extraction, training configuration, and evaluation metrics—thus supporting reproducibility and comparability. Such transparency is increasingly emphasized as a requirement for advancing the reliability of remote sensing research [
24,
25,
32].
The inclusion of the ablation component further enhances methodological transparency by explicitly showing how individual architectural choices influence the final model performance, thereby strengthening the reproducibility and interpretability of the proposed pipeline.
Nevertheless, several limitations must be acknowledged. First, the analysis was restricted to a binary classification (soil vs. vegetation). Extending the methodology to multi-class problems, including crops, forests, and urban areas, would provide broader applicability. Second, the evaluation was conducted for a single region (Dolj County, Romania) and within a single growing season. Multi-temporal analyses and cross-regional validations are necessary to assess generalizability across diverse agro-ecological contexts. Third, the study relied exclusively on Sentinel-2 optical imagery. Incorporating complementary data sources such as SAR could mitigate the effects of cloud cover and improve discrimination in spectrally ambiguous conditions [
26]. Finally, while lightweight CNNs improve computational efficiency, they remain limited in terms of interpretability. Index-based approaches such as NDVI, although less accurate, are easier to explain to end users. Future research should therefore explore explainable AI (XAI) methods to enhance the transparency and trustworthiness of CNN-based land cover monitoring [
16].
In summary, this study makes three main contributions: (i) it demonstrates that a lightweight CNN can achieve competitive accuracy with far fewer parameters, (ii) it ensures methodological transparency to facilitate reproducibility, and (iii) it contextualizes performance against both traditional indices and recent deep learning studies. Together, these contributions provide a methodological foundation for advancing scalable, efficient, and transparent deep learning pipelines in remote sensing applications.
6. Conclusions
This study introduced a lightweight convolutional neural network (CNN) pipeline for soil–vegetation classification using Sentinel-2 imagery, applied to Dolj County, Romania. The proposed model, with fewer than 150,000 trainable parameters, achieved an overall accuracy of 91.20% and a Cohen’s kappa of 0.82. These results are competitive with state-of-the-art CNNs containing millions of parameters, while being over 90% lighter in terms of computational requirements. By implementing a fully documented preprocessing and training workflow, including resampling of 20 m bands to 10 m resolution with bilinear interpolation, the study also contributes to methodological transparency and reproducibility.
The main contributions of this work are threefold: (i) full methodological transparency, ensuring reproducibility from preprocessing to evaluation, (ii) demonstration of a strong efficiency–accuracy trade-off, achieving high accuracy with a compact model architecture, and (iii) comparative evaluation against NDVI, heavier CNNs, and recent literature, highlighting the advantages of lightweight architectures for operational scalability.
Nevertheless, some limitations must be acknowledged. The study was restricted to a binary classification task (soil vs. vegetation) and to a single geographic region (Dolj County, Romania). Broader validation across multiple land cover classes (e.g., crops, forests, and urban areas), geographic regions, and temporal contexts is necessary to fully establish generalizability. Moreover, only optical Sentinel-2 imagery was used. Integrating complementary data sources, such as synthetic aperture radar (SAR), could improve classification robustness under cloudy or mixed-cover conditions. Finally, while lightweight CNNs provide efficiency gains, interpretability remains limited compared to index-based methods.
Future work should therefore focus on extending the approach to multi-class and multi-temporal land cover mapping, testing transferability across regions, and exploring SAR–optical data fusion. In addition, explainable AI (XAI) methods may help enhance the interpretability and transparency of CNN-based models.
Recent studies have also proposed lightweight multi-temporal models for crop monitoring with Sentinel-2, demonstrating promising results for time-series applications [
33].
Overall, the findings demonstrate that lightweight, reproducible CNNs offer a promising direction for scalable and efficient land cover monitoring. The complementary ablation analysis empirically validated the robustness of the proposed three-block architecture, confirming that compact CNNs can achieve a reliable balance between accuracy and efficiency for operational use. This study provides a methodological foundation that can be extended and adapted in future research addressing more complex and diverse classification tasks in operational remote sensing.