Next Article in Journal
Research on GPS Satellite Clock Bias Prediction Algorithm Based on the Inaction Method
Previous Article in Journal
Validation of the CERES Clear-Sky Surface Longwave Downward Radiation Products Under Air Temperature Inversion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimising Deep Learning-Based Segmentation of Crop and Soil Marks with Spectral Enhancements on Sentinel-2 Data

1
Center for Cultural Heritage Technology, Istituto Italiano di Tecnologia, 31056 Treviso, Italy
2
Department of Environmental Sciences, Informatics and Statistics, Università Ca’Foscari, 30171 Venezia, Italy
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(24), 4014; https://doi.org/10.3390/rs17244014
Submission received: 16 October 2025 / Revised: 26 November 2025 / Accepted: 8 December 2025 / Published: 12 December 2025

Highlights

What are the main findings?
  • The study presents the first systematic evaluation of how spectral enhancement techniques applied to Sentinel-2 imagery influence deep learning models for detecting palaeochannel-related soil and crop marks. Among the tested approaches, the multi-temporal composite (MV) consistently achieved the highest segmentation accuracy.
  • Seasonal variability strongly affects detection performance: early growth and post-harvest periods provide the most favourable conditions, while peak vegetation severely reduces visibility and segmentation accuracy across all enhancement techniques.
What are the implications of the main findings?
  • The results demonstrate that incorporating spectral enhancement techniques and seasonally tailored preprocessing strategies significantly improve the robustness and precision of deep learning-based palaeochannel detection workflows.
  • By highlighting the interplay between spectral transformations, seasonal conditions, and model behaviour, this study establishes a new benchmark for integrating enhancement methods into AI-driven prospection pipelines, supporting more accurate, scalable, and season-adaptive applications in archaeological and environmental remote sensing.

Abstract

This study presents the first systematic investigation into the influence of spectral enhancement techniques on the segmentation accuracy of specific soil and vegetation marks associated with palaeochannels. These marks are often subtle and can be seasonally obscured by vegetation dynamics and soil variability. Spectral enhancement methods, such as spectral indices and statistical aggregations, are routinely applied to improve their visual discriminability and interpretability. Despite recent progress in automated detection workflows, no prior research has rigorously quantified the effects of these enhancement techniques on the performance of deep learning–based segmentation models. This gap at the intersection of remote sensing and AI-driven analysis is critical, as addressing it is essential for improving the accuracy, efficiency, and scalability of subsurface feature detection across large and heterogeneous landscapes. In this study, two state-of-the-art deep learning architectures, U-Net and YOLOv8, were trained and tested to assess the influence of these spectral transformations on model performance, using Sentinel-2 imagery acquired across three seasonal windows. Across all experiments, spectral enhancement techniques led to clear improvements in segmentation accuracy compared with raw multispectral inputs. The multi-temporal Median Visualisation (MV) composite provided the most stable performance overall, achieving mean IoU values of 0.22 ± 0.02 in April, 0.07 ± 0.03 in August, and 0.19 ± 0.03 in November for U-Net, outperforming the full 12-band Sentinel-2 stack, which reached only 0.04, 0.02, and 0.03 in the same periods. FCC and VBB also performed competitively, e.g., FCC reached 0.21 ± 0.02 (April) and VBB 0.18 ± 0.03 (April), showing that compact three-band enhancements consistently exceed the segmentation quality obtained from using all spectral bands. Performance varied with environmental conditions, with April yielding the highest accuracy, while August remained challenging across all methods. These results highlight the importance of seasonally informed spectral preprocessing and establish an empirical benchmark for integrating enhancement techniques into AI-based archaeological and geomorphological prospection workflows.

1. Introduction

This study investigates how spectral enhancement techniques affect the performance of deep learning models for detecting subtle subsurface features in multispectral satellite imagery. Although spectral enhancements, such as band transformations, contrast adjustments, or index-based composites, have long been applied to improve visual inspection of remote sensing data, their relevance to modern data-driven pipelines remains underexplored. This work takes an exploratory approach to determine whether spectral preprocessing improves model accuracy and temporal robustness or instead introduces biases that degrade generalisation under different environmental conditions.
The identification of subsurface features in satellite imagery often relies on the detection of ‘soil marks’ and ‘crop marks’, spectral anomalies produced by alterations in vegetation stress, moisture content and soil composition [1,2,3]. These signals are typically weak, short-lived, and seasonal in nature, and highly sensitive to environmental factors, making their identification challenging on raw remote sensing data. To enhance the visibility of these features and their separability against the background, various image enhancement techniques, including band selection and combination, band math operations, statistical transformations, and convolution-based filters, have long been used [4,5,6,7]. These techniques were initially developed to support visual interpretation or classical machine-learning approaches (e.g., Random Forests, SVMs), which can readily incorporate numerous spectral bands and indices without architectural restrictions [8,9]. With the introduction of Deep Learning, the field has shifted toward the use of Convolutional Neural Networks (CNNs) and related architectures capable of learning hierarchical spatial and spectral representations directly from raw multi-band data [10]. Those methods have demonstrated significant improvements in terms of accuracy, computational efficiency, and scalability across a wide range of tasks [11,12,13,14,15,16]. However, it remains unclear whether these spectral enhancements provide similar benefits to deep learning models or whether they introduce distortions that ultimately hinder automated detection. Conventional CNN architectures often accept only a reduced number of input channels due to practical constraints related to parameter count, computational cost, and the risk of overfitting, especially in challenging tasks when the number of spectral bands is large relative to the training samples [17,18]. As a result, spectral transformations and preprocessing steps that are highly effective in classical ML workflows may not produce the same benefits for standard deep-learning architectures without careful adaptation.
Preprocessing and image enhancement techniques have already been explored in deep learning workflows for subsurface features applications, particularly in LiDAR-based segmentation. LiDAR detection tasks are more mature than tasks on multispectral images due to the stability of feature visibility, reduced seasonal variability, and stronger topographic expression [19,20,21]. Although enhancement methods in LiDAR workflows have produced promising results, they have also shown inconsistent or task-dependent effects on performance, often driven by data characteristics and response to different model architectures [22,23,24]. This variability is particularly important because it highlights that even in a modality where visibility is stable and well-standardised, preprocessing does not guarantee improved performance. Multispectral imagery differs fundamentally from LiDAR. Whereas LiDAR preprocessing primarily enhances geometric information, such as elevation or surface structure, by filtering, normalisation, or noise removal, multispectral enhancement alters spectral reflectance values that depend on dynamic environmental conditions. The fundamentally different nature of LiDAR’s structural signals and multispectral spectral responses means that preprocessing may interact with feature detection in different and less predictable ways. Soil and crop marks in multispectral data are less distinct, seasonal, and often obscured by vegetation or land management patterns, creating additional challenges for automated detection. To date, no systematic evaluation has assessed the influence of spectral preprocessing on deep learning models for segmenting subsurface features in multispectral data, leaving open questions about both its utility and potential biases.
Beyond evaluating whether spectral enhancement techniques improve detectability in absolute terms, it is also necessary to assess whether their effectiveness is consistent across different seasonal conditions. Temporal robustness is critical because soil and crop marks are not uniformly visible throughout the year, and they depend on dynamic environmental variables such as vegetation cover, crop phenology, soil moisture, and land management practices [25,26,27,28,29].
To account for these fluctuations, previous studies have typically addressed seasonal variability by selecting imagery acquired during “optimal windows” of maximum visibility and repeating visual inspection across multiple dates to maximise detection potential [30,31,32]. More recent approaches have introduced automated procedures, such as local statistical metrics that quantify the spectral separability between features of interest and their background, allowing a more objective assessment of detectability under different seasonal conditions [28,33,34,35,36,37]. However, most workflows still largely rely on manual methods and expert interpretation, which remain time-consuming and difficult to scale, and challenging for large-scale or long-term studies and monitoring.
Deep learning offers an opportunity to overcome these limitations by automating feature identification across temporally variable imagery. It also enables a systematic assessment of whether spectral enhancement techniques mitigate or amplify seasonal effects. For example, a band transformation may enhance buried features when vegetation stress is high but suppress them under bare-soil conditions, effectively altering the contrast across seasons. Evaluating enhancement methods within a deep learning framework, therefore, provides not only a measure of detectability but also a means to test their stability and generalisability across environmental conditions. By explicitly considering seasonality and its interaction with spectral enhancement, this study aims to provide a more rigorous understanding of when and how preprocessing benefits automated detection of subsurface features.
To this end, we introduced a benchmark framework based on multispectral Sentinel-2 imagery to evaluate how spectral enhancement interacts with deep learning models for the semantic segmentation of palaeochannels. Palaeochannels, the relic traces of former fluvial systems that dried up due to climatic, geological or human-induced factors, provide an ideal case study because they produce low-contrast, seasonally variable spectral signatures [38,39]. Their widespread distribution and diversity of environmental settings provide a rigorous test bed for assessing whether spectral preprocessing can improve model robustness under diverse spatial and temporal scenarios [40]. The study areas, located in two coastal plains in northeastern and southeastern Italy, provide a representative range of geomorphological and agricultural conditions in which palaeochannel visibility varies considerably. This variability is shaped by both the natural diversity of alluvial landscapes and modern land-management practices, which alter the expression of crop and soil marks. To account for these influences, imagery from three periods of the year was included, capturing seasonal changes in vegetation, soil moisture, and land use. This multitemporal perspective enables us to evaluate how spectral enhancement techniques perform under different environmental conditions and their impact on the detectability of palaeochannel traces. This study pursues three primary objectives:
  • Evaluate the impact of spectral enhancement techniques on deep learning segmentation performance in multispectral imagery, quantifying whether and how preprocessing improves the detectability of subtle soil and crop marks from subsurface features.
  • Assess the temporal robustness of enhancement techniques, determining how seasonal variability, vegetation phenology, and soil conditions affect model performance and whether certain preprocessing strategies are stable across environmental conditions.
  • Establish a benchmark framework for systematically comparing spectral preprocessing methods within deep learning workflows, providing a reproducible and generalisable foundation for future studies.
This article is structured as follows. Section 2.1 presents the study area. Section 2.2 details the datasets, preprocessing methods, deep learning architectures, training strategies, and evaluation metrics. Section 3 reports experimental results, including analyses of seasonal variation and preprocessing effects. Section 4 discusses the implications of these findings for multispectral feature detection and provides recommendations for future applications. Finally, Section 5 concludes with a summary of contributions and key insights.

2. Materials and Methods

2.1. Study Area

For this study, three areas in Italy were selected in large coastal plains along the Adriatic Sea: in northeastern Italy, the Veneto and Friuli Venezia Giulia plains and the Po Delta plain; in southern Italy, the Salpi Lagoon plain (Figure 1). These areas were chosen to capture a representative range of geomorphological and agricultural settings where palaeochannel visibility varies due to both natural and anthropogenic factors. The coastal plains are characterised by intensive agriculture with extensive cereal and forage cultivation, often irrigated, where spatial fragmentation and seasonal land-cover dynamics strongly influence the detectability of palaeochannel traces. The selection of these zones also considered their utility for the segmentation experiments, splitting training and validation areas to include a balanced representation of palaeochannel instances, and defining a geographically independent test area to evaluate model generalisation to previously unseen landscapes (see Section 2.2.4).
The study areas are largely characterised by Holocene alluvial deposits formed during successive aggregational phases of major fluvial systems [41,42,43,44]. In this context, the primary palaeochannel-related landforms include abandoned meander loops, relict channel belts, levee–ridge complexes, and crevasse splays. These features result from the gradual vertical and lateral accretion of channel sands, levee deposits, and bar complexes over finer-grained overbank sediments, which are dominated by silt and clay. As a result, palaeochannels and associated ridges often appear as slightly elevated, contrasting with the more compressible and subsidence-prone floodplain. However, the underlying substrate of the plain overlies Late Pleistocene (LGM) deposits, consisting of coarser, sandy and gravelly alluvial fan deposits, although partially buried, can exhibit more pronounced erosional or depositional relief where preserved.
From a remote sensing perspective, palaeochannel traces can be detected in optical imagery as subtle variations in tone, texture, moisture, and vegetation patterns. Coarser sandy channel fills and levee ridges may retain less moisture and support distinct crop or vegetation growth, producing spectral and phenological contrasts in multispectral imagery. Conversely, finer-grained abandoned channels may accumulate moisture and organic-rich soils, appearing as darker, wetter, or more vegetated anomalies, as confirmed by other studies that used visual inspections of remote sensing data in the same geographic area [45,46]. Although local variability exists in terms of soil type and vegetation among the study areas, visual inspection of palaeochannel traces revealed consistent forms and patterns across all three study areas examined.
Palaeochannels hold significant interdisciplinary value as their study informs research in geomorphology, archaeology, and hydrogeology, revealing past hydrological regimes, human-environment interactions, and landscape evolution [41,42,43,44]. Furthermore, the information derived from palaeochannel mapping contributes to sustainable land-use planning, water resource management, and environmental conservation, underscoring their relevance not only to academic research but also to practical applications in heritage preservation and ecosystem management [47].

2.2. Methodology

To address the research questions, the workflow was organised into successive stages of data preparation, spectral enhancement, and model training. Sentinel-2 imagery was processed with a range of enhancement techniques to improve feature visibility, and deep learning models were then applied for the automated detection of palaeochannels (Figure 2).
In the first stage, Sentinel-2 satellite imagery was acquired and processed to produce five different spectral enhancement products (Section 2.2.1 and Section 2.2.2). To account for seasonal variability, these enhancements were computed across three different periods of the year, resulting in a total of fifteen distinct raster products from the combination of each visualisation and each period.
In the second stage, palaeochannel labels were manually annotated for each image of the dataset to account for the volatile nature of the traces, resulting in fifteen corresponding label datasets (Section 2.2.3).
In the final stage, datasets were prepared for deep learning applications and partitioned into training and validation sets. Two deep learning architectures for semantic segmentation, U-Net and YOLOv8, were implemented and rigorously evaluated on a separate test area to assess their detection performance (Section 2.2.4, Section 2.2.5 and Section 2.2.6)

2.2.1. Imagery Acquisition

Sentinel-2 satellite imagery (European Space Agency, Copernicus Programme) from three distinct months of 2022, April, August, and November, was acquired and processed using Google Earth Engine [48]. These months were selected to balance the need for accurate labelling with the temporal variability of agricultural landscapes. Since each period requires manual annotation, limiting the analysis to three representative months reduces the annotation workload while still capturing meaningful variability, following the strategy adopted in [49]. In that study, the authors assessed the visibility of 30 randomly selected traces within the northeastern coastal plain of Italy (area partially overlapping with this study) and identified two major periods of high visibility: April and November, along with the adjacent months, as well as one period of lower visibility, August, intentionally selected to divide the year into three evenly spaced observation windows. Accordingly, the selected months in our study follow a similar distribution across the year, enabling us to capture different phenological stages, vegetation cover, soil moisture conditions, and periods of bare soil exposure, all of which are critical for influencing the detectability of palaeochannel traces under variable visibility conditions. At the same time, this strategy minimises redundant information, since consecutive months often show limited variations in their spectral signatures. In the study area, April marks the beginning of the crop cycle, during which agricultural fields exhibit minimal crop coverage as crops are still in their early growth stages. August corresponds to the period of highest vegetation coverage, largely due to artificial irrigation. November represents the onset of post-harvest conditions and reduced vegetation coverage. To quantitatively measure the amount of vegetation in the agricultural fields during each period, the mean NDVI value was computed, and images were ranked. The results confirmed that August had the highest vegetation coverage, with a mean NDVI of 0.36, followed by April at 0.26 and November at 0.21.
To obtain cloud-free coverage across the large and scattered target areas, multiple Sentinel-2 acquisitions were mosaicked from the level 2A collection, which provides surface reflectance values. Acquisition dates were kept within a 10-day window to ensure comparable vegetation and environmental conditions. For each selected month, a cloud-free image mosaic was then created, filtered using cloud and cirrus pixel masks.

2.2.2. Spectral Enhancements

Five image enhancement products were developed using the 10 m resolution bands of the Sentinel-2 imagery. All the products generated in this step were constrained into a 3-band image stack format to comply with the input requirements of the deep learning model architecture (see 2.2.4. for details). From the visible spectrum (wavelengths 490–665 nm), band 4 (central wavelength at 665 nm), band 3 (central wavelength at 560 nm), and band 2 (central wavelength at 490 nm) were utilised to create a three-band RGB composite, providing a true-colour representation. Additionally, the near-infrared band 8 (central wavelength at 842 nm) was utilised to generate a False Colour Composite (FCC) by stacking band 8, band 3, and band 2. This composite was specifically designed to enhance vegetation patterns and soil moisture variations, which are more pronounced in the NIR spectrum [5,35,50].
A preliminary visual inspection was also conducted on several spectral indices and processing techniques commonly used in archaeological and geomorphological prospections to highlight differences in vegetation vigour by exploiting the spectral distance between green, red and near-infrared bands. For the sake of this study and to reduce the computational aspects, only meaningful spectral enhancements that returned visual improvements over the use of standard RGB and FCC inspection were taken into consideration, while the others were discarded due to discontinuous performance across different parts of the territory across the three periods, often leading to pejorative results.
Additionally, we developed a new image composite that included selected bands from the HSV (Hue, Saturation, and Value) colour space transformation [51] and the Tasselled Cap Transformation (TCT) [52,53,54], two widely used techniques that have shown good performance in similar contexts for visual inspection [40,55]. This time, we selected only the most informative products based on a thorough visual inspection of the results. Visual inspection of the HSV transformation indicated that the Value band provided the clearest visibility for palaeochannel traces. The visual examination of three products of TCT revealed that the brightness band contributed most significantly to the identification of palaeochannel traces. By combining the best products from both transformations, we created a new three-layer stack image named VBB, including the Value band, Brightness of TCT, and band 2 (Blue) from Sentinel-2 imagery. Band 2 was included due to its strong capacity to enhance palaeochannel visibility in bare soil conditions.
An additional spectral enhancement technique was designed to capture the temporal variability expressed in short time series, with the objective to minimise the influence of image noise and atmospheric disturbance present in single acquisitions. This method, referred to as Median Visualisation (MV), enhances stable and persistent spectral signals in both soil and vegetation. MV is generated by calculating the per-band median of reflectance across cloud-free images within a two-month window, centred on the acquisition dates used for single-image enhancements. For this calculation, only the RGB bands were selected. This process returned three composite images: a ‘Spring’ image, derived from the median of images centred around mid-April; a ‘Summer’ image, centred around mid-August; and an ‘Autumn’ image, centred around mid-November (seasonal labels follow the Northern Hemisphere calendar) (Figure 3 and Figure 4).
To evaluate the performance of the deep learning models, a 12-band image stack was created, including all relevant spectral bands from Sentinel-2. This image was designed to maximise the spectral information available to the model by utilising the full range of raw data. The objective was to assess how effectively the model could leverage this broad spectral information to identify and detect key features directly from minimally processed imagery, without relying on any data preprocessing or enhancement techniques.
As the visual inspection revealed a marked reduction in palaeochannel visibility in August due to vegetation greenness, soil dryness, and lower spectral contrast, an additional set of vegetation indices was applied to improve the interpretability in this month. To increase the spectral contrast between vegetated and non-vegetated surfaces, the Normalised Difference Vegetation Index (NDVI) was computed from the red and near-infrared bands, exploiting their strong reflectance disparity to improve the visibility of subtle vegetation marks. To reduce the influence of exposed soils, the Modified Soil-Adjusted Vegetation Index (MSAVI) was included. MSAVI reduces soil-background interference and provides more stable performance in areas with sparse or stressed vegetation. In addition, the Enhanced Vegetation Index (EVI) was employed owing to its improved sensitivity in high-biomass conditions and its ability to reduce atmospheric noise through optimised correction factors [56,57].
The primary objective was to evaluate whether vegetation indices could suppress background noise and isolate subtle hydro-morphological patterns during the peak vegetation period (August), where raw RGB or multispectral imagery provides limited contrast, as shown in Figure 5. While several palaeochannel traces remain detectable in the raw August RGB image—primarily where bare soil or sparse vegetation is present—the vegetation indices suppress most soil-based signals, causing many palaeochannels to disappear and leaving only a limited subset of features associated with active vegetation contrast. This demonstrates the reduced interpretability of vegetation-based indices for palaeochannel detection during summer conditions. Following visual inspection, only NDVI was retained for further evaluation, as all three indices (NDVI, MSAVI, EVI) exhibited the same limitations: loss of bare-soil palaeochannels and minimal enhancement of vegetated traces. Despite this restricted visibility, NDVI was carried forward into the deep learning experiments to assess whether a model could still leverage its vegetation-driven contrast. NDVI was reformatted into three-band composite stacks to maintain compatibility with the deep learning input format and was specifically designed to compensate for the reduced spectral separability observed in August, thereby improving the visibility of palaeochannel traces during this challenging seasonal phase.
Regarding computational costs, all image enhancements were generated using the Google Earth Engine platform. While most enhancements can also be computed locally using standard band-math operations, the median visualisation (MV) is particularly suited to cloud-based processing. Unlike the other products, MV requires accessing multiple acquisitions, and downloading these image stacks locally would be time-consuming. In cloud-computing environments, this step is efficient, and the time required to generate the enhancements is negligible compared to the overall cost of training deep learning models.

2.2.3. Palaeochannels Dataset

Each enhancement technique was applied to imagery of the three distinct periods. Dataset labelling was conducted in ArcGIS Pro (version 3.1.1; ESRI, Redlands, CA, USA) using polygonal features and underwent multiple stages of cross-checking and revision to ensure consistency, address any missing instances and minimise false-positive identification. Since the visibility of palaeochannels varied depending on the enhancement techniques and periods (Figure 3 and Figure 4), the dataset required multiple annotations. As a result, 15 distinct sub-datasets were labelled, each generated from combinations of the five enhancement techniques across the three periods. Summary statistics of the labelled sub-datasets are reported in Table 1 for the training set and in Supplementary Materials for the test set (Table S1). The highest count of instances was recorded in April on the MV image, where 2581 features were identified. In November, the number of instances ranged from 1977 to 2061 across the different enhancement techniques. In contrast, the lowest count was recorded in August, where the FCC displayed the highest visibility of palaeochannels, with 1701 instances. To evaluate the 12-band model, labels from the RGB and FCC datasets were merged to incorporate information from the most relevant portions of the spectrum (RGB + NIR). This merging results in a lower absolute number of instances, as geometrically overlapping labels can fuse previously distinct features. The proportion of tiles containing at least one pixel of the palaeochannel class ranges from 58.8% to 69.9% of all tiles, while the ratio of palaeochannel pixels to background pixels varies between 0.76% and 1.27% of the total pixel count. In the test area (Supplementary Material, Table S1), ratios of palaeochannel pixels are higher due to the absence of empty tiles, ranging from 1.48% to 3.77%.

2.2.4. Experimental Setting

To evaluate the robustness of the findings, we employed two training strategies based on image tiling at a resolution of 256 × 256 pixels. The first strategy is based on a geographically informed split between training and validation data. The dataset was split using an 80:20 geographical split, resulting in 2807 km2 (2409 tiles) for training and 472 km2 (464 tiles) for validation (Figure 1). Validation areas were selected within each of the three main regions of the dataset to ensure that all major environmental and geomorphological contexts were represented during model evaluation. This selection also preserved a proportional balance in the number of palaeochannel instances across the two sets, avoiding biases associated with spatial clustering of features. In addition to the training and validation partitions, we defined a geographically distinct test area covering 63 km2. This area, which was entirely excluded during model training, enables a realistic assessment of the model’s generalisation capabilities when applied to unseen landscapes that share similar environmental and geomorphological characteristics. Evaluating performance on this separate area provides a stronger indication of the robustness and transferability of the proposed models.
The second training strategy is based on a 5-fold random cross-validation applied to the training region. All tiles are randomly partitioned into five disjoint folds with an approximately homogeneous distribution of palaeochannel instances. At each iteration, one fold is held out for validation, and the remaining four folds are used for training, yielding five independently trained models. This procedure quantifies the variability of model performance under different training subsets and reduces the risk of overfitting to a single, arbitrary train/validation split. All models are then evaluated on the same geographically separate test area, which is not used for training or validation. Because this test region is spatially disjoint from the training area, it provides a direct measure of spatial generalisation, while the cross-validation results capture the sensitivity of the model to changes in training data composition. Aggregating the predictions and metrics over the five folds on the test set, therefore, provides a more robust and stable estimate of performance.
Each visualisation is trained using only the images and labels corresponding to its specific month, resulting in 15 model configurations, with each model being trained and evaluated under each strategy. This design prevents mixing images from different periods, which could otherwise obscure the temporal patterns and spectral features characteristic of each visualisation, ultimately ensuring that the models learn features that are truly representative of each observation period.

2.2.5. Semantic Segmentation Models

Two networks for semantic segmentation, specifically U-Net [58] and YOLOv8 [59], were employed for palaeochannels segmentation. U-Net is particularly effective in scenarios with limited data, owing to its distinctive architecture, which includes a contracting path, bottleneck layer, and expansive path. This structure allows it to capture highly contextual information while preserving fine details. The incorporation of skip connections further enhances accuracy by integrating low-level features [60]. YOLOv8, a variant of the YOLO series, was also used in this study due to its anchor-free detection method, which reduces hyperparameters and improves model scalability [61]. YOLOv8-Seg was included as a complementary architecture to evaluate whether a modern, one-stage, anchor-free segmentation framework could offer advantages in boundary localisation and inference efficiency. Pre-trained on the COCO (Common Objects in Context) dataset is a widely used large-scale collection of annotated natural images, providing rich segmentation and object-level labels that help pretrained models generalise effectively [62], YOLOv8 offers five models varying in size and parameter count. We experimented with all five models of YOLOv8 (n, s, m, l, and x) and assessed their accuracy in relation to speed. We selected the “yolov8m-seg” due to its medium weight, high accuracy and speed, outperforming the larger and extra-large variants in this specific task.
The training process for both models was carefully structured to maximise performance. To further optimise model performance, extensive hyperparameter tuning was carried out for both architectures. For U-Net, experiments were conducted with multiple learning rates, batch sizes, and loss-weight combinations to determine the configuration that maximised segmentation accuracy and stability. The tuning process was supported by Optuna, an automated optimisation framework, which systematically searched for the best-performing parameters based on validation IoU. Similarly, for YOLOv8, several trials were performed by adjusting the learning rate, confidence threshold, and data augmentation strength. These experiments ensured that the final training was performed using the most effective hyperparameters, balancing accuracy, generalisation, and computational efficiency.
As a result, U-Net was configured with the mit_b5 encoder, a design partly inspired by Vision Transformer but tailored and optimised for semantic segmentation [63]. The encoder uses weights pre-trained on ImageNet. The U-Net decoder is integrated with the Spatial and Channel ‘Squeeze and Excitation’ (SCSE) attention module, which recalibrates the learned feature maps by boosting the meaningful features of all the test-set images [64]. The best learning rate was found to be 4.45 × 10−5, paired with a weight decay of 4.49 × 10−3, which provided a stable balance between convergence speed and generalisation. The loss function achieved its best performance when combining the Dice loss [65], which is well-suited for highly unbalanced segmentation tasks, and the Focal loss [66], to further address class imbalance by down-weighting easy negatives and emphasising hard-to-classify palaeochannel pixels, thereby improving the model’s sensitivity to faint and fragmented features. Weights used are approximately 0.44 for the Dice and 0.25 for the Focal loss. Additionally, optimal results were obtained using focal loss with α = 0.66 and γ = 4.95, which increased the weight assigned to difficult pixels and reduced the influence of background pixels. The optimal decision threshold for converting logits into binary masks was 0.26, and a batch size of 16 yielded the most reliable training dynamics. The Adam optimiser was utilised for both models.
To compare the 3-band spectral enhancements with the raw Sentinel-2 imagery, the U-Net encoder is configured with the full set of available input channels, allowing the model to ingest all spectral information directly. Because no pretrained backbone is used, all encoder and decoder weights, including the first convolutional layer, are initialized from scratch. This avoids the need to adapt RGB-pretrained filters to multispectral data and ensures that every input band is treated symmetrically from the beginning of training. The network therefore learns band-specific representations entirely from the palaeochannel dataset, without any architectural constraints. We also experimented with adapting the RGB-pretrained encoders to our 12-band Sentinel-2 input. We modify only the first convolutional layer by setting in_channels = 12 in the U-Net configuration. This replaces the original 3-channel input kernel with a 12-channel kernel initialized by redistributing the pretrained RGB filters across all 12 input bands while keeping all deeper encoder and decoder weights unchanged. This preserves the benefits of large-scale RGB pretraining and allows the first layer to learn band-specific responses for the additional spectral channels during fine-tuning.
We did not extend YOLO to 12 input bands because this would require modifying its backbone and detection head to support non-RGB multispectral inputs, which is non-trivial and would necessitate re-designing and re-training the entire architecture from scratch. In contrast, U-Net naturally supports arbitrary numbers of input channels, allowing us to integrate all 12 Sentinel-2 bands. Since U-Net also showed consistently stronger palaeochannel segmentation performance, we focused our efforts on developing and optimising the multispectral U-Net pipeline.

2.2.6. Metrics and Evaluation

Segmentation performance was evaluated using Intersection over Union (IoU), Precision and Recall metrics. Due to the strong class imbalance in palaeochannel detection, we also included the F1 score, which is less sensitive to background pixels and better highlights true positive detection. The F1 score offers a more balanced assessment of segmentation performance by incorporating both precision and recall, making it especially useful for dealing with class imbalance and overall detection quality [67,68].
I o U = T P T P + F P + F N
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
F 1   s c o r e = 2 T P T P + F P + F N
To systematically assess temporal and visual variability in model performance, we conducted a set of pairwise statistical comparisons using the Wilcoxon signed-rank test, complemented by rank-biserial effect sizes. The analysis was performed independently for U-Net and YOLOv8 and separately for IoU and F1 score, using fold-wise paired observations to ensure robust within-condition comparisons. For each enhancement type (RGB, FCC, VBB, MV, 12-band), we evaluated seasonal differences by testing performance across the three-month pair combinations (April–August, April–November, August–November). Within each month, all enhancement products were also compared pairwise to quantify their relative contribution to segmentation accuracy.
The resulting heatmaps summarise the statistical patterns across conditions, jointly reporting the Wilcoxon p-values, indicating whether performance differences are statistically meaningful (p < 0.05), and the rank-biserial effect sizes, which measure the strength and direction of the difference on a −1 to +1 scale. By considering the significance and magnitude of effect together, it provides a rigorous basis for interpreting how each model responds to seasonal changes and spectral preprocessing. This framework also minimises distortions caused by class imbalance or fold-level variability, ensuring an interpretable and methodologically robust comparison of enhancement strategies.
After selecting the best-performing models based on pixel-level metrics on the independent test area, a post hoc evaluation was conducted. These additional analyses do not influence model selection, but rather provide further insight into the model’s ability and visualisation performance. A post hoc Object-level evaluation was performed to assess the models’ ability to correctly detect palaeochannel traces as coherent spatial entities. This perspective is essential in archaeological and geomorphological applications, where the interpretability and usability of a prediction depend on the correct identification and delineation of individual features rather than only on local pixel accuracy. Predicted polygons were matched to ground-truth instances using an overlap criterion based on the Intersection over Union (IoU). A prediction is counted as a true positive (TP) if its IoU with a ground-truth instance exceeds a predefined overlap threshold τ. Ground-truth objects that do not match with at least one prediction above τ are counted as false negatives (FN), while predictions that do not reach τ with any ground-truth object are considered false positives (FP). Polygons with fewer than 20 pixels were not counted to remove potential salt-and-pepper effect and artefacts. Threshold τ was introduced as a tunable parameter to explore detection robustness under different levels of spatial agreement, which is essential given the fuzzy boundaries and interpretative uncertainty typical of palaeochannel mapping. Object-based metrics were computed over the entire test area, without tile-wise aggregation, to avoid artefacts due to tiling and to reflect performance at the scale at which palaeochannel features are interpreted.

3. Results

In April, both models achieved their highest segmentation performance, with MV emerging as the strongest enhancement for both U-Net and YOLOv8 (Table 2 and Table 3). For U-Net, MV produced the best overall results, achieving an IoU of 0.22 ± 0.02, the highest recall (0.31 ± 0.08), and the strongest F1 score (0.36 ± 0.02). FCC performed closely, with an IoU of 0.21 ± 0.02 and an F1-score of 0.35 ± 0.03, while RGB reached the highest precision (0.61 ± 0.09) but at the cost of lower recall. YOLOv8 followed the same pattern: MV again produced the top IoU (0.09 ± 0.02), recall (0.17 ± 0.04), and F1 score (0.18 ± 0.05), while FCC yielded the highest precision (0.54 ± 0.10). The NDVI showed the lowest IoU among the enhancements (0.05 ± 0.01), while the overall lowest IoU was observed in 12-band imagery (0.04 ± 0.03).
The statistical comparison (Figure 6a,b) confirmed that April consistently outperformed August for nearly all enhancements, with significant p-values: for example, U-Net VBB APR–AUG p = 0.031, effect size rb = 1.00, and FCC APR–AUG p = 0.031, indicating a clear seasonal advantage. Differences between April and November are generally weaker but definitely marked (e.g., U-Net MV APR–NOV, p = 0.031, rb = 1.00), and consistent across the VBB and FCC. U-Net also significantly outperformed YOLOv8 in April for several enhancements (Figure 6c,d), especially for FCC (APR p = 0.031) and VBB (APR p = 0.031), reflecting U-Net’s consistent numerical advantage.
In August, segmentation accuracy declined sharply for both architectures due to reduced spectral contrast and dense vegetation. For U-Net, the highest IoUs were obtained by RGB (0.07 ± 0.02) and MV (0.07 ± 0.03), while FCC dropped to 0.06 ± 0.01 and VBB to 0.02 ± 0.01. VBB delivered the highest precision (0.43 ± 0.23) but extremely low recall, whereas MV showed the most balanced behaviour, obtaining the highest recall (0.10 ± 0.05) and the strongest F1 score (0.14 ± 0.05). YOLOv8 showed the same trend: RGB achieved the highest IoU (0.04 ± 0.01), FCC gave the strongest precision (0.46 ± 0.21), and MV again produced the best recall (0.05 ± 0.03) and F1 score (0.07 ± 0.04). The minimum IoU is produced by the 12-band imagery (0.02 ± 0.01)
The comparison confirmed that August was statistically the weakest month, with significant disadvantages relative to April across nearly all enhancements (Figure 6a,b): for example, U-Net VBB APR–AUG p = 0.031, FCC APR–AUG p = 0.031, and MV APR–AUG p = 0.062. Most AUG–NOV comparisons were nonsignificant (e.g., U-Net RGB AUG–NOV p = 0.688), indicating that both months shared low-signal conditions. Model-to-model comparisons also showed little significant difference in August:for instance, YOLO vs. U-Net under MV gave p = 0.062, consistent with the uniformly weak performance (Figure 6e).
In November, segmentation performance improved again for both models. For U-Net, MV remained the strongest enhancement, achieving an IoU of 0.19 ± 0.03, the highest recall (0.26 ± 0.11), and the highest F1 score (0.32 ± 0.05). FCC produced the highest precision (0.614 ± 0.15) and a moderate IoU (0.08 ± 0.02), while VBB showed stable performance (IoU 0.09 ± 0.02, precision 0.58 ± 0.22). YOLOv8 again ranked MV as the best enhancement for IoU (0.06 ± 0.03) and precision (0.51 ± 0.09), while recall was slightly higher for RGB (0.08 ± 0.05) and VBB (0.08 ± 0.03). The highest F1 score came from VBB (0.13 ± 0.03), followed closely by MV (0.11 ± 0.05). The lowest IoU was produced by NDVI (0.04 ± 0.01) among the enhancements, while the overall lowest performance was shown by 12-band imagery (0.03 ± 0.02) (Table 2 and Table 3).
November significantly outperformed August for most enhancements, with low p-values: for example, U-Net FCC AUG–NOV p = 0.062, VBB AUG–NOV p = 0.031, matching the observed improvement in numerical values. April–November differences were mixed (e.g., MV APR–NOV p = 0.031, FCC APR–NOV p = 0.031), consistent with the moderate differences in mean values. U-Net significantly outperformed YOLOv8 for several enhancements in November—especially FCC (p = 0.062)—while MV and VBB showed smaller or nonsignificant differences, reflecting their similar values (Figure 6c,d).
Object-level evaluation confirmed the same seasonal pattern observed in pixel-level metrics (Table 4). In April, MV achieved the highest detection ability (F1_obj = 0.45 ± 0.10), followed closely by VBB (0.43 ± 0.11) and FCC (0.39 ± 0.11). Performance dropped substantially in August, where only MV retained moderate object-level detectability (0.24 ± 0.07), while VBB and FCC dropped below 0.10. In November, MV again provided the strongest results (0.25 ± 0.05), with FCC showing competitive performance (0.26 ± 0.06), confirming that temporal compositing and contrast-based enhancements yield more coherent palaeochannel detections (Table 4). Additional information about Precision and Recall, and the complete curve of behaviour under the changing of the Threshold τ is provided in in Supplementary Materials for each visualisation (Figures S8–S19).

4. Discussion

4.1. Impact of Spectral Enhancement Techniques on Semantic Segmentation Models

The evaluation of spectral enhancements shows that the effectiveness of each product is strongly linked to the physical mechanisms encoded in its construction. MV consistently outperformed the other enhancements, likely because it integrates multi-temporal reflectance information and suppresses atmospheric noise and scene-specific artefacts, producing stable composites where palaeochannel traces remain visible even when individual images are noisy or partially obscured. This multi-temporal compositing effect was reflected in both the quantitative metrics and comparative tests, which showed MV maintaining significant advantages across most spectral enhancements, particularly in April and November, when this seasonal composite captured persistent soil–vegetation contrasts. FCC also demonstrated strong and consistent performance. Including the near-infrared band enhances the differentiation of vegetation and soil moisture, which correlates with subsurface hydro-morphology, making channel outlines more distinct during vegetated periods. VBB exhibits stable and competitive performance, combining the most informative components of HSV and the Tasseled Cap Transformation, both of which emphasise brightness and albedo variations associated with buried features. Together, these results indicate that enhancements which reinforce persistent soil-moisture structure (MV), highlight NIR-driven vegetation–soil contrast (FCC), or emphasise brightness/albedo gradients (VBB) provide the most useful inputs for deep learning segmentation. In contrast, RGB and especially vegetation indices (e.g., NDVI) encode narrower spectral responses, limiting their ability to preserve the subtle spatial gradients through which palaeochannels appear, a pattern confirmed by their systematically lower Wilcoxon effect sizes (shown in Figure 7, Figure 8, Figure 9 and Figure 10).
Finally, the raw 12-band Sentinel-2 input showed the lowest overall performance, confirming that higher spectral dimensionality alone does not translate into improved segmentation. The poor results likely stem from spectral redundancy, noise amplification, and mixed spatial resolution among the Sentinel-2 bands, which collectively expand the learning space without enhancing discriminative power. As noted in previous work, multispectral data often requires dimensionality reduction or targeted band selection to be effective [69,70,71]. The comparisons underlined the weak performance of 12B, indicating statistically meaningful disadvantages relative to all composite-based enhancements.
Visual inspection showed that NDVI and related vegetation indices almost completely suppressed palaeochannel traces in August, retaining only strong vegetation responses and discarding the soil-based contrasts on which the traces rely. Consequently, the index-derived images exhibited minimal correspondence with the ground-truth features. When used as inputs to the segmentation model, performance dropped sharply. These results indicate that vegetation indices provide insufficient palaeochannel signal for effective segmentation, particularly in areas that are not fully vegetated
Taken together, these findings demonstrate that spectral enhancement is not merely auxiliary preprocessing but a decisive component of the segmentation pipeline that fundamentally shapes the detectability of palaeochannels. MV provides the most robust performance overall, while FCC and VBB serve as strong alternatives. The consistent superiority of these enhancements across U-Net and YOLOv8 highlights their broader utility and supports the importance of benchmarking spectral transformations in multispectral deep learning workflows for archaeological and geomorphological applications.

4.2. Role of Seasonal Variations in Palaeochannel Detection

The analysis of seasonal variability highlights the strong dependence of palaeochannel visibility on phenological and soil conditions, and how these environmental shifts modulate the behaviour of deep learning segmentation models. Across both architectures, April consistently emerged as the most favourable period for palaeochannel detection. The combination of minimal vegetation cover, high soil exposure, and strong spectral contrast created conditions in which features were most clearly expressed, resulting in the highest segmentation quality and the largest number of detected instances. April significantly outperformed August, with large effect sizes across MV, FCC, and VBB, indicating that pre-harvest conditions amplify the spectral cues that both models rely on (Figure 7, Figure 8, Figure 9 and Figure 10). These findings align with previous studies showing that early crop growth stages maximise crop-mark visibility [72,73,74].
November produced intermediate performance across the models. While vegetation die-off contributed to partial feature recovery, the absence of crop-related spectral enhancement mechanisms meant that palaeochannel signatures were less pronounced than in April. MV continued to produce the most visually coherent segmentations, while RGB occasionally recovered more structure in YOLOv8 than FCC or VBB, although this effect was inconsistent and strongly model-dependent. The comparisons (Figure 7, Figure 8, Figure 9 and Figure 10) showed that April and November were more balanced overall, with November occasionally matching April in FCC-enhanced imagery. This highlights that soil exposure alone is not always sufficient to reveal palaeochannel morphology; instead, crop-induced spectral differences play a more substantial role in strengthening feature detectability.
August represented the most challenging period for both architectures. Dense summer vegetation, higher soil moisture heterogeneity, and reduced spectral contrast significantly obscured palaeochannel expressions, resulting in the lowest segmentation performance across all enhancement types. Although RGB appeared comparatively less affected in YOLOv8, this effect was marginal and did not translate into meaningful recoverability of channel structures. U-Net showed slightly more resilience, with RGB and MV performing similarly, but overall feature detection remained weak. The comparisons (Figure 7, Figure 8, Figure 9 and Figure 10) confirmed that August consistently underperformed, with strong negative effect sizes when compared to April and negligible differences relative to November. This underscores the substantial masking effect of vegetation and environmental noise, making palaeochannel detection during peak vegetation stages extremely limited.
Architectural differences further shaped seasonal sensitivity. U-Net’s encoder–decoder design is inherently well-suited to capturing diffuse, curvilinear, and low-contrast patterns, allowing it to benefit more from the enhanced visibility present in April and November. Conversely, YOLOv8′s feature extraction remains influenced by object-centric, edge-driven representations inherited from its detection ancestry, making it less responsive to the broad seasonal shifts that favour palaeochannel visibility. This architectural mismatch helps explain why YOLOv8 shows weaker seasonal contrast and less benefit from sophisticated spectral enhancements, particularly in months where palaeochannel traits are faint or irregular.
Despite the modest absolute performance values achieved in challenging months, qualitative inspection demonstrates that deep learning models can still recover intricate morphological structures when spectral and seasonal conditions are favourable. However, low recall across all enhancements in difficult periods highlights the persistent issue of missed detections, which is a limitation tied to both environmental constraints and the diffuse nature of palaeochannel signatures. These observations reinforce the need for multi-temporal strategies and carefully selected acquisition windows to reliably capture subsurface geomorphological features.

4.3. Post Hoc Evaluation

The comparison between pixel-level and object-level performance highlights clear differences in how consistently each visualisation supports palaeochannel detection. Object-level scores are generally higher than pixel-level ones, indicating that the models capture the presence and overall shape of palaeochannels rather than precise boundary details. The RGB composite shows the weakest and least consistent results, with low scores in both metrics and limited agreement between them. This suggests that RGB alone does not provide enough spectral contrast to reliably identify palaeochannel traces under different seasonal conditions. The FCC visualisation performs better, with improvements at the pixel level reflected at the object level, and performance remains relatively stable across all periods. This indicates that adding NIR information helps capture soil and vegetation differences in a way that remains consistent throughout the year. The VBB enhancement performs well in April but becomes less reliable in August and November, showing larger variations and reduced stability. Overall, the MV composite resulted in the most robust. It achieves the highest and most coherent scores across both metrics and maintains stable performance across all months.

4.4. General Remarks on the Challenging of the Detection Task

The relatively low pixel-based performance obtained in our experiments must be interpreted in relation to the intrinsic complexity of the dataset. Even though all study areas represent coastal plain environments, the palaeochannel traces show substantial variation due to their different geomorphological contexts. This variability is further increased by extensive modern land-reclamation works, which often reuse ancient channels for contemporary drainage or irrigation and alter the natural landscape with reclaiming works. Agricultural variability adds another important source of heterogeneity: crop type, seasonal growth stages, and rotation patterns differ across regions and years, producing changing background textures and colours. Although bare-soil periods tend to improve visibility, even experienced interpreters may find it hard to distinguish palaeochannels from the highly fragmented agricultural landscape that interrupts the continuity of the traces, which often appear broken or incomplete, or obscured by more prominent modern elements such as field boundaries, irrigation lines, or drainage marks.
Uncertainty in the annotation process is a major challenge for deep learning models, both during training and when evaluating predictions. Many palaeochannel traces have soft or fading edges, showing a gradual transition into the surrounding alluvial plain. Because these boundaries are not clearly defined, different annotators may interpret the limits in different ways, providing inconsistent examples to the model. Strict pixel-based metrics such as IoU or pixel-F1 treat these labels as if boundaries were precise and unambiguous, which is not realistic for this type of geomorphological feature, and therefore tend to underestimate the true quality of the predictions. However, the object-based metrics reveal that the models generally detect the presence and overall shape of the paleochannels even when they fail to predict precise edges. This suggests that the models have learned the main structural patterns, but their accuracy decreases when fine-scale delineation is required in highly variable or noisy landscapes. Examples of uncertainties in the labelling stage are visible in the overlay maps on the test set (Figure 11), which show the distribution of true positives, false positives, and false negatives. Several false-positive detections (in red) appear to lie in direct continuity with the mapped palaeochannel traces, indicating that they may correspond to genuine features that were not annotated in the ground truth. Despite these issues, from an archaeological and geomorphological perspective, such predictions remain useful as a first layer of interpretation and can guide more detailed, expert-driven analysis.
Finally, it is essential to emphasise that the primary objective of this study was not to optimise performance, but to compare different visualisation enhancements and temporal conditions under a controlled and reproducible experimental design, thereby minimising sources of bias. From this perspective, methodological consistency was prioritised over maximising individual scores. More robust modelling approaches could involve training on multi-temporal or multi-condition images simultaneously, allowing the model to learn complementary spectral and textural cues. This may stabilise predictions across varying backgrounds. Nonetheless, without sufficiently large and balanced datasets, such strategies may also increase sensitivity to site-specific characteristics.

5. Conclusions

This study advances the automated detection of palaeochannels by demonstrating that spectral enhancement strategies, when carefully selected, substantially improve deep learning segmentation compared to the direct use of raw multispectral data. The evaluation highlighted that enhancement products are not interchangeable: their ability to emphasise palaeochannel continuity or suppress noise directly influenced model outputs, underscoring the need for tailored preprocessing in archaeological and geomorphological applications. Equally important is the role of environmental seasonality. By analysing multiple acquisition periods, the study shows that vegetation growth cycles strongly constrain detectability, establishing that optimal timing is as critical as spectral choice for reliable mapping. More broadly, the integration of spectral optimisation, seasonally aware imagery, and multitemporal analysis constitutes a transferable framework that strengthens the robustness of palaeochannel detection.
Despite the computational intensity of these experiments and the effort required to generate dedicated label sets for each combination of enhanced images and temporal periods, the methodology adopted establishes a solid foundation for future work, with plans to expand its scope on multiple aspects. The use of freely available data worldwide Sentinel-2 enables the reproducibility of the experiments, encouraging their applications to various case studies globally. Future research should broaden the range of enhancement techniques used in deep learning models by incorporating additional widely used products and spectral indices that may further improve palaeochannel detection. While some indices were excluded in our experiment due to low performance, this does not necessarily imply that they are ineffective in other contexts. Their utility may vary under different geographical, geomorphological, and environmental conditions, or when applied to the detection of other features that produce soil and crop marks, such as subsoil archaeological structures. Additionally, exploring the potential of bands in the SWIR range, such as B11 and B12 bands, despite their lower spatial resolution, could provide critical insights into their utility in soil marks detection.
This research focuses on classical preprocessing methods applied prior to model training, recognising that these techniques reflect long-standing remote sensing practices that have not been tested against learning-based systems. At the same time, it acknowledges alternative strategies, such as using enhancement as data augmentation or implementing learnable spectral transforms within the model architecture, which represent promising directions beyond the scope of this initial benchmark. By establishing baseline evidence on the effects of traditional spectral enhancement, this work provides a foundation for future research into end-to-end or adaptive spectral learning and clarifies whether long-standing assumptions about image preprocessing remain valid in the era of deep learning.
Beyond the local context, the findings carry broader environmental and economic implications. Accurate palaeochannel detection informs sustainable land-use planning, guides water resource management, and supports conservation of floodplain ecosystems. In agricultural landscapes, understanding the spatial distribution of former channels can also inform irrigation planning, soil management, and crop rotation strategies, potentially improving yield efficiency while minimising environmental impact. More broadly, the framework combining spectral enhancement, seasonal imagery, and multitemporal analysis is transferable to other coastal plains worldwide, offering a reproducible approach for detecting subtle subsurface features, also beyond palaeochannels, and supporting interdisciplinary applications in geomorphology, archaeology, and heritage management.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs17244014/s1, Table S1: Summary statistics of the test set labels; Figure S1–S8: Statistical comparisons for U-Net IoU results and YOLOv8 results (F1 and IoU); Figures S9–19: Object-level predictions and metrics.

Author Contributions

Conceptualisation: A.Y., G.P., S.V. and A.T.; methodology: A.Y. and G.P.; software: A.Y.; validation, A.Y. and G.P.; data curation, A.Y.; writing—original draft preparation: A.Y. and G.P.; writing—review and editing: S.V., A.T. and G.P.; supervision: S.V. and A.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting this study’s findings are available from the corresponding author, A.T, upon a reasonable request.

Acknowledgments

We gratefully acknowledge the Data Science and Computation Facility of the Istituto Italiano di Tecnologia for their support and for providing access to the IIT High-Performance Computing infrastructure used for the modelling work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Evans, R.; Jones, R.J.A. Crop Marks and Soils at Two Archaeological Sites in Britain. J. Archaeol. Sci. 1977, 4, 63–76. [Google Scholar] [CrossRef]
  2. Musson, C. Flights into the Past. In Flights into the Past Aerial Photography, Photo Interpretation and Mapping for Archaeology; Musson, C., Palmer, R., Campana, S., Eds.; Aerial Archaeology Research Group: Swindon, UK, 2013; pp. 11–119. [Google Scholar]
  3. Verhoeven, G.J. Near-Infrared Aerial Crop Mark Archaeology: From Its Historical Use to Current Digital Implementations. J. Archaeol. Method Theory 2012, 19, 132–160. [Google Scholar] [CrossRef]
  4. Radoux, J.; Chomé, G.; Jacques, D.; Waldner, F.; Bellemans, N.; Matton, N.; Lamarche, C.; D’Andrimont, R.; Defourny, P. Sentinel-2′s Potential for Sub-Pixel Landscape Feature Detection. Remote Sens. 2016, 8, 488. [Google Scholar] [CrossRef]
  5. Abate, N.; Lasaponara, R. Preventive Archaeology Based on Open Remote Sensing Data and Tools: The Cases of Sant’Arsenio (SA) and Foggia (FG), Italy. Sustainability 2019, 11, 4145. [Google Scholar] [CrossRef]
  6. Agapiou, A.; Hadjimitsis, D.G.; Sarris, A.; Georgopoulos, A.; Alexakis, D.D. Optimum Temporal and Spectral Window for Monitoring Crop Marks over Archaeological Remains in the Mediterranean Region. J. Archaeol. Sci. 2013, 40, 1479–1492. [Google Scholar] [CrossRef]
  7. Elfadaly, A.; Abate, N.; Masini, N.; Lasaponara, R. SAR Sentinel 1 Imaging and Detection of Palaeo-Landscape Features in the Mediterranean Area. Remote Sens. 2020, 12, 2611. [Google Scholar] [CrossRef]
  8. Orengo, H.A.; Conesa, F.C.; Garcia-Molsosa, A.; Lobo, A.; Green, A.S.; Madella, M.; Petrie, C.A. Automated Detection of Archaeological Mounds Using Machine-Learning Classification of Multisensor and Multitemporal Satellite Data. Proc. Natl. Acad. Sci. USA 2020, 117, 18240–18250. [Google Scholar] [CrossRef]
  9. Yang, H.; Hu, Q.; Zou, Q.; Ai, M.; Zhao, P.; Wang, S. Predicting Ancient City Sites Using GEE Coupled with Geographic Element Features and Temporal Spectral Features: A Case Study of the Neolithic and Bronze Age of the Jianghan Region, China. npj Herit. Sci. 2025, 13, 11. [Google Scholar] [CrossRef]
  10. Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.-S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
  11. Alrabayah, O.; Caus, D.; Watson, R.A.; Schulten, H.Z.; Weigel, T.; Rüpke, L.; Al-Halbouni, D. Deep-Learning-Based Automatic Sinkhole Recognition: Application to the Eastern Dead Sea. Remote Sens. 2024, 16, 2264. [Google Scholar] [CrossRef]
  12. Chowdhury, M.; Martínez-Sansigre, A.; Mole, M.; Alonso-Peleato, E.; Basos, N.; Blanco, J.M.; Ramirez-Nicolas, M.; Caballero, I.; de la Calle, I. AI-Driven Remote Sensing Enhances Mediterranean Seagrass Monitoring and Conservation to Combat Climate Change and Anthropogenic Impacts. Sci. Rep. 2024, 14, 8360. [Google Scholar] [CrossRef]
  13. Fiorucci, M.; Naylor, P.; Yamada, M. Optimal Transport for Change Detection on Lidar Point Clouds. In Proceedings of the IGARSS 2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; IEEE: New York, NY, USA, 2023; pp. 982–985. [Google Scholar]
  14. Kokalj, Ž.; Džeroski, S.; Šprajc, I.; Štajdohar, J.; Draksler, A.; Somrak, M. Machine Learning-Ready Remote Sensing Data for Maya Archaeology. Sci. Data 2023, 10, 558. [Google Scholar] [CrossRef]
  15. Naylor, P.; Di Carlo, D.; Traviglia, A.; Yamada, M.; Fiorucci, M. Implicit Neural Representation for Change Detection. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024. [Google Scholar]
  16. Verschoof-van der Vaart, W.B.; Lambers, K.; Kowalczyk, W.; Bourgeois, Q.P.J. Combining Deep Learning and Location-Based Ranking for Large-Scale Archaeological Prospection of LiDAR Data from The Netherlands. ISPRS Int. J. Geo-Inf. 2020, 9, 293. [Google Scholar] [CrossRef]
  17. Paoletti, M.E.; Haut, J.M.; Plaza, J.; Plaza, A. Deep&Dense Convolutional Neural Network for Hyperspectral Image Classification. Remote Sens. 2018, 10, 1454. [Google Scholar] [CrossRef]
  18. Audebert, N.; Le Saux, B.; Lefevre, S. Deep Learning for Classification of Hyperspectral Data: A Comparative Review. IEEE Geosci. Remote Sens. Mag. 2019, 7, 159–173. [Google Scholar] [CrossRef]
  19. Zeynali, R.; Mandanici, E.; Bitelli, G. A Technical Note on AI-Driven Archaeological Object Detection in Airborne LiDAR Derivative Data, with CNN as the Leading Technique. Remote Sens. 2025, 17, 2733. [Google Scholar] [CrossRef]
  20. Character, L.; Beach, T.; Inomata, T.; Garrison, T.G.; Luzzadder-Beach, S.; Baldwin, J.D.; Cambranes, R.; Pinzón, F.; Ranchos, J.L. Broadscale Deep Learning Model for Archaeological Feature Detection across the Maya Area. J. Archaeol. Sci. 2024, 169, 106022. [Google Scholar] [CrossRef]
  21. Kadhim, I.; Abed, F.M. A Critical Review of Remote Sensing Approaches and Deep Learning Techniques in Archaeology. Sensors 2023, 23, 2918. [Google Scholar] [CrossRef]
  22. Guyot, A.; Lennon, M.; Hubert-Moy, L. Objective Comparison of Relief Visualization Techniques with Deep CNN for Archaeology. J. Archaeol. Sci. Rep. 2021, 38, 103027. [Google Scholar] [CrossRef]
  23. Jaturapitpornchai, R.; Poggi, G.; Sech, G.; Kokalj, Ž.; Fiorucci, M.; Traviglia, A. Impact of LiDAR Visualisations on Semantic Segmentation of Archaeological Objects. In Proceedings of the IGARSS 2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; IEEE: New York, NY, USA, 2024; pp. 3361–3365. [Google Scholar]
  24. Somrak, M.; Džeroski, S.; Kokalj, Ž. Learning to Classify Structures in ALS-Derived Visualizations of Ancient Maya Settlements with CNN. Remote Sens. 2020, 12, 2215. [Google Scholar] [CrossRef]
  25. Aqdus, S.A.; Hanson, W.S.; Drummond, J. The Potential of Hyperspectral and Multi-Spectral Imagery to Enhance Archaeological Cropmark Detection: A Comparative Study. J. Archaeol. Sci. 2012, 39, 1915–1924. [Google Scholar] [CrossRef]
  26. Bini, M.; Isola, I.; Zanchetta, G.; Ribolini, A.; Ciampalini, A.; Baneschi, I.; Mele, D.; D’Agata, A.L. Identification of Leveled Archeological Mounds (Höyük) in the Alluvial Plain of the Ceyhan River (Southern Turkey) by Satellite Remote-Sensing Analyses. Remote Sens. 2018, 10, 241. [Google Scholar] [CrossRef]
  27. McGrath, C.N.; Cowley, D.C.; Hood, S.; Clarke, S.; Macdonald, M. An Assessment of High Temporal Frequency Satellite Data for Historic Environment Applications. A Case Study from Scotland. Archaeol. Prospect. 2023, 30, 267–282. [Google Scholar] [CrossRef]
  28. Moriarty, C.; Cowley, D.C.; Wade, T.; Nichol, C.J. Deploying Multispectral Remote Sensing for Multi-temporal Analysis of Archaeological Crop Stress at Ravenshall, Fife, Scotland. Archaeol. Prospect. 2019, 26, 33–46. [Google Scholar] [CrossRef]
  29. Orengo, H.A.; Petrie, C.A. Large-Scale, Multi-Temporal Remote Sensing of Palaeo-River Networks: A Case Study from Northwest India and Its Implications for the Indus Civilisation. Remote Sens. 2017, 9, 735. [Google Scholar] [CrossRef]
  30. Estanqueiro, M.; Šalamon, A.; Lewis, H.; Molloy, B.; Jovanović, D. Sentinel-2 Imagery Analyses for Archaeological Site Detection: An Application to Late Bronze Age Settlements in Serbian Banat, Southern Carpathian Basin. J. Archaeol. Sci. Rep. 2023, 51, 104188. [Google Scholar] [CrossRef]
  31. Hill, A.C.; Laugier, E.J.; Casana, J. Archaeological Remote Sensing Using Multi-Temporal, Drone-Acquired Thermal and Near Infrared (NIR) Imagery: A Case Study at the Enfield Shaker Village, New Hampshire. Remote Sens. 2020, 12, 690. [Google Scholar] [CrossRef]
  32. Peña-Villasenín, S.; Gil-Docampo, M.; Ortiz-Sanz, J. Hidden Archaeological Remains in Heterogeneous Vegetation: A Crop Marks Study in Fortified Settlements of Northwestern Iberian Peninsula. Remote Sens. 2024, 16, 3923. [Google Scholar] [CrossRef]
  33. Agapiou, A.; Hegyi, A.; Stavilă, A. Observations of Archaeological Proxies through Phenological Analysis over the Megafort of Csanádpalota-Juhász, T. Tanya in Hungary Using Sentinel-2 Images. Remote Sens. 2023, 15, 464. [Google Scholar] [CrossRef]
  34. Cerra, D.; Agapiou, A.; Cavalli, R.; Sarris, A. An Objective Assessment of Hyperspectral Indicators for the Detection of Buried Archaeological Relics. Remote Sens. 2018, 10, 500. [Google Scholar] [CrossRef]
  35. Kalayci, T.; Lasaponara, R.; Wainwright, J.; Masini, N. Multispectral Contrast of Archaeological Features: A Quantitative Evaluation. Remote Sens. 2019, 11, 913. [Google Scholar] [CrossRef]
  36. Lasaponara, R.; Abate, N.; Masini, N. On the Use of Google Earth Engine and Sentinel Data to Detect “Lost” Sections of Ancient Roads. The Case of Via Appia. IEEE Geosci. Remote Sens. Lett. 2022, 19, 3001605. [Google Scholar] [CrossRef]
  37. Masini, N.; Marzo, C.; Manzari, P.; Belmonte, A.; Sabia, C.; Lasaponara, R. On the Characterization of Temporal and Spatial Patterns of Archaeological Crop-Marks. J. Cult. Herit. 2018, 32, 124–132. [Google Scholar] [CrossRef]
  38. Crabb, N.; Carey, C.; Howard, A.J.; Jackson, R.; Burnside, N.; Brolly, M. Modelling Geoarchaeological Resources in Temperate Alluvial Environments: The Capability of Higher Resolution Satellite Remote Sensing Techniques. J. Archaeol. Sci. 2022, 141, 105576. [Google Scholar] [CrossRef]
  39. Vervoort, R.W.; Annen, Y.L. Palæochannels in Northern New South Wales: Inversion of Electromagnetic Induction Data to Infer Hydrologically Relevant Stratigraphy. Soil Res. 2006, 44, 35. [Google Scholar] [CrossRef]
  40. Brandolini, F.; Domingo-Ribas, G.; Zerboni, A.; Turner, S. A Google Earth Engine-Enabled Python Approach to Improve Identification of Anthropogenic Palaeo-Landscape Features. Open Res. Eur. 2021, 1, 22. [Google Scholar] [CrossRef] [PubMed]
  41. Piovan, S.; Mozzi, P.; Zecchin, M. The Interplay between Adjacent Adige and Po Alluvial Systems and Deltas in the Late Holocene (Northern Italy). Geomorphol. Process. Environ. 2012, 18, 427–440. [Google Scholar] [CrossRef]
  42. Ninfo, A.; Mozzi, P.; Abbà, T. Integration of LiDAR and Cropmark Remote Sensing for the Study of Fluvial and Anthropogenic Landforms in the Brenta–Bacchiglione Alluvial Plain (NE Italy). Geomorphology 2016, 260, 64–78. [Google Scholar] [CrossRef]
  43. Corrò, E.; Mozzi, P. Water Matters. Geoarchaeology of the City of Adria and Palaeohydrographic Variations (Po Delta, Northern Italy). J. Archaeol. Sci. Rep. 2017, 15, 482–491. [Google Scholar] [CrossRef]
  44. Fontana, A.; Frassine, M.; Ronchi, L. Geomorphological and Geoarchaeological Evidence of the Medieval Deluge in the Tagliamento River (NE Italy). In Earth and Environmental Science; Springer: Cham, Switzerland, 2019; pp. 97–116. [Google Scholar]
  45. Mozzi, P.; Piovan, S.; Corrò, E. Long-Term Drivers and Impacts of Abrupt River Changes in Managed Lowlands of the Adige River and Northern Po Delta (Northern Italy). Quat. Int. 2020, 538, 80–93. [Google Scholar] [CrossRef]
  46. Susini, D.; Vignola, C.; Goffredo, R.; Totten, D.M.; Masi, A.; Smedile, A.; De Martini, P.M.; Cinti, F.R.; Sadori, L.; Forti, L.; et al. Holocene Palaeoenvironmental and Human Settlement Evolution in the Southern Margin of the Salpi Lagoon, Tavoliere Coastal Plain (Apulia, Southern Italy). Quat. Int. 2023, 655, 37–54. [Google Scholar] [CrossRef]
  47. Stein, S.; Malone, S.; Knight, D.; Howard, A.J.; Carey, C. New Approaches to Mapping and Managing Palaeochannel Resources in the Light of Future Environmental Change: A Case Study from the Trent Valley, UK. Hist. Environ. Policy Pract. 2017, 8, 113–124. [Google Scholar] [CrossRef][Green Version]
  48. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-Scale Geospatial Analysis for Everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
  49. Poggi, G.; Yaseen, A.; Jaturapitpornchai, R.; Ferro, S.; Sech, G.; Naylor, P.; Salvi, M.C.; Vascon, S.; Saux, B.L.; Fiorucci, M.; et al. Multitemporal Multispectral Dataset for Palaeochannels Segmentation (MAPS). IEEE Access 2025, 13, 203113–203124. [Google Scholar] [CrossRef]
  50. Agapiou, A.; Lysandrou, V.; Lasaponara, R.; Masini, N.; Hadjimitsis, D. Study of the Variations of Archaeological Marks at Neolithic Site of Lucera, Italy Using High-Resolution Multispectral Datasets. Remote Sens. 2016, 8, 723. [Google Scholar] [CrossRef]
  51. Hamuda, E.; Mc Ginley, B.; Glavin, M.; Jones, E. Automatic Crop Detection under Field Conditions Using the HSV Colour Space and Morphological Operations. Comput. Electron. Agric. 2017, 133, 97–107. [Google Scholar] [CrossRef]
  52. Chen, C.; Chen, H.; Liang, J.; Huang, W.; Xu, W.; Li, B.; Wang, J. Extraction of Water Body Information from Remote Sensing Imagery While Considering Greenness and Wetness Based on Tasseled Cap Transformation. Remote Sens. 2022, 14, 3001. [Google Scholar] [CrossRef]
  53. Macintyre, P.; van Niekerk, A.; Mucina, L. Efficacy of Multi-Season Sentinel-2 Imagery for Compositional Vegetation Classification. Int. J. Appl. Earth Obs. Geoinf. 2020, 85, 101980. [Google Scholar] [CrossRef]
  54. Shi, T.; Xu, H. Derivation of Tasseled Cap Transformation Coefficients for Sentinel-2 MSI At-Sensor Reflectance Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4038–4048. [Google Scholar] [CrossRef]
  55. Lasaponara, R.; Masini, N. Image Enhancement, Feature Extraction and Geospatial Analysis in an Archaeological Perspective. In Satellite Remote Sensing; Springer: Dordrecht, The Netherlands, 2012; pp. 17–63. [Google Scholar]
  56. Vélez, S.; Martínez-Peña, R.; Castrillo, D. Beyond Vegetation: A Review Unveiling Additional Insights into Agriculture and Forestry through the Application of Vegetation Indices. J 2023, 6, 421–436. [Google Scholar] [CrossRef]
  57. Kearney, S.P.; Porensky, L.M.; Augustine, D.J.; Gaffney, R.; Derner, J.D. Monitoring Standing Herbaceous Biomass and Thresholds in Semiarid Rangelands from Harmonized Landsat 8 and Sentinel-2 Imagery to Support within-Season Adaptive Management. Remote Sens. Environ. 2022, 271, 112907. [Google Scholar] [CrossRef]
  58. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
  59. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  60. Li, J.; Cai, Y.; Li, Q.; Kou, M.; Zhang, T. A Review of Remote Sensing Image Segmentation by Deep Learning Methods. Int. J. Digit. Earth 2024, 17, 2328827. [Google Scholar] [CrossRef]
  61. Yue, X.; Qi, K.; Na, X.; Zhang, Y.; Liu, Y.; Liu, C. Improved YOLOv8-Seg Network for Instance Segmentation of Healthy and Diseased Tomato Plants in the Growth Stage. Agriculture 2023, 13, 1643. [Google Scholar] [CrossRef]
  62. Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the 13th European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
  63. Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Advances in Neural Information Processing Systems, Proceedings of the Thirty-Fifth Conference on Neural Information Processing Systems, Online, 6–14 December 2021; Curran Associates, Inc.: New York, NY, USA, 2021; pp. 12077–12090. [Google Scholar]
  64. Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
  65. Milletari, F.; Navab, N.; Ahmadi, S.-A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; IEEE: New York, NY, USA; pp. 565–571. [Google Scholar]
  66. Abraham, N.; Khan, N.M. A Novel Focal Tversky Loss Function With Improved Attention U-Net for Lesion Segmentation. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy, 8–11 April 2019; IEEE: New York, NY, USA; pp. 683–687. [Google Scholar]
  67. Dong, R.; Bai, L.; Li, F. SiameseDenseU-Net-Based Semantic Segmentation of Urban Remote Sensing Images. Math. Probl. Eng. 2020, 2020, 1515630. [Google Scholar] [CrossRef]
  68. Pokkuluri, K.S.; S.S.N, U.D.N.; Margala, M.; Chakrabarti, P. Enhancing Image Segmentation Accuracy Using Deep Learning Techniques. J. Adv. Res. Appl. Sci. Eng. Technol. 2024, 49, 139–148. [Google Scholar] [CrossRef]
  69. Xia, F.; Chu, S.; Liu, X.; Li, G. Hyperspectral Remote Sensing Image Dimensionality Reduction Method Based on Adaptive Filtering. J. Comput. Methods Sci. Eng. 2023, 23, 1705–1717. [Google Scholar] [CrossRef]
  70. Mu, L.; Zhang, C.; Chi, P.; Liu, L. A Band Selection Method of Hyperspectral Remote Sensing Based on Particle Frog Leaping Algorithm. Optoelectron. Lett. 2018, 14, 316–319. [Google Scholar] [CrossRef]
  71. Wang, Y.; Huang, S.; Liu, D.; Wang, B. Research Advance on Band Selection-Based Dimension Reduction of Hyperspectral Remote Sensing Images. In Proceedings of the 2012 2nd International Conference on Remote Sensing, Environment and Transportation Engineering, Nanjing, China, 1–3 June 2012; IEEE: New York, NY, USA, 2012; pp. 1–4. [Google Scholar]
  72. Bennett, R.; Welham, K.; HilF, R.A. Airborne Spectral Imagery for Archaeological Prospection in Grassland Environments-An Evaluation of Performance. Antiquity 2013, 87, 220–236. [Google Scholar] [CrossRef]
  73. Bennett, R.; Welham, K.; Hill, R.A.; Ford, A.L.J. The Application of Vegetation Indices for the Prospection of Archaeological Features in Grass-dominated Environments. Archaeol. Prospect. 2012, 19, 209–218. [Google Scholar] [CrossRef]
  74. Kaimaris, D.; Patias, P.; Tsakiri, M. Best Period for High Spatial Resolution Satellite Images for the Detection of Marks of Buried Structures. Egypt. J. Remote Sens. Space Sci. 2012, 15, 9–18. [Google Scholar] [CrossRef]
Figure 1. Selected coastal plain used for training, validation, and testing. Validation areas are geographically separate subsets within the training regions, while a distinct test area is in a different region to assess model generalisation. (a) Overview map of Italy showing the study areas; (b) the Veneto and Friuli plains, Po Delta plain, and the test area; (c) the Salpi lagoon plain. Basemap: ESRI World Imagery.
Figure 1. Selected coastal plain used for training, validation, and testing. Validation areas are geographically separate subsets within the training regions, while a distinct test area is in a different region to assess model generalisation. (a) Overview map of Italy showing the study areas; (b) the Veneto and Friuli plains, Po Delta plain, and the test area; (c) the Salpi lagoon plain. Basemap: ESRI World Imagery.
Remotesensing 17 04014 g001
Figure 2. Workflow of the proposed methodology for creating spectral enhancements from Sentinel-2 data and assessing their impact on deep learning model performance.
Figure 2. Workflow of the proposed methodology for creating spectral enhancements from Sentinel-2 data and assessing their impact on deep learning model performance.
Remotesensing 17 04014 g002
Figure 3. Seasonal variation in palaeochannels on MV images: (a) April; (b) August; (c) November. White arrows highlight notable differences in the appearance and continuity of palaeochannels across periods.
Figure 3. Seasonal variation in palaeochannels on MV images: (a) April; (b) August; (c) November. White arrows highlight notable differences in the appearance and continuity of palaeochannels across periods.
Remotesensing 17 04014 g003
Figure 4. Different visual appearances of palaeochannel traces across the enhanced images: (a) RGB; (b) FCC; (c) VBB; (d) MV; (e) NDVI. To improve visualisation, a histogram stretch was applied to enhance the contrast of crop and soil marks.
Figure 4. Different visual appearances of palaeochannel traces across the enhanced images: (a) RGB; (b) FCC; (c) VBB; (d) MV; (e) NDVI. To improve visualisation, a histogram stretch was applied to enhance the contrast of crop and soil marks.
Remotesensing 17 04014 g004
Figure 5. Comparison of palaeochannel visibility across seasonal RGB imagery and vegetation indices for the same area in training set. Panels (ac) show RGB composites from April, August, and November, with the corresponding ground-truth palaeochannels overlaid in red to illustrate seasonal differences in surface expression. Panels (df) present the NDVI, MSAVI, and EVI visualisations computed for August.
Figure 5. Comparison of palaeochannel visibility across seasonal RGB imagery and vegetation indices for the same area in training set. Panels (ac) show RGB composites from April, August, and November, with the corresponding ground-truth palaeochannels overlaid in red to illustrate seasonal differences in surface expression. Panels (df) present the NDVI, MSAVI, and EVI visualisations computed for August.
Remotesensing 17 04014 g005
Figure 6. F1 score comparison across, months (a,b), models (c,d) and enhancements (e,f), using p-values and rank-biserial correlations.
Figure 6. F1 score comparison across, months (a,b), models (c,d) and enhancements (e,f), using p-values and rank-biserial correlations.
Remotesensing 17 04014 g006
Figure 7. Sample tile from the test set to compare predicted masks on the MV enhancement image. Each row represents a different season ((a): Spring, (b): Summer; (c): Winter). Columns from left to right: MV image, ground truth image (white: palaeochannel instances), predicted masks, and overlay comparison (green: false negative, red: false positives, yellow: true positive).
Figure 7. Sample tile from the test set to compare predicted masks on the MV enhancement image. Each row represents a different season ((a): Spring, (b): Summer; (c): Winter). Columns from left to right: MV image, ground truth image (white: palaeochannel instances), predicted masks, and overlay comparison (green: false negative, red: false positives, yellow: true positive).
Remotesensing 17 04014 g007
Figure 8. Comparison of palaeochannels detection results on the FCC visualisation. Each row represents a different month ((a): April, (b): August, (c): November). Columns from left to right: FCC visualisation, ground truth labels, model predictions, and overlay comparison (green: false negative, red: false positives, yellow: true positive).
Figure 8. Comparison of palaeochannels detection results on the FCC visualisation. Each row represents a different month ((a): April, (b): August, (c): November). Columns from left to right: FCC visualisation, ground truth labels, model predictions, and overlay comparison (green: false negative, red: false positives, yellow: true positive).
Remotesensing 17 04014 g008
Figure 9. Comparison of palaeochannels detection results on the RGB visualisation. Each row represents a different month ((a): April, (b): August, (c): November). Columns from left to right: RGB visualisation, ground truth labels, model predictions, and overlay comparison (green: false negative, red: false positives, yellow: true positive).
Figure 9. Comparison of palaeochannels detection results on the RGB visualisation. Each row represents a different month ((a): April, (b): August, (c): November). Columns from left to right: RGB visualisation, ground truth labels, model predictions, and overlay comparison (green: false negative, red: false positives, yellow: true positive).
Remotesensing 17 04014 g009
Figure 10. Comparison of palaeochannels detection results on the VBB visualisation. Each row represents a different month ((a): April, (b): August, (c): November). Columns from left to right: VBB visualisation, ground truth labels, model predictions, and overlay comparison (green: false negative, red: false positives, yellow: true positive).
Figure 10. Comparison of palaeochannels detection results on the VBB visualisation. Each row represents a different month ((a): April, (b): August, (c): November). Columns from left to right: VBB visualisation, ground truth labels, model predictions, and overlay comparison (green: false negative, red: false positives, yellow: true positive).
Remotesensing 17 04014 g010
Figure 11. Predicted masks overlying ground truth labels in the test set for the MV enhancement, assessed using an object-detection metric (threshold = 0.5). True positives (white), false positives (red), and false negatives (green).
Figure 11. Predicted masks overlying ground truth labels in the test set for the MV enhancement, assessed using an object-detection metric (threshold = 0.5). True positives (white), false positives (red), and false negatives (green).
Remotesensing 17 04014 g011
Table 1. Summary statistics of the annotated palaeochannel dataset for each enhanced visualisation and observation period in the training set. Number of instances indicates the count of individual annotated polygons. Instance area is reported in pixels (Sentinel-2 10 m resolution; 1 pixel = 100 m2). Median value and IQR range are reported. Tiles with events report the percentage of tiles containing at least one pixel of the target class over the total training-area tiles. Event coverage denotes the proportion of palaeochannel pixels relative to the total number of pixels in the dataset.
Table 1. Summary statistics of the annotated palaeochannel dataset for each enhanced visualisation and observation period in the training set. Number of instances indicates the count of individual annotated polygons. Instance area is reported in pixels (Sentinel-2 10 m resolution; 1 pixel = 100 m2). Median value and IQR range are reported. Tiles with events report the percentage of tiles containing at least one pixel of the target class over the total training-area tiles. Event coverage denotes the proportion of palaeochannel pixels relative to the total number of pixels in the dataset.
MonthVisualisationN° of InstancesArea of Instances (Number of Pixels) Tiles w/Event (% of Total Tiles)Event Coverage (% of Total Pixels)
AprilRGB2474125 [73, 247]59.01.10
FCC2011131 [66, 264]60.91.17
VBB2285125 [61, 248]58.81.10
MV2581128 [67, 244]67.01.30
(RGB + FCC)2432136 [70, 265]61.51.25
AugustRGB169994 [46, 185]60.90.86
FCC170191 [45, 181]61.80.87
VBB158892 [47, 181]57.90.76
MV160893 [47, 181]58.20.76
(RGB + FCC)166798 [48, 194]62.10.91
NovemberRGB1977135 [72, 247]58.91.22
FCC1956131 [72, 233]68.81.19
VBB1977140 [77, 257]68.81.23
MV2061132 [72, 236]69.91.27
(RGB + FCC)1791137 [76, 255]69.91.27
Table 2. Performance of U-Net across enhancements and months, reported as mean ± standard deviation for IoU, precision, recall, and F1. Best mean values per month are highlighted in bold.
Table 2. Performance of U-Net across enhancements and months, reported as mean ± standard deviation for IoU, precision, recall, and F1. Best mean values per month are highlighted in bold.
APRILAUGUSTNOVEMBER
EnhancementsIoUPreRecF1IoUPreRecF1IoUPreRecF1
MV0.22 ± 0.020.51 ± 0.110.31 ± 0.080.36 ± 0.020.07 ± 0.030.32 ± 0.130.10 ± 0.050.14 ± 0.050.19 ± 0.030.50 ± 0.11 0.26 ± 0.110.32 ± 0.05
RGB0.09 ± 0.40.61 ± 0.90.11 ± 0.70.17 ± 0.080.07 ± 0.020.30 ± 0.10.09 ± 0.040.13 ± 0.03 0.06 ± 0.04 0.52 ± 0.27 0.07 ± 0.06 0.11 ± 0.07
FCC0.21 ± 0.02 0.47 ± 0.05 0.29 ± 0.04 0.35 ± 0.03 0.06 ± 0.01 0.27 ± 0.05 0.07 ± 0.02 0.12 ± 0.03 0.08 ± 0.020.61 ± 0.150.09 ± 0.030.15 ± 0.04
VBB0.18 ± 0.03 0.05 ± 0.11 0.26 ± 0.11 0.32 ± 0.05 0.03 ± 0.010.43 ± 0.230.03 ± 0.03 0.04 ± 0.03 0.09 ± 0.020.58 ± 0.22 0.10 ± 0.04 0.16 ± 0.05
12Band0.04 ± 0.030.09 ± 0.050.10 ± 0.050.09 ± 0.060.02 ± 0.010.19 ± 0.060.04 ± 0.050.10 ± 0.030.03 ± 0.020.15 ± 0.070.03 ± 0.030.05 ± 0.4
NDVI0.05 ± 0.010.32 ± 0.120.08 ± 0.040.13 ± 0.050.03 ± 0.010.25 ± 0.100.05 ± 0.030.08 ± 0.030.04 ± 0.010.30 ± 0.110.07 ± 0.030.11 ± 0.04
Table 3. Performance of Yolov8 across enhancements and months, reported as mean ± standard deviation for IoU, precision, recall, and F1. Best mean values per month are highlighted in bold.
Table 3. Performance of Yolov8 across enhancements and months, reported as mean ± standard deviation for IoU, precision, recall, and F1. Best mean values per month are highlighted in bold.
APRILAUGUSTNOVEMBER
EnhancementsIoUPreRecF1IoUPreRecF1IoUPreRecF1
MV0.09 ± 0.020.42 ± 0.070.17 ± 0.040.18 ± 0.050.03 ± 0.020.18 ± 0.130.05 ± 0.030.07 ± 0.040.06 ± 0.030.51 ± 0.090.07 ± 0.040.11 ± 0.05
RGB0.06 ± 0.050.26 ± 0.070.09 ± 0.070.12 ± 0.080.04 ± 0.010.24 ± 0.150.05 ± 0.040.07 ± 0.030.06 ± 0.020.44 ± 0.070.08 ± 0.050.01 ± 0.05
FCC0.07 ± 0.020.54 ± 0.100.08 ± 0.020.14 ± 0.050.02 ± 0.010.46 ± 0.210.03 ± 0.030.05 ± 0.030.04 ± 0.020.47 ± 0.090.05 ± 0.040.08 ± 0.05
VBB0.06 ± 0.030.32 ± 0.090.07 ± 0.030.12 ± 0.050.006 ± 0.0040.02 ± 0.010.01 ± 0.0090.01 ± 0.0090.07 ± 0.020.45 ± 0.100.08 ± 0.030.13 ± 0.03
Table 4. Pixel-level F1 scores and object-level F1 scores, reported as mean and standard deviation, were computed using overlap thresholds of 0.3, 0.5, and 0.7. (*) indicate models derived from training strategy 1 (5-fold cross-validation); (‡) denote models obtained from training strategy 2 (geographical split).
Table 4. Pixel-level F1 scores and object-level F1 scores, reported as mean and standard deviation, were computed using overlap thresholds of 0.3, 0.5, and 0.7. (*) indicate models derived from training strategy 1 (5-fold cross-validation); (‡) denote models obtained from training strategy 2 (geographical split).
MonthsEnhancementF1 ScoreF1_Obj Score
AprilRGB ‡0.270.24 ± 0.08
FCC *0.390.39 ± 0.11
VBB *0.370.43 ± 0.11
MV *0.390.45 ± 0.10
AugustRGB *0.180.20 ± 0.06
FCC *0.160.18 ± 0.07
VBB *0.050.07 ± 0.01
MV *0.180.24 ± 0.07
NovemberRGB *0.130.16 ± 0.04
FCC *0.210.26 ± 0.06
VBB *0.220.21 ± 0.08
MV *0.310.25 ± 0.05
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yaseen, A.; Poggi, G.; Vascon, S.; Traviglia, A. Optimising Deep Learning-Based Segmentation of Crop and Soil Marks with Spectral Enhancements on Sentinel-2 Data. Remote Sens. 2025, 17, 4014. https://doi.org/10.3390/rs17244014

AMA Style

Yaseen A, Poggi G, Vascon S, Traviglia A. Optimising Deep Learning-Based Segmentation of Crop and Soil Marks with Spectral Enhancements on Sentinel-2 Data. Remote Sensing. 2025; 17(24):4014. https://doi.org/10.3390/rs17244014

Chicago/Turabian Style

Yaseen, Andaleeb, Giulio Poggi, Sebastiano Vascon, and Arianna Traviglia. 2025. "Optimising Deep Learning-Based Segmentation of Crop and Soil Marks with Spectral Enhancements on Sentinel-2 Data" Remote Sensing 17, no. 24: 4014. https://doi.org/10.3390/rs17244014

APA Style

Yaseen, A., Poggi, G., Vascon, S., & Traviglia, A. (2025). Optimising Deep Learning-Based Segmentation of Crop and Soil Marks with Spectral Enhancements on Sentinel-2 Data. Remote Sensing, 17(24), 4014. https://doi.org/10.3390/rs17244014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop