2.1. Study Area and PlanetScope Imagery
The 15 × 15 km study area was chosen to complement the Northern Croatia site, providing contrasting parcel-size distribution and pedological characteristics. Whereas the earlier compact area of interest highlighted the smallholder fragmentation, this Slavonija site contains intensive arable blocks interspersed with peri-urban strips, allowing for stratified evaluation by parcel area (0–10,000 m
2, 10,000–100,000 m
2, and >100,000 m
2). To further assess generalization, we additionally evaluated the workflow on a second area of interest (AOI) in Northern Croatia representing a smallholder dominated landscape; the results are provided in
Appendix C.
No ancillary layers were used; all inputs were derived from PlanetScope satellite imagery. The dense PlanetScope revisit increases the likelihood of capturing asynchronous phenology among neighbouring fields, which is expected to enhance temporal contrast at parcel boundaries in the HSV composites.
A one-year PlanetScope SuperDove time series from October 2024 to October 2025 was assembled. Scenes were cloud-free according to provider flags and passed visual quality control. No additional per-pixel haze or shadow correction was applied beyond the provider quality masks and manual screening; therefore, residual thin haze, undetected cloud edges, and cast shadows may persist in parts of the imagery and may introduce localized artefacts in the vegetation index time series and the subsequent harmonic descriptors. Missing observations were not interpolated; harmonic parameters were fitted using the available valid samples only. The images were received orthorectified, atmospherically corrected, and harmonized to Sentinel-2 data as surface reflectance at native 3 m resolution. All scenes were delivered in EPSG:32634. The scenes were obtained through Planet’s Education and Research Program.
The spatial extent of the AOI is shown in
Figure 1, using a PlanetScope RGB composite. The landscape is predominantly agricultural, organized into elongated, rectangular field units. A transportation corridor traverses the AOI diagonally, dividing it into two areas. The northern sector has a finer spatial grain, consisting of smaller agricultural parcels, linear settlement patterns aligned with road networks, a meandering river, and scattered woodland features. The southern sector is dominated by larger arable fields, with forested patches occurring intermittently. Vegetated surfaces are represented by darker green while bare or sparsely vegetated soils appear as lighter, reddish-brown tones. Field boundaries are mainly inferred from spectral and tonal discontinuities associated with crop phenological stages and management practices rather than from clearly defined physical boundaries.
SuperDove’s eight bands—coastal blue, blue, green, green I, yellow, red, red-edge (RE), and near-infrared (NIR)—enable the calculation of indices that exploit the green and red-edge regions in addition to classic red and NIR contrasts, supporting chlorophyll-sensitive formulations [
11,
12,
13]. Instrument characterisations and validation studies confirm the added spectral utility of the yellow and red-edge bands in terrestrial and aquatic applications, motivating tests beyond traditional indices towards green and red-edge variants that may strengthen boundary contrast where crops diverge in canopy structure or chemistry [
11,
12,
13]. The band set is used for per-index calculations to build the harmonic descriptors used to create a false colour composite. Leveraging all bands provides an index-agnostic basis for subsequent harmonic recolouring and segmentation.
2.2. Indices Calculation
Vegetation indices provide compact, interpretable summaries of canopy status. In this study, eleven indices are evaluated: the Normalized Difference Vegetation Index (NDVI) [
14], the Green Normalized Difference Vegetation Index (GNDVI) [
14], the Normalized Difference Red-Edge Index (NDRE) [
15], the Enhanced Vegetation Index (EVI) [
16] and the Two-band Enhanced Vegetation Index (EVI2) [
16], the Soil-Adjusted Vegetation Index (SAVI, L = 0.5) [
17] and the Modified Soil-Adjusted Vegetation Index (MSAVI) [
18], the Normalized Difference Water Index (NDWI) [
19], the Chlorophyll Index—Green (CIg) [
15] and the Chlorophyll Index—Red-Edge (CIre) [
15], and the Normalized Difference Yellow Vegetation Index (NDYVI) [
20]. Collectively, these indices span complementary sensitivities: structural greenness and canopy vigour, pigment and nitrogen proxies via green and red-edge leverage, robustness to soil background in high biomass, explicit soil suppression in sparse cover, separation of water influences, and band-specific contrasts that exploit yellow and red-edge sensitivity. The diversity of indices is expected to yield different boundary expressions at field edges once their harmonic phase, amplitude, and mean are encoded into the HSV colour space.
All vegetation indices were computed per date, per pixel from PlanetScope surface reflectance on a common 3 m grid. The same scene set was used for every index. Spectral symbols follow SuperDove naming: BLUE, GREEN, GREEN-I, YELLOW, RED, RED-EDGE (RE), and NIR. Formulas and sources for all indices are provided in
Table 1.
These eleven indices serve as inputs for the harmonic analysis, with each index modelled independently to extract phase, amplitude, and mean per pixel before HSV recolouring and segmentation.
2.3. Harmonic Analysis
The harmonic analysis follows the framework validated in Papić et al. (2025) [
8] and is presented as an end-to-end processing pipeline in
Figure 2, applied independently to each of the eleven indices in
Table 1. This section provides implementation details required for reproducibility across all workflow stages. Starting from PlanetScope imagery, scenes are quality-screened (provider cloud flags and manual quality control), reprojected to a common grid, and assembled into a temporal stack. Vegetation indices are computed per date and are normalized temporally. The constructed time series are summarized by fitting a single annual harmonic to obtain three descriptors, phase (timing), amplitude (seasonality), and mean (baseline), which are then mapped to the HSV colour space and exported as a recoloured false colour composite optimized for field boundary delineation and used as inputs for SAM. The resulting instance masks are vectorized, filtered, and evaluated using pixel-wise and object-wise accuracy metrics.
Per-pixel index trajectories
xk (k = 1, …, n acquisitions) undergo two-stage modelling. First, slow temporal drift is removed via linear regression in year units since the first observation in order to suppress slow cross-scene drifts that can bias a single sinusoid;
sk is defined as the time in years since the first image:
where t
1 is the ordinal day of the first valid acquisition,
tk is the ordinal day of acquisition
k, and 365.2422 is the mean tropical year length in days. The model is fit as:
where
is the intercept,
is the linear trend, and
is a zero-mean random error term, which captures unexplained variability at acquisition k. Parameters
and
are estimated using ordinary least squares (OLSs), and from that the detrended residual at acquisition k is formed:
Seasonal timing is represented on a centred calendar angle, the mean ordinal day
is defined over the set of valid dates
, and the annual angular time in radians
is formulated as:
The detrended series is then approximated by:
where a
0 is a constant bias that can remain after detrending; a
1 and b
1 are the cosine and sine coefficients of the first harmonic. The coefficients
a0,
a1, and
b1 are estimated by OLSs independently for each pixel and each index. Only the fundamental component is used; thus, the phase is directly interpretable on the calendar circle, and descriptors are comparable across indices. Amplitude
A summarizes the seasonal range, phase
ϕ, in radians, encodes the calendar timing of the seasonal maximum on the annual circle, and the empirical mean of the original index over the dates
is derived as:
For
, the empirical mean is used, not the harmonic intercept. The detrending and annual harmonic fitting is executed independently for each index, which preserves index-specific seasonality. A one-year index trace at each pixel is summarized by a single annual harmonic. To further contextualize the performance of the proposed harmonic-HSV encoding, two RGB-based baselines were also evaluated using the same SAM configuration and post-processing pipeline, and the results are shown in
Appendix D.
2.4. Perceptual Recolouring
The three harmonic descriptors, phase
, amplitude
A, and mean
are converted into a Hue–Saturation–Value (HSV) image used as input for segmentation. The mapping follows the perceptual recoloring rationale established by Papić et al. [
8], with AOI wide scaling to preserve cross-tile consistency.
Hue
H is defined using the annual phase
, and it encodes relative seasonal timing; it is defined as:
And H = 0 and H = 1 coincide and represent the same timing on the annual circle.
To stabilize contrast across indices with different dynamic ranges, AOI wide winsorization is applied to the amplitude
A over all finite pixels for the current index [
21].
pL and
pH are defined as the 2nd and 98th percentiles of
A over all finite pixels for that index:
Subsequently, winsorized amplitude
A* is defined as:
which is then normalized to saturation
S:
The intuition behind this being that values below pL, which have very low seasonality, get mapped to S ~ 0, which results in grey colours and values above pH being mapped to S ~ 1, while the majority of pixels get spread linearly. This step stabilizes indices without suppressing parcel-scale contrast.
The value is depicted by the empirical mean
. To keep the mapping index-agnostic and robust,
is clipped to [−1, 1] and then mapped to [0, 1] as per:
This preserves rank order for naturally bounded indices, e.g., NDVI, while also preventing CIg and CIre from over-brightening due to large numeric ranges.
The (H,S,V) triplet then gets converted to RGB using the standard HSV transform. This perceptual encoding mirrors the strategy proposed by Papić et al. [
8], while generalizing it across various indices.
Figure 3, Panel (a) illustrates a multitemporal vegetation-index stack: each greyscale sheet represents the same AOI on a different acquisition date. Stacking these Panels yields a one-year time series for each pixel for all indices.
Figure 3, Panel (b) shows the per-pixel harmonic modelling of that series. The blue trace depicts the observed index values over time. A simple linear trend in years since the first image is removed, producing residuals. These residuals are then fitted with a single annual harmonic.
Figure 3, panel (c) depicts the mapping to the HSV colour space: the phase is mapped to the Hue so that parcels with different peak times receive different colours, while parcels with similar calendars appear in nearby Hues along the colour wheel. The amplitude is mapped to saturation: weak seasonality desaturates towards grey, while strongly seasonal crops are rendered vividly. The empirical index mean is mapped to the value, making low mean surfaces darker than high mean surfaces. Scaling is applied per index at the AOI level, using fixed saturation percentiles and conservative value clipping to keep colours consistent across tiles. The resulting output is a single three-band image per index, which is the direct input to SAM segmentation.
2.6. Validation and Accuracy Metrics
The ground-truth (GT) parcels are from the Paying Agency for Agriculture, Fisheries, and Rural Development (PAAFRD, 2025), delivered as a shapefile.
Figure 4 depicts the ground-truth parcels overlaid on an RGB composite of the study area. The pixel grid used for pixel-based metrics is built from the GT extent at 3 m resolution, so all rasterizations are perfectly co-registered. Prior to evaluation, polygons with an area less than 350 m
2 were removed. An Intersection-over-Union (IoU)-based screening step was applied prior to evaluation. For each predicted polygon, the maximum IoU with any ground truth (GT) parcel was computed, and only predictions with IoU > 0.50 were retained. This screening step is ground truth-aware and is therefore not a deployable post-processing strategy; consequently, all reported metrics should be interpreted as conditional segmentation quality for sufficiently overlapping candidates. We retain this GT-dependent screening to keep the pipeline fixed and isolate the effect of vegetation index encoding across variants. In a deployable workflow, this step would be replaced by a GT-agnostic acceptance/rejection module (a lightweight classifier trained on a small, labelled subset, shape heuristics, thresholding…), which is outside the scope of the present study. All metrics were computed over the AOI extent, and an identical evaluation workflow and threshold values were applied for each vegetation index variant.
For object-based scoring, a one-to-one matching procedure was performed between the retained predictions and the GT parcels. GT polygons were processed in order of occurrence, and candidate predictions were retrieved by querying a spatial index using bounding-box intersection as a coarse filter. Among the candidates, the polygon with the highest IoU was selected; when this value exceeded 0.50, a match was registered, and both objects were marked as used. Under this constraint, each GT parcel was matched to one prediction at most, and each prediction was allowed to participate in one match at most.
After matching, true positives (TP) were defined as predictions assigned to a GT parcel with IoU > 0.50. False positives (FP) were defined as predictions without an assigned GT parcel, and false negatives (FN) were defined as GT parcels without an assigned prediction. These counts were then used to compute the object-based performance metrics.
To quantify the performance of each index with respect to area, the previously mentioned metrics were also categorized by the area of the ground truth parcel, divided into three categories:
Parcels with an area less than 10,000 m2;
Parcels with an area between 10,000 m2 and 100,000 m2;
Parcels with an area greater than 100,000 m2.
Within each parcel-size category, the following object-based counts were defined and counted:
TP: GT parcels in the category that were matched to exactly one prediction with IoU > 0.50.
FN: GT parcels in the category for which no matching prediction was assigned.
FP: Predictions that were not assigned to any GT parcel.
These counts were used to calculate precision, recall, F1 score, and mean IoU. True negatives (TNs) were not reported, as the evaluation was performed in an object-based, GT-anchored manner, and TNs would correspond to background area and were not informative for the metrics considered.
Pixel-wise metrics are calculated by rasterizing both the ground truth and the prediction set for each index. The vector polygons were burnt as binary masks, where parcels had a value of 1, and the background had a value of 0. On the resulting grid, the following counts were computed:
TP: Cells where both GT and prediction masks were equal to 1 (correct parcel coverage).
FP: Cells where the GT mask was 0 and the prediction mask was 1 (non-parcel area labelled as parcel).
FN: Cells where the GT mask was 1, and the prediction mask was 0 (missed parcel area).
TNs are not counted, as they represent the background. Using these counts, precision, recall, F1 score, and IoU were calculated.
Fragmentation was quantified by counting the number of predicted polygons associated with each GT parcel. A GT parcel was classified as fragmented when it was linked to more than one prediction, and fragmentation statistics were reported by parcel-size category.
Additional object-level geometric errors were computed, including Global Over-Classification (GOC), Global Under-Classification (GUC), and Global Total Classification (GTC) error. Let
Si denote the i-th predicted parcel and
Oi denote the GT parcel with which
Si had the largest area of overlap. The over-classification (OC) error was defined following [
1]:
and the under-classification (UC) error was defined as:
and the total classification (TC) error as:
To quantify spatial overreach, omission, and overall geometric fidelity, global error means were computed as area-weighted averages across all predicted polygons:
In order to explicitly evaluate the spatial agreement between predicted and reference parcel outlines, boundary-sensitive metrics are also computed and reported. This evaluation is due to the fact that metrics such as IoU and F1 primarily reflect the amount of shared area between polygons but are less sensitive to local boundary displacements. For this reason, Boundary F1 (bF1) and Boundary IoU (bIoU) are included as the boundary-focused complements to the main accuracy metrics. The evaluation was performed on a common raster grid derived from the reference data extent. Both the ground truth parcel layer and the predicted parcel polygons were rasterized to the same 3 m spatial resolution. After rasterization, parcel boundaries were extracted from the label images by identifying pixels located at transitions between different labels. Let
LGT denote the rasterized ground truth label image and
LPR the rasterized prediction label image. Their corresponding binary boundary maps are denoted by
BGT and
BPR, where a value of 1 marks a boundary pixel, and 0 marks a non-boundary pixel. Because the exact pixel-level coincidence of two boundaries is unrealistic in parcel delineation, a tolerant boundary-matching strategy was adopted. Buffer δ was used to dilate both boundary masks. In this study, the tolerance δ was set to two pixels, which corresponds to 6 m. The buffered boundary supports are defined as:
Boundary precision and recall were computed by testing whether predicted boundary pixels fell within the tolerance neighbourhood of the ground truth boundary and vice versa. Boundary precision is defined as:
which measures the proportion of predicted boundary pixels that lie within the tolerated neighbourhood of the reference boundary. Boundary recall is defined as:
which measures the proportion of reference boundary pixels that are recovered within the tolerated neighbourhood of the predicted boundary. Using these two quantities, the Boundary F1 score was computed as the harmonic mean:
In addition to bF1, we also compute boundary Intersection-over-Union to provide a direct overlap measure between the tolerated boundary supports of the reference and prediction. Boundary IoU was defined as:
Unlike bF1, which is derived from separate precision and recall terms, bIoU summarizes the symmetric overlap between the dilated boundary sets in a single ratio. Higher values indicate stronger geometric consistency between predicted and reference contours after accounting for the predefined tolerance. As it is based exclusively on the boundary regions, bIoU is more sensitive to contour placement than the standard IoU.
The identical encoding, segmentation, post-processing, and evaluation (pixel-wise and object-wise) settings were applied to the second AOI to enable a cross-site comparison; results are summarized in
Appendix C.