Harmonic Phenology Mapping: From Vegetation Indices to Field Delineation

Papić, Filip; Miler, Mario; Medak, Damir; Rumora, Luka

doi:10.3390/rs18071011

Open AccessArticle

Harmonic Phenology Mapping: From Vegetation Indices to Field Delineation

Faculty of Geodesy, University of Zagreb, Kačićeva 26, 10000 Zagreb, Croatia

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(7), 1011; https://doi.org/10.3390/rs18071011

Submission received: 4 February 2026 / Revised: 16 March 2026 / Accepted: 25 March 2026 / Published: 27 March 2026

(This article belongs to the Special Issue Deep Learning-Based Analysis of High-Resolution Remote Sensing Images: Registration, Fusion, and Change Detection)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A phenology-encoded HSV composite enables zero-shot parcel delineation without retraining.
Among the eleven tested indices, performance is tiered: MSAVI, EVI, EVI2, and SAVI yield the best results, with errors also being dependent on parcel area.

What are the implications of the main findings?

A simple, scalable and interpretable methodology is defined for operations; further improvements are easily implementable.
The same harmonic descriptors used for the segmentation also support crop mapping.

Abstract

Operational agricultural monitoring in the Central European lowlands requires timely parcel boundaries; however, unmarked field edges produce minimal spectral contrast in single-date imagery. Previous works demonstrated that harmonic NDVI encoding enables zero-shot field delineation using foundational models, but the influence of the spectral index choice on temporal boundaries remained unquantified. This study systematically evaluates eleven vegetation indices—NDVI, GNDVI, NDRE, EVI, EVI2, SAVI, MSAVI, NDWI, CIg, CIre, and NDYVI—within a fixed harmonic phenology encoding pipeline. A one-year PlanetScope time series (15 × 15 km, Slavonija, Croatia) was decomposed via annual sinusoidal regression to extract per-pixel phase, amplitude, and mean parameters. These harmonic descriptors were mapped to HSV colour channels and segmented using the Segment Anything Model without fine-tuning. Official agricultural parcels (PAAFRD, 2025) provided ground truth for pixel-wise, object-wise, and size-stratified evaluation. Performance stratified into three tiers based on object-wise metrics. Soil-adjusted and enhanced-greenness indices (MSAVI, EVI, EVI2, and SAVI) achieved F1 = 0.51–0.52, and mIoU = 0.70–0.71, statistically outperforming standard ratio formulations (NDVI: F1 = 0.49) and chlorophyll indices (CIg, CIre: F1 = 0.45–0.47). Pixel-wise scores remained compressed (F1 > 0.88 across all indices), indicating consistent interior coverage but index-dependent boundary precision. Error analysis revealed scale-dependent patterns: merging dominated small parcels (<10,000 m²), while fragmentation increased with parcel size. Results demonstrate that spectral formulation is a systematic design factor in phenology-based delineation, with soil background correction and dynamic range compression improving seasonal trajectory separability. The harmonic parameters generated by this framework provide feature-ready input for crop classification, suggesting that integrated boundary extraction and crop mapping workflows merit further investigation.

Keywords:

field delineation; SAM; vegetation indices; harmonic analysis; zero-shot segmentation; phenology

1. Introduction

Accurate delineation of agricultural field boundaries underpins land administration, subsidy control, yield monitoring, crop statistics, and environmental compliance. However, boundaries are often weakly expressed, particularly in smallholder and strip-field systems where hedgerows, tracks, or ditches are discontinuous or absent. Recent surveys of agricultural parcel and boundary delineation (APBD) highlight a surge in techniques, from edge and region-growing pipelines to modern deep learning networks, but persistent challenges in fragmented landscapes and a lack of temporal cues in single-date inputs are also noted [1,2]. Although satellite time series are now widely available, the use of seasonal phenology as a primary signal for boundary detection remains limited in practice. Traditional field delineation methods evolved from manual digitisation and semi-automated workflows based on aerial photogrammetry and cadastral maps, which are accurate but labour-intensive, slow, and costly, especially in fragmented smallholder landscapes or rapidly changing areas [1]. With satellite remote sensing, large-area timely monitoring became possible, and modern approaches are broadly divided into pixel-based, edge-based, region-based, and hybrid methods that fuse boundaries with regional homogeneity [1]. Recent deep learning systems still rely on spatial–spectral contrasts from a single date, with phenological dynamics treated as an auxiliary input rather than central.

Foundational models for segmentation have transformed the field by enabling zero-shot mask proposals without task-specific training. The Segment Anything Model (SAM) is promptable and broadly transferable, built from more than one billion masks on eleven million images; it can generalize to new, unseen domains, including Earth observation, with one of its input modes [3]. In parallel, geospatial adaptations have emerged, frameworks that tailor inputs, tiling, and non-maximum suppression (NMS) to parcel geometry, and report competitive object-wise scores across heterogeneous agricultural regions [4], while early-season delineation seeks operational timelines with limited temporal depth [5].

Harmonic analysis summarizes seasonal trajectories into mean, amplitude, and phase using a compact wave function, enabling the interpolation of sparse or irregular acquisitions without extensive gap filling [6]. Studies using Landsat and MODIS data have shown that harmonic regression robustly captures phenology and can be mapped to downstream tasks where timing (phase) and seasonal range (amplitude) have agronomic significance [6,7]. When these descriptors are mapped to perceptual channels, ambiguous adjacent parcels can be separated through temporal contrast, even when single-date reflectance is similar [6,7].

Recent work by our group [8] addressed the temporal encoding challenge through perceptual colour mapping: harmonic Normalized Difference Vegetation Index (NDVI) parameters (phase, amplitude, and mean) were projected into cylindrical colour spaces (Hue–Saturation–Value—HSV, Hue–Whiteness–Blackness—HWB, and Luminance–Chroma–Hue—LCH) and segmented via SAM. That proof of concept, tested on a compact Northern Croatia site (5 × 5 km), established the viability of training-free, phenology-based delineation but examined only a single spectral formulation. The question of whether index physics—soil sensitivity, red-edge incorporation, and dynamic range—systematically affects boundary detectability when encoded through identical temporal decomposition and colour space projection remained open.

Spectral index choice matters mechanistically. Soil-adjusted formulations (MSAVI, SAVI) suppress within-field heterogeneity that could trigger spurious fragmentation. Enhanced-greenness variants (EVI, EVI2) normalize amplitude distributions across canopy densities, potentially stabilizing saturation values in colour space encodings. Red-edge indices (NDRE, CIre) respond to nitrogen variability, which may either sharpen crop-type boundaries or amplify management noise. Yet no study has isolated index formulation as the sole experimental variable within a fixed harmonic-to-segmentation pipeline, controlling for decomposition method, colour space, and segmentation architecture.

We address this gap through a controlled comparison across eleven vegetation indices spanning greenness, soil adjustment, red-edge, and water formulations. By holding the harmonic model, colour encoding, segmentation backbone, and evaluation protocol constant, we attribute performance differences directly to index-specific characteristics. Testing on a larger, more heterogeneous agricultural landscape (15 × 15 km, Slavonija, Croatia) enables size-stratified error analysis and the statistical validation of index rankings.

Many new SAM-derived delineation frameworks provide additional context. FieldSeg combines Sentinel temporal composites with SAM and careful patch management, reporting scalable extraction across eight regions and clarifying practical factors such as input normalization, mask filtering, and tiling overlaps [4]. Other studies enhance boundaries using detail-enhancement or edge-aware filters on SAM embeddings (Boundary SAM) or employ hybrid prompts (DeepLabV3+) with SAM blocks (fabSAM) to reduce merging and fragmentation errors common in dense strip fields [9,10]. Meanwhile, Delineate Anything reframes the task as instance segmentation trained on the FBIS-22M dataset, demonstrating resolution-agnostic generalization and strong metrics, setting a high standard for supervised approaches [2]. This highlights that input representation and post-processing remain crucial, even with strong backbones [2,4,9,10].

Considering all this, we present a method-development study that keeps the segmenter, tiling, thresholds, and evaluation protocol fixed, while varying the vegetation index used for harmonic to HSV recolouring. This study addresses the open question of how the choice of vegetation index affects boundary detectability when phenology is summarized via harmonic descriptors and mapped to a cylindrical colour space. The objective is to determine whether index-specific encodings produce systematically different delineation outcomes under an otherwise fixed, training-free segmentation pipeline, and to quantify any differences using standard pixel-wise and object-wise metrics.

2. Materials and Methods

2.1. Study Area and PlanetScope Imagery

The 15 × 15 km study area was chosen to complement the Northern Croatia site, providing contrasting parcel-size distribution and pedological characteristics. Whereas the earlier compact area of interest highlighted the smallholder fragmentation, this Slavonija site contains intensive arable blocks interspersed with peri-urban strips, allowing for stratified evaluation by parcel area (0–10,000 m², 10,000–100,000 m², and >100,000 m²). To further assess generalization, we additionally evaluated the workflow on a second area of interest (AOI) in Northern Croatia representing a smallholder dominated landscape; the results are provided in Appendix C.

No ancillary layers were used; all inputs were derived from PlanetScope satellite imagery. The dense PlanetScope revisit increases the likelihood of capturing asynchronous phenology among neighbouring fields, which is expected to enhance temporal contrast at parcel boundaries in the HSV composites.

A one-year PlanetScope SuperDove time series from October 2024 to October 2025 was assembled. Scenes were cloud-free according to provider flags and passed visual quality control. No additional per-pixel haze or shadow correction was applied beyond the provider quality masks and manual screening; therefore, residual thin haze, undetected cloud edges, and cast shadows may persist in parts of the imagery and may introduce localized artefacts in the vegetation index time series and the subsequent harmonic descriptors. Missing observations were not interpolated; harmonic parameters were fitted using the available valid samples only. The images were received orthorectified, atmospherically corrected, and harmonized to Sentinel-2 data as surface reflectance at native 3 m resolution. All scenes were delivered in EPSG:32634. The scenes were obtained through Planet’s Education and Research Program.

The spatial extent of the AOI is shown in Figure 1, using a PlanetScope RGB composite. The landscape is predominantly agricultural, organized into elongated, rectangular field units. A transportation corridor traverses the AOI diagonally, dividing it into two areas. The northern sector has a finer spatial grain, consisting of smaller agricultural parcels, linear settlement patterns aligned with road networks, a meandering river, and scattered woodland features. The southern sector is dominated by larger arable fields, with forested patches occurring intermittently. Vegetated surfaces are represented by darker green while bare or sparsely vegetated soils appear as lighter, reddish-brown tones. Field boundaries are mainly inferred from spectral and tonal discontinuities associated with crop phenological stages and management practices rather than from clearly defined physical boundaries.

SuperDove’s eight bands—coastal blue, blue, green, green I, yellow, red, red-edge (RE), and near-infrared (NIR)—enable the calculation of indices that exploit the green and red-edge regions in addition to classic red and NIR contrasts, supporting chlorophyll-sensitive formulations [11,12,13]. Instrument characterisations and validation studies confirm the added spectral utility of the yellow and red-edge bands in terrestrial and aquatic applications, motivating tests beyond traditional indices towards green and red-edge variants that may strengthen boundary contrast where crops diverge in canopy structure or chemistry [11,12,13]. The band set is used for per-index calculations to build the harmonic descriptors used to create a false colour composite. Leveraging all bands provides an index-agnostic basis for subsequent harmonic recolouring and segmentation.

2.2. Indices Calculation

Vegetation indices provide compact, interpretable summaries of canopy status. In this study, eleven indices are evaluated: the Normalized Difference Vegetation Index (NDVI) [14], the Green Normalized Difference Vegetation Index (GNDVI) [14], the Normalized Difference Red-Edge Index (NDRE) [15], the Enhanced Vegetation Index (EVI) [16] and the Two-band Enhanced Vegetation Index (EVI2) [16], the Soil-Adjusted Vegetation Index (SAVI, L = 0.5) [17] and the Modified Soil-Adjusted Vegetation Index (MSAVI) [18], the Normalized Difference Water Index (NDWI) [19], the Chlorophyll Index—Green (CIg) [15] and the Chlorophyll Index—Red-Edge (CIre) [15], and the Normalized Difference Yellow Vegetation Index (NDYVI) [20]. Collectively, these indices span complementary sensitivities: structural greenness and canopy vigour, pigment and nitrogen proxies via green and red-edge leverage, robustness to soil background in high biomass, explicit soil suppression in sparse cover, separation of water influences, and band-specific contrasts that exploit yellow and red-edge sensitivity. The diversity of indices is expected to yield different boundary expressions at field edges once their harmonic phase, amplitude, and mean are encoded into the HSV colour space.

All vegetation indices were computed per date, per pixel from PlanetScope surface reflectance on a common 3 m grid. The same scene set was used for every index. Spectral symbols follow SuperDove naming: BLUE, GREEN, GREEN-I, YELLOW, RED, RED-EDGE (RE), and NIR. Formulas and sources for all indices are provided in Table 1.

These eleven indices serve as inputs for the harmonic analysis, with each index modelled independently to extract phase, amplitude, and mean per pixel before HSV recolouring and segmentation.

2.3. Harmonic Analysis

The harmonic analysis follows the framework validated in Papić et al. (2025) [8] and is presented as an end-to-end processing pipeline in Figure 2, applied independently to each of the eleven indices in Table 1. This section provides implementation details required for reproducibility across all workflow stages. Starting from PlanetScope imagery, scenes are quality-screened (provider cloud flags and manual quality control), reprojected to a common grid, and assembled into a temporal stack. Vegetation indices are computed per date and are normalized temporally. The constructed time series are summarized by fitting a single annual harmonic to obtain three descriptors, phase (timing), amplitude (seasonality), and mean (baseline), which are then mapped to the HSV colour space and exported as a recoloured false colour composite optimized for field boundary delineation and used as inputs for SAM. The resulting instance masks are vectorized, filtered, and evaluated using pixel-wise and object-wise accuracy metrics.

Per-pixel index trajectories x_k (k = 1, …, n acquisitions) undergo two-stage modelling. First, slow temporal drift is removed via linear regression in year units since the first observation in order to suppress slow cross-scene drifts that can bias a single sinusoid; s_k is defined as the time in years since the first image:

s_{k} = \frac{t_{k} - t_{1}}{365.2422}

(1)

where t₁ is the ordinal day of the first valid acquisition, t_k is the ordinal day of acquisition k, and 365.2422 is the mean tropical year length in days. The model is fit as:

x_{k} = β_{0} + β_{1} s_{k} + ε_{k} .

(2)

where

β_{0}

is the intercept,

β_{1}

is the linear trend, and

ε_{k}

is a zero-mean random error term, which captures unexplained variability at acquisition k. Parameters

β_{0}

and

β_{1}

are estimated using ordinary least squares (OLSs), and from that the detrended residual at acquisition k is formed:

r_{k} = x_{k} - ({\hat{β}}_{0} + {\hat{β}}_{1} s_{k})

(3)

Seasonal timing is represented on a centred calendar angle, the mean ordinal day

\bar{t}

is defined over the set of valid dates

ν

, and the annual angular time in radians

θ_{k}

is formulated as:

\bar{t} = \frac{1}{n_{v}} \sum_{k \in ν} t_{k}, θ_{k} = \frac{2 π}{365.2422} (t_{k} - \bar{t}) .

(4)

The detrended series is then approximated by:

r_{k} = a_{0} + a_{1} \cos θ_{k} + b_{1} \sin θ_{k} .

(5)

where a₀ is a constant bias that can remain after detrending; a₁ and b₁ are the cosine and sine coefficients of the first harmonic. The coefficients a₀, a₁, and b₁ are estimated by OLSs independently for each pixel and each index. Only the fundamental component is used; thus, the phase is directly interpretable on the calendar circle, and descriptors are comparable across indices. Amplitude A summarizes the seasonal range, phase ϕ, in radians, encodes the calendar timing of the seasonal maximum on the annual circle, and the empirical mean of the original index over the dates

\bar{x}

is derived as:

A = \sqrt{a_{1}^{2} + b_{1}^{2}}, ϕ = a t a n 2 (b_{1}, a_{1}), \bar{x} = \frac{1}{n_{v}} \sum_{k \in ν} x_{k},

(6)

A \geq 0, ϕ \in (- π, π] .

(7)

For

\bar{x}

, the empirical mean is used, not the harmonic intercept. The detrending and annual harmonic fitting is executed independently for each index, which preserves index-specific seasonality. A one-year index trace at each pixel is summarized by a single annual harmonic. To further contextualize the performance of the proposed harmonic-HSV encoding, two RGB-based baselines were also evaluated using the same SAM configuration and post-processing pipeline, and the results are shown in Appendix D.

2.4. Perceptual Recolouring

The three harmonic descriptors, phase

ϕ

, amplitude A, and mean

\bar{x}

are converted into a Hue–Saturation–Value (HSV) image used as input for segmentation. The mapping follows the perceptual recoloring rationale established by Papić et al. [8], with AOI wide scaling to preserve cross-tile consistency.

Hue H is defined using the annual phase

ϕ

, and it encodes relative seasonal timing; it is defined as:

H = \frac{ϕ + π}{2 π}, H \in [0,1] .

(8)

And H = 0 and H = 1 coincide and represent the same timing on the annual circle.

To stabilize contrast across indices with different dynamic ranges, AOI wide winsorization is applied to the amplitude A over all finite pixels for the current index [21]. p_L and p_H are defined as the 2nd and 98th percentiles of A over all finite pixels for that index:

p_{L} = {p e r c}_{2} (A), p_{H} = {p e r c}_{98} (A),

(9)

Subsequently, winsorized amplitude A* is defined as:

A^{*} = \min (\max (A, p_{L}), p_{H}),

(10)

which is then normalized to saturation S:

S = \frac{A^{*} - p_{L}}{\max (p_{H} - p_{L})} .

(11)

The intuition behind this being that values below p_L, which have very low seasonality, get mapped to S ~ 0, which results in grey colours and values above p_H being mapped to S ~ 1, while the majority of pixels get spread linearly. This step stabilizes indices without suppressing parcel-scale contrast.

The value is depicted by the empirical mean

\bar{x}

. To keep the mapping index-agnostic and robust,

\bar{x}

is clipped to [−1, 1] and then mapped to [0, 1] as per:

{\bar{x}}^{*} = \min (\max (\bar{x}, - 1), 1), V = \frac{{\bar{x}}^{*} + 1}{2} .

(12)

This preserves rank order for naturally bounded indices, e.g., NDVI, while also preventing CIg and CIre from over-brightening due to large numeric ranges.

The (H,S,V) triplet then gets converted to RGB using the standard HSV transform. This perceptual encoding mirrors the strategy proposed by Papić et al. [8], while generalizing it across various indices.

Figure 3, Panel (a) illustrates a multitemporal vegetation-index stack: each greyscale sheet represents the same AOI on a different acquisition date. Stacking these Panels yields a one-year time series for each pixel for all indices. Figure 3, Panel (b) shows the per-pixel harmonic modelling of that series. The blue trace depicts the observed index values over time. A simple linear trend in years since the first image is removed, producing residuals. These residuals are then fitted with a single annual harmonic. Figure 3, panel (c) depicts the mapping to the HSV colour space: the phase is mapped to the Hue so that parcels with different peak times receive different colours, while parcels with similar calendars appear in nearby Hues along the colour wheel. The amplitude is mapped to saturation: weak seasonality desaturates towards grey, while strongly seasonal crops are rendered vividly. The empirical index mean is mapped to the value, making low mean surfaces darker than high mean surfaces. Scaling is applied per index at the AOI level, using fixed saturation percentiles and conservative value clipping to keep colours consistent across tiles. The resulting output is a single three-band image per index, which is the direct input to SAM segmentation.

2.5. Segmentation with Segment Anything Model

Each HSV composite is segmented using the Segment Anything Model (SAM) in automatic, prompt-free mode. All runs use the same configuration; no fine-tuning is performed. The HSV composites are used as RGB inputs, and images are partitioned into 512 × 512 px tiles. Tiling, multi-scale crop generation, and scene reconstruction using tile-wise masks were handled by the Segment Geospatial (0.12.3) wrapper built around SAM’s Automatic Mask Generator. The hyperparameters remain unchanged from Papić et al. (2025) [8]:

c r o p_n_l a y e r s = 2, c r o p_o v e r l a p_r a t i o = 0.35, c r o p_n_p o i n t s_d o w n s c a l e_f a c t o r = 1,

(13)

p o i n t s_p e r_s i d e = 32, p r e d_i o u_t h r e s h = 0.75, s t a b i l i t y_s c o r e_t h r e s h = 0.80 .

(14)

All runs used the Vision Transformer Huge (ViT-H) backbone, with weights sam_vit_h_4b8939.pth. Duplicates created by overlaps were suppressed using SAM’s internal non-maximum suppression and deduplication, and the results from each vegetation index input were kept separate, resulting in eleven distinct prediction sets. The instance masks were converted to polygons without post-processing. Predictions smaller than 350 m² were removed. To check whether the tiling and overlap handling introduced systematic edge effects, the performance was quantified in seam-adjacent versus interior regions. Tile borders were expanded by 16 pixels and rasterized to delineate seam zones; the remaining area was treated as interior. Ground truth and predictions were rasterized at 3 m resolution, and IoU, prediction, recall, and F1 scores were computed independently in both zones to test the effects of tiling on accuracy.

2.6. Validation and Accuracy Metrics

The ground-truth (GT) parcels are from the Paying Agency for Agriculture, Fisheries, and Rural Development (PAAFRD, 2025), delivered as a shapefile. Figure 4 depicts the ground-truth parcels overlaid on an RGB composite of the study area. The pixel grid used for pixel-based metrics is built from the GT extent at 3 m resolution, so all rasterizations are perfectly co-registered. Prior to evaluation, polygons with an area less than 350 m² were removed. An Intersection-over-Union (IoU)-based screening step was applied prior to evaluation. For each predicted polygon, the maximum IoU with any ground truth (GT) parcel was computed, and only predictions with IoU > 0.50 were retained. This screening step is ground truth-aware and is therefore not a deployable post-processing strategy; consequently, all reported metrics should be interpreted as conditional segmentation quality for sufficiently overlapping candidates. We retain this GT-dependent screening to keep the pipeline fixed and isolate the effect of vegetation index encoding across variants. In a deployable workflow, this step would be replaced by a GT-agnostic acceptance/rejection module (a lightweight classifier trained on a small, labelled subset, shape heuristics, thresholding…), which is outside the scope of the present study. All metrics were computed over the AOI extent, and an identical evaluation workflow and threshold values were applied for each vegetation index variant.

For object-based scoring, a one-to-one matching procedure was performed between the retained predictions and the GT parcels. GT polygons were processed in order of occurrence, and candidate predictions were retrieved by querying a spatial index using bounding-box intersection as a coarse filter. Among the candidates, the polygon with the highest IoU was selected; when this value exceeded 0.50, a match was registered, and both objects were marked as used. Under this constraint, each GT parcel was matched to one prediction at most, and each prediction was allowed to participate in one match at most.

After matching, true positives (TP) were defined as predictions assigned to a GT parcel with IoU > 0.50. False positives (FP) were defined as predictions without an assigned GT parcel, and false negatives (FN) were defined as GT parcels without an assigned prediction. These counts were then used to compute the object-based performance metrics.

P r e c i s i o n = \frac{T P}{T P + F P}, R e c a l l = \frac{T P}{T P + F N}, F 1 = 2 * \frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l},

(15)

I o U = \frac{T P}{T P + F P + F N}, m I o U = \frac{1}{N} \sum_{n} {I o U}_{n}

(16)

To quantify the performance of each index with respect to area, the previously mentioned metrics were also categorized by the area of the ground truth parcel, divided into three categories:

Parcels with an area less than 10,000 m²;
Parcels with an area between 10,000 m² and 100,000 m²;
Parcels with an area greater than 100,000 m².

Within each parcel-size category, the following object-based counts were defined and counted:

TP: GT parcels in the category that were matched to exactly one prediction with IoU > 0.50.
FN: GT parcels in the category for which no matching prediction was assigned.
FP: Predictions that were not assigned to any GT parcel.

These counts were used to calculate precision, recall, F1 score, and mean IoU. True negatives (TNs) were not reported, as the evaluation was performed in an object-based, GT-anchored manner, and TNs would correspond to background area and were not informative for the metrics considered.

Pixel-wise metrics are calculated by rasterizing both the ground truth and the prediction set for each index. The vector polygons were burnt as binary masks, where parcels had a value of 1, and the background had a value of 0. On the resulting grid, the following counts were computed:

TP: Cells where both GT and prediction masks were equal to 1 (correct parcel coverage).
FP: Cells where the GT mask was 0 and the prediction mask was 1 (non-parcel area labelled as parcel).
FN: Cells where the GT mask was 1, and the prediction mask was 0 (missed parcel area).

TNs are not counted, as they represent the background. Using these counts, precision, recall, F1 score, and IoU were calculated.

Fragmentation was quantified by counting the number of predicted polygons associated with each GT parcel. A GT parcel was classified as fragmented when it was linked to more than one prediction, and fragmentation statistics were reported by parcel-size category.

Additional object-level geometric errors were computed, including Global Over-Classification (GOC), Global Under-Classification (GUC), and Global Total Classification (GTC) error. Let S_i denote the i-th predicted parcel and O_i denote the GT parcel with which S_i had the largest area of overlap. The over-classification (OC) error was defined following [1]:

O C (S_{i}) = 1 - \frac{a r e a (S_{i} \cap O_{i})}{a r e a (O_{i})},

(17)

and the under-classification (UC) error was defined as:

U C (S_{i}) = 1 - \frac{a r e a (S_{i} \cap O_{i})}{a r e a (S_{i})},

(18)

and the total classification (TC) error as:

T C (S_{i}) = \sqrt{\frac{O C {(S_{i})}^{2} + U C {(S_{i})}^{2}}{2}} .

(19)

To quantify spatial overreach, omission, and overall geometric fidelity, global error means were computed as area-weighted averages across all predicted polygons:

G O C = \sum_{i}^{n} w_{i} O C (S_{i}), G U C = \sum_{i}^{n} w_{i} O C (S_{i}), G T C = \sum_{i}^{n} w_{i} T C (S_{i}),

(20)

w_{i} = \frac{a r e a (S_{i})}{\sum_{k}^{n} a r e a (S_{k})} .

(21)

In order to explicitly evaluate the spatial agreement between predicted and reference parcel outlines, boundary-sensitive metrics are also computed and reported. This evaluation is due to the fact that metrics such as IoU and F1 primarily reflect the amount of shared area between polygons but are less sensitive to local boundary displacements. For this reason, Boundary F1 (bF1) and Boundary IoU (bIoU) are included as the boundary-focused complements to the main accuracy metrics. The evaluation was performed on a common raster grid derived from the reference data extent. Both the ground truth parcel layer and the predicted parcel polygons were rasterized to the same 3 m spatial resolution. After rasterization, parcel boundaries were extracted from the label images by identifying pixels located at transitions between different labels. Let L_GT denote the rasterized ground truth label image and L_PR the rasterized prediction label image. Their corresponding binary boundary maps are denoted by B_GT and B_PR, where a value of 1 marks a boundary pixel, and 0 marks a non-boundary pixel. Because the exact pixel-level coincidence of two boundaries is unrealistic in parcel delineation, a tolerant boundary-matching strategy was adopted. Buffer δ was used to dilate both boundary masks. In this study, the tolerance δ was set to two pixels, which corresponds to 6 m. The buffered boundary supports are defined as:

B_{G T}^{δ} = d i l a t e (B_{G T}, δ), B_{P R}^{δ} = d i l a t e (B_{P R}, δ) .

(22)

Boundary precision and recall were computed by testing whether predicted boundary pixels fell within the tolerance neighbourhood of the ground truth boundary and vice versa. Boundary precision is defined as:

P_{b} = \frac{|B_{P R} \cap B_{G T}^{δ}|}{|B_{P R}|},

(23)

which measures the proportion of predicted boundary pixels that lie within the tolerated neighbourhood of the reference boundary. Boundary recall is defined as:

R_{b} = \frac{|B_{G T} \cap B_{P R}^{δ}|}{|B_{G T}|},

(24)

which measures the proportion of reference boundary pixels that are recovered within the tolerated neighbourhood of the predicted boundary. Using these two quantities, the Boundary F1 score was computed as the harmonic mean:

b F 1 = \frac{2 * P_{b} * P_{r}}{P_{b} + P_{r}} .

(25)

In addition to bF1, we also compute boundary Intersection-over-Union to provide a direct overlap measure between the tolerated boundary supports of the reference and prediction. Boundary IoU was defined as:

b I o U = \frac{|B_{G T}^{δ} \cap B_{P R}^{δ}|}{| B_{G T}^{δ} \cup B_{P R}^{δ} |} .

(26)

Unlike bF1, which is derived from separate precision and recall terms, bIoU summarizes the symmetric overlap between the dilated boundary sets in a single ratio. Higher values indicate stronger geometric consistency between predicted and reference contours after accounting for the predefined tolerance. As it is based exclusively on the boundary regions, bIoU is more sensitive to contour placement than the standard IoU.

The identical encoding, segmentation, post-processing, and evaluation (pixel-wise and object-wise) settings were applied to the second AOI to enable a cross-site comparison; results are summarized in Appendix C.

2.7. Statistical Analysis

The indices were compared on a within-parcel basis using non-parametric tests for repeated measures. For each ground truth parcel, the best IoU against each index’s prediction was computed to form a per-parcel matrix. The next step is to run a Friedman omnibus test across indices. A Friedman omnibus test was used to assess whether at least one method exhibited a different rank distribution across parcels [22]. Subsequently, paired Wilcoxon signed-rank tests were applied to all index pairs, and the Holm correction was applied to control the family-wise error rate [22,23]. For each ground truth polygon G, candidate predictions were retrieved via a spatial index and IoU was computed with each; the maximum IoU found is recorded for that parcel-index pair. The same approach is used for the F1 score as well. For the Friedman omnibus test, the test statistic, and p-value are reported; for the Wilcoxon test, for each pair and metric, the median difference, upper and lower 95% confidence intervals, and the Holm-adjusted p-value are reported.

3. Results

3.1. Harmonically Recoloured Outputs

Figure 5, Figure 6 and Figure 7 illustrate the harmonically recoloured composites generated from the fitted annual harmonic descriptors for all evaluated vegetation indices. In these false colour images, Hue encodes the timing of peak vegetation activity (phase), saturation encodes the strength of the seasonal dynamics (amplitude), and value represents the baseline vegetation level (empirical mean). As a consequence, parcels with similar crop calendars share similar Hues, while strongly seasonal fields appear more saturated than weakly seasonal or mixed-cover areas.

Across vegetation indices, the composites show consistent large-scale phenological patterns but differ in how clearly they separate adjacent parcels and how strongly non-vegetated elements are suppressed. Indices designed to reduce soil background effects typically produce higher saturation within cultivated parcels while maintaining comparatively uniform value in bare or sparsely vegetated zones, which visually improves parcel-to-parcel contrast. Indices more sensitive to canopy greenness emphasize dense vegetation but may compress contrast in areas where vegetation is uniformly high, making neighbouring parcels appear more similar. Red-edge and chlorophyll-related indices often highlight crop vigour differences more strongly within the same general phenological timing, which can increase within-field texture and may either aid or hinder boundary delineation, depending on local heterogeneity. Finally, NDWI behaves differently from greenness indices by emphasizing moisture-related variation, this can accentuate drainage patterns and non-crop-related features and therefore changes the visual relationship between parcel interiors and boundaries.

For additional context, we evaluated three commonly used classical segmentation baselines (Felzenszwalb, SLIC, and Quickshift); their pixel-wise and object-wise results are reported in Appendix B.

Figure 5 and Figure 6 summarize these behaviours for the main greenness and soil-adjusted indices (NDVI, GNDVI, NDRE, NDYVI, EVI, EVI2, SAVI, and MSAVI), while Figure 7 shows indices with different physical sensitivity (NDWI) and chlorophyll proxies (CIg, CIre).

3.2. Segmentation Outputs

Each HSV composite is segmented using the Segment Anything Model (SAM) on a fixed 512 × 512 px grid. SAM’s native output is a binary raster mask, which is then vectorized for further analysis and presentation. All Panels use identical scale, symbology, and their corresponding base map.

Figure 8 presents the segmentation outputs for four indices over the previously defined AOI: Panel (a) NDVI, Panel (b) GNDVI, Panel (c) NDRE, and Panel (d) NDYVI. Panels show the vectorized instance polygons derived from SAM on the previously shown HSV inputs.

Figure 9 depicts the segmentation outputs for four indices over the AOI: Panel (a) EVI, Panel (b) EVI2, Panel (c) SAVI; L = 0.5, and Panel (d) MSAVI.

Figure 10 shows the segmentation outputs for three additional indices: Panel (a) NDWI, Panel (b) CIg, and Panel (c) CIre.

3.3. Validation Metrics

Object-based metrics were used to indicate whether a distinct geometry was produced for each reference parcel, while the size-stratified tables were used to describe how performance varied with parcel area. Pixel-based metrics were used to quantify spatial agreement on a shared 3 m raster grid.

Across vegetation indices, three recurring segmentation failure modes were observed, each of which substantially affected the object-level scores:

Over-segmentation: Multiple predicted objects were assigned to a single GT parcel. Precision was reduced due to the increase in FPs, while recall was generally preserved. An example is shown in Figure 11a.
Under-segmentation: A single predicted polygon was found to cover two or more GT parcels. Recall was reduced because fewer reference parcels were matched, while precision was typically less affected. An example is shown in Figure 11b.
Fragmentation: A parcel was represented by numerous small predicted components. When at least one component exceeded IoU > 0.50, a true positive was recorded alongside multiple false positives, leading to reduced precision. An example is shown in Figure 11c.

3.3.1. Pixel-Wise Metrics

Table 2 shows the pixel-wise metrics. All indices achieve high scores, confirming that the interiors of the parcels are well covered for each vegetation index. NDWI achieves the highest pixel-wise metrics overall, but most of the differences are marginal when comparing the various vegetation indices. NDWI leads, with NDVI practically indistinguishable at the top; EVI2, EVI, MSAVI, and SAVI form a tight cluster just behind it. NDRE trails the greenness family, and CIg/CIre perform the worst.

3.3.2. Object-Wise Metrics

Table 3 presents the object-related metrics, with columns representing the individual metrics: precision, recall, F1 score, and mean IoU. MSAVI leads in precision, recall, and F1, with EVI2 and SAVI effectively tied just behind it. For mean IoU, EVI ranks first with NDWI second, while EVI2 and MSAVI are practically indistinguishable. Greenness-based indices sit mid-pack, NDRE trails them, and CIg and CIre produce the lowest results.

Table 4 shows the object-wise metrics for parcels smaller than 10,000 m². MSAVI, EVI, and SAVI lead in precision, recall, and F1, while NDRE, CIg, and CIre fall behind. NDWI achieves the highest mean IoU, but the margin is tiny and does not change the overall ordering.

Table 5 shows the results for parcels greater than 10,000 m² and smaller than 100,000 m². Performance improves over small parcels, with higher recall and F1 across all indices. A stable cluster of MSAVI, EVI2, EVI, and SAVI leads in precision, recall and F1, with NDWI close behind. NDVI, GNDVI, NDYVI, and NDRE form a middle tier, while CIg and CIre trail behind. For mIoU, EVI produces the best results but by an insignificant margin.

Table 6 shows the results for parcels larger than 100,000 m². Recall dominates across indices, being in the 0.8 range, which indicates that the method’s coverage is broad for large fields, while precision is comparatively lower, consistent with boundary overreaches. NDWI leads in F1 and recall, with EVI and EVI2 being close behind.

3.3.3. Tile Boundary Errors

Table 7 reports the differences between the seam band and the overall AOI. The dominant effect is a recall drop at the seams across all indices, while precision is essentially unchanged. F1 and IoU are lower in the seam for every index. The largest IoU penalty occurs for NDRE, followed by CIg and NDVI, with the smallest penalties affecting MSAVI, EVI, and SAVI.

3.3.4. Fragmentation Metrics

Table 8, Table 9 and Table 10 report the fragmentation ratio, defined as the number of predicted polygons normalized by the number of GT parcels, and stratified by parcel-size class. Values below 1 were interpreted as evidence of under-segmentation, while values above 1 indicate over-segmentation.

Table 8 reports the results for small parcels; ratios are well below one across all indices, indicating merging and under-segmentation at the smallest scale. NDVI achieves the highest result, while CIre achieves the lowest.

Table 9 reports the results for medium parcels; all ratios are slightly above one, indicating mild fragmentation and over-segmentation. NDWI produces a ratio closest to one, while CIg and NDVI are towards the higher end of the results.

Table 10 reports the ratios for large parcels; ratios rise to nearly 2.5, suggesting the fragmentation of large, homogeneous fields. NDWI again yields the lowest fragmentation ratio, while CIre and CIg produce the highest fragmentation ratios.

3.3.5. Over-Segmentation and Under-Segmentation Metrics

Table 11, Table 12 and Table 13 report the Global Over-Classification Error (GOC), Global Under-Classification Error (GUC), and their combined measure, the Global Total Classification Error (GTC), grouped by parcel area.

The largest errors occur in the bin with the smallest parcels (Table 11), where both GOC and GUC are high, resulting in GTC values in the high 0.3 range across all indices. The best performers are MSAVI, EVI, EVI2, NDWI, and SAVI, while NDRE, CIg, and CIre are at the high end of the error spectrum. GUC and GOC are relatively balanced, indicating that small parcels are prone to both spillover and missed interiors due to their tight boundaries.

Errors decrease notably for the bin with medium parcels (Table 12). Here, GOC clearly dominates GUC, indicating that medium parcels are more affected by boundary spillover than by omission. EVI, EVI2, MSAVI, SAVI, and NDWI produce the best results, while CIg and CIre remain among the weakest. Overall, GTC is in the mid 0.2 range, which is significantly lower than for small parcels.

Errors are lowest for the bin with large parcels (Table 13). GOC greatly exceeds GUC, as large fields exhibit overreach but with very little omission; their interiors are mostly complete, as supported by the high mIoU for large parcels reported in Table 6. NDWI achieves the lowest GTC in this bin, with EVI, EVI2, MSAVI, and NDRE close behind. CIre and CIg are the most error-prone, and GTC sits roughly between 0.15 and 0.20.

GTC decreases monotonically with parcel size, and over-classification dominates under-classification across all bin sizes. Regarding the indices, a stable top cluster is formed by EVI, EVI2, MSAVI, SAVI, and NDWI, which consistently yield the lowest total error. NDRE, NDVI, GNDVI, and NDYVI form a middle tier, while CIg and CIre consistently yield the greatest errors.

3.3.6. Boundary-Sensitive Metrics

Boundary-sensitive evaluation with a two-pixel, 6 m tolerance showed clear differences in contour agreement among the tested indices. The results are shown in Table 14. Overall, the strongest boundary performance was obtained from MSAVI, which achieved the highest bF1 (0.4741), and bIoU (0.3724), closely followed by EVI (bF1 = 0.4725, bIoU = 0.3262) and EVI2 (bF1 = 0.4712, bIoU = 0.3257). SAVI also performed strongly, confirming that soil-adjusted and enhanced vegetation formulations produced the most accurate parcel contours in this setting. By contrast, lower boundary agreement was observed for chlorophyll- and red-edge-oriented indices. Standard greenness indices such as NDVI and GNDVI occupied an intermediate position, while NDWI performed slightly better than NDVI in terms of boundary overlap.

3.3.7. Statistical Significance Testing

Across 11 indices, the Friedman test rejects the null hypothesis of equal performance. IoU and F1 produced identical rank orders and decisions, so only one set of values is reported. The results are shown in Table 14. Accordingly, the Friedman test statistic, Q, is equal to 2504.13, and the p-value is less than 0.001.

We applied paired Wilcoxon signed-rank tests with Holm correction on per-parcel comparisons across the 11 indices. Tests were run across all parcels. The main text reports a reduced, representative set of pairs for IoU in Table 15 and F1 in Table 16, while complete pairwise tables for both metrics are in Appendix A, Table A1 (IoU) and Table A2 (F1). These pairs capture the overall pattern: a top cluster contains MSAVI, EVI, EVI2, and SAVI, which are mutually indistinguishable. A middle cluster contains NDVI and NDWI, which are indistinguishable within the pair, followed by GNDVI and NDYVI, then NDRE, with CIg and CIre consistently performing the worst.

4. Discussion

Across eleven vegetation indices under a fixed recolouring scheme and zero-shot segmentation pipeline, the delineations were consistently strong at the pixel level, while parcel level precision and recall were noticeably lower. Index choice influenced the ability to properly delineate parcels, with soil-adjusted and enhanced-greenness indices tending to stabilize boundaries, while chlorophyll ratio indices consistently provided the worst results. Repeated measures tests indicated systematic differences among indices under identical preprocessing and segmentation pipelines, reinforcing that index physics, not tuning, drives the observed shift in results.

These findings align with prior evidence that temporal encodings can produce parcel-coherent masks with zero-shot SAM, a compact annual harmonic projected into cylindrical colour spaces that yields stable interiors and sharper inter-parcel contrast. Under HSV, Hue separates neighbours by timing, saturation scales with the seasonal range, and value reflects the baseline level. The dominant errors were over-segmentation, under-segmentation at the parcel level, and boundary under-reach with otherwise correct masks, which explains strong pixel coverage but lower object-wise precision. Hue can split large uniform fields when subtle management or moisture gradients affect timing; high saturation improves detectability but can also encourage duplicate fragments, and value tracks empirical mean greenness, which dampens short-lived events. Index interactions are consistent with these mechanics: soil-adjusted and enhanced-greenness (MSAVI, SAVI, EVI, and EVI2) indices generally reduce merges and spurious splits, while chlorophyll-ratio (CIg, CIre) indices are more sensitive to edge instability, and NDWI covers interiors well but may reduce inter-parcel chromatic contrast.

Papić et al. [8] demonstrated that recolouring multitemporal NDVI into cylindrical colour spaces and segmenting with a zero-shot segmenter yields parcel-coherent fields without retraining. HSV, HWB, and LCH each emphasized different aspects of the seasonal signal and produced distinct precision–recall trade-offs. Our results preserve this core mechanism: contrast from time using a compact annual harmonic is mapped to perceptual channels, while showing that index choice affects the strength of these cues. Compared to classic delineation methods based on gradients, region-growing or spectral-similarity merging, this approach derives contrast from seasonal trajectories rather than single-date spatial patterns [1].

Limitations include evaluation in a limited set of regions and a single season with one sensor family, geometry and timing mismatches in administrative ground truth, and sensitivity of time series encodings to residual atmospheric effects. In addition, the evaluation uses a ground truth-aware IoU screening step prior to computing metrics; therefore, the reported results reflect conditional quality for sufficiently overlapping candidates and may overestimate performance in a fully automated setting. The single-harmonic assumption captures the dominant seasonal mode but not double cropping or rapid cut regrowth. Mapping amplitude to the saturation and mean to value stabilizes the seasonality but may also mute diagnostically brief events. Index ordering shows AOI dependence in the smallholder setting, indicating that relative performance is not fully stable across contrasting parcel regimes.

Strengths include a controlled study design attributing differences to index physics, eleven indices spanning greenness, soil-adjusted, red-edge, and water-sensitive variants over a full year series, training-free and interpretable encoding that is both index- and sensor- agnostic, and rigorous evaluation with multiple metric types and statistical testing.

Index selection is a significant factor for boundary quality in temporally encoded, zero-shot delineation. The same harmonic descriptors used for delineation also carry crop-specific information, enabling the development of a unified methodology for field delineation and crop mapping from a single interpretable representation, well aligned with CAP-style monitoring and seasonal change analysis [8,11,12]. Recent reviews of crop mapping and yield prediction highlight the benefits of multitemporal inputs and modern deep networks but also underline label demands and generalization issues [24]. In smallholder mosaics, large-scale delineation with transfer learning and weak supervision has achieved strong boundary quality while drastically reducing manual labels yet still requires supervised training, which positions our zero-shot methodology as a fast, interpretable starting point that can seed supervised refinements [25]. For crop classification, crop-specific spectro-temporal feature selection improves map accuracy by tailoring features per class [26]. Our harmonic descriptors provide compact, discriminative features that can be integrated into such methodologies or guide class-wise feature subsets.

Future work should proceed along two complementary tracks. First, zero-shot upgrades, pairing the same recoloured inputs with specialized SAM derivatives, such as Delineate Anything [2], FieldSeg [4], BoundarySAM [9], fabSAM [10], and applying boundary-aware refinements, such as Principal Component Analysis (PCA), high-frequency enhancement, guided filtering, or light prompt/decoder tuning to tighten edges and suppress spurious splits without altering the temporal encoding [9,10]. Second, a supervised path, training compact segmenters such as U-Net, SegNet, or DeepLabV3+ directly on HSV composites, using a frozen DINOv3 backbone plus a lightweight decoder for data-efficient fine-tuning, size-aware sampling to reduce fragmentation, and implementing simple topology repair post-inference [27,28]. In parallel, the harmonic descriptors embedded in the HSV image can be used for parcel-level crop mapping.

5. Conclusions

This work proposes a simple, training-free method that converts annual harmonic summaries of vegetation index time series into HSV composites and feeds them to a zero-shot segmenter for parcel delineation over a 15 × 15 km AOI in Slavonija.

Harmonically summarizing vegetation index time series and mapping them to HSV, then segmenting with a zero-shot segmenter, provides a solid and transparent baseline for field delineation at scale. The pipeline is fast, training-free, and index-agnostic. The same phase–amplitude–mean descriptors embedded in the composites are also discriminative for crop mapping, enabling a single, scalable workflow that delineates fields and assigns crop types, while leaving a clear path to supervised heads and multi-sensor extensions.

Author Contributions

Conceptualization, F.P. and M.M.; methodology, F.P.; software, F.P.; validation, L.R. and D.M.; formal analysis, F.P.; investigation, F.P. and M.M.; resources, F.P. and L.R.; data curation, F.P.; writing—original draft preparation, F.P.; writing—review and editing, M.M. and D.M.; visualization, F.P.; supervision, M.M.; project administration, L.R.; funding acquisition, L.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Croatian Science Foundation for the FORMAT project: “Fusion of multitemporal optical and radar microsatellite data for land cover change detection”, Grant Number IP-2022-10-2639. This research was funded by the University of Zagreb, Faculty of Geodesy, through the institutional research projects Primjena spektralnih modela za segmentaciju zemljišnog pokrova—PRISMA (2025–2029) and Prostorno modeliranje urbanih šuma korištenjem LiDAR podataka i otvorenih GIS alata—GeoUrbanBio (2025–2029), under the Call for Institutional Research Project Funding, financed by the European Union—NextGenerationEU. The views and opinions expressed are those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the European Commission can be held responsible for them.

Data Availability Statement

Restrictions apply to the availability of this data. The PlanetScope imagery analyzed in this study was obtained under a research licence from Planet Labs PBC and cannot be redistributed. Derived data (figures) are included in the article. The ground truth data obtained from PAAFRD also cannot be redistributed.

Acknowledgments

Planet Team (2025). Planet Application Program Interface: In Space for Life on Earth. San Francisco, CA. https://api.planet.com, accessed on 5 February 2026. Paying Agency for Agriculture, Fisheries, and Rural Development (2026). Zagreb, Croatia. https://www.apprrr.hr/, accessed on 19 January 2026. During the preparation of this work the authors used ChatGPT 5.2 in order to improve the quality of writing. After using this tool, the authors reviewed and edited the content as needed and took full responsibility for the content of the publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

APBD	Agricultural Parcel and Boundary Delineation
SAM	Segment Anything Model
NDVI	Normalized Difference Vegetation Index
GNDVI	Green Normalized Difference Vegetation Index
NDRE	Normalized Difference Red Edge
EVI	Enhanced Vegetation Index
EVI2	Two-band Enhanced Vegetation Index
SAVI	Soil-Adjusted Vegetation Index
MSAVI	Modified Soil-Adjusted Vegetation Index
NDWI	Normalized Difference Water Index
CIg	Chlorophyll Index Green
CIre	Chlorophyll Index Red-Edge
NDYVI	Normalized Difference Yellow Vegetation Index
HSV	Hue-Saturation-Value
HWB	Hue-Whiteness-Blackness
LCH	Luminance-Chroma-Hue
AOI	Area-of-Interest
CAP	Common Agricultural Policy
NIR	Near Infra-Red
OLS	Ordinary Least Squares
IoU	Intersection-over-Union
GT	Ground Truth
PAAFRD	Payment Agency for Agriculture, Fisheries and Rural Development
TP	True Positive
FP	False Positive
FN	False Negative
GOC	Global Over-Classification Error
GUC	Global Under-Classification Error
GTC	Global Total Classification Error
mIoU	Mean Intersection-over-Union
PCA	Principal Component Analysis

Appendix A

In this appendix, the full set of post hoc pairwise comparisons among vegetation-index variants was provided to support the main results. After statistically significant differences among indices had been established using a Friedman test (Q = 2504.13, p < 0.001), Wilcoxon signed-rank tests were carried out for all index pairs, and Holm correction was applied to control the family-wise error rate.

The outcomes were summarized in Table A1 and Table A2 for IoU and F1, respectively. For every index pair, the median difference in scores is reported, together with a 95% bootstrap confidence interval and the Holm-adjusted p-value. Positive median differences were interpreted as higher performance of the first-listed index. In addition to the full pairwise comparisons, per-parcel comparisons against NDVI showed that top-tier indices MSAVI, EVI2, EVI, and SAVI significantly outperformed NDVI, with all Holm-adjusted p-values having values less than 0.001. These indices exceeded NDVI on 62.9–63.4% of parcels, with median IoU improvements of 0.012–0.0147. In contrast, NDWI did not significantly outperform NDVI (adjusted p = 0.817), with a zero median difference and a win rate of 48.6%. Overall, the statistical results indicate a consistent advantage of several top-tier formulations over NDVI, whereas NDWI did not show a significant per-parcel improvement.

Table A1. Full pairwise Wilcoxon signed-rank tests for IoU scores across indices.

Index A	Index B	Median IoU Difference	Lower 95% CI ¹	Upper 95% CI ¹	Adjusted p-Value ²
CIre	MSAVI	−0.02329	−0.02519	−0.02061	<0.001
CIre	EVI2	−0.02288	−0.02571	−0.02064	<0.001
CIre	EVI	−0.02352	−0.02622	−0.02130	<0.001
CIre	SAVI	−0.02236	−0.02496	−0.01950	<0.001
EVI	NDRE	0.01004	0.00882	0.01181	<0.001
EVI2	NDRE	0.01029	0.00880	0.01187	<0.001
MSAVI	NDRE	0.01031	0.00896	0.01200	<0.001
CIg	MSAVI	−0.01577	−0.01833	−0.01354	<0.001
CIg	EVI2	−0.01648	−0.01897	−0.01409	<0.001
NDRE	SAVI	−0.00972	−0.01115	−0.00836	<0.001
CIg	EVI	−0.01683	−0.01908	−0.01413	<0.001
EVI2	NDYVI	0.00852	0.00723	0.00986	<0.001
CIg	SAVI	−0.01519	−0.01781	−0.01288	<0.001
CIre	NDVI	−0.01476	−0.01747	−0.01245	<0.001
MSAVI	NDYVI	0.00821	0.00667	0.00977	<0.001
CIre	NDWI	−0.01462	−0.01673	−0.01218	<0.001
EVI	NDYVI	0.00836	0.00705	0.00962	<0.001
EVI2	GNDVI	0.00826	0.00704	0.00984	<0.001
EVI	GNDVI	0.00875	0.00744	0.01023	<0.001
CIre	NDYVI	−0.01041	−0.01218	−0.00825	<0.001
GNDVI	MSAVI	−0.00895	−0.01011	−0.00753	<0.001
GNDVI	SAVI	−0.00747	−0.00869	−0.00620	<0.001
NDYVI	SAVI	−0.00698	−0.00799	−0.00592	<0.001
EVI2	NDVI	0.00585	0.00465	0.00703	<0.001
CIre	GNDVI	−0.00911	−0.01143	−0.00674	<0.001
EVI	NDVI	0.00575	0.00454	0.00698	<0.001
CIre	NDRE	−0.00617	−0.00807	−0.00434	<0.001
MSAVI	NDVI	0.00615	0.00454	0.00745	<0.001
NDVI	SAVI	−0.00485	−0.00603	−0.00387	<0.001
NDRE	NDVI	−0.00309	−0.00415	−0.00207	<0.001
CIg	NDVI	−0.00775	−0.00998	−0.00577	<0.001
CIg	NDWI	−0.00664	−0.00901	−0.00458	<0.001
NDRE	NDWI	−0.00301	−0.00420	−0.00173	<0.001
EVI2	NDWI	0.00454	0.00329	0.00604	<0.001
EVI	NDWI	0.00481	0.00327	0.00622	<0.001
MSAVI	NDWI	0.00486	0.00303	0.00625	<0.001
CIg	CIre	0.00074	0.00000	0.00173	<0.001
CIg	NDYVI	−0.00352	−0.00524	−0.00188	<0.001
NDRE	NDYVI	−0.00087	−0.00133	−0.00039	<0.001
NDWI	SAVI	−0.00343	−0.00514	−0.00216	<0.001
CIg	GNDVI	−0.00398	−0.00582	−0.00208	<0.001
GNDVI	NDVI	−0.00084	−0.00157	−0.00006	<0.001
GNDVI	NDWI	−0.00085	−0.00180	0.00000	<0.001
NDWI	NDYVI	0.00133	0.00014	0.00221	<0.001
EVI	SAVI	0.00023	0.00000	0.00047	<0.001
GNDVI	NDRE	0.00086	0.00016	0.00162	<0.001
EVI2	SAVI	0.00000	0.00000	0.00019	<0.001
NDVI	NDYVI	0.00062	0.00000	0.00137	<0.001
MSAVI	SAVI	0.00000	0.00000	0.00010	<0.001
CIg	NDRE	−0.00028	−0.00199	0.00000	<0.001
EVI2	MSAVI	0.00000	0.00000	0.00000	0.032
EVI	EVI2	0.00000	0.00000	0.00000	0.591
EVI	MSAVI	0.00000	0.00000	0.00000	0.817
GNDVI	NDYVI	0.00000	0.00000	0.00000	0.817
NDVI	NDWI	0.00000	0.00000	0.00000	0.817

¹ Confidence interval; ² Holm-adjusted p-value.

Table A2. Full pairwise Wilcoxon signed-rank tests for F1 scores across indices.

Index A	Index B	Median F1 Difference	Lower 95% CI ¹	Upper 95% CI ¹	Adjusted p-Value ²
CIre	MSAVI	−0.02047	−0.02247	−0.01814	<0.001
CIre	EVI2	−0.02027	−0.02224	−0.01787	<0.001
CIre	EVI	−0.02070	−0.02285	−0.01881	<0.001
CIre	SAVI	−0.01971	−0.02169	−0.01743	<0.001
EVI2	NDRE	0.00914	0.00811	0.01044	<0.001
EVI	NDRE	0.00918	0.00793	0.01024	<0.001
MSAVI	NDRE	0.00921	0.00774	0.01038	<0.001
CIg	MSAVI	−0.01378	−0.01550	−0.01210	<0.001
CIg	EVI2	−0.01413	−0.01600	−0.01228	<0.001
NDRE	SAVI	−0.00847	−0.00954	−0.00728	<0.001
CIg	EVI	−0.01398	−0.01585	−0.01251	<0.001
EVI2	NDYVI	0.00732	0.00628	0.00824	<0.001
CIre	NDVI	−0.01297	−0.01486	−0.01120	<0.001
CIg	SAVI	−0.01342	−0.01529	−0.01169	<0.001
CIre	NDWI	−0.01279	−0.01447	−0.01107	<0.001
MSAVI	NDYVI	0.00736	0.00614	0.00837	<0.001
EVI	NDYVI	0.00730	0.00625	0.00845	<0.001
EVI2	GNDVI	0.00733	0.00623	0.00852	<0.001
EVI	GNDVI	0.00770	0.00651	0.00881	<0.001
CIre	NDYVI	−0.00903	−0.01061	−0.00738	<0.001
GNDVI	MSAVI	−0.00767	−0.00906	−0.00649	<0.001
GNDVI	SAVI	−0.00662	−0.00760	−0.00556	<0.001
NDYVI	SAVI	−0.00604	−0.00673	−0.00511	<0.001
EVI2	NDVI	0.00505	0.00415	0.00614	<0.001
CIre	GNDVI	−0.00820	−0.01009	−0.00673	<0.001
CIre	NDRE	−0.00567	−0.00695	−0.00413	<0.001
EVI	NDVI	0.00502	0.00420	0.00594	<0.001
MSAVI	NDVI	0.00521	0.00401	0.00619	<0.001
NDVI	SAVI	−0.00438	−0.00524	−0.00324	<0.001
NDRE	NDVI	−0.00289	−0.00383	−0.00201	<0.001
CIg	NDVI	−0.00700	−0.00894	−0.00479	<0.001
CIg	NDWI	−0.00636	−0.00829	−0.00429	<0.001
NDRE	NDWI	−0.00269	−0.00369	−0.00183	<0.001
EVI2	NDWI	0.00414	0.00310	0.00558	<0.001
EVI	NDWI	0.00436	0.00309	0.00542	<0.001
MSAVI	NDWI	0.00451	0.00325	0.00601	<0.001
CIg	CIre	0.00072	0.00000	0.00167	<0.001
CIg	NDYVI	−0.00309	−0.00453	−0.00202	<0.001
NDRE	NDYVI	−0.00082	−0.00122	−0.0004	<0.001
CIg	GNDVI	−0.00375	−0.00545	−0.00194	<0.001
NDWI	SAVI	−0.00336	−0.00440	−0.00210	<0.001
GNDVI	NDVI	−0.00068	−0.00128	−0.00011	<0.001
GNDVI	NDWI	−0.00088	−0.00163	0.00000	<0.001
NDWI	NDYVI	0.00116	0.00012	0.00210	<0.001
EVI	SAVI	0.00020	0.00000	0.00043	<0.001
GNDVI	NDRE	0.00089	0.00014	0.00151	<0.001
EVI2	SAVI	0.00000	0.00000	0.00020	<0.001
NDVI	NDYVI	0.00060	0.00000	0.00118	<0.001
CIg	NDRE	−0.00029	−0.00173	0.00000	<0.001
MSAVI	SAVI	0.00000	0.00000	0.00011	<0.001
EVI2	MSAVI	0.00000	0.00000	0.00000	0.043
EVI	EVI2	0.00000	0.00000	0.00000	0.709
EVI	MSAVI	0.00000	0.00000	0.00000	0.899
GNDVI	NDYVI	0.00000	0.00000	0.00000	0.899
NDVI	NDWI	0.00000	0.00000	0.00000	0.899

¹ Confidence interval; ² Holm-adjusted p-value.

Appendix B

To contextualize the performance of the proposed SAM-based approach, we additionally evaluated three commonly used classical segmentation algorithms, Felzenszwalb graph-based segmentation, SLIC superpixels, and Quickshift superpixels, which were applied directly to the same harmonic composite MSAVI input. The goal is providing a representative non-learning reference that is typically used for image segmentation.

Baseline segmentations were polygonized and filtered using the previously described algorithm, and accuracy was quantified using pixel-wise metrics and object-wise metrics, while matching at an IoU threshold of 0.50. The results are summarized in Table A3 and Table A4, and the segmentation outputs are depicted in Figure A1.

Figure A1. (a) Segmentation output of harmonic Felzenszwalb, (b) segmentation output of harmonic Quickshift, and (c) segmentation output of harmonic SLIC.

As shown in Table A3, superpixel-based methods achieved high pixel-wise scores on this AOI, whereas Felzenszwalb yielded a substantially lower pixel-wise IoU and F1. This behaviour is expected, as pixel-wise metrics can be inflated when methods generate extensive contiguous masks that overlap the parcel interiors, even if parcel boundaries are imprecise or instances are not well separated.

Table A3. Pixel-wise metrics for different segmentation methods.

Method	Precision	Recall	F1	IoU
Felzenszwalb	0.9691	0.4863	0.6476	0.4789
Quickshift	0.9490	0.8911	0.9192	0.8504
SLIC	0.9611	0.8399	0.8965	0.8124
SAM	0.9797	0.8057	0.8842	0.7924

Object-wise evaluation in Table A4 reveals a different picture. Despite high pixel-wise scores, SLIC and Quickshift show limited parcel-level delineation quality at IoU > 0.50 due to over-segmentation and fragmentation. Quickshift achieved the highest object-wise recall amongst the three, while SLIC produced lower object-wise F1. Felzenszwalb comparatively produced higher precision but very low recall, which indicates that it missed many parcels.

Table A4. Object-wise metrics for different segmentation methods.

Method	Precision	Recall	F1	mIoU
Felzenszwalb	0.492	0.1036	0.1712	0.7420
Quickshift	0.2770	0.3942	0.3254	0.6995
SLIC	0.2053	0.2775	0.2360	0.6589
SAM	0.5198	0.4869	0.5028	0.6998

These results highlight that pixel-wise agreement does not necessarily translate to accurate parcel delineation, where correct instance separation and boundary placement are essential. This motivates the usage of SAM, which is designed for instance segmentation and can leverage learned priors to better respect object boundaries, while our harmonic-HSV encoding focuses the input representation on phenological separability rather than raw texture alone. Although parcel interiors exhibit distinct textural and phenological patterns in the composites, delineation remains challenging in cases of adjacent parcels sharing similar seasonal signatures; within-field heterogeneity also introduces internal edges that can attract classical segmentation boundaries; and narrow field margins at 3 m resolution can also be sub-pixel or mixed, which limits boundary precision.

Appendix C

The main experiment quantifies how different vegetation indices affect segmentation performance when encoded via the proposed method. However, parcel delineation is sensitive to landscape structure, most notably field size and boundary density. In order to assess whether the conclusions drawn from the primary AOI are transferable to a more challenging setting, additional evaluation was performed on an independent agricultural region dominated by smallholder parcels in Northern Croatia.

The same processing steps are applied as in the main study without changing parameters. Figure A2 shows the location and extent of the second AOI in Northern Croatia, while Table A5 and Table A6 report object-wise and pixel-wise metrics for all index variants.

Table A5. Northern Croatia object-wise metrics.

Index	Precision	Recall	F1	mIoU
CIg	0.1490	0.2340	0.1821	0.6549
CIre	0.1252	0.2480	0.1664	0.6463
EVI2	0.0835	0.1905	0.1161	0.6387
EVI	0.1260	0.2740	0.1726	0.6371
GNDVI	0.1292	0.2475	0.1698	0.6449
MSAVI	0.1307	0.2680	0.1757	0.6437
NDRE	0.1240	0.2510	0.1660	0.6481
NDVI	0.1319	0.2620	0.1754	0.6465
NDWI	0.1460	0.2430	0.1824	0.6466
NDYVI	0.1331	0.2565	0.1752	0.6458
SAVI	0.1299	0.2745	0.1764	0.6405

Table A6. Northern Croatia pixel-wise metrics.

Index	Precision	Recall	F1	IoU
CIg	0.9601	0.4925	0.6510	0.4826
CIre	0.9594	0.5400	0.6910	0.5279
EVI2	0.9651	0.5186	0.6746	0.509
EVI	0.9695	0.5444	0.6973	0.5352
GNDVI	0.9671	0.5185	0.6751	0.5095
MSAVI	0.9693	0.5599	0.7098	0.5502
NDRE	0.9639	0.5554	0.7047	0.5441
NDVI	0.9675	0.5555	0.7058	0.5454
NDWI	0.9688	0.5267	0.6824	0.5180
NDYVI	0.9681	0.5377	0.6914	0.5284
SAVI	0.9711	0.5644	0.7139	0.5551

Figure A2. Secondary AOI overlaid with GT parcels.

Table A5 summarizes object-wise results in the smallholder AOI. Overall object-wise performance is significantly lower than in the primary AOI, which is expected in landscapes with numerous small parcels where mixed pixels along narrow parcel margins produce a disproportionate penalty under the defined IoU threshold. The differences between indices are also compressed within this AOI, as F1 spans a relatively narrow range, which implies that parcel geometry and boundary ambiguity dominate the error budget more strongly than index choice.

Despite the compressed ranges, a small upper tier is observable, NDWI and CIg yield the highest object-wise F1 values, followed closely by the soil-adjusted indices (SAVI and MSAVI) and greenness indices. At the same time, several indices exhibit notable rank shifts with regard to the primary AOI, which indicates that the full index ordering is not strictly preserved under the smallholder conditions. Table A6 reports the pixel-wise metrics for the same AOI, and they are generally higher than their object-wise counterparts, reflecting that many predicted masks overlap parcel interiors even when instance separation and boundary placement are imperfect. To summarize, evaluation of the second site indicates that absolute segmentation accuracy degrades substantially in a smallholder landscape at 3 m resolution and that index-dependent effects persist but become harder to separate due to the narrower performance spread.

Appendix D

To contextualize the proposed harmonic-HSV encoding, two RGB-based baselines were evaluated using the same SAM configuration and post-processing pipeline using a single-date RGB selected from the peak growing season and a naive multi-date RGB composite constructed without harmonic modelling. The naive multi-date RGB composite was generated by taking the per-pixel median of the red, green, and blue bands across three dates, producing a single three-band image that summarizes the temporal stack in a simple, non-phenological manner. The purpose of this comparison was to assess whether the proposed harmonic-HSV representation provides a measurable benefit over more straightforward three-channel inputs.

Object-wise results are summarized in Table A7. Among the tested inputs, the MSAVI harmonic-HSV composite achieved the highest object-wise F1 score (0.5028), indicating the best balance between precision and recall at the parcel level. The naive multi-date RGB composite yielded lower object-wise F1 (0.4411), although it produced the highest object-wise precision (0.7417), suggesting that it generated fewer false positive parcel matches but missed a larger proportion of reference parcels. The single-date RGB baseline performed the worst, with a significantly lower recall (0.1481) and F1 (0.2469). This indicates that relying on a single acquisition substantially reduces parcel detectability relative to both temporally informed representations.

Table A7. Object-wise metrics including two RGB baselines.

Type	Precision	Recall	F1	mIoU
Naive multi-date	0.7417	0.3138	0.4411	0.7137
Single date	0.7421	0.1481	0.2469	0.7281
MSAVI	0.5198	0.4869	0.5028	0.6998

Pixel-wise results are provided in Table A8. In this case, the MSAVI harmonic-HSV also achieved the strongest performance overall, with the highest recall (0.8057), F1 (0.8842), and IoU (0.7924). The naive multi-date RGB composite produced a lower pixel-wise F1 (0.7480) and IoU (0.5975), while the single-date RGB baseline again showed the weakest performance (F1 = 0.5381, IoU = 0.3681). Although both RGB baselines attained very high precision, this was accompanied by reduced recall, indicating under-segmentation relative to the harmonic-HSV encoding.

Table A8. Pixel-wise metrics including two RGB baselines.

Type	Precision	Recall	F1	IoU
Naive multi-date	0.9903	0.6010	0.7480	0.5975
Single date	0.9869	0.3699	0.5381	0.3681
MSAVI	0.9797	0.8057	0.8842	0.7924

Overall, these baseline experiments show that the proposed harmonic-HSV representation improves segmentation performance relative to both a single-date RGB input and a naive multi-date RGB composite, with the largest gains observed in recall-sensitive metrics. This suggests that encoding the annual phenological signal into a compact three-channel representation benefits parcel delineation beyond what can be obtained from straightforward RGB compositing alone.

References

Zheng, J.; Ye, Z.; Wen, Y.; Huang, J.; Zhang, Z.; Li, Q.; Hu, Q.; Xu, B.; Zhao, L.; Fu, H. A Comprehensive Review of Agricultural Parcel and Boundary Delineation from Remote Sensing Images: Recent Progress and Future Perspectives. IEEE Geosci. Remote Sens. Mag. 2026; early access. [CrossRef]
Lavreniuk, M.; Kussul, N.; Shelestov, A.; Yailymov, B.; Salii, Y.; Kuzin, V.; Szantoi, Z. Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery. arXiv 2025, arXiv:2504.02534. [Google Scholar] [CrossRef]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 4015–4026. [Google Scholar]
Ferreira, L.B.; Martins, V.S.; Aires, U.R.V.; Wijewardane, N.; Zhang, X.; Samiappan, S. FieldSeg: A scalable agricultural field extraction framework based on the Segment Anything Model and 10-m Sentinel-2 imagery. Comput. Electron. Agric. 2025, 232, 110086. [Google Scholar] [CrossRef]
Amin, G.; Oberlin, T.; Demarez, V. Early-season delineation of agricultural fields using a fully convolutional multi-task network and satellite images. Sci. Remote Sens. 2025, 12, 100256. [Google Scholar] [CrossRef]
Wilson, B.T.; Knight, J.F.; McRoberts, R.E. Harmonic regression of Landsat time series for modeling attributes from national forest inventory data. ISPRS J. Photogramm. Remote Sens. 2018, 137, 29–46. [Google Scholar] [CrossRef]
Ben Abbes, A.; Bounouh, O.; Farah, I.R.; de Jong, R.; Martínez, B. Comparative study of three satellite image time-series decomposition methods for vegetation change detection. Eur. J. Remote Sens. 2018, 51, 607–615. [Google Scholar] [CrossRef]
Papić, F.; Rumora, L.; Medak, D.; Miler, M. Turning Seasonal Signals into Segmentation Cues: Recolouring the Harmonic Normalized Difference Vegetation Index for Agricultural Field Delineation. Sensors 2025, 25, 5926. [Google Scholar] [CrossRef]
Awad, B.; Erer, I. Boundary SAM: Improved parcel boundary delineation using SAM’s image embeddings and detail enhancement filters. IEEE Geosci. Remote Sens. Lett. 2025, 22, 2502905. [Google Scholar] [CrossRef]
Xie, Y.; Wu, H.; Tong, H.; Xiao, L.; Zhou, W.; Li, L.; Wanger, T.C. fabSAM: A Farmland Boundary Delineation Method Based on the Segment Anything Model. arXiv 2025, arXiv:2501.12487. [Google Scholar] [CrossRef]
PlanetScope|Planet Documentation. Available online: https://docs.planet.com/data/imagery/planetscope/#psbsd (accessed on 24 November 2025).
Chasles, R.G.; Maciel, D.A.; Barbosa, C.C.F.; Novo, E.M.L.M.; Martins, V.S.; Paulino, R.; Wanderley, R.; Júnior, R.F.; Lima, T.M.; Bacellar, P.; et al. Accuracy assessment of PlanetScope SuperDove products for aquatic reflectance retrieval over Brazilian inland and coastal waters. ISPRS J. Photogramm. Remote Sens. 2025, 227, 678–690. [Google Scholar] [CrossRef]
Vanhellemont, Q. Evaluation of eight-band SuperDove imagery for aquatic applications. Opt. Express 2023, 31, 13851–13872. [Google Scholar] [CrossRef]
Yang, C.; Everitt, J.H.; Bradford, J.M.; Murden, D. Airborne Hyperspectral Imagery and Yield Monitor Data for Mapping Cotton Yield Variability. Precis. Agric. 2004, 5, 445–461. [Google Scholar] [CrossRef]
Hunt, E.R., Jr.; Daughtry, C.S.T.; Eitel, J.U.H.; Long, D.S. Remote sensing leaf chlorophyll content using a visible band index. Agron. J. 2011, 103, 1090–1099. [Google Scholar] [CrossRef]
Jiang, Z.; Huete, A.R.; Didan, K.; Miura, T. Development of a two-band enhanced vegetation index without a blue band. Remote Sens. Environ. 2008, 112, 3833–3845. [Google Scholar] [CrossRef]
Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A modified soil adjusted vegetation index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
Gao, B.-C. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
Wei, Y.; Lu, M.; Yu, Q.; Li, W.; Wang, C.; Tang, H.; Wu, W. The normalized difference yellow vegetation index (NDYVI): A new index for crop identification by using GaoFen-6 WFV data. Comput. Electron. Agric. 2024, 226, 109417. [Google Scholar] [CrossRef]
Pringle, M.J. Robust prediction of time-integrated NDVI. Int. J. Remote Sens. 2013, 34, 4791–4811. [Google Scholar] [CrossRef]
García, S.; Fernández, A.; Luengo, J.; Herrera, F. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 2010, 180, 2044–2064. [Google Scholar] [CrossRef]
Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Joshi, A.; Pradhan, B.; Gite, S.; Chakraborty, S. Remote-Sensing Data and Deep-Learning Techniques in Crop Mapping and Yield Prediction: A Systematic Review. Remote Sens. 2023, 15, 2014. [Google Scholar] [CrossRef]
Wang, S.; Waldner, F.; Lobell, D.B. Unlocking Large-Scale Crop Field Delineation in Smallholder Farming Systems with Transfer Learning and Weak Supervision. Remote Sens. 2022, 14, 5738. [Google Scholar] [CrossRef]
Yin, L.; You, N.; Zhang, G.; Huang, J.; Dong, J. Optimizing Feature Selection of Individual Crop Types for Improved Crop Mapping. Remote Sens. 2020, 12, 162. [Google Scholar] [CrossRef]
Luo, Z.; Yang, W.; Yuan, Y.; Gou, R.; Li, X. Semantic segmentation of agricultural images: A survey. Inf. Process. Agric. 2024, 11, 172–186. [Google Scholar] [CrossRef]
Siméoni, O.; Vo, H.V.; Seitzer, M.; Baldassarre, F.; Oquab, M.; Jose, C.; Khalidov, V.; Szafraniec, M.; Yi, S.; Ramamonjisoa, M.; et al. DINOv3. arXiv 2025, arXiv:2508.10104. [Google Scholar] [CrossRef]

Figure 2. Workflow of the proposed method.

Figure 3. (a) Vegetation index time series where each greyscale slice corresponds to one acquisition date, stacking the slice forms a per-pixel annual index time series; (b) per-pixel harmonic modelling of the time series-observed index values (blue points) are detrended with a linear term and the residuals are fitted with a single annual harmonic. The fit yields three descriptors—phase (timing of the seasonal peak), amplitude (seasonality strength), and mean (baseline level); (c) mapping of harmonic descriptors to the HSV colour space to form the false colour composite: Hue = phase (interpreted as a circular timing axis over the annual cycle, where similar Hues indicate similar peak timing); saturation = amplitude (low seasonality desaturates towards grey); and value = mean (low baseline appears darker). Scaling is applied per index at the AOI level using fixed saturation percentiles and conservative value clipping to ensure consistent colour ranges.

Figure 5. (a) NDVI harmonic composite, (b) GNDVI harmonic composite, (c) NDRE harmonic composite, and (d) NDYVI harmonic composite.

Figure 6. (a) EVI harmonic composite, (b) EVI2 harmonic composite, (c) SAVI harmonic composite, and (d) MSAVI harmonic composite.

Figure 7. (a) NDWI harmonic composite, (b) CIg harmonic composite, and (c) CIre harmonic composite.

Figure 8. (a) NDVI segmentation output, (b) GNDVI segmentation output, (c) NDRE segmentation output, and (d) NDYVI segmentation output.

Figure 9. (a) EVI segmentation output, (b) EVI2 segmentation output, (c) SAVI segmentation output, and (d) MSAVI segmentation output.

Figure 10. (a) NDWI segmentation output, (b) CIg segmentation output, and (c) CIre segmentation output.

Figure 11. Representative examples of the main segmentation error types relative to GT. GT parcel boundaries are shown in red, and predicted polygons are shown in grey: (a) over-segmentation, (b) under-segmentation, and (c) fragmentation.

Table 1. Table of vegetation indices, alongside their formula and source.

Index	Formula	Reference
NDVI	$\frac{N I R - R E D}{N I R + R E D}$	[14]
GNDVI	$\frac{N I R - G R E E N}{N I R + G R E E N}$	[14]
NDRE	$\frac{N I R - R E}{N I R + R E}$	[15]
EVI	$2.5 * \frac{N I R - R E D}{N I R + 6 * R E D - 7.5 * B L U E + 1}$	[16]
EVI2	$2.5 * \frac{N I R - R E D}{N I R + 2.4 * R E D + 1}$	[16]
SAVI	$(1 + L) * \frac{N I R - R E D}{N I R + R E D + L}, L = 0.5$	[17]
MSAVI	$\frac{2 * N I R + 1 - \sqrt{{(2 * N I R + 1)}^{2} - 8 * (N I R - R E D)}}{2}$	[18]
NDWI	$\frac{G R E E N - N I R}{G R E E N + N I R}$	[19]
CIg	$\frac{N I R}{G R E E N} - 1$	[15]
CIre	$\frac{N I R}{R E} - 1$	[15]
NDYVI	$\frac{N I R - Y E L L O W - R E}{N I R + Y E L L O W + R E}$	[20]

Table 2. Pixel-wise precision, recall, F1, and IoU.

Index	Precision	Recall	F1	IoU
CIg	0.9758	0.7693	0.8603	0.7549
CIre	0.9738	0.7724	0.8615	0.7567
EVI2	0.9796	0.8116	0.8877	0.7981
EVI	0.9790	0.8075	0.8851	0.7938
GNDVI	0.9789	0.7954	0.8776	0.7819
MSAVI	0.9797	0.8057	0.8842	0.7924
NDRE	0.9775	0.7815	0.8686	0.7677
NDVI	0.9794	0.8152	0.8898	0.8015
NDWI	0.9819	0.8161	0.8913	0.8040
NDYVI	0.9799	0.7972	0.8792	0.7844
SAVI	0.9785	0.8018	0.8814	0.7879

Table 3. Object-wise precision, recall, F1, and mIoU.

Index	Precision	Recall	F1	mIoU
CIg	0.4693	0.4437	0.4562	0.6883
CIre	0.4592	0.4229	0.4403	0.6848
EVI2	0.5194	0.4856	0.5019	0.6999
EVI	0.5157	0.4815	0.4980	0.7014
GNDVI	0.4846	0.4540	0.4688	0.6921
MSAVI	0.5198	0.4869	0.5028	0.6998
NDRE	0.4924	0.4473	0.4688	0.6942
NDVI	0.4786	0.4621	0.4702	0.6956
NDWI	0.5145	0.4641	0.4880	0.7000
NDYVI	0.4948	0.4594	0.4764	0.6931
SAVI	0.5113	0.4818	0.4961	0.6982

Table 4. Object-wise precision, recall, F1, and mIoU for small parcels (<10,000 m²).

Index	Precision	Recall	F1	mIoU
CIg	0.3116	0.1695	0.2196	0.6079
CIre	0.2968	0.1527	0.2017	0.5963
EVI2	0.3769	0.2177	0.2760	0.6078
EVI	0.3654	0.2123	0.2685	0.6064
GNDVI	0.3203	0.1832	0.2331	0.5994
MSAVI	0.3803	0.2195	0.2784	0.6067
NDRE	0.3075	0.1677	0.2171	0.6051
NDVI	0.3143	0.1877	0.2351	0.6028
NDWI	0.3509	0.1936	0.2496	0.6096
NDYVI	0.3331	0.1855	0.2382	0.6015
SAVI	0.3689	0.2168	0.2731	0.6042

Table 5. Object-wise precision, recall, F1, and mIoU for medium parcels (10,000 m²–100,000 m²).

Index	Precision	Recall	F1	mIoU
CIg	0.5824	0.7083	0.6392	0.6922
CIre	0.5719	0.6818	0.6220	0.6886
EVI2	0.6329	0.7455	0.6846	0.7096
EVI	0.6293	0.7420	0.6810	0.7111
GNDVI	0.6015	0.7134	0.6527	0.6978
MSAVI	0.6368	0.7455	0.6869	0.7107
NDRE	0.6147	0.7160	0.6615	0.6971
NDVI	0.5960	0.7231	0.6535	0.7015
NDWI	0.6223	0.7175	0.6665	0.7048
NDYVI	0.6096	0.7221	0.6611	0.6987
SAVI	0.6252	0.7374	0.6766	0.7096

Table 6. Object-wise precision, recall, F1, and mIoU for large parcels (larger than 100,000 m²).

Index	Precision	Recall	F1	mIoU
CIg	0.3437	0.8246	0.4852	0.8001
CIre	0.3359	0.8097	0.4748	0.7985
EVI2	0.3836	0.8545	0.5295	0.8311
EVI	0.3945	0.8582	0.5405	0.8328
GNDVI	0.3765	0.8470	0.5212	0.8216
MSAVI	0.3726	0.8619	0.5203	0.8259
NDRE	0.3924	0.8433	0.5355	0.8214
NDVI	0.3762	0.8731	0.5258	0.8234
NDWI	0.4335	0.8993	0.5850	0.8315
NDYVI	0.3810	0.8545	0.5270	0.8217
SAVI	0.3802	0.8582	0.5269	0.8218

Table 7. Tile boundary errors.

Index	ΔPrecision	ΔRecall	ΔF1	ΔIoU
CIg	−0.0007	−0.0222	−0.0143	−0.0218
CIre	0.0014	−0.0121	−0.0071	−0.0108
EVI2	−0.0031	−0.0088	−0.0066	−0.0106
EVI	0.0005	−0.0062	−0.0036	−0.0057
GNDVI	0.0012	−0.0176	−0.0103	−0.0162
MSAVI	−0.0032	−0.0019	−0.0025	−0.0039
NDRE	−0.0009	−0.0263	−0.0169	−0.0259
NDVI	−0.0020	−0.0174	−0.0113	−0.0182
NDWI	0.0008	−0.0099	−0.0056	−0.0091
NDYVI	0.0007	−0.0138	−0.0082	−0.0130
SAVI	0.0010	−0.0070	−0.0039	−0.0061

Table 8. Fragmentation metrics for parcels smaller than 10,000 m².

Index	Number of Ground Truth Polygons	Number of Predicted Polygons	Ratio
CIg	2200	1156	0.5255
CIre	2200	1089	0.4950
EVI2	2200	1253	0.5695
EVI	2200	1243	0.5650
GNDVI	2200	1217	0.5532
MSAVI	2200	1236	0.5618
NDRE	2200	1161	0.5277
NDVI	2200	1278	0.5809
NDWI	2200	1181	0.5368
NDYVI	2200	1190	0.5409
SAVI	2200	1267	0.5759

Table 9. Fragmentation metrics for parcels between 10,000 m² and 100,000 m².

Index	Number of Ground Truth Polygons	Number of Predicted Polygons	Ratio
CIg	1961	2410	1.2290
CIre	1961	2360	1.2035
EVI2	1961	2313	1.1795
EVI	1961	2329	1.1877
GNDVI	1961	2348	1.1973
MSAVI	1961	2312	1.1790
NDRE	1961	2306	1.1759
NDVI	1961	2402	1.2249
NDWI	1961	2282	1.1637
NDYVI	1961	2342	1.1943
SAVI	1961	2327	1.1866

Table 10. Fragmentation metrics for parcels larger than 100,000 m².

Index	Number of Ground Truth Polygons	Number of Predicted Polygons	Ratio
CIg	268	659	2.4590
CIre	268	667	2.4888
EVI2	268	612	2.2836
EVI	268	601	2.2425
GNDVI	268	622	2.3209
MSAVI	268	638	2.3806
NDRE	268	593	2.2127
NDVI	268	635	2.3694
NDWI	268	568	2.1194
NDYVI	268	617	2.3022
SAVI	268	617	2.3022

Table 11. Over-segmentation and under-segmentation metrics for parcels smaller than 10,000 m².

Index	GOC	GUC	GTC
CIg	0.3402	0.3179	0.3871
CIre	0.3453	0.3249	0.3927
EVI2	0.3400	0.3019	0.3756
EVI	0.3365	0.3035	0.3757
GNDVI	0.3491	0.3183	0.3907
MSAVI	0.3333	0.3072	0.3754
NDRE	0.3427	0.3336	0.3936
NDVI	0.3485	0.3044	0.3847
NDWI	0.3259	0.3152	0.3767
NDYVI	0.3363	0.3248	0.3881
SAVI	0.3394	0.3033	0.3770

Table 12. Over-segmentation and under-segmentation metrics for parcels between 10,000 m² and 100,000 m².

Index	GOC	GUC	GTC
CIg	0.2672	0.1336	0.2482
CIre	0.2693	0.1456	0.2561
EVI2	0.244	0.1245	0.2289
EVI	0.2433	0.1274	0.2296
GNDVI	0.2578	0.1338	0.2421
MSAVI	0.2443	0.1248	0.2289
NDRE	0.2547	0.1361	0.2398
NDVI	0.2633	0.1217	0.2393
NDWI	0.2492	0.1298	0.2344
NDYVI	0.2588	0.1295	0.2399
SAVI	0.2492	0.1260	0.2320

Table 13. Over-segmentation and under-segmentation metrics for parcels larger than 100,000 m².

Index	GOC	GUC	GTC
CIg	0.2483	0.0372	0.1925
CIre	0.2521	0.0515	0.2037
EVI2	0.2214	0.0309	0.1708
EVI	0.2118	0.0313	0.1642
GNDVI	0.2257	0.0372	0.1767
MSAVI	0.2215	0.0312	0.1710
NDRE	0.2066	0.0359	0.1620
NDVI	0.2221	0.0324	0.1723
NDWI	0.2009	0.0245	0.1515
NDYVI	0.2214	0.0261	0.1678
SAVI	0.2295	0.0327	0.1767

Table 14. Boundary metrics.

Index	Boundary Precision	Boundary Recall	Boundary F1	Boundary IoU
CIg	0.4094	0.4009	0.4051	0.2791
CIre	0.3953	0.3872	0.3912	0.2681
EVI2	0.4712	0.4711	0.4712	0.3257
EVI	0.4745	0.4704	0.4725	0.3262
GNDVI	0.4328	0.4242	0.4285	0.2975
MSAVI	0.4743	0.4740	0.4742	0.3274
NDRE	0.4281	0.4101	0.4189	0.2910
NDVI	0.4373	0.4391	0.4382	0.3045
NDWI	0.4488	0.4303	0.4393	0.3064
NDYVI	0.4295	0.4174	0.4234	0.2954
SAVI	0.4670	0.4662	0.4666	0.3219

Table 15. Results of the pairwise Wilcoxon signed-rank tests for IoU scores across indices.

Index A	Index B	Median IoU Difference	Lower 95% CI ¹	Upper 95% CI ¹	Adjusted p-Value ²
CIre	MSAVI	−0.02329	−0.02519	−0.02061	<0.001
CIg	EVI2	−0.01648	−0.01897	−0.01409	<0.001
MSAVI	NDRE	0.01031	0.00896	0.01200	<0.001
EVI2	NDVI	0.00585	0.00465	0.00703	<0.001
NDVI	NDWI	0.00000	0.00000	0.00000	0.817
EVI	EVI2	0.00000	0.00000	0.00000	0.591

¹ Confidence interval; ² Holm-adjusted p-value.

Table 16. Results of the pairwise Wilcoxon signed-rank tests for F1 scores across indices.

Index A	Index B	Median F1 Difference	Lower 95% CI ¹	Upper 95% CI ¹	Adjusted p-Value ²
CIre	MSAVI	−0.02047	−0.02247	−0.01814	<0.001
CIg	EVI2	−0.01413	−0.01600	−0.01228	<0.001
MSAVI	NDRE	0.00921	0.00774	0.01038	<0.001
EVI2	NDVI	0.00505	0.00415	0.00614	<0.001
NDVI	NDWI	0.00000	0.00000	0.00000	0.899
EVI	EVI2	0.00000	0.00000	0.00000	0.709

¹ Confidence interval; ² Holm-adjusted p-value.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Papić, F.; Miler, M.; Medak, D.; Rumora, L. Harmonic Phenology Mapping: From Vegetation Indices to Field Delineation. Remote Sens. 2026, 18, 1011. https://doi.org/10.3390/rs18071011

AMA Style

Papić F, Miler M, Medak D, Rumora L. Harmonic Phenology Mapping: From Vegetation Indices to Field Delineation. Remote Sensing. 2026; 18(7):1011. https://doi.org/10.3390/rs18071011

Chicago/Turabian Style

Papić, Filip, Mario Miler, Damir Medak, and Luka Rumora. 2026. "Harmonic Phenology Mapping: From Vegetation Indices to Field Delineation" Remote Sensing 18, no. 7: 1011. https://doi.org/10.3390/rs18071011

APA Style

Papić, F., Miler, M., Medak, D., & Rumora, L. (2026). Harmonic Phenology Mapping: From Vegetation Indices to Field Delineation. Remote Sensing, 18(7), 1011. https://doi.org/10.3390/rs18071011

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Harmonic Phenology Mapping: From Vegetation Indices to Field Delineation

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and PlanetScope Imagery

2.2. Indices Calculation

2.3. Harmonic Analysis

2.4. Perceptual Recolouring

2.5. Segmentation with Segment Anything Model

2.6. Validation and Accuracy Metrics

2.7. Statistical Analysis

3. Results

3.1. Harmonically Recoloured Outputs

3.2. Segmentation Outputs

3.3. Validation Metrics

3.3.1. Pixel-Wise Metrics

3.3.2. Object-Wise Metrics

3.3.3. Tile Boundary Errors

3.3.4. Fragmentation Metrics

3.3.5. Over-Segmentation and Under-Segmentation Metrics

3.3.6. Boundary-Sensitive Metrics

3.3.7. Statistical Significance Testing

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

Appendix C

Appendix D

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI