Previous Article in Journal
Improving GEDI L2B Leaf Area Index Estimation Using a Four-Scale Geometric Optical Model in Temperate Forests
Previous Article in Special Issue
Bridging Measurement and Modeling: An Approach to Urban Thermal Comfort Spatialization and Risk Assessment in Strasbourg, France
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Maya Pottery Red: Hue as a Perceptual Prior for Object Detection in UAV-Based Areal Survey

1
Department of Geography & GIS, University of Cincinnati, Cincinnati, OH 45221, USA
2
Department of Anthropology, University of Cincinnati, Cincinnati, OH 45221, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(11), 1836; https://doi.org/10.3390/rs18111836
Submission received: 10 April 2026 / Revised: 17 May 2026 / Accepted: 28 May 2026 / Published: 3 June 2026
(This article belongs to the Special Issue Applications of Remote Sensing in Landscapes and Human Settlements)

Highlights

What are the main findings?
  • Developed a Hue-Weighted Loss Function and Two-Phase Workflow for small-object detection.
  • HSV-based filtering reduced candidates by 99.1% while retaining 97.8% of targets (F1: 0.731).
What are the implications of the main findings?
  • Chromatic priors can also assist search-and-rescue, environmental, and traffic detection.
  • Low-altitude UAV chromatic detection scales survey records while reducing manual effort.

Abstract

The detection of small archaeological artifacts in high-resolution aerial imagery is challenged by minimal target size and local spectral and geometric similarity to background soils. This study identifies a failure mode in end-to-end deep learning where radiometrically dominant chromatic signals destabilize gradient-based optimization, leading to rapid training collapse. Using UAV imagery of Maya archaeological sites in Belize, we examine fingernail-sized ceramic sherds characterized by a consistent reddish hue. A Hue-Weighted Loss Function (HWLF) is introduced as a diagnostic instrument. Under severe class imbalance, chromatic gradients suppress geometric feature learning, collapsing detection within 300 iterations. Motivated by this discovery, we propose a staged detection architecture that decouples geometric candidate generation from chromatic validation. Candidates are detected via a transformer-based object detector and validated using hue constraints derived from unmodified 16-bit HSV representations. This approach reduced the Phase I candidate pool (177,148 geometric detections) to 1647 prioritized detections—a 99.1% reduction—while retaining 97.8% of annotated targets (F1 = 0.731). Chromatic priors may be more effective as decoupled post-inference discriminants than as embedded end-to-end optimization signals under severe class imbalance, where their gradient influence risks suppressing geometric feature learning entirely.

1. Introduction

1.1. The Challenge of Small-Object Detection in Heritage Documentation

Detecting small, low-contrast objects in Unmanned Aerial Vehicle (UAV)-based remote sensing remains a persistent challenge (Figure 1), particularly in environments where targets exhibit minimal geometric distinction from their surroundings [1,2]. In the Maya lowlands of Northern Belize, this problem is especially acute in archaeological surface surveys, where artifacts such as ceramic pottery sherds appear as sparse, visually subtle features embedded within heterogeneous backgrounds of soil, limestone, and vegetation. While traditional pedestrian survey remains the authoritative method for artifact recovery [3], it is limited by labor intensity, observer fatigue, and physical accessibility. In large archaeological landscapes such as the Lamanai region (Figure 2), the survey scale motivates the development of automated, UAV-assisted workflows to augment expert Human-in-the-Loop (HITL) analysis (Table 1). These methodologies offer significant advantages in spatial coverage and detection consistency compared to traditional field-walking.

1.2. Ecological Context and the “Maya Pottery Red” (MPR) Signature

The settlement development of the New River Lagoon and its surrounding hinterlands is deeply tied to the local ecology [4]. As established by Rushton [5], the intersection of soil productivity and hydrological access dictated the long-term spatial distribution of Maya settlements in this region. This ecological framework provides the predictive logic for where material remains are most likely to be concentrated [6]. However, identifying these remains—specifically ceramic sherds—requires isolating a diagnostic signal against a physiographically complex background. This complexity arises from the high-frequency textural noise of karst limestone fragments, decaying organic matter, and the variegated shadows cast by low-altitude vegetation, all of which closely mimic the morphology of weathered ceramic artifacts. Low altitude survey of an area can augment and enhance the practice of HITL ground truth for physical guidance on discovering existence of the sherds in the mounds within the fields.
Surface assemblages in this region frequently exhibit a characteristic reddish hue—”Maya Pottery Red” (MPR)—which contrasts with the grey limestone and brown Vertisol soils [7,8]. While such chromatic cues are readily used in human visual interpretation, they are often underutilized or destroyed during standard image processing. Standard 8-bit JPEG compression and typical orthomosaic generation often average out these subtle hue variations, rendering the artifacts described by Rushton’s settlement models invisible to traditional deep-learning workflows.

1.3. Related Work and the Technical Gap

Recent advances in archaeological remote sensing have successfully applied deep learning and LiDAR to landscape-scale features, including mounds and causeways [9,10]. These studies establish the foundational importance of high-altitude imagery for structural detection [11] and the necessity of precise geospatial indexing for long-term site analysis [12]. Advancements in LiDAR visualization have revolutionized the mapping of architectural footprints in the Maya lowlands [13]. Furthermore, the integration of UAV imaging, photogrammetry, and deep learning provides a robust baseline for documentation [14,15,16,17]; however, they rarely target objects at the sub-decimeter scale. The detection of low-visibility surface debris necessitates a transition toward high-resolution, spectrally aware autonomous workflows. Our primary targets in this study are fingernail-size, ancient pottery sherds, averaging 1.5 cm2 in size, with a distinctive reddish hue.
Recent advances in object detection and salient object detection for remote sensing imagery have addressed the challenge of discriminating structured targets from complex backgrounds at multiple scales. Methods such as HFCNet [18] and CMNFNet [19] exploit heterogeneous dual-encoder and graph-convolution architectures to fuse local and global features from optical RGB imagery, while HDNet [20] addresses infrared small-target detection through spatial frequency contrast in the Fourier domain. Multimodal approaches, including UMINet [21,22], extend this framework by fusing visible-light appearance features with depth or thermal channels. These methods demonstrate that incorporating complementary structural, spatial-frequency, or cross-modal features can substantially improve target discrimination in cluttered environments. However, a common thread across these approaches is the absence of explicit chromatic analysis: color, where present in the input, is treated as a learned spatial feature rather than a physically grounded discriminant. For targets with a known spectral signature—such as the distinctive reddish hue of Maya Pottery Red sherds—hue angle within an HSV representation constitutes a direct and measurable discriminant.
Despite these gains, a significant gap remains at the artifact scale: spectral confusion between weathered ceramic fragments and background materials such as crushed limestone in-fill from Maya foundations, cornstalks, and plastic garbage.
The Two-Phase Workflow developed here applies this principle: circular mean hue per candidate detection is extracted from 16-bit HSV imagery and used both as a loss-weighting signal during training and as a post-inference filter, making chromatic conformity to the MPR spectral centroid the primary criterion for candidate validation.

1.4. The Proposed Two-Phase Framework

While classical computer vision approaches have long employed color-space transformations (e.g., HSV or CIELAB) for segmentation [23,24], the explicit integration of chromatic priors into modern deep learning architectures, including transformer-based detectors such as ViTDet, remains limited. In most existing pipelines, color is treated as a secondary feature rather than a primary discriminant.
This study investigates chromatic priors for small-object detection in UAV-based archaeological survey. For a 16-bit HSV pipeline, a Hue-Weighted Loss Function (HWLF) is introduced, which encodes the MPR spectral signature directly within the gradient optimization process. The training behavior observed motivates a Two-Phase Workflow that applies chromatic discrimination as a post-inference filter, implementing Chromatic Signal Isolation.
The Two-Phase Workflow is developed as an architectural response to this finding. Rather than attenuating the chromatic signal to prevent extinction, the workflow exploits radiometric dominance by relocating chromatic discrimination outside the gradient optimization process entirely. Inference and filtering operate as independent stages: a standard object detector generates the candidate pool, refined through HSV-based filtering applied to unmodified 16-bit DNG imagery. This separation preserves the full discriminative strength of the MPR signature while allowing shape and texture learning to proceed without chromatic interference. The complete architecture is illustrated in Figure 3, and the process is documented in Algorithm 1.
Algorithm 1. Two-Phase Chromatic Artifact Detection Framework
Input:
16-bit HSV image array A; trained detector D;
Phase II parameters: target hue htarget = 40.6°, bounds (H(floor), H(ceil));
Smin = 18.0; weights wc = 0.70, w g = 0.30
Output:
Final detection set F
Phase I: Geometric Candidate Generation
  • Decompose A into overlapping 512 × 512 pixel tiles {tj} with stride 384 pixels (128 pixel overlap)
  • For each tile tj:
  • Apply detector D
  • b. Extract candidate detections with bounding boxes and geometric confidence scores
  • Merge overlapping detections across tile boundaries
  • For each candidate ci:
  • Compute circular mean hue hi from full-resolution array A (Equation (8))
  • b. Compute mean saturation Si from A
  • Construct candidate set C = {ci} with attributes (bounding box, confidence, hi, Si)
Phase II: Chromatic Validation
  • For each candidate ciC:
  • Compute circular hue distance:
  • δi = min(|hihtarget|, 360° − |hihtarget|)/180°
  • b. Compute chromatic score based on δi
  • c. Compute composite score:
  • Scorei = wc × ColorScore + w g × GeomScore
    d. If (HfloorhiHceil) and (SiSmin), retain ci
  • Collect retained detections R, ranked by composite score
Post-processing
  • Apply radius-based deduplication with r = 0.01 m
  • Return final detection set F
This study makes two primary contributions. First, it demonstrates that incorporating domain-informed chromatic priors within a staged detection architecture can resolve failure modes in cases where chromatic dominance prevents stable end-to-end gradient optimization. By leveraging the full 16-bit radiometric fidelity of the sensor data, this approach decouples dominant chromatic signals from the learning of geometric features (shape and texture). Second, it introduces and evaluates a hue-weighted loss function (HWLF). In this study, the HWLF serves as a diagnostic instrument to identify optimization limits and to characterize the consistent reddish hue of ceramic artifacts relative to agricultural soils. We show that this chromatic signal, while highly dominant, destabilizes end-to-end training but functions as an effective discriminant when explicitly isolated within a staged detection framework.

2. Materials and Methods

2.1. Motivation and Conceptual Design

This small object detection challenge is addressed by formalizing the characteristic reddish hue of ancient Maya pottery sherds of northern Belize as a discriminative chromatic feature (see Figure 4a–c). The conceptual design of the detection pipeline is predicated on the formalization of “Maya Pottery Red” (MPR) as a distinct chromatic prior within a circular coordinate system. In this framework, the universe of perceivable hues is mapped as an angular continuum from 0° to 360°, with MPR serving as the target centroid.

2.1.1. Empirical Establishment of the MPR Norm

To ensure the model operates against a statistically significant representation of the target material’s spectral signature, the MPR norm was empirically established through a systematic sampling protocol. We analyzed a reference set of Maya pottery sherds (n = 502) captured under field conditions in Belize. Raw UAV DNG files were converted into standardized 16-bit HSV NumPy arrays to preserve spectral depth necessary for high-fidelity chromatic analysis. A comparison of the derived hue (hex = # A46F43) with a confirmed detection is shown in Figure 5.
To isolate the diagnostic ‘Maya Pottery Red’ signature from ambient environmental noise, input imagery was transformed from the standard RGB additive model to the Hue-Saturation-Value (HSV) color space. This conversion follows established frameworks for color-based object identification, which utilize color-space thresholding as a primary classification tool to enhance feature discriminability [23].
To establish a statistically robust Maya Pottery Red (MPR) chromatic norm, characteristic hue signatures were extracted from ground-truthed specimens defined by Human-in-the-Loop (HITL) polygon shapefiles. As illustrated in Figure 6, this workflow involved isolating pixel arrays from high-resolution UAV imagery followed by a two-pixel edge-erosion step to eliminate boundary-pixel contamination from surrounding soils. For each sherd, the representative hue was computed using a circular mean—a necessary statistical treatment for angular data to avoid wrap-around artifacts inherent in linear averaging. The resulting 16-bit HSV NumPy arrays provided the empirical basis for the numerical MPR centroid and the hue-window thresholds employed in the Phase II filtering stage.

2.1.2. Chromatic Range and Inductive Bias

The objective of the hue-weighted prior is to quantify the “distance” of any detected candidate from the MPR norm, calculated as an angular offset on a 360° chromatic circle. For computational efficiency, these angular values are normalized to a float range of 0.0 to 1.0. This float is utilized during the processing of the 16-bit NumPy arrays, while geospatial and administrative metadata are carried forward in an accompanying JSON sidecar to ensure data persistence throughout the pipeline.
To determine if a candidate detection’s color constitutes an archaeological match, we measure the circular mean of all pixels within the artifact outline. This calculation excludes a two-pixel erosion boundary to mitigate spectral contamination from adjacent soil or vegetation. Based on the empirical distribution of the sampled sherds, the hue-weighted prior produces three emergent behavioral zones as a continuous function of angular deviation from the MPR centroid—not as discrete logical gates:
(1)
Low-penalty zone: Candidates whose mean hue closely approximates the MPR norm (within approximately ±10°) incur minimal weighting penalties, allowing the model to prioritize geometric learning for chromatically consistent detections.
(2)
Graduated amplification zone: As angular deviation increases beyond this region, the cubic penalty function produces progressively stronger suppression, with penalty severity scaling nonlinearly with distance from the centroid. This scaled penalty assignment is referred to as the Accelerated Penalty region in Figure 7.
(3)
Effective exclusion zone: Candidates deviating beyond approximately 50° from the MPR centroid accumulate penalties sufficient to suppress detection in practice. This is an emergent consequence of the cubic weighting function, not a hard threshold. Hard chromatic gates—where detections are accepted or rejected by fixed hue bounds—appear only in the Phase II post-inference filter and constitute a separate mechanism from the HWLF.
This relationship is formalized as a hue-weighted objective prior, implemented as a hue-weighted loss function component. By integrating this domain-specific chromatic bias into the detection objective, the system maintains compatibility with standard architectures while ensuring that spectral information is weighted alongside geometric and semantic features. This inductive bias (here, the encoding of a domain-specific chromatic prior directly into the loss function) directs the model toward recognizing certain characteristic patterns associated with Maya ceramic properties, and specifically the iron-oxide signatures [25] within complex and noisy environmental contexts.

2.2. Mathematical Formalization of Hue-Weighted Loss Function

Custom loss function construction for small-object detection represents an active area of research, motivated by the limitations of standard loss formulations that apply uniform weighting across all positive samples regardless of domain-specific discriminative structure [26]. Principled approaches to managing competing loss components and imbalanced sample distributions during training have been shown to substantially affect detection outcomes [27,28]. In this study, the Hue-Weighted Loss Function (HWLF) formalizes hue as a primary discriminant by encoding chromatic distance from the MPR target directly within the training objective, providing both a theoretical basis for chromatic integration and a diagnostic instrument for understanding optimization behavior under severe class imbalance.

2.2.1. Circular Hue Distance Metric

Since Hue is a periodic quantity defined on the 0° = 360° range, the distance between a predicted hue hi and target “Maya Pottery Red (MPR) hue htarget must be calculated using a circular metric. This accounts for the 0°/360° wraparound discontinuity. We define the angular distance dhue as shown in Equation (1):
dhue(hi, htarget) = min(|hi − htarget|, 360° − |hihtarget|)
To facilitate integration into the loss function, this distance is normalized to the interval [0, 1] as shown in Equation (2), where 0 represents the perfect chromatic match and 1 represents the maximum possible chromatic opposition (180°):
δi = dhue(hi, htarget)/180°
Here, δi ∈ [0, 1] represents the normalized hue deviation for object i. This normalization ensures the chromatic error is scale-invariant and compatible with the standard gradients used in deep learning backpropagation. Importantly, δi feeds into a smooth, continuous penalty function—no hard thresholds or discrete logic gates are applied at this stage. The graduated response to chromatic deviation is an emergent property of the cubic weighting function defined in Section 2.2.2. Hard chromatic gates, where candidate detections are accepted or rejected by fixed hue bounds, appear only in the Phase II post-inference filter and constitute a separate mechanism from the HWLF.

2.2.2. Hue-Weighted Penalty Function

To integrate chromatic priors into the optimization process, we apply a loss weighting factor based on the normalized hue deviation δi. This function is designed to amplify the distinction between the target “Maya Pottery Red” (MPR) and the complex spectral background of the agricultural fields. The hue-based weighting function is given by Equation (3):
whue(hi) = 1 + α · δi3
where:
-
α is a hyperparameter controlling penalty severity (e.g., α = 10)
-
δi3 is the cubic term that provides nonlinear amplification of the error.
Rationale:
The selection of the cubic term (δi3) is critical for balancing the model’s sensitivity. In contrast to a linear penalty, the cubic curve produces a region of low gradient sensitivity near the target centroid [29].
-
Chromatic Tolerance for MPR: Objects closely approximating Maya Pottery Red (htarget = 40.6°, δi ≈ 0): w ≈ 1. This minimal penalty allows the model to prioritize geometric learning for artifacts within the acceptable chromatic range.
-
Non-linear Rejection of Distractors: As the deviation increases, the penalty grows nonlinearly via the cubic term. Small deviations (δi < 0.3), which may be caused by minor lighting variations or mineralogical shifts in the clay, are penalized lightly, ensuring the model remains robust to heterogenous field conditions.
-
Maximum Penalty for Distant Hues: For objects at the maximum chromatic opposition (δi ≈ 1), such as certain limestone fragments or dense vegetation, the function reaches its peak weighting of w ≈ 1 + α. This severe penalty ensures that these geometrically similar but chromatically distinct materials are effectively rejected by the model.

2.2.3. Total Loss Function

The complete multi-task loss function is defined in Equation (4):
total = cls + λbbox bbox + λhue hue
total or Total Loss: The objective value minimized during training, with lower values indicating improved detection and identification of pottery sherds.
cls or Classification Loss: Quantifies object-class prediction accuracy by penalizing the misclassification of pottery sherds as background classes such as soil, rock, or vegetation.
bbox or Bounding Box Regression Loss: Measures spatial localization accuracy by penalizing deviations between predicted bounding boxes and ground-truth sherd outlines.
hue or Hue-Weighted Loss: Encourages chromatic consistency by penalizing deviations between detected object color and the defined Maya Pottery Red (MPR) target.
λ Lambda Coefficients—λbbox and λhue: Control the relative contribution of localization and chromatic error terms to the total loss during model optimization.
To mitigate potential model extinction, we conceptualized a dynamic relative loss weighting heuristic designed to modulate the loss weights to balance the influence of the HWLF based on real-time training performance. This heuristic is discussed here for theoretical completeness and was not operationalized in the experimental pipeline. Instead, a Two-Phase Workflow—separating geometric proposal generation from chromatic refinement—proved the more efficient and stable practical alternative.

2.2.4. Classification Loss (Hue-Weighted)

The hue-weighted classification loss is given by Equation (5):
cls = (1/N) Σi = 1N whue(hi) · CE(ŷi, yi)
-
CE(ŷi, yi) is cross-entropy between predicted class ŷi and ground truth yi
-
N is the number of predictions
-
whue(hi) scales the penalty based on hue distance

2.2.5. Bounding Box Regression Loss

The bounding box regression loss is defined in Equation (6):
bbox = (1/Npos) Σi∈positive smoothL1(b^ibi)
where b^i and bi are predicted and ground truth bounding boxes.

2.2.6. Hue Consistency Loss (Auxiliary Term)

The auxiliary hue consistency term is given by Equation (7):
hue = (1/Npos) Σi∈positive (δi)2
This auxiliary term explicitly encourages the model to attend to objects with hues near htarget.

2.2.7. Hyperparameters

-
htarget = 40.6° (Maya Pottery Red) as established by circular mean analysis of the 502-sherd reference set; Phase 2 chromatic filter target is refined to 38.4° following field calibration
-
α = 10 (penalty severity, tunable)
-
λbbox = 1.0 (bounding box loss weight)
-
λhue = 0.5 (hue consistency loss weight)
The operational target hue used by the Phase II chromatic filter was subsequently refined to 38.4° through iterative Human-in-the-Loop (HITL) calibration, in which expert review of successive inference passes indicated that incremental adjustment improved the true-positive to false-positive ratio of field detections.
The resulting 2.2° shift lies within the range of intra-scene chromatic variability measured across extracted sherd pixel arrays. Although both training and Independent Validation imagery originate from the same flight campaigns, local factors such as partial occlusion, soil contact, and mixed-pixel effects introduce measurable hue dispersion at the object level. Parameter calibration was performed on a held-out subset excluded from final evaluation.

2.2.8. Implementation Considerations, Hue Extraction During Training

For each annotated object i, the representative hue hi is computed as shown in Equation (8):
hi = atan2((1/|Pi|) Σ sin(H(p)), (1/|Pi|) Σ cos(H(p))) × (180/π)
where Pi is the set of pixels in object i, and H(p) is the hue value of pixel p in HSV space.
The practical implementation of these filters is handled via a modular Python 3.8.20 pipeline that processes detection metadata in a vectorized format. The system utilizes NumPy-based chromatic filtering to evaluate each candidate against the established HSV windows, ensuring that high-volume inference data is efficiently refined with minimal computational overhead.
The location of each annotated sherd was derived via inheritance from the original 16-bit DNG metadata. The extraction of these sherds for processing in the Two-Phase Workflow was managed through a Python-automated pipeline that parsed the original source images sequentially, and not as an orthomosaic. This individual-frame processing was intentionally selected to conserve the exact radiometric and geometric integrity of the sensor data.
This approach stands in deliberate contrast to the use of combined orthomosaics common in large-scale automated surveys [1,2,16,30]. While orthomosaicking is efficient for landscape-scale feature detection, the process involves radiometric synthesis and geometric interpolation that are destructive to artifact-level signals. In an orthomosaic, pixel values are resampled and averaged to create seamless transitions, a form of spectral blending that destroys the 16-bit hue precision required for Maya Pottery Red (MPR) identification. Furthermore, the geometric stretching inherent in orthorectification can distort a sharp, 12-pixel sherd into an irregular, blurred cluster, rendering it unrecognizable to the detection model. By processing raw, un-synthesized frames, our workflow ensures that the chromatic prior is evaluated against the most accurate representation of the physical artifact.

2.2.9. Gradient Flow

The hue-weighted term whue(hi) scales the classification gradient as shown in Equation (9):
cls/∂θ = (1/N) Σi = 1N whue(hi) · ∂CE(ŷi, yi)/∂θ
This forces the network to prioritize correct classification of objects with hues near htarget.
Under conditions of severe class imbalance, however, this same selective pressure proved sufficient to overwhelm geometric gradient signals entirely, resulting in total suppression of detections—a finding documented in Section 3.2 and instrumental in motivating the decoupled Two-Phase Workflow.
Operational Example—Chromatic Disambiguation: As demonstrated in Figure 7, the HWLF provides the selective pressure to differentiate geometrically similar objects. A vegetation fragment (h ≈ 120°) falls deep within the cubic amplification zone (w ≈ 1.86), while a pottery sherd (h ≈ 41°) remains in the low-penalty zone (w ≈ 1), embedding domain knowledge directly into the optimization process without architectural modification.

2.3. Implementation and Pipeline Verification

The proposed method is integrated into a high-fidelity pipeline originating from raw UAV imagery. To preserve the spectral precision required for the HWLF, imagery is concurrently converted into two formats: (1) high-bit-depth 16-bit HSV NumPy arrays, which retain the full color depth essential for accurate chromatic separation, and (2) RGB GeoTIFF images for manual annotation and quality assurance. The following subsections describe each stage of implementation in sequence.

2.3.1. Image Conversion and Annotation Workflow

Raw DNG files captured by the DJI Mavic 3 Classic UAV (native resolution: 5280 × 3956 pixels) were converted in parallel into two representations. Sixteen-bit HSV NumPy arrays were generated to preserve the full radiometric depth of the sensor data, retaining hue precision critical to the chromatic prior framework. Concurrent RGB GeoTIFF exports were produced for manual review and annotation. These GeoTIFFs were labeled using ESRI ArcGIS Pro v3.3.2 [31] Label Objects for Deep Learning geoprocessing tools, in which a domain expert drew polygon outlines around individual sherd instances, producing a georeferenced polygon shapefile. Training samples were subsequently extracted from the corresponding 16-bit HSV NumPy arrays using these shapefiles as spatial masks, with each extracted sample saved alongside a JSON sidecar file to ensure metadata persistence and georeferenced traceability throughout the pipeline.

2.3.2. Training Dataset and Augmentation

The training dataset was drawn from a collection of 500 annotated sherd instances extracted from 107 source UAV images acquired over two platform mounds in agricultural fields near Indian Church Village, Lamanai, Belize. Each sherd was extracted as an individually bounded raster tile from the underlying 16-bit HSV NumPy array, yielding 972 total sample tiles after overlapping tile extraction (mean of approximately 1.9 tiles per sherd). The dataset was partitioned into a training split of 777 tiles (80%) and a validation split of 195 tiles (20%). Note: A separate reference collection of 502 sherds was assembled independently and used exclusively for empirical establishment of the MPR chromatic norm (htarget = 40.6°, described in Section 2.2.7). The 500-sherd training set and the 502-sherd chromatic reference set are distinct collections; both were drawn from the same two platform mounds but assembled in separate annotation sessions. A slight divergence between the reference centroid and the IVT TP distribution was observed across independently assembled datasets; this relationship is discussed further in Section 3.3. Dynamic geometric augmentation was applied at training time via the custom data mapper: horizontal flip (p = 0.5), vertical flip (p = 0.5), and random 90° rotation (k randomly selected from {0, 1, 2, 3}). No chromatic augmentations were applied in order to preserve the radiometric fidelity of the 16-bit HSV training data. No pre-augmented copies were saved to disk; all transforms were generated on-the-fly during each training iteration.

2.3.3. Hue Extraction Protocol

For each annotated sherd, the representative hue value was extracted from the corresponding 16-bit HSV NumPy array using the following two-stage procedure. First, at the level of the individual sherd: the HITL polygon annotation was decoded from Run-Length Encoded (RLE) format to a binary pixel mask co-registered with the source tile array. A morphological erosion was then applied to the mask using a 3 × 3 structuring element over two iterations, effectively stripping a 2-pixel boundary from all edges of the annotated region. This erosion step was implemented to exclude pixels at the perimeter of the annotation outline, where spectral contamination from adjacent soil, vegetation, or minor digitization imprecision is most likely to occur, ensuring that only the spectrally pure interior of the sherd is sampled. Following erosion, the hue channel (channel 0 of the HSV array) was isolated and all pixel values within the eroded mask interior were extracted. The arithmetic mean of these pixel values was computed to yield a single representative hue value for that sherd, expressed as a 16-bit normalized float in the range [0, 1] (corresponding to 0–360°). Arithmetic mean is appropriate at this stage because MPR pixel values within a single sherd cluster tightly around ~40°, far from the 0°/360° angular wraparound boundary where arithmetic averaging would introduce error; the difference from a circular mean is negligible under these conditions. Second, at the level of the reference population: the circular mean was applied across the per-sherd representative hue values derived from the full 502-sherd chromatic reference set. Circular mean is the statistically correct treatment for population-level averaging of angular data, avoiding the wraparound artefacts that would affect a simple arithmetic mean when values are distributed across the full hue range. This population-level circular mean—computed as atan2(mean(sin(H)), mean(cos(H))) to respect the periodic nature of the hue dimension—yielded the empirical MPR centroid of 40.6°, which served as the reference hue for the Hue-Weighted Loss Function and the Phase II chromatic filtering threshold.

2.3.4. Model Architecture

The hue-weighted loss function is implemented as a custom module within a Cascade Mask R-CNN architecture utilizing a ViT-based backbone [32] in a Detectron2 framework [33]. Training was performed from scratch without COCO pretraining, to avoid introducing priors from RGB-dominant datasets that are not designed to discriminate detail in HSV-encoded feature spaces. This architecture employs a multi-stage refinement approach with progressively stricter IoU thresholds at each cascade stage, producing progressively higher-quality proposals and supporting precise localization of small objects. The ViTDet backbone replaces the traditional ResNet/FPN with a plain Vision Transformer, enabling global self-attention across the full feature map without the inductive locality bias of convolutional architectures.

2.3.5. Training Configuration

Training was configured for a maximum of 50,000 iterations but terminated due to detection extinction before this limit was reached, completing approximately 6000 iterations on a single NVIDIA RTX A4000 Laptop GPU (8 GB VRAM), using gradient accumulation to manage memory constraints. Internal validation using a dynamic mean average precision (mAP) metric was monitored throughout training. A detailed analysis of training behavior, including the emergence of detection suppression under the HWLF and the characterization of the extinction phenomenon, is provided in Section 3.2. The outcome of this training phase motivated the architectural pivot described in Section 2.3.6.

2.3.6. Two-Phase Workflow Implementation

Preliminary experiments integrating the HWLF directly into end-to-end optimization produced unstable training behavior under severe class imbalance (Section 3.2). To preserve chromatic discrimination while maintaining detector stability, chromatic validation was decoupled from geometric proposal generation into a Two-Phase Workflow leveraging the existing HSV pipeline and chromatic analysis infrastructure.
Phase I performs geometric candidate generation using the Cascade Mask R-CNN with a custom sliding window tiling strategy, which decomposes each source image into overlapping 512 × 512-pixel tiles at a stride of 384 pixels (128-pixel overlap), maintaining coverage of small objects that would otherwise straddle tile boundaries. Phase II applies the chromatic analytical tools as a deterministic gatekeeper, evaluating each candidate’s mean hue, saturation, value, and hue standard deviation against the calibrated MPR norm using 16-bit HSV data. This modular architecture repurposes the chromatic logic underlying the HWLF as a precision-targeted post-inference validation filter while maintaining the structural stability of the detection pipeline (Figure 8).
Phase II parameters were calibrated using a held-out subset of imagery through iterative evaluation against expert-validated annotations. Visual summaries of ranked detections were generated to assess the separation between target and background chromatic signatures, supporting the selection of stable parameter configurations (Figure 9).
Critically, once Phase I inference is complete, the resulting master catalog permits subsequent re-parameterization of Phase II chromatic thresholds in approximately 1.2 s per iteration, enabling rapid exploratory calibration without repeating inference.

2.3.7. Tiling and Computational Performance

Processing was performed on an NVIDIA RTX A4000 Laptop GPU (8 GB VRAM). Phase I decomposed each of the 100 source HSV arrays (5280 × 3956 pixels) into overlapping 512 × 512-pixel tiles at a stride of 384 pixels (128-pixel overlap), yielding 13,992 non-empty tiles; an additional 1408 tiles were bypassed automatically via a median-saturation threshold (<0.05), eliminating inference on bare-soil and structurally empty frames. Tiling completed in approximately 7.5 min. The Cascade ViTDet model was applied to all 13,992 tiles, producing 727,991 raw candidate detections; post-inference chromatic enrichment retained 177,148 candidates in the master catalog. Full inference required approximately 1 h 50 min, yielding a total pipeline runtime of approximately 5 h for the 100-image evaluation dataset.

2.3.8. Evaluation Criteria

Detection performance was evaluated using standard object detection metrics: Precision = TP/(TP + FP); Recall = TP/(TP + FN); F1 Score = 2 × (Precision × Recall)/(Precision + Recall); where TP, FP, and FN denote true positives, false positives, and false negatives, respectively, as determined by comparison with expert HITL annotations (see Section IVT Protocol and Results). A centroid-based matching criterion was employed. A detection was considered a true positive if its centroid, buffered by a 1.0 m radius, intersected a ground-truth annotation polygon. The 1.0 m radius is a conservative tolerance grounded in three independent sources of positional uncertainty. First, the DJI Mavic 3 Classic platform specifies a horizontal hovering accuracy of ±0.5 m under GNSS-only satellite positioning [34]; the 1.0 m matching radius is twice this value. Second, at the ultra-low acquisition altitudes used in this study (2.0–3.0 m above the home takeoff point), the ground sampling distance ranged from approximately 0.55 mm/pixel to 0.82 mm/pixel, making GSD a negligible source of matching error relative to GNSS uncertainty—the 1.0 m radius spans approximately 1200–1800 pixels at these scales. Third, flight altitude was set as meters above the GPS-determined home takeoff point rather than as true terrain-following AGL; over a site with platform mounds of variable relief, actual clearance above the ground surface varied across each flight, introducing additional effective altitude uncertainty beyond the nominal GNSS specification. Taken together, these three factors confirm that the 1.0 m matching radius is a conservative and well-grounded tolerance for this acquisition geometry. Matching was performed under a radius-coverage approach in which each annotation was assessed independently against all detections, without penalizing dense detection clusters for proximity to already-matched neighbors. The full evaluation methodology is described in Section IVT Protocol and Results.

3. Results

3.1. Diagnostic Evaluation of the Hue-Weighted Loss Function

The Hue-Weighted Loss Function (HWLF) was initially evaluated as a mechanism for encoding chromatic priors within the training objective. Across multiple training configurations (Table 2), incorporation of the HWLF resulted in increased total loss relative to baseline models. This increase reflects the intended penalization of chromatic deviation rather than improved predictive performance and is therefore not directly comparable across configurations with differing input normalization.
In practice, while the HWLF successfully constrained predictions toward the target hue distribution, it did not yield stable improvements in detection performance. Instead, its primary utility emerged as a diagnostic tool, revealing the influence of chromatic constraints on model optimization and motivating the development of an alternative deployment strategy.

3.2. HWLF Training Behaviour and Extinction Phenomenon

End-to-end training of the Cascade ViTDet model under the Hue-Weighted Loss Function (HWLF) produced a characteristic and diagnostically significant training trajectory. Metrics were logged every 20 iterations across the 6000-iteration run; the records summarized here are drawn from the training log of the primary HSV training run (output5_pseudo_rgb_transfer; metrics source: C:/d2/Outputs/output5_pseudo_rgb_transfer/metrics.json).
Two Detectron2-specific ROI head metrics characterize the extinction trajectory: fg_cls_accuracy, the fraction of foreground region proposals correctly classified as target objects at the primary cascade stage; and false_negative rate, the fraction of annotated ground-truth instances missed by the classifier. Across training runs, extinction onset was consistently observed within the first 100–300 training iterations (Figure 10). In the representative trajectory documented here, the model was actively detecting at iteration 19 (fg_cls_accuracy = 0.68, false_negative = 0.32), with measurable degradation by iteration 119 (fg_cls_accuracy = 0.50, false_negative = 0.50) and complete extinction—fg_cls_accuracy = 0.0, false_negative = 1.0—established before iteration 300. Notably, this collapse occurred entirely within the learning rate warmup phase, before the optimizer had reached its target learning rate, indicating that the chromatic gradient was sufficient to redirect optimization even at sub-warmup gradient magnitudes.
Critically, total_loss continued to decrease monotonically throughout training: from 272.23 at iter 19 to 0.73 at iter 4999. The optimizer was functioning correctly and converging to a low-loss solution. The solution it found was degenerate: the region proposal network (RPN) learned to suppress foreground region proposals almost entirely, such that the ROI head was presented overwhelmingly with background samples. The background-to-foreground ratio in the ROI head grew progressively throughout training—approximately 5:1 at early iterations, rising through approximately 270:1 at iter 3279 and 449:1 at iter 3299—reaching a stable terminal state of 511:1 by iter 4999 (roi_head/num_bg_samples = 511.0, num_fg_samples = 1.0—Detectron2 ROI head sample counts recording the mean number of background and foreground proposals, respectively, presented to the classifier per training batch; fg_cls_accuracy = 0.0 across all three cascade stages). The RPN’s anchor classification loss (loss_rpn_cls—a Detectron2 metric for the binary objectness loss within the Region Proposal Network, measuring how confidently the RPN distinguishes object-containing from empty image regions) declined from 109.9 at iter 19 to 0.20 at iter 4999, confirming that the network successfully learned to treat nearly all regions as background. Overall accuracy (cls_accuracy), which counts background classifications, remained high throughout at 0.998 at iter 4999—because correctly classifying 511 background samples per batch is trivially achievable once no foreground candidates are proposed.
The mechanism by which the HWLF induced this degenerate solution operated at the batch level rather than the per-candidate level. For each training batch, the HWLF computed a single scalar weight from the mean of the per-annotation hue penalty values attached to that batch’s ground-truth instances. This scalar was then applied uniformly to all classification loss terms (loss_cls) across all three cascade stages. The per-annotation penalty followed the formula w = 1 + α·(δ/180)3, where α = 10 and δ is the distance between each annotation’s representative hue and the 40.6° MPR reference. In the implementation used during this training run, the hue values stored by the data mapper were normalized to the [0, 1] floating-point range of the 16-bit HSV arrays, while the reference hue parameter remained expressed in degrees (40.6°). This unit mismatch produced a near-constant δ ≈ 40.5° for virtually all annotations, regardless of their actual hue, yielding a batch weight of approximately w ≈ 1.114—a flat ~11% amplification of classification loss applied to every training batch that contained annotated foreground instances.
While modest in isolation, this constant amplification operated under conditions of severe class imbalance. The class imbalance ratio in the ROI head began at approximately 2.5:1 and grew to 511:1 as the RPN progressively suppressed foreground proposals. Under these conditions, the gradient path of least resistance was to make no foreground proposals at all: without foreground candidates, the HWLF scalar weight became inapplicable (no annotated instances reached the ROI head), classification loss approached zero, and total loss converged to the residual contributed by box regression and RPN terms. This is the degenerate solution the model found and stabilized in.
The extinction behavior is not a training failure in the conventional sense; it demonstrates that even a moderate, constant amplification of classification loss is sufficient to drive a detector into the all-background degenerate solution when operating under extreme class imbalance (Table 3). This finding establishes an important architectural constraint: end-to-end chromatic loss weighting in a region-based detector requires either (1) a bidirectional formulation that rewards on-target hue while penalizing off-target hue, preventing the zero-detection path from being loss-minimizing, or (2) a post hoc decoupling of geometric and chromatic functions. The latter approach was adopted for the present study and is described in Section 2.3.6.

3.3. Munsell Calibration and Chromatic Fidelity

The MPR chromatic norm of 40.6°—established by circular mean analysis of the 502-sherd reference set (Section 2.1.1)—was mapped to the Munsell color system with a standard colorimetric conversion to obtain the 40.6° measurement within the Yellow-Red (YR) hue band. MPR resolves, under a D65 illuminant, to the standard Munsell designation 7.5YR 5/6; the corresponding 8-bit sRGB rendering of that Munsell chip is #A46F43. While archaeological literature suggests a slightly darker average for Maya Red [8], this field-condition estimate accounts for weathering, ambient illumination, and the sensor’s dynamic range. The operational filter target was subsequently refined to 38.4° through iterative HITL calibration against 523 field detections, as described in Section 2.2.7.
The 38.4° operational target hue was established during iterative HITL calibration prior to final evaluation and remained fixed during inference on the held-out IVT dataset. The observed true-positive hue distribution (~35.6° centroid) therefore reflects the chromatic characteristics of successfully detected objects within an independent image population rather than a recalibration of the deployment threshold itself. Because the IVT dataset consisted of previously unseen photographs and independently sampled target objects acquired under different local scene conditions, variation between the operational threshold and the empirical TP centroid is expected and consistent with normal intra-scene and inter-sample chromatic variability.

3.4. Detection Performance of the Two-Phase Workflow

Across 100 UAV images acquired under variable illumination conditions, the chromatic filtering stage reduced 177,148 geometric candidate detections to 1647 filtered detections, corresponding to a 99.1% reduction from the Phase I master catalog.
Despite this reduction, 97.8% of independently annotated target objects were retained, indicating that the chromatic prior effectively suppresses background detections while preserving the majority of true positives.
These results demonstrate that post-inference chromatic refinement in HSV color space provides an efficient mechanism for small-object discrimination in high-clutter environments.
The structure of the resulting detection records is summarized in Table 4, with the full attribute schema and dataset provided in Table S2 and the Supplementary Materials.

3.5. Independent Validation

The Two-Phase Workflow was evaluated on an Independent Validation dataset consisting of 100 previously unseen UAV images acquired over the same study area. None of these images were used during training or internal calibration. All metrics are reported with respect to an independently annotated validation set using the centroid-based matching criterion defined in Section IVT Protocol and Results.

IVT Protocol and Results

The term Independent Validation Test (IVT) is used in preference to the more common designation External Validation Test to reflect the specific design of this evaluation. External validation typically implies imagery sourced from a different acquisition campaign, geographic location, or sensor platform. In this study the 100 validation images were acquired during the same UAV flight campaign as the training imagery, over the same two platform mounds, on the same day. They are therefore not external in origin. Independence is established instead at two levels. First, none of the 100 images were used during model training, internal calibration, or parameter optimization—they were withheld from the development process in their entirety. Second, all ground-truth annotations for the IVT dataset were produced by one of this project’s co-authors, who had no involvement in model development, training, or hyperparameter tuning. As a Lamanai drone and GIS specialist with fifteen years of pedestrian survey and ceramic analysis experience at the study site, this co-author annotated the 100 images independently and compared those annotations against the model inferences to produce the confusion matrix of TP, FP, and FN counts from which all reported performance metrics are derived.
To perform the IVT, a set of 100 original, never-before-viewed images of the same two pedestrian-surveyed mounds was extracted from the flight dataset. The source images used for IVT are of the same field and recorded on the same day as the images used for training the original model with the HWLF, but none of the photographs from the IVT dataset are ones that were used in training the deep learning model. These 100 DNG files were converted into two formats: 16-bit HSV NumPy arrays with JSON sidecars, and GeoTIFFs. The NumPy/JSON sets were used as source imagery in the Phase I inferencing stage with a Cascade Mask R-CNN with a ViTDet (Vision Transformer Detector) backbone [32] in a Detectron2 framework [33]. Each of these images had its raster separated with a custom overlapping tile inference approach (512 × 512 px tiles, 128 px overlap, stride 384 px) in a SAHI-tiling style [35].
The Phase I inferencing stage of these tiles resulted in 727,991 raw candidate detections across 13,992 tiles from 100 images, before any chromatic filtering. The very large number of detections resulted from a vast proliferation of limestone pebbles from the crushed building stones and rubble fill that was used in the foundations of the construction of the structural foundations of the ancient Maya platforms. Scattered among these many tens of thousands of tiny limestone fragments, even expert Human-in-the-loop (HITL) annotators could only independently discern approximately 500 sherds each through painstaking manual review. The standard architecture was unable to effectively discriminate between the pebbles and the sherds, a problem addressed by the application of chromatic filtering in the HSV pipeline as a post-inference discriminant.
Hyperparameter configurations were iteratively evaluated using structured outputs consisting of georeferenced detection records and corresponding visual summaries. Performance was assessed against expert-validated annotations, with candidate configurations required to achieve a minimum true positive rate of 80% across the top 100 ranked detections.
Validation was conducted using annotated contact sheets reviewed by domain experts, and acceptance required consistent performance across two consecutive configurations to ensure stability rather than stochastic variation. A detection was considered a true positive if its centroid, buffered by a 1.0 m radius, intersected a ground-truth annotation polygon.
Under radius-coverage matching, each annotation was independently assessed against all detections; dense detection clusters were not penalized for proximity to already-matched neighbors. This centroid-buffer framework provides a spatially appropriate alternative to IoU metrics, which are not applicable to point-based detection outputs, and accounts for the ~1–3 m horizontal positional uncertainty of non-RTK UAV systems.
The Independent Validation Test (IVT) was conducted by an expert annotator serving as a Human-in-the-Loop (HITL) to ensure the highest possible ground-truth authority. This expertise is grounded in fifteen years of pedestrian survey, ground-truthing, and ceramic analysis within the Lamanai settlement zone, supplemented by three years of UAV-specific experimentation. By leveraging this deep longitudinal knowledge of the local ceramic assemblage and its geological context, the HITL generated an authoritative shapefile of sherd outlines within QGIS v.3.34 [36].
During evaluation, a substantial portion of detections initially classified as false positives were found, upon expert review, to correspond to real targets that were not annotated in the reference dataset. After adjudicating these cases, 101 of the 786 false positives were reclassified as true positives, and the ground truth was updated accordingly. All performance metrics reported in this study are based on the corrected TP/FP/FN counts. This outcome is consistent with the established challenge of annotation incompleteness in high-entropy, small-object remote sensing datasets, and reinforces the proposition that automated detection acts as a necessary reductive filter for analyst fatigue, contributing to a more exhaustive artifact enumeration when integrated with expert review [37].
In this way, after authoritative annotation by expert HITL, these 100 images were subjected to object detection analysis by our two-phase system. The detections were compared against the HITL annotations to generate totals for TP, FP, and FN detections. The IVT process yielded the respective total amounts as True Positive (962), False Positive (685), False Negative (22).
Performance was assessed using standard object detection metrics, including precision, recall, and F1-score, based on comparison with expert-validated ground-truth annotations. The results (Table 5) demonstrate that the Two-Phase Workflow maintains high recall while substantially improving precision relative to geometric-only detection.
These totals were used to compute standard evaluation metrics, including Precision, Recall, F1 score, and Average Precision (AP). Table 5 reports these metrics for the Independent Validation Test (IVT), as well as for the Phase I inference stage and Phase II filtering stage.
Ground Truth (GT) = 984 = the total number of pottery sherd outlines independently drawn by our expert HITL annotator across all 100 images of the 100pix IVT dataset—a Lamanai specialist with fifteen years of pedestrian survey experience in that settlement zone.
Of those 984 annotated sherds:
-
962 were detected by the Two-Phase Workflow (TP)
-
22 were missed by it (FN)
The 984 also includes the 101 sherds that the detector found but the annotator had initially missed—those were added to GT during the HITL adjudication review, bringing the count from the original ~883 up to the final 984.
Phase II metrics were computed using radius-coverage matching at R = 1.0 m against 984 ground-truth annotations, accounting for the ±0.5 m horizontal positional uncertainty of the DJI Mavic 3 Classic under GNSS-only operation [34]. Phase I AP was computed using centroid-distance matching at the same radius.
The relationship between Phase I inference and Phase II chromatic filtering constitutes a three-stage reduction funnel: raw geometric candidates (727,991) → HSV-enriched master catalog (177,148) → chromatically filtered high-confidence detections (1647). The effect of Phase II chromatic filtering on precision is illustrated in Figure 11.
Phase I inference produced 727,991 raw geometric candidates across 13,992 tiles. Initial chromatic enrichment retained 177,148 candidates in the master catalog—representing all detections meeting minimum geometric confidence thresholds and carrying full HSV metadata. Phase II chromatic filtering applied stringent hue, saturation, value, and hue-standard-deviation thresholds derived from the circular statistical analysis of the iron oxide-rich clay signature established as the Maya Pottery Red (MPR) norm, reducing the candidate set to 1647 high-confidence detections submitted for expert validation.
The 1647 chromatically filtered detections represent the Two-Phase Workflow’s final validated output, reduced from 177,148 candidates retained in the Phase I master catalog. Critically, that catalog and the Phase II master catalog remain available as reproducible assets: any re-parameterization of the Phase II chromatic thresholds can be applied to the full 177,148-candidate set in approximately 1.2 s per iteration, enabling rapid sensitivity analysis without repeating the computationally intensive inference stage.
These findings indicate that hue-based post-inference filtering can be reliably applied to unseen data without retraining the underlying detection model (see Figure 9).

4. Discussion

This study sought to determine whether low-altitude UAV imaging coupled with hue-aware object detection could reliably identify ancient Maya pottery fragments in situ.
The detection suppression documented in Section 3.2 demonstrates the potency of chromatic priors as a selective pressure under extreme class imbalance—sufficient to dominate optimization dynamics when not balanced by an explicit reward structure for valid detections.
This finding establishes an important architectural constraint for chromatically weighted region-based detectors: stable single-phase convergence likely requires either (1) a bidirectional formulation that rewards on-target hue while penalizing off-target hue, thereby preventing the zero-detection path from becoming loss-minimizing, or (2) decoupling of geometric proposal generation and chromatic validation into separate operational stages. The latter approach was adopted in the present study through the Two-Phase Workflow described in Section 2.3.6.
These results establish that the MPR chromatic signature is radiometrically dominant in the gradient landscape: when encoded as a static loss penalty, hue-weighted gradients suppress all other feature learning. The integrated Two-Phase Workflow—combining Cascade Mask R-CNN candidate generation with HSV NumPy-based chromatic filtering—exploits this dominance post-inference, demonstrating consistent efficacy in isolating artifacts within the complex agricultural landscapes of northern Belize.
By prioritizing HSV-encoded spectral metadata, the workflow preserved radiometric fidelity while establishing hue as a primary discriminant variable. These findings indicate that spectrally faithful HSV-domain processing provides a principled mechanism for achieving the chromatic discrimination precision required for effective artifact-scale UAV detection in high-clutter environments.

4.1. Chromatic Priors as a Diagnostic and Design Tool

This study demonstrates that chromatic priors can play a meaningful role in small-object detection, particularly in environments where geometric and textural cues alone are insufficient for reliable discrimination. The utility of chromatic priors extends to any domain where targets are defined by distinctive, non-natural spectral signatures—such as search-and-rescue operations, environmental monitoring, commercial branding, industrial automation, and traffic analysis—where the target’s hue remains a reliable diagnostic even when its geometry is obscured by environmental noise. The Hue-Weighted Loss Function (HWLF) provides a formal mechanism for encoding hue as a circular variable within the optimization process, allowing domain-specific chromatic priors to be explicitly reified within the model training architecture.
The experimental results show that direct integration of the HWLF into end-to-end training under severe class imbalance induces radiometric dominance: hue-weighted penalties suppress shape and texture gradients, driving the optimization process toward the degenerate all-background solution documented in Section 3.2. This behavior is interpreted as empirical evidence of the discriminative strength of the MPR chromatic signature within the gradient landscape—a finding with direct implications for chromatic prior design in imbalanced detection tasks [38].
This study did not pursue mitigation of the HWLF extinction behavior—whether through dynamic loss modulation, focal loss integration, or learning rate scheduling—as the Two-Phase Workflow satisfied the operational requirements of the project. The HWLF formulation is offered to the field as a starting point; dynamic chromatic loss modulation consistent with focal loss modulation principles [27,29,39] and time-varying penalty schedules are explored as future directions in Section 4.5.

4.2. Decoupled Architectures for Stable Deployment

Because it preserves precise chromatic values across the full sensor dynamic range, the 16-bit HSV pipeline was a central design feature of the detection pipeline from the outset. When end-to-end HWLF training proved unstable under severe class imbalance, the existing HSV analytical infrastructure facilitated the transition to a post-inference operation, motivating the adoption of the decoupled Two-Phase Workflow.
The Two-Phase Workflow resolves the optimization instabilities observed during end-to-end training through modular separation of geometric proposal generation and chromatic validation. Within this framework, hue functions not as a generic learned feature, but as a physically grounded deterministic chromatic constraint. This constraint is anchored in the empirical MPR norm established in Section 2.1.1 and is operationalized through the circular distance metrics defined in Equations (1) and (2).
This separation preserves explicit chromatic measurement outside the gradient optimization process while allowing hue to function as an interpretable discriminant variable. By gating candidates against the MPR centroid (Section 2.3.6) using the extraction protocols defined in Equation 8, the system maintains radiometric fidelity in a controlled and interpretable manner. In contrast, standard RGB-based pipelines typically operate on reduced bit-depth representations, in which conversion from higher-precision color spaces can compress subtle chromatic variations, limiting the fidelity of hue-based discrimination.
Viewed more broadly, the workflow represents a transition from a purely learned representation toward a hybrid domain-informed detection system capable of maintaining stable performance despite the extreme class imbalance inherent in archaeological surface survey.

4.3. Implications for Remote Sensing and Archaeological Survey

The results have direct implications for UAV-based archaeological survey, where detection targets are often small, sparse, and visually ambiguous. The ability to reduce false positives by over 99% while preserving the majority of true detections suggests that chromatic refinement can substantially improve the efficiency of downstream human validation workflows. The use of 16-bit imagery derived from raw sensor data preserves radiometric fidelity that is typically lost in standard orthomosaic pipelines. This enables more reliable exploitation of subtle spectral differences, such as the “Maya Pottery Red” (MPR) signature, which may not be recoverable from compressed imagery. The centroid-buffer evaluation framework further reflects practical constraints in UAV data acquisition, explicitly accounting for positional uncertainty in non-RTK systems. This reinforces the importance of aligning evaluation methodology with sensor characteristics in applied remote sensing contexts.

4.4. Limitations and Future Work

Despite these advances, several limitations remain. The effectiveness of the approach depends on the presence of a stable and discriminative chromatic signature, which may not be available in all environments. Variability in illumination, soil moisture, and sensor calibration can introduce shifts in observed hue, potentially reducing robustness across acquisition conditions.
The present study validates the chromatic decoupling framework within a single environmental and ceramic context; broader transferability across landscapes, sensors, and ceramic assemblages remains a subject for future evaluation. This geographic specificity is a byproduct of the high-resolution requirements of the study, which necessitated a deep focus on the Lamanai residential zones to establish the MPR baseline.
Future work will focus on expanding validation across more diverse datasets and environmental contexts, as well as integrating illumination normalization and color constancy techniques to improve cross-session stability. Additionally, incorporating complementary features such as texture, morphology, and spatial context may further enhance detection performance in cases where chromatic information alone is insufficient. Finally, the HWLF itself remains a promising area for further investigation. While its direct application in end-to-end training proved unstable in this context, alternative formulations incorporating balanced reward structures or adaptive weighting may enable more effective integration of chromatic priors within learned optimization frameworks.

4.5. Future Directions: Bidirectional Hue-Weighted Loss and Dynamic Penalty Scheduling

The unidirectional HWLF penalty produced detection extinction within 100–300 iterations by leaving the degenerate all-background path open (Section 3.2). Future implementations should adopt a bidirectional reward–penalty architecture—incentivizing detections within the MPR window while penalizing those outside it—to close this path. A time-varying penalty schedule, analogous to curriculum learning, would allow geometric competence to develop before high-magnitude chromatic pressures are introduced.
In the documented run, extinction was complete by iteration 259—within the warmup phase (target: iteration 519, learning rate 1 × 10−5)—at a learning rate of 5.18 × 10−6, approximately 52% of peak. The HWLF chromatic penalty was therefore sufficient to redirect gradient optimization before the optimizer reached its target rate. Future implementations should delay chromatic penalty activation until geometric detection has stabilized. Cross-run characterization of extinction onset was precluded in the present study, as fg_cls_accuracy and false_negative rate were logged in only a single run; future experiments should record these metrics throughout training.

4.6. The Two-Phase Workflow as a Decoupled Analytical Alternative

The gradient extinction observed under unidirectional HWLF training establishes a critical boundary condition: when a chromatic prior exerts such significant selective pressure, decoupling it from the initial optimization process becomes an architectural necessity. The Two-Phase Workflow addresses this by separating geometric candidate generation from chromatic validation. Rather than allowing the MPR spectral prior to compete for gradient influence with the geometric learning signal, the Two-Phase architecture utilizes it as a deterministic filter operating on the pre-computed geometric candidate catalog.
This modularity preserves the full analytical infrastructure of the HWLF while applying chromatic constraints post-inference, allowing the MPR signal to function as a precise discriminant without triggering the degenerate suppression paths documented in Section 3.2.
By isolating the MPR signal within a post-inference phase, the system achieves discriminative precision that would be inherently unstable if integrated directly into geometric backbone training. The decoupled architecture therefore represents a strategic alignment with the specific radiometric and geometric properties of the target.

5. Conclusions

This study investigated the integration of chromatic priors into a deep-learning detection pipeline for UAV-based archaeological survey. An HSV pipeline was employed to provide high-quality chromatic data for hue discrimination, and a Hue-Weighted Loss Function (HWLF) was developed to bias model training toward the spectral signature of Maya Pottery Red (MPR) sherds by encoding circular hue distance within the classification loss. Experimental results demonstrated that, under conditions of severe class imbalance, the chromatic penalty exerted sufficient selective pressure to drive gradient extinction within 300 iterations—a failure mode in which the model collapses to an all-background solution. This outcome was reproducible across multiple configurations and constitutes a diagnostic finding: it confirms that MPR hue is a sufficiently dominant discriminant to overwhelm geometric gradient signals when applied end-to-end under the constraints of the present dataset.
In response to this behavior, a Two-Phase Workflow was developed to decouple geometric candidate generation from chromatic validation. This architectural transition resolved the gradient extinction issues observed during initial training. Applied to an independent validation dataset of 100 UAV images, the Two-Phase Workflow reduced the Phase I candidate pool of 177,148 geometric detections to 1647 prioritized candidates—a 99.1% reduction—while retaining 97.8% of annotated ground-truth sherds (F1 = 0.731). Empirical validation indicates that HSV-based post-inference refinement provides the discrimination thresholds required for viable surface survey, effectively mitigating the false-positive proliferation characteristic of UAV-based detection in high-clutter archaeological environments.
The results demonstrate that chromatic priors are most effective when applied deterministically to radiometrically faithful imagery rather than integrated into gradient-based optimization under extreme class imbalance. The 16-bit HSV pipeline preserves the full dynamic range of the sensor output, enabling precise isolation of the MPR spectral centroid at h = 40.6° with saturation floor s ≥ 18.0. While these results specifically characterize the surface assemblages of Northern Belize, the present study validates the chromatic decoupling framework within a single environmental and ceramic context; broader transferability across landscapes, sensors, and assemblage types remains a subject for future evaluation. Future work will investigate bidirectional loss formulations, dynamic penalty scheduling, and multi-site transfer validation to establish the broader applicability of the chromatic decoupling framework. The chromatic decoupling framework nonetheless provides a scalable template for integrating domain-specific chromatic priors into diverse archaeological and remote sensing contexts. This approach is generalizable to any detection task where the target class presents a stable chromatic signature, including search-and-rescue and environmental monitoring—where spectral priors can serve as post-inference gatekeepers.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/rs18111836/s1; Table S1: Phase I Master Catalog—177,148 geometry-scored detection candidates (NewRiver_Phase1_Master_Catalog_Supplementary.zip); Table S2: Phase II Final Detections—1647 post-filter MPR sherd candidates including shapefile and processing script (NewRiver_Phase2_Final_Detections_Supplementary.zip). Additional information can be found at https://doi.org/10.7945/3as8-z409 (accessed on 20 March 2026).

Author Contributions

Conceptualization, A.M.; methodology, B.B. and A.M.; software, B.B.; validation, B.B. and A.M.; formal analysis, B.B., A.M. and N.D.; data field collection: A.M. and B.B.; data curation, B.B.; writing, original draft preparation, B.B.; writing, review, editing, A.M. and N.D.; visualization, B.B.; supervision, A.M. and N.D.; project administration, B.B. and A.M.; funding, B.B. and A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the 2025 BearcatAI Grant Award Program, University of Cincinnati.

Data Availability Statement

The data presented in this study are openly available in Scholar@UC at https://scholar.uc.edu/concern/datasets/h702q793b?locale=en (accessed on 20 March 2026). The available dataset includes high-resolution drone imagery tiles, geometric proposals from the Phase I detection model, the final validated pottery sherd catalogs, and the executable versions of the Two-Phase Workflow inference and filtering method as New River Pottery Sherd Detection System v1.a, a 64-bit Windows executable file (.exe).

Acknowledgments

The following entities provided invaluable logistical and technical support; the Department of Anthropology and the Department of Geography & GIS at University of Cincinnati; Helen Haines, Ka’kabish Archaeological Research Project. The authors acknowledge the use of AI-assisted tools in the preparation of this manuscript, including Anthropic Claude Code 2.1.126 and Claude.ai, Google Gemini v.3.5, Microsoft Copilot v1.25054.80.0, ESRI ArcGIS Pro 3.3.2, Affinity Photo 1.10.6.1665, Anaconda 24.11.3, and Zotero 7.0.15, for tasks including pipeline development, literature organization, and manuscript preparation. All AI-assisted outputs were reviewed and edited by the authors, who take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AGLAbove Ground Level
AIArtificial Intelligence
CECross-Entropy
DEMDigital Elevation Model
DNGDigital Negative (a raw image file format)
IVTIndependent Validation Test
GISGeographic Information Systems
HITLHuman-in-the-Loop
HSVHue, Saturation, Value (color space)
HWLFHue-Weighted Loss Function
IoUIntersection over Union
JSONJavaScript Object Notation
LiDARLight Detection and Ranging
mAPmean Average Precision
MPRMaya Pottery Red
ORCID iDOpen Researcher and Contributor ID
RGBRed, Green, Blue (color space)
ROIRegion of Interest
RTKReal-Time Kinematic
SAHISlicing Aided Hyper Inference
SODSmall Object Detection
UAVUnmanned Aerial Vehicle
ViTVision Transformer
ViTDetVision Transformer Detector

References

  1. Casana, J.; Wiewel, A.; Cool, A.; Hill, A.C.; Fisher, K.D.; Laugier, E.J. Archaeological aerial thermography in theory and practice. Adv. Archaeol. Pract. 2017, 5, 310–327. [Google Scholar] [CrossRef]
  2. Orengo, H.A.; Garcia-Molsosa, A. A brave new world for archaeological survey: Automated machine learning-based potsherd detection using high-resolution drone imagery. J. Archaeol. Sci. 2019, 112, 105013. [Google Scholar] [CrossRef]
  3. Katz, S.A.; Kimmel, A.P.; Wilk, E. Fieldwalking into the Twenty-First Century: The Enduring Strengths of Pedestrian Survey and Opportunities for Innovation. SAA Archaeol. Rec. 2026, 26, 33–37. [Google Scholar]
  4. McLellan, A.; Haines, H.R.; Bacon, J. Lidar, Hydrology, and Wetland Management Strategies in the Periphery of Lamanai, Belize. Lat. Am. Antiq. 2026, 1–16. [Google Scholar] [CrossRef]
  5. Rushton, E.A.; Metcalfe, S.E.; Whitney, B.S. A late-Holocene vegetation history from the Maya lowlands, Lamanai, Northern Belize. Holocene 2013, 23, 485–493. [Google Scholar] [CrossRef]
  6. McLellan, A. From Lamanai to Ka’kabish: Human and Environment Interaction, Settlement Change, and Urbanism in Northern Belize. Doctoral Dissertation, UCL (University College London), London, UK, 2020. Available online: https://discovery.ucl.ac.uk/id/eprint/10089565/ (accessed on 1 March 2026).
  7. Willey, G.R.; Culbert, T.P.; Adams, R.E. Maya Lowland ceramics: A report from the 1965 Guatemala City conference. Am. Antiq. 1967, 32, 289–315. [Google Scholar] [CrossRef]
  8. Smith, R.E.; Gifford, J.C. Pottery of the Maya Lowlands. In Handbook of Middle American Indians, Volumes 2 and 3: Archaeology of Southern Mesoamerica; University of Texas Press: Austin, TX, USA, 1965; pp. 498–534. [Google Scholar] [CrossRef]
  9. Inomata, T.; Triadan, D.; Vázquez López, V.A.; Fernandez-Diaz, J.C.; Omori, T.; Méndez Bauer, M.B.; García Hernández, M.; Beach, T.; Cagnato, C.; Aoyama, K. Monumental architecture at Aguada Fénix and the rise of Maya civilization. Nature 2020, 582, 530–533. [Google Scholar] [CrossRef] [PubMed]
  10. Kokalj, Ž.; Džeroski, S.; Šprajc, I.; Štajdohar, J.; Draksler, A.; Somrak, M. Machine learning-ready remote sensing data for Maya archaeology. Sci. Data 2023, 10, 558. [Google Scholar] [CrossRef]
  11. Character, L.; Beach, T.; Inomata, T.; Garrison, T.G.; Luzzadder-Beach, S.; Baldwin, J.D.; Cambranes, R.; Pinzón, F.; Ranchos, J.L. Broadscale deep learning model for archaeological feature detection across the Maya area. J. Archaeol. Sci. 2024, 169, 106022. [Google Scholar] [CrossRef]
  12. Britton, B.J.; McLellan, A.; Brewer, J.; Carr, C.; Dunning, N.; Liu, L. Evaluating Broadscale Deep Learning for Maya Settlement Detection in G-LiHT Lidar. J. Archaeol. Method Theory 2025, 33, 15. [Google Scholar] [CrossRef]
  13. Thompson, A.E. Detecting Classic Maya settlements with lidar-derived relief visualizations. Remote Sens. 2020, 12, 2838. [Google Scholar] [CrossRef]
  14. Kokalj, Ž. Standardizing Visualization in Ancient Maya Lidar Research: Techniques, Challenges and Recommendations. Archaeol. Prospect. 2025, 32, 967–988. [Google Scholar] [CrossRef]
  15. Argyrou, A.; Agapiou, A.; Papakonstantinou, A.; Alexakis, D.D. Comparison of machine learning pixel-based classifiers for detecting archaeological ceramics. Drones 2023, 7, 578. [Google Scholar] [CrossRef]
  16. Agapiou, A.; Sarris, A.; Papadopoulos, N. Remote Sensing for Cultural Heritage Management: From Site Detection to Digital Documentation. Remote Sens. 2021, 13, 2115. [Google Scholar] [CrossRef]
  17. Agapiou, A.; Vionis, A.; Papantoniou, G. Detection of archaeological surface ceramics using deep learning image-based methods and very high-resolution UAV imageries. Land 2021, 10, 1365. [Google Scholar] [CrossRef]
  18. Adamopoulos, E.; Papadopoulou, E.E.; Mpia, M.; Deligianni, E.O.; Papadopoulou, G.; Athanasoulis, D.; Konioti, M.; Koutsoumpou, M.; Anagnostopoulos, C.N. 3D survey and monitoring of ongoing archaeological excavations via terrestrial and drone LIDAR. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 10, 3–10. [Google Scholar] [CrossRef]
  19. Van De Sande, K.; Gevers, T.; Snoek, C. Evaluating color descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1582–1596. [Google Scholar] [CrossRef]
  20. Bouchard, L.F.; Lazreg, M.B.; Toews, M. Coloring Deep CNN Layers with Activation Hue Loss. arXiv 2023, arXiv:2310.03911. [Google Scholar] [CrossRef]
  21. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
  22. Gao, L.; Fu, P.; Xu, M.; Wang, T.; Liu, B. UMINet: A unified multi-modality interaction network for RGB-D and RGB-T salient object detection. Vis. Comput. 2024, 40, 1565–1582. [Google Scholar] [CrossRef]
  23. Pérez, M.; de Lucio, O.G.; Mitrani, A.; Lope, C.P.; Cruz Alvarado, W.; Sobral, H.; Márquez Herrera, C.; Ortiz Ruiz, S. Tapping into the Past: First Approach to a Diachronic Material Characterization of Mayapán Pottery. Ceramics 2025, 8, 131. [Google Scholar] [CrossRef]
  24. Wang, D.; Zhu, H.; Zhao, Y.; Shi, J. A method for constructing a loss function for multi-scale object detection networks. Sensors 2025, 25, 1738. [Google Scholar] [CrossRef] [PubMed]
  25. Kendall, A.; Gal, Y.; Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 7482–7491. [Google Scholar] [CrossRef]
  26. Shrivastava, A.; Gupta, A.; Girshick, R. Training region-based object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2016; pp. 761–769. [Google Scholar] [CrossRef]
  27. Su, Z.; Adam, A.; Nasrudin, M.F. Adaptive focal loss for keypoint-based deep learning detectors addressing class imbalance. IEEE Access 2025, 13, 31842–31856. [Google Scholar] [CrossRef]
  28. Soroush, M.; Mehrtash, A.; Khazraee, E.; Ur, J.A. Deep learning in archaeological remote sensing: Automated qanat detection in the Kurdistan region of Iraq. Remote Sens. 2020, 12, 500. [Google Scholar] [CrossRef]
  29. Esri. ArcGIS Pro (Version 3.x). Environmental Systems Research Institute. 2024. Available online: https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview (accessed on 20 March 2026).
  30. Li, Y.; Mao, H.; Girshick, R.; He, K. Exploring Plain Vision Transformer Backbones for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2022; pp. 19118–19128. [Google Scholar]
  31. Wu, Y.; Kirillov, A.; Massa, F.; Lo, W.Y.; Girshick, R. Detectron2. 2019. Available online: https://github.com/facebookresearch/detectron2 (accessed on 20 March 2026).
  32. DJI. Mavic 3 Classic Specs. Available online: https://www.dji.com/mavic-3-classic/specs (accessed on 30 March 2026).
  33. Akyon, F.C.; Altinuc, S.O.; Temizel, A. Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection. In Proceedings of the IEEE International Conference on Image Processing (ICIP); IEEE: Piscataway, NJ, USA, 2022; pp. 966–970. [Google Scholar] [CrossRef]
  34. QGIS Development Team. QGIS Geographic Information System. Open Source. Geospatial Foundation Project. 2024. Available online: https://qgis.org (accessed on 20 March 2026).
  35. Zimmer-Dauphinee, J.; VanValkenburgh, P.; Wernke, S.A. Eyes of the machine: AI-assisted satellite archaeological survey in the Andes. Antiquity 2024, 98, 245–259. [Google Scholar] [CrossRef]
  36. Elkan, C. The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence; Lawrence Erlbaum Associates Ltd.: Mahwah, NJ, USA, 2001; Volume 17, pp. 973–978. Available online: https://cseweb.ucsd.edu/~elkan/rescale.pdf (accessed on 20 March 2026).
  37. Liu, Y.; Xu, M.; Xiao, T.; Tang, H.; Hu, Y.; Nie, L. Heterogeneous Feature Collaboration Network for Salient Object Detection in Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5626314. [Google Scholar] [CrossRef]
  38. Xu, M.; Wang, S.; Hu, Y.; Tang, H.; Cong, R.; Nie, L. Cross-Model Nested Fusion Network for Salient Object Detection in Optical Remote Sensing Images. IEEE Trans. Cybern. 2025, 55, 5332–5345. [Google Scholar] [CrossRef]
  39. Xu, M.; Yu, C.; Li, Z.; Tang, H.; Hu, Y.; Nie, L. HDNet: A Hybrid Domain Network with Multiscale High-Frequency Information Enhancement for Infrared Small-Target Detection. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5004115. [Google Scholar] [CrossRef]
Figure 1. End-to-end pipeline for UAV-based Maya pottery sherd detection. The workflow integrates 16-bit HSV conversion, Cascade Mask R-CNN with a ViTDet backbone for candidate generation, and Independent Validation on unseen imagery to isolate the Maya Pottery Red (MPR) signature.
Figure 1. End-to-end pipeline for UAV-based Maya pottery sherd detection. The workflow integrates 16-bit HSV conversion, Cascade Mask R-CNN with a ViTDet backbone for candidate generation, and Independent Validation on unseen imagery to isolate the Maya Pottery Red (MPR) signature.
Remotesensing 18 01836 g001
Figure 2. The Lamanai region includes these two Platform mounds near the Indian Church Village in Belize. Sherd outlines used for training are shown in light green in successive inserts.
Figure 2. The Lamanai region includes these two Platform mounds near the Indian Church Village in Belize. Sherd outlines used for training are shown in light green in successive inserts.
Remotesensing 18 01836 g002
Figure 3. Two-Phase Workflow architecture implementing Chromatic Signal Isolation. Phase I (Inferencing) and Phase II (Filtering) operate as independent stages, allowing the radiometrically dominant MPR chromatic signature to function as a post-inference discriminant without destabilizing gradient optimization. This decoupled architecture exploits the discriminative strength of the MPR hue prior, while preserving stable geometric feature learning in Phase I.
Figure 3. Two-Phase Workflow architecture implementing Chromatic Signal Isolation. Phase I (Inferencing) and Phase II (Filtering) operate as independent stages, allowing the radiometrically dominant MPR chromatic signature to function as a post-inference discriminant without destabilizing gradient optimization. This decoupled architecture exploits the discriminative strength of the MPR hue prior, while preserving stable geometric feature learning in Phase I.
Remotesensing 18 01836 g003
Figure 4. Context-to-detection validation sequence. (a) Full scene context: UAV nadir imagery of survey site with sherd location flagged in the field (yellow circle). (b) Detection chip: 128 × 128 px tile centered on a candidate detection showing the raw model output. (c) HITL validation: operator shapefile overlay on the detection chip with chromatic metrics extracted from the mask perimeter.
Figure 4. Context-to-detection validation sequence. (a) Full scene context: UAV nadir imagery of survey site with sherd location flagged in the field (yellow circle). (b) Detection chip: 128 × 128 px tile centered on a candidate detection showing the raw model output. (c) HITL validation: operator shapefile overlay on the detection chip with chromatic metrics extracted from the mask perimeter.
Remotesensing 18 01836 g004
Figure 5. Maya Pottery Red; hexadecimal value # A46F43, compared to a detection of a sherd, about 1.5 cm2, approximately the size of a fingernail. Sherd color is more distinguishable than its shape.
Figure 5. Maya Pottery Red; hexadecimal value # A46F43, compared to a detection of a sherd, about 1.5 cm2, approximately the size of a fingernail. Sherd color is more distinguishable than its shape.
Remotesensing 18 01836 g005
Figure 6. Sequence shows drone image of two flags and sherd; HITL shapefile drawn around the sherds and flags; pixels in overlay perimeter extracted as NumPy/JSON array; 2-pixel edge erosion applied; target’s colorimetric metrics recorded as HSV NumPy/JSON and GeoTiFF.
Figure 6. Sequence shows drone image of two flags and sherd; HITL shapefile drawn around the sherds and flags; pixels in overlay perimeter extracted as NumPy/JSON array; 2-pixel edge erosion applied; target’s colorimetric metrics recorded as HSV NumPy/JSON and GeoTiFF.
Remotesensing 18 01836 g006
Figure 7. Integrated Hue-Weighted Penalty and ‘Two-Phase’ Selective Filtering for Maya Pottery Detection. (A) Comparison of the HWLF cubic penalty (whue = 1 + α · δi3) against a linear control, highlighting the plateau of low gradient sensitivity. (B) workflow process of the ‘Two-Phase Workflow’ training and inference logic. (C) Comparative field examples of a pottery sherd (True Positive, h ≈ 41°) and vegetation fragment (False Positive, w ≈ 1.86), demonstrating the utility of chromatic weighting in cases of high geometric similarity.
Figure 7. Integrated Hue-Weighted Penalty and ‘Two-Phase’ Selective Filtering for Maya Pottery Detection. (A) Comparison of the HWLF cubic penalty (whue = 1 + α · δi3) against a linear control, highlighting the plateau of low gradient sensitivity. (B) workflow process of the ‘Two-Phase Workflow’ training and inference logic. (C) Comparative field examples of a pottery sherd (True Positive, h ≈ 41°) and vegetation fragment (False Positive, w ≈ 1.86), demonstrating the utility of chromatic weighting in cases of high geometric similarity.
Remotesensing 18 01836 g007
Figure 8. Decoupled Phase Two Chromatic Filtering Workflow, detailing the Stage II analytical gatekeeper.
Figure 8. Decoupled Phase Two Chromatic Filtering Workflow, detailing the Stage II analytical gatekeeper.
Remotesensing 18 01836 g008
Figure 9. Phase II chromatic calibration output: ranked candidate detections sorted by composite score (0.70 × chromatic conformity to the MPR spectral centroid at h = 40.6°, s ≥ 18.0; 0.30 × geometric score), illustrating separation between Maya Pottery Red target responses and false-positive soil and vegetation detections. Top 42 ranked candidates shown. Phase II parameters calibrated against expert-validated annotations: Precision = 58.4%, Recall = 97.8%, F1 = 0.731 (1647 final detections from 177,148 Phase I candidates; 99.1% candidate reduction).
Figure 9. Phase II chromatic calibration output: ranked candidate detections sorted by composite score (0.70 × chromatic conformity to the MPR spectral centroid at h = 40.6°, s ≥ 18.0; 0.30 × geometric score), illustrating separation between Maya Pottery Red target responses and false-positive soil and vegetation detections. Top 42 ranked candidates shown. Phase II parameters calibrated against expert-validated annotations: Precision = 58.4%, Recall = 97.8%, F1 = 0.731 (1647 final detections from 177,148 Phase I candidates; 99.1% candidate reduction).
Remotesensing 18 01836 g009
Figure 10. HWLF training loss trajectory demonstrating detection extinction (output5_pseudo_rgb_transfer; model_0004999.pth). Upper panel: foreground classification accuracy (fg_cls_accuracy, blue) and false negative rate (FN, red) showing simultaneous collapse to 0.0 and 1.0, respectively, by iteration 259. Lower panel: total loss declining from 1.69 toward ~0.44 after extinction—numerically stable degenerate convergence in which the optimizer successfully minimizes loss by predicting all-background. Extinction occurred entirely within the learning rate warmup phase (warmup peak at iteration 519, lr = 1 × 10−5); at the point of extinction the learning rate stood at 5.18 × 10−6 (52% of peak). Training was configured for 50,000 iterations; the run terminated at approximately 6000 iterations due to degenerate convergence.
Figure 10. HWLF training loss trajectory demonstrating detection extinction (output5_pseudo_rgb_transfer; model_0004999.pth). Upper panel: foreground classification accuracy (fg_cls_accuracy, blue) and false negative rate (FN, red) showing simultaneous collapse to 0.0 and 1.0, respectively, by iteration 259. Lower panel: total loss declining from 1.69 toward ~0.44 after extinction—numerically stable degenerate convergence in which the optimizer successfully minimizes loss by predicting all-background. Extinction occurred entirely within the learning rate warmup phase (warmup peak at iteration 519, lr = 1 × 10−5); at the point of extinction the learning rate stood at 5.18 × 10−6 (52% of peak). Training was configured for 50,000 iterations; the run terminated at approximately 6000 iterations due to degenerate convergence.
Remotesensing 18 01836 g010
Figure 11. Precision-Recall curves for the Two-Phase Workflow. The results illustrate the efficacy of decoupling chromatic signatures from geometric candidate generation; applying the Maya Pottery Red (MPR) filter during Phase II results in a significant precision gain by eliminating 99.1% of false positives while maintaining a high recall (97.8%) for sub-decimeter targets.
Figure 11. Precision-Recall curves for the Two-Phase Workflow. The results illustrate the efficacy of decoupling chromatic signatures from geometric candidate generation; applying the Maya Pottery Red (MPR) filter during Phase II results in a significant precision gain by eliminating 99.1% of false positives while maintaining a high recall (97.8%) for sub-decimeter targets.
Remotesensing 18 01836 g011
Table 1. Comparison of the traditional archaeological survey technique known as Pedestrian Survey (also “Field Walking”) with UAV-based AI object detection for effective area survey.
Table 1. Comparison of the traditional archaeological survey technique known as Pedestrian Survey (also “Field Walking”) with UAV-based AI object detection for effective area survey.
FeatureTraditional Pedestrian SurveyUAV/AI Object Detection Model
SubjectivityHighly dependent on “archaeological eye” and fatigue.Consistent, mathematically defined “Maya Red” threshold.
CoverageLimited to small transects
(e.g., 5–15 m spacing).
Rapid, 100% coverage of the ground surface via high-res imagery.
SpeedPerson-hours per hectare are extremely high.Surveying and processing is rapid; large datasets completed with minimal latency.
PrecisionRecorded via handheld GPS
with variable accuracy.
Targets are located with centimeter accuracy using UAV imaging with precise AGL.
Table 2. Training session results comparing baseline and hue-weighted loss.
Table 2. Training session results comparing baseline and hue-weighted loss.
MetricRGB Baseline (Hue OFF)RGB + Hue Test (Hue ON)HSV Baseline (Hue OFF)HSV + Hue Test (Hue ON)
Total Loss0.98530.59510.08500.1635
Cls Loss (Stage 2)0.02010.04290.01020.0147
Time (s/iter)1.63032.39273.79743.5282
Max Memory11,473 M11,472 M9913 M11,471 M
Table 3. Selected training metrics from the HWLF extinction run (production_hsv_batch3_v36). fg_cls_accuracy and false_negative are reported at Cascade Stage 0 (primary detection stage). BG:FG ratio denotes the background-to-foreground sample ratio in the ROI head per iteration. Rows are grouped by the four training stages described in the text.
Table 3. Selected training metrics from the HWLF extinction run (production_hsv_batch3_v36). fg_cls_accuracy and false_negative are reported at Cascade Stage 0 (primary detection stage). BG:FG ratio denotes the background-to-foreground sample ratio in the ROI head per iteration. Rows are grouped by the four training stages described in the text.
IterationTotal Lossfg_cls_accfalse_negBG:FGloss_rpn_clsStage
19272.2341.0000.0002:1109.942I. Normal detection
39237.6601.0000.0002:1108.259I. Normal detection
59162.3750.2500.7502:1107.533II. Initial collapse
79166.3180.0001.0002:1108.196II. First extinction
99127.2620.0001.0002:1104.549II. Extinction sustained
19980.5251.0000.0006:170.114III. Extended recovery
4999.9131.0000.0008:14.612III. Extended recovery
99910.3151.0000.0006:13.114III. Extended recovery
19997.2871.0000.0006:12.385III. Extended recovery
29992.4921.0000.00043:10.487III. Ratio climbing
32791.7920.0001.000270:10.396IV. Transition to lock-in
39991.4930.0001.000511:10.397IV. Permanent lock-in
49990.7280.0001.000511:10.196IV. Terminal state
Table 4. Comprehensive attribute schema for the Master Detection Catalog. The catalog enables deterministic Phase II filtering by coupling standard instance segmentation outputs (RLE masks, confidence) with domain-specific chromatic priors (Circular Mean Hue) and morphometric indices. This structured format supports near-instantaneous hyperparameter tuning across the ≈177,000 candidate detections.
Table 4. Comprehensive attribute schema for the Master Detection Catalog. The catalog enables deterministic Phase II filtering by coupling standard instance segmentation outputs (RLE masks, confidence) with domain-specific chromatic priors (Circular Mean Hue) and morphometric indices. This structured format supports near-instantaneous hyperparameter tuning across the ≈177,000 candidate detections.
CategoryKey Field(s)Data TypeFunctional Description
IndexingObject_IDStringUnique UID: {Flight}{Tile}{Detection}.
ConfidenceGold_ScoreFloatWeighted composite: 0.70 × Color + 0.30 × Geom
ChromaticCIRC_HUEFloatCircular mean hue (0–360°) of mask pixels.
STD_HUEFloatPrimary discriminator for multi-chromatic noise.
MorphologyMASK_PXIntegerTotal pixel count within RLE segmentation.
CircularityFloatIsoperimetric quotient for shape refinement.
SpatialUTM_EastingFloatCentroid coordinate (UTM 16N, WGS84).
Data RecordRLE_MaskStringCOCO-standard run-length encoded geometry.
Chip_PathStringLocal path to extracted 128 px image chip.
Table 5. IVT performance metrics for the Two-Phase Workflow. Phase I metrics are calculated from 727,991 raw candidates against 984 ground-truth annotations (TP = 962, FP = 727,029, FN = 22); Phase II metrics reflect post-chromatic-filter output (1647 candidates). The identical Recall in both phases (0.978) confirms that the Phase II chromatic filter introduces no additional false negatives—it exclusively removes false positives.
Table 5. IVT performance metrics for the Two-Phase Workflow. Phase I metrics are calculated from 727,991 raw candidates against 984 ground-truth annotations (TP = 962, FP = 727,029, FN = 22); Phase II metrics reflect post-chromatic-filter output (1647 candidates). The identical Recall in both phases (0.978) confirms that the Phase II chromatic filter introduces no additional false negatives—it exclusively removes false positives.
ModelDetectionsPrecisionRecallF1 ScoreAP
Phase I727,9910.0010.9780.0020.282
Phase II16470.5840.9780.7310.374
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Britton, B.; McLellan, A.; Dunning, N. Maya Pottery Red: Hue as a Perceptual Prior for Object Detection in UAV-Based Areal Survey. Remote Sens. 2026, 18, 1836. https://doi.org/10.3390/rs18111836

AMA Style

Britton B, McLellan A, Dunning N. Maya Pottery Red: Hue as a Perceptual Prior for Object Detection in UAV-Based Areal Survey. Remote Sensing. 2026; 18(11):1836. https://doi.org/10.3390/rs18111836

Chicago/Turabian Style

Britton, Benjamin, Alec McLellan, and Nicholas Dunning. 2026. "Maya Pottery Red: Hue as a Perceptual Prior for Object Detection in UAV-Based Areal Survey" Remote Sensing 18, no. 11: 1836. https://doi.org/10.3390/rs18111836

APA Style

Britton, B., McLellan, A., & Dunning, N. (2026). Maya Pottery Red: Hue as a Perceptual Prior for Object Detection in UAV-Based Areal Survey. Remote Sensing, 18(11), 1836. https://doi.org/10.3390/rs18111836

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop